Wrapup of Visualizing Biological Data ’15

Screen Shot 2014-03-09 at 7.46.53 PMFrom the 24th till the 27th of March I visited the Broad Institute of Harvard and MIT in Boston to attend the VizBi 2015 conference. The scope of this conference is to advance the knowledge in the visualization of biological data, the 2015 iteration was the 6th international meeting that took place. Hereby a long overdue recap of two talks that I thought were particular interesting.

On Wednesday John Stasko kicked off as a keynote speaker with some very interesting notions about the different applications of visualization; this should either be for presentation (=explanatory) or for analysis (=exploratory). This difference is important since they both have their own goals, for example when presenting results the goals are: to clarify, focus, highlight, simplify and persuade. However when analyzing data the goal is to explore, make decisions and use statistic descriptors.

However a good quote also passed by here “IF you know what you are looking for, you probably don’t need visualizations”.

So when you do decide you need a visualization it is most useful for analysis (=exploratory), in this case it can help you:

  • If you don’t know what you are looking for
  • Don’t have an a priori questions
  • Want to know what questions to ask

So typically these kind of visualizations; show all variables, illustrate overview and detail and facilitate comparison. A result of this setup is that “analysis visualizations” are difficult to understand, because the underlying data is complex, so the visualization is probably also difficult to understand. This is not a bad thing, however the user needs to invest time to decode the visualization.

A perfect example of a exploratory visualization is the Attribute Explorer from 1998[1]. Here the authors used the notion of compromise to analyze a dataset. For example when searching for a new house you might look at the price, the commuting time and the amount of bedrooms. However when setting a particular limit on each of these attributes you might miss the house that has a perfect price and number of bedrooms but is just a 5-minute longer commute. The paper shows that by implementing coupled histograms the user is still able to see these “compromise solutions”. The PDF of the article is available here showing some old school histograms.

The concepts of the Attribute Explorer from 1998 are nowadays still relevant

The concepts of the Attribute Explorer from 1998 are nowadays still relevant

The takeaway: a visualization of radically different if one presents the data or when one analyses the data

An often encountered problem with visualization is high data complexity; too high to visualize in one go. There are a few options to tackle this:

  • pack all the data in one complex representation
  • spread the data into multiple coordinated views (pixels are Johns friend)
  • use interaction to reveal different subsets of the data

When interaction with data users have different intends in a 2007 InfoVis paper by Stasko [2] there are 7 intends described:

  1. Select
  2. Explore
  3. Reconfigure
  4. Encode
  5. Abstract/Elaborate
  6. Filter
  7. Connect

However 95% of the intends are made up by Tooltip&Selection in order to get details, Navigation and Brushing&linking. This gives rise to a chicken-egg problem, why are only those 4 intends used so extensively and how can one make a visualization more effective?

An example Stasko showed was the use of a tablet[3] where there is a whole wealth of new gestures available, as is best illustrated in this video:

As a conclusion Stasko gives his own formula that captures the value of visualization.

Value of Visualization = Time + Insight + Essence + Confidence:

  • T: Ability to minimize the total time needed to answer a wide variety of questions about the data
  • I: Ability to spur and discover insights or insightful questions about the data
  • E: Ability to convey an overall essence or take-away sense of the data
  • C: Ability to generate confidence and trust about the data, its domain and context

download (2)

On Friday Daniel Evanko (@devanko) from the Nature Publishing spoke about the future of visualizations in publications. There is currently a big gap between all the rich data sets that people publish and the way these are incorporated in scientific articles. Evanko made some interesting points from a publisher perspective.

The current “rich” standards such as pdf are probably good for a dozen of years to come, however new formats such as D3, Java and R can break or could become unsupported at any time in the future. On the other hand the basic print format such as paper or microfilm can be kept for 100 years. Although this is a conservative standpoint in my opinion it indeed makes sense to keep the long term perspective in mind when releasing new publication formats, because who says Java will be supported in 20 years. However I think with thorough design (the community) should be able to come up with some defined standards that have the lifetime of a microfilm.

Another argument Evanko used was the fact that the few papers that are published with interactive visualization do not generate a lot of traffic from which the conclusion was drawn that the audience doesn’t want these kind of visualization so publishers will not offer them. Again I feel we can be dealing here with a chicken-egg problem.

I’m grateful to the Otto Mønsteds Fond for providing support to attend Vizbi ’15.skjold-otto-moensteds-fond

 

References

  1. Spence R, Tweedie L: The Attribute Explorer: information synthesis via exploration. Interact Comput 1998, 11:137–146.
  2. Yi Jsyjs, Kang Yakya, Stasko JT, Jacko J.: Toward a Deeper Understanding of the Role of Interaction in Information Visualization. IEEE Trans Vis Comput Graph 2007, 13:1224–1231.
  3. Sadana R, Stasko J: Designing and implementing an interactive scatterplot visualization for a tablet computer. Proc. 2014 Int Work. 2014:265–272.

 

Leave a Comment

Filed under Talk

Recap of the Nanopore sequencing conference ‘London Calling’ by ONT

MinionLast Thursday and Friday Oxford Nanopore Technologies (ONT) hosted it’s first conference ‘London Calling’ where participants of the MinION Access Program (MAP) presented their results and experiences after 11 months of the program. The CTO of ONT also delivered  a session where the future directions where outlined. Below a quick recap of two days of London Calling.

There were about 20 talks (agenda) by a broad range of scientist from microbiologists to bioinformaticians. A few observations I found interesting to share:

  • John Tyson (University of British Columbia) wrote a script that slightly alters the voltage along the run to keep the yield curve linear, he uses this method standard for each of his runs
  • The majority of the presenters just only use the 2D reads
  • A nice month-by-month overview of the MAP program can be found in Nick Lomans talk here
  • Miles Carroll (Public Health England), Josh Quick (University of Birmingham) and Thomas Hoenen, NIH/NIAID) went to Africa last year to sequence the Ebola virus outbreak and were able to map the outbreak on phylogenetic timescale, they used RT-PCR to generate the input material. Main conclusion here was that field sequencing with the MinION works, the Ebola mutation rate is not higher than other viruses, key drug targets are not mutating.
  • People are exploring a lot of options to use it in clinical setting, for example for rapid identification of bacterial infections (Justin O’Grady, University of East Anglia) or for pharmacogenomics (Ron Ammar, University of Toronto); in short which drugs not to prescribe to patients because their liver cannot metabolise them due to a genetic modification, read the paper here.
  • A detailed account on how to assemble a bacterial genome with only Nanopore data by Jared Simpson can be found on Slideshare, it’s an interaction version of this pre-print
  • Currently MinION + MiSeq data is the way to go short-term future (according to Mick Watson) for genome assembly. Alistair Darby, University of Liverpool argued to just use 1 sequencing technology to perform the whole genome assembly because to much time can/is wasted to integrate all the different sequencing methods with different algorithms.
Minion

DNA sequencing becomes really personal now

During the talks some requests were put forward:

  • More automation for lib prep / faster lib prep protocol (this will be tackled either with VolTRaxx and/or a bead protocol for low input material and a 10 minute protocol for 1D reads announced by CTO Clive Brown)
  • More stable performance between individual flow cells
  • Base calling off-line so no need to connect to the cloud
  • Tweaking the base caller for base pair modifications (for example methylation)

On Thursday afternoon there was the talk of Clive Brown the CTO of ONT. On Twitter the talk was compared with a “Steve Jobs style” way to reveal the new products.

A few points he presented:

  • There will be at the end of the year/next year a new MinION release that has the ASIC electronics not in the flow cell but in the MinION itself, this would drastically cut the price of the flow cells (from 1000$ -> 25$). Another big change here is the chip will contain 3000 channels instead of 512. Furthermore runtime of these device will also be around 2 weeks.
  • All the shipments should be room temperature soon
  • A “fast mode“ will be available within the next 3 months where a typical run will not generate 2Gbase of data but 40Gbase of data.
  • VoltTRAX is developed which can be clicked on a flow cell and will automate the full lib prep process, they imagine users can load a mL of blood sample on the VolTRAX and it will be prepped automatically.
  • At the same time ONT will implement a different price structure where you pay per hour of sequencing instead of per flow cell, so you can just run a MinION for 3 hours and pay, say 270$ and don’t pay anything else.
  • The PromethION (kind of 48 MinIONs in 1 machine and more channels per chip) will be launched with Sequencing Core facilities as their main costumer in mind, however they will create a MAP for this (PEAP) as well. The PromethION It will include the above improvements as well, making it potentially more productive than a HiSeq.
Oxford Nanopore Tcchnologies CTO Clive Brown showcasing the VolTraxx automatic sample preparation unit

Oxford Nanopore Tcchnologies CTO Clive Brown showcasing the automatic sample preparation device VolTRAXX.

In conclusion the conference atmosphere was very upbeat with a lot of enthusiasm for the future of nanopore sequencing. Can’t wait to get this MinION started.

 

 

1 Comment

Filed under Talk

3D printing: Prevent wrapping with ABS

UM22A common problem when 3D printing with ABS is the wrapping that occurs when printing larger objects. Wrapping is the bending of the outsides of the printed object due to shrinkage of the ABS when it cools down. There are already plenty solutions around (ie  this instructables  where Kapton tape is used) but I found this one working particularly well without any Kapton tape.

I found it works best to dissolve left over pieces of ABS in acetone and let is dissolve for about an hour (preferably use a glass jar since acetone dissolves several common plastics). The resulting solution becomes pitch black and is a bit viscous. Next heat up the glass bed (110 °C) of the printer and apply a thin layer of ABS around the outline of your print. Watch out for the fumes of acetone, since acetone is a very very flammable liquid! Use a ventilated room.

When the first layer of the brim is printed, add drops of the ABS solution on the corners of the brim. This will partly dissolve the brim but makes is stick even better to the plate.

ABSslurry

Note that this ONLY works with ABS and not with PLA because PLA does not dissolve in acetone.

2 Comments

Filed under 3Dprinting

Book review: Guide to Information Graphics

E6CA7658-C06D-4E7A-A170-C223EB6427ACI found “The Wall street Journal. Guide to Information Design” in a bookshop and thought to give it a go. It is written by Dona M. Wong, an old student of Edward Tufte. The books aim is to be a “reference to be put on ones desk”, the question is who should put it on his/her desk and who not? Distributed over five chapters to book maintains a very basic level of graphic design. Chapter two has a nice outline showing Do’s on the right page and Don’ts on the left page, and contains a lot of useful tips (ie. try to avoid a legend in a line graph, but just label the lines on the right side of the line) but the book rarely explains the theory behind the advice. For this, one has to grab back to Tuftes “The Visual Display of Quantitative Information”. Continue reading

Leave a Comment

Filed under Book Review

A step closer in culturing “unculturable” bacteria?

MicroBio_img_006Last week the first real novel antibiotic since the ’90 saw the light. The authors made the discovery by diluting a soil sample in agar and casting this in a matrix with tiny holes. Next this matrix chamber (called iChip) was sealed with a semi-permeable membrane and placed back in the soil. After a month colonies were picked and after co-culturing with S. aureus the authors found the new antibiotic. The compound, teixobactin is produced by a large biosynthetic gene cluster of the previously uncharacterized bacteria Eleftheria terrae. The new antibiotic is functional against S. aureus but also against M. tuberculosis and probably a whole range of other Gram positives. Back to the culturing, with the iChip the authors claim to show the growth recovery “approaches 50%, as compared to 1% of cells from soil that will grow on a nutrient Petri dish10.”

iChip in action  (Photo: Slava Epstein / Northeastern University)

iChip in action (Photo: Slava Epstein / Northeastern University)

It is thus encouraging to see that by reconstituting the environmental cues (by placing the iChip back in the soil) a bigger fraction of the microorganisms is able to grow. This back-to-nature approach has a parallel with in vitro culturing techniques of eukaryotic cells where supplementing with fecal calf serum is used to reintroduces as many growth stimulating cues as possible. The question remains whether the limit of this method has been hit or if the recovery rate can be even further ramped up.

Another paper published last month in Applied and Environmental Microbiology ramped up growth in a different way. The authors systematically investigated the role of autoclaving phosphate buffer together (which is currently the practice in most labs) or separately with agar. It turns out that separately autoclaving the components is a difference of day and night with regard to the amount of cells that grow on the plates. Figure 3 in the paper shows about 8*107 CFU/g of soil when phosphate and agar were not autoclaved together compared to 3*107 CFU/g of soil when they were autoclaved together. In other words a ~2.5 fold increase in colony formation. In the conclusion the authors’ even report at 50-fold increase in CFU. The reason for this difference lies in the formation of hydrogen peroxide in the agar when it is autoclaved together with the phosphate buffer. The authors include a figure that indeed shows a correlation between an increase in H2O2 and phosphate buffer.

What can be learned from these two articles? First of all that it remains very challenging to culture a large fraction of the microbes out there and second the process is, after 100 years of cell culturing, still being improved.

Ling, L. L. et al. A new antibiotic kills pathogens without detectable resistance. Nature (2015). doi:10.1038/nature14098 [$]

Tanaka, T. et al. A hidden pitfall in agar media preparation undermines cultivability of microorganisms. Appl. Environ. Microbiol. 80, 7659–7666 (2014). [$]

Leave a Comment

Filed under Science Article

Validate DNA FASTA file with a javascript function

gitlogoI couldn’t find a quick function that validates a single DNA FASTA file in javascript, so here a setup (it uses the same structure as this protein FASTA validator from the Oxford Protein Informatics Group).

function validateDNA (seq)	{
	//Based on: http://www.blopig.com/blog/2013/03/a-javascript-function-to-validate-fasta-sequences/

	// immediately remove trailing spaces
	seq = seq.trim();

	// split on newlines... 
	var lines = seq.split('\n');

	// check for header
	if (seq[0] == '>') {
		// remove one line, starting at the first position	
		lines.splice(0, 1);
	
	}

	// join the array back into a single string without newlines and 
	// trailing or leading spaces
	seq = lines.join('').trim();
	
	//Search for charaters that are not G, A, T or C.
	if (seq.search(/[^gatc\s]/i) != -1) {	
		//The seq string contains non-DNA characters
		return false;
		/// The next line can be used to return a cleaned version of the DNA
		/// return seq.replace(/[^gatcGATC]/g, "");
	}
	else
	{
		//The seq string contains only GATC
		return true;
	}
	
}

Have fun

Leave a Comment

Filed under Tools

Measuring light intensity with the TSL-235 and a Raspberry Pi

picThere are many ways to measure the light intensity using a Raspberry Pi and one of them (that doesn’t require an ADC) is the TSL-235. It is a photodiode connected to a small circuit that generates a pulse of which the frequency depends on the light intensity. This frequency can be directly read out using a Raspberry Pi as explained below.
Continue reading

9 Comments

Filed under Tools

What do a TIM barrel and flavodoxin have in common?

4q37_bio_r_250Last month Birte Höcker, an experimental and theoretical protein scientist form the Max Planck Institute for Developmental Biology published an interesting article in Nature Chemical Biology.  The general challenge described is that sequences that look different on first inspection can give rise to very similar 3D structures. It also shows a nice combination of bioinformatics complemented with experimental work. Central in this case are the two folds (Figure 1, blue and green):

  • TIM barrel also known as the (βα)8-barrel  consisting of 8 β-strands in the core and 8 α-helixes around
  • Flavodoxin  which folds with 2 α-helixes on the outside sandwiching 5 βstrands in the core.

So the common theme: they both have αhelixes on the outside, sheets on the inside.

The underlaying question here is: How is one fold converted to the other?

Continue reading

Leave a Comment

Filed under Science Article

Highlights of the International Synthetic and Systems Biology Summer School 2014

sicily_etnaLast week I joined the International Synthetic and Systems Biology Summer School in Taormina, Italy and as the title describes it was all about Synthetic and Systems biology with some pretty cool speakers.  Weiss talked about the general principles of genetic circuits and the current limitations (record is currently 12 different synthetic promoters in 1 designed network). Sarpeshkar focused on the stochastic nature and the associated noise of cells, he showed how they can be simulated or mirrored using analog circuits. Paul Freemont took Ron Weiss’ design principles and showed how to apply them on different examples, he also elaborated on an efficient way of characterizing new circuits and parts. Tanja Kortemme, a former postdoc from the Baker lab, gave an introduction to the capabilities of computational protein design and using some neat examples showed the power (and limitations) of computational design. Below some highlights and the relevant links of the literature that was discussed.

  Continue reading

Leave a Comment

Filed under Talk

Setting up Syncthing for Raspberry Pi

Nowadays everything is apparently in the cloud. However the cloud comes in many different sizes and shapes. In my case primarily in the form of dropbox. However there are always downsides of “out sourcing” your data to an undefined cloud. Some popular alternatives include BitTorrent Sync (not open source and your data is still touched by a third party), ownCloud (an open source dropbox clone, however the performance on a Raspberry Pi is not very smooth) and since December 2013 Syncthing. The last one is a kind of open source implementation of BitTorrent Sync, so in contrast to Dropbox your data is distributed over your own computers and not at a distant server. In my setup I use a headless Raspberry Pi that is tucked a way in a cupboard as a node in the network, so this client is always online. Continue reading

15 Comments

Filed under Tools