porecamp: a great week of nanopore sequencing

MinionThis week I attended porecamp at the University of Birmingham focused on the use of the MinION nanopore sequencer. The workshop was hosted by Nick Loman and included interactive sessions with Matt Loose, Mick Watson, Josh Quick, John Tyson, Justin O’Grady and Jared Simpson. So pretty much every aspect of nanopore sequencing, from library preparation to assembly polishing was covered. Below a brief overview of the activities that were going on, a detailed account will soon be written up in a F1000 article by the participants.

Everyone had the opportunity to bring some DNA samples to try in the new ‘native barcoding protocol’. This pre-release protocol allows for the pooling of multiple samples on one flow cell by, in an extra ligation step, attaching a barcode to the individual samples.  The initial results looked pretty good in the sense that it should be possible to obtain an equal distribution of DNA from a pooled library. It also became evident that the use of high quality DNA improves the output from the MinION. When working with genomic DNA the best strategy is to start with a fresh culture, directly phenol-chloroform extract and don’t freeze the DNA before the library prep.

Josh explaining the library prep protocol

Josh explaining the library prep protocol

John Tyson and Matt Loose thoroughly demonstrated the use of  software add ons to improve the process. Johns scripts optimize the way the sequencer selects the correct pore to sequence from and Matt his minoTour software let you realtime analyse the data as it comes of the sequencer, he also showed some pretty cool initial results of the read-until feature, for example to balance the reads of a pooled sample.

Matt performing a -1 G nanopore run

Matt performing a -1 G nanopore run

On the bioinformatics side we gave, after diving into the fast5 file format, the new mapper from Heng Li miniasm a try, resulting in very rapid genome assembly. It will be interesting to see how miniasm will find its way into the assembly pipelines.

Concluding this was an extremely valuable week to get to know everyone and exchange knowledge on the latest practices in the nanopore sequencing world. So again a big thanks to the perfect organization.

The course material is available on github and additional information can be found on twitter under #porecamp

Leave a Comment

Filed under Course

deFUME webserver paper published last week!

paperLast week we published our deFUME paper in the open access journal BMC Research Notes. The aim is an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, specifically targeting wet-lab scientists (or non-bioinformaticians).
A quick intro into function metagenomics: it’s a subfield of the more widly known metagenomics. The term metagenomics was first introduced by Handelsman and Clardy in 1998 and is a method to extract DNA from the environment (metagenome) and study this by either sequencing or functional analysis. The first case does what the name says, extract and sequence as much DNA as possible and using bioinformatics tools to try to determine the function. In this way Hess et al [2]  were able to computationally identify 27,755 putative carbohydrate-active genes in cow rumen. However a drawback of this method is that these genes need to experimentally validated.

Different phenotypes that can be observed, for example halo formation, pigmentation or morphological changes

Different phenotypes that can be observed when expressing a metagenomic library, for example halo formation, pigmentation or morphological changes.

Functional metagenomics works in that sense the other way around, a metagenomic library is transformed in a laboratory host (for example E. coli) and cultured while monitoring for a phenotypic change. For example if one is looking for proteases, the agar plate can be supplemented with milk and colonies creating a halo can be deemed positive for proteolytic activity. These colonies can subsequently be sequenced and predicted genes functionally annotated. For this last process we created the deFUME webserver, it integrates the whole process from vector trimming till domain annotation into one pipeline.

The workflow of deFUME is visualized in the figure below where processes are depicted in red and (intermediate) files in black:

deFUME webserver flowchart

deFUME web server flowchart, processes are in red and files/objects in black. From [1]

As input files deFUME takes either Sanger chromatograms (as .ab1 files) or, in case of a next generation run, the assembled nucleotide sequences in FASTA format. In the next steps the data is processed and annotated with BLAST and InterPro data. Leaving it for the user to interact with the data in an interactive table format for example to filter on e-value, remove hypothetical proteins or show more or less detail. Finally the annotations can be exported in FASTA or Genbank format or in a simple csv file.

Why would you use the webserver?

  1. It’s free for academic users
  2. It saves time compared to, for example running the same workflow in CLC
  3. It’s easy because you don’t spent time on intermediate files, for example vector trimming the contigs and pushing those to BLAST.
Screenshot of deFUME

Screenshot of deFUME showing the functional annotations (A) and the interactive toolbox (B). From [1]

So where did this idea originate from?

It actually started out in the summer 2013 with a small project at the CIID (Copenhagen institute for interaction design) where we designed all kinds of interactive visualizations. In the lab we had a functional metagenomic data set laying around but some colleagues found it challenging to analyze the data and interact with it. So out of curiosity I made the following sketch (on Github) in Processing that would, based on Interpro data, give a quick overview of the sequences and annotated Interpro domains.

Screenshot of the initial sketch made in Processing

Screenshot of the initial sketch made in Processing

This small processing sketch was a direct hit and the idea arose to make this kind of interaction wider available. One basic necessity would be to also include the data processing into the visualization so the user only has to push 1 button in order to get an interactive visualization.
Therefor we implemented a backend that runs on the Center for Biological Sequence (CBS) servers at the Danish Technical University (DTU) and handles the data pipeline, from basecalling to BLASTing. Another quick realization was that a Processing sketch is not extremely portable and user-friendly, a web interface on the other hand would be. Therefor we build a table based (using jqGrid) module to display the functional annotations and use the HTML5 canvas to draw a visual representation of the data. We used Javascript to let the different components talk to each other and some D3js to display a histogram of GO terms. On the backend the pipeline is implemented in Perl and all the data is structured and stored in a single JSON object that is delivered to the client using PHP.

What is next?
We are very happy with the current version but while developing we already came across a number of feature that would make a great appearance in version 2, for example EcoCyc integration, reporting of GC content over the stretch of the contig, exporting the InterPro annotations in the Genbank file and optimizing the coloring scheme. So incase you are a student and interested in working on deFUME you can drop me an email.

The deFUME paper can be found here, the webserver here with a working example here. Contributions can be made to the deFUME github repository.

[1] van der Helm, E., Geertz-Hansen, H. M., Genee, H. J., Malla, S. & Sommer, M. O. A. deFUME: Dynamic exploration of functional metagenomic sequencing data. BMC Res. Notes 8, 328 (2015).

[2] Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–7 (2011).

Leave a Comment

Filed under Publications

Ultimaker replacing temperature sensor

Last week the Ultimaker 2 gave an ominous ERROR – STOPPED TEMP SENSOR message.

The Ultimaker 2 temperature error

The Ultimaker 2 temperature error

After consulting Ultimaker support and measuring the resistance over the Pt100 sensor in the printer head (only 138 Ohm when heated up, which would correspond to only 100 C ) the culprit was quickly identified. Luckily the Ultimaker support page contains a very elaborate step-by-step instruction on how to replace the Pt100 sensor. Although the instruction is very clear it takes quite some time to  perform all of the disassembly and subsequent assembly steps to replace the Pt100. Be also sure to replace the temperature sensor and not the heating element since they have both the same shape, the heating element is only slightly bigger.

Heather element on the left and new Pt100 temperature sensor on the right

After removing the temperature sensor with the help of some WD40 from the heatblock it is pretty clear that the sensor was, for unknown reason, completely destroyed. Replacing the Pt100 with a fresh one from the factory directly solved the problem and we are happy printing again.

The broken Pt100 temperature sensor

Leave a Comment

Filed under 3Dprinting

Wrapup of Visualizing Biological Data ’15

Screen Shot 2014-03-09 at 7.46.53 PMFrom the 24th till the 27th of March I visited the Broad Institute of Harvard and MIT in Boston to attend the VizBi 2015 conference. The scope of this conference is to advance the knowledge in the visualization of biological data, the 2015 iteration was the 6th international meeting that took place. Hereby a long overdue recap of two talks that I thought were particular interesting.

On Wednesday John Stasko kicked off as a keynote speaker with some very interesting notions about the different applications of visualization; this should either be for presentation (=explanatory) or for analysis (=exploratory). This difference is important since they both have their own goals, for example when presenting results the goals are: to clarify, focus, highlight, simplify and persuade. However when analyzing data the goal is to explore, make decisions and use statistic descriptors.

However a good quote also passed by here “IF you know what you are looking for, you probably don’t need visualizations”.

So when you do decide you need a visualization it is most useful for analysis (=exploratory), in this case it can help you:

  • If you don’t know what you are looking for
  • Don’t have an a priori questions
  • Want to know what questions to ask

So typically these kind of visualizations; show all variables, illustrate overview and detail and facilitate comparison. A result of this setup is that “analysis visualizations” are difficult to understand, because the underlying data is complex, so the visualization is probably also difficult to understand. This is not a bad thing, however the user needs to invest time to decode the visualization.

A perfect example of a exploratory visualization is the Attribute Explorer from 1998[1]. Here the authors used the notion of compromise to analyze a dataset. For example when searching for a new house you might look at the price, the commuting time and the amount of bedrooms. However when setting a particular limit on each of these attributes you might miss the house that has a perfect price and number of bedrooms but is just a 5-minute longer commute. The paper shows that by implementing coupled histograms the user is still able to see these “compromise solutions”. The PDF of the article is available here showing some old school histograms.

The concepts of the Attribute Explorer from 1998 are nowadays still relevant

The concepts of the Attribute Explorer from 1998 are nowadays still relevant

The takeaway: a visualization of radically different if one presents the data or when one analyses the data

An often encountered problem with visualization is high data complexity; too high to visualize in one go. There are a few options to tackle this:

  • pack all the data in one complex representation
  • spread the data into multiple coordinated views (pixels are Johns friend)
  • use interaction to reveal different subsets of the data

When interaction with data users have different intends in a 2007 InfoVis paper by Stasko [2] there are 7 intends described:

  1. Select
  2. Explore
  3. Reconfigure
  4. Encode
  5. Abstract/Elaborate
  6. Filter
  7. Connect

However 95% of the intends are made up by Tooltip&Selection in order to get details, Navigation and Brushing&linking. This gives rise to a chicken-egg problem, why are only those 4 intends used so extensively and how can one make a visualization more effective?

An example Stasko showed was the use of a tablet[3] where there is a whole wealth of new gestures available, as is best illustrated in this video:

As a conclusion Stasko gives his own formula that captures the value of visualization.

Value of Visualization = Time + Insight + Essence + Confidence:

  • T: Ability to minimize the total time needed to answer a wide variety of questions about the data
  • I: Ability to spur and discover insights or insightful questions about the data
  • E: Ability to convey an overall essence or take-away sense of the data
  • C: Ability to generate confidence and trust about the data, its domain and context

download (2)

On Friday Daniel Evanko (@devanko) from the Nature Publishing spoke about the future of visualizations in publications. There is currently a big gap between all the rich data sets that people publish and the way these are incorporated in scientific articles. Evanko made some interesting points from a publisher perspective.

The current “rich” standards such as pdf are probably good for a dozen of years to come, however new formats such as D3, Java and R can break or could become unsupported at any time in the future. On the other hand the basic print format such as paper or microfilm can be kept for 100 years. Although this is a conservative standpoint in my opinion it indeed makes sense to keep the long term perspective in mind when releasing new publication formats, because who says Java will be supported in 20 years. However I think with thorough design (the community) should be able to come up with some defined standards that have the lifetime of a microfilm.

Another argument Evanko used was the fact that the few papers that are published with interactive visualization do not generate a lot of traffic from which the conclusion was drawn that the audience doesn’t want these kind of visualization so publishers will not offer them. Again I feel we can be dealing here with a chicken-egg problem.

I’m grateful to the Otto Mønsteds Fond for providing support to attend Vizbi ’15.skjold-otto-moensteds-fond

 

References

  1. Spence R, Tweedie L: The Attribute Explorer: information synthesis via exploration. Interact Comput 1998, 11:137–146.
  2. Yi Jsyjs, Kang Yakya, Stasko JT, Jacko J.: Toward a Deeper Understanding of the Role of Interaction in Information Visualization. IEEE Trans Vis Comput Graph 2007, 13:1224–1231.
  3. Sadana R, Stasko J: Designing and implementing an interactive scatterplot visualization for a tablet computer. Proc. 2014 Int Work. 2014:265–272.

 

Leave a Comment

Filed under Talk

Recap of the Nanopore sequencing conference ‘London Calling’ by ONT

MinionLast Thursday and Friday Oxford Nanopore Technologies (ONT) hosted it’s first conference ‘London Calling’ where participants of the MinION Access Program (MAP) presented their results and experiences after 11 months of the program. The CTO of ONT also delivered  a session where the future directions where outlined. Below a quick recap of two days of London Calling.

There were about 20 talks (agenda) by a broad range of scientist from microbiologists to bioinformaticians. A few observations I found interesting to share:

  • John Tyson (University of British Columbia) wrote a script that slightly alters the voltage along the run to keep the yield curve linear, he uses this method standard for each of his runs
  • The majority of the presenters just only use the 2D reads
  • A nice month-by-month overview of the MAP program can be found in Nick Lomans talk here
  • Miles Carroll (Public Health England), Josh Quick (University of Birmingham) and Thomas Hoenen, NIH/NIAID) went to Africa last year to sequence the Ebola virus outbreak and were able to map the outbreak on phylogenetic timescale, they used RT-PCR to generate the input material. Main conclusion here was that field sequencing with the MinION works, the Ebola mutation rate is not higher than other viruses, key drug targets are not mutating.
  • People are exploring a lot of options to use it in clinical setting, for example for rapid identification of bacterial infections (Justin O’Grady, University of East Anglia) or for pharmacogenomics (Ron Ammar, University of Toronto); in short which drugs not to prescribe to patients because their liver cannot metabolise them due to a genetic modification, read the paper here.
  • A detailed account on how to assemble a bacterial genome with only Nanopore data by Jared Simpson can be found on Slideshare, it’s an interaction version of this pre-print
  • Currently MinION + MiSeq data is the way to go short-term future (according to Mick Watson) for genome assembly. Alistair Darby, University of Liverpool argued to just use 1 sequencing technology to perform the whole genome assembly because to much time can/is wasted to integrate all the different sequencing methods with different algorithms.
Minion

DNA sequencing becomes really personal now

During the talks some requests were put forward:

  • More automation for lib prep / faster lib prep protocol (this will be tackled either with VolTRaxx and/or a bead protocol for low input material and a 10 minute protocol for 1D reads announced by CTO Clive Brown)
  • More stable performance between individual flow cells
  • Base calling off-line so no need to connect to the cloud
  • Tweaking the base caller for base pair modifications (for example methylation)

On Thursday afternoon there was the talk of Clive Brown the CTO of ONT. On Twitter the talk was compared with a “Steve Jobs style” way to reveal the new products.

A few points he presented:

  • There will be at the end of the year/next year a new MinION release that has the ASIC electronics not in the flow cell but in the MinION itself, this would drastically cut the price of the flow cells (from 1000$ -> 25$). Another big change here is the chip will contain 3000 channels instead of 512. Furthermore runtime of these device will also be around 2 weeks.
  • All the shipments should be room temperature soon
  • A “fast mode“ will be available within the next 3 months where a typical run will not generate 2Gbase of data but 40Gbase of data.
  • VoltTRAX is developed which can be clicked on a flow cell and will automate the full lib prep process, they imagine users can load a mL of blood sample on the VolTRAX and it will be prepped automatically.
  • At the same time ONT will implement a different price structure where you pay per hour of sequencing instead of per flow cell, so you can just run a MinION for 3 hours and pay, say 270$ and don’t pay anything else.
  • The PromethION (kind of 48 MinIONs in 1 machine and more channels per chip) will be launched with Sequencing Core facilities as their main costumer in mind, however they will create a MAP for this (PEAP) as well. The PromethION It will include the above improvements as well, making it potentially more productive than a HiSeq.
Oxford Nanopore Tcchnologies CTO Clive Brown showcasing the VolTraxx automatic sample preparation unit

Oxford Nanopore Tcchnologies CTO Clive Brown showcasing the automatic sample preparation device VolTRAXX.

In conclusion the conference atmosphere was very upbeat with a lot of enthusiasm for the future of nanopore sequencing. Can’t wait to get this MinION started.

 

 

1 Comment

Filed under Talk

3D printing: Prevent wrapping with ABS

UM22A common problem when 3D printing with ABS is the wrapping that occurs when printing larger objects. Wrapping is the bending of the outsides of the printed object due to shrinkage of the ABS when it cools down. There are already plenty solutions around (ie  this instructables  where Kapton tape is used) but I found this one working particularly well without any Kapton tape.

I found it works best to dissolve left over pieces of ABS in acetone and let is dissolve for about an hour (preferably use a glass jar since acetone dissolves several common plastics). The resulting solution becomes pitch black and is a bit viscous. Next heat up the glass bed (110 °C) of the printer and apply a thin layer of ABS around the outline of your print. Watch out for the fumes of acetone, since acetone is a very very flammable liquid! Use a ventilated room.

When the first layer of the brim is printed, add drops of the ABS solution on the corners of the brim. This will partly dissolve the brim but makes is stick even better to the plate.

ABSslurry

Note that this ONLY works with ABS and not with PLA because PLA does not dissolve in acetone.

2 Comments

Filed under 3Dprinting

Book review: Guide to Information Graphics

E6CA7658-C06D-4E7A-A170-C223EB6427ACI found “The Wall street Journal. Guide to Information Design” in a bookshop and thought to give it a go. It is written by Dona M. Wong, an old student of Edward Tufte. The books aim is to be a “reference to be put on ones desk”, the question is who should put it on his/her desk and who not? Distributed over five chapters to book maintains a very basic level of graphic design. Chapter two has a nice outline showing Do’s on the right page and Don’ts on the left page, and contains a lot of useful tips (ie. try to avoid a legend in a line graph, but just label the lines on the right side of the line) but the book rarely explains the theory behind the advice. For this, one has to grab back to Tuftes “The Visual Display of Quantitative Information”. Continue reading

Leave a Comment

Filed under Book Review

A step closer in culturing “unculturable” bacteria?

MicroBio_img_006Last week the first real novel antibiotic since the ’90 saw the light. The authors made the discovery by diluting a soil sample in agar and casting this in a matrix with tiny holes. Next this matrix chamber (called iChip) was sealed with a semi-permeable membrane and placed back in the soil. After a month colonies were picked and after co-culturing with S. aureus the authors found the new antibiotic. The compound, teixobactin is produced by a large biosynthetic gene cluster of the previously uncharacterized bacteria Eleftheria terrae. The new antibiotic is functional against S. aureus but also against M. tuberculosis and probably a whole range of other Gram positives. Back to the culturing, with the iChip the authors claim to show the growth recovery “approaches 50%, as compared to 1% of cells from soil that will grow on a nutrient Petri dish10.”

iChip in action  (Photo: Slava Epstein / Northeastern University)

iChip in action (Photo: Slava Epstein / Northeastern University)

It is thus encouraging to see that by reconstituting the environmental cues (by placing the iChip back in the soil) a bigger fraction of the microorganisms is able to grow. This back-to-nature approach has a parallel with in vitro culturing techniques of eukaryotic cells where supplementing with fecal calf serum is used to reintroduces as many growth stimulating cues as possible. The question remains whether the limit of this method has been hit or if the recovery rate can be even further ramped up.

Another paper published last month in Applied and Environmental Microbiology ramped up growth in a different way. The authors systematically investigated the role of autoclaving phosphate buffer together (which is currently the practice in most labs) or separately with agar. It turns out that separately autoclaving the components is a difference of day and night with regard to the amount of cells that grow on the plates. Figure 3 in the paper shows about 8*107 CFU/g of soil when phosphate and agar were not autoclaved together compared to 3*107 CFU/g of soil when they were autoclaved together. In other words a ~2.5 fold increase in colony formation. In the conclusion the authors’ even report at 50-fold increase in CFU. The reason for this difference lies in the formation of hydrogen peroxide in the agar when it is autoclaved together with the phosphate buffer. The authors include a figure that indeed shows a correlation between an increase in H2O2 and phosphate buffer.

What can be learned from these two articles? First of all that it remains very challenging to culture a large fraction of the microbes out there and second the process is, after 100 years of cell culturing, still being improved.

Ling, L. L. et al. A new antibiotic kills pathogens without detectable resistance. Nature (2015). doi:10.1038/nature14098 [$]

Tanaka, T. et al. A hidden pitfall in agar media preparation undermines cultivability of microorganisms. Appl. Environ. Microbiol. 80, 7659–7666 (2014). [$]

Leave a Comment

Filed under Science Article

Validate DNA FASTA file with a javascript function

gitlogoI couldn’t find a quick function that validates a single DNA FASTA file in javascript, so here a setup (it uses the same structure as this protein FASTA validator from the Oxford Protein Informatics Group).

function validateDNA (seq)	{
	//Based on: http://www.blopig.com/blog/2013/03/a-javascript-function-to-validate-fasta-sequences/

	// immediately remove trailing spaces
	seq = seq.trim();

	// split on newlines... 
	var lines = seq.split('\n');

	// check for header
	if (seq[0] == '>') {
		// remove one line, starting at the first position	
		lines.splice(0, 1);
	
	}

	// join the array back into a single string without newlines and 
	// trailing or leading spaces
	seq = lines.join('').trim();
	
	//Search for charaters that are not G, A, T or C.
	if (seq.search(/[^gatc\s]/i) != -1) {	
		//The seq string contains non-DNA characters
		return false;
		/// The next line can be used to return a cleaned version of the DNA
		/// return seq.replace(/[^gatcGATC]/g, "");
	}
	else
	{
		//The seq string contains only GATC
		return true;
	}
	
}

Have fun

Leave a Comment

Filed under Tools

Measuring light intensity with the TSL-235 and a Raspberry Pi

picThere are many ways to measure the light intensity using a Raspberry Pi and one of them (that doesn’t require an ADC) is the TSL-235. It is a photodiode connected to a small circuit that generates a pulse of which the frequency depends on the light intensity. This frequency can be directly read out using a Raspberry Pi as explained below.
Continue reading

9 Comments

Filed under Tools