Tag Archives: DNA

Background on the poreFUME pre-print

porefumlogoLast week our pre-print on nanopore sequencing came online at bioRxiv. Nanopore sequencing is a relatively new sequencing technology that is starting to come of age. As part of this process we last year started playing with the ONT MinION sequencer. This post summarizes a bit of the background behind the pre-print.

Previously I covered the London Calling 2015 event  where a lot of progress on the development of the MinION was showcased. We were keen to find out how the MinION could contribute to our daily lab work, but also to see what new ground can be covered with this new sequencing technology.

One of the aspects colleagues in the lab are working on is the dissemination of antibiotic resistance genes, as a major healthcare challenge is the emergence of pathogens that are resistant against antibiotics. Therefor we thought of combining the MinION with antibiotic resistance gene profiling. More specifically; coupling functional metagenomic selections with nanopore sequencing.

Previous work in this field, for example by Justin O’Grady and colleagues, showed the use of the MinION [$] to identify the structure and chromosomal insertion site of a bacterial antibiotic resistance island in Salmonella Typhi.

Instead of going after single isolates, we set out the map the antibiotic resistance genes that are present in the gut (resistome) of a hospitalized patient. The resistome can influence the outcome of antibiotic treatment and it is therefor highly interesting to get insights in this complex network.   Through a collaboration under the EvoTAR programma with Willem van Schaik of the University of Utrecht we had a clinical fecal sample available of an ICU patient, which we used in the experiments.

Typical workflow of the construction and selection of a metagenomic workflow.

Typical functional metagenomic workflow where metagenomic DNA is isolated from a (complex) environment, in this case a fecal sample. The DNA is sheared, ligated and transformed in E. coli. When profiling for antibiotic resistance genes, the cells are plated on agar containing various antibiotics. Finally the metagenomic inserts are sequenced an annotated.

Key in the whole experimental setup to capture the resistome is the use of functional metagenomic selections. In contrast to culturing individual microorganisms directly from a fecal sample, metagenomic DNA is extracted from the sample. This metagenomic DNA is subsequently sheared, ligated and transformed in E. coli and finally plated out on solid agar containing various antibiotics. Only E. coli cells that harbor a metagenomic DNA fragment that encodes for an antibiotic resistant phenotype can survive. With these functional metagenomic selections in hand, the complexity of the resistome can be rapidly mapped.

And this is were the MinION comes in. Although other sequencing technologies, such as the Illumina and the PacBio platform, are available, they do not provide both long reads and low capital requirements.



After some initial failed attempts to get the MinION sequencer running in our lab, we started to see >100 Mbase runs in October last year. Also PoreCamp last December in Birmingham provided, on top of a great experience and nice people, some useful data (next week a new round of PoreCamp takes place).

In order to analyze the sequencing data that Metrichor generates we developed the poreFUME pipeline, which automates the process of barcode demultiplexing, error correction (using nanocorrect) and antibiotic resistance gene annotation (using CARD). The poreFUMe software is available on Github as a python script. The subsequent analysis is as well available on Github in a Jupyter notebook.

The jupyter notebook is available here

The Jupyter notebook with the analysis in the pre-print is available here.

In order to benchmark the nanopore sequencing data we also Sanger and PacBio sequenced the sample. From these results we could achieve a >97% sequence accuracy and we were able to identify all the 26 antibiotic resistance genes in both the Pacbio and nanopore set.

Since the whole workflow can be performed relatively quickly, it would be really interesting to move these techniques to the next stage and do in-situ resistome profiling. Especially integrating Matt Loose’s read-until functionally could open up new avenues. Furthermore these experiments were done with the R7 chemistry, however it seems that the new R9 chemistry is able to deliver even higher accuracies and faster turn-around.

The fasta files and poreFUME output used in the analysis are already online, the raw PacBio and MinION data is available at ENA

Update 2016-11-01: Added the ENA link to the raw data

Leave a Comment

Filed under Publications

Validate DNA FASTA file with a javascript function

gitlogoI couldn’t find a quick function that validates a single DNA FASTA file in javascript, so here a setup (it uses the same structure as this protein FASTA validator from the Oxford Protein Informatics Group).

function validateDNA (seq)	{
	//Based on: http://www.blopig.com/blog/2013/03/a-javascript-function-to-validate-fasta-sequences/

	// immediately remove trailing spaces
	seq = seq.trim();

	// split on newlines... 
	var lines = seq.split('\n');

	// check for header
	if (seq[0] == '>') {
		// remove one line, starting at the first position	
		lines.splice(0, 1);

	// join the array back into a single string without newlines and 
	// trailing or leading spaces
	seq = lines.join('').trim();
	//Search for charaters that are not G, A, T or C.
	if (seq.search(/[^gatc\s]/i) != -1) {	
		//The seq string contains non-DNA characters
		return false;
		/// The next line can be used to return a cleaned version of the DNA
		/// return seq.replace(/[^gatcGATC]/g, "");
		//The seq string contains only GATC
		return true;

Have fun

Leave a Comment

Filed under Tools

Highlights of the International Synthetic and Systems Biology Summer School 2014

sicily_etnaLast week I joined the International Synthetic and Systems Biology Summer School in Taormina, Italy and as the title describes it was all about Synthetic and Systems biology with some pretty cool speakers.  Weiss talked about the general principles of genetic circuits and the current limitations (record is currently 12 different synthetic promoters in 1 designed network). Sarpeshkar focused on the stochastic nature and the associated noise of cells, he showed how they can be simulated or mirrored using analog circuits. Paul Freemont took Ron Weiss’ design principles and showed how to apply them on different examples, he also elaborated on an efficient way of characterizing new circuits and parts. Tanja Kortemme, a former postdoc from the Baker lab, gave an introduction to the capabilities of computational protein design and using some neat examples showed the power (and limitations) of computational design. Below some highlights and the relevant links of the literature that was discussed.

  Continue reading

Leave a Comment

Filed under Talk

Book review: The Double Helix

The Double Helix, book coverI recently came across a copy of “The Double Helix” by James Watson and will put forth some highlights in this post.

The book became quite famous for giving a glimpse of how science was done in the ’50. However it also generated a fair amount of controversy because of the harsh personal attacks especially towards Rosalind Franklin, who made essential contributions but was never acknowledged to the extend of Watson and Crick.

Continue reading

Leave a Comment

Filed under Book Review

Happy DNA day!

DNAsnapshotToday it is exactly 60 years ago (April 25, 1953) J.D. Watson and F.H.C Crick published their famous DNA structure paper in Nature. You can  read the original article at Nature website here for free. In 1962  James Watson, Francis Crick and Maurice Wilkins were awarded the Nobel Prize in Physiology or Medicine for their discovery. As of 2003 the 25th of April is celebrated as DNA day.

Leave a Comment

Filed under Science Article