Category Archives: Publications

Background on the poreFUME pre-print

porefumlogoLast week our pre-print on nanopore sequencing came online at bioRxiv. Nanopore sequencing is a relatively new sequencing technology that is starting to come of age. As part of this process we last year started playing with the ONT MinION sequencer. This post summarizes a bit of the background behind the pre-print.

Previously I covered the London Calling 2015 event  where a lot of progress on the development of the MinION was showcased. We were keen to find out how the MinION could contribute to our daily lab work, but also to see what new ground can be covered with this new sequencing technology.

One of the aspects colleagues in the lab are working on is the dissemination of antibiotic resistance genes, as a major healthcare challenge is the emergence of pathogens that are resistant against antibiotics. Therefor we thought of combining the MinION with antibiotic resistance gene profiling. More specifically; coupling functional metagenomic selections with nanopore sequencing.

Previous work in this field, for example by Justin O’Grady and colleagues, showed the use of the MinION [$] to identify the structure and chromosomal insertion site of a bacterial antibiotic resistance island in Salmonella Typhi.

Instead of going after single isolates, we set out the map the antibiotic resistance genes that are present in the gut (resistome) of a hospitalized patient. The resistome can influence the outcome of antibiotic treatment and it is therefor highly interesting to get insights in this complex network.   Through a collaboration under the EvoTAR programma with Willem van Schaik of the University of Utrecht we had a clinical fecal sample available of an ICU patient, which we used in the experiments.

Typical workflow of the construction and selection of a metagenomic workflow.

Typical functional metagenomic workflow where metagenomic DNA is isolated from a (complex) environment, in this case a fecal sample. The DNA is sheared, ligated and transformed in E. coli. When profiling for antibiotic resistance genes, the cells are plated on agar containing various antibiotics. Finally the metagenomic inserts are sequenced an annotated.

Key in the whole experimental setup to capture the resistome is the use of functional metagenomic selections. In contrast to culturing individual microorganisms directly from a fecal sample, metagenomic DNA is extracted from the sample. This metagenomic DNA is subsequently sheared, ligated and transformed in E. coli and finally plated out on solid agar containing various antibiotics. Only E. coli cells that harbor a metagenomic DNA fragment that encodes for an antibiotic resistant phenotype can survive. With these functional metagenomic selections in hand, the complexity of the resistome can be rapidly mapped.

And this is were the MinION comes in. Although other sequencing technologies, such as the Illumina and the PacBio platform, are available, they do not provide both long reads and low capital requirements.



After some initial failed attempts to get the MinION sequencer running in our lab, we started to see >100 Mbase runs in October last year. Also PoreCamp last December in Birmingham provided, on top of a great experience and nice people, some useful data (next week a new round of PoreCamp takes place).

In order to analyze the sequencing data that Metrichor generates we developed the poreFUME pipeline, which automates the process of barcode demultiplexing, error correction (using nanocorrect) and antibiotic resistance gene annotation (using CARD). The poreFUMe software is available on Github as a python script. The subsequent analysis is as well available on Github in a Jupyter notebook.

The jupyter notebook is available here

The Jupyter notebook with the analysis in the pre-print is available here.

In order to benchmark the nanopore sequencing data we also Sanger and PacBio sequenced the sample. From these results we could achieve a >97% sequence accuracy and we were able to identify all the 26 antibiotic resistance genes in both the Pacbio and nanopore set.

Since the whole workflow can be performed relatively quickly, it would be really interesting to move these techniques to the next stage and do in-situ resistome profiling. Especially integrating Matt Loose’s read-until functionally could open up new avenues. Furthermore these experiments were done with the R7 chemistry, however it seems that the new R9 chemistry is able to deliver even higher accuracies and faster turn-around.

The fasta files and poreFUME output used in the analysis are already online, the raw PacBio and MinION data is available at ENA

Update 2016-11-01: Added the ENA link to the raw data


Filed under Publications

deFUME webserver paper published last week!

paperLast week we published our deFUME paper in the open access journal BMC Research Notes. The aim is an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, specifically targeting wet-lab scientists (or non-bioinformaticians).
A quick intro into function metagenomics: it’s a subfield of the more widly known metagenomics. The term metagenomics was first introduced by Handelsman and Clardy in 1998 and is a method to extract DNA from the environment (metagenome) and study this by either sequencing or functional analysis. The first case does what the name says, extract and sequence as much DNA as possible and using bioinformatics tools to try to determine the function. In this way Hess et al [2]  were able to computationally identify 27,755 putative carbohydrate-active genes in cow rumen. However a drawback of this method is that these genes need to experimentally validated.

Different phenotypes that can be observed, for example halo formation, pigmentation or morphological changes

Different phenotypes that can be observed when expressing a metagenomic library, for example halo formation, pigmentation or morphological changes.

Functional metagenomics works in that sense the other way around, a metagenomic library is transformed in a laboratory host (for example E. coli) and cultured while monitoring for a phenotypic change. For example if one is looking for proteases, the agar plate can be supplemented with milk and colonies creating a halo can be deemed positive for proteolytic activity. These colonies can subsequently be sequenced and predicted genes functionally annotated. For this last process we created the deFUME webserver, it integrates the whole process from vector trimming till domain annotation into one pipeline.

The workflow of deFUME is visualized in the figure below where processes are depicted in red and (intermediate) files in black:

deFUME webserver flowchart

deFUME web server flowchart, processes are in red and files/objects in black. From [1]

As input files deFUME takes either Sanger chromatograms (as .ab1 files) or, in case of a next generation run, the assembled nucleotide sequences in FASTA format. In the next steps the data is processed and annotated with BLAST and InterPro data. Leaving it for the user to interact with the data in an interactive table format for example to filter on e-value, remove hypothetical proteins or show more or less detail. Finally the annotations can be exported in FASTA or Genbank format or in a simple csv file.

Why would you use the webserver?

  1. It’s free for academic users
  2. It saves time compared to, for example running the same workflow in CLC
  3. It’s easy because you don’t spent time on intermediate files, for example vector trimming the contigs and pushing those to BLAST.
Screenshot of deFUME

Screenshot of deFUME showing the functional annotations (A) and the interactive toolbox (B). From [1]

So where did this idea originate from?

It actually started out in the summer 2013 with a small project at the CIID (Copenhagen institute for interaction design) where we designed all kinds of interactive visualizations. In the lab we had a functional metagenomic data set laying around but some colleagues found it challenging to analyze the data and interact with it. So out of curiosity I made the following sketch (on Github) in Processing that would, based on Interpro data, give a quick overview of the sequences and annotated Interpro domains.

Screenshot of the initial sketch made in Processing

Screenshot of the initial sketch made in Processing

This small processing sketch was a direct hit and the idea arose to make this kind of interaction wider available. One basic necessity would be to also include the data processing into the visualization so the user only has to push 1 button in order to get an interactive visualization.
Therefor we implemented a backend that runs on the Center for Biological Sequence (CBS) servers at the Danish Technical University (DTU) and handles the data pipeline, from basecalling to BLASTing. Another quick realization was that a Processing sketch is not extremely portable and user-friendly, a web interface on the other hand would be. Therefor we build a table based (using jqGrid) module to display the functional annotations and use the HTML5 canvas to draw a visual representation of the data. We used Javascript to let the different components talk to each other and some D3js to display a histogram of GO terms. On the backend the pipeline is implemented in Perl and all the data is structured and stored in a single JSON object that is delivered to the client using PHP.

What is next?
We are very happy with the current version but while developing we already came across a number of feature that would make a great appearance in version 2, for example EcoCyc integration, reporting of GC content over the stretch of the contig, exporting the InterPro annotations in the Genbank file and optimizing the coloring scheme. So incase you are a student and interested in working on deFUME you can drop me an email.

The deFUME paper can be found here, the webserver here with a working example here. Contributions can be made to the deFUME github repository.

[1] van der Helm, E., Geertz-Hansen, H. M., Genee, H. J., Malla, S. & Sommer, M. O. A. deFUME: Dynamic exploration of functional metagenomic sequencing data. BMC Res. Notes 8, 328 (2015).

[2] Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–7 (2011).

Leave a Comment

Filed under Publications