Last week we published our deFUME paper in the open access journal BMC Research Notes. The aim is an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, specifically targeting wet-lab scientists (or non-bioinformaticians).
A quick intro into function metagenomics: it’s a subfield of the more widly known metagenomics. The term metagenomics was first introduced by Handelsman and Clardy in 1998 and is a method to extract DNA from the environment (metagenome) and study this by either sequencing or functional analysis. The first case does what the name says, extract and sequence as much DNA as possible and using bioinformatics tools to try to determine the function. In this way Hess et al  were able to computationally identify 27,755 putative carbohydrate-active genes in cow rumen. However a drawback of this method is that these genes need to experimentally validated.
Functional metagenomics works in that sense the other way around, a metagenomic library is transformed in a laboratory host (for example E. coli) and cultured while monitoring for a phenotypic change. For example if one is looking for proteases, the agar plate can be supplemented with milk and colonies creating a halo can be deemed positive for proteolytic activity. These colonies can subsequently be sequenced and predicted genes functionally annotated. For this last process we created the deFUME webserver, it integrates the whole process from vector trimming till domain annotation into one pipeline.
The workflow of deFUME is visualized in the figure below where processes are depicted in red and (intermediate) files in black:
As input files deFUME takes either Sanger chromatograms (as .ab1 files) or, in case of a next generation run, the assembled nucleotide sequences in FASTA format. In the next steps the data is processed and annotated with BLAST and InterPro data. Leaving it for the user to interact with the data in an interactive table format for example to filter on e-value, remove hypothetical proteins or show more or less detail. Finally the annotations can be exported in FASTA or Genbank format or in a simple csv file.
Why would you use the webserver?
- It’s free for academic users
- It saves time compared to, for example running the same workflow in CLC
- It’s easy because you don’t spent time on intermediate files, for example vector trimming the contigs and pushing those to BLAST.
So where did this idea originate from?
It actually started out in the summer 2013 with a small project at the CIID (Copenhagen institute for interaction design) where we designed all kinds of interactive visualizations. In the lab we had a functional metagenomic data set laying around but some colleagues found it challenging to analyze the data and interact with it. So out of curiosity I made the following sketch (on Github) in Processing that would, based on Interpro data, give a quick overview of the sequences and annotated Interpro domains.
This small processing sketch was a direct hit and the idea arose to make this kind of interaction wider available. One basic necessity would be to also include the data processing into the visualization so the user only has to push 1 button in order to get an interactive visualization.
What is next?
We are very happy with the current version but while developing we already came across a number of feature that would make a great appearance in version 2, for example EcoCyc integration, reporting of GC content over the stretch of the contig, exporting the InterPro annotations in the Genbank file and optimizing the coloring scheme. So incase you are a student and interested in working on deFUME you can drop me an email.
 van der Helm, E., Geertz-Hansen, H. M., Genee, H. J., Malla, S. & Sommer, M. O. A. deFUME: Dynamic exploration of functional metagenomic sequencing data. BMC Res. Notes 8, 328 (2015).
 Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–7 (2011).