on June 21, 2011 by Matt Collison in Software Comparisons, Comments (3)

A Survey of Metabolic Reconstruction Tools for Metagenomic Datasets

Matthew G. collison 1

1 Biopharmaceutical Bioprocess Technology Centre, Merz Court, Newcatle University, NE1 7RU

Introduction

Metagenomics is a rapidly expanding field and the ability to predict the metabolic capability of a mixed microbial community from metagenomic samples has become an important challenge in bioinformatics. In response to this there have been several releases of highly used software packages for metabolic reconstruction of metagenomics datasets. However due to a race towards similar goals software approaches have converged and it can be unclear where the differences lie between the latest software releases. Here I aim to provide an independent survey to identify the advantages and limitations of some of the recent releases of metabolic reconstruction software for metagenomics datasets.

Summary

There are 2 major subdivisions in metagenomics metabolic reconstruction software packages; visualisation tools with light-weight analysis features aiming to aid manual curation of the resulting pathways, and heavy-weight analysis tools with computationally demanding analysis features attempting to enrich annotation prior to visualisation and provide automated statistical analysis across datasets. Depending on the level of analysis required and size of your datasets the advantages of each package are subjective to your experimental objectives.

As demonstrated in table 1 there is a common movement towards scalable data handling in all recent releases. There is also potential for portability across the tools, i.e. light-weight visualisation packages (iPath2.0 and pathway projector) can be customised to view integrated datasets and advanced annotations from the heavyweight analysis tool (metaSHARK, MEGAN and MGrast).

Light-weight visualistaion tools

The main advantage of iPath2.0 (10.1093/nar/gkr313) and Pathway Projector (10.1371/journal.pone.0007710) is their user interface which allows seamless navigation across the metabolic networks. This makes them very useful for generating figures, demonstrating functional trends and getting a fast impression of metabolic context of a pathway, metabolite or enzyme. The main limitation however is that in isolation these tools usually only use KEGG identifiers so produce a limited coverage of the metagenomic reads and require pre-computed database hits.

Enriched annotation metabolic tools

To address the the low percentage of annotated sequence reads in light weight analysis tools metaSHARK (10.1093/nar/gkl196) and KAAS (10.1093/nar/gkm321) provide additional analysis. In order to enrich annotation in the dataset metaSHARK uses enzyme profile searches against the PRIAM database and KAAS uses blast alignment against the KEGG genes database, both of which increase the percentage of annotated metagenomic data and provide functional information for some of the previously undefined function genes. These tools can be inaccurate due to lower reliability in functional assignment which means metaSHARK can be inconsistent with the KAAS pathways.

Comprehensive metagenomics packages

MGrast (10.1186/1471-2105-9-386) and MEGAN (10.1101/gr.5969107) provide comprehensive analysis of metagenomics data and generate integrated figures showing functional and taxonomic trends. Their main reference is the KEGG orthology and SEED databases, which allows the software to categorise the function and taxonomy of each sequence read and plot trends within the data. MGrast has been shown to give better results due to their local alignment algorithm, which uses multiple sources for comparison whereas MEGAN is set up only to take blast results against NCBI-NR and NCBI-NT (this can be extended and gives equal results). The main difference is that MGrast requires uploading data into the cloud whereas MEGAN is a standalone software package however requires large computational power to provide the initial blast analysis. The major limitation to these complete metagenomics packages is that functional annotation can still be limited due to the high percentage of undefined function entries in metagenomics databases. Therefore specific in depth analyses packages are being developed which reference the broad databaes and enrich annotation. HUMAnN is an example for the human microbiome project which promise vast improvements for coverage and statistical analysis.

software feature iPath2.0 Pathway projector KEGG Atlas metaSHARK MEGAN 4 MGrast v3 HUMAnN
Implementation web server web server web server web server and standalone standalone web server stand alone
visualisation Flash flex ZUI using google earth API KEGG Atlas veiwSHARK interactive independent comparative figures multiple figure generation multiple figure generation (interactive)
computational analysis lightweight against KEGG db lightweight against KEGG db lightweight middle weight enzyme profile search middle weight internal (after blast analysis) heavy weight external heavy weight internal
scalability highly scalable KEGG and custom KEGG KASS? highly parallel parallel through cloud scalable scalable
database reference KEGG and custom KEGG and custom KEGG KASS? PRIAM SEED KEGG eggNOG SEED KEGG eggNOG multiple
statistical outputs none none none none multiple multiple multiple

Bibliography

Tags: ,

3 Comments

  1. Jack A Gilbert

    July 30, 2011 @ 4:50 pm

    I really enjoyed this little introduction, but wanted to bring to your attention the latest tool in the arsenal, Predicted Relative Metabolic Turnover (PRMT). This tool allows you to predict the relative turnover of metabolites generated by a metagenomic or metatranscriptomic dataset derived from comparable samples, e.g. over time or space. It was published in BMC: microbial Informatics and Experimentation last month, and is open access. http://www.microbialinformaticsj.com/content/1/1/4/abstract

    I think you will find this very useful, a practical alternative to visualization of pathways.

  2. Matt Collison

    August 1, 2011 @ 10:57 am

    Thanks for the feedback Jack. PRMT looks very promising.

    I will review the tool and update the survey. I am also going to include a breakdown of each software package in the next update, so some additional detail, and I’ll include PRMT in this.

    Thanks again for the update
    Matt

  3. Doug

    December 27, 2012 @ 1:45 am

    Thanks for the summary of software for metagenomics studies. One article I have recently read also discusses software developments that allow identification on individual species within a metagenomics sample. One package is under development at the University of Washington in Seattle. Apparently it allows identification of individual genomes from even minor samples represented.

Leave a comment

Login