A Survey of Metabolic Reconstruction Tools for Metagenomic Datasets
Metagenomics is a rapidly expanding field and the ability to predict the metabolic capability of a mixed microbial community from metagenomic samples has become an important challenge in bioinformatics. In response to this there have been several releases of highly used software packages for metabolic reconstruction of metagenomics datasets. However due to a race towards similar goals software approaches have converged and it can be unclear where the differences lie between the latest software releases. Here I aim to provide an independent survey to identify the advantages and limitations of some of the recent releases of metabolic reconstruction software for metagenomics datasets.
There are 2 major subdivisions in metagenomics metabolic reconstruction software packages; visualisation tools with light-weight analysis features aiming to aid manual curation of the resulting pathways, and heavy-weight analysis tools with computationally demanding analysis features attempting to enrich annotation prior to visualisation and provide automated statistical analysis across datasets. Depending on the level of analysis required and size of your datasets the advantages of each package are subjective to your experimental objectives.
As demonstrated in table 1 there is a common movement towards scalable data handling in all recent releases. There is also potential for portability across the tools, i.e. light-weight visualisation packages (iPath2.0 and pathway projector) can be customised to view integrated datasets and advanced annotations from the heavyweight analysis tool (metaSHARK, MEGAN and MGrast).
Light-weight visualistaion tools
The main advantage of iPath2.0 (10.1093/nar/gkr313) and Pathway Projector (10.1371/journal.pone.0007710) is their user interface which allows seamless navigation across the metabolic networks. This makes them very useful for generating figures, demonstrating functional trends and getting a fast impression of metabolic context of a pathway, metabolite or enzyme. The main limitation however is that in isolation these tools usually only use KEGG identifiers so produce a limited coverage of the metagenomic reads and require pre-computed database hits.
Enriched annotation metabolic tools
To address the the low percentage of annotated sequence reads in light weight analysis tools metaSHARK (10.1093/nar/gkl196) and KAAS (10.1093/nar/gkm321) provide additional analysis. In order to enrich annotation in the dataset metaSHARK uses enzyme profile searches against the PRIAM database and KAAS uses blast alignment against the KEGG genes database, both of which increase the percentage of annotated metagenomic data and provide functional information for some of the previously undefined function genes. These tools can be inaccurate due to lower reliability in functional assignment which means metaSHARK can be inconsistent with the KAAS pathways.
Comprehensive metagenomics packages
MGrast (10.1186/1471-2105-9-386) and MEGAN (10.1101/gr.5969107) provide comprehensive analysis of metagenomics data and generate integrated figures showing functional and taxonomic trends. Their main reference is the KEGG orthology and SEED databases, which allows the software to categorise the function and taxonomy of each sequence read and plot trends within the data. MGrast has been shown to give better results due to their local alignment algorithm, which uses multiple sources for comparison whereas MEGAN is set up only to take blast results against NCBI-NR and NCBI-NT (this can be extended and gives equal results). The main difference is that MGrast requires uploading data into the cloud whereas MEGAN is a standalone software package however requires large computational power to provide the initial blast analysis. The major limitation to these complete metagenomics packages is that functional annotation can still be limited due to the high percentage of undefined function entries in metagenomics databases. Therefore specific in depth analyses packages are being developed which reference the broad databaes and enrich annotation. HUMAnN is an example for the human microbiome project which promise vast improvements for coverage and statistical analysis.
|software feature||iPath2.0||Pathway projector||KEGG Atlas||metaSHARK||MEGAN 4||MGrast v3||HUMAnN|
|Implementation||web server||web server||web server||web server and standalone||standalone||web server||stand alone|
|visualisation||Flash flex||ZUI using google earth API||KEGG Atlas||veiwSHARK||interactive independent comparative figures||multiple figure generation||multiple figure generation (interactive)|
|computational analysis||lightweight against KEGG db||lightweight against KEGG db||lightweight||middle weight enzyme profile search||middle weight internal (after blast analysis)||heavy weight external||heavy weight internal|
|scalability||highly scalable||KEGG and custom||KEGG KASS?||highly parallel||parallel through cloud||scalable||scalable|
|database reference||KEGG and custom||KEGG and custom||KEGG KASS?||PRIAM||SEED KEGG eggNOG||SEED KEGG eggNOG||multiple|