Abstract
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.
Keywords: CloudLCA, metagenome analysis, cloud computing
Footnotes
These authors contributed equally to the work.
Contributor Information
Yi Zhao, Email: biozy@ict.ac.cn.
Runsheng Chen, Email: chenrs@sun5.ibp.ac.cn.
References
- Blankenberg D., Von Kuster G., Coraor N., Ananda G., Lazarus R., Mangan M., Nekrutenko A., Taylor J. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol, Chapter. 2010;19:Unit 19.10 11–21. doi: 10.1002/0471142727.mb1910s89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blankenberg D., Taylor J., Schenck I., He J., Zhang Y., Ghent M., Veeraraghavan N., Albert I., Miller W., Makova K.D., et al. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res. 2007;17:960–964. doi: 10.1101/gr.5578007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson D.H., Auch A.F., Qi J., Schuster S.C. MEGAN analysis of metagenomic data. Genome Res. 2007;17:377–386. doi: 10.1101/gr.5969107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson D.H., Mitra S., Ruscheweyh H.J., Weber N., Schuster S.C. Integrative analysis of environmental sequences using MEGAN4. Genome Res. 2011;21:1552–1560. doi: 10.1101/gr.120618.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson D.H., Richter D.C., Mitra S., Auch A.F., Schuster S.C. Methods for comparative metagenomics. BMC Bioinformatics. 2009;10:S12. doi: 10.1186/1471-2105-10-S1-S12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lämmel R. Google’s MapReduce programming model — Revisited. Sci Comput Program. 2007;68:208–237. [Google Scholar]
- Langmead B., Hansen K.D., Leek J.T. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 2010;11:R83. doi: 10.1186/gb-2010-11-8-r83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Metzker M.L. Sequencing technologies — the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
- Qin J., Li R., Raes J., Arumugam M., Burgdorf K.S., Manichanh C., Nielsen T., Pons N., Levenez F., Yamada T., the MetaHIT Consortium et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schatz M.C. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009;25:1363–1369. doi: 10.1093/bioinformatics/btp236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sudha Sadasivam G., Baktavatchalam G. A novel approach to multiple sequence alignment using hadoop data grids. Int J Bioinform Res Appl. 2010;6:472–483. doi: 10.1504/IJBRA.2010.037987. [DOI] [PubMed] [Google Scholar]
- Yang J., Yang F., Ren L., Xiong Z., Wu Z., Dong J., Sun L., Zhang T., Hu Y., Du J., et al. Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol. 2011;49:3463–3469. doi: 10.1128/JCM.00273-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
