The Bio-Community Perl toolkit for microbial ecology

Florent E Angly; Christopher J Fields; Gene W Tyson

doi:10.1093/bioinformatics/btu130

. 2014 Mar 10;30(13):1926–1927. doi: 10.1093/bioinformatics/btu130

The Bio-Community Perl toolkit for microbial ecology

Florent E Angly ^1,^*, Christopher J Fields ², Gene W Tyson ¹

PMCID: PMC4071200 PMID: 24618462

Abstract

Summary: The development of bioinformatic solutions for microbial ecology in Perl is limited by the lack of modules to represent and manipulate microbial community profiles from amplicon and meta-omics studies. Here we introduce Bio-Community, an open-source, collaborative toolkit that extends BioPerl. Bio-Community interfaces with commonly used programs using various file formats, including BIOM, and provides operations such as rarefaction and taxonomic summaries. Bio-Community will help bioinformaticians to quickly piece together custom analysis pipelines and develop novel software.

Availability an implementation: Bio-Community is cross-platform Perl code available from http://search.cpan.org/dist/Bio-Community under the Perl license. A readme file describes software installation and how to contribute.

Contact: f.angly@uq.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online

1 INTRODUCTION

Sequencing is common in most fields of biological research, and the throughput of modern platforms is orders of magnitudes higher than traditional Sanger sequencing (Metzker, 2010). The BioPerl bioinformatic toolkit (Stajich et al., 2002) has attracted a large community of users and developers and has become critical in many sequencing projects by allowing quick code development and interaction between programs using incompatible file formats. In microbial ecology, sequencing is used routinely for 16S rRNA gene amplicon surveys (Tringe and Hugenholtz, 2008), metagenomics (Handelsman, 2004) and metatranscriptomics (Frias-Lopez et al., 2008). Because most microorganisms remain uncultivated (Rappé and Giovannoni, 2003), culture-independent molecular surveys are essential for the characterization of environmental microbial communities. However, they require large computational resources, novel bioinformatic tools and elaborate pipelines. Many tools have been developed to analyze the resulting sequence data. For example, libraries written in Python (Knight et al., 2007) and R (Dixon, 2003; Kembel et al., 2010) provide blocks for building bioinformatic software. QIIME (Caporaso et al., 2010) and mothur (Schloss et al., 2009) are dedicated packages with scripts to build complete analysis pipelines, but they use incompatible file formats. Here, we introduce Bio-Community, a set of format-agnostic modules and scripts to parse and manipulate taxonomic or functional microbial community profiles.

2 FEATURES

2.1 Object model

Bio-Community is a Perl object-oriented toolkit that extends BioPerl. It is centered around the Community object, which contains a group of entities from the same geographic area (Fig. 1).

Fig. 1. — Main objects, their attributes and operation modules

These entities are Member objects, representing individual genomes, genes, taxa or operational taxonomic units from amplicon and meta-omic surveys. Member objects store attributes such as an identifier, a taxon or a sequence and can be given weights to account for the fact that there is no one-to-one relationship between a sequencing read and a microbial cell. The relative abundance or abundance rank of a Member can be calculated based on this Member’s count, weight and the total count in the Community (Fig. 2). Similarly, absolute abundance is based on total microbial abundance in the community, quantifiable by epifluorescence microscopy, qPCR or flow cytometry (Rinsoz et al., 2008).

Fig. 2. — Relation between abundance types. Relative abundance depends on member counts and weights, whereas absolute abundance is further derived from a total abundance measure

2.2 Diversity metrics

Bio-Community quantifies community α, β and γ diversity (Whittaker, 1972) using a range of metrics [reviewed by Magurran (2004)]. The diversity of a single Community object, α diversity, is represented by metrics of richness, evenness, dominance and indices (Supplementary Table S1). Several Community objects can be grouped into a Meta object, representing a metacommunity (Leibold et al., 2004). This object provides methods to measure γ diversity, i.e. the collective diversity of its communities, and β diversity, i.e. their dissimilarity. The γ metrics are the same as those available for α diversity, whereas those for β diversity include qualitative and quantitative forms (Supplementary Table S1).

2.3 Data input and output

Community profiles (e.g. a site-by-species table) describe the distribution of members in biological samples. Operations to read and write these files are handled by the IO module and are important for exchanging data between programs using different formats. We have implemented parsers for five common file types (Supplementary Table S2), including the BIOM standard (McDonald et al., 2012). Examples of these file types are given in the t/data folder of the Bio-Community package. The parsers automatically detect file format based on its content using the FormatGuesser module, and iteratively record member identifier, taxonomy and abundance.

2.4 Tools

Tool modules can perform operations such as community transformation, rarefaction and taxonomic summaries (Fig. 1). Utility scripts using these modules are available in Bio-Community (Supplementary Table S3). They allow biologists to perform specific operations on community profiles, but they do not form an entire microbial analysis pipeline. These scripts can also be regarded as examples of integration of Bio-Community into bioinformatic scripts (Fig. 3). This integration can also leverage external modules to rapidly develop powerful custom scripts, e.g. Getopt::Euclid for handling command-line arguments, BioPerl modules for reading sequences or running external programs (e.g. BLAST) (Camacho et al., 2009) and Statistics::R for using R libraries or visualization capabilities.

Fig. 3. — Vignette illustrating the use of Bio-Community to read a BIOM community profile and report member information

3 CONCLUSIONS

Bio-Community provides several file formats to interface with popular programs and will help bioinformaticians quickly construct custom analysis pipelines or novel software for microbial ecology. The integration of relative and absolute abundance with diversity metrics permits holistic microbial studies (Dinsdale et al., 2008; Dove et al., 2013; Nathani et al., 2013), while weights can be added to account for gene copy number (Kembel et al., 2012) or genome length (Angly et al., 2009; Beszteri et al., 2010) bias. We encourage programmers to join the development of Bio-Community at https://github.com/bioperl/Bio-Community and to add support for new file formats, diversity metrics or tools.

Funding: Australian Research Council DE120101213 to FEA and DP1093175 to GWT.

Conflict of interest: none declared.

Supplementary Material

Supplementary Data

supp_30_13_1926__index.html^{(872B, html)}

REFERENCES

Angly FE, et al. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput. Biol. 2009;5:e1000593. doi: 10.1371/journal.pcbi.1000593. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beszteri B, et al. Average genome size: a potential source of bias in comparative metagenomics. ISME J. 2010;4:1075–1077. doi: 10.1038/ismej.2010.29. [DOI] [PubMed] [Google Scholar]
Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
Caporaso JG, et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 2010;7:335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dinsdale EA, et al. Microbial ecology of four coral atolls in the northern Line Islands. PLoS One. 2008;3:e1584. doi: 10.1371/journal.pone.0001584. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dixon P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 2003;14:927–930. [Google Scholar]
Dove SG, et al. Future reef decalcification under a business-as-usual CO2 emission scenario. Proc. Natl Acad. Sci. USA. 2013;110:15342–15347. doi: 10.1073/pnas.1302701110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frias-Lopez J, et al. Microbial community gene expression in ocean surface waters. Proc. Natl Acad. Sci. USA. 2008;105:3805–3810. doi: 10.1073/pnas.0708897105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 2004;68:669–685. doi: 10.1128/MMBR.68.4.669-685.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kembel SW, et al. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Comput. Biol. 2012;8:e1002743. doi: 10.1371/journal.pcbi.1002743. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kembel SW, et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics. 2010;26:1463–1464. doi: 10.1093/bioinformatics/btq166. [DOI] [PubMed] [Google Scholar]
Knight R, et al. PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007;8:R171. doi: 10.1186/gb-2007-8-8-r171. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leibold MA, et al. The metacommunity concept: a framework for multi-scale community ecology. Ecol. Lett. 2004;7:601–613. [Google Scholar]
Magurran AE. Measuring biological diversity. Oxford, United Kingdom: Blackwell; 2004. [Google Scholar]
McDonald D, et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. GigaScience. 2012;1:7. doi: 10.1186/2047-217X-1-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Metzker ML. Sequencing technologies—the next generation. Nat. Rev. Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
Nathani NM, et al. Comparative evaluation of rumen metagenome community using qPCR and MG-RAST. AMB Express. 2013;3:55. doi: 10.1186/2191-0855-3-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rappé MS, Giovannoni SJ. The uncultured microbial majority. Annu. Rev. Microbiol. 2003;57:369–394. doi: 10.1146/annurev.micro.57.030502.090759. [DOI] [PubMed] [Google Scholar]
Rinsoz T, et al. Application of real-time PCR for total airborne bacterial assessment: comparison with epifluorescence microscopy and culture-dependent methods. Atmos. Environ. 2008;42:6767–6774. [Google Scholar]
Schloss PD, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Env. Microbiol. 2009;75:7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stajich JE, et al. The Bioperl toolkit: perl modules for the life sciences. Genome Res. 2002;12:1611–1618. doi: 10.1101/gr.361602. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tringe SG, Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 2008;11:442–446. doi: 10.1016/j.mib.2008.09.011. [DOI] [PubMed] [Google Scholar]
Whittaker RH. Evolution and measurement of species diversity. Taxon. 1972;21:213–251. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

supp_30_13_1926__index.html^{(872B, html)}

supp_btu130_Bio-Community_supplementary.pdf^{(61KB, pdf)}

[btu130-B1] Angly FE, et al. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput. Biol. 2009;5:e1000593. doi: 10.1371/journal.pcbi.1000593. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B2] Beszteri B, et al. Average genome size: a potential source of bias in comparative metagenomics. ISME J. 2010;4:1075–1077. doi: 10.1038/ismej.2010.29. [DOI] [PubMed] [Google Scholar]

[btu130-B3] Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B4] Caporaso JG, et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 2010;7:335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B5] Dinsdale EA, et al. Microbial ecology of four coral atolls in the northern Line Islands. PLoS One. 2008;3:e1584. doi: 10.1371/journal.pone.0001584. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B6] Dixon P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 2003;14:927–930. [Google Scholar]

[btu130-B7] Dove SG, et al. Future reef decalcification under a business-as-usual CO2 emission scenario. Proc. Natl Acad. Sci. USA. 2013;110:15342–15347. doi: 10.1073/pnas.1302701110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B8] Frias-Lopez J, et al. Microbial community gene expression in ocean surface waters. Proc. Natl Acad. Sci. USA. 2008;105:3805–3810. doi: 10.1073/pnas.0708897105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B9] Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 2004;68:669–685. doi: 10.1128/MMBR.68.4.669-685.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B10] Kembel SW, et al. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Comput. Biol. 2012;8:e1002743. doi: 10.1371/journal.pcbi.1002743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B11] Kembel SW, et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics. 2010;26:1463–1464. doi: 10.1093/bioinformatics/btq166. [DOI] [PubMed] [Google Scholar]

[btu130-B12] Knight R, et al. PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007;8:R171. doi: 10.1186/gb-2007-8-8-r171. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B13] Leibold MA, et al. The metacommunity concept: a framework for multi-scale community ecology. Ecol. Lett. 2004;7:601–613. [Google Scholar]

[btu130-B14] Magurran AE. Measuring biological diversity. Oxford, United Kingdom: Blackwell; 2004. [Google Scholar]

[btu130-B15] McDonald D, et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. GigaScience. 2012;1:7. doi: 10.1186/2047-217X-1-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B16] Metzker ML. Sequencing technologies—the next generation. Nat. Rev. Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]

[btu130-B17] Nathani NM, et al. Comparative evaluation of rumen metagenome community using qPCR and MG-RAST. AMB Express. 2013;3:55. doi: 10.1186/2191-0855-3-55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B18] Rappé MS, Giovannoni SJ. The uncultured microbial majority. Annu. Rev. Microbiol. 2003;57:369–394. doi: 10.1146/annurev.micro.57.030502.090759. [DOI] [PubMed] [Google Scholar]

[btu130-B19] Rinsoz T, et al. Application of real-time PCR for total airborne bacterial assessment: comparison with epifluorescence microscopy and culture-dependent methods. Atmos. Environ. 2008;42:6767–6774. [Google Scholar]

[btu130-B20] Schloss PD, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Env. Microbiol. 2009;75:7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B21] Stajich JE, et al. The Bioperl toolkit: perl modules for the life sciences. Genome Res. 2002;12:1611–1618. doi: 10.1101/gr.361602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btu130-B22] Tringe SG, Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 2008;11:442–446. doi: 10.1016/j.mib.2008.09.011. [DOI] [PubMed] [Google Scholar]

[btu130-B23] Whittaker RH. Evolution and measurement of species diversity. Taxon. 1972;21:213–251. [Google Scholar]

PERMALINK

The Bio-Community Perl toolkit for microbial ecology

Florent E Angly

Christopher J Fields

Gene W Tyson

Abstract

1 INTRODUCTION