Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jan 1.
Published in final edited form as: Transl Res. 2016 Jul 25;179:7–23. doi: 10.1016/j.trsl.2016.07.012

High-Resolution Characterization of the Human Microbiome

Cecilia Noecker 1,#, Colin P McNally 1,#, Alexander Eng 1,#, Elhanan Borenstein 1,2,3,
PMCID: PMC5164958  NIHMSID: NIHMS811770  PMID: 27513210

Abstract

The human microbiome plays an important and increasingly recognized role in human health. Studies of the microbiome typically employ targeted sequencing of the 16S rRNA gene, whole metagenome shotgun sequencing, or other meta-omic technologies to characterize the microbiome's composition, activity, and dynamics. Processing, analyzing, and interpreting these data involve numerous computational tools that aim to filter, cluster, annotate, and quantify the obtained data and ultimately provide an accurate and interpretable profile of the microbiome's taxonomy, functional capacity, and behavior. These tools, however, are often limited in resolution and accuracy and may fail to capture many biologically- and clinically-relevant microbiome features, such as strain-level variation or nuanced functional response to perturbation. Over the past few years, extensive efforts have been invested toward addressing these challenges and developing novel computational methods for accurate and high-resolution characterization of microbiome data. These methods aim to quantify strain-level composition and variation, detect and characterize rare microbiome species, link specific genes to individual taxa, and more accurately characterize the functional capacity and dynamics of the microbiome. These methods and the ability to produce detailed and precise microbiome information are clearly essential for informing microbiome-based personalized therapies. In this review, we survey these methods, highlighting the challenges each method sets out to address and briefly describing methodological approaches.

Introduction

Recent marked advances in sequencing technologies have been followed by an explosion of studies utilizing these technologies to explore a wide range of microbial communities, including those that inhabit the human body. Such studies apply targeted sequencing of the 16S rRNA gene as well as whole metagenome shotgun sequencing to characterize the human microbiome in numerous settings. Analyses of these sequencing data commonly use an assortment of clustering, binning, annotation, and assembly algorithms to ultimately profile the composition of species in each sample, the set of genes they collectively encode, or the genome sequence of specific member species (Figure 1). Combined, these efforts to map the human microbiome in health and in disease have led to an increased appreciation for the important role of the microbiome in human well-being15.

Figure 1. Schemes of microbiome analysis.

Figure 1

Metagenomic data, as well as other meta-omic data, can be processed and analyzed in various ways to address a diverse set of questions concerning the microbiome's composition, capacity, and function.

Nevertheless, common computational metagenomic analysis methods are often limited in resolution and may fail to resolve nuanced, yet important and potentially clinically-relevant details concerning the composition of species and genes in the microbiome. Standard 16S rRNA surveys, for example, are often limited to a genus-level taxonomic identification6, can fail to distinguish closely related taxonomic groups, and cannot always unambiguously discriminate rare, low-abundance taxa from noise7. Shotgun metagenomic analyses may similarly fail to identify the taxonomic origins of a gene of interest or to produce accurate and unbiased estimates of gene families’ abundances8,9.

Clearly, however, given the complexity of the human microbiome, accurate and high-resolution mapping of the microbiome is crucial for gaining a principled understanding of community behavior, function, and ultimately its impact on the host10. For example, accurately profiling strain-level microbiome composition is vital for tracking ecological trends over time, such as the spread of bacterial vaginosis-associated strains between sexual partners11. Discerning subtle genomic variation between closely related strains of the same species may also have important clinical implications, as in the case of Propionibacterium acnes, which displays extensive strain variation in the skin microbiome with potential impact on various skin conditions12. Likewise, Escherichia coli has well-characterized variation in toxin production, which results in high pathogenicity for a subset of strains, while other strains are commonly found in healthy gut microbiomes13. Careful differentiation of strains can also inform clinical decision making by, for example, providing valuable insights as to whether a patient will respond to the heart failure drug digoxin14. Accurate detection of low abundance species is similarly essential as such rare species may still play important roles in various biological processes. Indeed, even species present at less than 0.01% abundance in oral microbial communities can play a key role in causing oral inflammatory disease15. A high-quality, unbiased, and rigorous characterization of the metagenome's gene content is equally important for pinpointing disease-associated shifts in the functional capacity of the microbiome9.

Moreover, many molecular processes that play important roles in the microbiome's activity and dynamics go beyond the microbiome's taxonomic and genic composition and accordingly cannot be profiled through metagenome sequencing. For example, oligosaccharides found in breast milk can change microbial gene expression and production of physiologically relevant microbial metabolites in the infant gut without affecting the abundance of most species16. Exploring such processes may require detailed information about transcript and protein variation, metabolite concentrations, and spatial distribution. Indeed, new ‘omics’ technologies can now comprehensively quantify such community features, but computational methods available to analyze the resulting sizable and novel datasets are largely still in early stages and may be limited in resolution.

In this review, we accordingly describe an array of recent computational methods and analytical approaches that set out to address these challenges and to provide high-resolution, multi-omic, systematic characterizations of the microbiome at multiple levels (Table 1). While some of these approaches have primarily been applied to environmental microbial communities, all are broadly applicable and potentially useful in the context of the human microbiome and its health impacts. We first discuss taxonomic analysis of the microbiome, focusing on methods for detecting strain-level variation within each member species. We specifically describe methods that utilize targeted 16S rRNA or whole metagenome sequencing data for strain-level profiling, identification, and tracking, either de novo or based on existing reference genomes. We also describe recent methods for assembling the genomes of novel strains directly from metagenomic data. We next discuss methods for improved functional characterization of the microbiome, including accurate detection of the various gene families encoded by the metagenome and precise quantification of their abundances, and for linking taxonomic and functional profiles. Lastly, we describe several recent frameworks for analyzing and integrating other microbiome-derived high-throughput ‘omic’ datasets and for profiling additional facets of the microbiome's composition and activity.

Table 1.

Tool Synopsis Ref. Website
16S rRNA analysis
M-pick Identifies taxonomic clusters based on modularity analysis of a sequence read graph 24 http://plaza.ufl.edu/xywang/Mpick.htm
Swarm Uses iterative linkage clustering to identify natural subpopulations 27 https://github.com/torognes/swarm
Minimum entropy decomposition Hierarchically clusters sequences by iteratively subdividing based on sequence entropy 30 http://merenlab.org/2014/11/04/med/
Oligotyping Identifies subpopulations within predefined OTUs by identifying and clustering the most informative nucleotide positions 31 http://merenlab.org/projects/oligotyping/
Oclust Hierarchically clusters PacBio CCR reads into OTUs 33 https://github.com/oscar-franzen/oclust/
CopyRighter Correct 16S rRNA data for copy number variation 40 https://github.com/fangly/AmpliCopyrighter
metagenomeSeq R package for normalization and differential abundance analysis 42 http://cbcb.umd.edu/software/metagenomeSeq/
phyloSeq R package for processing, normalization, differential abundance analysis, and visualization 43 https://joey711.github.io/phyloseq/
RAIDA Differential abundance analysis using ratios between taxa 44 http://cals.arizona.edu/~anling/sbg/software.htm
Strain-level characterization using shotgun metagenomic data
WG-FAST Identified strains from low coverage genome datasets 63 https://github.com/jasonsahl/wgfast
Sigma Taxonomic analysis of metagenome data at the strain level, including variant calling and statistical uncertainty calculations 62 http://sigma.omicsbio.org
Constrains Identifies strains from metagenomic sequence data and reconstructs their phylogeny 64 https://bitbucket.org/luo-chengwei/constrains
PathoScope Identifies the proportion of reads from individual microbial strains in metagenomic sequencing data 70 https://sourceforge.net/projects/pathoscope/
- Large-scale characterization of strain-level copy number variation 66 http://elbo.gs.washington.edu/download.html
phyloCNV Profiles species abundance; identified, characterizes, and analyzes strains based on SNPs and copy number variation 65 https://github.com/snayfach/PhyloCNV
Assembling reference genomes from shotgun metagenomic data
MEGAHIT Assembles metagenomic short reads with low memory use 78 https://github.com/voutcn/megahit
MetaVelvet Assembles metagenomic short reads into contigs 79 http://metavelvet.dna.bio.keio.ac.jp/
CONCOCT Binning using gaussian mixture models and sequence composition and abundance 90 https://github.com/BinPro/CONCOCT
GroopM Automated genome recovery from metagenomes 87 http://ecogenomics.github.io/GroopM/
MetaBAT Bins genomes based on tetranucleotide frequency and abundance 89 https://bitbucket.org/berkeleylab/metabat
MaxBin Bins assembled genomes using EM algorithm 88 http://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html
ABAWACA Binning based on mono, di, and tri-nucleotide frequency and abundance, using hierarchical clustering 91 https://github.com/CK7/abawaca
CheckM Assessing quality of putative genomes 94 http://ecogenomics.github.io/CheckM/
MetaDecon Metagenomic deconvolution and reconstruction of species-specific genomic content 93 http://elbo.gs.washington.edu/software_metadecon.html
Specialized functional annotation HMM databases
FOAM Annotation of KEGG orthology groups 119 http://cbb.pnnl.gov/portal/software/FOAM.html
Resfams Annotation of antibiotic resistance genes 108 http://www.dantaslab.org/resfams
dbCAN Annotation of carbohydrate-active enzyme (CAZyme) genes 125 http://csbl.bmb.uga.edu/dbCAN/annotate.php
Functional annotation and quantification of shotgun metagenomic data
ShortBRED Representative marker-based protein family profiling 126 https://huttenhower.sph.harvard.edu/shortbred
MUSiCC Accurate normalization of functional profiles 9 http://elbo.gs.washington.edu/software_musicc.html
MicrobeCensus Accurate normalization of functional profiles 129 https://github.com/snayfach/MicrobeCensus
Combined taxonomic and functional annotation of shotgun metagenomic data
LMAT k-mer based taxonomic assignment of metagenomic reads 133 https://sourceforge.net/projects/lmat/
Kraken k-mer based taxonomic assignment of metagenomic reads 134 http://ccb.jhu.edu/software/kraken/
TAC-ELM Neural network-based taxonomic classification of metagenomic reads 135 http://cs.gmu.edu/~mlbio/TAC-ELM/
MetAnnotate Combined taxonomic and functional annotation with HMMs and alignment to HMM-family protein sequences 138 http://metannotate.uwaterloo.ca
SeMeta Multiple-level clustering of reads followed by cluster assignment to taxa through representative read alignment 140 http://it.hcmute.edu.vn/bioinfo/metapro/SeMeta.html
MetaCluster-TA Clusters assembled contigs and assigns taxonomy by alignment 139 http://i.cs.hku.hk/~alse/MetaCluster/index.html
Meta-omic analysis tools
TAG Assembles a metatranscriptome incorporating information from a metagenome assembly 153 http://omics.informatics.indiana.edu/TAG/
Anvi'o Interactive processing and visualization of metagenomes and metatranscriptomes 156 http://merenlab.org/projects/anvio/
Pipasic Produces quantitative strain-specific peptide assignments using a sequence similarity correction 163 https://sourceforge.net/projects/pipasic/
BacSpace Processing and quantitative analysis of microbial community FISH imaging data 174 https://bitbucket.org/kchuanglab/bacspace/downloads
MIMOSA Metabolic model-based integration of microbiome taxonomic and metabolomic profiles 172 http://elbo.gs.washington.edu/software_MIMOSA.html

High-Resolution Characterization of the Microbiome's Taxonomic Composition

One of the most common and relatively accessible starting points for human microbiome analysis is taxonomic profiling. Specifically, by sequencing and analyzing taxonomy-associated marker genes, researchers can readily identify the various species present in a given microbiome sample and estimate the relative abundances of each species17. The study of such taxonomic profiles and the way they vary across individuals or between cohorts can provide numerous insights into the link between the microbiome ecology and the host's health. Such studies can, for example, pinpoint specific species with known virulence factors or community-wide dysbiotic features as biomarkers of disease18,19. As noted above, however, taxonomic profiling is often limited in resolution and may therefore hinder our ability to detect more fine-grained determinants of disease. Below, we describe several new and exciting developments in the analysis of both marker gene data and whole metagenomes that aim to provide a more detailed, high-resolution map of the microbiome's taxonomy.

High-resolution and accurate analysis of 16S rRNA data

To date, the most prevalent form of comprehensive microbiome taxonomic data is produced via targeted amplification and sequencing of the 16S rRNA gene, a commonly used phylogenetic marker20. The analysis of such 16S rRNA sequencing data typically involves clustering of the obtained sequences (usually based on sequence overall percent similarity) into Operational Taxonomic Units (OTUs), and determining the relative abundance of each OTU in the sample. The taxonomy of each OTU can then be inferred by clustering reads with reference sequences of known taxonomy or by a classifier algorithm that predicts each OTU's (or each read's) likely taxonomy21,22.

This clustering-based approach is efficient, widely-used, and well-established; yet several challenges remain in terms of accurate and precise taxonomic quantification at the species or strain level. First, a measure of the overall percent similarity between two 16S rRNA sequences may not fully capture the variation present in the sequenced region or taxonomic divergence this variation represents. Indeed, the number and nature of polymorphisms and of true subpopulations included within a single, similarity-based OTU cluster can vary greatly across OTUs23. A number of recently introduced algorithms aim to account for such variation using graph-based clustering approaches and grouping sequences based on local base differences between reads rather than by overall percent similarity24,25. These algorithms have proved successful in identifying higher-resolution sequence clusters and more accurately describing the population structure in each sample. One example of such an algorithm, termed Swarm26,27, first performs exact linkage clustering to group reads that have one nucleotide differences to any other read in the same cluster, and then refines each cluster based on read abundance distributions. This approach has been successfully applied to characterize fine-scale taxonomic profiles of bacteria and protists in several environments28,29. An alternative method for addressing the limitation of similarity-based clustering, termed minimum entropy decomposition, generates a hierarchy of read groupings by iteratively subdividing the dataset into groups based on the entropy explained by each division30.

Moreover, the clusters produced by any OTU picking method may also vary in homogeneity and within-cluster diversity across the various samples. Describing this within-cluster variation may lead to sub-OTU level taxonomic insights such as sharing of an OTU subpopulation across samples One computational approach to capture this variation (termed Oligotyping) uses Shannon entropy calculations to detect the most informative nucleotide variation and to correctly identify subpopulations within predefined OTU clusters31. This method relies on a combination of strategies to deemphasize likely sequencing errors compared to true strain variation and has been successfully applied to track strains of Gardnerella vaginalis shared between sexual partners11 and to study population dynamics in the oral microbiome and in sewage30,32. Another approach for extracting more detailed taxonomic information from 16S rRNA reads relies on longer-read sequencing technologies (most notably, PacBio Single Molecule Real Time sequencing) to obtain sequence data from more variable regions of this gene. Two recently introduced pipelines process and cluster PacBio circular consensus sequencing reads33,34, accounting for the specific characteristics of this different sequencing platform.

Notably, as methodologies for characterizing 16S rRNA datasets have proliferated, so have studies comparing and evaluating these approaches22,3537. These studies, however, have not necessarily reached a clear consensus on the superior approach, but have rather demonstrated that the choice of algorithm can have substantial impact on subsequent analyses, and that the best choice of method likely depends on the community being analyzed and the sequencing technology used.

Once sequences have been grouped into OTUs, OTU abundances can be analyzed and compared across taxa or samples. However, accurate measurements of 16S rRNA read count may not necessarily accurately mirror the abundances of the various taxa in the community. First, since the copy number of the 16S rRNA gene varies across microbial genomes, 16S-based surveys may overestimate the abundances of taxa with multiple copies of this gene. Using reference genome information to normalize for this variation can adjust and improve estimates of the relative abundance of different taxa in the same sample3840. PCR amplification can also introduce bias into abundance comparisons between taxa, since ribosomal genes from some taxa may amplify poorly with commonly-used primer sets41. This limitation also prevents the comparison of taxonomic abundances across different datasets generated using different primers. Lastly, the relative abundance of reads assigned to a given OTU across samples can be skewed by changes in the absolute abundance of another OTU, a phenomenon known as compositional bias. A number of tools have been introduced to correct for this bias, primarily by adopting techniques developed to address a similar problem in RNA-Seq experiments4244, or alternatively to account for this effect in analyzing relative abundance values45. Failure to address this bias can result, for example, in the identification of spurious correlations between the abundances of different OTUs, limiting our ability to robustly analyze co-occurrence relationships between different taxa46. Combined, these various biases render the relationship between the relative abundances of 16S rRNA reads and true taxonomic abundances extremely complex. One recent study set out to comprehensively characterize the joint impact of the various factors influencing this relationship by using synthetic mock communities of vaginal microbiome taxa and fitting regression models that predict true abundance of a given taxon as a function of both 16S rRNA read count and several taxon-specific bias correction terms47. While such a detailed approach can be helpful for interpreting and analyzing 16S rRNA datasets of well-studied taxa, the precise relationship between 16S read counts and true community taxonomic structure for many microbiome studies remains to be characterized.

Resolving strain-level taxonomy from shotgun metagenomic data

While 16S rRNA-based surveys can provide important insights into the taxonomic composition of a given microbiome sample, their ability to resolve strain-level genetic diversity is inherently limited. In fact, substantial genotypic variation can exist in the absence of noticeable 16S rRNA sequence divergence48. This variation can impact the capacity and behavior of a species and ultimately impact community-level activity. For example, some species, such as E. coli, have extremely marked variation in gene content48,49, which can influence the strain's pathogenicity or ecological niche50,51. Moreover, the concept of a bacterial species is in fact somewhat subjective and may not be captured well by the level of divergence in a ribosomal gene sequence6. It is therefore often informative to go beyond species-level resolution and to characterize the composition of strains (i.e., within-species taxonomic divisions) and strain-level variation within the microbiome.

Unfortunately, however, traditional methods for detecting, characterizing, and tracking strain-level diversity rely on sequencing5256 or applying microarrays57,58 to cultured isolates, and are therefore not readily applicable in a microbiome setting where many microbial taxa cannot be easily isolated or cultured59,60. Moreover, efforts to isolate all strains of interest in a given microbiome sample and sequence their genomes can be extremely resource-intensive61. An increasingly feasible alternative is to decipher strain-level diversity directly from shotgun metagenomic data using a plethora of novel and sophisticated computational techniques. Indeed, by identifying within-species genetic variation directly from metagenomic samples, a more comprehensive set of strains can be characterized in a high-throughput manner from a single sequencing experiment. This approach has been successfully applied, for example, to detect pathogenic strains of E. coli in clinical samples or for biosurveillance62,63, to identify novel strain-level dynamics in the infant gut64, to confirm the retention of personal strains over time65, and to demonstrate extensive, widespread, and clinically-relevant strain-level variation in the gut microbiome66.

Notably, strain-level variation can manifest in two ways: single nucleotide polymorphisms (SNPs) within shared genomic content, and variation in the presence (or copy number) of complete genes or specific segments of the genome. Most recently developed metagenomics-based SNP analysis methods take advantage of reference genome collections to estimate community diversity, detect strains of interest, or find shared strains between different metagenomic samples. These methods may use full genomes62,63,65,67 or marker genes known to contain loci with strain-identifying SNPs64. Since some reference genomes may be extremely similar to each other, the first step in many of these methods is to cluster genomes by similarity and select a representative genome for each cluster to which metagenomic reads can be aligned. This step speeds up the alignment process and reduces the likelihood of reads from the same strain being mapped across different but closely related references. One method, Sigma62, first uses such reference genome alignments to estimate relative abundances of taxa and then relies on the obtained estimates to refine the read mapping assignments. ConStrains64 takes a more efficient approach of mapping reads only to previously-identified informative marker genes, which facilitates strain profile comparisons between samples (e.g. for tracking strain dynamics in the infant microbiome68,69). Another tool, PathoScope70, identifies strains using reference genomes and estimates the relative share of different strains from the same species in a given sample, and is particularly optimized to detect low abundance strains from clinical samples.

The detection of strain-level variation in gene copy number (CNVs) or in gene content is a more specialized application of shotgun metagenomic-based taxonomic classification that focuses specifically on functional variation. Indeed, CNVs are an important source of functional difference between strains, with many variable genes involved in metabolism71,72, membrane and transport proteins5658, and virulence55,71. Identifying which genes are present, absent, or vary in copy number across the various strains in a microbiome sample is therefore a crucial task that has been address by several recent studies. Most of these studies rely on mapping short metagenomic reads to some set of reference genomes (using a variety of read-mapping strategies), aiming to detect genomic regions for which the observed coverage varies from our expectation. Analysis of data from the Human Microbiome Project, for example, used a similar approach, mapping reads directly to a reference genome of Streptococcus mitis and demonstrating strain-level variation in the presence/absence of various genomic elements of this species1. More recently, a first large-scale analysis of strain-level copy number variation was introduced, using universal single-copy genes to translate coverage measurements into copy number estimates and inferring the copy number of thousands of genes across dozens of species and in >100 samples66. Comparing copy number estimates across samples, this study has demonstrated extensive and widespread strain variation in the gut, including variation associated with obesity and Inflammatory bowel disease. Several later studies used a similar approach for detecting strain-level variation but focused mostly on genes’ presence/absences rather than on variation in copy number73. Other studies extending this approach have first constructed pan-genomes (as inventories of all genes known to occur in any strain of a particular species) and map reads to these pan genomes65,74. An alternative approach to directly mapping short reads to reference genome is to first assemble metagenomic reads into contigs, identify predicted genes in these contigs, and then align those to a reference71,72. These longer query sequences may improve strain-specific gene identification but may be more limited in scale.

Assembling reference genomes from metagenomes

Detailed identification of strains and species can be more informative if combined with information about the gene content of each genome. Genome content information represents a mechanistic link between the taxonomy of a given microbial organism and its functional capacity, and, more generally, between community ecology and community-wide activity. Indeed, many prevalent gut species have been isolated and sequenced, yet many microbial taxa (and many strains) still lack any reference sequence75. Assembling complete genomes directly from shotgun metagenomic reads is therefore a crucial (though clearly nontrivial) task. A recent study, for example, has demonstrated the utility of assembling shotgun reads for linking taxonomic and functional dynamics after a dietary intervention for patients with Prader-Willi syndrome76. The past several years have, however, witnessed substantial progress in the quality and number of genomes recovered and assembled from metagenomes77.

Assembling genomes from metagenomes commonly involves two steps. First, shotgun metagenomic reads are assembled into contigs and then the obtained contigs are grouped into multiple bins such that each bin ideally includes contigs from the same taxon. The assembly step can be performed using numerous assemblers that have been optimized for assembling metagenomic reads such as MEGAHIT78, MetaVelvet-SL79, Ray Meta80, or IDBA-UD81. The binning step often relies on nucleotide composition, exploiting the relationship between phylogenetic relatedness and similarly in various sequence features such as GC content or k-mer frequency82. Such nucleotide composition-based methods are prevalent, well-established, and several different implementations are available83,84. More recently, a different strategy for binning was introduced, which utilizes the fact that different reads from the same species will tend to co-vary in abundance across samples72,85. This approach was later refined by using the obtained differentially abundant bins of contigs to re-assemble the reads86. Finally, over the past few years, several exciting methods that integrate both the nucleotide composition-based and the differential abundance-based approaches have been published, including GroopM87, MaxBin88, MetaBAT89, CONCOCT90, and ABAWACA91. Another recently introduced method, termed Latent Strain Analysis (LSA), bins genomes using single value decomposition, enabling it to assemble genomes from very large datasets and thus identify rare species not found with other methods92. Alternative approaches bin reads or whole genes without assembly, for example by utilizing the expected co-variation between the abundance of various genomic elements in the metagenome and the abundance of the OTU from which they originated to deconvolve the metagenome into taxon-specific genomic data93. When considering these various binning methods, it should be noted that nucleotide composition-based methods have the advantage of being applicable even when only a single metagenomic sample is available, whereas differential abundance-based methods require multiple (and ideally a large number of) samples. When multiple samples are available, however, recent methods that combine both nucleotide composition and differential abundance will likely perform best. A comprehensive comparison of the performance of these many different binning algorithms has not yet been presented, though tools for validating the quality and completeness of assembled genomes are available (see, for example, CheckM94 and METAQUAST95).

While the methods above relied solely on metagenomic short read data, new molecular technologies hold promise for improving metagenome-based genome assembly. For example, combining short read sequencing with Hi-C data (which provide information about the physical proximity of the different sequences) has shown to improve contig binning in synthetic mixtures of microbes9698. Long read and single-molecule sequencing can similarly help to link sequences from the same genome. For example, PacBio reads have been combined with short reads to reconstruct high-quality, closed genomes from the skin microbiome99, and synthetic long reads have been successfully used to improve assembly quality100,101. These approaches require additional experimental and computational steps, but may significantly improve the ability to recover quality genomes from complex community samples, and are particularly promising for recovering genomes of rare species101.

High-Resolution Characterization of the Microbiome's Functional Capacity

Taxonomic analyses can be extremely useful for detecting disease-associated shifts in community composition and for characterizing states of ecological dysbiosis. Some research questions, however, may be best addressed by considering the aggregate functional potential of the microbiome, regardless of the individual species that carry a specific gene or perform a specific function. Identifying which gene families are encoded in a metagenome provides insight into the capacity of the community as a whole and allows for comparison of the functional potential of a given sample to that of another sample or another environment102. It can facilitate, for example, the identification of novel metabolic functions103,104, disease-associated shifts in the microbiome's metabolic capacity2,105, functional profile variations due to environmental fluctuations106,107 or antibiotic resistance genes108,109. In such settings, researchers commonly take a gene-centric approach, treating the community as a single supra-organism110112 and profiling the set of genes collectively encoded by the metagenome. To this end, these studies directly annotate each read in the metagenome (or each gene identified in assembled contigs) with a functional category. Importantly, this approach is particularly useful when the community harbors many poorly characterized species with no reference genome. In this section, we describe recent developments in functional annotation of metagenomic samples that aim to provide a more nuanced, targeted, and accurate quantification of an individual microbiome's functional capabilities.

Accurate annotation and quantification of the metagenome's functional profile

Functional annotation of shotgun metagenomic reads can be accomplished by a variety of recently introduced frameworks113118, and is typically based on mapping these reads to genes or protein domains with known functional classifications. Read mapping is done either by aligning each read to a reference database of gene or protein sequences or by using probabilistic models (such as Hidden Markov Models; HMMs) to evaluate the likelihood that a given read belongs to a specific protein family or domain.

The general annotation approach provides a useful broad overview of the functional profile of a community, but may have a high false positive rate due to the large reference databases used. Such false positives may represent, for example, reads originating from genes that in fact have no closely related references in the database but that still map to genes with which they share regions of homology even though they may not perform the same function. To address this shortcoming, recent efforts have produced tailored reference databases that cover specific classes of proteins, in the hope that such specialized databases could improve the specificity and accuracy of functional annotation. Specifically, while large databases of protein-based HMMs exist, several specialized HMM databases have been recently introduced for metagenomic annotation. For example, FOAM119 is a database designed to identify genes matching KEGG Orthology groups116,120 that can aid in characterizing the metabolic potential of communities121,122. Resfams108, on the other hand, was developed to recognize the structure of antibiotic resistance genes and has been used to study the human gut resistomes of different cultures123,124. Yet another database, dbCAN125, specifically targets carbohydrate-active enzymes. A related method, ShortBRED126, similarly quantifies a specialized set of proteins of interest, but uses alignment-based annotations rather than HMMs for a more efficient and general approach that allows for customized user-defined reference databases. These metagenomics-specific and specialized databases are a key component for accurate annotation of complex metagenomic samples. Much progress has also been made in methods for read alignment, focusing primarily on speeding up the alignment process117,118 or providing efficient web-based annotation tools114,116.

Notably, however, even when shotgun reads are aligned to an appropriate database the resulting calculated functional profile can be markedly impacted by various factors, including experimental and computational biases and the protocol used to annotate each read based on the obtained alignments. Sample processing and library preparation can, for example, bias the predicted functional profile of a metagenomic sample127. A recent study systematically evaluated such homology-based annotation practices and demonstrated that variation introduced by computational protocol selection could completely mask true biological variation between samples, suggesting goal-specific best-practice guidelines for metagenomic annotation128. Moreover, once the samples’ functional profiles have been determined, rigorous normalization and calibration of samples are still required to allow accurate comparison across samples (e.g., to identify disease-associated functional shifts). A couple of recent studies, however, have demonstrated that the commonly used compositional normalization (i.e., using the relative abundance of each gene family within the metagenome) introduces marked biases both across and within microbiome samples9,129. These studies have further presented novel methods (termed MUSiCC9 and MicrobeCensus129) that use universal single-copy genes to calibrate measurements of gene abundances and to correct these biases. Use of these methods should improve the accuracy and statistical power of future comparative functional analyses.

Integrated characterization of function and taxonomy

As noted above, methods for characterizing both the microbiome's taxonomic profile and its functional capacity have advanced rapidly over the past few years. Yet, a remaining important challenge is the integrated analysis of these two aspects of the microbiome and the determination of which taxa provide which functions. Such information will not only allow us to fill in gaps in the availability of reference genomes but is also a crucial first step in the development and design of targeted microbiome manipulations that could modulate the community's function.

A simple approach to associate taxa with functional potential is to annotate reads (or partial assemblies) with both taxonomy and function using any of the methods discussed above. Determining the taxon of origin for the many reads in a metagenomic sample, however, can be both computationally expensive and methodologically challenging due to the short length of shotgun reads and varying distribution of taxonomy-distinguishing loci across genomes. To address the latter issue, early tools such as MEGAN130 and MTR131 used a lowest common ancestor (LCA) approach that assigns a read the highest-resolution taxonomic classification that is shared by all sequences to which the read aligned. LCA classifications are clearly limited in resolution, leaving a large fraction of reads with only a course-grained taxonomic assignment or none at all132. To improve the precision of taxonomic assignment of shotgun metagenomic reads, several recently introduced tools have incorporated information on k-mer frequency profiles in reference genome databases, though how those profiles are used varies greatly between tools. LMAT133 and Kraken134 both assign taxonomy based on identified LCA taxa for k-mers in each query sequence. Other methods train models on the k-mer profiles associated with each taxon, using a variety of machine learning approaches including neural networks (TAC-ELM135), naïve Bayes classifiers (RITA136), or linear models-based methods137. TAC-ELM also incorporates data on GC content and RITA combines BLAST-based reference alignments. Comparisons between Kraken and the linear model-based method above suggest that while exact k-mer matching methods like LMAT and Kraken are more accurate when query sequences originate from reference genomes, they may produce overly specific classifications for sequences from genomes absent from the reference database137. Moreover, Kraken requires fairly long (31 amino acid) k-mer matches, which may potentially reject many short reads due to insufficient data. These observations suggest that exact k-mer matching methods are most appropriate when a metagenome is dominated by well-characterized taxa and consists of sufficiently long reads, whereas machine learning approaches are superior for samples with more novel or unclassified microbes.

A few alternative methods for taxonomic assignment of shotgun reads use more specialized techniques. MetAnnotate138 first uses an HMM approach to functionally annotate metagenomic reads, then determines taxonomy based on comparisons to the homologs of the matching protein family. Notably, this approach combines both functional annotation and taxonomic assignment into a single pipeline. Another tool, MetaCluster-TA139, partially assembles reads, clusters the resulting contigs, and then assigns the LCA taxonomy given cluster alignments to genomes. SeMeta140 similarly groups reads that contain overlapping sequence, clusters those groups by k-mer profiles, and then assigns each cluster with a taxonomic classification using an LCA approach for representative reads. These clustering-based methods aim to leverage groups of reads to obtain a broader genomic context for taxonomic classification (in contrast to the k-mer approaches that classify single reads), but may still produce low-resolution or incorrect taxonomic assignments if clustering of reads is incorrect. Together, these novel techniques allow more detailed and accurate functional profiling of microbiome samples, which will ultimately aid in understanding the human microbiome's functional capacity, dynamics, and impact on the host.

Characterization of Other Microbiome Facets via Meta-Omic Assays

While deep genomic characterization of microbial communities has rapidly advanced our understanding of community structure and function, many community features cannot be captured by metagenomic assays. For example, the oral microbiome undergoes a dramatic shift in metabolism in response to carbohydrate consumption without any taxonomic group shifting substantially in abundance141. Likewise, communities with very different taxonomic profiles may in fact have similar functional metabolic profiles142. To study such processes in detail and to characterize these additional facets of the microbiome's activity, researchers utilize comprehensive ‘meta-omics’ technologies (including metatranscriptomics, metaproteomics, and metabolomics) that can systematically characterize community-wide gene expression, protein abundance, and metabolite concentration over time or in response to perturbations. In fact, multiple reviews have recently called for an integrative approach that combines and compares these ‘omic’ assays to identify and characterize the underlying biological mechanisms in the microbiome143146. However, analyzing each of these omic datasets presents substantial bioinformatic challenges that have only been partially addressed to date. As in metagenomics, accurate and high-resolution quantification of the measured elements and accounting for various regularities, biases, and dependencies in the data are key for realizing the full potential of these exciting high-throughput datasets. These meta-omic assays, and the unique challenges each one presents, are discussed below.

A metatranscriptomic assay generally involves reverse transcription and cDNA sequencing of RNA material isolated from a microbiome sample. Such measurements of gene expression at the community level can provide important information on how different species respond to each other and to environmental changes such as antibiotic treatment147 or dietary perturbations148. This technology was further used to characterize gene expression patterns in a diverse range of communities148152. A typical analysis of such metatranscriptomic data consists of transcript assembly, annotation with functional and/or taxonomic information, normalization, and testing for differential expression between sample groups. None of these processing and analysis steps is necessarily simple or straightforward. The assembly of metatranscriptomic data can be performed by any transcript assembler, but it may be useful to leverage reference information from an associated metagenome. For example, a recently developed method applied a de Bruijn graph-based approach to incorporate information on metagenome assembly quality and completeness to improve subsequent transcript assembly153. Assembled transcripts can be annotated for taxonomy and function using any of the metagenome annotation tools described above. However, as in the case of metagenomic assembly, fully assembled transcripts may not always be easy to obtain or informative (although, as an alternative, an assembly-free metatranscriptome-specific annotation pipeline is also available154). Moreover, a recent simulation study recommended that a reasonably unbiased analysis could be achieved by both assembling transcripts and including unassembled transcripts in subsequent clustering and annotation155. Notably, even after the metatranscriptome has been processed and the number of reads associated with each gene and/or taxon has been calculated, evaluating and exploring such data is a daunting task due to the potentially thousands of taxa, each with thousands of expressed genes, that are represented by these data. To address this challenge, an interactive tool (termed Anvi'o) has been recently introduced, implementing several metatranscriptomic and metagenomic processing algorithms and producing clear visualizations of assemblies and profiles at the species-, gene-, contig-, and sample-level156.

Statistically sound normalization and rigorous quantitative comparisons of such complex metatranscriptomic datasets are a further challenge. The abundance of reads from a given transcript in a metatranscriptome depends on multiple factors, including the expression level of that transcript in its resident species, the abundance of that species in the community, and various biases associated with RNA-Seq experiments (such as compositional bias and batch effects). Extensive simulation and evaluation of such RNA-Seq biases and the development of rigorous methods for addressing them have produced useful tools for analyzing single-organism RNA-Seq experiments and correcting RNA-Seq-associated biases157159, some of which have already been applied in the microbial community setting. In contrast, however, methods for differentiating between transcript abundance changes occurring due to gene regulation in a given taxon versus those occurring due to ecological shifts are still lacking and are an important area for future research.

Similarly, while metaproteomic assays present a powerful opportunity to understand protein-level regulation in complex communities, the analysis of such data presents a plethora of challenges, including both the traditional obstacles associated with proteomics-based experiments and additional complications associated with assaying a mixed community of microbes. Such studies generally use tandem mass spectrometry to quantify peptide fragments, and then identify the source proteins of each peptide by searching against a reference database of theoretical or previously collected spectra. Since a peptide typically cannot be identified unless it is found in the reference database, the choice of database and search parameters can have a substantial impact on the obtained results. Indeed, this effect was convincingly shown in a recent study comparing peptide identifications in a human intestinal metaproteomic dataset with a classic single-organism proteomic dataset using several different metaproteomic databases and search strategies160. An efficient way to narrow the search space and identify uncharacterized proteins is to use a database of theoretical spectra constructed from associated metagenome sequencing reads to search the obtained peptides161. In a recent study, for example, a strain-resolved metagenome was used to analyze a longitudinal metaproteomic dataset from the gut of a preterm infant162. Further difficulty associated with analyzing community-wide proteomic data arises because a given peptide may match homologous proteins across multiple taxa. Pipasic is a recently developed tool that addresses this challenge by correcting for the amount of similarity in peptide sequences from different strains163. Moreover, proteins at the community level display an enormous dynamic range of abundances, and it therefore cannot be reliably determined whether a peptide not detected in a given sample by an untargeted assay is indeed completely absent or present but at a very low abundance. This incompleteness restricts the utility of metaproteomics for community metabolism modeling, though this limitation may be ultimately mitigated by improving technology. As a promising example, one recent study was able to use metaproteomic data to construct and compare detailed metabolic models of two naphthalene-degrading bacterial communities164.

Importantly, while genes and proteins vary across taxa, metabolites are, at least in principle, universal. Accordingly, in contrast to metatranscriptomics and metaproteomics, the processing and analysis of community-wide metabolomic data can rely on standard approaches for single-organism metabolomics with essentially no modifications. For untargeted mass spectrometry metabolomics, these analyses typically involve normalization and putative identification of metabolites by searching either for matches in a spectral library or for known compounds with matching mass and chromatographic elution profiles165. The greater challenge, however, lies in the interpretation of these datasets, and in linking the observed variation in biomolecule abundances with other data on community structure and function. Statistical associations between disease, metabolite concentrations, and microbial species abundances have been observed in case-control studies of Crohn's disease, colorectal cancer, and C. dificile infection among other conditions, but the mechanistic nature of these links remains unclear166169. A few studies have further used metabolic pathway information to quantify the link between shifts in the metagenome and functionally related metabolome variation170,171. Moreover, a recent study has introduced a novel computational framework, MIMOSA, for metabolic model-based integration of community taxonomic and metabolomic data and for evaluating whether variation in the metabolome can be explained mechanistically by variation in the community's taxonomic profile172. Such methods are crucial for gaining a principled, systems-level understanding of how changes in community ecology impact community metabolism and behavior.

Lastly, while not strictly an ‘omics’ assay, high-resolution imaging of microbial communities and the study of community spatial distributions is another area of rapid technology and bioinformatic development. Spatial factors can affect microbial community nutrient availability, communication, and biofilm formation, among other processes173. Methods for quantifying the distribution of microbes in a community and relating it to associated omic data are therefore clearly needed. One increasingly popular technique is fluorescent in situ hybridization (FISH) with primers specific to various bacterial taxa of interest, combined with high-resolution microscopy174,175. A recently developed tool, called BacSpace, systematically processes and analyzes such data by filtering out non-microbial fluorescence and calculating and aggregating distances between different microbial cells and environmental landmarks174. Another approach to examine microbial biogeography on a larger scale involves mapping and visualizing many metabolomic and taxonomic profiles via a 3D model of a community site. This strategy has been applied to communities growing on solid culture176, as well as to the human skin microbiome177. Computational and quantitative methods along these lines are crucial for incorporating information of spatial heterogeneity into a more complete mechanistic and quantitative understanding of the microbiome.

Conclusion

The growing appreciation for the scientific and clinical importance of the human microbiome has given rise to an explosion of microbiome studies. These studies now routinely generate, assemble, and explore high-dimensional meta-omic data at an unprecedented scale. Above, we have broadly outlined the most common types of approaches and computational tools available for processing and analyzing such data, with emphasis on several areas in which increasingly higher resolution and precision can be gained from computational analysis of microbiome data (Figure 1). Fortunately, such tools are regularly distributed as open-source software that can be applied to datasets from a wide range of studies (Table 1). It is important to note, however, that these methods (and likely many other methods that will be developed to address these challenges in coming years) are ultimately limited by the large numbers of genes of unknown function and yet-uncharacterized taxa present in the microbiome. Developing efficient, cost-effective, and rigorous methods to demystify these hidden layers of microbiome diversity is therefore necessary to realize the full potential of microbiome research. Nevertheless, the resolution and scale of microbial community profiling will likely continue to improve with future technology development. These technologies will provide an increasingly more detailed view of the structure and function of the microbiome's subpopulations and even single cells across time and space, the behavior of such subpopulations, and the way they interact with one another and with the host. These advances will contribute to the growing field of personalized medicine, with applications ranging from precise identification of pathogenic strains for targeted treatment, through careful monitoring of dysbiotic microbial communities in disease, to personalized and rational design of microbiome manipulations.

Acknowledgment

CN is supported by an NSF IGERT DGE-1258485 fellowship. CM is supported by “Interdisciplinary Training in Genomic Sciences” National Human Genome Research Institute Grant T32 HG00035. This work was supported in part by New Innovator Award DP2 AT007802-01 to EB.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

All authors have read the journal's policy on disclosure of potential conflicts of interest. All authors have read the journal's authorship agreement and the manuscript has been reviewed by and approved by all named authors. All authors have disclosed any financial or personal relationship with organizations that could potentially be perceived as influencing the described research.

References

  • 1.Huttenhower C, Gevers D, Knight R, et al. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–214. doi: 10.1038/nature11234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Qin J, Li Y, Cai Z, et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2012;490(7418):55–60. doi: 10.1038/nature11450. [DOI] [PubMed] [Google Scholar]
  • 3.Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027–1031. doi: 10.1038/nature05414. [DOI] [PubMed] [Google Scholar]
  • 4.Cox LM, Yamanishi S, Sohn J, et al. Altering the Intestinal Microbiota during a Critical Developmental Window Has Lasting Metabolic Consequences. Cell. 2014;158(4):705–721. doi: 10.1016/j.cell.2014.05.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Smith MI, Yatsunenko T, Manary MJ, et al. Gut microbiomes of Malawian twin pairs discordant for kwashiorkor. Science. 2013;339(6119):548–554. doi: 10.1126/science.1229000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yarza P, Yilmaz P, Pruesse E, et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol. 2014;12(9):635–645. doi: 10.1038/nrmicro3330. [DOI] [PubMed] [Google Scholar]
  • 7.Shakya M, Quince C, Campbell JH, Yang ZK, Schadt CW, Podar M. Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Environ Microbiol. 2013;15(6):1882–1899. doi: 10.1111/1462-2920.12086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mande SS, Mohammed MH, Ghosh TS. Classification of metagenomic sequences: methods and challenges. Brief Bioinform. 2012;13(6):669–681. doi: 10.1093/bib/bbs054. [DOI] [PubMed] [Google Scholar]
  • 9.Manor O, Borenstein E. MUSiCC: a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome. Genome Biol. 2015;16:53. doi: 10.1186/s13059-015-0610-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Manor O, Levy R, Borenstein E. Mapping the inner workings of the microbiome: Genomic- and metagenomic-based study of metabolism and of metabolic interactions in the human gut microbiome. Cell Metab. 2014;20(5):742–745. doi: 10.1016/j.cmet.2014.07.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Eren AM, Zozaya M, Taylor CM, Dowd SE, Martin DH, Ferris MJ. Ravel J, editor. Exploring the Diversity of Gardnerella vaginalis in the Genitourinary Tract Microbiota of Monogamous Couples Through Subtle Nucleotide Variation. PLoS One. 2011;6(10):e26732. doi: 10.1371/journal.pone.0026732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Fitz-Gibbon S, Tomida S, Chiu B-H, et al. Propionibacterium acnes Strain Populations in the Human Skin Microbiome Associated with Acne. J Invest Dermatol. 2013;133(9):2152–2160. doi: 10.1038/jid.2013.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Busby B, Kristensen DM, Koonin EV. Contribution of phage-derived genomic islands to the virulence of facultative bacterial pathogens. Env Microbiol. 2012;15(2):307–312. doi: 10.1111/j.1462-2920.2012.02886.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Haiser HJ, Gootenberg DB, Chatman K, Sirasani G, Balskus EP, Turnbaugh PJ. Predicting and Manipulating Cardiac Drug Inactivation by the Human Gut Bacterium Eggerthella lenta. Science. 2013;341(6143):295–298. doi: 10.1126/science.1235872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hajishengallis G, Liang S, Payne M, et al. Low-Abundance Biofilm Species Orchestrates Inflammatory Periodontal Disease through the Commensal Microbiota and Complement. Cell Host Microbe. 2011;10(5):497–506. doi: 10.1016/j.chom.2011.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Charbonneau M, O'Donnell D, Blanton L, et al. Sialylated Milk Oligosaccharides Promote Microbiota-Dependent Growth in Models of Infant Undernutrition. Cell. 2016;164(5):859–871. doi: 10.1016/j.cell.2016.01.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Goodrich JK, Di Rienzi SC, Poole AC, et al. Conducting a Microbiome Study. Cell. 2014;158(2):250–262. doi: 10.1016/j.cell.2014.06.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kostic AD, Gevers D, Pedamallu CS, et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 2012;22(2):292–298. doi: 10.1101/gr.126573.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Llopis M, Cassard AM, Wrzosek L, et al. Intestinal microbiota contributes to individual susceptibility to alcoholic liver disease. Gut. 2016;65:830–839. doi: 10.1136/gutjnl-2015-310585. [DOI] [PubMed] [Google Scholar]
  • 20.Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc Natl Acad Sci. 1977;74(11):5088–5090. doi: 10.1073/pnas.74.11.5088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ju F, Zhang T. 16S rRNA gene high-throughput sequencing data mining of microbial diversity and interactions. Appl Microbiol Biotechnol. 2015;99(10):4119–4129. doi: 10.1007/s00253-015-6536-y. [DOI] [PubMed] [Google Scholar]
  • 22.Kopylova E, Navas-Molina JA, Mercier C, et al. Open-Source Sequence Clustering Methods Improve the State Of the Art. mSystems. 2016;1(1):e00003–e00015. doi: 10.1128/mSystems.00003-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tikhonov M, Leach RW, Wingreen NS. Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution. ISME J. 2014;9(1):68–80. doi: 10.1038/ismej.2014.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wang X, Yao J, Sun Y, Mai V. M-pick a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinformatics. 2013;14:43. doi: 10.1186/1471-2105-14-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Forster D, Bittner L, Karkar S, et al. Testing ecological theories with sequence similarity networks: marine ciliates exhibit similar geographic dispersal patterns as multicellular organisms. BMC Biol. 2015;13:16. doi: 10.1186/s12915-015-0125-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. Swarm: robust and fast clustering method for amplicon-based studies. PeerJ. 2014;2:e593. doi: 10.7717/peerj.593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ. 2015;3:e1420. doi: 10.7717/peerj.1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.De Vargas C, Audic S, Henry N, et al. Eukaryotic plankton diversity in the sunlit ocean. Science. 2015;348(6237):1261605. doi: 10.1126/science.1261605. [DOI] [PubMed] [Google Scholar]
  • 29.Lima-Mendez G, Faust K, Henry N, et al. Determinants of community structure in the global plankton interactome. Science. 2015;348(6237):1262073. doi: 10.1126/science.1262073. [DOI] [PubMed] [Google Scholar]
  • 30.Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin ML. Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. ISME J. 2014;9(4):968–979. doi: 10.1038/ismej.2014.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Eren AM, Maignien L, Sul WJ, et al. Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data. Methods Ecol Evol. 2013;4(12):1111–1119. doi: 10.1111/2041-210X.12114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Newton RJ, McLellan SL, Dila DK, et al. Sewage Reflects the Microbiomes of Human Populations. MBio. 2015;6(2):e02574–14. doi: 10.1128/mBio.02574-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Franzén O, Hu J, Bao X, Itzkowitz SH, Peter I, Bashir A. Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering. Microbiome. 2015;3:43. doi: 10.1186/s40168-015-0105-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Singer E, Bushnell B, Coleman-Derr D, et al. High-resolution phylogenetic microbial community profiling. ISME J. 2016 Feb; doi: 10.1038/ismej.2015.249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Westcott SL, Schloss PD. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ. 2015;3:e1487. doi: 10.7717/peerj.1487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Forster D, Dunthorn M, Stoeck T, Mahé F. Comparison of three clustering approaches for detecting novel environmental microbial diversity. PeerJ. 2016;4:e1692. doi: 10.7717/peerj.1692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Schmidt TSB, Rodrigues JFM, von Mering C. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLoS Comput Biol. 2014;10(4):e1003594. doi: 10.1371/journal.pcbi.1003594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kembel SW, Wu M, Eisen JA, Green JL. Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol. 2012;8(10):e1002743. doi: 10.1371/journal.pcbi.1002743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Langille MGI, Zaneveld J, Caporaso JG, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013;31(9):814–821. doi: 10.1038/nbt.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Angly FE, Dennis PG, Skarshewski A, Vanwonterghem I, Hugenholtz P, Tyson GW. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction. Microbiome. 2014;2:11. doi: 10.1186/2049-2618-2-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Walker AW, Martin JC, Scott P, Parkhill J, Flint HJ, Scott KP. 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice. Microbiome. 2015;3:26. doi: 10.1186/s40168-015-0087-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–1202. doi: 10.1038/nmeth.2658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.McMurdie PJ, Holmes S. Waste Not Want Not: Why Rarefying Microbiome Data Is Inadmissible. PLoS Comput Biol. 2014;10(4):e1003531. doi: 10.1371/journal.pcbi.1003531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sohn MB, Du R, An L. A robust approach for identifying differentially abundant features in metagenomic samples. Bioinformatics. 2015;31(14):2269–2275. doi: 10.1093/bioinformatics/btv165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Friedman J, Alm EJ. Mering C, editor. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Weiss S, Van Treuren W, Lozupone C, et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 2016 Feb; doi: 10.1038/ismej.2015.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Brooks JP, Edwards DJ, Harwich MD, et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 2015;15:66. doi: 10.1186/s12866-015-0351-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Konstantinidis KT, Ramette A, Tiedje JM. The bacterial species definition in the genomic era. Philos Trans R Soc B Biol Sci. 2006;361(1475):1929–1940. doi: 10.1098/rstb.2006.1920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Lukjancenko O, Wassenaar TM, Ussery DW. Comparison of 61 sequenced Escherichia coli genomes. Microb Ecol. 2010;60(4):708–720. doi: 10.1007/s00248-010-9717-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Clermont O, Bonacorsi S, Bingen E. Rapid and Simple Determination of the Escherichia coli Phylogenetic Group. Appl Environ Microbiol. 2000;66(10):4555–4558. doi: 10.1128/aem.66.10.4555-4558.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.LeBlanc J. Implication of Virulence Factors in Escherichia coli O157:H7 Pathogenesis. Crit Rev Microbiol. 2003;29(4):277–296. doi: 10.1080/713608014. [DOI] [PubMed] [Google Scholar]
  • 52.Holt KE, Parkhill J, Mazzoni CJ, et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet. 2008;40(8):987–993. doi: 10.1038/ng.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gutacker MM, Smoot JC, Migliaccio CAL, et al. Genome-Wide Analysis of Synonymous Single Nucleotide Polymorphisms in Mycobacterium tuberculosis Complex Organisms: Resolution of Genetic Relationships Among Closely Related Microbial Strains. Genetics. 2002;162(4):1533–1543. doi: 10.1093/genetics/162.4.1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Manning SD, Motiwala AS, Springman AC, et al. Variation in virulence among clades of Escherichia coli O157:H7 associated with disease outbreaks. Proc Natl Acad Sci. 2008;105(12):4868–4873. doi: 10.1073/pnas.0710834105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Gill SR, Fouts DE, Archer GL, et al. Insights on Evolution of Virulence and Resistance from the Complete Genome Analysis of an Early Methicillin-Resistant Staphylococcus aureus Strain and a Biofilm-Producing Methicillin-Resistant Staphylococcus epidermidis Strain. J Bacteriol. 2005;187(7):2426–2438. doi: 10.1128/JB.187.7.2426-2438.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Hansen EE, Lozupone CA, Rey FE, et al. Pan-genome of the dominant human gut-associated archaeon Methanobrevibacter smithii, studied in twins. Proc Natl Acad Sci. 2011;108(Supplement 1):4599–4606. doi: 10.1073/pnas.1000071108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Salama N, Guillemin K, McDaniel TK, Sherlock G, Tompkins L, Falkow S. A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc Natl Acad Sci. 2000;97(26):14668–14673. doi: 10.1073/pnas.97.26.14668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Siezen RJ, Tzeneva VA, Castioni A, et al. Phenotypic and genomic diversity of Lactobacillus plantarum strains isolated from various environmental niches. Environ Microbiol. 2010;12(3):758–773. doi: 10.1111/j.1462-2920.2009.02119.x. [DOI] [PubMed] [Google Scholar]
  • 59.Rappé MS, Giovannoni SJ. The uncultured microbial majority. Annu Rev Microbiol. 2003;57:369–394. doi: 10.1146/annurev.micro.57.030502.090759. [DOI] [PubMed] [Google Scholar]
  • 60.Hugenholtz P. Exploring prokaryotic diversity in the genomic era. Genome Biol. 2002;3:reviews0003.1. doi: 10.1186/gb-2002-3-2-reviews0003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Atarashi K, Tanoue T, Oshima K, et al. Treg induction by a rationally selected mixture of Clostridia strains from the human microbiota. Nature. 2013;500(7461):232–236. doi: 10.1038/nature12331. [DOI] [PubMed] [Google Scholar]
  • 62.Ahn T-H, Chai J, Pan C. Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance. Bioinformatics. 2014;31(2):170–177. doi: 10.1093/bioinformatics/btu641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Sahl JW, Schupp JM, Rasko DA, Colman RE, Foster JT, Keim P. Phylogenetically typing bacterial strains from partial SNP genotypes observed from direct sequencing of clinical specimen metagenomic data. Genome Med. 2015;7:52. doi: 10.1186/s13073-015-0176-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Luo C, Knight R, Siljander H, Knip M, Xavier RJ, Gevers D. ConStrains identifies microbial strains in metagenomic datasets. Nat Biotechnol. 2015;33(10):1045–1052. doi: 10.1038/nbt.3319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Nayfach S, Pollard KS. Population Genetic Analyses of Metagenomes Reveal Extensive Strain-Level Variation in Prevalent Human-Associated Bacteria. Cold Spring Harbor Labs Journals. 2015 [Google Scholar]
  • 66.Greenblum S, Carr R, Borenstein E. Extensive Strain-Level Copy-Number Variation across Human Gut Microbiome Species. Cell. 2015;160(4):583–594. doi: 10.1016/j.cell.2014.12.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Schloissnig S, Arumugam M, Sunagawa S, et al. Genomic variation landscape of the human gut microbiome. Nature. 2013;493(7430):45–50. doi: 10.1038/nature11711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Vatanen T, Kostic AD, d'Hennezel E, et al. Variation in Microbiome LPS Immunogenicity Contributes to Autoimmunity in Humans. Cell. 2016;165(4):842–853. doi: 10.1016/j.cell.2016.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Yassour M, Vatanen T, Siljander H, et al. Natural history of the infant gut microbiome and impact of antibiotic treatment on bacterial strain diversity and stability. Sci Transl Med. 2016;8(343):343ra81. doi: 10.1126/scitranslmed.aad0917. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Hong C, Manimaran S, Shen Y, et al. PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples. Microbiome. 2014;2:33. doi: 10.1186/2049-2618-2-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Morowitz MJ, Denef VJ, Costello EK, et al. Strain-resolved community genomic analysis of gut microbial colonization in a premature infant. Proc Natl Acad Sci. 2010;108(3):1128–1133. doi: 10.1073/pnas.1010992108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2012;23(1):111–120. doi: 10.1101/gr.142315.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhu A, Sunagawa S, Mende DR, Bork P. Inter-individual differences in the gene content of human gut bacterial species. Genome Biol. 2015;16:82. doi: 10.1186/s13059-015-0646-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Scholz M, Ward DV, Pasolli E, et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods. 2016 doi: 10.1038/nmeth.3802. advance on. [DOI] [PubMed] [Google Scholar]
  • 75.Hug LA, Baker BJ, Anantharaman K, et al. A new view of the tree of life. Nat Microbiol. 2016:16048. doi: 10.1038/nmicrobiol.2016.48. [DOI] [PubMed] [Google Scholar]
  • 76.Zhang C, Yin A, Li H, et al. Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children. EBioMedicine. 2015;2(8):968–984. doi: 10.1016/j.ebiom.2015.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population genomes from metagenome datasets. Microbiome. 2016;4:8. doi: 10.1186/s40168-016-0154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
  • 79.Afiahayati, Sato K, Sakakibara Y. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res. 2014;22(1):69–77. doi: 10.1093/dnares/dsu041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Boisvert S, Raymond F, Godzaridis É , Laviolette F, Corbeil J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 2012;13(12):R122. doi: 10.1186/gb-2012-13-12-r122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Peng Y, Leung HCM, Yiu SM, Chin FYL. Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics. 2011;27(13):i94–i101. doi: 10.1093/bioinformatics/btr216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Pride DT. Evolutionary Implications of Microbial Genome Tetranucleotide Frequency Biases. Genome Res. 2003;13(2):145–158. doi: 10.1101/gr.335003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Strous M, Kraft B, Bisdorf R, Tegetmeyer HE. The Binning of Metagenomic Contigs for Microbial Physiology of Mixed Cultures. Front Microbio. 2012;3:410. doi: 10.3389/fmicb.2012.00410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Saeed I, Tang S-L, Halgamuge SK. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res. 2011;40(5):e34–e34. doi: 10.1093/nar/gkr1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31(6):533–538. doi: 10.1038/nbt.2579. [DOI] [PubMed] [Google Scholar]
  • 86.Nielsen HB, Almeida M, Juncker AS, et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol. 2014;32(8):822–828. doi: 10.1038/nbt.2939. [DOI] [PubMed] [Google Scholar]
  • 87.Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, Tyson GW. GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ. 2014;2:e603. doi: 10.7717/peerj.603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2015;32(4):605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
  • 89.Kang DD, Froula J, Egan R, Wang Z. MetaBAT an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3:e1165. doi: 10.7717/peerj.1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Alneberg J, Bjarnason BS, de Bruijn I, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11(11):1144–1146. doi: 10.1038/nmeth.3103. [DOI] [PubMed] [Google Scholar]
  • 91.Brown CT, Hug LA, Thomas BC, et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature. 2015;523(7559):208–211. doi: 10.1038/nature14486. [DOI] [PubMed] [Google Scholar]
  • 92.Cleary B, Brito IL, Huang K, et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat Biotechnol. 2015;33(10):1053–1060. doi: 10.1038/nbt.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Carr R, Shen-Orr SS, Borenstein E. Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution. PLoS Comput Biol. 2013;9(10):e1003292. doi: 10.1371/journal.pcbi.1003292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates single cells, and metagenomes. Genome Res. 2015;25(7):1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics. 2016;32(7):1088–1090. doi: 10.1093/bioinformatics/btv697. [DOI] [PubMed] [Google Scholar]
  • 96.Burton J, Liachko I, Dunham M, Shendure J. Species-Level Deconvolution of Metagenome Assemblies with Hi-C-Based Contact Probability Maps. G3. 2014;4(7):1339–1346. doi: 10.1534/g3.114.011825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Marbouty M, Cournac A, Flot J-F, Marie-Nelly H, Mozziconacci J, Koszul R. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife. 2014;3:e03318. doi: 10.7554/eLife.03318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Beitel CW, Froenicke L, Lang JM, et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ. 2014;2:e415. doi: 10.7717/peerj.415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Tsai Y-C, Conlan S, Deming C, et al. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing. MBio. 2016;7(1):e01948–15. doi: 10.1128/mBio.01948-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Sharon I, Kertesz M, Hug LA, et al. Accurate multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 2015;25(4):534–543. doi: 10.1101/gr.183012.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Kuleshov V, Jiang C, Zhou W, Jahanbani F, Batzoglou S, Snyder M. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat Biotechnol. 2015;34(1):64–69. doi: 10.1038/nbt.3416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Tringe SG, von Mering C, Kobayashi A, et al. Comparative metagenomics of microbial communities. Science. 2005;308(5721):554–557. doi: 10.1126/science.1107851. [DOI] [PubMed] [Google Scholar]
  • 103.Illeghems K, Weckx S, De Vuyst L. Applying meta-pathway analyses through metagenomics to identify the functional properties of the major bacterial communities of a single spontaneous cocoa bean fermentation process sample. Food Microbiol. 2015;50:54–63. doi: 10.1016/j.fm.2015.03.005. [DOI] [PubMed] [Google Scholar]
  • 104.White RA, Chan AM, Gavelis GS, et al. Metagenomic Analysis Suggests Modern Freshwater Microbialites Harbor a Distinct Core Microbial Community. Front Microbiol. 2016;6:1531. doi: 10.3389/fmicb.2015.01531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Greenblum S, Turnbaugh PJ, Borenstein E. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc Natl Acad Sci U S A. 2012;109(2):594–599. doi: 10.1073/pnas.1116053109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Freedman ZB, Upchurch RA, Zak DR, Cline LC. Anthropogenic N Deposition Slows Decay by Favoring Bacterial Metabolism: Insights from Metagenomic Analyses. Front Microbiol. 2016;7:259. doi: 10.3389/fmicb.2016.00259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Koenig JE, Spor A, Scalfone N, et al. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A. 2011;108(Suppl (Supplement_1)):4578–4585. doi: 10.1073/pnas.1000081107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Gibson MK, Forsberg KJ, Dantas G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 2015;9(1):207–216. doi: 10.1038/ismej.2014.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Li B, Yang Y, Ma L, et al. Metagenomic and network analysis reveal wide distribution and co-occurrence of environmental antibiotic resistance genes. ISME J. 2015;9(11):2490–2502. doi: 10.1038/ismej.2015.59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Lederberg J. Infectious history. Science. 2000;288(5464):287–293. doi: 10.1126/science.288.5464.287. [DOI] [PubMed] [Google Scholar]
  • 111.Gordon JI, Klaenhammer TR. A rendezvous with our microbes. Proc Natl Acad Sci U S A. 2011;108(Suppl):4513–4515. doi: 10.1073/pnas.1101958108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Borenstein E. Computational systems biology and in silico modeling of the human microbiome. Brief Bioinform. 2012;13(6):769–780. doi: 10.1093/bib/bbs022. [DOI] [PubMed] [Google Scholar]
  • 113.Glass EM, Meyer F. Handbook of Molecular Microbial Ecology I. Wiley-Blackwell; 2011. The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes. pp. 325–331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Wu S, Zhu Z, Fu L, Niu B, Li W. WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genomics. 2011;12:444. doi: 10.1186/1471-2164-12-444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Arumugam M, Harrington ED, Foerstner KU, Raes J, Bork P. SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics. 2010;26(23):2977–2978. doi: 10.1093/bioinformatics/btq536. [DOI] [PubMed] [Google Scholar]
  • 116.Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 2015;428(4):726–731. doi: 10.1016/j.jmb.2015.11.006. [DOI] [PubMed] [Google Scholar]
  • 117.Bose T, Haque MM, Reddy C, Mande SS. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets. PLoS One. 2015;10(11):e0142102. doi: 10.1371/journal.pone.0142102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Kultima JR, Coelho LP, Forslund K, et al. MOCAT2: a metagenomic assembly, annotation and profiling framework. Bioinformatics. 2016 Apr;:btw183. doi: 10.1093/bioinformatics/btw183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Prestat E, David MM, Hultman J, et al. FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus. Nucleic Acids Res. 2014;42(19):e145–e145. doi: 10.1093/nar/gku702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Yeoh YK, Paungfoo-Lonhienne C, Dennis PG, et al. The core root microbiome of sugarcanes cultivated under varying nitrogen fertilizer application. Environ Microbiol. 2016;18(5):1338–1351. doi: 10.1111/1462-2920.12925. [DOI] [PubMed] [Google Scholar]
  • 122.Nelson MB, Berlemont R, Martiny AC, Martiny JBH. Kostka JE, editor. Nitrogen Cycling Potential of a Grassland Litter Microbial Community. Appl Environ Microbiol. 2015;81(20):7012–7022. doi: 10.1128/AEM.02222-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Clemente JC, Pehrsson EC, Blaser MJ, et al. The microbiome of uncontacted Amerindians. Sci Adv. 2015;1(3) doi: 10.1126/sciadv.1500183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Rampelli S, Schnorr SL, Consolandi C, et al. Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota. Curr Biol. 2015;25(13):1682–1693. doi: 10.1016/j.cub.2015.04.055. [DOI] [PubMed] [Google Scholar]
  • 125.Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:W445–W451. doi: 10.1093/nar/gks479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Kaminski J, Gibson MK, Franzosa EA, Segata N, Dantas G, Huttenhower C. Noble WS, editor. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLOS Comput Biol. 2015;11(12):e1004557. doi: 10.1371/journal.pcbi.1004557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Jones MB, Highlander SK, Anderson EL, et al. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc Natl Acad Sci. 2015;112(45):14024–14029. doi: 10.1073/pnas.1519288112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Carr R, Borenstein E. Comparative analysis of functional metagenomic annotation and the mappability of short reads. PLoS One. 2014;9(8):e105776. doi: 10.1371/journal.pone.0105776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Nayfach S, Pollard KS. Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome. Genome Biol. 2015;16:51. doi: 10.1186/s13059-015-0611-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–386. doi: 10.1101/gr.5969107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Gori F, Folino G, Jetten MSM, Marchiori E. MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics. 2010;27(2):196–203. doi: 10.1093/bioinformatics/btq649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P. A Bioinformatician's Guide to Metagenomics. Microbiol Mol Biol Rev. 2008;72(4):557–578. doi: 10.1128/MMBR.00009-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics. 2013;29(18):2253–2260. doi: 10.1093/bioinformatics/btt389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46. doi: 10.1186/gb-2014-15-3-r46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Rasheed Z, Rangwala H. Metagenomic Taxonomic Classification using Extreme Learning Machines. J Bioinform Comput Biol. 2012;10(5):1–19. doi: 10.1142/S0219720012500151. [DOI] [PubMed] [Google Scholar]
  • 136.MacDonald NJ, Parks DH, Beiko RG. Encyclopedia of Metagenomics. Springer Science \mathplus Business Media; 2013. RITA: Rapid Identification of High-Confidence Taxonomic Assignments for Metagenomic Data. pp. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Vervier K, Mahé P, Tournoud M, Veyrieras J-B, Vert J-P. Large-scale Machine Learning for Metagenomics Sequence Classification. Bioinformatics. 2016;32(7):1023–1032. doi: 10.1093/bioinformatics/btv683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Petrenko P, Lobb B, Kurtz DA, Neufeld JD, Doxey AC. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes. BMC Biol. 2015;13:92. doi: 10.1186/s12915-015-0195-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Wang Y, Leung H, Yiu S, Chin F. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics. 2014;15(Suppl 1):S12. doi: 10.1186/1471-2164-15-S1-S12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Van Le V, Van Tran L, Van Tran H. A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads. BMC Bioinformatics. 2016;17(1):22. doi: 10.1186/s12859-015-0872-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Edlund A, Yang Y, Yooseph S, et al. Meta-omics uncover temporal regulation of pathways across oral microbiome genera during in vitro sugar metabolism. ISME J. 2015;9(12):2605–2619. doi: 10.1038/ismej.2015.72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Ferrer M, Ruiz A, Lanza F, et al. Microbiota from the distal guts of lean and obese adolescents exhibit partial functional redundancy besides clear differences in community structure. Environ Microbiol. 2013;15(1):211–226. doi: 10.1111/j.1462-2920.2012.02845.x. [DOI] [PubMed] [Google Scholar]
  • 143.Franzosa EA, Hsu T, Sirota-Madi A, et al. Sequencing and beyond: integrating molecular “omics” for microbial community profiling. Nat Rev Microbiol. 2015;13(6):360–372. doi: 10.1038/nrmicro3451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Lamendella R, VerBerkmoes N, Jansson JK. ‘Omics’ of the mammalian gut – new insights into function. Curr Opin Biotechnol. 2012;23(3):491–500. doi: 10.1016/j.copbio.2012.01.016. [DOI] [PubMed] [Google Scholar]
  • 145.Waldor MK, Tyson G, Borenstein E, et al. Where next for microbiome research? PLoS Biol. 2015;13(1):e1002050. doi: 10.1371/journal.pbio.1002050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Greenblum S, Chiu H, Levy R, Carr R, Borenstein E. Towards a predictive systems-level model of the human microbiome: progress, challenges, and opportunities. Curr Opin Biotechnol. 2013;24:810–820. doi: 10.1016/j.copbio.2013.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Pérez-Cobas AE, Gosalbes MJ, Friedrichs A, et al. Gut microbiota disturbance during antibiotic therapy: a multi-omic approach. Gut. 2013;62(11):1591–1601. doi: 10.1136/gutjnl-2012-303184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.David LA, Maurice CF, Carmody RN, et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature. 2014;505(7484):559–563. doi: 10.1038/nature12820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.De Filippis F, Genovese A, Ferranti P, Gilbert JA, Ercolini D. Metatranscriptomics reveals temperature-driven functional changes in microbiome impacting cheese maturation rate. Sci Rep. 2016;6:21871. doi: 10.1038/srep21871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Shi W, Moon CD, Leahy SC, et al. Methane yield phenotypes linked to differential gene expression in the sheep rumen microbiome. Genome Res. 2014;24(9):1517–1525. doi: 10.1101/gr.168245.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Jorth P, Turner KH, Gumus P, Nizam N, Buduneli N, Whiteley M. Metatranscriptomics of the Human Oral Microbiome during Health and Disease. MBio. 2014;5(2):e01012–e01014. doi: 10.1128/mBio.01012-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Aylward FO, Eppley JM, Smith JM, Chavez FP, Scholin CA, DeLong EF. Microbial community transcriptional networks are conserved in three domains at ocean basin scales. Proc Natl Acad Sci. 2015;112(17):5443–5448. doi: 10.1073/pnas.1502883112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Ye Y, Tang H. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics. 2016;32(7):1001–1008. doi: 10.1093/bioinformatics/btv510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Leimena MM, Ramiro-Garcia J, Davids M, et al. A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets. BMC Genomics. 2013;14:530. doi: 10.1186/1471-2164-14-530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Toseland A, Moxon S, Mock T, Moulton V. Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation. BMC Genomics. 2014;15:901. doi: 10.1186/1471-2164-15-901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Eren AM, Esen ÖC, Quince C, et al. Anvi'o: an advanced analysis and visualization platform for `omics data. PeerJ. 2015;3:e1319. doi: 10.7717/peerj.1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Dillies M-A, Rau A, Aubert J, et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2012;14(6):671–683. doi: 10.1093/bib/bbs046. [DOI] [PubMed] [Google Scholar]
  • 158.Qin L-X, Huang H-C, Niu Y. Differential Expression Analysis for RNA-Seq: An Overview of Statistical Methods and Computational Software. Cancer Inf. 2015;14(Suppl 1):57–67. doi: 10.4137/CIN.S21631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12) doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.Muth T, Kolmeder CA, Salojärvi J, et al. Navigating through metaproteomics data: A logbook of database searching. Proteomics. 2015;15(20):3439–3453. doi: 10.1002/pmic.201400560. [DOI] [PubMed] [Google Scholar]
  • 161.Erickson AR, Cantarel BL, Lamendella R, et al. Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease. PLoS One. 2012;7(11):e49138. doi: 10.1371/journal.pone.0049138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 162.Brooks B, Mueller RS, Young JC, Morowitz MJ, Hettich RL, Banfield JF. Strain-resolved microbial community proteomics reveals simultaneous aerobic and anaerobic function during gastrointestinal tract colonization of a preterm infant. Front Microbiol. 2015;6:654. doi: 10.3389/fmicb.2015.00654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Penzlin A, Lindner MS, Doellinger J, Dabrowski PW, Nitsche A, Renard BY. Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. Bioinformatics. 2014;30(12):i149–i156. doi: 10.1093/bioinformatics/btu267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Tobalina L, Bargiela R, Pey J, et al. Context-specific metabolic network reconstruction of a naphthalene-degrading bacterial community guided by metaproteomic data. Bioinformatics. 2015;31(11):1771–1779. doi: 10.1093/bioinformatics/btv036. [DOI] [PubMed] [Google Scholar]
  • 165.Ren S, Hinzman AA, Kang EL, Szczesniak RD, Lu LJ. Computational and statistical analysis of metabolomics data. Metabolomics. 2015;11(6):1492–1513. [Google Scholar]
  • 166.Jansson J, Willing B, Lucio M, et al. Metabolomics reveals metabolic biomarkers of Crohn's disease. PLoS One. 2009;4(7):e6386. doi: 10.1371/journal.pone.0006386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167.Weir TL, Manter DK, Sheflin AM, Barnett BA, Heuberger AL, Ryan EP. Stool microbiome and metabolome differences between colorectal cancer patients and healthy adults. PLoS One. 2013;8(8):e70803. doi: 10.1371/journal.pone.0070803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Theriot CM, Koenigsknecht MJ, Carlson PE, et al. Antibiotic-induced shifts in the mouse gut microbiome and metabolome increase susceptibility to Clostridium difficile infection. Nat Commun. 2014;5:3114. doi: 10.1038/ncomms4114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Gomez A, Petrzelkova K, Yeoman CJ, et al. Gut microbiome composition and metabolomic profiles of wild western lowland gorillas (Gorilla gorilla gorilla) reflect host ecology. Mol Ecol. 2015;24(10):2551–2565. doi: 10.1111/mec.13181. [DOI] [PubMed] [Google Scholar]
  • 170.McHardy IH, Goudarzi M, Tong M, et al. Integrative analysis of the microbiome and metabolome of the human intestinal mucosal surface reveals exquisite interrelationships. Microbiome. 2013;1(1):17. doi: 10.1186/2049-2618-1-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Sridharan GV, Choi K, Klemashevich C, et al. Prediction and quantification of bioactive microbiota metabolites in the mouse gut. Nat Commun. 2014;5:5492. doi: 10.1038/ncomms6492. [DOI] [PubMed] [Google Scholar]
  • 172.Noecker C, Eng A, Srinivasan S, et al. Metabolic Model-Based Integration of Microbiome Taxonomic and Metabolomic Profiles Elucidates Mechanistic Links between Ecological and Metabolic Variation. mSystems. 2016;1(1):e00013–e00015. doi: 10.1128/mSystems.00013-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 173.Donaldson GP, Lee SM, Mazmanian SK. Gut biogeography of the bacterial microbiota. Nat Rev Microbiol. 2016;14(1):20–32. doi: 10.1038/nrmicro3552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.Earle K, Billings G, Sigal M, et al. Quantitative Imaging of Gut Microbiota Spatial Organization. Cell Host Microbe. 2015;18(4):478–488. doi: 10.1016/j.chom.2015.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175.Welch JLM, Rossetti BJ, Rieken CW, Dewhirst FE, Borisy GG. Biogeography of a human oral microbiome at the micron scale. Proc Natl Acad Sci. 2016;113(6):E791–E800. doi: 10.1073/pnas.1522149113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176.Watrous JD, Phelan VV, Hsu C-C, et al. Microbial metabolic exchange in 3D. ISME J. 2013;7(4):770–780. doi: 10.1038/ismej.2012.155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 177.Bouslimani A, Porto C, Rath CM, et al. Molecular cartography of the human skin surface in 3D. Proc Natl Acad Sci. 2015;112(17):E2120–E2129. doi: 10.1073/pnas.1424409112. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES