Abstract
Advances in high-throughput sequencing (HTS) have fostered rapid developments in the field of microbiome research, and massive microbiome datasets are now being generated. However, the diversity of software tools and the complexity of analysis pipelines make it difficult to access this field. Here, we systematically summarize the advantages and limitations of microbiome methods. Then, we recommend specific pipelines for amplicon and metagenomic analyses, and describe commonly-used software and databases, to help researchers select the appropriate tools. Furthermore, we introduce statistical and visualization methods suitable for microbiome analysis, including alpha- and beta-diversity, taxonomic composition, difference comparisons, correlation, networks, machine learning, evolution, source tracing, and common visualization styles to help researchers make informed choices. Finally, a step-by-step reproducible analysis guide is introduced. We hope this review will allow researchers to carry out data analysis more effectively and to quickly select the appropriate tools in order to efficiently mine the biological significance behind the data.
Keywords: metagenome, marker genes, high-throughput sequencing, pipeline, reproducible analysis, visualization
Introduction
Microbiome refers to an entire microhabitat, including its microorganisms, their genomes, and the surrounding environment (Marchesi and Ravel, 2015). With the development of high-throughput sequencing (HTS) technology and data analysis methods, the roles of the microbiome in humans (Gao et al., 2018; Yang and Yu, 2018; Zhang et al., 2018a), animals (Liu et al., 2020), plants (Liu et al., 2019a; Wang et al., 2020a), and the environment (Mahnert et al., 2019; Zheng et al., 2019) have gradually become clearer in recent years. These findings have completely changed our understanding of the microbiome. Several countries have launched successful international microbiome projects, such as the NIH Human Microbiome Project (HMP) (Turnbaugh et al., 2007), the Metagenomics of the Human Intestinal Tract (MetaHIT) (Li et al., 2014), the integrative HMP (iHMP) (Proctor et al., 2019), and the Chinese Academy of Sciences Initiative of Microbiome (CAS-CMI) (Shi et al., 2019b). These projects have made remarkable achievements, which have pushed microbiome research into a golden era.
The framework for amplicon and metagenomic analysis was established in the last decade (Caporaso et al., 2010; Qin et al., 2010). However, microbiome analysis methods and standards have been evolving rapidly over the past few years (Knight et al., 2018). For example, there was a proposal to replace operational taxonomic units (OTUs) with amplicon sequence variants (ASVs) in marker gene-based amplicon data analysis (Callahan et al., 2016). The next-generation microbiome analysis pipeline QIIME 2, a reproducible, interactive, efficient, community-supported platform was recently published (Bolyen et al., 2019). In addition, new methods have recently been proposed for taxonomic classification (Ye et al., 2019), machine learning (Galkin et al., 2018), and multi-omics integrated analysis (Pedersen et al., 2018).
The development of HTS and analysis methods has provided new insights into the structures and functions of microbiome (Jiang et al., 2019; Ning and Tong, 2019). However, these new developments have made it challenging for researchers, especially those without a bioinformatics background, to choose suitable software and pipelines. In this review, we discuss the widely used software packages for microbiome analyses, summarize their advantages and limitations, and provide sample codes and suggestions for selecting and using these tools.
HTS methods of microbiome analysis
The first step in microbiome research is to understand the advantages and limitations of specific HTS methods. These methods are primarily used for three types of analysis: microbe-, DNA-, and mRNA-level analyses (Fig. 1A). The appropriate method(s) should be selected based on sample types and research goals.
Culturome is a high-throughput method for culturing and identifying microbes at the microbe-level (Fig. 1A). The microbial isolates are obtained as follows. First, the samples are crushed, empirically diluted in liquid medium, and distributed in 96-well microtiter plates or Petri dishes. Second, the plates are cultured for 20 days at room temperature. Third, the microbes in each well are subjected to amplicon sequencing, and wells with pure, non-redundant colonies are selected as candidates. Fourth, the candidates are purified and subjected to 16S rDNA full-length Sanger sequencing. Finally, the newly characterized pure isolates are preserved (Zhang et al., 2019). Culturome is the most effective method for obtaining bacterial stocks, but it is expensive and labor intensive (Fig. 1B). This method has been used for microbiome analysis in humans (Goodman et al., 2011; Zou et al., 2019), mouse (Liu et al., 2020), marine sediment (Mu et al., 2018), Arabidopsis thaliana (Bai et al., 2015), and rice (Zhang et al., 2019). These studies not only expanded the catalog of taxonomic and functional databases for metagenomic analyses, but also provided bacterial stocks for experimental verification. For further information, please see (Lagier et al., 2018; Liu et al., 2019a).
DNA is easy to extract, preserve, and sequence, which has allowed researchers to develop various HTS methods (Fig. 1A). The commonly used HTS methods of microbiome are amplicon and metagenomic sequencing (Fig. 1B). Amplicon sequencing, the most widely used HTS method for microbiome analysis, can be applied to almost all sample types. The major marker genes used in amplicon sequencing include 16S ribosome DNA (rDNA) for prokaryotes and 18S rDNA and internal transcribed spacers (ITS) for eukaryotes. 16S rDNA amplicon sequencing is the most commonly used method, but there is currently a confusing array of available primers. A good method for selecting primer is to evaluate their specificity and overall coverage using real samples or electronic PCR based on the SILVA database (Klindworth et al., 2012) and on host factors including the presence of chloroplasts, mitochondria, ribosomes, and other potential sources of non-specific amplification. Alternatively, researchers can refer to the primers used in published studies similar to their own, which would save time in method optimization and facilitate to compare results among studies. Two-step PCR is typically used for amplification and to add barcodes and adaptors to each sample during library preparation (de Muinck et al., 2017). Sample sequencing is often performed on the Illumina MiSeq, HiSeq 2500, or NovaSeq 6000 platform in paired-end 250 bases (PE250) mode, which generates 50,000–100,000 reads per sample. Amplicon sequencing can be applied to low-biomass specimens or samples contaminated by host DNA. However, this technique can only reach genus-level resolution. Moreover, it is sensitive to the specific primers and number of PCR cycles chosen, which may lead to some false-positive or false-negative results in downstream analyses (Fig. 1B).
Metagenomic sequencing provides more information than amplicon sequencing, but it is more expensive using this technique. For ‘pure’ samples such as human feces, the accepted amount of sequencing data for each sample ranges from 6 to 9 gigabytes (GB) in a metagenomic project. The corresponding price for library construction and sequencing ranges from $100 to $300. For samples containing complex microbiota or contaminated with host-derived DNA, the required sequencing output ranges from 30 to 300 GB per sample (Xu et al., 2018). In brief, 16S rDNA amplicon sequencing could be used to study bacteria and/or archaea composition. Metagenomic sequencing is advisable for further analysis if higher taxonomic resolution and functional information are required (Arumugam et al., 2011; Smits et al., 2017). Of course, metagenomic sequencing could be used directly in studies with smaller sample sizes, assuming sufficient project funding is available (Carrión et al., 2019; Fresia et al., 2019).
Metatranscriptomic sequencing can profile mRNAs in a microbial community, quantify gene expression levels, and provide a snapshot for functional exploration of a microbial community in situ (Turner et al., 2013; Salazar et al., 2019). It is worth noting that host RNA and other rRNAs should be removed in order to obtain transcriptional information of microbiota (Fig. 1B).
Since viruses have either DNA or RNA as their genetic materials, technically, metavirome research involves a combination of metagenome and metatranscriptome analyses (Fig. 1A and 1B). Due to the low biomass of viruses in a sample, virus enrichment (Metsky et al., 2019) or the removal of host DNA (Charalampous et al., 2019) is essential steps for obtaining sufficient quantities of viral DNA or RNA for analysis (Fig. 1B).
The selection of sequencing methods depends on the scientific questions and sample types. The integration of different methods is advisable, as multi-omics provides insights into both the taxonomy and function of the microbiome. In practice, most researchers select only one or two HTS methods for analysis due to time and cost limitations. Although amplicon sequencing can provide only the taxonomic composition of microbiota, it is cost effective ($20–50 per sample) and can be applied to large-scale research. In addition, the amount of data generated from amplicon sequencing is relatively small, and the analysis is quick and easy to perform. For example, data analysis of 100 amplicon samples could be completed within a day using an ordinary laptop computer. Thus, amplicon sequencing is often used in pioneering research. In contrast to amplicon sequencing, metagenomic sequencing not only extends taxonomic resolution to the species- or strain-level but also provides potential functional information. Metagenomic sequencing also makes it possible to assemble microbial genomes from short reads. However, it does not perform well for low-biomass samples or those severely contaminated by the host genome (Fig. 1B).
Analysis pipelines
“Analysis pipeline” refers to a particular program or script that combines several or even dozens of software programs organically in a certain order to complete a complex analysis task. As of January 23, 2020, the words “amplicon” and “metagenome” were mentioned more than 200,000 and 40,000 times in Google Scholar, respectively. Due to their wide usage, we will discuss the current best-practice pipelines for amplicon and metagenomic analysis. Researchers should get acquainted with the Shell environment and R language, which we discussed in our previous review (Liu et al., 2019b).
Amplicon analysis
The first stage of amplicon analysis is to convert raw reads (typically in fastq format) into a feature table (Fig. 2A). The raw reads are usually in paired-end 250 bases (PE250) mode and generated from the Illumina platforms. Other platforms, including Ion Torrent, PacBio, and Nanopore, are not discussed in this review and may not be suitable for the analysis pipelines discussed below. First, raw amplicon paired-end reads are grouped based on their barcode sequences (demultiplexing). Then the paired reads are merged to obtain amplicon sequences, and barcode and primers are removed. A quality-control step is normally needed to remove low-quality amplicon sequences. All of these steps can be completed using USEARCH (Edgar, 2010) or QIIME (Caporaso et al., 2010). Alternatively, clean amplicon data supplied by sequencing service providers can be used for next analysis (Fig. 2A).
Picking the representative sequences as proxies of a species is a key step in amplicon analysis. Two major approaches for representative sequence selection are clustering to OTUs and denoising to ASVs. The UPARSE algorithm clusters sequences with 97% similarity into OTUs (Edgar, 2013). However, this method may fail to detect subtle differences among species or strains. DADA2 is a recently developed denoising algorithm that outputs ASVs as more exactly representative sequences (Callahan et al., 2016). The denoising method is available at denoise-paired/single by DADA2, denoise-16S by Deblur in QIIME 2 (Bolyen et al., 2019), and -unoise3 in USEARCH (Edgar and Flyvbjerg, 2015). Finally, a feature table (OTU/ASV table) can be obtained by quantifying the frequency of the feature sequences in each sample. Simultaneously, the feature sequences can be assigned taxonomy, typically at the kingdom, phylum, class, order, family, genus, and species levels, providing a dimensionality reduction perspective on the microbiota.
In general, 16S rDNA amplicon sequencing can only be used to obtain information about taxonomic composition. However, many available software packages have been developed to predict potential functional information. The principle behind this prediction is to link the 16S rDNA sequences or taxonomy information with functional descriptions in literature. PICRUSt (Langille et al., 2013), which is based on the OTU table of the Greengenes database (McDonald et al., 2011), could be used to predict the metagenomic functional composition (Zheng et al., 2019) of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Kanehisa and Goto, 2000). The newly developed PICRUSt2 software package (https://github.com/picrust/picrust2) can directly predict metagenomic functions based on an arbitrary OTU/ASV table. The R package Tax4Fun (Asshauer et al., 2015) can predict KEGG functional capabilities of microbiota based on the SILVA database (Quast et al., 2013). The functional annotation of prokaryotic taxa (FAPROTAX) pipeline performs functional annotation based on published metabolic and ecological functions such as nitrate respiration, iron respiration, plant pathogen, and animal parasites or symbionts, making it useful for environmental (Louca et al., 2016), agricultural (Zhang et al., 2019), and animal (Ross et al., 2018) microbiome research. BugBase is an extended database of Greengenes used to predict phenotypes such as oxygen tolerance, Gram staining, and pathogenic potential (Ward et al., 2017); this database is mainly used in medical research (Mahnert et al., 2019).
Metagenomic analysis
Compared to amplicon, shotgun metagenome can provide functional gene profiles directly and reach a much higher resolution of taxonomic annotation. However, due to the large amount of data, the fact that most software is only available for Linux systems, and the large amount of computing resources are needed to perform analysis. To facilitate software installation and maintenance, we recommend using the package manager Conda with BioConda channel (Grüning et al., 2018) to deploy metagenomic analysis pipelines. Since metagenomic analysis is computationally intensive, it is better to run multiple tasks/samples in parallel, which requires software such as GNU Parallel for queue management (Tange, 2018).
The Illumina HiSeqX/NovaSeq system often produces PE150 reads for metagenomic sequencing, whereas reads generated by BGI-Seq500 are in PE100 mode. The first crucial step in metagenomic analysis is quality control and the removal of host contamination from raw reads, which requires the KneadData pipeline (https://bitbucket.org/biobakery/kneaddata) or a combination of Trimmomatic (Bolger et al., 2014) and Bowtie 2 (Langmead and Salzberg, 2012). Trimmomatic is a flexible quality-control software package for Illumina sequencing data that can be used to trim low-quality sequences, library primers and adapters. Reads mapped to host genomes using Bowtie 2 are treated as contaminated reads and filtered out. KneadData is an integrated pipeline, including Trimmomatic, Bowtie 2, and related scripts that can be used for quality control, to remove host-derived reads, and to output clean reads (Fig. 2B).
The main step in metagenomic analysis is to convert clean data into taxonomic and functional tables using reads-based and/or assembly-based methods. The reads-based methods align clean reads to curated databases and output feature tables (Fig. 2B). MetaPhlAn2 is a commonly used taxonomic profiling tool that aligns metagenome reads to a pre-defined marker-gene database to perform taxonomic classification (Truong et al., 2015). Kraken 2 performs exact k-mer matching to sequences within the NCBI non-redundant database and uses lowest common ancestor (LCA) algorithms to perform taxonomic classification (Wood et al., 2019). For a review about benchmarking 20 tools of taxonomic classification, please see Ye et al. (2019). HUMAnN2 (Franzosa et al., 2018), the widely used functional profiling software, can also be used to explore within- and between-sample contributional diversity (species’ contributions to a specific function). MEGAN (Huson et al., 2016) is a cross-platform graphical user interface (GUI) software that performs taxonomic and functional analyses (Table 1). In addition, various metagenomic gene catalogs are available, including catalogs curated from the human gut (Li et al., 2014; Pasolli et al., 2019; Tierney et al., 2019), the mouse gut (Xiao et al., 2015), the chicken gut (Huang et al., 2018), the cow rumen (Stewart et al., 2018; Stewart et al., 2019), the ocean (Salazar et al., 2019), and the citrus rhizosphere (Xu et al., 2018). These customized databases can be used for taxonomic and functional annotation in the appropriate field of study, allowing efficient, precise, rapid analysis.
Table 1.
Name | Link | Description and advantages | Reference |
---|---|---|---|
QIIME | http://qiime.org | The most highly cited and comprehensive amplicon analysis pipeline, providing hundreds of scripts for analyzing various data types and visualizations | (Caporaso et al., 2010) |
QIIME 2 | This next-generation amplicon pipeline provides integrated command lines and GUI, and supports reproducible analysis and big data. Provides interactive visualization and Chinese tutorial documents and videos | (Bolyen et al., 2019) | |
USEARCH | Alignment tool includes more than 200 subcommands for amplicon analysis with a small size (1 Mb), cross-platform, high-speed calculation, and free 32-bit version. The 64-bit version is commercial ($1485) | (Edgar, 2010) | |
VSEARCH | https://github.com/torognes/vsearch | A free USEARCH-like software tool. We recommend using it alone or in addition to USEARCH. Available as a plugin in QIIME 2 | (Rognes et al., 2016) |
Trimmomatic | http://www.usadellab.org/cms/index.php?page=trimmomatic | Java based software for quality control of metagenomic raw reads | (Bolger et al., 2014) |
Bowtie 2 | http://bowtie-bio.sourceforge.net/bowtie2 | Rapid alignment tool used to remove host contamination or for quantification | (Langmead and Salzberg, 2012) |
MetaPhlAn2 | https://bitbucket.org/biobakery/metaphlan2 | Taxonomic profiling tool with a marker gene database from more than 10,000 species. The output is relative abundance of strains | (Truong et al., 2015) |
Kraken 2 | https://ccb.jhu.edu/software/kraken2 | A taxonomic classification tool that uses exact k-mer matches to the NCBI database, high accuracy and rapid classification, and outputs reads counts for each species | (Wood et al., 2019) |
HUMAnN2 | https://bitbucket.org/biobakery/humann2 | Based on the UniRef protein database, calculates gene family abundance, pathway coverage, and pathway abundance from metagenomic or metatranscriptomic data. Provide species’ contributions to a specific function | (Franzosa et al., 2018) |
MEGAN | A GUI, cross-platform software for taxonomic and functional analysis of metagenomic data. Supports many types of visualizations with metadata, including scatter plot, word clouds, Voronoi tree maps, clustering, and networks | (Huson et al., 2016) | |
MEGAHIT | https://github.com/voutcn/megahit | Ultra-fast and memory-efficient metagenomic assembler | (Li et al., 2015) |
metaSPAdes | http://cab.spbu.ru/software/spades | High-quality metagenomic assembler but time-consuming and large memory requirement | (Nurk et al., 2017) |
MetaQUAST | http://quast.sourceforge.net/metaquast | Evaluates the quality of metagenomic assemblies, including N50 and misassemble, and outputs PDF and interactive HTML reports | (Mikheenko et al., 2016) |
MetaGeneMark | http://exon.gatech.edu/GeneMark/ | Gene prediction in bacteria, archaea, metagenome and metatranscriptome. Support Linux/MacOSX system. Provides webserver for online analysis | (Zhu et al., 2010) |
Prokka | http://www.vicbioinformatics.com/software.prokka.shtml | Provides rapid prokaryotic genome annotation, calls metaProdigal (Hyatt et al., 2012) for metagenomic gene prediction. Outputs nucleotide sequences, protein sequences, and annotation files of genes | (Seemann, 2014) |
CD-HIT | http://weizhongli-lab.org/cd-hit | Used to construct non-redundant gene catalogs | (Fu et al., 2012) |
Salmon | https://combine-lab.github.io/salmon | Provides ultra-fast quantification of reads counts of genes using a k-mer-based method | (Patro et al., 2017) |
metaWRAP | https://github.com/bxlab/metaWRAP | Binning pipeline includes 140 tools and supports conda install, default binning by MetaBAT, MaxBin, and CONCOCT. Provides refinement, quantification, taxonomic classification and visualization of bins | (Uritskiy et al., 2018) |
DAS Tool | https://github.com/cmks/DAS_Tool | Binning pipeline that integrates five binning software packages and performs refinement | (Sieber et al., 2018) |
Assembly-based methods assemble clean reads into contigs using tools such as MEGAHIT or metaSPAdes (Fig. 2B). MEGAHIT is used to assemble large, complex metagenome datasets quickly using little computer memory (Li et al., 2015), while metaSPAdes can generate longer contigs but requires more computational resources (Nurk et al., 2017). Genes present in assembled contigs are then identified using metaGeneMark (Zhu et al., 2010) or Prokka (Seemann, 2014). Redundant genes from separately assembled contigs must be removed using tools such as CD-HIT (Fu et al., 2012). Finally, a gene abundance table can be generated using alignment-based tools such as Bowtie 2 or alignment-free methods such as Salmon (Patro et al., 2017). Millions of genes are normally present in a metagenomic dataset. These genes must be combined into functional annotations, such as KEGG Orthology (KO), modules and pathways, representing a form of dimensional reduction (Kanehisa et al., 2016).
In addition, metagenomic data can be used to mine gene clusters or to assemble draft microbe genomes. The antiSMASH database is used to identify, annotate, and visualize gene clusters involved in secondary metabolite biosynthesis (Blin et al., 2018). Binning is a method that can be used to recover partial or complete bacterial genomes in metagenomic data. Available binning tools include CONCOCT (Alneberg et al., 2014), MaxBin 2 (Wu et al., 2015), and MetaBAT2 (Kang et al., 2015). Binning tools cluster contigs into different bins (draft genomes) based on tetra-nucleotide frequency and contig abundance. Reassembly is performed to obtain better bins. We recommend using a binning pipeline such as MetaWRAP (Uritskiy et al., 2018) or DAStool (Sieber et al., 2018), which integrate several binning software packages to obtain refined binning results and more complete genomes with less contamination. These pipelines also supply useful scripts for evaluation and visualization. For a more comprehensive review on metagenomic experiments and analysis, we recommend Quince et al. (2017).
Statistical analysis and visualization
The most important output files from amplicon and metagenomic analysis pipeline are taxonomic and functional tables (Figs. 2 and 3). The scientific questions that researchers could answer using the techniques include the following: Which microbes are present in the microbiota? Do different experimental groups show significant differences in alpha- and beta-diversity? Which species, genes, or functional pathways are biomarkers of each group? To answer these questions, methods are needed for both overall and details statistical analysis and visualization. Overall visualization can be used to explore differences in alpha/beta- diversity and taxonomic composition in a feature table. Details analysis could involve identifying biomarkers via comparison, correlation analysis, network analysis, and machine learning (Fig. 3). We will discuss these methods below and provide examples and references to facilitate such studies (Fig. 3 and Table 2).
Table 2.
Method | Scientific question | Visualization | Description and example reference |
---|---|---|---|
Alpha diversity | Within-sample diversity | Boxplot | Distribution (Edwards et al., 2015) or significant difference (Zhang et al., 2019) of alpha diversity among groups (Fig. 3A) |
Rarefaction curve | Sample diversity changes with sequencing depth or evaluation of sequencing saturation (Beckers et al., 2017) | ||
Venn diagram | Common or unique taxa (Ren et al., 2019) | ||
Beta diversity | Distance among samples or groups | Unconstrained PCoA scatter plot | Major differences of samples showing group differences (Fig. 3B) or gradient changes with time (Zhang et al., 2018b) |
Constrained PCoA scatter plot | Major differences among groups (Zgadzaj et al., 2016; Huang et al., 2019) | ||
Dendrogram | Hierarchical clustering of samples (Chen et al., 2019) | ||
Taxonomic composition | Relative abundance of features | Stacked bar plot | Taxonomic composition of each sample (Beckers et al., 2017) or group (Jin et al., 2017) (Fig. 3C) |
Flow or alluvial diagram | Relative abundance (RA) of taxonomic changes among seasons (Smits et al., 2017) or time-series (Zhang et al., 2018b) | ||
Sanky diagram | A variety of Venn diagrams showing changes in RA and common or unique features among groups (Smits et al., 2017) | ||
Difference comparison | Significantly different biomarkers between groups | Volcano plot | A variety of scatter plots showing P-value, RA, fold change, and number of differences (Shi et al., 2019a) |
Manhattan plot | A variety of scatter plots showing P-values, taxonomy, and highlighting significantly different biomarkers (Zgadzaj et al., 2016) (Fig. 3D) | ||
Extend bar plot | Bar plot of RA combined with difference and confidence intervals (Parks et al., 2014) | ||
Correlation analysis | Correlation between features and sample metadata | Scatter plot with linear fitting | Shows changes in features with time (Metcalf et al., 2016) or relationships with other numeric metadata (Fig. 3E) |
Corrplot | Correlation coefficient or distance triangular matrix visualized by color and/or shape (Zhang et al., 2018b) | ||
Heatmap | RA of features that change with time (Subramanian et al., 2014) | ||
Network analysis | Global view correlation of features | Colored based on taxonomy or modules | Finding correlation patterns of features based on taxonomy (Fig. 3F) and/or modules (Jiao et al., 2016) |
Colors highlight important features | Highlighting important features and showing their positions and connections (Wang et al., 2018b) | ||
Machine learning | Classification groups or regression analysis for numeric metadata prediction | Heatmap | Colored block showing classification results (Fig. 3G) (Wilck et al., 2017) or feature patterns in a time series (Subramanian et al., 2014). |
Bar plot | Feature importance, RA (Zhang et al., 2019), and increase in mean squared error (Subramanian et al., 2014). | ||
Treemap | Phylogenetic tree or taxonomy hierarchy | Phylogenetic tree or cladogram | Phylogenetic tree (Fig. 3H) shows relationship of OTUs or species (Levy et al., 2018). Taxonomic cladogram highlighting interesting biomarkers (Segata et al., 2011). |
Circular tree map | Shows features in a hierarchy color bubble (Carrión et al., 2019) |
Alpha diversity evaluates the diversity within a sample, including richness and evenness measurements. Several software packages can be used to calculate alpha diversity, including QIIME, the R package vegan (Oksanen et al., 2007), and USEARCH. The alpha diversity values of samples in each group could be visually compared using boxplots (Fig. 3A). The differences in alpha diversity among or between groups could be statistically evaluated using Analysis of Variance (ANOVA), Mann-Whitney U test, or Kruskal-Wallis test. It is important to note that P-values should be adjusted if each group is compared more than twice. Other visualization methods for alpha diversity indices are described in Table 2.
Beta diversity evaluates differences in the microbiome among samples and is normally combined with dimensional reduction methods such as principal coordinate analysis (PCoA), non-metric multidimensional scaling (NMDS), or constrained principal coordinate analysis (CPCoA) to obtain visual representations. These analyses can be implemented in the R vegan package and visualized in scatter plots (Fig. 3B and Table 2). The statistical differences between these beta-diversity indices can be computed using permutational multivariate analysis of variance (PERMANOVA) with the adonis() function in vegan (Oksanen et al., 2007).
Taxonomic composition describes the microbiota that are present in a microbial community, which is often visualized using a stacked bar plot (Fig. 3C and Table 2). For simplicity, the microbiota is often shown at the phylum or genus level in the plot.
Difference comparison is used to identify features (such as species, genes, or pathways) with significantly different abundances between groups using Welch’s t-test, Mann-Whitney U test, Kruskal-Wallis test, or tools such as ALDEx2, edgeR (Robinson et al., 2010), STAMP (Parks et al., 2014), or LEfSe (Segata et al., 2011). The results of difference comparison can be visualized using a volcano plot, Manhattan plot (Fig. 3D), or extended error bar plot (Table 3). It is important to note that this type of analysis is prone to produce false positives due to increases in the relative abundance of some features and decreases in other features. Several methods have been developed to obtain taxonomic absolute abundance in samples, such as the integration of HTS and flow cytometric enumeration (Vandeputte et al., 2017), and the integration of HTS with spike-in plasmid and quantitative PCR (Tkacz et al., 2018; Guo et al., 2020; Wang et al., 2020b).
Table 3.
Resource | Links | Description |
---|---|---|
GSA | http://gsa.big.ac.cn | HTS data deposition and sharing. Fast data transfer, interfaces in both Chinese and English, automated submission, technical support via email or QQ group, and widely recognized by international journals |
Qiita | https://qiita.ucsd.edu | Platform for amplicon data deposition, analysis, and cross-study comparisons |
MGnify | https://www.ebi.ac.uk/metagenomics | Webserver for amplicon and metagenomic data deposition, sharing, analysis, and cross-study comparisons |
gcMeta | https://gcmeta.wdcm.org | Webserver for amplicon and metagenomic data analysis, deposition, and sharing |
R Markdown | https://rmarkdown.rstudio.com | Uses a productive notebook interface to weave together narrative text and code to produce an elegantly formatted report in HTML or PDF format. Is becoming increasingly popular in microbiome research |
R Graph Gallery | https://www.r-graph-gallery.com | R code for 42 chart types |
GitHub | https://github.com | Online code-saving and sharing platforms with version control systems. Supports searching |
Correlation analysis is used to reveal the associations between taxa and sample metadata (Fig. 3E). For example, it is used to identify associations between taxa and environmental factors, such as pH, longitude and latitude, and clinical indices, or to identify key environmental factors that affect microbiota and dynamic taxa in a time series (Edwards et al., 2018).
Network analysis explores the co-occurrence of features from a holistic perspective (Fig. 3F). The properties of a correlation network might represent potential interactions between co-occurring taxa or functional pathways. Correlation coefficients and significant P-values could be computed using the cor.test() function in R or more robust tools that are suitable for compositional data such as the SparCC (sparse correlations for compositional data) package (Kurtz et al., 2015). Networks could also be visualized and analyzed using R library igraph (Csardi and Nepusz, 2006), Cytoscape (Saito et al., 2012), or Gephi (Bastian et al., 2009). There are several good examples of network analysis, such as studies exploring the distribution of phylum or modules (Fan et al., 2019) or showing trends at different time points (Wang et al., 2019).
Machine learning is a branch of artificial intelligence that learns from data, identifies patterns, and makes decisions (Fig. 3G). In microbiome research, machine learning is used for taxonomic classification, beta-diversity analysis, binning, and compositional analysis of particular features. Commonly used machine learning methods include random forest (Vangay et al., 2019; Qian et al., 2020), Adaboost (Wilck et al., 2017), and deep learning (Galkin et al., 2018) to classify groups by selecting biomarkers or regression analysis to show experimental condition-dependent changes in biomarker abundance (Table 2).
Treemap is widely used for phylogenetic tree construction and for taxonomic annotation and visualization of the microbiome (Fig. 3H). Representative amplicon sequences are readily used for phylogenetic analysis. We recommend using IQ-TREE (Nguyen et al., 2014) to quickly build high-confidence phylogenetic trees using big data and online visualization using iTOL (Letunic and Bork, 2019). Annotation files of tree can easily be generated using the R script table2itol (https://github.com/mgoeker/table2itol). In addition, we recommend using GraPhlAn (Asnicar et al., 2015) to visualize the phylogenetic tree or hierarchical taxonomy in an attractive cladogram.
In addition, researchers may be interested in examining microbial origin to address issues such as the origin of gut microbiota and river pollution, as well as for forensic testing. FEAST (Shenhav et al., 2019) and SourceTracker (Knights et al., 2011) were designed to unravel the origins of microbial communities. If researchers would like to focus on the regulatory relationship between genetic information from the host and microorganisms (Wang et al., 2018a), genome-wide association analysis (GWAS) might be a good choice (Wang et al., 2016).
Reproducible analysis
Reproducible analysis requires that researchers submit their data and code along with their publications instead of merely describing their methods. Reproducibility is critical for microbiome analysis because it is impossible to reproduce results without raw data, detailed sample metadata, and analysis codes. If the readers can run the codes, they will better understand what has been done in the analyses. We recommend that researchers share their sequencing data, metadata, analysis codes, and detailed statistical reports using the following steps:
Upload and share raw data and metadata in a data center
Amplicon or metagenomic sequencing generates a large volume of raw data. Normally, raw data must be uploaded to data centers such as NCBI, EBI, and DDBJ during publication. In recent years, several repositories have also been established in China to provide data storage and sharing services. For example, the Genome Sequence Archive (GSA) established by the Beijing Institute of Genomics Chinese Academy of Sciences (Wang et al., 2017; Members, 2019) has a lot of advantages (Table 3). We recommend that researchers upload raw data to one of these repositories, which not only provides backup but also meets the requirements for publication. Several journals such as Microbiome require that the raw data should be deposited in repositories before submitting the manuscript.
Share pipeline scripts with other researchers
Pipeline scripts could help reviewers or readers evaluate the reproducibility of experimental results. We provide sample pipeline scripts for amplicon and metagenome analyses at https://github.com/YongxinLiu/Liu2020ProteinCell. The running environment and software version used in analysis should also be provided to help ensure reproducibility. If Conda is used to deploy software, the command “conda env export environment_name > environment.yaml” can generate a file containing both the software used and various versions for reproducible usage. For users who are not familiar with command lines, webservers such as Qiita (Gonzalez et al., 2018), MGnify (Mitchell et al., 2020), and gcMeta (Shi et al., 2019b) could be used to perform analysis. However, webservers are less flexible than the command line mode because they provide fewer adjustable steps and parameters.
Provide a detailed statistical and visualization reports
The tools used for statistical analysis and visualization of a feature table include Excel, GraphPad, and Sigma plot, but these are commercial software tools, and are difficult to quickly reproduce the results. We recommend using tools such as R Markdown or Python Notebooks to trace all analysis codes and parameters and storing them in a version control management system such as GitHub (Table 3). These tools are free, open-source, cross-platform, and easy-to-use. We recommend that researchers record all scripts and results of statistical analysis and visualization in R markdown files. An R markdown document is a fully reproducible report that includes codes, tables, and figures in HTML/PDF format. This work mode would greatly improve the efficiency of microbiome analysis and make the analysis process transparent and easier to understand. R visualization codes can refer to R Graph Gallery (Table 3). The input files (feature tables + metadata), analysis notebook (*.Rmd), and output results (figures, tables, and HTML reports) of the analysis can be uploaded to GitHub, which would allow peers to repeat your analyses or reuse your analysis codes. ImageGP (http://www.ehbio.com/ImageGP) provides more than 20 statistical and visualization methods, making it a good choice for researchers without a background in R.
Notes and perspectives
It is worth noting that experimental operations have a far greater impact on the results of a study than the pipeline chosen for analysis (Sinha et al., 2017). It is better to record detailed experimental processes as metadata, which includes sampling method, time, location, operators, DNA extraction kit, batch, primers, and barcodes. The metadata can be used for downstream analyses and help researchers to determine whether these operational differences contribute to false-positive results (Costea et al., 2017). Some specific experimental steps could be used to provide a unique perspective on microbiome analysis. For example, the development and use of methods to remove the host DNA can effectively increase the proportion of the microbiome in plant endophytes (Carrión et al., 2019) and human respiratory infection samples (Charalampous et al., 2019). A large amount of relic DNA in soil can be physically removed with propidium monoazide (Carini et al., 2016). In addition, when using samples with low microbial biomass, researchers must be particularly careful to avoid false-positive results due to contamination (de Goffau et al., 2019). For these situations, DNA-free water should be used as a negative control. In human microbiome studies, the major differences in microbiome composition among individuals are due to factors such as diet, lifestyle, and drug use, such that the heritability is less than 2% (Rothschild et al., 2018). For recommendations about information that should be collected, please refer to minimum information about a marker gene sequence (MIMARKS) and minimum information about metagenome sequence (Field et al., 2008; Yilmaz et al., 2011), minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea (Bowers et al., 2017), and minimum information about an uncultivated virus genome (Roux et al., 2019). In the early stage of microbiome research, data-driven studies provide basic components and conceptual frame of microbiome, however, with the development of experimental tools, more hypothesis-driven studies are needed to dissect the causality of microbiome and host phenotypes.
Shotgun metagenomic sequencing could provide insights into a microbial community structure at strain-level, but it is difficult to recover high-quality genome (Bishara et al., 2018). Single-cell genome sequencing shows very promising applications in microbiome research (Xu and Zhao, 2018). Based on flow cytometry and single-cell sequencing, MetaSort could recover high-quality genomes from sorted sub-metagenome (Ji et al., 2017). Recently developed third-generation sequencing techniques have been used for metagenome analysis, including Pacific Biosciences (PacBio) single molecule real time sequencing and the Oxford Nanopore Technologies sequencing platform (Bertrand et al., 2019; Stewart et al., 2019; Moss et al., 2020). With the improvement in sequencing data quality and decreasing costs, these techniques will lead to a technological revolution in the field of microbiome sequencing and bring microbiome research into a new era.
Conclusion
In this review, we discussed methods for analyzing amplicon and metagenomic data at all stages, from the selection of sequencing methods, analysis software/pipelines, statistical analysis and visualization to the implementation of reproducible analysis. Other methods such as metatranscriptome, metaproteome, and metabolome analysis may provide a better perspective on the dynamics of the microbiome, but these methods have not been widely accepted due to their high cost and the complex experimental and analysis methods required. With the further development of these technologies in the future, a more comprehensive view of the microbiome could be obtained.
Acknowledgements
This work was supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences (Precision Seed Design and Breeding, XDA24020104), the Key Research Program of Frontier Sciences of the Chinese Academy of Science (grant nos. QYZDB-SSW-SMC021), the National Natural Science Foundation of China (grant nos. 31772400).
Abbreviations
ANOVA, analysis of variance; ASV, amplicon sequence variants; CAS-CMI, Chinese Academy of Sciences Initiative of Microbiome; CPCoA, constrained principal coordinate analysis; FAPROTAX, Functional Annotation of Prokaryotic Taxa; GB, gigabyte; GSA, Genome Sequence Archive; GUI, graphical user interface; GWAS, genome-wide association analysis; HMP, Human Microbiome Project; HTS, high-throughput sequencing; iHMP, integrative HMP; KEGG, Kyoto Encyclopedia of Genes and Genomes; KO, KEGG Ortholog; LCA, lowest common ancestor; MetaHIT, Metagenomics of the Human Intestinal Tract; NMDS, non-metric multidimensional scaling; OTU, operational taxonomic units; PacBio, Pacific Biosciences; PERMANOVA, permutational multivariate analysis of variance; PE250, paired-end 250 bp; PCoA, principal coordinate analysis; RA, relative abundance; rDNA, ribosome DNA.
Compliance with ethics guidelines
Yong-Xin Liu, Xubo Qian and Yang Bai contributed to write the paper. Yuan Qin designed and draw the figures. Tong Chen tested all the software mentioned in this review and share the codes. All authors read, revise and approved this paper. Yong-Xin Liu, Yuan Qin, Tong Chen, Xubo Qian, Meiping Lu, Xiaoxuan Guo and Yang Bai declare that they have no conflict of interest. This article does not contain any studies with human or animal subjects performed by the any of the authors.
Footnotes
Yong-Xin Liu, Yuan Qin and Tong Chen have contributed equally to this work
Contributor Information
Yong-Xin Liu, Email: yxliu@genetics.ac.cn.
Yang Bai, Email: ybai@genetics.ac.cn.
References
- Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–1146. doi: 10.1038/nmeth.3103. [DOI] [PubMed] [Google Scholar]
- Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto JM, et al. Enterotypes of the human gut microbiome. Nature. 2011;473:174–180. doi: 10.1038/nature09944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asnicar F, Weingart G, Tickle TL, Huttenhower C, Segata N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ. 2015;3:e1029. doi: 10.7717/peerj.1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asshauer KP, Wemheuer B, Daniel R, Meinicke P. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics. 2015;31:2882–2884. doi: 10.1093/bioinformatics/btv287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bai Y, Müller DB, Srinivas G, Garrido-Oter R, Potthoff E, Rott M, Dombrowski N, Münch PC, Spaepen S, Remus-Emsermann M, et al. Functional overlap of the Arabidopsis leaf and root microbiota. Nature. 2015;528:364–369. doi: 10.1038/nature16192. [DOI] [PubMed] [Google Scholar]
- Bastian M, Heymann S, and Jacomy M (2009). Gephi: an open source software for exploring and manipulating networks. In: Third international AAAI conference on weblogs and social media.
- Beckers B, Op De Beeck M, Weyens N, Boerjan W, Vangronsveld J. Structural variability and niche differentiation in the rhizosphere and endosphere bacterial microbiome of field-grown poplar trees. Microbiome. 2017;5:25. doi: 10.1186/s40168-017-0241-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bertrand D, Shaw J, Kalathiyappan M, Ng AHQ, Kumar MS, Li C, Dvornicic M, Soldo JP, Koh JY, Tong C, et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat Biotechnol. 2019;37:937–944. doi: 10.1038/s41587-019-0191-2. [DOI] [PubMed] [Google Scholar]
- Bishara A, Moss EL, Kolmogorov M, Parada AE, Weng Z, Sidow A, Dekas AE, Batzoglou S, Bhatt AS. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol. 2018;36:1067–1075. doi: 10.1038/nbt.4266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blin K, Weber T, Lee SY, Medema MH, Pascal Andreu V, de los Santos ELC, Del Carratore F. The antiSMASH database version 2: a comprehensive resource on secondary metabolite biosynthetic gene clusters. Nucleic Acids Res. 2018;47:D625–D630. doi: 10.1093/nar/gky1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37:852–857. doi: 10.1038/s41587-019-0209-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35:725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13:581–583. doi: 10.1038/nmeth.3869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7:335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carini P, Marsden PJ, Leff JW, Morgan EE, Strickland MS, Fierer N. Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Nat Microbiol. 2016;2:16242. doi: 10.1038/nmicrobiol.2016.242. [DOI] [PubMed] [Google Scholar]
- Carrión VJ, Perez-Jaramillo J, Cordovez V, Tracanna V, de Hollander M, Ruiz-Buck D, Mendes LW, van Ijcken WFJ, Gomez-Exposito R, Elsayed SS, et al. Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome. Science. 2019;366:606–612. doi: 10.1126/science.aaw9285. [DOI] [PubMed] [Google Scholar]
- Charalampous T, Kay GL, Richardson H, Aydin A, Baldan R, Jeanes C, Rae D, Grundy S, Turner DJ, Wain J, et al. Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection. Nat Biotechnol. 2019;37:783–792. doi: 10.1038/s41587-019-0156-5. [DOI] [PubMed] [Google Scholar]
- Chen Q, Jiang T, Liu Y-X, Liu H, Zhao T, Liu Z, Gan X, Hallab A, Wang X, He J, et al. Recently duplicated sesterterpene (C25) gene clusters in Arabidopsis thaliana modulate root microbiota. Sci China Life Sci. 2019;62:947–958. doi: 10.1007/s11427-019-9521-2. [DOI] [PubMed] [Google Scholar]
- Costea PI, Zeller G, Sunagawa S, Pelletier E, Alberti A, Levenez F, Tramontano M, Driessen M, Hercog R, Jung F-E, et al. Towards standards for human fecal sample processing in metagenomic studies. Nat Biotechnol. 2017;35:1069–1076. doi: 10.1038/nbt.3960. [DOI] [PubMed] [Google Scholar]
- Csardi G, Nepusz T. The igraph software package for complex network research. InterJ Complex Syst. 2006;1695:1–9. [Google Scholar]
- de Goffau MC, Lager S, Sovio U, Gaccioli F, Cook E, Peacock SJ, Parkhill J, Charnock-Jones DS, Smith GCS. Human placenta has no microbiome but can contain potential pathogens. Nature. 2019;572:329–334. doi: 10.1038/s41586-019-1451-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Muinck EJ, Trosvik P, Gilfillan GD, Hov JR, Sundaram AYM. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform. Microbiome. 2017;5:68. doi: 10.1186/s40168-017-0279-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
- Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. 2013;10:996–998. doi: 10.1038/nmeth.2604. [DOI] [PubMed] [Google Scholar]
- Edgar RC, Flyvbjerg H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics. 2015;31:3476–3482. doi: 10.1093/bioinformatics/btv401. [DOI] [PubMed] [Google Scholar]
- Edwards J, Johnson C, Santos-Medellín C, Lurie E, Podishetty NK, Bhatnagar S, Eisen JA, Sundaresan V. Structure, variation, and assembly of the root-associated microbiomes of rice. Proc Natl Acad Sci USA. 2015;112:E911–E920. doi: 10.1073/pnas.1414592112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edwards JA, Santos-Medellín CM, Liechty ZS, Nguyen B, Lurie E, Eason S, Phillips G, Sundaresan V. Compositional shifts in root-associated bacterial and archaeal microbiota track the plant life cycle in field-grown rice. PLoS Biol. 2018;16:e2003862. doi: 10.1371/journal.pbio.2003862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan K, Delgado-Baquerizo M, Guo X, Wang D, Wu Y, Zhu M, Yu W, Yao H, Zhu Y-g, Chu H. Suppressed N fixation and diazotrophs after four decades of fertilization. Microbiome. 2019;7:143. doi: 10.1186/s40168-019-0757-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 2008;26:541–547. doi: 10.1038/nbt1360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods. 2018;15:962–968. doi: 10.1038/s41592-018-0176-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fresia P, Antelo V, Salazar C, Giménez M, D’Alessandro B, Afshinnekoo E, Mason C, Gonnet GH, Iraola G. Urban metagenomics uncover antibiotic resistance reservoirs in coastal beach and sewage waters. Microbiome. 2019;7:35. doi: 10.1186/s40168-019-0648-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galkin F, Aliper A, Putin E, Kuznetsov I, Gladyshev VN, Zhavoronkov A (2018) Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. bioRxiv 507780
- Gao L, Xu T, Huang G, Jiang S, Gu Y, Chen F. Oral microbiomes: more and more importance in oral cavity and whole body. Protein Cell. 2018;9:488–500. doi: 10.1007/s13238-018-0548-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gonzalez A, Navas-Molina JA, Kosciolek T, McDonald D, Vázquez-Baeza Y, Ackermann G, DeReus J, Janssen S, Swafford AD, Orchanian SB, et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods. 2018;15:796–798. doi: 10.1038/s41592-018-0141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodman AL, Kallstrom G, Faith JJ, Reyes A, Moore A, Dantas G, Gordon JI. Extensive personal human gut microbiota culture collections characterized and manipulated in gnotobiotic mice. Proc Natl Acad Sci USA. 2011;108:6252–6257. doi: 10.1073/pnas.1102938108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J, The Bioconda T. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018;15:475–476. doi: 10.1038/s41592-018-0046-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo X, Zhang X, Qin Y, Liu Y-X, Zhang J, Zhang N, Wu K, Qu B, He Z, Wang X, et al. Host-associated quantitative abundance profiling reveals the microbial load variation of root microbiome. Plant Commun. 2020;1:100003. doi: 10.1016/j.xplc.2019.100003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang AC, Jiang T, Liu Y-X, Bai Y-C, Reed J, Qu B, Goossens A, Nützmann H-W, Bai Y, Osbourn A. A specialized metabolic network selectively modulates Arabidopsis root microbiota. Science. 2019;364:eaau6389. doi: 10.1126/science.aau6389. [DOI] [PubMed] [Google Scholar]
- Huang P, Zhang Y, Xiao K, Jiang F, Wang H, Tang D, Liu D, Liu B, Liu Y, He X, et al. The chicken gut metagenome and the modulatory effects of plant-derived benzylisoquinoline alkaloids. Microbiome. 2018;6:211. doi: 10.1186/s40168-018-0590-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh H-J, Tappu R. MEGAN community edition—interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol. 2016;12:e1004957. doi: 10.1371/journal.pcbi.1004957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyatt D, LoCascio PF, Hauser LJ, Uberbacher EC. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics. 2012;28:2223–2230. doi: 10.1093/bioinformatics/bts429. [DOI] [PubMed] [Google Scholar]
- Ji P, Zhang Y, Wang J, Zhao F. MetaSort untangles metagenome assembly by reducing microbial community complexity. Nat Commun. 2017;8:14306. doi: 10.1038/ncomms14306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang X, Li X, Yang L, Liu C, Wang Q, Chi W, Zhu H. How microbes shape their communities? A microbial community model based on functional genes. Genom Proteom Bioinf. 2019;17:91–105. doi: 10.1016/j.gpb.2018.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiao S, Liu Z, Lin Y, Yang J, Chen W, Wei G. Bacterial communities in oil contaminated soils: biogeography and co-occurrence patterns. Soil Biol Biochem. 2016;98:64–73. [Google Scholar]
- Jin T, Wang Y, Huang Y, Xu J, Zhang P, Wang N, Liu X, Chu H, Liu G, Jiang H, et al. Taxonomic structure and functional association of foxtail millet root microbiome. Giga Sci. 2017;6:1–12. doi: 10.1093/gigascience/gix089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 2016;428:726–731. doi: 10.1016/j.jmb.2015.11.006. [DOI] [PubMed] [Google Scholar]
- Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3:e1165. doi: 10.7717/peerj.1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, Glöckner FO. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2012;41:e1–e1. doi: 10.1093/nar/gks808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knight R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J, Gonzalez A, Kosciolek T, McCall L-I, McDonald D, et al. Best practices for analysing microbiomes. Nat Rev Microbiol. 2018;16:410–422. doi: 10.1038/s41579-018-0029-9. [DOI] [PubMed] [Google Scholar]
- Knights D, Kuczynski J, Charlson ES, Zaneveld J, Mozer MC, Collman RG, Bushman FD, Knight R, Kelley ST. Bayesian community-wide culture-independent microbial source tracking. Nat Methods. 2011;8:761. doi: 10.1038/nmeth.1650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015;11:e1004226. doi: 10.1371/journal.pcbi.1004226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lagier J-C, Dubourg G, Million M, Cadoret F, Bilen M, Fenollar F, Levasseur A, Rolain J-M, Fournier P-E, Raoult D. Culturing the human microbiota and culturomics. Nat Rev Microbiol. 2018;16:540–550. doi: 10.1038/s41579-018-0041-0. [DOI] [PubMed] [Google Scholar]
- Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Vega Thurber RL, Knight R, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013;31:814. doi: 10.1038/nbt.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Letunic I, Bork P. Interactive tree of life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:W256–W259. doi: 10.1093/nar/gkz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levy A, Salas Gonzalez I, Mittelviefhaus M, Clingenpeel S, Herrera Paredes S, Miao J, Wang K, Devescovi G, Stillman K, Monteiro F, et al. Genomic features of bacterial adaptation to plants. Nat Genet. 2018;50:138–150. doi: 10.1038/s41588-017-0012-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31:1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
- Li J, Jia H, Cai X, Zhong H, Feng Q, Sunagawa S, Arumugam M, Kultima JR, Prifti E, Nielsen T, et al. An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol. 2014;32:834–841. doi: 10.1038/nbt.2942. [DOI] [PubMed] [Google Scholar]
- Liu C, Zhou N, Du M-X, Sun Y-T, Wang K, Wang Y-J, Li D-H, Yu H-Y, Song Y, Bai B-B, et al. The mouse gut microbial Biobank expands the coverage of cultured bacteria. Nat Commun. 2020;11:79. doi: 10.1038/s41467-019-13836-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y-X, Qin Y, Bai Y. Reductionist synthetic community approaches in root microbiome research. Curr Opin Microbiol. 2019;49:97–102. doi: 10.1016/j.mib.2019.10.010. [DOI] [PubMed] [Google Scholar]
- Liu Y-X, Qin Y, Guo X, Bai Y. Methods and applications for microbiome data analysis. Hereditas (Beijing) 2019;41:1–18. doi: 10.16288/j.yczz.19-222. [DOI] [PubMed] [Google Scholar]
- Louca S, Parfrey LW, Doebeli M. Decoupling function and taxonomy in the global ocean microbiome. Science. 2016;353:1272–1277. doi: 10.1126/science.aaf4507. [DOI] [PubMed] [Google Scholar]
- Mahnert A, Moissl-Eichinger C, Zojer M, Bogumil D, Mizrahi I, Rattei T, Martinez JL, Berg G. Man-made microbial resistances in built environments. Nat Commun. 2019;10:968. doi: 10.1038/s41467-019-08864-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marchesi JR, Ravel J. The vocabulary of microbiome research: a proposal. Microbiome. 2015;3:31. doi: 10.1186/s40168-015-0094-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 2011;6:610. doi: 10.1038/ismej.2011.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Members BDC. Database resources of the BIG data center in 2019. Nucleic Acids Res. 2019;47:D8–D14. doi: 10.1093/nar/gky993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Metcalf JL, Xu ZZ, Weiss S, Lax S, Van Treuren W, Hyde ER, Song SJ, Amir A, Larsen P, Sangwan N, et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science. 2016;351:158–162. doi: 10.1126/science.aad2646. [DOI] [PubMed] [Google Scholar]
- Metsky HC, Siddle KJ, Gladden-Young A, Qu J, Yang DK, Brehio P, Goldfarb A, Piantadosi A, Wohl S, Carter A, et al. Capturing sequence diversity in metagenomes with comprehensive and scalable probe design. Nat Biotechnol. 2019;37:160–168. doi: 10.1038/s41587-018-0006-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics. 2016;32:1088–1090. doi: 10.1093/bioinformatics/btv697. [DOI] [PubMed] [Google Scholar]
- Mitchell AL, Almeida A, Beracochea M, Boland M, Burgin J, Cochrane G, Crusoe MR, Kale V, Potter SC, Richardson LJ, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020;48:D570–D578. doi: 10.1093/nar/gkz1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moss EL, Maghini DG, and Bhatt AS (2020) Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol [DOI] [PMC free article] [PubMed]
- Mu D-S, Liang Q-Y, Wang X-M, Lu D-C, Shi M-J, Chen G-J, Du Z-J. Metatranscriptomic and comparative genomic insights into resuscitation mechanisms during enrichment culturing. Microbiome. 2018;6:230. doi: 10.1186/s40168-018-0613-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2014;32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ning K, Tong Y. The fast track for microbiome research. Genom Proteom Bioinf. 2019;17:1–3. doi: 10.1016/j.gpb.2019.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–834. doi: 10.1101/gr.213959.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oksanen J, Kindt R, Legendre P, O’Hara B, Stevens MHH, Oksanen MJ, Suggests M. The vegan package. Commun Ecol Pack. 2007;10:631–637. [Google Scholar]
- Parks DH, Tyson GW, Hugenholtz P, Beiko RG. STAMP: statistical analysis of taxonomic and functional profiles. Bioinformatics. 2014;30:3123–3124. doi: 10.1093/bioinformatics/btu494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, Beghini F, Manghi P, Tett A, Ghensi P, et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019;176:649–662.e620. doi: 10.1016/j.cell.2019.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14:417–149. doi: 10.1038/nmeth.4197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedersen HK, Forslund SK, Gudmundsdottir V, Petersen AØ, Hildebrand F, Hyötyläinen T, Nielsen T, Hansen T, Bork P, Ehrlich SD, et al. A computational framework to integrate high-throughput ‘-omics’ datasets for the identification of potential mechanistic links. Nat Protoc. 2018;13:2781–2800. doi: 10.1038/s41596-018-0064-z. [DOI] [PubMed] [Google Scholar]
- Proctor LM, Creasy HH, Fettweis JM, Lloyd-Price J, Mahurkar A, Zhou W, Buck GA, Snyder MP, Strauss JF, Weinstock GM, et al. The integrative human microbiome project. Nature. 2019;569:641–648. doi: 10.1038/s41586-019-1238-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian X, Liu Y-X, Ye X, Zheng W, Lv S, Mo M, Lin J, Wang W, Wang W, Zhang X, et al. Gut microbiota in children with juvenile idiopathic arthritis: characteristics, biomarker identification, and usefulness in clinical prediction. BMC Genom. 2020;21:286. doi: 10.1186/s12864-020-6703-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35:833. doi: 10.1038/nbt.3935. [DOI] [PubMed] [Google Scholar]
- Ren Z, Li A, Jiang J, Zhou L, Yu Z, Lu H, Xie H, Chen X, Shao L, Zhang R, et al. Gut microbiome analysis as a tool towards targeted non-invasive biomarkers for early hepatocellular carcinoma. Gut. 2019;68:1014–1023. doi: 10.1136/gutjnl-2017-315084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584. doi: 10.7717/peerj.2584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ross AA, Müller KM, Weese JS, Neufeld JD. Comprehensive skin microbiome analysis reveals the uniqueness of human skin and evidence for phylosymbiosis within the class mammalia. Proc Natl Acad Sci USA. 2018;115:E5786–E5795. doi: 10.1073/pnas.1801302115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rothschild D, Weissbrod O, Barkan E, Kurilshikov A, Korem T, Zeevi D, Costea PI, Godneva A, Kalka IN, Bar N, et al. Environment dominates over host genetics in shaping human gut microbiota. Nature. 2018;555:210. doi: 10.1038/nature25973. [DOI] [PubMed] [Google Scholar]
- Roux S, Adriaenssens EM, Dutilh BE, Koonin EV, Kropinski AM, Krupovic M, Kuhn JH, Lavigne R, Brister JR, Varsani A, et al. Minimum information about an uncultivated virus genome (MIUViG) Nat Biotechnol. 2019;37:29–37. doi: 10.1038/nbt.4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saito R, Smoot ME, Ono K, Ruscheinski J, Wang P-L, Lotia S, Pico AR, Bader GD, Ideker T. A travel guide to cytoscape plugins. Nat Methods. 2012;9:1069–1076. doi: 10.1038/nmeth.2212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salazar G, Paoli L, Alberti A, Huerta-Cepas J, Ruscheweyh H-J, Cuenca M, Field CM, Coelho LP, Cruaud C, Engelen S, et al. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell. 2019;179:1068–1083.e1021. doi: 10.1016/j.cell.2019.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12:R60. doi: 10.1186/gb-2011-12-6-r60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shenhav L, Thompson M, Joseph TA, Briscoe L, Furman O, Bogumil D, Mizrahi I, Pe’er I, and Halperin E (2019) FEAST: fast expectation-maximization for microbial source tracking. Nat Methods [DOI] [PMC free article] [PubMed]
- Shi W, Li M, Wei G, Tian R, Li C, Wang B, Lin R, Shi C, Chi X, Zhou B, et al. The occurrence of potato common scab correlates with the community composition and function of the geocaulosphere soil microbiome. Microbiome. 2019;7:14. doi: 10.1186/s40168-019-0629-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi W, Qi H, Sun Q, Fan G, Liu S, Wang J, Zhu B, Liu H, Zhao F, Wang X, et al. gcMeta: a global catalogue of metagenomics platform to support the archiving, standardization and analysis of microbiome data. Nucleic Acids Res. 2019;47:D637–D648. doi: 10.1093/nar/gky1008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 2018;3:836–843. doi: 10.1038/s41564-018-0171-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sinha R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, Schwager E, Crabtree J, Ma S, Abnet CC, et al. Assessment of variation in microbial community amplicon sequencing by the microbiome quality control (MBQC) project consortium. Nat Biotechnol. 2017;35:1077–1086. doi: 10.1038/nbt.3981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smits SA, Leach J, Sonnenburg ED, Gonzalez CG, Lichtman JS, Reid G, Knight R, Manjurano A, Changalucha J, Elias JE, et al. Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science. 2017;357:802–806. doi: 10.1126/science.aan4834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stewart RD, Auffret MD, Warr A, Walker AW, Roehe R, Watson M. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat Biotechnol. 2019;37:953–961. doi: 10.1038/s41587-019-0202-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stewart RD, Auffret MD, Warr A, Wiser AH, Press MO, Langford KW, Liachko I, Snelling TJ, Dewhurst RJ, Walker AW, et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat Commun. 2018;9:870. doi: 10.1038/s41467-018-03317-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian S, Huq S, Yatsunenko T, Haque R, Mahfuz M, Alam MA, Benezra A, DeStefano J, Meier MF, Muegge BD, et al. Persistent gut microbiota immaturity in malnourished Bangladeshi children. Nature. 2014;510:417. doi: 10.1038/nature13421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tange O (2018). Gnu parallel 2018 (Lulu. com).
- Tierney BT, Yang Z, Luber JM, Beaudin M, Wibowo MC, Baek C, Mehlenbacher E, Patel CJ, Kostic AD. The landscape of genetic content in the gut and oral human microbiome. Cell Host Microbe. 2019;26:283–295.e288. doi: 10.1016/j.chom.2019.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tkacz A, Hortala M, Poole PS. Absolute quantitation of microbiota abundance in environmental samples. Microbiome. 2018;6:110. doi: 10.1186/s40168-018-0491-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods. 2015;12:902–903. doi: 10.1038/nmeth.3589. [DOI] [PubMed] [Google Scholar]
- Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449:804–810. doi: 10.1038/nature06244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turner TR, Ramakrishnan K, Walshaw J, Heavens D, Alston M, Swarbreck D, Osbourn A, Grant A, Poole PS. Comparative metatranscriptomics reveals kingdom level changes in the rhizosphere microbiome of plants. ISME J. 2013;7:2248–2258. doi: 10.1038/ismej.2013.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6:158. doi: 10.1186/s40168-018-0541-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vandeputte D, Kathagen G, D’hoe K, Vieira-Silva S, Valles-Colomer M, Sabino J, Wang J, Tito RY, De Commer L, Darzi Y, et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature. 2017;551:507–511. doi: 10.1038/nature24460. [DOI] [PubMed] [Google Scholar]
- Vangay P, Hillmann BM, Knights D. Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. GigaScience. 2019;8:giz042. doi: 10.1093/gigascience/giz042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Chen L, Zhao N, Xu X, Xu Y, Zhu B. Of genes and microbes: solving the intricacies in host genomes. Protein Cell. 2018;9:446–461. doi: 10.1007/s13238-018-0532-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Jia Z, Zhang B, Peng L, and Zhao F (2019) Tracing the accumulation of in vivo human oral microbiota elucidates microbial community dynamics at the gateway to the GI tract. Gut, gutjnl-2019–318977 [DOI] [PMC free article] [PubMed]
- Wang J, Thingholm LB, Skiecevičienė J, Rausch P, Kummen M, Hov JR, Degenhardt F, Heinsen F-A, Rühlemann MC, Szymczak S, et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat Genet. 2016;48:1396–1406. doi: 10.1038/ng.3695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Zheng J, Shi W, Du N, Xu X, Zhang Y, Ji P, Zhang F, Jia Z, Wang Y, et al. Dysbiosis of maternal and neonatal microbiota associated with gestational diabetes mellitus. Gut. 2018;67:1614–1625. doi: 10.1136/gutjnl-2018-315988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang W, Yang J, Zhang J, Liu Y-X, Tian C, Qu B, Gao C, Xin P, Cheng S, Zhang W, et al. An Arabidopsis secondary metabolite directly targets expression of the bacterial type III secretion system to inhibit bacterial virulence. Cell Host Microbe. 2020;27:601–613.e607. doi: 10.1016/j.chom.2020.03.004. [DOI] [PubMed] [Google Scholar]
- Wang X, Wang M, Xie X, Guo S, Zhou Y, Zhang X, Yu N, and Wang E (2020b) An amplification-selection model for quantified rhizosphere microbiota assembly. Sci Bull [DOI] [PubMed]
- Wang Y, Song F, Zhu J, Zhang S, Yang Y, Chen T, Tang B, Dong L, Ding N, Zhang Q, et al. GSA: genome sequence archive*. Genom Proteom Bioinf. 2017;15:14–18. doi: 10.1016/j.gpb.2017.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ward T, Larson J, Meulemans J, Hillmann B, Lynch J, Sidiropoulos D, Spear JR, Caporaso G, Blekhman R, Knight R et al (2017) BugBase predicts organism-level microbiome phenotypes. bioRxiv 133462
- Wilck N, Matus MG, Kearney SM, Olesen SW, Forslund K, Bartolomaeus H, Haase S, Mähler A, Balogh A, Markó L, et al. Salt-responsive gut commensal modulates TH17 axis and disease. Nature. 2017;551:585–589. doi: 10.1038/nature24628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wood DE, Lu J, and Langmead B (2019) Improved metagenomic analysis with Kraken 2. bioRxiv 762302 [DOI] [PMC free article] [PubMed]
- Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2015;32:605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
- Xiao L, Feng Q, Liang S, Sonne SB, Xia Z, Qiu X, Li X, Long H, Zhang J, Zhang D, et al. A catalog of the mouse gut metagenome. Nat Biotechnol. 2015;33:1103. doi: 10.1038/nbt.3353. [DOI] [PubMed] [Google Scholar]
- Xu J, Zhang Y, Zhang P, Trivedi P, Riera N, Wang Y, Liu X, Fan G, Tang J, Coletta-Filho HD, et al. The structure and function of the global citrus rhizosphere microbiome. Nat Commun. 2018;9:4894. doi: 10.1038/s41467-018-07343-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Y, Zhao F. Single-cell metagenomics: challenges and applications. Protein Cell. 2018;9:501–510. doi: 10.1007/s13238-018-0544-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Yu J. The association of diet, gut microbiota and colorectal cancer: what we eat may imply what we get. Protein Cell. 2018;9:474–487. doi: 10.1007/s13238-018-0543-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell. 2019;178:779–794. doi: 10.1016/j.cell.2019.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol. 2011;29:415–420. doi: 10.1038/nbt.1823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zgadzaj R, Garrido-Oter R, Jensen DB, Koprivova A, Schulze-Lefert P, Radutoiu S. Root nodule symbiosis in Lotus japonicus drives the establishment of distinctive rhizosphere, root, and nodule bacterial communities. Proc Natl Acad Sci USA. 2016;113:E7996–E8005. doi: 10.1073/pnas.1616564113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang F, Cui B, He X, Nie Y, Wu K, Fan D, Feng B, Chen D, Ren J, Deng M, et al. Microbiota transplantation: concept, methodology and strategy for its modernization. Protein Cell. 2018;9:462–473. doi: 10.1007/s13238-018-0541-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Liu Y-X, Zhang N, Hu B, Jin T, Xu H, Qin Y, Yan P, Zhang X, Guo X, et al. NRT1.1B is associated with root microbiota composition and nitrogen use in field-grown rice. Nat Biotechnol. 2019;37:676–684. doi: 10.1038/s41587-019-0104-4. [DOI] [PubMed] [Google Scholar]
- Zhang J, Zhang N, Liu Y-X, Zhang X, Hu B, Qin Y, Xu H, Wang H, Guo X, Qian J, et al. Root microbiota shift in rice correlates with resident time in the field and developmental stage. Sci China Life Sci. 2018;61:613–621. doi: 10.1007/s11427-018-9284-4. [DOI] [PubMed] [Google Scholar]
- Zheng M, Zhou N, Liu S, Dang C, Liu Y-X, He S, Zhao Y, Liu W, Wang X. N2O and NO emission from a biological aerated filter treating coking wastewater: main source and microbial community. J Clean Prod. 2019;213:365–374. [Google Scholar]
- Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 2010;38:e132–e132. doi: 10.1093/nar/gkq275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou Y, Xue W, Luo G, Deng Z, Qin P, Guo R, Sun H, Xia Y, Liang S, Dai Y, et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat Biotechnol. 2019;37:179–185. doi: 10.1038/s41587-018-0008-8. [DOI] [PMC free article] [PubMed] [Google Scholar]