Skip to main content
eLife logoLink to eLife
. 2020 Nov 20;9:e58999. doi: 10.7554/eLife.58999

Does diversity beget diversity in microbiomes?

Naïma Madi 1, Michiel Vos 2, Carmen Lia Murall 1, Pierre Legendre 1, B Jesse Shapiro 1,3,4,
Editors: Detlef Weigel5, Detlef Weigel6
PMCID: PMC7755399  PMID: 33215610

Abstract

Microbes are embedded in complex communities where they engage in a wide array of intra- and inter-specific interactions. The extent to which these interactions drive or impede microbiome diversity is not well understood. Historically, two contrasting hypotheses have been suggested to explain how species interactions could influence diversity. ‘Ecological Controls’ (EC) predicts a negative relationship, where the evolution or migration of novel types is constrained as niches become filled. In contrast, ‘Diversity Begets Diversity’ (DBD) predicts a positive relationship, with existing diversity promoting the accumulation of further diversity via niche construction and other interactions. Using high-throughput amplicon sequencing data from the Earth Microbiome Project, we provide evidence that DBD is strongest in low-diversity biomes, but weaker in more diverse biomes, consistent with biotic interactions initially favouring the accumulation of diversity (as predicted by DBD). However, as niches become increasingly filled, diversity hits a plateau (as predicted by EC).

Research organism: Other

Introduction

The majority of the genetic diversity on Earth is encoded by microbes (Hug et al., 2016; Lapierre and Gogarten, 2009; Tara Oceans coordinators et al., 2015) and the functioning of all Earth’s ecosystems is reliant on diverse microbial communities (Falkowski et al., 2008). High-throughput 16S rRNA gene amplicon sequencing studies continue to yield unprecedented insight into the taxonomic richness of microbiomes (e.g. Louca et al., 2019; Sogin et al., 2006), and abiotic drivers of community composition (e.g. pH; Lauber et al., 2009; Power et al., 2018) are increasingly characterized. Although it is known that biotic (microbe-microbe) interactions can also be important in determining community composition (Needham and Fuhrman, 2016), comparatively little is known about how such interactions, either positive (e.g. cross-feeding; Seth and Taga, 2014) or negative (e.g. toxin-mediated interference competition; Czárán et al., 2002; Hibbing et al., 2010), shape microbiome diversity as a whole.

The dearth of studies exploring how microbial interactions could influence diversity stands in marked contrast to a long research tradition on biotic controls of plant and animal diversity (Elton, 1946; Gause, 2003). In an early study of 49 animal (vertebrate and invertebrate) community samples, Elton, 1946 plotted the number of species versus the number of genera and observed a ~ 1:1 ratio in each individual sample, but a ~ 4:1 ratio when all samples were pooled. He took this observation as evidence for competitive exclusion preventing related species, more likely to overlap in niche space, to co-exist. This concept, more recently referred to as niche filling or Ecological Controls (EC) (Schluter and Pennell, 2017), predicts speciation (or, more generally, diversification) rates to decrease with increasing standing species diversity because less niche space is available (Rabosky and Hurlbert, 2015). In contrast, the Diversity Begets Diversity (DBD) model predicts that when species interactions create novel niches, standing biodiversity favours further diversification (Calcagno et al., 2017; Whittaker, 1972). For example, niche construction (i.e. the physical, chemical or biological alteration of the environment) could influence the evolution of the species constructing the niche, as well as that of co-occurring species (Laland et al., 1999; San Roman and Wagner, 2018). An alternative to either EC or DBD is The Neutral Theory of Biodiversity and Biogeography, in which all species are functionally equivalent and communities assemble via random sampling (Hubbell, 2001). Neutral Theory serves as a null hypothesis of community assembly in macrobes (Azaele et al., 2016; Gotelli and McGill, 2006), and more recently in microbiome research (Harris et al., 2017; Li and Ma, 2016).

Empirical evidence for the action of EC vs. DBD in natural plant and animal communities has been mixed (Calcagno et al., 2017; Emerson and Kolm, 2005; Palmer and Maurer, 1997; Price et al., 2014; Rabosky et al., 2018). Laboratory evolution experiments tracking the diversification of a focal bacterial lineage in communities of varying complexity have also yielded contradictory results, with support for EC, DBD, or intermediate scenarios (Brockhurst et al., 2007; Meyer and Kassen, 2007). For example, diversification of a focal Pseudomonas clone was favoured by increasing community diversity in the range of 0–20 other strains or species within the same genus (Calcagno et al., 2017; Jousset et al., 2016) but diversification was inhibited in highly diverse communities (e.g. hundreds or thousands of species in compost; Gómez and Buckling, 2013). These experiments are consistent with interspecific competition initially driving (Bailey et al., 2013), but eventually inhibiting diversification as niches are filled.

Most laboratory experiments are restricted to relatively short evolutionary time scales and include only a small number of taxa; it is therefore unclear if they can be generalized to natural communities consisting of many more taxa evolving and assembling over much longer periods, spanning more environmental change, greater evolutionary diversification, and frequent migration events. Although the absence of a substantial prokaryotic fossil record hinders deconvoluting speciation and extinction rates (Louca and Pennell, 2020; Marshall, 2017) Louca et al., 2018 recently estimated that bacterial diversity has mostly increased over the past billion years, with speciation rates slightly exceeding extinction rates. However, because many free-living microbes have high migration rates (‘everything is everywhere, but the environment selects’ [de Wit and Bouvier, 2006]), we expect that the majority of diversity present within a typical microbiome sample is selected from a pool of migrants rather than having evolved in situ. As such, here we broadly define ‘diversity begets diversity’ (DBD) to include the combined effects of community assembly from a migrant pool (‘ecological species sorting’) and in situ evolutionary diversification (Figure 1).

Figure 1. Contrasting the Diversity Begets Diversity (DBD) and Ecological Controls (EC) models.

Figure 1.

(A). In this hypothetical scenario, microbiome sample 1 contains one non-focal genus, and two amplicon sequence variants (ASVs) within the focal genus (point at x = 1, y = 2 in the plot). Sample 2 contains three non-focal genera, and four ASVs within the focal genus (point at x = 3, y = 4). Tracing a line through these points yields a positive diversity slope, supporting the DBD model (red). (B) Alternatively, a negative slope would support the Ecological Controls (EC) model (blue line). In the middle panel, we consider a community assembly model to explain the hypothetical data of the top panel, in which standing diversity (black points) in a community selects (for or against) new types (referred to here as ASVs) which arrive via migration (purple points and arrows). In the bottom panel, we consider an evolutionary diversification model of a focal lineage (genus) into ASVs as a function of initial genus-level community diversity present at the time of diversification.

To test whether patterns of diversity in natural communities conform to EC or DBD dynamics, we used 2000 microbiome samples from the Earth Microbiome Project (EMP), the largest available repository of biodiversity based on standardized sampling and sequencing protocols, with 16S rRNA gene amplicon sequence variants (ASVs) as the finest-grained taxonomic unit (Thompson et al., 2017). Following Elton, 1946, we use the equivalent of Species:Genus ratios, calculating a range of taxonomic diversity ratios (up to the Class:Phylum level) as proxies for diversity within a focal taxon, from shallow to deep evolutionary time. We then plot each ratio as a function of the number of non-focal taxa (Genera, Families, Orders, Classes, and Phyla, respectively) with which the focal taxon could interact. We refer to the slope of these plots as the ‘diversity slope’, with negative slopes supporting EC and positive slopes supporting DBD (Figure 1). As a null, we compare these slopes to the expectation under Neutral Theory. To avoid a trivially positive diversity slope due to variation in sequencing effort, all samples were rarefied to 5000 observations (counts of 16S rRNA gene sequences), as diversity estimates are highly sensitive to sampling effort (Gotelli and Colwell, 2001). As 16S evolves at a rate of roughly 1–2 substitutions per million years (Kuo and Ochman, 2009b), evolutionary diversification within individual EMP samples cannot be uncovered using this marker; rather our data represent mainly a record of community assembly.

Results

Quantifying the DBD-EC continuum in prokaryote communities compared to neutral null models

We used generalized linear mixed models (GLMMs) to estimate the diversity slope at each taxonomic level in the EMP data, which revealed a tendency towards positive slopes with significant variation explained by the random effects of lineage, environment, and their interaction (Table 1, Figure 2, Figure 2—figure supplement 116, Supplementary file 1 Section 1). All models reported here provide significantly better fits compared to models without the fixed effect of community diversity, and coefficients of determination (R2) are higher with the inclusion of random effects, showing their importance (Supplementary file 2). Examples of how the diversity slope varies across lineages and environments are shown in Figure 2 and Figure 2—figure supplement 216. To assess the significance of these slope estimates in light of potential sampling bias and data structure (Gotelli and Colwell, 2001; Jarvinen, 1982), we considered null models, all of which randomize the associations between ASVs within a sample, thus randomizing any true biotic interactions. Models 1 and 2 are based on draws from the zero-sum multinomial (ZSM) distribution, which arises from the standard Neutral Theory of Biodiversity (Materials and methods). Model 1, in which each microbiome sample is drawn from the same ZSM distribution, produces a significantly negative diversity slope (Figure 2—figure supplement 17; Table 2). Model 2, in which each environment draws from a separate distribution, is effectively a composite of Model 1 in which different environments, each with a negative slope, are 'stacked' to yield an overall positive slope (Figure 2—figure supplement 17). However, the Model 2 slope is not significant in a GLMM accounting for variation across environments (Table 2, Supplementary file 3 Section 1.2). In the real EMP data, most individual environments tend towards a positive slope (Figure 2—figure supplement 18). The tendency towards positive diversity slopes in the EMP is therefore not straightforwardly explained by neutral processes.

Table 1. Effects of community diversity on focal lineage diversity across taxonomic ratios.

The GLMMs show a statistically significant positive effect of community diversity on focal lineage diversity. Each row reports the effect of community diversity (Div) on focal lineage diversity, as well as its standard error, Wald z-statistic for its effect size and the corresponding P-value (left section), or standard deviation on the slope for the significant random effects (right section). SE = standard error, Env = environment type, Lin = lineage type, Lab = Principal Investigator ID, Sample = EMP Sample ID. Interactions are denoted as ‘*’. n.s. = not significant (likelihood-ratio test). All models provide a significantly better fit than null models without fixed effects (∆AIC > 10 and p<0.05; Supplementary file 2).

Slope (fixed effects) Standard deviation on the slope (random effects)
Div SE z P Env Lin Lin*Env Env*Lab Sample
ASV:Genus 0.091 0.016 5.792 6.95e-09 n.s. 0.074 0.142 0.114 0.067
Genus:Family 0.047 0.008 5.911 3.41e-09 n.s. 0.071 0.07 0.039 n.s.
Family:Order 0.119 0.017 7.001 2.54e-12 0.023 0.094 0.092 0.106 n.s.
Order:Class 0.109 0.020 5.447 5.13e-08 0.05 0.141 0.078 0.051 n.s.
Class:Phylum 0.272 0.043 6.341 2.29e-10 0.119 0.174 0.119 0.114 n.s.

Figure 2. Focal-lineage diversity as a function of community diversity in the top two most prevalent taxa at each taxonomic level.

As in Figure 1, the x-axes show community diversity in units of the number of non-focal taxa (e.g. the number of non-Proteobacteria phyla for the left-most column), and the y-axes show the taxonomic ratio within the focal taxon (e.g. the number of classes within Proteobacteria). Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey. Note that linear models are distinct from GLMMs, and are for illustrative purposes only. Four representative environments are shown (see Figure 2—figure supplement 216 for plots in all 17 environments).

Figure 2.

Figure 2—figure supplement 1. Distributions of diversity slope estimates across different random effects, from the GLMMs predicting focal lineage diversity as a function of community diversity.

Figure 2—figure supplement 1.

(A) Class:Phylum, (B) Order:Class, (C) Family:Order, (D) Genus:Family, and (E) ASV:Genus. Estimation of random effect coefficients from the GLMMs (Table S1), shows that the effect of diversity on focal lineage diversity (slope estimates) are generally positive but could be negative in some lineages or combinations of environment, lineage (Environment*Lineage), and the laboratory that submitted the dataset (Environment*Lab).Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.
Figure 2—figure supplement 2. Focal-lineage diversity as a function of community diversity across biomes in Proteobacteria.

Figure 2—figure supplement 2.

Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.
Figure 2—figure supplement 3. Focal-lineage diversity as a function of community diversity across biomes in Bacteroidetes.

Figure 2—figure supplement 3.

Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.
Figure 2—figure supplement 4. Focal-lineage diversity as a function of community diversity across biomes in Actinobacteria.

Figure 2—figure supplement 4.

Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.
Figure 2—figure supplement 5. Focal-lineage diversity as a function of community diversity across biomes in Gammaproteobacteria.

Figure 2—figure supplement 5.

Linear models are shown for the number of orders per class (y-axis) as a function of community diversity (non-focal classes, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 6. Focal-lineage diversity as a function of community diversity across biomes in Alphaproteobacteria.

Figure 2—figure supplement 6.

Linear models are shown for the number of orders per class (y-axis) as a function of community diversity (non-focal classes, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 7. Focal-lineage diversity as a function of community diversity across biomes in Actinobacteria.

Figure 2—figure supplement 7.

Linear models are shown for the number of orders per class (y-axis) as a function of community diversity (non-focal classes, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 8. Focal-lineage diversity as a function of community diversity across biomes in Actinomycetales.

Figure 2—figure supplement 8.

Linear models are shown for the number of families per order (y-axis) as a function of community diversity (non-focal orders, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 9. Focal-lineage diversity as a function of community diversity across biomes in Flavobacteriales.

Figure 2—figure supplement 9.

Linear models are shown for the number of families per order (y-axis) as a function of community diversity (non-focal orders, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 10. Focal-lineage diversity as a function of community diversity across biomes in Rhizobiales.

Figure 2—figure supplement 10.

Linear models are shown for the number of families per order (y-axis) as a function of community diversity (non-focal orders, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 11. Focal-lineage diversity as a function of community diversity across biomes in Flavobacteriaceae.

Figure 2—figure supplement 11.

Linear models are shown for genera per family (y-axis) as a function of community diversity (non-focal families, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 12. Focal-lineage diversity as a function of community diversity across biomes in Sphingomonadaceae.

Figure 2—figure supplement 12.

Linear models are shown for genera per family (y-axis) as a function of community diversity (non-focal families, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 13. Focal-lineage diversity as a function of community diversity across biomes in Verrucomicrobiaceae.

Figure 2—figure supplement 13.

Linear models are shown for genera per family (y-axis) as a function of community diversity (non-focal families, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 14. Focal-lineage diversity as a function of community diversity across biomes in Pseudomonas.

Figure 2—figure supplement 14.

Linear models are shown for ASVs per genus (y-axis) as a function of community diversity (non-focal genera, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 15. Focal-lineage diversity as a function of community diversity across biomes in Planctomyces.

Figure 2—figure supplement 15.

Linear models are shown for ASVs per genus (y-axis) as a function of community diversity (non-focal genera, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 16. Focal-lineage diversity as a function of community diversity across biomes in Clostridium.

Figure 2—figure supplement 16.

Linear models are shown for ASVs per genus (y-axis) as a function of community diversity (non-focal genera, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.
Figure 2—figure supplement 17. Null models based on Neutral Theory.

Figure 2—figure supplement 17.

Results are shown from data simulated under (A) neutral Model 1, (B) neutral Model 2, or (C) neutral Model 3. Model 1 is sampled from the zero-sum multinomial distribution with a single distribution for the whole dataset, while Model 2 includes a separate distribution for each of the 17 different environments (EMPO 3 biomes). In Model 3 (C), the effect of DBD (top rows) or EC (bottom rows) are ‘spiked in’ at different levels, ranging from 0 to 100% of ASVs in a sample. Blue lines show a linear fit, with slopes (m) estimated by GLMM in selected panels. See Methods for model details, and Table 2 and Supplementary file 3, Section 1.2 for full GLMM results.
Figure 2—figure supplement 18. Lineage diversity (mean ASV:Genus ratio among all lineages) as a function of community diversity (number of genera) in the EMP data.

Figure 2—figure supplement 18.

Samples from different environments (EMPO level 3) are shown in different colours, each with their corresponding linear model fit.
Figure 2—figure supplement 19. Taxonomic ratios estimated from simulated rarefied sequence data.

Figure 2—figure supplement 19.

Each panel simulates a set of microbiome samples that differ in their diversity (number of genera in left panels A and B, number of phyla in right panels C and D) while maintaining a set true taxonomic ratio (horizontal black line). (A) True ratio set to 2 ASVs/genus, close to the per-sample mean and median in the real EMP data, in a range of samples between 1 and 1128 named genera, as observed in the real EMP data. (B) True ratio set to 20 ASVs/genus, equal to the overall mean of 22,014 named ASVs in 1128 named genera, and close to the maximum ratios observed in individual samples (Figure 2—figure supplement 5). Insets show the ranges of 1–50 and 51–150 genera, approximating observations from lower- or higher-diversity samples such as gut and soil, respectively (Figure 2—figure supplement 5). The insets only show the rarefaction to 5000 sequences, as used in the real EMP dataset. (C) True ratio set to three classes/phylum, close to the per-sample mean and median in the real EMP data, in a range of samples between 1 and 84 named phyla, as observed in the real EMP data. (D) True ratio set to 10 classes/phylum, close to the maximum ratios observed in individual samples (Figure 2—figure supplements 24). Different rarefaction levels are shown as different coloured lines.
Figure 2—figure supplement 20. Linear, quadratic, and cubic models for the relationship between focal-lineage diversity and community diversity for varying levels of % nucleotide identity.

Figure 2—figure supplement 20.

Community diversity was estimated as the number of clusters at a focal level (di) and focal-lineage diversity as the mean of the clusters at the rank above (di 1/di). All P-values are <0.001. Linear fit (grey); quadratic fit (blue), cubic fit (red); same colours for the associated adjusted R2. The x-axis (diversity) shows the number of clusters at the focal percent-identity level (di), and the y-axis (diversification) is the mean of the clusters at the rank above (di 1/di).
Figure 2—figure supplement 21. Focal clusters at 75% nucleotide identity.

Figure 2—figure supplement 21.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).
Figure 2—figure supplement 22. Focal clusters at 80% nucleotide identity.

Figure 2—figure supplement 22.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).
Figure 2—figure supplement 23. Focal clusters at 85% nucleotide identity.

Figure 2—figure supplement 23.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).
Figure 2—figure supplement 24. Focal clusters at 90% nucleotide identity.

Figure 2—figure supplement 24.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).
Figure 2—figure supplement 25. Focal clusters at 95% nucleotide identity.

Figure 2—figure supplement 25.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).
Figure 2—figure supplement 26. Focal clusters at 97% nucleotide identity.

Figure 2—figure supplement 26.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

Table 2. GLMMs applied to data simulated under null models.

Null models 1 and 2 were generated under the ZSM distribution, with a single distribution for the whole dataset (Model 1) or one distribution per environment (Model 2). Model 3 is similar to Model 1, except with a single Poisson distribution for the whole dataset, and +DBD or +EC refer to adding these effects to all ASVs in each sample (see Materials and methods and Figure 2—figure supplement 17). Each row reports the effect of community diversity (Div) on focal lineage diversity, as well as its standard error, Wald z-statistic for its effect size and the corresponding P-value (Wald test) (left section), or standard deviation on the slope for the significant random effects (right section). SE = standard error, Env = environment type, Lin = lineage type, Sample = EMP Sample ID. n.s. = not significant (likelihood-ratio test), n.t. = not tested, because separate environments were not included in Models 1 or 3.

Slope (fixed effects) Stand dev on the slope (random effects)
Div SE z P Env Lin Lin*Env Sample
Model 1 −0.005 0.000 −9.807 <2 e −16 n.t. 0.639 n.t. n.s.
Model 2 n.s.
Model 3 −0.012 0.002 −6.552 5.69e-11 n.t. 0.021 n.t. n.s.
Model 3 + DBD 0.016 0.001 11.48 <2e-16 n.t. 0.008 n.t. n.s.
Model 3 + EC −0.011 0.002 −6.14 8.26e-10 n.t. ns n.t. n.s.

To estimate the power to detect either DBD or EC, we specifically added each of these effects to data simulated under a null model. As expected, adding DBD reversed the negative slope and rendered it positive (Table 2; Figure 2—figure supplement 17, Supplementary file 3 Section 2.1), suggesting reasonable power to detect DBD when truly present. In contrast, the addition of EC had little effect on the slope, suggesting low power to detect EC under some null models. Taken together, these modelling results suggest that positive diversity slopes observed in the EMP are more readily explained by DBD than by Neutral Theory, whereas negative slopes could be explained by EC, Neutral Theory, or some combination of the two.

Because taxonomic labels can be unavailable or inconsistent with phylogenetic relationships (Parks et al., 2018; Vos, 2011) we repeated the analyses using nucleotide sequence identity in the 16S rRNA gene instead of taxonomy, and again recovered generally positive diversity slopes (Materials and methods). As a final sensitivity analysis, we repeated the GLMMs using unrarefied community Shannon diversity instead of richness (Materials and methods) and obtained similar results, with generally positive diversity slopes that could in some cases be reversed depending on the lineage or environment (Table 3, Supplementary file 1 Section 2). The Shannon diversity metric is robust to sampling effort, suggesting that the results are not biased by undersampling in diverse biomes. Even if undersampling could bias the diversity slope downward in more diverse samples, the effect is unlikely to be large at a rarefaction to 5000 sequences, and only to occur at the extremes of diversity (e.g. very many genera and high ASV:genus ratios) and not at higher taxonomic levels (e.g. Class:Phylum) (Figure 2—figure supplement 19).

Table 3. GLMMs with community diversity measured using Shannon diversity.

Results are shown from GLMMs with Shannon diversity of non-focal taxa (Div) as a predictor of ASVs richness of focal taxa. Each row reports the estimate (Div), as well as its standard error, Wald z-statistic for its effect size and the corresponding P-value (Wald test) (left section), or standard deviation on the slope for the significant random effects (right section). SE = standard error, Env = environment type, Lin = lineage type, Lab = Principal Investigator ID, Sample = EMP Sample ID. n.s. = not significant (likelihood-ratio test).

Fixed effects Random effects
Div SE z p Env Lin Env*Lin Env*Lab Sample
Genus 0.055 0.013 4.33 1.49e-05 n.s. 0.08 0.15 0.085 0.054
Family 0.148 0227 6.491 8.51e-11 n.s. 0.184 0.268 0.16 0.134
Order 0.378 0.038 9.864 <2e-16 n.s. 0.34 0.417 0.258 0.202
Class 0.398 0.05 7.973 1.54e-15 n.s. 0.369 0.46 0.326 0.262
Phylum 0.319 0.088 3.614 0.0003 0.169 0.316 0.5 0.495 0.378

DBD reaches a plateau at high diversity

It is expected from theory and experimental studies that a positive DBD relationship should eventually reach a plateau, giving way to EC as niches become saturated (Brockhurst et al., 2007; Gómez and Buckling, 2013). This expectation is borne out in our dataset, particularly in the nucleotide sequence-based analyses which support quadratic or cubic relationships over linear diversity slopes (Figure 2—figure supplement 20). For example, in the animal distal gut, a relatively low-diversity biome, we observed a strong linear DBD relationship at most phylogenetic depths; in contrast, the much more diverse soil biome clearly reaches a plateau (Figure 2—figure supplements 2126).

To comprehensively test the hypothesis that more diverse microbiomes experience weaker DBD due to saturated niche space, we used a GLMM including the interaction between diversity and environment as a fixed effect. We considered this model only for taxonomic ratios with significant diversity slope variation by environment (Table 1): Family:Order, Order:Class, and Class:Phylum. Diversity slopes were significantly higher in less diverse (often host-associated) biomes, suggesting that niche filling leads to a plateau of DBD in more diverse biomes (Figure 3, Supplementary file 1 Section 3). The interaction observed in the real EMP data between community diversity and biome type in shaping focal lineage diversity was not observed under a neutral null (Model 2, in which each environment has its own characteristic level of diversity) (Supplementary file 3 Section 1.2). The DBD plateau observed in more diverse biomes is thus not readily explained by a neutral model, nor is rarefaction expected to bias the diversity slope estimates, particularly at the Class:Phylum level (Figure 2—figure supplement 19). This suggests that the plateau of DBD at higher levels of community diversity is not an artefact of data structure or sampling effort. Finally, we considered whether variation along the EC-DBD continuum could be explained by differential cell density across environments, which could affect both the frequency of cell-cell interactions (a biological effect) or the sampling depth (a technical artefact). Although precise estimates of cell densities in all EMP biomes are not available, we extracted plausible ranges for eight biomes from the literature (Kennedy and de Luna, 2005; Lindow and Brandl, 2003; Sender et al., 2016; Whitman et al., 1998) and annotated these in Figure 3. It is clear from this figure that relatively high- and low-density samples are found along the range of community taxonomic diversities, demonstrating that cell density is unlikely to drive the trend of decreasing diversity slopes with increasing community diversity.

Figure 3. The diversity slope of focal taxa is higher in low-diversity (often host-associated) microbiomes.

Figure 3.

The x-axis shows the mean number of non-focal taxa: (A) phyla, (B) classes, and (C) orders in each biome. On the y-axis, the diversity slope was estimated by a GLMM predicting focal lineage diversity as a function of the interaction between community diversity and environment type at the level of (A) Class:Phylum, (B) Order:Class, and (C) Family:Order ratios (Supplementary file 1 Section 3). The line represents a linear regression; the shaded area depicts 95% confidence limits of the fitted values. Adjusted R2 and P-values from the linear fits are shown at the top right of each panel. See Supplementary file 2 for model goodness of fit. Slopes not significantly different from zero are shown as empty circles. Estimates of bacterial cell density from the literature are indicated in grey text, in units of bacteria/mm3. For animal (skin) and plant surface, units of bacteria/mm2 were converted to mm3 assuming layers of bacteria one micron thick. For rhizosphere samples we assume a density of 1–2 g/cm(Kennedy and de Luna, 2005).

Abiotic drivers of diversity

Our results thus far suggest that community diversity is a major determinant of the EC-DBD continuum, and by extension that biotic interactions may override abiotic factors in determining where a community lies on the continuum. To formally test for the additional role abiotic drivers might play in generating the observed EC-DBD continuum, we analysed two data sets in more detail.

First, we analysed a subset of 192 EMP samples with measurements of four key abiotic factors shown to affect microbial diversity (pH, temperature, latitude, and elevation; Delgado-Baquerizo et al., 2018; Lauber et al., 2009; Power et al., 2018; Schluter and Pennell, 2017). We fitted a GLMM with focal lineage-specific diversity as the dependent variable, and with the number of non-focal lineages, the four abiotic factors and their interactions as predictors (fixed effects). As in the full EMP dataset (Table 1), focal lineage diversity was positively associated with community diversity at all taxonomic ratios in the EMP subset (Table 4). As expected, certain abiotic factors, alone or in combination with diversity, had significant effects on focal lineage diversity (Table 4). However, the effects of abiotic factors were always weaker than the effect of community diversity (Table 4; Supplementary file 1 Section 4).

Table 4. Community diversity has a stronger effect than abiotic factors on focal lineage diversity (EMP dataset).

Results are shown from GLMMs with community diversity (Div), four abiotic factors (temperature, elevation, pH, and latitude), and their interactions with community diversity, as predictors of focal lineage diversity. Random effects on the intercept included environment, lineage, lab ID and sample ID. Each row reports the taxonomic ratio, the predictors used in the GLMM (fixed effects only), their slope estimate (Est), standard error (SE) and P-value (P) (Wald test). Interactions are denoted as ‘*’. Random effects are not shown.

Predictor Est SE P
ASV:Genus Div 0.128 0.013 <2e-16
Temperature 0.04 0.014 0.00479
Div*Temperature 0.043 0.014 0.00175
Div*Latitude 0.031 0.013 0.02119
Div*Elevation −0.031 0.014 0.02829
Genus:Family Div 0.094 0.009 <2e-16
Temperature 0.026 0.009 0.00268
pH −0.042 0.009 5.88e-06
Family:Order Div 0.131 0.01 <2e-16
Order:Class Div 0.184 0.01 <2e-16
Div*Temperature 0.032 0.009 0.000827
Div*Latitude 0.023 0.008 0.005403
Class:Phylum Div 0.236 0.011 <2e-16
Div*Temperature 0.059 0.014 2.15e-05
Div*Latitude 0.03 0.011 0.00884

Second, we used a global 16S sequencing dataset of 237 soil samples associated with more detailed environmental metadata (Delgado-Baquerizo et al., 2018) which we reprocessed to yield ASVs comparable to those in the EMP (Materials and methods). This dataset revealed weaker evidence for DBD and stronger effects of abiotic variables on diversity. Community diversity generally had significant positive effects on focal-lineage diversity, but the effect was weak and not detectable at all taxonomic ratios (Table 5). Known abiotic drivers of soil bacterial diversity such as pH (Lauber et al., 2009) and latitude (Delgado-Baquerizo et al., 2018) had effects of similar or stronger magnitude compared to the effect of community diversity (Table 5, Supplementary file 4). The relatively weak effect of DBD and strong effect of abiotic drivers on diversity in this soil dataset can be explained by the fact that soils generally are highly diverse and have relatively low-diversity slopes (Figure 3).

Table 5. GLMMs applied to a soil dataset.

Each row reports the taxonomic ratio, the predictors used in the GLMM (fixed effects only), their estimate (Est), standard error (SE) and P-value (P) (Wald test). Left columns: GLMM with community diversity (Div) and all abiotic variables considered separately, as predictors of focal lineage diversity. Right columns: GLMM with community diversity (Div) and the three first principle components (PCs) representing abiotic variables, as predictors of focal lineage diversity. n.s., non-significant (LRT test). All models provide a significantly better fit than null models without fixed effects (∆AIC >10 and p<0.05; Supplementary file 2), except for the GLMM with abiotic factors at the Family:Order level, where latitude has a significant effect on focal lineage diversity but its effect is nearly null, with a ∆AIC between full and null model of 4 and a null marginal R2.

GLMMs with abiotic variables GLMMs with the 3 first PCs
Predictor Est SE P Predictor Est SE P
ASV:Genus Div n.s. Div 0.064 0.016 9.47e-05
Latitude 0.294 0.025 <2e-16 PC1 −0.065 0.007 <2e-16
UV_light −0.177 0.016 <2e-16 PC2 −0.03 0.006 1.98e-05
MDR 0.028 0.006 7.12e-06
NPP2003_2015 −0.066 0.005 <2e-16
Latitude^2 −0.3 0.029 <2e-16
Clay_silt^2 −0.012 0.004 0.003
Soil_N^2 −0.007 0.001 1.66e-06
Soil_C_N_ratio^2 0.003 0.001 0.004
PSEA^2 0.01 0.002 4.84e-06
MDR^2 0.017 0.003 2.40e-08
NPP2003_2015^2 −0.016 0.004 0.0001
Genus:Family Div 0.032 0.01 0.0011 Div 0.033 0.01 0.001
Latitude −0.035 0.006 2.04e-09 PC1 −0.016 0.006 0.02
PC2 0.02 0.006 0.00089
Family:Order Div n.s. Div n.s.
Latitude −0.0005 0.0002 0.0105 PC1 −0.026 0.007 0.00032
Div*PC1 0.04 0.006 2.14e-12
Div*PC3 0.023 0.005 1.68e-06
Order:Class Null model with no predictor was significant
Class:Phylum Div 0.032 0.01 0.00174 Div 0.032 0.01 0.003
pH 0.074 0.01 4.37e-13 PC1 −0.051 0.01 3.54e-07
PC2 −0.028 0.01 0.006

We note that it remains possible that unmeasured abiotic effects could explain some of the DBD effects observed in the EMP. Although only a small subset of abiotic factors was considered, the generally positive diversity slopes in the EMP are not likely to be driven by these factors in the abiotic environment (Table 4). Specifically, we consider it unlikely that unmeasured abiotic factors would always act similarly, and in the same direction across multiple different environments, to drive DBD. However, as demonstrated in soil (Table 5), abiotic factors may become increasingly important in highly diverse biomes with weak DBD.

DBD is more pronounced in resident taxa than in migrants or generalists

A recent meta-analysis of 16S sequence data from a variety of biomes suggests there is an important distinction between generalist lineages found in many environments, compared to specialists with a more restricted distribution (Sriswasdi et al., 2017). Generalists were inferred to have higher speciation rates, suggesting that the DBD-EC balance might differ between generalists and specialists (Sriswasdi et al., 2017). To further investigate this difference, we defined ‘resident’, taxa with a strong preference for a specific biome, in addition to generalists without a strong biome preference in the EMP dataset. We first clustered environmental samples by their genus-level community composition using fuzzy k-means clustering (Figure 4a), which identified three major clusters: ‘animal-associated’, ‘saline’, and ‘non-saline’. The clustering included some outliers (e.g. plant corpus grouping with animals), but was generally consistent with known distinctions between host-associated vs. free-living (Thompson et al., 2017), and saline vs. non-saline communities (Auguet et al., 2010; Lozupone and Knight, 2007). Resident genera were defined as those with a strong preference for a particular environment cluster (whether due to dispersal limitation or narrow niche breadth) using indicator species analysis (permutation test, p<0.05; Figure 4a; Figure 4—figure supplement 1; Supplementary file 5), and genera without a strong preference were considered generalists. When residents of one environmental cluster were (relatively infrequently) observed in a different cluster, we defined them as ‘migrants’ in that sample. For each environment cluster, we ran a GLMM with resident genus-level diversity (the number of non-focal genera) as a predictor of focal-lineage diversity (the ASV:Genus ratio) for residents, generalists, or migrants to that sample (Supplementary file 1 Section 5).

Figure 4. The DBD relationship varies between resident and non-resident genera.

(A) Ordination showing genera clustering into their preferred environment clusters. The matrix of 17 environments (rows) by 1128 genera (columns) by, with the matrix entries indicating the percentage of samples from a given environment in which each genus is present, was subjected to principal components analysis (PCA). Circles indicate genera and triangles indicate environments (EMPO 3 biomes). coloured circles are genera inferred by indicator species analysis to be residents of a certain environmental cluster, and grey circles are generalist genera. The three environment clusters identified by fuzzy k-means clustering are: Non-saline (NS, blue), saline (S, green) and animal-associated (purple). Triangles of the same colour indicate EMPO 3 biomes clustered into the same environmental cluster. (B) DBD in resident versus non-resident genera across environment clusters. Results of GLMMs modelling focal lineage diversity as a function of the interaction between community diversity and resident/migrant/generalist status. The x-axis shows the standardized number of non-focal resident genera (community diversity); the y-axis shows the number of ASVs per focal genus. Resident focal genera are shown in orange, migrant focal genera in red, and generalist focal genera in black. Red stars indicate a significantly positive or negative slope (Wald test, p<0.005). See Supplementary file 2 for model goodness of fit.

Figure 4.

Figure 4—figure supplement 1. Resident genera of environment clusters.

Figure 4—figure supplement 1.

Results from indicator species analysis illustrated as a heatmap. Only the 25 resident genera with the highest indval indices and p<0.05 (permutation test) are shown for every environment cluster (animal-associated, non-saline and saline free). For the full results see Supplementary file 5.

Resident community diversity had no significant effect on the diversity of generalists in animal-associated, saline and non-saline clusters (GLMM, Wald test, p>0.05), but was positively correlated with lineage-specific resident diversity (GLMM, Wald test, z = 7.1, p=1.25e-12; z = 3.316, p=0.0009; z = 7.109, p=1.17e-12, respectively). Resident community diversity significantly decreased migrant diversity in saline (GLMM, z = −3.194, p=0.0014) and non-saline environment clusters (GLMM, z = −2.840, p=0.0045), but had no significant effect in the animal-associated cluster (GLMM, p>0.05) (Figure 4b). These results suggest that, although generalist lineages may have higher speciation rates and colonize more habitats than specialists (Sriswasdi et al., 2017), they have lower diversity slopes. Migrants to the ‘wrong’ environment experience even less DBD, and are even subject to EC in two out of three environment types (Figure 4b). The accumulation of diversity via successful establishment of migrants may thus be limited, presumably because most niches are already occupied by residents.

Discussion

Using ~10 million individual marker sequences from the EMP, we demonstrate an overall trend for diversity in focal lineages to be positively associated with overall community diversity, albeit with significant variation across lineages and environments. The strength of the DBD relationship dissipates with increasing microbiome diversity, which we hypothesize is caused by niche saturation. In more diverse biomes such as soil, abiotic factors therefore may become relatively more important in driving focal-lineage diversity. The effect of DBD is strongest among habitat specialists (residents), suggesting that long-term niche adaptation tends to select against the establishment of migrant diversity.

While most of the DBD literature considers a model of evolutionary diversification (Schluter and Pennell, 2017; Whittaker, 1972), our results pertain mainly to ecological community assembly dynamics. At the limited resolution of 16S rRNA gene sequences, we do not expect measurable diversification within an individual microbiome sample (Kuo and Ochman, 2009b); however, community diversity could still select for (as in DBD) or against (as in EC) increasing diversity in a focal lineage, even if this lineage diversified before the sampled community assembled. Future work with higher resolution genomic or metagenomic data will enable testing if and how DBD arises in microbial communities via evolutionary diversification, and also how prokaryote diversification is affected by other community members including phages (Brockhurst et al., 2005), protists (Meyer and Kassen, 2007), and fungi (Kastman et al., 2016). Predator-prey, cross-feeding, and other biotic interactions with these non-prokaryotic community members could explain some of the unaccounted variation we observed in diversity slopes across environments.

Our dataset also provides an opportunity to explore how DBD relates to genome size evolution. Bacteria with larger repertoires of accessory genes, and thus larger genomes, are able to occupy a wider range of niches (Barberán et al., 2014). Taxa with larger genomes might therefore be hypothesized to better survive and thrive when they disperse into a new location, exhibiting stronger DBD. Although a comprehensive test of this hypothesis will require higher resolution genomic or metagenomic data, as a preliminary exploration we assigned genome sizes to 576 focal genera for which at least one whole genome sequence was available (using the largest recorded genome size for each genus) and added an interaction term between genome size and diversity as a fixed effect in the GLMM (Materials and methods). Consistent with our expectation, we observed a significant positive effect of genome size on the diversity slope (GLMM, Wald test, z = 2.5, p=0.01; Figure 5, Supplementary file 1 Section 6). This effect was not observed in null models, in which the interaction between community diversity and focal genus genome size was never significant (Supplementary file 3 Section 1.3 and 2.2) and so this effect of genome size cannot be trivially explained by data structure. The positive relationship between genome size and DBD is likely even stronger than estimated, because assigning genome sizes to entire genera is imprecise (i.e. there is variation in genome size within a genus, or even within species), therefore weakening the correlation.

Figure 5. Positive effect of genome size on DBD.

Figure 5.

Results are shown from a GLMM predicting focal lineage diversity as a function of the interaction between community diversity and genome size at the ASV:Genus ratio (Supplementary file 1 Section 6). The x-axis shows the standardized number of non-focal genera (community diversity); the y-axis shows the number of ASVs per focal genus. Variable diversity slopes corresponding to different genome sizes are shown in a blue colour gradient; the shaded area depicts 95% confidence limits of the fitted values. See Supplementary file 2 for model goodness of fit.

The positive correlation between genome size and DBD observed here could be driven by larger metabolic repertoires encoded by larger genomes (Barberán et al., 2014), potentially creating more opportunities to benefit from cross-feeding, niche construction (San Roman and Wagner, 2018), and other interspecies interactions. This tendency appears to be at odds with the Black Queen hypothesis, which predicts that social conflict between interacting species leads to the inactivation and loss of genes involved in shareable metabolites (public goods), eventually resulting in reduced genome size (Morris et al., 2012). Such a process would produce a negative correlation between the degree of species interactions (i.e. community diversity) and genome size (Morris et al., 2012). The interaction between genome size, biotic interactions and diversification thus deserves further study.

Alongside theory and experimental data, the EMP survey data provide a window into the biotic drivers of microbial diversity in nature. In particular, our correlational results support previous experiments and theory showing that DBD is strong when community diversity is low (Calcagno et al., 2017; Jousset et al., 2016), driving the accumulation of diversity in a positive feedback loop until niches are filled and EC starts to predominate (Bailey et al., 2013; Brockhurst et al., 2007; Gómez and Buckling, 2013; Meyer and Kassen, 2007). However, due to the correlational nature of the EMP data, it is not possible to test whether DBD is primarily due to the creation of novel niches via biotic interactions and niche construction (Laland et al., 1999), or due to increased competition leading to specialization on underexploited resources (Hibbing et al., 2010; Jousset et al., 2016). We hope future higher resolution genomic studies, and complementary experiments, will be able to elucidate the types of biotic interactions that promote microbiome diversity. Regardless of the underlying mechanisms, our results demonstrate a general scaling between different levels of community diversity, which has important implications for modelling and predicting community function and stability in response to perturbations (Coyte et al., 2015; Pennekamp et al., 2018). The answer to the question ‘why are microbiomes so diverse?’ might in a large part be because microbiomes are so diverse (Emerson and Kolm, 2005).

Data and materials availability

All data is available from the Earth Microbiome Project (ftp://ftp.microbio.me/), as detailed in the Methods. All computer code used for analysis are available at (https://github.com/Naima16/dbd.gitMadi, 2020; copy archived at swh:1:rev:ecb4f844264b72eaa8cbd708244ecd32d414c7dd).

Materials and methods

Earth Microbiome Project dataset

We used the EMP ‘2000 subset’ of 16S rRNA gene sequences, rarefied to 5000 sequences per sample. This subset contains 155,002 ASVs from 2000 samples with an even distribution across 17 natural environments (EMP Ontology level 3). Data were downloaded from the EMP FTP server (ftp://ftp.microbio.me/), on February 9, 2018.

Specifically, 16S rRNA-V4 region reads (90 bp, GreenGenes 13.8 taxonomy) along with environmental data and EMPO3 designations (http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/empo/) were used. Sequence summaries were downloaded from: ftp://ftp.microbio.me/emp/release1/otu_distributions/otu_summary.emp_deblur_90bp.subset_2k.rare_5000.tsv, environmental data from: ftp://ftp.microbio.me/emp/release1/mapping_files/emp_qiime_mapping_release1.tsv, and EMPO3 designations from: ftp://ftp.microbio.me/emp/release1/mapping_files/emp_qiime_mapping_subset_2k.tsv.

The list of the associated 97 studies and 61 corresponding principal investigator identities were downloaded from https://www.nature.com/articles/nature24621#s1.

Based on the ASV annotations across samples, we estimated the taxonomic ratio for each focal lineage (ASV:Genus, Genus:Family, Family:Order, Order:Class and Class:Phylum), along with the number of non-focal lineages (dbd_analys_input.py, glmm_analys_input.py, Python Version 2.7). Unclassified ASVs were removed from the analyses.

Generalized linear mixed model (GLMM) analyses

We used GLMMs to determine how focal lineage diversity (e.g. its ASV:Genus ratio) is affected by community diversity (e.g. non-focal genera). The effects of environment (as defined by the EMP Ontology ‘level 3 biomes’) and the focal lineage identity were included as random effects on the slope and intercept. We also controlled for the submitting laboratory (identified by the principal investigator) and the EMP unique sample identifier (i.e. if two taxa were part of the same sample).

All models were fitted in Rstudio (Version 1.1.442, R Version 3.5.2) using the glmer function of the lme4 package (Bates et al., 2015). Data standardization (transformation to a mean of zero and a standard deviation of one) was applied to all predictors to get comparable estimates. In models with only one predictor, applying standardization resolved convergence warnings and considerably sped up the optimization. We first tested the significance of random effects, by using likelihood-ratio tests (LRTs, implemented in the anova function in the R stats package) on nested models where each random effect was dropped one at a time. We then assessed the significance of fixed effects using the drop1 function from the stats package with the likelihood-ratio test option (this function drops individual terms from the full model and compares models based on the AIC). We calculated the Akaike information criterion (AIC) of each significant model and a null model including all random effects but no fixed effects other than the intercept. We then report the difference in AIC between the full and null models (∆AIC), along with a likelihood-ratio test p-value to assess the significance of the full model relative to the null. Only significant models (p<0.05) are reported.

As an additional test of the goodness of fit for the significant models, we estimated the coefficient of determination (R2) using the r.squaredGLMM function from the MuMIn R package. This function implements a method developed by Nakagawa and Schielzeth and its extension for random slopes (Johnson, 2014; Nakagawa and Schielzeth, 2013). Two values were estimated: the marginal R2, as a measure of the variance explained only by fixed effects, and the conditional R2 as a measure of the variance explained by the entire model (both fixed effects and random effects). Only results from R2 estimation based on lognormal and trigamma methods were reported because they are specific to the logarithmic link function used in all GLMMs.

Diagnostic plots (plot and qqnorm R functions in base and stats packages) were checked for each model to ensure that residual homoscedasticity (homogeneity of variance) was fulfilled: no increase of the variance with fitted values and residuals were symmetrically distributed tending to cluster around the 0 of the ordinate, but with an expected pattern due to count data. Normality plots were imperfect, but they generally showed that the residuals were close to being normally distributed. The assumption of normality is often difficult to fulfill with high numbers of observations, as is the case in our models (https://www.statisticshowto.datasciencecentral.com/shapiro-wilk-test/), and non-normality is less of concern than heteroscedastic for the validity of GLMMs (https://bbolker.github.io/mixedmodels-misc/ecostats_chap.html#diagnostics).

We tested for overdispersion using the overdisp_fun R function available at https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html, and found that all the models were not overdispersed, but rather were underdispersed: the ratio of the sum of squared Pearson residuals to residual degrees of freedom was <1 and non-significant when tested with a chi-squared test. The only exception was Shannon diversity-based GLMMs. In case of underdispersion and given that underdispersion leads to more conservative results, we retained the GLMMs with Poisson error distribution, despite the underdispersion. (GLMM FAQ; Ben Bolker and others; 25 September 2018; https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#underdispersion). For Shannon diversity-based GLMMs, we accounted for overdispersion by adding an observation-level random effect to the GLMMs (Elston et al., 2001).

Taxonomy-based GLMMs

To test how focal lineage diversity (e.g. its ASV:Genus ratio) is affected by community diversity (e.g. non-focal genera richness), for different environment types and lineages across all taxonomic ratios, we used generalized linear mixed models (GLMMs) fitted on the EMP dataset. As the dependent variable (focal lineage diversity, defined as taxonomic ratios, ASV:Genus, Genus:Family, Family:Order, Order:Class, and Class:Phylum) was a count response, we used a Poisson error distribution with a log link function. Community diversity (number of non-focal lineages: non-focal Genera, Families, Orders, Classes, and Phyla), standardized to a mean of zero and a standard deviation of one, was specified as the predictor (fixed effect). We included the following random effects on the slope and intercept: lineage (Lin), environment (Env), environment nested within lineage (a lineage may be present in different environments) and lab (the principal investigator who conducted the EMP study) nested within environment (different labs sampled and sequenced a given environment) (as suggested in http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html). Defining random effects on the slope enabled us to test slope variation across groups of each categorical variable (e.g. slope variation between different environments or different lineages). We included the EMP unique sample ID as a random effect to control for dependencies between observations (if two taxa were part of the same sample) (Table 1, Supplementary file 1 section 1).

Shannon diversity-based GLMMs

We also tested whether ASV diversity in a focal taxon is dependent on the diversity of all other ASVs in that sample (rather than the diversity at only the focal taxonomic level, as in the taxonomy-based GLMMs above). We used the Shannon diversity index, which is robust to differences in sampling effort, and generally reaches a plateau at 5000 sequences or fewer. To do so, we fitted a GLMM with the number of ASVs per focal taxon as the response variable, and the Shannon diversity based on ASVs across all non-focal taxa (z-standardized) as the predictor (fixed effect) The random effects were kept as in the taxonomy-based GLMMs, but we added an observation-level random effect to account for overdispersion (Table 3, Supplementary file 1 section 2). To avoid dependence between the response and predictor variables, we used the rarefied ASV dataset (5,000 ASVs/sample as above) as the response variable, and the Shannon diversity calculated on unrarefied data from the same samples as the predictor.

Null models

We considered three null models, all of which randomize the associations between ASVs within a sample, thus breaking any true biotic interactions. These null models were randomly generated matrices of the same size as the real EMP dataset, but based on a distribution that arises from the Neutral Theory of Biodiversity. Neutral Theory postulates that the biodiversity of a metacommunity is governed by independent random population dynamics across species. The aggregate behaviour is quantified by the fundamental biodiversity number θ, such that θ=2JMυ, where JM is the size of the metacommunity and ν is the speciation rate. Parametrized by θ, the metacommunity zero-sum multinomial distribution (mZSM) was developed to obtain random samples of size J (Alonso and McKane, 2004). We used this mZSM distribution (implemented with the sads package in R; http://search.r-project.org/library/sads/html/dmzsm.html) to generate the counts of the ASVs for each dataset in models 1 and 2. Model 1 assumes that the whole dataset follows the same species abundance distribution (SAD), characterized by a mZSM with θ = 50. Model 2 assumes that each environment has its own SAD and thus all the samples of a single environment are assigned the same θ but are distinct across environments (θ was chosen uniformly between 1 and 100). The number of samples per environment were the same as the EMP dataset. To obtain similar mean counts as the real dataset, we set J = 1000 for both models 1 and 2, in order to vary θ from 1 to 100. These values are reasonable based on previous studies that estimated these parameters from microbiome data (Li and Ma, 2016). We included a down-sampling step to replicate the zero-inflated nature of the real dataset (on average there were only 96 ASVs per sample while there was a total of 22,014 ASVs in the entire EMP dataset). To replicate the sampling effect due to rarefaction, we first created a vector of all individuals from a single sample. We then selected 5000 individuals at random whose identities determined which ASVs were found in that sample. These neutrally-derived random matrices, null models 1 and 2, were plotted using the same plots (ASV:Genus vs number of genera) as the real EMP dataset and were then analysed using GLMMs with community diversity as a predictor of focal lineage diversity (fixed effect), with lineage identity and EMP sample ID as random effects. For Model 1, the slope was significantly negative (GLMM, Wald test, z=-9.807, P<2e-16). For Model 2, the null GLMM (including the intercept only) was significant, meaning that the community diversity has no significant effect on focal lineages diversity (Likelihood-ratio test between the model with the predictor and the intercept-only model, P=0.9399).

To generate a null model for a metacommunity assembled by niche processes, null model 3 was made by sampling from a single Poisson distribution (λ = 0.01) for each element of the data matrix. We used the Poisson distribution as a sensitivity analysis compared to the ZSM, and found the two behave quite similarly (i.e. Model 1 and 3 produce qualitatively similar results). The probability of size zero was sufficiently large that the down-sampling step was not needed for this model. Next, DBD and EC effects were added to null model 3 according to the following procedure. An element was chosen at random in a sample and tested if it is empty or full (i.e. check the presence/absence of a particular ASV). If the element is full then the DBD algorithm fills an empty element chosen at random in the same sample, while the EC algorithm empties a filled element in the same sample. This is to mimic the effect of DBD creating a niche for a new ASV, or EC removing a niche based on the existing diversity. The strength of DBD or EC effects were determined by the percent of elements tested. These data were analysed with GLMMs to test the power of our models to detect DBD or EC (Table 2, Supplementary file 3 Section 2.1).

Rarefaction simulation

We constructed a simple simulation in which each microbiome sample may differ in total diversity (e.g. in the observed range of genera) while maintaining a constant taxonomic ratio (e.g. ASV:genus ratio = 2). To mimic rarefaction, we then sampled a set number of sequencing reads from each synthetic community, assuming ASVs are sampled with equal probability and plotted the observed taxonomic ratio (Figure 2—figure supplement 19). This simple simulation is implemented in permute_ASVs_synthetic.pl.

Nucleotide sequence-based analysis

We clustered ASVs at decreasing levels of nucleotide identity, from 100% identical ASVs down to 75% identity (roughly equivalent to phyla [Konstantinidis and Tiedje, 2005]). We estimated focal cluster diversity as the mean number of descendants per cluster (e.g. number of 100% clusters per 97% cluster) and plotted this against the total number of clusters (97% identity in this example). This approach has the advantage of including sequences even if they come from unnamed taxa. For each of the six nucleotide divergence ratios tested, the relationship between total number of clusters and focal cluster diversity was positive (Figure 2—figure supplement 20), consistent with DBD and suggesting that the taxonomic analyses were qualitatively unbiased.

Fasta files with all ASVs per sample were produced by a python script (Construct_fasta_per_sample.py, Python Version 2.7) from the sequences summary file (otu_summary.emp_deblur_90 bp.subset_2 k.rare_5000 from EMP ftp server). We clustered sequences from each sample using USEARCH V9.2 and estimated sample diversity as the total number of clusters at a given level (e.g. 97% identity) and focal cluster diversity as the mean number of descendent clusters (e.g. number of 100% clusters per 97% cluster). To describe the putative DBD or EC relationships, we tested three models: linear, quadratic and cubic (lm function in R). Model comparisons were based on the adjusted R2 (Figure 2—figure supplement 20).

We note that diversity at level i (di) and at level i+1 (di+1/di) are not independent in this analysis because di+1 must be greater than or equal to di. To assess the effects of this non-independence on the results, we conducted permutation tests by randomizing the associations between di and di+1. Using 999 permutations, P-values were calculated based on how many times we observed a correlation greater than that seen in the real data (cor.test R function with kendall method). In each permutation, we recalculated the significance test (Wald z) for the correlation in the randomized data, and then computed the P-value based on how many times we observed a z value greater than that of the original data. At all six levels of nucleotide identity, the real data always showed a significantly stronger positive correlation when compared to permuted data (p=0.001), indicating that the DBD patterns was not an artefact of the dependence structure in the data.

The effect of community diversity on focal cluster diversity was also tested across different environments analysed separately. We modelled this relationship with linear, quadratic and cubic fits, and compared those models based on the adjusted R2 (Figure 2—figure supplements 2126).

DBD variation across environments

We tested the variation of focal lineage diversity slopes across different environments by including EMPO 3 biome type as a fixed effect. We fitted a GLMM with the interaction between community diversity and environment type as a predictor of focal lineage diversity. All other random effects on intercept and slope were kept as in the previous GLMMs (Figure 3, Supplementary file 1 Section 3). DBD variation across environments was tested for Family:Order, Order:Class and Class:Phylum taxonomic ratios, as diversity slope variation by environment was statistically significant (likelihood-ratio test, p<0.05) for these ratios in the taxonomy-based models (Table 1).

Abiotic effects

To test for the relative effect of biotic and abiotic environmental variables on focal lineage diversity across different taxonomic ratios, we used a separate GLMM, with Poisson error distribution and a log link function, for every ratio. We fitted the GLMM on a subset (~10%) of the whole dataset, 192 samples (from water: saline (19) and non-saline (44), surface: saline (42) and non-saline (19), sediment: saline (22) and non-saline (31), soil (8) and plant rhizosphere (7)), for which measurements of four key abiotic variables (temperature, pH, latitude and elevation) were available. As predictors of focal lineage diversity (fixed effects), we included non-focal community diversity and abiotic variables, as well as their interactions. All predictors were standardized to a mean of zero and a standard deviation of one to obtain comparable estimates. The GLMM had the same random effects as in the previous analysis, but only on the intercept for simplicity (Table 4, Supplementary file 1 section 4).

Soil dataset analysis

We used the Delgado-Baquerizo et al., 2018 soil microbiome survey (237 samples from 18 countries) to further test the relative impacts of biotic versus abiotic drivers of diversity. Raw data and abiotic measurements were downloaded from Figshare (https://figshare.com/s/82a2d3f5d38ace925492; DOI: 10.6084/m9.figshare.5611321). 16S bioinformatic processing was performed using QIIME2 and Deblur with the same protocol as in Thompson et al., 2017. Raw data 16S rRNA gene (V3-V4 region), were processed by trimming the primers (341F/805R primer set) with qiime cutadapt trim-paired, then merged using qiime vsearch join-pairs. Sequences were quality filtered and denoised using Deblur with a trimming length of 400 bp. The resulting 400 bp Deblur BIOM table was filtered to keep only ASVs with at least 25 reads total over all samples and rarefied to a depth of 5000. Taxonomy was assigned with a Naive Bayes classifier trained on the V4-V3 region of 99% OTU Greengenes 13.8 sequences with qiime feature-classifier. We obtained a final dataset of 186 samples and 24,252 ASVs which was used as input for all statistical analysis as in the EMP dataset analysis. This data set included 14 environmental factors: aridity index (Aridity_Index), minimum and maximum temperature (MINT and MAXT), precipitation seasonality (PSEA), mean diurnal temperature range (MDR), ultra-violet (UV) radiation (UV_Light), net primary productivity (NPP2003_2015), soil texture (Clay_silt), pH; total C (Soil_C), N (Soil_N) and P (Soil_P) concentrations, C:N ratio (Soil_C_N_ratio) and Latitude.

We used a separate GLMM with Poisson error distribution and a log link function to test for the effect of biotic (non-focal community diversity) and abiotic environmental variables on focal lineage diversity (e.g. the ASV:Genus ratio for a focal genus), across different taxonomic ratios. We defined non-focal taxa diversity and abiotic variables as predictors (fixed effects) and the lineage identity as a random effect.

We also fitted the same model but with the first three principal components (PCs) from the principal component analysis (PCA, rda function, vegan R package) of the abiotic variables (a matrix of 237 samples (rows) by 14 abiotic variables (columns)), as well as the interactions between diversity and each PC, and the interaction between PCs as predictors (fixed effects).

Because of possible non-linear relationships between abiotic variables and diversity, GLMMs were fitted with a linear and a quadratic term for every abiotic variable. The quadratic terms were not significant, except for the ASV:genus ratio (Table 5; likelihood-ratio test, p<2.2e-16). The interaction terms were not significant except the interaction between diversity and PCs at Family:Order ratio (likelihood-ratio test, p=2.182e-05; Table 5, Supplementary file 4).

Defining residents, generalists, and migrants

We defined a genus-level community composition matrix as a matrix of 17 environments (rows) by 1128 genera (columns), with the matrix entries indicating the percentage of samples from a given environment in which each genus is present. We clustered the environmental samples based on their genus-level community composition using fuzzy k-means clustering. The clustering (cmeans function, package e1071 in R) was done on the ‘hellinger’ transformed data (decostand function, vegan R package). To identify resident genera to each cluster, we used indicator species analysis (Dufrene and Legendre, 1997) as implemented in the indval function (labdsv R package). We defined residents as genera with indval indices between 0.4 and 0.9, with permutation test p<0.05. Genera not associated with any cluster were considered generalists. We used principal component analysis (PCA) on the community composition matrix to visualize the clustering and the indicator genera (rda function, vegan R package) (Figure 4). We then ran a separate GLMM for each environmental cluster, with resident genus-level diversity (number of non-focal genera) as a predictor of focal genus diversity (ASV:Genus ratio) for resident, migrant (residents of one cluster found in a different cluster) and generalist genera. The fixed effect was specified as the interaction between diversity and a factor defining the genus-cluster association (with three levels: resident, migrant and generalist). Random effects on intercept and slope were kept as in the GLMMs described above.

Genome size analysis

We chose a subset of genera represented by one or more sequenced genomes in the NCBI microbial genomes database (https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/). For these genera, a representative genome size was assigned by selecting the genome with the lowest number of scaffolds (if no closed genomes were available) (Supplementary file 6). If multiple genomes were available with the same level of completion, the largest genome size was used, as smaller genomes could be artefacts of incomplete assembly which would bias the mean and median downward. Moreover, given the deletional bias in bacterial genomes (Kuo and Ochman, 2009a), the largest genome is likely more reflective of the ancestral genome size of the genus. Only genera with two or more ASVs in at least one sample were included in the analysis. Intracellular symbionts were excluded. We fitted a GLMM on the subset of data with known genome size (576 genera, ranging from ~1 to 15 Mbp) with the interaction between community diversity and genome size as a predictor of focal lineage diversity at the ASV:Genus level. All the other random effects on intercept and slope were kept as in the previous GLMMs (Supplementary file 1 section 6).

Acknowledgements

We thank Luke Thompson for assistance obtaining EMP data and Zofia Ecaterina Taranu, Vincent Fugère and Guillaume Larocque for advice on GLMMs. We are also grateful to Steven Kembel, Tom Battin, the reviewers Eric Kemen and Benjamin E Wolfe, and the editor Detlef Weigel for critical comments that improved the manuscript. Funding: This project was made possible by an NSERC Discovery Grant and Canada Research Chair to BJS.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

B Jesse Shapiro, Email: jesse.shapiro@mcgill.ca.

Detlef Weigel, Max Planck Institute for Developmental Biology, Germany.

Detlef Weigel, Max Planck Institute for Developmental Biology, Germany.

Funding Information

This paper was supported by the following grants:

  • Natural Sciences and Engineering Research Council of Canada to B Jesse Shapiro.

  • Canada Research Chairs to B Jesse Shapiro.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Data curation, Supervision, Investigation, Writing - original draft, Writing - review and editing.

Software, Formal analysis, Validation, Investigation, Methodology, Writing - review and editing.

Conceptualization, Methodology, Writing - review and editing.

Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Additional files

Supplementary file 1. Full GLMM outputs for the EMP data.
elife-58999-supp1.pdf (169.8KB, pdf)
Supplementary file 2. Goodness of fit for the GLMMs.
elife-58999-supp2.docx (19.4KB, docx)
Supplementary file 3. Full GLMM output for simulated data under Neutral Theory models.
elife-58999-supp3.pdf (176.6KB, pdf)
Supplementary file 4. Full GLMM output for soil data (Delgado-Baquerizo et al., 2018).
elife-58999-supp4.pdf (155.8KB, pdf)
Supplementary file 5. Indicator species analysis.

The table shows the assignment of each genus to one of three environment types.

elife-58999-supp5.xlsx (26.7KB, xlsx)
Supplementary file 6. Genome size assignment.

The table shows genome sizes assigned to each genus.

elife-58999-supp6.xlsx (29.7KB, xlsx)
Transparent reporting form

Data availability

All data is available from the Earth Microbiome Project (ftp.microbio.me), as detailed in the Methods. All computer code used for analysis are available at https://github.com/Naima16/dbd.git (copy archived at https://archive.softwareheritage.org/swh:1:rev:ecb4f844264b72eaa8cbd708244ecd32d414c7dd/).

The following previously published datasets were used:

Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey K, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard RS, Goodwin KD, Jansson JK, Gilbert JA, Knight R. 2017. Earth Microbiome Project. microbio.me. otu_summary.emp_deblur_90bp.subset_2k.rare_5000.tsv

Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-González A, Eldridge DJ, Bardgett RD, Maestre FT, Singh BK, Fierer N. 2018. Global Soil Dataset. figshare.

References

  1. Alonso D, McKane AJ. Sampling Hubbell’s neutral theory of biodiversity: Sampling neutral theory. Ecology Letters. 2004;7:901–910. doi: 10.1111/j.1461-0248.2004.00640.x. [DOI] [Google Scholar]
  2. Auguet JC, Barberan A, Casamayor EO. Global ecological patterns in uncultured archaea. The ISME Journal. 2010;4:182–190. doi: 10.1038/ismej.2009.109. [DOI] [PubMed] [Google Scholar]
  3. Azaele S, Suweis S, Grilli J, Volkov I, Banavar JR, Maritan A. Statistical mechanics of ecological systems: neutral theory and beyond. Reviews of Modern Physics. 2016;88:035003. doi: 10.1103/RevModPhys.88.035003. [DOI] [Google Scholar]
  4. Bailey SF, Dettman JR, Rainey PB, Kassen R. Competition both drives and impedes diversification in a model adaptive radiation. Proceedings of the Royal Society B: Biological Sciences. 2013;280:20131253. doi: 10.1098/rspb.2013.1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Barberán A, Ramirez KS, Leff JW, Bradford MA, Wall DH, Fierer N. Why are some microbes more ubiquitous than others? predicting the habitat breadth of soil Bacteria. Ecology Letters. 2014;17:794–802. doi: 10.1111/ele.12282. [DOI] [PubMed] [Google Scholar]
  6. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015;67:1–48. doi: 10.18637/jss.v067.i01. [DOI] [Google Scholar]
  7. Brockhurst MA, Buckling A, Rainey PB. The effect of a bacteriophage on diversification of the opportunistic bacterial pathogen, Pseudomonas aeruginosa. Proceedings of the Royal Society B: Biological Sciences. 2005;272:1385–1391. doi: 10.1098/rspb.2005.3086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brockhurst MA, Colegrave N, Hodgson DJ, Buckling A. Niche occupation limits adaptive radiation in experimental microcosms. PLOS ONE. 2007;2:e193. doi: 10.1371/journal.pone.0000193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Calcagno V, Jarne P, Loreau M, Mouquet N, David P. Diversity spurs diversification in ecological communities. Nature Communications. 2017;8:15810. doi: 10.1038/ncomms15810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Coyte KZ, Schluter J, Foster KR. The ecology of the microbiome: networks, competition, and stability. Science. 2015;350:663–666. doi: 10.1126/science.aad2602. [DOI] [PubMed] [Google Scholar]
  11. Czárán TL, Hoekstra RF, Pagie L. Chemical warfare between microbes promotes biodiversity. PNAS. 2002;99:786–790. doi: 10.1073/pnas.012399899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. de Wit R, Bouvier T. 'Everything is everywhere, but, the environment selects'; what did Baas Becking and Beijerinck really say? Environmental Microbiology. 2006;8:755–758. doi: 10.1111/j.1462-2920.2006.01017.x. [DOI] [PubMed] [Google Scholar]
  13. Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-González A, Eldridge DJ, Bardgett RD, Maestre FT, Singh BK, Fierer N. A global atlas of the dominant Bacteria found in soil. Science. 2018;359:320–325. doi: 10.1126/science.aap9516. [DOI] [PubMed] [Google Scholar]
  14. Dufrene M, Legendre P. Species assemblages and Indicator species: the need for a flexible asymmetrical approach. Ecological Monographs. 1997;67:345–366. doi: 10.2307/2963459. [DOI] [Google Scholar]
  15. Elston DA, Moss R, Boulinier T, Arrowsmith C, Lambin X. Analysis of aggregation, a worked example: numbers of ticks on red grouse chicks. Parasitology. 2001;122:563–569. doi: 10.1017/S0031182001007740. [DOI] [PubMed] [Google Scholar]
  16. Elton C. Competition and the structure of ecological communities. The Journal of Animal Ecology. 1946;15:54–68. doi: 10.2307/1625. [DOI] [Google Scholar]
  17. Emerson BC, Kolm N. Species diversity can drive speciation. Nature. 2005;434:1015–1017. doi: 10.1038/nature03450. [DOI] [PubMed] [Google Scholar]
  18. Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive earth's biogeochemical cycles. Science. 2008;320:1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
  19. Gause GF. The Struggle for Existence. Baltimore, 1934: Williams & Wilkins; 2003. [Google Scholar]
  20. Gómez P, Buckling A. Real-time microbial adaptive diversification in soil. Ecology Letters. 2013;16:650–655. doi: 10.1111/ele.12093. [DOI] [PubMed] [Google Scholar]
  21. Gotelli NJ, Colwell RK. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters. 2001;4:379–391. doi: 10.1046/j.1461-0248.2001.00230.x. [DOI] [Google Scholar]
  22. Gotelli NJ, McGill BJ. Null versus neutral models: what's the difference? Ecography. 2006;29:793–800. doi: 10.1111/j.2006.0906-7590.04714.x. [DOI] [Google Scholar]
  23. Harris K, Parsons TL, Ijaz UZ, Lahti L, Holmes I, Quince C. Linking statistical and ecological theory: hubbell's Unified Neutral Theory of Biodiversity as a Hierarchical Dirichlet Process. Proceedings of the IEEE. 2017;105:516–529. doi: 10.1109/JPROC.2015.2428213. [DOI] [Google Scholar]
  24. Hibbing ME, Fuqua C, Parsek MR, Peterson SB. Bacterial competition: surviving and thriving in the microbial jungle. Nature Reviews Microbiology. 2010;8:15–25. doi: 10.1038/nrmicro2259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hubbell SP. The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press; 2001. [DOI] [PubMed] [Google Scholar]
  26. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, Butterfield CN, Hernsdorf AW, Amano Y, Ise K, Suzuki Y, Dudek N, Relman DA, Finstad KM, Amundson R, Thomas BC, Banfield JF. A new view of the tree of life. Nature Microbiology. 2016;1:16048. doi: 10.1038/nmicrobiol.2016.48. [DOI] [PubMed] [Google Scholar]
  27. Jarvinen O. Species-To-Genus ratios in biogeography: a historical note. Journal of Biogeography. 1982;9:363–370. doi: 10.2307/2844723. [DOI] [Google Scholar]
  28. Johnson PC. Extension of nakagawa & schielzeth's R2GLMM to random slopes models. Methods in Ecology and Evolution. 2014;5:944–946. doi: 10.1111/2041-210X.12225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Jousset A, Eisenhauer N, Merker M, Mouquet N, Scheu S. High functional diversity stimulates diversification in experimental microbial communities. Science Advances. 2016;2:e1600124. doi: 10.1126/sciadv.1600124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kastman EK, Kamelamela N, Norville JW, Cosetta CM, Dutton RJ, Wolfe BE. Biotic interactions shape the ecological distributions of Staphylococcus species. mBio. 2016;7:e01157-16. doi: 10.1128/mBio.01157-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kennedy AC, de Luna LZ. Rhizhosphere. In: Hillel D, editor. Encyclopedia of Soils in the Environment. Elsevier; 2005. pp. 399–406. [Google Scholar]
  32. Konstantinidis KT, Tiedje JM. Towards a genome-based taxonomy for prokaryotes. Journal of Bacteriology. 2005;187:6258–6264. doi: 10.1128/JB.187.18.6258-6264.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kuo CH, Ochman H. Deletional Bias across the three domains of life. Genome Biology and Evolution. 2009a;1:145–152. doi: 10.1093/gbe/evp016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kuo CH, Ochman H. Inferring clocks when lacking rocks: the variable rates of molecular evolution in Bacteria. Biology Direct. 2009b;4:35. doi: 10.1186/1745-6150-4-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Laland KN, Odling-Smee FJ, Feldman MW. Evolutionary consequences of niche construction and their implications for ecology. PNAS. 1999;96:10242–10247. doi: 10.1073/pnas.96.18.10242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Lapierre P, Gogarten JP. Estimating the size of the bacterial pan-genome. Trends in Genetics. 2009;25:107–110. doi: 10.1016/j.tig.2008.12.004. [DOI] [PubMed] [Google Scholar]
  37. Lauber CL, Hamady M, Knight R, Fierer N. Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale. Applied and Environmental Microbiology. 2009;75:5111–5120. doi: 10.1128/AEM.00335-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Li L, Ma ZS. Testing the neutral theory of biodiversity with human microbiome datasets. Scientific Reports. 2016;6:31448. doi: 10.1038/srep31448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Lindow SE, Brandl MT. Microbiology of the phyllosphere. Applied and Environmental Microbiology. 2003;69:1875–1883. doi: 10.1128/AEM.69.4.1875-1883.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Louca S, Shih PM, Pennell MW, Fischer WW, Parfrey LW, Doebeli M. Bacterial diversification through geological time. Nature Ecology & Evolution. 2018;2:1458–1467. doi: 10.1038/s41559-018-0625-0. [DOI] [PubMed] [Google Scholar]
  41. Louca S, Mazel F, Doebeli M, Parfrey LW. A census-based estimate of earth's bacterial and archaeal diversity. PLOS Biology. 2019;17:e3000106. doi: 10.1371/journal.pbio.3000106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Louca S, Pennell MW. Extant timetrees are consistent with a myriad of diversification histories. Nature. 2020;580:502–505. doi: 10.1038/s41586-020-2176-1. [DOI] [PubMed] [Google Scholar]
  43. Lozupone CA, Knight R. Global patterns in bacterial diversity. PNAS. 2007;104:11436–11440. doi: 10.1073/pnas.0611525104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Madi N. dbd. swh:1:rev:ecb4f844264b72eaa8cbd708244ecd32d414c7ddSoftware Heritage. 2020 https://archive.softwareheritage.org/swh:1:rev:ecb4f844264b72eaa8cbd708244ecd32d414c7dd/
  45. Marshall CR. Five palaeobiological laws needed to understand the evolution of the living biota. Nature Ecology & Evolution. 2017;1:165. doi: 10.1038/s41559-017-0165. [DOI] [PubMed] [Google Scholar]
  46. Meyer JR, Kassen R. The effects of competition and predation on diversification in a model adaptive radiation. Nature. 2007;446:432–435. doi: 10.1038/nature05599. [DOI] [PubMed] [Google Scholar]
  47. Morris JJ, Lenski RE, Zinser ER. The black queen hypothesis: evolution of dependencies through adaptive gene loss. mBio. 2012;3:e00036–12. doi: 10.1128/mBio.00036-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Nakagawa S, Schielzeth H. A general and simple method for obtaining R 2 from generalized linear mixed-effects models. Methods in Ecology and Evolution. 2013;4:133–142. doi: 10.1111/j.2041-210x.2012.00261.x. [DOI] [Google Scholar]
  49. Needham DM, Fuhrman JA. Pronounced daily succession of phytoplankton, archaea and Bacteria following a spring bloom. Nature Microbiology. 2016;1:16005. doi: 10.1038/nmicrobiol.2016.5. [DOI] [PubMed] [Google Scholar]
  50. Palmer MW, Maurer TA. Does diversity beget diversity? A case study of crops and weeds. Journal of Vegetation Science. 1997;8:235–240. doi: 10.2307/3237352. [DOI] [Google Scholar]
  51. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, Hugenholtz P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology. 2018;36:996–1004. doi: 10.1038/nbt.4229. [DOI] [PubMed] [Google Scholar]
  52. Pennekamp F, Pontarp M, Tabi A, Altermatt F, Alther R, Choffat Y, Fronhofer EA, Ganesanandamoorthy P, Garnier A, Griffiths JI, Greene S, Horgan K, Massie TM, Mächler E, Palamara GM, Seymour M, Petchey OL. Biodiversity increases and decreases ecosystem stability. Nature. 2018;563:109–112. doi: 10.1038/s41586-018-0627-8. [DOI] [PubMed] [Google Scholar]
  53. Power JF, Carere CR, Lee CK, Wakerley GLJ, Evans DW, Button M, White D, Climo MD, Hinze AM, Morgan XC, McDonald IR, Cary SC, Stott MB. Microbial biogeography of 925 geothermal springs in New Zealand. Nature Communications. 2018;9:2876. doi: 10.1038/s41467-018-05020-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Price TD, Hooper DM, Buchanan CD, Johansson US, Tietze DT, Alström P, Olsson U, Ghosh-Harihar M, Ishtiaq F, Gupta SK, Martens J, Harr B, Singh P, Mohan D. Niche filling slows the diversification of himalayan songbirds. Nature. 2014;509:222–225. doi: 10.1038/nature13272. [DOI] [PubMed] [Google Scholar]
  55. Rabosky DL, Chang J, Title PO, Cowman PF, Sallan L, Friedman M, Kaschner K, Garilao C, Near TJ, Coll M, Alfaro ME. An inverse latitudinal gradient in speciation rate for marine fishes. Nature. 2018;559:392–395. doi: 10.1038/s41586-018-0273-1. [DOI] [PubMed] [Google Scholar]
  56. Rabosky DL, Hurlbert AH. Species richness at continental scales is dominated by ecological limits. The American Naturalist. 2015;185:572–583. doi: 10.1086/680850. [DOI] [PubMed] [Google Scholar]
  57. San Roman M, Wagner A. An enormous potential for niche construction through bacterial cross-feeding in a homogeneous environment. PLOS Computational Biology. 2018;14:e1006340. doi: 10.1371/journal.pcbi.1006340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Schluter D, Pennell MW. Speciation gradients and the distribution of biodiversity. Nature. 2017;546:48–55. doi: 10.1038/nature22897. [DOI] [PubMed] [Google Scholar]
  59. Sender R, Fuchs S, Milo R. Revised estimates for the number of human and Bacteria cells in the body. PLOS Biology. 2016;14:e1002533. doi: 10.1371/journal.pbio.1002533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Seth EC, Taga ME. Nutrient cross-feeding in the microbial world. Frontiers in Microbiology. 2014;5:350. doi: 10.3389/fmicb.2014.00350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, Arrieta JM, Herndl GJ. Microbial diversity in the deep sea and the underexplored "rare biosphere". PNAS. 2006;103:12115–12120. doi: 10.1073/pnas.0605127103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Sriswasdi S, Yang CC, Iwasaki W. Generalist species drive microbial dispersion and evolution. Nature Communications. 2017;8:1162. doi: 10.1038/s41467-017-01265-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Tara Oceans coordinators. Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejo-Castillo FM, Costea PI, Cruaud C, d'Ovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F, Lepoivre C, Lima-Mendez G, Poulain J, Poulos BT, Royo-Llonch M, Sarmento H, Vieira-Silva S, Dimier C, Picheral M, Searson S, Kandels-Lewis S, Bowler C, de Vargas C, Gorsky G, Grimsley N, Hingamp P, Iudicone D, Jaillon O, Not F, Ogata H, Pesant S, Speich S, Stemmann L, Sullivan MB, Weissenbach J, Wincker P, Karsenti E, Raes J, Acinas SG, Bork P. Structure and function of the global ocean microbiome. Science. 2015;348:1261359. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]
  64. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R, Earth Microbiome Project Consortium A communal catalogue reveals earth's multiscale microbial diversity. Nature. 2017;551:457–463. doi: 10.1038/nature24621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Vos M. A species concept for Bacteria based on adaptive divergence. Trends in Microbiology. 2011;19:1–7. doi: 10.1016/j.tim.2010.10.003. [DOI] [PubMed] [Google Scholar]
  66. Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: the unseen majority. PNAS. 1998;95:6578–6583. doi: 10.1073/pnas.95.12.6578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Whittaker RH. Evolution and measurement of species diversity. Taxon. 1972;21:213–251. doi: 10.2307/1218190. [DOI] [Google Scholar]

Decision letter

Editor: Detlef Weigel1
Reviewed by: Eric Kemen2, Benjamin E Wolfe3

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This study revisits an old debate in the ecology literature dating back to the 1940s: Does diversity promote or constrain diversity at different levels, and how is diversity maintained? These questions have been investigated in communities of macroorganisms, but this is probably the first high-level analysis of a large microbiome dataset to explore how species interactions influence microbial diversity. The authors found support for the "Diversity Begets Diversity" (DBD) hypothesis in habitats with lower overall diversity, while in habitats with higher overall diversity, the "Ecological Controls" (EC) hypothesis appeared to more closely describe patterns of microbial diversity. The paper has a great mix of modern tools, historical ideas, and interdisciplinary science. It is clearly written and the analyses are generally carefully executed and presented.

Decision letter after peer review:

Thank you for submitting your article "Does diversity beget diversity in microbiomes?" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Detlef Weigel as the Reviewing and Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Eric Kemen (Reviewer #1); Benjamin E Wolfe (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

Madi and colleagues revisit an old debate in the ecology literature dating back to the 1940s: Does diversity promote or constrain diversity at different levels, and how is diversity maintained? On one hand, having more species to interact with can create new niches and promote diversity. On the other hand, more species might mean more competition that would limit additional gains in diversity. While these hypotheses make opposite predictions, they are not mutually exclusive and likely act in tandem during community assembly. As the authors note, these questions have been asked in communities of macroorganisms, but this is probably the first high-level analysis of a large microbiome dataset that explores how species interactions influence microbial diversity.

The authors found support for the "Diversity Begets Diversity" (DBD) hypothesis in habitats with lower overall diversity, while in habitats with higher overall diversity, the "Ecological Controls" (EC) hypothesis appeared to more closely describe patterns of microbial diversity.

There are many caveats when inferring microbial interactions and other dimensions of microbial diversity from 16S rRNA amplicon data, but this work provides a great jumping point for many future studies to experimentally test some of the conclusions of this analysis.

The paper has a great mix of modern tools, historical ideas, and interdisciplinary science. It is clearly written and the analyses are generally carefully executed and presented.

However, there are also several significant concerns, which we hope the authors will be able to address.

Essential revisions:

1) A third significant hypothesis is missing that makes predictions about how diversity can affect net rates of diversification, The Neutral Theory of Biodiversity and Biogeography (NT). The pattern the authors observed – decreased diversity slopes within clades embedded within more diverse communities – can be explained by a neutral model devoid of ecological interactions. Let us consider a simple neutral model that has a rate of diversification intrinsic to each unique taxonomic unit (new taxa emerging per generation per unique taxonomic unit) and a rate of stochastic extinction that is proportional to the population size of each taxonomic unit. The probability that any taxa diversifies would be uniform throughout the community, however, as the diversity increases, this increases the number of unique units that can yield new taxa and thus the rate of influx of new taxa. At some tipping point, however, the community will become so diverse that the population size of any given taxa will be low and susceptible to stochastic loss. This will cause the net diversification rate to drop in more diverse communities. This is just a simple example and it does not consider migration, which is important for shaping microbial diversity; however, it demonstrates that other forces outside of the ecologically-based models of DBD and EC may be acting to produce the patterns under investigation. It is important to rule out NT's parsimonious predictions before moving on to testing sophisticated ecological theories.

2) Following from the NT discussion, there is likely an important role for extinction in shaping these patterns. The discussion of hypotheses focused on the production of diversity and not the net balance of production and loss. This omission is a major weakness of the paper; some discussion of the role of extinction in these patterns would strengthen the interpretations.

3) The analysis of the relationship between genome size and DBD in the context of the Black Queen Hypothesis is not convincing. As the authors acknowledge, there are serious limitations with assuming that the genome size from one strain/species within a genus can fully represent the genome size of that genus. The authors do not provide an estimate of how much error might result from this approach (for example, by looking at several genera where many genome sizes exist and seeing how that impacts their analyses). They also do not fully explain their rationale for some aspects of this analysis; for example, why was the largest genome size available used? Why would having more precise genome data make the positive relationship stronger?

We suggest that this part of the analysis is removed in the revision. You may refer to it in the Discussion as speculation.

If you decide to discuss genome size and how this could correlate with more metabolic functions, you should elaborate further on this. Generally genome size correlates to repeats rather than additional metabolite clusters. You could use the available genomes to annotate genome clusters (using e.g. antiSMASH).

4) One of the major findings is the evidence for DBD being strongest in less diverse biomes and weaker in more diverse biomes. However, some of these biomes also suggest a correlation with total microbial load. For example, the animal distal gut is a less diverse biome but an extremely densely populated biome, whereas the soil biome is exceptionally diverse but has a lower total microbial density than the distal animal gut. Do the authors have access to any data about the population density in these different biomes? Does population density also correlate with the DBD-EC continuum? If the authors do not think that total microbial load factors into the DBD-EC continuum, they should provide an explicit rationale why not. Note that we do not expect that microbial population size data are available for the entire data set, but perhaps there is a subset for which this is available and that could be analysed.

5) For model validation for GLMMs, it would be useful if the authors could report goodness of fit for all their models (differences between the observed values and the model's predicted) using the Akaike's Information Criterion (AIC) or the Bayesian Information Criterion (BIC). Further, using cross validation (dividing data to train and test) might be useful to estimate the error rate of the models and could be used to confirm the accuracy of the models.

Additional major comments:

6) The measures for diversity vary over the paper and it is not always clear when which measurement is used, which makes it difficult to compare the analyses. Where the measures differ, this should be justified.

For example, this is clear in Figures 1, 4 and 5, but not in Figure 2. Similarly, Figure 2—figure supplement 1, which show the number of focal taxa as a function of the number of non-focal taxa, but in Figure 3 the diversity slope was estimated from a GLMM using taxonomic ratio for the community diversity, without mentioning what diversity measure was used for focal lineage. More generally, are taxonomic ratios a good measures of lineage diversity? Please explain why you used these measures of diversity instead of others. You mention that you also tried others such as Shannon index that are also robust but did not follow up on this.

7) In the very beginning of the Results, the authors note that a null model was used to assess the slopes from their GLMMs (subsection “Quantifying the DBD-EC continuum in prokaryote communities”), but details are provided only later in the Materials and methods. We think the paper would be very much improved if the authors spent a paragraph explaining the null model early in the Results section and if a data figure (or figures) from the null model were moved to the main body of the text from the supplementary information. This would help the reader understand how support for significance of the slopes was determined as they move through the rest of the paper. More importantly, the paper would have greater readability and impact for a broader audience if the authors explained the development and rationale of their null model in very simple terms.

8) The term ASV is used throughout, but the downloaded databases contain mainly OTUs. This of course could affect the outcome of an analyses depending if OTUs and ASVs are mixed or either OTUs or ASVs are used. Please explain in more detail what was used and how it was calculated.

9) A major limitation that is not acknowledged is a sole focus on prokaryotic taxa. Many of the ecosystems sampled in the EMP dataset have diverse and abundant fungi, protists, and other types of microbes. It is likely that these other microbial taxa interact with the target bacteria studied in this work in diverse ways (as numerous previous studies have shown). The authors should acknowledge this major limitation and explore briefly how it may impact their findings in the Discussion of the text. For example, fungi may play disproportionate roles in some environments that explain some of the variation observed here (e.g. the rhizosphere).

10) Subsection “Abiotic drivers of diversity”, last two paragraphs: These two paragraphs contain two analyses that essentially contradict each other. These varying results are interesting – could you please expand a little more on why these two analyses might be showing different results? In particular, the last sentence suggests that diversity levels in soil and other communities with a DBD plateau are predominantly controlled by abiotic factors. However, this is the first mention of those specific biomes in this paragraph. Could you add a little more about this observation?

eLife. 2020 Nov 20;9:e58999. doi: 10.7554/eLife.58999.sa2

Author response


Essential revisions:

1) A third significant hypothesis is missing that makes predictions about how diversity can affect net rates of diversification, The Neutral Theory of Biodiversity and Biogeography (NT). The pattern the authors observed – decreased diversity slopes within clades embedded within more diverse communities – can be explained by a neutral model devoid of ecological interactions. Let us consider a simple neutral model that has a rate of diversification intrinsic to each unique taxonomic unit (new taxa emerging per generation per unique taxonomic unit) and a rate of stochastic extinction that is proportional to the population size of each taxonomic unit. The probability that any taxa diversifies would be uniform throughout the community, however, as the diversity increases, this increases the number of unique units that can yield new taxa and thus the rate of influx of new taxa. At some tipping point, however, the community will become so diverse that the population size of any given taxa will be low and susceptible to stochastic loss. This will cause the net diversification rate to drop in more diverse communities. This is just a simple example and it does not consider migration, which is important for shaping microbial diversity; however, it demonstrates that other forces outside of the ecologically-based models of DBD and EC may be acting to produce the patterns under investigation. It is important to rule out NT's parsimonious predictions before moving on to testing sophisticated ecological theories.

We thank the reviewers for pointing out this oversight on our part; the Neutral Theory (NT) is a natural concept to include, and we now consider it explicitly as a potential explanation for the diversity slopes observed in the EMP data. We mention the NT in the Introduction, and the Results now include simulated data sampled from the distribution that arises from the classical NT: the zero-sum multinomial (ZSM). These simulated data are generated in a very similar way to our previous null models, except that the Poisson distribution has been replaced with the ZSM to align with NT. As before, and as suggested by the reviewers, ours is a model of community assembly rather than de novo diversification, which is more appropriate for the 16S amplicon data from EMP (in which de novo speciation within a sample is highly unexpected).

Under neutral Model 1, in which all data are sampled from the same ZSM distribution, there is a significantly negative diversity slope, and the trend is visibly linear (updated Figure 2—figure supplement 7). Under neutral Model 2, in which each environment draws from a separate distribution (which we consider to be realistic given that environments have systematically different levels of community diversity), there is a visibly positive trend (due to ‘stacked’ negative slopes from different environments; Figure 2—figure supplement 7) but this is found to be non-significant in a GLMM that accounts for the effect of different environments. We note that our previous Model 2 (which was made with a Poisson distribution) contained an error: although we stated in the text that there was one distribution per environment, in fact the data were generated from a model with one distribution per sample. This error has now been corrected, and we consider the ‘environment-specific diversity’ model more parsimonious than the ‘sample-specific diversity’ model, especially given that sample-specific effects on the diversity slope are relatively small (Table 1). The neutral null models 1 and 2 assume no species interactions by definition and so are not included as ‘baselines’ for the DBD or EC “spike-in” simulations. We keep the old Model 1 (sampled from one Poisson distribution) as the baseline for the DBD and EC “spike-in” simulations, and it is now called Model 3.

Overall, the updated model results suggest that a positive diversity slope is unlikely to arise under NT, but that negative slopes can arise. Combined with the finding that adding DBD, but not EC, is detectable by our GLMMs, this suggests we are well-powered to detect DBD but likely underpowered to detect EC in the EMP data. Therefore, the positive slopes in the EMP data are likely real, but there may be undetectable EC effects as well.

Regarding the plateau of DBD at high levels of diversity: these results indicate that the ‘upward’ part of the slope is likely due to DBD, but the plateau could be due to a combination of EC and sampling from a NT model. We also note that the observed ‘plateau’ of DBD in more diverse biomes (shown in Figure 3) level is not observed under the neutral Model 2. Therefore, this DBD plateau cannot be easily explained by NT.

To summarize, we are able to rule out NT’s parsimonious predictions in the sense that they are unlikely to give rise to the positive DBD slopes observed. However, it is hard to tell whether negative slopes in the EMP data arise from NT, EC, or a mixture of the two. This is in line with previous studies that have found distinguishing competition and neutrality based on metacommunity data to be non-trivial. Further work and more sophisticated models will be needed to tease these apart.

2) Following from the NT discussion, there is likely an important role for extinction in shaping these patterns. The discussion of hypotheses focused on the production of diversity and not the net balance of production and loss. This omission is a major weakness of the paper; some discussion of the role of extinction in these patterns would strengthen the interpretations.

In the absence of a substantial fossil record, it is very difficult to accurately deconvolute speciation and extinction rates from phylogenies and instead we must rely on the net diversification rate (speciation minus extinction) (Louca and Pennell, 2020; Marshall, 2017). Nevertheless, Louca et al. recently estimated that bacterial diversity has mostly increased over the past billion years, with speciation rates slightly exceeding extinction rates (Louca et al., 2018). (We note that this is generally consistent with the widespread action of DBD, and less consistent with widespread EC). Thus, although bacterial lineages do go extinct, the precise rates of extinction and speciation are not straightforward to infer from phylogenies, and we do not attempt this here. More importantly, we do not think that speciation and extinction dynamics are at play at the scale of 16S rRNA gene sequencing in the EMP dataset. Although molecular clocks vary across bacterial lineages, the 16S gene typically evolves at a substitution rate of <0.1% per million years, equivalent to 1-2 substitutions (Kuo and Ochman, 2009b). Therefore, even at the resolution of ASVs, we do not expect new ASVs to evolve within EMP samples (and this is even more true for higher taxa like genera or phyla). We thus consider most of our observations to be consistent with an ‘ecological community assembly’ rather than ‘evolutionary diversification’ model of DBD. These points have now been added to the Introduction.

3) The analysis of the relationship between genome size and DBD in the context of the Black Queen Hypothesis is not convincing. As the authors acknowledge, there are serious limitations with assuming that the genome size from one strain/species within a genus can fully represent the genome size of that genus. The authors do not provide an estimate of how much error might result from this approach (for example, by looking at several genera where many genome sizes exist and seeing how that impacts their analyses). They also do not fully explain their rationale for some aspects of this analysis; for example, why was the largest genome size available used? Why would having more precise genome data make the positive relationship stronger?

We suggest that this part of the analysis is removed in the revision. You may refer to it in the Discussion as speculation.

If you decide to discuss genome size and how this could correlate with more metabolic functions, you should elaborate further on this. Generally genome size correlates to repeats rather than additional metabolite clusters. You could use the available genomes to annotate genome clusters (using e.g. antiSMASH).

We agree that this analysis was speculative and it has now been clearly flagged as​ such and moved to the Discussion. Although this is a preliminary analysis, the effects of biotic interactions on genome size evolution (including the Black Queen Hypothesis) is a subject of growing interest, and we believe our analysis will spur discussion and further study. The Black Queen Hypothesis is now only briefly mentioned in the Discussion. We plan to follow up on these genome size analyses using higher-resolution genomic or metagenomic data, and hopefully publish these as a Research Advance as suggested above by the editor (https://elifesciences.org/articles/57162). In the meantime, we attempt to clarify and justify the exploratory genome size analyses as much as possible.

First, we justify the choice of using the largest genome size in a genus as in the Materials and methods section as follows: “If multiple genomes were available with the same level of completion, the largest genome size was used, as smaller genomes could be artefacts of incomplete assembly which would bias the mean and median downward. Moreover, given the deletional bias in bacterial genomes (Kuo and Ochman, 2009a), the largest genome is likely more reflective of the ancestral genome size of the genus.”

Second, we further explain why having more precise genome size data would make the positive relationship stronger, in the Discussion as follows: “The positive relationship between genome size and DBD is likely even stronger than estimated, because assigning genome sizes to entire genera is imprecise (i.e. there is variation in genome size within a genus, or even within species), therefore weakening the correlation.”

In essence, we argue that we see a DBD-genome size relationship in spite of a noisy estimate of genome size. If the relationship is real, having less noisy estimates would improve the correlation.

Finally, given that estimating genome size at the genus level based on 16S taxonomic information is already noisy, we refrain from running additional analyses (e.g. antiSMASH) based on this dataset, and prefer to follow up on more appropriate genomic datasets. Nevertheless, we believe it is a reasonable assumption that the number of gene families (and thus metabolic potential, in a general sense) scale with genome size in bacteria. In bacteria and archaea, there is a linear (log-log) correlation between genome size and the number of protein-coding genes, and the linear relationship reaches a plateau in eukaryotes

(https://commons.wikimedia.org/wiki/File:Genome_size_vs_protein_count.svg). This is likely because larger eukaryotic genomes contain more repeats, transposons, and viral sequences, but this is not the case for bacteria, as shown for example in a study of soil bacteria in which genome size is correlated with the number of coding genes, including secondary metabolic pathways (Barberán et al., 2014). We also note that the range of bacterial genome size variation (1-15 Mbp) that we find affect diversity slope (in our Figure 5) is too great to be explained simply by repeat sequences, and must be largely driven by coding sequences. Therefore, the literature clearly supports a scaling of bacterial genome size with protein coding genes, and likely with metabolic potential. Nevertheless, the relevance of this to DBD remains speculative and deserves further study, as clearly stated in our revised Discussion section.

4) One of the major findings is the evidence for DBD being strongest in less diverse biomes and weaker in more diverse biomes. However, some of these biomes also suggest a correlation with total microbial load. For example, the animal distal gut is a less diverse biome but an extremely densely populated biome, whereas the soil biome is exceptionally diverse but has a lower total microbial density than the distal animal gut. Do the authors have access to any data about the population density in these different biomes? Does population density also correlate with the DBD-EC continuum? If the authors do not think that total microbial load factors into the DBD-EC continuum, they should provide an explicit rationale why not. Note that we do not expect that microbial population size data are available for the entire data set, but perhaps there is a subset for which this is available and that could be analysed.

We thank the reviewers for raising this excellent point. As the reviewers suspected,​ we do not have comprehensive estimates of microbial population sizes (cell counts) for the entire dataset, but a subset allows us to make some informed speculation. Indeed, the animal distal gut has a density ~1011 cells/mm33 and has relatively low diversity while soil is more diverse but less dense (~107 -109 cells/mm ). However, cell density does not appear to correlate with the DBD-EC continuum, as is now illustrated in the updated Figure 3. For example, saline water has an intermediate level of diversity compared to the distal gut or soil, but has lower cell density than either of these biomes (~106 cells/mm3 ). Non-saline water has a similar density, but tends to have higher diversity than saline water. Of course, these are rough estimates and a more systematic study could allow a more refined analysis, but we tentatively conclude that differential cell density across biomes is unlikely to explain variation along the DBD-EC continuum. We suspect this is because even relatively low-density microbiomes yield sufficient numbers of 16S amplicons for analysis. These findings are now reported in the Results describing Figure 3.

5) For model validation for GLMMs, it would be useful if the authors could report goodness of fit for all their models (differences between the observed values and the model's predicted) using the Akaike's Information Criterion (AIC) or the Bayesian Information Criterion (BIC). Further, using cross validation (dividing data to train and test) might be useful to estimate the error rate of the models and could be used to confirm the accuracy of the models.

As suggested, we have calculated AIC for full GLMM models, and compared these​ to models without any fixed effects save the intercept. The delta-AIC values for these comparisons, as well as likelihood-ratio test p-values are now reported in the new Supplementary file 1. As an additional measure of goodness of fit, we also report both marginal (including only fixed effects) and conditional (including both fixed and random effects) coefficients of variation (R2) in Supplementary file 1. In general, the conditional R2 values are higher, suggesting the importance of random effects. This is now mentioned briefly in the first Results paragraph, and explained in detail in the Materials and methods.

Additional major comments:

6) The measures for diversity vary over the paper and it is not always clear when which measurement is used, which makes it difficult to compare the analyses. Where the measures differ, this should be justified.

For example, this is clear in Figures 1, 4 and 5, but not in Figure 2. Similarly, Figure 2—figure supplement 1, which show the number of focal taxa as a function of the number of non-focal taxa, but in Figure 3 the diversity slope was estimated from a GLMM using taxonomic ratio for the community diversity, without mentioning what diversity measure was used for focal lineage. More generally, are taxonomic ratios a good measures of lineage diversity? Please explain why you used these measures of diversity instead of others. You mention that you also tried others such as Shannon index that are also robust but did not follow up on this.

We apologize for this confusion, and we have now clarified the metrics of focal lineage diversity and community diversity used throughout the manuscript. Briefly, focal lineage diversity is always measured as a taxonomic ratio in all the main text figures, and the axis labels have now been adjusted to clarify this. The only alternative metric of focal lineage diversity is when we use percent nucleotide identity instead of taxonomic ratios, as in Figure 2—figure supplements 10 and 11. In the main text figures, community diversity is always measured as the number of non-focal taxa. The only exception is a sensitivity analysis, in which we show that using Shannon diversity as an alternative metric of community diversity (Table 3) gives qualitatively similar results as using the number of non-focal taxa as in other analyses (summarized in Table 1). We have made clarifications throughout the manuscript, notably in the Figure 2 legend to specify that these are taxonomic ratios as in Figure 1.

We chose to focus the primary analyses on taxonomic ratios because they are simple, interpretable (i.e. they allow linking the results to well-studied organisms, e.g. Pseudomonas), and have a long history in the ecological literature dating back to Elton. We view the taxonomy-independent analyses (using percent nucleotide identity in the 16S rRNA gene) as supportive of the primary results based on taxonomic ratios, showing that these primary results are not overly biased by taxonomy. Similarly, we use Shannon diversity as an alternative measure of community diversity (which is otherwise measured simply as richness, or the number of non-focal taxa) to show that major findings are not sensitive to the choice of metric. We consider these controls using alternative diversity metrics to be sufficient for our broad-scale analyses, and leave it for future investigations to investigate details of why different metrics might differ slightly, tracking different aspects of the EC-DBD continuum. For the time being, we hope it is now clear which measures were used for which analyses in our manuscript, and for what reason.

7) In the very beginning of the Results, the authors note that a null model was used to assess the slopes from their GLMMs (subsection “Quantifying the DBD-EC continuum in prokaryote communities”), but details are provided only later in the Materials and methods. We think the paper would be very much improved if the authors spent a paragraph explaining the null model early in the Results section and if a data figure (or figures) from the null model were moved to the main body of the text from the supplementary information. This would help the reader understand how support for significance of the slopes was determined as they move through the rest of the paper. More importantly, the paper would have greater readability and impact for a broader audience if the authors explained the development and rationale of their null model in very simple terms.

We fully agree, and a description of the null models (now explicitly linked to Neutral Theory, as described in the response to Essential Revision #1 above) now appears prominently in the first two Results paragraphs.

8) The term ASV is used throughout, but the downloaded databases contain mainly OTUs. This of course could affect the outcome of an analyses depending if OTUs and ASVs are mixed or either OTUs or ASVs are used. Please explain in more detail what was used and how it was calculated.

We aimed to make all our analyses as standardized and comparable as possible,​ we used ASVs throughout – for both EMP and global soil datasets. To make the soil dataset comparable to the EMP, we downloaded the raw sequencing data and called ASVs using deblur, using the same pipeline as for the EMP. This is now clarified in the Results paragraph introducing the soil dataset analysis.

9) A major limitation that is not acknowledged is a sole focus on prokaryotic taxa. Many of the ecosystems sampled in the EMP dataset have diverse and abundant fungi, protists, and other types of microbes. It is likely that these other microbial taxa interact with the target bacteria studied in this work in diverse ways (as numerous previous studies have shown). The authors should acknowledge this major limitation and explore briefly how it may impact their findings in the Discussion of the text. For example, fungi may play disproportionate roles in some environments that explain some of the variation observed here (e.g. the rhizosphere).

We appreciate this suggestion, and fully agree. We now mention in the Discussion how interactions involving non-prokaryotes such as phages and eukaryotes could explain some of the variation along the DBD-EC continuum.

10) Subsection “Abiotic drivers of diversity”, last two paragraphs: These two paragraphs contain two analyses that essentially contradict each other. These varying results are interesting – could you please expand a little more on why these two analyses might be showing different results? In particular, the last sentence suggests that diversity levels in soil and other communities with a DBD plateau are predominantly controlled by abiotic factors. However, this is the first mention of those specific biomes in this paragraph. Could you add a little more about this observation?

We have changed the end of this paragraph to further reconcile the apparent contradiction between EMP and soil datasets. The result that DBD is weak and abiotic drivers of diversity are strong in soil can be reconciled with the finding of generally stronger DBD in the overall EMP dataset. We showed that soil has a relatively low diversity slope, even without considering abiotic factors (Figure 3). It is therefore not surprising to estimate a low diversity slope when abiotic factors are added to the model. This is now explained on in the subsection “Abiotic drivers of diversity”.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey K, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard RS, Goodwin KD, Jansson JK, Gilbert JA, Knight R. 2017. Earth Microbiome Project. microbio.me. otu_summary.emp_deblur_90bp.subset_2k.rare_5000.tsv
    2. Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-González A, Eldridge DJ, Bardgett RD, Maestre FT, Singh BK, Fierer N. 2018. Global Soil Dataset. figshare. [DOI] [PubMed]

    Supplementary Materials

    Supplementary file 1. Full GLMM outputs for the EMP data.
    elife-58999-supp1.pdf (169.8KB, pdf)
    Supplementary file 2. Goodness of fit for the GLMMs.
    elife-58999-supp2.docx (19.4KB, docx)
    Supplementary file 3. Full GLMM output for simulated data under Neutral Theory models.
    elife-58999-supp3.pdf (176.6KB, pdf)
    Supplementary file 4. Full GLMM output for soil data (Delgado-Baquerizo et al., 2018).
    elife-58999-supp4.pdf (155.8KB, pdf)
    Supplementary file 5. Indicator species analysis.

    The table shows the assignment of each genus to one of three environment types.

    elife-58999-supp5.xlsx (26.7KB, xlsx)
    Supplementary file 6. Genome size assignment.

    The table shows genome sizes assigned to each genus.

    elife-58999-supp6.xlsx (29.7KB, xlsx)
    Transparent reporting form

    Data Availability Statement

    All data is available from the Earth Microbiome Project (ftp.microbio.me), as detailed in the Methods. All computer code used for analysis are available at https://github.com/Naima16/dbd.git (copy archived at https://archive.softwareheritage.org/swh:1:rev:ecb4f844264b72eaa8cbd708244ecd32d414c7dd/).

    The following previously published datasets were used:

    Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey K, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard RS, Goodwin KD, Jansson JK, Gilbert JA, Knight R. 2017. Earth Microbiome Project. microbio.me. otu_summary.emp_deblur_90bp.subset_2k.rare_5000.tsv

    Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-González A, Eldridge DJ, Bardgett RD, Maestre FT, Singh BK, Fierer N. 2018. Global Soil Dataset. figshare.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES