Skip to main content
eLife logoLink to eLife
. 2024 Jan 22;12:RP89650. doi: 10.7554/eLife.89650

Investigating macroecological patterns in coarse-grained microbial communities using the stochastic logistic model of growth

William R Shoemaker 1,, Jacopo Grilli 1
Editors: Bernhard Schmid2, Meredith C Schuman3
PMCID: PMC10945690  PMID: 38251984

Abstract

The structure and diversity of microbial communities are intrinsically hierarchical due to the shared evolutionary history of their constituents. This history is typically captured through taxonomic assignment and phylogenetic reconstruction, sources of information that are frequently used to group microbes into higher levels of organization in experimental and natural communities. Connecting community diversity to the joint ecological dynamics of the abundances of these groups is a central problem of community ecology. However, how microbial diversity depends on the scale of observation at which groups are defined has never been systematically examined. Here, we used a macroecological approach to quantitatively characterize the structure and diversity of microbial communities among disparate environments across taxonomic and phylogenetic scales. We found that measures of biodiversity at a given scale can be consistently predicted using a minimal model of ecology, the Stochastic Logistic Model of growth (SLM). This result suggests that the SLM is a more appropriate null-model for microbial biodiversity than alternatives such as the Unified Neutral Theory of Biodiversity. Extending these within-scale results, we examined the relationship between measures of biodiversity calculated at different scales (e.g. genus vs. family), an empirical pattern previously evaluated in the context of the Diversity Begets Diversity (DBD) hypothesis (Madi et al., 2020). We found that the relationship between richness estimates at different scales can be quantitatively predicted assuming independence among community members, demonstrating that the DBD can be sufficiently explained using the SLM as a null model of ecology. Contrastingly, only by including correlations between the abundances of community members (e.g. as the consequence of interactions) can we predict the relationship between estimates of diversity at different scales. The results of this study characterize novel microbial patterns across scales of organization and establish a sharp demarcation between recently proposed macroecological patterns that are not and are affected by ecological interactions.

Research organism: B. subtilis, E. coli

Introduction

An essential feature of microbial communities is their heterogeneous composition. A single environmental sample typically has a high richness, harboring hundreds to thousands of community members (Thompson et al., 2017; Shoemaker et al., 2017; Barberán et al., 2014). This high level of richness reaches an astronomical quantity at the global level, as scaling relationships and models of biodiversity predict upwards of one trillion (∼1012) species on Earth (Locey and Lennon, 2016; Lennon and Locey, 2020). Even experimental communities in laboratory settings with a single carbon source can harbor ≥40 community members, culminating in a total richness numbering in the hundreds among replicate communities (e.g. Dal Bello et al., 2021). This richness contributes to the sheer diversity of microbial communities, a challenge for researchers attempting to identify the general principles that govern their dynamics and composition.

While richness estimates of microbial communities are undoubtedly high, the choice of assigning a community member to a given taxon remains intrinsically arbitrary. This arbitrariness exists regardless of whether the definition of a taxon is based on physiological attributes measured in the laboratory, entire genomes (i.e. metagenomics), or single-gene amplicon-based methods (i.e. 16S rRNA annotation). Despite their methodological differences, these approaches can all be viewed as different ways to cluster individuals within a community into groups. To contend with the sheer richness of microbial communities, researchers frequently rely on annotation-based approaches, i.e., by summing the abundances of community members that belong to the same group at a given taxonomic scale (e.g. genus, family, etc.). This approach pares down communities to a size that is amenable for the visualization of individual groups and allows for questions of scale-dependent community reproducibility to be addressed (Louca et al., 2016; Goldford et al., 2018; Estrela et al., 2022; Estrela et al., 2021; Dal Bello et al., 2021; Ho et al., 2022; Good and Rosenfeld, 2022; Tian et al., 2020).

This movement towards performing analyses of diversity at various taxonomic scales raises the question of how the composition of a community at one scale relates to that at another. To address these questions, researchers have examined the relationship between biodiversity measures at different scales in order to pare down the set of ecological mechanisms that plausibly govern community composition. Specifically, recent efforts have found that microbial richness/diversity within a given taxonomic group (e.g. genus) is typically positively correlated with the richness/diversity among the remaining groups (e.g. family) (Madi et al., 2020; Estrela et al., 2022), an empirical pattern that aligns with the predictions of the Diversity Begets Diversity (DBD) hypothesis (Whittaker, 1972; San Roman et al., 2018; Maynard et al., 2017). Evidence of the DBD hypothesis has historically been attributed to the construction of novel niches within a community through member interactions (Calcagno et al., 2017; Whittaker, 1972), with similar mechanisms having been proposed to explain the existence of a positive relationship in microbial communities (Madi et al., 2020). However, we still lack a quantitative understanding of how community composition at one scale should relate to that of another. Proceeding towards this goal requires two elements: (1) a systematic approach to grouping community members and (2) an appropriate null model for the composition of communities.

The operation of grouping the components of a system into a smaller number (e.g. merging read counts of OTUs to the family level in a community) is known in the physical sciences as coarse-graining. This formalism defines our systematic approach to grouping community members. While it is often not explicitly acknowledged as such, coarse-graining is a core concept in the microbial life sciences (Good and Hallatschek, 2018). By smoothing over microscopic details at a lower level of biological organization in order to make progress at a higher level, the concept of coarse-graining has contributed towards the development of effective models of physiological growth (Scott et al., 2014; Jun et al., 2018), evolutionary dynamics (Schweinsberg, 2003; Desai et al., 2013), and the dependence of ecosystem properties on the diversity of underlying communities (Moran and Tikhonov, 2022). Coarse-graining has even been used to glean insight into the question of whether ‘species’ as a unit has meaning for microorganisms, as modeling efforts have found that the operation permits the delimitation of distinct taxonomic groups when the resource preferences of community members are structured (Tikhonov, 2017). These theoretical and empirical efforts suggest that coarse-graining may provide an appropriate framework for investigating patterns of diversity and abundance within and between taxonomic scales of observation.

When evaluating the novelty of an empirical pattern it is useful to identify an appropriate null model for comparison (O’Dwyer et al., 2017; McGill, 2010; Harte, 2011). Prior research efforts have demonstrated the novelty of the fine vs. coarse-grained relationship by contrasting inferences from empirical data with predictions obtained from the Unified Neutral Theory of Biodiversity (UNTB) (Hubbell, 2011; Volkov et al., 2003; Azaele et al., 2016; Madi et al., 2020; Alonso and McKane, 2004; Azaele et al., 2006). These predictions generally failed to reproduce slopes inferred from empirical data (Madi et al., 2020), implying that the fine vs. coarse-grained relationship represents a novel macroecological pattern that cannot be quantitatively explained by existing null models of ecology. However, the task of identifying an appropriate null model for comparison is not straightforward. Rather, the question of what constitutes an appropriate null model remains a persistent topic of discussion in community ecology (Simberloff, 1983; Harvey et al., 1983; Gotelli and Graves, 1996; Gotelli and Ulrich, 2012; O’Dwyer et al., 2017). Here, we take the view that a null model is appropriate for examining the relationship between two observables (e.g. community diversity at different scales) if it was capable of quantitatively predicting each observable (e.g. community diversity at one scale). By this standard, the UNTB is an unsuitable choice as a null as it generally fails to capture basic patterns of microbial diversity and abundance at any scale (Li and Ma, 2016; Harris et al., 2017; Grilli, 2020). One relevant example is that the UNTB predicts that the distribution of mean abundances of community members across sites is extremely narrow (i.e. converging to a delta distribution as the number of sites increases), whereas empirical data tends to follow a broad lognormal distribution (Grilli, 2020). Contrastingly, recent efforts have determined that the predictions of a model of self-limiting growth with environmental noise, the SLM, is capable of quantitatively capturing multiple empirical macroecological patterns in observational and experimental microbial communities (Grilli, 2020; Zaoli and Grilli, 2021; Zaoli et al., 2022; Descheemaeker et al., 2021; Descheemaeker and de Buyl, 2020; Shoemaker et al., 2023c; Lim et al., 2023). The stationary solution of this model predicts that the abundance of a given community member across sites follows a gamma distribution (Grilli, 2020), a result that provides the foundation necessary to predict macroecological patterns among and between different taxonomic and phylogenetic scales.

In this study, we evaluated macroecological patterns of microbial communities across scales of evolutionary resolution. To limit potential biases that may result due to taxonomic annotation errors and to use all available data, we investigated the macroecological consequences of coarse-graining by developing a procedure that groups community members using the underlying phylogeny in addition to relying on taxonomic assignment. We used data from the Earth Microbiome Project (EMP), a public catalog of microbial community barcode data, to ensure the generality of our findings and their commensurability with past research efforts. First, we assessed the extent that microbial diversity varies as the abundances of community members are coarse-grained by phylogenetic distance and taxonomic rank. The results of these analyses lead us to consider whether the predictive capacity of the gamma distribution remained robust under coarse-graining, a prediction that we quantitatively evaluated among community members and then extended to predict overall community richness and diversity. The accuracy of the gamma distribution provided the necessary motivation to test whether the gamma distribution was capable of predicting the relationship between fine and coarse-grained estimates, the empirical pattern that has been interpreted as evidence for the DBD hypothesis. Together, these analyses present evidence of the scale invariance of macroecological patterns in microbial communities as well as the applicability of the gamma distribution, the stationary distribution of the SLM, as a null model for evaluating the novelty of macroecological patterns of microbial biodiversity.

Results

The macroecological consequences of phylogenetic and taxonomic coarse-graining

While microbial communities are often coarse-grained into higher taxonomic scales, their effect on measures of biodiversity and the underlying phylogeny are rarely examined. Before proceeding with the full analysis using public 16S rRNA amplicon data from the EMP, we elected to quantify the fraction of the remaining community members across coarse-graining thresholds, a reflection of the extent that coarse-graining reduces global richness and the relation between taxonomic and phylogenetic coarse-graining. We first defined a coarse-grained group g as the set of OTUs that have the same assigned label in a given taxonomic rank out of G groups (e.g. Pseudomonas at the genus level) or are collapsed when the phylogeny is truncated by a given root-to-tip distance (Figure 1, Figure 1—figure supplement 1). The relative abundance of group g in site j is defined as xg,j=igxi,j.

Figure 1. The process of coarse-graining abundances using the phylogeny.

Taxonomic assignment in 16S rRNA amplicon sequence data provided the opportunity to investigate how properties of communities vary at different taxonomic scales. The most straightforward means of coarse-graining here is to sum the abundances of OTUs/ASVs that belong to the same taxonomic group. Amplicon data-based studies provide information about the shared evolutionary history of community constituents, information that can be leveraged by the construction of phylogenetic trees. A coarse-graining procedure can be defined that is analogous to one based on taxonomy, where a phylogenetic root-to-tip distance is chosen and terminal nodes are collapsed if their distance to a common ancestor is less than the prescribed distance.

Figure 1.

Figure 1—figure supplement 1. The process of coarse-graining using taxonomic information.

Figure 1—figure supplement 1.

Taxonomic assignment in 16S rRNA amplicon sequence data provides the opportunity to investigate how properties of communities vary at different taxonomic scales. A straightforward means of coarse-graining is to sum the abundances of OTUs/ASVs that belong to the same taxonomic group.
Figure 1—figure supplement 2. Examining the change in relative richness under coarse-graining.

Figure 1—figure supplement 2.

To gain an intuition for how the number of community members changes in the face of coarse-graining, we can examine the fraction of OTUs that remain across scales of (a) taxonomic and (b) phylogenetic coarse-graining. In (b) the mean fraction across environments for a given taxonomic rank is plotted as a point of reference.

We found that even minor degrees of coarse-graining had a drastic effect on the total number of community members within an environment, reducing global richness by ∼90% even at just the genus level (Figure 1—figure supplement 2a). By coarse-graining over a range of phylogenetic distances, we found that the fraction of coarse-grained community members comparable to that of genus-level coarse-graining occurred at a root-to-tip phylogenetic distance of ∼0.1 (Figure 1—figure supplement 2b). This distance translated to only ∼3% of the total distance of the tree, meaning that the majority of OTUs were coarse-grained over a minority of the tree. This pattern was likely driven by the underlying structure of microbial phylogenetic trees, where most community members have short branch lengths (O’Dwyer et al., 2015). This result suggests that while coarse-graining communities to the genus or family level substantially reduces global richness, it does so without coarse-graining the majority of the evolutionary history captured by the phylogeny. Assuming that phylogenies capture ecological changes that occur over evolutionary time, this detail implies that ecological divergence that is captured by the phylogeny should be retained even when communities are considerably coarse-grained.

With our coarse-graining procedures established, we proceeded with our macroecological investigation. Recent efforts have found that the distribution of abundances of a given ASV/OTU maintained a consistent statistically similar form across independent sites and time, a pattern known as the Abundance Fluctuation Distribution (Grilli, 2020; Zaoli and Grilli, 2021; Zaoli et al., 2022; Shoemaker, 2023a; Wolff et al., 2023). By coarse-graining empirical AFDs and rescaling them by their mean and variance across sites (i.e. standard score), we found that AFDs from the human gut microbiome retained their shape across phylogenetic scales (Figure 2a). This pattern of invariance held across environments for both phylogenetic and taxonomic coarse-graining (Figure 2—figure supplement 1, Figure 2—figure supplement 2), suggesting that empirical AFDs can likely be described by a single probability distribution.

Figure 2. The shape of the AFD remained qualitatively invariant under coarse-graining.

(a) Under phylogenetic coarse-graining the general shape of the AFD for OTUs that were present in all sites (i.e. an occupancy of one) remained qualitatively invariant. (b) Similarly, the shape of the relationship between the mean coarse-grained abundance across hosts and occupancy across sites did not tend to vary. Predictions obtained from the gamma distribution are capable of capturing the relationship between the mean abundance and occupancy, suggesting that the gamma distribution remains a useful quantitative null model under coarse-graining. All data in this plot is from the human gut microbiome.

Figure 2.

Figure 2—figure supplement 1. The AFD of all environments under taxonomic coarse-graining.

Figure 2—figure supplement 1.

To control for the effect of sampling we only examined the AFDs of OTUs that were present in all sites (i.e. an occupancy of one). The single exception was the microbial mat, where we used a minimum occupancy of 0.75 as the environment harbored no OTUs that were present among all sites. Black lines represent fits of the gamma distribution obtained using SciPy.
Figure 2—figure supplement 2. The AFD of all environments under phylogenetic coarse-graining.

Figure 2—figure supplement 2.

To control for the effect of sampling we only examined the AFDs of OTUs that were present in all sites (i.e. an occupancy of one). Black lines represent fits of the gamma distribution obtained using SciPy.
Figure 2—figure supplement 3. The predicted occupancy across sites for a gamma-distributed AFD under taxonomic coarse-graining for all environments.

Figure 2—figure supplement 3.

Figure 2—figure supplement 4. The predicted occupancy across sites for a gamma distributed AFD under phylogenetic coarse-graining for all environments.

Figure 2—figure supplement 4.

Figure 2—figure supplement 5. The relationship between the mean abundance across sites and the occupancy for various taxonomic coarse-graining scales.

Figure 2—figure supplement 5.

The black line represents the prediction of the gamma distribution.
Figure 2—figure supplement 6. The relationship between the mean abundance across sites and the occupancy for various phylogenetic coarse-graining scales.

Figure 2—figure supplement 6.

The black line represents the prediction of the gamma distribution.
Figure 2—figure supplement 7. Predictions of the variance of occupancy failed across taxonomic coarse-graining thresholds.

Figure 2—figure supplement 7.

While the gamma distribution succeeded in predicting mean occupancy, it failed to predict the variance of occupancy.
Figure 2—figure supplement 8. Predictions of the variance of occupancy failed across phylogenetic coarse-graining thresholds.

Figure 2—figure supplement 8.

Analogous plot of Figure 2—figure supplement 7 for phylogenetic coarse-graining.
Figure 2—figure supplement 9. Occupancy predictions of the gamma remained invariant despite coarse-graining.

Figure 2—figure supplement 9.

Despite (a) taxonomic and (b) phylogenetic coarse-graining the mean relative error of occupancy predictions using the gamma did not increase. Instead, the error tended to decline over extended coarse-graining scales, only increasing for phylogenetic coarse-graining when communities were coarse-grained to ≲5 members.
Figure 2—figure supplement 10. The sum of the variances of OTUs was close to the value of the variance of a taxonomic coarse-grained group, implying that the contribution of covariance to the variance of a given coarse-grained group was low.

Figure 2—figure supplement 10.

Figure 2—figure supplement 11. The analysis presented in Figure 2—figure supplement 10 but for phylogenetic coarse-graining.

Figure 2—figure supplement 11.

Figure 2—figure supplement 12. The plot presented in Figure 2—figure supplement 10 but with the ratio of coarse and fine-grained variances plotted on the y-axis for the purpose of visualizing deviations from the 1:1 line.

Figure 2—figure supplement 12.

Figure 2—figure supplement 13. The analysis presented in Figure 2—figure supplement 12 but for phylogenetic coarse-graining.

Figure 2—figure supplement 13.

It has been previously demonstrated that empirical microbial AFDs are well-described by a gamma distribution that is parameterized by the mean relative abundance x¯i and the shape parameter βi=x¯i2σi2 (equal to the squared inverse of the coefficient of variation Grilli, 2020). This distribution can be viewed as the stationary distribution of a SLM of growth, a mathematical model that successfully captures macroecological patterns of microbial communities across both sites and time (Grilli, 2020; Descheemaeker and de Buyl, 2020; Zaoli and Grilli, 2021; Equation 5 in Materials and methods).

Using this result, we determined whether the gamma distribution sufficiently characterized coarse-grained AFDs. In order to accomplish this task, it is worth noting that we do not directly observe xi. Rather, our ability to observe a community member is dependent on sampling effort (i.e. total number of reads for a given site). To account for sampling, one can derive a form of the gamma distribution that explicitly accounts for the sampling process, obtaining the probability of obtaining n reads out of N total reads belonging to a community member (Materials and methods, Grilli, 2020). Given that n=0 for a community member, we do not observe, we defined the fraction of M sites where a community member was observed (i.e. occupancy, oi) as

oi=11Mm=1MP(0|Nm,x¯i,βi) (1)

We then compared this prediction to observed estimates of occupancy to assess the accuracy of the gamma distribution across coarse-grained thresholds. We found that Equation 1 generally succeeded in predicting observed occupancy across phylogenetic and taxonomic scales for all environments (Figure 2—figure supplement 3, Figure 2—figure supplement 4). We then determined whether the gamma distribution was capable of predicting the relationship between macroecological quantities. One such relationship is that the occupancy of a community member should increase with its mean abundance, known as the abundance-occupancy relationship (Gaston et al., 2000). This pattern has been found across microbial systems (Shade et al., 2018; Sloan et al., 2007; Burns et al., 2016) and can be quantitatively predicted using the gamma distribution (Grilli, 2020). We see that this relationship is broadly captured across taxonomic and phylogenetic scales for all environments (Figure 2b, Figure 2—figure supplement 5, Figure 2—figure supplement 6). This result implies that the ability to observe a given taxonomic group was primarily determined by its mean abundance across sites and the sampling effort within a site, regardless of one’s scale of observation. In contrast, under the assumption of demographic indistinguishability under the UNTB we would expect the mean abundance distribution to be extremely narrow, following a delta distribution. Under the SLM, the variation in mean relative abundances we observed implies that the carrying capacities of community members vary over multiple orders of magnitude. We also note that at high mean abundances our predictions show slight variation, which is likely driven by variation in the shape parameter β (Figure 2b). In contrast with these results, the gamma distribution was unable to predict the variance of occupancy under both taxonomic and phylogenetic coarse-graining (Equation 11; Figure 2—figure supplement 7, Figure 2—figure supplement 8), the implications of which we will address in a later section.

To quantitatively assess the accuracy of the gamma distribution we calculated the relative error of our mean occupancy predictions (Equation 12) for all coarse-graining thresholds. We found that the mean logarithm of the error only slightly increased for the initial taxonomic and phylogenetic scales, where it then exhibited a sharp decrease across environments (Figure 2—figure supplement 9). The error then only began to decrease once the community became highly coarse-grained, harboring a global richness (union of all community members in all sites for a given environment) <20. This result means that, if anything, the accuracy of the gamma distribution only improved with coarse-graining.

Reconciling coarse-graining and the predictions of the gamma distribution

The consistent predictive success of the gamma distribution under coarse-graining raises the question of why it remains a sufficient null model. The sum of independent gamma-distributed random variables only returns a gamma through analytic calculation if all random variables have identical rate parameters (βi/x¯i=β/x¯), a requirement that microbial communities clearly do not meet since they typically harbor broad mean abundance distributions. Given that, a gamma AFD cannot predict the distribution of correlations between AFDs (Grilli, 2020), it is first worth examining whether the degree of dependence between AFDs shapes coarse-grained variables. We first consider the relation between the variance of the sum and the sum of variances.

Var(iStotxi)=iStotVar(xi)+2i<jCov(xi,xj) (2)

By plotting Var(iStotxi) against iStotVar(xi) across coarse-grained thresholds, we found that the contribution of covariance to individual coarse-grained taxa was weak, suggesting that the statistical moments at higher scales can be approximated by those at lower scales (Figure 2—figure supplement 10, Figure 2—figure supplement 11). Similar conclusions can be drawn by plotting the variances as a ratio, with slight deviations above a ratio of one, suggesting that coarse-grained variance was slightly higher (Figure 2—figure supplement 12, Figure 2—figure supplement 13). These results are consistent with previous efforts demonstrating that the strongest correlations between AFDs are typically concentrated among pairs of closely related community members (i.e. low phylogenetic distance) (Sireci et al., 2023), implying that the effects of correlation should dissipate when communities are coarse-grained. Given that the variance of the sum can be approximated by the sum of the variances and that, by definition, the mean of a sum is the sum of the means, it is reasonable to propose that the statistical moments of coarse-grained AFDs are sufficient to characterize the distribution.

Finally, while we know of no general closed-form solution for the sum of independent gamma-distributed random variables with different rate parameters (equivalent to considering the convolution of many AFDs with different carrying capacities), progress has been made towards obtaining suitable approximations (Stewart et al., 2007; Murakami, 2015; Hu et al., 2020; Behme and Bondesson, 2017; Barnabani, 2017). This body of work includes an analysis demonstrating that a single gamma distribution can provide a suitable approximation to the distribution of the sum of many gamma random variables with different rate parameters (Covo and Elalouf, 2014). In summary, the gamma distribution appears to successfully captures patterns of biodiversity under taxonomic and phylogenetic coarse-graining because the sum of multiple gamma distributions can be approximated by a single gamma distribution.

Predicting measures of richness and diversity within a coarse-grained scale

Given that the presence or absence of a community member is used to estimate community richness, a measure previously used to make claims about patterns of microbial diversity across taxonomic scales (Madi et al., 2020), we can visualize the sufficiency of the gamma distribution by predicting the mean richness within an environment at a given coarse-grained scale (Equation 13a). Likewise, we can use the entirety of the distribution of read counts to predict the diversity within a site, a measure that reflects richness as well as the distribution of abundances within a community (Equation 14a), analytic predictions that we validated through simulations (Figure 3—figure supplement 1). We note that we observe consistent deviations between the analytic predictions of the variance of diversity and simulation results. These deviations are likely driven by small deviations in predictions of the second moment of diversity, which are slight for individual community members, but become considerable when terms are summed over hundreds or thousands of community members.

Focusing on the human gut microbiome as an example, we found that we can predict the typical richness of a community across phylogenetic scales using the gamma distribution (Figure 3a). Similar results were obtained when we repeated our analysis for predicted diversity (Figure 3b). By examining all nine environments we found that despite the dissimilarity in environments, we were able to predict mean richness and diversity in the face of coarse-graining (Figure 3c and d). In contrast, the UNTB failed to predict richness (Figure 3—figure supplement 3). The results of this analysis suggest that the composition of microbial communities remained largely invariant under coarse-graining and that the gamma distribution remained a suitable null model for predicting mean community measures across coarse-grained scales. Identical results were obtained for taxonomic coarse-graining (Figure 3—figure supplement 2).

Figure 3. The gamma distribution successfully predicted mean richness and diversity under phylogenetic coarse-graining.

(a) The expected richness derived from the gamma distribution (Equation 13a) was capable of predicting richness across phylogenetic coarse-graining scales, as illustrated by data from the human gut. (b) Predictions remained successful across all environments, suggesting that a minimal model of zero interactions was sufficient to predict observed properties of community composition, (c, d) Similarly, predictions of expected diversity (14) also succeeded across coarse-graining scales for all environments. The shade of a color of a given datapoint represents the phylogenetic distance used for coarse-graining, with lighter colors representing finer scales and darker colors representing coarser scales.

Figure 3.

Figure 3—figure supplement 1. The analytic predictions of the mean and variance of richness and diversity vs. the results of simulations that assume gamma-distributed AFDs and reads drawn from a multinomial distribution.

Figure 3—figure supplement 1.

Figure 3—figure supplement 2. The gamma distribution successfully predicted mean richness and diversity under taxonomic coarse-graining.

Figure 3—figure supplement 2.

An equivalent set of analyses is depicted in Figure 3 for taxonomic coarse-graining.
Figure 3—figure supplement 3. Unified Neutral Theory of Biodiversity (UNTB) failed to predict mean richness. UNTB consistently overpredicted richness under both taxonomic and phylogenetic coarse-graining.

Figure 3—figure supplement 3.

Turning to higher-order moments, we examined the variance of richness and diversity across sites. Using a similar approach that was applied to the mean, we derived analytic predictions for the variance (Equation 17a). With the human gut as an example, we see that analytic predictions typically fail to capture estimates of variance obtained from empirical data for phylogenetic coarse-graining (Figure 4a and b). This lack of predictive success was consistent across environments (Figure 4c and d), implying that a model of independent community members with gamma-distributed abundances was insufficient to capture the variance of measures of biodiversity. A major assumption made in our derivation was that community members are independent, an assumption that is unjustified given that the gamma distribution has been previously shown to be unable to capture the empirical distribution of correlations in the AFDs of community members (Grilli, 2020). To attempt to remedy this failed prediction, we again turned to the law of total variance by estimating the covariance of richness and diversity from empirical data and adding the covariance to the predicted variance for each measure. We found that the addition of this empirical estimate was sufficient to predict the observed variance in the human gut (Figure 4a and b) as well as across environments (Figure 4e and f), implying that the underlying model is fundamentally correct for predicting the first moment of measures of biodiversity but cannot capture the correlations necessary to explain higher statistical moments such as the variance. Identical results were again obtained with taxonomic coarse-graining (Figure 4—figure supplement 1).

Figure 4. The gamma distribution only predicts the variance of richness and diversity under phylogenetic coarse-graining when covariance is included.

(a, b) In contrast with the mean, the variance of richness and diversity estimates predicted by the gamma distribution (Equation 17a) failed to capture empirical estimates from the human gut. Predictions are only comparable when empirical estimates of covariance are included in the predictions of the gamma distribution, meaning that dependence among community members is essential to describe the variation in measures of biodiversity across communities. (c, d) This lack of predictive success was constant across environments, (e, f) though the addition of covariance consistently improves our analytic predictions. The color scale used here is identical to the color scale used in Figure 3.

Figure 4.

Figure 4—figure supplement 1. A gamma AFD can only predict the variance of richness and diversity under taxonomic coarse-graining when covariance is included.

Figure 4—figure supplement 1.

An equivalent set of analyses is depicted in Figure 4 for taxonomic coarse-graining.

Predicting patterns of richness and diversity between fine and coarse-grained scales

Our predictions of the statistical moments of richness and diversity using the gamma distribution provided the foundation necessary to investigate macroecological patterns between different taxonomic and phylogenetic scales. One such prominent pattern is the relationship between the fine-grain richness/diversity within a given coarse-grained group vs. the coarse-grained richness/diversity among all remaining groups (e.g. the number of classes within Firmicutes vs. the number of phyla excluding the phylum Firmicutes), a pattern that has been purported to demonstrate the existence of DBD processes in microbial systems. Before continuing, we note that the acronym DBD technically refers to the hypothesis that such positive relationships reflects the existence of ecological interactions through which coarse-grained diversity bolsters the accumulation of fine-grained diversity (e.g. niche construction Laland et al., 1999; San Roman et al., 2018). Since we are primarily interested in the predictive power of an empirically-validated null model of biodiversity, we distinguish between DBD as a hypothesis and DBD as an empirical pattern by referring to the slope as the fine vs. coarse-grained relationship throughout the remainder of this manuscript.

The fine vs. coarse-grained relationship can be quantified as the slope of the relationship between the fine-grained richness within a given coarse-grained group g (Sg,m) and the richness in the remaining G1 coarse-grained groups: Sg,mαSGg,m, where Gg denotes the exclusion of group g and α is the slope of the relationship. This formulation was proposed by Madi et al., and to ensure commensurability we adopted it here (Madi et al., 2020). Furthermore, keeping with the approach used by Madi et al., fine and coarse-grained measures were compared across increasing taxonomic and phylogenetic scales (e.g. OTU vs. genus, genus vs. family, etc.), (Madi et al., 2020). Using Equation 1, we then defined each of these estimators in terms of the sampling form of the gamma distribution while accounting for sampling

Sg,m=|g|igP(0|Nm,x¯i,βi) (3a)
SGg,m=(|G|1)gGggP(0|Nm,x¯g,βg) (3b)

Similarly, we used Equation 14a to derive predictions for fine and coarse-grained diversity.

Hg,m=igxln[x]|Nm,x¯g,βg (4a)
HGg,m=gGggxln[x]|Nm,x¯g,βg (4b)

By repeating this calculation for all M sites, we obtained vectors of coarse and fine-grained richness estimates for group g from which we inferred the slope of the fine vs. coarse-grained relationship through ordinary least squares regression. By repeating this process for all G groups we obtained a distribution of slopes that can be directly compared to those obtained from empirical data. We include a conceptual diagram visualizing this process as a supplement (Figure 5—figure supplement 1).

Before performing a direct comparison, we first note the features of the empirical slopes and how they pertain to the predictions we obtained. By examining the distribution of empirical slopes pooled over all coarse-graining thresholds for each environment, we found that they were rarely less than zero (Figure 5a, Figure 5—figure supplement 2a). The few negative slopes inferred from empirical data were extremely small, having absolute values <10−4 and could be treated as zeros. Furthermore, the distribution of slopes follows the same form across environments, suggesting that the slope of the fine vs. coarse-grained relationship reflects a general feature of community sequence data rather than the ecology of specific environments. Like the empirical slopes, the gamma distribution virtually always predicted a positive slope for all environments for both taxonomic and phylogenetic coarse-graining. This paucity of negative slopes suggests that the prediction of the alternative to the DBD hypothesis, the Ecological Controls hypothesis (Schluter and Pennell, 2017), is virtually absent in empirical data and cannot be generated from an empirically validated null model of microbial biodiversity.

Figure 5. The slope of the fine vs. coarse-grained relationship for richness could be predicted by the gamma distribution, but was novel for estimates of diversity.

(a, b) The predictions of the gamma distribution (Equation 3a) successfully reproduced observed fine vs. coarse-grained richness slopes across scales of phylogenetic coarse-graining. (c, d) In contrast, the predictions of the gamma distribution failed to capture diversity slopes (Equation 4a). The color scale used here is identical to the color scale used in Figure 3. Squared Pearson correlation coefficients (ρ2) are computed over all slopes for all taxa across all coarse-graining scales.

Figure 5.

Figure 5—figure supplement 1. Conceptual diagram illustrating how fine vs. coarse-grained slopes are inferred.

Figure 5—figure supplement 1.

(a) Slopes were inferred from empirical data by estimating fine and coarse-grained measures of biodiversity (for this diagram, richness) according to the leave-one-out procedure used by Madi et al., 2023. (b) The mean relative abundance and beta (squared inverse coefficient of variation) were then estimated from the data and then used to predict fine and coarse-grain estimates of a given measure of biodiversity using the same leave-one-out procedure. Separate regressions were then fit to the empirical and predicted fine and coarse-grained measures of biodiversity.
Figure 5—figure supplement 2. The gamma distribution as a tool for investigating the novelty of fine vs.coarse-grained slopes.

Figure 5—figure supplement 2.

An equivalent set of analyses is depicted in Figure 5 for taxonomic coarse-graining.
Figure 5—figure supplement 3. The predicted slopes of fine vs. coarse-grained richness from the sampling form of the gamma distribution under taxonomic coarse-graining.

Figure 5—figure supplement 3.

Figure 5—figure supplement 4. The predicted slopes of fine vs. coarse-grained richness from the sampling form of the gamma distribution under phylogenetic coarse-graining.

Figure 5—figure supplement 4.

Figure 5—figure supplement 5. The predicted slopes of fine vs. coarse-grained richness under taxonomic coarse-graining using the UNTB.

Figure 5—figure supplement 5.

Figure 5—figure supplement 6. The predicted slopes of fine vs. coarse-grained richness under phylogenetic coarse-graining using the UNTB.

Figure 5—figure supplement 6.

Figure 5—figure supplement 7. The mean predicted slopes of fine vs. coarse-grained richness under (a) taxonomic and (b) phylogenetic coarse-graining using the Unified Neutral Theory of Biodiversity (UNTB).

Figure 5—figure supplement 7.

Figure 5—figure supplement 8. Comparisons of the relative error of fine vs. coarse-grained richness slope predictions between the Stochastic Logistic Model (SLM) and Unified Neutral Theory of Biodiversity (UNTB) for taxonomic coarse-graining.

Figure 5—figure supplement 8.

Figure 5—figure supplement 9. Comparisons of the relative error of fine vs. coarse-grained richness slope predictions between the Stochastic Logistic Model (SLM) and Unified Neutral Theory of Biodiversity (UNTB) for phylogenetic coarse-graining.

Figure 5—figure supplement 9.

Figure 5—figure supplement 10. The predicted slopes of fine vs. coarse-grained diversity from the sampling form of the gamma distribution under taxonomic coarse-graining.

Figure 5—figure supplement 10.

Figure 5—figure supplement 11. The predicted slopes of fine vs. coarse-grained diversity from the sampling form of the gamma distribution under phylogenetic coarse-graining.

Figure 5—figure supplement 11.

However, only observing positive slopes does not necessarily provide support for the DBD hypothesis. A direct comparison of slopes predicted from the gamma distribution to those inferred from empirical data is necessary to determine whether the predictions of DBD lie outside what can be reasonably captured by an interaction-free model such as the SLM. To evaluate the novelty of the slope of the fine vs. coarse-grained relationship we compared the values of observed slopes to those obtained from the interaction-free SLM. We found that the predictions of the gamma distribution closely matched the observed slopes across environments for both taxonomic and phylogenetic coarse-graining (Figure 5—figure supplement 3, Figure 5—figure supplement 4). We consolidated these results by taking the mean slope for a given coarse-grained scale, from which we see that the mean slope predicted by the gamma distribution does a reasonable job capturing empirical slopes across environments (Figure 5b, Figure 5—figure supplement 2b). These results indicate that we should expect to see a positive relationship between richness estimates at different scales and that the relationships we observe can be quantitatively captured by a gamma-distributed AFD. It is worth noting that the slope of the fine vs. coarse-grained relationship could be sufficiently predicted even though the gamma distribution only succeeded at predicting mean richness, suggesting that higher-order statistical moments, and by extension interactions between community members, are unnecessary to quantitatively capture the positive relationship observed between fine and coarse-grained estimates of richness.

As a point of comparison, we predicted the slope of the fine vs. coarse-grained relationship for richness using a UNTB model (Madi et al., 2023) (Supporting information). We found that generally, the UNTB slopes deviated from those obtained from empirical data, exhibiting far greater bias and variation around the 1:1 line than what was observed of the SLM (Figure 5—figure supplement 5, Figure 5—figure supplement 6). By examining the mean slope we found that predictions from the UNTB tended to systematically underpredict the observed slope under both taxonomic and phylogenetic coarse-graining (Figure 5—figure supplement 7). Directly comparing the mean relative error of the UNTB predictions to those of the SLM confirms these observations, as the UNTB predictions tended to have larger errors by an order of magnitude (Figure 5—figure supplement 8, Figure 5—figure supplement 9). To summarize, in contrast to the SLM, the UNTB cannot predict the slope of the fine vs. coarse-grained relationship for richness.

While richness is a widespread and versatile estimator that is commonly used in community ecology, neglects considerable information by focusing on presences and absences instead of the entirety of the distribution of abundances. To rigorously test the predictive power of the gamma distribution it was necessary to evaluate the fine vs. coarse-grained relationship for diversity. We again found that disparate environments had similar distributions of slopes from empirical data (Figure 5c, Figure 5—figure supplement 2c), suggesting that the slope of the relationship is likely a general property of microbial communities rather than an environment-specific pattern. However, unlike richness, diversity predictions obtained from the gamma distribution generally failed to capture observed slopes, as the squared correlation between observed and predicted slopes can be less than that of richness by over an order of magnitude (Figure 5d, Figure 5—figure supplement 10, Figure 5—figure supplement 11, Figure 5—figure supplement 2d). Here, we see where the predictions of an interaction-free SLM succeeded and failed to predict observed macroecological patterns.

Given that the gamma distribution failed to predict the observed diversity slope, it is worth evaluating whether additional features could be incorporated to generate successful predictions. A notable omission is that there is an absence of interactions between community members in the SLM, meaning that we were unable to predict correlations between community member abundances. However, while considerable progress has been made (e.g. Ho et al., 2022), predicting the observed distribution of correlation coefficients between community members while accounting for sampling remains a non-trivial task. Given that the gamma distribution succeeded at predicting other macroecological patterns, we elected to perform a simulation where a collection of sites was modeled as an ensemble of communities with correlated gamma-distributed AFDs with the means, variances, correlations, and total depth of sampling set by estimates from empirical data (Materials and methods). By including correlations between AFDs into the simulations, the statistical outcome of ecological interactions between community members, we were able to largely capture observed fine vs. coarse-grained diversity slopes (Figure 6, Figure 6—figure supplement 1, Figure 6—figure supplement 2, Figure 6—figure supplement 3). These results suggest that rather than diversity at a fine-scale begetting diversity at a coarse-scale, the correlations that exist at a fine-scale (e.g. genus) contribute to measures of biodiversity at the nearest coarse-grained scale (e.g. family), resulting in a positive relationship between measures of diversity at different scales.

Figure 6. Including correlations allows the gamma distribution to capture observed diversity slopes.

Observed fine vs. coarse-grained diversity slopes could be quantitatively reproduced under phylogenetic coarse-graining by simulating correlated gamma-distributed AFDs at the OTU-level. The color scale used here is identical to the color scale used in Figure 3. Squared Pearson correlation coefficients (ρ2) are computed over all slopes for all taxa across all coarse-graining scales.

Figure 6.

Figure 6—figure supplement 1. The predicted slopes of fine vs. coarse-grained diversity from the sampling form of the gamma distribution with correlations between OTUs under taxonomic coarse-graining.

Figure 6—figure supplement 1.

Figure 6—figure supplement 2. The predicted slopes of fine vs. coarse-grained diversity from the sampling form of the gamma distribution with correlations between OTUs under phylogenetic coarse-graining.

Figure 6—figure supplement 2.

Figure 6—figure supplement 3. Gamma distribution simulations with correlations capture observed diversity slopes.

Figure 6—figure supplement 3.

An equivalent set of analyses is depicted in Figure 6 for taxonomic coarse-graining.

Discussion

The results of this study demonstrate that macroecological patterns in microbial communities remain largely invariant across taxonomic and phylogenetic scales. By focusing on the predictions of the SLM, an interaction-free model of microbial growth under environmental fluctuations, we were able to evaluate the extent that measures of biodiversity can be predicted under coarse-graining. We were largely able to predict said measures using the same model with parameters estimated from data across scales, implying that certain macroecological patterns of microbial communities remained self-similar across taxonomic and phylogenetic scales. Building off of this result, we investigated the dependence of community measures between different degrees of coarse-graining, a pattern that has been formalized as the Diversity Begets Diversity hypothesis (Whittaker, 1972; Madi et al., 2020). The prediction derived from the sampling form of the gamma distribution quantitatively captured the observed slopes of the fine vs. coarse-grained relationship for richness, while it failed to capture the slope of diversity. However, introducing correlations between abundance fluctuation distribution permitted the recovery of the slope of the fine vs. coarse-grained diversity relationship.

Our richness results complement past work demonstrating that occupancy, the constituent of richness, is highly dependent on two parameters: sampling depth (i.e. total read count) and the mean abundance of a community member (Grilli, 2020). Our ability to predict the relationship between fine and coarse-grained measures of richness using the gamma distribution, despite our inability to predict the variance of richness, suggest that correlations driving the slope of the fine vs. coarse-grained relationship is primarily driven by the effects of finite sampling. This past work, and the relationships between the mean abundance and occupancy evaluated in this manuscript, demonstrate that occupancy alone is unlikely to contain ecological information that is not already captured by the distribution of abundances across sites (i.e. the AFD). Our analyses of the relationship between fine and coarse-grained richness support this conclusion, as predictions derived from a gamma distribution quantitatively captured the observed slope. The success of an interaction-free model in predicting the slope of the fine vs. coarse-grained relationship is an indictment of the appropriateness of estimators that rely solely on the presence of a community member for identifying novel macroecological patterns, a measure that has been used to bolster support for the DBD hypothesis at the level of 16S rRNA amplicons as well as strains (Madi et al., 2020; Madi et al., 2023). Rather, estimates of richness harbor little information about the dynamics of a community across taxonomic and phylogenetic scales that is not already captured by the sampling form of the gamma distribution. Contrasting with richness, the predictions of diversity from the gamma distribution were unable to capture fine vs. coarse-grained relationships in empirical data. Given that measures of diversity incorporate information about the richness and evenness of a community (Magurran, 2004), the comparative deficiency of our predictions for fine vs. coarse-grained diversity suggests that forms of the SLM that neglect interactions between community members cannot capture relationships between phylogenetic/taxonomic scales that depend on the evenness of the distribution of abundances.

Macroecological patterns are not imbued with mechanistic explanation (Warren et al., 2022). Rather, the onus is on the investigator to identify plausible mechanisms. Often in ecology this task is made easier by evaluating whether a model lacking a particular mechanism is capable of producing the observed pattern, that is, identifying an appropriate null. The novelty of the fine vs. coarse-grained relationship was previously assessed using a null model which assumed demographic equivalence among community members and community dynamics driven by demographic noise (i.e. the UNTB) (Madi et al., 2020; Alonso and McKane, 2004). Empirical patterns of microbial abundance cannot be reasonably captured by such models, making predictions obtained from the UNTB invalid for evaluating the novelty of microbial macroecological patterns. In contrast, models that combine self-limiting growth with environmental noise reproduces several empirical patterns, making the SLM an appropriate choice for evaluating the novelty of fine vs. coarse-grained relationships (Grilli, 2020; Descheemaeker and de Buyl, 2020). This is not a trivial detail, as there is historical precedence on the need to identify an appropriate null in order to investigate how fine and coarse-grained measures of biodiversity relate to one another, as one of the earliest adoptions of null model analysis in ecology was done to investigate the ratio of species to genera in a community (Williams, 1947; Smith et al., 2014).

In this study, the predictions of the sampling form of the gamma distribution considerably improved when correlations between community members were included. This result suggests that rather than exclusively pointing to niche construction as previously suggested (Madi et al., 2020), any ecological mechanism that can capture the observed distribution of correlation coefficients is a plausible candidate. Given that models of consumer-resource dynamics have succeeded in capturing macroecological patterns (Chesson, 1990; Cui et al., 2021), including quantitatively predicting the distribution of correlation coefficients (Ho et al., 2022), it is reasonable to suggest that such mechanisms are ultimately responsible for the relationship between fine and coarse-grained measures of diversity and can be reduced to phenomenological models such as the SLM. Indeed, experimental investigations of the slopes evaluated here have found the existence of positive slopes in artificial communities maintained in a laboratory setting, where the strength of the correlation between fine and coarse-grained scales is driven by the secretion of secondary metabolites (Estrela et al., 2022). This mechanism, known as cross-feeding, can be viewed as compatible with the concept of niche construction (San Roman et al., 2018) as well as with the original interpretation of Madi et al., 2020.

In the interest of providing macroecological insight into the DBD hypothesis, we solely focused on coarse-graining procedures that relied on phylogenetic reconstruction and taxonomic assignment. However, it is worth noting that it is also possible to coarse-grain community members by the strength of their correlations (i.e. sum the abundances of each pair of community members with the strongest correlation in AFDs). This procedure has been named the phenomenological renormalization group method due to its ability to identify if and where a system is stable despite knowing little about the system’s dynamics (i.e. fixed points in nonlinear systems) (Nicoletti et al., 2020; Meshulam et al., 2019). However, given that the AFD correlation between two community members is often inversely related to their phylogenetic distance, such an analysis would likely be redundant, as coarse-graining based on the strength of correlation would effectively coarse-grain the most closely related community members (Sireci et al., 2023).

A major goal of this study was to evaluate the novelty of macroecological patterns that were used to bolster support for the DBD hypothesis. We used the same dataset in order to ensure generality and commensurability with past research efforts. However, it is worth inspecting how the use of a global survey dataset constrains the inferences one can make. Throughout this study, we implicitly assumed that an ensemble approach is valid, meaning that we viewed different sites/hosts as virtual copies of a given environment. This assumption can remain valid for time-series studies where the distribution of microbial abundances remains stationary with respect to time (Faith et al., 2013), as the stationary solution of the SLM has successfully characterized microbial community time-series at both the level of OTUs (Grilli, 2020) and strains (Wolff et al., 2023). Given these past results, we predict that the fine vs. coarse-grained relationship results presented here will remain valid in longitudinal studies where community members fluctuate around a single point with respect to time.

Materials and methods

Data acquisition and processing

To ensure that our analyses were generalizable across ecosystems and commensurate with prior DBD investigations, we used amplicon sequence data from the V4 region of the 16S rRNA gene generated and curated by the Earth Microbiome Project (Thompson et al., 2017; Madi et al., 2020). We restricted our analysis to the quality control (QC)-filtered subset of the EMP, which was annotated using the closed-reference database SILVA (Quast et al., 2013) and consists of 96 studies culminating in 23,828 total samples with each processed sample having ≥10,000 reads. We downloaded the public Silva reference tree for OTUs with 97% similarity 97_otus.tre from the EMP database. We identified nine heavily sampled environments in the metadata file emp_qiime_mapping_qc_filtered.tsv and selected 100 random sites from each environment. Summary statistics for each environment are provided (Table 1, Table 2).

Table 1. Summary statistics for the 100 sites randomly selected for each environment.

These statistics reflect the data used for taxonomic coarse-graining, as OTUs lacking taxonomic labels were excluded from taxonomic coarse-graining analyses.

Environment Total # OTUs Mean # OTUs Mean # reads
Marine 9090 690.80 95,129.43
Marine sediment 16,110 1,393.67 40,440.38
Human gut 6175 599.09 32,894.50
Human oral 4716 537.45 44,271.22
Human skin 17,955 1293.45 36,344.13
Freshwater sediment 12,231 1080.95 18,979.56
Microbial mat 5087 200.24 8,659.36
Freshwater 12,052 822.37 33,646.53
Soil 20,298 1814.76 36,268.93

Table 2. Summary statistics for the 100 sites randomly selected for each environment.

These statistics reflect the data used for phylogenetic coarse-graining as all OTUs could be used.

Environment Total # OTUs Mean # OTUs Mean # Reads
Marine 18,173 1,356.37 168,520.66
Marine sediment 41,304 4,167.25 106,166.92
Human gut 10,190 862.73 44,031.12
Human oral 7062 614.97 46,104.84
Human skin 29,448 1817.12 48,285.68
Freshwater sediment 33,193 3,569.59 65,582.56
Microbial mat 11,869 431.42 23,216.02
Freshwater 26,645 1775.89 74,298.17
Soil 45,273 4730.74 106,578.45

We briefly note that our occupancy and richness predictions depend on the form of the gamma distribution that explicitly accounts for sampling as a multinomial process. The multinomial distribution describes the probability of sampling n reads given a relative abundance of x and total read count N with replacement, a process we can model as the Poisson limit of a binomial sampling process for individual community members. Given this choice and the past success of the gamma distribution, we deviated from past analyses by electing to not sub-sample read counts to the same depth, as the process of sampling without a replacement would bias the sampling distribution for rare community members (Madi et al., 2020).

Coarse-graining protocol

Taxonomic coarse-graining was performed as the summation of the abundances of all OTUs within a given taxonomic group. We removed taxa with indeterminate labels to prevent potential biases due to taxonomic misassignment, (e.g. ‘uncultured,’ ‘ambiguous taxa,’ ‘candidatus,’ ‘unclassified,’ etc.). Manual inspection of EMP taxonomic annotations revealed a low number of OTUs that had been assigned the taxonomic label of their host (e.g. Arachis hypogaea (peanut)). These marked OTUs were removed from all downstream analyses.

Phylogenetic coarse-graining was performed using the phylogenetic tree provided by SILVA 123 97_otus.tre in the EMP release. Each internal node of a phylogenetic tree was collapsed if the mean branch lengths of its descendants was less than a given distance. All phylogenetic operations were performed using the Python package ETE3 (Huerta-Cepas et al., 2016).

Deriving biodiversity measure predictions

While the gamma distribution as the stationary solution of the SLM and the sampling form of the gamma distribution have been previously derived (Grilli, 2020), we briefly outline relevant derivations here for the convenience of the reader before deriving the predicted richness and diversity of a community. We define the SLM as the following Langevin equation

dxidt=xiτi(1xiKi)Self-limiting growth+στiτixiη(t)Environmental noise (5)

Here τi, Ki, and στi represent the timescale of growth, the carrying capacity, and the coefficient of variation of growth rate fluctuations, respectively. Multiplicative environmental noise is captured by the product of a linear frequency term, the coefficient of variation of growth rate fluctuations, and a Brownian noise term η(t) that introduces stochasticity into the equation. The expected value of η(t) is η(t)=0 (Gardiner, 2009). The dependence of η(t) at time t on an earlier time η(t) is defined as η(t)η(t)=δ(tt)(Gardiner, 2009). This standard definition means that if the noise term is shifted in time, it has zero correlation with itself. We briefly note that because DBD patterns were originally investigated by Madi et al., using an ensemble of sites that belong to the same type of ecosystem rather than the time series of a single site (Madi et al., 2020), the gamma distribution alone does not prove the validity of the SLM nor does it prove alternatively formulated stochastic differential equations of ecology that also predict a gamma distribution (e.g. George and O’Dwyer, 2022). However, given that the SLM has successfully characterized the temporal dynamics of microbial communities, we believe that this model is an appropriate formulation for investigating DBD patterns (Grilli, 2020; Wolff et al., 2023; Descheemaeker and de Buyl, 2020).

In contrast to the SLM, macroecological predictions can be derived from the UNTB. There are many forms of the UNTB, but the novelty of observed fine vs. coarse-grained relationships was assessed using a form of the UNTB that predicts that the distribution of community member abundances within a given site follows a zero-sum multinomial distribution (Alonso and McKane, 2004; Madi et al., 2020). For the convenience of the reader the predicted richness using the form of the UNTB relevant to this study has been rederived (Supporting information).

The stationary distribution of the SLM can be derived using the Itô ↔ Fokker-Planck equivalence and solving for the stationary solution (Grilli, 2020; Engen and Lande, 1996), resulting in the gamma-distributed AFD. Through the SLM, we can define the mean relative abundance and its squared inverse coefficient of variation as x¯i=Ki(1στi2) and βi=2στiστi, respectively. These are parameters that were estimated from the empirical data and were used below to obtain predictions. Using these definitions and the stationary distribution of Equation 5, we obtained the gamma distribution

P(xi|x¯i,βi)=1Γ(βi)(βix¯i)βiexp[xiβix¯i]xiβi1 (6)

When we sequence microbial communities, one obtains read counts rather than actual abundances. Therefore, it is necessary to account for the reality of sampling when we apply to empirical data. We can account for sampling by first assuming that the probability of observing a single community member can be modeled as a binomial sampling process. Given that the total number of reads is typically large (N1) and the typical relative abundance of a community member is much smaller than one (xi1), the binomial can be approximated as a Poisson sampling process with the following probability of sampling n reads

P(n|N,xi)=(Nxi)neNxin! (7)

This formulation of the sampling process is convenient, as it can be used to obtain an analytic solution for the probability of observing n reads given x¯i and βi, the parameters we estimate from the data. This distribution can be obtained by solving the convolution of the Poisson and the gamma distribution (Grilli, 2020). The resulting distribution can be considered a negative binomial distribution if sites have identical sampling depths (Fisher, 1941). Using this distribution, we calculated the probability of obtaining nm reads out of a total sampling depth of Nm for the ith OTU in sample m as

P(nm|Nm,x¯i,βi)=0P(xi|x¯i,βi)P(nnm|Nm,xi)dxi (8a)
=Γ(βi+nm)nm!Γ(βi)(x¯iNmβi+x¯iNm)nm(βiβi+x¯iNm)βi (8b)

This distribution requires two parameters that can be estimated from the data (x¯i and βi) and one parameter that is known (total number of reads, Nm). This equation will be used to obtain predictions of measures of biodiversity. First, noticing that the probability of a community member’s absence is the complement of its presence, we can define the expected occupancy of a community member across M sites as

oi=1MmM(1P(0|Nm,x¯i,βi)) (9)

And the second moment of occupancy as

oi2=1MmM(1P(0|Nm,x¯i,βi))2 (10)

from which we defined the predicted variance of occupancy

Var(oi)=oi2oi2 (11)

The success of our predictions was assessed using the relative error.

ε=|Obs.Pred.Obs.| (12)

Using the definition of occupancy from the sampling form of the gamma distribution, we derived the expected richness of a community as

S=i=1Stotaloi (13a)
=1Mm=1Mi=1Sobs(1P(0|Nm,x¯i,βi)) (13b)
=Stotal1Mm=1Mi=1StotalP(0|Nm,x¯i,βi) (13c)

where Stotal is the total number of observed community members. Similarly, we derived the expected value of Shannon’s diversity (Magurran, 2004).

H=1Mm=1MHm (14a)
=1Mm=1Mi=1Sobsxln[x]|Nm,x¯i,βi (14b)
=1Mm=1Mi=1Stotal0NmnNmln[nNm]P(n|Nm,x¯i,βi)dn (14c)

In physics parlance, these predictions neglect interactions between community members, also known as mean-field predictions. We then calculated the mean-field prediction of Equation 13a from empirical data. However, there is no known analytic solution for the integral inside the sum of Equation 14a. To calculate H, we performed numerical integration on each integral for each taxon in each sample at a given coarse-grained resolution using the quad() function from SciPy.

To predict the variance of each measure we derived the expected value of the second moment, assuming independence among community members. We derived the second moments of richness and diversity.

S2=(iStotaloi)2 (15a)
=1MmM(iStotaloi,m)2 (15b)
=1MmMi,joi,moj,mdnP(n|N,x¯i,βi)δN,Nm=1 (15c)
=dnP(n|N,x¯i,βi)i,j1MmMδN,Nmoi,moj,m (15d)
=i=1StotaldnP(n|N,x¯i,βi)oi2δN,Nm+ijdnP(n|N,x¯i,βi)oiojδN,Nm (15e)
=i=1Stotaloi2|Nm,x¯i,βi+ijoi|Nm,x¯i,βioj|Nm,x¯j,βj (15f)

where δi,j is the Kronecker delta.

By performing an analogous series of operations, we obtained the expected value of the second moment for diversity.

H2=(iStotalxiln[xi])2 (16a)
=1MmM(iStotalxiln[xi])2 (16b)
=1MmMi,j(xiln[xi])(xjln[xj])dnP(n|N,x¯i,βi)δN,Nm (16c)
=1MmMi=1StotaldnP(n|N,x¯i,βi)δN,Nm(xiln[xi])2 (16d)
+1MmMijdnP(n|N,x¯i,βi)δN,Nm(xiln[xi])(xjln[xj]) (16e)
=1MmMi=1Stotal(xln[x])2|Nm,x¯i,βi (16f)
+1MijmMxln[x]|Nm,x¯i,βixln[x]|Nm,x¯j,βj (16g)

Where the expected value of the second moment of the diversity term is defined as (xln[x])2|Nm,x¯s,βs=0Nm(nNmln[nNm])2 P(n|Nm,x¯s,βs)dn. From which we obtained the expected value of the variance

Var(S)=S2S2 (17a)
Var(H)=H2H2 (17b)

We predicted the mean and variance of richness and diversity separately at each coarse-grained scale. Specifically, we coarse-grained the empirical data, estimate x¯s and βs for each coarse-grained community member, and use these estimates to obtain a prediction for each measure of biodiversity.

It is worth noting why the above functions constitute predictions. To obtain values that we can compare with empirical data we estimated the mean and variance of relative abundance across sites for each community member at a given scale. These parameters were used to obtain the expected value of a community-level measure (e.g. richness) using a function. These functions were derived under the assumption that a given probability distribution (i.e. the gamma) provided an appropriate description of the distribution of relative abundances across sites. We then compared the expected value of a community-level measure to the mean value from empirical data and assessed the similarity between the two values.

Fine vs. coarse-grained relationship slope inference

In order to predict the relationship between the measures within a coarse-grained group and that among all remaining groups, we calculated a vector of predicted richness or diversity estimates for all sites using Equation 3a or Equation 4a within a given coarse-grained group and 3b or Equation 4b among the remaining groups. This ‘leave-one-out’ procedure was originally implemented by Madi et al., where the authors examined the slope of fine vs. coarse-grained measures of diversity as a sliding window across taxonomic ranks with both the fine and coarse scales increasing with each rank (e.g. genus:family, family: order, etc.) (Madi et al., 2020). To maintain consistency, we used the same definition for our predictions. We also extended the definition to the case of phylogenetic coarse-graining, where we compared fine and coarse scales using different phylogenetic distances while retaining the same ratio (e.g. 0.1:0.3, 0.3:0.5, etc.). Slopes were estimated using ordinary least squares regression with SciPy. Throughout the manuscript the success of a prediction was evaluated by calculating its relative error as follows: we only inferred the slope if a fine-grained group had at least five members. We only examined the slopes of a given coarse-grained threshold if at least three slopes could be inferred.

Simulating communities of correlated gamma-distributed AFDs

Correlated gamma-distributed AFDs were simulated by performing inverse transform sampling. For each environment with M sites, an M×Sobs matrix Z was generated from the standard Gaussian distribution using the empirical Sobs×Sobs correlation matrix calculated from relative abundances. The cumulative distribution U=Φ(Z)Gaus. was calculated and a matrix of the abundances of community members across sites was obtained using the point percentile function of the gamma distribution and the empirical distribution of mean relative abundances and the squared inverse coefficient of variation of abundances: x¯=x¯1,x¯2,,x¯Sobs, β=β1,β2,,βSobs. To simulate the process of sampling, each community of the resulting M×Sobs matrix of true relative abundances X=Φ(U)Gamma1 was sampled using a multinomial distribution with the empirical distribution of total read counts.

Acknowledgements

This work was supported by the NSF Postdoctoral Research Fellowships in Biology Program under Grant No. 2010885 (WRS).

Appendix 1

Supporting information: Investigating macroecological patterns in coarse-grained microbial communities using the stochastic logistic model of growth

UNTB richness predictions

Below we rederive a prediction for richness using the form of the UNTB used by Madi et al., as a point of comparison to the SLM predictions derived in the main manuscript (Madi et al., 2020; Alonso and McKane, 2004). When the size of a metacommunity tends towards an asymptotic limit, the stationary distribution for community members of relative abundance x approaches the following continuous distribution (Vallade and Houchmandzadeh, 2003).

P(x|θ)dx=θx(1x)θ1dx (S1)

where θ is Hubbell’s biodiversity parameter (also known as Fisher’s α Fisher et al., 1943).

Using this distribution, we can obtain an expression for the expected number of community members with n sampled individuals out of a total sample size of N.

S(n|N,m,θ)=θ01P(n|N,m,x)(1x)θ1xdx (S2)

where P(n;N,m,x) is the probability of sampling n individuals of relative abundance x given a total sample size N

P(n|N,m,x)=(Nn)Γ(n+γx)Γ(γx)Γ(N+γ(1x)n)Γ(γ(1x))Γ(γ)Γ(γ+N) (S3)

where γ=m(N1)1m. The function Equation S2 is known as the migration-limited zero-sum multinomial distribution (ZSM) (Alonso and McKane, 2004). As m1, Equation S2 approaches a limiting form known as the metacommunity zero-sum multinomial distribution (mZSM). The process of sampling community members under the mZSM can be represented as a binomial distribution.

S(n|N,θ)=θ01xN(1x)Nn(1x)θ1xdx (S4)

Similar to our analysis using the SLM, the binomial can be approximated as a Poisson distribution.

S(n|N,θ)=θ01exN(xN)nn!(1x)θ1xdx (S5)

The resulting integral can be obtained by using the change of variable y=xN and rearranging terms, then approximating the upper limit of integration as infinity (since N1).

S(n|N,θ)=θ01ey(y)nn!(1yN)θ1NydyN (S6a)
=θn01eyyn1(n1)!Gamma distribution(1yN)θ1dy (S6b)
θn0eyyn1(n1)!Gamma distribution(1yN)θ1dy (S6c)
=θn(1YN)θ1 (S6d)

By rearranging terms, we obtain a gamma distribution with shape parameter n and rate parameter 1. Because we integrated over a product with a gamma distribution, the variable Y is a gamma-distributed random variable. We can then expand term in the integral using a Taylor series around Y=n and by noticing that Y=n and Y2=n2+n under a gamma distribution.

S(n|N,θ)=θn(1nN)θ1+12θ(θ1)(θ2)N2(1nN)θ3+O(N3) (S7)

We can then predict the richness of a community by summing the abundances from 1 to N.

S(N,θ)=n=1NS(n|N,θ) (S8)

This quantity represents the total observed richness of a sample from a panmictic infinite metacommunity after accounting for sampling. Predictions of mean richness over M sites can then be calculated as

S(θ)=1Mm=1MS(Nm,θ) (S9)
Simulating fine vs. coarse-grained richness slopes under the UNTB

We followed the procedure in Madi et al., to obtain fine vs. coarse-grained slopes for richness so that they could be compared to predictions obtained from the SLM (Madi et al., 2020). We simulated SADs according to the mZSM model outlined above using the rmzsm() function from the R package sads v0.4.2. We simulated 100 SADs using the empirical distribution of total read counts and the total number of observed OTUs. We set the biodiversity parameter θ=50 for all environments. The SADs returned by rmzsm() contain no zeros, meaning that values of richness are identical for all UNTB SADs. In order to introduce zeros so that richness estimates could vary, we followed the procedure used in Madi et al., where each simulated SAD was rarefied to 5000 individuals. We repeated this rarefaction procedure on the empirical SADs. We then performed taxonomic and phylogenetic coarse-graining and fine vs. coarse-grained slope inference using the procedure described in the Materials and methods.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

William R Shoemaker, Email: williamrshoemaker@gmail.com.

Bernhard Schmid, University of Zurich, Switzerland.

Meredith C Schuman, University of Zurich, Switzerland.

Funding Information

This paper was supported by the following grant:

  • National Science Foundation 2010885 to William R Shoemaker.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Conceptualization, Formal analysis, Supervision, Methodology, Writing – original draft, Writing – review and editing.

Additional files

MDAR checklist

Data availability

All sequencing data used in this study was obtained from the Earth Microbiome Project (URL: https://ftp.microbio.me/emp/release1/). Processed data used to perform the analyses in this study are available on Zenodo, DOI: https://doi.org/10.5281/zenodo.7692046. All code written for this study is available on GitHub under a GNU General Public License: https://github.com/wrshoemaker/macroeco_phylo (copy archived at Shoemaker, 2023b).

The following dataset was generated:

Shoemaker WR. 2023. Macroecological patterns in coarse-grained microbial communities. Zenodo.

The following previously published datasets were used:

Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project mapping files. Earth Microbiome Project. release1/mapping_files

Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project phylogeny and taxonomy. Earth Microbiome Project. release1/otu_info/silva_123

Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project count data. Earth Microbiome Project. release1/otu_tables/closed_ref_silva

References

  1. Alonso D, McKane AJ. Sampling Hubbell’s neutral theory of biodiversity. Ecology Letters. 2004;7:901–910. doi: 10.1111/j.1461-0248.2004.00640.x. [DOI] [Google Scholar]
  2. Azaele S, Pigolotti S, Banavar JR, Maritan A. Dynamical evolution of ecosystems. Nature. 2006;444:926–928. doi: 10.1038/nature05320. [DOI] [PubMed] [Google Scholar]
  3. Azaele S, Suweis S, Grilli J, Volkov I, Banavar JR, Maritan A. Statistical mechanics of ecological systems: Neutral theory and beyond. Reviews of Modern Physics. 2016;88:035003. doi: 10.1103/RevModPhys.88.035003. [DOI] [Google Scholar]
  4. Barberán A, Casamayor EO, Fierer N. The microbial contribution to macroecology. Frontiers in Microbiology. 2014;5:203. doi: 10.3389/fmicb.2014.00203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Barnabani M. An approximation to the convolution of gamma distributions. Communications in Statistics - Simulation and Computation. 2017;46:331–343. doi: 10.1080/03610918.2014.963612. [DOI] [Google Scholar]
  6. Behme A, Bondesson L. A class of scale mixtures of $\operatorname{Gamma}(k)$-distributions that are generalized gamma convolutions. Bernoulli. 2017;23:773–787. doi: 10.3150/15-BEJ761. [DOI] [Google Scholar]
  7. Burns AR, Stephens WZ, Stagaman K, Wong S, Rawls JF, Guillemin K, Bohannan BJ. Contribution of neutral processes to the assembly of gut microbial communities in the zebrafish over host development. The ISME Journal. 2016;10:655–664. doi: 10.1038/ismej.2015.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Calcagno V, Jarne P, Loreau M, Mouquet N, David P. Diversity spurs diversification in ecological communities. Nature Communications. 2017;8:15810. doi: 10.1038/ncomms15810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chesson P. MacArthur’s consumer-resource model. Theoretical Population Biology. 1990;37:26–38. doi: 10.1016/0040-5809(90)90025-Q. [DOI] [Google Scholar]
  10. Covo S, Elalouf A. A novel single-gamma approximation to the sum of independent gamma variables, and A generalization to infinitely divisible distributions. Electronic Journal of Statistics. 2014;8:EJS914. doi: 10.1214/14-EJS914. [DOI] [Google Scholar]
  11. Cui W, Marsland R, Mehta P. Diverse communities behave like typical random ecosystems. Physical Review. E. 2021;104:034416. doi: 10.1103/PhysRevE.104.034416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dal Bello M, Lee H, Goyal A, Gore J. Resource–diversity relationships in bacterial communities reflect the network structure of microbial metabolism. Nature Ecology & Evolution. 2021;5:1424–1434. doi: 10.1038/s41559-021-01535-8. [DOI] [PubMed] [Google Scholar]
  13. Desai MM, Walczak AM, Fisher DS. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics. 2013;193:565–585. doi: 10.1534/genetics.112.147157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Descheemaeker L, de Buyl S. Stochastic logistic models reproduce experimental time series of microbial communities. eLife. 2020;9:e55650. doi: 10.7554/eLife.55650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Descheemaeker L, Grilli J, de Buyl S. Heavy-tailed abundance distributions from stochastic Lotka-Volterra models. Physical Review. E. 2021;104:034404. doi: 10.1103/PhysRevE.104.034404. [DOI] [PubMed] [Google Scholar]
  16. Engen S, Lande R. Population dynamic models generating species abundance distributions of the gamma type. Journal of Theoretical Biology. 1996;178:325–331. doi: 10.1006/jtbi.1996.0028. [DOI] [Google Scholar]
  17. Estrela S, Sanchez-Gorostiaga A, Vila JC, Sanchez A. Nutrient dominance governs the assembly of microbial communities in mixed nutrient environments. eLife. 2021;10:e65948. doi: 10.7554/eLife.65948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Estrela S, Diaz-Colunga J, Vila JCC, Sanchez-Gorostiaga A, Sanchez A. Diversity Begets Diversity under Microbial Niche Construction. bioRxiv. 2022 doi: 10.1101/2022.02.13.480281. [DOI]
  19. Faith JJ, Guruge JL, Charbonneau M, Subramanian S, Seedorf H, Goodman AL, Clemente JC, Knight R, Heath AC, Leibel RL, Rosenbaum M, Gordon JI. The long-term stability of the human gut microbiota. Science. 2013;341:1237439. doi: 10.1126/science.1237439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fisher RA. The negative binomial distribution. Annals of Eugenics. 1941;11:182–187. doi: 10.1111/j.1469-1809.1941.tb02284.x. [DOI] [Google Scholar]
  21. Fisher RA, Corbet AS, Williams CB. The relation between the number of species and the number of individuals in a random sample of an animal population. The Journal of Animal Ecology. 1943;12:42. doi: 10.2307/1411. [DOI] [Google Scholar]
  22. Gardiner CW. Stochastic Methods: A Handbook for the Natural and Social Sciences. Berlin Heidelberg: Springer; 2009. [Google Scholar]
  23. Gaston KJ, Blackburn TM, Greenwood JJD, Gregory RD, Quinn RM, Lawton JH. Abundance–occupancy relationships. Journal of Applied Ecology. 2000;37:39–59. doi: 10.1046/j.1365-2664.2000.00485.x. [DOI] [Google Scholar]
  24. George AB, O’Dwyer J. Universal Abundance Fluctuations across Microbial Communities, Tropical Forests, and Urban Populations. bioRxiv. 2022 doi: 10.1101/2022.09.14.508016. [DOI] [PMC free article] [PubMed]
  25. Goldford JE, Lu N, Bajić D, Estrela S, Tikhonov M, Sanchez-Gorostiaga A, Segrè D, Mehta P, Sanchez A. Emergent simplicity in microbial community assembly. Science. 2018;361:469–474. doi: 10.1126/science.aat1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Good BH, Hallatschek O. Effective models and the search for quantitative principles in microbial evolution. Current Opinion in Microbiology. 2018;45:203–212. doi: 10.1016/j.mib.2018.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Good BH, Rosenfeld LB. Eco-Evolutionary Feedbacks in the Human Gut Microbiome. bioRxiv. 2022 doi: 10.1101/2022.01.26.477953. [DOI] [PMC free article] [PubMed]
  28. Gotelli NJ, Graves GR. Null models in ecology. 1996. [September 19, 2023]. http://repository.si.edu/xmlui/handle/10088/7782
  29. Gotelli NJ, Ulrich W. Statistical challenges in null model analysis. Oikos. 2012;121:171–180. doi: 10.1111/j.1600-0706.2011.20301.x. [DOI] [Google Scholar]
  30. Grilli J. Macroecological laws describe variation and diversity in microbial communities. Nature Communications. 2020;11:4743. doi: 10.1038/s41467-020-18529-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Harris K, Parsons TL, Ijaz UZ, Lahti L, Holmes I, Quince C. Linking statistical and ecological theory: Hubbell’s unified neutral theory of biodiversity as a hierarchical dirichlet process. Proceedings of the IEEE. 2017;105:516–529. doi: 10.1109/JPROC.2015.2428213. [DOI] [Google Scholar]
  32. Harte J. Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics. Oxford: Oxford University Press; 2011. [DOI] [Google Scholar]
  33. Harvey PH, Colwell RK, Silvertown JW, May RM. Null models in ecology. Annual Review of Ecology and Systematics. 1983;14:189–211. doi: 10.1146/annurev.es.14.110183.001201. [DOI] [Google Scholar]
  34. Ho PY, Good BH, Huang KC. Competition for fluctuating resources reproduces statistics of species abundance over time across wide-ranging microbiotas. eLife. 2022;11:e75168. doi: 10.7554/eLife.75168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Hu C, Pozdnyakov V, Yan J. Density and distribution evaluation for convolution of independent gamma variables. Computational Statistics. 2020;35:327–342. doi: 10.1007/s00180-019-00924-9. [DOI] [Google Scholar]
  36. Hubbell SP. The Unified Neutral Theory of Biodiversity and Biogeography (MPB-32) Princeton University Press; 2011. [DOI] [PubMed] [Google Scholar]
  37. Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Molecular Biology and Evolution. 2016;33:1635–1638. doi: 10.1093/molbev/msw046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jun S, Si F, Pugatch R, Scott M. Fundamental principles in bacterial physiology-history, recent progress, and the future with focus on cell size control: a review. Reports on Progress in Physics. Physical Society. 2018;81:056601. doi: 10.1088/1361-6633/aaa628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Laland KN, Odling-Smee FJ, Feldman MW. Evolutionary consequences of niche construction and their implications for ecology. PNAS. 1999;96:10242–10247. doi: 10.1073/pnas.96.18.10242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lennon JT, Locey KJ. More support for Earth’s massive microbiome. Biology Direct. 2020;15:5. doi: 10.1186/s13062-020-00261-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Li L, Ma ZS. Testing the neutral theory of biodiversity with human microbiome datasets. Scientific Reports. 2016;6:31448. doi: 10.1038/srep31448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Lim JJ, Diener C, Wilson J, Valenzuela JJ, Baliga NS, Gibbons SM. Growth phase estimation for abundant bacterial populations sampled longitudinally from human stool metagenomes. Nature Communications. 2023;14:5682. doi: 10.1038/s41467-023-41424-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Locey KJ, Lennon JT. Scaling laws predict global microbial diversity. PNAS. 2016;113:5970–5975. doi: 10.1073/pnas.1521291113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Louca S, Parfrey LW, Doebeli M. Decoupling function and taxonomy in the global ocean microbiome. Science. 2016;353:1272–1277. doi: 10.1126/science.aaf4507. [DOI] [PubMed] [Google Scholar]
  45. Madi N, Vos M, Murall CL, Legendre P, Shapiro BJ. Does diversity beget diversity in microbiomes? eLife. 2020;9:e58999. doi: 10.7554/eLife.58999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Madi NJ, Chen D, Wolff R, Shapiro BJ, Garud NR. Community diversity is associated with intra-species genetic diversity and gene loss in the human gut microbiome. eLife. 2023;12:e78530. doi: 10.7554/eLife.78530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Magurran AE. Measuring Biological Diversity. Malden, Ma: Blackwell Pub; 2004. [Google Scholar]
  48. Maynard DS, Bradford MA, Lindner DL, van Diepen LTA, Frey SD, Glaeser JA, Crowther TW. Diversity begets diversity in competition for space. Nature Ecology & Evolution. 2017;1:0156. doi: 10.1038/s41559-017-0156. [DOI] [PubMed] [Google Scholar]
  49. McGill BJ. Towards a unification of unified theories of biodiversity. Ecology Letters. 2010;13:627–642. doi: 10.1111/j.1461-0248.2010.01449.x. [DOI] [PubMed] [Google Scholar]
  50. Meshulam L, Gauthier JL, Brody CD, Tank DW, Bialek W. Coarse graining, fixed points, and scaling in a large population of neurons. Physical Review Letters. 2019;123:178103. doi: 10.1103/PhysRevLett.123.178103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Moran J, Tikhonov M. Defining coarse-grainability in a model of structured microbial ecosystems. Physical Review X. 2022;12:12.021038. doi: 10.1103/PhysRevX.12.021038. [DOI] [Google Scholar]
  52. Murakami H. Approximations to the distribution of sum of independent non-identically gamma random variables. Mathematical Sciences. 2015;9:205–213. doi: 10.1007/s40096-015-0169-2. [DOI] [Google Scholar]
  53. Nicoletti G, Suweis S, Maritan A. Scaling and criticality in a phenomenological renormalization group. Physical Review Research. 2020;2:023144. doi: 10.1103/PhysRevResearch.2.023144. [DOI] [Google Scholar]
  54. O’Dwyer JP, Kembel SW, Sharpton TJ. Backbones of evolutionary history test biodiversity theory for microbes. PNAS. 2015;112:8356–8361. doi: 10.1073/pnas.1419341112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. O’Dwyer JP, Rominger A, Xiao X. Reinterpreting maximum entropy in ecology: a null hypothesis constrained by ecological mechanism. Ecology Letters. 2017;20:832–841. doi: 10.1111/ele.12788. [DOI] [PubMed] [Google Scholar]
  56. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. San Roman M, Wagner A, Langille M. An enormous potential for niche construction through bacterial cross-feeding in a homogeneous environment. PLOS Computational Biology. 2018;14:e1006340. doi: 10.1371/journal.pcbi.1006340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Schluter D, Pennell MW. Speciation gradients and the distribution of biodiversity. Nature. 2017;546:48–55. doi: 10.1038/nature22897. [DOI] [PubMed] [Google Scholar]
  59. Schweinsberg J. Coalescent processes obtained from supercritical Galton–Watson processes. Stochastic Processes and Their Applications. 2003;106:107–139. doi: 10.1016/S0304-4149(03)00028-0. [DOI] [Google Scholar]
  60. Scott M, Klumpp S, Mateescu EM, Hwa T. Emergence of robust growth laws from optimal regulation of ribosome synthesis. Molecular Systems Biology. 2014;10:747. doi: 10.15252/msb.20145379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Shade A, Dunn RR, Blowes SA, Keil P, Bohannan BJM, Herrmann M, Küsel K, Lennon JT, Sanders NJ, Storch D, Chase J. Macroecology to unite all life, large and small. Trends in Ecology & Evolution. 2018;33:731–744. doi: 10.1016/j.tree.2018.08.005. [DOI] [PubMed] [Google Scholar]
  62. Shoemaker WR, Locey KJ, Lennon JT. A macroecological theory of microbial biodiversity. Nature Ecology & Evolution. 2017;1:0107. doi: 10.1038/s41559-017-0107. [DOI] [PubMed] [Google Scholar]
  63. Shoemaker WR. A macroecological perspective on genetic diversity in the human gut microbiome. PLOS ONE. 2023a;18:e0288926. doi: 10.1371/journal.pone.0288926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Shoemaker WR. Macroeco_Phylo. swh:1:rev:3d0eab340cb9fdf6f3399b8f4bc23cd3674c6a72Software Heritage. 2023b https://archive.softwareheritage.org/swh:1:dir:1900fe8b4b2c743f8863627e1edcf361bdc884d5;origin=https://github.com/wrshoemaker/macroeco_phylo;visit=swh:1:snp:d3d67c93d7d811c8e62ed8ac4500f03f1919a40f;anchor=swh:1:rev:3d0eab340cb9fdf6f3399b8f4bc23cd3674c6a72
  65. Shoemaker WR, Sánchez Á, Grilli J. Macroecological laws in experimental microbial communities. bioRxiv. 2023c doi: 10.1101/2023.07.24.550281. [DOI]
  66. Simberloff D. Competition theory, hypothesis-testing, and other community ecological buzzwords. The American Naturalist. 1983;122:626–635. doi: 10.1086/284163. [DOI] [Google Scholar]
  67. Sireci M, Muñoz MA, Grilli J. Environmental fluctuations explain the universal decay of species-abundance correlations with phylogenetic distance. PNAS. 2023;120:2217144120. doi: 10.1073/pnas.2217144120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Sloan WT, Woodcock S, Lunn M, Head IM, Curtis TP. Modeling taxa-abundance distributions in microbial communities using environmental sequence data. Microbial Ecology. 2007;53:443–455. doi: 10.1007/s00248-006-9141-x. [DOI] [PubMed] [Google Scholar]
  69. Smith FA, Gittleman JL, Brown JH. Foundations of Macroecology: Classic Papers with Commentaries. University of Chicago Press; 2014. [DOI] [Google Scholar]
  70. Stewart T, Strijbosch LWG, Moors H, Batenburg van P. A Simple Approximation to the Convolution of Gamma Distributions. 2007. [September 19, 2023]. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=900109
  71. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R, Earth Microbiome Project Consortium A communal catalogue reveals Earth’s multiscale microbial diversity. Nature. 2017;551:457–463. doi: 10.1038/nature24621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Tian L, Wang XW, Wu AK, Fan Y, Friedman J, Dahlin A, Waldor MK, Weinstock GM, Weiss ST, Liu YY. Deciphering functional redundancy in the human microbiome. Nature Communications. 2020;11:6217. doi: 10.1038/s41467-020-19940-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Tikhonov M. Theoretical microbial ecology without species. Physical Review. E. 2017;96:032410. doi: 10.1103/PhysRevE.96.032410. [DOI] [PubMed] [Google Scholar]
  74. Vallade M, Houchmandzadeh B. Analytical solution of a neutral model of biodiversity. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics. 2003;68:061902. doi: 10.1103/PhysRevE.68.061902. [DOI] [PubMed] [Google Scholar]
  75. Volkov I, Banavar JR, Hubbell SP, Maritan A. Neutral theory and relative species abundance in ecology. Nature. 2003;424:1035–1037. doi: 10.1038/nature01883. [DOI] [PubMed] [Google Scholar]
  76. Warren RJ, Costa JT, Bradford MA. Seeing shapes in clouds: the fallacy of deriving ecological hypotheses from statistical distributions. Oikos. 2022;2022:09315. doi: 10.1111/oik.09315. [DOI] [Google Scholar]
  77. Whittaker RH. Evolution and measurement of species diversity. TAXON. 1972;21:213–251. doi: 10.2307/1218190. [DOI] [Google Scholar]
  78. Williams CB. The generic relations of species in small ecological communities. The Journal of Animal Ecology. 1947;16:11. doi: 10.2307/1502. [DOI] [Google Scholar]
  79. Wolff R, Shoemaker W, Garud N. Ecological stability emerges at the level of strains in the human gut microbiome. mBio. 2023;14:e0250222. doi: 10.1128/mbio.02502-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Zaoli S, Grilli J. A macroecological description of alternative stable states reproduces intra- and inter-host variability of gut microbiome. Science Advances. 2021;7:abj2882. doi: 10.1126/sciadv.abj2882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Zaoli S, Grilli J, Bollenbach T. The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communities. PLOS Computational Biology. 2022;18:e1010043. doi: 10.1371/journal.pcbi.1010043. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife assessment

Bernhard Schmid 1

This valuable study considers empirical macroecological patterns in microbiome data across multiple taxonomic scales. The work convincingly shows that the Stochastic Logistic Growth model is a more appropriate choice of null model than the neutral theory of biodiversity. The work will be of particular interest to microbial ecologists.

Reviewer #1 (Public Review):

Anonymous

Shoemaker and Grilli analyze publicly available sequencing data to quantify how the microbial diversity of ecosystems changes with the taxonomic scale considered (e.g., diversity of genera vs diversity of families). This study builds directly on Grilli's 2020 paper which used this data to show that for many different microbial species, the distribution of abundances of the species across sampling sites belongs to a simple one-parameter family of gamma distributions. In this work, they show that the gamma distribution also describes the distribution of abundances of higher taxonomic levels. The distribution now requires two parameters, but the second parameter can be approximately derived by treating the distributions of lower-level taxonomic units as being independent. The difference between the species-level result and the result at higher taxonomic levels suggests that in some sense microbial species are ecologically meaningful units.

While the higher-level taxon abundance distributions can be well-approximated assuming independence of the constituent species, this approach substantially underestimates variation in community richness and diversity among sampling sites. Much of this extra variability appears to be driven by variability in sample size across sites. It is not clear to me how much this variation in sample size is itself due to variation in sampling effort versus variation in overall microbial densities. This variation in sample size also produces correlations between taxon richness at lower and higher taxonomic levels. For instance, sites with large samples are likely to have both many species within a genus and many genera. The authors also consider taxon diversity (Shannon index, i.e. entropy), which is constructed from frequencies and is therefore less sensitive to sample size. In this case, correlations between diversity across taxonomic scales instead appear to depend on the idiosyncratic correlations among species abundances.

This paper's results are presented in a fairly terse manner, even when they are describing summary statistics that require a lot of thought to interpret. I don't think it would make sense to try to understand it without having first worked through the 2020 paper. But everyone interested in a general understanding of microbial ecology should read the 2020 paper, and once one has done that, this paper is worth reading as well simply for seeing how the major pattern in that paper shifts as one moves up in taxonomic scale.

Reviewer #3 (Public Review):

Anonymous

Summary

In this research advance, the authors purport to show that the unified neutral theory of biodiversity (UNTB) is not a suitable null model for exploring the relationship between macroecological quantities, and additionally that the stochastic logistic growth model (SLM) is a viable replacement. They do this by citing other studies where UNTB was unable to capture individual macroecological quantities, and then demonstrating SLM's strength at predicting the same diversity metrics. They extend this analysis to show SLM's modeling capability at multiple scales of coarse graining, in addition to its failures at predicting these metrics' variances. Finally, authors conduct a similar analysis to Madi et al. (2020) by investigating the relationship between diversity measures within a group and across coarse-grained groups (e.g. genera diversity in one family compared to diversity of families). The authors show that choosing SLM as a null model reveals some previously reported relationships to be no longer "novel", in the sense that the patterns can be adequately captured by the null model. Authors also show that relationships not captured by the null model can be recovered by adding correlations, suggesting interactions are the driving force behind them.

Strengths

1. Authors make a strong argument that UNTB is not a good null model of macroecological observables and especially relationships between them. Authors convincingly argue that a SLM is a better null since the gamma distribution it predicts is a better description of the empirical Abundance Fluctuation Distributions (AFD).

2. Authors show that the gamma distribution predicted by SLM is a good fit for the AFD's at many different scales of coarse graining, not just the OTU level as was previously demonstrated. Authors show the same distribution predicted the mean diversity and richness at all scales of coarse graining.

3. Authors convincingly demonstrate how SLM can be used to test the relevance of interactions to macroecological relationships.

Weaknesses

This reviewer's concerns were convincingly addressed by the revisions.

Overall Impact

The authors present a convincing argument for the use of SLM as a better non-interacting null model for macroecological quantities and relationships.

eLife. 2024 Jan 22;12:RP89650. doi: 10.7554/eLife.89650.3.sa3

Author Response

William Randolph Shoemaker 1, Jacopo Grilli 2

The following is the authors’ response to the original reviews.

Reviewer #1:

In no particular order:

1. In Figs S3 and S4, can they also show gamma fit? (or rather corrected fit accounting for abundance conditioning?) The shapes look different, especially for the microbial mat.

Author response: We have added gamma distribution fits to the rescaled AFD plots (Figs. S3, S4).

1. Lines 170-176 seem like they should come before lines 164-166.

Author response: In lines 166-170 we discuss empirical patterns in the data that motivate the introduction of the SLM as a model in lines 170-175. We have clarified these points in the revision.

1. The wiggles in the gamma predictions in the occupancy-abundance plots are because occupancy depends not only on abundance but also on the shape parameter, right? Probably good to write a sentence or two explaining what's going on here.

Author response: We agree with the reviewer that the variation in the prediction could be in-part driven by variation in the shape parameter across community members. We now include this observation in our revision (lines 209-211).

1. In the predicted vs observed occupancy plots, it would be nice to add curves showing predicted standard deviation or similar to give a sense of how well the model is predicting the variability.

Author response: In the revised manuscript we now include predictions for the variance of occupancy using the gamma distribution under both taxonomic and phylogenetic coarse-graining (Fig. S9; S10; lines 211-214).

1. Covariance between sister groups: Figs S9 and S10 look very nice, but it's hard to see much because they're log-log plots over multiple decades, while even a several-fold difference from y = x would indicate a strong effect of correlations. It would be clearer if the y-axis showed the ratio of the coarsegrained variance to the sum of OTU variances and we were looking at how well it fit y = 1.

Author response: We have included these plots in the revision (Fig. S14, S15).

1. If the sum of gammas can be well-approximated by a gamma, does that mean that the gamma is just a fairly flexible distribution and we shouldn't take the quality of the gamma fits in general as a very specific indication of what's going on?

Author response: While the sum of random variables that are drawn from gamma distributions with different parameters is often well-approximated by another gamma, this does not tell us why the gamma distribution holds for microbial communities at the finest-grain level (i.e.,OTUs/ASVs). At present, the best explanation is that the gamma is a stationary distribution for certain stochastic differential equations which have ecological interpretations (Grilli, 2020; Shoemaker et al., 2023). Furthermore, alternative two-parameter distributions have been tested alongside the gamma and have done a comparatively poor job capturing observed macroecological patterns (Grilli, 2020). These results suggest that the utility of the gamma distribution is not simply an outcome of its flexible nature, it succeeds because it has captured core ecological properties of microbial communities. In the case of the SLM, gamma-like distributions arise when a community member is subject to self-limiting growth and environmental noise. On the other hand, the stability of the gamma distribution might explain why it can be detected as shape of the AFD, as it does not fade out across coarse-graining level.

1. What's going on with the variance of diversity in Fig S12? Does this suggest that some of the problem in Figure 4 could be with the analytic approximation rather than the model? I had a hard time understanding the part of the Methods explaining the simulation details (lines 587-597). It would be worth expanding this. Is there some way to explain how the correlations were simulated in terms of the SLM, e.g., correlations in the noise term across OTUs?

Author response: We believe that deviations in the variance of diversity in Fig. S16g,h are driven by small deviations in our predictions of the second moment (xln(x)|Nm,x¯i,βi2) (Eq. S16). Alone these predictions are slight, but their effects become noticeable when summed over hundreds or thousands of taxa. We have included this observation in the revised manuscript (lines 268-271). However, this deviation pales in comparison with the magnitude of covariance in the empirical data, suggesting that our inability to predict the variance of richness and diversity is primarily driven by our assumption of statistical independence.

Regarding the source of the correlations, under the SLM correlations in abundances can be introduced either by adding deterministic interaction terms or through correlated environmental noise. Determining which of these two options drives empirical correlations is an active area of research (e.g., Camacho-Mateu et al., 2023). For the purpose of this study, we remain agnostic on the cause of the correlations, optioning to instead emphasize that that the inclusion of correlations is necessary to reproduce observed slopes of the fine vs. coarse-grained relationship for diversity.

1. In Figure 5ab, is the idea that the correlation in richness is primarily driven by the number of samples from the environment? Line 390 seems to say so, but it would be good to make this explicit and put it right in that section of the Results.

Author response: Our results suggest that sampling effort (# reads) plays a larger role in determining the correlations between fine and coarse-grained measures of richness. We now clarify this point in the revised manuscript (lines 429-435).

1. I don't totally understand the contrast in lines 369-372. If fine-scale diversity within one group begets coarse-grained diversity in another group, couldn't that show up as correlations in the AFDs? Or is the argument that only including within-group correlations in AFDs is enough to reproduce the pattern? I'm not sure I see how that could be.

Author response: The term “begets” implies both causation and direction. If we see a positive relationship between diversity estimates at two different scales of observation the causal mechanism cannot be determined solely from correlations between samples obtained once from different sites. So, mechanisms consistent with niche construction/"DBD" can produce correlations, though the existence of correlations do not necessarily imply DBD.

1. The discussion of niche construction on 429-431 doesn't match very well with 440-441. Basically, niche construction is a very broad concept, not a specific one, right?

Author response: In lines 472-576 (formerly 429-431) we discuss how the existence of correlations between fine and coarse-grained scales does not point to a single ecological mechanism. Alternatively stated, observing a non-zero slope does not mean that niche construction is driving the relationship.

In lines 476-487 (formerly 440-441) we discuss how the mechanism of cross-feeding has been shown to generate a positive relationship between fine and coarse-grained measures of diversity. This mechanism can be interpreted as a form of “niche construction”, so it is an instance of a tested ecological mechanism that aligns with the interpretation given in Madi et al. (2020).

1. Isn't (8) just the negative binomial distribution?

Author response: The convolution of the stationary solution of the SLM (i.e., a gamma distribution) and the Poisson limit of a multinomial sampling distribution returns a negative binomial distribution of read counts across hosts if samples have identical sampling depths. We now include this detail in the revision (line 593-595). Note however that if different samples have different sampling depths, the distribution of reads across samples is not a negative binomial.

1. Missing 1/M in (9).

Author response: We have fixed this omission in the revision.

1. Schematic figures illustrating what the different statistics are intuitively capturing would really help this work be understandable to a broader audience, but they'd also be a ton of work.

Author response: Richness and diversity are used in ecology to such an extent that we do not see the benefit of a conceptual diagram. Furthermore, we have included a conceptual diagram about our pipeline in our revision at the request of Reviewer 2 (Fig. S20).

Reviewer #2:

Major Recommendations

If I were reviewing this manuscript for a regular journal, I believe the following issues would be important to address prior to publication.

1. From my reading, the main points of this advance are that

a. SLM models AFDs well at all levels of coarse-graining.

b. This makes SLM a better null-model than UNTB for macroecological relationships.

c. Using SLM on the EMP data, the richness slopes are well explained by SLM but not the diversity slopes. Therefore, any theory that hopes to explain the diversity slopes must include interactions. Argument B appears to be one of the key points yet is missing from the abstract, and should be made clearer. If these aren't the main points the authors intended, then other main points need to be highlighted more.

Author response: In the revision we now explicitly mention argument b in the Abstract.

1. The title should be more specific, so as to better reflect the content. (E.g. "UNTB is not a good null model for macroecological patterns" would seem more appropriate.)

Author response: We would prefer to focus on the success of the SLM rather than the limitations of the UNTB in the title of this work. Therefore, we have modified our title as follows: “Investigating macroecological patterns in coarse-grained microbial communities using the stochastic logistic model of growth”.

1. The manuscript would benefit from a clearer description of exactly what information the SLM retains about the data (perhaps even a cartoon panel in one of the figures). In particular, it is important to be explicit about the number of model parameters.

Author response: The number of model parameters for the gamma AFD are now explicitly stated in the revision (Lines 579-580).

1. The main point of Figures 2-4 seems to be that SLM is good at describing the data (and when it fails it is due to interactions) while UNTB fails to reproduce this behavior, in support of Argument B. This is not clear from the figure descriptions or titles, which focus on SLM's "predictive" power.

Author response: Fig. 2a demonstrates that the gamma distribution predicted by the SLM explains the empirical distribution of abundances. This result provides motivation to predict the fraction of sites harboring a given community member (i.e., occupancy, Fig. 2c) as well as general measures of community composition including mean richness (Fig. 3a,c) and mean diversity (Fig. 3b,d) using parameters estimated from the data (not free parameters).

This success led us to consider whether the gamma distribution could predict the variance of richness and diversity, which it could not because it does not capture covariance between community members (Fig. 4).

In the revision we have identified opportunities to make these points clear throughout the Results.Furthermore, we have added additional detail to the legends of Figs. 2-4.

1. The manuscript would benefit from clarifying the use of "prediction" related to the SLM. Since the gamma distributions predicted by SLM were fit to empirical data, it seems like the agreement between analytic means and empirical means (Fig. 3) is a statement on gamma distributions being a good fit for the AFD's more than SLM predicting richness and diversity. For example, from my reading, it seems like this analysis could be done numerically by shuffling species abundances across environments and seeing whether this changed the mean richness/diversity. I would not call this shuffling test a prediction, since it is more a statement on the relevance of interactions. SLM predicts gamma-distributed AFD's, but those distributions recovering the data they were trained on doesn't seem like a prediction.

Author response: In this manuscript we identified the gamma distribution as an appropriate probability distribution to describe the distribution of relative abundances across samples over a range of coarse-grained scales. Motivated by this result, we performed a separate analysis where at each scale we estimated the mean and variance of relative abundance across sites for each community member. We then used these parameters to obtain the expected value of acommunity-level measure using an equation we derived by assuming that the gamma distribution was appropriate (e.g., richness, Eq. 13). We then compared the expected value of richness to the mean value from empirical data and assessed the similarity between the two values.

The outcome of this procedure constitutes a prediction. While the mean and variance are parameters, estimating them from the empirical data has no connection with the operation of training a distribution on empirical data. We could have derived predictions such as Eq. 13 using any other probability distribution that can be parameterized using the mean and variance (e.g., Gaussian). Such a prediction would likely do a poor job even though it used the same means and variances used for our gamma predictions. This is because the choice of distribution would not have been a good descriptor of the distribution of abundances across hosts.

To better explain this last -- perhaps the most significant -- issue, I'd like to ask the authors if the following recasting would be an accurate reflection of their conclusions, or if something is missing.

1. "Focusing on the empirical relationship observed between diversity slopes by Madi 2020, we ask the question: does explaining these relationships require accounting for species-species correlations? Or could it be reproduced in a noninteracting model?"To address this question, one can perform a randomization test, shuffling abundances to preserve all single-OTU statistics but breaking any correlations. My reading of the authors' results is that (new result 1) the richness relationships would be preserved, while diversity relationships would not be preserved. [Note that this result 1 need not mention either SLM or UNTB.]

Author response: The question of whether correlations between species are necessary to explain the observed slope of the fine vs. coarse-grained relationship was only one component of our research goals. Our first question was whether the SLM would prove to be a more appropriate null for evaluating the novelty of observed slopes. We believe that our results support the conclusion that the SLM is an appropriate null for this question, as it was able to capture observed slopes of the fine vs. coarse-grained relationship for estimates of richness, determining that correlations and the interactions that are ultimately responsible are not necessary to explain this result.

We then find that the SLM as a null model fails to capture observed slopes of the fine vs. coarsegrained relationship for estimates of diversity and simulate the SLM with correlations to return reasonable estimates of the slope. However, here the question about correlations is a direct follow-up from our question about a null model that excludes interactions, so it is unclear how a randomization test would relate to this result.

1. Instead of doing a randomization test (resampling the empirical distribution), one might insist on instead fitting a model to the AFD distributions, and sampling from that distribution rather than the empirical one.

a. If doing it this way, one should of course ensure that the distribution being fit is a good description of the data.

b. UNTB is a bad fit. SLM is a better fit, and in fact (new result 2) continues to be a good empirical fit even at coarse-grained levels.

c. Can make statements on using SLM as a null model for these types of cross-scale relationships. Could try arguing that fitting an SLM model per-OTU (instead of resampling the empirical distribution) could offer some advantage if certain properties could be computed analytically from the fit parameters, instead of averaging over multiple computational rounds of resampling.

Do these two points accurately summarize the manuscript? If so, this presentation avoids the confusion with "prediction". If my summary is missing some important point, the presentation should be revised to clarify the points I appear to have missed.

Author response: In our manuscript we derive predictions from the gamma distribution, the stationary distribution of the SLM, that require parameters estimated from the data (i.e., mean and variance of relative abundance). These parameters are estimated from the data using normal procedures and then plugged into our predictions that assume the appropriateness of the gamma, returning values that are then compared to estimates from empirical data. Our estimation of the mean and variance does not assume that the empirical distribution following a gamma distribution, but the value returned by our function derived from the gamma distribution (e.g., Eq. 13) does make that assumption.

To address the reviewer’s broader comment, we believe that following points summarize our manuscript:

1. The gamma distribution as a stationary solution of the SLM captures macroecological patterns and predicts typical community-level properties (i.e., mean richness and diversity) across phylogenetic and taxonomic scales.

2. The gamma distribution fails to predict variation in community-level properties (i.e., variance of richness and diversity) across phylogenetic and taxonomic scales. This occurs because the SLM is a mean-field model that does not explicitly include interactions between community members.

3. Despite the inability to capture interactions, the gamma distribution succeeds at predicting the fine vs. coarse-grain slope for richness, a pattern that had previously been attributed to community member interactions. This result demonstrates that the novelty of a macroecological pattern hinges on one’s choice of null model.

4. However, the gamma cannot capture the same relationship for diversity. Simulations of the gamma distribution that incorporate correlations between community members are capable of generating reasonable estimates of the slope.

To address the reviewer’s comments regarding the appropriateness fitted gamma distributions, in our revision we have added fitted gamma distributions to plots of AFDs so that the reader can visually assess the ability of the gamma to describe empirical patterns (Fig. S3, S4).

We have also obtained predictions for the slope of the fine vs. coarse-grained relationship for community richness using the same form of UNTB used by Madi et al (2020). In our revised manuscript we establish a procedure to infer the single parameter of this model, generate predictions of richness at fine and coarse-grained scales, and then evaluate whether the UNTB is capable of predicting the slope of the fine vs. coarse-grained relationship for richness (Supplementary Information; Figs. S18, 24-28; lines 277-278; 370-380).

Other/minor comments

1. The manuscript would be improved with more consistent terminology ("fine vs. coarse-grained relationship"/"the relationship" vs. "diversity slope"). Also, many readers may be used to OTUs referring to the rather fine level of description, as opposed to any chosen level; and could interpret indexing over groups as being in contrast with indexing over OTU's (coarse vs fine). The authors' use is perfectly correct, but keeping a consistent terminology would help.

Author response: We have revised our manuscript to specify the “slope” as the “slope of the fine vs. coarse-grained relationship” (e.g., Line 318). We also specify in the Results and in the Methods that we use “fine” and “coarse” as relative terms, keeping with the sliding-scale approach used in Madi et al (2020).

1. While I appreciate this "slope" is something borrowed from other work, the clarity of the paper might benefit from a cartoon of how one goes from the raw data to the slopes at a particular coarse-graining level. (Optional).

Author response: We had added a conceptual diagram to the revision (Fig. S20).

1. The text often colloquially references "the gamma," "predictions of the gamma," etc. This phrasing comes across as sloppy, and the manuscript would be improved by being more specific.

Author response: We now specify “gamma” as the “gamma distribution” throughout the manuscript.

1. Equation 6 appears to be missing some subscripts on the x terms (included on the left of the equation).

Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

1. In "Simulating communities of correlated...AFDs", the acronym SAD is not defined.

Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

1. In Figure 2:

a. Invariant is probably the wrong word for the title, since all the AFD's were rescaled by mean and variance before being compared. Data does support that the gamma distributions are good at describing the AFD's, but as stated in the description it's the general shape that is preserved, not the distribution itself.

Author response: When we mention the invariance of the AFD we now specify that we mean that the shape of the distribution remained qualitatively invariant.

b. I'd recommend changing the color coding to something with more contrast, since currently it's impossible to assess the claim that the shape of the distribution collapses.

Author response: Our coarse-graining procedure is a sequential operation that has no intuitive point that would suggest the use of a contrasting colormap (e.g., if our scale ranged from -1 to 1 then there would be a natural point of contrast at zero).

c. The legend is missing relevant technical details: How many OTU's were used to make plot a? How many samples?

Author response: The number of samples was listed in the Materials and Methods (line 523). In the revision we now include a table with the average and total number of OTUs as well as the average number of reads for each environment (Table S1, S2).

d. In plot b, is the mean relative abundance referring to "mean abundance when observed" or "mean across all samples"?

Author response: The mean relative abundance is the mean abundance across all sites (line 204) and in the legend of Fig. 2.

e. Since one argument here is that SLM fits these distributions better than UNTB, if possible it would be nice to see UNTB's failed fits here.

Author response: A major feature of the UNTB is that the demographic parameters of community members are indistinguishable. Under the SLM, the variation in the mean relative abundance we observe suggests that the carrying capacities of community members vary over multiple orders of magnitude, a result that is incompatible with most forms of the UNTB (x-axis of Fig. 2b). We now mention this point in the revised manuscript (lines 110; 229; 455-471).

1. In Figure 3:

a. It is not clear how coarse-graining is included in model fitting. The "Deriving biodiversity measure predictions" section would benefit from including how coarse-graining is incorporated.

Author response: We predict measures of biodiversity separately at each coarse-grained scale. We now clarify this detail in the revised manuscript (Lines 624-627).

b. Reference Shannon Diversity in Methods.

Author response: We now cite Shannon’s diversity.

c. What is the blue/white color coding in plots a & c? It doesn't have any color key.

Author response: Figs. 3-6 use a uniform light-to-dark scale for all environments, with each environment having its own color. For example, Fig. 3a contains data from the human gut microbiome. Human gut data were assigned the color aquamarine, so the shade of aquamarine for a given datapoint in Fig. 3a indicates the phylogenetic scale.

In the revision we now clarify the colorscale in the legend of Fig. 3 and specify that the same scale is used in all subsequent figure legends.

d. Re: earlier comments, why is richness considered a prediction? (Am I correct in my interpretation that panel b is almost a tautology - counting the number of zeros in the matrix either by rows or by columns - whereas panel d is nontrivial?)

Author response: Mean richness as a measure of biodiversity depends on the fraction of sites where a given community member is present (i.e., occupancy). The mean relative abundance of a community member and its variation across sites (beta) is clearly related to occupancy, but those two statistics do not give you a prediction of occupancy. Obtaining a prediction of occupancy and, subsequently, richness, requires 1) a probability distribution of abundances (i.e., the gamma) and 2) a probability distribution of sampling (i.e., the Poisson). Using these two pieces of information, we derived a prediction for mean richness (Eq. 13). We then compare the value of richness obtained by plugging in the mean relative abundances, betas, and known number of reads to the observed mean richness obtained from the data.

e. The lettering of subplots in Figure 3 is not consistent with Figure 4. Figure 3 subplots are also cited incorrectly in paragraph two on page six (lines 251-254).

Author response: We thank the reviewer for noticing the error and we have corrected it in the revision.

f. Again, if possible show UNTB predictions in plots a & c.

Author response: In our revised manuscript we provide extensive descriptions and predictions of mean richness and the slope of the fine vs. coarse-grained relationship for richness using the form of the UNTB used in Madi et al. (2020; Figs. S18, S24 - S29; lines 277-282; 370-380). We then compare the error of these slope predictions to those obtained from the SLM, finding that the SLM generally outperforms UNTB (Figs. S27-S29).

1. In Figure 4:

a. What are the color codings in plots a & b?

Author response: The color scale used in Fig. 4 is identical to the color scale used in Fig. 3. This detail is now specified in the legend of Fig. 4.

b. What are the two lines of empirical data in plots a & b, and why is one of them dashed?

Author response: We now specify what the two lines mean in the key within the figure.

c. Same comment as earlier on predictions and richness.

Author response: We now specify what the two lines mean in the key within the figure.

1. In Figure 5:

a. It wasn't clear to me in the manuscript how the authors generated these plots from the raw data. The manuscript would benefit from a clear cartoon/description of the data pipeline, from raw data to empirical (and analytic) slopes.

Author response: We have added a conceptual diagram to the revised manuscript (Fig. S20).

b. Make the figure title more descriptive to better connect it to the figure's objective (the richness slopes relationship is not novel, but the diversity slopes relationship is).

Author response: We have revised the figure title.

References

Camacho-Mateu, J., Lampo, A., Sireci, M., Muñoz, M. Á., & Cuesta, J. A. (2023). Species interactions reproduce abundance correlations patterns in microbial communities(arXiv:2305.19154). arXiv. https://doi.org/10.48550/arXiv.2305.19154

Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature Communications, 11(1), 4743. https://doi.org/10.1038/s41467-020-18529-y

Madi, N., Vos, M., Murall, C. L., Legendre, P., & Shapiro, B. J. (2020). Does diversity beget diversity in microbiomes? eLife, 9, e58999. https://doi.org/10.7554/eLife.58999

Shoemaker, W. R., Sánchez, Á., & Grilli, J. (2023). Macroecological laws in experimental microbial systems (p. 2023.07.24.550281). bioRxiv. https://doi.org/10.1101/2023.07.24.550281

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Shoemaker WR. 2023. Macroecological patterns in coarse-grained microbial communities. Zenodo. [DOI] [PMC free article] [PubMed]
    2. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project mapping files. Earth Microbiome Project. release1/mapping_files
    3. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project phylogeny and taxonomy. Earth Microbiome Project. release1/otu_info/silva_123
    4. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project count data. Earth Microbiome Project. release1/otu_tables/closed_ref_silva

    Supplementary Materials

    MDAR checklist

    Data Availability Statement

    All sequencing data used in this study was obtained from the Earth Microbiome Project (URL: https://ftp.microbio.me/emp/release1/). Processed data used to perform the analyses in this study are available on Zenodo, DOI: https://doi.org/10.5281/zenodo.7692046. All code written for this study is available on GitHub under a GNU General Public License: https://github.com/wrshoemaker/macroeco_phylo (copy archived at Shoemaker, 2023b).

    The following dataset was generated:

    Shoemaker WR. 2023. Macroecological patterns in coarse-grained microbial communities. Zenodo.

    The following previously published datasets were used:

    Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project mapping files. Earth Microbiome Project. release1/mapping_files

    Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project phylogeny and taxonomy. Earth Microbiome Project. release1/otu_info/silva_123

    Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Xu ZZ, Jiang L, Haroon MF, Kanbar J, Zhu Q, Song SJ, Kosciolek T, The Earth Microbiome Project Consortium 2017. Earth Microbiome Project count data. Earth Microbiome Project. release1/otu_tables/closed_ref_silva


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES