Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2024 Jan 24;121(5):e2309575121. doi: 10.1073/pnas.2309575121

Sparse species interactions reproduce abundance correlation patterns in microbial communities

José Camacho-Mateu a, Aniello Lampo a, Matteo Sireci b,c, Miguel A Muñoz b,c, José A Cuesta a,d,1
PMCID: PMC10853627  PMID: 38266051

Significance

Microbes—the most abundant organisms on Earth—form complex communities with a profound influence on many problems, from human health to ecosystem management. Understanding species interactions is of paramount importance in many different contexts, e.g., in medical applications. However, current models of microbial communities have not adequately captured the essential role of interactions. Here, we present population models implementing species interactions. These enable us to replicate macroecological patterns of species correlations not captured by existing models. These findings robustly support the importance of species interaction networks, highlighting their inherent sparsity as a distinctive structural property. This characteristic is associated with the prevalence of amensalistic and commensalistic relationships within the community.

Keywords: ecology, microbiome, Lotka–Volterra, correlations, interactions

Abstract

During the last decades, macroecology has identified broad-scale patterns of abundances and diversity of microbial communities and put forward some potential explanations for them. However, these advances are not paralleled by a full understanding of the dynamical processes behind them. In particular, abundance fluctuations of different species are found to be correlated, both across time and across communities in metagenomic samples. Reproducing such correlations through appropriate population models remains an open challenge. The present paper tackles this problem and points to sparse species interactions as a necessary mechanism to account for them. Specifically, we discuss several possibilities to include interactions in population models and recognize Lotka–Volterra constants as a successful ansatz. For this, we design a Bayesian inference algorithm to extract sets of interaction constants able to reproduce empirical probability distributions of pairwise correlations for diverse biomes. Importantly, the inferred models still reproduce well-known single-species macroecological patterns concerning abundance fluctuations across both species and communities. Endorsed by the agreement with the empirically observed phenomenology, our analyses provide insights into the properties of the networks of microbial interactions, revealing that sparsity is a crucial feature.


Our understanding of the microscopic living world has been recently challenged by the advent of metagenomics (1, 2). Indeed, DNA sequencing methods unveiled that a large fraction of microbial diversity was missing in laboratory cultures (35). Moreover, the possibility to collect genetic material directly from its natural environment introduced a new dimension—the set of samples—along which the properties of the biome may vary. This has given rise to the production of the largest datasets ever, allowing microbial communities to be investigated at a much greater scale and detail than before.

To approach this new profusion of data, macroecology—the quantitative analysis of emergent broad-scale patterns—prevailed as a promising point of view (611). The framework paved the way to assess statistically the variation in abundance and diversity that, despite the complexity of the underlying microscopic behaviors, often portray distinctive distributions and that sometimes may be explained in terms of basic ecological forces. Specifically, considerable progress has been achieved in the observation of statistical regularities of taxa populations across time (12), spatial samples (13), and species abundance distributions (14).

Most remarkably, a recent paper by Grilli (15) provided an important step toward a macroecological study of microbial communities. Relying on the analysis of data from nine real biomes, the work characterizes some patterns of abundance variation in terms of (using the terminology of, e.g., refs. 12, 14, and 15) three macroecological laws (Fig. 1): i) the fluctuations in the abundance of any given species across samples follow a gamma distribution; ii) the variances of these distributions for different species are proportional to the square of their means (a particular case of Taylor’s law (16) for power-law exponent 2); and iii) the mean abundances across species follow a lognormal distribution. These macroecological patterns of species fluctuations and diversity have been parsimoniously explained using the stochastic logistic model (SLM), which endows the traditional logistic equation with a (multiplicative) stochastic term (17, 18) embodying information about environmental variability (15, 1921).

Fig. 1.

Fig. 1.

Infographic of the population dynamics and the resulting macroecological patterns. Panel (A) portrays, as an illustrative example, three individual-species (color coded) time courses at equally spaced times (longitudinal data), resulting from the integration of Eq. 1. The abundances at different samples describe the fluctuations around the mean (abundance fluctuation distribution, AFD) are well described by a gamma distribution, as shown in panel (B) (SI Appendix, Figs. S4 and S5). For each species, this distribution is characterized by its mean value x¯i and its variance σi2. These two magnitudes are linked by Taylor’s law σi2x¯i2 [panel (B)]. The mean abundances of all species are distributed as a lognormal (mean abundance distribution, MAD), panel (B). Further details about Taylor’s law and MAD are presented in SI Appendix, Figs. S6 and S7. Panel (C) illustrates the correlations between abundance fluctuations of pairs of species across samples (a point for each sample/realization). The Top-Left plot illustrates the case of two uncorrelated species, whereas the Top-Right plot illustrates two positively correlated species. The Bottom picture shows the distribution of Pearson’s coefficients ρij of all pairs of species. Empirically, this distribution is found to generally cover the entire range 1ρij1 and to exhibit a peak at negative values.

Beside the aforementioned patterns, the analysis of empirical data unveils also the existence of nontrivial pairwise correlations in species abundances (15). In particular, the Pearson’s correlation coefficients of all pairs of species in a biome display distributions ranging from anti-correlations to positive correlations, with a peak often located at negative values (SI Appendix, Table S1). These pairwise correlations are not accounted for by the SLM model because it treats species dynamics as independent from each other (22). Describing correlations in species abundances, thus, calls for introducing some sort of interaction between species.

The existence of species interactions in microbiomes is well documented in a wealth of experimental results that manage to observe and measure them (2326). Indeed, microbial interactions are a key ingredient behind community stability (27, 28), necessary for, e.g., the maintenance of health in human biomes (2931) or the control of medical disorders (3235). From the modeling standpoint, microbial pairwise interactions have also received a lot of attention in the field of network inference, where researchers struggle with the problem of reconstructing species interaction networks from available empirical datasets (3639). All these efforts make evident the current consensus on the crucial role of interactions in microbial ecosystems, justifying their inclusion in modeling approaches.

Interactions can be implemented in models in at least two ways: i) indirectly, i.e., assuming the diverse environmental noise terms to be correlated with each other, or ii) directly, i.e., introducing a coupling between species abundances, or iii) using a combination of both. The first route assumes that correlations in the abundance of two species arise from similar or opposite responses of both species to changes in the environment (variation of nutrients, presence of chemicals, changes in temperature or pH, etc.). As a matter of fact, this strategy has recently been explored in connection with phylogeny, under the rationale that genetically related species tend to respond to environmental cues alike (22). Coupling the noise terms has the added advantage of being a modification of the SLM that preserves, by construction, Grilli’s three empirical laws. However, as we will later show, environmental noise by itself is insufficient to fully capture the correlation patterns observed empirically so that it needs to be replaced—or at least supplemented—by the inclusion of direct species interactions.

In this paper, we propose a population model which includes direct Lotka–Volterra pairwise couplings between species abundances. This approach is suitable to model competition mechanisms (negative correlations) detected in real biomes and their interplay with cooperative ones, as well as other kinds of relationships. Unlike other approaches, we do not attempt to infer specific pairwise interactions but, rather, the ensemble of possible interaction networks able to reproduce the empirically observed correlation patterns. We will show that interactions provide a necessary and sufficient requirement to account for the observed correlation distributions, besides preserving Grilli’s three empirical laws. Our analysis identifies sparsity, i.e., the low density of species interactions, as a critical feature of microbial networks. This feature suggests a prevalence of amensalistic and commensalistic relationships among the community biota.

Modeling Microbiomes

Environmental Noise vs. Species Interactions.

A simple model that couples species in a parsimonious way is the Stochastic Lotka–Volterra model (SLVM)

x˙i=xiτi1+j=1Saijxj+xiξi,i=1,,S, [1]

where xi(t) is the abundance of species i at time t, τi is the time scale of its basal population growth, and ξi is a zero-mean, multivariate Gaussian white noise (Itô interpretation) with correlations ξi(t)ξj(t)=wijδ(tt). The matrix W=(wij) accounts for environmental fluctuations, whereas the off-diagonal terms of matrix A=(aij) describe direct, Lotka–Volterra-like interactions between species, and the diagonal terms aii=1/Ki incorporate the carrying capacity for each species i.

When W=wI and A is a diagonal matrix, Eq. 1 becomes the SLM. By turning on the off-diagonal terms of the noise correlation matrix W (indirect interactions) and/or of the Lotka–Volterra matrix A (direct interactions), we can study the effect of correlated environmental noise and/or direct species interactions on species pairwise correlations. Ideally, though, the model should contain the right proportion of both terms.

Before proceeding, let us remark that both coupling terms A and W could have been derived, after some simplifying assumptions, from a more complex consumer–resource type of model describing explicitly the dynamics of environmental factors or resources (see, e.g., refs. 22, 40, and references therein). However, for the sake of simplicity and to allow for a straightforward comparison with the SLM, here, we build our model by proposing general forms for these two types of couplings independently.

At first sight, adding environmental noise correlations (W) has an advantage over adding interactions (A) in that Grilli’s first law is preserved by construction (SI Appendix, section 7A). The second law simply amounts to setting wii=w for all i. As for the third law, it can be fulfilled if one chooses ad hoc the carrying capacities Ki as lognormal distributed random variables (15). Obviously, these latter choices do not explain the origin of the second and third laws, but at least render a model that is compatible with them. On the downside though, the fact that W, being a covariance matrix, must be symmetric and positive definite severely constrains the kind of abundance correlations that Eq. 1 can generate.

On the other hand, if one introduces interactions while keeping W=wI, in general, the first and second laws do not hold exactly—although they may approximately do so. However, the presence of interactions strongly affects the average abundance of the species. While the SLM (with or without a noise correlation matrix) predicts a stationary population that fluctuates around its carrying capacity, in the presence of coupling, the mean values are the solution of the linear system (SI Appendix, section 2)

j=1Saijx¯j=τiw21,i=1,,S, [2]

where x¯j denotes the average abundance of species j. Therefore, interactions shift these average abundances to the extent that, even if all carrying capacities were the same, the x¯j would split over a range of values. This may not be a full explanation of the third law yet, but it opens the possibility that its origin might lie on a particular structure of the network of interactions. As a matter of fact, Descheemaeker et al. (41) were able to obtain a lognormal mean abundance distribution by simply introducing an indirect interaction between species through a global carrying capacity of the system.

A quick test to decide which of the previously described two strategies is most promising to model abundance correlations is to generate a large sample of random matrices (either W or A), and for each of them simulate the stochastic process Eq. 1, calculate abundance correlations between pairs of species, and compare the resulting distributions with those empirically obtained from the microbiome datasets with the same number of species (Fig. 1 A and C). Each of these two samples must fulfill some constraints: Matrices W must all be symmetric and positive definite, and matrices A must all lead to a feasible (i.e., x¯i>0 for all i) (42) and asymptotically stable, i.e., small perturbations must die out (43, 44) steady state (SI Appendix, section 3).

Fig. 2 shows the distribution of Pearson’s abundance correlation coefficients for all S(S1) pairs of species ij, as obtained from a typical dataset and using each of these two matrix ensembles. The empirical distribution decays exponentially to the left and to the right, it is a bit asymmetrical (skewness coefficients close to or larger than 1 for all biomes; see SI Appendix, Table S1) and it has a peak at slightly negative values of the correlation. The distributions obtained from the W samples bear little resemblance to the former—they exhibit very little negative correlations, are strongly asymmetrical, and show a peak at zero. On the contrary, distributions obtained from the A samples have a wide range of sample-to-sample variability, and some of the realizations are very similar to empirical data, often peaking at negative correlation values.

Fig. 2.

Fig. 2.

Distributions of Pearson’s abundance correlation coefficients as obtained in the model with (Left panel) a few samples of the noise correlation matrix W (each with a different gray shade) or (Right panel) with random samples of the Lotka–Volterra matrix A. The black solid lines portray in each case, the empirical distribution as obtained from the Seawater microbiome (species which appear in less than 50% of the communities have been filtered out), while the blue ones represent the distribution of correlations as obtained from the model without interactions. In the Left plot, colored circles show the results for a few samples of matrices W (see “Material and Methods” for details of the sampling procedure); Lotka–Volterra constants are chosen as aij=δij/Ki, with carrying capacities Ki sampled from a lognormal distribution with mean 0.1 and SD 0.5—as for the SLM (15). The results shown in this figure are typical (SI Appendix, sections 7B and C for a more thorough exploration). In the Right plot, colored circles represent correlations resulting from the SLVM with W=wI and Lotka–Volterra constants aij (ij) sampled from a Gaussian distribution with zero mean and SD 0.03. A random selection of 60% of such constants are set to zero (i.e., the connectance of the interaction matrix is C=0.4).

These analyses suggest that environmental noise by itself seems incapable of generating correlations resembling those observed in real microbiomes, and so interactions have to be included in the model. In any case, the presence of correlated noise cannot be ruled out from these analyses, but in order to keep things simple, we henceforth take W=wI and focus on the effect of interactions in the model.

Grilli’s Laws in the Presence of Interactions.

The model described by Eq. 1 with W=wI and a nontrivial interaction matrix A is not guaranteed to satisfy any of the three macroecological laws found by Grilli (15)—even if the carrying capacities Ki are sampled from a lognormal distribution, as in the SLM. However, as long as the interactions are a “small” perturbation to the SLM, one can reasonably expect them to hold, at least approximately. In particular, if one sets all off-diagonal interaction coefficients to zero, except for a fraction C (“connectance”) that are randomly and independently drawn from a normal distribution N(0,σ), a criterion for the weakness of interactions is that the resulting system remains feasible and asymptotic stable; in other words, σSCKmax1, where Kmax is the maximum carrying capacity (SI Appendix, section 4). We will refer to this as the “weak-interaction regime.”

Fig. 3 shows the compliance with the three macroecological laws for different combinations of parameters within the weak-interaction regime (see Fig. 1 A and B for the sampling procedure). The first row illustrates that fluctuations of the abundance around the mean values still follow a gamma distribution (first law); the second row reveals that Var(xi)x¯i2, according to the second law; and the third row shows that the mean abundances very closely follow a lognormal distribution (third law). Particularly noteworthy is the compliance with the third law, given that the mean abundances in Eq. 2 are no longer fixed by the carrying capacities, which do follow a lognormal distribution.

Fig. 3.

Fig. 3.

Grilli’s three macroecological laws as a function of the interaction parameters. Specifically, the figure shows the abundance fluctuation distribution (AFD) (panels AC), Taylor’s law (panels DF), and the MAD, (panels GI) for different values of the species number S (panels A, D, and G), the connectance C (panels B, E, and H), and SD of the interaction constants σ (panels C, F, and I). Results have been averaged over 100 realizations of the SLVM Eq. 1 each one with a different random interaction matrix. Results including all realizations are depicted as a cloud of gray points, whereas averages are shown as colored bullets. The AFD obtained for a given realization contains the results for all species, represented in terms of rescaled logarithm abundances (z=Var(x)1/2log(x/x¯)). Solid black lines correspond to gamma distributions. MAD plots (GI) is obtained by properly rescaling the mean abundances and are fitted by a normalized (zero mean, unit SD) lognormal distribution (black solid line). Similarly, the black straight lines in (panels DF) describe the relation Var(xi)x¯i2 in logarithmic scale. (Panels JL) illustrate the limits of the weak-interaction regime across the set of parameters that characterize species interactions. The plots quantify the compliance with (J) a gamma AFD, (K) Taylor’s law, and (L) a lognormal MAD, within the region where the system is stable and feasible. Each pixel corresponds to a combination of values of the network connectance C (horizontal axis) and the SD σ of the distribution of interactions (vertical axis). The color of the pixel quantifies the distance from the AFD to a gamma distribution (J), the value of the exponent γ in the relationship Var(xi)x¯iγ (K), and the distance of the MAD to a lognormal distribution (L), averaged over a sample of 100 realizations. Gray areas mark the region of the parameter space where the resulting systems are neither stable nor feasible. In these plots S=50, τi=0.1, w=0.1, and the carrying capacities are sampled from a lognormal distribution (mean 0.1, SD 0.5).

Importantly, the gamma abundance-fluctuation distribution (AFD) remains unaffected regardless of the values taken by the interaction parameters (Fig. 3J). Moving closer to the boundary of the weak-interaction regime we can see that Taylor’s law still holds, but the exponent gets modified as Var(xi)x¯iγ. As this boundary is approached, the exponent decreases down to values around γ1.4 (Fig. 3K) and, likewise, the mean-square distance between the distribution of mean abundances and a lognormal increases (Fig. 3L), although it is never very large.

It is worth mentioning that the SLVM Eq. 1 provides an alternative way to comply with the third law other than sampling the carrying capacities from a lognormal distribution and remaining in the weak-interaction regime. Even if we choose constant carrying capacities (Ki=K for all i), Eq. 2 allows us to seek interaction matrices that shift the mean abundances so as to follow a lognormal distribution. SI Appendix, section 5 shows that such matrices do actually exist and yield stable and feasible communities. This finding brings species interactions in the long debate about the origin of heavy-tailed abundance distributions, something which, to the best of our knowledge, has been scarcely investigated (but see ref. 19). This is an issue that goes beyond the aim of the current work and will be explored in a forthcoming publication. Therefore, hereafter, we focus on the weak-interaction regime with log-normally distributed carrying capacities.

Interactions Reproduce the Distribution of Correlations.

An analysis of empirical data selected from the EBI metagenomics platform (45) reveals that, on top of the three single-species macroecological laws that we have discussed so far, microbiomes exhibit pairwise correlations. As a matter of fact, the distribution of all S(S1) Pearson’s coefficients of a microbial community has a characteristic pattern (Fig. 1C). For all the microbiomes that we have considered, this distribution approximately covers the whole range of values (1ρij1) and is very different from the residual narrow distribution peaked at zero that results from the approach with no species interactions (15, 19, 20) (Fig. 2, as well as Fig. 4A). Worth noticing is the almost exponential decay to both sides of the interval and the location of the maximum at slightly negative values of ρij.

Fig. 4.

Fig. 4.

Abundance correlation distributions for real and simulated communities. In (A), different colored bullets correspond to different biomes selected from the EBI metagenomics platform (45) (namely Seawater, River, Lake, Glacier, and Sludge communities). Black dashed lines portray the distribution of Pearson’s coefficients for the abundance correlation of all pairs of species resulting from the SLM. Gray curves show the same distributions as obtained from the SLVM, c.f., Eq. 1, with the Lotka–Volterra interaction constants inferred using the Bayesian approach described in “Material and Methods.” The Inset in the Bottom panel of (A) presents the quantile–quantile (QQ) plot, comparing quantiles of the empirical and the synthetic distributions, for the different biomes. The dots sit on the bisector line, indicating the close alignment between the quantiles of both distributions for each single biome. This test is consistent with the results of a Kolmogorov–Smirnov test (P-values of Seawater and River: 0.99; Lake and Glacier: 0.90; Sludge: 0.81). The Top panel of (B) shows the Euclidean distance between the logarithms of the empirical and the synthetic distributions (for Seawater, using only species appearing in at least 50% of the samples) vs. the iterations of the MCMC. In blue and gray dots, the Inset shows the synthetic distributions obtained at the iteration marked by a dashed line of the corresponding color. The empirical distribution is drawn with a solid black line. (C) shows the distribution of absolute values of the interaction constants (|aij|) for a collection of over 200 matrices generated through the MCMC method (green bullets). This distribution can be fitted by a convex combination of two Gaussian distributions (gray solid line). The black (blue) dashed line fits the broader (narrower) Gaussian. Practically all coefficients in the narrower Gaussian are negligible compared to the broader one. Hence, the presence of a small fraction of large coefficients gives rise to an effective connectance (Ceff) in the associated network. The Inset shows a histogram of the values of Ceff. It peaks around 0.05, with tails extending approximately in the range 0.01Ceff0.15. This reveals the high sparsity of the interactions.

In order to find a set of interaction matrices A (or “ensemble”) that are capable of inducing the empirically observed patterns of correlations, while at the same time preserving Grilli’s three laws, we have adopted a Bayesian approach. We know from the previous analysis (Fig. 2, Right) that matrices inducing similar correlations do exist. Thus, we take the empirical correlation distributions, as well as Grilli’s second and third laws, as given—within a Gaussian error—and wonder about the posterior probability distribution of interaction matrices A. Needless to say, this distribution cannot be computed analytically, so in order to sample matrices A out of the ensemble of possible solutions, we need to perform a Markov chain Monte Carlo (MCMC) simulation (see “Materials and Methods” for the details).

As an illustration of the results of this approach, Fig. 4A shows the distribution of Pearson’s coefficients obtained for five biomes (SI Appendix, Fig. S15 contains the results for all available biomes in our dataset), along with the empirical ones. The figure speaks for itself as the agreement is rather precise in all cases. Remarkably, the interaction matrices A which this method converges to follow an interesting statistical pattern. Most coefficients remain zero, and the nonzero ones are distributed as a combination of two zero mean, Gaussian distributions with SDs differing from each other in more than one order of magnitude (see Fig. 4B for a typical fit). This yields matrices that are highly sparse (sparsity is estimated from the contribution of the wider Gaussian; “Materials and Methods”).

It is worth mentioning that removing the constraint on the second and third laws results similar to those of Fig. 4A can be obtained with more densely connected matrices. However, the exponents of Taylor’s law are smaller than 2, and the MAD deviates from a lognormal distribution. This strongly suggests that loosely connected interaction networks (sparsity) might be an important feature of real microbiomes—something that seems to be consistent with existing experimental evidence (23, 26).

Discussion

The existence of macroecological, one-species, statistical patterns in microbial communities puts strong constraints to their mathematical modeling. A remarkable result is that a simple model that neglects interactions between species, the SLM (15, 19, 20), seems enough to reproduce such one-species patterns. This is somehow unsettling because species interactions are well documented to play a fundamental role in the behavior of microbial communities. As a matter of fact, interactions may underlie critical features associated with, e.g., health disorders (32) such as Crohn’s disease (33) or some forms of inflammatory bowel syndrome (34), and many current treatments rely on competition among bacteria (31, 35).

However, not surprisingly, neither the SLM nor any other independent-species approach is able to account for the nontrivial patterns of species-abundance correlations that microbial communities exhibit. It is true that the existence of correlation between a pair of species does not necessarily imply a direct interaction between them: It may be caused by common (similar or opposite) responses to environmental fluctuations or external driving forces. As a matter of fact, phylogenetic closeness can justify why some species respond to the same chemicals alike, which may account for much of the positive correlation observed (22). However, the widespread presence of negative correlations renders this explanation incomplete.

We have shown in this paper that environmental fluctuations by themselves are insufficient to reproduce the correlation patterns observed in real microbiomes, leaving direct interactions as a necessary ingredient in any sound dynamical model of microbial communities. This conclusion is also supported by a recent work (40) where microbial behavior has been investigated within the framework of the consumer–resource model. In particular, the authors argue that resource competition could account for numerous statistical patterns observed in abundance fluctuations across diverse microbiotas, encompassing the human gut, saliva, vagina, mouse gut, and rice. Significantly, they also investigate the distribution of abundance correlations and find that the predictions derived from considering resource competition approach much closer to empirical data than those related to noninteracting models.

Furthermore, we have shown here that it is possible to introduce interactions to the SLM, thus generating a Lotka–Volterra type of model, such that it complies simultaneously with the three single-species macroecological laws put forward in ref. 15 as well as with the abundance-correlation patterns. Interaction matrices satisfying such constraints have been generated through a Bayesian approach implemented by means of a Markov chain Monte Carlo method. These matrices (and their associated networks) are assumed to be representative of a larger ensemble, whose characterization in terms of topological or structural features is challenging.

We have found very robust evidence that these matrices need to be sparse, which is in agreement with the reported sparse nature of microbial interaction networks (23, 26). Network sparsity has been previously argued to be a crucial topological feature for the functionality of ecological (and other) networks. For instance, Busiello et al. (46) have shown that network sparsity is an emergent property resulting from the conflicting forces of optimizing both explorability and dynamical robustness, where explorability is a measure of the system’s ability to adapt to newly intervening changes, and dynamical robustness is the capacity of the system to remain stable after perturbations of the underlying dynamics. In other words, networks able to adapt in a flexible way to external changes, while keeping a robust dynamical regime, need to be sparse.

The high sparsity values that we obtain imply that more than 90% of the pairwise interactions emerging from our analysis are commensalistic or amensalistic (SI Appendix, Fig. S20). This is an interesting outcome because in microbiotas, commensalism can be associated to cross-feeding and amensalism to poisoning by toxins, two types of interactions that are very common between microbes (47, 48), and are known to help stabilizing diverse ecological communities (49).

Beside network sparsity, we have not been able so far to identify other structural features telling this ensemble of networks apart from random ones, in the same way as previous work distilled properties, such as nestedness, modularity, or trophic-level (hierarchical) organization, from the analyses of other (much smaller and less complex) ecological networks.

For example, the relative frequency of different types of interactions (competitive, mutualistic, neutral...), the degree distribution, the number of directed loops of different lengths, the network spectrum, etc., remain near indistinguishable from their counterparts in random networks (see SI Appendix for details). This does not mean that more subtle differences (beside network sparsity) do not exist between both ensembles. As a matter of fact, such nonrandom structural features must exist, because typical representatives of the ensemble of random sparse networks do not comply with the empirical macroecological laws—only those found through the Monte Carlo simulations do. Identifying such additional nonrandom structural features is not a trivial task because it most likely involves deciphering higher-order correlations in the way pairwise interactions are placed within the network. Thus, it remains a challenging goal for future research.

An aspect that makes the problem especially challenging is the large number of free parameters (interactions) to be determined (it scales with the square of the number of species or taxa) as well as the intrinsic difficulty of identifying its actual values (even with perfect information on correlations; see ref. 39).

Our solution to such a formidable challenge is to construct not just one inferred network—specifying in a detailed way each possible pairwise interaction—but rather a whole ensemble of possible interaction networks compatible with the observed distribution of correlations. In this approach, the identity of specific pairs becomes irrelevant: It can neither be answered nor makes any difference when it comes to explaining macroecological data.

An alternative approach is that of Gibson et al. (50). The authors lump microbial taxa into groups (or modules), assuming that all taxa within a group share the same interactions with other taxa outside their module (and respond to perturbations in the same way). This “dimensionality reduction” allows them to perform a (coarse-grained) network inference with a much smaller number of parameters. Let us stress, however, that we observed very little modularity in the matrices emerging out of our Monte Carlo method which rules out, in principle, such a coarse-graining procedure.

We are aware that some modeling choices we have made can be questioned. For instance, the use of pairwise interactions in a Lotka–Volterra-like fashion. Apart from its long tradition in theoretical ecology, recent works (51, 52) show that, within some limits, it is a reasonable choice. Nevertheless, it has been argued that higher-order interactions maybe crucial in the correct assessment of community stability and the understanding of its conflicting relationship with species diversity (42, 53). Microbiomes are extraordinarily complex communities where processes involving more than two species may be prevalent (54). As a matter of fact, effective dynamical models for species abundances derived from more detailed consumer–resource models, do generically include higher-order interactions, e.g., ref. 22. Therefore, including higher-order interaction terms in Eq. 1 is a generalization worth exploring.

Reproducing abundance correlations with direct interactions alone is more a proof of concept than an actual claim that this is the only mechanism for species correlations. We have already mentioned that phylogeny may be behind much of the positive correlation among microbial species (22) so that, in spite of the fact that environmental noise cannot reproduce by itself the observed correlation patterns, this does not mean that it must be ruled out. Most likely, both terms must be kept in Eq. 1 and the truth lies in an appropriate combination of both. As a matter of fact, this might be the reason behind the asymmetry observed in the correlation patterns that the model with interactions alone struggles to capture.

Other important modeling ingredients, such as demographic noise or immigration, are left out as well. Demographic noise stems from intrinsic fluctuations in birth-and-death processes, and it is proportional to the square-root of the abundance (unlike the environmental noise of the SLVM, proportional to the abundance itself). A very recent study (55) posits that a simple linear model with demographic noise is capable of reproducing patterns, not only of microbiomes, but of other very different systems as well. It has the additional advantage of being analytically solvable. The authors claim that such simple models can do a better job at capturing general properties observed across very different systems. Be that as it may, environmental noise can explain all three Grilli’s laws—whereas, as argued in ref. 15, demographic noise cannot—and can also describe other statistical patterns (SI Appendix, Figs. S21 and S22). For these reasons, we have favored it over demographic noise in our model to study species-abundance correlations. Nevertheless, future work to further discriminate the effects and/or interplay of both types of noise would be highly desirable.

As for immigration, let us remark that it precludes extinctions in open ecosystems such as ours (24), but its effects are more relevant when communities are away from steady states. In this respect, recent work (56) emphasizes that the gut microbiome is constantly bottlenecked, with large amounts of biomass being constantly lost in the stool. It is clear then that abundances can change due to processes other than replication. Thus, extensions of our framework to account for these effect will need to be eventually incorporated.

Aside from these considerations, our analyses offer many possibilities to reach a deeper understanding of microbial communities and their emerging ecological patterns. For instance, whereas the origin of the gamma AFD—and its cousin, Taylor’s law—is related to the multiplicative nature of the noise (the larger the abundance the larger its fluctuations), we still lack a good explanation for the appearance of a lognormal MAD. Both, in the SLM and the present SLVM, it has been imposed by purposely tailoring the carrying capacities of the species. But through Eq. 2, the SLVM offers the possibility that a special choice of the interaction constants—away from what we have termed weak-interaction regime—may induce a lognormal MAD in some self-organized way. Preliminary analyses show that this can actually happen (SI Appendix, section 6), placing the explanatory burden on the nature of the interaction networks. This launches network theory of species interactions in the long-lasting debate (57) about the origin of mechanistic processes behind the emergence of heavy-tailed species-abundance distributions—something that, to the best of our knowledge, has been scarcely explored so far (see ref. 58 for an exception).

Perhaps the most important message of the present work is that direct interactions between species are as relevant in microbiomes as they are in other more traditional ecosystems—such as animal-plant communities or food webs. In this regard, our analysis brings the study of microbes closer to the well-established framework of community ecology, where generalized Lotka–Volterra models play a central role. This paves the way to testing theoretical laws in ecology through experiments performed in microbial communities. The test of the stability-diversity relationship carried out in ref. 59 is an excellent example of this idea. In view of the usual scarcity of data for traditional ecosystems, the overwhelming amount of microbial data provided by metagenomics open an avenue of unprecedented possibilities for ecology.

Materials and Methods

Numerical Solution of the SLVM.

Eq. 1 was solved numerically using an Euler–Maruyama integration scheme (60). For each species, the solution represents a noisy logistic trajectory, with the stationary mean population determined by the interaction properties. Using the resulting time series, the population of a given species in different samples may be recovered—once the dynamics has reached the stationary state—by either selecting abundances at different times (longitudinal data) or considering the abundances of different realizations at the same time (cross-sectional data). Both ways lead to identical results (i.e., the system is ergodic). Further details are discussed in SI Appendix, section 1.

Environmental Noise Matrix Sampling.

To produce a random, positive definite, symmetric matrix W, we factor it as W=UΛUT, where U is an S×S orthogonal matrix (UUT=UTU=I) and Λ is a diagonal matrix whose diagonal elements are random, nonnegative real numbers. The matrix U can be generated by randomly sampling from a Haar distribution generated using the Python function ortho_group from the scipy package (61). The diagonal elements of Λ have been drawn from different probability distributions, but all of them lead to similar results (see SI Appendix, section 5 for a full account).

Bayesian Approach.

The posterior distribution of matrix A, given the correlation distribution ρ, is obtained from Bayes’s formula:

P(A|ρ)=P(ρ|A)P(A)P(ρ). [3]

In order to sample matrices A from the posterior distribution P(A|ρ), we apply a Metropolis–Hastings factor algorithm (60). This amounts to replacing such samples by those of a purposely tailored Markov chain. In particular, at each step n of the chain a pair of species (i,j) is randomly selected and its corresponding interaction constant is modified as aij(n+1)=aij(n)+η, where ηU(ϵ,ϵ). This change is accepted with probability min(1,Hn)—otherwise rejected—where the Hasting factor using Bayes’s formula Eq. 3 is obtained as:

Hn=P(ρ|A(n+1))P(A(n+1))P(ρ|A(n))P(A(n)).

The log-likelihood is computed (up to a trivial additive constant) as:

logP(ρ|A)=12Δ12ilogρ(xi)ρ^(xi)212Δ22|2γ|12Δ32ilog(x¯i)^(x¯i)2,

where Δ1=2, Δ2=0.1, and Δ3=0.3 are weights chosen to ensure that all cost terms are comparable (we have verified that the Monte Carlo is rather robust to their precise values); ρ(x) is the empirical distribution of Pearson’s coefficients; ρ^(x) is the one computed using matrix A; (x¯i) is the rescaled MAD as obtained from simulations; and ^(x¯i) is a standardized lognormal distribution. (The AFD needs not be imposed because it is hardly affected by interactions; see Fig. 3.) As for the prior P(A), we choose it to be zero if A leads to an unstable or unfeasible community, and constant otherwise. Finally, ϵ is selected so that the acceptance ratio at the start of the Markov chain is 30%. Notice though that this acceptance ratio decreases as the Monte Carlo progresses.

Double Normal Distribution.

The nonzero fraction f of nondiagonal elements of the interaction matrices A generated through Monte Carlo simulations follow a distribution that can be well described as the convex combination of two zero-mean normal distributions with SDs σw (for “wide”) and σn (for “narrow”). Typically, σw is at least an order of magnitude larger than σn. Since it spans so different size scales, this distribution is better visualized in a log–log scale. As a function of z=log|x| a normal distribution has the expression:

logg(z,σ)=12log(2/πσ2)e2z2σ2+z. [4]

Thus, in log–log scale, the distribution of the absolute value of coefficients (signs are equally likely positive or negative) is given by:

G(|aij|)=χg(|aij|,σB)+(1χ)g(|aij|,σS). [5]

Thus, χ can be interpreted as the fraction of nonzero matrix elements that are “large.” Hence, an “effective connectance” of the interaction matrix can be estimated as Ceff=fχ.

Supplementary Material

Appendix 01 (PDF)

Acknowledgments

This work has been supported by i) grants PGC2022-141802NB-I00 (BASIC), PID2021-128966NB-I00, and PID2020-113681GB-I00 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe,” and ii) Consejería de Conocimiento, Investigación Universidad, Junta de Andalucía and Universidad de Granada under project B-FQM-366-UGR20 (ERDF). We also thank Jacopo Grilli for enlightening discussions.

Author contributions

J.C.-M., A.L., M.S., M.A.M., and J.A.C. designed research; J.C.-M. and A.L. performed research; J.C.-M., A.L., and J.A.C. analyzed data; and J.C.-M., A.L., M.A.M., and J.A.C. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

Previously published data were used for this work (15).

Supporting Information

References

  • 1.Handelsman J., Rondon M. R., Brady S. F., Clardy J., Goodman R. M., Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products. Chem. Biol. 5, R245–R249 (1998). [DOI] [PubMed] [Google Scholar]
  • 2.Steele H. L., Streit W. R., Metagenomics: Advances in ecology and biotechnology. FEMS Microbiol. Lett. 247, 105–111 (2005). [DOI] [PubMed] [Google Scholar]
  • 3.Whitman W. B., Coleman D. C., Wiebe W. J., Prokaryotes: The unseen majority. Proc. Natl. Acad. Sci. U.S.A. 95, 6578–6583 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rappé M. S., Giovannoni S. J., The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369–394 (2003). [DOI] [PubMed] [Google Scholar]
  • 5.Hug L. A., et al. , A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016). [DOI] [PubMed] [Google Scholar]
  • 6.Brown J. H., Maurer B. A., Macroecology: The division of food and space among species on continents. Science 243, 1145–1150 (1989). [DOI] [PubMed] [Google Scholar]
  • 7.Brown J. H., Macroecology (University of Chicago Press, 1995). [Google Scholar]
  • 8.Field R., Macroecology takes wing. Glob. Ecol. Biogeogr. 11, 87–88 (2002). [Google Scholar]
  • 9.Prosser J. I., et al. , The role of ecological theory in microbial ecology. Nat. Rev. Microbiol. 5, 384–392 (2007). [DOI] [PubMed] [Google Scholar]
  • 10.Shade A., et al. , Macroecology to unite all life, large and small. Trends Ecol. Evol. 33, 731–744 (2018). [DOI] [PubMed] [Google Scholar]
  • 11.McGill B. J., The what, how and why of doing macroecology. Glob. Ecol. Biogeogr. 28, 6–17 (2019). [Google Scholar]
  • 12.Ji B. W., Sheth R. U., Dixit P. D., Tchourine K., Vitkup D., Macroecological dynamics of gut microbiota. Nat. Microbiol. 5, 768–775 (2020). [DOI] [PubMed] [Google Scholar]
  • 13.Zaoli S., Grilli J., A macroecological description of alternative stable states reproduces intra- and inter-host variability of gut microbiome. Sci. Adv. 7, eabj2882 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shoemaker W. R., Locey K. J., Lennon J. T., A macroecological theory of microbial biodiversity ecology. Nat. Ecol. Evol. 5, 1–7 (2017). [DOI] [PubMed] [Google Scholar]
  • 15.Grilli J., Macroecological laws describe variation and diversity in microbial communities. Nat. Commun. 11, 1–11 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Taylor L. R., Aggregation, variance and the mean. Nature 189, 732–735 (1961). [Google Scholar]
  • 17.Redner S., Random multiplicative processes: An elementary tutorial. Am. J. Phys. 58, 267–273 (1990). [Google Scholar]
  • 18.Muñoz M. A., Colaiori F., Castellano C., Mean-field limit of systems with multiplicative noise. Phys. Rev. E 72, 056102 (2005). [DOI] [PubMed] [Google Scholar]
  • 19.Descheemaeker L., de Buyl S., Stochastic logistic models reproduce experimental time series of microbial communities. eLife 9, e55650 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wolff R., Shoemaker W., Garud N., Ecological stability emerges at the level of strains in the human gut microbiome. mBio 14, e0250222 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Shoemaker W. R., A macroecological perspective on genetic diversity in the human gut microbiome. PLoS One 18, e0288926 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sireci M., Muñoz M. A., Grilli J., Environmental fluctuations explain the universal decay of species-abundance correlations with phylogenetic distance. Proc. Natl. Acad. Sci. U.S.A. 120, e2217144120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kehe J., et al. , Positive interactions are common among culturable bacteria. Sci. Adv. 7, eabi7159 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jiliang H., Amor D. R., Barbier M., Bunin G., Gore J., Emergent phases of ecological diversity and dynamics mapped in microcosms. Science 378, 85–89 (2022). [DOI] [PubMed] [Google Scholar]
  • 25.Shetty S. A., et al. , Inter-species metabolic interactions in an in-vitro minimal human gut microbiome of core bacteria. Biofilms Microbiomes 8, 21 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Weiss A. S., et al. , In vitro interaction network of a synthetic gut bacterial community. ISME J. 16, 1095–1109 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Coyte K. Z., Schluter J., Foster K. R., The ecology of the microbiome: Networks, competition, and stability. Science 350, 663–666 (2015). [DOI] [PubMed] [Google Scholar]
  • 28.Butler S., O’Dwyer J. P., Stability criteria for complex microbial communities. Nat. Commun. 9, 2970 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lozupone C. A., Stombaugh J. I., Gordon J. I., Jansson J. K., Knight R., Diversity, stability and resilience of the human gut microbiota. Nature 489, 220–230 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Relman D. A., The human microbiome: Ecosystem resilience and health. Nutr. Rev. 70, 8 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.de Dios Caballero J., et al. , Individual patterns of complexity in cystic fibrosis lung microbiota, including predator bacteria, over a 1-year period. mBio 8, e00959-17 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Newell P. D., Douglas A. E., Interspecies interactions determine the impact of the gut microbiota on nutrient allocation in Drosophila melanogaster. Appl. Env. Microbiol. 80, 788–796 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Khanna S., Raffals L. E., The microbiome in Crohn’s disease: Role in pathogenesis and role of microbiome replacement therapies. Gastroenterol. Clin. N. Am. 46, 481–492 (2017). [DOI] [PubMed] [Google Scholar]
  • 34.Frank D. N., et al. , Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc. Natl. Acad. Sci. U.S.A. 104, 13780–13785 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Palmer J. D., Foster K. R., Bacterial species rarely work together. Science 376, 581–582 (2022). [DOI] [PubMed] [Google Scholar]
  • 36.Xiao Y., et al. , Mapping the ecological networks of microbial communities. Nat. Commun. 8, 2042 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Angulo M. T., Moog C. H., Liu Y.-Y., A theoretical framework for controlling complex microbial communities. Nat. Commun. 10, 1045 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Matchado M. S., et al. , Network analysis methods for studying microbial communities: A mini review. Comput. Struc. Biotech. J. 19, 2687–2698 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Pinto S., Benincà E., van Nes E. H., Scheffer M., Bogaards J. A., Species abundance correlations carry limited information about microbial network interactions. PLoS Comput. Biol. 18, e1010491 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ho P.-Y., Good B. H., Huang K. C., Competition for fluctuating resources reproduces statistics of species abundance over time across wide-ranging microbiotas. eLife 11, e75168 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Descheemaeker L., Grilli J., de Buyl S., Heavy-tailed abundance distributions from stochastic Lotka-Volterra models. Phys. Rev. E 104, 034404 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Grilli J., Barabás G., Michalska-Smith M. J., Allesina S., Higher-order interactions stabilize dynamics in competitive network models. Nature 548, 210–213 (2017). [DOI] [PubMed] [Google Scholar]
  • 43.May R., Will a large complex system be stable? Nature 238, 413–414 (1972). [DOI] [PubMed] [Google Scholar]
  • 44.Allesina S., Tang S., Stability criteria for complex ecosystems. Nature 483, 205 (2012). [DOI] [PubMed] [Google Scholar]
  • 45.Mitchell A. L., et al. , EBI metagenomics in 2017: Enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res. 46, D726–D735 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Busiello D. M., Suweis S., Hidalgo J., Maritan A., Explorability and the origin of network sparsity in living systems. Sci. Rep. 7, 12323 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.D’Souza G., et al. , Ecology and evolution of metabolic cross-feeding interactions in bacteria. Nat. Prod. Rep. 35, 455–488 (2018). [DOI] [PubMed] [Google Scholar]
  • 48.Goldford J. E., et al. , Emergent simplicity in microbial community assembly. Science 361, 469–474 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mougi A., The roles of amensalistic and commensalistic interactions in large ecological network stability. Sci. Rep. 6, 29929 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gibson T. E., et al. , Intrinsic instability of the dysbiotic microbiome revealed through dynamical systems inference at scale. bioRxiv [Preprint] (2021). https://www.biorxiv.org/content/10.1101/2021.12.14.469105v1 (Accessed 14 December 2023).
  • 51.Joseph T. A., Shenhav L., Xavier J. B., Halperin E., Pe’er I., Compositional Lotka-Volterra describes microbial dynamics in the simplex. PLoS Comput. Biol. 16, e1007917 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Dedrick S., Warrier V., Lemon K. P., Momeni B., When does a Lotka-Volterra model represent microbial interactions? Insights from in vitro nasal bacterial communities Msystems 8, e00757-22 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bairey E., Kelsic E. D., Kishony R., High-order species interactions shape ecosystem diversity. Nat. Commun. 7, 12285 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ludington W. B., Higher-order microbiome interactions and how to find them. Trends Microbiol. 30, 618–621 (2022). [DOI] [PubMed] [Google Scholar]
  • 55.George A. B., O’Dwyer J., Universal abundance fluctuations across microbial communities, tropical forests, and urban populations. Proc. Natl. Acad. Sci. U.S.A. 120, e2215832120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Lim J. J., et al. , Growth phase estimation for abundant bacterial populations sampled longitudinally from human stool metagenomes. Nat. Commun. 14, 5682 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.McGill B. J., et al. , Species abundance distributions: Moving beyond single prediction theories to integration within an ecological framework. Ecol. Lett. 10, 995–1015 (2007). [DOI] [PubMed] [Google Scholar]
  • 58.Wilson W. G., et al. , Biodiversity and species interactions: Extending Lotka-Volterra community theory. Ecol. Lett. 6, 944–952 (2003). [Google Scholar]
  • 59.Yonatan Y., Amit G., Friedman J., Bashan A., Stability criteria for complex microbial communities. Nat. Ecol. Evol. 6, 693–700 (2022). [DOI] [PubMed] [Google Scholar]
  • 60.Toral R., Colet P., Stochastic Numerical Methods (Wiley-VCH, Weingheim, Germany, 2014). [Google Scholar]
  • 61.Virtanen P., et al. , SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Data Availability Statement

Previously published data were used for this work (15).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES