Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2017 Mar 13;12(3):e0173765. doi: 10.1371/journal.pone.0173765

Inferring interactions in complex microbial communities from nucleotide sequence data and environmental parameters

Yu Shang 1,*, Johannes Sikorski 1,6, Michael Bonkowski 2, Anna-Maria Fiore-Donno 2, Ellen Kandeler 3, Sven Marhan 3, Runa S Boeddinghaus 3, Emily F Solly 4, Marion Schrumpf 4, Ingo Schöning 4, Tesfaye Wubet 5,6, Francois Buscot 5,6, Jörg Overmann 1,6
Editor: Hauke Smidt7
PMCID: PMC5348048  PMID: 28288199

Abstract

Interactions occur between two or more organisms affecting each other. Interactions are decisive for the ecology of the organisms. Without direct experimental evidence the analysis of interactions is difficult. Correlation analyses that are based on co-occurrences are often used to approximate interaction. Here, we present a new mathematical model to estimate the interaction strengths between taxa, based on changes in their relative abundances across environmental gradients.

Introduction

The composition of microbial communities is a key driver of ecological processes [13]. Changes in the abundances of species can occur in response to abiotic selection pressures, neutral assembly processes and be affected by organismal interactions. Biotic interactions are multifarious (competition, mutualism, commensalism, amensalism, and antagonism including parasitism and predation) and may have positive, negative, or neutral consequences for either one or both interacting partners. In most of the cases, outcomes of interactions between two species may be asymmetric in terms of abundance, e.g. predators will negatively affect the abundance of prey, but prey consumption will increase the abundance of the predator. As another example, a strong competitor will decrease the abundance of a neighboring species, but the latter may have no net effect on the former. Moreover, the strength of the influence can also differ, e.g. a preferred prey receives stronger top-down control than a general prey. Conversely, the prey could exert only a weak positive influence on the predator if it is of minor food quality.

While biodiversity studies of higher organisms have yielded large sets of multitrophic data and enabled a comprehensive analysis of interactions [4], the peculiarities of microbial communities render the detection and study of interactions very challenging. Whereas potential pairwise interactions among microorganisms can be studied directly in the laboratory, the determination of interactions in large and complex biotic communities under natural conditions is limited. Soil ecosystems in particular harbor extremely diverse microbial communities. The diversity of bacteria may reach 104 species [5] and average cell numbers of 1010 per gram of soil [6]. This results in highly complex networks of coexisting microbes [7]. Secondly, only a minority (0.1%–0.001%) of the microbial diversity has been cultivated to date [8], which precludes experimental analysis of the majority of interactions in the laboratory. Thirdly, the heterogeneous structure of the soil habitat at the microscale represents an additional methodological and conceptual challenge. Due to the complexity of soil structure, microbial cells typically occur in a non-random fashion in clusters with cell-to-cell distances of only 1-10 μm, which is the distance at which interactions between microbes is assumed to take place by either direct cell to cell contact or by diffusion limit of chemical substances. However, at this spatial scale, it is impossible to study organismal interactions simply by passive observations as in macroorganisms. In contrast, most studies of microbes in complex habitats typically are conducted through destructive sampling methodology, precluding the consecutive analysis of the same microbial community over time, which would be a prerequisite to applying the established discrete-time Lotka-Volterra models [9]. As a result, the lack of knowledge on species interactions has remained one of the major shortfalls in understanding the drivers of ecosystem functions [10]

A major step forward in analyzing the composition of microbial communities in the environment has been the advent of high-throughput next-generation sequencing technology for microbial DNA or RNA extracted from the environment. This enables the determination of the relative abundance of many taxa per sample and the analysis of co-occurrence and correlation patterns, which have been suggested as proxies for species interactions [7, 11]. However, the latter approach may only reflect similar responses of different species towards environmental pressures rather than direct interaction and cannot resolve interactions that are asymmetric. Also, several important properties of interactions can not be captured by correlations. Firstly, interactions may be asymmetric such that taxon A may influence B negatively but B influences A positively. In contrast, the correlation of A with B is the same as B with A, hence symmetric. Secondly, the direction of interaction may be different, e.g., A influences B positively, but under some circumstances, A may influence B negatively. In contrast, the concept of correlation analysis requires that the abundance relationship of two taxa remains consistent throughout: if A increases, B always either increases too (positive correlation) or decreases (negative correlation).

Fisher et al. [9] established the LIMITS algorithm to infer the interaction among microbial species from the time series data based on the discrete time Lotka-Volterra Model. Bucci et al. [12] also suggested utilizing the generalized Lotka-Volterra equation with time-dependent perturbation to analyse the interaction from the time series data. However, both of these approaches can not be used for series of cross-sectional data which originate from samples along gradients in environmental parameters but do not have a temporal dimension. Therefore, here, we present a novel method which is based on generalized Lotka-Volterra models and allows to quantify interactions from high-throughput microbial sequence data derived from cross-sectional samples for which also larger datasets on environmental parameters are available. This method does not rely on data obtained during different time points, since the strength and direction of interactions between partners are also a function of gradients in abiotic environmental parameters, for example of temperature [13], nutrient conditions [14] or other factors such as soil moisture, mineral content, pH or osmolarity [1517]. Biswas et al. [18] developed a Poisson-multivariate normal hierarchical model to analyse interactions, taking into account also the environmental gradients. In this work, the environmental parameters are included into the parameters of the Poisson function which is used to express the species abundance distribution. But, this calculation still represents a type of correlation analysis. Hence, the interaction matrix is symmetric and can not account for asymmetric interactions. On the contrary, our method is fundamentally different from correlation analysis and can analyse asymmetric interactions. Sugihara et al. [19] developed a causality test (CCM) to infer the causal link between the time-series variables which are not correlated. CCM analysed locally the historical time trajectory of the different variables and measured the extent to which the historical record of one variable can reliably estimate states of another variable. CCM can infer whether these variables are strongly linked to each other without consideration of environmental gradients. However, CCM can not account for the asymmetry property in the interaction relationship. In contrast, our approach can address this issue in the sense of changing environmental gradients but does not take into account the time series data. The basic assumptions of our model are that (i) interactions lead to changes of abundances within individual species, and that (ii) the abundances of a species are a function of the species abundances of the remaining community as well as of environmental parameters. We first describe the theoretical foundation of our framework and then suggest a numerical implementation calculating the direction and strength of interactions between two pairs of taxa, which depend on the quality criteria assigned to the determination of environmental parameter values and relative taxon abundances retrieved from high-throughput sequences. We further present methods to calculate single representative interaction values and to determine their robustness by appropriate statistical tools such as random sampling. Finally, we discuss the potentials of our approach.

Methods

The aim of our approach is to determine the interaction coefficient βij, which quantifies the interaction strength of species j on species i. We firstly present the theoretical basis for our approach. Secondly, depending on the precision quality of the input data, we suggest numerical calculation approaches estimating βijαk for a given environmental parameter α and a given sample k. The result is a two-dimensional matrix of numerical βijαk values with m (m is the number of environmental parameters) rows and N (N is the number of samples) columns for each pair of species j having interaction influence on species i. Thirdly, we addressed the issue about data structure and the precision of the calculation. We fourthly present several strategies to summarize the βijαk values into a global βij value. Finally, we present an estimate for the robustness of βij based on random sampling and the addition of numerical noise to the input data.

Theoretical deductions

Species abundances change as a response to changing environmental conditions and abundances of interacting organisms. Therefore the information on the direction and strength of biotic interactions must be stored in the change of species abundances and hence can be extracted from that.

The basic idea of this methods is to analyze the influence from other interacting species abundance on the rate of change of specific species abundance in the sense of environmental gradients.

The Fig 1 shows the detailed conceptual idea of this approach and the sketch on the analysis workflow of obtaining βij interaction values from values of relative abundances of taxa and environmental parameters obtained from a set of sampling sites. The change of species abundance Ai and Aj with respect to one environmental parameter θ are presented as a curve in subplots (a) and (e), respectively. In subplot(a), the rate of change pi of Ai with respect to Θ was determined by calculating the slope of the tangent vector on each curve point. It reflects the influence of environmental parameter Θ on the change of abundance Ai across the environmental gradients. On the red triangular point, the slope is positive, means the rate change at this point (corresponding to one θ value) is positive, the abundance Ai has the increasing tendency at that gradient value of Θ. Similarly, the orange triangular point has a negative rate of change reflecting the decreasing tendency of Ai under these (larger) values of the gradient of Θ. Based on the calculated pi for each values of θ and the abundance curve of Aj, the relationship between pi and Aj is shown in subplot (b) (the red triangular has higher abundance values of Aj than the orange triangular). Then, the influence of Aj on the change of pi can be denoted by the tangent vector (light green triangular and dark green triangular are the two example points). The slope of this new tangent vector stand for the rate of change of pi with respect to Aj, and it is denoted as βij. βij is the interaction influence from species j on species i in the sense of θ. The relationship of βij and Aj is presented in subplot (c). Due to the relationship of Aj and θ in subplot(b), the relationship between βij and θ is shown in subplot (d). The subplot (d) inform us the change of interaction influence βij in different environmental conditions which are corresponding to the different value of θ. Similarly, we can do the same rate change analysis for Aj the corresponding results are shown in subplots (f), (g) and (h).

Fig 1. The sketch on the analysis workflow.

Fig 1

By analyzing the rate of change of species abundance, pi, and the change of pi with respect to other species, we can deduce the interaction influence between them. From this figure, βij is positive, suggesting that the species j has a positive interaction influence on species i. Conversely, βji is negative, suggesting that the species i has a negative interaction influence on species j. Although one pair of species shares the same interaction relationship, the effect of the interaction on both of them could be different, not only in the direction but also on the strength. Moreover, based on the subplot (a) and (e), there is no clear correlation between Ai and Aj. This indicates that the correlation is not equal to interaction, hence, the analysis of correlation between the species abundances is not suitable to infer the interaction relationship between them. The parameter βij is chosen in analogy to the Lotka-Volterra equation. The details are explained as follows.

Assume that the abundance Ai of species i is the smooth function of interacting species Aj(ji) and the environmental parameters Θα, α = 1, 2, ⋯, m (in case of soil, for example, soil moisture, pH, nutrient contents; where changes in time are available, even time t can be used). In a two species interaction system, the change in abundance of both species in response to the change of environmental parameters and biotic interactions are

dAidΘ=Si(Ai)+Iij(AiAj)dAjdΘ=Sj(Aj)+Iji(AjAi) (1)

where Si can be treated as the solitary part of species i, i.e. change of Ai independent of any influence from other species, it is also influenced by the environmental parameter. The derivative dAidΘ is the rate of change of Ai with respect to the change of values of Θ. Iij is the influence from species j on species i, which is a function of Ai, Aj, and also Θ. In different environmental conditions, Iij will be different. Accordingly, the interaction can be analysed based on the gradient of Θ and will demonstrate how the interaction levels change across the different environmental conditions. Note that this can be asymmetric, i.e., IijIji. Thus, the effects of the interaction between species i and j could be different with respect to the rate of change of Ai and of Aj. Note also that the exact mathematical form of Si, Sj is often unknown due to lack of suitable experimental data, but can be approximated to follow the Monod equation or logistic equation [2022]. To analyze the interaction, Iij and Iji need to be resolved. In order to calculate the change of dAidΘ with respect to Aj we remove the unknown part Si, as follows

d(AidΘ)dAj=dIij(Ai,Aj)dAj (2)

The interaction information is contained on the right-hand side of the equation. For calculating the interaction numerically, we need the concrete mathematical expression of Iij. For simplification, we assume:

Iij=βijAiAj (3)

Because Ai is a multivariate function of Θα, the rate of change of Ai with respect to Θα which is also a multivariate function of Θα can be expressed by using the partial derivative:

piα=AiΘα (4)

As the change of p is affected by βij, the information of βij is stored in the change of p.

With the approximation of Eq (3), βij can then be estimated as:

βijα=piαAj1Ai (5)

We define the interaction level βij as the rate of change of p with respect to the abundance Aj of species j. Thus, the interaction level βij will be the smooth functions of species abundances Ai, Aj and the environmental gradients stored in Θα.

The above concept of using changes in species abundance for the calculation of interaction values is analogous to time-dependent generalized Lotka-Volterra equations (predator-prey equations):

dA1dt=r1A1(1-A1K1+β12A2K1)dA2dt=r2A2(1-A2K2+β21A1K2) (6)

Here, the parameters r1, r2 are the growth rate, K1, K2 are the carrying capacity of the system [22]. Comparison of Eq (6) to Eq (1) demonstrates that: Θ is equivalent to the time parameter t, and I12=r1K1β12A1A2. Incorporation of r1K1 into β12 yields the:

I12=β12*A1A2 (7)

By using Eq (5), we can estimate β12*=d(A1dt)dA21A1, which represents the estimation of the interaction level from the species abundance change in the Lotka-Valterra equation.

Numerical determination of βijαk values

In microbial ecology, absolute abundances of individual cells can usually not be determined for all taxa at all taxonomic hierarchy levels. With high-throughput sequence data, the abundance of a given taxon in sample k is actually given as a relative abundance value, which is the number of sequences reads assigned to that taxon among all sequence reads in the respective sample k. The determination of the relative abundance value of a specific taxon by high-throughput sequencing is not error-free. Small but uncontrollable variations in nucleic acid extractions, cDNA synthesis (in case RNA is extracted), amplicon primer ligation, and sequencing runs on high-throughput sequencers add uncertainty to the estimated relative abundance value. In the case of abundant taxa, typically at class or phylum level, the uncertainty may encompass just a 1% to 10% error level [23]. However, for less abundant taxa at the level of genera or species (defined by 97% similarity of the 16S rRNA gene [24]), the error could be much larger (two–fold, own unpublished data).

Similarly, the determination of physicochemical environmental parameters from soil such as pH, soil moisture, carbon and nitrogen content, is accompanied by uncertainty errors mostly due to soil heterogeneity which may also be in the range of 1% to 15% (own unpublished data).

We refer to data with assumed low (1-10%) experimental error in the estimation of numerical input data, and describe how to numerically calculate AiΘα and piαAj from a data set derived from different samples using the Taylor expansion [25].

If the samples are denoted by using the index k = 1, 2, ⋯ N, we denote Aik, Θαk as the abundance of species Ai and environmental parameter Θα in sample k, respectively. The rate of change of Ai with respect to Θα in sample k is defined as the partial derivative:

piαk=AikΘαk (8)

and the interaction level βij as the rate of change of piαk with respect to species j abundance Ajk according to

βijαk=piαkAjk1Aik (9)

βijαk represents the interaction value characterizing the interaction influence of species j on species i accompanying the change in environmental parameter Θα in sample k. This allows analyzing the interaction of species j on species i for different environmental parameters, and its change across different environmental conditions.

Note that for this part of the analysis, the numerical calculation of the partial derivative AikΘαk and piαkAjk normally requires to fix the values of the environmental parameters Θα to be the same in the other samples as in sample k. This mathematical requirement can not be fulfilled as real world samples differ typically at the same time in both species abundances and values of environmental parameters. We, therefore, make use of the Taylor expansion of multivariate functions to obtain the accurate numerical calculation when both environmental parameter values α and species abundances A change across samples k simultaneously. As addressed above, Aik is a multivariate function of environmental parameters {Θα} and other interacting species Ajk.

Between two different samples, the abundance of species i can be expressed as Aik(Θk) and Ail(Θl). Here, Θk and Θl are the corresponding environmental parameters in sample k and sample l. Using the Taylor expansion, the difference between Aik(Θk) and Ail(Θl) can be expressed as

Ail=r1=0rd=0s1=0sm=0(A1l-A1k)r1(Adl-Adk)rd(Θ1l-Θ1k)s1(Θml-Θmk)smr1!rd!s1!sm!·(r1++rd+s1++smAilA1r1AdrdΘ1s1Θmsm)(A1k,,Adk,Θ1k,,Θmk) (10)

Using only the linear part of the approximation simplifies the formula as follows:

Ail(Θl)-Aik(Θk)=α(Θαl-Θαk)Aik(Θ)Θαk+ji(Ajl-Ajk)Aik(Θ)Ajk=α(Θαl-Θαk)piαk+ji(Ajl-Ajk)pijk (11)

The first part addresses the influence from the change of environmental parameters whereas the second part addresses the influence from the change of other species abundances. We fix the sample k, and let the sample l run across all remaining samples, in order to then estimate piαk from linear regression. Actually, the term pijk, which addresses the rate of change of Ai with respect to Aj in sample k, can also be estimated as the by-product.

Because pijk is not necessary in the following βijαk calculation, we reduce the model in Eq (11) to

Ail=s1=0sm=0(Θ1l-Θ1k)s1(Θml-Θmk)sms1!sm!·(s1++smAilΘ1s1Θmsm)(Θ1k,,Θmk) (12)

and then address the linear part of the approximation by

Ail(Θl)-Aik(Θk)=α(Θαl-Θαk)Aik(Θ)Θαk=α(Θαl-Θαk)piαk (13)

in order to estimate piαk from the linear regression using Eq (13).

After fixing the value of piαk, βijαk is calculated using the same strategy. Because piαk is the multivariate function of Ajk,j=1,2,,n and the environmental parameters Θα, the difference between pik(Θk) and pil(Θl) can be also expressed using the full version of Taylor expansion.

piαl=r1=0rd=0s1=0sm=0(A1l-A1k)r1(Adl-Adk)rd(Θ1l-Θ1k)s1(Θml-Θmk)smr1!rd!s1!sm!·(r1++rd+s1++smpiαlA1r1AdrdΘ1s1Θmsm)(A1k,,Adk,Θ1k,,Θmk) (14)

Hence, analogous to Eq (11), the linear part to describe βijαk based on two samples k and l is given by:

piαl-piαk=j(Ajl-Ajk)piαkAjk+α(Θαl-Θαk)piαkΘαk (15)

here, the first part in Eq (14) addresses the change of other species abundances, whereas the second part addresses the change of environmental parameters. Analogous to Eq (13), the linear regression can be used to estimate piαkAjk. Based on the Eq (9), the values of βijαk are calculated as

βijαk=piαkAjk1Aik (16)

The terms piαkΘαk can be estimated also from Eq (15). These by-products address the influence from the environmental parameter change on the change of piαk. Although they do not provide information on the interaction between species i and j, they may provide valuable information of the interaction value βijαk.

In case that piαkΘαk is of no interest, the model Eq (14) can be reduced to a simplified version by making use of only the first part in each Eqs (14) and (15) to then calculate βijαk using Eq (16). This simplified version represents a different model of interaction calculation.

All the numerical calculations above are based on the linear part of the Taylor expansion. In order to cope with potential nonlinear properties of the data, it is also possible to include the higher order terms in the linear regression model. For example, when the second order terms are added into the Eq (13)

Ail(Θl)-Aik(Θk)=α(Θαl-Θαk)Aik(Θ)Θαk+12αβ(Θαl-Θαk)(Θβl-Θβk)2Aik(Θ)ΘαkΘβk (17)

the resulting Eq (17) will allow performing numerical calculations by additionally including the nonlinear part. However, the numerical calculations will substantially increase in complexity.

The models introduced allow different levels of precisions (Eqs (10)–(17)). The user can choose any of these based on the defined preferences, e.g., numerical precision of the input data or also the availability of computational power.

Data structure and precision of calculation

Data sparsity is the first issue which needs to be taken into account. In Eq (16), term Aik is in the denominator. If the species has not been observed and hence has the abundance value 0 in one sample, this species cannot be included in the mathematical treatment. Therefore, abundance values of 0 have to be removed before interaction analysis.

The second important issue is the data type of abundance Ai. For several organismic groups such as most plants and animals absolute abundance values Ai can be determined for each taxon. In contrast, a taxon-wise determination of absolute abundances is technically hardly feasible for bacterial microbes. Microbial communities are typically assessed by metagenomic high-throughput sequence data which yield relative abundances of taxa. However, since the overall cell numbers of microorganism in many cases do not change to a large extent, changes in absolute abundances are expected to be less pronounced than those in relative composition.

The structure of relative abundance data is characterized by an intrinsic compositional effect. This could produce misleading results in correlation analysis and would not reflect the true correlation as would have been the case for absolute abundance values [2631]. As the sum of relative species abundance values by definition is constrained to 1, these values are not independent of each other. Fluctuations in the relative abundance of one species have an effect on the relative abundance of the rest of the community without that the rest of the community may actually have changed in absolute abundance. For example, if the abundance of a dominant species (e.g. 95% of all species) changes, relative abundances of all other species vary in the opposite direction. This creates artificial negative correlations with the dominant species which would not be the case with absolute abundances. Compositional effects may be severe in some data sets but mild in others. For example, compositional effects are most pronounced in communities with low species richness and/or pronounced dominance structure. The α diversity (of the samples in question is, therefore, a good predictor of the strength of compositional effects [28]. To decrease the influence of compositional effects, a series of methods based on the log-transformed techniques was developed [2629].

Our interaction approach is not only conceptually but also mathematically fundamentally different from a correlation analysis. For example, whereas correlation aims to maximize the recovery of covariance cov(Ai, Aj), our interaction approach aims to derive the correct partial derivative of abundance Aj with respect to the environmental parameters Θ and other species Aj. Thus, compositional effects have to be treated differently, as we show in detail below.

First, we discuss the relationship between the absolute abundance and the relative abundance, and the difference in results when we did the interaction analysis on both types of abundance data.

In the formalism of the interaction analysis, the terms Aik(Θ)Θαk, Aik(Θ)Ajk play the very important parts in the whole calculation. For generality, we suppose yi as the absolute abundance of species i, xi as the relative abundance. We need to deduce the relationship between yiΘα yiyj and xiΘα xixj.

The relationship between xi and yi is

xi=yiαyα (18)

calculate the derivative on both sides of Eq (18), we have

xi=yiαyα-yiαyα(αyα)2 (19)

Therefore, we have

xiΘ=yiΘαyα-yiαyαΘ(αyα)2=yiΘαyα-yiαyαΘ(αyα)2 (20)
xixj=yiαyα-yiαyαyjαyα-yjαyα=(αyα-yi)yiyj-yjαiyαyjαyα-yjyiyj-yjαiyαyj (21)

When ∑α yα > > yα, diversity of the species is high. The term yiαyαΘαyα approaches zero. Eq (19) reduce to xiΘyiΘαyα. This approximation reveals that the effect of the environmental parameter Θ on the rate of change of relative abundance is different from the corresponding rate of change of absolute abundance by a factor 1αyα. This factor has a positive value, and will keep the sign of xixj yiyj the same. Similarly, Eq (21) reduce to xixjyiyj. This approximation means that the effect of the species j on the rate of change of relative abundance of species i is roughly the same as the corresponding rate of change of species absolute abundance of the species. Therefore, the interaction analysis based on the relative abundance can be roughly approximate to the corresponding analysis based on the absolute abundance. However, in the case of a lower species diversity, the upper approximation will not exist anymore, and the relation between the relative abundance and absolute abundance will become complex. Since in most cases microbial communities are highly diverse, our interaction analysis is expected to yield reasonable results. Even for lower diversity, interaction analysis of relative abundance data can yield useful data for understanding ecosystem properties.

Actually, the compositional effect has its basis in the non-independence of the relative abundance. In order to decrease the effect of non-independence, a robust algorithm is needed for the numerical calculations. The precision and robustness of our numerical calculation depend on the linear regression estimates (Eqs (11)–(13), etc.). The least squares estimation (function stats::lm() in the R language) is not a robust algorithm since it is very sensitive to the initial input data. Therefore, we applied the more robust maximal likelihood estimation instead (R function MASS::rlm()). However, this algorithm has some requirements: input data should not have singularity, i.e. no linear relationship (collinearity) among the columns of the input data matrix [32, 33]. Therefore, the test for singularity on both relative abundance data and environmental parameters data needs to be performed before regression analysis. If the input data have collinearity, the algorithm will remove one species or one environmental parameter randomly, and repeat the test for singularity in the new data sets until all the collinearity relationships are removed. This pretreatment not only improves the robust numerical calculation but also decreases the compositional effects.

Another issue related to the precision of calculation is the relationship between the samples number and the number of variables. Sample number should be larger than the number of variables to avoid indeterminate equations or overfitting. Other suggestions to avoid overfitting which are not used in our methods are discussed in [3436].

Summarizing βijαk into the global interaction level βij

The interaction level βijαk has four indexes i, j, α, k, which refer to a specific pair of taxa i, j, a specific environmental parameter α and a specific sample k. For each pair of species j with interaction influence on species i, there is a two-dimensional matrix of numerical βijαk values with α rows and k columns. In either row or column, the values can be either positive, negative or zero. Positive values indicate a positive influence, negative values indicate a negative influence. Values may be non-normal distributed including extreme outlier values. To summarize these results into a more global interaction level value βij between species i and species j, we suggest the following different methods which can be chosen based on user preference.

Prior to any summarizing approach, users may decide to give different weight to βijαk values for different environmental parameters, based on some prior knowledge about Θα. We estimate βijk by performing a linear combination of βijαk across all the environmental parameters:

βijk=α(Cα×βijαk) (22)

where Cα is the applied weight of a given environmental parameter α. Prior knowledge on Cα can be obtained from, e.g. multivariate statistics. A Redundancy Analysis (RDA) allows determining those environmental parameters which significantly contribute to an observed community composition. The eigenvalue for each Θα in RDA analysis can be used as Cα weight. The derived weighted βijαk values can be summarized in the same way as the original βijαk values using the methods suggested below.

The most straightforward way is to summarize βijαk estimates by standard summarizing statistics (mean, median, maximum, minimum). This approach retains the strength and direction of the interaction.

It is not recommendable, to sum up to βijαk values, as positive and negative values could equal out each other resulting in a rather low βij value. In case the user is interested in a sum value, the standard norm definition could be applied. Note that this is possible only at the expense of losing information about the direction of interaction, as only positive values will be obtained. For example,

βijk=(βijαk)=α(βijαk)2 (23)

represents a summed βijαk interaction level between species i and species j across all environmental parameters α at sample k. Using the same formula as in Eq (23), several other quantities could be determined, e.g., βij would represent a summed βijk across all samples k. Similarly, βik represents the summed βijk across all cases where ji. Finally, βi represents the summed βik, which is the global influence of interaction from all the other species on species i across all the samples k.

Neither of the summarizing statistics addressed above captures the center of βijαk values for a given environmental parameter α across all samples k appropriately and might therefore not yield the necessary insight into the interaction structure. A curve fitting approach including bootstrapping on βijαk values with subsequent peak value extraction, as implemented in the eHOF R package [37] would be appropriate but could be computationally demanding with increasing taxa and environmental parameter numbers. As a compromise, we extract the median of those values which are represented in the peak from a βijαk density distribution. The peak values, one per environmental parameter α for each species pair ij or ji and denoted therefore as Mα, could be summarized using the above standard summarizing statistics but would also suffer from the same shortcomings.

We, therefore, propose a custom approach which focuses on the dominant patterns of direction and strength of βijαk values. The result will be a conservative estimate of direction and strength of βij values reflecting the dominant interactions between species i and species j. This procedure is based on two steps and may involve several user-based definitions of applied threshold values.

Firstly, the direction of interaction by categorizing βijαk values is determined for each α across all samples k as being either positive or negative. In case the majority (we use 80%) of all βijαk of a given α belongs to either category, the direction of interaction is classified either as positive or negative, respectively. In case that no preponderance can be identified, the given α does not contribute to a global βij determination and hence is ignored in the further analysis. The above peak determination approach yields a set of Mα values along with robust assignment of direction (either positive or negative).

Secondly, the set of Mα values per each taxon pair ij or ji is used to yield a global interaction value βij or βji, respectively. Note that Mα values are characterized by a direction (positive or negative) and by a certain strength (magnitude of the numerical value). Depending on the type of distribution of both direction and strength of value, two different ways for further evaluation can be taken into account. In case that the majority (we use 80%) of Mα values can be assigned to either direction, the respective Mα values are summarized by determining the median value, which represents then the global βij across all α parameter and k samples and is additionally characterized by a specific direction (positive or negative). Note that based on users interest, any other majority threshold value and summarizing statistic such as mean, minimum, or maximum can also be taken into account. However, in case that no preponderance can be identified, a decision based on the above majority rule on direction can not be taken and will be replaced by a decision based on the magnitude of Mα values. We determine for each group of positive or negative Mα values the respective median values Mα+ and Mα values. In case the ratio of absolute values of Mα+ or Mα value is larger than two, the direction of the interaction is assumed to be represented by the larger M value (either Mα+ or Mα), with the respective M value being the global βij across all α parameter and k samples. If Mα+ and Mα have comparable absolute values and also have equal proportions of either positive or negative direction, it is concluded that it is not possible to determine a global βij across all α parameter and k samples for the specific species pair of i and j.

In sum, we have suggested several workflows summarizing βijαk values into a global interaction value βij that also contains the information on the direction (positive or negative interaction). Note that the choice of methods and choice of settings of several threshold values for a decision on intermediate steps is dependent on user preferences.

Robustness estimation of βij values

The magnitude and sign of the βij value depend on variables of the experimental data, such as variability of sampling site choice and the reliability (degree of precision) with which numerical values such as relative abundances of taxa or environmental parameter were determined [38]. We, therefore, implemented several methods to explore the robustness of the βij value (strength and direction) with respect to sampling site choice and numerical precision uncertainties in determining relative abundances and values of environmental parameter. Firstly, the effect of samples, which include values for both the environmental parameter and the relative abundances of taxa, is accessed by random sampling on soil samples. Secondly, we add numerical noise to either the original data of relative abundances or the environmental parameter values by randomly adding or subtracting error terms using formula

Θαk˜=Θαk(1+ϵ) (24)

where Θαk is the original environmental parameters matrix, Θαk˜ is the perturbed data matrix, and ϵ is the error term. Values for ϵ are generated as follows:

ϵ=U(-1,1)e (25)

where U(−1, 1) is the uniform distribution in the range [−1, 1] and e is the error level. The height of the error level, given as the proportion of the original value, e.g. 0.01% to 50%, can be defined by the user.

Similarly, random perturbations can be added to the numerical values of the species abundances. Values of βij obtained in repeated runs of data perturbation are then summarized using monovariate statistics (mean, 95% confidence interval, null hypothesis testing).

Application and results

We tested our method on datasets obtained from grassland soils of the German Biodiversity Exploratories (http://www.biodiversity-exploratories.de; [39]). The sampling plots were located in three regions in Germany: Schorfheide-Chorin (Schorfheide Exploratory; SE) in Brandenburg, national park Hainich-Dün (HE) in Thuringia, and biosphere reserve Schwäbische Alb (AE) in Baden-Württemberg. In every region, 50 grassland sites with different land-use intensities were investigated in the year 2011, resulting in a total of 150 plots analyzed in this study. At each plot aboveground plant parts in grasslands were removed before fourteen soil cores (diameter, 5 cm) were taken from the upper 15 cm of the A horizon from a 20 × 20 m subarea. The 14 samples were combined, homogenized and 10g of the homogenized soil were frozen immediately in liquid nitrogen and stored until nucleic acid extraction for the determination of relative abundances of prokaryotic (RNA extraction) and fungal and protist (DNA extraction) communities by high-throughput sequencing.

The original test data encompasses 14 environmental parameters which were determined for each sieved (2 mm) soil sample. These are pH, soil moisture (%), nitrate (NO3-), ammonium (NH4+), mineral nitrogen (N), microbial C [all values as μg soil−1], organic and inorganic carbon (C) [all values as mgg soil−1], fine root biomass [gcm−3], total N and C in roots (%), C/N ratio in soil, microbial C/N ratio, and C/N ratio in roots. The test for singularity (QR decomposition [40]) finds a rank of the environmental parameters matrix of 13 due to the collinearity between NO3- and NH4+. This means that either NO3- or NH4+ should be removed. Therefore, the interaction analysis is done with only a set of 13 environmental parameters after removing NH4+. Details on nucleic acid extractions, high-throughput sequencing, taxonomic classification, and determination of physicochemical soil parameters have been published elsewhere [4145].

Calculations were performed for 17 taxonomic groups at the level of phylum or classes which represent abundant groups. The estimation of their relative abundances is generally of high precision (typically less than 5% deviation [23]). βijαk and βjiαk of each pair of species i and j were determined for each soil sample k and for each environmental parameter α.

Overall, 150 samples, 17 taxonomic groups, and 13 environmental parameters were included in the data analysis. This data structure satisfies the requirement of sample size which is discussed in section. We scaled the species abundance and environmental parameters data (variance equals 1, uncentralized) to avoid the problem of large difference scale in different variables.

The degree of interaction changes with the gradient of the environmental conditions

The Fig 2 exemplifies the change of βijαk estimates across the environmental gradient of three soil parameters for the interaction influence of acidobacterial subgroup Gp3 on acidobacterial subgroup Gp1. Y-axis range is the same for all three subfigures.

Fig 2. The distribution of βijαk along the environmental gradient.

Fig 2

Whereas for pH and organic carbon a strong positive interaction is predicted, soil moisture suggests a weak negative impact. Notably, the strength of interaction varies along the gradient of the environmental parameter and increases substantially above a pH value of 6.5. In contrast, the interaction strength along gradients of organic carbon and soil moisture appears to be larger at rather low values of organic carbon and soil moisture respectively. In sum, βijαk values may vary substantially across the gradient of environmental parameters.

Comparison between the global interaction matrix and the correlation matrix

The βijαk and βjiαk values were then summarized by global βij and βji values, respectively, to estimate (a) the direction of interaction, which can be either positive or negative, and (b) the strength of interaction. In addition, the global interaction values were compared to results of standard co-occurrence analyses calculated by means of a Spearman rank correlation matrix Cij based on relative abundances of the taxa. Both sets of results were also visualized as networks which displayed the dominant patterns (for values -0.1 > βij > 0.1; -0.4 > ρSpearman > 0.4) (Fig 3).

Fig 3. The comparison between the correlation and interaction analysis.

Fig 3

Fig 3 provides the heatmaps (a,b) and network figures (c,d) of Spearman rank correlation matrix (a,c) and interaction matrix (b,d). Taxa depicted on the x-axis (columns) have interaction influence on taxa on the y-axis (rows). The red and blue colors indicate the negative and positive effects, respectively. For example, acidobacterial subgroup Gp3 has a high positive interaction influence on acidobacterial subgroup Gp1, whereas the bacterial Planctomycetacia have a strong negative interaction influence on protist group of Myxomycetes. Note that the heatmap color code is the same for both βij and Spearman ρ values. In the network figures, low Spearman ρ and low βij are not shown (but displayed in the heatmap), whereas the remaining values were artificially grouped into different categories (see color legend networks). Note that the interaction network displays also the direction of interaction by arrows, whereas correlation analysis does not enable any statement of directionality.

Important characteristics of interaction estimates and their difference to co-occurrence estimates are highlighted in Fig 3.

Firstly, taxa involved in multiple co-occurrences are not necessarily involved in corresponding interactions and vice versa. For example, acidobacterial subgroups Gp3, Gp5, Gp3 and also Actinobacteria share numerous co-occurrences with other taxa but are far less involved in interactions. Similarly, Myxomycetes and acidobacterial subgroup Gp1 both show numerous interactions to other taxa, but are less, if at all, involved in correlations.

Secondly, in case that two taxa are characterized by both strong correlation and interaction, it is not possible to predict from the type of correlation on the direction of interaction and vice versa. For example, both acidobacterial subgroup Gp3 and the Chlamydiae show strong positive correlations with each other and with acidobacterial subgroup Gp1. A strong positive interaction is observed only from Gp3 to Gp1, whereas the interactions of Chlamydiae on Gp1 are weakly negative and on Gp3 only very weak (βij = 0.04).

Thirdly, whereas co-occurrences within the same taxon are always positive at ρ = 1 (see heatmap, but not depicted in the network), overall interactions of taxa with itself can be both negative and positive. For example, Myxomycetes and Chlamydiae appear to have a negative interaction on themselves, whereas acidobacterial subgroup Gp6 and Plancomycetacia appear to have a positive influence on its own. Mathematically, this can be explained by analogy to the species self-effect in logistic equations, in which the rate of change of species abundance has also an influence in itself. In other words, this interaction value can be treated as the leading order of the solitary part in Eq (1). When the rate of change pi decreases with the increase of its abundance Ai, the interaction influence from itself will be negative. In the converse situation, the influence will be positive. As a biological interpretation, taxa negatively interacting with each other (as implied here by negative βij values) have reached the carrying capacity within their ecological niche. Alternatively, these results could be a consequence of hierarchically nested taxa that are strongly interacting with each other, resulting in a cumulative positive or negative interaction of the higher level taxon on its self (e.g. Chlamydia).

Finally, it appears as if taxa are preferentially either exerting or experiencing interaction influence. For example, Myxomycetes share a lot of mostly negative interactions with other taxa, however, in all cases Myxomycetes are being influenced by others but are not exerting influence on others. Antibiotics production of bacteria could be a likely explanation [46]. The same is true for acidobacterial subgroup Gp1, which is, mostly positively, under interaction influence by other taxa. Only few taxa appear to both, experience as well as exert effects through influence (Verrumicrobia, Acidobacteria subgroup Gp5).

Estimation of robustness on the interaction influence calculation

In order to estimate the robustness of βij with respect to the numerical imprecision of the input data, we performed several perturbation assays. For this, we chose six examples of global βij interaction values from the Fig 3 which are representative of different strengths of interaction values with both a positive or a negative direction. Following the distribution of βij shown in the interaction heatmap of Fig 3, we tested larger, median, and low βij values for their robustness on data perturbations. The effect of variation in sample composition on βij is evaluated by 1000 iterations of randomly sampling 90% of the samples without replacement. We refrain from using the classical bootstrapping (sampling with replacement), as the deviation term (Eq (11)) will turn zero for twice or more of subsampled data and hence will be of no informative value in the downstream regression analysis (Eq (11)).

The effect of either numerical precision of environmental parameter values or relative abundances of taxa was evaluated by randomly adding or subtracting error terms (0.01%, 0.1%, 5%, 10%, 20% and 50%) to the original values. The effect of both numerical precision of environmental parameter values and relative abundances of taxa was evaluated by randomly adding or subtracting error terms (5%, 20%) to the original values.

Each 1000 iterations were performed for each error term and data type. We analyzed the data by means of comparison of 95% confidence interval, which provide information on effect sizes additionally to null hypothesis significance testing [47]. The robustness of βij estimations at different levels of data perturbation. are presented in Fig 4.

Fig 4. The test of robustness.

Fig 4

The plot (a) presents the distribution of βij as shown in the interaction heatmap in Fig 3. The plot (b) shows the robustness estimations on exemplary positive (upper row) and negative (lower row) βij values of decreasing strength (from left to right) taken from the interaction heatmap in Fig 3. The respective interactions from taxon j on i are listed as abbreviations in the panel header (Gp1 and Gp3: acidobacterial subgroups Gp1 and Gp3; Basid: Basidiomycetes; Myxomy: Myxomycetes; Planct: Planctomycetacia; gProt: γ-Proteobacteria; Sphing: Sphingobacteria). Only the strong interactions (left panels in (b) are depicted in the interaction network in Fig 3). The black horizontal line indicates the original βij values. Dots and vertical lines represent mean and 95% confidence interval bars from 1000 iterations of each type of data perturbation (see color legend). Horizontal dashed lines separate perturbations on environmental parameter values, relative taxon abundances, and sampling sites. Very small 95% CI values are are not visible as they are covered by the size of the point estimate dot (mean value of 1000 iterations).

Note, however, that in all cases where the 95% CI bar did not cross the zero line, p was < 0.01 in a two-sided one-sample t-test.

Typically, the 95% CIs are very small, suggesting the algorithm for numerical calculations to be robust. However, with increasing error level (from 0.01% to 10%) the 95% CIs become larger. This can be explained by the accumulated error in the numerical calculation and the nonlinear structure of the data. In our model, we use the linear part of the Taylor expansion as the approximation, and the numerical calculation is based on the linear regression. At small error level, the Taylor expansion can be reliably estimated by its linear part. At increasing error level, the potentially nonlinear structure of the data will become more relevant and therefore may generate increasing uncertainty in the estimation. Principally, this issue could be solved by extending the Taylor expansion to higher orders to take into account the nonlinear structure of the data.

The majority of βij values was very small for both positive and negative directions (plot a in Fig 4). This is the result of our conservative custom approach for βij summarizing, which is based on the peak values of βijαk density distributions and which is typically close to zero. However, Fig 2 indicates that individual βijαk values can be considerably larger than 1.

There is a substantial effect of increasing error term size on the reduction of the original βij, which appears to be much larger than the effect on the increase of 95% CI intervals with increasing error term (figure b in Fig 3). This finding is independent of direction and strength of the original βij value and suggests that conclusions on direction and strength of interactions, especially in comparison of different pairs of taxa to each other, appear to be stable in the light of moderate error rates (up to 10%). The overall effect of error term size on data perturbations is larger for environmental parameter values than for values for the relative abundances of taxa. As a result, at larger error rates of environmental parameter values, the direction of interaction may change, suggesting that biological interpretation of very low βij should be treated with caution.

βij resulting from random sampling on soil samples are at comparable levels to βij resulting from 5% to 10% error term data perturbations on relative abundance values. Obviously, variation in the composition of samples does not change the estimates and the algorithm remains robust.

Discussion

Our first application of the methods developed in the present study to real-world data allowed us to identify several biotic interactions that are likely to shape soil microbial communities but were previously not recognized. Notably, the novel approach can be used to resolve the full spectrum from intra-specific, over intra-phylum, to inter-domain interactions in complex microbial communities.

In our approach, environmental parameters which are not quantified in a study but which are nevertheless relevant for the interaction of species i on j will affect the results and hence the inference of biotic interactions. By comparison, lack of quantitative information on environmental parameters does not affect the results of co-occurrence patterns, where the correlation value of species i and j will stay the same irrespectively of any environmental parameter that has been determined or not. Yet, the relevance of unknown parameters that might control the correlation instead of direct biotic interactions will not be revealed by co-occurrence analysis. Increasing the number of organismic groups and environmental parameters in our type of analysis ultimately will require some data reduction approaches. The choice of which taxa and which environmental parameter should be included requires independent knowledge about their potential relevance. Such knowledge could be retrieved from, e.g., multivariate statistics.

The novel approach to estimate the strength and direction of biological interactions among taxa provides several advantages.

Firstly, our approach does not require repeated measurements at different time points. This is in contrast to approaches based on the discrete-time Lotka–Volterra model which requires concrete differential equation models [9, 4850]. Often, these interactions are analysed within a predator-prey framework. However, the sampling of microbial communities is typically conducted in a destructive manner, which renders reproducible sampling of heterogeneous soil environment rather difficult or even impossible [43]. Instead, many soil microbial studies use cross-sectional format by taking multiple samples from different sites in parallel at the same time [5153]. Our approach is far more general than the discrete-time Lotka–Volterra models employs derivations from the multivariate Taylor expansion function, and therefore allows to assess interactions using comparative cross-sectional datasets.

Secondly, our approach allows analysing interactions between two species i and j in the presence of other taxa x, y, z which may affect the interaction of species i on j [54, 55]. Analysing interactions between species i and j within a more complex community has already been addressed in discrete-time Lotka-Volterra models [9]. In our model, the user can deliberately remove species x, y, z to analyse the effect of their presence or absence on the interaction of species i on j or identify whether interactions appear stable despite varying community compositions. As an example, we observed that Acidobacteria subgroup Gp1 is apparently influenced by several other taxa (Fig 3). Using slightly different community compositions and slightly different sets of the environmental parameter, we observed that (i) Acidobacteria subgroup Gp1 remains being influenced by numerous other groups and that (ii) the observation of a strong positive interaction of Acidobacteria subgroup Gp3 on Gp1 remains (data not shown). The ecological function of Acidobacteria subgroup Gp1 is still largely unknown, but its involvement in several rather strong interactions suggests that they represent a keystone group of bacteria.

Thirdly, we can address interaction in the light of qualitatively and quantitatively different environmental parameters. Analogous to studying the effect of species composition, the user will be able to study the effect of a specific environmental parameter by removing it from the data set or by testing different combinations. More detailed analyses could be undertaken by determining βij separately for different ranges of environmental parameter values, e.g. at low versus high values of either pH, soil moisture, or land use intensities.

Fourthly, the results from our interaction calculations serve not only for biological interpretations but can be further used in statistical or modeling approaches. Quantities such as (piαk), (βijαk), (βijk), (βij), (βik) and (βi) could be used in multivariate statistics or to construct dynamical equations within the framework of system stability analysis [5658].

Finally, whereas in co-occurrence analysis it is not possible to distinguish between effects of true interaction and effects of similar response to environmental parameters without actually interacting, our approach enables to address separately the effects of biotic interactions and the abiotic response to the environmental parameter by using Eq (14).

At present, our model does not incorporate specific assumption about the mathematical expression of Si, Iij in Eq (1). In the future work, the Monod equation or logistic functions could be included to update our model to a concrete form. The numerical calculation strategies to do the interaction estimation would not necessarily be affected by this.

Based on the quantification of the strength of interaction and the prediction of its direction that is provided by our new approach, the underlying mechanisms of interaction will have to be determined by complementary experimental approaches. For example, a strong negative interaction could be exerted via direct predation [55], antibiotics [59], or other types of chemical warfare such as volatiles [60]. Here, strong βij that link taxa which previously were not under suspect to interact under natural conditions could serve as models for future investigations of the interaction mechanisms. This will require the availability of cultured isolates, however.

Supporting information

S1 File. R code and data sets.

“Abundscale.Rdata” is the original species relative abundance data. “Parascale.Rdata” is the original environmental parameters data. Due to the long computation time on the original data, two shorten data sets “Abundscale_short.RData” and “Parascale_short.RData” are also provided for the fast test. “test_git.R” is the code for the analysis workflow on both of original data and the shorten data. “InteractionAnalysis_git.R” contains all the developed functions which are used in “test_git.R”.

(7Z)

Acknowledgments

We thank the managers of the three Exploratories, Kirsten Reichel-Jung, Swen Renner, Katrin Hartwich, Sonja Gockel, Kerstin Wiesner, and Martin Gorke for their work in maintaining the plot and project infrastructure; Christiane Fischer and Simone Pfeiffer for giving support through the central office, Michael Owonibi for managing the central data base, and Markus Fischer, Eduard Linsenmair, Dominik Hessenmöller, Jens Nieschulze, Daniel Prati, Ernst-Detlef Schulze, Wolfgang W. Weisser and the late Elisabeth Kalko for their role in setting up the Biodiversity Exploratories project. We thank Doreen Berner for her assistance during sampling and laboratory analyses.

The work has been (partly) funded by the DFG Priority Program 1374 “Infrastructure-Biodiversity-Exploratories” (Grants No. OV 20/21-1, 22-1). Field work permits were issued by the responsible state environmental offices of Baden-Württemberg, Thüringen, and Brandenburg (according to §72 BbgNatSchG).

Data Availability

My code and test data are in the github website https://github.com/amssshangyu/interaction_analysis.

Funding Statement

The work has been (partly) funded by the Deutsche Forschungsgemeinschaft (DFG) Priority Program 1374 “Infrastructure-Biodiversity-Exploratories” (Grants No. OV 20/21-1, 22-1 http://gepris.dfg.de/gepris/projekt/252263987). J.O. received the funding. Field work permits were issued by the responsible state environmental offices of Baden-Württemberg, Thüringen, and Brandenburg (according to §72 BbgNatSchG).

References

  • 1. Bodelier P, Meima-Franke M, Hordijk C, et al. Microbial minorities modulate methane consumption through niche partitioning. ISME. 2013;J7:2214–2228. 10.1038/ismej.2013.99 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Levine U, Teal T, Robertson G, et al. Agriculture’s impact on microbial diversity and associated fluxes of carbon dioxide and methane. ISME. 2011;J5:1683–1691. 10.1038/ismej.2011.40 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Strickland M, Lauber C, Fierer N, et al. Testing the functional significance of microbial community composition. Ecology. 2009;90:441–451. 10.1890/08-0296.1 [DOI] [PubMed] [Google Scholar]
  • 4. Scherber C, Eisenhauer N, Weisser WW, et al. Bottom-up effects of plant diversity on multitrophic interactions in a biodiversity experiment. Nature. 2010;468(7323):553–556. 10.1038/nature09492 [DOI] [PubMed] [Google Scholar]
  • 5. Roesch LFW, Fulthorpe RR, Riva A, et al. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 2007;1(4):283–290. http://www.nature.com/ismej/journal/v1/n4/suppinfo/ismej200753s1.html 10.1038/ismej.2007.53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Torsvik V, Ovreas L. Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology. 2002;5(3):240–245. 10.1016/S1369-5274(02)00324-7 [DOI] [PubMed] [Google Scholar]
  • 7. Chaffron S, Rehrauer H, Pernthaler J, et al. A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Research. 2010;7(9):947–959. 10.1101/gr.104521.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Overmann J. 7. In: Principles of enrichment, isolation, cultivation, and preservation of prokaryotes. Springer; Berlin Heidelberg; 2013. p. 149–207. Available from: 10.1007/978-3-642-30194-0_7 [DOI] [Google Scholar]
  • 9. K FC, Pankaj M. Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries Using Sparse Linear Regression. PLoS ONE. 2014;9(7):e102451 10.1371/journal.pone.0102451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Hortal J, Bello Fd, Diniz-Filho JAF, et al. Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics. 2015;46(1):523–549. 10.1146/annurev-ecolsys-112414-054400 [DOI] [Google Scholar]
  • 11. Faust K, Raes J. Microbial interactions: from networks to models. Nat Rev Microbiol. 2012;10(8):538–550. 10.1038/nrmicro2832 [DOI] [PubMed] [Google Scholar]
  • 12. Bucci V, Xavier JB. Towards Predictive Models of the Human Gut Microbiome. Journal of Molecular Biology. 2014;426(23):3907–3916. 10.1016/j.jmb.2014.03.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Tveit AT, Urich T, Frenzel P, et al. Metabolic and trophic interactions modulate methane production by Arctic peat microbiota in response to warming. Proceedings of the National Academy of Sciences. 2015;112(19):E2507–E2516. 10.1073/pnas.1420797112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Kuzyakov Y, Blagodatskaya E. Microbial hotspots and hot moments in soil: Concept and review. Soil Biology and Biochemistry. 2015;83(0):184–199. 10.1016/j.soilbio.2015.01.025 [DOI] [Google Scholar]
  • 15. Nadell CD, Xavier JB, Levin SA, et al. The evolution of quorum sensing in bacterial biofilms. PLoS Biology. 2008;6(1):e14 10.1371/journal.pbio.0060014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. West SA, Griffin AS, Gardner A, et al. Social evolution theory for microorganisms. Nature Reviews Microbiology. 2006;4(8):597–607. 10.1038/nrmicro1461 [DOI] [PubMed] [Google Scholar]
  • 17. Williams P, Winzer K, Chan W, et al. Look who’s talking: communication and quorum sensing in the bacterial world. Philos Trans R Soc London Ser B. 2007;362(1483):1119–1134. 10.1098/rstb.2007.2039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Biswas S, McDonald M, Lundberg DS, et al. In: Learning Microbial Interaction Networks from Metagenomic Count Data. Cham: Springer International Publishing; 2015. p. 32–43. Available from: 10.1007/978-3-319-16706-0_6 [DOI] [PubMed] [Google Scholar]
  • 19. Sugihara G, May R, Ye H, Hsieh Ch, Deyle E, Fogarty M, et al. Detecting Causality in Complex Ecosystems. Science. 2012;338(6106):496–500. 10.1126/science.1227079 [DOI] [PubMed] [Google Scholar]
  • 20. Merchuk JC, Asenjo JA. The Monod equation and mass transfer. Biotechnology and Bioengineering. 1995;45(1):91–94. 10.1002/bit.260450113 [DOI] [PubMed] [Google Scholar]
  • 21. Monod J. The Growth of Bacterial Cultures. Annual Review of Microbiology. 1949;3(1):371–394. 10.1146/annurev.mi.03.100149.002103 [DOI] [Google Scholar]
  • 22. Murray JD. Mathematical Biology: I. An introduction, Third Edition Springer; 2002. [Google Scholar]
  • 23. Uroz S, Buee M, Murat C, et al. Pyrosequencing reveals a contrasted bacterial diversity between oak rhizosphere and surrounding soil. Environ Microbiol Rep. 2010;2(2):281–288. 10.1111/j.1758-2229.2009.00117.x [DOI] [PubMed] [Google Scholar]
  • 24. Rossello-Mora R, Amann R. The species concept for prokaryotes. FEMS Microbiol Rev. 2001;25:39–67. 10.1111/j.1574-6976.2001.tb00571.x [DOI] [PubMed] [Google Scholar]
  • 25. Hazewinkel M. “Calculus”, Encyclopedia of Mathematics. Springer; 2001. [Google Scholar]
  • 26. Ban Y, An L, Jiang H. Investigating microbial co-occurrence patterns based on metagenomic compositional data. Bioinformatics. 2015;31(20). 10.1093/bioinformatics/btv364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Fang H, Huang C, Zhao H, Deng M. CCLasso: Correlation Inference for Compositional Data through Lasso. Bioinformatics. 2015; 10.1093/bioinformatics/btv349 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Friedman J, Alm EJ. Inferring Correlation Networks from Genomic Survey Data. PLoS Comput Biol. 2012;8(9):1–11. 10.1371/journal.pcbi.1002687 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and Compositionally Robust Inference of Microbial Ecological Networks. PLoS Comput Biol. 2015;11(5):1–25. 10.1371/journal.pcbi.1004226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Pearson K. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc London. 1897;60:489–502. 10.1098/rspl.1896.0076 [DOI] [Google Scholar]
  • 31. Weiss S, Treuren VW, Lozupone C, et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. The ISME Journal. 2016;10:1669–1981. 10.1038/ismej.2015.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Ellis SP. Singularity and outliers in linear regression with application to least squares, least squares linear regression. Metron—International Journal of Statistics. 2000;LVIII(1-2):121–129. [Google Scholar]
  • 33. Huber PJ, Ronchetti EM. Robust Statistics. Wiley; 2009. [Google Scholar]
  • 34. Draper NR. Applied regression analysis. New York: Wiley; 1998. [Google Scholar]
  • 35. Everitt BS. Cambridge Dictionary of Statistics. Cambridge University Press; 2002. [Google Scholar]
  • 36. Hawkins DM. The problem of overfitting. Journal of chemical information and computer sciences. 2004;44(1):1–12. 10.1021/ci0342472 [DOI] [PubMed] [Google Scholar]
  • 37. Jansen F, Oksanen J. How to model species responses along ecological gradients–Huisman–Olff–Fresco models revisited. Journal of Vegetation Science. 2013;24(6):1108–1117. 10.1111/jvs.12050 [DOI] [Google Scholar]
  • 38. Nayfach S, Pollard KS. Toward Accurate and Quantitative Comparative Metagenomicss. Cell. 2016;166(5):1103–1116. 10.1016/j.cell.2016.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Fischer M, Bossdorf O, Gockel S, et al. Implementing large-scale and long-term functional biodiversity research: The Biodiversity Exploratories. Basic and Applied Ecology. 2010;11(6):473–485. 10.1016/j.baae.2010.07.009 [DOI] [Google Scholar]
  • 40. Golub GH, Van Loan CF. Matrix Computations (3rd Ed.). Baltimore, MD, USA: Johns Hopkins University Press; 1996. [Google Scholar]
  • 41. Fiore-Donno AM, Weinert J, Wubet T, et al. Metacommunity analysis of amoeboid protists in grassland soils. Scientific Reports. 2016;6:19068 10.1038/srep19068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Goldmann K, Schöning I, Buscot F, et al. Forest management type influences diversity and community composition of soil fungi across temperate forest ecosystems. Frontiers in Microbiology. 2015;6 10.3389/fmicb.2015.01300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Regan KM, Nunan N, Boeddinghaus RS, et al. Seasonal controls on grassland microbial biogeography: Are they governed by plants, abiotic properties or both? Soil Biology and Biochemistry. 2014;71(0):21–30. 10.1016/j.soilbio.2013.12.024 [DOI] [Google Scholar]
  • 44. Solly EF, Schöning I, Boch S, et al. Factors controlling decomposition rates of fine root litter in temperate forests and grasslands. Plant and Soil. 2014;382(1):203–218. 10.1007/s11104-014-2151-4 [DOI] [Google Scholar]
  • 45.Wüst PK, Nacke H, Kaiser K, et al. Estimates of the bacterial ribosome content and diversity in soils are significantly affected by different nucleic acid extraction methods. Applied and Environmental Microbiology. 2016; [DOI] [PMC free article] [PubMed]
  • 46. Jousset A, Scheu S, Bonkowski M. Secondary metabolite production facilitates establishment of rhizobacteria by reducing both protozoan predation and the competitive effects of indigenous bacteria. Functional Ecology. 2008;22 10.1111/j.1365-2435.2008.01411.x [DOI] [Google Scholar]
  • 47. Halsey LG, Curran-Everett D, Vowler SL, et al. The fickle P value generates irreproducible results. Nat Meth. 2015;12:179–185. 10.1038/nmeth.3288 [DOI] [PubMed] [Google Scholar]
  • 48. Ives AR, Dennis B, Cottingham KL, et al. ESTIMATING COMMUNITY STABILITY AND ECOLOGICAL INTERACTIONS FROM TIME-SERIES DATA. Ecological Monographs. 2003;73 10.1890/0012-9615(2003)073[0301:ECSAEI]2.0.CO;2 [DOI] [Google Scholar]
  • 49.J HD. Estimating species interactions from observational data with Markov networks. bioRxiv. 2015;
  • 50. Power DA, Watson RA, Szathmáry E, et al. What can ecosystems learn? Expanding evolutionary ecology with learning theory. Biology direct. 2015;10 10.1186/s13062-015-0094-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Crowther TW, Maynard DS, Leff JW, et al. Predicting the responsiveness of soil biodiversity to deforestation: a cross-biome study. Global Change Biology. 2014;20(9):2983–2994. 10.1111/gcb.12565 [DOI] [PubMed] [Google Scholar]
  • 52. Fierer N, Leff JW, Adams BJ, et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc Natl Acad Sci U S A. 2012;109(52):21390–21395. 10.1073/pnas.1215210110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Lozupone CA, Knight R. Global patterns in bacterial diversity. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(27):11436–11440. 10.1073/pnas.0611525104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Stenseth NC, Falck W, Bjørnstad ON, et al. Population regulation in snowshoe hare and Canadian lynx: Asymmetric food web configurations between hare and lynx. Proceedings of the National Academy of Sciences. 1997;94(10):5147–5152. 10.1073/pnas.94.10.5147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Hoppener-Ogawa S, Leveau JHJ, van Veen JA, et al. Mycophagous growth of Collimonas bacteria in natural soils, impact on fungal biomass turnover and interactions with mycophagous Trichoderma fungi. ISME J. 2008;3(2):190–198. http://www.nature.com/ismej/journal/v3/n2/suppinfo/ismej200897s1.html 10.1038/ismej.2008.97 [DOI] [PubMed] [Google Scholar]
  • 56. Legendre P, Legendre L. Numerical Ecology. ELSEVIER; 2012. [Google Scholar]
  • 57. Daniel B, Francois G, Legendre P. Numerical Ecology with R. Springer; 2011. [Google Scholar]
  • 58. Coyte KZ, Schluter J, Foster KR. The ecology of the microbiome: Networks, competition, and stability. Science. 2015;350 10.1126/science.aad2602 [DOI] [PubMed] [Google Scholar]
  • 59. Ling LL, Schneider T, Peoples AJ, et al. A new antibiotic kills pathogens without detectable resistance. Nature. 2015;517:455–459. 10.1038/nature14098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Schmidt R, Cordovez V, de Boer W, et al. Volatile affairs in microbial interactions. ISME J. 2015;9(11):2329–2335. 10.1038/ismej.2015.42 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. R code and data sets.

“Abundscale.Rdata” is the original species relative abundance data. “Parascale.Rdata” is the original environmental parameters data. Due to the long computation time on the original data, two shorten data sets “Abundscale_short.RData” and “Parascale_short.RData” are also provided for the fast test. “test_git.R” is the code for the analysis workflow on both of original data and the shorten data. “InteractionAnalysis_git.R” contains all the developed functions which are used in “test_git.R”.

(7Z)

Data Availability Statement

My code and test data are in the github website https://github.com/amssshangyu/interaction_analysis.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES