Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2025 Nov 7;21(11):e1013691. doi: 10.1371/journal.pcbi.1013691

Quantifying microbial interactions based on compositional data using an iterative approach for solving generalized Lotka-Volterra equations

Yue Huang 1, Tianqi Tang 2, Xiaowu Dai 3, Fengzhu Sun 1,*
Editor: Shanfeng Zhu4
PMCID: PMC12626337  PMID: 41202104

Abstract

Understanding microbial interactions is fundamental for exploring population dynamics, particularly in microbial communities where interactions affect stability and host health. Generalized Lotka-Volterra (gLV) models have been widely used to investigate system dynamics but depend on absolute abundance data, which are often unavailable in microbiome studies. To address this limitation, we introduce an iterative Lotka-Volterra (iLV) model, a novel framework tailored for compositional data that leverages relative abundances and iterative refinements for parameter estimation. The iLV model features two key innovations: an adaptation of the gLV framework to compositional constraints and an iterative optimization strategy combining linear approximations with nonlinear refinements to enhance parameter estimation accuracy. Using simulations and real-world datasets, we demonstrate that iLV surpasses existing methodologies, such as the compositional LV (cLV) and the generalized LV (gLV) model, in recovering interaction coefficients and predicting species trajectories under varying noise levels and temporal resolutions. Applications to the lynx-hare predator-prey, Stylonychia pustula-P. caudatum mixed culture, and cheese microbial systems revealed consistency between predicted and observed relative abundances showcasing its accuracy and robustness. In summary, the iLV model bridges theoretical gLV models and practical compositional data analysis, offering a robust framework to infer microbial interactions and predict community dynamics using relative abundance data, with significant potential for advancing microbial research.

Author summary

Microbes and animals often live in communities where species interact in complex ways. Understanding these interactions is essential for predicting population dynamics and how ecosystems respond to environmental changes. A widely used mathematical tool for modeling these interactions is the generalized Lotka-Volterra model. However, it requires information on absolute population sizes, which is rarely available in microbiome studies. We developed the iterative Lotka-Volterra (iLV) model to overcome this challenge. iLV adapts the classical framework to work directly with relative abundance data—the kind most often collected in microbiome studies. Through an iterative optimization process, iLV improves the accuracy of inferred species interactions and population trajectories. We validated our approach using simulated data, along with real-world systems including the snowshoe hare–Canadian lynx system, a microbial co-culture experiment, and a cheese microbial community. In each case, iLV provided more accurate results than existing methods. Our work offers a practical framework for studying species interactions using relative abundance data.

Introduction

Quantifying microbial interactions is essential for understanding the dynamics of microbial communities. This task has gained increased attention in microbiome research, as microbial interactions directly impact community stability and host health [13]. The Lotka-Volterra (LV) modeling framework was originally developed to describe predator-prey systems [4], based on which the generalized Lotka-Volterra (gLV) was proposed to model systems of more than two species. gLV has been widely adopted to infer these interactions due to its versatility in modeling nonlinear dynamics across diverse ecosystems, from microbial communities to macroscopic populations [57]. The generalized Lotka-Volterra (gLV) model remains a key tool for describing microbial interactions and forecasting community trajectories. Applications range from predicting microbiome responses to antibiotics and dietary shifts to understanding microbial coexistence in ecological contexts [8,9].

However, the widespread use of sequencing technologies in microbial studies introduces a unique challenge: most data are compositional, providing relative rather than absolute abundance profiles [10,11]. This poses significant limitations for traditional gLV models, which rely on absolute abundance inputs for accurate parameterization and prediction [12,13]. To address this problem, compositional adaptations of gLV models have been developed, such as the compositional Lotka-Volterra (cLV) framework, which maps dynamics onto a constrained simplex, accommodating the summation constraint of relative abundances [13].

These previous advancements have highlighted the potential of compositional data analysis, but they also have some major limitations, such as moderate accuracy due to their nature of linear approximations. For cLV, the coefficients in the gLV model cannot be fully recovered [13], making it harder to interpret the results. Gaps remain in integrating theoretical and empirical approaches for compositional data. To bridge the gap, we proposed a novel method (called iterative Lotka-Volterra or iLV) of compositional data analysis based on the traditional gLV model, emphasizing the compositional constraints and their implications for model parameterization. There are two major innovative components of iLV, including defining the classical generalized Lotka-Volterra model with relative abundances and inter-species sum of absolute abundances, and iterative linear approximations followed by non-linear optimizations. By leveraging computational simulations, we compared the performance of iLV with other existing methods including generalized Lotka–Volterra (gLV) and compositional Lotka-Volterra (cLV). We also applied iLV in three real-world datasets to estimate the coefficients of gLV model and accurately recover community relative abundance trajectories.

Results

Performance and stability of different non-linear optimization methods

When applying the iLV algorithm to parameter estimation, the choice of non-linear optimization methods in Subroutine 2 can influence both accuracy and numerical stability, particularly when the problem is ill-conditioned. Fig 1 presents an example where instability occurred for different non-linear optimization methods. We simulated data points using the following parameter setting that reflects periodic oscillations among three species: r1=0.31, r2=0.6, r3=0.29, b12=0.01, b13=0.011, b21=0.009, b23=0.01, b31=0.012, b32=0.015, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100, with a time length of 20 and a time interval of 0.4. We executed the iLV Algorithm 20 times using the simulated data and estimated the parameters with three different non-linear optimization methods: leastsq(), least_squares(method = ‘lm’), and least_squares(method=’trf’). Nsum_initial_guess was set to 200. As shown in Fig 1, all three methods exhibited instabilities potentially due to rounding errors and the ill-conditioned nature of this dataset, and leastsq() achieved the lowest trajectory RMSE of 9.29×109. Therefore, to mitigate the effects of numerical instabilities and the variable performance of optimization methods across different datasets, we compared the trajectory RMSE returned by least_squares (method = “trf” or method = “lm”) and leastsq() in Subroutine 2, and retained the results with the lowest RMSE. Additionally, during benchmarking, we repeated the algorithm 20 times and reported the parameter that yielded the lowest RMSE among these 20 runs.

Fig 1. Trajectory Root Mean Square Error (RMSE) distributions of three optimization methods: (A) SQ refers to leastsq(), (B) LM refers to least_squares(method = ‘lm’), and (C) TRF refers to least_squares(method=’trf’).

Fig 1

Data points were simulated using the following parameter setting that reflects periodic oscillations among three species: 𝐫1=0.31, 𝐫2=0.6, 𝐫3=0.29, 𝐛12=0.01, 𝐛13=0.011, 𝐛21=0.009, 𝐛23=0.01, 𝐛31=0.012, 𝐛32=0.015, 𝐱1(0)=0.3, 𝐱2(0)=0.5, 𝐱3(0)=0.2, 𝐍sum(0)=100, with a time length of 20 and a time interval of 0.4. All three methods exhibited instabilities for this dataset, while leastsq() achieved the lowest trajectory RMSE of 9.29×109. The mean RMSEs across 20 runs for leastsq(), least_squares(method = ’lm’), and least_squares(method = ’trf’) were 0.0687, 0.135, and 0.103, respectively, while their corresponding median RMSEs were 0.0386, 0.0958, and 0.0945.

Two subroutines of the iLV Algorithm jointly improve parameter estimation

An accurate initial guess of parameters is crucial for the success of optimization functions like leastsq() or least_squares() in finding an optimal local minimum of the cost function. Subroutine 1 (the iterative subroutine) of the iLV Algorithm provides an iterative approach to generate such initial guesses effectively. To illustrate the impact of the iterative subroutine, we generated simulated data using the following parameter settings: r1=0.31, r2=0.6, r3=0.29, b12=0.01, b13=0.011, b21=0.009, b23=0.01, b31=0.012, b32=0.015, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100, time range = 20, time interval = 1. Nsum_initial_guess was set to 200. Fig 2A shows the trajectory RMSE as a function of the number of iterations in Subroutine 1 (the iterative subroutine). The figure indicates that Subroutine 1 iteratively refined the initial guess, with the optimal guess achieved in the 13th iteration out of 100. Fig 2B and 2C illustrate how Subroutine 1 substantially improved the fit between predicted and observed relative abundances. Without applying Subroutine 1, using the parameter estimations derived from the gLV linear approximation as the starting point of Subroutine 2 (the least square estimation subroutine) resulted in poor optimization performance, with an RMSE of 0.122 for leastsq() (Fig 2C). In contrast, using the parameters from the iteration with the lowest trajectory RMSE within the first 100 iterations significantly improved optimization performance, yielding an RMSE of 2.11×1010 with leastsq() (Fig 2B). Under this parameter setting, least_squares(method = ‘lm’ or ‘trf’) did not perform as well as leastsq(). Notably, the iterative subroutine guarantees non-increasing trajectory RMSE values, as the first iteration corresponds to the result without applying the iterative subroutine, and only the iteration with the lowest trajectory RMSE (e.g., out of the first 100 iterations) is selected as the starting point of the least square estimation subroutine.

Fig 2. Two subroutines of the iLV Algorithm jointly improve parameter estimation.

Fig 2

Panel A depicts the trajectory RMSE values across 100 iterations of the iterative subroutine in the iLV Algorithm, illustrating its ability to iteratively identify parameter estimations with the lowest RMSE within the first 100 runs. The last three panels compare the predicted relative abundances (solid lines) to observed values (symbols) for each species, with both Subroutines (Panel B), without Subroutine 1 (Panel C), and without Subroutine 2 (Panel D), respectively, demonstrating significantly improved alignment after joint effects of both Subroutines in the iLV Algorithm.

Subroutine 2 helps to improve parameter estimation as leastsq() and least_squares() functions find a local minimum of the cost function F(x), near the initial guess returned by Subroutine 1. As shown in Fig 2D, the trajectory RMSE was 0.169 when only Subroutine 1 was applied without Subroutine 2.

Benchmarking with existing methods

Previously, Joseph et al. [13] developed a nonlinear dynamical system for modeling microbial dynamics using relative abundances, termed the “compositional” Lotka-Volterra (cLV) model, which unifies methodologies based on generalized Lotka-Volterra (gLV) equations with approaches from compositional data analysis. However, cLV cannot fully recover the original interaction coefficients in the generalized Lotka-Volterra model and it also assumes Nsum stays roughly constant [13]. There could also be other ways of dealing with relative abundance input, such as applying the generalized Lotka–Volterra (gLV) to relative abundance input directly (we referred to it as gLV_relative). Besides, we used Pearson or Spearman correlation coefficients between the relative abundance data of two microbial species i and j to assess the corresponding interaction coefficients bij of the generalized Lotka–Volterra model, as two other benchmarked methods. We also applied the gLV model to absolute abundance data for parameter estimation as a baseline for comparison.

To evaluate the performance of these methods, we generated simulated data using the generalized Lotka–Volterra model, represented by the ODE system (4), under two biologically relevant parameter settings without self-interactions. The first parameter setting (r1=0.31, r2=0.6, r3=0.29, b11= b22=b33= 0, b12=0.01, b13=0.011, b21=0.009, b23=0.01, b31=0.012, b32=0.015, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100) reflects periodic oscillations among three species, which are similar to patterns observed in the lynx-hare dynamics [14]. The second parameter setting (r1=0.21, r2=0.4, r3=0.19, b11= b22=b33= 0, b12=0.02, b13=0.016, b21=0.01, b23=0.014, b31=0.017, b32=0.02, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100) describes a scenario where one species eventually dominates while others decline, as observed in the cheese microbial dataset [15].

To test robustness, we added varying levels of noise (no noise, 5%, or 10% random Gaussian noise) to the simulated data. Relative abundance data were generated for different time lengths and intervals. For each noisy group (either 5% or 10% noise), 20 randomly noisy datasets of relative abundance were generated. iLV was used to estimate the interaction coefficients with Nsum_intial_guess= 200. The cosine similarities between the estimated and ground truth interaction coefficients (bij) are presented in Tables 1, 2, 4, and 5. For the cLV model, cosine similarities were calculated between the estimated and ground truth values of bijbDj (where species D has the lowest observed variance of relative abundance among all species and i ≠ j or D), due to the inherent limitations of the cLV framework of not being able to estimate bij [13]. Trajectory RMSE was not used as an evaluation metric since the system can be unstable growing exponentially if parameter estimation is too far from the ground truth. Moreover, cLV cannot fully recover bij, or ri either.

Table 1. Average cosine similarity of interaction coefficients (bij) where ij under varying time intervals (Δt) and noise levels for different methods, using simulated datasets that reflect periodic oscillations among three species.

Time segment iLV cLV gLV_relative Pearson’s r Spearman’s r gLV_absolute
No noise Δt = 1 1 0.813 0.789 0.00201 -0.00310 0.816
Δt = 0.5 1 0.888 0.883 -5.23e-05 0.00791 0.945
Δt = 0.1 1 0.924 0.909 -0.00283 0.000474 0.997
Noise level 0.05 Δt = 1 0.936
(0.0751)
0.811
(0.0107)
0.784
(0.0146)
0.00213
(0.00386)
-0.00585
(0.00852)
0.810
(0.0168)
Δt = 0.5 0.962
(0.0493)
0.889
(0.0111)
0.883
(0.00896)
-0.00178
(0.00246)
0.00269
(0.00333)
0.947
(0.00888)
Δt = 0.1 0.991
(0.0142)
0.914
(0.00524)
0.906
(0.00294)
-0.00237
(0.00143)
-5.71e-05
(0.00132)
0.995
(0.00267)
Noise level 0.10 Δt = 1 0.813
(0.210)
0.802
(0.0189)
0.774
(0.0273)
0.00187
(0.00684)
-0.00254
(0.0100)
0.807
(0.0465)
Δt = 0.5 0.886
(0.109)
0.875
(0.0260)
0.866
(0.0228)
-0.00113
(0.00543)
-2.38e-4
(0.00688)
0.937
(0.0268)
Δt = 0.1 0.961
(0.0431)
0.874
(0.0177)
0.868
(0.0147)
-0.00268
(0.00218)
-1.81e-5
(0.00255)
0.970
(0.0226)

For each noisy group, 20 randomly noisy datasets of relative abundance were generated, and the standard deviation was reported in parentheses following the mean cosine similarities of bij. Best performances (Friedman Test, followed by post-hoc one-sided Wilcoxon signed-rank test, with p value less than 0.05) are highlighted in bold. The data were simulated under the first parameter setting (r1=0.31, r2=0.6, r3=0.29, b11= b22=b33= 0, b12=0.01, b13=0.011, b21=0.009, b23=0.01, b31=0.012, b32=0.015, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100). We set Nsum_intial_guess=200.

Table 2. Average cosine similarity of interaction coefficients (bij) where ij under varying time ranges (t) and noise levels for different methods, using simulated datasets that reflect periodic oscillations among three species.

Time range iLV cLV gLV_relative Pearson’s r Spearman’s r gLV_absolute
No noise t = 20 1 0.814 0.788 -0.00325 -0.00166 0.817
t = 15 1 0.843 0.822 0.0127 0.0108 0.820
t = 10 1 0.813 0.789 0.00201 -0.00310 0.816
Noise level 0.05 t = 20 0.980
(0.0398)
0.811
(0.0109)
0.784
(0.0135)
-0.00269
(0.00339)
-0.00161
(0.00366)
0.817
(0.0161)
t = 15 0.940
(0.138)
0.839
(0.0100)
0.816
(0.0116)
0.0133
(0.00265)
0.0121
(0.00437)
0.820
(0.0220)
t = 10 0.936
(0.0751)
0.811
(0.0107)
0.784
(0.0146)
0.00213
(0.00386)
-0.00585
(0.00852)
0.810
(0.0168)
Noise level 0.10 t = 20 0.958
(0.0569)
0.813
(0.0145)
0.788
(0.0171)
-0.00587
(0.00452)
-0.00435
(0.00478)
0.824
(0.0358)
t = 15 0.958
(0.0567)
0.832
(0.0161)
0.808
(0.0216)
0.0117
(0.00557)
0.00820
(0.00786)
0.841
(0.0320)
t = 10 0.813
(0.210)
0.802
(0.0189)
0.774
(0.0273)
0.00187
(0.00684)
-0.00254
(0.0100)
0.807
(0.0465)

For each noisy group, 20 randomly noisy datasets of relative abundance were generated, and the standard deviation was reported in parentheses following the mean cosine similarities of bij. Best performances (Friedman Test, followed by post-hoc one-sided Wilcoxon signed-rank test, with p value less than 0.05) are highlighted in bold. The data were simulated under the first parameter setting (r1=0.31, r2=0.6, r3=0.29, b11= b22=b33= 0, b12=0.01, b13=0.011, b21=0.009, b23=0.01, b31=0.012, b32=0.015, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100). We set Nsum_intial_guess=200.

Table 4. Average cosine similarity of interaction coefficients (bij) where ij under varying time intervals (Δt) and noise levels for different methods, using simulated datasets that reflect a stabilizing system of three species.

Time segment iLV cLV gLV_relative Pearson’s r Spearman’s r gLV_absolute
No noise Δt = 1 1 0.856 0.743 0.0963 0.0965 0.903
Δt = 0.5 1 0.921 0.840 0.0923 0.100 0.971
Δt = 0.1 1 0.955 0.892 0.0880 0.0901 0.998
Noise level 0.05 Δt = 1 0.979
(0.0179)
0.852
(0.00911)
0.738
(0.0135)
0.0969
(0.00582)
0.0994
(0.0134)
0.900
(0.0165)
Δt = 0.5 0.990
(0.00911)
0.918
(0.00619)
0.835
(0.00783)
0.0914
(0.00365)
0.0969
(0.00495)
0.971
(0.0109)
Δt = 0.1 0.996
(0.00565)
0.947
(0.00352)
0.871
(0.00814)
0.0884
(0.00160)
0.0900
(0.00217)
0.994
(0.00413)
Noise level 0.10 Δt = 1 0.838
(0.159)
0.844
(0.0159)
0.725
(0.0209)
0.0963
(0.0122)
0.102
(0.0167)
0.875
(0.0350)
Δt = 0.5 0.890
(0.170)
0.904
(0.0121)
0.814
(0.0260)
0.0934
(0.00601)
0.0952
(0.0101)
0.953
(0.0356)
Δt = 0.1 0.960
(0.0852)
0.851
(0.0337)
0.744
(0.0349)
0.0875
(0.00353)
0.0890
(0.00388)
0.974
(0.0216)

For each noisy group, 20 randomly noisy datasets of relative abundance were generated, and the standard deviation was reported in parentheses following the mean cosine similarities of bij. Best performances (Friedman Test, followed by post-hoc one-sided Wilcoxon signed-rank test, with p value less than 0.05) are highlighted in bold. The data were simulated under the second parameter setting (r1=0.21, r2=0.4, r3=0.19, b11= b22=b33= 0, b12=0.02, b13=0.016, b21=0.01, b23=0.014, b31=0.017, b32=0.02, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100). We set Nsum_intial_guess=200.

Table 5. Average cosine similarity of interaction coefficients (bij) where ij under varying time ranges (t) and noise levels for different methods, using simulated datasets that reflect a stabilizing system of three species.

Time range iLV cLV gLV_relative Pearson’s r Spearman’s r gLV_absolute
No noise t = 20 1 0.832 0.716 0.148 0.166 0.988
t = 15 1 0.853 0.738 0.102 0.106 0.977
t = 10 1 0.856 0.743 0.0963 0.0965 0.903
Noise level 0.05 t = 20 0.875
(0.276)
0.828
(0.00897)
0.711
(0.0102)
0.148
(0.00269)
0.167
(0.00294)
0.986
(0.00233)
t = 15 0.993
(0.00688)
0.848
(0.00893)
0.731
(0.0106)
0.101
(0.00470)
0.106
(0.00886)
0.974
(0.00537)
t = 10 0.979
(0.0179)
0.852
(0.00911)
0.738
(0.0135)
0.0969
(0.00582)
0.0994
(0.0134)
0.900
(0.0165)
Noise level 0.10 t = 20 0.868
(0.336)
0.819
(0.0129)
0.692
(0.0153)
0.146
(0.00447)
0.164
(0.00416)
0.981
(0.00768)
t = 15 0.951
(0.144)
0.835
(0.0196)
0.716
(0.0250)
0.101
(0.0110)
0.108
(0.0123)
0.969
(0.0136)
t = 10 0.838
(0.159)
0.844
(0.0159)
0.725
(0.0209)
0.0963
(0.0122)
0.102
(0.0167)
0.875
(0.0350)

For each noisy group, 20 randomly noisy datasets of relative abundance were generated, and the standard deviation was reported in parentheses following the mean cosine similarities of bij. Best performances (Friedman Test, followed by post-hoc one-sided Wilcoxon signed-rank test, with p value less than 0.05) are highlighted in bold. The data were simulated under the second parameter setting (r1=0.21, r2=0.4, r3=0.19, b11= b22=b33= 0, b12=0.02, b13=0.016, b21=0.01, b23=0.014, b31=0.017, b32=0.02, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100). We set Nsum_intial_guess=200.

We also added different levels of self-interaction terms bii in simulated datasets. Positive self-interactions are rarely biologically meaningful, which usually lead to unstable dynamics [16,17]. Therefore, bii was randomly assigned from three negative ranges representing different magnitudes of self-regulations: -0.005 to 0 (low), -0.01 to -0.005 (medium), and -0.015 to -0.01 (high). The metric of cosine similarity remains the same as in Tables 1, 2, 4, and 5, and results are presented in Tables 3 and 6 for periodic-oscillation and stabilizing systems, respectively. We incorporated self-interaction terms (bii) into gLV_absolute modeling for results presented in Tables 3 and 6. However, this was not feasible for iLV, cLV, or gLV_relative due to the identifiability issue mentioned in the Methods section. While gLV_absolute generally achieved the best performance using absolute-abundance inputs, iLV consistently outperformed cLV and gLV_relative using relative-abundance inputs. Under all levels of self-interactions, iLV exhibited stable and reliable performance in capturing the relative interaction strengths among species, with cosine similarity consistently exceeding 0.7. By comparison, both cLV and gLV_relative yielded markedly lower cosine similarities (Tables 3 and 6).

Table 3. The cosine similarity of interaction coefficients (bij) where ij under varying time intervals (Δt) and self-interaction levels for different methods, using simulated datasets that reflect periodic oscillations among three species.

Time segment iLV cLV gLV_relative Pearson’s r Spearman’s r gLV_absolute
Low self-interaction Δt = 1 0.967 0.838 0.787 0.0613 0.0302 0.841
Δt = 0.5 0.966 0.889 0.845 0.0624 0.0178 0.963
Δt = 0.1 0.966 0.912 0.873 0.0629 0.0121 0.998
Medium self-interaction Δt = 1 0.811 0.639 0.545 0.0843 0.102 0.977
Δt = 0.5 0.815 0.683 0.574 0.0878 0.102 0.994
Δt = 0.1 0.817 0.710 0.592 0.0905 0.102 0.999
High self-interaction Δt = 1 0.754 0.432 0.365 0.0685 0.0978 0.958
Δt = 0.5 0.705 0.475 0.384 0.0776 0.0998 0.985
Δt = 0.1 0.715 0.503 0.398 0.0841 0.101 0.999

Best performances are highlighted in bold. The data were simulated under the first parameter setting (r1=0.31, r2=0.6, r3=0.29, b11=0.0017, 0.0065, or 0.012 (low, medium, or high self-interactions, respectively), b12=0.01, b13=0.011, b21=0.009, b22=0.0028, 0.0052, or 0.014 (low, medium, or high self-interactions, respectively), b23=0.01, b31=0.012, b32=0.015, b33=0.0042, 0.0076, or 0.011 (low, medium, or high self-interactions, respectively), x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100). We set Nsum_intial_guess=200.

Table 6. The cosine similarity of interaction coefficients (bij) where ij under varying time intervals (Δt) and self-interaction levels for different methods, using simulated datasets that reflect stabilizing systems of three species.

Time segment iLV cLV gLV_relative Pearson’s r Spearman’s r gLV_absolute
Low self-interaction Δt = 1 0.987 0.827 0.704 0.139 0.133 0.965
Δt = 0.5 0.988 0.873 0.787 0.137 0.115 0.991
Δt = 0.1 0.988 0.895 0.843 0.136 0.110 0.999
Medium self-interaction Δt = 1 0.812 0.727 0.567 0.164 0.176 0.973
Δt = 0.5 0.887 0.767 0.619 0.164 0.174 0.993
Δt = 0.1 0.819 0.790 0.656 0.165 0.174 0.999
High self-interaction Δt = 1 0.874 0.633 0.448 0.169 0.178 0.970
Δt = 0.5 0.874 0.671 0.484 0.169 0.176 0.992
Δt = 0.1 0.876 0.695 0.510 0.169 0.175 0.999

Best performances are highlighted in bold. The data were simulated under the first parameter setting (r1=0.21, r2=0.4, r3=0.19, b11=0.0011, 0.0064, or 0.01 (low, medium, or high self-interactions, respectively), b12=0.02, b13=0.016, b21=0.01, b22=0.0021, 0.0086, or 0.013 (low, medium, or high self-interactions, respectively), b23=0.014, b31=0.017, b32=0.02, b33=0.0035, 0.0053, or 0.011 (low, medium, or high self-interactions, respectively), x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100). We set Nsum_intial_guess=200.

Across all scenarios, iLV performed well against competing methods in recovering interaction coefficients (bij). In noise-free conditions, iLV achieved perfect or near-perfect cosine similarities between predicted and ground-truth coefficients, demonstrating its accuracy and reliability. While gLV-relative and cLV methods are limited by their reliance on linear approximations and the assumption that Nsum stays constant, iLV leverages iterative optimization to refine parameter estimates, resulting in significantly improved accuracy under most of the scenarios. Moreover, the Pearson and Spearman coefficients are generally very small in the four tables, indicating that they cannot be used to measure the interaction strength between different species.

The benchmarking results revealed that iLV’s performance is sensitive to temporal resolution, with smaller time intervals (Δt) yielding more accurate estimates. Longer temporal ranges had similar effects on iLV performance but with some exceptions (such as the second parameter setting when t = 20). In contrast, the performance of cLV and gLV-relative methods plateaued or declined under these conditions, underscoring iLV’s ability to exploit temporal information effectively. As noise levels increased, iLV showed higher variance compared to other simple linear approximation methods, since higher noise levels can make the non-linear optimization process harder by further complicating the non-convex objective function.

When comparing methods across all benchmark scenarios, gLV-absolute performed well with absolute abundance data, but its reliance on such input limits its applicability to microbiome studies, where relative abundance is often the only available measure. In contrast, iLV provides comparable accuracy while using relative abundance, demonstrating its versatility and practical relevance.

The snowshoe hare-Canadian lynx population dynamics and the Stylonychia pustula-P. caudatum mixed culture

In the northern boreal forests of North America, the snowshoe hare (Lepus americanus) serves as the primary food source for the Canadian lynx (Lynx canadensis). When hare populations are abundant, lynx consume approximately two hares every three days, primarily excluding other prey from their diet. As a result, the population dynamics of these two species are tightly interconnected. For over a century, it has been observed that their populations underwent significant fluctuations in cycles lasting approximately 8–11 years [14].

We utilized the absolute abundance data of lynx and hare from 1920 to 1935 [18], converting it into relative abundances. To analyze the data, we applied the iLV and gLV_relative models to the relative abundance dataset and gLV_absolute model to the absolute abundance data. The resulting trajectory RMSEs were 0.109, 0.318, and 0.425, respectively, indicating that iLV achieved the lowest RMSE and provided the best agreement between estimated and observed trajectories, as shown in Fig 3A, 3B, and 3C.

Fig 3. Comparison of predicted and observed relative abundance trajectories for snowshoe hare-Canadian lynx.

Fig 3

We set 𝐍sum_intial_guess=10 according to Table 7. Panels A, B, and C show the trajectories generated by iLV, gLV_relative, and gLV_absolute models using all 16 pairs of data points, respectively, with observed data overlaid for comparison. iLV aligns closely with observed trajectories (RMSE = 0.109), whereas gLV_relative (RMSE = 0.318) and gLV_absolute (RMSE = 0.425) display higher errors. Panel D illustrates the training and validation RMSE distribution across 100 runs, showing iLV significantly outperforming the other methods in training and validation accuracies. Significance is computed relative using the Friedman Test followed by a post-hoc one-sided Wilcoxon signed-rank test (****: p <104; ***:p < 0.001; **: p < 0.01; *: p < 0.05; ns: not significant).

To further evaluate model performance, we randomly split the relative abundance dataset into 10 pairs (hare and lynx) of data points for parameter training and 6 pairs of data points for validation across 100 runs. The performance of iLV, gLV_relative, and gLV_absolute was compared using training and validation trajectory RMSE as shown in Fig 3D. The cLV model was excluded from this analysis, as it can only estimate rirD and bijbDj, where species D has the lowest observed variance of relative abundance among all species. Without explicit estimates for ri and bij, the trajectories of relative abundances cannot be recovered, making trajectory RMSE computation infeasible for cLV.

The iLV model significantly outperformed gLV_relative and gLV_absolute in training RMSE, with p-values 1.72×1017 and 8.93×1015, respectively (Friedman Test followed by a post-hoc one-sided Wilcoxon signed-rank test). No significant difference in training RMSE was observed between gLV_relative and gLV_absolute. For validation RMSE, iLV significantly outperformed gLV_relative and gLV_absolute, with p-values of 2.32×105 and 3.81×104, respectively. No significant difference in validation RMSE was detected between gLV_relative and gLV_absolute. Overall, the iLV model consistently performed best among the three methods during training and validation, which underscores iLV’s reliability.

A dataset containing the abundances of Stylonychia pustulata grown in mixture with P. caudatum on Osterhout medium was recorded in The Struggle for Existence [19] and digitized by Mühlbauer et al. [20]. Absolute abundances of these two species were recorded in a 25-day period, and we converted them to relative abundances. Day 2 was skipped, while individual values were missing and only the mean was reported for days 20–25 in the original data. Besides, the system started to plateau after day 19, so we modeled mean abundances (of three replicates) before day 19. The fitted results are given in Fig 4. We compared the performances of iLV, gLV_relative, and gLV_absolute, and iLV had the lowest RMSE (0.0673) compared to the other two models (with RMSE of 0.163 and 0.383, respectively). iLV also captured the system oscillation between day 0 and day 7, while the other two models could not.

Fig 4. Comparison of predicted and observed relative abundance trajectories for Stylonychia pustulata and P. caudatum.

Fig 4

We set 𝐍sum_intial_guess=100 according to Table 7. Panels A, B, and C show the trajectories generated by iLV, gLV_relative, and gLV_absolute models using all 19 pairs of data points, respectively, with observed data overlaid for comparison. iLV aligns closely with observed trajectories (RMSE = 0.0673), whereas gLV_relative (RMSE = 0.163) and gLV_absolute (RMSE = 0.383) display higher errors. Panel D illustrates the training and validation RMSE distribution across 100 runs, showing iLV significantly outperforming the other methods in training accuracy and maintainin significantly better validation performance than gLV_absolute. Significance is computed relative using the Friedman Test followed by a post-hoc one-sided Wilcoxon signed-rank test (****: p <104; ***:p < 0.001; **: p < 0.01; *: p < 0.05; ns: not significant).

Similarly, we randomly split the whole dataset into 12 pairs of data points for training and 7 pairs of data points for validation across 100 runs, and the results are shown in Fig 4D. iLV performed significantly better than gLV_relative and gLV_absolute in training RMSE. As for validation, iLV performed significantly better than gLV_absolute, but there was no significant difference between iLV and gLV_relative in validation RMSE.

A cheese microbial community

Mounier et al. previously quantified cell counts of five microbial groups within a cheese microbial community over a 21-day period [15]. We converted these absolute counts to relative abundances for modeling purposes. These groups include D. hansenii, Y. lipolytica, G. candidum, Leucobacter sp., and a bacterial group composed of Arthrobacter arilaitensis, Hafnia alvei, Corynebacterium casei, Brevibacterium aurantiacum, and Staphylococcus xylosus. The system plateaued around day 10, and we modeled the dynamic phase between days 0 and 10. We applied our iLV model to this dataset with Nsum_intial_guess set to 0.001 (see Table 7), and the trajectory RMSE is 0.149. For comparison, we also applied the gLV_relative and gLV_absolute models, which yielded trajectory RMSEs of 0.327 and 0.296, respectively.

Table 7. Runtime and sensitivity to Nsum_intial_guess of the iLV Algorithm using real datasets.

Number of species Number of
time points
Nsum_intial_guess Runtime
(Seconds)
Trajectory
RMSE
Lynx-hare
2 16 0.001 2.11 0.118
2 16 0.01 2.01 0.122
2 16 0.1 2.12 0.114
2 16 1 2.38 0.110
2 16 10 2.37 0.109
2 16 100 2.40 0.112
2 16 1000 2.22 0.112
Stylonychia pustula-P. caudatum
2 19 0.001 1.95 0.0685
2 19 0.01 2.09 0.0673
2 19 0.1 2.10 0.0673
2 19 1 1.91 0.0684
2 19 10 2.10 0.0673
2 19 100 1.85 0.0673
2 19 1000 1.89 0.0673
Cheese
5 11 0.001 4.29 0.149
5 11 0.01 10.50 0.186
5 11 0.1 4.58 0.173
5 11 1 5.47 0.152
5 11 10 6.91 0.175
5 11 100 4.85 0.186
5 11 1000 3.69 0.186

We tested runtime and sensitivity to Nsum_intial_guess for all three real datasets used in this paper, which are the lynx-hare dataset, the Stylonychia pustula-P. caudatum dataset, and the cheese microbial dataset. The runtime was measured for a single run on a personal laptop, and the number of loops M in Subroutine 1 of the iLV Algorithm was set to 100. We set Nsum_intial_guess to 10, 100 and 0.001 for lynx-hare, Stylonychia pustula-P. caudatuma and cheese microbial datasets, respectively in our data analysis. The rows corresponding to the lowest trajectory RMSE are in bold.

Significant interactions among the microbial species, as predicted by the iLV model, are shown in Fig 5D. Some of these interactions are supported by previous studies [15]. For example, Y. lipolytica has been shown to reduce the viability of D. hansenii during the stationary phase [15]. However, not all predicted interactions align with prior findings. For instance, Y. lipolytica was reported to inhibit the mycelial growth of G. candidum, rather than promoting it. This inhibition disrupted the typical mold-like mycelial structure of G. candidum, leading to the formation of spaghetti-like morphologies [15]. It is important to note, however, that Mounier et al. investigated pairwise interactions through two-species co-culture experiments [15], which may not fully capture the complexity of interactions within a multispecies community.

Fig 5. Predicted abundance trajectory and interactions of the cheese microbial community.

Fig 5

We set 𝐍sum_intial_guess=0.001 according to Table 7. Panels A, B, and C show the trajectories generated by iLV, gLV_relative, and gLV_absolute models, respectively, with observed data overlaid for comparison. iLV aligns closely with observed trajectories (RMSE = 0.149), whereas gLV_relative (RMSE = 0.327) and gLV_absolute (RMSE = 0.296) display higher errors. Panel D shows the predicted interaction network. The green arrows mean promotion, and the red arrows mean inhibition. The thickness of the arrows is proportional to the effect size of promotion or inhibition, indicated by the magnitude of bij. Only interactions that are at least 50 times larger than the minimum predicted interaction were plotted. Dh is D. hansenii, Yl is Y. lipolytica, Gc is G. candidum, Ls is Leucobacter sp., and C is a bacterial group composed of Arthrobacter arilaitensis, Hafnia alvei, Corynebacterium casei, Brevibacterium aurantiacum, and Staphylococcus xylosus.

Runtime and sensitivity to Nsum_intial_guess of the iLV Algorithm

We tested the runtime of the iLV Algorithm in both real and simulated datasets under different values of Nsum_intial_guess (varying from 0.001 to 1000), and the results are shown in Tables 5 and 6, respectively. In the lynx-hare dataset, trajectory RMSE had minor fluctuations around 0.11 and achieved the minimum of 0.109 when Nsum_intial_guess=10. Similarly, trajectory RMSE stayed rather stable around 0.068 and achieved the minimum of 0.0673 when Nsum_intial_guess=100 in the Stylonychia pustula-P. caudatum dataset. The cheese microbial dataset exhibited slightly higher fluctuations and achieved the minimum RMSE of 0.149 when Nsum_intial_guess=0.001. For the simulated datasets, generated using the two parameter settings applied in benchmarking with 10% Gaussian noise added, the trajectory RMSE remained consistent across different values of Nsum_intial_guess. Based on these results, we adopted Nsum_intial_guess=200 in benchmarking analyses without loss of generality. Overall, the trajectory RMSE is not highly sensitive to the value of Nsum_intial_guess for small systems, but we noticed a small increase in sensitivity as the system becomes larger and more complicated.

The iLV model’s computational bottleneck lies in the nonlinear optimization step (Subroutine 2), which uses leastsq() and least_squares() functions from the SciPy library [21]. The time complexities of the Levenberg–Marquardt algorithm (implemented in both leastsq() and least_squares(method = ’lm’)) and the Trust Region Reflective algorithm (least_squares(method = ’trf’)), both scale approximately with O(k*(n2m+n3)) when the Jacobian of cost function F(x) is analytically solvable, where n is the number of parameters, m is the number of timepoints, and k is the number of iterations until convergence (depending on the problem, starting point, etc.) [22,23]. In our experiments, the runtime was acceptable for low-dimensional systems (see Tables 5 and 6). It generally increases with the number of species and the number of time points, while Nsum_intial_guess can also affect runtime by providing different starting points and thus affecting the number of iterations required for convergence.

Discussion

This study presents the iterative Lotka-Volterra (iLV) model as a novel approach for parameter estimation using relative abundance data, which integrates iterative refinement with non-linear optimization. The iLV model outperformed existing methodologies, such as the compositional Lotka-Volterra (cLV) model and generalized Lotka-Volterra (gLV_relative) approaches, in recovering interaction coefficients of simulation data under various noise levels and temporal resolutions. Moreover, compared with cLV and gLV_relative, iLV demonstrated more stable and reliable performance in capturing the relative interaction strengths across species under different magnitudes of self-interactions.

Real-world applications, such as Canadian lynx-snowshoe hare, Stylonychia pustula-P. caudatum, and the cheese microbial community datasets, further validated the practicality of iLV. In these cases, iLV successfully reconstructed system trajectory with low RMSE values, demonstrating its ability to handle compositional data without sacrificing model interpretability. For instance, in the lynx-hare system, iLV accurately captured population dynamics, aligning closely with observed data (RMSE = 0.109). The iterative design of iLV is a significant improvement over traditional linear regression methods. By iteratively solving ordinary differential equations and refining parameter estimates, iLV minimizes the root mean square error (RMSE) between observed and predicted relative abundances.

Despite its effectiveness, the iLV model’s computational bottleneck lies in the nonlinear optimization step (Subroutine 2), which uses leastsq() and least_squares() functions from the SciPy library [21]. These functions can be computationally intensive, particularly for systems with many species or dense time-series data. In our experiments, the runtime was acceptable for low-dimensional systems (see Tables 5 and 6), but challenges will occur for higher-dimensional systems. To address scalability concerns, several strategies can be considered in future work. One is the parallelization of key components, such as distributed optimization across multiple initializations [24]. Furthermore, using sparse Jacobians or approximate Hessians could potentially reduce computation without a major loss in accuracy, as has been shown in large-scale numerical optimization problems [25,26]. These approaches would enhance the scalability of iLV and facilitate its application to larger and more complex microbiome datasets, where compositional data with many taxa and high temporal resolution are increasingly common [10,11]. Another solution would be dimensionality reduction. For example, Phuongan et al. reduced more than 11,000 microbial Operational Taxonomic Units (OTUs) into 14 subgroups by clustering their blooming times, and they modeled on the subgroups instead of OTUs [27].

Besides, incorporating additional ecological constraints, such as carrying capacities or environmental perturbations, could improve the realism of the model. Moreover, an initial guess of the inter-species sum of absolute abundances Nsum_intial_guess is needed for iLV, and users may tweak it several times for better prediction accuracy. Sensitivity of Nsum_intial_guess is dataset-specific, and datasets involving larger and more complicated systems are more likely to be sensitive to Nsum_intial_guess in our experiments (see Tables 7 and 8). The number of data points also matters for iLV performance, and data interpolation may help with inadequate input data while also introducing additional errors.

Table 8. Runtime and sensitivity to Nsum_intial_guess of the iLV Algorithm using simulated datasets.

Number of species Number of
time points
Nsum_intial_guess Runtime
(Seconds)
Trajectory
RMSE
Periodic oscillation
3 11 0.001 2.56 0.0156
3 11 0.01 4.07 0.0156
3 11 0.1 4.30 0.0156
3 11 1 3.56 0.0156
3 11 10 3.66 0.0156
3 11 100 3.94 0.0156
3 11 200 4.06 0.0156
3 11 1000 3.55 0.0156
Stabilizing system
3 11 0.001 2.16 0.0102
3 11 0.01 2.46 0.0102
3 11 0.1 2.29 0.0102
3 11 1 2.24 0.0102
3 11 10 2.13 0.0102
3 11 100 10.59 0.0102
3 11 200 3.54 0.0102
3 11 1000 2.31 0.0102

We utilized two parameter settings from benchmarking. The first parameter setting represents periodic oscillations among three species: r1=0.31, r2=0.6, r3=0.29, b12=0.01, b13=0.011, b21=0.009, b23=0.01, b31=0.012, b32=0.015, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100. The second parameter setting reflects a stabilizing system of three species: r1=0.21, r2=0.4, r3=0.19, b12=0.02, b13=0.016, b21=0.01, b23=0.014, b31=0.017, b32=0.02, x1(0)=0.3, x2(0)=0.5, x3(0)=0.2, Nsum(0)=100. The simulated time ranges were 0–10 with step size 1, and 10% random Gaussian noise was added. The runtime was measured for a single run on a personal laptop, and the number of loops M in Subroutine 1 of the iLV Algorithm was set to 100. Both simulation parameter settings were insensitive to Nsum_intial_guess, and we used Nsum_intial_guess=200 in all benchmarking without loss of generality that is highlighted in bold.

In summary, the iterative Lotka-Volterra model offers a robust framework for analyzing compositional data. By addressing limitations in existing methodologies and demonstrating superior performance across simulated and real-world datasets, iLV sets a new approach for species interaction modeling. Future efforts to expand data dimensionality and ecological assumptions (such as perturbations) will further cement its utility in ecological and microbiome research.

Methods

The generalized Lotka-Volterra model, defined with relative abundances

The Lotka-Volterra model was first proposed by Alfred James Lotka in 1910 for chemical reactions [28], and in 1925 he used the model to study predator-prey interactions [4]. In 1926, the same model was published by Vito Volterra [29]. While the Lotka-Volterra model only depicts systems of two species, the generalized Lotka-Volterra (gLV) was proposed to model systems of more than two species. The gLV model is widely used in describing the dynamics of a set of organisms, which takes the form of (where N1, …, Nm are absolute abundances of the different organisms in the system, and ri and bij are coefficients):

{dN1dt=r1N1+j=1mb1jN1NjdN2dt=r2N2+ j=1mb2jN2NjdNmdt=rmNm+j=1mbmjNmNj  (1)

Relative abundances (x1, …, xm) and absolute abundances can be converted mutually if the sum of absolute abundances of all species in the system is given:

Nsum=i=1mNi (2)
xi=NiNsum (3)

According to equations (1)(2)(3), the gLV model can be rewritten with x1, …, xm and Nsum:

{dNsumdt= i=1m(rixiNsum+j=1mbijxixjNsum2)dx1dt=(r1x1Nsum+j=1mb1jx1xjNsum2x1dNsumdt)/Nsumdx2dt=(r2x2Nsum+j=1mb2jx2xjNsum2x2dNsumdt)/Nsumdxmdt=(rmxmNsum+j=1mbmjxmxjNsum2xmdNsumdt)/Nsum  (4)

where the first sub-equation in (4) holds as the derivative of the sum equals the sum of derivatives and the rest of equations (4) holds as the following:

dxidt= d (NiNsum)dt= dNidtNsumNidNsumdtNsum2= dNidtxidNsumdtNsum= riNi+j=1mbijNiNjxidNsumdtNsum= rixiNsum+j=1mbijxixjNsum2xidNsumdtNsum

We used odeint() function in scipy package [21] to solve the ordinary differential equation system (4) numerically when the initial values of x1, …, xm and Nsum are given.

Parameter estimation by linear approximation and regression

Multiple investigators have used the following linear approximations for the generalized Lotka-Volterra model when absolute abundances are given [6,8,15,30]:

Δ(lnNi)Δtd(lnNi)dt=dNidtNi=ri+j=1mbijNj (5)

where Δ(lnNi)= lnNi(tk+1)lnNi(tk\) is the difference of lnNi  at two adjacent time points tk+1 and tk, and Δt= tk+1tk

By linear approximation (5), ri is the intercept and bij are slopes of the linear regression where Δ(lnNi)Δt is the dependent variable and Nj are independent variables. When absolute abundance data are available, parameters can be estimated as the least square solutions of such linear regressions. In benchmarking, we referred to such method as gLV_absolute.

Since absolute abundances and relative abundances can be mutually converted when the sum of absolute abundances is known, we were inspired by the linear approximation method and came up with a novel iterative method for estimating parameters using the relative abundance presentation of the generalized Lotka-Volterra model (as indicated by equation system (4)).

The iterative Lotka-Volterra (iLV) algorithm

The iterative Lotka-Volterra (iLV) algorithm has two subroutines: (1) The iterative subroutine, which utilizes linear optimization, and (2) The least square estimation subroutine, which is a non-linear optimization using the results from the iterative subroutine as the starting point of optimization.

When relative abundances xi_observed of the i-th organism at different time points are given and we have an initial guess Nsum_intial_guess  of Nsum, we propose the following iterative subroutine to improve parameter estimation to minimize the root-mean-square (RMSE) between estimated and observed relative abundances:

Subroutine 1: the iterative subroutine.
Step 1: Ni0=xi_observedNsum_intial_guess. Then, use linear approximation (5) and calculate the least square solution of linear regressions. Denote the solution as ri0, bij0.
Step 2:
For k from 0 to M (e.g., 99):
  1. Solve ordinary differential equation (ODE) system (4) numerically by odeint(rik, bijk, Nsum_initial_guess, xi_observed(0)) and calculate RMSE between estimated and observed relative abundances. Please note xi(0are estimated as the observed relative abundances at time point 0, and Nsum(0) is estimated as Nsum_intial_guess. Also, denote Nsumk+1 as the trajectory of the sum of absolute abundances returned by the odeint() function.

  2. Let Nik+1=xi_observedNsumk+1. Then, use linear approximation (5) and calculate the least square solution of linear regressions. Denote the solution as rik+1, bijk+1.


Return rik  and bijk with the minimal RMSE between estimated and observed relative abundances. xi(0) are estimated as the observed relative abundances at time point 0, and Nsum(0) is estimated as Nsum_intial_guess.

The motivations behind the iterative algorithm can be described as follows. The first step gives an initial estimate of the parameters (ri, bij) assuming constant total absolute abundance across time. The assumption of constant total absolute abundance across time is not correct in most situations and the total absolute abundance calculated based on the estimated parameters using the gLV model may be closer to the real values. Therefore, we re-estimate the parameters using the estimated total absolute abundance together with the observed relative abundance, resulting in a more accurate estimation of absolute abundances. We iterate this process multiple times and choose the parameters as the set having the lowest RMSE between the theoretical relative abundance and observed relative abundance.

Note that if we multiply Nsum_intial_guess by a constant such that Nsum_initial_guess(c)=c Nsum_intial_guess, the resulting ri estimate will not change, while the resulting interaction parameters will be changed to bij/c from equation (5) in the first step. Therefore, the estimated interaction coefficients will differ by a constant which will not impact the relative abundance across time [12].

We note that the iterative subroutine was based on intuition and intended to obtain an initial estimate of the parameters (ri, bij), which can be used as the starting point for the following least square estimation. There is no guarantee that the estimates from the iterative subroutine converge to the real parameter values.

Subroutine 2: the least square estimation subroutine.

We used least_squares() and leastsq() functions in scipy package [21] to find a local minimum of the cost function F(x), given initial guesses of the parameters:

F(x)=0.5j=1mfj(x)2
fj(x)=xj[odeint(ODEsystem(4),ri,bik,xi(0),Nsum(0))]j,wherei,k=1,2,..,m

Given different initial guesses of the parameters, least_squares() and leastsq() functions may return different estimated parameters corresponding to different local minima of the cost function F(x). Therefore, a close-enough initial guess of parameters is crucial. We used the results returned by Subroutine 1 as the initial guesses for ri, bij, observed relative abundances at time point 0 as the initial guesses for xi(0). There are three optimization methods implemented in least_squares() function for finding a nearby local minimum, which are Trust Region Reflective (TRF) algorithm [22], Levenberg-Marquardt (LM) algorithm [23] and dogbox algorithm [31]. The dogbox algorithm is suitable for a small number of variables and may exhibit slow convergence when the rank of Jacobian is less than the number of variables [31], and it had poor performance and stability in our trials, so we only used Trust Region Reflective and Levenberg-Marquardt optimizations. We also included leastsq() function that has a slightly different implementation of Levenberg-Marquardt (LM) algorithm, as compared to least_squares() function. In our data analysis, we compared the trajectory RMSE of least_squares(method = “trf” or method = “lm”) and leastsq(), and returned the results with the lowest trajectory RMSE.

Notably, while the growth coefficients ri in the gLV model are identifiable using relative abundance data, interaction coefficients bij are only identifiable up to a multiplicative constant [12]. Consequently, the interaction coefficients estimated by the iLV algorithm approximate the true interaction coefficients up to a constant. This property allows us to use the estimated parameters to compare the relative strengths of microbial interactions. Besides, for better performance of the iLV Algorithm, users have to tune Nsum_intial_guess that returned a local minimum of trajectory RMSE. Sensitivity to Nsum_intial_guess is dataset-specific, and datasets involving larger and more complicated systems are more likely to be sensitive to Nsum_intial_guess in our experiments (see Tables 5 and 6). We suggest that investigators carefully compare the trajectories of the relative abundances of the different microbes based on the gLV model under the estimated parameters with the observed relative abundances of the microbes. Their close approximation suggests that the estimated interactions are close to the true underlying interactions.

Identifiability issue of self-interaction terms

At each time point, only the relative abundances of the species are available, which satisfy the compositional constraint j=1mxj=1. Thus, the relative abundance of the first species can be represented as x1=1j=2mxj. This expression was substituted into the first step of Subroutine 1 in the iLV algorithm to obtain:

Δ(ln(xiNsum_initial_guess))Δt=Δ(lnxi)Δtd(lnxi)dt=dxidtxi=ri+bi1(1j=2mxj)+j=2mbijxj=ri+bi1+j=2m(bijbi1)xj (6)

In equation (6), while ri+bi1 and bijbi1 are identifiable, there are infinitely many solutions for ri and bij. Similar identifiability issues also arise in the cLV and gLV_relative models [13]. To overcome this issue, we assumed self-interaction terms bii to be 0 when modeling relative abundances with iLV, cLV, or gLV_relative. We also investigated how the presence of self-interactions impacted the estimation of bij (where i ≠ j) in our simulation studies.

Evaluation metrics

We used trajectory Root Mean Square Error (RMSE) to measure the performance of our algorithm (where m is the number of microbial species in the system and n + 1 is the number of sampled time points):

gj(tk)=xj(tk)[odeint(ODEsystem(4), ri, bil, xi(0), Nsum(0))(tk)]j
RMSE=k=0nj=1mgj2(tk)(n+1)m,wherei,l=1,2,..,m

Cosine similarities of bij between estimated and ground truth values were also used as an evaluation metric in benchmarking. We concatenated the estimated values of all bij (where i = 1,2,..,m, and j ≠ i) as vector A, and we concatenated the ground truth value of all bij vector B. Then, cosine similarity is the cosine of the angle between vector A and vector B.

Data Availability

All data are public; data and code to replicate experiments are available at https://github.com/willhahah/iLV.

Funding Statement

This research was supported in part by the US National Science Foundation (EF-2125142 to F.S.) and the US National Institute of Diabetes and Digestive and Kidney Diseases (R01DK142026 to X.D.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Coyte KZ, Schluter J, Foster KR. The ecology of the microbiome: Networks, competition, and stability. Science. 2015;350(6261):663–6. doi: 10.1126/science.aad2602 [DOI] [PubMed] [Google Scholar]
  • 2.Faust K, Raes J. Microbial interactions: from networks to models. Nat Rev Microbiol. 2012;10(8):538–50. doi: 10.1038/nrmicro2832 [DOI] [PubMed] [Google Scholar]
  • 3.Casadevall A, Pirofski LA. Host-pathogen interactions: basic concepts of microbial commensalism, colonization, infection, and disease. Infect Immun. 2000;68(12):6511–8. doi: 10.1128/IAI.68.12.6511-6518.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lotka AJ. Elements of physical biology. Williams & Wilkins. 1925. [Google Scholar]
  • 5.Bucci V, Tzen B, Li N, Simmons M, Tanoue T, Bogart E, et al. MDSINE: Microbial Dynamical Systems INference Engine for microbiome time-series analyses. Genome Biol. 2016;17(1):121. doi: 10.1186/s13059-016-0980-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Davis JD, Olivença DV, Brown SP, Voit EO. Methods of quantifying interactions among populations using Lotka-Volterra models. Front Syst Biol. 2022;2. doi: 10.3389/fsysb.2022.1021897 [DOI] [Google Scholar]
  • 7.Zhang G, Allaire D, McAdams DA, Shankar V. System evolution prediction and manipulation using a Lotka–Volterra ecosystem model. Design Studies. 2019;60:103–38. doi: 10.1016/j.destud.2018.11.001 [DOI] [Google Scholar]
  • 8.Stein RR, Bucci V, Toussaint NC, Buffie CG, Rätsch G, Pamer EG, et al. Ecological modeling from time-series inference: insight into dynamics and stability of intestinal microbiota. PLoS Comput Biol. 2013;9(12):e1003388. doi: 10.1371/journal.pcbi.1003388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dimas Martins A, Gjini E. Modeling Competitive Mixtures With the Lotka-Volterra Framework for More Complex Fitness Assessment Between Strains. Front Microbiol. 2020;11:572487. doi: 10.3389/fmicb.2020.572487 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017;8:2224. doi: 10.3389/fmicb.2017.02224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tsilimigras MCB, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol. 2016;26(5):330–5. doi: 10.1016/j.annepidem.2016.03.002 [DOI] [PubMed] [Google Scholar]
  • 12.Remien CH, Eckwright MJ, Ridenhour BJ. Structural identifiability of the generalized Lotka-Volterra model for microbiome studies. R Soc Open Sci. 2021;8(7):201378. doi: 10.1098/rsos.201378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Joseph TA, Shenhav L, Xavier JB, Halperin E, Pe’er I. Compositional Lotka-Volterra describes microbial dynamics in the simplex. PLoS Comput Biol. 2020;16(5):e1007917. doi: 10.1371/journal.pcbi.1007917 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Elton C, Nicholson M. The Ten-Year Cycle in Numbers of the Lynx in Canada. The Journal of Animal Ecology. 1942;11(2):215. doi: 10.2307/1358 [DOI] [Google Scholar]
  • 15.Mounier J, Monnet C, Vallaeys T, Arditi R, Sarthou A-S, Hélias A, et al. Microbial interactions within a cheese microbial community. Appl Environ Microbiol. 2008;74(1):172–81. doi: 10.1128/AEM.01338-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hofbauer J, Sigmund K. Evolutionary games and population dynamics. Cambridge University Press. 1998. [Google Scholar]
  • 17.May RM. Stability and complexity in model ecosystems. Princeton university press. 2001. [Google Scholar]
  • 18.MacLulich DA. Fluctuations in the Numbers of the Varying Hare (Lepus Americanus). University of Toronto Press. 1937. doi: 10.3138/9781487583064 [DOI] [Google Scholar]
  • 19.Gause GF. Experimental Analysis of Vito Volterra’s Mathematical Theory of the Struggle for Existence. Science. 1934;79(2036):16–7. doi: 10.1126/science.79.2036.16.b [DOI] [PubMed] [Google Scholar]
  • 20.Mühlbauer LK, Schulze M, Harpole WS, Clark AT. gauseR: Simple methods for fitting Lotka-Volterra models describing Gause’s “Struggle for Existence”. Ecol Evol. 2020;10(23):13275–83. doi: 10.1002/ece3.6926 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. doi: 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Conn AR, Gould NIM, Toint PL. Trust Region Methods. Society for Industrial and Applied Mathematics. 2000. doi: 10.1137/1.9780898719857 [DOI] [Google Scholar]
  • 23.Moré JJ. The Levenberg-Marquardt algorithm: Implementation and theory. Lecture Notes in Mathematics. Springer Berlin Heidelberg. 1978. 105–16. doi: 10.1007/bfb0067700 [DOI] [Google Scholar]
  • 24.Aoki Y, Hayami K, Toshimoto K, Sugiyama Y. Cluster Gauss–Newton method. Optim Eng. 2020;23(1):169–99. doi: 10.1007/s11081-020-09571-2 [DOI] [Google Scholar]
  • 25.Fowkes JM, Gould NIM, Scott JA. Approximating sparse Hessian matrices using large-scale linear least squares. Numer Algor. 2023;96(4):1675–98. doi: 10.1007/s11075-023-01681-z [DOI] [Google Scholar]
  • 26.Walther A. Computing sparse Hessians with automatic differentiation. ACM Trans Math Softw. 2008;34(1):1–15. doi: 10.1145/1322436.1322439 [DOI] [Google Scholar]
  • 27.Dam P, Rodriguez-R LM, Luo C, Hatt J, Tsementzi D, Konstantinidis KT, et al. Model-based Comparisons of the Abundance Dynamics of Bacterial Communities in Two Lakes. Sci Rep. 2020;10(1):2423. doi: 10.1038/s41598-020-58769-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lotka AJ. Contribution to the Theory of Periodic Reactions. J Phys Chem. 1910;14(3):271–4. doi: 10.1021/j150111a004 [DOI] [Google Scholar]
  • 29.Volterra V. Variazioni e fluttuazioni del numero d’individui in specie animali conviventi. Società anonima tipografica “Leonardo da Vinci”. 1926. [Google Scholar]
  • 30.Venturelli OS, Carr AC, Fisher G, Hsu RH, Lau R, Bowen BP, et al. Deciphering microbial interactions in synthetic human gut microbiome communities. Mol Syst Biol. 2018;14(6):e8157. doi: 10.15252/msb.20178157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nocedal J, Wright SJ. Numerical Optimization. Springer-Verlag. 1999. doi: 10.1007/b98874 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data are public; data and code to replicate experiments are available at https://github.com/willhahah/iLV.


Articles from PLOS Computational Biology are provided here courtesy of PLOS

RESOURCES