Abstract
Dynamic flux balance analysis uses a quasi-steady state assumption to calculate an organism’s metabolic activity at each time-step of a dynamic simulation, using the well-known technique of flux balance analysis. For microbial communities, this calculation is especially costly and involves solving a linear constrained optimization problem for each member of the community at each time step. However, this is unnecessary and inefficient, as prior solutions can be used to inform future time steps. Here, we show that a basis for the space of internal fluxes can be chosen for each microbe in a community and this basis can be used to simulate forward by solving a relatively inexpensive system of linear equations at most time steps. We can use this solution as long as the resulting metabolic activity remains within the optimization problem’s constraints (i.e. the solution to the linear system of equations remains a feasible to the linear program). As the solution becomes infeasible, it first becomes a feasible but degenerate solution to the optimization problem, and we can solve a different but related optimization problem to choose an appropriate basis to continue forward simulation. We demonstrate the efficiency and robustness of our method by comparing with currently used methods on a four species community, and show that our method requires at least 91% fewer optimizations to be solved. For reproducibility, we prototyped the method using Python. Source code is available at https://github.com/jdbrunner/surfin_fba.
Author summary
The standard methods in the field for dynamic flux balance analysis (FBA) carry a prohibitively high computational cost because it requires solving a linear optimization problem at each time-step. We have developed a novel method for producing solutions to this dynamical system which greatly reduces the number of optimization problems that must be solved. We prove mathematically that we can solve the optimization problem once and simulate the system forward as an ordinary differential equation (ODE) for some time interval, and solutions to this ODE provide solutions to the optimization problem. Eventually, the system reaches an easily check-able condition which implies that another optimization problem must be solved. We compare our method against typically used methods for dynamic FBA to validate that it provides equivalent solutions while requiring fewer linear-program solutions.
This is a PLOS Computational Biology Methods paper.
Introduction
Microbial communities and human health
The makeup of microbial communities is often complex, dynamic, and hard to predict. However, microbial community structure has a profound effect on human health and disease [1–7]. These two facts have led to significant interest in mathematical models which can predict relative abundances among microbes in a community. Various dynamical models have been proposed to explain and predict microbial community population dynamics [8–12]. Among these are models which propose that interactions between species are mediated by the metabolites that each species produces and consumes [13, 14], and there is significant evidence that these models perform better than models which depend on direct interaction between species [15, 16].
Recently, advances in genetic sequencing have allowed the creation of genome-scale models (GEMs) that reflect the internal network of cellular metabolism, and can therefore be used to predict metabolite use and production [17–19]. This technique can be extended to microbial community modeling by combining GEMs of different species. There has been significant interest in using GEMs to predict relative populations of stable microbial communities [20–26]. Community metabolic modeling can not only predict relative populations, but also holds the potential to predict and explain the community metabolite yield, which can have a profound effect on health [4]. Furthermore, model repositories such as the online bacterial bioinformatics resource PATRIC [27] or the BiGG model database [28] make it possible to build community models using information from individual species investigations.
GEMs can be used to predict microbial growth rates as well as metabolite consumption and production rates using a process called flux balance analysis (FBA). Because these predictions appear in the form of rates of change, they can be used to define a metabolite mediated dynamical model simply by taking as a vector field the rates of change predicted by FBA. We can therefore combine the techniques of metabolite mediated dynamic modeling and community metabolic modeling to produce dynamic predictions of microbial community population size and metabolite yield. This strategy is called dynamic FBA [29–31], and has recently been used to model microbial communities [32–34].
Dynamic FBA, when implemented naïvely, requires a linear optimization problem to be repeatedly solved, and carries a high computational cost for even small communities. Furthermore, in silico experiments may need to be repeated many times over various environmental conditions or using various parameter choices in order to make robust conclusions or to accurately fit model parameters. As a result, implementations of dynamic FBA which depend on optimization at every time-step carry a prohibitively high computational cost when used to simulate larger microbial communities. The implementation of dynamic FBA in the popular COBRA toolbox software package [17] is done in this way, and essentially all more efficient available tools for simulating dynamic FBA fundamentally use an ODE solver approach with optimization at each time-step [24, 31, 35–38]. Dynamic FBA can be improved by taking advantage of the linear structure of the optimization problem which provides a choice of basis for an optimal solution that may be reused at future time-steps [39, 40]. However, the optimizations that are required by this strategy involve solutions with non-unique bases. This means that a basis chosen at random may not provide an optimal solution to the linear program at future time-steps because it provides a solution that is non-optimal or infeasible.
In order to implement dynamic FBA without optimizing at each time step, we use an optimal basic set for the FBA linear optimization problem to create a system of linear equations whose solutions at future time-steps coincide with the solutions to the FBA optimization problem. To solve the problem of non-uniqueness among bases, we prove that there exists a choice of basis that allows forward simulation for a given optimal flux solution and provide a method to choose this basis. Note that this method does not choose among a set of non-unique optimal flux solutions, but instead chooses a basis for a single given optimum. To choose among multiple optimal flux solutions, biological, rather than mathematical, considerations should be used.
In this manuscript, we detail how dynamic FBA can be simulated forward without re-optimization for some time interval, and give a method for doing so. We propose conditions on an optimal basic set for the FBA linear optimization problem which allows for forward simulation, and we prove that such a choice exists. We then detail how to choose this basis set, and finally give examples of simulations which demonstrate the power of our method. For reproducibility, we make a prototype implementation of our method in the Python language available at https://github.com/jdbrunner/surfin_fba.
Background
Flux balance analysis
With the advent of genetic sequencing and the resulting genome scale reconstruction of metabolic pathways, methods have been developed to analyze and draw insight from such large scale models [18]. To enable computation of relevant model outcomes, constraint based reconstruction and analysis (COBRA) is used to model steady state fluxes vi through a microorganism’s internal metabolic reactions under physically relevant constraints [18]. One of the most basic COBRA methods, called flux balance analysis (FBA) optimizes some combination of reaction fluxes ∑γivi which correspond to increased cellular biomass, subject to the constraint that the cell’s internal metabolism is at equilibrium:
(1) |
where Γ is the stoichiometric matrix, a matrix describing the stoichiometry of the metabolic model.
This optimization is chosen because it reflects the optimization carried out by nature through evolution [18]. The vector γ = (γ1, γ2, …, γd) is an encoding of cellular objectives, reflecting the belief that the cell will be optimized to carry out these objectives. The constraint Eq (1) means that any optimal set of fluxes found by FBA corresponds to a steady state of the classical model of chemical reaction networks [41]. This reflects the assumption that the cell will approach an internal chemical equilibrium.
The optimization is done over a polytope of feasible solutions defined by the inequalities vi,min ≤ vi ≤ vi,max, or possibly more complicated linear constraints. See Fig 1 for a geometric representation of an example of the type of linear optimization problem that is carried out. By convention, forward and reverse reactions are not separated and so negative flux is allowed. Linear optimization problems like FBA often give rise to an infinite set of optimal flux vectors v = (v1, v2, …, vd). Geometrically, this set will correspond to some face of the polytope of feasible solutions. To draw conclusions despite this limitation, many methods have been developed to either characterize the set of optimal solutions, as with flux variability analysis (FVA), or enforce more constraints on the network to reduce the size of this set, as with loopless FVA [18].
Dynamic FBA
FBA provides a rate of increase of biomass which can be interpreted as a growth rate for a cell. Furthermore, a subset of the reactions of a GEM represent metabolite exchange between the cell and its environment. By interpreting constraints on nutrient exchange reactions within the metabolic network as functions of the available external metabolites and fluxes of exchange reactions as metabolite exchange rates between the cell and its environment, the coupled system can be modeled. The simplest way to do this is to use an Euler method, as in [30].
In addition to Euler’s method, more sophisticated ODE solvers may be used in the so-called “direct” method of simply recomputing the FBA optimization at every time-step. This can provide better solution accuracy and potentially larger time-steps, but may also require more than one FBA optimization at each time-step. For instance, the Runge-Kutta fourth order method [42] requires four FBA solutions at each time step. Direct methods are implemented in the COBRA toolbox [17] and are the central algorithm in many modern tools, including those of Zhuang et al. [31, 35], Harcombe et al. [36], Zomorrodi et al. [24], Louca and Doebeli [37], and Popp and Centler [38]. Notably, any direct method requires at least one complete recalculation of the network fluxes at each time-step.
However, resolving the system at each time step is not necessary, as the solution the optimization problem at some initial time can actually be used to compute future optimal solutions. Höffner et al., [40], used this observation to introduce a variable step-size method for dynamic FBA. In that method a basic index set is chosen by adding biological constraints to the optimization problem hierarchically until a unique optimal flux vector is found. The challenge of such an approach is in choosing the basis for the optimal solution, as the optimal basis is not guaranteed to be unique even for a unique optimal flux solution. In fact, due to the nature of the method of Höffner et al. and of our method, any optimization past the initial solution that must be carried out is guaranteed to have a solution with a non-unique basis. Furthermore, many choices of optimal basis will not provide a solution for future time-steps, so that choosing among these bases must be done intelligently. Unfortunately, Höffner et al. [40] do not provide a method for choosing among non-unique bases for a single linear program solution.
Our method seeks to solve this problem by choosing a basis which is most likely to remain optimal as simulation proceeds forward from the possibilities provided by an FBA solution. We therefore prioritize reducing the number of times the linear program must be solved, choosing our basis based on the mathematical properties of the system which gives the best chance of providing a solution at future time-steps.
Additionally, a method described as the “dynamic optimization approach” was introduced in Mahadevan et al., [29], however this method is computationally expensive. In particular, the method given in [29] involves optimizing over the entire time-course simulated, and so is formulated as a non-linear program which only needs to be solved once. While this method requires only one optimization, this optimization is itself prohibitively difficult due to the dimensionality of the problem growing with the fineness of time-discretization.
The dynamic FBA model for communities
We can write a metabolite mediated model for the population dynamics of a community of organisms x = (x1, …, xp) on a medium composed of nutrients y = (y1, …, ym):
(2) |
(3) |
where ψi is a vector of the fluxes of nutrient exchange reactions for organism xi as determined by FBA. Using FBA to determine ψi is therefore a quasi-steady state assumption on the internal metabolism of the organisms xi [43–45].
Recall that the basic assumption of flux balance analysis is that, given a matrix Γi that gives the stoichiometry of the network of reactions in a cell of organism xi that growth gi(y) is the maximum determined by solving the following linear program [18]:
(4) |
where is some vector of lower flux bounds while is some vector-valued function of the available metabolites which represents upper flux bounds. The key observation allowing dynamic FBA is that the optimal solution to this problem also determines ψi simply by taking ψij to be the value of the flux vij of the appropriate metabolite exchange reaction. For clarity, we will relabel the elements of vi so that ψik = vij if vij is the kth exchange flux, and ϕik = vij if vij is the kth internal flux. The objective vector γi indicates which reactions within the cell contribute directly to cellular biomass, and so is non-zero only in elements corresponding to internal fluxes. We can therefore rewrite this vector to include only elements corresponding to internal fluxes, so that the objective of the optimization is to maximize γi ⋅ ϕi.
The stoichiometry of metabolite exchange reactions is represented by standard basis vectors [18]. Therefore, we can partition Γi as
(5) |
where I is the identity matrix of appropriate size, and and contain the stoichiometry of the internal reactions [18, 46, 47]. Making this change in notation allows us to see that the optimization problem of flux balance analysis is essentially internal to the cell, with external reactions providing constraints.
We can see from Eq (5) that ker(Γi) is isomorphic to , and so we can maximize over this kernel. Then, the exchange reaction fluxes are determined by the internal fluxes according to the linear mapping . The maximization of FBA becomes a maximization problem over the internal fluxes. We rewrite Eq (4) using Eq (5) and combine with Eqs (2) and (3) to form the differential algebraic system
(6) |
(7) |
(8) |
where each ϕi is determined by the optimization Eq (8), all carried out separately. Note that this is a metabolite mediated model of community growth as defined in [15]. That is, the coupling of the growth of the separate microbes is due to the shared pool of metabolites y. Each separate optimization which determines ϕi at a single time-step depends on y, and each ϕi determines some change in y. Furthermore, each optimization is carried out in a manner that depends only the status of the metabolite pool and is independent from the optimizations of other organisms. There is therefore no shared “community objective”. Instead, each organism optimizes according to only its own internal objective.
We write, for full generality, upper and lower dynamic bounds on internal and exchange reactions, and assume that each function cij(y) ∈ C∞. We let
(9) |
so that we can rewrite the optimization problem Eq (8) as
(10) |
for ease of notation.
We now hope to select a basic index set for Eq (10) for each organism xi so that each ϕi(t) is a solution to the resulting linear system of equations.
Methods
Linear optimization preliminaries
In this manuscript, we will rewrite the FBA optimization problem in the form
(11) |
where the matrices A and Γ† are derived from the stoichiometric matrix and flux constraints. Such a problem is often referred to as a linear program (LP). We now recall some well known results from the study of linear programming (see, for example [40, 48]).
First, we note that Eq (11) can be rewritten in the so-called standard form with the addition of slack variables s = (s1, …, sn) which represent the distance each of the n constraints is from its bound as follows:
(12) |
Standard form requires that we rewrite and then define so that we require non-negativity of each variable, and the matrix , B = −A. We rewrite the problem in this form to make use of established results, and for ease of notation will write ϕ instead of when it is clear which form of the problem we are discussing.
We will make use of the well-known result that there exists an optimal basis or basic set for a bounded linear program [49]. To state this result, we first define the notation to be the matrix with columns of corresponding to some index set , and if is invertible we define the notation so that
(13) |
for any . We may now define a basic optimal solution and corresponding basic index set.
Definition 1 A basic optimal solution to a linear program is an optimal solution along with some index set such that , where c is the vector of constraints as in Eq (12). The variables are referred to as basic variables, and the index set is referred to as the basic index set.
Finally, if there exists a bounded, optimal solution to Eq (12), then there exists a basic optimal solution and corresponding basic index set.
For a given basic optimal solution vector w, there may be more than one basic index set such that . Such a solution is called degenerate. Clearly a necessary condition for such non-uniqueness is that there exists some such that wk = 0. This is also a sufficient condition as long as there is some column of which is not in the column space of .
Forward simulation without re-solving
Consider again Eq (10), the linear program that must be solved at each time point of the dynamical system for each microbial population. Information from prior solutions can inform future time-steps as long as the region of feasible solutions has not qualitatively changed. Thus, we may only need to solve the optimization problem a few times over the course of a simulation. The key observation making this possible is that the simplex method of solving a linear program provides an optimal basis for the solution. We may often re-use this basis for future time-steps within some time interval, and therefore find optimal solutions without re-solving the linear program.
In order to do this, we need to find a form of the solution which may be evolved in time. Thus, we turn the system of linear inequalities given in the linear program into a system of linear equations. Then, if this system has a unique solution we have reduced the task to solving a system of equations rather than optimizing over a system of inequalities. We can find such a system of equations by solving the linear program once, and using this solution to create a system of equations whose solution provides the optimal flux ϕi using a basic index set. We then use this same system to simulate forward without the need to re-solve the optimization problem until the solution to the system of equations is no longer a feasible solution to the linear program.
First, the linear program Eq (10) is transformed into standard form (Eq (12)). Then, a basic optimal solution is found with corresponding basic index set . The dynamical system Eqs (6), (7) and (10) can then be evolved in time using Eq (13). This evolution is accurate until some wij becomes negative (meaning that the solution is no longer a feasible solution to the linear program). At this point, a new basis must be chosen. That is, until becomes infeasible, we let and replace Eqs (6), (7) and (10) with
(14) |
(15) |
One major difficulty in this technique is that a unique wi does not guarantee a unique basis set . If we have some for , then there exists some alternate set such that . Such a solution is called degenerate. In a static implementation of a linear program, the choice of basis of a degenerate solution is not important, as one is interested in the optimal vector and optimal value. However, as we will demonstrate with 1, the choice of basis of a degenerate solution is important in a dynamic problem. In fact, if the system given in Eqs (14) and (15) is evolved forward until becomes infeasible, the time at which the system becomes infeasible is the time at which we have some for . Thus, we need to resolve Eq (10) whenever becomes degenerate, which will be the final time-point at which the is feasible.
Example 1 Consider the dynamic linear program
(16) |
In standard form at t = 0, this linear program becomes
(17) |
which has the unique solution w = (10, 10, 0, 0, 0). There are three choices of basic index sets: , , and . The resulting bases are
Computing Eq (13) at t > 0 for each, we have that yields , yields , and yields , shown in Fig 1 for t > 0. Thus, only solves the dynamic problem because is not optimal and is not feasible for t > 0. We may follow and be insured of remaining at an optimal solution to the linear program until t = 20 + ε, at which point , which is not a feasible solution to the linear program. At time t = 20, a re-optimization is required to choose a new basis.
Notice that the correct choice of basis fundamentally depends on the time-varying bound function c(t) = (10, 10, 30 − t). To see this, consider other possible time-varying bounds c(t) which have c(0) = (10, 10, 30). For example, if c(t) = (10 − t, 10 − t, 30), then only would give the correct w(c(t)) for t > 0.
A basis for the flux vector
We now provide a method to choose a basis for each organism xi in the case of a degenerate solution. Consider an optimal solution wi to the linear program Eq (12). To simulate forward according to Eqs (14) and (15), we need for each organism xi a basic index set such that
(18) |
so that the solution remains feasible, and furthermore that is optimal over the possible choice of basic index sets for wi. This is obviously a necessary condition for forward simulation within some non-empty time interval, and can be made sufficient (although no longer necessary) by making the inequality strict. We use the relaxed condition for more practical applicability.
In order to develop a method based on the above observation (i.e., Eq (18)), we must know that Eq (12) has such a solution. We therefore require the following lemma, which is proved in S1 Text:
Lemma 1 For a linear program with the form given in Eq (12) with a basic optimal solution w, there exists a basic index set such that Eq (18) holds and is optimal over the possible choice of basic index sets for w.
If Eq (12) has only a non-degenerate solution, the unique basis will satisfy this requirement. The challenge remains to choose from among the possible bases of a degenerate solution.
To do this, we form a second linear program analogous to Eq (18) in the following way. We first find all constraints aij (i.e. rows of Ai or ) such that aij ⋅ ϕi = cij(t), calling this set . Note that this set contains all the rows of , for which we regard cij(t) = 0 for all t > 0. Note that if the solution given is a basic optimal solution, the rank of the matrix whose rows are aij for is d, where again d is the number of internal fluxes. This is true because we include constraints of the type a < ϕij < b as rows of Ai.
Then, we solve the linear program
(19) |
We may then use any basis which solves Eq (19) as long as it has exactly d non-basic slack variables. Lemma 1 tells us that such a choice exists, although it may be necessary to manually pivot non-slack variables into the basis set given by the numerical solver. In testing the algorithm, this was necessary when using IBM ILOG CPLEX Optimization Studio to solve, but not when using The Gurobi Optimizer. Note that we do not need the entire basis , but instead only need the d × d submatrix formed by rows of Ai or which correspond to non-basic slack variables in the solution to Eq (19). These appear as rows (ai, 0) in , and so this sub-matrix uniquely determines ϕi. We call this smaller matrix Bi, and label the set of row indices as .
The chosen basis and corresponding constraints are used to simulate forward until that particular solution becomes infeasible. At that time, we have an optimal solution to Eq (10) simply by continuity. We therefore do not need to resolve Eq (10) but instead re-form and solve Eq (19).
Pseudo-Code of the method
Below, we present as pseudo-code an outline of the method. A practical implication may need to adaptively adjust the time-step Δt to insure that no resource is artificially over-depleted past 0.
Algorithm 1: Dynamic FBA algorithm following Lemma 1. Note that for numerical stability and speed, we may store the matrices Qi, Ri such that QiRi = Bi is the QR-factorization of Bi rather than either storing or solving completely during each time step of numerical integration.
Input: Final time T, initial microbial biomasses xi(0), initial nutrient concentrations yj(0), maximum inflow rates of nutrients αi, stoichiometric matrices Γi
OutPut: Timecourse simulation of biomass and nutrient concentrations
1 for each microbial population i do
2 Set wi(0) to be solution to Eq (13) which lies on a vertex of the feasible polytope.;
3 Solve Eq (21) to find initial basis Bi
4 end
5 while t < T do
6 Integrate Eqs (14) and (15) from t to t + Δt with ;
7 if is not a feasible solution then
8 reset xi = xi(t), yj = yj(t);
9 Solve Eq (21) to find new basis Bi, with additional constraints representing bounds violated by .
10 end
11 end
Results
Number of optimizations
We can compare the efficiency of Algorithm 1 with modern dynamic FBA methods by counting the number of times a large linear program must be carried out over the course of a simulation. At their core, state-of-art dynamic FBA tools such as d-OptCom [24] and COMETS [36] employ the direct method of calling an ODE-solving method with the linear program set as the right-hand-side. In the case of Euler’s method, the resulting ODE can be integrated by hand between time-steps. This last strategy is often referred to as the “static optimization approach” [40].
We compared simulation of various combinations of the organisms Escherichia coli str. K-12 substr. MG1655 (model iJR904), Saccharomyces cerevisiae S288C (model iND705), Pseudomonas putida KT2440 (model iJN746) and Mycobacterium tuberculosis H37Rv (model iEK1008), using models from the BiGG database [28] (see S2 Table for details). We counted the optimizations required for our model, as well as for direct methods using the numerical ODE solvers vode, zvode, lsoda, dopri5, and dop853 from the SciPy library. All of these numerical ODE solvers use adaptive step sizes for accuracy and stability, and so represent optimized choices of time-steps. Additionally, we compared the method of Höffner et al. as implemented in the MatLab package DFBAlab [39]. The number of re-optimizations required for each simulation is shown in Table 1 and the time-point of each re-optimization that was carried out is shown in Fig 2.
Table 1. Number of realizations required to simulate to time t = 5 with no cell death or metabolite flow, using M9 minimal medium.
Model Combination | Solution Method | ||||||
---|---|---|---|---|---|---|---|
Algorithm 1 | Höffner | vode | zvode | lsoda | dopri5 | dop853 | |
iJR904 | 7 | 1 | 62 | 62 | 116 | 3313 | 6228 |
iND750 | 4 | 1 | 91 | 91 | 85 | 3508 | 6514 |
iJN746 | 4 | 13 | 166 | 167 | 376 | 1176 | 2249 |
iEK1008 | 4 | 4 | 120 | 120 | 208 | 2768 | 5148 |
iJR904 + iND750 | 4 | 24 | 240 | 211 | 346 | 5586 | 10469 |
iJR904 + iJN746 | 30 | 479 | 420 | 420 | 744 | 2695 | 5579 |
iJR904 + iEK1008 | 20 | 136 | 216 | 216 | 454 | 3385 | 6411 |
iND750 + iEK1008 | 8 | 32 | 311 | 311 | 509 | 5284 | 9888 |
iJR904 + iND750 + iEK1008 | 18 | 32* | 451 | 451 | 1282 | 6225 | 11961 |
iJR904 + iND750 + iJN746 + iEK1008 | 56 | 672 | 1122 | 1122 | 2242 | 6837 | 13529 |
For our method and the direct method, we allowed exchange of every metabolite detailed in S1 Table with initial metabolite concentrations given by that same file, and with initial biomass of 0.3 for each species. The file sim_comm.py in the supplementary repository https://github.com/jdbrunner/surfin_fba contains complete simulation set-up.
To compare with the method of Höffner et al. [40], we use the newly available Python package from the research group of Dr. David Tourigny titled dynamic-fba [50] for single organisms. This package allows simulation without secondary optimizations, as our does, and so is more similar to our prototype tool for comparison. Unfortunately, this package is currently only able to simulate single organisms at the time of publishing. For microbial communities, we can compare with the MatLab package DFBAlab [39] which requires all dynamics variables to be optimized in a secondary optimization. For simulations with DFBAlab, we use only the low-concentration metabolites D-glucose, oxygen, and cob(I)alamin from the M9 medium detailed in S1 Table as dynamically varying metabolites. It is worth noting that these are the most favorable conditions we could find for the method of Höffner [39, 40] et al. which are still biologically equivalent to our other simulations.
Error estimation
Our method provides much less theoretical error in dynamic FBA solutions than traditional methods. In fact, Algorithm 1 implies that a simulation of a microbial community can be divided into time intervals on which the algorithm is exact. Of course, this assumes that the linear ODE solved in these intervals is solved exactly rather than numerically.
Precisely, there exits some sequence t0 = 0 < t1 < ⋯ < tn−1 < tn = T such that if we know the optimal flux vectors wi(tl) at time tl, then Lemma 1 implies the existence of a set of invertible matrices such that solutions to Eqs (14) and (15) are solutions to Eqs (6), (7) and (10) for t ∈ [tl, tl+1]. Therefore, if we are able to identify the tl exactly, then Algorithm 1 provides exact solutions to the dynamic FBA problem Eqs (6), (7) and (10). Of course, numerical limitations imply that we will not re-optimize precisely at each tl, and so we must investigate the impact of this error. However, once re-optimization is done, the method is again exact. The result is that we have no local truncation error for any time step taken between re-optimization after tl and the interval endpoint tl+1, except for error due to numerical integration. In comparison, direct methods provide some integration error at every time step. This error depends on the integration strategy used, and so for example the Euler’s method based static optimization approach carries first order local truncation error at each time step. This can easily lead to ODE overshoot and infeasible linear programs at future time-steps.
Assume that tl−1 is known exactly, and N is such that
so that there is some possible error in the interval [t1, t2]. We can estimate the accumulated error in this time interval using a power series expansion. Let x(t), y(t) be solutions to Eqs (6), (7) and (10) and be solutions given by Algorithm 1 for t ∈ [t1, t2). Furthermore, let be the invertible matrices derived by solving Eq (10) at tl−1 and those derived by solving at tl. Then, and . For each xi we expand, assuming some regularity of the functions c(y),
(20) |
and see that this method gives first order local error in time steps that require a re-optimization.
The local error, while first order, only appears at time steps in which a re-optimization occurred, and so global error will scale with the number of necessary re-optimizations. This is in contrast with the classical use of Euler’s method, which gives first order local error at every time-step, or any other direct ODE method, whose error is dependent on the solver used.
We may compare the solutions provided by direct methods with those provided by the method presented in Algorithm 1 and by the method of Höffner et al. [40]. The root-sum-square (l2) difference in results are shown in Table 2, and example simulations are shown in Fig 3. As we argue above, direct methods are less accurate in theory that the algorithm presented in Algorithm 1. Furthermore, direct simulations routinely failed to simulate to time t = 5 without encountering an infeasible linear program. This infeasibility is the result of numerical error accumulating throughout the simulation. The comparisons in Table 2 can be summarized by three distinct characteristics. First, in the case of S.cerevisiae, the direct methods agree well with the newly presented method. Secondly, in the case of E.coli and M.tuberculosis, error seems to begin accumulating immediately. Finally, in the case of P.putida, the simulations agree well up to some time-point at which the direct method fails and either quits entirely (as in the case of the dopri5 solver which returns small error) or continues at a constant value.
Table 2. l2 difference in solutions to single-organism simulations between direct methods and the method presented in Algorithm 1.
vode | zvode | lsoda | dopri5 | dop853 | Hoffner et al. | |
---|---|---|---|---|---|---|
E.coli | 5.09933 | 5.09933 | 4.61467 | 5.09928 | 5.09928 | 4.68578 |
M.tuberculosis | 1.45401 | 1.45401 | 1.45417 | 1.45415 | 1.45415 | 2.48691 |
S.cerevisiae | 0.00426 | 0.00426 | 0.00430 | 0.00429 | 0.00429 | 3.06105 |
P.putida | 15.29177 | 15.29177 | 0.07080 | 15.23826 | 15.26221 | 4.78751 |
We note that discrepancies in dynamic FBA simulation may not always be due to numerical error, but instead due to non-uniqueness in optimal flux solutions. Our method provides a strategy for choosing between non-unique representations (in the form of a basis) of a single optimal flux solution. The method of Höffner et al. [40] provides a lexicographic strategy for choosing between non-unique optimal flux solutions based on biological, rather than mathematical, considerations. We note that for complete reproducibility, our method should be integrated with some biologically based strategy for choosing between non-unique optima.
Examples & applications
There has been a recent surge in interest in modeling microbial communities using genome-scale metabolic models, much of which has focused on equilibrium methods [4, 21, 22, 26, 51]. In order to capture transient behavior and dynamic responses to stimuli, dynamic FBA has also been applied to microbial communities [24, 34, 52]. However, community dynamic FBA invariably leads to a large dynamical system with a high-dimensional parameter space, often with little to no knowledge of parameter values. Any parameter fitting therefore requires repeated numerical simulation of the system. Existing tools to do this are built around a direct simulation approach, requiring many linear program solutions. By drastically reducing the number of optimizations required for numerical simulation, our approach offers the promise of efficient numerical simulation of dynamic FBA which will make parameter fitting more tractable, and may even allow conclusions without well-fit parameters.
Below, we demonstrate that the problem of parameter fitting is an important one by showing that experimental outcome in even small communities is sensitive to changes in kinetic parameters. Precisely, the kinetic parameters governing the uptake rate of nutrients (i.e., the parameters of the functions in Eq (4)) have a profound effect on species competition.
Next, we show how repeated simulation with randomly sampled parameters can provide some insight into community structure even without a well-fit set of nutrient uptake parameters. These examples demonstrate the importance of efficient dynamic FBA to microbial community modeling.
Prediction dependence on nutrient uptake
The set of unknown functions in Eq (4) present a profound problem for dynamic FBA simulation. If the behavior of the system is sensitive to the functions chosen and parameters of those functions, a single simulation will be of little use in drawing biological conclusion. In order to demonstrate that such a sensitivity exists, we repeatedly simulated the same simple community with different randomly drawn parameters. While a more realistic choice of function may be saturating or sigmoidal (as with Hill or Michaelis-Menten kinetics), for the following experiment we take these functions to be linear:
(21) |
meaning that the maximum uptake rate of nutrient yj by organism xi is proportional to the concentration of yj. This choice minimizes the number of parameters that must be chosen for our analysis of parameter sensitivity, and is in line with an assumption of simple mass action kinetics [53, 54].
The choice of κij may have a profound effect on the outcome of a community simulation, as it represents how well an organism can sequester a resource when this will optimize the organism’s growth. In order study this effect in a small community, we sampled a three-species community model with κij ∈ (0, 1) chosen uniformly at random. We used models for E.coli, S.cerevisiae and M.tuberculosis downloaded from the BiGG model database [28].
We simulated with no dilution of metabolites or microbes, and no replenishment of nutrients. In every simulation, some critical metabolite was eventually depleted and the organisms stopped growing. We recorded the simulated final biomass of each organism from each simulation, and the results are shown in Fig 4.
Community growth effects
As we saw in previous section, community growth outcomes depend on the choice of nutrient uptake rates κij. Using Algorithm 1, we can perform Monte-Carlo sampling in order to understand the possible effects on some microorganism of growing in some community. To do this, we randomly sample the set of uptake rates κij and run simulations of various communities for the chosen uptake rates. Then, the correlation between communities of final simulated biomass of some organism can be interpreted as the effect of the community on the growth of that organism. A correlation less than 1 between growth of an organism in different communities indicates that the community is having some effect. To see the direction of this effect, we can fit a simple linear regression model (best fit line) to the final simulated biomasses. Then, the slope of this line tells us if the organism benefits or is harmed by being in one community over another.
We again simulated E.coli, S.cerevisiae and M.tuberculosis downloaded from the BiGG model database [28]. Simulations were run with the M9 medium described in S1 Table, with no replenishment of resources.
Each organism grew to a larger final simulated biomass when alone compared to when in a trio with the other two, which is unsurprising given the finite resources. This difference was the least pronounced for S.cerevisiae, suggesting that this organism is the least negatively effected by the competition. However, this can be seen as only a preliminary observation without better estimates of uptake parameters. Best-fit lines are shown in Fig 5. Efficient dynamic FBA allows repeated simulation with randomly sampled parameters, which gives an indication of likely behavior even without accurate parameter fitting.
Conclusion
Understanding, predicting, and manipulating the make-up of microbial communities requires understanding a complex dynamic process. Genome-scale metabolic models provide an approximation to this process through the quasi-steady state assumption which leads to dynamic flux balance analysis. However, this system is large and hard to simulate numerically, let alone analyze for qualitative behaviors. As a first step towards a thorough analysis of community of organisms modeled with dynamic FBA, an efficient method of numerical simulation would provide an essential tool. However, modern tools for simulating dynamic FBA rely on repeatedly solving an optimization problem at every time step [24, 31, 35–38].
Dynamic FBA simulation can be improved by considering the structure of these linear programs so that many fewer optimizations are required. As of now, the algorithm of Höffner et al. [40] is the only published method which takes advantage of this observation. However, that method does not account for the degeneracy of solutions to the relevant linear programs, meaning that it can choose a solution that cannot be carried forward in time. We present a method that chooses a basis for forward simulation. In contrast to the method of Höffner et al., we choose this basis in such a way that increases the likelihood that this forward simulation is actually possible.
Efficient dynamic FBA will allow better parameter fitting to time-longitudinal data. Furthermore, it allows for a search of parameter space which can help predict likely model outcomes or learn maps from parameter values to model outcomes.
Supporting information
Data Availability
The source code is available at https://github.com/jdbrunner/surfin_fba.
Funding Statement
This work was supported by funding from the DeWitt and Curtiss Family Foundation (JDB), National Cancer Institute grant R01 CA179243 (NC), and the Center for Individualized Medicine, Mayo Clinic (NC and JDB). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Braundmeier AG, Lenz KM, Inman KS, Chia N, Jeraldo P, Walther-António MRS, et al. Individualized medicine and the microbiome in reproductive tract. Frontiers in Physiology. 2015;6:97 10.3389/fphys.2015.00097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Calcinotto A, Brevi A, Chesi M, Ferrarese R, Perez LG, Grioni M, et al. Microbiota-driven interleukin-17-producing cells and eosinophils synergize to accelerate multiple myeloma progression. Nature communications. 2018;9(1):4832 10.1038/s41467-018-07305-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Flemer B, Lynch DB, Brown JM, Jeffery IB, Ryan FJ, Claesson MJ, et al. Tumour-associated and non-tumour-associated microbiota in colorectal cancer. Gut. 2017;66(4):633–643. 10.1136/gutjnl-2015-309595 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Hale VL, Jeraldo P, Chen J, Mundy M, Yao J, Priya S, et al. Distinct microbes, metabolites, and ecologies define the microbiome in deficient and proficient mismatch repair colorectal cancers. Genome Medicine. 2018;10(1):78 10.1186/s13073-018-0586-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Ng KM, Ferreyra JA, Higginbottom SK, Lynch JB, Kashyap PC, Gopinath S, et al. Microbiota-liberated host sugars facilitate post-antibiotic expansion of enteric pathogens. Nature. 2013;502:96 EP–. 10.1038/nature12503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Round JL, Mazmanian SK. The gut microbiota shapes intestinal immune responses during health and disease. Nature Reviews Immunology. 2009;9:313 EP–. 10.1038/nri2515 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Walsh DM, Mert I, Chen J, Hou X, Weroha SJ, Chia N, et al. The Role of Microbiota in Human Reproductive Tract Cancers. In: American Journal of Physical Anthropology. vol. 168. Wiley 111 River St, Hoboken 07030-5774, NJ USA; 2019. p. 260–261.
- 8. Fisher CK, Mehta P. Identifying keystone species in the human gut microbiome from metagenomic timeseries using sparse linear regression. PloS one. 2014;9(7):e102451 10.1371/journal.pone.0102451 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Friedman J, Higgins LM, Gore J. Community structure follows simple assembly rules in microbial microcosms. Nature Ecology & Evolution. 2017;1:0109 EP–. 10.1038/s41559-017-0109 [DOI] [PubMed] [Google Scholar]
- 10. Goyal A, Maslov S. Diversity, Stability, and Reproducibility in Stochastically Assembled Microbial Ecosystems. Phys Rev Lett. 2018;120:158102 10.1103/PhysRevLett.120.158102 [DOI] [PubMed] [Google Scholar]
- 11. Stein RR, Bucci V, Toussaint NC, Buffie CG, Rätsch G, Pamer EG, et al. Ecological modeling from time-series inference: insight into dynamics and stability of intestinal microbiota. PLoS computational biology. 2013;9(12):e1003388 10.1371/journal.pcbi.1003388 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Sung J, Kim S, Cabatbat JJT, Jang S, Jin YS, Jung GY, et al. Global metabolic interaction network of the human gut microbiota for context-specific community-scale analysis. Nature communications. 2017;8:15393; 15393–15393. 10.1038/ncomms15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Niehaus L, Boland I, Liu M, Chen K, Fu D, Henckel C, et al. Microbial coexistence through chemical-mediated interactions. bioRxiv. 2018. 10.1038/s41467-019-10062-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Posfai A, Taillefumier T, Wingreen NS. Metabolic Trade-Offs Promote Diversity in a Model Ecosystem. Phys Rev Lett. 2017;118:028103 10.1103/PhysRevLett.118.028103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Brunner JD, Chia N. Metabolite-mediated modelling of microbial community dynamics captures emergent behaviour more effectively than species–species modelling. Journal of the Royal Society Interface. 2019;16(159):20190423 10.1098/rsif.2019.0423 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Momeni B, Xie L, Shou W. Lotka-Volterra pairwise modeling fails to capture diverse pairwise microbial interactions. Elife. 2017;6:e25051 10.7554/eLife.25051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Heirendt L, Arreckx S, Pfau T, Mendoza S, Richelle A, Heinken A, et al. Creation and analysis of biochemical constraint-based models: the COBRA toolbox v3. 0. arXiv. arXiv preprint arXiv:171004038. 2017;. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Lewis NE, Nagarajan H, Palsson BO. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nature Reviews Microbiology. 2012;10:291 EP–. 10.1038/nrmicro2737 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Lloyd CJ, Ebrahim A, Yang L, King ZA, Catoiu E, O’Brien EJ, et al. COBRAme: A computational framework for genome-scale models of metabolism and gene expression. PLOS Computational Biology. 2018;14(7):1–14. 10.1371/journal.pcbi.1006302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Chan SHJ, Simons MN, Maranas CD. SteadyCom: Predicting microbial abundances while ensuring community stability. PLOS Computational Biology. 2017;13(5):1–25. 10.1371/journal.pcbi.1005539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Diener C, Resendis-Antonio O. Micom: metagenome-scale modeling to infer metabolic interactions in the microbiota. bioRxiv. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Gottstein W, Olivier BG, Bruggeman FJ, Teusink B. Constraint-based stoichiometric modelling from single organisms to microbial communities. Journal of the Royal Society Interface. 2016;13(124):20160627 10.1098/rsif.2016.0627 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Mendes-Soares H, Mundy M, Soares LM, Chia N. MMinte: an application for predicting metabolic interactions among the microbial species in a community. BMC Bioinformatics. 2016;17(1):343 10.1186/s12859-016-1230-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Zomorrodi AR, Islam MM, Maranas CD. d-OptCom: Dynamic Multi-level and Multi-objective Metabolic Modeling of Microbial Communities. ACS Synthetic Biology. 2014;3(4):247–257. 10.1021/sb4001307 [DOI] [PubMed] [Google Scholar]
- 25. Borer B, Ataman M, Hatzimanikatis V, Or D. Modeling metabolic networks of individual bacterial agents in heterogeneous and dynamic soil habitats (IndiMeSH). PLoS computational biology. 2019;15(6). 10.1371/journal.pcbi.1007127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Koch S, Kohrs F, Lahmann P, Bissinger T, Wendschuh S, Benndorf D, et al. RedCom: A strategy for reduced metabolic modeling of complex microbial communities and its application for analyzing experimental datasets from anaerobic digestion. PLoS computational biology. 2019;15(2):e1006759 10.1371/journal.pcbi.1006759 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T, Bun C, et al. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic acids research. 2017;45(D1):D535–D542. 10.1093/nar/gkw1017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. King ZA, Lu J, Dräger A, Miller P, Federowicz S, Lerman JA, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic acids research. 2016;44(D1):D515–D522. 10.1093/nar/gkv1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Mahadevan R, Edwards JS, Doyle FJ. Dynamic Flux Balance Analysis of Diauxic Growth in Escherichia coli. Biophysical Journal. 2002;83(3):1331—1340. 10.1016/S0006-3495(02)73903-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Varma A, Palsson BO. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Applied and Environmental Microbiology. 1994;60(10):3724–3731. 10.1128/AEM.60.10.3724-3731.1994 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Zhuang K, Izallalen M, Mouser P, Richter H, Risso C, Mahadevan R, et al. Genome-scale dynamic modeling of the competition between Rhodoferax and Geobacter in anoxic subsurface environments. The ISME journal. 2011;5(2):305 10.1038/ismej.2010.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Henson MA, Hanly TJ. Dynamic flux balance analysis for synthetic microbial communities. IET systems biology. 2014;8(5):214–229. 10.1049/iet-syb.2013.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Song HS, Cannon WR, Beliaev AS, Konopka A. Mathematical modeling of microbial community dynamics: a methodological review. Processes. 2014;2(4):711–752. 10.3390/pr2040711 [DOI] [Google Scholar]
- 34. Succurro A, Segrè D, Ebenhöh O. Emergent subpopulation behavior uncovered with a community dynamic metabolic model of Escherichia coli diauxic growth. Msystems. 2019;4(1). 10.1128/mSystems.00230-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Zhuang K, Ma E, Lovley DR, Mahadevan R. The design of long-term effective uranium bioremediation strategy using a community metabolic model. Biotechnology and bioengineering. 2012;109(10):2475–2483. 10.1002/bit.24528 [DOI] [PubMed] [Google Scholar]
- 36. Harcombe WR, Riehl WJ, Dukovski I, Granger BR, Betts A, Lang AH, et al. Metabolic Resource Allocation in Individual Microbes Determines Ecosystem Interactions and Spatial Dynamics. Cell Reports. 2014;7(4):1104—1115. 10.1016/j.celrep.2014.03.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Louca S, Doebeli M. Calibration and analysis of genome-based models for microbial ecology. Elife. 2015;4:e08208 10.7554/eLife.08208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Popp D, Centler F. μbialSim: constraint-based dynamic simulation of complex microbiomes. BioRxiv. 2019; p. 716126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Gomez JA, Höffner K, Barton PI. DFBAlab: a fast and reliable MATLAB code for dynamic flux balance analysis. BMC bioinformatics. 2014;15(1):409 10.1186/s12859-014-0409-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Höffner K, Harwood SM, Barton PI. A reliable simulator for dynamic flux balance analysis. Biotechnology and Bioengineering. 2012;110(3):792–802. [DOI] [PubMed] [Google Scholar]
- 41. Feinberg M, Horn F. Dynamics of open chemical systems and the algebraic structure of the underlying reaction network. Chemical Engineering Science. 1973;29:775–787. [Google Scholar]
- 42. Bradie B. A Friendly Introduction to Numerical Analysis. Pearson Education Inc.; 2006. [Google Scholar]
- 43. Baroukh C, Muñoz-Tamayo R, Steyer JP, Bernard O. DRUM: a new framework for metabolic modeling under non-balanced growth. Application to the carbon metabolism of unicellular microalgae. PloS one. 2014;9(8). 10.1371/journal.pone.0104499 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Øyås O, Stelling J. Genome-scale metabolic networks in time and space. Current Opinion in Systems Biology. 2018;8:51–58. 10.1016/j.coisb.2017.12.003 [DOI] [Google Scholar]
- 45. Zazueta CL, Bernard O, Gouzé JL. Reduction of Metabolic Networks keeping Core Dynamics. Discrete Applied Mathematics. 2018;157(10):2483–2493. [Google Scholar]
- 46. Kondo A, Ishii J, Hara KY, Hasunuma T, Matsuda F. Development of microbial cell factories for bio-refinery through synthetic bioengineering. Journal of biotechnology. 2013;163(2):204–216. 10.1016/j.jbiotec.2012.05.021 [DOI] [PubMed] [Google Scholar]
- 47. Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nature Reviews Genetics. 2014;15(2):107–120. 10.1038/nrg3643 [DOI] [PubMed] [Google Scholar]
- 48. Bertsimas D, Tsitsiklis JN. Introduction to linear optimization. vol. 6 Athena Scientific; Belmont, MA; 1997. [Google Scholar]
- 49. Tardella F. The fundamental theorem of linear programming: extensions and applications. Optimization. 2011;60(1-2):283–301. 10.1080/02331934.2010.506535 [DOI] [Google Scholar]
- 50.Tourigny DS, Muriel JC, Beber ME. dfba: Software for efficient simulation of dynamic flux-balance analysis models in Python; 2020. https://gitlab.com/davidtourigny/dynamic-fba.
- 51. Islam MM, Fernando SC, Saha R. Metabolic modeling elucidates the transactions in the rumen microbiome and the shifts upon virome interactions. Frontiers in microbiology. 2019;10:2412 10.3389/fmicb.2019.02412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Xu X, Zarecki R, Medina S, Ofaim S, Liu X, Chen C, et al. Modeling microbial communities from atrazine contaminated soils promotes the development of biostimulation solutions. The ISME journal. 2019;13(2):494–508. 10.1038/s41396-018-0288-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Horn F, Jackson R. General mass action kinetics. Archive for Rational Mechanics and Analysis. 1972;47. [Google Scholar]
- 54.Feinberg M. Lectures on Chemical Reaction Networks; 1979. http://www.crnt.osu.edu/LecturesOnReactionNetworks.
- 55. Monk JM, Charusanti P, Aziz RK, Lerman JA, Premyodhin N, Orth JD, et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proceedings of the National Academy of Sciences. 2013;110(50):20338–20343. 10.1073/pnas.1307797110 [DOI] [PMC free article] [PubMed] [Google Scholar]