Abstract
Motivation
Most metabolic pathways contain more reactions than metabolites and therefore have a wide stoichiometric matrix that corresponds to infinitely many possible flux distributions that are perfectly compatible with the dynamics of the metabolites in a given dataset. This under-determinedness poses a challenge for the quantitative characterization of flux distributions from time series data and thus for the design of adequate, predictive models. Here we propose a method that reduces the degrees of freedom in a stepwise manner and leads to a dynamic flux distribution that is, in a statistical sense, likely to be close to the true distribution.
Results
We applied the proposed method to the lignin biosynthesis pathway in switchgrass. The system consists of 16 metabolites and 23 enzymatic reactions. It has seven degrees of freedom and therefore admits a large space of dynamic flux distributions that all fit a set of metabolic time series data equally well. The proposed method reduces this space in a systematic and biologically reasonable manner and converges to a likely dynamic flux distribution in just a few iterations. The estimated solution and the true flux distribution, which is known in this case, show excellent agreement and thereby lend support to the method.
Availability and Implementation
The computational model was implemented in MATLAB (version R2014a, The MathWorks, Natick, MA). The source code is available at https://github.gatech.edu/VoitLab/Stepwise-Inference-of-Likely-Dynamic-Flux-Distributions and www.bst.bme.gatech.edu/research.php.
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
A key step of any computational modeling is the identification of an adequate mathematical representation of the phenomenon under study. Only with an appropriate representation of all processes involved in the phenomenon will the model have sufficient predictive capacity with respect to new experiments and data. While the importance of an adequate model choice is quite obvious, most modeling efforts begin by simply assuming mathematical formats for the process representations, even though these are often unproven and may be substantially wrong. For instance, many metabolic systems are modeled with Michaelis-Menten rate laws and their generalizations, even though it is not known to what degree these functions, which were developed for analyses in vitro, are valid in vivo (Savageau 1992, 1995, Tummler et al., 2014).
A method that attempts to infer unbiased process representations from metabolic time series data is dynamic flux estimation (DFE), which was introduced in this journal a few years ago (Goel et al., 2008). DFE consists of two phases: a model-free and a model-based estimation. In phase 1, the dynamic flux profiles are estimated. This occurs through smoothing the time series of metabolite concentrations, for which numerous techniques are available (e.g. Dolatshahi et al., 2014; Eilers, 2003; Seatzu, 2000; Vilela et al., 2007; Whittaker, 1923). The slopes at many time points are then substituted for the time derivatives on the left-hand sides of the differential equations (ODEs) of the model, so that each original ODE is replaced with a system of algebraic equations, which are linear in the fluxes (see below). If the stoichiometric matrix of the system is square and has full rank, these equations can be solved directly with methods of linear algebra. The result is a collection of flux profiles, which can be plotted against the appropriate variables and reveal the shape of each flux, although not its mathematical format. In phase 2, functional formats of the fluxes are chosen, based on their shapes, and parameterization provides a fully characterized kinetic model. By separating the procedures of flux profile estimation and parameterizing the flux profiles, the method minimizes compensation errors which are commonplace in simultaneous parameterizations of the entire system (Goel et al., 2008; Voit, 2011, 2012).
The drawback of DFE in phase 1 is that most stoichiometric matrices are ‘wide,’ because the number of fluxes exceeds the number of metabolites in most pathways. Thus, the system is underdetermined, and linear algebra tells us that infinitely many solutions exist for such a system. As a consequence, a given time series of metabolic concentration data is theoretically consistent with infinitely many different dynamic flux distributions. Expressed differently, a wrong model may not only be obtained from a given dataset due to invalid biological assumptions or omissions of important features, but also due to the fact that numerous equivalent models may exist that represent the dynamics of this dataset with the same, often very reasonable accuracy. However, these alternative models may diverge substantially for other datasets. This divergence implies that a model may be representative of the system within a limited vicinity of the provided dataset, but easily fails to explain the pathway system in an expanded space.
Several approaches have been proposed to address this stoichiometric under-determinedness, ranging from ad hoc strategies to generic mathematical methods. Biologically the most straightforward solution is the use of additional, independent information regarding a flux, which may come from knowledge of its kinetics. Also intuitive is the merging of fluxes where, for instance, a pair of forward and reverse reactions is replaced with a single net reaction, which reduces the degrees of freedom by one. Along the same lines, a cycle of metabolites may be condensed into a single pool (Voit 2013). From a mathematical point of view, one could try to address the underdetermined systems with pseudoinverse techniques (Albert, 1972; Moore, 1920, Penrose, 1955). However, using something like the Moore-Penrose pseudoinverse typically leads to negative fluxes, which are often not feasible biologically (Dolatshahi and Voit, 2016). Because we pretend to know the directionality of all fluxes, we will show that we can filter out solutions that include negative fluxes, which we consider infeasible. The generic, interesting problem we have is not that there is no solution, but that there are too many solutions. Furthermore, we know that if the problem is set up correctly, the true solution should be among them. Thus, appropriate constraints allow us to eliminate wrong solutions and to restrict the space of feasible solution, which should still contain the true solution. Computational approaches may introduce upper and lower bounds for some or all of the fluxes, which constrain the feasible solution space. One could also limit the space of potential flux profiles by eliminating oscillatory solutions that might not be consistent with the biological system. A different strategy for systems at steady state is the optimization of the solution within the feasible space. As a prominent example, stoichiometric and flux balance analyses (FBA) often define maximal growth as an objective function for microbial systems and determine the flux distribution that optimizes this objective function (Gavalas, 1968, Heinrich and Schuster, 1996, Palsson, 2006, Orth et al., 2010). For other systems, the choice of a suitable objective function is sometimes unclear.
In this work, we propose an essentially unbiased method for inferring dynamic flux distributions in underdetermined metabolic models that are statistically most likely. The approach reduces the degrees of freedom of the stoichiometric matrix in a stepwise manner. In favorable situations, the method is able to eliminate all degrees of freedom in a small number of steps and quickly yields a statistically likely distribution of fluxes.
2 Materials and methods
The dynamics of a metabolic pathway system is typically represented by the stoichiometric equation
| (1) |
In this representation, the state variables, , are the metabolites and is the corresponding vector of the rates of change. V is the vector of fluxes and N is the stoichiometric matrix, which describes the connectivity between the fluxes and the pools of metabolites. We assume that time series measurements of the metabolite concentrations are available at K time points. It is to be expected that these data are spread out relatively far and contain moderate noise which, at present, is the typical situation. Nevertheless, continuing improvements in experimental techniques render it likely that future time series will be generated in replicates and are denser and less noisy than what is feasible today. Even moderately noisy data contain rich information, and numerous good smoothers have been developed over the past decades. As long as the data represent the time trends adequately, these smoothers can be employed not only to approximate the true trends, but also to obtain estimates of the slopes at the K time points of measurements. These slopes may be positive or negative, depending on the situation. For instance, it may be possible that a substrate is being used up, which will lead to a decreasing trend. In terms of locally negative slopes that are entirely due to noise in the data, effective modern smoothers provide for the option of predetermining a desired balance between data fit and roughness of the resulting trend curve (e.g. Campbell and Steele, 2012; Dolatshahi et al., 2014; Ramsay et al., 2007; Vilela et al., 2007). In our case, it is important to capture the trend rather smoothly. Of course, there will always be cases where the data are so noisy that they do not reflect the true trends. If so, this method, as well as many others, will not yield reliable results and, in fact, the data themselves seem to be of little value in such cases.
Substituting the estimated slopes for the time derivatives at the K time points, the stoichiometric equation (1) becomes
| (2) |
which, at every time point, is a set of linear algebraic equations in the flux values V, because the left-hand side now contains numerical values (Varah, 1982; Voit and Almeida, 2004; Voit and Savageau, 1982; Voit and Savageau, 1982). Of course, the fluxes are functions of metabolites and thus of time, but at each time point, Eq. (2) is a regular algebraic matrix equation in flux values. If it is possible to solve these equations at every time point, the collective result is a complete set of flux values, for each process and for every time point. Expressed differently, a set of metabolic time series data can be converted into a complete set of fluxes over the measured time horizon, and this conversion is essentially assumption free and unbiased. The fluxes are numerically determined, and their functional formats are unknown. Appropriate formats may be inspired by plots where the values of a flux are plotted against the metabolites that affect this flux, one time point at a time. This procedure was introduced in this journal as Dynamic Flux Estimation (DFE) (Goel et al., 2008) and is also the basis for a novel manner of nonparametric dynamic modeling (Faraji and Voit, 2016).
DFE works very well if Eq. (2) can be solved uniquely. However, the number of fluxes in metabolic pathway systems is typically greater than the number of metabolites, which causes the system to be underdetermined. In other words, given the slopes at a set of specific time points, the system has infinitely many solutions for the vector at these time points, which all match the data perfectly. This infinite set of flux solutions corresponds to an infinite set of models of the system, which all perfectly coincide in the observed dataset. One could surmise that more data points or time series would solve the problem, but that is not necessarily the case, because the redundancy is a structural feature of the stoichiometric matrix and, thus, the connectivity of the pathway system. As it was mentioned earlier, numerous approaches have been proposed to address this issue, including general techniques and ad hoc strategies. All of these approaches require either additional biological information about the system or assumptions that may or may not be justified. Here, we propose a mathematical and computational tool for identifying flux distributions that are in a statistical sense most likely.
2.1 Split ratios at branch points
Most metabolic pathways contain branch points where a compound is used as the starting substrate for two or more pathways with different end products. The amounts of mass entering these different pathways are characterized by the flux split ratio and increase the degrees of freedom in Eq. (2). Figure 1 depicts a very simple hypothetical pathway with two branches, at and . The three metabolites and five fluxes lead to two degrees of freedom. At the steady state, the first split ratio, , is defined as the ratio of to , and is correspondingly given as the ratio of to . In general, and can have any real values between 0 and 1. Due to the conservation of mass, the ratio of to is , and the ratio of to is . Analogous considerations hold for more than two pathways diverging from a branch point.
Fig. 1.

A hypothetical pathway with two degrees of freedom. Split ratios determine the percentage of each flux leaving a metabolite pool toward different pathways, compared with the total influx to the pool
At a transient state, the rate of change in the metabolites needs to be taken into account as well. For the system in Figure 1, system equations are
| (3) |
Given a known input,, and the rates of change in the metabolites, , we can rewrite Eq. (3) such that the fluxes are functions of the input and the split ratios:
| (4) |
In general, the numerical values of split ratios are unknown, although coarse information is available for many branch points. For instance, the amount of material branching off glycolysis into the pentose phosphate pathway is often 10% or lower (e.g. Gumaa and McLean, 1969; Loreck et al., 1987; Fonseca et al., 2011).
If reliable information is available, it can be used to define the numerical range for a split ratio. If not, the range is taken to be [0, 1]. Split ratios collectively form a vector in ℝn, where denotes the degrees of freedom. To explore the repertoire of behaviors of a given system, a large-scale Monte Carlo simulation may be conducted where the set of split ratios forms a hypercube in ℝn. For each time point, the randomly generated split ratios are plugged into Eq. (4), and a set of vectors of fluxes is computed. Some of these are likely infeasible due to biological constraints; for instance, they could contain negative rates for other fluxes in the system. Other constraints, such as upper or lower bounds for some of the fluxes, or any other a priori knowledge regarding the split ratios, are very helpful, as they reduce the feasible solution set. For small numbers of branch points, grid sampling may be preferred over a Monte-Carlo simulation.
2.2 Metabolic energy assumption
Even if various constraints can be imposed, the solution set typically contains infinitely many solutions with diverse dynamic qualities, which all are equivalent in a sense that they fit the metabolic time series data. However, these equivalent solutions often differ quite substantially in the total amount of metabolic energy they incur, because some flux distributions contain overall much higher flux values than others, even though they result in exactly the same metabolic profiles. A measure for this total energy is the Euclidean norm, which we examine at each time point (Fig. 2).
Fig. 2.

Admissible solutions, matrix of the admissible fluxes and array of admissible flux norms. Using a grid search (here for an illustration pathway with two branch points) leads to a matrix representing the entire set of admissible fluxes of the pathway at every time point. To capture the entire time horizon, these matrices are stacked up. At any given time point , some grid points may be inadmissible, for instance, because they contain negative flux values. As a consequence, only a subset of grid points (gray circles) is admissible. Each grid point corresponds to a vector of admissible fluxes that constitute the row of the matrix at each time slice, and different rows are admissible solutions that correspond to different grid points. It should be noted that different sets of grid points may be admissible at different time points. For each row in the matrix of the fluxes, the Euclidean norm is computed. This collection of norms forms a matrix where the columns correspond to different time points and the rows to admissible solutions at each time point
Collectively, for a Monte-Carlo or grid sample of flux distributions, the values of these norms form a frequency distribution at each time point.
Thus, the initially uniform distribution of admissible split ratios is nonlinearly transformed into a non-uniform distribution of norms. The benefit of this transformation is that we can now zoom onto intervals with the most likely flux distributions, namely those intervals that result from many more combinations or split ratios than others, and disregard solutions outside this interval. Specifically, we can estimate the most likely flux distribution norm by minimizing the sum of the distance functions, weighted by the probability distribution of all admissible solutions at each time point. Thus, the objective function takes the form
| (5) |
where the ’s are the norms of admissible solutions and is the probability of according to the distribution of the r admissible solutions at each time point. Setting the first derivative in (5) equal to zero yields the estimator
| (6) |
which is the expected value of the norm distribution, and thus its mean, . Using this mean, as well as the standard deviation at each time point, we select those norms that fall within the range . In a distribution resembling the normal distribution, the range roughly covers 68% of the data. In cases of highly skewed or multimodal distributions, other selection strategies for likely ranges might be preferable. For instance, one could take the smallest range that corresponds to 50% of the mass of the distribution.
2.3 Reducing the degrees of freedom
It is sometimes technically feasible to measure influxes and effluxes, and maybe even interior fluxes within the systems. Such measurements automatically reduce the degrees of freedom (Sherry and Malloy, 2007; Voit et al., 2009). For instance, if flux can be measured, the ith equation in (1) becomes
| (7) |
so that the entire system effectively contains one variable less. Nonetheless, it is rare that the model can be reduced to a full-ranked system. Thus, even though it is possible to constrain the underdetermined solution, the resulting system typically still permits infinitely many solutions.
To achieve further reduction, the sets of fluxes and split ratios are investigated individually. The generic argument is the following: If the vast majority of simulations identifies a particular split ratio that always falls within the same narrow range, then this range is assumed to be most likely, and the split ratio is subsequently fixed at the mean or median of this range. This setting reduces the degrees of freedom among fluxes by one. With the split ratio fixed, a new round of simulations is initiated, and further split ratios within narrow ranges are again fixed. This procedure eventually leads to a unique flux distribution that is in some sense most likely (Fig. 3).
Fig. 3.

Flowchart of the proposed inference method for flux distributions. The input to the method consists of smoothed time series of metabolite concentrations, from which slopes are computed. A Monte-Carlo or grid simulation generates very many flux distributions corresponding to sampled split ratios. Non-negative (and possibly otherwise constrained) flux vectors are retained. Euclidean norms of the retained flux vectors are calculated at each time point, and all solutions within the range are kept. The time series of split ratios and the flux distributions of the selected solutions are plotted, and solutions with relatively small variations are numerically fixed at their means or medians in order to decrease the degrees of freedom. The process is iterated until it yields a unique solution
A challenging situation occurs when all fluxes hold wide variations, which would prevent the algorithm from proceeding. This case is addressed in the discussion.
3 Results
Case study: Lignin biosynthesis in switchgrass
Switchgrass (Panicum virgatum) has been identified by the U.S. Department of Energy as a target source for bioethanol production (BioEnergy Science Center; http://bioenergycenter.org/besc/research/biomass.cfm). In previous work (Faraji et al., 2015), we analyzed the structure and regulation of its lignin biosynthesis pathway, with the ultimate goal of prescribing gene or enzyme modulations leading to more favorable lignin production. Specifically, we constructed a computational model of the pathway using steady-state data of the wild type and four transgenic strains of switchgrass (Fig. 4). The data included measured levels of H-, S- and G-lignin, which are the building blocks of the lignin heteropolymer. In this paper, we use this model as an illustration where we have full knowledge of the system and its features. Specifically, we generate virtual time series of concentrations of all metabolites in the pathway and pretend that they were smoothed experimental data. The aim is to infer the most likely flux distribution using the method proposed in the previous section.
Fig. 4.

Lignin biosynthesis pathway in switchgrass. Seven branch points give rise to as many degrees of freedom. At each branch point, the total efflux splits into two or three fluxes. The parametric split ratios are shown with capital letters. Conservation of mass dictates the sum of the split ratios at each branch point to be 1. Adapted from (Faraji et al., 2015)
The lignin biosynthesis pathway consists of 16 metabolites and 23 fluxes, which result in seven branches; in Figure 4, the split ratios are marked with capital letters. The variables are
| (8) |
The system equations can be rewritten for the split ratio analysis as follows:
| (9) |
Here is the input, and is the vector of split ratios. For normalized comparisons, is set as 100%, so that the input of the system is a unit flux which is distributed throughout the pathway.
The seven degrees of freedom correspond to an initial sampling space in ℝ7. A priori knowledge about the pathway helps us to reduce the sampling space from a unit hypercube to a smaller volume. Namely, biological information about the pathway attests that the ratio between the production of S-lignin and G-lignin in switchgrass is in the vicinity of 1, which allows us to define a constraint for the ratio of . It is also known that H-lignin accounts for only about 3% of the total lignin. We use this information to constrain loosely to a value smaller than 10% of the flux that leaves the pool . Furthermore, consumes at most 10% of the efflux from . Thus, along with the constraint for , we may set a loose lower bound of 80% for . Similarly, it is reasonable to assume that accounts for roughly 10% of the efflux from , which allows us to constrain . Taken together, the sampling space for a grid search in the first round is
| (10) |
Regarding the S/G ratio, we set the range
| (11) |
which allows for 20% deviation from 1.
Using the settings in (10) and (11), we generated a grid in ℝ7. Each grid point corresponds to a seven-dimensional vector that contains the values for the seven split ratios. We substituted these vectors back into Eq. (9) and computed the flux distributions for all grid points. Intriguingly, only 5% of these flux distributions had entirely positive values at all time points and were retained.
As the next step we computed the norms of the flux distributions. Figure 5 is a visualization of the time-array of the norms. Each vertical slice corresponds to a column in the matrix of the norms and depicts the distribution of the norms of the admissible fluxes at a specific time point. The gray band exhibits the range . As it was described in the Methods section, we record the solutions within the gray band and discarded the rest.
Fig. 5.

Distribution of flux norms in iteration 1. Left panel: The gray band depicts the range , which contains about two thirds of the solutions, at each time point; the mean is shown in black. The thick dark blue boxes represent the second and third quartiles, and the white line is the median, which is similar to the mean. The thin blue lines are the first and fourth quartiles. Right panel: Histogram of norms in the left panel at the steady state, and range (grey) (Color version of this figure is available at Bioinformatics online.)
All split ratios and flux distributions corresponding to the retained solutions (gray band in Fig. 5) are plotted in Figure 6. The plots indicate that both and show small variations. The computed mean of is 0.04. Therefore, for the next iteration we fix , which decreases the degrees of freedom by one. At this point, we could have fixed as well, but for our illustration we will not take this shortcut. The same results, restricted to the steady state, are shown in Figure 7i for comparisons with further iterations.
Fig. 6.

Split ratios and flux distributions within the range of admissible solutions in iteration 1. Similar to Figure 5, the boxplots at each time point reflect the quartiles. Panel (a) depicts the split ratios and panel (b) the flux distributions. Note that four of the fluxes already exhibit rather narrow ranges. The first three fluxes are independent of the split ratios, and therefore not shown
Fig. 7.

Steady-state split ratios within the range of admissible solutions. The subpanels correspond to the last time point (steady state) of each iteration. Each horizontal bar shows the median at the given iteration, and the boxplots represent the quartiles of split ratios
Fixing reduces the sampling space for the second iteration to ℝ6. Since , and because we considered an upper bound of 10% for , we can refine the sampling interval of and set its lower bound to 86%. Computing the flux distributions using the refined sampling space and screening for nonnegative solutions determines the admissible solutions of iteration two. Similar to iteration 1, the norms of the admissible fluxes are computed and the solutions within the range are retained. The time array of split ratios and flux distributions corresponding to the retained admissible solutions are exhibited in Supplementary Figure S1. Figure 7ii exhibits the split ratios corresponding to the retained admissible solutions at steady state. Split ratio exhibits the smallest variation, and therefore is fixed at its mean value, 0.89, for the third iteration.
For the third iteration we sample the array of from ℝ5, and compute the flux distributions. The admissible fluxes within the range are retained (see Supplementary Fig. S2b). The corresponding split ratios at steady state are illustrated in Figure 7iii. The plot suggests that is the best candidate to be fixed in this iteration. We assign , the mean value, and proceed to the fourth iteration.
Following the same procedure as in previous iterations, we achieve the split ratios and flux distributions of the retained admissible solutions with four degrees of freedom, associated with , , and (see Supplementary Fig. S3). Although the width of the distribution is not all that small, the standard deviation of is relatively small and the distribution is dense at the lower bound. Therefore, for the next iteration, we fix on its mean value, equal to 0.66 (Fig. 7iv).
Figure 7v indicates that the distributions of the remaining split ratios in iteration five are relatively wider than before so that the next choice is ambiguous. However, the plot of in Supplementary Figure S4b indicates a very narrow range, independent of the split ratios. Thus, we can fix and compute for any given value of such that conservation of mass is preserved at . The split ratio is then automatically determined by (see Fig. 4) rather than by sampling from the grid, which reduces the degrees of freedom to two. Furthermore, fixing leads to the direct identification of and . One could simultaneously fix and . However, this strategy would not reduce the degrees of freedom further, but add an extra constraint, which would make the system overdetermined (see Fig. 4). A regression model could then determine the optimal flux distribution. We do not pursue this solution here.
In iteration six, only the split ratios C and E remain to be unknown (Fig. 7vi). Variation in the split ratio F is explained by the fact that although V18 is fixed, V16 is dependent on V14, where V14 itself is a function of the unknown split ratio E (see Supplementary Fig. S4 in Faraji and Voit, 2016). Therefore, sampling E from the grid space leads to variation in V14, V16 and consequently F as the ratio of the two. Thus, we set E to its mean value of 0.65 and proceed to the next iteration.
Figure 7vii shows that in this iteration the range of the norms of the admissible solutions (corresponding to the gray band) is not sensitive to the value of split ratio , and the retained flux distributions (Supplementary Fig. S5a) are such that can have values almost throughout the entire interval of (0, 1). This observation leads to the conclusion that any error in does not affect the flux norms distribution much. At the same time, choosing the mean value statistically minimizes the error of the estimation, as we argued in the Methods section. Thus, we set to is mean, 0.45.
With this setting, all degrees of freedom are eliminated, and the system is fully determined by the end of iteration seven. Figure 8 exhibits the likely split ratios and flux distributions, superimposed on the corresponding fluxes in the reference model. The match between the true and the estimated solution is very good. Supplementary Figure S7 depicts the time-trend of the norms of the likely flux distribution.
Fig. 8.

Inferred likely split ratios and flux distributions (dashed red) in comparison with the corresponding model features (green) (Color version of this figure is available at Bioinformatics online.)
4 Discussion
In this work, we propose a method for inferring likely dynamic flux distributions in metabolic pathways from time series of metabolite concentrations. The method utilizes customized computational techniques that explore the space of solutions given by the stoichiometric matrix. Importantly, the method readily allows the inclusion of a priori knowledge regarding the pathway, including diverse biological constraints. To compare flux distributions collectively, we employed the Euclidean norm, which in some sense is a metric of the total metabolic energy consumption. High values of this norm indicate high flux rates, which seem wasteful, since all flux distributions yield exactly the same metabolic profile. Very low values are presumably disadvantageous as well, as they do not provide enough robustness and bandwidth to respond to changes in metabolic demand.
Thus, we decided to focus on those combinations of split ratios that resulted in flux distributions in the center of the distribution of their norms, which we defined as , and which contains about two thirds of the mass of the distribution if it is more or less normal. This strategy led to some rather narrow and some wider bands of dynamic trends, which suggested which split ratios to fix in the next iteration. Ultimately, this process resulted in unique, time-dependent distributions.
For the illustration example of the pathway of lignin biosynthesis in switchgrass, the method converged within a few steps and yielded flux trends very similar to those in the model used to generate the data. This strong similarity between the inferred and actual fluxes certainly does not validate the method, but lends it support. The method performed similarly well when we applied it to a simplified model of purine metabolism. These results can be found in the Supplements. We furthermore tested a much simpler version of this method to a fermentation pathway with two branch points (results not shown; but see (Cascante et al., 1995; Curto et al., 1995; Faraji and Voit, 2016; Sorribas et al., 1995)).
Like any other modeling framework, the proposed method has its own advantages and drawbacks. As we demonstrated, it works well under favorable conditions. However, the method has two issues. First, it requires good, representative data, which at this point in time are rare. However, the methodologies of molecular biology have advanced very rapidly in recent years (Bruggner et al., 2014; Cohen 2009; Li et al., 2013; Neves et al., 2005). Whereas there were essentially no time series data two decades ago, they have become quite common, and it is to be expected that more, better and cheaper methods will emerge for measuring representative time series datasets. Secondly, the method emphasizes statistical likelihood, but it is of course possible that some pathway is naturally parameterized in a very specific, non-average fashion. The only guard against this situation is additional biological information that may be used to constrain some split ratios or fluxes.
A technical issue may ensue when none of the dynamic trends in split ratios or flux norms displays narrow bands. One possible reason for this situation is the existence of unrecognized relationships among fluxes. Such relationships may be identifiable with methods such as principal component analysis (PCA). For instance, it might be possible to identify a specific relationship between two fluxes that allows the numerical coupling between the two. This situation lowers the degrees of freedom by one, and allows the algorithm to continue.
5 Conclusions
The choice of an adequate model representation is paramount for uses of a model under new conditions. A considerable challenge in this model selection is the fact that compensation among its components may allow drastically different models that fit training data very well, but may fail spectacularly for other data. Dynamic Flux Estimation (DFE) addresses this issue by characterizing the shapes of individual fluxes within pathway systems from metabolic time series data. These flux shapes can subsequently be converted into explicit, parameterized functions (Goel et al., 2008) or into libraries for nonparametric modeling (Faraji and Voit, 2016). Unfortunately, DFE suffers from the existence of infinitely many equivalent solutions in underdetermined stoichiometric systems, and there is little guidance with respect to the best solutions within this set. The method introduced here reveals likely dynamic flux distributions, based on relatively general assumptions. Of course, the method is not failsafe, but with the quickly advancing development of techniques for generating high-quality experimental time series data, we expect it to become increasingly more powerful and provide a straightforward and relatively unbiased approach to the computational characterization of metabolic pathways.
Supplementary Material
Acknowledgements
The authors are very grateful to Luis L. Fonseca for providing stimulating discussions and valuable feedback.
Funding
This work was supported in part by the following grants: National Science Foundation [MCB-0958172, MCB-0946595 and MCB-1517588; PI: EOV; MCB 1411672; PI: Diana Downs; DEB-1241046; PI: Kostas Konstantinides]; National Institute of Health [1P30ES019776-01A1, Gary W. Miller, PI]; and DOE-BESC [DE-AC05-00OR22725; PI: Paul Gilna]. BESC, the BioEnergy Science Center, is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. This project was furthermore supported in part by Federal funds from the US National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract # HHSN272201200031C, which supports the Malaria Host–Pathogen Interaction Center (MaHPIC). The funding agencies are not responsible for the content of this article.
Conflict of Interest: none declared.
References
- Albert A.E. (1972). Regression and the Moore-Penrose Pseudoinverse. Mathematics in Science and Engineering v. 94. New York, Academic Press. 1 online resource (xiii, 180 pages). [Google Scholar]
- Bruggner R.V. et al. (2014) Automated identification of stratifying signatures in cellular subpopulations. Proc. Natl. Acad. Sci. U. S. A., 111, E2770–E2777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campbell D., Steele R.J. (2012) Smooth functional tempering for nonlinear differential equation models. Stat. Comput., 22, 429–443. [Google Scholar]
- Cascante M. et al. (1995) Comparative characterization of the fermentation pathway of Saccharomyces cerevisiae using biochemical systems theory and metabolic control analysis: steady-state analysis. Math Biosci., 130, 51–69. [DOI] [PubMed] [Google Scholar]
- Cohen A. (2009) Mass spectrometry, review of the basics: electrospray, MALDI and commonly used mass analyzers (vol 44, p. 210, 2009). Appl. Spectrosc. Rev., 44, 362–362. [Google Scholar]
- Curto R. et al. (1995) Comparative characterization of the fermentation pathway of Saccharomyces cerevisiae using biochemical systems theory and metabolic control analysis: model definition and nomenclature. Math Biosci., 130, 25–50. [DOI] [PubMed] [Google Scholar]
- Dolatshahi S. et al. (2014) A constrained wavelet smoother for pathway identification tasks in systems biology. Comput. Chem. Eng., 71, 728–733. [Google Scholar]
- Dolatshahi S., Voit E.O. (2016) Identification of metabolic pathway systems. Front. Genet., 7, 6.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eilers P.H.C. (2003) A perfect smoother. Anal. Chem., 75, 3631–3636. [DOI] [PubMed] [Google Scholar]
- Faraji M. et al. (2015) Computational inference of the structure and regulation of the lignin pathway in Panicum virgatum. Biotechnol. Biofuels, 8, 151.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faraji M., Voit E.O. (2016) Nonparametric dynamic modeling. Math. Biosci., pii: S0025-5564(16)30113-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fonseca L.L. et al. (2011) Complex coordination of multi-scale cellular responses to environmental stress. Mol. BioSyst., 7, 731– 741. [DOI] [PubMed] [Google Scholar]
- Gavalas G.R. (1968). Nonlinear Differential Equations of Chemically Reacting Systems. Springer-Verlag, Berlin. [Google Scholar]
- Goel G. et al. (2008) System estimation from metabolic time-series data. Bioinformatics, 24, 2505–2511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gumaa K.A., McLean P. (1969) The pentose phosphate pathway of glucose metabolism. Enzyme profiles and transient and steady-state content of intermediates of alternative pathways of glucose metabolism in Krebs ascites cells. Biochem. J., 115, 1009–1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinrich R., Schuster S. (1996). The Regulation of Cellular Systems, Chapman & Hall; New York. [Google Scholar]
- Li S.Z. et al. (2013) Predicting network activity from high throughput metabolomics. Plos Comput. Biol., 9, [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loreck D.J. et al. (1987) Regulation of the pentose phosphate pathway in human astrocytes and gliomas. Metab. Brain Dis., 2, 31–46. [DOI] [PubMed] [Google Scholar]
- Moore E.H. (1920) On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc., 26, 394–395. [Google Scholar]
- Neves A.R. et al. (2005) Overview on sugar metabolism and its control in Lactococcus lactis – The input from in vivo NMR. FEMS Microbiol. Rev., 29, 531–554. [DOI] [PubMed] [Google Scholar]
- Orth J.D. et al. (2010) What is flux balance analysis? Nat. Biotechnol., 28, 245–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palsson B. (2006) Systems Biology: Properties of Reconstructed Networks. Cambridge University Press, New York. [Google Scholar]
- Penrose R. (1955) A generalized inverse for matrices. Proc. Camb. Phil. Soc., 51, 406–413. [Google Scholar]
- Ramsay J.O. et al. (2007) Parameter estimation for differential equations: a generalized smoothing approach. J. R. Stat. Soc. Ser. B, 69, 741–796. [Google Scholar]
- Savageau M.A. (1992) Critique of the enzymologist's test tube. In: E.E Bittar. (ed.) Fundamentals of Medical Cell Biology. vol. 3A, pp. 45–108. JAI Press Inc, Greenwich, CT. [Google Scholar]
- Savageau M.A. (1995) Enzyme kinetics in vitro and in vivo: Michaelis–Menten revisited. In: Bittar E.E. (ed.) Principles of Medical Biology. vol. 4, pp. 93–146. JAI Press Inc, Greenwich, CT. [Google Scholar]
- Seatzu C. (2000) A fitting based method for parameter estimation in S-systems. Dyn. Syst. Appl., 9, 77–98. [Google Scholar]
- Sherry A.D., Malloy C.R. (2007) Integration of 13C Isotopomer Methods and Hyperpolarization Provides a Comprehensive Picture of Metabolism. eMagRes, John Wiley & Sons, Ltd., pp. 885–900. [Google Scholar]
- Sorribas A. et al. (1995) Comparative characterization of the fermentation pathway of Saccharomyces cerevisiae using biochemical systems theory and metabolic control analysis: model validation and dynamic behavior. Math. Biosci., 130, 71–84. [DOI] [PubMed] [Google Scholar]
- Tummler K. et al. (2014) New types of experimental data shape the use of enzyme kinetics for dynamic network modeling. FEBS J., 281, 549–571. [DOI] [PubMed] [Google Scholar]
- Varah J.M. (1982) A spline least squares method for numerical parameter estimation in differential equations. SIAM J. Sci. Stat. Comput., 3, 28–46. [Google Scholar]
- Vilela M. et al. (2007) Automated smoother for the numerical decoupling of dynamics models. BMC Bioinformatics, 8, 305.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voit E.O. (2011). What if the fit is unfit? Criteria for biological systems estimation beyond residual errors. In: Dehmer M.et al. (eds.) Applied Statistics for Network Biology: Methods in Systems Biology, pp. 183–200. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany. [Google Scholar]
- Voit E.O. (2012) A First Course in Systems Biology. Garland Science, New York, NY. [Google Scholar]
- Voit E.O. (2013) Characterizability of metabolic pathway systems from time series data. Math. Biosci., 246, 315–325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voit E.O., Almeida J. (2004) Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics, 20, 1670–1681. [DOI] [PubMed] [Google Scholar]
- Voit E.O. et al. (2009) Estimation of metabolic pathway systems from different data sources. IET Syst. Biol., 3, 513–522. [DOI] [PubMed] [Google Scholar]
- Voit E.O., Savageau M.A. (1982) Power-law approach to modeling biological systems; III. Methods of analysis. J. Ferment. Technol., 60, 223–241. [Google Scholar]
- Voit E.O., Savageau M.A. (1982) Power-law approach to modeling biological systems; II. Application to ethanol production. J. Ferment. Technol., 60, 229–232. [Google Scholar]
- Whittaker E. (1923) On a new method of graduation. Edinburgh Math. Soc., 63–75. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
