Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2021 Mar 23;37(18):2938–2945. doi: 10.1093/bioinformatics/btab194

Probabilistic thermodynamic analysis of metabolic networks

Mattia G Gollub 1, Hans-Michael Kaltenbach 2, Jörg Stelling 3,
Editor: Lenore Cowen
PMCID: PMC8479673  PMID: 33755125

Abstract

Motivation

Random sampling of metabolic fluxes can provide a comprehensive description of the capabilities of a metabolic network. However, current sampling approaches do not model thermodynamics explicitly, leading to inaccurate predictions of an organism’s potential or actual metabolic operations.

Results

We present a probabilistic framework combining thermodynamic quantities with steady-state flux constraints to analyze the properties of a metabolic network. It includes methods for probabilistic metabolic optimization and for joint sampling of thermodynamic and flux spaces. Applied to a model of Escherichia coli, we use the methods to reveal known and novel mechanisms of substrate channeling, and to accurately predict reaction directions and metabolite concentrations. Interestingly, predicted flux distributions are multimodal, leading to discrete hypotheses on E.coli’s metabolic capabilities.

Availability and implementation

Python and MATLAB packages available at https://gitlab.com/csb.ethz/pta.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Constraint-based models (CBMs) aim to characterize the metabolic capabilities of an organism, potentially at genome-scale, by predicting metabolic fluxes based on constraints arising, for example, from the structure of the metabolic network and the steady-state assumption (Orth et al., 2010). Because many degrees of freedom exist for the network’s (potential, if not practical) operation, a central object of the analysis is the flux space. Formally, the flux space F of a metabolic network with n reactions and m metabolites is the set of flux distributions vRn that satisfy steady state and additional flux constraints:

S·v=0,lbvub, (1)

where SRm×n is the stoichiometric matrix (encoding the network structure) and lbRn and ubRn are measured or assumed lower and upper bounds on fluxes (encoding, e.g. reaction reversibilities).

The most common approaches to analyzing F predict particular flux solutions, for example, by assuming cellular objectives in flux balance analysis (FBA) (Orth et al., 2010). To explore and characterize all possible solutions that lead to observed phenotypes or desired goals, uniform sampling (US) (Haraldsdóttir et al., 2017; Price et al., 2004) and its extensions in the form of loopless (Desouki et al., 2015; Saa and Nielsen, 2016) and non-uniform sampling (Binns et al., 2015; Heinonen et al., 2019; Keaty and Jensen, 2020) provide a comprehensive description of the flux space. Compared to characterizations by metabolic pathway analysis (Terzer and Stelling, 2008), sampling predictions directly relate to experimental flux distributions.

However, available constraints often restrict the flux space insufficiently. Different -omics data from high-throughput measurement technologies can in principle be used to derive additional constraints, but data integration is challenging and the resulting constraints are condition-specific (Ramon et al., 2018). In contrast, constraints based on thermodynamics derive from first principles; they are absolute and independent of specific kinetic properties or regulatory mechanisms. Therefore, modeling thermodynamic laws has the potential to reveal the options available to the cell.

In a metabolic model, thermodynamics quantifies the favorability of a set Γ of γ reactions (excluding unbalanced pseudo-reactions such as biomass and exchange reactions) with Gibbs reaction energies ΔrGRγ:

ΔrG=ΔrG°+RT·SΓ·ln c, (2)

where cRm are the compartment-specific metabolite concentrations, ΔrG°Rγ the standard Gibbs reaction energies [corrected for pH, ionic strength and transport between compartments (Haraldsdóttir et al., 2012)], R the gas constant, T the temperature and SΓ is the transpose of the stoichiometric sub-matrix corresponding to Γ. The second term of (2) (here called the activity term) maps the contribution of metabolite concentrations to the reactions they participate in. Reaction energies constrain fluxes via the second law of thermodynamics, which implies that positive (negative) net flux can only occur if the free energy of a reaction is negative (positive). This ensures that biochemical processes result in net energy dissipation.

How well thermodynamics constrain a model depends on the precision of the estimates of the variables in (2). Standard reaction energies can be estimated from measured equilibrium constants (Goldberg et al., 2004) by the group contribution method and its extensions (Du et al., 2018a; Jankowski et al., 2008; Noor et al., 2013). Because multiple reactions may affect the same chemical groups, their estimates are often linearly dependent and thus lie in a lower-dimensional subspace (Fig. 1A). Metabolomics data can provide precise estimates or approximate distributions (Park et al., 2016) for metabolite concentrations (Fig. 1B).

Fig. 1.

Fig. 1.

Integration of thermodynamics in CBMs. (A) Group contribution methods provide estimates of standard reaction energies. The uncertainties of multiple estimates may be correlated and lie in a lower dimensional subspace. (B) Probability distributions of metabolite concentrations can be constructed from measured or assumed data. (C) The space of reaction energies and metabolite concentrations defined by TMFA, with regions that are not feasible at steady-state removed (red). The space is defined by independent bounds on each variable of ln c and ΔrG°, leading to an overapproximation of the uncertainty. (D) PTA models the complete uncertainty information probabilistically, leading to a joint probability distribution over the reaction energies of the entire network. Steady-state flux constraints restrict it to feasible orthants

Thermodynamics-based Metabolic Flux Analysis (TMFA) (Henry et al., 2007) integrates thermodynamics into CBMs with (i) a set of linear constraints describing (2) and bounds on its quantities and (ii) integer constraints enforcing the second law of thermodynamics. TMFA has been used to predict metabolite concentrations (Henry et al., 2007; Park et al., 2016) and pathway favorability (Chiappino-Pepe et al., 2017; Du et al., 2018b). However, it describes the uncertainty by independent error bounds for each variable. This facilitates the construction of optimization problems, but also over-approximates the uncertainty. Figure 1C illustrates a space of thermodynamic variables in TMFA: independent constraints on ln c and ΔrG° constrain ΔrG as well. In addition, the steady-state condition (1) excludes orthants that imply directions without a feasible flux distribution in F. This approach discards any correlation between the ΔrGi° of different reactions and the probability of a particular solution.

The common construction of CBMs for pure flux-based analysis poses a further challenge. Model curators or automatic pipelines regularly need to prevent unrealistic internal or ATP-generating cycles (Schilling et al., 2000). Because the flux balance framework does not resolve this problem consistently, assumptions or condition-specific biological knowledge are employed to assign reaction reversibilities. Importantly, missing knowledge or inaccurate irreversibilities can impose false thermodynamic constraints that do not affect flux balance and thus remain undetected.

To address these challenges, here we propose Probabilistic Thermodynamic Analysis (PTA), a framework in which the uncertainty of free energies and concentrations is modeled with a joint probability distribution (Fig. 1D). We develop optimization and sampling methods based on PTA and show their application to predict substrate channeling, reaction directions and metabolite concentrations, as well as to explore the capabilities of the metabolic network.

2 Materials and methods

2.1 The steady-state thermodynamic space

To account for correlations between the uncertainties of reaction energies of different reactions, we assume that metabolite concentrations follow a log-normal distribution based on measurements or an assumed physiological distribution (Park et al., 2016)

ln cN(μc,Σc). (3)

From the estimates of a group contribution method (Noor et al., 2013), we obtain a normal distribution for the standard reaction energies

ΔrG°N(μ°,Σ°). (4)

From (2), it follows that the reaction energy estimates are also normally distributed as

ΔrGN(μr=R·μcμ°, Σr=R·[Σc00Σ°]·R), (5)

where R=[RT·SΓ,I]. For computational convenience, we define the vector t=[ln c,ΔrG°,ΔrG]. It is again normally distributed with mean μt=[μc,μ°,μr] and covariance matrix

Σt=[Σc0RT·Σc·SΓ0Σ°Σ°RT·SΓ·ΣcΣ°Σr]. (6)

We use this distribution to constrain t to values that are compatible with the estimated concentrations and standard free energies up to a confidence level α, such that

(tμt)·Σt+·(tμt)χq;α2, (7)

where Σt+ is a pseudo-inverse of Σt, of rank q=rank(Σt+)=rank(Σt) and χq;α2 is the α-quantile of the χ2-distribution with q degrees of freedom. We exploit that q is much smaller than the dimension of t by using a q-dimensional standard normal distribution to express t as

t=μt+Q·m, (8)

where Q=Σt1/2 is a matrix square root such that Σt=QQ.

Lastly, we enforce the second law of thermodynamics and require that all reactions in Γ have a well-defined direction, that is, non-zero flux. For each i, 1iγ:

vΓ(i)·ΔrGi<0. (9)

Note that this requires reactions that are structurally blocked (e.g. by involving dead-end metabolites) to be removed in advance; they would conflict with (9).

We define the steady-state thermodynamic space (or short: thermodynamic space) T as the set of t that fulfill (7) and satisfy the steady-state constraints (1), (9). In other words, T defines the possible reaction energies and metabolite concentrations at steady-state.

2.2 Probabilistic metabolic optimization (PMO)

The definitions of F and T allow us to extend the FBA framework to include thermodynamic quantities similar to TMFA. We translate (1) and (7) into linear and quadratic constraints and rewrite (9) with integer constraints in the big-M formulation (Williams, 2013). The resulting Mixed-Integer Quadratically Constrained Program (MIQCP) can be used to perform FBA-like analyses under thermodynamic constraints.

Since we know the probability distribution of t, we can also determine the most probable reaction energies and metabolite concentrations by setting a quadratic objective that maximizes the probability of t under steady-state constraints:

m*=argminm,vm22s.t.Eq.(1),Eq.(7),Eq.(9). (10)

We call t*=μt+Q·m* the mode of the distribution of t.

2.3 Thermodynamic assessment of metabolic models

Structural assessment aims to detect thermodynamic inconsistencies in a model, that is, groups of reactions that, given their irreversibility annotation, cannot satisfy thermodynamic constraints with any assignment of concentrations and standard free energies. For example, two stoichiometrically equivalent paths can be irreversible in opposite directions; then, all non-zero flux solutions contain a thermodynamically infeasible internal cycle.

To resolve such cases, we first block all reactions related to exchanges, biomass and ATP maintenance and enumerate all internal Elementary Flux Modes (EFMs) (Terzer and Stelling, 2008). Next, we select all the forced internal cycles, that is, the internal EFMs that must be active in all non-zero flux distributions. We find these cycles using linear programming to verify that changing any of the signs of an EFM in the original model always results in an unfeasible problem. We then inspect the reactions in the forced internal cycles to find and resolve inconsistent irreversibilities.

Quantitative assessment of a model aims to find features in PMO predictions indicative of knowledge gaps or model inaccuracies. First, we construct a condition-specific model with available flux and concentration constraints from a base model and use PMO to estimate t*. Next, we compute the z-score zi for each predicted value ti*, which measures the deviation between ti* and μi in units of standard deviation (see Supplementary Information). We focus on metabolite concentrations and classify predictions with |zi| greater than a threshold θ (or ci10mM for non-intracellular metabolites) as anomalies and investigate whether those indicate missing or unknown mechanisms or model inaccuracies. Finally, we curate the base model with the results of the search. The values in t* represent the unlikely case in which predicted standard free energies are as close as possible to the mean estimate and reactions can operate close to the thermodynamic equilibrium. Because the predicted deviations likely underestimate the true deviations, we use a conservative θ = 1. We repeat this process for a selected set of representative conditions. Note that curation actions are applied to the base model—further condition-specific models can be constructed without needing to curate each condition manually.

2.4 Thermodynamics and flux sampling (TFS)

We use sampling to characterize T in terms of probability distributions of metabolite concentrations as well as probabilities for the different flux modes (see algorithm 1). Geometrically, T is the intersection between an ellipsoid and the orthants with at least one steady-state flux distribution satisfying flux and thermodynamic constraints. This space cannot be defined explicitly, and is usually neither convex nor connected. In fact, an n-dimensional space has 2n orthants, each corresponding to a possible combination of reaction directions in a metabolic network. Clearly, the enumeration—let alone verification of the steady-state condition—of all orthants of T is infeasible with many reversible reactions. We therefore developed a Markov Chain Monte Carlo (MCMC) method based on the Hit-and-Run sampler for General Target Distributions (Bélisle et al., 1993) to sample T (Fig. 2).

graphic file with name btab194ilf1.jpg

Fig. 2.

Fig. 2.

Overview of TFS. (A) Example network, where left-to-right fluxes are positive. At steady state v2 = v1 and ΔrG3=ΔrG1+ΔrG2. (B) We sample the thermodynamic space using a modified Hit-and-Run algorithm. At each step, we limit sampling to portions of the space that allow steady-state flux distributions (orange). (C) As a result, we estimate the probability of each mode. (D) Finally, we sample the flux space drawing from each mode a number of samples proportional to its probability

Choice of initial points. For MCMC simulations it is generally advised to simulate multiple chains, choosing the starting points from an over-dispersed distribution, to detect if the chains explore the entire space. For each reversible reaction, we use PMO to find two sets of reaction directions over the network: one that maximizes the flux through the reaction, and one that minimizes it. Then, again using PMO, for each set of directions we search for a point si in T that satisfies the directionality constraints. We require each si to have maximal distance from the boundary of the respective space to reduce the risk of numerical errors. Finally, we start simulating one chain from each si.

Finding the intersection I. At each step of Hit-and-Run we first intersect the ray with the ellipsoid defined by (7) and a set of linear inequalities constraining the signs of free energies of irreversible reactions. Without a close-form description of the boundary of the space, finding the orthants that allow steady-state flux solutions is challenging. Despite the exponential number of orthants, a ray in an n-dimensional space only crosses at most n +1 of them. Thus, at each step, we identify all orthants intersected by the ray and discard those that do not allow a steady-state flux distribution. We verify the steady-state condition with a linear program that checks the existence of a flux distribution satisfying (1) and the directionality constraints of the orthant. For each feasible orthant, we add its intersection with the ray to I.

Sampling from I. We sample the next point along a ray proportionally to the target probability distribution, which is a Multivariate Normal (MVN). First, we restrict the MVN to a normal distribution along the 1-dimensional ray. We then compute the Cumulative Distribution Functions (CDFs) for each valid segment of I to determine its weight, sample a valid segment according to these weights and finally sample a point in the segments according to the associated truncated normal distribution on the segment. We optionally store the sample and the samples count for the selected orthant to estimate the orthant’s probability.

Reducing the dimensionality. Our method for sampling T so far neglects that only reaction free energies determine flux directions, and that free energies are determined only by concentration ratios, not absolute concentrations. Thus, we can reduce the dimensionality of the problem by not representing metabolite concentrations explicitly and solving (4) using μr and Σr instead of μt and Σt. This allows faster convergence of the chains. Afterwards, for each sample of ΔrGi, we sample the corresponding ln c and ΔrG° from the distribution of t conditioned on ΔrGi. This distribution can be constructed from the Schur complement of Σr in Σt and has the form of an MVN, which can be sampled efficiently:

ln cΔrG′° |ΔrGiNμcμ°+BΣr1(ΔrGiμr), [Σc00Σ°] BΣr1B, (11)

where B=[RTΣcSΓΣ°] is a submatrix of Σt.

Convergence. Since each sampled orthant is full-dimensional and all orthants are reachable, asymptotic convergence to arbitrary target distributions is guaranteed (Bélisle et al., 1993). However, there is no theoretical bound on the convergence rate. We therefore asses the quality of the sampled chains using recommended criteria on the chains’ Potential Scale Reduction Factor (PSRF) and Effective Sample Size (ESS) (Gelman et al., 2013). Several hit-and-run samplers reformulate the problem in isotropic parametrization to improve convergence (Haraldsdóttir et al., 2017). Since the distribution of m is a standard MVN, the parametrization already has a potentially high isotropy but this may not be optimal because flux constraints exclude parts of the solution space. To improve convergence, we initially run shorter chains and compute the PSRF of each reaction energy. We then adjust the parametrization such that the space is better explored along dimensions showing high PSRFs.

Flux sampling with thermodynamic prior. Finally, we sample the flux space using the estimated distributions of reaction directions as prior. We assume that the probability of an orthant is determined by its probability in the thermodynamic space, for which we already obtained estimates. For each direction sample di{1,1}γ, one can easily construct and sample the flux space defined by (1) and constrained by di using Coordinate Hit-and-Run with Rounding (CHRR) (Haraldsdóttir et al., 2017). However, this is computationally prohibitive for models with millions of thermodynamically realistic orthants. We therefore approximate the distribution in flux space by selecting no random orthants according to their relative probability. For each orthant, we then use CHRR to draw a number of flux samples proportional to its probability.

2.5 Data, models and implementation

We validated our methods on data for Escherichia coli growing on different carbon sources (Gerosa et al., 2015) with models based on iML1515 (Monk et al., 2017). The dataset contains measured growth rates, uptake and secretion rates for 10 metabolites, absolute concentrations for 42 metabolites and 13C estimates for 26 fluxes in core metabolism. Measurements were performed in minimal media with eight different carbon sources. iML1515 did not admit solutions satisfying the measured rates on galactose and gluconate, potentially due to its fixed biomass composition and ATP maintenance. Thus, we only performed simulations on acetate, fructose, glucose, glycerol, pyruvate and succinate.

We generated models in three steps: (i) We used NetworkReducer’s lossy reduction (Erdrich et al., 2015) to simplify the subsystems related to lipids and cofactors metabolism. (ii) We applied the results of the thermodynamic curation and integrated condition-specific constraints such as media composition and measured growth, uptake and secretion rates. The distribution of each intracellular metabolite was set to the log-normal distribution of all metabolites in the acetate and glucose conditions. We used the same mean, but a 5-fold larger standard deviation, for periplasmic and extracellular metabolites (except the ones provided in the media, for which we used the actual concentrations). Measured intracellular concentrations were added only for thermodynamic assessment and when specified in Results. (iii) We applied NetworkReducer’s lossless compression and manual simplifications to obtain a reduced model. We focused on keeping carbon, amino acid and nucleotide metabolism untouched; hence we will refer to the generated models as iML1515-CAN. The models used for TMFA and US simulations were generated with the same approach, but without integrating the results of thermodynamic curation. For PTA and TMFA simulations, the set of thermodynamically constrained reactions Γ consisted of all reactions except for biomass, exchange, water transport and export of all metabolites other than CO2. The dimensions of the models were 881n883,500m503 and 715γ721.

We implemented the TFS and CHRR (used also for comparisons against US) samplers in C++ and validated them against established approaches (see Supplementary Information). We provide Python and MATLAB (Mathworks, Natick, MA) interfaces for all methods, using COBRApy (Ebrahim et al., 2013) and the COBRA Toolbox (Heirendt et al., 2019) for model manipulation and the Gurobi Optimizer (Gurobi Optimization, Beaverton, OR) to solve PMO problems.

3 Results

3.1 Thermodynamic assessment of iML1515-CAN

First, we searched for inconsistent irreversibilities in our reduced version of the E.coli model iML1515 (Monk et al., 2017; see Section 2). Seven irreversible reactions had to be set to reversible to make the model thermodynamically feasible. Manual inspection of the standard reaction energies confirmed that all these reactions could be reversed at physiological conditions. For example, we identified a forced internal cycle in propionate metabolism (Fig. 3A), which we addressed by making acetate-CoA ligase (ACCOAL) and propanoyl-CoA: succinate CoA-transferase (PPCSCT) reversible and by allowing secretion of propionate, as it must be themodynamically favorable.

Fig. 3.

Fig. 3.

Thermodynamic assessment of iML1515-CAN. (A) Forced internal cycle in propionate metabolism (orange). The irreversibility of ACCOAL conflicts with the direction of phosphate acetyltransferase (PTA2) and propionate kinase (PPAKr). Moreover, due to the direction of PPCSCT and a shared pool of cofactors (green), it also incorrectly implies that conversion of succinyl-CoA to succinate in the TCA by succinyl-CoA synthetase (SUCOAS) is unfavorable. (B) After using PMO to find t* (red cross), we compute the z-scores of each metabolite and metabolites, i.e. the normalized distance from the mean (black cross). (C) Metabolites with high z-score (‘before’) are selected. After literature search, 29 out of the 36 flagged by the analysis led to improvements in the model, as shown by smaller z-scores in the curated model (‘after’). (D) A symptomatic case suggesting substrate channeling. An unfavorable reaction produces an intermediate (glutamate-5-phosphate) that must have low concentration to maintain thermodynamic feasibility and is followed by a favorable reaction. The total reaction energies with and without channeling are not identical because the estimated concentrations differ

After resolving the remaining reversibility conflicts, we searched for quantitative anomalies. We selected one glycolytic (glucose) and one gluconeogenic (actetate) growth condition and applied PMO to the corresponding models constrained by the measured concentrations.

We identified 36 unique anomalies in the two models, summarized in Table 1 and detailed in Supplementary Information. The most common reason for anomalies (16 metabolites) is the possible occurrence of substrate channeling. Thermodynamics-based modeling assumes a well-mixed system. If two or more enzymes catalyzing consecutive steps of a pathway assemble in a complex, reaction products may find themselves close to the binding pockets of the subsequent enzymes, which increases the reaction probability. In such cases, the intermediate is said to be channeled between the two enzymes, breaking the assumption of homogeneity. As a result, reactions that would require unusually high or low concentrations of a reactant can become feasible even at unfavorable intracellular concentrations. An example highlighted by PMO are the first two steps of proline synthesis, ProB and ProA (Fig. 3D). Constrained by the measured concentrations of glutamate and energy cofactors, PMO predicts a best-case maximum concentration of 14μM for the intermediate glutamate-5-phosphate, 1.4 standard deviations below the mean estimate. Thus, we hypothesized that glutamate-5-phosphate is channeled between ProB and ProA, relieving the thermodynamic constraint on its intracellular concentration, which is consistent with literature (Marco-Marín et al., 2007). Consequently, we updated the model by replacing the two steps with a single net reaction.

Table 1.

Summary of the anomalies in the concentrations predicted by PMO and the resulting modifications to the model in the curation (third column: number of reactions added, removed or converted to lumped reactions)

Reasons for anomalies Anomalies Reactions
Substrate channeling with literature support 9 16
Substrate channeling in gluconeogenesis 3 10
Substrate channeling without literature support 4 0
Incomplete stoichiometry 2 2
Inaccurate irreversibility annotations 6 9
Inaccurate group contribution estimates 4 3
Artifacts of steady-state constraints 5 5
False positives 3 0
Total 36(7.2%) 45(6.2%)

Note: Percentage values are relative to m and γ, respectively.

We predicted and found literature support for substrate channeling in several other pathways (see Supplementary Information for details). Most interestingly, with available metabolomics data, four steps of gluconeogenesis (phosphoglycerate mutase, phosphoglycerate kinase, triose-phosphate isomerase and fructose-biphosphate aldolase) appear unfavorable, suggesting enzyme complexes with more favorable steps (phosphoenolpyruvate carboxykinase and fructose-bisphosphatase) that promote substrate channeling. This is consistent with 13C and kinetic studies showing channeling of glycolytic compounds (Abernathy et al., 2019; Rakus et al., 2004).

In addition, two anomalies revealed that the first step of serine synthesis, phosphoglycerate dehydrogenase, is unfavorable with NAD as cofactor. It is known that a more complex mechanism couples the oxidation of 3-phosphoglycerate to the reduction of ubiquinone, which is more favorable (Zhang et al., 2017). We attributed 15 anomalies to inaccurate ΔrG° estimates, inaccurate irreversibility annotations and cases in which dilution effects likely dominate the steady-state constraint. Only three anomalies were false positives in which concentrations far from the average are expected, as for intracellular oxygen. Overall, 91.7% of the metabolites flagged after PMO reflected model inaccuracies or knowledge gaps, leading to the curation of 45 reactions, 6.2% of the reactions with thermodynamic constraints.

3.2 Sampling modes and directions in iML1515-CAN

We used TFS to sample the thermodynamic space of iML1515-CAN in different growth conditions, both with (M+, all conditions) and without (M, all conditions except glucose and acetate, where metabolomics data have been used to curate the model) integrating metabolomics data. Specifically, we collected 108 samples of orthants and 105 samples of ΔrG. In each condition, PMO successfully found initial points and the sampler reached convergence to the target distribution according to common metrics (see Section 2). Interestingly, TFS only needed 401 variables to represent ΔrG of the 717 reactions in Γ. In models constrained only with measured growth and exchange rates, we discovered 16.5×106 orthants, out of which 11.5×106 suffice to cover 95% of the thermodynamic space. These numbers decreased to 11.3×106 and 7.2×106 once we integrated measured metabolite concentrations. Overall, this represents a substantial reduction over the average theoretical maximum number of orthants (computed from the number of reversible reactions) in the TMFA-constrained models (M: 5.8×1022,M+: 3.1×1020). Hence, correlations in thermodynamic space may impose important constraints on possible combinations of flux directions.

We compared the predicted reaction directions of TFS, TMFA and US on TMFA-constrained models and validated them against 13C estimates. For TFS and US, a prediction was considered irreversible when at least 95% of the samples represented the same direction. On average, TMFA could only constrain a small fraction of the models’ 71–79 reversible reactions, in particular without metabolomics data (Fig. 4A). US yielded the most constrained predictions, while TFS predicted irreversibilities for about half of the 102–105 reversible reactions in the PMO-curated models. Despite having the highest precision, US also showed the lowest accuracy: it incorrectly predicted the direction of part of the 11–15 reversible fluxes with available 13C estimates. Incorrect predictions focus on two main pathways in gluconeogenic conditions: upper glycolysis, where US systematically prefers fructose 6-phosphate aldolase over fructose-bisphosphate aldolase, and the non-oxidative pentose phosphate pathway. In one condition, these choices result from TMFA incorrectly predicting fructose-bisphosphate aldolase to be irreversible. In all conditions, the irreversibilities predicted by TFS were consistent with the 13C estimates.

Fig. 4.

Fig. 4.

Reaction directions and fluxes. (A) Precision and accuracy of predicting directionalities with different methods. Predicted irreversibilities are given as percentage of the average number of reversible reactions in the different conditions. The percentage of incorrectly predicted irreversibilities is relative to the average number of fluxes that are reversible in the models and have 13C estimates. (B) Example of multimodal predictions of fluxes with TFS for the synthesis of purines and glycine. Orange: serine is converted to glycine, donating one carbon to form 10-formyltetrahydrofolate (10thf), a cofactor for purine synthesis. Blue: glycine is synthesized from threonine and is converted to CO2 and 10thf. Histograms show predicted flux distributions, clustered by the direction of GHMT2r. (C) Distribution of ΔrG° (left) and the activity term (right) of GHMT2r, clustered by the direction of the reaction. Si is the i-th row of SΓ. (D) Cohen’s d for the distributions of the standard reaction energies and of the activity terms in the two clusters for all reactions in Γ

Beyond the validated fluxes, we found cases (M: 11–18, M+: 9–13) where both US and TFS predicted a reaction to be irreversible, but in different directions, mostly for reactions that can participate in internal cycles and when alternative paths exist for the synthesis of the same compound. For example, 5-phospho-alpha-D-ribose 1-diphosphate is generally synthesized directly from ribose 5-phosphate (as predicted by TFS), but US predicts a conversion in the opposite direction and 5-phospho-alpha-D-ribose 1-diphosphate synthesis through a longer and stoichiometrically equivalent path via ribose 1-phosphate and ribose 1,5-diphosphate. While literature suggests that the directions predicted by TFS are correct (see Supplementary Information), the available data does not allow for a systematic validation. These differences highlight the effect of using different spaces to determine reaction directions. In US, the probability of a direction is determined by the fraction of flux space that allows it; this favors directions that impose minimal constraints on the rest of the network. In contrast, TFS predicts directions based on the probability of having a favorable reaction energy.

3.3 Exploration of metabolic modes

Using the samples of orthants and their weights we sampled the flux space of 104 orthants for each condition. TFS predicted millions of modes in iML1515-CAN, but focusing on the predicted flux distribution of individual reactions one observes fewer than 100 marginal modes, which have simpler interpretation. An example are the two modes defined by the direction of glycine hydroxymethyltransferase (GHMT2r) (Fig. 4B). Physiologically, serine is used to synthesize glycine and feed 1C-metabolism, required for purine synthesis. US only predicts this path. TFS predicts an additional mode where glycine and 1C are synthesized from threonine. This threonine bypass is not commonly used by E.coli, likely because of higher resource requirements. However, it has been successfully used to enhance PHB production (Lin et al., 2015). Interestingly, most reactions in glycolysis and the TCA display several marginal modes, depending on the direction of reactions in the rest of the network. While the effect of reversing a reaction on the fluxes can be analyzed with classical FBA or US, TFS provides additional information: the probability (and feasibility) of each direction and the correlation between the direction and the thermodynamic quantities. In the case of GHMT2r, the two modes can be better distinguished by the activity term—more precise estimates of ΔrG° would contribute little information about the reaction direction (Fig. 4C). Conversely, knowing the direction of the reaction would constrain the ratio of serine and glycine, and thus their concentrations. We computed Cohen’s d for the distribution of ΔrG° and of the activity term in the two modes for each reversible reaction (Fig. 4D). In most cases, the modes are better separated by the activity term, suggesting that uncertainties in the standard reaction energy estimates have limited impact. Additional measurements of absolute metabolite concentrations will likely constrain more reaction directions.

3.4 Prediction of metabolite concentrations

Finally, we compared the metabolite concentrations predicted by TFS and TMFA to the experimentally measured values in the four M conditions. TFS predictions generally agree with the measured values and highlight strong thermodynamic constraints in the concentrations of some metabolites, such as the NAD/NADH ratio and organic phosphate (Fig. 5A). The mean concentrations predicted by TFS correlate with the experimental measurements (r =0.32, Fig. 5B). For several metabolites, the predicted distribution was very close to the prior distribution [Kullback-Leibler divergence (KL) smaller than 0.2], suggesting that they participate in reactions that are always favorable. Reducing the coverage of the predictions to metabolites with KL 0.2 led to a higher correlation (r =0.63), showing that their concentration is thermodynamically constrained. When we focused on the 95% Confidence Interval (CI) of the samples and compared them to the ranges predicted by TMFA (with 95% CI on the distributions of ΔrG° and ln c), TFS always predicted narrower ranges (Fig. 5C), with an average 4.5-fold reduction (0.65 on the log10-scale). We quantified the overlap of the predicted distributions (uniform within the predicted interval for TMFA) with the distribution of the measurements using the Hellinger distance (Fig. 5D). In 66% of the cases, TFS agreed better with the measurements than TMFA, further supporting its overall higher predictive power for metabolite concentrations.

Fig. 5.

Fig. 5.

Metabolite concentrations. (A) Examples of distributions predicted by TFS (orange), the respective 95% confidence intervals (red), ranges predicted by TMFA (blue) and mean and 95% confidence intervals of the metabolomics measurements (black, range from literature in gray). (B) Mean concentrations predicted by TFS agree with measured values (Pearson’s r =0.32, Root Mean Square Error RMSE =0.87 in log10 scale). Reducing the coverage to metabolites with a KL between prior and posterior distribution above 0.2 (full circles) increases the agreement to r =0.63, RMSE =0.72. (C) Cumulative distribution of the reduction in the width of the predicted interval of TFS over TMFA for the validated metabolites. (D) Hellinger distances between the distribution of the predicted and measured metabolite concentrations for TMFA and TFS

4 Discussion

Realistic models must obey the laws of thermodynamics: quantitative observations inform on the model structure, and mechanistic knowledge constrains parameters. Here, we show how this connection can be exploited in both directions to analyze metabolism. Our PTA framework for thermodynamic-based modeling includes optimization and sampling methods for the evaluation and exploration of metabolic networks. As demonstrated for E.coli, it allows to combine metabolic models with experimental data to discover quantitative and structural model inaccuracies and generate hypotheses on mechanisms such as substrate channeling. We expect this capability to be particularly important for less well-characterized organisms. Conversely, we showed that the stoichiometry of a network, coupled with thermodynamic constraints and few physiological observations can significantly reduce the space of feasible metabolic modes and metabolite concentrations.

In contrast to other methods, PTA interprets uncertainties in the thermodynamic parameters probabilistically to account for correlations between uncertainties of multiple estimates. Moreover, we require all reactions to have a well-defined direction. The model cannot silently resolve thermodynamic inconsistencies by setting a flux to zero as in TMFA—problems are visible to the modeler, guiding model curation and the formulation of novel hypotheses. Using the thermodynamic space as prior for sampling the flux space changed the exploration principle from the one used by US. Instead of using the volume of the flux space only, TFS determines the probability of a mode based on the probability of having a set of reaction energies allowing it. This excludes thermodynamically infeasible cycles from the solution. It also prevents distributed thermodynamic bottlenecks (Mavrovouniotis, 1993) and highlights feasible modes that would be missed by US, such as the threonine bypass.

However, PTA comes at the cost of higher computational complexity. Enforcing non-zero flux for each reaction makes optimization problems more difficult to solve, and sampling the non-convex thermodynamic space requires long simulation times; the asymptotic complexity is potentially exponential. Here, we successfully analyzed models with up to 880 reactions. For each condition, enumeration of internal EFMs and solution of PMO required less than five minutes, while sampling took approximately one day. To make larger models tractable, PTA can trade speed for coverage by limiting thermodynamic constraints to compartments or subsystems of interest.

To discover new biology, PMO proved effective at identifying occurrences of substrate channeling. While TMFA suggested such occurrences (Chiappino-Pepe et al., 2017), predictions were limited to cases where predicted concentrations in an essential pathway exceed physiological ranges. Most of our predictions were supported by at least another source in E.coli or other organisms, and we also obtained novel hypotheses. That gluconeogenesis appears unfavorable is puzzling: several observations suggest channeling of glycolytic compounds. Yet, this observation could be attributed to inaccurate estimates of standard reaction energies, for example, because magnesium ions, which affect the thermodynamics of glycolysis (Vojinović and von Stockar, 2009), are insufficiently accounted for. Additionally, the analysis showed that the curation of reaction directions for FBA applications leads to thermodynamic inaccuracies. As coverage and precision of standard reaction energies and metabolite concentrations increases, we therefore encourage to model thermodynamics explicitly.

Finally, while TFS predicts significantly fewer modes than contained in the flux space, these modes cover a wide range of behaviors. Thermodynamics alone does not determine what a cell does, but it restricts what options are available to achieve a certain phenotype. The final choice probably follows from other constraints or principles such as enzyme cost and minimal resource usage, whose integration could further increase the predictive power of thermodynamically constrained models.

Supplementary Material

btab194_supplementary_data

Acknowledgements

The authors thank Elad Noor for the help using equilibrator-api, Axel Theorell for comments on the manuscript, and Fabian Rudolf for helpful discussions.

Funding

This work was supported by a Swiss National Science Foundation Sinergia project [177164].

Conflict of Interest: none declared.

Contributor Information

Mattia G Gollub, Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, Basel 4058, Switzerland.

Hans-Michael Kaltenbach, Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, Basel 4058, Switzerland.

Jörg Stelling, Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, Basel 4058, Switzerland.

References

  1. Abernathy M.H.  et al. (2019) Comparative studies of glycolytic pathways and channeling under in vitro and in vivo modes. AIChE J., 65, 483–490. [Google Scholar]
  2. Bélisle C.J.  et al. (1993) Hit-and-run algorithms for generating multivariate distributions. Math. Oper. Res., 18, 255–266. [Google Scholar]
  3. Binns M.  et al. (2015) Sampling with poling-based flux balance analysis: optimal versus sub-optimal flux space analysis of actinobacillus succinogenes. BMC Bioinformatics, 16, 49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chiappino-Pepe A.  et al. (2017) Bioenergetics-based modeling of plasmodium falciparum metabolism reveals its essential genes, nutritional requirements, and thermodynamic bottlenecks. PLoS Comput. Biol., 13, e1005397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Desouki A.A.  et al. (2015) Cyclefreeflux: efficient removal of thermodynamically infeasible loops from flux distributions. Bioinformatics, 31, 2159–2165. [DOI] [PubMed] [Google Scholar]
  6. Du B.  et al. (2018a) Temperature-dependent estimation of Gibbs energies using an updated group-contribution method. Biophys. J., 114, 2691–2702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Du B.  et al. (2018b) Thermodynamic favorability and pathway yield as evolutionary tradeoffs in biosynthetic pathway choice. Proc. Natl. Acad. Sci. USA, 115, 11339–11344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Ebrahim A.  et al. (2013) Cobrapy: constraints-based reconstruction and analysis for python. BMC Syst. Biol., 7, 74–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Erdrich P.  et al. (2015) An algorithm for the reduction of genome-scale metabolic network models to meaningful core models. BMC Syst. Biol., 9, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gelman A.  et al. (2013) Bayesian Data Analysis. CRC Press, Boca Raton, FL. [Google Scholar]
  11. Gerosa L.  et al. (2015) Pseudo-transition analysis identifies the key regulators of dynamic metabolic adaptations from steady-state data. Cell Syst., 1, 270–282. [DOI] [PubMed] [Google Scholar]
  12. Goldberg R.N.  et al. (2004) Thermodynamics of enzyme-catalyzed reactions—a database for quantitative biochemistry. Bioinformatics, 20, 2874–2877. [DOI] [PubMed] [Google Scholar]
  13. Haraldsdóttir H.  et al. (2012) Quantitative assignment of reaction directionality in a multicompartmental human metabolic reconstruction. Biophys. J., 102, 1703–1711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Haraldsdóttir H.S.  et al. (2017) CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models. Bioinformatics, 33, 1741–1743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Heinonen M.  et al. (2019) Bayesian metabolic flux analysis reveals intracellular flux couplings. Bioinformatics, 35, i548–i557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Heirendt L.  et al. (2019) Creation and analysis of biochemical constraint-based models using the cobra toolbox v. 3.0. Nat. Protoc., 14, 639–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Henry C.S.  et al. (2007) Thermodynamics-based metabolic flux analysis. Biophys. J., 92, 1792–1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jankowski M.D.  et al. (2008) Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys. J., 95, 1487–1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Keaty T.C., Jensen P.A. (2020) Gapsplit: efficient random sampling for non-convex constraint-based models. Bioinformatics, 36, 2623–2625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lin Z.  et al. (2015) Metabolic engineering of Escherichia coli for poly (3-hydroxybutyrate) production via threonine bypass. Microb. Cell Factories, 14, 185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Marco-Marín C.  et al. (2007) A novel two-domain architecture within the amino acid kinase enzyme family revealed by the crystal structure of Escherichia coli glutamate 5-kinase. J. Mol. Biol., 367, 1431–1446. [DOI] [PubMed] [Google Scholar]
  22. Mavrovouniotis M.L. (1993) Identification of localized and distributed bottlenecks in metabolic pathways. ISMB, 93, 273–283. [PubMed] [Google Scholar]
  23. Monk J.M.  et al. (2017) iML1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol., 35, 904–908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Noor E.  et al. (2013) Consistent estimation of Gibbs energy using component contributions. PLoS Comput. Biol., 9, e1003098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Orth J.D.  et al. (2010) What is flux balance analysis?  Nat. Biotechnol., 28, 245–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Park J.O.  et al. (2016) Metabolite concentrations, fluxes and free energies imply efficient enzyme usage. Nat. Chem. Biol., 12, 482–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Price N.D.  et al. (2004) Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. Biophys. J., 87, 2172–2186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Rakus D.  et al. (2004) Interaction between muscle aldolase and muscle fructose 1, 6-bisphosphatase results in the substrate channeling. Biochemistry, 43, 14948–14957. [DOI] [PubMed] [Google Scholar]
  29. Ramon C.  et al. (2018) Integrating –omics data into genome-scale metabolic network models: principles and challenges. Essays Biochem., 62, 563–574. [DOI] [PubMed] [Google Scholar]
  30. Saa P.A., Nielsen L.K. (2016) ll-achrb: a scalable algorithm for sampling the feasible solution space of metabolic networks. Bioinformatics, 32, 2330–2337. [DOI] [PubMed] [Google Scholar]
  31. Schilling C.H.  et al. (2000) Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J. Theor. Biol., 203, 229–248. [DOI] [PubMed] [Google Scholar]
  32. Terzer M., Stelling J. (2008) Large-scale computation of elementary flux modes with bit pattern trees. Bioinformatics, 24, 2229–2235. [DOI] [PubMed] [Google Scholar]
  33. Vojinović V., von Stockar U. (2009) Influence of uncertainties in pH, pMg, activity coefficients, metabolite concentrations, and other factors on the analysis of the thermodynamic feasibility of metabolic pathways. Biotechnol. Bioeng., 103, 780–795. [DOI] [PubMed] [Google Scholar]
  34. Williams H.P. (2013) Model Building in Mathematical Programming. John Wiley & Sons, Hoboken, NJ. [Google Scholar]
  35. Zhang W.  et al. (2017) Coupling between d-3-phosphoglycerate dehydrogenase and d-2-hydroxyglutarate dehydrogenase drives bacterial l-serine synthesis. Proc. Natl. Acad. Sci. USA, 114, E7574–E7582. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btab194_supplementary_data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES