Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming

Stephen Gang Wu; Yuxuan Wang; Wu Jiang; Tolutola Oyetunde; Ruilian Yao; Xuehong Zhang; Kazuyuki Shimizu; Yinjie J Tang; Forrest Sheng Bao

doi:10.1371/journal.pcbi.1004838

. 2016 Apr 19;12(4):e1004838. doi: 10.1371/journal.pcbi.1004838

Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming

Stephen Gang Wu ¹, Yuxuan Wang ², Wu Jiang ³, Tolutola Oyetunde ¹, Ruilian Yao ⁴, Xuehong Zhang ⁴, Kazuyuki Shimizu ⁵, Yinjie J Tang ^1,^*, Forrest Sheng Bao ^6,^*

Editor: Christos A Ouzounis⁷

PMCID: PMC4836714 PMID: 27092947

Abstract

¹³C metabolic flux analysis (¹³C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 ¹³C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on ¹³C-MFA are published for non-model species.

Author Summary

Metabolic information is important for disease treatment, bioprocess optimization, environmental remediation, biogeochemical cycle regulation, and our understanding of life’s origin and evolution. ¹³C-MFA can quantify microbial physiology at the level of metabolic reaction rates. To speed up microbial characterizations and fluxomic studies, we hypothesize that genetic and environmental factors generate specific fluxome patterns that can be recognized by machine learning. Aided by constraint programming and quadratic optimization, our platform based on machine learning (ML) can predict meaningful metabolic information about bacterial species in their environments. Further, it can offer constraints to improve the accuracy of flux balance analysis. This study infers that the bacterial metabolic network has a certain degree of rigidity in allocating carbon fluxes, and different microbial species may share common regulatory strategies for balancing carbon and energy metabolisms. As a proof of concept, we demonstrate that the use of data-driven artificial intelligence (AI) approaches, e.g., ML, may assist mechanistic based models to elucidate the topology of microbial fluxomes.

Introduction

With the advent of systems biology tools, such as genomics, transcriptomics, proteomics, and metabolomics during the last decade, the understanding of intracellular metabolisms from genotype to phenotype has been dramatically boosted. Notably, ¹³C metabolic flux analysis (¹³C-MFA) enables the quantification of metabolic reaction rates in vivo [1]. It determines carbon metabolic fluxes using the mass isotopomer distribution (MID) of proteinogenic amino acids or free metabolites from ¹³C labeling experiments. ¹³C-MFA is considered as a reliable measurement of central metabolic reaction rates [2], which has demonstrated its power in discovering novel pathways [3, 4], validating gene functions [3], verifying engineered strains [5, 6], and revealing energy metabolism of host strains [7]. In the past decade, advanced parallel bioreactor systems, mass spectrometry, and computational tools resolving metabolic fluxes have been developed [8–11], which improved the precision of flux profiles [12] and extended ¹³C-MFA’s application to the non-stationary metabolic phase [13, 14]. On the other hand, broad applications of ¹³C-MFA are still hindered because ¹³C experiments, biomass analysis, and flux calculations are expensive and time-consuming [15]. Moreover, some microbial systems may not be amenable to ¹³C-MFA if they require complex nutrients or their genome annotation is incomplete [16]. Before performing ¹³C-MFA on non-model species, laborious work is needed to examine extracellular metabolites, to characterize unknown pathways, and to analyze biomass compositions.

This study aims to employ an artificial intelligence (AI) approach called machine learning (ML) to investigate bacterial fluxomics patterns. ML is a powerful tool in systems biology [17] and has demonstrated successes in omics studies [18, 19]. For example, the precision of genome annotation on the model species C. elegans has been significantly enhanced by employing a simplified Support Vector Machine (SVM) method. Researchers have reached an accuracy of 75% on controversial genes [20]. At the transcriptomics level, ML approaches have been frequently used for disease identification. For instance, SVM has successfully recognized the gene expression patterns of hepatocellular carcinoma (HCC) [21], diffuse large B-cell lymphoma (DLBCL) [22] and ovarian cancer [23]. At the proteomics level, Supek et al. have employed a combined approach by integrating the Principal Component Analysis (PCA) method with SVM, to enhance analytic power in identifying “fingerprint” proteins (i.e., unique proteins in each tissue) from different horseradish tissues (leaf, teratoma, and tumor) grown in vitro [24]. In metabolomics, an SVM method can resolve the NMR data of metabolites in urine samples from different groups of people (healthy vs. pneumonia) [25]. In metabolic modeling, Karp’s group have adopted ML algorithms to predict the existence of various pathways for metabolic network reconstruction in different organisms [19].

The general idea of ML is to statistically build a numerical predictive model or an estimator which is a function f : X ↦ y that maps a vector of numbers called the feature vector to a vector of numbers called the target or the label. In many cases, the target is a 1-D vector, or a scalar. One may consider the feature vector as the input and the target as the output of the model. If the target takes discrete values, we call the ML model a classifier. Otherwise, a regressor. A commonly used classifier is binary classifier, where the cardinality (size) of the target set y is 2, e.g., y = {+1,−1}. In this paper, we build a regressor $f : R^{n} \mapsto R$ for each flux, where $R$ stands for the set of all real numbers. In supervised ML, a pair of a feature vector and a target form a training sample. Given a finite set of N samples {(X₁, y₁), …, (X_N, y_N)}, an ML algorithm will find such a function, usually through solving a numerical optimization problem, to minimize the predictive error. Samples used to train a model form the training set while those for testing the performance form the test set. Given a new piece of data, numerically represented as a vector X_new, the model f will predict the target f(X_new), e.g., a flux value given reaction parameters where are represented by the vector X_new in this paper. The models learned through ML are usually not analytical models that can be represented using equations. Rather, they are numerical operators. For example, an artificial neural network (ANN) model can be represented by a series of weight and bias matrices, each of which is for one layer. A poor model can only predict well on the training set as if it only “remembers” the training samples, while a good model can learn the patterns among data and still be accurate on samples it has never “seen”. Hence, researchers make the training and test sets mutually exclusive. A mechanism called cross validation is used to ensure the mutual exclusiveness of training and test sets while making full use of the dataset.

A distinct advantage for ML applications is that they can reduce the need for costly experimental supplies and time-consuming benchwork. Despite the progress in utilizing ML methods in systems biology, there is no similar application in the fluxomics field to predict the flux profile. Therefore, we conceived the idea of integrating ML strategies with fluxomics research. To efficiently employ ML methods, a dataset with a sufficient number of samples is a prerequisite. Recently, a ¹³C-MFA dataset named CeCaFDB has been constructed, which includes more than 100 papers mostly on prokaryotic species [26]. Based on this dataset, five categorical and sixteen continuous features were initiated to describe the environmental and genetic factors involved in ¹³C-MFA of bacterial species. Unlike most omics projects employing ML approaches, this work built regressors rather than classifiers: 29 lumped central metabolic fluxes were adopted as the outputs to describe the central carbon metabolism of bacteria species. A 10-fold cross validation evaluated the performance of different algorithms. Furthermore, we included a knowledge-based system to check whether user inputs were biologically meaningful. Lastly, quadratic programming was employed to adjust the fluxes predicted by ML to satisfy stoichiometric constraints. Our web-based platform MFlux provides reasonable predictions for central metabolic flux profiles on 30 bacteria species, and it can be accessed online at http://mflux.org along with the training data. Although our platform is still in the early phase, our trial to integrate AI approaches with mechanistic models will have broad impacts on both systems biology and metabolic engineering fields.

Methods

Data collection

The dataset used to build MFlux are constructed from the literature. The total uptake rate of carbon sources is normalized as 100; all other fluxes are normalized based on the uptake rate of carbon sources. We obtained ¹³C-MFA information for bacterial species from the CeCaFDB dataset and added a few recent papers (approximately 120 papers in total, as of January 2015). ¹³C-MFA data related to photosynthetic bacteria was excluded in ML study because of their diverse CO2 fixation pathways, light-sensitive fluxomes, and insufficient sampling sizes for ML. For photosynthetic species, MFlux currently only reports a general description of their fluxomic features based on corresponding references.

In heterotrophic microorganisms, interconversions between glycolysis metabolites (phosphoenolpyruvate and pyruvate) and TCA cycle metabolites (oxaloacetate and malate) involve a set of anaplerotic reactions (e.g., phosphoenolpyruvate carboxylase, phosphoenolpyruvate carboxykinase, pyruvate carboxylase, and malic enzyme) serving as a key switch point for carbon flux distribution in bacteria [27]. These reactions balancing both carbon and cofactors may be employed by different microbial species. For example, E. coli anaplerotic pathways involve phosphoenolpyruvate carboxylase and malic enzyme, while Bacillus species furnish pyruvate carboxylase (the pyruvate shunt). In the case of Corynebacterium, both phosphoenolpyruvate carboxylase and pyruvate carboxylase are functional [28, 29]. These anaplerotic pathways can re-route fluxes when central pathway such as pyruvate kinase is knocked out. To ease the ML efforts, the anaplerotic pathways were lumped into two routes that exchanges fluxes between the TCA cycle and the glycolysis nodes: (MAL → PYR + CO₂ and PEP + CO₂ → OAA). This simplification also considered the fact that ¹³C-MFA has poor resolutions on anaplerotic fluxes because various combinations of these reactions could generate similar labeling patterns in amino acids [30].

Feature vector for ML

As mentioned earlier, supervised ML builds models based on the samples, each of which is a pair of a feature vector and a target. Based on published ¹³C-MFA methodologies and microbial physiologies, we proposed five categorical features: species, nutrient types, oxygen conditions, engineering method, genetic background, and cultivation methods. There were two considerations when choosing those features. First, genetic modifications can significantly re-organize fluxomes. To improve the predictability on mutant strains, our platform allows toggling on or off certain central pathways (by manually setting the flux boundaries) in engineered strains. Second, the factor of cultivation method aims to reveal fluxome differences between shake flask cultures (a pseudo-steady state approach) and bioreactor cultures (a well-controlled fermentation or chemostat cultivation). Meanwhile, we introduced sixteen continuous features: growth rate, substrate uptake rate, and the ratio of multiple substrate uptakes (glucose, fructose, galactose, gluconate, glutamate, citrate, xylose, succinate, malate, lactate, pyruvate, glycerol, acetate and NaHCO₃, as shown in Fig 1). Since the features include both categorical and continuous ones, one-hot encoders were used to convert categorical feature values into real numbers. Each feature was then standardized into zero mean and unit variance as assumed by many ML approaches. For each predicted flux, or the target/label in ML terminology, we scaled it into the interval [0, 1] by the min-max method. In addition to the min-max method, we also tested unit-variance-zero-mean standardization for scaling flux values, and the result was quite similar.

Fig 1 — The central carbon metabolic pathway is simplified into 29 fluxes, used as the outputs of our model.

Machine learning algorithms

The problem of predicting fluxes was modeled as a regression problem in ML where a computer program learns from existing data to estimate continuous variables. Twenty-nine regressors were trained to predict the 29 fluxes. We tested three widely-applied ML algorithms, including k-nearest neighbors (k-NN), decision tree, and SVM. To ensure a fair comparison, we performed a grid search for the best parameter set of each algorithm. The detailed parameter sets for 29 SVM-based regression models can be found from our web page. The programming language used for this project was Python 2.7 and the numpy and scikit-learn modules were utilized for machine learning [31]. Program files for training the models and testing them are wrapped in S1 Program. Full version including web-end code is released under GNU GPL v3 at https://bitbucket.org/forrestbao/influx

Model evaluation and cross validation

Considering the limited number of samples in the current dataset, we adopted a 10-fold cross validation. An N-fold cross validation works as follows. All samples in our dataset are spliced into N equal parts. In each iteration, N − 1 parts are used as the training set, while the remaining as the test set. In the next iteration, the test set will be rotated to another part of the data, and the training set will consist of all other samples. This procedure will stop when all parts of the data have been incorporated into the test set exactly once, and training set exactly N − 1 times. Finally, the accuracy of the model can be calculated by checking the prediction result for each sample. For each flux, the error in cross validation is computed using Mean Squared Error (MSE).

Stoichiometric constraints and boundary

One unique feature of our method is incorporating the overall mass balance through central metabolic pathways. The stoichiometric equations in Fig 1 under steady state are summarized as follows:

G6P : v_{1} = v_{2} + v_{10} + v b m_{g 6 p}

(1)

F6P/FBP : v_{2} + v_{15} + v_{16} + 100 \cdot r a t i o_{f r u c t o s e} = v b m_{f 6 p} + v_{3}

(2)

DHAP : v_{3} + 100 \cdot r a t i o_{g l y c e r o l} = v_{4}

(3)

GAP : v_{3} + v_{4} + v_{14} + v_{15} + v_{25} = v_{5} + v_{16} + v b m_{g a p}

(4)

3PG : v_{5} = v_{6} + v b m_{3 p g}

(5)

PEP : v_{6} = v_{7} + v_{28} + v b m_{p e p}

(6)

PYR : v_{7} + v_{25} + v_{29} + 100 \cdot r a t i o_{p y r u v a t e} = v_{8} + v_{27} + v b m_{p y r}

(7)

AceCoA : v_{9} + v_{17} + v_{24} + v_{26} + v b m_{a c c o a} = v_{8}

(8)

Ru5P : v_{11} = v_{12} + v_{13}

(9)

R5P : v_{13} = v_{14} + v b m_{r 5 p}

(10)

E4P : v_{15} + v b m_{e 4 p} = v_{16}

(11)

S7P : v_{14} = v_{16}

(12)

X5P : v_{12} + 100 \cdot r a t i o_{x y l o s e} = v_{14} + v_{15}

(13)

6PG : v_{10} + 100 \cdot r a t i o_{g l u c o n a t e} = v_{11} + v_{25}

(14)

CIT : v_{17} + 100 \cdot r a t i o_{c i t r a t e} = v_{18}

(15)

ICIT : v_{18} = v_{19} + v_{24}

(16)

AKG : v_{19} + 100 \cdot r a t i o_{g l u t a m a t e} = v_{20} + v b m_{a k g}

(17)

SUC : v_{20} + v_{24} + 100 \cdot r a t i o_{s u c c i n a t e} = v_{21} + v_{a a 1}

(18)

FUM : v_{21} + v_{a a 2} = v_{22}

(19)

MAL : v_{22} + v_{24} + 100 \cdot r a t i o_{m a l a t e} = v_{23} + v_{29}

(20)

OAA : v_{23} + v_{28} = v_{17} + v b m_{o a a}

(21)

Specifically, v₁ represents the flux from carbon substrate (either glucose or galactose) to G6P since both glucose and galactose can be catabolized to G6P, vaa₁ and vaa₂ represent fluxes involved in biomass building block synthesis or extracellular products, while vbm represents carbon fluxes going to biomass from different precursors [32].

A series of linear constraints can be derived from the stoichiometric equations above and used to restrain fluxes predicted by the ML methods:

v_{1} - 100 \cdot (r a t i o_{g l u c o s e} + r a t i o_{g a l a c t o s e}) = 0

(22)

v_{3} - v_{4} + 100 \cdot r a t i o_{g l y c e r o l} = 0

(23)

v_{11} - v_{12} - v_{13} = 0

(24)

v_{14} - v_{16} = 0

(25)

v_{10} - v_{11} - v_{25} + 100 \cdot r a t i o_{g l u c o n a t e} = 0

(26)

- v_{17} + v_{18} - 100 \cdot r a t i o_{c i t r a t e} = 0

(27)

- v_{12} + v_{14} + v_{15} - 100 \cdot r a t i o_{x y l o s e} = 0

(28)

- v_{18} + v_{19} + v_{24} = 0

(29)

- v_{22} + v_{23} - v_{24} + v_{29} - 100 \cdot r a t i o_{m a l a t e} = 0

(30)

Among equations listed above, Eq 22 indicates the case for co-metabolism of both C6 sugars. Meanwhile, a list of inequality constraints can be drawn, given that all biomass fluxes are non-negative:

v_{1} - v_{2} - v_{10} \geq 0

(31)

v_{2} - v_{3} + v_{15} + v_{16} + 100 \cdot r a t i o_{f r u c t o s e} \geq 0

(32)

v_{3} + v_{4} - v_{5} + v_{14} + v_{15} - v_{16} + v_{25} \geq 0

(33)

v_{5} - v_{6} \geq 0

(34)

v_{6} - v_{7} - v_{28} \geq 0

(35)

v_{7} - v_{8} + v_{25} - v_{27} + v_{29} + 100 \cdot r a t i o_{p y r u v a t e} \geq 0

(36)

v_{8} - v_{9} - v_{17} - v_{24} - v_{26} \geq 0

(37)

v_{13} - v_{14} \geq 0

(38)

- v_{15} + v_{16} \geq 0

(39)

v_{19} - v_{20} + 100 \cdot r a t i o_{g l u t a m a t e} \geq 0

(40)

- v_{17} + v_{23} + v_{28} \geq 0

(41)

- v_{21} + v_{22} \geq 0

(42)

Among all inequality constraints, Eq 39 works well except for the case of zwf knockout, where the direction of Eq 39 could be reversed [33].

Flux adjustment using stoichiometric constraints

We adopted a quadratic programming method similar to minimization of metabolic adjustment (MOMA) [34], to tune fluxes to satisfy the stoichiometric constraints. The CVXOPT package for Python was employed here for quadratic programming [35]. The optimization problem is modeled as

\begin{matrix} Minimize f (v) & = \sum_{i = 1}^{29} {(Scaled (v_{i}) - Scaled ({\hat{v}}_{i}))}^{2} \\ Subject to S \cdot v & = 0, \\ A \cdot v & \geq 0, \end{matrix}

(43)

where the vector $\hat{v} = [{\hat{v}}_{1}, \dots, {\hat{v}}_{29}]$ is the flux values predicted by ML, the vector v = [v₁, …, v₂₉] is the flux values to be solved in this optimization problem, the function Scaled(⋅) using Min-Max scaling to scale all fluxes into the range [0, 1], the matrix S is obtained from all equality constraints from Eq 22 to Eq 30, and the matrix A is obtained from all inequality constraints from Eq 31 to Eq 42. Notably, the biomass composition for a same species varies significantly under various conditions. Therefore, the quadratic programming looses mass balance constraints toward biomass synthesis. The purpose of scaling fluxes into the same range is to avoid the bias because fluxes have different dynamic ranges. The objective function f(v) can be rewritten into a standard quadratic programming problem using the following steps:

\begin{matrix} f (v) & = \sum_{i = 1}^{29} {(Scaled (v_{i}) - Scaled ({\hat{v}}_{i}))}^{2} = \sum_{i = 1}^{29} {(\frac{v_{i} - M i n_{i}}{M a x_{i} - M i n_{i}} - \frac{{\hat{v}}_{i} - M i n_{i}}{M a x_{i} - M i n_{i}})}^{2} \\ = 2 \cdot \sum_{i = 1}^{29} (\frac{1}{2} \frac{v_{i}^{2}}{{(M a x_{i} - M i n_{i})}^{2}} + \frac{- 1 \cdot v_{i} \cdot {\hat{v}}_{i}}{{(M a x_{i} - M i n_{i})}^{2}} + \frac{1}{2} \frac{{\hat{v}}_{i}^{2}}{{(M a x_{i} - M i n_{i})}^{2}}) \end{matrix}

(44)

where Min_i and Max_i are the range of the i-th flux. Since the last term $\frac{1}{2} {(\frac{{\hat{v}}_{i}}{M a x_{i} - M i n_{i}})}^{2}$ and the coefficient 2 are constants, they can be omitted from the objective function. Hence, Eq 43 can be rewritten in standard quadratic programming form as

\begin{matrix} Minimize f (v) & = \frac{1}{2} \sum_{i = 1}^{29} \frac{{(v_{i})}^{2}}{{(M a x_{i} - M i n_{i})}^{2}} + \sum_{i = 1}^{29} \frac{- 1 \cdot v_{i} \cdot {\hat{v}}_{i}}{{(M a x_{i} - M i n_{i})}^{2}} \\ Subject to S \cdot v & = 0, \\ A \cdot v & \geq 0 . \end{matrix}

(45)

For the upper and lower boundaries of each flux, i.e., Max_i and Min_i, we used the maximal and minimal values observed in multiple datasets as the default values. Users can manually set desired values for the upper/lower bound of any specific flux in MFlux webpage, or they can opt to not use any boundaries. For instance, users can simply set the boundary of a certain flux as zero if this specific gene is knocked out.

Constraint programming and input checking

To ensure user inputs, e.g., growth rates, oxygen usage, and substrate uptake rates, are biologically meaningful, MFlux first checks the satisfiability (e.g., whether cell growth rate is realistic) of input values [36]. The biological meaningfulness is represented using constraint programming [37], where each input is treated as a variable of a given domain. A set of inputs lacking of biological meaning will cause those constraints to be unsatisfied and MFlux will report an error message to warn the user. The Python module python-constraint [38] is used as the constraint solver.

Overall system design

Different parts of MFlux mentioned above are put together as illustrated in Fig 2. The prediction on 29 fluxes is done via an RBF-kernel SVM, whose outcome will be finalized by quadratic programming. Users can set boundary constraints to represent information about genes that are knocked out on the species, and such information will be used in quadratic programming. If parameters set by the user are not biologically meaningful, a warning message will be displayed. In the future, users will also have the option to enter flux constraints and settings of their own experiment to improve the prediction accuracy of MFlux.

Results

Pathway map and statistical analysis

The core metabolism of bacteria is summarized into a pathway map in Fig 1. Considering the availability of information, 29 major fluxes with 14 potential substrates were used to represent a universal heterotrophic carbon metabolism for non-photosynthetic bacteria species, which includes glycolysis, the tricarboxylic acid (TCA) cycle, the pentose phosphate (PP) pathway, the Entner–Doudoroff (ED) pathway, the glyoxylate shunt and the anaplerotic pathway. It is difficult for ¹³C-MFA to precisely resolve the anaplerotic pathway fluxes [39]. Information on the anaplerotic pathway is either incomplete or not precise in many publications in our dataset. Consequently, we simplified the anaplerotic pathway into two reversible fluxes. Similarly, we ignored several overflow fluxes which occasionally appear in ¹³C-MFA anaerobic metabolisms (e.g., the secretion of formate, butyrate, or pyruvate), because of lacking sufficient samples for machine learning. The omission of those fluxes can also partially explain the high prediction error in some fluxes (e.g., v₈: Pyruvate → Acetyl-CoA).

By statistical analysis, we determined the variation between each flux profile and the average flux profile from our ¹³C-MFA dataset. The average value, the range, and the 95% confidence interval for each flux are shown in Fig 3. The most conservative fluxes from our dataset include the non-oxidative pentose phosphate pathway and the glyoxylate shunt. The former pathway supplies precursors for bio-synthesizing amino acids (i.e., histidine, phenylalanine, and tyrosine) and nucleotides. The latter acts as an alternative carbon reserving path to the TCA cycle and is inhibited by the presence of glucose (most ¹³C-MFA is based on the glucose metabolism). All 29 fluxes are found to have a relatively narrow confidence interval compared to possible flux ranges, suggesting that fluxes of different bacteria species varies in a relatively small range. This is because most ¹³C-MFA studies are focusing on models species (e.g., E. coli and B. subtilis) and glucose based metabolism, while there are much less MFA efforts to study non-model species or metabolism of carbon substrates other than sugars (i.e., bias of fluxome research across).

Fig 3 — “Flux range” represents the variation of each flux in the ¹³C-MFA dataset. “95% confidence interval” indicates that 95% of flux data were within a small range. “Average flux value” is the average value in each flux based on all data in our ¹³C-MFA dataset.

Optimization of algorithms and parameters

To decide the most suitable ML algorithm, we first performed a grid search in the parameter space, using a dataset of wild type (WT) samples only. The best results of three different algorithms (for SVM, linear kernel only here) are presented in Fig 4. SVM makes better predictions than either the decision tree or k-NN on most fluxes. After this step, we carried out a second round of grid search to optimize parameters and improve the performance of SVM on the whole phenotype (WP) dataset (both WT and engineered). Both the linear kernel and radial bias function (RBF) kernel were included in this round of grid search.

Fig 4 — The best cross-validation results on 29 fluxes are compared. All tests in this step were performed on the WT dataset only.

Better cross validation was expected from the SVM models trained on the WT dataset, rather than on the WP dataset, while sophisticated genetic variations are not included in the WT dataset. However, cross-validation results refuted our initial thought: the models from the WP dataset demonstrated significantly better performance than those trained on the WT dataset (data shown in Fig 5). This result can be interpreted as that the size of the training set is a major factor affecting the model quality, especially when the training set is relatively small (the sizes of WT and WP datasets are about 150 and 450 samples, respectively). We also compared the SVM results using the linear kernel with those using the RBF kernel, and the RBF kernel showed slightly better performance (Fig 6). The parameter set producing the most accurate cross-validation result was used to configure MFlux. Notably, prediction on v₁₁ (the second step of the oxidative PP pathway) and v₂₄ (the glyoxylate shunt) have relatively large variations. Two factors may contribute to this fact. Both v₁₁ and v₂₄ have relatively narrow ranges (see Fig 1) and consequently even small numerical variations will generate larger relative errors for both fluxes. Meanwhile, genetic modifications may influence both v₁₁ (e.g., zwf knockout [40]) and v₂₄ (e.g., ppc knockout [41]) significantly. For instance, knocking out zwf in E. coli will cause a zero flux in v₁₀ (the oxidative pentose phosphate pathway, OPP pathway) [42]. However, the lack of sufficient information on flux re-organization mechanisms in engineered microbes reduces ML predictability. This is because most engineered microbial fluxomics studies are focused on a few model species such as E. coli. To resolve this problem, the MFlux platform allows the users to manually set the boundaries of central fluxes to improve prediction quality (e.g., setting a zero flux through the OPP pathway for E. coli zwf mutant).

Fig 5 — Grid searches are performed on both linear and RBF kernels. The results from WP dataset are much better than those from the WT dataset. The result indicated that the size of the dataset is an important factor affecting the predictive power of machine learning models.

Fig 6 — The best cross-validation results of linear kernel and RBF kernel after grid searches on WP dataset are very similar. The RBF kernel is employed in the final model for flux prediction.

Flux correction by quadratic programming

After parameter optimization, the SVM models of the best parameter sets can predict with relatively small error. However, the flux profile predicted by the ML method does not necessarily satisfy the inherent stoichiometric constraints of metabolic networks because the ML methods do have big enough dataset at this stage to reflect this. The situation could get even worse where specific fluxes predicted by the ML algorithm may go beyond a biologically meaningful range (e.g., the predicted glyoxylate shunt flux v₂₄ may have a negative value). To address those issues, we employed quadratic programming for flux correction as described in the Methods section. More rational results with improved accuracy are expected after flux correction. An essential assumption of this step is that ML predictions are relatively close to real values reported in the literature. This hypothesis is backed by our cross-validation results further validated in the following case studies.

Case studies

To demonstrate the functionality of MFlux, we carried out tests on 20 cases, and the results are illustrated in Fig 7. Brief information for each case is listed in Table 1, and comprehensive results are included in S1 and S2 Tables. In general, MFlux can achieve decent flux predictions. Here we demonstrate two cases which are Cases 8 and 16.

Table 1. Summary of 20 cases of study.

Glc, glucose; Xyl, xylose; Lac, lactate; Ace, acetate; KO, knockout.

Species	Carbon source	Oxygen condition	Reactor	Genetic background	Case
E. coli	Glc	aerobic	tube	WT	1 [12]
E. coli	Glc	aerobic	baffled shake flask	ppc KO	2–4 [41]
B. subtilis	Glc	aerobic	shake flask, CSTR	WT, spo0A KO	5–7 [43]
B. subtilis	Multiple	aerobic	shake flask	mutant	8–11 [44]
C. glutamicum	Glc	aerobic	shake flask	WT	12 [45]
C. glutamicum	Glc	aerobic	shake flask	mutant	13 [46]
P. denitrificans	Glc	aerobic, microaerobic	fermentor	WT	14, 15 [47]
G. thermoglucosidasius	Glc	microaerobic	shake flask	WT	16 [28]
Thermoanaerobacter sp.	Xyl	anaerobic	batch (closed)	WT	17, 18 [48]
D. vulgaris	Lac	anaerobic	batch (closed)	WT	19 [49]
G. metallireducens	Ace	anaerobic	batch (closed)	WT	20 [3]

Open in a new tab

In Case 8, B. subtilis strain uptakes the mixed substrates succinate and glutamate. To illustrate mixed substrates co-metabolisms, we tested MFlux with ¹³C-MFA data of B. subtilis reported by Chubukov et al. [44]. Microbial fermentation fed with multiple substrates of low price is promising for the biotechnology industry. However, there are very few quantitative analyses of this topic. In this test, we adopted the same set of parameters found in the literature (S1 Table, Case 8) as the inputs of MFlux. For flux correction, we directly took the default boundary settings for quadratic programming. A comparison of flux profiles reported by ¹³C-MFA, predicted by ML only, and predicted by MFlux (i.e., ML + quadratic programming) is illustrated in Fig 8. ML-only approach and MFlux accurately predict on most fluxes, closely matching the ¹³C-MFA flux profiles with Root Mean Squared Error (RMSE) under 5. For ML, the predictions have large variation on specific fluxes (e.g., v₁₁—oxidative PP pathway and v₁₉– TCA cycle). Quadratic programming can further adjust flux profiles and reduce deviations of flux predictions. The corrected flux profiles also meet the basic stoichiometric relationship of the metabolic network. The final prediction from MFlux shows improvement with RMSE reduces to 3.2.

Fig 8 — *B. subtilis* was incubated in a shake flask (37 C, 300 rpm, aerobic condition), and supplied with labeled succinate and glutamate as carbon sources in M9 minimal medium. Detailed information is in S1 Table.

In Case 16, G. thermoglucosidasius strain M10EXG grows under microaerobic conditions. G. thermoglucosidasius is a thermophilic and ethanol tolerant bacterium which can convert both hexose and pentose into ethanol [28]. To predict its central fluxomes, the parameter set used is listed in S1 Table, along with the default boundary settings for flux correction. A heat map (Fig 9) visualizes ¹³C-MFA fluxes with ML-only fluxes and MFlux results. The results are encouraging: ML-only prediction gives an RMSE of 4.0, while MFlux uses both ML and quadratic programming to improve the prediction to an RMSE of only 3.0. Among the 20 case studies, the average flux set has very large variations (RMSE of 33.5) from actual ¹³C-MFA fluxes (S2 Table). In this case, MFlux reduces the deviations of predicted fluxes from ¹³C-MFA values.

Fig 9 — *G. thermoglucosidasius* M10EXG was incubated in sealed bottles (micro-aerobic condition), supplied with glucose as a carbon source. Detailed information is in S2 Table.

For species with genetic modifications in major pathways (Cases 2, 3, 4, 12, and 13, E. coli and C. glutamicum), MFlux predictions have an RMSE between 5 and 10, higher than the RMSE for prediction of wild type strains. Since MFlux is currently unable to capture complex regulatory mechanisms of flux reorganization, human-computer interaction can be employed by manually tuning boundary values of certain fluxes to improve flux prediction quality. For example, knocking out ppc on E. coli may activate the glyoxylate shunt [41, 42]. The users can assign a non-zero lower boundary of the glyoxylate shunt when running MFlux.

Improving flux balance analysis of microbial metabolism via MFlux

Stoichiometry-based flux balance analysis (FBA) is an important mechanistic tool to predict unknown cell metabolism [50]. Accurate FBA prediction relies highly on setting the objective function and the flux constraints appropriately (based on thermodynamics or experimental analysis). Here, we compare FBA with MFlux for predicting E. coli metabolisms. The latest version of E. coli iJO1366 genome-scale model (2583 fluxes) was used [51]. Two comparative case studies were performed on E. coli fluxomes: one case for glucose based ¹³C-MFA via parallel labeling experiments [12] and the other for glucose and glycerol co-utilization (unpublished data from the Shimizu Group). Neither of the test cases was included in the training set of MFlux. Given ¹³C-MFA results as the control, MFlux results apparently have smaller RMSEs than FBA predictions. In the first case, the FBA has an RMSE of 11.3, while MFlux has an RMSE of 6.5 (Fig 10A). In the second case, the FBA has an RMSE of 22.5, while MFlux has an RMSE of 5.1 (Fig 10B). To circumvent variations caused by alternative solutions in FBA, we also employed pFBA and geometricFBA for both cases [52, 53] (S2 Table). In general, pFBA does not show better results compared with FBA for either case, while geometricFBA does not converge in our calculation.

Fig 10 — FBA analysis is simulated by an *E. coli i*JO1366 model (latest version) with default boundary settings from the reference [54]. The default values of growth associated maintenance energy (GAM) and non-growth associated maintenance energy (NGAM) were adopted. A) *E. coli* fluxome of glucose metabolism was precisely measured via parallel labeling experiments (a recent paper not in our dataset) [12]. B) *E. coli* fluxome of glycerol and glucose co-metabolism as measured by Drs. Yao and Shimizu (unpublished data). The *E. coli* strain was cultured in chemostat fermentor with a working volume of 1 L(37 C). The dilution rate in the continuous culture was 0.35 h⁻¹. [1-¹³C] glucose and [1, 3-¹³C] glycerol were used for tracer experiments. The flux calculation is based on a previous method [42]. The RMSE from FBA is 22.5, while the RMSE from MFlux (this work) is 5.1. The COBRA toolbox running on MATLAB R2012b was employed for FBA/pFBA/geometricFBA simulation, and Gurobi 5.5 was used for linear programming. Detailed information is included in S2 Table.

FBA alone has given good predictions of growth rate as well as input and output fluxes, but not of intercellular fluxes. It is difficult to obtain actual P/O ratios, the ATP maintenance cost, the oxygen flux, and the transhydrogenase activities [55]. These energy/cofactor variables strongly affect the fluxes in the oxidative PP pathway (NADPH generation) and the TCA cycle (NADH, NADPH, and FADH₂ generation). Without proper flux constraints and objective functions, it is more challenging for FBA to narrowly determine intracellular fluxomes in suboptimal metabolisms, especially for co-metabolism dual substrates because of the large solution space for the cell metabolism to optimize biomass growth using two substrates. As a complementary tool, MFlux may offer a quick metabolic overview and provide biologically meaningful flux boundaries to reduce FBA solution spaces when proper constraints for FBA are unavailable.

Discussion

Metabolic robustness of fluxome patterns among microbial species

“Robustness” was originally defined as the closed-loop process stability under perturbations in the control field. This definition is applicable to biochemical networks. To maintain the physiological output (i.e., the fluxome) within a desired range, microorganisms employ sophisticated control disciplines at different architecture levels, from the genome to the phenotype. In contrast to chaotic transcriptional profiles, the microbial fluxome shows robustness so that cells can survive in constantly-altering environments or in response to genetic mutations [56–58]. Metabolic rigidity at the flux level was first reported by Stephanopoulos in the early 1990s [59, 60]: NADPH is important for anabolism in the exponential growth phase, and the flux ratio around glucose-6-P node is rigid to form NADPH [60]. Moreover, 12 precursors from the central metabolism are required for biomass formation, which all have relatively small variations that are mainly dependent on biomass compositions. Due to both thermodynamic and mass balance constraints, cell metabolism aims to minimize variations in flux ratios under environmental perturbations. This rule also works for engineered microbes with moderately overexpressed pathways or strains from random mutations or deletions of non-essential genes. The feature of metabolic robustness facilitates ML applications.

Flux pattern recognition enables MFlux to predict metabolism of new species by learning from a small set of fluxome information from the same genus. For example, the metabolisms of P. aeruginosa, P. fluorescens, and P. putida have been studied by ¹³C-MFA in the past decade [61–65]. The results show that different Pseudomonas species employ remarkably identical fluxomics types: they employ a highly active ED pathway for glycolytic metabolism and keep a low flux on the PP pathway for biomass synthesis, due to the lack of the pfk gene [66]. The ED pathway has less cost for protein formation than the Embden–Meyerhof–Parnas (EMP) pathway, yet only one ATP is formed per glucose [67, 68]. Pseudomonas species have slow cell growth rates and their aerobic metabolisms do not yield by-products. They also demonstrate a very active pyruvate shunt (MAL → PYR) and NADPH overproduction flux (a benefit for counteracting oxidative stress). On the other hand, the TCA cycle in Pseudomonas species show plasticity under genetic and environmental variations [69], and can respond to increased ATP and NADH demands under stress conditions [70].

For different bacterial species (e.g., E. coli and Bacillus), their fluxomes (e.g., glucose metabolisms) can be similar, because central fluxes in catabolism are regulated by energy and building block requirements that show much smaller variations than genome or transcriptional differences. On the other hand, change of carbon substrates may alternate flux distributions. For example, co-utilization of glucose and glycerol in E. coli cause significant re-organization of fluxomes. In a same microbial strain, different fluxome patterns can be employed for metabolizing different substrates (e.g., glucose-based fluxome vs acetate based fluxomes). Recognizing these metabolic patterns allows the use of a relatively small training set to perform a decent metabolic prediction of diverse metabolic types. Consequently, these common principles of certain classes of microorganisms can be captured by machine learning for fluxome predictions.

Limitations of machine learning

There are several major challenges regarding MFlux. First, the ¹³C-MFA flux in literature may have errors and biases, which would be included in the learning/training process of MFlux and lead to further variations. For example, current ¹³C-MFA studies are not evenly distributed among a broad scope of microbial genera. Most reported MFAs are concentrated in a few model microbial species or metabolism of only a few substrates (mainly glucose), and thus our current ML cannot predict fluxomes well in certain cases. Such problem (model bias) can be resolved after more ¹³C-MFA papers for non-model species are included in the database and more constraints are implemented by our platform.

Second, the predictability of ML is limited to species and pathways that are already included in learning. More information and efforts are required to deal with cases of genetically modified strains with engineered pathways that hijack flux for synthesis of diverse commodity chemicals [13]. Currently, ¹³C-MFA has not widely used by synthetic biology community yet. In future versions of MFlux, new metabolic knowledge and rules should be applied for flux corrections.

Third, it is still difficult to incorporate regulation mechanisms into the current model. For instance, various catabolite repression mechanisms regulate the cell fluxome in the presence of multiple substrates (e.g., glucose shows catabolite repression for fast growing E. coli when both glucose and glycerol are available, Fig 10) [71]. These hierarchy regulations among substrate utilization can be dependent on growth rates or can differ among microbial species (E. coli, Bacillus and Corynebacterium).

Fourth, when oxygen is not available, fast bacterial sugar utilization will activate mixed acid fermentation (e.g., by utilizing lactate dehydrogenase and pyruvate formate lyase) to produce complicated overflow metabolites [13]. This mechanism is also furnished in yeast and mammalian cells. However, ¹³C-MFA studies on anaerobic metabolisms are much less frequent than on aerobic metabolisms. MFlux cannot predict the complicated patterns of overflow fluxes at this stage.

Fifth, our current dataset is still unable to support ML studies on phototrophic bacterial fluxomes. For phototrophic metabolism, its energy generation (ATP, NADH and NADPH) may not be controlled by substrate catabolism. Some phototrophic bacteria (e.g., cyanobacteria) have versatile autotrophic and photomixotrophic metabolism that is highly sensitive to light and substrate availability. Other phototrophs may even have CO₂ fixation pathway (such as the reversed TCA cycle). Therefore, our MFlux platform could not make ML predictions but only reports a general description of metabolic features of these species.

Lastly, ML cannot directly estimate fluxes for carbon sources which are not part of the learning dataset. To predict fluxomes for new substrates, users need to assume that similar entry-points of carbon sources into the central metabolic network may cause similar flux distributions (e.g., sucrose has to be treated as a combination of glucose and fructose).

Conclusion

This proof-of-concept study demonstrates that AI methods can facilitate fluxomics research with reasonable precision. ¹³C-MFA is a very small field of just hundreds of MFA research papers on microbial species published in the past two decades. In the long term, ML methods may solve this problem: with a large and reliable fluxomics dataset and more information from ¹³C-MFA and AI scientists, the future MFlux model can make broad-scope metabolism predictions. To sum up, MFlux presents the first platform introducing ML in the field of fluxomics and it will be continuously updated and improved. It will inspire the development of similar computational tools to advance omics and metabolic engineering fields [72].

Supporting Information

S1 Program. MFlux Computer Program (Source code).

Python scripts in a ZIP file.

(ZIP)

Click here for additional data file.^{(17.2KB, zip)}

S1 Table. Results of 20 case studies.

Detailed information for 20 cases studies using MFlux, including literature references, input conditions, ¹³C-MFA flux, the flux profiles predicted by ML, and the flux profiles predicted by MFlux with additional constraints.

(XLSX)

Click here for additional data file.^{(269.7KB, xlsx)}

S2 Table. Detailed information of the comparison with FBA/pFBA.

The information of constraints, objective function, and simulation results.

(XLSX)

Click here for additional data file.^{(49.9KB, xlsx)}

Acknowledgments

The authors would like to appreciate helpful suggestions by Dr. Eric You Xu (Fitbit Inc.) and Dr. Lian He (Washington University).

Data Availability

The dataset used to train MFlux is being updated all the time and it is downloadable via our project website http://MFlux.org. All other relevant data are in the manuscript and Supporting Information.

Funding Statement

This work was funded by NSF DBI 1356669 http://www.nsf.gov/awardsearch/showAward?AWD_ID=1356669. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Winter G, Krömer JO. Fluxomics–connecting’omics analysis and phenotypes. Environmental Microbiology. 2013;15(7):1901–1916. 10.1111/1462-2920.12064 [DOI] [PubMed] [Google Scholar]
2. Chen X, Alonso AP, Allen DK, Reed JL, Shachar-Hill Y. Synergy between ¹³C-metabolic flux analysis and flux balance analysis for understanding metabolic adaption to anaerobiosis in E. coli. Metabolic Engineering. 2011;13(1):38–48. 10.1016/j.ymben.2010.11.004 [DOI] [PubMed] [Google Scholar]
3. Tang YJ, Chakraborty R, Martín HG, Chu J, Hazen TC, Keasling JD. Flux analysis of central metabolic pathways in Geobacter metallireducens during reduction of soluble Fe(III)-nitrilotriacetic acid. Applied and Environmental Microbiology. 2007;73(12):3859–3864. 10.1128/AEM.02986-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Tang JKH, You L, Blankenship RE, Tang YJ. Recent advances in mapping environmental microbial metabolisms through ¹³C isotopic fingerprints. Journal of The Royal Society Interface. 2012;9(76):2767–2780. 10.1098/rsif.2012.0396 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Yim H, Haselbeck R, Niu W, Pujol-Baxley C, Burgard A, Boldt J, et al. Metabolic engineering of Escherichia coli for direct production of 1, 4-butanediol. Nature Chemical Biology. 2011;7(7):445–452. 10.1038/nchembio.580 [DOI] [PubMed] [Google Scholar]
6. Becker J, Zelder O, Häfner S, Schröder H, Wittmann C. From zero to hero–Design-based systems metabolic engineering of Corynebacterium glutamicum for L-lysine production. Metabolic Engineering. 2011;13(2):159–168. 10.1016/j.ymben.2011.01.003 [DOI] [PubMed] [Google Scholar]
7. He L, Xiao Y, Gebreselassie N, Zhang F, Antoniewicz MR, Tang YJ, et al. Central metabolic responses to the overproduction of fatty acids in Escherichia coli based on ¹³C-metabolic flux analysis. Biotechnology and Bioengineering. 2014;111(3):575–585. 10.1002/bit.25124 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Antoniewicz MR, Kelleher JK, Stephanopoulos G. Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. Metabolic Engineering. 2007;9(1):68–86. 10.1016/j.ymben.2006.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Weitzel M, Nöh K, Dalman T, Niedenführ S, Stute B, Wiechert W. ¹³CFLUX2–high-performance software suite for ¹³C-metabolic flux analysis. Bioinformatics. 2013;29(1):143–145. 10.1093/bioinformatics/bts646 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Quek LE, Wittmann C, Nielsen LK, Krömer JO. OpenFLUX: efficient modelling software for ¹³C-based metabolic flux analysis. Microbial Cell Factories. 2009;8:25 10.1186/1475-2859-8-25 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Zamboni N, Fischer E, Sauer U. FiatFlux–a software for metabolic flux analysis from ¹³C-glucose experiments. BMC Bioinformatics. 2005;6(1):209 10.1186/1471-2105-6-209 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Crown SB, Long CP, Antoniewicz MR. Integrated ¹³C-metabolic flux analysis of 14 parallel labeling experiments in Escherichia coli. Metabolic Engineering. 2015;28:151–158. 10.1016/j.ymben.2015.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Antoniewicz MR, Kraynie DF, Laffend LA, González-Lergier J, Kelleher JK, Stephanopoulos G. Metabolic flux analysis in a nonstationary system: fed-batch fermentation of a high yielding strain of E. coli producing 1, 3-propanediol. Metabolic Engineering. 2007;9(3):277–292. 10.1016/j.ymben.2007.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Nöh K, Grönke K, Luo B, Takors R, Oldiges M, Wiechert W. Metabolic flux analysis at ultra short time scale: isotopically non-stationary ¹³C labeling experiments. Journal of Biotechnology. 2007;129(2):249–267. 10.1016/j.jbiotec.2006.11.015 [DOI] [PubMed] [Google Scholar]
15. Tang YJ, Martin HG, Myers S, Rodriguez S, Baidoo EE, Keasling JD. Advances in analysis of microbial metabolic fluxes via ¹³C isotopic labeling. Mass Spectrometry Reviews. 2009;28(2):362–375. 10.1002/mas.20191 [DOI] [PubMed] [Google Scholar]
16. Zhuang WQ, Yi S, Bill M, Brisson VL, Feng X, Men Y, et al. Incomplete Wood–Ljungdahl pathway facilitates one-carbon metabolism in organohalide-respiring Dehalococcoides mccartyi. Proceedings of the National Academy of Sciences. 2015;111(17):6419–6424. 10.1073/pnas.1321542111 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Tarca AL, Carey VJ, Chen X, Romero R, Draghici S. Machine learning and its applications to biology. PLoS Computational Biology. 2007;3(6):e116 10.1371/journal.pcbi.0030116 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Kell DB. Metabolomics, modelling and machine learning in systems biology–towards an understanding of the languages of cells. FEBS Journal. 2006;273(5):873–894. [DOI] [PubMed] [Google Scholar]
19. Dale JM, Popescu L, Karp PD. Machine learning methods for metabolic pathway prediction. BMC Bioinformatics. 2010;11(1):15 10.1186/1471-2105-11-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Rätsch G, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer RJ, et al. Improving the Caenorhabditis elegans genome annotation using machine learning. PLoS Computational Biology. 2007;3(2):e20 10.1371/journal.pcbi.0030020 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Ye QH, Qin LX, Forgues M, He P, Kim JW, Peng AC, et al. Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning. Nature Medicine. 2003;9(4):416–423. 10.1038/nm843 [DOI] [PubMed] [Google Scholar]
22. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine. 2002;8(1):68–74. 10.1038/nm0102-68 [DOI] [PubMed] [Google Scholar]
23. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000;16(10):906–914. 10.1093/bioinformatics/16.10.906 [DOI] [PubMed] [Google Scholar]
24. Supek F, Peharec P, Krsnik-Rasol M, Šmuc T. Enhanced analytical power of SDS-PAGE using machine learning algorithms. Proteomics. 2008;8(1):28–31. 10.1002/pmic.200700555 [DOI] [PubMed] [Google Scholar]
25. Mahadevan S, Shah SL, Marrie TJ, Slupsky CM. Analysis of metabolomic data using support vector machines. Analytical Chemistry. 2008;80(19):7562–7570. 10.1021/ac800954c [DOI] [PubMed] [Google Scholar]
26. Zhang Z, Shen T, Rui B, Zhou W, Zhou X, Shang C, et al. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by ¹³C-fluxomics. Nucleic Acids Research. 2014;43:D549–D557. 10.1093/nar/gku1137 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Sauer U, Eikmanns BJ. The PEP–pyruvate–oxaloacetate node as the switch point for carbon flux distribution in bacteria. FEMS Microbiology Reviews. 2005;29(4):765–794. 10.1016/j.femsre.2004.11.002 [DOI] [PubMed] [Google Scholar]
28. Tang YJ, Sapra R, Joyner D, Hazen TC, Myers S, Reichmuth D, et al. Analysis of metabolic pathways and fluxes in a newly discovered thermophilic and ethanol-tolerant Geobacillus strain. Biotechnology and Bioengineering. 2009;102(5):1377–1386. 10.1002/bit.22181 [DOI] [PubMed] [Google Scholar]
29. Peters-Wendisch PG, Kreutzer C, Kalinowski J, Pátek M, Sahm H, Eikmanns BJ. Pyruvate carboxylase from Corynebacterium glutamicum: characterization, expression and inactivation of the pyc gene. Microbiology. 1998;144(4):915–927. 10.1099/00221287-144-4-915 [DOI] [PubMed] [Google Scholar]
30. Toya Y, Ishii N, Nakahigashi K, Hirasawa T, Soga T, Tomita M, et al. ¹³C-metabolic flux analysis for batch culture of Escherichia coli and its pyk and pgi gene knockout mutants based on mass isotopomer distribution of intracellular metabolites. Biotechnology Progress. 2010;26(4):975–992. [DOI] [PubMed] [Google Scholar]
31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
32. Leighty RW, Antoniewicz MR. COMPLETE-MFA: Complementary parallel labeling experiments technique for metabolic flux analysis. Metabolic Engineering. 2013;20:49–55. 10.1016/j.ymben.2013.08.006 [DOI] [PubMed] [Google Scholar]
33. Zhao J, Baba T, Mori H, Shimizu K. Effect of zwf gene knockout on the metabolism of Escherichia coli grown on glucose or acetate. Metabolic Engineering. 2004;6(2):164–174. 10.1016/j.ymben.2004.02.004 [DOI] [PubMed] [Google Scholar]
34. Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences. 2002;99(23):15112–15117. 10.1073/pnas.232349399 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Andersen M, Dahl J, Liu Z, Vandenberghe L. Interior-point methods for large-scale cone programming. Optimization for Machine Learning. 2011;p. 55–83. [Google Scholar]
36. Towell GG, Shavlik JW. Knowledge-based artificial neural networks. Artificial Intelligence. 1994;70(1):119–165. 10.1016/0004-3702(94)90105-8 [DOI] [Google Scholar]
37. Marriott K, Stuckey P. Programming with Constraints: An Introduction. MIT Press; 1998. [Google Scholar]
38.Niemeyer G. python-constraint: Constraint Solving Problem solver for Python;. Available from https://labix.org/python-constraint.
39. Fischer E, Zamboni N, Sauer U. High-throughput metabolic flux analysis based on gas chromatography–mass spectrometry derived ¹³C constraints. Analytical Biochemistry. 2004;325(2):308–316. 10.1016/j.ab.2003.10.036 [DOI] [PubMed] [Google Scholar]
40. Zhao J, Baba T, Mori H, Shimizu K. Global metabolic response of Escherichia coli to gnd or zwf gene-knockout, based on ¹³C-labeling experiments and the measurement of enzyme activities. Applied Microbiology and Biotechnology. 2004;64(1):91–98. 10.1007/s00253-003-1458-5 [DOI] [PubMed] [Google Scholar]
41. Fong SS, Nanchen A, Palsson BO, Sauer U. Latent pathway activation and increased pathway capacity enable Escherichia coli adaptation to loss of key metabolic enzymes. Journal of Biological Chemistry. 2006;281(12):8024–8033. 10.1074/jbc.M510016200 [DOI] [PubMed] [Google Scholar]
42. Peng L, Arauzo-Bravo MJ, Shimizu K. Metabolic flux analysis for a ppc mutant Escherichia coli based on ¹³C-labelling experiments together with enzyme activity assays and intracellular metabolite measurements. FEMS Microbiology Letters. 2004;235(1):17–23. 10.1111/j.1574-6968.2004.tb09562.x [DOI] [PubMed] [Google Scholar]
43. Tännler S, Decasper S, Sauer U. Maintenance metabolism and carbon fluxes in Bacillus species. Microbial Cell Factories. 2008;7(1):19 10.1186/1475-2859-7-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Chubukov V, Uhr M, Le Chat L, Kleijn RJ, Jules M, Link H, et al. Transcriptional regulation is insufficient to explain substrate-induced flux changes in Bacillus subtilis. Molecular Systems Biology. 2013;9(1):709 10.1038/msb.2013.66 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. van Ooyen J, Noack S, Bott M, Reth A, Eggeling L. Improved L-lysine production with Corynebacterium glutamicum and systemic insight into citrate synthase flux and activity. Biotechnology and Bioengineering. 2012;109(8):2070–2081. 10.1002/bit.24486 [DOI] [PubMed] [Google Scholar]
46. Bommareddy RR, Chen Z, Rappert S, Zeng AP. A de novo NADPH generation pathway for improving lysine production of Corynebacterium glutamicum by rational design of the coenzyme specificity of glyceraldehyde 3-phosphate dehydrogenase. Metabolic Engineering. 2014;25:30–37. 10.1016/j.ymben.2014.06.005 [DOI] [PubMed] [Google Scholar]
47. Wang ZJ, Wang P, Liu YW, Zhang YM, Chu J, Huang Mz, et al. Metabolic flux analysis of the central carbon metabolism of the industrial vitamin B12 producing strain Pseudomonas denitrificans using ¹³C-labeled glucose. Journal of the Taiwan Institute of Chemical Engineers. 2012;43(2):181–187. 10.1016/j.jtice.2011.09.002 [DOI] [Google Scholar]
48. Hemme CL, Fields MW, He Q, Deng Y, Lin L, Tu Q, et al. Correlation of genomic and physiological traits of Thermoanaerobacter species with biofuel yields. Applied and Environmental Microbiology. 2011;77(22):7998–8008. 10.1128/AEM.05677-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Tang Y, Pingitore F, Mukhopadhyay A, Phan R, Hazen TC, Keasling JD. Pathway confirmation and flux analysis of central metabolic pathways in Desulfovibrio vulgaris Hildenborough using gas chromatography-mass spectrometry and Fourier transform-ion cyclotron resonance mass spectrometry. Journal of Bacteriology. 2007;189(3):940–949. 10.1128/JB.00948-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nature Biotechnology. 2010;28(3):245–248. 10.1038/nbt.1614 [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism–2011. Molecular Systems Biology. 2011;7(1):535 10.1038/msb.2011.65 [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Lewis NE, Hixson KK, Conrad TM, Lerman JA, Charusanti P, Polpitiya AD, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models. Molecular Systems Biology. 2010;6(1):390 10.1038/msb.2010.47 [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Smallbone K, Simeonidis E. Flux balance analysis: a geometric perspective. Journal of Theoretical Biology. 2009;258(2):311–315. 10.1016/j.jtbi.2009.01.027 [DOI] [PubMed] [Google Scholar]
54. Orth JD, Palsson B. Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC Systems Biology. 2012;6(1):30 10.1186/1752-0509-6-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Wu SG, He L, Wang Q, Tang YJ, An ancient Chinese wisdom for metabolic engineering: Yin-Yang. Microbial Cell Factories. 2015;14(1):39 10.1186/s12934-015-0219-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Fischer E, Sauer U. Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism. Nature Genetics. 2005;37(6):636–640. 10.1038/ng1555 [DOI] [PubMed] [Google Scholar]
57. Schuetz R, Kuepfer L, Sauer U. Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molecular Systems Biology. 2007;3(1):119 10.1038/msb4100162 [DOI] [PMC free article] [PubMed] [Google Scholar]
58. Tang YJ, Martin HG, Deutschbauer A, Feng X, Huang R, Llora X, et al. Invariability of central metabolic flux distribution in Shewanella oneidensis MR-1 under environmental or genetic perturbations. Biotechnology Progress. 2009;25(5):1254–1259. 10.1002/btpr.227 [DOI] [PubMed] [Google Scholar]
59. Stephanopoulos G. Metabolic fluxes and metabolic engineering. Metabolic Engineering. 1999;1(1):1–11. 10.1006/mben.1998.0101 [DOI] [PubMed] [Google Scholar]
60. Stephanopoulos G, Vallino JJ. Network rigidity and metabolic engineering in metabolite overproduction. Science. 1991;252(5013):1675–1681. 10.1126/science.1904627 [DOI] [PubMed] [Google Scholar]
61. Lien SK, Niedenführ S, Sletta H, Nöh K, Bruheim P. Fluxome study of Pseudomonas fluorescens reveals major reorganisation of carbon flux through central metabolic pathways in response to inactivation of the anti-sigma factor MucA. BMC Systems Biology. 2015;9(1):6 10.1186/s12918-015-0148-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
62. Fuhrer T, Fischer E, Sauer U. Experimental identification and quantification of glucose metabolism in seven bacterial species. Journal of Bacteriology. 2005;187(5):1581–1590. 10.1128/JB.187.5.1581-1590.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
63. Wierckx N, Ruijssenaars HJ, de Winde JH, Schmid A, Blank LM. Metabolic flux analysis of a phenol producing mutant of Pseudomonas putida S12: verification and complementation of hypotheses derived from transcriptomics. Journal of Biotechnology. 2009;143(2):124–129. 10.1016/j.jbiotec.2009.06.023 [DOI] [PubMed] [Google Scholar]
64. del Castillo T, Ramos JL, Rodríguez-Herva JJ, Fuhrer T, Sauer U, Duque E. Convergent peripheral pathways catalyze initial glucose catabolism in Pseudomonas putida: genomic and flux analysis. Journal of Bacteriology. 2007;189(14):5142–5152. 10.1128/JB.00203-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Blank LM, Ionidis G, Ebert BE, Bühler B, Schmid A. Metabolic response of Pseudomonas putida during redox biocatalysis in the presence of a second octanol phase. FEBS Journal. 2008;275(20):5173–5190. 10.1111/j.1742-4658.2008.06648.x [DOI] [PubMed] [Google Scholar]
66. Conway T. The Entner-Doudoroff pathway: history, physiology and molecular biology. FEMS Microbiology Reviews. 1992;103(1):1–28. 10.1111/j.1574-6968.1992.tb05822.x [DOI] [PubMed] [Google Scholar]
67. Bar-Even A, Flamholz A, Noor E, Milo R. Rethinking glycolysis: on the biochemical logic of metabolic pathways. Nature Chemical Biology. 2012;8(6):509–517. 10.1038/nchembio.971 [DOI] [PubMed] [Google Scholar]
68. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proceedings of the National Academy of Sciences. 2013;110(24):10039–10044. 10.1073/pnas.1215283110 [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Berger A, Dohnt K, Tielen P, Jahn D, Becker J, Wittmann C. Robustness and plasticity of metabolic pathway flux among uropathogenic isolates of Pseudomonas aeruginosa. PloS One. 2014;9(4). 10.1371/journal.pone.0088368 [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Ebert BE, Kurth F, Grund M, Blank LM, Schmid A. Response of Pseudomonas putida KT2440 to increased NADH and ATP demand. Applied and Environmental Microbiology. 2011;77(18):6597–6605. 10.1128/AEM.05588-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
71. Yao R, Hirose Y, Sarkar D, Nakahigashi K, Ye Q, Shimizu K. Catabolic regulation analysis of Escherichia coli and its crp, mlc, mgsA, pgi and ptsG mutants. Microbial Cell Factories. 2011;10(67):1475–2859 10.1186/1475-2859-10-67 . [DOI] [PMC free article] [PubMed] [Google Scholar]
72. Wu SG, Shimizu K, Tang JKH, Tang YJ. Facilitate Collaborations among Synthetic Biology, Metabolic Engineering and Machine Learning. ChemBioEng Reviews. 2016;3(2):1–11 10.1002/cben.201500024. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Program. MFlux Computer Program (Source code).

Python scripts in a ZIP file.

(ZIP)

Click here for additional data file.^{(17.2KB, zip)}

S1 Table. Results of 20 case studies.

(XLSX)

Click here for additional data file.^{(269.7KB, xlsx)}

S2 Table. Detailed information of the comparison with FBA/pFBA.

The information of constraints, objective function, and simulation results.

(XLSX)

Click here for additional data file.^{(49.9KB, xlsx)}

Data Availability Statement

[pcbi.1004838.ref001] 1. Winter G, Krömer JO. Fluxomics–connecting’omics analysis and phenotypes. Environmental Microbiology. 2013;15(7):1901–1916. 10.1111/1462-2920.12064 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref002] 2. Chen X, Alonso AP, Allen DK, Reed JL, Shachar-Hill Y. Synergy between ¹³C-metabolic flux analysis and flux balance analysis for understanding metabolic adaption to anaerobiosis in E. coli. Metabolic Engineering. 2011;13(1):38–48. 10.1016/j.ymben.2010.11.004 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref003] 3. Tang YJ, Chakraborty R, Martín HG, Chu J, Hazen TC, Keasling JD. Flux analysis of central metabolic pathways in Geobacter metallireducens during reduction of soluble Fe(III)-nitrilotriacetic acid. Applied and Environmental Microbiology. 2007;73(12):3859–3864. 10.1128/AEM.02986-06 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref004] 4. Tang JKH, You L, Blankenship RE, Tang YJ. Recent advances in mapping environmental microbial metabolisms through ¹³C isotopic fingerprints. Journal of The Royal Society Interface. 2012;9(76):2767–2780. 10.1098/rsif.2012.0396 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref005] 5. Yim H, Haselbeck R, Niu W, Pujol-Baxley C, Burgard A, Boldt J, et al. Metabolic engineering of Escherichia coli for direct production of 1, 4-butanediol. Nature Chemical Biology. 2011;7(7):445–452. 10.1038/nchembio.580 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref006] 6. Becker J, Zelder O, Häfner S, Schröder H, Wittmann C. From zero to hero–Design-based systems metabolic engineering of Corynebacterium glutamicum for L-lysine production. Metabolic Engineering. 2011;13(2):159–168. 10.1016/j.ymben.2011.01.003 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref007] 7. He L, Xiao Y, Gebreselassie N, Zhang F, Antoniewicz MR, Tang YJ, et al. Central metabolic responses to the overproduction of fatty acids in Escherichia coli based on ¹³C-metabolic flux analysis. Biotechnology and Bioengineering. 2014;111(3):575–585. 10.1002/bit.25124 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref008] 8. Antoniewicz MR, Kelleher JK, Stephanopoulos G. Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. Metabolic Engineering. 2007;9(1):68–86. 10.1016/j.ymben.2006.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref009] 9. Weitzel M, Nöh K, Dalman T, Niedenführ S, Stute B, Wiechert W. ¹³CFLUX2–high-performance software suite for ¹³C-metabolic flux analysis. Bioinformatics. 2013;29(1):143–145. 10.1093/bioinformatics/bts646 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref010] 10. Quek LE, Wittmann C, Nielsen LK, Krömer JO. OpenFLUX: efficient modelling software for ¹³C-based metabolic flux analysis. Microbial Cell Factories. 2009;8:25 10.1186/1475-2859-8-25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref011] 11. Zamboni N, Fischer E, Sauer U. FiatFlux–a software for metabolic flux analysis from ¹³C-glucose experiments. BMC Bioinformatics. 2005;6(1):209 10.1186/1471-2105-6-209 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref012] 12. Crown SB, Long CP, Antoniewicz MR. Integrated ¹³C-metabolic flux analysis of 14 parallel labeling experiments in Escherichia coli. Metabolic Engineering. 2015;28:151–158. 10.1016/j.ymben.2015.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref013] 13. Antoniewicz MR, Kraynie DF, Laffend LA, González-Lergier J, Kelleher JK, Stephanopoulos G. Metabolic flux analysis in a nonstationary system: fed-batch fermentation of a high yielding strain of E. coli producing 1, 3-propanediol. Metabolic Engineering. 2007;9(3):277–292. 10.1016/j.ymben.2007.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref014] 14. Nöh K, Grönke K, Luo B, Takors R, Oldiges M, Wiechert W. Metabolic flux analysis at ultra short time scale: isotopically non-stationary ¹³C labeling experiments. Journal of Biotechnology. 2007;129(2):249–267. 10.1016/j.jbiotec.2006.11.015 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref015] 15. Tang YJ, Martin HG, Myers S, Rodriguez S, Baidoo EE, Keasling JD. Advances in analysis of microbial metabolic fluxes via ¹³C isotopic labeling. Mass Spectrometry Reviews. 2009;28(2):362–375. 10.1002/mas.20191 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref016] 16. Zhuang WQ, Yi S, Bill M, Brisson VL, Feng X, Men Y, et al. Incomplete Wood–Ljungdahl pathway facilitates one-carbon metabolism in organohalide-respiring Dehalococcoides mccartyi. Proceedings of the National Academy of Sciences. 2015;111(17):6419–6424. 10.1073/pnas.1321542111 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref017] 17. Tarca AL, Carey VJ, Chen X, Romero R, Draghici S. Machine learning and its applications to biology. PLoS Computational Biology. 2007;3(6):e116 10.1371/journal.pcbi.0030116 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref018] 18. Kell DB. Metabolomics, modelling and machine learning in systems biology–towards an understanding of the languages of cells. FEBS Journal. 2006;273(5):873–894. [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref019] 19. Dale JM, Popescu L, Karp PD. Machine learning methods for metabolic pathway prediction. BMC Bioinformatics. 2010;11(1):15 10.1186/1471-2105-11-15 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref020] 20. Rätsch G, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer RJ, et al. Improving the Caenorhabditis elegans genome annotation using machine learning. PLoS Computational Biology. 2007;3(2):e20 10.1371/journal.pcbi.0030020 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref021] 21. Ye QH, Qin LX, Forgues M, He P, Kim JW, Peng AC, et al. Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning. Nature Medicine. 2003;9(4):416–423. 10.1038/nm843 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref022] 22. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine. 2002;8(1):68–74. 10.1038/nm0102-68 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref023] 23. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000;16(10):906–914. 10.1093/bioinformatics/16.10.906 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref024] 24. Supek F, Peharec P, Krsnik-Rasol M, Šmuc T. Enhanced analytical power of SDS-PAGE using machine learning algorithms. Proteomics. 2008;8(1):28–31. 10.1002/pmic.200700555 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref025] 25. Mahadevan S, Shah SL, Marrie TJ, Slupsky CM. Analysis of metabolomic data using support vector machines. Analytical Chemistry. 2008;80(19):7562–7570. 10.1021/ac800954c [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref026] 26. Zhang Z, Shen T, Rui B, Zhou W, Zhou X, Shang C, et al. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by ¹³C-fluxomics. Nucleic Acids Research. 2014;43:D549–D557. 10.1093/nar/gku1137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref027] 27. Sauer U, Eikmanns BJ. The PEP–pyruvate–oxaloacetate node as the switch point for carbon flux distribution in bacteria. FEMS Microbiology Reviews. 2005;29(4):765–794. 10.1016/j.femsre.2004.11.002 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref028] 28. Tang YJ, Sapra R, Joyner D, Hazen TC, Myers S, Reichmuth D, et al. Analysis of metabolic pathways and fluxes in a newly discovered thermophilic and ethanol-tolerant Geobacillus strain. Biotechnology and Bioengineering. 2009;102(5):1377–1386. 10.1002/bit.22181 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref029] 29. Peters-Wendisch PG, Kreutzer C, Kalinowski J, Pátek M, Sahm H, Eikmanns BJ. Pyruvate carboxylase from Corynebacterium glutamicum: characterization, expression and inactivation of the pyc gene. Microbiology. 1998;144(4):915–927. 10.1099/00221287-144-4-915 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref030] 30. Toya Y, Ishii N, Nakahigashi K, Hirasawa T, Soga T, Tomita M, et al. ¹³C-metabolic flux analysis for batch culture of Escherichia coli and its pyk and pgi gene knockout mutants based on mass isotopomer distribution of intracellular metabolites. Biotechnology Progress. 2010;26(4):975–992. [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref031] 31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]

[pcbi.1004838.ref032] 32. Leighty RW, Antoniewicz MR. COMPLETE-MFA: Complementary parallel labeling experiments technique for metabolic flux analysis. Metabolic Engineering. 2013;20:49–55. 10.1016/j.ymben.2013.08.006 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref033] 33. Zhao J, Baba T, Mori H, Shimizu K. Effect of zwf gene knockout on the metabolism of Escherichia coli grown on glucose or acetate. Metabolic Engineering. 2004;6(2):164–174. 10.1016/j.ymben.2004.02.004 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref034] 34. Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences. 2002;99(23):15112–15117. 10.1073/pnas.232349399 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref035] 35. Andersen M, Dahl J, Liu Z, Vandenberghe L. Interior-point methods for large-scale cone programming. Optimization for Machine Learning. 2011;p. 55–83. [Google Scholar]

[pcbi.1004838.ref036] 36. Towell GG, Shavlik JW. Knowledge-based artificial neural networks. Artificial Intelligence. 1994;70(1):119–165. 10.1016/0004-3702(94)90105-8 [DOI] [Google Scholar]

[pcbi.1004838.ref037] 37. Marriott K, Stuckey P. Programming with Constraints: An Introduction. MIT Press; 1998. [Google Scholar]

[pcbi.1004838.ref038] 38.Niemeyer G. python-constraint: Constraint Solving Problem solver for Python;. Available from https://labix.org/python-constraint.

[pcbi.1004838.ref039] 39. Fischer E, Zamboni N, Sauer U. High-throughput metabolic flux analysis based on gas chromatography–mass spectrometry derived ¹³C constraints. Analytical Biochemistry. 2004;325(2):308–316. 10.1016/j.ab.2003.10.036 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref040] 40. Zhao J, Baba T, Mori H, Shimizu K. Global metabolic response of Escherichia coli to gnd or zwf gene-knockout, based on ¹³C-labeling experiments and the measurement of enzyme activities. Applied Microbiology and Biotechnology. 2004;64(1):91–98. 10.1007/s00253-003-1458-5 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref041] 41. Fong SS, Nanchen A, Palsson BO, Sauer U. Latent pathway activation and increased pathway capacity enable Escherichia coli adaptation to loss of key metabolic enzymes. Journal of Biological Chemistry. 2006;281(12):8024–8033. 10.1074/jbc.M510016200 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref042] 42. Peng L, Arauzo-Bravo MJ, Shimizu K. Metabolic flux analysis for a ppc mutant Escherichia coli based on ¹³C-labelling experiments together with enzyme activity assays and intracellular metabolite measurements. FEMS Microbiology Letters. 2004;235(1):17–23. 10.1111/j.1574-6968.2004.tb09562.x [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref043] 43. Tännler S, Decasper S, Sauer U. Maintenance metabolism and carbon fluxes in Bacillus species. Microbial Cell Factories. 2008;7(1):19 10.1186/1475-2859-7-19 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref044] 44. Chubukov V, Uhr M, Le Chat L, Kleijn RJ, Jules M, Link H, et al. Transcriptional regulation is insufficient to explain substrate-induced flux changes in Bacillus subtilis. Molecular Systems Biology. 2013;9(1):709 10.1038/msb.2013.66 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref045] 45. van Ooyen J, Noack S, Bott M, Reth A, Eggeling L. Improved L-lysine production with Corynebacterium glutamicum and systemic insight into citrate synthase flux and activity. Biotechnology and Bioengineering. 2012;109(8):2070–2081. 10.1002/bit.24486 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref046] 46. Bommareddy RR, Chen Z, Rappert S, Zeng AP. A de novo NADPH generation pathway for improving lysine production of Corynebacterium glutamicum by rational design of the coenzyme specificity of glyceraldehyde 3-phosphate dehydrogenase. Metabolic Engineering. 2014;25:30–37. 10.1016/j.ymben.2014.06.005 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref047] 47. Wang ZJ, Wang P, Liu YW, Zhang YM, Chu J, Huang Mz, et al. Metabolic flux analysis of the central carbon metabolism of the industrial vitamin B12 producing strain Pseudomonas denitrificans using ¹³C-labeled glucose. Journal of the Taiwan Institute of Chemical Engineers. 2012;43(2):181–187. 10.1016/j.jtice.2011.09.002 [DOI] [Google Scholar]

[pcbi.1004838.ref048] 48. Hemme CL, Fields MW, He Q, Deng Y, Lin L, Tu Q, et al. Correlation of genomic and physiological traits of Thermoanaerobacter species with biofuel yields. Applied and Environmental Microbiology. 2011;77(22):7998–8008. 10.1128/AEM.05677-11 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref049] 49. Tang Y, Pingitore F, Mukhopadhyay A, Phan R, Hazen TC, Keasling JD. Pathway confirmation and flux analysis of central metabolic pathways in Desulfovibrio vulgaris Hildenborough using gas chromatography-mass spectrometry and Fourier transform-ion cyclotron resonance mass spectrometry. Journal of Bacteriology. 2007;189(3):940–949. 10.1128/JB.00948-06 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref050] 50. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nature Biotechnology. 2010;28(3):245–248. 10.1038/nbt.1614 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref051] 51. Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism–2011. Molecular Systems Biology. 2011;7(1):535 10.1038/msb.2011.65 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref052] 52. Lewis NE, Hixson KK, Conrad TM, Lerman JA, Charusanti P, Polpitiya AD, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models. Molecular Systems Biology. 2010;6(1):390 10.1038/msb.2010.47 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref053] 53. Smallbone K, Simeonidis E. Flux balance analysis: a geometric perspective. Journal of Theoretical Biology. 2009;258(2):311–315. 10.1016/j.jtbi.2009.01.027 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref054] 54. Orth JD, Palsson B. Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC Systems Biology. 2012;6(1):30 10.1186/1752-0509-6-30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref055] 55. Wu SG, He L, Wang Q, Tang YJ, An ancient Chinese wisdom for metabolic engineering: Yin-Yang. Microbial Cell Factories. 2015;14(1):39 10.1186/s12934-015-0219-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref056] 56. Fischer E, Sauer U. Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism. Nature Genetics. 2005;37(6):636–640. 10.1038/ng1555 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref057] 57. Schuetz R, Kuepfer L, Sauer U. Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molecular Systems Biology. 2007;3(1):119 10.1038/msb4100162 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref058] 58. Tang YJ, Martin HG, Deutschbauer A, Feng X, Huang R, Llora X, et al. Invariability of central metabolic flux distribution in Shewanella oneidensis MR-1 under environmental or genetic perturbations. Biotechnology Progress. 2009;25(5):1254–1259. 10.1002/btpr.227 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref059] 59. Stephanopoulos G. Metabolic fluxes and metabolic engineering. Metabolic Engineering. 1999;1(1):1–11. 10.1006/mben.1998.0101 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref060] 60. Stephanopoulos G, Vallino JJ. Network rigidity and metabolic engineering in metabolite overproduction. Science. 1991;252(5013):1675–1681. 10.1126/science.1904627 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref061] 61. Lien SK, Niedenführ S, Sletta H, Nöh K, Bruheim P. Fluxome study of Pseudomonas fluorescens reveals major reorganisation of carbon flux through central metabolic pathways in response to inactivation of the anti-sigma factor MucA. BMC Systems Biology. 2015;9(1):6 10.1186/s12918-015-0148-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref062] 62. Fuhrer T, Fischer E, Sauer U. Experimental identification and quantification of glucose metabolism in seven bacterial species. Journal of Bacteriology. 2005;187(5):1581–1590. 10.1128/JB.187.5.1581-1590.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref063] 63. Wierckx N, Ruijssenaars HJ, de Winde JH, Schmid A, Blank LM. Metabolic flux analysis of a phenol producing mutant of Pseudomonas putida S12: verification and complementation of hypotheses derived from transcriptomics. Journal of Biotechnology. 2009;143(2):124–129. 10.1016/j.jbiotec.2009.06.023 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref064] 64. del Castillo T, Ramos JL, Rodríguez-Herva JJ, Fuhrer T, Sauer U, Duque E. Convergent peripheral pathways catalyze initial glucose catabolism in Pseudomonas putida: genomic and flux analysis. Journal of Bacteriology. 2007;189(14):5142–5152. 10.1128/JB.00203-07 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref065] 65. Blank LM, Ionidis G, Ebert BE, Bühler B, Schmid A. Metabolic response of Pseudomonas putida during redox biocatalysis in the presence of a second octanol phase. FEBS Journal. 2008;275(20):5173–5190. 10.1111/j.1742-4658.2008.06648.x [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref066] 66. Conway T. The Entner-Doudoroff pathway: history, physiology and molecular biology. FEMS Microbiology Reviews. 1992;103(1):1–28. 10.1111/j.1574-6968.1992.tb05822.x [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref067] 67. Bar-Even A, Flamholz A, Noor E, Milo R. Rethinking glycolysis: on the biochemical logic of metabolic pathways. Nature Chemical Biology. 2012;8(6):509–517. 10.1038/nchembio.971 [DOI] [PubMed] [Google Scholar]

[pcbi.1004838.ref068] 68. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proceedings of the National Academy of Sciences. 2013;110(24):10039–10044. 10.1073/pnas.1215283110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref069] 69. Berger A, Dohnt K, Tielen P, Jahn D, Becker J, Wittmann C. Robustness and plasticity of metabolic pathway flux among uropathogenic isolates of Pseudomonas aeruginosa. PloS One. 2014;9(4). 10.1371/journal.pone.0088368 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref070] 70. Ebert BE, Kurth F, Grund M, Blank LM, Schmid A. Response of Pseudomonas putida KT2440 to increased NADH and ATP demand. Applied and Environmental Microbiology. 2011;77(18):6597–6605. 10.1128/AEM.05588-11 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref071] 71. Yao R, Hirose Y, Sarkar D, Nakahigashi K, Ye Q, Shimizu K. Catabolic regulation analysis of Escherichia coli and its crp, mlc, mgsA, pgi and ptsG mutants. Microbial Cell Factories. 2011;10(67):1475–2859 10.1186/1475-2859-10-67 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1004838.ref072] 72. Wu SG, Shimizu K, Tang JKH, Tang YJ. Facilitate Collaborations among Synthetic Biology, Metabolic Engineering and Machine Learning. ChemBioEng Reviews. 2016;3(2):1–11 10.1002/cben.201500024. [DOI] [Google Scholar]

PERMALINK

Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming

Stephen Gang Wu

Yuxuan Wang

Wu Jiang

Tolutola Oyetunde

Ruilian Yao

Xuehong Zhang

Kazuyuki Shimizu

Yinjie J Tang

Forrest Sheng Bao

Roles

Abstract

Author Summary

Introduction

Methods

Data collection

Feature vector for ML

Fig 1. A universal central metabolic pathway for bacteria.

Machine learning algorithms

Model evaluation and cross validation

Stoichiometric constraints and boundary

Flux adjustment using stoichiometric constraints

Constraint programming and input checking

Overall system design

Fig 2. The flowchart of MFlux algorithm.

Results

Pathway map and statistical analysis

Fig 3. Overview of central metabolic fluxes collected in our dataset.

Optimization of algorithms and parameters

Fig 4. A comparison of three ML algorithms: SVM, k-NN, and decision tree.

Fig 5. Best results by SVM on WT and WP datasets.

Fig 6. A comparison between linear-kernel SVM and RBF-kernel SVM.

Flux correction by quadratic programming

Case studies

Fig 7. Summary of root mean squared error (RMSE) from 20 case studies: averaged flux from 13C-MFA dataset, ML-only, and MFlux (ML + quadratic programming).

Table 1. Summary of 20 cases of study.

Fig 8. A comparison of the 13C-MFA flux, the flux predicted by ML only, and the flux predicted by MFlux in Case 8.

Fig 9. A comparison of the 13C-MFA flux, the flux predicted by ML only, and the flux predicted by MFlux in Case 16.

Improving flux balance analysis of microbial metabolism via MFlux

Fig 10. A comparison of the 13C-MFA flux, the flux predicted by MFlux, and the flux predicted by FBA.

Discussion

Metabolic robustness of fluxome patterns among microbial species

Limitations of machine learning

Conclusion

Supporting Information

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 7. Summary of root mean squared error (RMSE) from 20 case studies: averaged flux from ¹³C-MFA dataset, ML-only, and MFlux (ML + quadratic programming).

Fig 8. A comparison of the ¹³C-MFA flux, the flux predicted by ML only, and the flux predicted by MFlux in Case 8.

Fig 9. A comparison of the ¹³C-MFA flux, the flux predicted by ML only, and the flux predicted by MFlux in Case 16.

Fig 10. A comparison of the ¹³C-MFA flux, the flux predicted by MFlux, and the flux predicted by FBA.