Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Sep 9;15(9):e0238689. doi: 10.1371/journal.pone.0238689

Assessment of transcriptomic constraint-based methods for central carbon flux inference

Siddharth Bhadra-Lobo 1,*, Min Kyung Kim 1, Desmond S Lun 1,2,3
Editor: Rajagopal Subramanyam4
PMCID: PMC7480874  PMID: 32903284

Abstract

Motivation

Determining intracellular metabolic flux through isotope labeling techniques such as 13C metabolic flux analysis (13C-MFA) incurs significant cost and effort. Previous studies have shown transcriptomic data coupled with constraint-based metabolic modeling can determine intracellular fluxes that correlate highly with 13C-MFA measured fluxes and can achieve higher accuracy than constraint-based metabolic modeling alone. These studies, however, used validation data limited to E. coli and S. cerevisiae grown on glucose, with significantly similar flux distribution for central metabolism. It is unclear whether those results apply to more diverse metabolisms, and therefore further, extensive validation is needed.

Results

In this paper, we formed a dataset of transcriptomic data coupled with corresponding 13C-MFA flux data for 21 experimental conditions in different unicellular organisms grown on varying carbon substrates and conditions. Three computational flux-balance analysis (FBA) methods were comparatively assessed. The results show when uptake rates of carbon sources and key metabolites are known, transcriptomic data provides no significant advantage over constraint-based metabolic modeling (average correlation coefficients, transcriptomic E-Flux2 0.725 and SPOT 0.650 vs non-transcriptomic pFBA 0.768). When uptake rates are unknown, however, predictions obtained utilizing transcriptomic data are generally good and significantly better than those obtained using constraint-based metabolic modeling alone (E-Flux2 0.385 and SPOT 0.583 vs pFBA 0.237). Thus, transcriptomic data coupled with constraint-based metabolic modeling is a promising method to obtain intracellular flux estimates in microorganisms, particularly in cases where uptake rates of key metabolites cannot be easily determined, such as for growth in complex media or in vivo conditions.

Introduction

Computational tools integrating transcriptomic data into genome-scale metabolic models can predict system-level and condition specific metabolic flux distributions. Many methods for inferring metabolic fluxes from gene expression data have been, and continue to be, developed [13]. However, the comparative performance of these methods lacks diverse experimental flux data for validation. Existing validation was performed exclusively against flux data generated from E. coli and S. cerevisiae (yeast) cultures grown on glucose as the sole carbon source [3, 4]. Cells cultured on identical substrates utilize highly similar metabolic pathways [5]. This carbon source bias presents significant similarities in the measured metabolic flux distribution across previous validation datasets which may have been inadequate in assessing predictive performance.

Carbon source availability and relative uptake rates influence cellular metabolism. In nature, heterotrophic microorganisms can encounter a wide set of possible carbon sources to support growth, including sugars, polyols, alcohols, organic acids, and amino acids [6]. Heterotrophs such as E. coli and Bacillus subtilis have been widely studied and cultured on a variety of substrates including monosaccharides (e.g. glucose, fructose, galactose), disaccharides (e.g. sucrose), and two-carbon compounds (e.g. acetate) [711]. Thus, under a multitude of possible carbon sources, an incorrectly constrained heterotrophic model can reduce the predictive accuracy of central carbon fluxes from conventional FBA methods. Gene expression may be useful to impute model constraints based on transcript abundance in the absence of specific carbon source and uptake rate data.

Growth condition encompasses the availability of metabolic state-determining metabolites, both organic and inorganic (e.g. glucose, CO2, photons, NO3). Missing or incorrect growth condition information can change flux predictions to alternate metabolic states of the cell. Photoautotrophic unicellular metabolic models are generally well characterized and therefore simpler to constrain with respect to carbon source. The depletion of non-carbon metabolites may metabolically adapt the cell to alternate metabolic states. For example, light inhibition can shift metabolism from either autotrophic, heterotrophic, or a combination of both as mixotrophic in Synechocystis sp. PCC 6803 [12]. A substrate void of nitrate can induce replenishing of nitrogen from metabolic sinks such as amino acids for Synechococcus sp. PCC 7002 [13]. In the lack of environmental condition specificity, informational deficit may be overcome with gene expression data such as key pathways being allocated flux values based on the upregulation of associated transcripts.

Previous studies [2, 3] have extensively evaluated the predictive capability of in silico flux prediction using measured extracellular and intracellular fluxes in multiple experimental conditions, but under single carbon source bias (glucose) in two organisms. To address the limitations of the previous dataset, we have compiled an additional 21 experimental conditions of transcriptome measurements coupled with corresponding central carbon metabolic intracellular 13C flux measurements in 4 organisms (8 in E. coli, 8 in Bacillus subtilis, 3 in Synechocystis sp. PCC 6803, and 2 in Synechococcus sp. PCC 7002). These conditions were applied to models run using two transcriptomic methods (E-Flux2 and SPOT) [4] and the non-transcriptomic method parsimonious FBA (pFBA) [14]. E-Flux2 and SPOT were chosen as representative transcriptomic methods, and similarly pFBA was chosen as the representative non-transcriptomic method, because prior publications suggest they are among the best in their respective method classes [3, 4]. In this study, the generality of E-Flux2 and SPOT have been validated against pFBA using this new dataset of diverse carbon sources and conditional constraints.

In the absence of carbon source and growth condition data, transcriptomic coupled constraint-based modeling is useful in bridging this information gap. If it is even feasible in the experimental condition of interest, the extraction of 13C-labeled isotopes is costly and laborious. Additionally, many published 13C-MFA studies are unable to be reproduced due to missing information, with “only 30% of the studies examined were found to be acceptable” in a review by Crown et al. [15]. The 13C-labeled data also conveys minimal growth condition information as it cannot be directly applied to non-carbon metabolites [16]. In contrast, gene expression data is relatively simple to gather and is obtained from cell culturing experiments regularly. With transcriptomic FBA methods, researchers can utilize their gathered expression data to estimate intracellular metabolism.

Materials and methods

Gene expression, flux datasets, and metabolic models

All gene expression measurements obtained were not normalized any further past the instrument processed signal. Any log-transformed data was transformed back to their original scale by exponentiation.

Data and model for E. coli

For E. coli, both the measured gene expression (single color microarray) and 13C flux data were obtained from a previous study by Gerosa et al. [17]. In this study, data were measured from E. coli wild type BW25113 cells growing exponentially on eight different carbon sources: glucose, galactose, gluconate, fructose, glycerol, pyruvate, acetate, and succinate. We used iJO1366 [18] as the genome-scale metabolic model.

Data and model for B. subtilis

For B. subtilis, we used transcriptomic (single color microarray) and 13C flux data published in [19] and [20], respectively. Data were obtained from B. subtilis BSB168 cells grown under eight conditions defined by different carbon sources: glucose, fructose, gluconate, succinate + glutamate, glycerol, malate, malate + glucose, and pyruvate. For the genome-scale metabolic model of B. subtilis, the model published by Oh et al. [21] was used.

Data and model for Synechocystis sp. PCC 6803

For Synechocystis sp. PCC 6803, transcriptomic (RNA-seq) data was graciously provided by Dr. Le You (University of California San Diego, USA) and Dr. Yinjie Tang (Washington University in St. Louis, USA) [12]. The 13C flux data was compiled from three different publications [12, 22, 23]. Data were measured from the strain Synechocystis sp. PCC 6803 grown under three different conditions: photoautotrophic (i.e. HCO3- (bicarbonate) as the main carbon source) [23], photomixotrophic (i.e. open air CO2 + glucose) [22], and heterotrophic (i.e. open air CO2 + glucose, constrained photons) [12], respectively. We used the genome-scale metabolic model of Synechocystis sp. PCC 6803 developed by Knoop et al. [24]. An external pseudo-compartment was added to the model through which metabolites can be exchanged with the external environment via cellular transport reactions.

Data and model for Synechococcus sp. PCC 7002

For Synechococcus sp. PCC 7002, the transcriptomic (RNA-seq) data was obtained from a previous publication by Ludwig and Bryant [25]. The 13C flux data for this model was gathered from Qian et al. [26]. Data were measured from Synechococcus sp. PCC 7002 cells grown photoautotrophically (i.e. CO2 carbon source and photon uptake) with 10 mM nitrate and with no other nitrogen source. iSyp821 was used for the organism's genome scale-metabolic model [13].

Computational prediction and correlation

Computational metabolic flux prediction

In this study, E-Flux2, SPOT [4], and pFBA [14] were used to predict metabolic flux distributions. Biomass production was set as the objective function for E-Flux2 and pFBA. All FBA methods used in this study are referenced from their original publications [4, 14]. Computations were carried out on the macOS Mojave platform using a personal computer with a 3.1 GHz Intel Core i5 processor with 8GB of RAM. E-Flux2, SPOT and pFBA methods are implemented in MOST (Metabolic Optimization and Simulation Tool) which is available at http://most.ccib.rutgers.edu [27].

Correlation calculations

Validation of the predictive accuracy of the methods used in this study was done by calculating the uncentered Pearson product-moment correlation between in silico fluxes and corresponding 13C determined intracellular fluxes as previously described in [4]. A value of the correlation coefficient close to +1 or -1 indicates a strong relationship via a positive or negative scale factor, respectively, between experimentally measured fluxes and computationally predicted fluxes; a value of 0 indicates no such relationship [28]. If a measured reaction corresponds to a set of consecutive reactions in the model that are linked with intermediate metabolites (AND relationship), then the minimum flux value among the predicted fluxes was used. If a measured flux corresponds to multiple identical reactions (OR relationship), the sum of those predicted fluxes was used to calculate the correlation.

Correlations were calculated between the measured and predicted fluxes per carbon source in MATLAB R2018b (The Math Works Inc., Natick, Mass., USA). The predicted fluxes for the transcriptomic methods (E-Flux2 and SPOT) were generated using the respective carbon source and/or growth condition gene expression profile. pFBA does not use gene expression and was run in two scenarios, one where the carbon source flux was not specified (i.e. maximal uptake allowed) and one where the carbon source flux is specified (for uptake rates used see S1 Table in S1 File). Carbon source fluxes were gathered from uptake rates from the respective 13C flux experiments (mmol/g DCW/h).

Results and discussion

To test generality of E-Flux2 and SPOT, we evaluated predictive accuracy by calculating the uncentered Pearson correlation (Section Methods) between experimentally measured and computationally predicted intracellular fluxes using transcriptomic data, for the compiled 21 experimental conditions. The dataset consists of 8, 8, 3, and 2 conditions of E. coli, B. subtilis, Synechocystis sp. PCC 6803, and Synechococcus sp. PCC 7002, respectively (Section Methods provides carbon source information). We expect model choice affects the transcriptional FBA methods more than the non-transcriptional. A less complete model may have reduced constraint mapping from the relevant gene expression data.

We have chosen the uncentered Pearson correlation as a good, goodness-of-fit metric because transcriptomic flux inference, in general, estimates that high transcript count corresponds with high flux, but not the actual flux value. Therefore, the predicted flux values are in arbitrary units. This type of correlation captures predictive accuracy irrespective of the scaling introduced by the gene expression data.

Testing flux prediction under known and unknown carbon sources, E. coli and B. subtilis fluxes were simulated under different carbon source availabilities, at three different stages labeled as AC, DC, and Full AC.

  • DC: Known carbon source and uptake rate information available, uptake rate is only supplied to the non-transcriptomic method, pFBA.

  • AC: Unknown carbon source and uptake rate, only eight speculative carbon sources without uptake rate data are available to the model.

  • Full AC: No carbon source information available, all possible carbon sources (and any other extracellular metabolites) opened for exchange into the model.

Testing flux prediction under different growth condition in PCC 6803 and PCC 7002. Fluxes were simulated based on the organism’s possible metabolic states, at two different stages. Carbon sources are fewer and simpler to constrain in these photoautotrophic organisms, therefore here AC is the same as Full AC in the previous heterotrophic organisms.

  • DC: Growth condition information and metabolic state are known and uniquely applied to simulate each respective organism’s metabolic states. Carbon uptake rate data only supplied to pFBA.

  • AC: No growth condition information is available, all possible carbon and inorganic metabolites available for simulating the mixotrophic condition.

Unknown growth condition was used to demonstrate cases of complex media or in vivo growth of cultures. An example of this would be in studying the metabolism of enteric bacteria, in which the growth medium is complex, and the culture is grown in vivo. An example enteric model is Mycobacterium tuberculosis, in which using conventional 13C-MFA to measure latent bacteria would not be feasible due to additional constraints such as tissue specificity and slow in vivo growth rates, but extraction of RNA expression data has been shown to be possible [29, 30]. In the cases of gut microbiome, the distribution of bacterial species in the gut has been shown to vary based on diet [31]. With improved RNA extraction techniques, it may be possible to detect microbial metabolic shifts in species that continue to persist in the gut during dietary changes, using transcriptomic flux prediction. A conjectured experiment would be to sample RNA from the gut during a period of one type of host diet, then sample RNA again after a period of time on another diet. Although this is highly dependent on the quality of expression profiles and metabolic models.

Correlations for known and unknown carbon source

Central carbon flux correlations in E. coli

Under direct carbon source (DC) the E.coli models were supplied with only one carbon source each (Fig 1A). With complete carbon source information supplied, correlation between the transcriptomic and non-transcriptomic methods are similarly good. pFBA was provided an additional constraint to improve prediction with the experimentally measured carbon source uptake flux (uptake rate) being set within the pFBA runs only (Fig 1A). For a speculative set of possible carbon sources, Fig 1B shows the measured fluxes of E. coli grown on a single carbon source correlated with the predicted fluxes when supplied with all 8 carbon sources (AC) used in the measurements (i.e. glucose, galactose, gluconate, fructose, glycerol, pyruvate, acetate, and succinate) per model. Fig 1C simulates the absence of any carbon source and uptake rate information, with the model fully open for exchange with the extracellular environment. Here overall predictive accuracy drops across all methods as the number of available carbon sources increases. E-Flux2 on average performs comparably to SPOT, with slightly worse correlation on average. Models run with all 294 available carbon sources (Full AC) and 30 ion sources, shows that on average E-Flux2 and SPOT generate reasonable correlations (Fig 1C). All three methods produce lower correlations for carbon sources found in the TCA cycle (Fig 1C Full AC acetate, pyruvate, and succinate). These low correlations were investigated and determined to be due to predicting flux opposite in direction to the measured flux (S3 Fig in S1 File). The measured fluxes for glycolysis are negative in reaction direction and the predictions are positive, while the measured fluxes for TCA cycle reactions are positive, and the predicted fluxes are negative. SPOT maintains higher correlations compared to E-Flux2 and pFBA due to predicting the TCA cycle reactions in the correct direction.

Fig 1. E. coli predictions.

Fig 1

Correlations between measured and predicted flux of E. coli grown on 8 different carbon sources for the three FBA methods E-Flux2 (red), SPOT (gray), pFBA (blue). Horizontal dashed lines are the respectively colored mean correlations per method. The mean (μ) is the average prediction correlation per method. The standard deviation (σ) is the spread of prediction correlation above and below the mean, denoted by the error bars. (A) Respective direct carbon source (DC) supplied. pFBA was given the additional constraint of known uptake rate in the single carbon source, while E-Flux2 and SPOT were not. All methods perform consistently across the individual carbon sources. (B) All 8 carbon sources supplied and correlated with measured flux from single carbon growth (AC). Correlations drop in all methods, particularly in the TCA cycle carbon sources (Acetate, Pyruvate, and Succinate). (C) All possible carbon sources in the model supplied (Full AC). All methods again lose performance, but the transcriptomic methods retain reasonable correlations. See Supplementary S1 Table for uptake rates used.

Central carbon flux correlations in B. subtilis

The B. subtilis measured fluxes consist of 8 different carbon sources, with two cases of double carbon sources experiments (Fig 2 glutamate + succinate and malate + glucose). Fig 2A shows the DC correlations from E-Flux2 and pFBA is comparable, with known carbon flux giving the best correlations on average. In speculative carbon sources, Fig 2B, all three methods perform similarly on average for AC. pFBA performs similarly poorly to the other methods for the double carbon cases and only marginally better for glutamate + succinate (see Discussion). SPOT performs the best for the TCA cycle single carbon source cases (AC malate, AC pyruvate). The same can be seen in the Full AC models (269 carbon sources, 25 ion sources) (Fig 2C) but pFBA on average performs worse, most notably in the TCA cycle single carbon sources.

Fig 2. B. subtilis predictions.

Fig 2

Correlations between measured and predicted flux of heterotrophic B. subtilis grown on 8 different carbon sources for the three FBA methods E-Flux2 (red), SPOT (gray), pFBA (blue). Double carbon sources are glutamate plus succinate (Glut + Succ) and malate plus glucose (Mal + Glcs). Horizontal dashed lines are the respectively colored mean correlations per method. The mean (μ) is the average prediction correlation per method. The standard deviation (σ) is the spread of prediction correlation above and below the mean, denoted by the error bars. (A) Respective direct carbon source (DC) supplied. pFBA was given the additional constraint of known uptake rate in the single carbon source, while E-Flux2 and SPOT were not. All methods perform consistently across the individual carbon sources, with minor drops in correlation for double carbon sources (Glut + Succ and Mal + Glcs). (B) All 8 carbon sources supplied and correlated with measured flux from single carbon growth (AC). Correlations drop in all methods, particularly for Malate. (C) All possible carbon sources in the model supplied (Full AC). All methods again lose performance, but the transcriptomic methods retain reasonable correlations. See Supplementary S1 Table for uptake rates used.

Discussion of known and unknown carbon source

For the E. coli and B. subtilis models, if carbon source and uptake rates are known, the directly provided carbon source and uptake rate information (DC) produces flux predictions in non-transcriptomic pFBA that are comparable to transcriptomic E-Flux2 (Figs 1A and 2A). SPOT provides a reasonable, but lower average prediction for both DC cases. E-Flux2 predicts flux similarly to pFBA except E-Flux2 was not provided any uptake rate information. The effect of gene expression derived reaction bounds predicts central carbon flux well, even without providing respective carbon uptake rates. This suggests that gene expression can serve as a substitute for measured carbon source uptake information, if the carbon source is known.

If carbon source is speculatively known, uptake rate is unknown, and presented with a relatively small set of 8 possible carbon sources (AC), pFBA predictive power drops significantly (Fig 2A and 2B). Without transcriptomic data, pFBA sets fixed proportion uptake rates of the available metabolites in the model across multiple cases. This affects the subsequent central carbon flux prediction as a single flux pattern is being predicted across all conditions. In contrast, the transcriptome coupled methods do not have the same uptake of carbon source per condition, as the gene expression dictates the proportions of carbon source flux for cellular uptake. This suggests that with unknown uptake rates and speculatively known carbon sources, gene expression can still serve as a substitute for measured uptake rate data.

Under both unknown carbon source and unknown uptake rates (Full AC), where the models are allowed uptake of all possible carbon sources present in the model, the pFBA average prediction score drops further while E-Flux2 and SPOT remains similar to their AC correlations (Figs 1C and 2C). The E-Flux2 and SPOT average correlation even increases slightly from the B. subtilis AC to Full AC cases. A possible explanation is that in the overabundance of carbon sources, the gene expression can mediate the allocation of flux feeding into central carbon metabolism when presenting from multiple metabolic network entry points and thereby predict reaction directionality better (see S3–S5 Figs in S1 File). This is in contrast to when flux directionality is set based on a small set of carbon sources, such as the TCA cycle or glycolysis relevant metabolites.

Additionally, in the Full AC model, all ion uptake reactions were open, suggesting the transcriptome can also facilitate ion flux prediction, where 13C data generally does not provide information. For both unknown carbon source and unknown uptake rate conditions (Figs 1C and 2C), SPOT performs the best on average. This is likely due to SPOT maximizing correlation with flux prediction and the gene expression set, rather than setting expression-based reaction bounds (as in E-Flux2) which can set a large flux window that can affect predicted directionality in subsequent reactions (see S3–S5 Figs in S1 File). The generally higher prediction correlations for E-Flux2 and SPOT suggest that under both unknown carbon source and unknown uptake rates, gene expression data can substitute for carbon source and uptake rate information for central carbon flux prediction.

On average the transcriptomic methods perform better than pFBA under unknown carbon source and uptake, but in one exception of the double carbon source conditions, pFBA predicts central carbon flux with higher accuracy than either transcriptomic method across the DC, AC, and Full AC cases (Fig 2A, 2B and 2C glutamate + succinate). This is potentially due to pFBA predicting low flux correctly for a subset of the measured flux values for B. subtilis, while E-Flux2 and SPOT allocated different fluxes for these reactions based on the presence of the associated transcripts (see reaction directions in S5 Fig in S1 File). Hence, when a measured reaction has low flux, but some transcript abundance, the transcriptomic methods may attribute more flux to these reactions.

Additionally, carbon source similarity affects flux predictions. On a carbon source basis, glutamate + succinate measured fluxes are similar to glycerol and pyruvate measured fluxes. The other double carbon source (malate + glucose) exhibits a measured flux distribution very close to the single carbon malate measured distribution (see S1A, S5 Figs in S1 File). This suggests that some carbon sources produce similar flux distributions to others, both experimentally and in silico. This is supported by the clustering of pFBA flux patterns across all constraints and conditions (S1B, S5 Figs in S1 File) which shows similarity between the predicted overall glutamate + succinate distribution to glycerol and pyruvate predicted distributions. This effect has also shown to shift flux predictions away from the measured distribution. In one case, the predicted distribution for malate + glucose more closely resembles the predicted glucose distribution, but in the measured flux patterns the malate + glucose measured flux distributions more closely resembles the malate flux distribution.

Correlations for known and unknown growth condition

Central carbon flux correlations in PCC 6803. In Synechocystis sp. PCC 6803 autotrophic, mixotrophic, and heterotrophic conditions (Fig 3A) E-Flux2 and pFBA produce very similar central carbon flux distributions under the autotrophic condition. These predictions correlate well with the autotrophic measured fluxes, suggesting that both methods are producing nearly identical flux distributions. In the mixotrophic condition, pFBA, with known carbon source and flux, produces a higher correlation than the other methods. All methods predict heterotrophic central carbon metabolism poorly, with SPOT predicting the only positive correlation between measured and predicted fluxes. SPOT produces similar correlation values with the three measured flux distributions, and the only non-negative correlation consistently for all three conditions. Fig 3B shows the correlations of fluxes predicted using the three conditional gene expression sets (expression data collected from autotrophic, mixotrophic, and heterotrophic cultures) while under mixotrophic constraints, simulating how predicted fluxes correlate under unknown conditions and guided by transcriptomic data. E-Flux2 and pFBA produce negative correlations for all mixotrophically constrained predictions. SPOT again provides the only positive correlations.

Fig 3. Synechocystis sp. PCC 6803 predictions.

Fig 3

Correlations between measured and predicted flux of multitrophic Synechocystis sp. PCC 6803 grown in 3 different environment conditions for the three FBA methods E-Flux2 (red), SPOT (gray), pFBA (blue). Horizontal dashed lines are the respectively colored mean μ, correlations per method. The mean (μ) is the average prediction correlation per method. The above and below the mean. A) Autotrophic, mixotrophic, and heterotrophic conditional constraints standard deviation (σ) is the spread of prediction correlation applied and correlated with respective measured fluxes, denoted by the error bars. pFBA was given the additional constraint of known uptake rate in the single carbon source, while E-Flux2 and SPOT were not. B) Mixotrophic condition constraints applied and correlated with all three conditional measured fluxes. See Section Methods for condition specific model constraints and see Supplementary S1 Table for uptake rates used. for uptake rates used.

Central carbon flux correlations in PCC 7002

Measured fluxes from Synechococcus sp. PCC 7002 in nitrogen-replete (10 mM nitrate) and nitrogen-deprived (N-deprived, no nitrogen source) conditions correlated well with predicted fluxes under autotrophic constraints. SPOT produced significantly better central carbon flux for the N-deprived condition and the other methods performed similarly across both nitrogen conditions (Fig 4, N-replete). Fig 4B shows PCC 7002 in an AC mixotrophic condition not naturally exhibited in PCC 7002 (see Results and discussion, Unknown carbon source and growth condition). Both sets of predicted fluxes are allowed open uptake of all carbon sources as well as NO3 uptake, simulating unknown carbon source and unknown nitrate availability. SPOT performs well under the set of unknown conditions, while the other methods perform poorly.

Fig 4. Synechococcus sp. PCC 7002 predictions.

Fig 4

Correlations between measured and predicted flux of multitrophic Synechococcus sp. PCC 7002 grown in autotrophic conditions for the three FBA methods E-Flux2 (red), SPOT (gray), pFBA (blue). Horizontal dashed lines are the respectively colored mean correlations per method. The mean (μ) is the average prediction correlation per method. The standard deviation (σ) is the spread of prediction correlation above and below the mean, denoted by the error bars. (A) Autotrophic conditional constraints applied and correlated with N-replete and N-deprived measured fluxes. Supplementary Materials for uptake rates used. (B) PCC 7002 in AC autotrophic condition (mixotrophic, see Discussion) and unconstrained NO3 uptake. See Section Methods for condition specific model constraints and see Supplementary S1 Table for uptake rates used. for uptake rates used.

Discussion of known and unknown growth condition

For the cyanobacteria models (PCC 6803 and PCC 7002), carbon source is relatively easy to choose and constrain. The models we assessed are known to fix a single source of inorganic carbon under autotrophic condition, which pFBA can predict well with known carbon source and uptake rates (Fig 3A). In the autotrophic growth condition, uptake rate of the inorganic carbon source does not significantly affect central carbon flux prediction (see S2 Fig in S1 File). But if an organism can increase biomass in multiple possible growth conditions (PCC 6803) then information pertaining to the presence and uptakes rates of inorganic carbon source versus glucose is much more useful.

With unknown growth condition information for PCC 6803, a reasonable approach for modeling the flux distribution is under mixotrophic conditions. That is, allow uptake of both inorganic and organic carbon as well as photon flux and use the associated gene expression to dictate how fluxes should be allocated. Fig 3B shows that under such conditions, pFBA and E-Flux2 predict similarly poor central carbon flux. However, SPOT consistently produces positive correlations between the predicted and measured fluxes, across growth conditions. This suggests that with SPOT, gene expression can give some idea of what the condition an organism is growing under using gathered gene expression and the genome-scale metabolic model. A possible explanation for the lower predictive accuracy in both E-Flux2 and pFBA compared to SPOT, is that under glucose availability the typical glycolysis flux distribution is not always found it nature (S6 Fig in S1 File). In PCC 6803, we found fluxes in the pentose phosphate pathway (see S2 Table in S1 File), which is an alternative metabolic route to glycolysis, has significantly higher flux predicted through it for SPOT in comparison to the other methods. This is further supported by information suggesting that PCC 6803 is merely a facultative heterotroph and therefore only metabolizes exogenous organic carbon when given no other choice [32].

In PCC 7002, the growth condition is only partially known. PCC 7002 is modeled under photoautotrophic conditions, but key secondary metabolite uptake rates are unknown (NO3 exchange). Here pFBA predicts central carbon flux poorly. By applying different uptake rates of non-carbon metabolites, it is possible to determine whether an organism is in one metabolic state versus another. For example, constraining the uptake of oxygen can produce a flux distribution for anaerobic metabolism [33]. Similarly, in PCC 7002 the presence and depletion of nitrate to the system can lead to different intracellular carbon utilization.

PCC 7002 is known to be an obligate photoautotroph [34]. Therefore, non-transcriptomic methods should be able to perform well in predicting central carbon metabolism, but Fig 4 shows that pFBA given known carbon source and uptake rate performs worse than the transcriptomic methods in both N-replete and N-deprived cases. In Fig 4A SPOT predicts N-deprived central carbon flux better than the other methods. This likely due to the drawing of flux from the nitrogen sinks such as amino acids in order to accommodate for the lack of extracellular nitrate.

Extended analysis: Artificial conditional information deficit

As an extension of our findings, PCC 7002 was constrained under a second artificial growth condition set to mimic mixotrophic conditions. We attempted to predict flux using the second condition’s set of incorrect conditional constraints and see how gene expression might help reduce prediction error. This allows for carbon sources other than CO2 allowed for uptake as well as unconstrained NO3 uptake for both the N-replete and N-deprived cases (Fig 4B). This mixotrophic state is not found in nature, and therefore the PCC 7002 central carbon flux distribution correlation was expected to be poor [34]. With the nitrate growth condition unspecified in the model, NO3 was allowed into the cell freely for both conditions. The correlations for E-Flux2 and pFBA were indeed poor, but SPOT produced strong correlations. This suggests that even in incorrectly constrained models supplied with unrealistic carbon sources and no secondary metabolite information, gene expression can still be used to predict central carbon flux well (S7 Fig in S1 File).

Conclusion

In this study, we compiled 21 experimental conditions and corresponding transcriptomic data for cells grown on various carbon sources and conditions. The predicted fluxes were correlated against experimentally measured fluxes to evaluate the predictive power of E-Flux2 and SPOT compared with the non-transcriptomic method, pFBA. pFBA is a representative method for comparison as it was shown to have good predictions, was used in the previous two validations studies, and does not use transcriptomic data [2, 3].

If carbon source and uptake rate information are accurately known for microorganisms and gene expression data is unavailable, pFBA is a suitable method for central carbon flux prediction (Figs 1A and 2A). Even with well-defined carbon source, uptake rate, and growth condition information (other factors on the cell’s metabolism, such as light intensity), E-Flux2 performed better than pFBA in 13 of the 21 models. In all of these cases E-Flux2 was not provided any measured uptake rate data, while pFBA was.

If a carbon source or growth condition informational deficit is encountered, then SPOT is the method of choice as it consistently produced good correlations and can account for noncanonical internal metabolism (see Results and discussion, Known and unknown growth condition). Although pFBA can give good predictions, any uncertainty in carbon source or growth condition carries the risk of generating very poor predictions. Even with accurate carbon source and growth condition information pFBA can still produce negative correlations (Fig 3A). Gene expression can produce better central carbon flux as the expression data can account for other unknowns in the model, beyond just the carbon source (Fig 4A, N-deprived).

Based on the findings in this study, we propose a general decision tree to be used in constraint-based modeling for central carbon flux prediction in microorganisms (Fig 5). In this figure, if no expression data is available, then pFBA is the default method of choice. If any expression data is available, then a transcriptomic method is suggested as gene expression has been shown to account for additional informational deficits beyond carbon source such as ion exchanges.

Fig 5. General methods decision tree.

Fig 5

The three FBA methods are shown as E-Flux2 (red), SPOT (gray), pFBA (blue). Left branches on the tree indicate a YES decision, right branches indicate a NO decision. Growth condition refers to the availability of inorganic and organic metabolites that can shift metabolism between different states (e.g. photons s, NO3, CO2, glucose).

Using validated methods like SPOT can minimize the risk of predicting incorrect central carbon flux distributions in the absence of accurate carbon source and growth condition data. Not only does SPOT consistently produce positive correlations in all 21 samples, but also produces low if not the overall lowest, standard deviations in predictive accuracy (Figs 15, legend σ-values). For future improvement, developing a method for better determining flux directionality based on gene expression values should improve transcriptomic flux prediction.

In cells grown on well-defined media, it is relatively easy to determine carbon sources and uptake rates. The carbon source is generally known, while the uptake rate is determined from measuring how fast a culture consumes it. For cells grown in vivo or on complex media, where growth condition cannot be fully defined, 13C-labeling may not be even feasible, let alone the cost. Additionally, specifics pertaining to the growth conditions such as inorganic compound exchange may not be available or easily measured. In such cases, gene expression data can nevertheless be gathered simply and cheaply, and methods to infer intracellular metabolic flux from transcriptomic data (such as E-Flux2 and SPOT) have great utility.

Supporting information

S1 Dataset. E. coli model, data, and scripts used in this study.

The genome-scale metabolic model of E. coli iJO1366 [18]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) study by Gerosa et al. [17]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

(GZ)

S2 Dataset. Bacillus subtilis model, data, and scripts used in this study.

The genome-scale metabolic model of B. subtilis from Oh et al. [21]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) from Oh et al. [19]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

(GZ)

S3 Dataset. Synechocystis sp. PCC 6803 model, data, and scripts used in this study.

The genome-scale metabolic model of PCC 6803 developed by Knoop et al. [24]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) from by Dr. Le You (University of California San Diego, USA) and Dr. Yinjie Tang (Washington University in St. Louis, USA) [12]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

(GZ)

S4 Dataset. Synechococcus sp. PCC 7002 model, data, and scripts used in this study.

The genome-scale metabolic model of PCC 7002 from Qian et al. [26]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) from Ludwig and Bryant [25]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

(GZ)

S5 Dataset. Python code and tables used for correlations and plotting.

Python code used in plotting and analysis (.ipynb), and tab-delimited tables of correlations generated. See README for details.

(GZ)

S1 File. Supporting figures and tables.

Supporting figures (S1–S8 Figs) and tables (S1 and S2 Tables) with associated captions.

(PDF)

Acknowledgments

The authors thank James J. Kelley (Rutgers University, USA) for his work on Metabolic Optimization and Simulation Tool (MOST).

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

This work was supported in part by NSF award no. 1515511 (MKK).

References

  • 1.Beale DJ, Karpe AV, Ahmed W. Beyond Metabolomics: A Review of Multi-Omics-Based Approaches In: Beale DJ, Kouremenos KA, Palombo EA, editors. Microbial Metabolomics: Applications in Clinical, Environmental, and Industrial Microbiology. Cham: Springer International Publishing; 2016. p. 289–312. [Google Scholar]
  • 2.Kim MK, Lun DS. Methods for integration of transcriptomic data in genome-scale metabolic models. Computational and structural biotechnology journal. 2014;11(18):59–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Machado D, Herrgard M. Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism. PLoS Comput Biol. 2014;10(4):e1003580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kim MK, Lane A, Kelley JJ, Lun DS. E-Flux2 and SPOT: Validated Methods for Inferring Intracellular Metabolic Flux Distributions from Transcriptomic Data. PLoS One. 2016;11(6):e0157101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Peregrin-Alvarez JM, Sanford C, Parkinson J. The conservation and evolutionary modularity of metabolism. Genome Biol. 2009;10(6):R63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rodrigues F, Ludovico P, Leão C. Sugar Metabolism in Yeasts: an Overview of Aerobic and Anaerobic Glucose Catabolism In: Péter G, Rosa C, editors. Biodiversity and Ecophysiology of Yeasts. Berlin, Heidelberg: Springer Berlin Heidelberg; 2006. p. 101–21. [Google Scholar]
  • 7.Borkowski O, Goelzer A, Schaffer M, Calabre M, Mader U, Aymerich S, et al. Translation elicits a growth rate-dependent, genome-wide, differential protein production in Bacillus subtilis. Mol Syst Biol. 2016;12(5):870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Brunk E, Chang RL, Xia J, Hefzi H, Yurkovich JT, Kim D, et al. Systemic post-translational control of bacterial metabolism regulates adaptation in dynamic environments. bioRxiv. 2017:180646. [Google Scholar]
  • 9.Liu M, Durfee T, Cabrera JE, Zhao K, Jin DJ, Blattner FR. Global transcriptional programs reveal a carbon source foraging strategy by Escherichia coli. J Biol Chem. 2005;280(16):15921–7. [DOI] [PubMed] [Google Scholar]
  • 10.Otto A, Bernhardt J, Meyer H, Schaffer M, Herbst FA, Siebourg J, et al. Systems-wide temporal proteomic profiling in glucose-starved Bacillus subtilis. Nat Commun. 2010;1:137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tiwari DP, Chatterjee PM, Rotti H, Chand B, Raval R, Dubey AK. Expression dynamics of the poly-γ-glutamic acid biosynthesis genes of Bacillus subtilis in response to glucose and glutamic acid—a pilot study. FEMS Microbiology Letters. 2018;365(22). [DOI] [PubMed] [Google Scholar]
  • 12.You L, He L, Tang YJ. Photoheterotrophic Fluxome in Synechocystis sp. Strain PCC 6803 and Its Implications for Cyanobacterial Bioenergetics. Journal of Bacteriology. 2015;197(5):943–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Qian X, Kim MK, Kumaraswamy GK, Agarwal A, Lun DS, Dismukes GC. Flux balance analysis of photoautotrophic metabolism: Uncovering new biological details of subsystems involved in cyanobacterial photosynthesis. Biochimica et biophysica acta Bioenergetics. 2017;1858(4):276–87. [DOI] [PubMed] [Google Scholar]
  • 14.Lewis NE, Hixson KK, Conrad TM, Lerman JA, Charusanti P, Polpitiya AD, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1):390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Crown SB, Antoniewicz MR. Publishing 13C metabolic flux analysis studies: a review and future perspectives. Metabolic engineering. 2013;20:42–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Basler G, Fernie AR, Nikoloski Z. Advances in metabolic flux analysis toward genome-scale profiling of higher organisms. Biosci Rep. 2018;38(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gerosa L, Haverkorn van Rijsewijk BR, Christodoulou D, Kochanowski K, Schmidt TS, Noor E, et al. Pseudo-transition Analysis Identifies the Key Regulators of Dynamic Metabolic Adaptations from Steady-State Data. Cell systems. 2015;1(4):270–82. [DOI] [PubMed] [Google Scholar]
  • 18.Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, et al. A comprehensive genome‐scale reconstruction of Escherichia coli metabolism—2011. Molecular Systems Biology. 2011;7(1):535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nicolas P, Mäder U, Dervyn E, Rochat T, Leduc A, Pigeonneau N, et al. Condition-Dependent Transcriptome Reveals High-Level Regulatory Architecture in Bacillus subtilis. Science. 2012;335(6072):1103–6. [DOI] [PubMed] [Google Scholar]
  • 20.Chubukov V, Uhr M, Le Chat L, Kleijn RJ, Jules M, Link H, et al. Transcriptional regulation is insufficient to explain substrate‐induced flux changes in Bacillus subtilis. Molecular Systems Biology. 2013;9(1):709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Oh Y-K, Palsson BO, Park SM, Schilling CH, Mahadevan R. Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data. Journal of Biological Chemistry. 2007;282(39):28791–9. [DOI] [PubMed] [Google Scholar]
  • 22.You L, Berla B, He L, Pakrasi HB, Tang YJ. 13C-MFA delineates the photomixotrophic metabolism of Synechocystis sp. PCC 6803 under light- and carbon-sufficient conditions. Biotechnology journal. 2014;9(5):684–92. [DOI] [PubMed] [Google Scholar]
  • 23.Young JD, Shastri AA, Stephanopoulos G, Morgan JA. Mapping photoautotrophic metabolism with isotopically nonstationary (13)C flux analysis. Metabolic engineering. 2011;13(6):656–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Knoop H, Gründel M, Zilliges Y, Lehmann R, Hoffmann S, Lockau W, et al. Flux Balance Analysis of Cyanobacterial Metabolism: The Metabolic Network of Synechocystis sp. PCC 6803. PLOS Computational Biology. 2013;9(6):e1003081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ludwig M, Bryant D. Acclimation of the Global Transcriptome of the Cyanobacterium Synechococcus sp. Strain PCC 7002 to Nutrient Limitations and Different Nitrogen Sources. Frontiers in Microbiology. 2012;3(145). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Qian X, Zhang Y, Lun DS, Dismukes GC. Rerouting of Metabolism into Desired Cellular Products by Nutrient Stress: Fluxes Reveal the Selected Pathways in Cyanobacterial Photosynthesis. ACS Synthetic Biology. 2018;7(5):1465–76. [DOI] [PubMed] [Google Scholar]
  • 27.Kelley JJ, Lane A, Li X, Mutthoju B, Maor S, Egen D, et al. MOST: a software environment for constraint-based metabolic modeling and strain design. Bioinformatics. 2014;31(4):610–1. [DOI] [PubMed] [Google Scholar]
  • 28.Bewick V, Cheek L, Ball J. Statistics review 7: Correlation and regression. Crit Care. 2003;7(6):451–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bordbar A, Lewis NE, Schellenberger J, Palsson BØ, Jamshidi N. Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions. Molecular Systems Biology. 2010;6(1):422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Talaat AM, Lyons R, Howard ST, Johnston SA. The temporal expression profile of Mycobacterium tuberculosis infection in mice. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(13):4602–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Singh RK, Chang H-W, Yan D, Lee KM, Ucmak D, Wong K, et al. Influence of diet on the gut microbiome and implications for human health. Journal of Translational Medicine. 2017;15(1):73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Reyes JC, Crespo JL, Garcia-Dominguez M, Florencio FJ. Electron Transport Controls Glutamine Synthetase Activity in the Facultative Heterotrophic Cyanobacterium Synechocystis sp. PCC 6803. Plant Physiology. 1995;109(3):899–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jouhten P, Rintala E, Huuskonen A, Tamminen A, Toivari M, Wiebe M, et al. Oxygen dependence of metabolic fluxes and energy generation of Saccharomyces cerevisiae CEN.PK113-1A. BMC Syst Biol. 2008;2:60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY. Generic Assignments, Strain Histories and Properties of Pure Cultures of Cyanobacteria. Microbiology. 1979;111(1):1–61. [Google Scholar]

Decision Letter 0

Christopher N Boddy

18 Oct 2019

PONE-D-19-22421

Assessment of transcriptomic constraint-based methods for central carbon flux inference

PLOS ONE

Dear Mr. Bhadra-Lobo,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please ensure all key points raised by the reviewers are addressed.  In particular both reviewers are concerned about variance and its impact on the comparison between experimental and simulation data.  Reviewer 1 highlights that the statistical analysis of the data is insufficient and that the data does not support some of the conclusion made (particularly as it relates to the performance of pFBA for the double carbon cases). These are a key concerns that must be clearly addressed in your revised manuscript.     

We would appreciate receiving your revised manuscript by Dec 02 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Christopher N. Boddy, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Assessment of transcriptomic constraint-based methods for central carbon flux

inference

Summary: In this paper, the authors sought to understand which methods (E-Flux2, SPOT or pFBA) best predicts intercellular flux when certain data types for experiment is missing (e.g. uptake rates, primary carbon usage etc). By systematically comparing the results of the methods with experimental C-13 MFA, they were able to determine the power of each method to correctly predict intracellular metabolic flux given the data constraints. Furthermore, they also analyzed data from non-model organisms of genus Synechococcus to ensure that their assessment were not model organism specific. Based on these thorough comparisons, the authors propose a decision tree that helps modellers choose the best method given the available data (Figure 5).

While the authors were thorough in their data collection, they were not as thorough in their analysis. The paper can be greatly strengthened by presenting variance in correlations between the model prediction and the experimental result. Without this information, it is difficult to determine the significance of difference between the different methods. The paper could also be strengthened by benchmarking it against other algorithms for integrating transcriptomics data (e.g. GIMME and iMAT). Overall, the exhaustive data collected/ analyzed and the decision tree produced by the paper would be a very useful guide to modellers working with incomplete dataset.

Major Comments:

In cases of low correlations between predicted and experimentally measured fluxes, it would be interesting to know if any particular pathway or subsystem contributes disproportionately to low correlation. These may point to failure points in the model and direct future improvements. This is briefly addressed on Line 172, but an expanded analysis could strengthen the paper.

Linde 177-178: “In speculative carbon sources, Fig 2B, all three methods perform similarly on average for AC. pFBA performs the best for the double carbon cases..”

These conclusions are difficult to make for these conditions. pFBA’s performance on double carbon sources for AC is only slightly better (if at all). On ‘mal + glcs’ pFBA is on par with E-Flux2.

Figure 3B: Though the correlations are negative, there is a strong correlation between the pFBA flux prediction and the measured flux prediction (r = ~0.64). It is not clear from the text why the correlation is so strong and what this implies about the pFBA prediction in this sample/condition.

The crux of the paper relies on comparing correlations between experimental and simulation data. However, almost all correlations are presented as the mean with little information about variance. This makes it difficult to understand how significant the differences in correlations between the different methods are (see #3 above). We suggest the authors calculate total variance in correlation.

The authors should consider adding analysis with GIMME and iMAT. We understand the authors have previously done similar comparisons between GIMME, iMAT and E-Flux2 + SPOT (Kim et al. 2016). However, with the addition of new organisms in this paper, it would be of interest to know how these different methods perform relative to each other. If E-Flux2 and SPOT still outperform GIMME and iMAT in these new conditions, it may also lead to greater usage of their methods in the future.

Minor Comments:

If there are future rounds of reviews, please provide higher quality figures (specially figures 1 and 2). Currently they are difficult to read.

Line 165: Change “supplied with 8 the carbon sources” to “supplied with all 8 carbon sources.”

Figure 1B: The actual results should be described in the text beyond just description of what the analysis was (e.g. “Fluxes during glucose showed the highest correlation with….”).

Reviewer #2: This paper addresses a limitation of the subfield of metabolic network analysis concerned with predicting flux distributions under specific growth conditions (specified by key nutrient sources and/or maximum uptake rates). It collates 21 settings that have both Cabon-13 flux measurements (considered to be a gold standard here) as well as transcriptomic measurements that can be used to constrain a metabolic model. The models and the transcriptomic data are then used to predict fluxes, and these are then compared to the gold standard. The authors find that transcriptomic-based methods can be advantageous over standard methods when the transcriptomic data is obtained in a setting with incomplete information about the cell’s growth conditions.

Major comments:

1) I would have liked to see a more extensive discussion of the experimental error involved in measuring fluxes via Carbon-13; for instance, what is known about the repeatability of these measurements?

2) The correlation measure is very crude and hides a lot of nuances; for instance, not all fluxes are equally important to get right (the growth rate is typically the key quantity in model analysis, but also larger-scale pathway utilization may be of interest). Furthermore, it is very possible to have a perfect correlation, but be off by a factor of 10 in every single prediction - thus, not only should the correlation coefficient be reported, but also the equation of the line of best fit. An additional question could be about the similarities of the rankings between fluxes (e.g. Spearman correlation) because sometimes it is preferable to determine the relative importance of the reactions or pathways, rather than their absolute importance, to the cell’s internal state.

In addition, the use of the correlation as the main metric is not ideal because the typical goal of a metabolic study is to obtain qualitative information about the internal state of the cell (often relative to a baseline), whereas the correlation tries to make a quantitative statement about it, which is often (though not always) an over-interpretation of the metabolic model’s predictions.

3) It is unclear what the two growth conditions were used for PCC 7002 - only one of them is mentioned in the paper in the relevant section.

4) The figures (in particular, figures 1 and 2) are very grainy (low resolution) and do not use a consistent color scheme. They should be substantially improved. I recommend using ggplot2 in R for the plotting (alternatively, matplotlib in Python could work). You should also provide error bars on the plot, rather than having a mean and standard deviation specified in the legend.

5) The organization of the Results and Discussion section needs to be improved - the information is there, but hard to digest in the way it is currently organized. I would recommend having the sections represent an organism and the subsections represent the setting, or vice versa; right now it is a complicated mixture of both, which prevents an easy overview of the results.

6) I am not entirely convinced that the situation where the information about a cell’s growth conditions is incomplete is a realistic one. I would like to see the discussion at least one clear setting where that is likely to happen (or, barring that, an acknowledgment by the authors that the utility of the proposed approach is limited to a somewhat hypothetical scenario).

7) Can you learn something by combining the outputs of all three methods? For instance, is the average of the three predictions (or the two predictions by the methods using transcriptomics data) better correlated with the gold standard than the individual one? This could be an interesting finding in and of itself.

Minor comments:

Lines 43-44: Cells cultured on identical substrates produce highly conserved glucose metabolism pathways -> utilize highly similar pathways?

Line 66: To this end, we have […] -> specify to what end?

Line 83: Any log2 normalized data was exponentiated. -> I would say “any log-transformed data was transformed back to real scale by exponentiation” as normally, “exponentiated” suggests that the base of exponentiation was e, rather than 2.

Line 89: We used iJO1366 as the genome-scale metabolic model. -> I would have liked to see a discussion of the effect of the model choice on the results, though I recognize that this might be outside of the scope of the current article.

Line 101: remove the extra “H” before the formula for bicarbonate

Line 104: metabolites can exchange -> metabolites can be exchanged

Line 116: All methods used in this study are referenced with their original publications -> from their original publications? The sentence is very confusing in any case.

Line 118: When I follow the link I get the following error (please fix it): You don't have permission to access / on this server.

Line 139: see Methods 2.1 -> Section Methods 2.1

Line 310: incorrect hyphenation on both ends

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Assessment of transcriptomic constraint-based methods for central carbon flux.pdf

PLoS One. 2020 Sep 9;15(9):e0238689. doi: 10.1371/journal.pone.0238689.r002

Author response to Decision Letter 0


6 Dec 2019

Please refer to the attached file under the description "Response to Reviewers." and filename: Sid_PLosONE_rebuttal1_final.docx

We thank the reviewers for their time.

Attachment

Submitted filename: Sid_PLosONE_rebuttal1_final.docx

Decision Letter 1

Christopher N Boddy

22 Jan 2020

PONE-D-19-22421R1

Assessment of transcriptomic constraint-based methods for central carbon flux inference

PLOS ONE

Dear Mr. Bhadra-Lobo,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both reviewers agree that your revised manuscript addresses the major concerns from the first review.  They identify minor corrections that are needed for the manuscript to be acceptable. In particular, Reviewer 2's concern regarding M. tuberculosis must be addressed.  

We would appreciate receiving your revised manuscript by Mar 07 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Christopher N. Boddy, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: I thank the authors for addressing my concerns; however, their modifications have introduced some additional concerns, as detailed below. I do not believe these to be a major obstacle to publication, however, and am therefore recommending minor revisions.

Major comments:

1) “enteric bacteria both pathogenic and commensal/mutualistic, in which the growth medium is complex,

and the culture is grown in vivo. An example of a pathogenic model is Mycobacterium tuberculosis.”

This incorrectly suggests that Mycobacterium tuberculosis is an enteric pathogen, which it is not (it is primarily, though not exclusively, found in the lungs)

In tuberculosis, the bacterium live inside of scar tissue of the lung and their metabolism is uncertain.

Technically, the bacteria (plural!) live inside a granuloma, formed by macrophages, rather than “scar tissue”.

Also, please provide references; in particular, the work by Palsson’s group on modeling Mycobacterium tuberculosis inside a macrophage seems relevant to cite (other models may have been published since as well).

“A hypothetical experiment would be to sample RNA from the gut during a period of one type of host diet, then sample RNA again after a period of time on another diet.”

Unfortunately, it is currently somewhat challenging to even determine the composition of a microbiome from RNA-Seq data at a particular timepoint with high accuracy; I think that expecting to detect changes in metabolism from this kind of data would be very ambitious (though some startups, such as Gusto, claim to be able to do that).

Grammar and minor comments:

simulated under based -> simulated based

Although this is dependent on the expression profiles and metabolic models to be complete enough for prediction of central carbon metabolism. - confusing!

decent correlations -> reasonable correlations

Double carbon sources are denoted as glutamate with succinate (Glut + Succ) and malate with glucose (Mal + Glcs). -> edit for grammar

If carbon source is speculatively known, and uptake rate is unknown, as presented with a relatively small set of 8 possible carbon sources (AC), pFBA predictive power drops significantly -> same

where more closely resembles -> same

S3 – S5 Fig -> Fig. S3 – S5 Fig

constrained under a second artificial growth condition was set to mimic -> edit for grammar

be used predict -> same

flux transcriptomic flux prediction. -> repeated word

S1, S2, and S3, S4, S5, S6, S7, and S8 Figs -> same comment as above

Reviewer #3: The authors present here the use of non-transcriptomic-based and transcriptomic-based methods to predict metabolic fluxes, where nutrient sources and uptake rates are available or not available. The authors show that non-transcriptomic-based and transcriptomic-based methods perform similarly with information available on carbon sources and uptake rates. However, transcriptomic-based methods are better at predicting flux distributions than constraint-based method, across various organisms and conditions.

The authors have sufficiently addressed the reviewer comments and the overall organization and structure of the manuscript has improved. I do not have further concerns except a few minor comments.

There is wording on nitrogen (or N) deprived throughout the manuscript. It should be nitrogen-deprived or N-deprived.

The terms DC, AC and full AC appear before their full names were introduced.

Line 253 effects -> affects

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Rajagopal Subramanyam

24 Aug 2020

Assessment of transcriptomic constraint-based methods for central carbon flux inference

PONE-D-19-22421R2

Dear Dr. Bhadra-Lobo,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Rajagopal Subramanyam

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Rajagopal Subramanyam

25 Aug 2020

PONE-D-19-22421R2

Assessment of transcriptomic constraint-based methods for central carbon flux inference

Dear Dr. Bhadra-Lobo:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Rajagopal Subramanyam

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Dataset. E. coli model, data, and scripts used in this study.

    The genome-scale metabolic model of E. coli iJO1366 [18]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) study by Gerosa et al. [17]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

    (GZ)

    S2 Dataset. Bacillus subtilis model, data, and scripts used in this study.

    The genome-scale metabolic model of B. subtilis from Oh et al. [21]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) from Oh et al. [19]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

    (GZ)

    S3 Dataset. Synechocystis sp. PCC 6803 model, data, and scripts used in this study.

    The genome-scale metabolic model of PCC 6803 developed by Knoop et al. [24]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) from by Dr. Le You (University of California San Diego, USA) and Dr. Yinjie Tang (Washington University in St. Louis, USA) [12]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

    (GZ)

    S4 Dataset. Synechococcus sp. PCC 7002 model, data, and scripts used in this study.

    The genome-scale metabolic model of PCC 7002 from Qian et al. [26]. Individual models in SBML format (.xml) with set constraints used AC, DC, etc. are included. Transcriptomic data (.csv) from Ludwig and Bryant [25]. Predicted fluxes generated using these data. MATLAB scripts used for calculating correlations (.m).

    (GZ)

    S5 Dataset. Python code and tables used for correlations and plotting.

    Python code used in plotting and analysis (.ipynb), and tab-delimited tables of correlations generated. See README for details.

    (GZ)

    S1 File. Supporting figures and tables.

    Supporting figures (S1–S8 Figs) and tables (S1 and S2 Tables) with associated captions.

    (PDF)

    Attachment

    Submitted filename: Assessment of transcriptomic constraint-based methods for central carbon flux.pdf

    Attachment

    Submitted filename: Sid_PLosONE_rebuttal1_final.docx

    Attachment

    Submitted filename: Sid_PLosONE_rebuttal2.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES