Guidelines for extracting biologically relevant context-specific metabolic models using gene expression data

Saratram Gopalakrishnan; Chintan J Joshi; Miguel Valderrama Gomez; Elcin Icten; Pablo Rolandi; William Johnson; Cleo Kontoravdi; Nathan E Lewis

doi:10.1016/j.ymben.2022.12.003

. Author manuscript; available in PMC: 2024 Jan 1.

Published in final edited form as: Metab Eng. 2022 Dec 22;75:181–191. doi: 10.1016/j.ymben.2022.12.003

Guidelines for extracting biologically relevant context-specific metabolic models using gene expression data

Saratram Gopalakrishnan ¹, Chintan J Joshi ¹, Miguel Valderrama Gomez ², Elcin Icten ², Pablo Rolandi ², William Johnson ², Cleo Kontoravdi ³, Nathan E Lewis ^1,^4,^*

PMCID: PMC10258867 NIHMSID: NIHMS1904206 PMID: 36566974

Abstract

Genome-scale metabolic models comprehensively describe an organism’s metabolism and can be tailored using omics data to model condition-specific physiology. The quality of context-specific models is impacted by (i) choice of algorithm and parameters and (ii) alternate context-specific models that equally explain the -omics data. Here we quantify the influence of alternate optima on microbial and mammalian model extraction using GIMME, iMAT, MBA, and mCADRE. We find that metabolic tasks defining an organism’s phenotype must be explicitly and quantitatively protected. The scope of alternate models is strongly influenced by algorithm choice and the topological properties of the parent genome-scale model with fatty acid metabolism and intracellular metabolite transport contributing much to alternate solutions in all models. mCADRE extracted the most reproducible context-specific models and models generated using MBA had the most alternate solutions. There were fewer qualitatively different solutions generated by GIMME in E. coli, but these increased substantially in the mammalian models. Screening ensembles using a receiver operating characteristic plot identified the best-performing models. A comprehensive evaluation of models extracted using combinations of extraction methods and expression thresholds revealed that GIMME generated the best-performing models in E. coli, whereas mCADRE is better suited for complex mammalian models. These findings suggest guidelines for benchmarking -omics integration algorithms and motivate the development of a systematic workflow to enumerate alternate models and extract biologically relevant context-specific models.

Keywords: Systems biology, Metabolic modeling, Constraint-based models, Context-specific models, Model extraction methods

1. INTRODUCTION

The physiological state of a cell is mediated by an intricate network of signaling pathways, gene regulatory networks and metabolic reactions. Gene expression data provide functional insights into the modulation of cellular phenotype (Manzoni et al., 2018), biological features of disease states (Borrageiro et al., 2018; Dickson, 2021; Kori and Yalcin Arga, 2018; Pedrotty et al., 2012), cellular differentiation and tissue-specific functions (Burke et al., 2020; Uhlen et al., 2016; Watcham et al., 2019), and cellular responses to environmental perturbations (Kochanowski et al., 2017). Although many tools improve the coverage of gene expression data analysis, to gain more functional insights into the modulation of cell state (Nguyen et al., 2019), quantitative assessments using genome-scale models (GEMs) can provide rich mechanistic insights.

GEMs are a comprehensive repository of biochemical reactions encoded within the genome of an organism (Gu et al., 2019) that reflect its metabolic capabilities. The sheer size (e.g., number of reactions) of eukaryotic genome-scale models introduces computational and data availability bottlenecks to parameterize quantitative integration techniques such as whole-cell modeling (Macklin et al., 2020), ME-Models (O’Brien et al., 2013), or kinetic models (Gopalakrishnan et al., 2020; Khodayari and Maranas, 2016). The integration of transcriptomics with GEMs has been invaluable to the scientific community for nearly two decades (Blazier and Papin, 2012; Robaina Estevez and Nikoloski, 2014). For example, transcriptomics data can be integrated with eukaryotic models through binarization of enzyme abundance levels to “ON” or “OFF” states after thresholding associated gene expression levels and evaluating gene-protein-reaction (GPR) relationships to yield context-specific models that represent the condition-specific metabolism of the organism. However, inactivating reactions based on thresholding alone leads to fragmented metabolic networks that are incapable of predicting any meaningful flux distributions (hereafter known as flux inconsistent networks) (Åkesson et al., 2004). Flux consistency must be restored using gap-filling algorithms, which seek to preserve the validity of the model. Several algorithms have been developed over the past decade, each with its own unique approach for extracting flux-consistent sub-networks. Context-specific models generated using various model extraction methods have been previously applied to study human tissue-specific metabolism (Jerby et al., 2010), identify biomarkers in NAFLD (Mardinoglu et al., 2014), cancer (Zielinski et al., 2017), and diabetes (Bordbar et al., 2011; Kumar et al., 2014), propose potential anti-cancer drug targets (Pacheco et al., 2019), and optimize bioprocessing for drug manufacturing (Fouladiha et al., 2020; Schinn et al., 2021a).

Model extraction methods are broadly classified into optimization-based and pruning-based methods. Optimization-based methods are broadly classified into the GIMME-like family of methods (Becker and Palsson, 2008) and the iMAT-like family of methods (including iMAT (Zur et al., 2010), INIT (Agren et al., 2012), and tINIT (Agren et al., 2014)) and rely on solving a linear or mixed-integer programming problem to extract context-specific models. The objective varies based on the method and generally maximizes removal of poorly expressed genes (as in the GIMME-like methods) or inclusion of highly expressed genes (as in iMAT and INIT) and may enforce minimum flux through certain required phenotype-defining pathways (also known as required metabolic functions (RMFs)) as implemented in tINIT. On the other hand, pruning-based methods like MBA (Jerby et al., 2010), FASTCORE (Vlassis et al., 2014), mCADRE (Wang et al., 2012), and CORDA (Schultz and Qutub, 2016) extract context-specific models by first identifying a candidate list of reactions to be removed and then pruning the genome-scale models one reaction at a time, until no more reactions can be removed without losing information about the cell’s phenotype. While optimization-based methods are faster and better at protecting flux through known metabolic functions, pruning-based methods allow evidence-based retention of reactions, thereby generating models that are more representative of the physiological state being investigated (Robaina Estevez and Nikoloski, 2014).

The content and quality of an extracted model depends on the choice of model extraction method, the threshold applied to gene expression data to identify active and inactive reactions, and the coverage of data. Previous studies (Opdam et al., 2017; Richelle et al., 2019b) revealed the choice of method and the threshold strongly influencing model content. However, an overlooked factor influencing model content is whether model extraction methods yield a unique context-specific model. Alternate optimal solutions arise when there are multiple combinations of reactions associated with poorly expressed genes that can be retained to restore flux consistency of the metabolic network but cannot be effectively resolved using the available gene expression data. Typically, these include isozymes utilizing different cofactors (e.g., NAD vs NADP) and alternate biosynthetic routes. The scope and disparity of alternate optimal solutions is a measure of reproducibility of each model extraction algorithm and sufficiency of data. To account for alternate optimal solutions, the algorithm EXAMO first identifies all fluxes that are active in all alternate solutions generated by iMAT and uses this set of reactions as high-confidence reactions in MBA (Rossell et al., 2013). Robaina-Estevez and Nikoloski (2017) developed a framework to quantify alternate optima in flux-centric extraction methods such as RegrEx and CORDA and revealed that the variability in extracted model topology stemmed from different combinations of 58% of the reactions that were flagged for removal. Therefore, it is necessary to identify and quantify the variability in extracted context-specific models and screen potential alternate solutions using appropriate data (gene knockout data, fluxomics, endo-metabolomics, etc.) so that extracted models are sufficiently accurate to identify meaningful intervention strategies for therapeutic design or metabolic engineering applications of interest. In addition, a framework to enumerate and screen the space of alternate solutions will provide insights into the reproducibility of existing model extraction algorithms and establish a platform to benchmark future omics-integration algorithms.

This study comprehensively assesses the importance of quantitatively protecting flux through RMF reactions (the biomass production reaction, in this case) and the effect of choice of threshold and extraction method on the scope of alternate optimal solutions during transcriptomics-based model extraction in E. coli, CHO-S, and a renal cancer cell line (786O). Ensembles of 100 context-specific models were extracted using combinations of parameters selected from five thresholding approaches (global 80^th percentile, global 75^th percentile, global 60^th percentile, StanDep, and local T2), four model extraction methods (GIMME, iMAT, MBA, and mCADRE), and quantitative protection of metabolic functions (i.e., growth rate). First, we define a method to generate the ensemble of alternate solutions for each case. Next, we evaluate the growth rate predicted by all extracted context-specific models and determine that qualitatively protecting the biomass reaction (as previously suggested (Richelle et al., 2019a)) is not sufficient to accurately predict the experimentally measured growth rate. Following this, we quantify the variability in content of context-specific models in each ensemble in terms of conserved and variable pathways to assess the reproducibility of each method. Across all organisms and expression thresholds evaluated in this study, mCADRE generated the most reproducible models, whereas models generated by MBA showed the largest variance in reaction content. We also find that the size and content of models extracted using GIMME were the least sensitive to the applied expression threshold in all organisms evaluated in this study. We then demonstrate the utility of the receiver-operating-characteristic (ROC) plot in visualizing the performance of extracted context-specific models and propose a metric to select the model which best represents the biological system in the context of the application, using gene knockout data reserved from the model extraction dataset. Using a Euclidean distance metric, we quantified the proximity of the extracted models to the ideal model and found that GIMME generated the best-performing models for fast growing prokaryotes such as E. coli, whereas models extracted using mCADRE fared better in mammalian systems such as 786O. Finally, we establish a set of guidelines that an extracted model should satisfy for reliable hypothesis generation in biomedical and metabolic engineering applications.

2. RESULTS

2.1. Flux through required metabolic functions must be explicitly protected during model extraction

Model extraction methods aim to generate models that predict biologically relevant fluxes and accurately capture the sensitivity of the fluxome to genetic and environmental perturbations. Therefore, biologically relevant models must accurately recapitulate experimentally measured metabolite uptake and secretion rates and fluxes through required metabolic function (RMF) reactions. In this study, we consider the biomass formation reaction as an RMF reaction. Because the biomass reaction may not necessarily be retained in the extracted models, it should be protected as a core reaction to ensure retention (Richelle et al., 2019a). This was sufficient in optimization-based methods (GIMME and iMAT), in which fluxes were protected using lower and upper bounds in the metabolic model. However, protecting the biomass reaction was insufficient to ensure a biologically relevant growth rate in models extracted using MBA and mCADRE (Figure 1). Only 34 MBA models for E. coli generated using the 80^th percentile expression threshold predicted a growth rate greater than 90% of the experimentally measured growth rate (Supplementary Figure S1A). For 786O, only 36 of 500 models generated using MBA supported a growth rate within 10% of the maximum rate predicted by Recon2.2 (Supplementary Figure S1B). For CHO-S, only 9 of 500 generated MBA models predicted a growth rate within 10% of the maximum growth rate predicted by iCHO1766 (Supplementary Figure S1C). No model extracted using mCADRE for any organism correctly predicted biologically relevant growth rates despite protecting the biomass formation reaction itself as a core reaction. Core reactions in MBA and mCADRE are considered active if they can carry a flux of at least 10⁻⁴ mmol/gDW-h for E. coli or 10⁻⁴ mmol/gDW-day for 786O and CHO-S, which is several orders of magnitude less than the experimentally measured growth rate of all three organisms.

Retention of required metabolic functions. Box and Whisker plots show the distribution of the maximum growth rate predicted by extracted models relative to the maximum growth rate predicted by the genome-scale model for E. coli, 786O, and CHO-S using GIMME, iMAT, MBA, and mCADRE.

In E. coli, reactions from the electron transport chain (complexes I, II and III) and succinate dehydrogenase from the TCA cycle were necessary for ATP production but were inactivated because the associated transcript abundances were below the cutoff threshold. The resulting models therefore relied on the lower-yield substrate-level phosphorylation reactions for ATP generation and yielded lower growth rates compared to iJO1366. In 786O and CHO-S, reactions supporting cysteine and lysine uptake were removed based on transcriptomic evidence. Thus, the resulting models relied on de novo cysteine biosynthesis pathways and biocytin catabolism to meet the biosynthetic cysteine and lysine demands. The low abundance of biocytin in cell culture media limited lysine availability for protein synthesis, resulting in a considerably lower growth rate prediction compared to the respective parent genome-scale models. Ranking of non-core reactions based on expression scores prior to model pruning in mCADRE ensured that reactions required to sustain an experimentally measured growth rate were always removed due to low or missing gene expression values. However, very few MBA models fortuitously retained these reactions because MBA randomizes the removal order for reactions with low expression scores. Upon enforcing a mandatory minimum flux of 90% of the maximum growth rate predicted by the parent genome-scale model as a pruning criterion, all models generated by MBA and mCADRE predicted a biologically relevant growth rate for each of E. coli, 786O, and CHO-S (Figure 1). These findings suggest that even the most lenient threshold approaches such as StanDep and the Local T2 threshold can filter out reactions necessary to support key phenotypes and therefore, flux through RMF reactions must be explicitly protected during model extraction.

2.2. Choice of extraction method determines the scope of alternate solutions

Analysis of model sizes in each ensemble provided insights into the reproducibility and internal variability of model extraction methods. The ensemble generated using mCADRE showed the least dispersion in model sizes (average range = 2 for E. coli, 10 for 786O, and 14 for CHO-S), while models generated using MBA showed the largest dispersion in model sizes for E. coli (average range = 37) and CHO-S (average range = 280) (Figure 2, Supplementary Tables ST4, ST5, and ST6). For 786O, models generated using iMAT showed the largest size dispersion (average range = 128). Upon increasing the global expression threshold from the 60^th percentile to the 80^th percentile, the dispersion of model sizes from iMAT and MBA increased by up to 50%. However, ensembles generated using iMAT and MBA with StanDep or local T2 thresholding had lower size dispersion compared to models using global thresholding. The size dispersion correlated with the size of the core reaction set. For larger core reaction sets, model extraction methods choose pathways from a smaller set of non-core reactions for gap-filling, resulting in ensembles with smaller dispersions for thresholds with more core reactions. Interestingly, model size dispersion in ensembles generated using GIMME remained relatively unchanged in response to changes in threshold. On the other hand, rank-ordering of non-core reactions by mCADRE limits variability in removal order, and therefore, generated ensembles with the smallest size dispersion.

Size distribution of models in the ensemble generated using GIMME, iMAT, MBA, and mCADRE for E. coli, 786O and CHO-S with the global 60^th percentile threshold, global 75^th percentile threshold, global 80^th percentile threshold, StanDep, and the local T2 threshold.

Because a low size dispersion within an ensemble does not necessarily imply fewer alternate solutions, conserved and variable reactions in the ensemble must be identified and analyzed. During model extraction, we classified all reactions in the parent genome-scale models into one of four classes: conserved reactions (always retained in the ensemble), inactivated reactions (always removed in all models), variable reactions (retained in some models when certain criteria are met), and no data reaction (reactions lacking data in favor of retention or removal). The Jaccard index highlights the prevalence of each of these reaction classes and therefore quantifies the diversity of models within an ensemble.

The average Jaccard index for ensembles from mCADRE were 0.99, 0.99, and 0.98, in E. coli, 786O, and CHO-S, respectively. Over 98% of reactions in the extracted models were conserved reactions (Figure 3A). Upon varying the applied threshold, the number of conserved reactions in E. coli ranged from 872 to 1,426 reactions. The corresponding ranges were 1,722 to 3,199 reactions in 786O, and 1,161 to 2,249 reactions in CHO-S. Reactions were conserved in an ensemble because they were either core reactions, stoichiometrically coupled to core reactions, or stoichiometrically coupled to the biomass formation reaction. 434, 286, and 332 growth-coupled reactions were conserved in E. coli, 786O, and CHO-S, respectively. While only 315 reactions in E. coli were retained to activate blocked core reactions, this number increased up to 541 reactions in CHO-S and 1,019 reactions in 786O. This suggests that reaction retention in E. coli was primarily driven by biomass coupling, whereas gene expression data were the primary cause of reaction retention in the eukaryotic models. 27 reactions in E. coli, 303 reactions in 786O, and 259 reactions in CHO-S constituted alternate solutions (Figure 3B). In E. coli, these 27 reactions (21 reactions from glycerophospholipid metabolism, 3 metabolite transport reactions, and 3 reactions from lipopolysaccharide biosynthesis) were included to ensure flux consistency of seven core reactions (five transport reactions, and one reaction each from lipopolysaccharide and glycerophospholipid biosynthesis). In 786O, alternate solutions resulted from variability in 203 transport reactions, 34 glycosylation reactions, 22 reactions from fatty acid metabolism, and 8 reactions from nucleotide metabolism, 10 reactions from amino acid metabolism, and 23 reactions from central metabolism. These reactions were retained in the extracted models to activate 195 core reactions, primarily from fatty acid metabolism, all of which have four alternate pathways on average activating them. In CHO-S, 187 transport reactions, 25 reactions from fatty acid metabolism, 15 glycosylation reactions, 11 reactions from nucleotide metabolism, and 21 reactions from central and amino acid metabolism make up all identified alternate solutions. Similar to 786O, the core reactions activated by these non-conserved reactions are predominantly from fatty acid metabolism. Since mCADRE attempts to remove all non-core reactions, none of the reactions in the model were classified as no data reactions.

(A) Fraction of conserved reactions in models extracted using GIMME, iMAT, MBA, and mCADRE for *E. coli*, 786O, and CHO-S with various thresholds.

(B) Fraction of reactions from various pathways (0 representing no variable reactions and 1 representing all variable reactions) contributing to alternate solutions in models extracted using GIMME, iMAT, MBA, and mCADRE for *E. coli*, 786O, and CHO-S with various thresholds

Compared to mCADRE, MBA ensembles had greater size dispersion and lower Jaccard index values (averaging 0.95 in E. coli, 0.86 in 786O, and 0.82 in CHO-S). Although MBA used more core reactions than mCADRE, an average 10% reduction in conserved reactions was observed in all three organisms. Unlike mCADRE, MBA permits removing core reactions if at least twice as many non-core reactions are removed. In addition, conserved reactions accounted for only 91%, 84%, and 83% of the extracted models for E. coli, 786O, and CHO-S, respectively. This contrasted with mCADRE, in which >99% of the reactions in all extracted models were conserved. The variable fraction of the models was considerably higher in MBA models compared to mCADRE models (Figure 4A), accounting for 247 reactions in E. coli, 1,436 reactions in 786O, and 1,579 in CHO-S, of which, 23 reactions in E. coli, 49 reactions in 786O, and 91 reactions in CHO-S were rendered growth-coupled by mCADRE. The variable reactions in extracted models were predominantly from fatty acid metabolism in E. coli and from metabolite transport pathways in 786O and CHO-S (Figure 3B). Of these variable reactions, 171 reactions in E. coli, 1,114 reactions in 786O, and 1,222 reactions in CHO-S were always removed in ensembles generated using mCADRE. This is because MBA randomizes the removal order of non-core reactions whereas mCADRE sorts non-core reactions based on expression and connectivity evidence prior to removal. Thus, certain non-core reactions are always eliminated by mCADRE because their low gene expression increases their removal priority, while MBA may retain them if competing non-core reactions are removed earlier. This implementation difference contributed to the larger variation in size and content in models extracted using MBA compared to other methods.

(A) Improvement in quality of models extracted using GIMME, iMAT, MBA, and mCADRE for *E. coli*, 786O, and CHO-S compared to the parent genome-scale models. The ideal model correctly classifies all essential and non-essential reactions and therefore, has a specificity and sensitivity equal to 1. The distance from the ideal model is calculated as $\sqrt{{(1 - sensitivity)}^{2} + {(1 - specificity)}^{2}}$ .

(B) Receiver Operating Characteristic (ROC) plot showing the improvement in model performance of the best models extracted using GIMME, iMAT, MBA, and mCADRE relative to the parent genome-scale model in *E. coli*, 786O, and CHO-S.

Compared to MBA, iMAT models had fewer reactions, lower dispersion, and lower variability in model content with a Jaccard index of 0.96, 0.86, and 0.8 in E. coli, 786O, and CHO-S, respectively. Ensembles generated using iMAT for E. coli had the smallest fraction of conserved reactions (88%). For 786O and CHO-S, this fraction was 74% and 55%, respectively, considerably lower than mCADRE despite having the same number of core reactions. Unlike mCADRE, iMAT does not remove all reactions below the high expression threshold but attempts to inactivate only those reactions whose expression score is below the specified lower threshold. Moreover, iMAT permits removing core reactions if an equal number of low expression reactions were inactivated. Reactions from transport pathways and fatty acid metabolism accounted for 65% of all variable reactions in the E. coli ensembles (Figure 4B). Meanwhile, reactions from fatty acid metabolism, cofactor biosynthesis, and transport pathways accounted for 88% of the variable reactions in 786O, whereas reactions from metabolite transport pathways alone accounted for 70% of the variable reactions in CHO-S.

Although the GIMME ensembles had low size dispersions relative to iMAT and MBA, a pairwise comparison of models based on reaction content revealed that the scope of alternate solutions varied based on the topological features of the parent GSM model. Ensembles extracted using GIMME for E. coli had an average Jaccard index of 0.99 with 426 conserved reactions across the ensemble, 1,815 reactions always removed in all models, and 342 reactions contributing to alternate solutions. Of the 426 conserved reactions, 375 reactions were growth-coupled in iJO1366, 43 reactions were growth-coupled in the extracted models but not in iJO1366, one reaction (ATP maintenance) was retained based on pre-specified flux bounds, and six reactions from central metabolism were retained as alternatives to low-expression reactions. Of the 342 variable reactions, 224 reactions from metabolite transport, fatty acid metabolism, tryptophan biosynthesis and nucleotide phosphorylation pathways were growth-coupled when retained in the extracted models. Ensembles for both eukaryotic models had more diverse alternate solutions with an average Jaccard index of 0.72 for CHO-S and 0.64 for 786O. The number of conserved reactions was also reduced to 170 reactions in CHO-S and 83 reactions in 786O with only 127 and 44 reactions coupled to biomass formation in iCHO1766 and Recon2.2, respectively. 4,757 reactions in CHO-S and 5,861 reactions in 786O were inactivated in every extracted model. However, the number of variable reactions in each case increased to 1,736 reactions in CHO-S and 1,841 reactions in 786O, which is much greater than E. coli, despite similarities in model sizes in all three ensembles. 70% of these variable reactions were inter-compartment metabolite transport reactions, 10% from amino acid metabolism, 6% from fatty acid metabolism, and the remaining from cofactor biosynthesis and nucleotide biosynthesis and salvage. The primary objective of GIMME is to inactivate reactions with genes expressed below the threshold while ensuring that RMF reactions are retained and fully operational. Thus, we classify reactions as: (i) growth-coupled, (ii) low-expression, and (iii) maybe-on. All growth-coupled reactions are always retained in every extracted model. Low-expression reactions are always removed unless coupled to the RMF reaction. The inactivation of low-expression reactions forces flux through alternate pathways, when available, to meet the demands of the RMF reaction. Pathways that are the sole alternatives to low-expression reactions are retained in every extracted model. However, when alternate pathways exist, variable reactions can be retained, resulting in alternate solutions. Reactions with no available data have no reason for retention or removal and therefore contribute to alternate pathways. As such, alternate solutions from GIMME are determined predominantly by the topological features of the parent GSM. In E. coli, a much larger fraction of metabolism is growth-coupled leading to less diverse alternate solutions. However, models relying on more complex media, such as 786O and CHO-S have a more diverse set of alternate solutions.

2.3. ROC plots help evaluate the quality of extracted models

Diverse ensembles of context-specific models can be generated, but it is often unclear which models are most biologically relevant. To validate extracted models, gene dispensability data, flux redirections, and fluxomics datasets can be used (Opdam et al., 2017). Here we rely on gene knockout data to evaluate the quality of alternate optimal models. The ideal model would correctly identify all essential and non-essential genes. Integrating transcriptomics data deactivates pathways that are inactive in the context of interest and is therefore expected to reconcile false predictions by the genome-scale model. Here we evaluate the specificity and sensitivity using receiver operating characteristic (ROC) plots (see Methods section for the definition of specificity and sensitivity and Supplementary Figure S2 ROC plots for E. coli, 786O, and CHO-S). After computing the specificity and sensitivity for each model, the distance from the ideal model was computed and then compared with the parent genome-scale model.

All extracted models outperformed their respective parent GEM models in predicting gene dispensability. This is because model pruning removes alternate routes that compensate for the loss of function of essential reactions, which reconciles false-positive predictions in the genome-scale model. We find that GIMME models had the highest specificity for E. coli and CHO-S with an average sensitivity of 0.87 and 0.71, respectively. mCADRE generated the highest specificity models for 786O with an average specificity of 0.14. The best models generated for E. coli and CHO-S using GIMME showed a 29% and 55% improvement in gene essentiality predictions compared to iJO1366 and iCHO1766, respectively. On the other hand, the best model for 786O generated using mCADRE only showed a 13% improvement compared to Recon2.2.

The essentiality of 203 genes were reconciled in the best performing model generated using GIMME for E. coli, including 30 genes from fatty acid biosynthesis, nucleotide biosynthesis, and glycolysis. Compared to other models in the ensemble, the best performing model failed to reconcile the essentiality of the b1638 gene that encodes the PDX5POi reaction involved in pyridoxal phosphate biosynthesis. The PDX5PO2 reactions serves as an alternate route to pyridoxal phosphate synthesis when the PDX5POi gene is inactivated. Because PDX5PO2 is not associated with any gene, it is not preferentially removed or retained in models generated using GIMME and iMAT, due to which, b1638 is always reconciled in these ensembles. In contrast, PDX5PO2 is treated as a low confidence reaction by MBA and mCADRE, leading to prioritized removal. As a result of this, MBA and mCADRE can reconcile the essentiality of b1638.

The essentiality of 62 genes predominantly from fatty acid metabolism and transport pathways were reconciled in the best performing model for 786O generated using mCADRE. In the best model for CHO-S constructed using GIMME, the essentiality of 18 genes from fatty acid metabolism and the TCA cycle were reconciled. The best models generated for 786O and CHO-S reconciled all essential genes reconciled in their respective ensembles.

The difference in gene essentiality reconciliation between the three models is attributable to differences in the metabolism of E coli and mammalian cells, which are reflected in the topological features of iJO1366, Recon2.2, and iCHO1766. Because E. coli grows in minimal media, a large fraction of its metabolism is biosynthetic, leading to a higher number of growth-coupled pathways. Protection of flux through the biomass reaction leads to removal of only dispensable pathways supported by low gene expression in models extracted using GIMME. This gave rise to models with the largest increase in specificity compared to the parent genome-scale model in E. coli. On the other hand, because a much smaller fraction of Recon2.2 and iCHO1766 is coupled to biomass production, removal of reactions without evidence-based prioritization leads to erroneous removal of essential reactions. This resulted in models with low specificity in 786O and CHO-S. In contrast, mCADRE prioritizes removal of reactions that are poorly expressed and weakly connected to highly expressed reactions. This systematic removal protects against the removal of highly expressed reactions in potentially essential pathways, thereby generating models with higher specificity than those extracted using GIMME for 786O. In comparison, models generated by iMAT and MBA did not perform as well as those generated by GIMME as suggested by their proximity to the parent genome-scale model (Figure 4 and Supplementary Figure S2). Models generated by iMAT were much closer to the parent genome-scale model for E. coli and 786O, but performed considerably better in CHO-S.

3. DISCUSSION

This study evaluates key parameters influencing the quality of context-specific models extracted with various methods using gene expression data. While the choice of model extraction method and the threshold for gene expression remain the most important factors affecting model size, our analysis reveals that depending on the choice of model extraction method, the exploration of alternate solutions can lead to drastically different models. These findings suggest the need for a set of guidelines for extracting the most meaningful and biologically relevant context-specific models, to supplement guidelines on model construction (Thiele and Palsson, 2010), model annotation (Ebrahim et al., 2015), and model parameterization (Schinn et al., 2021b). Key guidelines are presented in Table 1, a workflow incorporating the proposed guidelines is shown in Figure 5, and the steps to implement the workflow are listed in Table 2. Three steps (Figure 5) are involved in the extraction of context-specific models from genome-scale models: (i) pre-processing, (ii) ensemble generation, and (iii) ensemble screening. The pre-processing step transforms the raw model and transcriptomic data into a format compatible with model extraction methods.

Table 1:

Guidelines for extracting meaningful metabolic models using transcriptomics data

#	Guideline
1	Limit nutrient uptake to media components only
2	Enforce minimum fluxes through known metabolic functions
3	Generate and screen ensembles of alternate solutions using other omics data
4	Draw inferences from conserved reactions only

Open in a new tab

Generalized workflow pipeline for extracting context-specific models using gene-expression data

Table 2:

Implementation of the workflow depicted in Figure 5.

STEP	DESCRIPTION
	Model preprocessing
STEP 1A:	Impose the lower and upper bounds for the uptake and secretion of all measured metabolites as well as the growth rate. For metabolites in the growth medium that are not measured, an arbitrary bound limiting their uptake can be imposed. Identify all reactions incapable of carrying flux using Flux Variability Analysis and remove them. The resultant pre-processed model should be flux consistent.
	Data preprocessing
STEP 1B:	Compute reaction expression scores from gene expression data using defined reaction-specific Gene-Protein-Reaction (GPR) rules. Generate multiple core reaction sets by applying different thresholds to the computed reaction expression scores. Local thresholding methods are often preferred due to their ability to retain lowly expressed housekeeping genes.
	Identify metabolic tasks that define the cell’s phenotype
STEP 2:	Generate a list of metabolic tasks that must be retained in extracted models. Metabolic tasks with available experimental measurements must be quantitatively protected. Other identified metabolic tasks should be added to the sets of core reactions.
	Generate ensembles of context-specific models
STEP 3:	Using the preprocessed model form step 1a, the preprocessed reaction expression scores from step 1b, and the metabolic tasks from step 2 as inputs, generate ensembles of at least 50 models using any model extraction method.
	Screen and select the best-performing models
STEP 4:	For each model in the generated ensemble, compute the specificity and sensitivity using validation data (gene knockout, flux prediction, etc.). Compute the distance from the ideal model using the expression: $\sqrt{{(1 - sensitivity)}^{2} + {(1 - specificity)}^{2}}$ . The top performing models have the lowest distance metric.

Open in a new tab

Preprocessing of transcriptomics involves applying a threshold to determine which reactions are likely active. To this end, transcriptomic data are log-transformed and mapped to reactions via gene-protein-reaction (GPR) relationships. A threshold (top 25^th percentile, top 50^th percentile, etc.) is applied to reaction expression scores to extract lists of reactions based on the requirements of model extraction methods. Here we investigated combinations of five thresholds (global 60^th percentile, global 75^th percentile, global 80^th percentile, StanDep, and local T2 threshold) and four model extraction methods (GIMME, iMAT, MBA, and mCADRE). GIMME and mCADRE require the lists of reactions with expression scores below and above the specified threshold, respectively. iMAT and MBA require two thresholds to classify reactions into highly expressed and weakly expressed sets. Incorporating media information identifies and eliminates inconsistent core reactions which protects the workflow from extraction failures (see Supplementary Results). After preprocessing, gap-filling of metabolic networks is performed using model extraction methods to ensure flux consistency of the core reaction set.

During model extraction, it is mandatory to retain and protect the flux through known metabolic functions in the conditions being investigated. Indeed, required metabolic functions are not always retained in extracted models (Opdam et al., 2017) and protecting metabolic functions reduces the variability in model content between models extracted using different extraction methods (Richelle et al., 2019a). This study, however, finds that merely protecting these tasks is insufficient to ensure the required flux through the metabolic task. For example, the predicted growth rate in E. coli drops by over 99% in models generated using mCADRE when a minimum growth rate is not enforced. This suggests that while gene expression data provides insights into pathway activity, it alone is insufficient to distinguish between the various metabolic states underpinning the metabolic task. Although a comprehensive list of condition-specific metabolic tasks may be obtained through a literature search, sets of metabolic known tasks in rat and human tissues have been published (Blais et al., 2017; Richelle et al., 2019b; Thiele et al., 2013). Furthermore, context-specific metabolic tasks can be predicted from transcriptomic data to inform which of all tasks should be protected when extracting a model for the desired conditions or cell type (Masson et al., 2022; Richelle et al., 2019a; Richelle et al., 2021). The inability to consistently retain and predict a required flux through essential metabolic functions implies that flux constraints on these reactions complement gene expression data and improve the biological relevance of extracted models.

The size, content, and predictive capabilities of the model are strongly influenced by the choice of model extraction method and the applied threshold for gene expression, as seen in previous studies (Opdam et al., 2017; Richelle et al., 2019b). Therefore, the choice of the right combination of parameters is crucial for extracting a meaningful model. Here we demonstrated that ROC plots can be used to identify the best performing models. While models generated using individual gene-specific local thresholds (Uhlen et al., 2015) or thresholds derived from hierarchical clustering (Joshi et al., 2020) were generally better, these thresholding methods can only be applied when multiple gene expression data samples are available. In addition to gene knockout data used for screening in this study, other types of biological data such as metabolomics and fluxomics data can be used for validation so long as the model’s recapitulation of the validation dataset can be represented using a confusion matrix. While metabolomics data reveals which metabolites actively participate in the condition being investigated, fluxomics data elucidates pathway utilization to validate generated models. Furthermore, the quality of models extracted using different algorithms varied based on the biology of the organism in question. Using available gene knockout data, we found that GIMME generated the best performing models in fast-growing prokaryotes such as E. coli, whereas the corresponding models generated for a function-oriented cell such as 786O were sub-par. These differences suggest the need for a careful assessment of thresholds and methods while constructing context-specific models for targeted applications.

The impact of alternate solutions must be assessed while extracting and/or and developing tools to extract context-specific models. Alternate optima provide meaningful insights into the reproducibility of the algorithm and highlight the variable parts of the extracted metabolic networks (Rossell et al., 2013). This arises from the insufficiency of available gene expression data to resolve pathway usage in those parts of metabolism. Thus, any inferences drawn from flux distributions involving those pathways are potentially ambiguous and would require additional validation. Furthermore, for algorithms of lower reproducibility such as MBA, generation of an ensemble of models increases the likelihood of identifying better performing models that may be more relevant to the condition being investigated.

An important factor affecting the performance of extracted models is the quality of the parent genome-scale model. While curated models such as those for E. coli benefit from a wealth of available literature, thereby leading to models with very high specificity and sensitivity, less studied and more complex organisms do not enjoy the same luxury. For example, the parent genome-scale model for 786O, Recon2.2, has a very low sensitivity of 0.02. This indicates a need for developing algorithms that leverage gene knockout data in addition to gene expression data for extracting accurate context-specific models. Better model extraction algorithms that can accurately capture the biological state of the cell will simplify the model reduction step commonly performed before computationally intensive analyses such as 13C-MFA (Sacco and Young, 2021), kinetic modeling (Islam et al., 2021), hybrid models(Khaleghi et al., 2021), and models integrating other cell processes with metabolism, such as signaling pathways, protein secretion, and many other processes (Elsemman et al., 2022; Gutierrez et al., 2020; Karr et al., 2012). This will expand the coverage of biological data that can be integrated with metabolic models to gain novel insights into the biology of the organism, study the progression of diseases, identify novel therapeutics, and inform metabolic engineering strategies in production hosts.

4. Methods

4.1. Models and Data Sources

The metabolic models iJO1366 (Orth et al., 2011), Recon 2.2 (Swainston et al., 2016), and iCHO1766 (Hefzi et al., 2016) for E. coli, human metabolism, and Chinese hamster ovary (CHO-S) cells were used as parent genome-scale models for extraction of context-specific models. Published glucose uptake rate, growth rate, and acetate secretion rate for E. coli grown in M9 Minimal Medium were used (Leighty and Antoniewicz, 2013). Glucose uptake rate, lactate secretion rate, growth rate, and uptake and secretion rates for amino acids were obtained from the NCI-60 database for the 786O renal cancer cell line (Jain et al., 2012; Opdam et al., 2017) and from literature for the CHO-S cell line (Hefzi et al., 2016). Gene expression data for E. coli grown in M9 minimal medium, 786O, and CHO-S were obtained from previously published data by Monk et al. (2016), the NCI-60 database (Klijn et al., 2015), and previously published data by Hefzi et al. (2016), respectively.

4.2. Model and Data Preprocessing

Gene expression data were converted to reaction expression scores using a gene-protein-reaction (GPR) relationship. A GPR relationship is a Boolean expression that relates genes products to enzymes catalyzing a reaction. An OR relationship indicates that a reaction can be catalyzed by multiple isozymes. In this case, the reaction expression score is computed as the maximum expression of the genes encoding the different isozymes. Association of multiple subunits is modeled using the AND relationship. The reaction expression score for an AND relationship is evaluated as the minimum expression of the genes encoding the various subunits. Reactions without GPR relationships or with missing gene expression data were assigned an expression score of −1. These scores were used to identify global thresholding approaches. Expression scores using StanDep were computed as described by Joshi et al. (2020) whereas local T2 thresholding was performed as described by Richelle et al. (2019b). These approaches enable the better retention of more lowly expressed housekeeping genes and reactions (Joshi et al., 2022). Flux variability analysis (Mahadevan and Schilling, 2003) was performed to identify and remove inactive reactions so that all reactions in the parent models used for transcriptomics-based model extraction are flux consistent.

4.3. Model Extraction Methods

GIMME (Becker and Palsson, 2008) requires as inputs one expression threshold and assignment of a reaction as the required metabolic function (RMF). Values corresponding to the 60^th, 75^th, and 80^th percentile in the reaction expression scores were applied as thresholds to determine which reactions must be removed. For expression scores computed using StanDep and the local T2 approach, thresholds of 0 and 5*ln(2), respectively were applied. The biomass reaction was selected as the RMF reaction for all three organisms and a mandatory minimum of 90% of the maximum growth rate was enforced during model extraction. Since GIMME solves a linear programming problem to identify context-specific models, alternate solutions were identified by imposing an integer cut that eliminates previously identified solutions (Maranas and Zomorrodi, 2016).

iMAT (Zur et al., 2010) requires one threshold for high expression reactions and one for low expression reactions. For the global thresholding cases, expression scores corresponding to the 60^th, 75^th, and 80^th percentile were used to identify core reactions that must be included in the extracted model, whereas scores corresponding to the 20^th percentile were considered inactive reactions for removal. For StanDep and the local T2 cases, equal upper and lower threshold of 1 and 5*ln(2), respectively were applied. Because iMAT does not inherently protect flux through the RMF reaction, a lower bound of 90% of the maximum biomass flux was enforced in the MILP formulation of the iMAT case. As with GIMME, alternate solutions were identified using integer cuts.

MBA (Jerby et al., 2010) requires two sets of reactions be provided as inputs: one set corresponding to high confidence reactions that must be included in the extracted model and a medium confidence set that is maximally retained. For the global thresholding cases, reactions with scores above the 60^th, 75^th, and 80^th percentile were considered high confidence reactions whereas those with scores above the 40^th percentile but not part of the high confidence set were included in the medium confidence set. For StanDep, reactions with expression score greater than 110% of that method’s cluster threshold were considered high confidence reactions and reactions with expression scores between 90% and 110% were considered medium confidence reactions (Joshi et al., 2020). For the local T2 case, reactions with scores above the 75^th percentile were high confidence reactions and those with scores greater than 5*ln(2) and below the 75^th percentile were included in the medium confidence set. Alternate solutions were generated by permuting the removal order of low confidence reactions. In addition to ensuring flux consistency of the high expression reaction set, a minimum flux of 90% of the maximum growth rate was enforced as a criterion for removing reactions to ensure that all models in the ensemble can predict a biologically meaningful growth rate. A separate ensemble was also generated using the conventional implementation of MBA in which the biomass formation reaction is added to the set of high confidence reactions.

mCADRE (Wang et al., 2012) requires ubiquity scores to be provided as an input. Ubiquity scores for the global threshold cases were computed by normalizing reaction expression scores by the applied global threshold. Ubiquity scores for StanDep were computed as previously described by Joshi et al. For the local T2 case, ubiquity scores were calculated by normalizing expression scores to 5*ln(2) after applying appropriate local thresholds. Reactions with a ubiquity score greater than 1 were flagged as core reactions to be protected during model extraction. Because mCADRE ranks non-core reactions based on expression and connectivity evidence, only a subset of non-core reactions of equal rank can be permuted. Alternate solutions were identified by permuting the removal order of this subset of reactions. As with MBA, a minimum of 90% of the maximum growth rate was enforced as an additional criterion for model pruning. An ensemble was also generated using conventional mCADRE with the biomass formation reaction added to the set of core reactions.

All algorithms were implemented in the COBRA Toolbox (Heirendt et al., 2019) in MATLAB ^®.

4.4. Analysis of Ensembles

The similarity of two models (model_i and model_j) in any ensemble is quantified using the Jaccard Index defined as follows:

J_{i j} = \frac{{Reactions in {model}_{i}} \cap {Reactions in {model}_{j}}}{{Reactions in {model}_{i}} \cup {Reactions in {model}_{j}}}

4.5. Validation of Extracted Models

Gene essentiality data inferred from gene knockout studies were used to screen ensembles of context-specific models. In silico gene essentiality was determined by computing the reduction in the growth rate upon inactivating one gene at a time in every extracted context-specific model. Genes were considered in silico essential if the predicted growth rate in the knockout model fell below 5% of the growth rate predicted by the original context-specific model. The quality of extracted context-specific models was evaluated by comparing model predictions of gene essentiality with experimentally determined gene essentiality. Gene essentiality data for WT E. coli grown in M9 Minimal medium was obtained from the KEIO collection (Baba et al., 2006). For the 786O cell line, gene essentiality was determined based on the CERES scores published in the NCI-60 database (Meyers et al., 2017). Genes with a CERES score less than zero were considered essential. The list of essential genes in CHO was obtained from (Xiong et al., 2021). Genes correctly predicted as non-essential were classified as true positive (TP) predictions, incorrectly predicted as essential were classified as false negative (FN) predictions, correctly predicted as essential were classified as true negative (TN) predictions, whereas those incorrectly predicted as non-essential were classified as false positive (FP) predictions. The specificity and sensitivity of the models were computed using the following expressions.

specificity = \frac{# of TN genes}{# of TN genes + # of FP genes}

(1)

sensitivity = \frac{# of TP genes}{# of TP genes + # of FN genes}

(2)

All extracted models and gene dispensability predictions are reported in the supplementary material.

Supplementary Material

NIHMS1904206-supplement-1.mat^{(1.8MB, mat)}

NIHMS1904206-supplement-2.mat^{(5.7MB, mat)}

NIHMS1904206-supplement-3.mat^{(3.8MB, mat)}

NIHMS1904206-supplement-4.mat^{(2MB, mat)}

NIHMS1904206-supplement-5.m^{(9.7KB, m)}

NIHMS1904206-supplement-6.xlsx^{(61.7KB, xlsx)}

NIHMS1904206-supplement-7.xlsx^{(21.5MB, xlsx)}

NIHMS1904206-supplement-8.m^{(4.5KB, m)}

NIHMS1904206-supplement-9.m^{(5.2KB, m)}

NIHMS1904206-supplement-10.m^{(2.2KB, m)}

NIHMS1904206-supplement-11.docx^{(12.9KB, docx)}

NIHMS1904206-supplement-12.png^{(865.6KB, png)}

NIHMS1904206-supplement-13.png^{(648.9KB, png)}

NIHMS1904206-supplement-14.png^{(18.9KB, png)}

NIHMS1904206-supplement-15.docx^{(15.4KB, docx)}

NIHMS1904206-supplement-16.xlsx^{(35KB, xlsx)}

NIHMS1904206-supplement-17.pdf^{(8.9MB, pdf)}

NIHMS1904206-supplement-18.m^{(1.4KB, m)}

NIHMS1904206-supplement-19.m^{(2.2KB, m)}

NIHMS1904206-supplement-20.m^{(1.3KB, m)}

NIHMS1904206-supplement-21.m^{(286B, m)}

NIHMS1904206-supplement-22.mat^{(882KB, mat)}

NIHMS1904206-supplement-23.mat^{(760.3KB, mat)}

NIHMS1904206-supplement-24.m^{(7.1KB, m)}

NIHMS1904206-supplement-25.m^{(9.7KB, m)}

NIHMS1904206-supplement-26.m^{(3.2KB, m)}

Highlights:

Phenotype must be protected during model extraction using gene expression data.
Choice of algorithm influences scope of alternate solutions
ROC plots are effective tools to screen and select best-performing models.
Proposed workflow guides the extraction of biologically meaningful models.

Acknowledgements:

This work was supported by funding generously provided by Amgen, the Novo Nordisk Foundation (NNF20SA0066621) and NIGMS (R35 GM119850).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Agren R, Bordel S, Mardinoglu A, Pornputtapong N, Nookaew I, and Nielsen J (2012). Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT. PLoS Comput Biol 8, e1002518. [DOI] [PMC free article] [PubMed] [Google Scholar]
Agren R, Mardinoglu A, Asplund A, Kampf C, Uhlen M, and Nielsen J (2014). Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling. Mol Syst Biol 10, 721. [DOI] [PMC free article] [PubMed] [Google Scholar]
Åkesson M, Förster J, and Nielsen J (2004). Integration of gene expression data into genome-scale metabolic models. Metabolic Engineering 6, 285–293. [DOI] [PubMed] [Google Scholar]
Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, and Mori H (2006). Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2, 2006 0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Becker SA, and Palsson BO (2008). Context-specific metabolic networks are consistent with experiments. PLoS Comput Biol 4, e1000082. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blais EM, Rawls KD, Dougherty BV, Li ZI, Kolling GL, Ye P, Wallqvist A, and Papin JA (2017). Reconciled rat and human metabolic networks for comparative toxicogenomics and biomarker predictions. Nat Commun 8, 14250. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blazier AS, and Papin JA (2012). Integration of expression data in genome-scale metabolic network reconstructions. Front Physiol 3, 299. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bordbar A, Feist AM, Usaite-Black R, Woodcock J, Palsson BO, and Famili I (2011). A multi-tissue type genome-scale metabolic network for analysis of whole-body systems physiology. BMC Syst Biol 5, 180. [DOI] [PMC free article] [PubMed] [Google Scholar]
Borrageiro G, Haylett W, Seedat S, Kuivaniemi H, and Bardien S (2018). A review of genome-wide transcriptomics studies in Parkinson’s disease. Eur J Neurosci 47, 1–16. [DOI] [PubMed] [Google Scholar]
Burke EE, Chenoweth JG, Shin JH, Collado-Torres L, Kim SK, Micali N, Wang Y, Colantuoni C, Straub RE, Hoeppner DJ, et al. (2020). Dissecting transcriptomic signatures of neuronal differentiation and maturation using iPSCs. Nat Commun 11, 462. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dickson I (2021). Full-spectrum transcriptomics in NAFLD. Nat Rev Gastroenterol Hepatol 18, 82. [DOI] [PubMed] [Google Scholar]
Ebrahim A, Almaas E, Bauer E, Bordbar A, Burgard AP, Chang RL, Drager A, Famili I, Feist AM, Fleming RM, et al. (2015). Do genome-scale models need exact solvers or clearer standards? Mol Syst Biol 11, 831. [DOI] [PMC free article] [PubMed] [Google Scholar]
Elsemman IE, Rodriguez Prado A, Grigaitis P, Garcia Albornoz M, Harman V, Holman SW, van Heerden J, Bruggeman FJ, Bisschops MMM, Sonnenschein N, et al. (2022). Whole-cell modeling in yeast predicts compartment-specific proteome constraints that drive metabolic strategies. Nat Commun 13, 801. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fouladiha H, Marashi SA, Torkashvand F, Mahboudi F, Lewis NE, and Vaziri B (2020). A metabolic network-based approach for developing feeding strategies for CHO cells to increase monoclonal antibody production. Bioprocess Biosyst Eng 43, 1381–1389. [DOI] [PubMed] [Google Scholar]
Gopalakrishnan S, Dash S, and Maranas C (2020). K-FIT: An accelerated kinetic parameterization algorithm using steady-state fluxomic data. Metab Eng 61, 197–205. [DOI] [PubMed] [Google Scholar]
Gu C, Kim GB, Kim WJ, Kim HU, and Lee SY (2019). Current status and applications of genome-scale metabolic models. Genome Biol 20, 121. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gutierrez JM, Feizi A, Li S, Kallehauge TB, Hefzi H, Grav LM, Ley D, Baycin Hizal D, Betenbaugh MJ, Voldborg B, et al. (2020). Genome-scale reconstructions of the mammalian secretory pathway predict metabolic costs and limitations of protein secretion. Nat Commun 11, 68. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hefzi H, Ang KS, Hanscho M, Bordbar A, Ruckerbauer D, Lakshmanan M, Orellana CA, Baycin-Hizal D, Huang Y, Ley D, et al. (2016). A Consensus Genome-scale Reconstruction of Chinese Hamster Ovary Cell Metabolism. Cell Syst 3, 434–443 e438. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, Haraldsdottir HS, Wachowiak J, Keating SM, Vlasov V, et al. (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc 14, 639–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
Islam MM, Schroeder WL, and Saha R (2021). Kinetic modeling of metabolism: Present and future. Current Opinion in Systems Biology 26, 72–78. [Google Scholar]
Jain M, Nilsson R, Sharma S, Madhusudhan N, Kitami T, Souza AL, Kafri R, Kirschner MW, Clish CB, and Mootha VK (2012). Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation. Science 336, 1040–1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jerby L, Shlomi T, and Ruppin E (2010). Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism. Mol Syst Biol 6, 401. [DOI] [PMC free article] [PubMed] [Google Scholar]
Joshi CJ, Ke W, Drangowska-Way A, O’Rourke EJ, and Lewis NE (2022). What are housekeeping genes? PLoS Comput Biol 18, e1010295. [DOI] [PMC free article] [PubMed] [Google Scholar]
Joshi CJ, Schinn SM, Richelle A, Shamie I, O’Rourke EJ, and Lewis NE (2020). StanDep: Capturing transcriptomic variability improves context-specific metabolic models. PLoS Comput Biol 16, e1007764. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B Jr., Assad-Garcia N, Glass JI, and Covert MW (2012). A whole-cell computational model predicts phenotype from genotype. Cell 150, 389–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khaleghi MK, Savizi ISP, Lewis NE, and Shojaosadati SA (2021). Synergisms of machine learning and constraint-based modeling of metabolism for analysis and optimization of fermentation parameters. Biotechnol J 16, e2100212. [DOI] [PubMed] [Google Scholar]
Khodayari A, and Maranas CD (2016). A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun 7, 13806. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klijn C, Durinck S, Stawiski EW, Haverty PM, Jiang Z, Liu H, Degenhardt J, Mayba O, Gnad F, Liu J, et al. (2015). A comprehensive transcriptional portrait of human cancer cell lines. Nat Biotechnol 33, 306–312. [DOI] [PubMed] [Google Scholar]
Kochanowski K, Gerosa L, Brunner SF, Christodoulou D, Nikolaev YV, and Sauer U (2017). Few regulatory metabolites coordinate expression of central metabolic genes in Escherichia coli. Mol Syst Biol 13, 903. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kori M, and Yalcin Arga K (2018). Potential biomarkers and therapeutic targets in cervical cancer: Insights from the meta-analysis of transcriptomics data within network biomedicine perspective. PLoS One 13, e0200717. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kumar A, Harrelson T, Lewis NE, Gallagher EJ, LeRoith D, Shiloach J, and Betenbaugh MJ (2014). Multi-Tissue Computational Modeling Analyzes Pathophysiology of Type 2 Diabetes in MKR Mice. PLOS ONE 9, e102319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leighty RW, and Antoniewicz MR (2013). COMPLETE-MFA: complementary parallel labeling experiments technique for metabolic flux analysis. Metab Eng 20, 49–55. [DOI] [PubMed] [Google Scholar]
Macklin DN, Ahn-Horst TA, Choi H, Ruggero NA, Carrera J, Mason JC, Sun G, Agmon E, DeFelice MM, Maayan I, et al. (2020). Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation. Science 369. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mahadevan R, and Schilling CH (2003). The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng 5, 264–276. [DOI] [PubMed] [Google Scholar]
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, and Ferrari R (2018). Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform 19, 286–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maranas CD, and Zomorrodi AR (2016). Modeling with Binary Variables and MILP Fundamentals. In Optimization Methods in Metabolic Networks, pp. 81–106. [Google Scholar]
Mardinoglu A, Agren R, Kampf C, Asplund A, Uhlen M, and Nielsen J (2014). Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat Commun 5, 3083. [DOI] [PubMed] [Google Scholar]
Masson HO, Borland D, Reilly J, Telleria A, Shrivastava S, Watson M, Bustillo L, Li Z, Capps L, Kellman BP, et al. (2022). Inferring a cell’s capabilities from omics data with ImmCellFie. bioRxiv, 2022.2011.2016.516672. [Google Scholar]
Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, Dharia NV, Montgomery PG, Cowley GS, Pantel S, et al. (2017). Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet 49, 1779–1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
Monk JM, Koza A, Campodonico MA, Machado D, Seoane JM, Palsson BO, Herrgard MJ, and Feist AM (2016). Multi-omics Quantification of Species Variation of Escherichia coli Links Molecular Features with Strain Phenotypes. Cell Syst 3, 238–251 e212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen T-M, Shafi A, Nguyen T, and Draghici S (2019). Identifying significantly impacted pathways: a comprehensive review and assessment. Genome Biology 20, 203. [DOI] [PMC free article] [PubMed] [Google Scholar]
O’Brien EJ, Lerman JA, Chang RL, Hyduke DR, and Palsson BO (2013). Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol Syst Biol 9, 693. [DOI] [PMC free article] [PubMed] [Google Scholar]
Opdam S, Richelle A, Kellman B, Li S, Zielinski DC, and Lewis NE (2017). A Systematic Evaluation of Methods for Tailoring Genome-Scale Metabolic Models. Cell Syst 4, 318–329 e316. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, and Palsson BO (2011). A comprehensive genome-scale reconstruction of Escherichia coli metabolism−−2011. Mol Syst Biol 7, 535. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pacheco MP, Bintener T, Ternes D, Kulms D, Haan S, Letellier E, and Sauter T (2019). Identifying and targeting cancer-specific metabolism with network-based drug target prediction. EBioMedicine 43, 98–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedrotty DM, Morley MP, and Cappola TP (2012). Transcriptomic biomarkers of cardiovascular disease. Prog Cardiovasc Dis 55, 64–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richelle A, Chiang AWT, Kuo CC, and Lewis NE (2019a). Increasing consensus of context-specific metabolic models by integrating data-inferred cell functions. PLoS Comput Biol 15, e1006867. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richelle A, Joshi C, and Lewis NE (2019b). Assessing key decisions for transcriptomic data integration in biochemical networks. PLoS Comput Biol 15, e1007185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richelle A, Kellman BP, Wenzel AT, Chiang AWT, Reagan T, Gutierrez JM, Joshi C, Li S, Liu JK, Masson H, et al. (2021). Model-based assessment of mammalian cell metabolic functionalities using omics data. Cell Reports Methods 1, 100040. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robaina-Estevez S, and Nikoloski Z (2017). On the effects of alternative optima in context-specific metabolic model predictions. PLoS Comput Biol 13, e1005568. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robaina Estevez S, and Nikoloski Z (2014). Generalized framework for context-specific metabolic model extraction methods. Front Plant Sci 5, 491. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rossell S, Huynen MA, and Notebaart RA (2013). Inferring metabolic states in uncharacterized environments using gene-expression measurements. PLoS Comput Biol 9, e1002988. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sacco SA, and Young JD (2021). 13C metabolic flux analysis in cell line and bioprocess development. Current Opinion in Chemical Engineering 34, 100718. [Google Scholar]
Schinn SM, Morrison C, Wei W, Zhang L, and Lewis NE (2021a). A genome-scale metabolic network model and machine learning predict amino acid concentrations in Chinese Hamster Ovary cell cultures. Biotechnol Bioeng 118, 2118–2123. [DOI] [PubMed] [Google Scholar]
Schinn SM, Morrison C, Wei W, Zhang L, and Lewis NE (2021b). Systematic evaluation of parameters for genome-scale metabolic models of cultured mammalian cells. Metab Eng 66, 21–30. [DOI] [PubMed] [Google Scholar]
Schultz A, and Qutub AA (2016). Reconstruction of Tissue-Specific Metabolic Networks Using CORDA. PLoS Comput Biol 12, e1004808. [DOI] [PMC free article] [PubMed] [Google Scholar]
Swainston N, Smallbone K, Hefzi H, Dobson PD, Brewer J, Hanscho M, Zielinski DC, Ang KS, Gardiner NJ, Gutierrez JM, et al. (2016). Recon 2.2: from reconstruction to model of human metabolism. Metabolomics 12, 109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thiele I, and Palsson BO (2010). A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5, 93–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, et al. (2013). A community-driven global reconstruction of human metabolism. Nat Biotechnol 31, 419–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, et al. (2015). Proteomics. Tissue-based map of the human proteome. Science 347, 1260419. [DOI] [PubMed] [Google Scholar]
Uhlen M, Hallstrom BM, Lindskog C, Mardinoglu A, Ponten F, and Nielsen J (2016). Transcriptomics resources of human tissues and organs. Mol Syst Biol 12, 862. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vlassis N, Pacheco MP, and Sauter T (2014). Fast reconstruction of compact context-specific metabolic network models. PLoS Comput Biol 10, e1003424. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Y, Eddy JA, and Price ND (2012). Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC Syst Biol 6, 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watcham S, Kucinski I, and Gottgens B (2019). New insights into hematopoietic differentiation landscapes from single-cell RNA sequencing. Blood 133, 1415–1426. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiong K, la Cour Karottki KJ, Hefzi H, Li S, Grav LM, Li S, Spahn P, Lee JS, Ventina I, Lee GM, et al. (2021). An optimized genome-wide, virus-free CRISPR screen for mammalian cells. Cell Reports Methods 1, 100062. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zielinski DC, Jamshidi N, Corbett AJ, Bordbar A, Thomas A, and Palsson BO (2017). Systems biology analysis of drivers underlying hallmarks of cancer cell metabolism. Sci Rep 7, 41241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zur H, Ruppin E, and Shlomi T (2010). iMAT: an integrative metabolic analysis tool. Bioinformatics 26, 3140–3142. [DOI] [PubMed] [Google Scholar]