Abstract
In recent years, the number of large-scale metabolomics studies on various cellular processes in different organisms has increased drastically. However, it remains a major challenge to perform a systematic identification of mechanistic regulatory events that mediate the observed changes in metabolite levels, due to complex interdependencies within metabolic networks. We present the metabolic network segmentation (MNS) algorithm, a probabilistic graphical modeling approach that enables genome-scale, automated prediction of regulated metabolic reactions from differential or serial metabolomics data. The algorithm sections the metabolic network into modules of metabolites with consistent changes. Metabolic reactions that connect different modules are the most likely sites of metabolic regulation. In contrast to most state-of-the-art methods, the MNS algorithm is independent of arbitrary pathway definitions, and its probabilistic nature facilitates assessments of noisy and incomplete measurements. With serial (i.e., time-resolved) data, the MNS algorithm also indicates the sequential order of metabolic regulation. We demonstrated the power and flexibility of the MNS algorithm with three, realistic case studies with bacterial and human cells. Thus, this approach enables the identification of mechanistic regulatory events from large-scale metabolomics data, and contributes to the understanding of metabolic processes and their interplay with cellular signaling and regulation processes.
Author summary
Reciprocal crosstalk between metabolism and cellular signaling pathways plays a crucial role in cellular decision-making. In recent years, this premise has motivated several metabolomics studies that aimed to gain a mechanistic understanding of metabolic phenotypes. However, due to complex interactions within metabolic networks, it remains a challenge to infer mechanisms that underlie metabolome changes. We present the metabolic network segmentation approach, a novel method that aimed to identify the sites and sequential order of metabolic regulatory events, based merely on steady state or dynamic metabolomics data. This method employs probabilistic graphical models to partition the entire metabolic network into modules with correlated metabolites. It identifies fractures between modules (i.e., reactions that connect non-correlated metabolites) as sites of regulation. We performed validation and benchmark analyses in hundreds of E. coli knockout mutants deficient in enzymes and transcription factors. Moreover, we verified the capability of our method for identifying the sequential order of metabolic regulatory events by testing it on fibroblasts exposed to oxidative stress. Our metabolic network segmentation algorithm is widely applicable, and thus, it will enhance our mechanistic understanding of metabolic phenotypes and their connections to cellular signaling and regulatory processes.
This is a PLoS Computational Biology Methods paper.
Introduction
The consolidated notion that metabolites can provide feedback to cellular signaling and alter metabolism is a hallmark of several diseases. This notion has boosted interest in gaining a mechanistic and quantitative understanding of metabolic phenotypes [1–3]. Consequently, researchers in the field of metabolomics have established a battery of methods for large-scale analyses of metabolite levels in all sample types [4]. Continual advances in instrumentation and protocols for metabolomics have led to an exponential production of high-quality data on the metabolome. The quality of the data is reflected in the high number of detectable metabolites and the high experimental reproducibility. Due to these advances, it has become common that every study discovers multiple significant metabolite changes. This positive trend, however, has exacerbated the problem of interpreting metabolome changes.
In the context of systems or cellular biology, interpretations of metabolome data aim to generate testable hypotheses about the molecular events that might have led to the observed metabolome pattern. The quest for mechanisms underlying metabolome changes is complicated by the complex relationship between metabolite levels, enzyme properties, and metabolic fluxes [5, 6]; by the dozens of pathways connected to key metabolites; and by our partial understanding of cellular regulation [7]. This complexity is a non-trivial problem that cannot be addressed with traditional uni- and multivariate statistical techniques, which are generally used to identify markers or classify samples [7]. Instead, prior knowledge of the metabolic or regulatory networks must be embedded into the analysis for an efficient inference of the links that gave rise to the observed metabolome changes.
Mechanistic model-based approaches enable to model the dynamics of the metabolic network and its interactions at molecular level. Therefore these modelling approaches have been commonly considered the methods of choice for inferring mechanisms and regulatory events [8]. However, such an in depth mechanistic description of metabolism requires detailed knowledge of the model structure and the kinetic parameters, which in many cases limits model size and applicability (reviewed in [8]). Recent ensemble modelling approaches employed high dimensional fluxomics datasets to generate large-scale kinetic metabolic models of the well-studied model organism Escherichia coli. These approaches have been successful in, for example, estimating flux changes and metabolite yields metabolism of Escherichia coli, predicting metabolite yields in engineered E. coli enzyme mutant strains [9, 10]. Yet in many studies only metabolite levels and not metabolic fluxes are available. Therefore mechanistic modelling approaches for exclusively metabolomics data have been limited to well-defined, small metabolic models with known reaction stoichiometry and only a few dozen reactions and metabolites. Nevertheless, these models were highly successfully applied for example to identify regulations from absolutely quantified metabolomics data and dynamic modeling [11–13]. However, modern, large-scale metabolomics tools enable one to profile hundreds to thousands of metabolites throughout the metabolic network [14]. For such large metabolomics datasets, in most cases only less well-defined models with at least ten-fold more features, reactions, and parameters are available; thus, current mechanistic-based approaches are prone to failure.
Consequently, an ongoing effort to develop alternative, simpler computational tools has capitalized on current knowledge of metabolic networks to facilitate detailed interpretations of large-scale metabolomics data. These tools include metabolic pathway enrichment analyses and visualizations of metabolite changes on maps of metabolic networks (Reviewed in [7]). These approaches can simplify the analysis and interpretation of metabolomics data, but they can also be limited by their reliance on fixed significance cutoffs for grouping metabolites, fixed pathway definitions, and highly user-biased interpretations. These limitations can preclude discovery of unexpected regulatory events. Two approaches, namely reporter reactions and mass action ratios, were developed to make automatic predictions of regulatory sites, based on metabolomics data [15, 16]. Both these approaches are based on the assumption that a given metabolic perturbation, such as inhibiting a metabolic enzyme, will induce the strongest, most significant metabolic alterations in the levels of substrates and products. To date, these approaches have been solely applied to small metabolic subnetworks with a few dozen metabolites, of which almost all were measured. However, these methods tend to give misleading results in regions that are sparsely covered in an analytical metabolome, like peripheral pathways. Thus, randomly sampled data would have to replace undetected metabolites, and presumably, model performance would decline [16].
To overcome this issue, we developed the metabolic network segmentation (MNS) algorithm. This generalized approach aimed to identify the regulated metabolic reactions responsible for the end point or dynamics of large-scale metabolomics experiments. In contrast to other approaches, which aim to reconstruct networks based on correlations between metabolites [17–19], we sought to identify reactions in the metabolic network that exhibited broken symmetry; i.e., metabolic changes, where substrates and products were not correlated. Our algorithm partitioned the metabolic network into regions of neighboring metabolites with consistent changes, and it identified reactions between two regions—so called fractures—as potential sites of metabolic regulation (Fig 1a).
Our algorithm is based on the undirected subclass of probabilistic graphical models, known as Markov random fields (MRFs), which were introduced by Lenz and Ising in the 1920’s to describe ferromagnetic materials [20]. MRFs consist of nodes that represent random variables and arcs that describe the probabilistic dependencies between connected random variables [21]. Therefore, MRFs have been successfully applied in diverse machine learning tasks that include noisy and incomplete data with underlying sequential or spatial structure [21, 22]. For example, in computer vision, these models are used for image segmentation or image reconstruction, where they employ dependency assumptions between neighboring pixels [22]. Likewise, the metabolic network defines an underlying spatial structure, where the substrates and products of a reaction are dependent. For example, given an unperturbed reaction, one would assume that changes in substrate and product levels should correlate. Given the structured nature of metabolism, MRFs are an ideal approach for pattern recognition in biochemical networks.
Here, we demonstrate that our approach outperformed current state-of-the-art algorithms in the identification of known regulatory sites and the prediction of novel regulatory sites. Moreover, we extended the algorithm for metabolomics data with serial structure, such as time courses. This unique extension enabled the model both to identify sites and also to determine the sequential order of events, i.e., the timing of metabolic regulatory steps.
Design and implementation of the MNS algorithm
Inference of metabolic regulation sites from univariate metabolomics data
The goal of our approach was to predict sites of metabolic regulation, such as activated or inhibited enzymes, from differential metabolomics data. These data were obtained from a univariate comparison of metabolite levels across conditions. Supported by experimental data [15, 16], we designed the MNS algorithm based on the assumption that the strongest metabolic changes occur in close proximity of a perturbed enzyme (Fig 1a). For example, when a flux-carrying enzyme is inhibited by a drug, we expected that the substrate and its precursors would accumulate, and the product levels would decrease. In contrast to existing methods, we did not focus solely on the direct substrate and products for the inference; instead, we aimed to segment the entire metabolic network into regions of consistent metabolite changes, so-called metabolic modules, where fractures between two modules are potential sites of metabolic regulation (Fig 1a). To deal with sparse and noisy metabolite measurements, our algorithm considered interdependencies between neighboring metabolites for predictions; i.e., we assumed that reactant levels of an unperturbed reaction would correlate.
Our algorithm was designed to build on organism-specific, genome-wide, metabolic network reconstructions. It employs main reactant-pair models, obtained from the KEGG database with a modified version of the MetaboNetworks toolbox [23, 24]. For each metabolite in the network, we introduced a discrete hidden variable, yi ∈ y, into a MRF model (Fig 1b). The discrete values of the hidden variables represented the labels of the metabolic modules to which a certain metabolite belonged. Specifically, metabolites with similarly decreasing, increasing, or constant metabolite levels were likely to have the same label. The module labels of the hidden variables were not known a priori (hidden), but they were inferred by the algorithm through optimization of the conditional likelihood.
To account for network structure and proximity, we defined probabilistic dependencies between hidden variables that represented the reactants of the main reactant pairs (Fig 1b). Assuming that, in an unperturbed system, neighboring metabolites should be in the same module, we enforced a local homogeneity between neighboring hidden variables by introducing for each maximal clique (cj) in the hidden layer, a neighborhood factor potential, ψN,j(cj). The maximal clique was defined as a maximal subset of nodes (i.e., metabolites), which were all connected to each other. The neighborhood factor potential ψN,j(cj) was a hidden state, label dependent, exponential decay function, defined by:
where the unique(cj) equals the number of different labels of hidden variables involved in a maximal clique, cj; the size(cj) is the number of hidden variables in the cj; and λ1 is a weighting factor that controls the strength of the neighborhood influence (Fig 1c).
Because the states of the hidden variables were required to be dependent on the metabolite data, for each detected metabolite of a given dataset, we introduced a continuous observed variable, xi ∈ x, which represented the metabolic change, based on a univariate comparison, e.g., the log2(fold-change) or the z-scores (Fig 1b). The dependency between hidden and observed states was described by an observation factor potential, ψO,i(xi,yi), which described their relationship with hidden state-dependent Gaussian distributions, as follows
where μ(yi) and σ(yi) are the hidden state-dependent mean values and standard deviations, respectively (Fig 1c). The hidden state-dependent mean values were defined as either the mean values of the clusters obtained from a k-means clustering (k-means) analysis, or as the values equally distributed between the 0.001 and 0.999 quantiles of the complete dataset (quantile, S1 Table). The standard deviations can be set to a constant user-defined value (fixed); or they can be individually obtained from the standard deviations of the k-means clusters (k-means); or they can be set to the standard deviation of the complete dataset (all data, S1 Table). Together, the neighborhood and observation factor potentials of all metabolites, i ∈ M, and all maximal cliques, j ∈ C, describe the conditional likelihood of the MRF model, defined as
By maximizing the conditional likelihood, the optimal hidden state label distribution, , can now be determined as follows:
Due to the complexity of our network, we employed an approximation algorithm called LazyFlipper, available in the OpenGM toolbox, to infer the best hidden states [25, 26].
To infer regulatory sites, our algorithm repetitively segmented the metabolic network by estimating the most likely hidden states, , with increasing neighborhood influence, λ1 (S1 Fig). The fractures between individual modules represented potential regulatory sites. This procedure was inspired by the watershed algorithm for image segmentation [27]; it assumes that the fractures identified at a high neighborhood influence, λ1, are most likely biologically meaningful. Sequential scanning through λ1 values was performed in a two-step process. In the initial step, λ1 was exponentially increased until all metabolites were assigned to the same module, and no more fractures were found. Then, in the second step, an extensive search within a linear range of λ1 was performed, which resulted in a list of fractures for each given λ1. The likelihood that each fracture represented a site of metabolic regulation was quantified with two entities. One entity was the fracture counts (#fractures), i.e., for how many values of λ1 can a given reaction be identified as a fracture between two modules? The second entity was the maximum λ1 value (max(λ1)) at which a reaction remained classified as a fracture (Fig 1d, S1 and S2 Figs). The significance of the fractures could then be determined with a permutation test, by comparing the real outcome with the outcome from 1000 repetitions of the analysis performed with permuted metabolite labels (S1 Fig). The p-value for each reaction, with max(λ1) = λ* and #fractures = n, can then be calculated as follows:
and
To combine the inference results of individual predictors with different parameterizations, the max(λ1) and #fracture rankings were integrated with the rank product. The most likely regulated reactions were ordered by ascending rank products. Significance for the rank products was calculated, as described previously [28].
Identification of the sequential order of metabolic regulation steps from dynamic metabolomics data
Metabolism provides the molecular building blocks, energy, and redox equivalents to fulfill the physiological needs of a cell. Therefore, a given metabolic perturbation must be compensated by other metabolic branches to sustain the cell’s physiological requirements. This means that a metabolic perturbation can cause primary regulation events, i.e., the enzymatic target of a perturbation, and secondary regulation events, i.e., the metabolic regulation events required to compensate for the effects of the perturbation. Thus, in metabolomics research, it is often crucial to know which enzymes are regulated, and in addition, the causal order of these regulation events. Differences in the sequence of regulation events given dynamic data, called sequential order from here, can give a first indication for causal regulatory interactions For example, the timing of regulations can distinguish between primary and secondary effects of a drug. However, to date, no method is available that can automatically infer the sequence of regulation events. To enable such analyses, we extended our approach for identifying sites to include identifications of the sequential order of metabolic regulation steps, based on dynamic metabolomics data, such as time courses or compound dilution series (Fig 2a). Specifically, the sequential data was split into individual frames, and each frame was represented by a MRF model for univariate data, as described previously (Figs 1b and 2b). We assumed that, without perturbation, the levels of a given metabolite at two consecutive data points were invariant. Therefore, we introduced a dependency on sequential data points with a sequence factor potential function ψS(yi,s-1,yi,s), which connected hidden variables of neighboring frames (Fig 2b), as follows:
where λ2 is a weighting factor that quantifies the influences of neighboring sequence frames on each other. Given this sequence dependency, the conditional likelihood was defined as:
where S is the total number of sequence frames, yi,s and xi,s are the hidden and observed variables, respectively, and cj,s is the maximal neighborhood clique for each sequence frame, s. Similar to the univariate approach, we aimed to enable automatic identifications of sequences and neighborhood fractures with a scanning process (S3 Fig). First, the ranges of the neighborhood and the sequence weights, λ1 and λ2, were defined individually by exponentially increasing λ1 with constant λ2 = 0, until no more neighborhood fractures were identifiable, and by exponentially increasing λ2 with constant λ1 = 0 until no more fractures were found. Second, the neighborhood and sequence fractures were extracted by inference of the hidden state labels for every parameter combination, during a scan through all combinations of the defined ranges of λ1 and λ2.
Similar to the identification of the sites of metabolic regulation, we expected sequence fractures to be more relevant when they persisted with high sequential influence (i.e., high values of λ2). In the identification of significant fractures, we balanced the interdependency of the neighborhood, observation, and sequence factor potential functions, with a score function:
where was the optimal module label distribution derived by inference, given λ1, λ2; and the observations x, , and , were the sequence and neighborhood fracture counts, respectively; given the inference solution, , the max(#fracturessequence) and max(#fracturesneighborhood), represented the total counts of possible fractures, given the model structure; moreover, ws and wn were the weights that determined the influence of the numbers of sequence and neighborhood fractures in the score function. This score function served to balance the fracture frequency, to ensure that it was comparable between different experiments. By maximizing the score function, the best combination, , could be derived as follows:
Parameterizations were excluded from the maximization, when they gave a zero value for either the observation potential or the number of fractures. Thus, the sites and the sequential order (i.e., timing or sensitivity) of metabolic regulation steps that were represented by the most stable sequence and neighborhood fractures could be extracted by scanning through the increasing weights, ws and wn.
Results
Parameter optimization for automatic identification of sites of metabolic regulation
We first optimized the parameters used by the MNS algorithm (S1 Table), based on a non-targeted metabolomics dataset of 62 Escherichia coli single enzyme knockout and overexpression mutants, which exhibited a wide variety of metabolome phenotypes (S2 Table, details in S1 Text). The optimization results demonstrated that, for most parameter combinations, the MNS algorithm achieved inferences of regulatory sites that were comparable to those achieved by an expert scientist that manually identified the perturbed enzymes (S3 Table, S4 Fig). Furthermore, we evaluated whether the algorithm could provide a significantly better prediction of which enzyme was perturbed compared to a random guess with a permutation test. The results demonstrated that the MNS algorithm with the best parameterization (P3, S1 Table) identified 11 (maximum λ1) and 13 (#fractures) of the 62 perturbed enzymes, significantly better than the permutation test (Fig 3a, S3 Table).
Interestingly, when we performed a comparison of the performance of our approach with different individual parameter combinations, we found that, for some enzymatic perturbations (e.g., the sdhC knockout), a different parameter set was preferable (Fig 3b). Inspired by the notion that a combination of multiple predictors might improve the prediction results [29–32], we combined independent predictions obtained with different parameters, by integrating them with the rank product. When we tested the performance of these pairwise combinations, we found that three individual combinations of the parameterizations slightly improved the prediction results, and 14 (22.6%) reactions were identified significantly better with the MNS algorithm than with the permutation test (p <0.05, permutation test; Fig 3c, S3 Table).
Performance evaluation and comparison to existing methods
We compared the performance of the MNS algorithm with optimized parameters against two state-of-the-art, but conceptually simpler, methods: reporter reactions [16] and mass-action ratios [15]. These comparisons employed independent metabolomics data from a genome-wide screen of 647 E. coli single enzyme knockout mutants [33]. We evaluated the prediction performance, based on the rankings of exact identifications of the perturbed reactions and of near misses, i.e., a prediction of the first-neighbor reaction. To evaluate significance, we performed a permutation test. The reporter reaction algorithm predicted 48 (7.4%) knocked-out enzymes significantly, and thereby surpassed the mass-action ratio algorithm, which identified 31 (4.8%) knocked-out enzymes significantly (p <0.05, permutation test; Table 1, S5 Fig). With the best single set of parameters derived from the previous optimization (P3), our MNS algorithm predicted 55 (8.5%) of the perturbed enzymes significantly; moreover, with the best combined parameterization (P2/P3/#fractures), our algorithm identified 74 (11.4%) of the knocked-out enzymes significantly (p <0.05, permutation test; Table 1, S5 Fig). Thus, our method showed a more than 50% improvement in prediction performance compared to current state-of-the-art-methods (Table 1, S5 Fig).
Table 1. Comparison of algorithms in the identification of the experimentally perturbed reactions in 647 E. coli enzyme knockout mutants.
Algorithm | Genes found in TOP10 Ranks [%] | #Significantly identified reactions | Significantly identified reactions [%] | |||
---|---|---|---|---|---|---|
Exact | Total | Exact | Total | Exact | Total | |
MNS—P3—max(λ1) | 4.0 | 38.6 | 48 | 55 | 7.4 | 8.5 |
MNS—P3—#fractures | 4.2 | 38.8 | 48 | 55 | 7.4 | 8.5 |
MNS—P2 & P11—max(λ1) | 4.3 | 38.9 | 66 | 71 | 10.2 | 11.0 |
MNS—P5 & P8—max(λ1) | 3.6 | 28.4 | 62 | 67 | 9.6 | 10.4 |
MNS—P2 & P3—#fractures | 5.6 | 42.2 | 69 | 74 | 10.7 | 11.4 |
reporter reactions | 2.8 | 43.6 | 37 | 48 | 5.7 | 7.4 |
mass action ratio | 1.5 | 38.5 | 26 | 31 | 4.0 | 4.8 |
Inference of regulatory sites in E. coli transcription factor knock-out mutants
To illustrate further the potential of the MNS approach, we applied it to three cases with widespread, complex metabolome changes, namely the E. coli transcription factor knockout mutants, Crp, MetR, and ArgR. For Crp, the reactant pairs predicted by the algorithm significantly overlapped with known targets of transcription factors (38% overlap, p = 0.023, hypergeometric test, S6 Fig) [34]. In contrast, for ArgR and MetR, no overlap was detected between the predicted and known targets (S6 Fig). The ArgR knockout should induce active arginine biosynthesis, even when arginine was present in the media [35]; however, we did not expect a significant overlap between known and predicted targets in MetR mutants, because methionine in the medium could inhibit MetR activity in wild-type E. coli [36].
Surprisingly, results for the metR knockout mutant repeatedly identified cyaA and cpdB among the top predicted genes regulated (S4 Table). Both the CyaA and CpdB enzymes are involved in the homeostasis of cyclic nucleotide monophosphates (cNMPs), and they are known to be regulated by Crp, but not by MetR [37, 38]. Consistent with MNS predictions, the raw metabolite data revealed increased metabolite levels of all cNMPs and nucleotide monophosphates (NMPs), but nucleotide triphosphate (NTP) levels remained constant, and nucleoside levels declined (Fig 4a). This pattern suggested an activation of the adenylate cyclase, CyaA, or the inhibition of the 2',3'-cyclic-nucleotide 2'-phosphodiesterase/3'-nucleotidase, CpdB, by MetR. A non-specific effect on cNMP production rates could be excluded (S7a Fig). To verify the predicted regulation of CyaA and CpdB activity by MetR, we performed CyaA and CpdB enzyme assays in crude protein extracts of E. coli metR knockout and overexpression mutants. In extracts from E. coli with metR gene knockouts, we found elevated CyaA activity and unaltered CpdB activity (Fig 4b, S7b and S7c Fig). In metR overexpressing strains, we also found no significant change in enzymatic CpdB activity, but a strong reduction in enzymatic CyaA activity. These results confirmed that MetR directly or indirectly negatively regulated CyaA activity. In contrast, the predicted regulation of CpdB could not be confirmed. We speculated that either the prediction of CpdB could be considered a near miss in CyaA regulation or that in vivo regulation of CpdB activity could not be reproduced with in vitro enzyme assays. This latter possibility could occur, for example, when the regulation is mediated by allosteric interactions or post-translational modifications. Nevertheless, in summary, this example also demonstrated that novel sites of metabolic regulation could be identified by MNS, even in complex situations, like transcription factor knock-out mutants.
Identification of the sequential order of oxidative stress-induced metabolic regulation in human fibroblasts treated with increasing concentrations of H2O2
As mentioned before, it is often crucial that, in addition to identifying sites, a model can also determine the sequential order of metabolic regulatory steps. For example, it is often important to distinguish between the primary targets and the secondary effects of a metabolic drug. Therefore, we investigated the potential of our MNS algorithm in predicting both the sites and sequential order of metabolic regulation, based on dynamic metabolite data. We applied the algorithm to a previously published metabolomics dataset from fibroblasts treated with increasing concentrations of H2O2 [39]. We ran the algorithm with three clusters that had fixed mean values (μ1 = -0.1, μ2 = 0, μ3 = 0.1), with data-dependent standard deviation values for the observation function, and with a fixed scanning range, for λ1 and λ2, which was determined with a prior, automatic coarse-grained scan through the parameters (S2 Text). The score distributions, which depended on sequence and neighborhood weights, ws and wn, indicated that the influence of the metabolic neighborhood, λ1, and of the sequential hidden variables, λ2, could be balanced to filter out equal degrees of non-significant neighborhood and sequence fractures (S8 Fig). The application of the algorithm smoothed the heterogeneous module label distributions with increasing neighborhood and sequential influences (λ1 and λ2), which resulted in a continuous reduction of sequence and neighborhood fractures, and a focus on a small subset of metabolites and reactions (S8, S9 and S10 Figs). For certain ranges of the neighborhood and sequential weights, the analysis resulted in a pseudo steady-state distribution of the module labels and fractures (grey area, S8b Fig). We expected that these pseudo steady-states would be of special interest, because they represented module label configurations with a certain stability, and thus, they might be biologically meaningful.
Next, we further investigated the biological relevance of the inferred metabolic regulatory steps. At an intermediate level of neighborhood and sequential influence, we identified oxidative stress-induced regulatory steps in the citric acid cycle, glycolysis, and pentose phosphate pathway (Fig 5b). Most predicted enzymes in the citric acid cycle, including succinate, α-ketoglutarate, and malate dehydrogenase, were previously reported to be sensitive to oxidative stress [40–42]. Isocitrate dehydrogenase was not previously reported to be influenced by oxidative stress, but the upstream enzyme, aconitase, was known to be inhibited by oxidation [40]; however, our algorithm did not predict that aconitase was regulated. We hypothesized that this discrepancy was due to the inability of the non-targeted metabolomics method to distinguish between isomers like citrate and isocitrate, which are the substrate and product, respectively, in the aconitase-catalyzed reaction [43]. To test this hypothesis, further follow up analysis must be performed with targeted mass spectrometry methods that enable a distinction between the two isomers.
At high sequential and neighborhood influences, the only inferred metabolic regulations were the major, known regulators of the metabolic response to oxidative stress in upper glycolysis and the pentose phosphate pathway (Fig 5c, S9 and S10 Figs). These inferred regulations included oxidative stress-mediated activation of glucose 6-phosphate dehydrogenase [39], inhibition of glycolytic flux by oxidation of glyceraldehyde-3-phosphate dehydrogenase [39, 44, 45], and changes in the directions of net fluxes for transketolase and transaldolase [39]. Importantly, our algorithm revealed that the activation of glucose 6-phosphate dehydrogenase and the inhibition of glycolytic flux occurred at equally low H2O2 concentrations, and that the accumulation of non-oxidative pentose phosphate pathway metabolites was observed only at high H2O2 concentrations (Fig 5c). Because the accumulation of pentose phosphates is a consequence of increased glucose 6-phosphate dehydrogenase flux [39], our method correctly identified the sequential order of these regulatory events.
Discussion
Here, we presented the MNS algorithm, a novel computational method that employs probabilistic graphical models to infer the sites and sequential order of metabolic regulatory events, based on relative metabolomics data and metabolic network topology (Fig 6). Exploiting probabilistic dependencies specified by the metabolic model, our approach successfully coped with large-scale, noisy, and sparse datasets. In addition, our approach outperformed existing methods, like reporter reactions [16] and mass-action ratios [15], in the identification of regulatory sites. Furthermore, it could infer the sequential order of regulatory events, which is often essential for identifying direct and indirect metabolic regulatory events.
Constraints on the application of the method
In the parameter optimization and method comparison sections, we demonstrated that, although the MNS algorithm inferred many genetically perturbed enzymes correctly, a significant portion of perturbed enzymes were not identified. However, the algorithm’s prediction performance was comparable to a manual analysis by an expert. A low prediction performance might be due the fact that some genetic perturbations of enzymes have little effect on metabolism. For example, inhibition of an enzyme that carries no metabolic flux will not influence the metabolic phenotype. Likewise, when a flux-carrying enzyme is perturbed, concomitant regulation of the involved isoenzyme can maintain the flux of the perturbed reaction. In both cases, the metabolite data will not indicate any changes, and the perturbed reaction will not be identifiable.
Moreover, the perturbation of a single metabolic enzyme can cause global rearrangements in metabolism. This is due to the fact that metabolism supplies the cell with the molecular building blocks, energy, and redox equivalents it needs to meet the cell’s physiological requirements. Therefore, perturbations of required enzymes can induce secondary regulation to stimulate other metabolic branches that can compensate for the changed flux. Given steady state data, these secondary regulation events might not be distinguishable from the regulation of the experimentally perturbed enzyme, or in some cases, secondary events might occlude the perturbation. Therefore, it is highly important to identify the causal chain of regulatory events that occur after a metabolic perturbation. The extended version of the metabolic network segmentation algorithm, for sequential metabolomics data enables to identify differences in the sequence of regulations, e.g. a temporal delay in time course experiments. These differences in the sequential order of regulations can be causal, but also result from secondary effects. For example, a drug can perturb the activity of several enzymes in parallel, but due to differences in turnover rates of the enzymes the regulations might be observed with time delays. Therefore, the sequential order would indicate that there are secondary effects although the enzymes are regulated directly by the drug. Therefore additional analysis are required to increase certainty about causality in the chain of regulations. For example, dynamic simulations of detailed mechanistic models of biological subnetworks can indicate if the predicted dependencies amongst regulations can explain the observed data. Furthermore, additional experiments with enzyme knockouts enable to investigate the dependency of potential secondary regulations on primary ones in enzyme knockout experiments.
Moreover, it is necessary to ensure that the sequential samples, for example dilution steps in a drug dilution series or the sampling frequency in time course experiments, have sufficient resolution to distinguish the regulatory events. Notably, modern sampling approaches reach a sampling frequency in the range of seconds. Although some metabolic regulations can occur even with delay times below seconds, these approaches meet the requirements for distinguishing the sequence of most metabolic regulatory events [11, 46].
Plans for further development of the algorithm
Due to the modularity of the underlying MRF models, the presented MNS algorithm can be extended and used for other applications. One problem in metabolomics research is the annotation of metabolites in non-targeted datasets. Recently, different network-based approaches have demonstrated that information on the metabolic environment improved the annotation of metabolites [18, 47, 48]. We think that our method might also be extended to include uncertainty in metabolite annotation. This extension would require that we add to the MRF models a third node-type, which represents the raw data features. The raw-data nodes would be connected to potential metabolites with a given probability, which would describe how likely the connection was between the raw-data feature and the metabolite. During the segmentation process, this probability could be continuously optimized to identify sites of regulation steps and to identify likely connections between metabolites and raw-data features.
Furthermore, we are currently working on methods for focusing on the extraction of co-regulated enzyme-metabolite modules from integrated ‘omics’ data. Recently, different subgraph extraction approaches have achieved great success in identifying areas in the metabolic network that showed changing activity [49–51]. These subgraphs have simplified the complex data structure, which has enabled manual interpretations within a network context. We aim to design an algorithm that can identify co-regulated metabolite-enzyme modules in metabolomics data integrated into metabolic models, together with multiple other ‘omics’ layers that might influence enzymatic activity; for example, data from transcriptomics, proteomics, or post-translational modification analyses. This new algorithm would provide a means to make direct inferences from the extracted modules to determine which type of regulation was most likely to cause changes in metabolic activity.
One further promising extension we are planning to implement is a multidimensional clustering approach for multivariate datasets. This approach would aim to form clusters of metabolite levels, datasets, and the metabolic network, in parallel. In contrast to current approaches (e.g., correlation networks [17, 52]), this algorithm would enable the identification metabolic modules of metabolites and their corresponding datasets, simultaneously. Due to the high metabolic heterogeneity in cancers [53–55], such a method would be highly advantageous for identifying similarly regulated metabolic branches in different cancer cell lines or tumor samples. This could facilitate tumor categorizations and improve the development of tumor-specific treatments that target the metabolism.
Conclusion
In its current state, the MNS method is broadly applicable to different metabolic setups, where it can facilitate automated interpretations of large-scale metabolomics data. Moreover, due to the modularity of the underlying MRF models, more advanced metabolomics data analysis approaches can be developed, based on the MNS algorithm. Thus, the presented MNS approach will enhance our understanding of metabolism and its interactions with cellular signaling and regulatory processes.
Materials and methods
Manual identification of perturbed reactions in E. coli enzyme mutants using visual integration
To get a gold standard how well the perturbed reactions given the metabolomics data of E. coli enzyme knockout and overexpression mutants can be identified by experts, we extract for each mutant a local network comprising metabolites and reactions with a maximal distance of three reactions around the perturbed reaction and visualized the log2(fold-changes) comparing metabolome data from knockout and wildtype strains on the individual metabolites using Cytoscape v3.0 [56]. Based on this subnetwork, two individual experts guessed by eye which of the reactions was perturbed without knowing the enzyme and metabolite names. The datasets were then classified into three classes based on the distance between the manual guess and the genetically perturbed enzyme. The classes are exact (distance = 0), first neighbor (distance = 1) and not identifiable (distance > 1).
Parameter settings of the univariate MRF model for the parameter optimization for the identification of sites of metabolic regulation
In the parameter optimization for the automatic inference of sites of metabolic regulations, we optimized three parameters, the number of hidden state labels, the mean values and the standard deviations of the observation potential functions. The hidden state labels were varied between 3, 4 and 5. The hidden state label dependent mean values were either set to the mean values of the clusters derived k-means clustering with Euclidean distance (k-means) or set to values equally distributed between the 0.001 and 0.999 percentile of the input data to guarantee homogenously distributed centers of the observation potential functions (quantile). The hidden state label dependent standard deviations was set identical for each hidden state label either to 1 (fix), to the average standard deviation of the k-means clusters (k-means), or to the standard deviation of the complete data set. This resulted in total in 18 parameter combinations (S1 Table).
To identify the best Markov random field model parameterization, we inferred for each parameter combination 62 known regulatory sites using metabolomics data from 62 E. coli enzyme knock-out and overexpression mutants. We determined significantly identified reactions by comparing the real inferred rank of the perturbed reaction and its first neighbors with the rank distribution of the perturbed reaction and its first neighbors given 1000 permutations of the reaction labels resulting in p-values for each perturbed reaction and each method. Thereby we identified how many known perturbed reactions were identified correct.
Method comparison for the inference of sites of metabolic regulation
We compared the potential of our algorithm to identify sites of metabolic regulation with reporter reactions [16] and mass-action ratio [15]. We implemented the reporter reaction and the mass-action ratio as described previously, considered only the direct substrates and products of a reaction and replaced undetected metabolites by random sampling of not annotated peaks [15, 16]. For both methods all reactions in the model were ranked according to descending absolute z-scores (reporter reactions) or descending absolute mass-action ratio. For all methods including ours we determined significantly identified reactions by comparing the real rank of the perturbed reaction and its first neighbor reactions with the rank distribution of the perturbed reaction and its first neighbors given permutations of the reaction labels resulting in p-values for each perturbed reaction and each method. The p-values for reactions identified exactly or to the first neighbor were calculated individually. A reaction was considered significantly identified for p-values < 0.05 of the exact and/or first neighbor identification.
Strains and overnight pre-culture preparation
All used E. coli knock-out strains were part of the KEIO knock-out library [57] and all used overexpression strains were part of the ASKA overexpression library [58]. KEIO strains were compared against E. coli K12 BW25113, denoted as KEIO WT. Induced ASKA strains were compared against non-induced ASKA strains, denoted as ASKA WT or against the average over all induced ASKA strains per plate, denoted as others. For overnight precultures, strains were inoculated from LB rich medium agar plates (10 g/L Bacto peptone, 5 g/L Bacto yeast extract, 5 g/L NaCl, with additional 15 g/L agar-agar for solidification) into 5 mL of LB liquid medium in 15 mL culture tubes and incubated for 16 h at 37°C and 300 rpm. LB medium for KEIO strains was supplemented with 50 μg/mL kanamycin as resistance marker. KEIO WT was cultured in pure LB medium. ASKA strains were supplemented with 20 μg/mL chloramphenicol as resistance marker. Overexpression in ASKA strains was induced by adding of 100 μg/mL IPTG. Overnight preculture preparation was conducted identical for all experiments.
Growth phenotyping of enzyme mutant strains
To define optical density at 600 nm (OD) values corresponding to mid-log phase for KEIO and ASKA strains, growth experiment were conducted. KEIO and ASKA strains were inoculated at OD 0.05 from overnight precultures and subsequently cultured in 1 mL M9 medium (7.52 g Na2HPO4·2H2O, 3 g KH2PO4, 0.5 g NaCl, 2.5 g (NH4)2SO4, 14.7 mg CaCl2·2H2O, 246.5 mg MgSO4·7H2O, 16.2 mg Fe(III)Cl3·6H2O, 180 μg ZnSO4·7H2O, 120 μg CuCl2·2H2O, 120 μg MnSO4·H2O, 180 μg CoCl2·6H2O, 1 mg thiamine hydrochloride per liter of deionized water) supplemented with 4 g/L glucose and 2 g/L N-Z casein hydrolysate in 96-deep-well plates at 37°C and 300 rpm. To induce overexpression in ASKA strains, the medium was supplemented with 100 μg/mL IPTG, all other strains were cultured in unsupplemented medium. For each mutant, four technical replicates were used. OD was measured in intervals of 45 min for a total of 10 time points using a TECAN sunrise spectrophotometer. Based on the observed growth rates, strains for metabolomics experiments were selected (S2 Table).
Extraction of the E. coli enzyme knockout and overexpression metabolome
Strains for metabolomics measurements were prepared identical as described for growth experiment of enzyme mutant strains. In total, 42 KEIO enzyme knockout strains, plus one KEIO WT per plate and 20 ASKA strains plus three non-induced ASKA WT strains per plate were inoculated. Each strain was grown to mid-exponential phase, i.e. OD 1.60 for KEIO strains and OD of 0.80 for ASKA strains. Harvested cells were centrifuged at 4000 rpm for 10 min at 0°C. Cell pellets were immediately extracted with 150 μL preheated ddH20 for 10 min at 80°C. The extraction broth was centrifuged for 10 min with 4000 rpm at 0°C. The supernatants were diluted 1:5 in ddH20 for non-targeted metabolomics.
Generation of crude protein extracts for enzyme assays
For the generation of crude protein extracts ASKA MetR induced and non-induced strains were inoculated in 1:25 dilutions and KEIO WT and metR KO strains in 1:50 dilutions in M9 medium with N-Z casein plus 2 g/L, for induced strains additionally supplemented with 100 μg/mL IPTG, and grown to a final OD of 0.6 (ASKA) or 1.2 (KEIO). The cell suspensions were transferred into 50 mL falcon tubes and centrifuged at 4000 rpm for 10 min at 0°C. For protein extraction, supernatants were removed and pellets resuspended in 3.5 mL ice-cold Tris-HCI buffer (pH 7.5, 5 mM MgCl2, 2 mM DTT and 4 mM PMSF). Cells were subsequently lysed by a French press to obtain crude cell lysates. Enzyme concentrations were normalized by a colorimetric Bradford assay [59]. Total protein concentrations of extracts from KEIO strains were normalized to a protein concentration of 2.69 mg/mL and from ASKA strains to a total protein concentration of 1.47 mg/mL.
CyaA and CpdB enzyme assays
To determine the enzyme activity of CyaA and CpdB in dependence of MetR activity, enzyme assays for the protein crude extracts were conducted with a final concentration of 1 mM or 10 mM of AMP, cAMP or ATP as substrates. Enzyme assays were conducted at 37°C with two biological replicates. For the enzyme assays 100 μL of pre-warmed crude cell extract were incubated with 50 μL of substrate. To study the enzymatic reaction dynamics 10 μL samples were taken at various time points (15 s, 45 s, 75 s, 120 s, 180 s, 300 s, 450 s, 600 s, 900 s, 1500 s, 2100 s, 2700 s) and immediately quenched in ice cold 100% methanol. To remove precipitated protein samples were centrifuged at 4000 rpm at 0°C for 10 min, and reactant concentrations in supernatants were measured by non-targeted metabolomics using a previously published method [43].
Non-targeted metabolomics by flow injection—Time of flight mass spectrometry
Non-targeted analysis of metabolite extracts was performed by flow injection—time of flight mass spectrometry on a Agilent 6550 ion funnel QTOF instrument (Agilent, Santa Clara, CA) in negative mode 4 GHz, high resolution in a m/z range of 50–1000 as described previously [43]. A 60:40 mixture of isopropanol:water supplemented with NH4F at pH 9.0, as well as 10 nM hexakis(1H, 1H, 3H-tetrafluoropropoxy)phosphazine and 80 nM taurocholic acid for online mass calibration. Ions were annotated to metabolites based on exact mass considering [M-H]- and [M+F]- and 0.001 Da mass accuracy using the KEGG eco database [24]. All metabolomics data analysis was performed using Matlab 2014b (The Mathworks, Natick, MA).
Supporting information
Acknowledgments
We thank Tobias Fuhrer for providing the metabolomics data for the KEIO knock-out library.
Data Availability
The MNS toolbox is available at http://www.imsb.ethz.ch/research/zamboni/resources.html or https://github.com/kuehnean/MNS_toolbox/ under GPLv3 license. Source data and code to reproduce the analyses included in the paper are available at http://www.imsb.ethz.ch/research/zamboni/resources.html.
Funding Statement
This work was partially funded by SystemsX.ch (IPhD grant to AK; http://www.systemsx.ch/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Hsu PP, Sabatini DM. Cancer cell metabolism: Warburg and beyond. Cell. 2008;134(5):703–7. doi: 10.1016/j.cell.2008.08.021 [DOI] [PubMed] [Google Scholar]
- 2.DeBerardinis RJ, Thompson CB. Cellular metabolism and disease: what do metabolic outliers teach us? Cell. 2012;148(6):1132–44. doi: 10.1016/j.cell.2012.02.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.McKnight SL. On getting there from here. Science(Washington). 2010;330(6009):1338–9. [DOI] [PubMed] [Google Scholar]
- 4.Patti GJ, Yanes O, Siuzdak G. Innovation: Metabolomics: the apogee of the omics trilogy. Nature reviews Molecular cell biology. 2012;13(4):263–9. doi: 10.1038/nrm3314 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gerosa L, Sauer U. Regulation and control of metabolic fluxes in microbes. Curr Opin Biotechnol. 2011;22(4):566–75. doi: 10.1016/j.copbio.2011.04.016 [DOI] [PubMed] [Google Scholar]
- 6.Sauer U. Metabolic networks in motion: 13C-based flux analysis. Molecular systems biology. 2006;2:62 doi: 10.1038/msb4100109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Booth SC, Weljie AM, Turner RJ. Computational tools for the secondary analysis of metabolomics experiments. Computational and structural biotechnology journal. 2013;4:e201301003 doi: 10.5936/csbj.201301003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Link H, Christodoulou D, Sauer U. Advancing metabolic models with kinetic information. Current opinion in biotechnology. 2014;29:8–14. doi: 10.1016/j.copbio.2014.01.015 [DOI] [PubMed] [Google Scholar]
- 9.Khodayari A, Zomorrodi AR, Liao JC, Maranas CD. A kinetic model of Escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metabolic engineering. 2014;25:50–62. doi: 10.1016/j.ymben.2014.05.014 [DOI] [PubMed] [Google Scholar]
- 10.Khodayari A, Maranas CD. A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun. 2016;7:13806 doi: 10.1038/ncomms13806 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Link H, Kochanowski K, Sauer U. Systematic identification of allosteric protein-metabolite interactions that control enzyme activity in vivo. Nature biotechnology. 2013;31(4):357–61. doi: 10.1038/nbt.2489 [DOI] [PubMed] [Google Scholar]
- 12.Zampar GG, Kümmel A, Ewald J, Jol S, Niebel B, Picotti P, et al. Temporal system‐level organization of the switch from glycolytic to gluconeogenic operation in yeast. Molecular systems biology. 2013;9(1):651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Noguchi R, Kubota H, Yugi K, Toyoshima Y, Komori Y, Soga T, et al. The selective control of glycolysis, gluconeogenesis and glycogenesis by temporal insulin patterns. Molecular systems biology. 2013;9(1):664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fuhrer T, Zamboni N. High-throughput discovery metabolomics. Curr Opin Biotechnol. 2015;31:73–8. doi: 10.1016/j.copbio.2014.08.006 [DOI] [PubMed] [Google Scholar]
- 15.Ewald JC, Matt T, Zamboni N. The integrated response of primary metabolites to gene deletions and the environment. Mol Biosyst. 2013;9(3):440–6. doi: 10.1039/c2mb25423a [DOI] [PubMed] [Google Scholar]
- 16.Cakir T, Patil KR, Onsan Z, Ulgen KO, Kirdar B, Nielsen J. Integration of metabolome data with metabolic networks reveals reporter reactions. Molecular systems biology. 2006;2:50 Epub 2006/10/04. doi: 10.1038/msb4100085 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Steuer R, Kurths J, Fiehn O, Weckwerth W. Observing and interpreting correlations in metabolomic networks. Bioinformatics. 2003;19(8):1019–26. [DOI] [PubMed] [Google Scholar]
- 18.Li S, Park Y, Duraisingham S, Strobel FH, Khan N, Soltow QA, et al. Predicting network activity from high throughput metabolomics. PLoS Comput Biol. 2013;9(7):e1003123 doi: 10.1371/journal.pcbi.1003123 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kotze HL, Armitage EG, Sharkey KJ, Allwood JW, Dunn WB, Williams KJ, et al. A novel untargeted metabolomics correlation-based network analysis incorporating human metabolic reconstructions. BMC systems biology. 2013;7:107 doi: 10.1186/1752-0509-7-107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Brush SG. History of the Lenz-Ising model. Reviews of modern physics. 1967;39(4):883. [Google Scholar]
- 21.Bishop CM. Graphical Models Pattern Recognition and Machine Learning: Springer; 2006. [Google Scholar]
- 22.Sutton C. An Introduction to Conditional Random Fields. Foundations and Trends® in Machine Learning. 2012;4(4):267–373. doi: 10.1561/2200000013 [Google Scholar]
- 23.Posma JM, Robinette SL, Holmes E, Nicholson JK. MetaboNetworks, an interactive Matlab-based toolbox for creating, customizing and exploring sub-networks from KEGG. Bioinformatics. 2014;30(6):893–5. doi: 10.1093/bioinformatics/btt612 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28(1):27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Andres B, Koethe U, Kroeger T, Helmstaedter M, Briggman KL, Denk W, et al. 3D segmentation of SBFSEM images of neuropil by a graphical model over supervoxel boundaries. Med Image Anal. 2012;16(4):796–805. doi: 10.1016/j.media.2011.11.004 [DOI] [PubMed] [Google Scholar]
- 26.Andres BaBTaK, J.H. OpenGM: A C++ Library for Discrete Graphical Models. ArXiv e-prints. 2012;1206.0111. [Google Scholar]
- 27.Beucher S, Lantuéjoul C, editors. Use of watersheds in contour detection. International workshop on image processing, real-time edge and motion detection; 1979.
- 28.Heskes T, Eisinga R, Breitling R. A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments. BMC bioinformatics. 2014;15:367 doi: 10.1186/s12859-014-0367-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Eduati F, Mangravite LM, Wang T, Tang H, Bare JC, Huang R, et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nature biotechnology. 2015;33(9):933–40. doi: 10.1038/nbt.3299 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Costello JC, Heiser LM, Georgii E, Gonen M, Menden MP, Wang NJ, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature biotechnology. 2014;32(12):1202–12. doi: 10.1038/nbt.2877 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Marbach D, Costello JC, Kuffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804. doi: 10.1038/nmeth.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Meyer P, Cokelaer T, Chandran D, Kim KH, Loh PR, Tucker G, et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC systems biology. 2014;8:13 doi: 10.1186/1752-0509-8-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fuhrer T, Zampieri M, Sévin DC, Sauer U, Zamboni N. Genome-wide landscape of gene-metabolome associations in Escherichia coli. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado L, Solano-Lira H, et al. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic acids research. 2011;39(Database issue):D98–105. doi: 10.1093/nar/gkq1110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Maas WK. The arginine repressor of Escherichia coli. Microbiological reviews. 1994;58(4):631–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cai XY, Maxon ME, Redfield B, Glass R, Brot N, Weissbach H. Methionine synthesis in Escherichia coli: effect of the MetR protein on metE and metH expression. Proceedings of the National Academy of Sciences of the United States of America. 1989;86(12):4407–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Liu J, Beacham IR. Transcription and regulation of the cpdB gene in Escherichia coli K12 and Salmonella typhimurium LT2: evidence for modulation of constitutive promoters by cyclic AMP-CRP complex. Molecular & general genetics: MGG. 1990;222(1):161–5. [DOI] [PubMed] [Google Scholar]
- 38.Mori K, Aiba H. Evidence for negative control of cya transcription by cAMP and cAMP receptor protein in intact Escherichia coli cells. The Journal of biological chemistry. 1985;260(27):14838–43. [PubMed] [Google Scholar]
- 39.Kuehne A, Emmert H, Soehle J, Winnefeld M, Fischer F, Wenck H, et al. Acute Activation of Oxidative Pentose Phosphate Pathway as First-Line Response to Oxidative Stress in Human Skin Cells. Mol Cell. 2015;59(3):359–71. doi: 10.1016/j.molcel.2015.06.017 [DOI] [PubMed] [Google Scholar]
- 40.Tretter L, Adam-Vizi V. Inhibition of Krebs cycle enzymes by hydrogen peroxide: A key role of [alpha]-ketoglutarate dehydrogenase in limiting NADH production under oxidative stress. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2000;20(24):8972–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tretter L, Adam-Vizi V. Alpha-ketoglutarate dehydrogenase: a target and generator of oxidative stress. Philosophical transactions of the Royal Society of London Series B, Biological sciences. 2005;360(1464):2335–45. doi: 10.1098/rstb.2005.1764 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Shi Q, Gibson GE. Up-regulation of the mitochondrial malate dehydrogenase by oxidative stress is mediated by miR-743a. Journal of neurochemistry. 2011;118(3):440–8. doi: 10.1111/j.1471-4159.2011.07333.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Fuhrer T, Heer D, Begemann B, Zamboni N. High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection-time-of-flight mass spectrometry. Analytical chemistry. 2011;83(18):7074–80. Epub 2011/08/13. doi: 10.1021/ac201267k [DOI] [PubMed] [Google Scholar]
- 44.Ralser M, Wamelink MM, Kowald A, Gerisch B, Heeren G, Struys EA, et al. Dynamic rerouting of the carbohydrate flux is key to counteracting oxidative stress. J Biol. 2007;6(4):10 doi: 10.1186/jbiol61 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ralser M, Wamelink MM, Latkolik S, Jansen EE, Lehrach H, Jakobs C. Metabolic reconfiguration precedes transcriptional regulation in the antioxidant response. Nat Biotechnol. 2009;27(7):604–5. doi: 10.1038/nbt0709-604 [DOI] [PubMed] [Google Scholar]
- 46.Link H, Fuhrer T, Gerosa L, Zamboni N, Sauer U. Real-time metabolome profiling of the metabolic switch between starvation and growth. Nature Methods. 2015. [DOI] [PubMed] [Google Scholar]
- 47.Pirhaji L, Milani P, Leidl M, Curran T, Avila-Pacheco J, Clish CB, et al. Revealing disease-associated pathways by network integration of untargeted metabolomics. Nat Methods. 2016;13(9):770–6. doi: 10.1038/nmeth.3940 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, et al. Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information. PLoS Genet. 2012;8(10):e1003005 doi: 10.1371/journal.pgen.1003005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jha AK, Huang SC, Sergushichev A, Lampropoulou V, Ivanova Y, Loginicheva E, et al. Network integration of parallel metabolic and transcriptional data reveals metabolic modules that regulate macrophage polarization. Immunity. 2015;42(3):419–30. doi: 10.1016/j.immuni.2015.02.005 [DOI] [PubMed] [Google Scholar]
- 50.Beisser D, Grohme MA, Kopka J, Frohme M, Schill RO, Hengherr S, et al. Integrated pathway modules using time-course metabolic profiles and EST data from Milnesium tardigradum. BMC Syst Biol. 2012;6:72 doi: 10.1186/1752-0509-6-72 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Zhu J, Sova P, Xu Q, Dombek KM, Xu EY, Vu H, et al. Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biol. 2012;10(4):e1001301 doi: 10.1371/journal.pbio.1001301 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bartel J, Krumsiek J, Theis FJ. Statistical methods for the analysis of high-throughput metabolomics data. Comput Struct Biotechnol J. 2013;4:e201301009 doi: 10.5936/csbj.201301009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hensley CT, Faubert B, Yuan Q, Lev-Cohain N, Jin E, Kim J, et al. Metabolic Heterogeneity in Human Lung Tumors. Cell. 2016;164(4):681–94. doi: 10.1016/j.cell.2015.12.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Robertson-Tessi M, Gillies RJ, Gatenby RA, Anderson AR. Impact of metabolic heterogeneity on tumor growth, invasion, and treatment outcomes. Cancer Res. 2015;75(8):1567–79. doi: 10.1158/0008-5472.CAN-14-1428 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Sengupta D, Pratx G. Imaging metabolic heterogeneity in cancer. Mol Cancer. 2016;15(1):4 doi: 10.1186/s12943-015-0481-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD. Cytoscape Web: an interactive web-based network browser. Bioinformatics. 2010;26(18):2347–8. doi: 10.1093/bioinformatics/btq430 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular systems biology. 2006;2:2006 0008 doi: 10.1038/msb4100050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kitagawa M, Ara T, Arifuzzaman M, Ioka-Nakamichi T, Inamoto E, Toyonaga H, et al. Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research. DNA research: an international journal for rapid publication of reports on genes and genomes. 2005;12(5):291–9. doi: 10.1093/dnares/dsi012 [DOI] [PubMed] [Google Scholar]
- 59.Bradford MM. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Analytical biochemistry. 1976;72:248–54. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The MNS toolbox is available at http://www.imsb.ethz.ch/research/zamboni/resources.html or https://github.com/kuehnean/MNS_toolbox/ under GPLv3 license. Source data and code to reproduce the analyses included in the paper are available at http://www.imsb.ethz.ch/research/zamboni/resources.html.