Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2013 Aug 1;110(34):14006–14011. doi: 10.1073/pnas.1222569110

Heterogeneity in protein expression induces metabolic variability in a modeled Escherichia coli population

Piyush Labhsetwar a,1, John Andrew Cole b,1, Elijah Roberts c, Nathan D Price d, Zaida A Luthey-Schulten a,b,c,2
PMCID: PMC3752265  PMID: 23908403

Abstract

Stochastic gene expression can lead to phenotypic differences among cells even in isogenic populations growing under macroscopically identical conditions. Here, we apply flux balance analysis in investigating the effects of single-cell proteomics data on the metabolic behavior of an in silico Escherichia coli population. We use the latest metabolic reconstruction integrated with transcriptional regulatory data to model realistic cells growing in a glucose minimal medium under aerobic conditions. The modeled population exhibits a broad distribution of growth rates, and principal component analysis was used to identify well-defined subpopulations that differ in terms of their pathway use. The cells differentiate into slow-growing acetate-secreting cells and fast-growing CO2-secreting cells, and a large population growing at intermediate rates shift from glycolysis to Entner–Doudoroff pathway use. Constraints imposed by integrating regulatory data have a large impact on NADH oxidizing pathway use within the cell. Finally, we find that stochasticity in the expression of only a few genes may be sufficient to capture most of the metabolic variability of the entire population.

Keywords: protein noise, pathway utilization, metabolism, aerobic regulation, systems biology


The stochastic nature of life imparts to each cell a certain uniqueness in the form of small—and sometimes not-so-small—deviations from mean behavior, manifested ultimately in measurable cell-to-cell variability. Stochasticity in gene expression in particular has been proposed to be an important factor in giving rise to rich phenotypic variability exhibited among clonal populations. Sources of intrinsic and extrinsic noise affect each cell differently so that each cell will in turn have its own unique set of protein copy numbers, and thereby its own physiological properties.

Gene expression noise has been studied at both the experimental and theoretical levels for several years (for a review, see ref. 1). Models of protein production and regulation have been developed that are both amenable to mathematical analysis and capable of describing a range of biologically relevant phenomena. In particular, several analyses converge on the gamma distribution in describing steady-state protein copy number distributions and their resulting effects on enzyme kinetics (25). A recent system-wide determination of protein and mRNA copy numbers in single Escherichia coli cells has shown that this distribution is an excellent fit to experimental measurement across a broad range of expression levels (6).

Although mathematical analysis performs admirably at the level of individual proteins and simple regulatory schemes, moving beyond this scope to larger reaction networks necessitates a different modeling paradigm. Dynamical simulations are capable of elucidating microscopic descriptions of cellular phenomena (7), but as yet are difficult to extend to large reaction networks due to incomplete knowledge of many reaction rate constants. Steady-state fluxes through a large system of reactions can be determined by flux balance analysis (FBA), which requires only knowledge of stoichiometry and reaction bounds, making it applicable to these larger systems (810) (for an excellent introduction to FBA, see ref. 11). Models of metabolism, transcription, and translation have been painstakingly developed and refined for E. coli and several other model organisms, and are capable of elucidating detailed descriptions of cellular behavior (1214). Although these models contain imperfections such as dead ends stemming from insufficient information, they have nonetheless been shown to be highly predictive under a wide range of experimental conditions.

FBA allows a large space of possible solutions. By imposing realistic constraints on reaction fluxes, this space can be pared down to a small subset that most accurately reflects the behavior of real cells. Several sophisticated techniques have been developed to constrain metabolic models using experimental data. Transcriptional microarray data, for example, has been used to build integrated metabolic and regulatory models to study cells in differing states of gene regulation (1517). As systems-level models become more complete, an increasingly large amount of experimental data is required to parameterize them. A recent pioneering study describing an integrated multicomponent model of Mycoplasma genitalium incorporated over 1,900 experimentally determined parameters (18).

In the present work, the effects of gene expression noise on growth and metabolic pathway use in isogenic E. coli cells are studied by imposing flux constraints based on experimentally determined protein distributions onto a metabolic model. As summarized in Fig. 1, a population of 1 million cells is modeled, each of which is defined by independently sampling the copy number distributions of 352 metabolic proteins; FBA on an integrated regulatory and metabolic reconstruction then determines each cell’s metabolic behavior. A broad distribution of specific growth rates and a surprisingly rich set of metabolic phenotypes among the population are observed. This observed specific growth rate distribution can be almost completely characterized by approximately a dozen genes.

Fig. 1.

Fig. 1.

Protein copy number distributions were obtained from experimental data (6). Distributions are sampled to obtain a set of protein counts that define the state of a unique cell in the population of 1 million. This set of protein counts is used to impose constraints on fluxes in the integrated metabolic and regulatory reconstruction. FBA is used to obtain optimal specific growth rates for every in silico cell.

Model

The sampled enzyme copy numbers are used to impose flux constraints on all associated reactions in the E. coli metabolic reconstruction iJO1366 (Fig. S1). This reconstruction is the latest and most comprehensive to date; it contains roughly 1,100 metabolites involved in over 2,200 reactions catalyzed by the products of 1,366 genes (12). Assuming Michaelis–Menten kinetics, the maximum enzyme-catalyzed reaction rate is given by Inline graphic, where Kcat and Inline graphic are the enzyme turnover rate and concentration, respectively. The copy numbers reported in ref. 6 were normalized to an average cell volume; sampling out of each distribution therefore yields a copy number for a cell of average size. Because FBA is independent of cell size (e.g., all inputs and outputs are in units per gram dry weight), each sampled copy number must also be used in a size-independent manner. Normalizing the enzyme copy numbers by the average cell dry weight of Inline graphic (19) accomplishes this and avoids complicating cell size effects.

Of the 1,018 proteins measured (6), 389 catalyze reactions in the metabolic network. Several of these enzymes were reported with unrealistically low copy numbers, possibly due to labeling difficulties. An example is the ε subunit of ATP synthase, which was counted on average less than once in every six cells, whereas other subunits were measured in the hundreds. For this reason, enzymes counted on average less than once per cell are considered uncounted, leaving a total of 352 proteins to be used in setting metabolic constraints. Copy numbers sampled from the measured distributions are paired with the most appropriate Kcat value from the BRENDA database (20) to set upper bounds on metabolic reaction fluxes (Dataset S1). This approach has been used in the past, perhaps most notably in ref. 18. For a reaction catalyzed by a single protein, the product of the sampled copy number and its associated Kcat is imposed as the upper bound on the reaction flux. In the event that multiple proteins form a complex catalyzing a given reaction, the product of the lowest sampled copy number and its associated Kcat is used to set the reaction bound. In cases where multiple proteins can catalyze the same reaction independently and all proteins have known copy number distributions, the sum of the products of the sampled copy numbers and Kcat values is used to set the reaction bound. In this case, if any of the copy number distributions are unknown, no constraint is applied. This basic paradigm can be scaled up straightforwardly for more complex enzyme–reaction relationships.

In the absence of regulatory information, FBA can erroneously predict flux through reactions that are down-regulated under aerobic conditions. By incorporating transcriptional regulatory data, flux through reactions catalyzed by genes known to be strongly down-regulated under aerobic growth conditions can be prevented. A conservative approach was pursued whereby the upper and lower bounds on reactions that depend on genes known to be at least fourfold down-regulated under aerobic conditions were set to zero. In all, reactions catalyzed by 31 gene products were prevented from carrying flux under this criterion. Significant consistency between the set of strongly down-regulated proteins and protein count data (6) was observed. The set of measured protein counts was found to be correlated with strongly regulated genes (strongly down-regulated genes tend to be counted in smaller numbers, whereas strongly up-regulated genes tend to be counted in larger numbers), and a statistically significant number of strongly down-regulated genes could not be counted at all (SI Text, section 1.1, Fig. S2, and Table S1). Because transcriptional regulation results from chemical reaction networks of varying size and complexity, which should be subject to stochastic variability, drawing a distinction between aerobic and anaerobic states at the single-cell level is somewhat problematic. The high threshold chosen for disallowing flux through a reaction is meant to ensure that only strongly regulated enzymes are prevented from carrying flux, and the effects of copy number variability in the more moderately or ambiguously regulated enzymes may still be explored through sampling.

The solution space of the metabolic network is explored using flux variability analysis (FVA). For a given set of growth conditions, the growth rate is held constant and the minimum and maximum fluxes possible through each metabolic reaction are calculated, giving the range of flux values that each reaction can carry. Large parts of the metabolic network—on the order of 650 reactions—are predicted by FVA to be incapable of carrying any flux (both their minimum and maximum flux values are zero). Roughly 200 of these reactions can be attributed to known gaps in the metabolic network, mainly due to missing reactions or metabolites upstream or downstream of a given pathway or reaction (21). Interestingly, however, 27 enzymes associated with the remaining zero-flux reactions are measured to have nonzero protein counts; nine of these have significant average copy numbers above 20 (Table S2). These nine enzymes are associated with the metabolism of alternate sugars like galactose, maltose, mannose, and fructose, indicating either residual expression from earlier growth in LB medium or that in vivo cells may maintain significant levels of these proteins regardless of their primary carbon source to quickly respond to changes in food availability.

Results and Discussion

Modeling the Metabolic Response to Gene Expression Noise.

Cells were observed growing across a range of specific growth rates spanning from 0 to 0.55 h−1 with mean near the bulk growth rate of 0.37 h−1 measured for cells growing in a glucose minimal medium (Fig. 2). The coefficient of variation for this distribution is 0.30. Data from a recent study on single-cell growth behavior in a rich medium confirms that the growth rate of individual E. coli varies both in time as they progress through the cell cycle and across the population from one cell to another (22). Cells were measured growing at rates ranging from roughly 0.1 up to 1.5, with a peak in the vicinity of 0.6 h−1 and a coefficient of variation of roughly 0.42—significantly faster than the cells used in the proteome study (6) to which our model is tuned (Fig. S3).

Fig. 2.

Fig. 2.

Distributions of specific growth rates predicted by uncorrelated protein sampling (blue), by imposing correlations of correlation coefficient 0.66 among proteins in the extrinsic noise regime (red), and by sampling only the 15 proteins whose copy numbers are most likely to constrain the growth of modeled cells (green). The vertical black line represents the experimentally determined bulk specific growth rate.

The slow-growth cells in our modeled population arise from the probabilistic nature of gene expression; proteins essential for growth are sampled low by chance, limiting these cells’ potential for growth. These cells grow too slowly for their glucose demand to reach the upper bound on glucose uptake. The fast-growing cells do reach this upper bound, and their growth is fundamentally limited by this constraint. In our model, the glucose uptake upper bound is constant across the population, which leads to decreased overall variability in growth rates among the fast-growing cells. This is reflected in the distinctive peak in the distribution at high growth rates. Higher upper bounds on the glucose uptake rate tend to spread the distribution to the right (toward higher growth rates), smoothing out this peak (Fig. S4). This behavior highlights the need for data on single-cell uptake rates to increase the accuracy of our predictions.

Variability in high copy number enzyme counts is known to be dominated by extrinsic fluctuations, and enzymes in this regime have been shown to exhibit considerable copy number correlation (6, 23, 24). In addition to the populations simulated through uncorrelated sampling of protein distributions, populations were also simulated through correlated sampling of enzymes with mean copy number greater than 10 (the approximate extrinsic noise limit; SI Text, section 1.2). Correlated protein sampling resulted in a shift of the growth rate distribution toward faster doubling times (Fig. 2 and Fig. S5B). Colocalization of genes within operons was investigated as another possible source of protein copy number correlation, but this resulted in no appreciable effect on the modeled population (Fig. S5A).

Principal Component Analysis of Flux Distributions.

Subpopulations for whom pathway use differs from that of the population average can be distinguished as outliers along some unique direction in flux space. Principal component analysis (PCA) of sets of flux distributions coupled with varimax rotation has been used in the past to elucidate key ways in which metabolic pathway use can differ among modeled cells (25). PCA was performed on the growth rate-normalized flux distributions of a set of 1,000 cells uniformly sampled from across the range of growth rates. A rotation among the first 20 principal components (accounting for over 99.9% of the variability of our transcriptionally regulated aerobic cells) is then performed. Several rotation schemes were investigated including the well-known varimax, quartimax, equimax, and parsimax rotations, as well as a data-dependent rotation designed to find directions along which relatively small populations of cells extend away from a larger bulk behavior (see SI Text, section 1.3, for details). Among these varimax, quartimax, and our own data-dependent scheme were found empirically to provide the most biologically meaningful basis rotations.

The first few components resulting from the three best rotations (varimax, quartimax, and our data-dependent scheme) show enhanced loadings on the acetate overflow pathway, cytochrome oxidases bo3 and bd, the glycolysis and the Entner–Doudoroff (ED) pathways, and ATP synthase and maintenance. The varimax and quartimax rotations both isolate malate oxidase among their first five basis vectors, and our own scheme picks out pyruvate-lactate redox cycling among its first five. Upon further investigation, no distinctive phenotypic behavior was observed for malate oxidase or ATP synthase use at any growth rate. The first five basis vectors returned by varimax and quartimax rotation account for around 84% of the total variability of the data, whereas our approach accounts for around 94% (Figs. S6S9). The pathways highlighted by this analysis were investigated further to determine the roles that nutrient availability, gene expression, and regulation play in giving rise to population heterogeneity.

Acetogenesis and tricarboxylic acid cycling phenotypic differentiation.

Cells differentiate into either primarily acetate-secreting or CO2-secreting phenotypes. The shift in pathway use occurs abruptly at a growth rate near 0.38 h−1—the rate at which a cell’s glucose uptake rate tends to reach its upper bound (Fig. 3A). This differentiation is driven by a trade-off between two key forms of metabolic efficiency: enzyme use efficiency, and energy (ATP) production efficiency. Under the assumption of parsimony of enzyme use (see SI Text, section 1.4, for details), the model predicts acetate secretion to be greater than CO2 secretion among slow-growing cells. Although less energy efficient, the acetate pathway requires lower total enzyme-mediated flux than does the tricarboxylic acid (TCA) cycle, making acetate overflow optimal in this regime. In the case of the faster growing glucose-limited cells, the relative importance of enzyme use efficiency and energy production efficiency reverse; the model predicts cells in the glucose-limited regime favor use of the TCA cycle. These cells feel the effects of limited sugar availability doubly fold—with increasing growth rate comes an increasing ATP requirement to run the cell’s molecular machinery, while at the same time come increasing requirements for biomass “building blocks” like amino acids, lipids, etc., which require diverting glucose metabolic products from ATP generation. The fastest growing cells survive using increasingly less glucose for energy by being increasingly more efficient (see SI Text, section 1.5, and Fig. S10 for details on carbon economy).

Fig. 3.

Fig. 3.

Use of key pathways by 1,000 cells sampled uniformly across the entire range of predicted growth rates. (A) The population separates into slow-growing acetogenic and fast-growing TCA-cycling phenotypes. This shift occurs at the growth rate at which cells begin to reach the upper bound on the glucose uptake rate (Fig. S10). (B) Cells growing in the range of 0.2–0.5 h−1 show a tendency to use the cytochrome oxidase bd reaction. The sampled cytochrome bo3 counts impose flux constraints below what these cells require, necessitating them to make up the difference via cytochrome bd. (C) Enolase enzyme counts impose constraints on the amount of glucose products that can be metabolized through glycolysis. Although preferable among slow-growing cells for its substrate-level ATP generation, the ED pathway quickly dominates sugar metabolism as it requires one-half the enolase flux that glycolysis does. (D) Shifts in cellular behavior also arise from imposed regulation. The model without regulation shows a slow-growth phenotype that uses PDH, whereas in the regulated model this behavior is strongly suppressed. A regulated model with PFL and LDH gene deletions shows that PDH is the dominant path from pyruvate to acetyl-CoA. All data were produced with correlated protein sampling.

The prediction of a subpopulation of slow-growing acetogenic cells under aerobic conditions is somewhat unexpected. Aerobic acetate secretion is associated with fast-growing cells in excess glucose, such as those in fed batch experiments (26). Because the modeled slow-glowing subpopulation does not reach its glucose uptake upper bound, glucose availability is never a limiting factor for growth, and these cells therefore behave like cells in excess glucose. A test of our prediction was to measure the supernatant acetate concentration in a batch culture of E. coli using the same strain and media used in the single-cell proteomics study. These cells—whose doubling times were measured at nearly 2 h—showed significant acetogenesis. Comparison with simulated acetogenesis by a colony sampled from our modeled population (Fig. 4; Figs. S11 and S12; and SI Text, sections 1.6 and 1.7) showed order of magnitude agreement overall.

Fig. 4.

Fig. 4.

Comparison of experimentally determined acetate concentration in batch culture supernatant with simulated acetate production by growing colonies of modeled cells. Experimental data are represented as ○, ×, and +. The colored lines represent simulated acetate produced by a population modeled with varying degrees of generational growth rate correlation. For these simulations, the modeled cells were generated with imposed protein count correlation of 0.66 among proteins in the extrinsic noise regime.

Cytochrome oxidase phenotypic differentiation.

Overall, cells tend to use the highly efficient cytochrome bo3 (Fig. 3B). For cells at low growth rates, this is the dominant pathway, as it is very rarely constrained by its upper bound. Cells at intermediate growth rates when glucose is still relatively abundant are more prone to reach the upper bound on this reaction, and begin to use the less efficient cytochrome bd pathway—which pumps one-half as many protons across the membrane per ubiquinone molecule—to maintain the necessary proton gradient. As partial utilization of cytochrome bd makes the overall energy production of a cell less efficient, the ability of cells to grow at higher growth rates by taking advantage of this pathway reaches an upper limit. Only those cells with an extremely high (>95th percentile) cytochrome bo3 count can attain the maximum theoretical growth rate given the constraints on the glucose uptake rate.

Glycolysis–ED phenotypic differentiation.

Subpopulations of both fast- and slow-growing cells exhibit distinct differences in their utilization of the glycolysis and the ED pathways. Cells growing at rates below roughly 0.1 h−1 tend to favor glycolysis, as do those growing very fast (at growth rates of roughly 0.55 h−1), whereas cells growing at intermediate rates tend to favor the ED pathway (Figs. 3C and 5). This twofold shift in pathway use across our population is unique among the pathways studied here. It arises from the interplay between enolase copy numbers—which represent a primary glycolytic bottleneck—and the difference between enolase flux required by the two alternative pathways. Slow-growing cells take up such a small amount of glucose that they can easily process its products via glycolysis without reaching the upper bound on enolase flux. Faster-growing cells need to move more glucose products through the enolase reaction step, but cannot do it entirely via glycolysis; instead the ED pathway, which requires one-half the enolase flux that glycolysis does, offers the cell an avenue to faster glucose metabolism at the cost of substrate-level ATP generation. The fastest-growing cells—already taxed by limited glucose availability and therefore requiring efficient substrate-level ATP generation—must rely again on glycolysis, and can only maintain a high growth rate by having a correspondingly high enolase copy number. A recent analysis of enzyme kinetics in the glycolysis and ED pathways suggests that the ED pathway may be significantly more favorable in terms of enzymatic protein requirement (27). This could explain why the enolase copy number distribution appears to have evolved toward requiring the use of both pathways rather than exclusive use of glycolysis.

Fig. 5.

Fig. 5.

Example of differences in use between glycolysis and the ED pathway by representative cells in our modeled population. The ED pathway requires one-half as much enolase flux to metabolize the same amount of glucose as glycolysis, but at the cost of substrate-level ATP generation. Slow-growing cells tend to use glycolysis (A), whereas intermediate to fast-growing cells tend to use the ED pathway (B).

Regulation significantly impacts NADH oxidation behavior.

Comparison of simulations performed with and without transcriptional regulation shows differences in NADH oxidation and the use of pyruvate dehydrogenase (PDH). Without regulation, a well-defined phenotype of slow-growing cells using PDH was observed. These cells use a complex set of reactions to oxidize the NADH resulting from PDH, which involved the menaquinone and demethylmenaquinone plus 3H+ NADH dehydrogenases and tandem succinate–fumarate redox cycling (Table S3). Among the genes known to be strongly down-regulated under aerobic conditions are the menaquinone and demethylmenaquinone fumarate reductases that take part in this redox cycling. Because these are essential components of this NADH oxidizing machinery, this pathway is not used in our regulated model; rather, the regulated model predicts that PDH use is suppressed in favor of pyruvate formate lyase (PFL), which does not produce NADH. This suppression leads to an almost complete loss of the PDH phenotype under aerobic regulation (Fig. 3D).

Commensurate with the loss of the PDH phenotype, aerobic regulation leads to an increase in use of an NADH-oxidizing pyruvate–lactate redox cycle (Figs. S6E and S7). Without the NADH oxidizing pathways detailed above, regulated cells are predicted to run NADH-dependent lactate dehydrogenase backward to convert pyruvate to lactate, and then a ubiquinone-dependent lactate dehydrogenase to convert lactate back to pyruvate, in the process oxidizing NADH and reducing ubiquinone. Although this behavior did exist in the unregulated modeled population as a relatively small set of cells at intermediate to fast growth rates, the number of cells predicted to engage in this behavior is significantly expanded under regulation due to the lack of alternate avenues for NADH oxidation.

Interestingly, both PFL and LDH use are known to be associated with anaerobic growth (28, 29). The microarray data used in imposing regulatory constraints indicates that PFL is down-regulated a modest 2.6-fold—well below our cutoff for strong down-regulation—despite an average measured copy number in excess of 200 per cell. One explanation for this discrepancy is that PFL is known to be primarily down-regulated by O2 and PFL deactivase rather than transcriptionally. LDH is only very mildly regulated at the transcriptional level and could not be counted. A final model was created using the aerobically regulated model with both PFL and LDH flux disallowed. This model uses PDH only in producing acetyl-CoA, and oxidizes excess NADH via malate and oxaloacetate redox cycling rather than via lactate and pyruvate or succinate and fumarate cycling (Table S4). Microarray data show that the two malate dehydrogenases that catalyze this redox cycling—which are distinct from the malate oxidase isolated by varimax and quartimax PCA basis rotation—are over fourfold up-regulated under aerobic conditions (17). These examples highlight the necessity when developing realistic cell-scale models for using many different sources of data including literature searches—transcriptional regulation in this case is not sufficient to predict the realistic behavior of the cell.

A Few Genes Are Predicted to Account for Most of the Metabolic Variability.

Over 350 protein distributions were sampled to model each in silico cell, but many enzymes were found not to have an appreciable effect on metabolic behavior. The likelihood that a sampled enzyme count will impact metabolism in a modeled cell was investigated by studying whether a change in the copy number of that enzyme results in a significant change in the cell’s growth rate (SI Text, section 1.8). Only 28 of the sampled enzymes constrain the growth of at least one cell in a modeled population of 10,000, and of those, only 15 represent a constraint in more than 2% of the population (Fig. 6 and Fig. S13). For the sake of comparison, a parallel analysis using enzyme counts drawn from uniform distributions on the interval from 1 to 1,000 showed that 51 of the sampled enzymes constrain the growth of at least one cell, and 20 represent a constraint in more than 2% of the cells. Although by no means exhaustive, these results do hint that the particular enzyme copy number distributions observed in vivo may have evolved such that most enzymes do not hinder growth most of the time.

Fig. 6.

Fig. 6.

Bar graph indicating the number of cells whose growth is directly limited by a given protein. Only 28 proteins sampled from the experimentally measured protein distributions (shown in red) limit the growth rate of at least one cell in a population of 10,000. For reference, over 50 proteins would be expected to limit the growth rate of at least one cell, had all enzyme counts been sampled from a uniform distribution from 1 to 1,000 (shown in blue).

A few enzymes were found to be especially likely to impact the metabolism of a modeled cell. FadB, FabD, FadJ, Ppk, Eno, and CyoC all had probabilities greater than 0.25 of being a direct limitation on cellular growth rate. Of these, FadB, FabD, and FadJ (all associated with lipid biosynthesis) were measured to have small mean copy numbers (2.0, 5.3, and 2.2 per cell, respectively) (6). Recent mass spectrometry studies have found that FadA—which is cotranscribed with FadB in the fadBA operon—is strongly expressed in the presence of oleic acid, but that it and FadJ were undetectable under glucose culturing conditions (30, 31), which supports the copy number data for FadB and FadJ. The same mass spectrometry studies, however, detected FabD (considered an essential enzyme) in significant numbers among cells growing on glucose. Further study of this enzyme’s expression may be necessary to resolve this discrepancy. Ppk (oxidative phosphorylation), Eno (glycolysis), and CyoC (oxidative phosphorylation) were all measured in significant numbers and had reasonable associated Kcat values (Table S5).

In general, a large portion of the overall cell-to-cell variability in metabolic behavior can be attained by sampling only the enzymes most likely to constrain cell growth. The growth rate distribution that results from sampling the 15 enzymes most likely to constrain growth shows outstanding agreement with the fully sampled population (Fig. 2 and Fig. S14). As a note of caution, the metabolic model remains a work in progress, and there exist inconsistencies in the data that make the identification of artificial “bottlenecks” (Methods) difficult. Nevertheless, the ability of so few proteins to characterize the steady-state behavior of the population under the given environmental conditions is an important result. This reduction suggests that future experiments may be able to focus on just a few enzymes and features of the network and kinetic parameters to capture the behavior of the entire population.

Methods

Computational Methods.

Implementations of FBA and parsimonious FBA from the freely available COBRA toolbox (9) were used, and the metabolic reconstruction iJO1366 is available as part of the supporting information of ref. 12. Details on these methods and model can be found in SI Text, sections 1.4 and 1.9.

Selecting Kcat Values.

A conservative approach in selecting the Kcat values to be imposed on our model was used to ensure that, where there are unknowns, systemic limitations result only from the experimental data that have been obtained. Because Kcat data exist for relatively few E. coli reactions in the BRENDA database (20), the highest value listed for each reaction, regardless of species or growth conditions, is used. In the event that no Kcat data are available for a given reaction, a high turnover rate of 20,000 s−1 is used, this being one of the highest turnover numbers listed in the BRENDA database for a wild-type enzyme.

Experimental Protein Distributions, Doubling Time, and Modeled Uptake Rates.

Details on the protein distributions and method for sampling thereof, as well as experimental growth rate and glucose uptake measurements, and the assignment of glucose, amino acid, and vitamin uptake rates for our model are described in detail in SI Text, sections 1.6 and 1.10, Fig. S11 and S15, and Table S6.

Identifying Artificial Metabolic Bottlenecks.

Several flux constraints were found to limit growth to rates well below the experimental value. An iterative process was developed to identify and release these “bottlenecks.” At each iteration, a population of 40,000 cells is generated by protein sampling; if the resulting mean growth rate is not larger than 0.38 h−1 (corresponding to the growth rate measured by OD600; SI Text, section 1.6), then these population data are used to identify the protein with the highest correlation with growth. Initially, an attempt to raise the turnover rate of this protein to 20,000 s−1 is made; if it is already 20,000 s−1, or if a future round of sampling shows this protein to still be a bottleneck, then we release the constraint entirely. This process is repeated until the predicted growth rate resulting from the mean values of the remaining constraints being imposed matches the experimental value. In all, 20 turnover rates were raised and two constraints were lifted entirely (SI Text, section 1.11, and Table S7).

Example Script and Data.

An example MATLAB script for generating a population of 1,000 cells by protein copy number sampling, as well as all of the required data, is freely available at www.scs.illinois.edu/schulten/software/index.html. A detailed description of the script can be found in SI Text, section 2, and for example output, see Fig. S16.

Supplementary Material

Supporting Information

Acknowledgments

We thank Dr. C. M. Schroeder, Dr. W. W. Metcalf, Arnab Mukherjee, Dr. Nicolai Müeller, and Huiyi Chen for providing the materials, space, equipment, and advice that made our experiments possible. We also thank our reviewers, whose thoughtful suggestions greatly enhanced this work. This research was supported by the Office of Science (Office of Biological and Environmental Research), Department of Energy Grant DE-FG02-10ER6510 and National Science Foundation Grants MCB-08-44670 and MCB 12-44570.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1222569110/-/DCSupplemental.

References

  • 1.Munsky B, Neuert G, van Oudenaarden A. Using gene expression noise to understand gene regulation. Science. 2012;336(6078):183–187. doi: 10.1126/science.1216379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Paulsson J, Ehrenberg M. Random signal fluctuations can reduce random fluctuations in regulated components of chemical regulatory networks. Phys Rev Lett. 2000;84(23):5447–5450. doi: 10.1103/PhysRevLett.84.5447. [DOI] [PubMed] [Google Scholar]
  • 3.Friedman N, Cai L, Xie XS. Linking stochastic dynamics to population distribution: An analytical framework of gene expression. Phys Rev Lett. 2006;97(16):168302. doi: 10.1103/PhysRevLett.97.168302. [DOI] [PubMed] [Google Scholar]
  • 4.Shahrezaei V, Swain PS. Analytical distributions for stochastic gene expression. Proc Natl Acad Sci USA. 2008;105(45):17256–17261. doi: 10.1073/pnas.0803850105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kim PJ, Price ND. Macroscopic kinetic effect of cell-to-cell variation in biochemical reactions. Phys Rev Lett. 2010;104(14):148103. doi: 10.1103/PhysRevLett.104.148103. [DOI] [PubMed] [Google Scholar]
  • 6.Taniguchi Y, et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329(5991):533–538. doi: 10.1126/science.1188308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Roberts E, Magis A, Ortiz JO, Baumeister W, Luthey-Schulten Z. Noise contributions in an inducible genetic switch: A whole-cell simulation study. PLoS Comput Biol. 2011;7(3):e1002010. doi: 10.1371/journal.pcbi.1002010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lewis NE, Nagarajan H, Palsson BO. Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat Rev Microbiol. 2012;10(4):291–305. doi: 10.1038/nrmicro2737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schellenberger J, et al. Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox v2.0. Nat Protoc. 2011;6(9):1290–1307. doi: 10.1038/nprot.2011.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Price ND, Reed JL, Palsson BØ. Genome-scale models of microbial cells: Evaluating the consequences of constraints. Nat Rev Microbiol. 2004;2(11):886–897. doi: 10.1038/nrmicro1023. [DOI] [PubMed] [Google Scholar]
  • 11.Orth JD, Thiele I, Palsson BO. What is flux balance analysis? Nat Biotechnol. 2010;28(3):245–248. doi: 10.1038/nbt.1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Orth JD, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011. Mol Syst Biol. 2011;7:535. doi: 10.1038/msb.2011.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Thiele I, Jamshidi N, Fleming R, Palsson B. Genome-scale reconstruction of Escherichia coli’s transcriptional and translational machinery: A knowledge base, its mathematical formulation, and its functional characterization. PLoS Comput Biol. 2009;5(3):e1000312. doi: 10.1371/journal.pcbi.1000312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wu M, Chan C. Human metabolic network: Reconstruction, simulation, and applications in systems biology. Metabolites. 2012;2(1):242–253. doi: 10.3390/metabo2010242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chandrasekaran S, Price ND. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proc Natl Acad Sci USA. 2010;107(41):17845–17850. doi: 10.1073/pnas.1005139107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang Y, Eddy JA, Price ND. Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC Syst Biol. 2012;6:153. doi: 10.1186/1752-0509-6-153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO. Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004;429(6987):92–96. doi: 10.1038/nature02456. [DOI] [PubMed] [Google Scholar]
  • 18.Karr JR, et al. A whole-cell computational model predicts phenotype from genotype. Cell. 2012;150(2):389–401. doi: 10.1016/j.cell.2012.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bremer H, Dennis PP. In: Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology. Neidhardt FC, editor. Washington, DC: American Society for Microbiology; 1987. p. 1530. [Google Scholar]
  • 20.Scheer M, et al. BRENDA, the enzyme information system in 2011. Nucleic Acids Res. 2011;39(Database issue):D670–D676. doi: 10.1093/nar/gkq1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Orth JD, Palsson B. Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC Syst Biol. 2012;6:30. doi: 10.1186/1752-0509-6-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mir M, et al. Optical measurement of cycle-dependent cell growth. Proc Natl Acad Sci USA. 2011;108(32):13124–13129. doi: 10.1073/pnas.1100506108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002;297(5584):1183–1186. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]
  • 24.Swain PS, Elowitz MB, Siggia ED. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci USA. 2002;99(20):12795–12800. doi: 10.1073/pnas.162041399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Barrett CL, Herrgard MJ, Palsson B. Decomposing complex reaction networks using random sampling, principal component analysis and basis rotation. BMC Syst Biol. 2009;3:30. doi: 10.1186/1752-0509-3-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Luli GW, Strohl WR. Comparison of growth, acetate production, and acetate inhibition of Escherichia coli strains in batch and fed-batch fermentations. Appl Environ Microbiol. 1990;56(4):1004–1011. doi: 10.1128/aem.56.4.1004-1011.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proc Natl Acad Sci USA. 2013;110(24):10039–10044. doi: 10.1073/pnas.1215283110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Knappe J, Sawers G. A radical-chemical route to acetyl-CoA: The anaerobically induced pyruvate formate-lyase system of Escherichia coli. FEMS Microbiol Rev. 1990;6(4):383–398. doi: 10.1111/j.1574-6968.1990.tb04108.x. [DOI] [PubMed] [Google Scholar]
  • 29.Garvie EI. Bacterial lactate dehydrogenases. Microbiol Rev. 1980;44(1):106–139. doi: 10.1128/mr.44.1.106-139.1980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Han MJ, Lee JW, Lee SY, Yoo JS. Proteome-level responses of Escherichia coli to long-chain fatty acids and use of fatty acid inducible promoter in protein production. J Biomed Biotechnol. 2008;2008:735101. doi: 10.1155/2008/735101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ishihama Y, et al. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics. 2008;9:102. doi: 10.1186/1471-2164-9-102. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1222569110_sd01.xlsx (190.8KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES