Abstract
The fitness landscape is a concept commonly used to describe evolution towards optimal phenotypes. It can be reduced to mechanistic detail using genome-scale models (GEMs) from systems biology. We use recently developed GEMs of Metabolism and protein Expression (ME-models) to study the distribution of Escherichia coli phenotypes on the rate-yield plane. We found that the measured phenotypes distribute non-uniformly to form a highly stratified fitness landscape. Systems analysis of the ME-model simulations suggest that this stratification results from discrete ATP generation strategies. Accordingly, we define “aero-types”, a phenotypic trait that characterizes how a balanced proteome can achieve a given growth rate by modulating 1) the relative utilization of oxidative phosphorylation, glycolysis, and fermentation pathways; and 2) the differential employment of electron-transport-chain enzymes. This global, quantitative, and mechanistic systems biology interpretation of fitness landscape formed upon proteome allocation offers a fundamental understanding of bacterial physiology and evolution dynamics.
Author summary
Genome-scale models enable quantitative prediction of bacterial phenotypes and a fine-grained description of the underlying optimal proteome allocation. Thus, we can now analyze the phenotypic potential of a large number of Escherichia coli genotypes grown under different conditions, which leads to the discovery of a stratified distribution of phenotypes. The observed distribution is determined by distinct ATP generation strategies, defined as “aero-types”, associated with optimal proteome allocation modulated upon differential usage of the electron-transport-chain enzymes. This mechanistic approach offers us a genome-scale understanding of the fitness landscape, and a fundamental interpretation of bacterial physiology and evolution dynamics.
Introduction
Sewall Wright’s fitness landscape [1] represented an early attempt to illustrate the complex genotype-fitness relationship in a graphical manner that allows an easy conceptualization of evolutionary dynamics. Technology developments in diverse fields including mutagenesis, microbial evolution experiments, and high-throughput DNA sequencing methods have now turned this concept from a metaphor into real data that allows for the reconstruction of the empirical fitness landscapes [2–4]. Two classes of such landscapes are usually studied to examine how natural selection may drive a population to the top of a fitness peak. The first one is constructed based on discreteness of the protein sequences, where evolution is modeled as movement through the evolutionary intermediates along feasible mutational pathways. This discrete representation is useful for estimating the probability of evolutionary outcomes [5] and demonstrating how molecular and epistatic interactions limit the number of accessible evolutionary paths [6–9]. However, experimental exploration and subsequent mathematical modeling of the fitness landscape is limited to a well-characterized posterior selection of mutations in an intrinsically high-dimensional genotype space. The second model specifies the phenotype-fitness relationship in a continuous and multivariate phenotypic space. It is capable of fitting variation in landscape structure across many species and environments [10–12], but with an impaired ability to relate fitness change directly to a specific genetic and molecular mechanism.
These well-studied fitness landscape models, whether discrete or continuous, address evolutionary dynamics towards an optimal phenotype based on the rare beneficial mutations that arise historically or in the course of microbial evolution experiments [13]. Directed evolution expedites the search for beneficial mutations in the high-dimensional sequence space by enforcing selection in the desired function and discarding those variants with no improvement. This powerful technique is capable of elucidating the molecular mechanisms of adaptation and evolutionary tradeoff in protein properties [14] under diverse environments [15], therefore greatly enriching our understanding of the adaptive trajectory. However, fitness effects for the majority of mutations that arise in nature are neutral, slightly deleterious, and slightly beneficial [16]. The distribution of the fitness effects of these spontaneous mutations in natural bacterial populations remains unclear.
An alternative approach to explore the fitness landscape and phenotypic distribution comes from the solution space of a genome-scale metabolic model (M-model) [17, 18]. Genome-scale models explicitly compute how the system-level optimization of organismal fitness is achieved through natural evolution while considering the constraints on as many factors as possible. These include the metabolic burden, resource allocation, and the interactions between gene and cellular environment [19]. The models’ ability to predict phenotypes and rapidly screen millions of genotypes allows for the exploration of the change in an optimal solution space upon gene deletion, providing valuable insight into the impact of gene essentiality [20] under diverse conditions [21], plasticity and robustness of metabolic networks [22], and the effect of epistasis interactions on the fitness distribution [23, 24].
Expansion of the M-models to include constraints on the cost of protein biosynthesis has been improving the accuracy of phenotypic predictions for different organisms under various environments [25–28]. The genome-scale models of metabolism and protein expression (ME-models) for E. coli, in particular, explicitly incorporate the full reconstruction of transcription and translation pathways to allow for quantitative predictions of proteome allocation at the gene level [29–31] and the ability to predict evolutionary outcomes [32]. A more recent development further takes into account the temperature-dependent catalytic efficiency and thermostability of all enzymes in the ME-model (FoldME-model [33]), enabling an explicit formulation of the effect of a gene mutation in contrast to a direct gene deletion. This final improvement provides us with the opportunity to evaluate the phenotypic distribution of natural E. coli populations on a fitness landscape.
Here, we assemble and analyze large amounts of E. coli phenotypic growth data in the rate-yield plane and find consistent non-uniformity in the fitness distribution. Both computationally and experimentally determined phenotypes display multiple distinct phenotypic categories that distribute in stripes on the rate-yield plane and form a landscape with a “stratified” topology. We then show, by detailed analysis of metabolic fluxes and protein expression, that the stratified topography of this phenotypic fitness landscape can be fully described by the energy production strategy, which in turn is determined by a balance between proteome allocation cost and the metabolic efficiency of ATP production. Interestingly, we find that a simple quantity—the fraction of total ATP that is generated by the ATP synthase (fATPS)—is capable of outlining the stratification. Consequently, we define E. coli “aero-types” based on the multimodal distribution of fATPS modulated through the discrete usage of electron-transport-chain enzymes. An aero-type not only describes the cellular respiratory behavior, but also indicates the associated metabolic state and proteomic compositions. Finally, we discuss how the aero-type, as an effective fitness descriptor, can be used to address important biological questions such as the predictability of microbial evolution and the interpretation of the rate-yield tradeoff.
Results
A stratified structure in the E. coli phenotypic fitness landscape defined on the rate-yield plane
We used the most fundamental bacterial growth parameters, the biomass yield (Y) and substrate uptake rate (q), to span the phenotypic space for E. coli (Materials and methods). To gain a comprehensive view of the fitness distribution, we first compiled a compendium of experimental growth phenotypes from literature augmented with measurements obtained from our adaptive laboratory evolution (ALE) experiments (Materials and methods and references therein). This data set (n = 199) includes characterizations of different naturally occurring E. coli strains, evolved gene knock-out mutants, and growth under various nutrient conditions. It is immediately noticeable that both the high and low yield regions are densely populated, yet the regions in between (0.2 < Y < 0.3gDW/g) are almost empty (S1 Fig).
Is the observed non-uniform distribution of the rate-yield phenotype a result of insufficient sampling from experimental data, or a fundamental property determined by the design of a cell’s genome and metabolic network? To answer this question, we used the FoldME model [33] to compute the phenotypic fitness for a large number of in silico strains that sample the genetic variations of the naturally occurring E. coli genomes (Materials and methods). To implement such strain sampling, we first selected genes for mutation according to the calculated frequency of fixed mutations for each gene (S2 Fig). Then, we determined the molecular effect of the selected mutation by varying the selected enzyme’s catalytic efficiency (keff) and thermal stability (ΔG) by a random but small amount (see Materials and Methods for more details). Finally, growth of the sampled strains was simulated under glucose minimal media with temperature perturbations from 25°C to 46°C to take into account the effect of both genetic mutations and environmental changes.
The calculated fitness effects for the in silico strains were projected onto the rate-yield plane. The contour plot of a total of 2,200 sampled E. coli strains (Fig 1A) nicely confirms the non-uniform distribution observed from the experimental data. More importantly, it offers a characteristic representation for the “phenotypic fitness landscape”, in which growth phenotypes densely cluster along a few hyperbolic lines on the rate-yield plane (indicated by the blue arrows) but rarely fall in between these stratified density peaks.
The metabolic location of ATP production stratifies the phenotypic fitness landscape
To explain the observed stratification in phenotype distribution, we first examined the metabolic features characterizing the simulated samples within each populated region on the rate-yield plane. Interestingly, solutions along the densely populated hyperbolic lines (blue arrows in Fig 1A), where q and Y are positively correlated, share similar features in their flux distributions in central metabolism (S3 Fig). On the contrary, samples along the constant growth rate lines (μ-isoclines, red solid lines in Fig 1A) show consistent variation in the metabolic states that correlate with shifts in the rate-yield phenotype.
Specifically, as Y decreases along a μ-isocline, the following changes in the metabolic state can be identified through principal component analysis (Fig 1B and S3 Fig): 1) the amount of ATP produced by ATP synthase decreases; 2) flux through the tricarboxylic acid (TCA) cycle decreases; 3) total flux through the glycolysis pathway increases; 4) acetate secretion increases; and 5) the overall metabolic complexity, measured by the number of active reactions, decreases. We analyzed the expression data from 17 E. coli strains evolved under glucose minimal medium at 37°C [34] and 42°C [35], and confirmed the first two calculated trends with the positive correlation between Y and the total mass fraction of genes involved in TCA cycle and oxidative phosphorylation (S4 Fig).
We noticed that flux change of the energy production reactions correlated well with the shift in metabolic state and phenotypic location. Hence, we computed the fraction of total ATP produced by eight ATP-producing reactions: 1) ATP production by the ATP synthase (ATPS4rpp), and reactions catalyzed by the polyphosphate kinase (PPKr and PPK2r) in oxidative phosphorylation; 2) reactions catalyzed by the phosphoglycerate kinase (PGK) and the pyruvate kinase (PYK) in the lower glycolysis pathway; 3) the reaction catalyzed by the acetate kinase (ACKr) in mixed acid fermentation; 4) the reaction catalyzed by the succinyl-CoA synthetase (SUCOAS) in the TCA cycle; and 5) the reaction catalyzed by the ribose-phosphate diphosphokinase (PRPPS) in nucleotide biosynthesis. These quantities (fATPS, fPGK, fACKr, etc.) formed an eight-element vector that we used as the explanatory variables in a stepwise linear regression analysis. The results showed that six of the ATP production fractions could explain 89.5% of the variation in phenotypic distance (Materials and methods, S5 Fig), confirming the predictable mapping relationships between the metabolic state of ATP production and the phenotype.
Among these ATP production fractions, fATPS appears to be of particular importance. The fact that fATPS is positively correlated with Y and negatively correlated with qglc at each specific growth rate (S6 Fig), identifies it as the metabolic origin for the observed relationship between a metabolic state and the rate-yield phenotype. The observed correlation is rooted fundamentally in the cell’s energetic and metabolic network, rather than being just a simple function of the expression of the ATP synthase (S7 Fig). Interestingly, fATPS displays a multimodal distribution that is highly correlated with the distribution of the rate-yield phenotypes. Solutions with higher fATPS values (e.g., with averages 0.71 or 0.64) are located within the top two hyperbolic bands on the rate-yield plane (Fig 1C). For these high-yield phenotypes, high-resolution 13C-metabolic flux data is available to estimate their fATPS values experimentally. We calculated fATPS to be ∼0.65 for E. coli MG1655 evolved under glucose minimal medium and ∼0.706 for E. coli BL21 [36], both within 1.5% difference of the peak values predicted by our simulations.
To further confirm the critical role of fATPS, we tested whether the discreteness of fATPS directly gave rise to the stratified structure of the phenotypic fitness landscape. We performed strain sampling simulations where fATPS was constrained at the five predicted peak values: 0, 0.37, 0.53, 0.64, and 0.71 (Materials and methods). The results showed clearly that optimal solutions obtained at a particular fATPS were constrained within a thin hyperbolic band, where q and Y were positively correlated (S8(A) Fig). Under the same substrate supply, the higher the fATPS, the higher biomass yield can be achieved, consistent with correlations shown in S6(B) Fig. This reconstructed fitness landscape fully reproduced the observed stratified phenotypic distribution.
In summary, we introduced the fraction of total ATP produced by the ATP synthase (fATPS) as a simple, yet effective, quantification for the cell’s metabolic state, and key determining factor for the stratified phenotypic distribution on the rate-yield plane.
Multimodal distribution of fATPS is constrained by proteome complexity of the ATP production pathways
The quantitative relationship between fATPS and a cell’s metabolic and phenotypic state inspired the investigation for the underlying constraints imposed on the ATP production reactions. To deduce the source of this constraint, we look for systematic differences in protein expression profiles between solutions with different fATPS values. First, we generated sampling simulations constrained to six defined growth rates at 30°C to limit uncontrolled biases from temperature-induced differences in growth rate (Materials and methods, S9(A) Fig). The result confirmed the observed relationships by reproducing the multimodal distribution centered at the same fATPS values (S9(B) Fig).
Next, we order the expression profiles of the simulated strains by their computed fATPS values (Fig 2A). We find that an increase in fATPS is accompanied by a shift to a more complex proteome. The increase in proteome complexity is manifested in two ways. First, the number of genes expressed increases (Fig 2B left). For example, the pentose phosphate pathway and the multi-gene protein complexes in oxidative phosphorylation are only extensively used when aerobic respiration is turned on (fATPS > 0). Second, the average number of subunits per enzyme increases (Fig 2B right). In other words, as ATP synthase becomes responsible for a larger fraction of ATP production, the cell tends to use larger multi-domain protein complexes instead of single-gene enzymes with low molecular mass.
The switch between single-gene and multi-domain enzymes is the most obvious in oxidative phosphorylation pathways, and particularly electron transport chain (ETC) reactions (Fig 2C). For example, reduction of the quinone pool is mainly performed by the NADH dehydrogenase II Ndh at low fATPS, but switches to larger protein complexes, such as the formate dehydrogenase and the NADH:quinone oxidoreductase, as fATPS increases. In the subsequent oxidation of quinol and transport of protons across the inner membrane, the smaller oxidase complex CydABX is used at low fATPS, and the larger alternative CyoABCD takes over at higher fATPS. We note that approximately 60% of the reactions in oxidative phosphorylation rely on one or multiple protein complexes for catalysis (S1 Table). Compared to other metabolic pathways, this high level of protein complexity is likely an evolutionary result to provide more flexibility and fine-tuning for the discrete selection of energy production strategies.
These results reveal an intricate balance between proteome complexity and the energy requirements for cell growth. As the energy demand increases, more and more enzyme complexes are necessary to achieve higher ATP yield. However, larger complexes also require significantly more metabolic resources for their biosynthesis. Thus, once activated, these enzyme complexes should be used as much as possible, inducing necessary rewiring of the metabolic network for optimal balance in proteome allocation, and shifting the ATP production strategy to the next discrete state.
Introduction of the “aero-type” as a phenotypic trait defined based on fATPS
We have shown that aerobic respiration through ATP synthase determines the cell’s metabolic state and its phenotypic location on the rate-yield plane. Accordingly, we define “aero-types” i to v to describe the five populated phenotypes represented by the five peak values of fATPS (from fATPS = 0 to fATPS ∼ 0.71) observed in the strain sampling simulations. Computationally, we compare aero-type with the P/O ratio, a commonly used parameter that describes the cellular respiratory behavior. We show that the P/O ratio outlines only the local stoichiometry of the oxidative phosphorylation pathways. Aero-type offers a more global description of cellular fitness by representing the metabolic and phenotypic state, and the proteome complexity associated with a specific energy production scheme (S1 Text and S10 Fig). Nevertheless, experimental evidence is necessary to establish the computationally defined aero-type as a practical proxy measure for the bacterial fitness.
We resorted to the characterization of genetic mutations that may trigger a switch in the aero-type. According to the comprehensive decomposition of the ETC enzyme usage shown in Fig 2 and S10 Fig, we selected two genes from the dehydrogenase (nuoB from the NADH dehydrogenase I and the NADH dehydronase II gene ndh) and two from the cytochrome oxidase (cyoB from the cytochrome bo oxidase and cydB from the cytochrome bd-I oxidase) for genetic manipulation (Materials and methods). We would expect that the removal of ndh would most likely switch the cell to aero-type iv or v, which have the highest Y and the lowest qglc on the rate-yield plane. Removing cyoB (regardless of which NADH dehydrogenase is present) would most likely leave the cell in aero-type i and ii, with lower Y and higher qglc. The mutants depleted of cydB and/or nuoB are, in principle, still accessible to all aero-types. However, it is less likely for the ΔnuoB mutant to have higher Y and lower qglc, because the NADH dehydrogenase I is almost always activated for aero-type iv and v.
We constructed the single (Δndh, ΔnuoB, ΔcydB, and ΔcyoB) and double (ΔndhΔcydB, ΔndhΔcyoB, ΔnuoBΔcydB, and ΔnuoBΔcyoB) knockout strains to test the predicted phenotypic effect experimentally (Materials and methods). Phenotype characterization of the eight mutants qualitatively captured the computationally predicted trends (Fig 3A and S2 Table), and showed that the designed removal of the ETC genes was able to restrain the mutant within the corresponding aero-type at different temperatures (S11 Fig). Additional evidence came from Portnoy et al. [37], where all terminal cytochrome oxidase genes (cydAB, cyoABCD, and appBC) and a quinol monooxygenase gene, ygiN, were removed from the E. coli genome. This mutant strain was characterized by the lowest possible Y and highest qglc, corresponding to aero-type i (fATPS = 0).
Next, we confirmed the correlation between aero-type and the proteomic state of the mutant strains using RNA-Seq analysis (Materials and methods). Hierarchical clustering of the expression profile showed groupings consistent with the aero-type assigned on the rate-yield plane (Fig 3B). For example, the ΔcyoB mutants grouped together in lower aero-type regardless of their large difference in growth rate and glucose uptake rate. Genes involved in central metabolism were also clustered in two main groups (Fig 3B). Consistent with the metabolic state shift shown in Fig 1B, aerobic respiration and metabolic activity decrease, while anaerobic respiration increases as the assigned aero-type goes down from v (yellow) to ii (purple).
In short, we design mutant strains where the major ETC enzymes are removed combinatorially to perturb the cell’s respiratory potential and ATP production strategy. We show that the phenotypic outcome, proteome re-allocation, and the phenotypic aero-type switch of these strains are consistent with the computational predictions.
Stratification of the anaerobic phenotypes using nitrate as the electron acceptor
As a facultative anaerobe, E. coli is able to thrive under a variety of environmental conditions, from highly oxic to completely anoxic, with its amazingly versatile pool of fifteen primary dehydrogenases and ten terminal reductases [38]. So far, we have discussed how the differential usage of approximately one third of these enzymes gives rise to a stratified phenotypic distribution during aerobic growth when oxygen is used as a terminal electron acceptor. How do optimal phenotypes distribute on the rate-yield plane under anaerobic condition if alternative dehydrogenases and terminal reductases are activated?
To answer this question, we created an in silico strain where the expression of all terminal cytochrome oxidase genes (cydAB, cyoABCD, and appBC) and a quinol monooxygenase gene (ygiN) were set to zero. This mutant strain was shown to produce a phenotype that was almost incapable of oxygen utilization and presented fermentative behavior under oxic condition [37]. Considering that nitrate represses other anaerobic pathways in E. coli under anoxic conditions [38], we supplemented nitrate to be utilized as the preferred electron acceptor instead of oxygen, performed strain sampling simulations, and examined the fitness distribution.
Three discrete anaerobic phenotypes were found that distributed in a stratified fashion on the rate-yield plane (Fig 4A). Consistent with the aero-type analysis, each observed phenotype can be characterized by a particular fATPS value (Fig 4B), usage of a different combination of the respiratory enzymes (Fig 4C), and different proteome complexity (Fig 4C and 4D). By analogy but not to be confused with the “aero-type” where oxygen is used as the terminal electron acceptor, we denoted these anaerobic phenotypes “nitro-type” i ∼ iii. Nitro-type i with the lowest biomass yield expressed the siroheme NADH-nitrite reductase NirAB in addition to the high expression of the nitrate reductase A or Z. This small-molecular-weight enzyme likely helped to reduce proteome complexity through either detoxifying nitrite generated by the nitrate reductases, or by carrying out fermentative ammonification that balanced between maximizing ATP production and maintaining the NAD+ levels [39]. These results again emphasized the importance of proteome allocation to the energy production pathways in determining the phenotypic distribution.
Discussion
In this study, we develop a systems biology definition of the phenotypic fitness landscape based on the solution space of the E. coli genome-scale FoldME model [33]. Simulations that sample thousands of E. coli strains across many temperatures lead to the discovery of a stratified geometry of phenotypic distribution, which is consistent with observations from a compendium of experimental phenotypic data. FoldME’s capability to reveal quantitative multi-level relationships between a cell’s genotype, metabolic state, proteomic allocation, energy production strategy, and the phenotype provides us with the opportunity to interpret the observed topography of the phenotypic fitness landscape [19, 40]. We find that: 1) the stratification is due to the discreteness of the ATP production strategy; 2) the fraction of the ATP produced by the ATP synthase (fATPS) is a governing parameter describing the discretization; and 3) the discretization is rooted in a balance between the modularity of proteome composition and metabolic functions underlying optimal growth.
The direct correlation between a cell’s energy production strategy and the phenotypic landscape topography inspires the definition of the E. coli “aero-type” to summarize the complex relationships between genotype, metabolic state, proteome allocation, and phenotype. We reason that a switch in the aero-type may occur if differential usage of the ETC enzymes is imposed by genetic mutations or environmental stresses. To confirm the hypothesis, we experimentally construct the mutant strains where major ETC enzymes are removed combinatorially, and show that the measured aero-types of the mutants are consistent with computational prediction.
With the aero-type defined as a key phenotypic descriptor, it is worth pointing out that discretization of ATP production through other reactions (represented by fPGK, fPYK and fACKr, S8(B)–S8(D) Fig) within each aero-type is also observed (S8(B)–S8(D) Fig). Based on these results, we propose a multi-level regulation that the cell uses to adjust its energy production strategy in adaptation to genetic and environmental perturbations (S12(A) Fig). A cell first partitions its cellular resources between the ATP synthase and enzymes that catalyze other ATP-production reactions to meet the minimal ATP requirement for growth. Thus, an aero-type is determined. Next, within each aero-type, two types of reactions further fine-tune the ratio between the proteome dedicated to ATP and biomass precursor production, respectively: 1) those that produce both ATP and biomass precursors, such as PGK and PYK, and 2) ACKr that contributes to ATP production alone. The final result optimizes the ratio between ATP and biomass precursors to maximize biomass production in adaptation to a particular condition that the cell encounters.
This two-level regulation is consistent with the underlying physical principles of the respiration-fermentation tradeoff on the top level, and thermodynamic tradeoff between biomass and ATP yield on the second [41, 42]. Moreover, our formulation offers critical mechanistic details compared to similar efforts that model energy metabolism as a partition between the parallel pathways of the high-yield, low-yield ATP producers and the biomass producer [43, 44], therefore extends the explanatory power beyond the constrained boundaries on the rate-yield plane to the full scope of a fitness landscape.
The proposed hierarchical energy production strategy may find applications in diverse fields such as metabolic biochemistry, cellular physiology, and evolutionary dynamics. For example, rate-yield tradeoff is one of the long-standing questions in understanding bacterial physiology [45], yet controversy as to whether a positive or negative relationship should be seen still exists [46, 47]. On top of what relationship should result, mechanistic interpretations also come in variety of forms: proteome investment tradeoff between the metabolic enzymes and the uptake system of the limiting nutrient [48, 49], efficiency tradeoff between the fermentation and respiration enzymes [50, 51], or tradeoff between membrane efficiency and ATP yield [52], to name a few. Our results help put forward a generalized yet straightforward reconciliation of these different points of view. If the energy production strategy (or aero-type) remains the same, a positive rate-yield correlation should be seen. When the current energy plan is not capable of supporting growth and a switch to another aero-type must occur, phenotypic tradeoffs result.
The phenotypic landscape defined based on aero-type also offers an alternative perspective to understand bacterial adaptation towards optimal fitness. Instead of “climbing up the fitness peak”, mutations that arise during evolution could move the phenotype in two directions: one towards higher growth rate, biomass yield, and nutrient uptake rate where the cells remain in the same aero-type; and the other in an orthogonal direction where an aero-type switch is anticipated under constant growth rate. The fitness effect of a particular mutation can then be analyzed through its influence on the metabolic network and proteome re-allocation, which is governed by the fundamental physicochemical principles regarding fermentation-respiration and thermodynamic tradeoffs. We present an initial attempt to contextualize this perspective on bacterial evolutionary dynamics (S2 Text and S12(B) and S12(C) Fig), and expect subsequent studies to investigate how this framework may help us understand the convergence and divergence, predictability and stochasticity of bacterial evolution.
The concept of a fitness landscape has shaped thinking in evolutionary biology since the 1930s when it was first articulated. Here, we put forward a low-dimensional representation of the fitness landscape by quantifying the metabolic and proteomic state using the relative contributions of a few key ATP-producing reactions. Our analysis suggests that the topology of this fitness landscape is encoded in the energy allocation strategy underlying an organism’s metabolic network and proteome complexity. The influence of environmental fluctuations (e.g., temperature change, the presence and absence of oxygen) and genetic perturbations (e.g., different sampling strategies on enzyme efficiency and protein stability) on the fitness landscape can be rationally evaluated based on how the cell’s energy production is regulated. In principle, such a fitness landscape should be a general and effective framework with which to understand adaptation and evolution of different cell types in a variety of organisms (e.g., Crabtree effect for yeast and Warburg effect for cancer cells) under diverse conditions.
Materials and methods
Literature compendium of E. coli phenotypes
Rate and yield are the most fundamental quantities used to describe bacterial ecology and physiology. The rate can be measured as growth rate, or moles/grams of substrate, ATP, or biomass production per unit time. Yield is usually measured by moles/grams of biomass or ATP per unit of substrate. Regardless of which definition of rate and yield to use, these two physiological parameters are tightly correlated with each other. However, the exact form of the relationship is context-dependent, which may vary according to different experimental procedures and conditions. Here, we aim to resolve the controversy and provide a unified explanation for the condition-dependent rate-yield correlation. Therefore, the particular definition should not affect our investigation and discussion. Without loss of generality, and to compare with the genome-scale model simulations using glucose as carbon source, we choose to use the substrate (glucose) uptake rate (q (qglc), mmol/gDW/h) and the biomass yield (gDW/g) to denote the E. coli phenotypic space.
Substrate uptake rate (q) and growth rate (μ) are collected from two main types of experimental measurements (S1(B) Fig, top left): 1) growth in nutrient chemostat [27, 53–57], and 2) characterization of the ALE end-point strains [32, 34, 35, 37, 58–64]. Biomass yield is then calculated as , where m is the molecular mass of the substrate. A total of 199 data points result in the phenotypic space spanned by substrate uptake rate and biomass yield, including measurements taken for wild-type E. coli and gene knockout strains (S1(B) Fig, top right), under different nutrient conditions (S1(B) Fig, bottom left), and with different E. coli strains (S1(B) Fig, bottom right). Despite the broad difference in data sources, the phenotypic characterization of E. coli seems to occupy a common space with an interesting structure that is discussed in the Results.
An overview of the FoldME model
All sampling simulations in this paper are performed using the recently developed genome-scale model for metabolism and protein expression enhanced with the chaperone network, FoldME [33]. The reconstruction of FoldME started with associating all biochemical reactions in the E. coli genome-scale ME-Model iOL1650 [31] with the sequences and structures of their catalytic enzymes [65]. Then we computed the temperature-dependent folding properties for every modeled protein, with which the protein’s condition-specific chaperone requirement was formulated. Next, we coupled the folding state of the cell into its metabolic network by allowing three folding pathways (spontaneous, DnaK-assisted, and GroEL/ES-mediated) to compete for folding of any protein based on the calculated chaperone requirement. As such, the model was capable of adjusting the in vivo folding pathway of each protein to minimize the global cost invested in chaperone biosynthesis and the energy requirement for folding.
The choice of parameters is critical for applying genome-scale models to understand biological phenomena on the systems level. The FoldME model is constructed based on three basic categories of parameters: 1) the global physiological parameters, 2) the in vivo turnover rate of metabolic enzymes, and 3) the protein-specific thermodynamic parameters. The first two categories of parameters are common to all ME-models, and thus are set to the default values as first developed in O’Brien et al. [31]. Protein-specific thermodynamic parameters, including the kinetic folding rate, free energy of unfolding, and aggregation propensity, are unique to the FoldME model. These parameters are calculated using protein sequences and structures with empirical prediction algorithms that are well established in literature. More details of model formulation, parameter calculation, and sensitivity analysis can be found in Chen et al. [33].
We showed that the FoldME model improved the precision and scope of prediction for the optimal proteome composition over a wide variety of perturbations, including temperature, nutrient availability, and genetic mutations, and is therefore suitable for the study of phenotypic distribution presented in this paper.
E. coli strain sampling simulations
The purpose of our sampling method is to evaluate the phenotypic distribution of E. coli using in silico strains reconstructed to represent the diversity of naturally occurring strains. We assume that adaptation is achieved through gradual accumulation of large amounts of mutations that emerge independently, each with a random small effect on the affected genes. To simulate the genome-scale consequence of this long-term dynamic evolutionary process and estimate its fitness effect, we design a two-step process: 1) select genes for mutation according to the probability of observing a mutation in each gene, and 2) determine the molecular effect of the mutation on the corresponding gene.
In the first step, we analyzed the genetic variations of 1,765 E. coli strains collected from 1) the PATRIC database [66], 2) the Ecoref strain panel [67], and 3) a manually curated set of adherent-invasive E. coli (AIEC) strains. We compiled protein sequences for the 1,566 protein-coding genes present in the FoldME model, and performed pairwise sequence alignments of the protein from each strain against its homologous sequence in E. coli K12 MG1655 [68]. We found a total of 266,940 coding region mutations, including 245,635 non-synonymous SNP, 16,591 deletions and 4,714 insertions. Then we defined the probability of observing a mutation in a gene as the number of all observed mutations in that gene over the total number of coding region mutations. Next, we need to determine which genes harbor mutation in each sample. To do that, we generated a random number between 0 and 1 for each gene. If the random number is smaller than the gene’s mutation frequency, the gene is mutated; otherwise, the gene is left in its wild-type form. As such, we reproduced the probability of observing a mutation in a gene in the naturally occurring E. coli strains (S2 Fig).
In the second step, we perturbed the catalytic efficiency and the thermo-stability of the enzymes selected for mutation in the first step. The beneficial and deleterious effects of mutations were known to distribute exponentially, with many small‑effect mutations and fewer large‑effect ones [69, 70]. To reflect the exponential distribution of beneficial effect at the gene level, we scaled the in vivo turn-over rate (keff) of the affected enzyme with an exponentially distributed random number between 0.5 and 2. In the same time, we perturbed the enzyme’s thermo-stability (free energy of unfolding ΔG) by a random amount between -2 to 2 kcal/mol to account for (de)stabilizing mutations with a small effect. The direction of change in the enzyme efficiency and stability was assumed to be opposite considering the pleiotropic effects of mutations [71], such that if the enzyme’s efficiency increased, its stability decreased, and vice versa.
Finally, to introduce environmental stresses, we simulated 100 strain samples at each temperature from 25 to 46°C, resulting in a total of 2,200 simulations.
To test the robustness of this sampling process, we performed additional sets of simulations with the following modifications: 1) fixed the number of mutated genes to 10%, 20%, or 30% of the total number of modeled genes, and selected mutations assuming uniform mutation fixation frequency for all genes; 2) perturbed only the keff or the stability of the enzymes selected for mutation; 3) used a different wild-type keff profile according to the recent machine learning study [72]; and 4) perturbed the keff of the mutated enzyme with larger scaling factors. None of these modifications in the sampling procedure changed our main conclusion regarding a stratified phenotypic landscape determined by the multimodal distribution of fATPS. As an example, we showed the result for sampling simulations in which keff was scaled between 0.1 to 10 fold, and 0.01 to 100 fold (modification #4, S9(C) Fig). In both cases, fATPS distributed around the same locations as shown in Fig 1C and S9(B) Fig, with differences only in the relative amplitude of the fitness peaks. Therefore, we considered our current choice of sampling procedure and parameters capable of generating robust phenotypic predictions with evolutionarily meaningful genotypes.
Sampling simulations with fixed fATPS
To confirm the relationship between the multimodal distribution of fATPS and the stratified structure of the phenotypic fitness landscape, we performed sampling simulations where fATPS was constrained to its five most likely values: 0, 0.37, 0.53, 0.64, and 0.71. The constraint was formulated as followed:
(1) |
where Vreaction_name denoted the flux of the corresponding reaction and p was the value that fATPS was constrained to. For every fATPS value, 24 sampling simulations were performed at each temperature from 25 to 46°C. However, this strong constraint caused many incompatibilities with the sampled genotype, resulting in a final 2,237 feasible and optimal solutions reported in S8 Fig.
Sampling simulations at fixed growth rate
Difference in growth temperature gave rise to systematic changes in protein stability and in vivo turnover rate of the enzymes, consequently different growth rates [33]. To rule out the possibility that the multimodal distribution of fATPS was a result of the bias induced by growth rate difference at different temperatures, we performed additional sampling simulations at one particular temperature. In the same time, it was desirable to cover as much on the rate-yield space as possible. Thus, we examined the accessible range of qglc and Y (i.e., values that render feasible solutions for cell growth) at each temperature in the previously described 2,200 sampling simulations (S9(A) Fig). In general, below the optimal growth temperature, accessible ranges of qglc and Y both decreased as temperature increased, favoring the choice of lower temperature. Then, we considered the overlap with the most populated experimental phenotype range (S1(A) Fig), where qglc varied approximately in the range between 5 to 15 mmol/gDW/h and Y between 0.3 to 0.55 gDW/g. Combining both criteria, we fixed the second set of sampling simulation at 30°C (S9(A) Fig, red).
Next, to maximize instances in every discrete fATPS regime and enable direct comparison in metabolic fluxes, we focused sampling along a few μ-isoclines. We chose six growth rates (values reported in relative to the WT growth rate at 37°C): 3 around the average growth rate at 30°C (0.36, 0.44, 0.47), one close to the upper limit for growth at 30°C (0.65), and two slightly lower than the lower bound (0.18, 0.22). Simulation at higher fixed growth rate generated large number of infeasible solutions, thus were not included. The results confirmed that at each simulated growth rate, fATPS showed similar multiple Gaussian distribution that differed only in the relative weight of each Gaussian. Because of the same number of peaks and the mean values, we reported in S9(B) Fig the collective result for all six growth rates together.
Fitting the multimodal distribution of fATPS
We assumed the fATPS value for each aero-type to be normally distributed. It followed that fATPS calculated from the sampling simulations should be fitted to a mixture of multiple Gaussian distributions, each representing one aero-type. The number of Gaussian distributions (peaks) should be chosen as the number of aero-types determined based on the distinguishable metabolic (Fig 1) and proteomic states (Fig 2). Therefore, we consider fATPS = 0 (the fully fermentative phenotype) as one “peak”, and fitted the remaining data with four Gaussians using Matlab.
To check whether our choice for the number of peaks was consistent, we compared distributions generated from many separate sets of sampling simulations, including those using different sampling strategies for sensitivity analysis. The fATPS distribution constantly showed five peaks, although the heights of the peaks varied. Peaks around 0.0, 0.37 and 0.53 were clearly present throughout all data sets, whereas peaks around 0.64 and 0.71 could be blurred under certain conditions. This final uncertainty likely came from unresolved proteome complexity of the highly respiratory phenotypes, which should not impair the validity of the fitting and the sampling process.
Bacterial strains
The E. coli electron transport chain contains two types of enzymes: a dehydrogenase that oxidizes an electron donor and a cytochrome oxidase that reduces the electron acceptor (S10(A) Fig). To create mutant strains that are constrained to a particular aero-type, we choose two enzymes from each category to be removed from the genome: NADH dehydrogenase I (NuoABCDEFGHIJKLMN) and NADH dehydronase II (Ndh) for the dehydrogenase; cytochrome bo oxidase (CyoABCD) and cytochrome bd-I oxidase (CydAB) for the cytochrome oxidase.
Three of the chosen ETC enzymes are multi-protein complexes, and we aim to choose the gene that maximally disrupts the function of the whole enzyme. For NADH dehydrogenase I, all subunits are required for the assembly or stability of a functional enzyme [73]. The subunit encoded by the gene nuoB contains the N2 4Fe-4S cluster, which may play a role in proton translocation activity of the enzyme [74]. For cytochrome bd-I oxidase, although both subunits are required for binding of the heme b595 and heme d components of cytochrome bd-I, subunit II encoded by gene cydB binds a structural ubiquinone-8 cofactor that may have a role in the dimer assembly [75]. Similarly, deletion each gene in the cyo operon results in nonfunctional enzymes, yet we choose to disrupt cyoB because it encodes subunit I which is involved in proton translocation [76].
The four single-ETC-gene-knockout and four double-ETC-gene-knockout strains were constructed with the P1 phage transduction method [77], using E. coli K-12 MG1655 (ATCC 700926) as the recipient strain. Keio collection strains were used as donor strain for the generation of gene knockout cassettes containing a kanamycin resistance marker [78]. Knock-outs were confirmed by PCR and DNA resequencing (S3 Table).
E. coli phenotype characterization
Characterizations were performed fully aerated, at 37°C, in 15 mL working volume tubes containing M9 glucose medium, as described in LaCroix et al. [34]. Cultures were initially inoculated from frozen glycerol stocks, and grown overnight. Physiological adaptation was achieved by growing exponentially over 2 passages for 5 to 10 generations. Cultures were then passaged to a fresh tube, and spectrophotometer optical density (OD) readings were periodically taken at a wavelength of 600 nanometers (Thermo Fisher Scientific, Waltham, MA) until stationary phase was reached.
Samples were filtered through a 0.22 micrometer filter (MilliporeSigma, Burlington, MA) at the same time OD measurements were taken, and the filtrate was analyzed for glucose concentrations using a high-performance liquid chromatography system (Agilent Technologies, Santa Clara, CA) with an Aminex HPX-87H column (Bio-Rad Laboratories, Hercules, CA). Glucose uptake rates in exponential growth were determined by best-fit linear regression of glucose concentrations versus cell dry weights, multiplied by growth rates over the same sample range.
The oxygen uptake rate of each aerobic culture was determined by measuring the rate of dissolved oxygen depletion in an enclosed respirometer chamber using YSI 5300A Biological Oxygen Monitor System that utilized Clark type polarographic oxygen probes (Cole-Parmer Instruments, Vernon Hills, IL).
DNA resequencing
To determine the mutations emerged during adaptive laboratory evolution of the pgi-deficient E. coli strain, growth-improved clones along the ALE trajectory were isolated and grown in M9 minimal medium supplemented with 4g/L glucose. Cells were then harvested while in exponential growth and genomic DNA was extracted using a KingFisher Flex Purification system previously validated for the high throughput platform mentioned below [79]. Shotgun sequencing libraries were prepared using a miniaturized version of the Kapa HyperPlus Illumina-compatible library prep kit (Kapa Biosystems). DNA extracts were normalized to 5 ng total input per sample using an Echo 550 acoustic liquid handling robot (Labcyte Inc), and 1/10 scale enzymatic fragmentation, end-repair, and adapter-ligation reactions carried out using a Mosquito HTS liquid-handling robot (TTP Labtech Inc). Sequencing adapters were based on the iTru proptocol [80], in which short universal adapter stubs were ligated first and then sample-specific barcoded sequences added in a subsequent PCR step. Amplified and barcoded libraries were then quantified using a PicoGreen assay and pooled in approximately equimolar ratios before being sequenced on an Illumina HiSeq 4000 instrument.
RNA-Seq data acquisition and analysis
Total RNA was sampled from duplicate cultures. All strains were grown in M9 minimal medium supplemented with 4g/L glucose. 3 mL of cell broth (taken at OD600 ∼ 0.6) was immediately added to 2 volumes Qiagen RNAprotect Bacteria Reagent (6 mL), vortexed for 5 seconds, incubated at room temperature for 5 min, and immediately centrifuged for 10 min at 17,500 RPMs. The supernatant was decanted, and the cell pellet was stored at -80°C. Cell pellets were thawed and incubated with Readylyse Lysozyme, SuperaseIn, Protease K, and 20% SDS for 20 minutes at 37°C. Total RNA was isolated and purified using the RNeasy Plus Mini Kit (Qiagen) columns following vendor procedures. An on-column DNase treatment was performed for 30 minutes at room temperature. RNA was quantified using a spectrophotometer (NanoDrop 1000, Thermo Fisher Scientific, Waltham MA) and quality was assessed by running RNA electrophoresis on the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara CA). The rRNA was removed using Illumina Ribo-Zero rRNA Removal Kit (Gram-Negative Bacteria). Stranded RNA-Seq Kit (Kapa Biosystems) was used following the manufacturer’s protocol to create sequencing libraries with an average insert length of around ∼300 bp. Libraries were sequenced on an Illumina HiSeq 4000 instrument.
Raw sequencing reads were obtained as described above, and mapped to the reference genome NC_000913.3 using Bowtie 2.3.4.3 [81] with the following options “-X 1000 -N 1 -3 3”. Transcript abundance was quantified using summarizeOverlaps from the R GenomicAlignments package, with the following options “mode=“IntersectionStrict”, singleEnd = FALSE, ignore.strand = FALSE, preprocess.reads = invertStrand” [82]. Transcripts per Million (TPM) was calculated by DESeq2, and log-transformed TPM (log2(TPM+ 1)), referred to as log-TPM, was taken for the downstream analysis. The log-TPM values of the two biological replicates were highly correlated (R2 > 0.97), expect for the ΔndhΔcydB mutant (R2 = 0.91). Uncertainty of the ΔndhΔcydB mutant might come from partial knockout for one of the replicates, which showed relatively high expression of the cydB gene. We considered the aero-type assignment and other quantifications for this mutant to be less reliable compared to others.
Principal component analysis (PCA) of the log-TPM showed that the first four components could explain 84% of the variations throughout the expression profile. The first principal component was highly correlated with exchange rates such as the glucose/oxygen uptake rate and the acetate production rate, and the second with growth rate. These components, although highly explanatory, were enriched in gene clusters (e.g., chemotaxis, flagellum biosynthesis, amino acid metabolism, sugar transport, etc.) that were non-specific to the conditions and phenotypes that we were interested in. Alternatively, the fourth component, explaining 5.4% of the overall variation, was highly enriched in genes involved TCA cycle, anaerobic respiration and ETC activity. Consequently, we considered selecting genes most representative for aero-type for subsequent hierarchical clustering analysis. First, we calculated the Spearman correlation between five phenotypic parameters (μ,Y ,qglc,, qac) and our aero-type definition in the sampling simulations. It turned out that only qac(Spearman correlation = -0.9; P-value = 4e-143) and Y (Spearman correlation = 0.87; P-value = 4e-120) showed significant correlation. Then, we computed the Pearson correlation between log-TPM and experimentally measured phenotypic parameters, and selected genes that were highly correlated with qac and Y (P-value<0.01), but not with μ. This process resulted in a set of 391 genes, which were used for generating the clustering diagram shown in Fig 3B. As expected, this set was enriched in genes involved in oxidative phosphorylation (17 out of 94, one-sided binomial test P-value = 0.004) and TCA cycle (7 out of 27, one-sided binomial test P-value = 0.009). The clustering pattern qualitatively resembled that generated using all genes in the expression profile (4314 genes in total), yet it maximized signal of interest for easy analysis and interpretation.
Supporting information
Acknowledgments
We thank Zachary King and David Heckman for helpful discussions. This research used resources of the National Energy Research Scientific Computing Center, supported by the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Data Availability
All simulations in this paper are performed (and can be reproduced) using the FoldME model, which is constructed using the COBRApy toolbox for constraint-based modeling and its extension for the ME-models, COBRAme, ECOLIme, and solveME, all publicly available on Github (https://github.com/SBRG/cobrame/, https://github.com/SBRG/ecolime, https://github.com/SBRG/solvemepy). All data processed from the simulations are within the manuscript and its Supporting information. The RNASeq data used in the manuscript has been deposited to public database GEO. The accession number is GSE164236. You can find the data in the following link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164236.
Funding Statement
This work was funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF10CC1016517, https://www.novonordisk.com/about-novo-nordisk/corporate-governance/foundation.html). BOP received funding from NIH National Institute of General Medical Sciences (NIH R01 GM057089, https://www.nigms.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc Sixth Int Congr Genet. 1932;1. [Google Scholar]
- 2. Lobkovsky AE, Koonin EV. Replaying the tape of life: quantification of the predictability of evolution. Front Genet. 2012;3:246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Achaz G, Rodriguez-Verdugo A, Gaut BS, Tenaillon O. The reproducibility of adaptation in the light of experimental evolution with whole genome sequencing. Adv Exp Med Biol. 2014;781:211–231. 10.1007/978-94-007-7347-9_11 [DOI] [PubMed] [Google Scholar]
- 4. De Visser JAG, Krug J. Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet. 2014;15(7):480–490. 10.1038/nrg3744 [DOI] [PubMed] [Google Scholar]
- 5. Nichol D, Jeavons P, Fletcher AG, Bonomo RA, Maini PK, Paul JL, et al. Steering evolution with sequential therapy to prevent the emergence of bacterial antibiotic resistance. PLoS Comput Biol. 2015;11(9):e1004493 10.1371/journal.pcbi.1004493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science. 2006;312(5770):111–114. 10.1126/science.1123539 [DOI] [PubMed] [Google Scholar]
- 7. Poelwijk FJ, Kiviet DJ, Weinreich DM, Tans SJ. Empirical fitness landscapes reveal accessible evolutionary paths. Nature. 2007;445(7126):383–386. 10.1038/nature05451 [DOI] [PubMed] [Google Scholar]
- 8. Moradigaravand D, Engelstädter J. The effect of bacterial recombination on adaptation on fitness landscapes with limited peak accessibility. PLoS Comput Biol. 2012;8(10):e1002735 10.1371/journal.pcbi.1002735 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Aguilar-Rodríguez J, Payne JL, Wagner A. A thousand empirical adaptive landscapes and their navigability. Nat Ecol Evol. 2017;1:0045 10.1038/s41559-016-0045 [DOI] [PubMed] [Google Scholar]
- 10. Blanquart F, Achaz G, Bataillon T, Tenaillon O. Properties of selected mutations and genotypic landscapes under Fisher’s geometric model. Evolution. 2014;68(12):3537–3554. 10.1111/evo.12545 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Blanquart F, Bataillon T. Epistasis and the structure of fitness landscapes: are experimental fitness landscapes compatible with Fisher’s geometric model? Genetics. 2016;203(2):847–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Hwang S, Park SC, Krug J. Genotypic complexity of Fisher’s geometric model. Genetics. 2017;206(2):1049–1079. 10.1534/genetics.116.199497 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Orr HA. Fitness and its role in evolutionary genetics. Nat Rev Genet. 2009;10(8):531–539. 10.1038/nrg2603 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol. 2009;10(12):866–876. 10.1038/nrm2805 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Li C, Zhang J. Multi-environment fitness landscapes of a tRNA gene. Nat Ecol Evol. 2018;2(6):1025–1032. 10.1038/s41559-018-0549-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Barrick JE, Lenski RE. Genome dynamics during experimental evolution. Nat Rev Genet. 2013;14(12):827–839. 10.1038/nrg3564 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Ibarra R, Fu P, Palsson B, DiTonno J, Edwards J. Quantitative analysis of Escherichia coli metabolic phenotypes within the context of phenotypic phase planes. J Mol Microbiol Biotechnol. 2003;6(2):101–108. 10.1159/000076740 [DOI] [PubMed] [Google Scholar]
- 18. Ndifon W, Plotkin JB, Dushoff J. On the accessibility of adaptive phenotypes of a bacterial metabolic network. PLoS Comput Biol. 2009;5(8):e1000472 10.1371/journal.pcbi.1000472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. O’Brien EJ, Monk JM, Palsson BO. Using genome-scale models to predict biological capabilities. Cell. 2015;161(5):971–987. 10.1016/j.cell.2015.05.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Edwards J, Palsson B. The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci USA. 2000;97(10):5528–5533. 10.1073/pnas.97.10.5528 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Snitkin ES, Dudley AM, Janse DM, Wong K, Church GM, Segrè D. Model-driven analysis of experimentally determined growth phenotypes for 465 yeast gene deletion mutants under 16 different conditions. Genome Biol. 2008;9(9):R140 10.1186/gb-2008-9-9-r140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Rodrigues JFM, Wagner A. Evolutionary plasticity and innovations in complex metabolic reaction networks. PLoS Comput Biol. 2009;5(12):e1000613 10.1371/journal.pcbi.1000613 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Papp B, Notebaart RA, Pál C. Systems-biology approaches for predicting genomic evolution. Nat Rev Genet. 2011;12(9):591–602. 10.1038/nrg3033 [DOI] [PubMed] [Google Scholar]
- 24. Harrison R, Papp B, Pál C, Oliver SG, Delneri D. Plasticity of genetic interactions in metabolic networks of yeast. Proc Natl Acad Sci USA. 2007;104(7):2307–2312. 10.1073/pnas.0607153104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Beg QK, Vazquez A, Ernst J, de Menezes MA, Bar-Joseph Z, Barabási AL, et al. Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity. Proc Natl Acad Sci USA. 2007;104(31):12663–12668. 10.1073/pnas.0609845104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Goelzer A, Fromion V, Scorletti G. Cell design in bacteria as a convex optimization problem. Automatica. 2011;47(6):1210–1218. 10.1016/j.automatica.2011.02.038 [DOI] [Google Scholar]
- 27. Mori M, Hwa T, Martin OC, De Martino A, Marinari E. Constrained allocation flux balance analysis. PLoS Comput Biol. 2016;12(6):e1004913 10.1371/journal.pcbi.1004913 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Sánchez BJ, Zhang C, Nilsson A, Lahtvee PJ, Kerkhoven EJ, Nielsen J. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol Syst Biol. 2017;13(8):935 10.15252/msb.20167411 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Lerman JA, Hyduke DR, Latif H, Portnoy VA, Lewis NE, Orth JD, et al. In silico method for modelling metabolism and gene product expression at genome scale. Nat Commun. 2012;3(1):929 10.1038/ncomms1928 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Thiele I, Fleming RM, Que R, Bordbar A, Diep D, Palsson BO. Multiscale modeling of metabolism and macromolecular synthesis in E. coli and its application to the evolution of codon usage. PloS One. 2012;7(9):e45635 10.1371/journal.pone.0045635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. O’Brien EJ, Lerman JA, Chang RL, Hyduke DR, Palsson BØ. Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol Syst Biol. 2013;9(1):693 10.1038/msb.2013.52 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Sandberg TE, Lloyd CJ, Palsson BO, Feist AM. Laboratory evolution to alternating substrate environments yields distinct phenotypic and genetic adaptive strategies. Appl Environ Microbiol. 2017;83(13):e00410–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Chen K, Gao Y, Mih N, O’Brien EJ, Yang L, Palsson BO. Thermosensitivity of growth is determined by chaperone-mediated proteome reallocation. Proc Natl Acad Sci USA. 2017;114(43):11548–11553. 10.1073/pnas.1705524114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. LaCroix RA, Sandberg TE, O’Brien EJ, Utrilla J, Ebrahim A, Guzman GI, et al. Use of adaptive laboratory evolution to discover key mutations enabling rapid growth of Escherichia coli K-12 MG1655 on glucose minimal medium. Appl Environ Microbiol. 2015;81(1):17–30. 10.1128/AEM.02246-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Sandberg TE, Pedersen M, LaCroix RA, Ebrahim A, Bonde M, Herrgard MJ, et al. Evolution of Escherichia coli to 42°C and subsequent genetic engineering reveals adaptive mechanisms and novel mutations. Mol Biol Evol. 2014;31(10):2647–2662. 10.1093/molbev/msu209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Long CP, Gonzalez JE, Feist AM, Palsson BO, Antoniewicz MR. Fast growth phenotype of E. coli K-12 from adaptive laboratory evolution does not require intracellular flux rewiring. Metab Eng. 2017;44:100–107. 10.1016/j.ymben.2017.09.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Portnoy VA, Scott DA, Lewis NE, Tarasova Y, Osterman AL, Palsson BØ. Deletion of genes encoding cytochrome oxidases and quinol monooxygenase blocks the aerobic-anaerobic shift in Escherichia coli K-12 MG1655. Appl Environ Microbiol. 2010;76(19):6529–6540. 10.1128/AEM.01178-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Unden G, Bongaerts J. Alternative respiratory pathways of Escherichia coli: energetics and transcriptional regulation in response to electron acceptors. Biochim Biophys Acta-Bioenergetics. 1997;1320(3):217–234. 10.1016/S0005-2728(97)00034-0 [DOI] [PubMed] [Google Scholar]
- 39. Wang X, Tamiev D, Alagurajan J, DiSpirito AA, Phillips GJ, Hargrove MS. The role of the NADH-dependent nitrite reductase, Nir, from Escherichia coli in fermentative ammonification. Arch Microbiol. 2019;201(4):519–530. 10.1007/s00203-018-1590-3 [DOI] [PubMed] [Google Scholar]
- 40. Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Genet. 2014;15(2):107–120. 10.1038/nrg3643 [DOI] [PubMed] [Google Scholar]
- 41. Pfeiffer T, Schuster S, Bonhoeffer S. Cooperation and competition in the evolution of ATP-producing pathways. Science. 2001;292(5516):504–507. 10.1126/science.1058079 [DOI] [PubMed] [Google Scholar]
- 42. Pfeiffer T, Bonhoeffer S. Evolutionary consequences of tradeoffs between yield and rate of ATP production. Z Phys Chem. 2002;216(1):51–63. [Google Scholar]
- 43. Chen Y, Nielsen J. Energy metabolism controls phenotypes by protein efficiency and allocation. Proc Natl Acad Sci USA. 2019;116(35):17592–17597. 10.1073/pnas.1906569116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Cheng C, O’brien EJ, McCloskey D, Utrilla J, Olson C, LaCroix RA, et al. Laboratory evolution reveals a two-dimensional rate-yield tradeoff in microbial metabolism. PLoS Comput Biol. 2019;15(6):e1007066 10.1371/journal.pcbi.1007066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Monod J. The growth of bacterial cultures. Ann Rev Microbiol. 1949;3(1):371–394. 10.1146/annurev.mi.03.100149.002103 [DOI] [Google Scholar]
- 46. Lele U, Watve M. Bacterial growth rate and growth yield: is there a relationship. In: Proc. Indian Natn. Sci. Acad. vol. 80; 2014. p. 537–546. 10.16943/ptinsa/2014/v80i3/55129 [DOI] [Google Scholar]
- 47. Lipson DA. The complex relationship between microbial growth rate and yield and its implications for ecosystem processes. Front Microbiol. 2015;6:615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Molenaar D, Van Berlo R, De Ridder D, Teusink B. Shifts in growth strategies reflect tradeoffs in cellular economics. Mol Syst Biol. 2009;5(1):323 10.1038/msb.2009.82 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Mori M, Marinari E, De Martino A. A yield-cost tradeoff governs Escherichia coli’s decision between fermentation and respiration in carbon-limited growth. NPJ Syst Biol Appl. 2019;5(1):16 10.1038/s41540-019-0093-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proc Natl Acad Sci USA. 2013;110(24):10039–10044. 10.1073/pnas.1215283110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Basan M, Hui S, Okano H, Zhang Z, Shen Y, Williamson JR, et al. Overflow metabolism in Escherichia coli results from efficient proteome allocation. Nature. 2015;528(7580):99–104. 10.1038/nature15765 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Szenk M, Dill KA, de Graff AM. Why do fast-growing bacteria enter overflow metabolism? Testing the membrane real estate hypothesis. Cell Syst. 2017;5(2):95–104. 10.1016/j.cels.2017.06.005 [DOI] [PubMed] [Google Scholar]
- 53. Holms H. Flux analysis and control of the central metabolic pathways in Escherichia coli. FEMS Microbiol Rev. 1996;19(2):85–116. 10.1111/j.1574-6976.1996.tb00255.x [DOI] [PubMed] [Google Scholar]
- 54. Vemuri GN, Altman E, Sangurdekar D, Khodursky AB, Eiteman MA. Overflow metabolism in Escherichia coli during steady-state growth: transcriptional regulation and effect of the redox ratio. Appl Environ Microbiol. 2006;72(5):3653–3661. 10.1128/AEM.72.5.3653-3661.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Nanchen A, Schicker A, Sauer U. Nonlinear dependency of intracellular fluxes on growth rate in miniaturized continuous cultures of Escherichia coli. Appl Environ Microbiol. 2006;72(2):1164–1172. 10.1128/AEM.72.2.1164-1172.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Valgepea K, Adamberg K, Nahku R, Lahtvee PJ, Arike L, Vilu R. Systems biology approach reveals that overflow metabolism of acetate in Escherichia coli is triggered by carbon catabolite repression of acetyl-CoA synthetase. BMC Syst Biol. 2010;4(1):166–178. 10.1186/1752-0509-4-166 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Renilla S, Bernal V, Fuhrer T, Castaño-Cerezo S, Pastor JM, Iborra JL, et al. Acetate scavenging activity in Escherichia coli: interplay of acetyl–CoA synthetase and the PEP–glyoxylate cycle in chemostat cultures. Appl Microbiol Biotechnol. 2012;93(5):2109–2124. 10.1007/s00253-011-3536-4 [DOI] [PubMed] [Google Scholar]
- 58. Ibarra RU, Edwards JS, Palsson BO. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature. 2002;420(6912):186–189. 10.1038/nature01149 [DOI] [PubMed] [Google Scholar]
- 59. Fong SS, Marciniak JY, Palsson BØ. Description and interpretation of adaptive evolution of Escherichia coli K-12 MG1655 by using a genome-scale in silico metabolic model. J Bacteriol. 2003;185(21):6400–6408. 10.1128/JB.185.21.6400-6408.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Fong SS, Palsson BØ. Metabolic gene–deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat Genet. 2004;36(10):1056–1058. 10.1038/ng1432 [DOI] [PubMed] [Google Scholar]
- 61. Fong SS, Burgard AP, Herring CD, Knight EM, Blattner FR, Maranas CD, et al. In silico design and adaptive evolution of Escherichia coli for production of lactic acid. Biotechnol Bioeng. 2005;91(5):643–648. 10.1002/bit.20542 [DOI] [PubMed] [Google Scholar]
- 62. Fong SS, Nanchen A, Palsson BO, Sauer U. Latent pathway activation and increased pathway capacity enable Escherichia coli adaptation to loss of key metabolic enzymes. J Biol Chem. 2006;281(12):8024–8033. 10.1074/jbc.M510016200 [DOI] [PubMed] [Google Scholar]
- 63. Latif H, Sahin M, Tarasova J, Tarasova Y, Portnoy VA, Nogales J, et al. Adaptive evolution of Thermotoga maritima reveals plasticity of the ABC transporter network. Appl Environ Microbiol. 2015;81(16):5477–5485. 10.1128/AEM.01365-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Sandberg TE, Long CP, Gonzalez JE, Feist AM, Antoniewicz MR, Palsson BO. Evolution of E. coli on [U-13C] glucose reveals a negligible isotopic influence on metabolism and physiology. PLoS One. 2016;11(3):e0151130 10.1371/journal.pone.0151130 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Brunk E, Mih N, Monk J, Zhang Z, O’Brien EJ, Bliven SE, et al. Systems biology of the structural proteome. BMC Syst Biol. 2016;10(1):26 10.1186/s12918-016-0271-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T, Bun C, et al. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Res. 2017;45(D1):D535–D542. 10.1093/nar/gkw1017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Galardini M, Koumoutsi A, Herrera-Dominguez L, Varela JAC, Telzerow A, Wagih O, et al. Phenotype inference in an Escherichia coli strain panel. eLife. 2017;6:e31035 10.7554/eLife.31035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Monk JM, Lloyd CJ, Brunk E, Mih N, Sastry A, King Z, et al. iML1515, a knowledgebase that computes Escherichia coli traits. Nat Biotechnol. 2017;35(10):904–908. 10.1038/nbt.3956 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Orr HA. The distribution of fitness effects among beneficial mutations. Genetics. 2003;163(4):1519–1526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Brajesh R, Dutta D, Saini S. Distribution of fitness effects of mutations obtained from a simple genetic regulatory network model. Sci Rep. 2019;9(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Soskine M, Tawfik DS. Mutational effects and the evolution of new protein functions. Nat Rev Genet. 2010;11(8):572–582. 10.1038/nrg2808 [DOI] [PubMed] [Google Scholar]
- 72. Heckmann D, Lloyd CJ, Mih N, Ha Y, Zielinski DC, Haiman ZB, et al. Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models. Nat Commun. 2018;9(1):5252 10.1038/s41467-018-07652-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Schneider D, Pohl T, Walter J, Dörner K, Kohlstädt M, Berger A, et al. Assembly of the Escherichia coli NADH: ubiquinone oxidoreductase (complex I). Biochim Biophys Acta-Bioenergetics. 2008;1777(7-8):735–739. 10.1016/j.bbabio.2008.03.003 [DOI] [PubMed] [Google Scholar]
- 74. Hellwig P, Scheide D, Bungert S, Mäntele W, Friedrich T. FT-IR spectroscopic characterization of NADH: ubiquinone oxidoreductase (complex I) from Escherichia coli: oxidation of FeS cluster N2 is coupled with the protonation of an aspartate or glutamate side chain. Biochemistry. 2000;39(35):10884–10891. 10.1021/bi000842a [DOI] [PubMed] [Google Scholar]
- 75. Theßeling A, Rasmussen T, Burschel S, Wohlwend D, Kägi J, Müller R, et al. Homologous bd oxidases share the same architecture but differ in mechanism. Nat Commun. 2019;10(1):1–7. 10.1038/s41467-019-13122-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Thomas JW, Puustinen A, Alben JO, Gennis RB, Wikstrom M. Substitution of asparagine for aspartate-135 in subunit I of the cytochrome bo ubiquinol oxidase of Escherichia coli eliminates proton-pumping activity. Biochemistry. 1993;32(40):10923–10928. 10.1021/bi00091a048 [DOI] [PubMed] [Google Scholar]
- 77. Thomason LC, Costantino N, Court DL. E. coli genome manipulation by P1 transduction. Curr Protoc Mol Biol. 2007;79(1):1.17.1––1.17.8. [DOI] [PubMed] [Google Scholar]
- 78. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006;2(1):2006.0008 10.1038/msb4100050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Marotz C, Amir A, Humphrey G, Gaffney J, Gogul G, Knight R. DNA extraction for streamlined metagenomics of diverse environmental samples. BioTechniques. 2017;62(6):290–293. [DOI] [PubMed] [Google Scholar]
- 80. Glenn TC, Nilsen R, Kieran TJ, Finger JW, Pierson TW, Bentley KE, et al. Adapterama I: universal stubs and primers for thousands of dual-indexed Illumina libraries (iTru & iNext). BioRxiv. 2016; p. 049114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118 10.1371/journal.pcbi.1003118 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All simulations in this paper are performed (and can be reproduced) using the FoldME model, which is constructed using the COBRApy toolbox for constraint-based modeling and its extension for the ME-models, COBRAme, ECOLIme, and solveME, all publicly available on Github (https://github.com/SBRG/cobrame/, https://github.com/SBRG/ecolime, https://github.com/SBRG/solvemepy). All data processed from the simulations are within the manuscript and its Supporting information. The RNASeq data used in the manuscript has been deposited to public database GEO. The accession number is GSE164236. You can find the data in the following link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164236.