Significance
Bacteria often evolve by copying genes from other strains, a process termed horizontal gene transfer. As a consequence, different strains of the bacterial species Escherichia coli differ substantially in the sets of genes they possess. Here, we use the inferred gene sets of all recent ancestors of 53 E. coli strains to reconstruct the ancestors’ abilities to grow in different nutritional environments. This allows us to infer over 3,000 metabolic innovations in E. coli’s evolutionary history. All innovations arose through the copying (transfer) of only one small piece of DNA from another strain, demonstrating an amazing capacity of E. coli to quickly adapt to new environments.
Keywords: horizontal gene transfer, lateral gene transfer, Escherichia coli, metabolic adaptation, flux balance analysis
Abstract
Even closely related prokaryotes often show an astounding diversity in their ability to grow in different nutritional environments. It has been hypothesized that complex metabolic adaptations—those requiring the independent acquisition of multiple new genes—can evolve via selectively neutral intermediates. However, it is unclear whether this neutral exploration of phenotype space occurs in nature, or what fraction of metabolic adaptations is indeed complex. Here, we reconstruct metabolic models for the ancestors of a phylogeny of 53 Escherichia coli strains, linking genotypes to phenotypes on a genome-wide, macroevolutionary scale. Based on the ancestral and extant metabolic models, we identify 3,323 phenotypic innovations in the history of the E. coli clade that arose through changes in accessory genome content. Of these innovations, 1,998 allow growth in previously inaccessible environments, while 1,325 increase biomass yield. Strikingly, every observed innovation arose through the horizontal acquisition of a single DNA segment less than 30 kb long. Although we found no evidence for the contribution of selectively neutral processes, 10.6% of metabolic innovations were facilitated by horizontal gene transfers on earlier phylogenetic branches, consistent with a stepwise adaptation to successive environments. Ninety-eight percent of metabolic phenotypes accessible to the combined E. coli pangenome can be bestowed on any individual strain by transferring a single DNA segment from one of the extant strains. These results demonstrate an amazing ability of the E. coli lineage to adapt to novel environments through single horizontal gene transfers (followed by regulatory adaptations), an ability likely mirrored in other clades of generalist bacteria.
In many ways, homologous recombination between the strains of a prokaryotic species is analogous to meiotic recombination in eukaryotes: It contributes to the efficient purging of deleterious mutations (1, 2) and brings together beneficial mutations that arose in different genetic backgrounds (i.e., it counters clonal interference) (3). Similar to recombination in eukaryotes, prokaryotic recombination may sometimes break up beneficial combinations of epistatically interacting sequences (1). Crucially, prokaryotic recombination of genomic regions that are only partially homologous facilitates horizontal gene transfer (HGT) between strains, a phenomenon contributing to prokaryotic adaptation (4). The role of recombination in the evolution of Escherichia coli and its relationship to HGT has been studied extensively over the past 70 y (4–11).
With the advent of high-throughput DNA sequencing, comparative genomics all but replaced the comparison of phenotypes as the basis for understanding evolution and natural selection. However, it is the phenotype that natural selection acts upon; to fully appreciate the patterns and driving forces of adaptation, we need to link genotypes to phenotypes on both the genomic scale and an evolutionary timescale. Bacterial metabolism is arguably the most promising model system for such an endeavor. The ability to efficiently metabolize nutrient sources is an essential determinant of bacterial fitness (12), and flux balance analysis (FBA) has been established as a robust and reliable modeling framework for the prediction of this ability (13, 14).
A computational analysis of approximate metabolic models generated automatically from genome sequences suggested that within-species phenotypic divergence is almost instantaneous, whereas divergence between genera is gradual or “clock-like” (12). Accordingly, the genetic distance calculated from multilocus sequence typing data is a weak indicator of how similar two E. coli strains are in terms of the carbon sources they can metabolize (15). How can within-species divergence be so much faster than between-species divergence? The answer likely lies in frequent recombination and the HGT events it facilitates between bacterial strains belonging to the same species (16): A small set of new genes acquired through HGT can potentially lead to drastic phenotypic changes (4, 12).
Horizontally transferred genes that do not provide fitness benefits are likely to be lost quickly, not least because of a mutational bias toward deletions in bacterial genomes (17). This logic suggests that successful HGTs—that is, those events that left their traces in extant genomes—were individually adaptive. A requirement for individually adaptive DNA acquisitions would impose a strong barrier on the emergence of complex phenotypes that require multiple gene acquisitions, because the size of horizontally transferred DNA segments is limited by the mechanisms of cellular DNA uptake (18, 19). For example, DNA transfers by phages (transduction), a major mechanism of HGT in E. coli (20), are limited by the carrying capacity of the phage capsid (18, 21).
Did ancient strains of the E. coli lineage find a way to circumvent the barrier to complex adaptations imposed by the size limit on HGTs? It has been proposed that complex metabolic adaptations may evolve via a neutral exploration of phenotype space (22, 23), hypothesizing that “many additions of individual reactions to a metabolic network will not change a metabolic phenotype until a second added reaction connects the first reaction to an already existing metabolic pathway” (23). However, no empirical data from bacterial metabolism supports this scenario (24); bacterial genomes appear compact and almost devoid of nonfunctional DNA sequences (17).
An alternative explanation for the emergence of complex adaptations was put forward by Szappanos et al. (24), who suggested that metabolic complexity may arise through successive noncomplex adaptations to changing environments. However, the relative roles in bacterial evolution of simple adaptations (proceeding through individually adaptive DNA acquisitions) vs. complex adaptations are currently unknown. What proportion of metabolic innovations in a given bacterial clade was complex (i.e., required multiple independent HGT events)? Did such multiple DNA acquisitions occur in quick succession, or were they spread over long evolutionary time spans, suggesting a string of successive adaptations to stepping-stone environments (24)? And, more fundamentally, How adaptable are generalist bacteria such as E. coli—that is, how many independent DNA acquisitions are typically required to allow a strain to grow in an environment where it was initially unviable?
Results
The E. coli Dataset.
We examine these questions by linking genomic and phenotypic evolution in the E. coli clade. Our analyses focus exclusively on changes in the accessory genomes that arose via HGT and gene losses; see the Discussion section for a review of the contribution of other genomic changes. Our dataset consists of 53 E. coli and Shigella strains (SI Appendix, Fig. S1 and Dataset S1), encompassing commensal as well as intestinal and extraintestinal pathogenic strains (25). These strains had been chosen to form a representative sample of the E. coli species (25) and cover the major recognized clades of E. coli (A, B1, B2, D, and E; see SI Appendix, Fig. S2). Because the Shigella strains are nested within the E. coli phylogeny (18, 26), we subsume them hereafter under the term E. coli. Genome-scale metabolic models have previously been reconstructed for these extant strains based on gene annotations, manually curated genome-scale metabolic models of the E. coli K-12 strain and several close relatives, and information obtained from public databases (25).
In a previous publication (18), we identified orthologous groups of genes between the 53 strains and 17 outgroup genomes based on amino acid sequence similarities. We then reconstructed a well-supported maximum-likelihood phylogeny of vertical descent based on the concatenated alignments of 1,334 universal one-to-one orthologs (SI Appendix, Fig. S1). Based on the presence and absence patterns of gene family members across the phylogeny, we inferred the gene content of the 52 ancestral genomes, using the maximum-likelihood algorithm implemented in the web server GLOOME (27). A gene acquisition via HGT was inferred if a gene was present in the derived, but not the ancestral, node of a phylogenetic branch (18).
In the same publication, we found 1,790 gene pairs whose members were repeatedly coacquired via HGT on the same branches of the phylogeny. The genomic distance between such cotransferred genes rarely exceeds 30 kb in length, indicating that individual genomic acquisitions by the E. coli strains in the present study are restricted to DNA segments of <30 kb. This 30-kb size limit is highly consistent with the size distribution of clusters of cofunctioning E. coli genes (18) and agrees with the size distribution of domesticated prophages in E. coli genomes (21). Although a comparative analysis of E. coli genomes identified “hot” genomic regions with elevated rates of homologous recombination that exceeded 100 kb in size, these appear to result from the superposition of multiple smaller HGT events (28).
Reconstruction of Ancestral E. coli Metabolic Systems.
Here, to track phenotypic innovations in the evolutionary history of the E. coli clade, we first reconstructed the metabolic networks of the 52 ancestral strains based on a consensus annotation of the extant metabolic networks published by Monk et al. (25) and the gene presence and absence data inferred by Pang and Lercher (18) (see Fig. 1 for an overview and SI Appendix, Detailed Materials and Methods for details). We then performed FBA on the ancestral and extant networks, testing their ability to grow in 200,000 randomly generated nutritional environments as well as in 2,418 environments used in previous simulations of E. coli K-12 metabolic adaptation (24) (Dataset S2). Thirty extant and 46 ancestral networks were each able to grow in more than 20% of the environments (SI Appendix, Fig. S3 and Dataset S1). Due to auxotrophies or gaps in essential pathways, the remaining models produced biomass in a small minority of the tested environments (≤0.5%) and were excluded from further analyses.
Fig. 1.
Overview of the methodology and main result. We started from the phylogeny of the E. coli genomes studied in refs. 18 and 25. (Left) Based on the genomes of the ancestral strains (18), we reconstructed their metabolic networks ①. For each strain and its immediate ancestor, we performed FBA to estimate viability and biomass yield across a wide range of nutritional environments ②. If the derived strain could grow in a given environment but the ancestor was unviable or produced biomass with much lower yield, we inferred a phenotypic innovation (Right) For each such innovation, the newly acquired genes responsible for the innovation (red bars) lie within <30 kb of each other on the genome of at least one descendant of the innovating strain. We therefore conclude that these genes were cogained through the horizontal transfer of a single DNA segment (indicated by the single gene-donating phage). In contrast, we found no case in which multiple independent HGT events (e.g., multiple phages) contributed to the same innovation.
As expected, strains that diverged very recently (amino acid sequence divergence <0.01%) tend to grow in the same environments. Beyond those nearly identical strains, however, we find that phenotypic similarity is practically independent of the amino acid sequence divergence of two strains (Spearman’s ρ = 0.0052, P = 0.88; SI Appendix, Fig. S4A), consistent with earlier observations of strong within-species diversification (12, 15). Thus, in contrast to longer evolutionary timescales (12), phenotypic divergence within species does not show clock-like evolution. Given that phenotypic divergence has been proposed to be driven by HGT (4), this independence is consistent with evidence for high rates of within-species recombination relative to point mutations (9).
Most Potential Phenotypic Innovations Require only a Single DNA Transfer.
We first wanted to study how easily possible phenotypic innovations could, in principle, arise via HGT within the E. coli pangenome. To this end, we developed a model of functional HGT based on the extant metabolic networks. We merged the metabolic networks of all E. coli strains into a supermodel. We performed FBA on this supermodel to identify all metabolic phenotypes accessible to the E. coli pangenome. The pangenome-scale supermodel was able to produce biomass in many environments inaccessible to any of the extant strains. By comparing the viability and biomass yield of each extant genome with the supermodel, we identified potential phenotype changes that could be bestowed on a given strain through the addition of reactions from the pangenome. We distinguished “new phenotypes”, defined as the ability of a given strain to produce biomass in an environment where it was previously unable to grow, and “yield-improved phenotypes”, defined as at least a doubling in biomass yield.
DNA segments of 30 kb represent an upper size limit for being successfully acquired via HGT by E. coli strains (18). We thus assumed that a set of metabolic genes could be transferred in a single HGT event from one of the extant genomes (the donor) if the genes reside within a segment of <30 kb in that genome. For each potential new or yield-improved phenotype, we then identified the minimal number of such 30-kb segments from other extant genomes that would have to be jointly transferred to bestow this phenotype on the recipient. We found that only 2.4% of potential new phenotypes and 7.4% of potential yield improvements require the acquisition of multiple DNA segments (i.e., they should be classified as complex adaptations) (Fig. 2).
Fig. 2.
A small minority of potential phenotypic innovations accessible within the E. coli pangenome require the acquisition of several distinct DNA segments via HGT (i.e., these innovations are complex).
Cross-Strain Phenotype Transfer Depends only Weakly on Sequence Divergence.
If a strain can grow in a certain environment, how likely is it that it can confer this ability to another strain through the transfer of a single DNA segment? To test this, we examined all environments in which one extant strain (the donor) can grow while another (the recipient) cannot. For most donor–recipient pairs, all previously lacking phenotypes can be bestowed on the recipient through the transfer of genes found on a single 30-kb DNA segment of the donor (i.e., through one HGT event); on average, 98.8% of growth abilities of one strain can be transferred to another strain in this way (SI Appendix, Fig. S4B). Although there is a weak negative correlation between phenotype transferability and amino acid divergence between strains (Spearman’s ρ = −0.24, P < 10−12), the vast majority of phenotypes can be transferred with a single DNA segment between even the most-divergent E. coli strains.
The probability that a DNA segment that confers a given phenotype to one strain confers the same phenotype to another strain decreases slightly with the sequence divergence between the two recipients (Spearman’s ρ = −0.20, P < 10−8) but remains above 90% regardless of sequence divergence (SI Appendix, Fig. S4C). The potential utility of HGT, however, does not depend on sequence divergence: The probability that the transfer of the genes found on a random 30-kb segment of the donor genome leads to any phenotypic innovation remains at around 24% per donor–recipient pair (considering only segments that contain at least one metabolic gene not present in the recipient; Spearman’s ρ = −0.0064, P = 0.85; SI Appendix, Fig. S4D).
Phenotypic Innovations in the History of E. coli Each Required only a Single DNA Transfer.
How does this picture of possible phenotypic innovations compare with the phenotypic changes actually observed throughout E. coli’s evolutionary history? From FBA simulations on the ancestral and the derived node of each phylogenetic branch, we identified 3,323 phenotypic innovations that arose on individual branches of the E. coli phylogeny (Fig. 1). Of these phenotypic innovations, 1,998 are new phenotypes, whereas 1,325 are yield improvements. For each phenotypic innovation on a given phylogenetic branch, we identified the genes that contributed to the innovation by performing parsimonious FBA (29). As the considered phenotype arose somewhere along the branch, one or more of the contributing genes must have been acquired there. We inferred that these genes were cotransferred on a single DNA segment if they are found within 30 kb of each other on the genome of at least one of the extant descendants of the branch (SI Appendix, Detailed Materials and Methods).
Strikingly, all metabolic innovations that we observed for the E. coli lineages were achieved through the horizontal acquisition of a single DNA segment. Whereas 71% of these segments facilitated the innovation through a single gene, <3% contained more than five relevant genes. Thus, the E. coli clade seems to be completely devoid of complex adaptations, at least on the timescale of individual phylogenetic branches. Our simulations identified at least one new phenotype for 55.8% and one yield improvement for 58.9% of the observed DNA segment acquisitions that involve metabolic genes of known molecular function. Thus, we find potential adaptations behind a majority of HGT events amenable to analysis via FBA, indicating that the simulated set of hypothetical environments overlaps substantially with the environmental landscape encountered by E. coli strains throughout their evolution. However, we identified many more new phenotypes and yield improvements per HGT than expected had randomly chosen DNA segments been transferred (SI Appendix, Fig. S5; binomial tests P < 10−15), indicating that the phenotypic innovations inferred via FBA are indeed correlated with meaningful evolutionary events.
While complex innovations drawn from the E. coli pangenome are expected to be rare (Fig. 2), they are still possible, either as alternatives to single-segment adaptations or in the rare cases in which multiple DNA segments are indeed necessary to evolve a phenotypic innovation. That we did not find a complex HGT scenario behind any of the 3,323 phenotypic innovations we observed indicates that metabolic evolution is unlikely to be a predominantly neutral process in E. coli [P < 10−15 for each type of innovation; one-sided binomial tests comparing the number of observed complex innovations (0) with the number expected if complex and noncomplex innovations were equally likely to be observed (2.4% of 1,998 new phenotypes and 7.4% of 1,325 yield-improved phenotypes, respectively)]. We thus conclude that only DNA segments that are individually adaptive are likely to spread through E. coli populations and that complex E. coli phenotypes rarely or never evolve through the neutral exploration of phenotype space.
Complex Innovations Through Stepwise Niche Expansion.
It has been suggested that metabolic evolution may often proceed through adaptations to intermediate environments that act as evolutionary stepping stones, providing reactions that can later be exapted for additional adaptations (24). Over the timescale of individual branches of the E. coli phylogeny, no such successive adaptations are observable. What about larger evolutionary timescales? For each phenotypic innovation observed on a branch of the E. coli phylogeny, we identified the number of DNA segments contributing to this phenotype that were acquired on earlier branches. We found that 10.6% of new phenotypes and 19.0% of yield improvements indeed relied on combining multiple DNA segment transfers since the last common ancestor of the E. coli strains studied here, using gene acquisitions on earlier branches as evolutionary stepping stones (Fig. 3). Examples for such complex innovations are shown in SI Appendix, Figs. S6–S8). Note that for every single one of these apparently complex innovations, alternatively, the same phenotype could have been bestowed on the ancestor of the successive gene acquisitions in a noncomplex fashion: in each case, at least one of the extant genomes contains a genomic segment of <30 kb that contains all of the genes required for the later innovation.
Fig. 3.
In the E. coli lineage, 10.6% of new phenotypes and 19.0% of yield improvements evolved through two to four successive horizontal DNA acquisitions on distinct branches of the phylogeny. Note that every single one of these apparently complex phenotypic innovations could instead have been bestowed on the immediate ancestor of the successive DNA acquisitions through a single <30-kb DNA segment presently found in one of the other extant E. coli strains.
Discussion
Based on combining comparative genomics and genome-scale metabolic modeling, we conclude that every single phenotypic innovation identified by our methodology was achieved through the acquisition of a single short segment of DNA via HGT. Such a far-reaching statement warrants some discussion.
Our analysis of phenotypic evolution in E. coli focuses on observed phenotypic innovations rather than on observed HGT events. While we additionally found that the majority of HGT events contribute to at least one metabolic phenotypic innovation, we emphasize that we cannot draw any conclusions on the adaptiveness of individual HGTs.
We analyzed only 53 extant genomes, a tiny subset of the total genomic diversity in the E. coli clade. While our strains represent the major E. coli clades (SI Appendix, Fig. S2), the analysis of additional strains would undoubtedly have unearthed additional phenotypic innovations. We cannot know what fraction of these unseen phenotypic innovations were complex (i.e., required the acquisition of multiple DNA segments). We can conclude, however, that complex metabolic innovations in E. coli are exceedingly rare: Our best estimate indicates that fewer than 1 in 3,323 (0.03%) metabolic innovations in E. coli were complex.
Of the 202,418 tested environments, 200,000 represent a largely unbiased sample of minimal nutritional environments, each constructed by choosing a random source each of carbon, nitrogen, phosphorus, and sulfur. Thus, our set of environments is obviously biased against more nutrient-rich environments. However, the main type of evolutionary innovation examined in our manuscript is the emergence of new phenotypes, defined as the ability to grow in a previously inaccessible environment. If an E. coli strain can produce biomass on a subset of the nutrients available in a given nutrient-rich environment, it can also grow in the full environment. Thus, acquiring the ability to grow in a nutrient-rich environment can never be more complex (in the sense of requiring more DNA acquisitions) than adaptation to a minimal subset of the corresponding nutrients. If anything, the restriction to minimal environments biases our analysis toward complex adaptations and is, in that sense, conservative.
We emphasize that we base our analysis on the known or predicted molecular functions of individual genes; if a gene family is known to have multiple molecular functions, we also consider those. While our analysis is—for obvious reasons—biased against phenotypic innovations that rely on currently unknown molecular gene functions, this should not create a bias toward noncomplex interactions. Even if the majority of horizontally acquired genes with unknown function were metabolic, there is no reason to think that they differ systematically in some fundamental way from genes of known molecular functions.
Gene functions are frequently discovered based on the effects of single-gene knockouts, suggesting a possible bias of known functions toward noncomplex phenotypes. However, the functions of metabolic genes are, by definition, “noncomplex”: Complex phenotypes arise not from individual enzymes or transporters, but from their interactions. Importantly, the gene-protein-reaction (GPR) associations in E. coli metabolism are well known also for protein complexes, as evidenced by the accurate prediction of gene knockout effects for genes that require protein complex formation for their function (30, 31). Accordingly, the experimental bias toward the study of individual genes appears uncritical in the context of our study.
Importantly, we make no assumptions about the processes or physiological functions in which genes or sets of genes are involved. FBA is agnostic of a gene’s or an operon’s physiological role and simply tests all possible ways in which a molecular function can benefit the production of biomass, given the organism’s overall network of metabolic reactions. Thus, when we test for biomass production in our simulations, the physiological function of genes and operons emerges from the molecular interaction of their gene products with other metabolic proteins encoded anywhere in the genome.
FBA identifies a set of reactions that, if activated together, would provide maximal growth. Thus, when identifying a phenotypic innovation, FBA implicitly assumes that regulation is optimally adapted to the environment considered. When a phenotypic innovation occurs—that is, when a given strain first acquires the enzymes and/or transporters required for growth in a new environment—protein expression is unlikely to be initially optimal; on the contrary, genomic DNA acquisitions may disrupt existing regulatory programs. If the strain finds itself in an environment where the innovation is of adaptive value, the FBA-predicted phenotype will be realized only after a period of regulatory adaptation. Laboratory evolution experiments have demonstrated that adaptive regulatory changes arise rapidly in evolving E. coli populations through point mutations or copy number changes (32–35). Thus, additional gene acquisitions are typically not required for regulatory adaptation.
We cannot test whether an appropriate protein expression pattern evolved in reality, whether the innovation was adaptive, or, indeed, whether the strain even experienced any environment in which the phenotype was of relevance. We find that every potentially adaptive phenotypic innovation arose through the acquisition of a single stretch of DNA of <30 kb in length. While only an unknown subset of these phenotypic innovations were truly adaptive, we can still conclude that all adaptive phenotypic innovations, as far as they are discernible from the genomic data and current metabolic modeling technology, arose through individual—and thus individually adaptive—HGT events.
In our analysis, we focused exclusively on HGT as the source of phenotypic innovations. What about the role of homologous sequence changes induced by genomic mutations or homologous recombination (5–8, 10, 11, 36)? Such sequence changes are likely to affect metabolic phenotypes, especially through changes in gene regulation (discussed above), in enzyme kinetics, or in substrate specificity. While changes in enzyme kinetics may affect growth rates in a given environment, they do not influence a strain’s ability to grow or its maximal biomass yield and are thus irrelevant to the phenotypic innovations analyzed here. Our analysis assumes that all members of an orthologous enzyme family in the E. coli strains are considered to catalyze the same reaction(s) of the same substrates (i.e., have identical biochemical functions). This assumption is justified because homologous enzymes with different biochemical functions generally show much higher sequence divergence than is observed between the orthologous E. coli sequences analyzed here (37). Thus, it appears likely that the majority of metabolic phenotypic innovations observable within the E. coli clade indeed arose via HGT, even if their full fitness benefits required subsequent regulatory adaptations through genomic mutations.
It would of course be desirable to test the predicted phenotypic innovations experimentally, adding the DNA segment inferred to be responsible for the innovation to an ancestral genome and observing the growth of the ancestral and engineered strains in the corresponding environment (after allowing enough time for the strains to adapt their gene regulation). Although such experimental gene additions are infeasible with the scale of our analysis, the removal (knockout) of genes provides very similar information on model reliability. FBA predictions for metabolic gene essentiality in E. coli have been shown to be extremely accurate, with correct predictions for between 91% and 95% of genes (31), justifying our reliance on this methodology.
In sum, our results provide a comprehensive picture of metabolic evolution across the E. coli species. Because two strains belonging to the E. coli species can easily recombine (36, 38), sequence divergence does not impose a barrier to phenotype transfer within the E. coli pangenome. Quite the opposite: even for the most diverged E. coli genomes in our dataset, there is still a 99% chance that a given phenotype of a donor strain can be transferred through the genes contained on a single DNA segment in the donor’s genome of <30 kb. Thus, if one E. coli strain is already adapted to a given environment, another strain “stranded” in this environment can almost always acquire the necessary metabolic reactions from its relative in a single HGT event. Contrary to earlier suggestions (22, 23), neutral metabolic evolution does not appear necessary, nor is it observable across E. coli strains.
E. coli’s adaptive efficiency is largely a consequence of its well-filled metabolic “toolbox” (39, 40). Based on the mathematical and computational analysis of abstract metabolic network topologies (40), the toolbox model posits that the number of phenotypes supported by a metabolic network, F, scales approximately quadratically with the number of enzymes, N, across different bacterial species: (see figure 2 in ref. 40). This means that on average, each new phenotype requires the acquisition of new enzymes (40). In their toolbox analogy, Maslov and colleagues (39, 40) liken enzymes to human tools: The larger one’s toolbox, the more tools one can repurpose for a new task and the fewer additional tools one needs to acquire. For the relatively large E. coli metabolic network [1,336 enzymes for the MG1655 strain (41)], the toolbox model predicts 1.3 enzymes for each new phenotype. For comparison, the endosymbiont Buchnera aphidicola, a species with a much smaller metabolic network [288 enzymes for the APS strain (41)] is predicted to need, on average, 5.2 enzymes. The already high metabolic versatility of E. coli strains means that a large selection of enzymes can be repurposed (exapted) for new phenotypes and that only a few additional genes need to be acquired via HGT. It appears likely that the efficiency of E. coli metabolic evolution through individual DNA transfers within a versatile pangenome is at least in part responsible for the frequent emergence of new pathogenic strains in this clade.
Materials and Methods
A detailed account of the materials and methods used can be found in SI Appendix, Detailed Materials and Methods; for a graphical summary, see Fig. 1. Briefly, we obtained a reliable phylogeny as well as reconstructions of the gene content of ancestral genomes and of HGT events from Pang and Lercher (18). Based on genomic neighborhoods of the extant strains and the previous observation that horizontally cotransferred sets of genes in E. coli genomes almost always lie within 30 kb of each other (18), we identified sets of genes likely to be cotransferred in a single HGT event.
We assembled a set of universal GPR associations across the metabolic networks of the extant E. coli strains reconstructed by Monk et al. (25). We reconstructed the metabolic networks of the ancestral strains by applying the universal GPR associations to the inferred genomes. An efficient implementation (42) of FBA (13, 14) was used to calculate maximal biomass production rates of the ancestral and extant strains across 202,418 nutritional environments. We defined phenotypic distances as 1 − J, where J is the Jaccard index of the subsets of environments in which each of the two compared strains can grow. We defined phenotypic innovations as cases in which a strain can produce biomass in an environment where its immediate ancestor could not (new phenotypes) or where its biomass yield is at least double that of its immediate ancestor (yield-improved phenotypes). In each case, the innovation occurred through the acquisition of one or more genes via HGT on the phylogenetic branch immediately preceding the innovation; if these genes lie within 30 kb of each other on the genome of one of the innovator’s descendants, we concluded that they were coacquired in a single HGT event.
Supplementary Material
Acknowledgments
We thank Esther Sundermann for preparing Fig. 1 and Balázs Papp, Csaba Pál, and Bill Martin for helpful discussions. This work was supported by DFG Grants CRC 680 (to M.J.L.) and CRC 1310 (to T.Y.P. and M.J.L.) and by Volkswagen Foundation Grant Life 93 043 (to M.J.L.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1718997115/-/DCSupplemental.
References
- 1.Felsenstein J. The evolutionary advantage of recombination. Genetics. 1974;78:737–756. doi: 10.1093/genetics/78.2.737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Moran NA. Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc Natl Acad Sci USA. 1996;93:2873–2878. doi: 10.1073/pnas.93.7.2873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cooper TF. Recombination speeds adaptation by reducing competition between beneficial mutations in populations of Escherichia coli. PLoS Biol. 2007;5:e225. doi: 10.1371/journal.pbio.0050225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pál C, Papp B, Lercher MJ. Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet. 2005;37:1372–1375. doi: 10.1038/ng1686. [DOI] [PubMed] [Google Scholar]
- 5.Tatum EL, Lederberg J. Gene recombination in the bacterium Escherichia coli. J Bacteriol. 1947;53:673–684. doi: 10.1128/jb.53.6.673-684.1947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dykhuizen DE, Green L. Recombination in Escherichia coli and the definition of biological species. J Bacteriol. 1991;173:7257–7268. doi: 10.1128/jb.173.22.7257-7268.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Guttman DS, Dykhuizen DE. Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science. 1994;266:1380–1383. doi: 10.1126/science.7973728. [DOI] [PubMed] [Google Scholar]
- 8.Kowalczykowski SC, Dixon DA, Eggleston AK, Lauder SD, Rehrauer WM. Biochemistry of homologous recombination in Escherichia coli. Microbiol Rev. 1994;58:401–465. doi: 10.1128/mr.58.3.401-465.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Spratt BG, Hanage WP, Feil EJ. The relative contributions of recombination and point mutation to the diversification of bacterial clones. Curr Opin Microbiol. 2001;4:602–606. doi: 10.1016/s1369-5274(00)00257-5. [DOI] [PubMed] [Google Scholar]
- 10.Wirth T, et al. Sex and virulence in Escherichia coli: An evolutionary perspective. Mol Microbiol. 2006;60:1136–1151. doi: 10.1111/j.1365-2958.2006.05172.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tenaillon O, Skurnik D, Picard B, Denamur E. The population genetics of commensal Escherichia coli. Nat Rev Microbiol. 2010;8:207–217. doi: 10.1038/nrmicro2298. [DOI] [PubMed] [Google Scholar]
- 12.Plata G, Henry CS, Vitkup D. Long-term phenotypic evolution of bacteria. Nature. 2015;517:369–372. doi: 10.1038/nature13827. [DOI] [PubMed] [Google Scholar]
- 13.Watson MR. Metabolic maps for the Apple-II. Biochem Soc Trans. 1984;12:1093–1094. [Google Scholar]
- 14.Orth JD, Thiele I, Palsson BO. What is flux balance analysis? Nat Biotechnol. 2010;28:245–248. doi: 10.1038/nbt.1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sabarly V, et al. The decoupling between genetic structure and metabolic phenotypes in Escherichia coli leads to continuous phenotypic diversity. J Evol Biol. 2011;24:1559–1571. doi: 10.1111/j.1420-9101.2011.02287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Soucy SM, Huang J, Gogarten JP. Horizontal gene transfer: Building the web of life. Nat Rev Genet. 2015;16:472–482. doi: 10.1038/nrg3962. [DOI] [PubMed] [Google Scholar]
- 17.Mira A, Ochman H, Moran NA. Deletional bias and the evolution of bacterial genomes. Trends Genet. 2001;17:589–596. doi: 10.1016/s0168-9525(01)02447-7. [DOI] [PubMed] [Google Scholar]
- 18.Pang TY, Lercher MJ. Supra-operonic clusters of functionally related genes (SOCs) are a source of horizontal gene co-transfers. Sci Rep. 2017;7:40294. doi: 10.1038/srep40294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Junier I, Rivoire O. Conserved units of co-expression in bacterial genomes: An evolutionary insight into transcriptional regulation. PLoS One. 2016;11:e0155740. doi: 10.1371/journal.pone.0155740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Golomidova A, Kulikov E, Isaeva A, Manykin A, Letarov A. The diversity of coliphages and coliforms in horse feces reveals a complex pattern of ecological interactions. Appl Environ Microbiol. 2007;73:5975–5981. doi: 10.1128/AEM.01145-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bobay LM, Touchon M, Rocha EPC. Pervasive domestication of defective prophages by bacteria. Proc Natl Acad Sci USA. 2014;111:12127–12132. doi: 10.1073/pnas.1405336111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wagner A. Neutralism and selectionism: A network-based reconciliation. Nat Rev Genet. 2008;9:965–974. doi: 10.1038/nrg2473. [DOI] [PubMed] [Google Scholar]
- 23.Wagner A. The Origins of Evolutionary Innovations: A Theory of Transformative Change in Living Systems. Oxford Univ Press; Oxford: 2011. [Google Scholar]
- 24.Szappanos B, et al. Adaptive evolution of complex innovations through stepwise metabolic niche expansion. Nat Commun. 2016;7:11607. doi: 10.1038/ncomms11607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Monk JM, et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc Natl Acad Sci USA. 2013;110:20338–20343. doi: 10.1073/pnas.1307797110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Pupo GM, Lan R, Reeves PR. Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc Natl Acad Sci USA. 2000;97:10567–10572. doi: 10.1073/pnas.180094797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Cohen O, Ashkenazy H, Belinky F, Huchon D, Pupko T. GLOOME: Gain loss mapping engine. Bioinformatics. 2010;26:2914–2915. doi: 10.1093/bioinformatics/btq549. [DOI] [PubMed] [Google Scholar]
- 28.Yahara K, et al. The landscape of realized homologous recombination in pathogenic bacteria. Mol Biol Evol. 2016;33:456–471. doi: 10.1093/molbev/msv237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Holzhütter HG. The principle of flux minimization and its application to estimate stationary fluxes in metabolic networks. Eur J Biochem. 2004;271:2905–2922. doi: 10.1111/j.1432-1033.2004.04213.x. [DOI] [PubMed] [Google Scholar]
- 30.Orth JD, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism–2011. Mol Syst Biol. 2011;7:535. doi: 10.1038/msb.2011.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hartleb D, Jarre F, Lercher MJ. Improved metabolic models for E. coli and mycoplasma genitalium from GlobalFit, an algorithm that simultaneously matches growth and non-growth data sets. PLoS Comput Biol. 2016;12:e1005036. doi: 10.1371/journal.pcbi.1005036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Fong SS, Palsson BO. Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat Genet. 2004;36:1056–1058. doi: 10.1038/ng1432. [DOI] [PubMed] [Google Scholar]
- 33.Fong SS, Joyce AR, Palsson BO. Parallel adaptive evolution cultures of Escherichia coli lead to convergent growth phenotypes with different gene expression states. Genome Res. 2005;15:1365–1372. doi: 10.1101/gr.3832305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Blount ZD, Barrick JE, Davidson CJ, Lenski RE. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature. 2012;489:513–518. doi: 10.1038/nature11514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Tenaillon O, et al. Tempo and mode of genome evolution in a 50,000-generation experiment. Nature. 2016;536:165–170. doi: 10.1038/nature18959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Dixit PD, Pang TY, Studier FW, Maslov S. Recombinant transfer in the basic genome of Escherichia coli. Proc Natl Acad Sci USA. 2015;112:9070–9075. doi: 10.1073/pnas.1510839112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Whisstock JC, Lesk AM. Prediction of protein function from protein sequence and structure. Q Rev Biophys. 2003;36:307–340. doi: 10.1017/s0033583503003901. [DOI] [PubMed] [Google Scholar]
- 38.Dixit PD, Pang TY, Maslov S. Recombination-driven genome evolution and stability of bacterial species. Genetics. 2017;207:281–295. doi: 10.1534/genetics.117.300061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Maslov S, Krishna S, Pang TY, Sneppen K. Toolbox model of evolution of prokaryotic metabolic networks and their regulation. Proc Natl Acad Sci USA. 2009;106:9743–9748. doi: 10.1073/pnas.0903206106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Pang TY, Maslov S. A toolbox model of evolution of metabolic pathways on networks of arbitrary topology. PLoS Comput Biol. 2011;7:e1001137. doi: 10.1371/journal.pcbi.1001137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Caspi R, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2016;44:D471–D480. doi: 10.1093/nar/gkv1164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Gelius-Dietrich G, Desouki AA, Fritzemeier CJ, Lercher MJ. Sybil–Efficient constraint-based modelling in R. BMC Syst Biol. 2013;7:125. doi: 10.1186/1752-0509-7-125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



