Abstract
A major obstacle to the mapping of genotype-phenotype relationships is pleiotropy, the tendency of mutations to affect seemingly unrelated traits. Pleiotropy has major implications for evolution, development, ageing, and disease. Except for disease data, pleiotropy is almost exclusively estimated from full gene knockouts. However, most deleterious alleles segregating in natural populations do not fully abolish gene function, and the degree to which a polymorphism reduces protein function may influence the number of traits it affects. Utilizing genome-scale metabolic models for Escherichia coli and Saccharomyces cerevisiae, we show that most fitness-reducing full gene knockouts of metabolic genes in these fast-growing microbes have pleiotropic effects, i.e., they compromise the production of multiple biomass components. Alleles of the same metabolic enzyme-encoding gene with increasingly reduced enzymatic function typically affect an increasing number of biomass components. This increasing pleiotropy is often mediated through effects on the generation of currency metabolites such as ATP or NADPH. We conclude that the physiological effects observed in full gene knockouts of metabolic genes will in most cases not be representative for alleles with only partially reduced enzyme capacity or expression level.
Introduction
A gene is pleiotropic if it affects more than one phenotypic trait1,2. A classic example is phenylketonuria, a human disease that is caused by a single gene defect but which affects multiple systems, with symptoms ranging from lighter skin color to mental disorders3. Pleiotropic effects can cause alleles to affect fitness differentially at different ages, a phenomenon believed to be a major cause of aging4–6; indeed, alleles contributing to increased longevity often show reduced fertility and stress tolerance7. Similar antagonistic epistasis may underlie other important biological phenomena such as speciation8 and cooperation9. Understanding the factors that contribute to pleiotropy is of fundamental importance in genetics10–12, evolution13–16, development17,18, as well as in disease19,20 and ageing4. In comparison to its fundamental importance, empirical knowledge of the prevalence and especially on the causal mechanisms of pleiotropy is scarce2,21.
Pleiotropy may be classified according to the types of traits considered22. Molecular gene pleiotropy refers to the number of functions of a gene and its products, e.g., the number of reactions catalyzed by a single enzyme. Developmental pleiotropy describes the genetic and evolutionary interdependence of phenotypic aspects. Finally, selectional pleiotropy refers to the number of separate fitness components affected by mutations to a gene. In this study, we focus on the latter type of pleiotropy.
Experimental studies generally assess pleiotropy through the analysis of gene knockouts23–25. The degree of pleiotropy is then defined as the number of traits affected when a gene becomes fully non-functional. Wang et al.25 analyzed phenotypes of large numbers of yeast, nematode, and mouse mutants. They found that pleiotropy is widespread: on average, yeast gene knockouts affect 8% of the examined traits; for the nematode, the corresponding number is 10%, for the mouse 3% (see also24). The distributions of the degree of pleiotropy appear rather similar across very different study systems, from the skeletal features of mice25 to metabolic systems26–28. Moreover, pleiotropy was found to be modular, such that sets of genes tend to affect the same sets of traits25.
In E. coli, 36% of metabolic reactions are catalyzed by enzymes also involved in other reactions; the same is true for 27% of metabolic reactions in the yeast Saccharomyces cerevisiae28. Pleiotropic effects of mutations that affect enzyme activity can be simulated from genome-scale metabolic models using constraint-based modeling techniques such as flux balance analysis (FBA)29,30. The functional pleiotropy of a metabolic gene can then be defined as the number of biomass components whose maximal production is affected by the gene’s knockout26. Previous studies using this definition found that a metabolic gene’s functional pleiotropy is related to its propensity to form negative epistatic interactions with other metabolic genes26,27.
While full gene knockouts are easily examined experimentally, they may not be representative of the effects of deleterious alleles segregating in natural populations: individual mutations may affect only a subset of all traits influenced by the gene31. Thus, it is important to distinguish between the pleiotropy of the gene and the pleiotropy of individual mutations, especially in evolutionary and clinical contexts. For example, while 4.6% of human SNPs implicated in complex non-Mendelian phenotypes show pleiotropic effects, most of these do not fully abolish protein function32. Experimental studies indicate that mutational pleiotropy tends to be smaller than gene pleiotropy31.
Genome-scale metabolic models allow us to dissect the relationship between gene and mutational pleiotropy in quantitative detail, without being hampered by the detection limits of experimental assays. Does the degree of pleiotropy depend on how severely a given allele of a metabolic gene reduces protein activity, i.e., are the same number of functions affected when protein function or expression is reduced only partially? How modular is metabolic pleiotropy? Currency metabolites, such as ATP and NADPH, are used as cofactors in many otherwise unrelated reactions; it thus appears highly likely that a substantial fraction of metabolic pleiotropy is due to effects on the production of currency metabolites. Is such an effect of currency metabolites on patterns of pleiotropy confirmed by simulated data?
Below, we address these questions by analyzing the metabolic networks of a representative bacterial model system, Escherichia coli, and a corresponding eukaryotic system, the baker’s yeast Saccharomyces cerevisiae. We find that most gene knockouts that impact fitness do so by affecting the production of multiple biomass components, and that the number of affected biomass components typically increases with increasing mutation severity. Pleiotropy is rarely a consequence of multiple molecular gene functions, but is an emergent property of the metabolic network. For many genes, pleiotropy is indeed mediated through their involvement in the generation of currency metabolites.
Results
Estimating pleiotropy from contributions to biomass components within the wildtype flux distribution
We first estimated wildtype flux distributions in the default growth condition for the genome-scale metabolic model of E. coli33 and the yeast S. cerevisiae34 (obtained from https://sourceforge.net/projects/yeast/files/). The maximal biomass production rates were estimated using flux balance analysis (FBA)29,30. For both model systems, we identified the flux distribution compatible with maximal biomass production that had the smallest sum of absolute fluxes, a strategy often termed parsimonious FBA (pFBA), which approximates optimal utilization of limited cellular protein resources35.
To simulate mutations that cause different reductions of protein function or expression and correspond to different deleterious alleles of a metabolic gene, we restricted the maximal flux through all reactions requiring this gene to a fixed percentage of the estimated wildtype flux36, starting from 100% (the wildtype) down to 0% (a full gene knockout) in steps of 0.5%. For each flux reduction, we defined the degree of pleiotropy (referred to simply as “pleiotropy” below) as the number of biomass components whose production was reduced by at least 0.01% compared to the maximal (wildtype) production. Note that with this definition, only genes with pleiotropy ≥2 are pleiotropic, while genes with pleiotropy 0 (no affected biomass component) or pleiotropy 1 (one affected biomass component) are non-pleiotropic.
Flux distributions at maximal biomass production rate are usually not unique35, and so in many cases a flux restriction through one reaction may be compensated by a rerouting of fluxes through alternative pathways. Such rerouting would require the upregulation of the corresponding genes. While it has been observed experimentally that cells can survive many gene deletions in central metabolism without drastic changes in gene expression37, the necessary upregulation of protein expression will not occur spontaneously at least for some pathways38. More importantly, if we are interested in the de facto contribution of a given gene to the production of biomass components, then it is of no consequence if alternative pathways could take over part of this functionality. Thus, when calculating the maximal (wildtype) production rate of individual biomass components as well as when simulating the effects of mutations to a given metabolic gene, we did not allow the redistribution of fluxes to alternative pathways: we allowed only decreases, not increases, of the absolute value of any flux compared to the wildtype flux distribution obtained with pFBA.
Note that experimental studies often employ a pragmatic working definition of pleiotropy that lies somewhere between the definitions of pleiotropy proposed here based on the wild type flux distribution on the one hand and a quantitative measure of essentiality based on an analogous calculation that allows the free redistribution of fluxes. In these studies, pleiotropy is typically estimated as the number of traits with observable phenotypic changes after the gene knockout, but before allowing the strain to adapt to its new genotype. In this case, some fluxes may be rerouted due to enzymes and transporters that are expressed either spuriously or because of other roles they play in wildtype physiology, while other fluxes that require the upregulation of the corresponding enzymes and transporters will not yet be active. Thus, our definition of pleiotropy describes a “worst case scenario”, providing an upper limit on experimentally measured pleiotropy.
Many genes affect the production of multiple biomass components
Pleiotropy varies widely between different genes. Mutations to the majority of genes affect no biomass components in the minimal growth medium assayed, independent of mutation severity (E. coli: 1,067 genes or 78.1%; S. cerevisiae: 687 genes or 75.6%). Among genes contributing to biomass production—and thus fitness—in the wildtype, non-pleiotropic cases are rare: in E. coli, only 54 full-gene knockouts (out of 299 knockouts with fitness contributions, 18.1%) affect exactly one biomass component, while the same is true for only 12 knockouts (out of 222 knockouts with fitness contributions, 5.4%) in S. cerevisiae. Conversely, knockouts of 32 genes in E. coli (10.7% of knockouts with fitness contributions) and 40 genes in S. cerevisiae (18.1% of knockouts with fitness contributions) affect the production of all biomass components. Many of the remaining genes show low degrees of pleiotropy, affecting the production of only a few biomass components; on average, full gene knockouts of fitness-relevant genes affect the production of 20% of biomass components in E. coli and 34% of biomass components in S. cerevisiae (Fig. 1, Table 1).
Table 1.
E. coli c | S. cerevisiae c | ||
---|---|---|---|
Number of biomass components | 50 | 35 | |
Standard model | Pleiotropya | 10.0 (3) | 12.1 (5) |
Essentialitya | 4.6 (2) | 9.3 (2) | |
Free NADPHb | Pleiotropya | 9.9 (3) | 11.1 (4.5) |
Essentialitya | 4.3 (2) | 8.0 (1) |
aIn the FBA calculations, fluxes are either constrained to not exceed the wildtype (WT) fluxes to estimate the de facto contribution of gene products to biomass production (Pleiotropy), or they are allowed to vary freely to assess the number of biomass components for which gene products are essential even after allowing the mutant strain to adapt (Essentiality).
bSolution when allowing unlimited conversion of NADPH to NADP+ cMean (median) number of affected biomass components.
These percentages reflect functional pleiotropy, the de facto contribution of gene products to biomass component production. If, instead, we are interested in the phenotypic effects of gene knockouts after allowing the mutant strain to adapt its physiology to its altered gene content, we must allow free redistributions of fluxes after the gene knockouts. Corresponding simulations show that after adaptation, genes with fitness contributions are, on average, essential for the production of 9.2% of E. coli biomass components and of 26.6% of S. cerevisiae biomass components (Table 1, Supplementary Figure S1). The degree of gene pleiotropy for yeast is substantially higher than previous experimental estimates, which are around 2 (corresponding to 2–11% of considered traits depending on the types of traits analyzed)25; however, experimental estimates of gene pleiotropy tend be downwardly biased due to experimental detection limits2,22.
Pleiotropy is an emergent property of the metabolic network
Pleiotropy can be classified by its origin into type I pleiotropy, caused by multiple molecular functions of a gene product, and type II pleiotropy, caused by multiple physiological consequences of a single molecular function2. Similar distinctions have been made previously using the terms “horizontal” vs. “vertical”10 and “mosaic” vs. “relational”39 pleiotropy. Our model allows us to quantify the relative contributions of these two pleiotropy types. 41.7% of E.coli genes and 40.5% of yeast genes in our metabolic models catalyze multiple reactions. To what extent does this functional diversity cause functional pleiotropy as measured in the number of biomass components affected by a gene knockout? To answer this question, we compared the gene pleiotropy (Fig. 1) to the pleiotropy of individual reactions catalyzed by the gene product. For example, fully abolishing all functions of the purB (b1131) gene, whose gene product catalyzes two distinct biochemical reactions, reduced the production of 18 biomass components. In contrast, blocking only one of the catalyzed reactions results in a pleiotropy estimate of 16, while blocking only the other reaction results in a pleiotropy of 10. Thus, the pleiotropy of the b1131 gene is largely of type II, and is only in small part due to its multiple molecular functions.
This pattern is typical: the maximal pleiotropy arising from blocking only a single out of several reactions catalyzed by the same protein accounts for over 97% of the gene pleiotropy (E. coli 97.4%, yeast 97.6%). These numbers drop only marginally when we consider only gene products that are essential for multiple reactions, to 92.2% in E. coli and to 94.5% in yeast (Supplementary Figure S3). We conclude that the vast majority of metabolic epistasis is of type II, i.e., is an emergent property of the metabolic network rather than a consequence of multiple molecular functions. This finding is consistent with the previous observation that the degree of pleiotropy in yeast is not significantly correlated with the number of molecular gene functions40.
Metabolic networks show significant but low modularity
The relationship between genes and biomass components (traits) can be represented as a bipartite graph, with links connecting genes with affected biomass components. Modules are defined as sets of genes and traits with significantly more within-module than between-module links25. A high degree of modularity thus indicates that pleiotropic genes tend to affect groups of related traits (e.g., chemically related biomass components) rather than random sets of traits. Supplementary Figure S4 shows heatmaps that illustrate the modularity of both metabolic pleiotropy networks. To quantitatively assess the modularity, we used the LP&BRIM algorithm41, resulting in raw modularities of Q = 0.235 for E. coli and Q = 0.197 for S. cerevisiae. Both networks show highly statistically significant modularity: in each case, the modularity of 10,000 randomly rewired networks was always lower than observed for the real pleiotropy network (i.e., P < 0.0001; Supplementary Figure S5).
Following ref.25, we then defined a z-score for modularity (or “scaled modularity”)42 as the difference between the observed modularity and the mean modularity of randomly rewired networks, measured in number of standard deviations. The E. coli pleiotropy network exhibits a scaled modularity of 9.1, while the S. cerevisiae network has a scaled modularity of 4.9, i.e., the modularity of metabolic pleiotropy is about 9 and 5 standard deviations higher than for corresponding random gene-trait networks. These values are surprisingly low: for five different experimental study systems and trait definitions, Wang et al. found a median scaled modularity of 37 (range 34–238). Thus, metabolic pleiotropy networks are less modular than other pleiotropy networks, suggesting that the underlying metabolic network shows more interconnections between the pathways producing different sets of biomass components than the genetic networks underlying other types of traits. Our findings on modularity may be related to the role of currency metabolites, which crosslink the diverse metabolic pathways (see below).
Pleiotropy typically increases with increasing mutation severity
We next examined the pleiotropy of alleles with small-effect mutations, i.e., mutations that reduce enzyme capacity without fully abolishing enzyme function. About 20% of E. coli genes with fitness contributions have constant pleiotropy: small-effect mutations of these genes affect the same number of biomass components as full gene knockouts. In comparison, only 7.7% of yeast genes contributing to fitness exhibit constant pleiotropy.
All other genes contributing to fitness affect an increasing number of biomass components for increasingly deleterious alleles. Figure 2 shows this stepwise increase in pleiotropy for the example of Lipoamide dehydrogenase (gene names: E. coli b0116, S. cerevisiae YFL018C; for additional examples, see Supplementary Figure S6. In both organisms, pleiotropy typically increases in about a dozen steps from weakly to strongly deleterious alleles (Fig. 3; mean number of steps: E. coli 11.6, S. cerevisiae 12.6).
The pleiotropy of the full gene knockout constitutes an upper limit to the number of stepwise increases in pleiotropy. If there was otherwise no systematic relationship between maximal pleiotropy and the number of steps, we would expect the numbers of steps to be uniformly distributed between zero and the pleiotropy of the full knockout. However, the correlation between the number of steps and pleiotropy at full knockout was much stronger than expected from such a relationship (Supplementary Figure S7, Spearman’s ρ = 0.926 (E. coli) and ρ = 0.986 (S. cerevisiae), P < 10−6 in each case from randomizations; see Methods). Thus, genes whose full knockout showed higher metabolic pleiotropy also showed more stepwise increases in pleiotropy for increasingly debilitating mutations.
All genes whose mutations affect the production of at least one biomass component must also affect the overall production of biomass (i.e., in the common interpretation of FBA, fitness). The reverse is not true: a mutation to a gene may affect the maximal production of biomass, but not the production of any individual biomass component. This is a consequence of the algorithm employed to estimate production capabilities for individual biomass components. If we maximize the production of a single compound, then pathways usually concerned with the production of other biomass components can be diverted to the production of this compound. While we find no such genes for S. cerevisiae, this is indeed the case for 3 essential E. coli genes, which encode transporters for acetate (b4067), magnesium/nickel/cobalt (b3816), and calcium/sodium (b3196, an antiporter).
The pleiotropy of most genes is mediated by currency metabolites
We can conceptually partition internal metabolites into currency metabolites—those involved in many reactions, e.g., to provide energy or redox equivalents43—and primary metabolites. A deleterious allele may affect the production of a given biomass component because the mutated gene catalyzes a reaction in a pathway of primary metabolites that directly leads to the component’s production. Conversely, a deleterious allele may affect not the primary metabolites, but the currency metabolites utilized in the component’s production. A list of 14 currency metabolites was obtained from ref.43. Excluding exchange reactions, 753 out of a total of 2,251 reactions (33.5%) in E. coli and 310 out of 3,324 reactions (9.3%) in S. cerevisiae involved at least one of these metabolites.
A substantial fraction of pleiotropy is indeed associated with the generation of currency metabolites: 87.4% of previously pleiotropic genes show reduced pleiotropy when we make metabolites such as ATP, UTP, or NADPH freely available in yeast (Fig. 4). The free availability of ATP alone reduces the degree of pleiotropy of over half of pleiotropic yeast genes. The influence of currency metabolite production on pleiotropy is weaker, yet still substantial in E. coli: here, 55.3% of pleiotropic genes are affected, with NADH making the biggest contribution (over 40%) (Fig. 4).
Involvement in currency metabolite production is an important determinant of the number of biomass components for which a gene knockout is essential even after allowing the mutant strain to adapt its protein expression to the altered gene content of its genome. This contribution is particularly striking in yeast: for over half of the tested currency metabolites, free availability reduces the number of biomass components for which a gene is essential for almost half of the genes (Supplementary Figure S8).
Discussion
Using constraint-based simulations of the metabolic models for E. coli and the yeast S. cerevisiae, we have characterized the distributions of pleiotropy. Consistent with earlier computational26–28 and experimental23–25 studies, we found that the knockout of a majority of genes that contribute to fitness has pleiotropic effects. The vast majority of this gene pleiotropy is not caused by multiple molecular functions of the gene product (type I), but is an emergent property of the metabolic network (type II). Pleiotropy is modular, but to a lower degree than estimated experimentally for non-metabolic systems25.
For most pleiotropic genes, pleiotropy increases strongly for alleles with increasingly debilitating effects. Thus, standard measures of pleiotropy based on gene knockout studies are more likely to reflect the maximal degree of mutational pleiotropy of a given gene2,31. Alleles that only knock down protein activity (by reducing enzyme/transporter function or expression level) often affect only a subset of phenotypic traits, with additional traits affected progressively as alleles become more deleterious. Thus, the physiological effect of the full gene knockout will in most cases not be representative for the effects of deleterious alleles that retain some level of enzyme function. This type of effect is also evident from individual medical observations of pleiotropy. For example, some small-effect mutations affecting human SOX9 expression lead to minor skeletal malformations, while the consequences of large-effect mutations can include sex reversals44.
How can we understand the dependence of pleiotropy on the degree to which an allele reduces protein activity? For increasingly deleterious alleles, more and more metabolic resources must be channeled into the compensation of the compromised pathway; as a consequence of this increasing drain of resources, more and more other pathways are affected. Not surprisingly27, we found that the pleiotropy of many genes is mediated through the generation of currency metabolites such as ATP, NADPH, or FADH2. This is true for more than half of the pleiotropic genes in E. coli, and for 87% of pleiotropic genes in yeast.
While the overall patterns of pleiotropy appear qualitatively similar between E. coli and yeast, we found a number of quantitative differences. Compared to E. coli genes, yeast genes (i) showed generally higher pleiotropy and were rarely of pleiotropy 1; (ii) were less likely to have constant pleiotropy; and (iii) were more likely to show reduced pleiotropy when supplied with currency metabolites. Moreover, (iv) the yeast pleiotropy network exhibited lower modularity. In part, these differences may be related to network size. The E. coli metabolic network encompasses substantially more genes overall than the yeast network. However, we constrained network usage to reactions active in the wildtype. In contrast to total network sizes, the active metabolic network of yeast (755 reactions and 184 metabolites) is substantially larger than the active metabolic network of E. coli (462 reactions and 103 metabolites); this difference is consistent with the notion that yeast metabolism is more complex, yet less flexible than E. coli metabolism. While the average number of reactions per metabolite is similar between E. coli (4.49) and yeast (4.10), the lower yeast modularity indicates that reactions more often connect otherwise distant network parts. A role of currency metabolites in such connecting reactions would be consistent with the larger effect of currency metabolite supply on pleiotropy in yeast. In sum, the higher interconnectedness of the yeast pleiotropic network, combined with the larger active metabolic network size, appears to provide more potential for pleiotropic effects.
Pleiotropy is a complex phenomenon: it is not constant, but varies between different alleles of the same gene, and its causes are often indirect. Thus, experimental as well as computational analyses of pleiotropy should move away from focusing on full gene knockouts, and instead consider explicitly the degree to which mutations reduce protein activity. The necessity of a corresponding nuanced view of pleiotropy may be particularly evident in studies of medically relevant mutations, where full knockouts are often lethal, while small-effect mutations may segregate at appreciable frequencies in the human population45.
Materials and Methods
Metabolic models
To simulate Escherichia coli metabolism, we used the metabolic reconstruction iJO136633, encompassing 1,366 metabolic genes associated with 2,251 reactions. For the yeast S. cerevisiae, we used the yeast7.6 model (https://sourceforge.net/projects/yeast)34, accounting for 909 metabolic genes associated with 3,324 reactions. The published models were used without any modifications. The E. coli model contains a growth-independent maintenance energy consumption term (the ATPM reaction), which enforcies a minimal ATPase activity of 3.15 mmol/gDW/h. We utilized the default biomass reactions for E. coli (Ec_biomass_iJO1366_core_53p95M), which comprises 50 essential biomass components (Supplementary Table S1), considering only “substrates” of the biomass reactions and excluding inorganic ions and H2O. For S. cerevisiae, we used the “yeast 5 biomass pseudoreaction”, which comprises 35 essential biomass components (Supplementary Table S2), again considering only “substrates” and excluding inorganic ions and H2O.
Flux distribution constraints derived from wildtype simulations
In order to approximate the de facto contribution of individual metabolic proteins to the production of individual biomass components in vivo, we should only consider flux distributions that are naturally active during growth (biomass production) in the nutritional environment studied, and fluxes should not exceed these wildtype fluxes. We thus first estimate the wildtype flux distribution vWT, by running a flux balance analysis (FBA) with the biomass reaction as the objective function, followed by a minimization of the sum of absolute fluxes at the previously determined maximal biomass production rate (parsimonious FBA35).
When simulating the production of individual biomass components, we constrained all fluxes vi to values between zero and the wildtype flux viWT for this reaction, i.e.,
1 |
Estimating pleiotropy
For each essential biomass component (Supplementary Tables S1 and S2, respectively), we added a new exchange reaction representing its secretion46. As some biomass components may be coupled through the biomass reaction, we allowed the free excretion of all other biomass components when maximizing the production of one selected biomass component (i.e., vj ≥ 0 for all added exchange reactions j).
We then calculated the maximum production of each biomass component by maximizing its exchange reaction flux while enforcing the wildtype flux distribution constraints (Eq. 1). For each metabolic gene, we compared this unperturbed maximal production with the maximal production rate of alleles with increasingly reduced protein activity, simulated by restricting the flux through all reactions catalyzed by the gene to a fixed fraction of the wildtype flux36, which we reduced from 100% to 0% in steps of 0.5%. The flux through a specific reaction was constrained in this way only if the gene-protein-reaction (GPR) mapping contained the affected gene either alone or only in an “AND” relationship (i.e., as an essential part of a protein complex); if the GPR listed the affected gene in an “OR” relationship (i.e., as one of multiple isoenzymes or alternative transporters), the reaction was not affected.
We defined pleiotropy as the number of biomass components whose maximal production was reduced by at least 0.01% compared to the unperturbed state (WT) for the allele considered26. Thus, an allele not involved in the maximal production of any essential biomass component is considered to have pleiotropy 0; an allele that affects the production of exactly one essential biomass component has pleiotropy 1.
Our estimate of pleiotropy reflects the actual contribution of a gene product to biomass formation, based on estimated enzyme and transporter activities in the wildtype. If instead, one is interested in a quantitative measure of essentiality, defined as the number of biomass components affected by a deleterious allele after the mutant strain has been allowed to adapt its physiology to the gene deletion, a different algorithm is more appropriate. In this case, one needs to allow the free redistribution of fluxes after the simulated activity reduction of the protein encoded by the gene in question.
Statistical test for the relationship between pleiotropy at full gene knockout and number of steps
The full knockout (maximal) pleiotropy sets an upper limit to the possible number of pleiotropy steps at decreasing enzyme activity. The null hypothesis is that the number of steps is uniformly distributed between zero and maximal pleiotropy; i.e., the null hypothesis assumes that apart from the upper limit, there is no systematic relationship between maximal pleiotropy and number of steps. We tested this through a randomization protocol, where we constructed 106 datasets with the same maximal pleiotropies, but with step numbers drawn from the corresponding uniform distributions. All random datasets for both the E. coli and the yeast data had Spearman rank correlation coefficients between pleiotropy at full knockout and number of steps that were lower than the observed correlation coefficients. Thus, the empirical P-value was <10−6 for both data sets.
Currency metabolites
In additional analyses, we made several cofactors freely available to study how pleiotropy is associated with the generation of currency metabolites. We did this by adding a balanced biochemical reaction that interconverts the activated and inactivated versions of the cofactor and allowing unlimited flux of this reaction in both directions. For example, to simulate free NADPH, we added the following reversible reaction:
A list of currency metabolites was obtained from ref.43. Supplementary Table S3 lists the currency metabolites and the corresponding exchange reactions as well as the number of reactions utilizing each currency metabolite.
Software used
All simulations were performed in R47 using sybil, a computer library optimized for efficient constraint-based modeling of metabolic networks48. We used IBM ILOG CPLEX as the linear solver, connected to sybil via the cplexAPI R package.
To calculate network modularities, we used the LP&BRIM algorithm (Label Propagation with Bipartite Recursively Induced Modules) implemented in Matlab49.
Electronic supplementary material
Acknowledgements
We thank Balazs Papp and Jonathan Fritzemeier for helpful discussions. This work was supported financially through fellowships by the German Academic Exchange Service (DAAD) to DA and AAD, and through German Research Foundation (DFG) grants to MJL (CRC 680 and CRC 1310).
Author Contributions
D.A. and M.J.L.: conceptualization, methodology, writing. D.A.: investigation, software, validation. A.A.D.: assistance with metabolic modeling and software implementation. M.J.L.: funding acquisition, supervision.
Data Availability
All input files, R scripts, and raw data used to generate the results and figures can be found on github at https://github.com/deyazoubi/pleiotropy-.git. An overview over the individual files is given in the Readme file.
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Supplementary information accompanies this paper at 10.1038/s41598-018-35092-1.
References
- 1.Stearns FW. One Hundred Years of Pleiotropy: A Retrospective. Genetics. 2011;187:355–355. doi: 10.1534/genetics.110.125021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wagner GP, Zhang J. The pleiotropic structure of the genotype-phenotype map: the evolvability of complex organisms. Nat Rev Genet. 2011;12:204–213. doi: 10.1038/nrg2949. [DOI] [PubMed] [Google Scholar]
- 3.Paul D. A double-edged sword. Nature. 2000;405:515–515. doi: 10.1038/35014676. [DOI] [PubMed] [Google Scholar]
- 4.Williams GC. Pleiotropy, Natural-Selection, and the Evolution of Senescence. Evolution. 1957;11:398–411. doi: 10.1111/j.1558-5646.1957.tb02911.x. [DOI] [Google Scholar]
- 5.Zwaan BJ. The evolutionary genetics of ageing and longevity. Heredity. 1999;82:589–597. doi: 10.1046/j.1365-2540.1999.00544.x. [DOI] [PubMed] [Google Scholar]
- 6.Moorad JA, Promislow DEL. What can genetic variation tell us about the evolution of senescence? Proceedings of the Royal Society B-Biological Sciences. 2009;276:2271–2278. doi: 10.1098/rspb.2009.0183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Paaby AB, Schmidt PS. Dissecting the genetics of longevity in Drosophila melanogaster. Fly (Austin) 2009;3:29–38. doi: 10.4161/fly.3.1.7771. [DOI] [PubMed] [Google Scholar]
- 8.Slatkin M. Pleiotropy and Parapatric Speciation. Evolution. 1982;36:263–270. doi: 10.1111/j.1558-5646.1982.tb05040.x. [DOI] [PubMed] [Google Scholar]
- 9.Foster KR, Shaulsky G, Strassmann JE, Queller DC, Thompson CR. Pleiotropy as a mechanism to stabilize cooperation. Nature. 2004;431:693–696. doi: 10.1038/nature02894. [DOI] [PubMed] [Google Scholar]
- 10.Tyler AL, Asselbergs FW, Williams SM, Moore JH. Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. Bioessays. 2009;31:220–227. doi: 10.1002/bies.200800022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wright, S. Evolution and the genetics of populations: a treatise in four volumes. (University of Chicago Press, 1968).
- 12.Barton NH. Pleiotropic Models of Quantitative Variation. Genetics. 1990;124:773–782. doi: 10.1093/genetics/124.3.773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fisher, R. A. The genetical theory of natural selection. (Clarendon Press, 1930).
- 14.Orr HA. Adaptation and the cost of complexity. Evolution. 2000;54:13–20. doi: 10.1111/j.0014-3820.2000.tb00002.x. [DOI] [PubMed] [Google Scholar]
- 15.Waxman D, Peck JR. Pleiotropy and the preservation of perfection. Science (New York, N.Y.) 1998;279:1210–1213. doi: 10.1126/science.279.5354.1210. [DOI] [PubMed] [Google Scholar]
- 16.Otto SP. Two steps forward, one step back: the pleiotropic effects of favoured alleles. Proc Biol Sci. 2004;271:705–714. doi: 10.1098/rspb.2003.2635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hodgkin J. Seven types of pleiotropy. Int J Dev Biol. 1998;42:501–505. [PubMed] [Google Scholar]
- 18.Carroll SB. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell. 2008;134:25–36. doi: 10.1016/j.cell.2008.06.030. [DOI] [PubMed] [Google Scholar]
- 19.Albin RL. Antagonistic pleiotropy, mutation accumulation, and human genetic disease. Genetica. 1993;91:279–286. doi: 10.1007/BF01436004. [DOI] [PubMed] [Google Scholar]
- 20.Brunner HG, van Driel MA. From syndrome families to functional genomics. Nat Rev Genet. 2004;5:545–551. doi: 10.1038/nrg1383. [DOI] [PubMed] [Google Scholar]
- 21.Zhang J, Wagner GP. On the definition and measurement of pleiotropy. Trends Genet. 2013;29:383–384. doi: 10.1016/j.tig.2013.05.002. [DOI] [PubMed] [Google Scholar]
- 22.Paaby AB, Rockman MV. The many faces of pleiotropy. Trends Genet. 2013;29:66–73. doi: 10.1016/j.tig.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Dudley Aimée Marie, Janse Daniel Maarten, Tanay Amos, Shamir Ron, Church George McDonald. A global view of pleiotropy and phenotypically derived gene function in yeast. Molecular Systems Biology. 2005;1(1):E1–E11. doi: 10.1038/msb4100004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wagner GP, et al. Pleiotropic scaling of gene effects and the ‘cost of complexity’. Nature. 2008;452:470–472. doi: 10.1038/nature06756. [DOI] [PubMed] [Google Scholar]
- 25.Wang Z, Liao BY, Zhang JZ. Genomic patterns of pleiotropy and the evolution of complexity. Proceedings of the National Academy of Sciences of the United States of America. 2010;107:18034–18039. doi: 10.1073/pnas.1004666107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Szappanos B, et al. An integrated approach to characterize genetic interaction networks in yeast metabolism. Nat Genet. 2011;43:656–662. doi: 10.1038/ng.846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bajic D, Moreno-Fenoll C, Poyatos JF. Rewiring of genetic networks in response to modification of genetic background. Genome Biol Evol. 2014;6:3267–3280. doi: 10.1093/gbe/evu255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang Z, Zhang JZ. Abundant Indispensable Redundancies in Cellular Metabolic Networks. Genome Biology and Evolution. 2009;1:23–33. doi: 10.1093/gbe/evp002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Watson MR. Metabolic Maps for the Apple-II. Biochem Soc T. 1984;12:1093–1094. doi: 10.1042/bst0121093. [DOI] [Google Scholar]
- 30.Orth JD, Thiele I, Palsson BO. What is flux balance analysis? Nature Biotechnology. 2010;28:245–248. doi: 10.1038/nbt.1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stern DL. Evolutionary developmental biology and the problem of variation. Evolution. 2000;54:1079–1091. doi: 10.1111/j.0014-3820.2000.tb00544.x. [DOI] [PubMed] [Google Scholar]
- 32.Sivakumaran S, et al. Abundant Pleiotropy in Human Complex Diseases and Traits. American Journal of Human Genetics. 2011;89:607–618. doi: 10.1016/j.ajhg.2011.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Orth JD, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism–2011. Mol Syst Biol. 2011;7:535. doi: 10.1038/msb.2011.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Aung HW, Henry SA, Walker LP. Revising the Representation of Fatty Acid, Glycerolipid, and Glycerophospholipid Metabolism in the Consensus Model of Yeast Metabolism. Ind Biotechnol (New Rochelle N Y) 2013;9:215–228. doi: 10.1089/ind.2013.0013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Holzhutter HG. The principle of flux minimization and its application to estimate stationary fluxes in metabolic networks. European Journal of Biochemistry. 2004;271:2905–2922. doi: 10.1111/j.1432-1033.2004.04213.x. [DOI] [PubMed] [Google Scholar]
- 36.Xu L, Barker B, Gu ZL. Dynamic epistasis for different alleles of the same gene. Proceedings of the National Academy of Sciences of the United States of America. 2012;109:10420–10425. doi: 10.1073/pnas.1121507109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ishii N, et al. Multiple high-throughput analyses monitor the response of E. coli to perturbations. Science (New York, N.Y.) 2007;316:593–597. doi: 10.1126/science.1132067. [DOI] [PubMed] [Google Scholar]
- 38.Shlomi T, Berkman O, Ruppin E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc Natl Acad Sci USA. 2005;102:7695–7700. doi: 10.1073/pnas.0406346102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hadorn, E. Developmental genetics and lethal factors. (Methuen; Wiley, 1961).
- 40.He X, Zhang J. Toward a molecular understanding of pleiotropy. Genetics. 2006;173:1885–1891. doi: 10.1534/genetics.106.060269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Liu, X. & Murata, T. In 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (IEEE, Milan, Italy, 2009).
- 42.Wang Z, Zhang JZ. In search of the biological significance of modular structures in protein networks. Plos Computational Biology. 2007;3:1011–1021. doi: 10.1371/journal.pcbi.0030107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Fritzemeier CJ, Hartleb D, Szappanos B, Papp B, Lercher MJ. Erroneous energy-generating cycles in published genome scale metabolic networks: Identification and removal. PLoS Comput Biol. 2017;13:e1005494. doi: 10.1371/journal.pcbi.1005494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cameron FJ, Sinclair AH. Mutations in SRY and SOX9: Testis-determining genes. Human Mutation. 1997;9:388–395. doi: 10.1002/(SICI)1098-1004(1997)9:5<388::AID-HUMU2>3.0.CO;2-0. [DOI] [PubMed] [Google Scholar]
- 45.McKusick-Nathans Institute of Genetic Medicine - Johns Hopkins University (Baltimore MD). Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/omim/.
- 46.Shlomi T, et al. Systematic condition-dependent annotation of metabolic genes. Genome Research. 2007;17:1626–1633. doi: 10.1101/gr.6678707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.The R Foundation. The R Project for Statistical Computing, https://www.r-project.org.
- 48.Gelius-Dietrich G, Desouki AA, Fritzemeier CJ, Lercher MJ. Sybil–efficient constraint-based modelling in R. BMC Syst Biol. 2013;7:125. doi: 10.1186/1752-0509-7-125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Flores CO, Poisot T, Valverde S, Weitz JS. BiMat: a MATLAB package to facilitate the analysis of bipartite networks. Methods Ecol Evol. 2016;7:127–132. doi: 10.1111/2041-210X.12458. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All input files, R scripts, and raw data used to generate the results and figures can be found on github at https://github.com/deyazoubi/pleiotropy-.git. An overview over the individual files is given in the Readme file.