Abstract
System-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism’s genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene’s fitness contribution to an organism “here and now” and the same gene’s historical importance as evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call “function-loss cost”, which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.
Introduction
Quantitatively assessing the contribution of each gene to the overall fitness of an organism is an ongoing challenge in evolutionary and systems biology [1]. A classical, bioinformatics estimate of this contribution has been the evolutionary rate of the gene in question, which is based on genetic sequence conservation patterns amongst phylogenetically related genes [2–5]. This evolutionary rate metric serves as a historical record, providing a retrospective cumulative quantification of the importance of a gene. In contrast, systems biology methods are able to specifically quantify, for each gene, its current contribution to overall organism fitness by directly measuring [6,7] or estimating [8,9] the fitness defect caused by the removal of that gene. The natural question arises of whether the current contribution of a given gene to organism fitness, i.e. its dispensability, correlates with its historical importance. It is non-trivial whether such a relationship should exist, because the dispensability of any one gene at any set time point may be influenced by many complex factors, including the environmental condition(s) and its interactions with other genes within the genome, whose effects cannot be discerned from evolutionary rate. This question has been previously addressed in the model organism Saccharomyces cerevisiae (budding yeast) [10,11], for which fitness defect scores upon gene deletion have been experimentally measured in a systematic and comprehensive way [6,7,12,13]. Interestingly, a significant negative correlation between gene evolutionary rate and gene dispensability is detectable, although the signal is weak (Spearman’s ρ approx. −0.2).
In addition to the high-throughput experimental techniques used to quantify gene dispensability at the genome scale, constraint-based modeling techniques—such as flux balance analysis (FBA) [14]–may be used to efficiently generate such data in silico [15]. Flux balance models have been shown to successfully recapitulate several experimental observations, including growth phenotypes under various environmental conditions and gene essentiality in select lab conditions [16–18]. However, one of the puzzling failures of FBA techniques has been precisely the lack of even moderate correlation between predicted gene dispensability and evolutionary rate [11]. This lack of correlation has been ascribed to a number of possible reasons, including lack of knowledge about the most relevant environmental conditions to be used in simulations, and the complex condition-dependence of gene essentiality [8,11,13].
Here we present an alternative metric for measuring gene dispensability using FBA, which we call “function-loss cost” (Fig 1, light green arrows). As opposed to the standard “gene-loss cost” (Fig 1, dark orange arrows), our new metric estimates the total cost of a gene’s deletion by integrating the fitness costs of removing each enzymatic function associated with that gene from the FBA model, even if alternative isoenzymes exist for a given reaction. This is in contrast to the standard assumption in FBA models that isoenzymes associated with the same reaction act as completely redundant backups of each other. Using function-loss cost as our measure of gene dispensability, we are able to observe a negative correlation between the impact of gene deletion and gene evolutionary rate, that is significantly stronger than the same correlation calculated using gene-loss cost (Fig 2).
Furthermore, we find that our new treatment of isoenzymes in the model can also be informative in the study of genetic interactions, or epistasis [19]. The ability of FBA to predict the experimentally observed epistatic interaction [20] between any two metabolic gene deletions is changed when treating isoenzymes in this new way, with an increase in the true positive rate but also an increase in the false positive rate. Thus, function-loss cost provides novel insight about enzyme gene dispensability, while simultaneously suggesting that some standard assumptions used in genome scale modeling may not be universally applicable.
Results
Gene-loss cost and evolutionary rate correlate weakly in minimal environments
In prior work, it was established that gene-loss cost, as estimated by flux balance genome-scale models of metabolism, correlates poorly with gene evolutionary rate [11]. These prior calculations had been performed for a large number (approx. 104) of randomly generated combinations of environmentally available metabolites, and using different variants of the FBA objective function (including the standard maximization of biomass production flux [14] and the minimization of metabolic adjustment upon gene deletion [21]). We started by revisiting these results, using a recently updated stoichiometric reconstruction [22,23], a different strategy for choosing a large number of environmental conditions, and independently computed evolutionary rates.
In particular, to impose environmental constraints in our FBA calculations, we generated 1,632 minimal media, each containing a nitrogen and a carbon source, in all possible combinations (see Materials and Methods for details and [24] for use of a similar strategy). Gene-loss costs were calculated across all metabolic enzyme genes and environments, using the standard FBA protocol for gene knockouts (see Material and Methods and [25]). Evolutionary rates for S. cerevisiae metabolic genes were calculated using a modified version of dN/dS from orthologs in five related species spanning a phylogenetic timetable of roughly 10–100 million years (see Materials and Methods and [26]). Our results (Fig 2A, dark orange distribution) show that gene-loss costs weakly anticorrelate with gene evolutionary rate (Spearman’s ρ ranging between −0.28 and −0.03). This anticorrelation is both markedly stronger and closer to experimental results as compared to the FBA calculations in [11].
Notably, in contrast to FBA calculations previously used for this type of analysis, we limit each minimal environment to a single source of carbon and a single source of nitrogen. At the model level, such minimal media strictly enforce a kind of metabolic resource scarcity. In the absence of this scarcity, the FBA model can reroute metabolic fluxes to use alternate resources at zero cost, masking the effect of blocking individual pathways with a deletion. We also took advantage of these minimal environments to test whether or not a particular carbon or nitrogen substrate significantly influenced the anticorrelation between gene evolutionary rate and gene-loss cost. However, we do not observe that any one specific carbon or nitrogen source produces significantly stronger correlations than the other sources (p = 0.07, Wilcoxon rank-sum test adjusted for multiple comparisons). The strongest average anticorrelation for an individual carbon or nitrogen source is ρ = −0.21 for pyruvate which has a standard deviation of 0.03, compared to a mean of ρ = −0.19 and standard deviation of 0.04 across all combinations of carbon and nitrogen sources.
A newly defined function-loss cost has stronger anticorrelation with evolutionary rate
Given the weakness of the correlation observed between FBA-computed gene-loss cost and gene evolutionary rate, we asked ourselves whether any step in the FBA calculation could potentially distort the estimation of the cost of gene deletion. We ended up focusing our attention on the gene-to-reaction mapping, which, in the FBA knockout calculation, translates the deletion of a gene into the corresponding flux constraints that block (potentially multiple) reactions associated with that gene (Fig 1). This mapping, expressed using simple Boolean logic, plays a particularly important role for reactions that are catalyzed by multiple enzymes (isoenzymes) or by enzyme protein complexes (Fig 1). For two isoenzymes catalyzing the same reaction, for example, deletion of one the two enzymes has no effect on the corresponding flux in a traditional FBA knockout calculation, because the other enzyme is assumed to provide full backup functionality (Fig 1B). However, abundant experimental evidence suggests that this backup effect is often limited, or condition-dependent [27–29]. The cumulative effect of this discrepancy in genome scale calculations could be quite significant, given that more than one third of the metabolic enzymes in S. cerevisiae are members of isoenzyme sets (and thus would end up incurring no cost whatsoever under standard FBA knockout calculations). We thus hypothesize that fixing this oversimplification in the assessment of gene-loss cost could have a non-negligible effect on the above-mentioned correlation estimate.
In defining a new score for the functional cost incurred upon gene deletions, we also wanted to take into account the fact that multi-functional enzymes (i.e., enzymes that catalyze more multiple distinct reactions, Fig 1C) may be under more evolutionary pressure to maintain their function(s) than genes performing only a single function, especially if all such functions are essential.
These considerations led us to define a new metric predicting the impact of gene deletions in genome-scale models. In particular, we define the function-loss cost of a gene as the sum of all costs incurred by removing each individual reaction catalyzed by the gene from the network (see also Materials and Methods and Fig 1), with the assumption of zero backup capacity by isoenzymes. The distribution of the newly introduced function-loss cost is substantially different from the distribution of gene-loss cost computed before (S1 and S2 Figs). Notably, for any gene that does not belong to the set of isoenzymes or to the set of multi-functional enzymes, the function-loss cost is identical to the gene-loss cost.
Interestingly, using our new function-loss cost metric as the measure of gene dispensability, we obtain a significantly stronger negative correlation between this measure and gene evolutionary rate than using gene-loss cost (Fig 2A, Wilcoxon signed-rank test p = 2 x 10−240). In fact, the mean anticorrelation between these data (ρ = −0.27) is even stronger than the anticorrelation observed between gene evolutionary rate and experimentally-measured gene essentialities, even though strict gene essentiality prediction accuracy obtained using function-loss cost is reduced relative to the accuracy obtained using gene-loss cost (odds ratio drops from 30 to 7). Note that the distribution of correlations between function-loss costs and evolutionary rates across different environments is similar in shape and with a similarly narrow standard deviation to the distribution previously obtained for gene-loss cost, indicating the recovery of anticorrelation obtained with the function-loss cost is not strongly dependent on nutrient choice.
Isoenzymes play a special role in determining the anticorrelation with evolutionary rate
As a next step in our analysis, we set out to examine the contributions from isoenzymes and multi-functional enzymes to the improved negative correlations. Recalculating the correlation distributions using only the isoenzymes (Fig 2C) shows much weaker correlations between gene-loss cost and evolutionary rate than the same correlations calculated using the whole gene set (Fig 2A). For function-loss cost, restricting to isoenzymes has comparatively little effect. Conversely, for the correlation distributions excluding the isoenzymes (Fig 2B), the distribution using the gene-loss cost is significantly more negative than when using all genes in the model (Wilcoxon’s signed-rank test p = 3 x 10−241). When the correlation distributions are recalculated using only the subset of genes in the model which are multifunctional and are not isoenzymes (Fig 2D) we observe that the function-loss cost correlations are significantly weaker than gene-loss cost (Wilcoxon’s signed-rank test p = 1 x 10−198). From these observations we can conclude that it is the treatment of isoenzymes that is the cause of the stronger anticorrelation with evolutionary rate seen when using function-loss cost compared with gene-loss cost.
As a further investigation of whether the improvement gained from function-loss cost is due to a better accounting of isoenzyme deletion cost, we recomputed the impact of gene deletion in two variant “hybrid-loss cost” ways. First, we computed the gene deletion impact using the gene-loss score for all genes except the multi-functional enzyme genes, for which we use the new function-loss score (we will refer to this schema as hybrid-loss cost 1). Conversely, in a separate calculation (hybrid-loss score 2), we computed the gene deletion impact using the gene-loss score for all genes except the isoenzyme-associated genes, for which we use the new functional-loss score. We find that hybrid-loss score 2 displays a negative correlation very similar to the one observed with the full function-loss score (whereas hybrid-loss score 1 displays a negative correlation similar to gene-loss cost) (S3 Fig). Taken together with the results from Fig 2, this indicates that incorrectly accounting for the effect of isoenzyme deletion has a prominent role in the capacity to discern the relationship between the impact of gene deletion and evolutionary rate. In turn, this suggests that deletion of an isoenzyme is costly, corroborating previous arguments that true redundant functional backup is not evolutionarily sustainable [30].
In order to gather further insight into the relationship between different enzymes in an isoenzyme set, we tested the correlation between function-loss cost and evolutionary rate for different specific choices of enzymes within each set. Specifically, for each isoenzyme set, we identified the enzyme which is most conserved (slowest evolutionary rate), and the one that is least conserved (fastest evolutionary rate). Thus, across all isoenzyme pairs, we could collect a subset of all fast evolving and slow evolving isoenzymes.
Notably, when computing the correlation between function-loss cost and evolutionary rate with the inclusion of slow-evolving isoenzymes only (Fig 3B, light green distribution), we found an average correlation of ρ = −0.38. This correlation is even more negative (Wilcoxon signed-rank test p = 1 x 10−236) than the mean correlation found using the whole gene set (Fig 2A, light green distribution). When similarly selecting for the fastest-evolving isoenzymes from each isoenzyme set (Fig 3A), the mean correlation of ρ = −0.30 is significantly less negative than that when using the slowest-evolving isoenzymes (Wilcoxon signed-rank test p = 4 x 10−230). Prior work had established that different isoenzymes catalyzing the same reaction evolve at different rates, and that this could be interpreted as a signal of subfunctionalization [31,32]. Our analysis reveals for the first time that, in computing the anticorrelation between function-loss score and evolutionary importance, excluding the fast-evolving isoenzymes results in a stronger anticorrelation than excluding the slow-evolving isoenzymes. This suggests that the historical (long-term evolutionary) importance of slow-evolving (i.e. highly conserved) genes carries more information about their experimentally measurable essentiality relative to fast-evolving counterparts.
Modeling isoenzymes as non-redundant increases the number of predicted epistatic interactions
Given that changing how isoenzymes map to reactions in the flux balance model significantly affects the predicted cost of single gene deletions, it is interesting to ask how this change would affect the predicted cost of multiple simultaneous gene deletions. Whether the combined effect of pairs of genetic perturbations is predictable from knowledge of each individual effect constitutes a question with broad implications. In fact, deviations from simple expectations (i.e. epistasis) can significantly affect evolutionary processes [33], and can provide valuable functional information about the underlying system [34]. Previous work has investigated the capacity of FBA models to predict epistatic interactions between pairs of metabolic enzyme genes [35], also motivated by the availability of extensive experimental datasets of genetic interactions in S. cerevisiae [20]. Here, we test the ability of FBA to predict these experimentally derived interactions for two gene-to-reaction mappings. The first mapping—the default in FBA—assumes complete isoenzyme redundancy (gene-loss cost-like), while the second assumes that isoenzymes are completely non-redundant (function-loss cost-like). Our results are scored based on the ability of the model to predict the type of interaction between pairs of genes correctly: synergistic (the double knockout combines to limit cell growth more than expected from the single knockouts, e.g. synthetic lethal interactions), antagonistic (the double knockout has better cell growth than expected), or non-interacting.
Interestingly, by assuming that isoenzymes are completely non-redundant, the flux balance model correctly predicts more experimentally verified genetic interactions between pairs of genes than when assuming that isoenzymes are completely redundant (Fig 4). This increase is not limited to a particular interaction type (synergistic or antagonistic). Using either gene-to-reaction mapping assumption, the predictions are significantly better than random, for both categories of interaction (Fisher’s exact test, p < 0.01). The fact that changing the isoenzyme assumption increases the number of predicted epistatic interactions is perhaps unsurprising, given that, under standard gene-loss cost protocols, an isoenzyme may only be predicted to exhibit epistasis under very narrow circumstances. Namely, (1) the isoenzyme must be one of an isoenzyme pair, (2) the other deletion must be of the partner isoenzyme, and (3) the reaction they catalyze together must incur a cost penalty when blocked. However, although the sensitivity increases after changing the isoenzyme to reaction mapping, there is also a large rise in the number of false positives. Although the recall is improved, the precision is reduced (S5 Fig).
Discussion
We have introduced function-loss cost, a new metric for quantifying the impact of the deletion of a gene based on genome-scale models of metabolism. This metric is similar to previously estimated gene-loss impacts, except for the modification of some of the basic assumptions on how the deletion of a gene translates into reaction flux constraints. The modification that ends up being responsible for recovering the expected correlation between gene deletion impact and evolutionary rate is the assumption on how isoenzyme deletion affects the corresponding reaction flux. While previous calculations assume that each enzyme in a set of isoenzymes can unconditionally perform the function in the absence of the other isoenzymes, the algorithm we use here assumes that deletion of each isoenzyme causes a complete loss of function for the cell. Based also on multiple types of analyses and observations [27,29,36], one would obviously expect the reality to be a complex combination of the above assumptions: different isoenzymes may respond differently to different environmental perturbations, and provide backup to each other to varying degrees. What our results indicate, however, is that—on average—the assumption that each isoenzyme fulfills an essential metabolic role is more consistent with the evolutionary record than the opposite assumption of isoenzymes being unconditionally, individually dispensable. On the other hand, the assumption that function-loss cost makes with respect to how the cost of multifunctional enzymes should be estimated, appears to reduce the strength of the relationship between evolutionary rate and the model predicted loss cost. The approach of taking the sum of the costs of the individual reactions was chosen to try and capture the maximum cost of a gene deletion. In the future it could be worthwhile to investigate other formulations of the cost for the loss of multifunctional enzymes, such as taking the maximum cost of the deletion of a single reaction rather than the sum of the costs of each reaction.
From the perspective of flux balance modeling, our analysis suggests that extra caution should be used when applying the classical gene-to-reaction mapping relationships to estimate the effect of gene loss, especially when using these models to understand evolutionary aspects of metabolism. As to whether our newly suggested way to deal with isoenzyme deletion will be helpful in comparisons with experimental gene deletion studies, this requires additional evaluations.
With respect to epistasis, we have shown that this modification to the assumption of how isoenzymes map to reactions results in more correctly predicted genetic interactions but also in more false positives. In prior calculations, using the standard gene-to-reaction mapping (Fig 1B), it would have been possible to detect such interactions only between two isoenzymes that are the only two catalysts for a given reaction. In any other case (e.g. an interaction involving a single isoenzyme and another arbitrary enzyme), the complete backup assumption of isoenzyme sets would completely mask any possible interaction. In the future, by integrating high throughput experimental data (such as epistasis measurements) and network structure information, it may be possible to rewrite reaction-specific gene-to-reaction relationships (using AND or OR) in order to further improve model prediction capacity.
This could prove to be a very important development for the use of constraint-based models as tools in the future study of genetics, especially in the area of biomedicine. Double gene deletions that result in cell death (synthetic lethal deletions) are an important avenue of cancer research, where the ability to induce lethality only within subpopulation of cells that carry specific mutations by inducing a perturbation to the entire population is of obvious benefit. Similarly, research into other metabolic diseases, such as fructose intolerance, could benefit from increased ability to predict unexpected changes of metabolic phenotypes caused by double gene perturbation events.
Materials and Methods
Yeast metabolic model and genes used in this study
This study was conducted using the Yeast 7 metabolic model of Saccharomyces cerevisiae metabolism [22,23], which may be obtained from http://yeast.sf.net (specifically, version 7.6). This model specifies a metabolic network consisting of 3493 reactions between 2220 metabolites, a set of 909 enzyme-encoding genes, and a set of Boolean expressions associating reactions to all possible subsets of genes that are required for catalysis (the gene-to-reaction mapping, also known as the gene-protein-reaction expression map or GPR). We identified blocked reactions in the model (reactions incapable of carrying flux) using a previously established method [37] and subsequently purged all genes associated only with these reactions from our analyses. All subsequent analyses presented in this section and the results presented in this paper were made using this subset of 792 genes. For the specific cases of correlating gene-loss cost and function-loss cost with evolutionary rate, restricting our analyses to a subset of metabolic genes did not significantly impact the outcome (S4 Fig).
Calculation of gene evolutionary rates
The evolutionary rates of all metabolic genes included in this study were derived following the procedure described in [26]. First, dN/dS ratios [2] (hereafter referred to simply as k) were obtained from [26] for each Saccharomyces cerevisiae model gene from its corresponding ortholog in five related yeast species: Saccharomyces bayanus, Saccharomyces castellii, Lachancea kluyveri (formerly S. kluyveri [38]), Saccharomyces mikatae, and Saccharomyces paradoxus. This provided, for each gene g, five separate strain-dependent measures of evolutionary rates (, , etc.). To obtain a single representative rate for each gene , we first grouped all values of k by strain, converted these sets to rank order, and then took the average rank of each gene across these sets; that is, where ḱg is the rank order of kg within the set and y is the yeast comparison strain. The use of this measure of evolutionary rate allows more proteins to be analyzed than standard dN/dS, since this method does not require that a protein have orthologs in all other species used. There were 13 genes for which none of the five related species carried an appropriate ortholog. These genes were excluded from the analysis of evolutionary rates. Note that throughout the paper we refer to the averaged evolutionary rate rank score () as the evolutionary rate. Importantly, since all our correlations involving evolutionary rates are rank-based measures, this does not affect the outcome of these calculations.
Prediction of gene-loss costs for S. cerevisiae metabolic genes
The gene-loss cost of each gene is calculated as the relative loss in predicted fitness of the gene-knockout mutant as compared to the predicted fitness of the wild-type yeast. Fitness predictions for the wild type and all mutants were obtained using standard flux balance analysis (FBA), which has been previously described in [14]. Briefly, FBA calculates the rate of flow (i.e. flux) of metabolites through each reaction (vi) in the metabolic network in such a way as to maximize the flux through a pseudo-reaction describing organism growth (w = vbiomass), while at the same time satisfying the major constraint of steady state mass balance: , where S is the stoichiometric matrix. Additional constraints may be imposed on each reaction, such that the minimum and/or maximum flux allowed through it is bounded (αi ≤ vi ≤ βi). In our FBA model we impose three such types of additional constraints on reactions: (1) reaction irreversibility constraints (vi > 0), as defined by the original model; (2) constraints pertaining to environmental nutrient availability, available in S1 Table; and (3) constraints imposed by gene deletions, described here in detail. Gene deletions are translated, through the gene-to-reaction mapping, to constraints on some number of reactions (possibly zero) which limit the flux through these reactions to zero (αi = 0 ≤ vi ≤ βi = 0). With fitness taken to be the flux through the biomass reaction, the normalized gene-loss cost of any gene g can be expressed as:
where ww.t. is the fitness of the wild type and wΔR is the fitness of the mutant with the set of reactions R blocked. For a reaction r to be in the set R, the gene in question (g) must be a necessary prerequisite for that reaction, as determined by the GPR:
Prediction of function-loss costs for S. cerevisiae metabolic genes
The function-loss cost for each gene is calculated as the sum total of the individual costs of removing each function (reaction) the gene is responsible for from the model one-by-one, where an individual cost is represented by the fitness loss of the single-reaction knockout mutant relative to the wild type as predicted by FBA. For this purpose, a gene g is said to be responsible for a reaction r if the gene appears anywhere in that reaction’s associated GPR expression. This translates to a fairly simple adaptation of the gene-loss cost metric, which can be expressed as:
Generation of environmental conditions for gene-deletion impact simulations
Environmental conditions for flux balance simulations were generated by modifying a previously defined heuristic for determining minimal media that support growth [24]. First, an initial minimal medium was manually defined for the model, such that each primary nutrient (e.g. carbon and nitrogen) was provided by only a single metabolite. Our initial medium consisted of glucose, ammonium (NH4+), inorganic phosphate and sulfate, oxygen, and minerals (S1 Table). We then identified alternative carbon-providing metabolites by removing glucose from this initial medium and exhaustively testing all other metabolites for growth. Similarly, nitrogen-providing metabolites were identified by the removal of ammonium and subsequent testing of metabolites. Our final set of minimal media was constructed by taking all pair-wise combinations of carbon-providing and nitrogen-providing metabolites, together with the secondary metabolites listed previously, for which the wild-type model predicted positive growth (S1 Table).
Simulations were also conducted on several non-minimal environments representing common lab-growth media. Such so-called “rich media” were defined manually for YPD, YPLactate (both D- and L-Lactate), SD and SD−His (S1 Table). The SD-His settings were used in the epistasis investigation in order to mimic the experimental setup in [20]. Maximum import rates were restricted based on the measured uptake rate of glucose by S. cerevisiae grown in YPD where this rate is limiting.
Calculation of epistasis
Epistatic interaction scores were calculated for each possible pair-wise interaction between genes using a standard method [39]. Epistasis (ε) between any pair of genes i and j, is defined as
where wi and wj represent the relative fitness of each single-gene deletion mutant and wij is the relative fitness of the double-gene deletion mutant. The relative fitness of any mutant wi may be derived from the loss cost (ci) as wi = 1 –ci. Predicted values of ε were generated using gene-loss cost and using a modified version of gene-loss cost where the ORs were replaced with ANDs for the isoenzyme gene to reaction mapping rules.
Comparison of predicted epistasis with experimental data
In order to assess the validity of our predicted epistatic interaction scores, we compared our predictions against a data set for which these scores have been computed from experimentally observed fitness [20]. We limited our comparisons to genes for which the experimentally observed fitness of deletion mutant was no greater than two standard deviations above the fitness of the wild type, because FBA is incapable of predicting increases in fitness due to gene deletions (in the absence of other types of perturbations). We used the intermediate criteria [20] of |ε| > 0.08, p < 0.05 to classify pairs of genes in the experimental data, into synergistic interactions (negative ε), antagonistic interactions (positive ε), or non-interactions. We used a cutoff of |ε| > 0.0001 for the FBA predictions in Fig 4. Performance was measured by testing whether or not the epistatic classification predicted using FBA techniques matched the experimental classification.
Data and scripts availability
All data and scripts used to generate the results presented in this work are freely available at github.com/llambourne/isoenzymes_flux_balance (doi:10.5281/zenodo.231284) and at datadryad.org (doi:10.5061/dryad.6ht2c).
Supporting Information
Data Availability
All relevant data are within the paper and its Supporting Information files. Data and script are also available at Dryad (DOI: doi:10.5061/dryad.6ht2c) and Github (http://github.com/llambourne/isoenzymes_flux_balance).
Funding Statement
This work was supported by the National Science Foundation, grant CCF-1219007 to YX; the Natural Sciences and Engineering Research Council of Canada, grant RGPIN-2014-03892 to YX; the National Institute of Health, grants 5R01GM089978 and 5R01GM103502 to DS; the Army Research Office - Multidisciplinary University Research Initiative, grant W911NF-12-1-0390 to DS; the US Department of Energy, grant DE-SC0012627 to DS; and by the Canada Research Chairs Program (YX). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Orr HA. Fitness and its role in evolutionary genetics. Nat Rev Genet. 2009. August;10(8):531–9. 10.1038/nrg2603 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hurst LD. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 2002. September;18(9):486 [DOI] [PubMed] [Google Scholar]
- 3.Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol (Amst). 2000. December 1;15(12):496–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000. January;17(1):32–43. [DOI] [PubMed] [Google Scholar]
- 5.Hirsh AE, Fraser HB, Wall DP. Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol. 2005. January;22(1):174–7. 10.1093/molbev/msh265 [DOI] [PubMed] [Google Scholar]
- 6.Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999. August 6;285(5429):901–6. [DOI] [PubMed] [Google Scholar]
- 7.Giaever G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002. July 25;418(6896):387–91. 10.1038/nature00935 [DOI] [PubMed] [Google Scholar]
- 8.Papp B, Pál C, Hurst LD. Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature. 2004. June 10;429(6992):661–4. 10.1038/nature02636 [DOI] [PubMed] [Google Scholar]
- 9.Harrison R, Papp B, Pál C, Oliver SG, Delneri D. Plasticity of genetic interactions in metabolic networks of yeast. Proc Natl Acad Sci. 2007. February 13;104(7):2307–12. 10.1073/pnas.0607153104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hirsh AE, Fraser HB. Protein dispensability and rate of evolution. Nature. 2001. June 28;411(6841):1046–9. 10.1038/35082561 [DOI] [PubMed] [Google Scholar]
- 11.Wang Z, Zhang J. Why is the correlation between gene importance and gene evolutionary rate so weak? PLoS Genet. 2009. January;5(1):e1000329 10.1371/journal.pgen.1000329 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Steinmetz LM, Scharfe C, Deutschbauer AM, Mokranjac D, Herman ZS, Jones T, et al. Systematic screen for human disease genes in yeast. Nat Genet. 2002. August;31(4):400–4. 10.1038/ng929 [DOI] [PubMed] [Google Scholar]
- 13.Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, et al. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science. 2008. April 18;320(5874):362–5. 10.1126/science.1150021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010. March;28(3):245–8. 10.1038/nbt.1614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Raman K, Chandra N. Flux balance analysis of biological systems: applications and challenges. Brief Bioinformatics. 2009. July;10(4):435–49. 10.1093/bib/bbp011 [DOI] [PubMed] [Google Scholar]
- 16.Kuepfer L, Sauer U, Blank LM. Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res. 2005. October;15(10):1421–30. 10.1101/gr.3992505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mo ML, Palsson BØ, Herrgård MJ. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol. 2009;3:37 10.1186/1752-0509-3-37 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Heavner BD, Smallbone K, Price ND, Walker LP. Version 6 of the consensus yeast metabolic network refines biochemical coverage and improves model performance. Database. 2013;2013:bat059 10.1093/database/bat059 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Phillips PC. Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008. November;9(11):855–67. 10.1038/nrg2452 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, et al. The genetic landscape of a cell. Science. 2010. January 22;327(5964):425–31. 10.1126/science.1180823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Segrè D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci. 2002. November 12;99(23):15112–7. 10.1073/pnas.232349399 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Heavener, BD. yeast_7.6: 09 Jul 2015 [Internet]. Yeastnet [cited 11 Nov 2016]. Available from: https://sourceforge.net/projects/yeast/files/.
- 23.Aung H, Henry S, Walker L. Revising the Representation of Fatty Acid, Glycerolipid, and Glycerophospholipid Metabolism in the Consensus Model of Yeast Metabolism. Ind Biotechnol. 2013. August;9(4):215–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Klitgord N, Segrè D. Environments that induce synthetic microbial ecosystems. PLoS Comput Biol. 2010;6(11):e1001002 10.1371/journal.pcbi.1001002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Covert MW, Schilling CH, Palsson BØ. Regulation of gene expression in flux balance models of metabolism. J Theor Biol. 2001. November 7;213(1):73–88. 10.1006/jtbi.2001.2405 [DOI] [PubMed] [Google Scholar]
- 26.Xia Y, Franzosa EA, Gerstein MB. Integrated assessment of genomic correlates of protein evolutionary rate. PLoS Comput Biol. 2009. June;5(6):e1000413 10.1371/journal.pcbi.1000413 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ihmels J, Levy R, Barkai N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol. 2004. January;22(1):86–92. 10.1038/nbt918 [DOI] [PubMed] [Google Scholar]
- 28.Kafri R, Bar-Even A, Pilpel Y. Transcription control reprogramming in genetic backup circuits. Nat Genet. 2005. March;37(3):295–9. 10.1038/ng1523 [DOI] [PubMed] [Google Scholar]
- 29.DeLuna A, Springer M, Kirschner MW, Kishony R. Need-based up-regulation of protein levels in response to deletion of their duplicate genes. PLoS Biol. 2010. March;8(3):e1000347 10.1371/journal.pbio.1000347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ihmels J, Collins SR, Shuldiner M, Krogan NJ, Weissman JS. Backup without redundancy: genetic interactions reveal the cost of duplicate gene loss. Mol Syst Biol. 2007;3:86 10.1038/msb4100127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kellis M, Birren BW, Lander ES. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004. April;428(6983):617–624. 10.1038/nature02424 [DOI] [PubMed] [Google Scholar]
- 32.He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005. February;169(2):1157–1164. 10.1534/genetics.104.037051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Poelwijk FJ, Tănase-Nicola S, Kiviet DJ, Tans SJ. Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes. J Theor Biol. 2011. March 7;272(1):141–4. 10.1016/j.jtbi.2010.12.015 [DOI] [PubMed] [Google Scholar]
- 34.Jacobs C, Segrè D. Organization principles in genetic interaction networks In: Soyer OS, editor. Evolutionary systems biology. New York: Springer New York; 2012. p. 53–78. [DOI] [PubMed] [Google Scholar]
- 35.Szappanos B, Kovács K, Szamecz B, Honti F, Costanzo M, Baryshnikova A, et al. An integrated approach to characterize genetic interaction networks in yeast metabolism. Nat Genet. 2011. July;43(7):656–62. 10.1038/ng.846 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shlomi T, Herrgård MJ, Portnoy V, Naim E, Palsson BØ, Sharan R, et al. Systematic condition-dependent annotation of metabolic genes. Genome Res. 2007. November;17(11):1626–33. 10.1101/gr.6678707 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Burgard AP, Nikolaev EV, Schilling CH, Maranas CD. Flux coupling analysis of genome-scale metabolic network reconstructions. Genome Res. 2004. February;14(2):301–12. 10.1101/gr.1926504 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kurtzman CP. Phylogenetic circumscription of Saccharomyces, Kluyveromyces and other members of the Saccharomycetaceae, and the proposal of the new genera Lachancea, Nakaseomyces, Naumovia, Vanderwaltozyma and Zygotorulaspora. FEMS Yeast Res. 2003. December;4(3):233–45. [DOI] [PubMed] [Google Scholar]
- 39.Segrè D, Deluna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2005. January;37(1):77–83. 10.1038/ng1489 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the paper and its Supporting Information files. Data and script are also available at Dryad (DOI: doi:10.5061/dryad.6ht2c) and Github (http://github.com/llambourne/isoenzymes_flux_balance).