Abstract
Intense experimental and theoretical efforts have been made to globally map genetic interactions, yet we still do not understand how gene-gene interactions arise from the operation of biomolecular networks. To bridge the gap between empirical and computational studies, we: i) quantitatively measure genetic interactions between ~185,000 metabolic gene pairs in Saccharomyces cerevisiae, ii) superpose the data on a detailed systems biology model of metabolism, and iii) introduce a machine-learning method to reconcile empirical interaction data with model predictions. We systematically investigate the relative impacts of functional modularity and metabolic flux coupling on the distribution of negative and positive genetic interactions. We also provide a mechanistic explanation for the link between the degree of genetic interaction, pleiotropy, and gene dispensability. Last, we demonstrate the feasibility of automated metabolic model refinement by correcting misannotations in NAD biosynthesis and confirming them by in vivo experiments.
Recent large-scale genetic analyses of yeast have enabled the systematic screening of pairwise genetic interactions and provided valuable insights into the functional organisation of a eukaryotic cell1 as well as genetic networks underlying specific biological processes2,3. Despite the rapid growth in quantitative data on genetic interactions, we still have only a limited understanding of the molecular mechanisms through which one mutation modifies the phenotypic effect of another. Furthermore, while the general properties of genetic interaction networks have been explored phenomenologically1,4, we often lack a mechanistic understanding of these patterns. For example, a recent large-scale study reported that single mutants with severe fitness defects tend to exhibit numerous genetic interactions1, a phenomenon that still awaits explanation. Finally, the systematic generation of novel biological hypotheses from the welter of phenotypic data produced by interaction screens remains a major challenge. By examining how cellular phenotypes arise from the operation of molecular networks, systems biology offers great promise for meeting these challenges.
Metabolism is one of the best-characterized cellular subsystems and is especially suited for system-level studies of the genotype–phenotype relationship, and hence genetic interactions. First, high-quality metabolic network reconstructions are available that specify the chemical reactions catalysed by hundreds of enzymes and cover the molecular function for a significant fraction of the genome (e.g. 15% in yeast)5. Second, these reconstructions can be converted into computational models to calculate the phenotype of both wild-type and mutant cells using constraint-based analysis tools6, such as flux balance analysis (FBA). This imposes mass balance and capacity constraints to define the space of feasible steady-state flux distributions of the network and then identifies optimal network states that maximise biomass yield, a proxy for growth. Despite its simplicity and low data requirements, this modelling framework has shown great predictive power and has been successfully applied to various research problems7, including predicting the viability of single-gene deletants8 and model-driven analysis of high-throughput data8-10. Although some properties of genetic interaction networks have also been addressed using FBA, these earlier studies were exclusively11,12 or mainly13,14 theoretical due to the lack of large-scale genetic interaction data for metabolic genes.
To bridge the gap between theory and experiment, we have systematically measured genetic interactions between pairs of metabolic genes in yeast and combined these data with a detailed metabolic network reconstruction. Quantitative measurement of the fitness of single and double mutants has enabled us to detect both negative (aggravating) and positive (alleviating) interactions (i.e. the double mutant has a lower or higher fitness, respectively, than would be expected from the product of the single-mutant fitnesses). Our integrated approach has three major goals. First, we investigate the distribution of genetic interactions within and across functional modules as defined by classical annotation groups and network-based mathematical methods. Second, we perform constraint-based analysis of the network to simulate mutational effects and predict interactions in silico. We then employ our in vivo interaction data to test the model's ability to capture the general properties of genetic interaction networks and to assess the validity of its specific predictions. Third, we automate the reconciliation of empirical interaction data with model predictions and use discrepancies to update the metabolic network and direct biological discovery.
Results
Constructing a genetic interaction map of yeast metabolism
We selected genes for our genetic interaction map based on an updated reconstruction of the S. cerevisiae metabolic network, which consists of 1412 reactions and accounts for 904 genes10. Genetic interaction data has been generated by large-scale synthetic genetic array (SGA) technology15. First, we performed new screens to construct a map that covers all major metabolic subsystems, except for tRNA aminoacylation. The screens involved constructing of high-density arrays of double mutants by crossing 613 query mutants, including 78 hypomorphic alleles of essential genes, against an array of 470 null mutants, producing double mutants for 184,624 unique gene pairs. The fitness of single and double mutants was assessed quantitatively by measuring colony size16. Interaction scores (ε) were calculated based on the deviation of the double-mutant fitness (f12) from the product of the corresponding single-mutant fitnesses (ε = f12 – f1·f2)17. Second, we supplemented our measurements with data from our recent large-scale genetic interaction screen1, which employed the same experimental procedure as the present study, but represented genes in all functional categories, including metabolism.
Overall, our combined dataset covers more than 80% of metabolic network genes, including 82 essential genes, and provides interaction scores for 215,907 pairs, 57% of which have been independently screened more than once. Applying a previously defined confidence threshold that proved informative in functional analyses1, we detected 3,572 negative and 1,901 positive interactions (Online Methods). We have focussed on interactions between null mutations of non-essential genes (176,821 pairs) due to their better coverage and easier interpretation; data on essential genes has only been employed for specific analyses. Additionally, we also defined a high-confidence interaction set based on the reproducibility of replicate experiments and employed it when very low false-positive rates were required.
Genetic interactions are widespread between different functional modules
We took advantage of our quantitative genetic interaction map to empirically test earlier predictions about the distribution of interactions within and between metabolic functional modules. Specifically, a computational study based on FBA suggested that: i) genetic interactions are enriched within metabolic annotation groups, and ii) interactions between different functional groups tend to be either exclusively negative or exclusively positive, a property termed ‘monochromaticity’11.
First, we report a modest, but significant, enrichment of both negative (1.6-fold) and positive (2.5-fold) interactions within classically defined functional modules. For example, lipid metabolism is especially enriched in genetic interactions, with sterol metabolism and fatty acid biosynthesis being primarily enriched in positive interactions, while both forms of interactions are overrepresented in sphingolipid metabolism (Fig. 1). Importantly, the enrichments remain after controlling for potential confounding variables, such as paralogy18, physical interaction3, or single-mutant fitness1 (Online Methods) and become more pronounced when using the high-confidence interaction set (3.8-fold and 8.7-fold enrichment of negative and positive interactions, respectively). However, as Figure 1 demonstrates, the majority of genetic interactions occur between genes assigned to different metabolic functions (93% of negative and 90% of positive, or 86% and 73%, respectively, when using high-confidence interactions). The fact that even strongly enriched functional groups, such as fatty acid biosynthesis, exhibit numerous interactions with other groups indicates widespread pleiotropy across metabolic subsystems.
Next, we asked whether interactions between different functional groups tend to be either exclusively negative or positive. In agreement with theoretical predictions, we found a statistically significant excess of monochromaticity among pairs of functional groups in the real data compared to randomized interaction maps (P<10-4). For example, while sterol metabolism displays almost purely negative interactions with tyrosine, tryptophan, and phenylalanine metabolism, it predominantly interacts positively with fatty acid biosynthesis (Fig. 1). Nevertheless, monochromaticity in our genetic interaction map is modest, only ~24−34% more monochromatic pairs were found than expected by chance, a conclusion that remained qualitatively the same when using high-confidence interactions (Supplementary Table 1).
As an alternative to functional groups defined based on classical biochemical pathways, unbiased mathematical methods have been developed to measure functional relatedness based on coherent usage of reactions in the metabolic network6,19. In particular, flux coupling20 provides a biochemically sound definition of functional relatedness and has strong physiological and evolutionary significance21-23. To further investigate the distribution of genetic interactions within and between functional modules, we identified flux-coupled gene pairs computationally (i.e. pairs of reactions where the activity of one reaction implies the activity of the other, either reciprocally or in one direction; Online Methods). In agreement with results obtained using annotation groups, while we find that both negative (2-fold) and positive (2.7-fold) interactions are enriched in flux-coupled pairs (P<10-6 and P<10-8, respectively), the overwhelming majority (>97%) of both forms of interactions occur between uncoupled genes, even when only high-confidence interactions are investigated (>93%).
In conclusion, both definitions of functional relatedness reveal that most genetic interactions connect across distinct functional modules, extending an earlier estimate that synthetic lethal interactions are 3.5 times more likely to span pairs of protein-protein interaction pathways than to occur within such pathways24. Furthermore, our finding that both negative and positive interactions tend to occur between metabolic modules is consistent with recent observations that both forms of interactions primarily connect genes belonging to different protein complexes1,16.
A metabolic model elucidates the degree distribution of genetic interaction networks
To further explore the organizational principles of the genetic interaction network, we next investigated its degree distribution using a computational model of metabolism. A prominent attribute of genetic interaction networks, also shared by other biological networks25, is that the majority of genes display few interactions, while a minority of “hub” genes are highly connected1,4. Furthermore, a recent study uncovered a strong correlation between the number of genetic interactions a gene exhibits and the fitness defect associated with its deletion (dispensability)1, a pattern also confirmed by our empirical metabolic interaction map (Supplementary Fig. 1). Nevertheless, the tendency of ‘sick’ single mutants to engage in an especially high number of both negative and positive interactions remains unexplained. Intuitively, one expects that a strongly deleterious single mutation can mask a large number of mildly deleterious mutations in other genes, and hence display numerous positive interactions. However, a similar logic would imply a paucity of negative interactions for sick mutants (i.e. a sick deletant is less likely to be made worse by other mutations), an expectation that is inconsistent with observations1.
To probe whether a simple structural model of metabolism is able to capture the above properties of genetic interaction networks, we computed in silico interaction degrees and single-mutant fitness employing FBA. Similar to the empirical data, in silico genetic interaction degree is also unevenly distributed, with only ~12% of genes accounting for the majority (~85%) of interactions. Most remarkably, the model predicted a strong negative correlation between single-mutant fitness and genetic interaction degree for both positive and negative interactions, confirming the trend observed in the experimentally-derived genetic interaction network (Spearman's ρ= -0.89 and ρ= -0.66, respectively). Importantly, these trends remained when genes without any in silico fitness contribution were excluded from the analysis (ρ= -0.59, P<10-3 for positive; ρ= -0.47, P=0.005 for negative interactions, Fig. 2a), demonstrating that the associations are not simply due to the presence of silent reactions in the metabolic model.
Having established its ability to capture the high genetic interaction connectivity of sick mutants, we asked the metabolic model to provide mechanistic explanations. One reason why a gene might exhibit numerous genetic interactions is that it contributes to multiple biological processes (i.e. it is highly pleiotropic), hence the phenotypic effect of its deletion may be modulated by a large number of other genes, each of them negatively or positively affecting a different aspect of its functionality. Indeed, it has been reported that genetic interaction hubs often display multifunctionality1. If highly pleiotropic genes also have (on average) a large fitness contribution, then we would expect a negative correlation between single-mutant fitness and interaction degree. Although pleiotropy is difficult to define empirically, the FBA framework offers a rigorous approach to compute pleiotropy and test this idea. To do this, we determined the number of key metabolites (so-called biomass components, including amino acids, nucleotides, etc.) whose maximal production is affected by the absence of each gene (see Online Methods and ref. 26). In accordance with our hypothesis, we found a strong association between the number of biosynthetic processes to which a gene contributes and the predicted fitness of its deletant (ρ=-0.83, P<10-9 on raw data for genes with a non-zero deletion effect, see also Fig. 2b). Moreover, pleiotropy correlates with both in silico and in vivo genetic interaction degrees (negative degree: ρ=0.55 and ρ=0.24; positive degree: ρ=0.62 and ρ=0.25, respectively; P<10-8 in all cases). Given the close association between computationally derived single-mutant fitness and pleiotropy, we next performed partial correlation analyses to disentangle the effects of these factors on in silico interaction degrees. Our multivariate analyses revealed that, while positive interaction degree is determined by single-mutant fitness (a finding consistent with the idea that severe mutations can mask numerous milder mutations), negative interaction degree is driven by pleiotropy (Supplementary Table 2).
Taken together, these computational results suggest that the structure of the metabolic network dictates both the fitness contribution (and hence positive interaction degree) and the functional pleiotropy (and hence negative interaction degree) of genes. Future empirical studies of pleiotropy will help to clarify whether these mechanisms also adequately explain in vivo genetic interaction degrees.
No empirical evidence for prevalent positive interactions in essential genes
A recent FBA study suggested that non-lethal mutations in essential metabolic genes exhibit strikingly different interaction patterns compared to null mutations of non-essential genes14. Specifically, it was predicted that essential metabolic genes frequently display positive interactions with other metabolic genes, regardless of their function or the latter's essentiality, strongly skewing the ratio of positive to negative interactions. While a small-scale empirical analysis was consistent with this prediction14, it remained to be seen whether it was supported by large-scale experiments. Accordingly, we mapped genetic interactions between hypomorphic alleles2 of a set of essential genes and null mutants of non-essential genes, screening 39,086 pairs. If positive interactions were indeed highly abundant between gene pairs involving an essential reaction, then we should observe a strong bias toward positive interactions for essential genes. Although we found that essential genes have an increased number of positive interactions, they also display more negative interactions, therefore their ratio of positive to negative interactions is virtually identical to those of non-essential genes (Wilcoxon test: P=0.89, Fig. 2c). In sum, we failed to find empirical evidence for the predicted high prevalence of positive genetic interactions for essential metabolic genes. Given that the only experimental study reporting abundant positive interactions investigated only a handful of non-metabolic essential genes14, we speculate that the discrepancy between the small-scale study14 and our results could partly be due to sampling bias in the former.
Fine-scale evaluation of predicted genetic interactions
Our comprehensive genetic interaction map provides an unprecedented opportunity to assess the FBA framework's ability to predict individual interactions. To rigorously estimate the fraction of true predicted interactions (precision) and the fraction of experimentally observed interactions that are captured by the model (recall or true-positive rate), we selected a set of high-confidence empirical interactions between non-essential genes (Online Methods) and excluded genes that are associated with poorly characterized network parts (i.e. blocked reactions20). This resulted in 325 negative and 116 positive interactions among 67,517 non-essential gene pairs. We found that experimentally identified interactions are highly over-represented among predicted strong interactions, with up to 100-fold and 60-fold enrichment for negative and positive interactions, respectively (i.e. precision values of 50% and 11%, respectively, see Fig. 3). Although this confirms that the highest predicted interaction scores have high physiological relevance13, we find that only a minority of empirical interactions are captured by the model at the same cut-off points (recall values are 2.8% and 12.9% for negative and positive interactions, respectively), a conclusion that remained unchanged when an alternative algorithm27, an alternative interaction score11, or a less compartmentalized metabolic model28 was employed to compute interactions (Supplementary Figures 2a-c). Importantly, only a minority of gene pairs that show negative (7.6%) or positive (3%) interactions in vivo display non-zero interaction scores of the opposite sign in silico, indicating that the low recall of the model stems from missed genetic interactions, not from misclassification of the two forms of interactions.
Why are so many genetic interactions missed by the model? First, as single-mutant fitness predictions are far from perfect8,10, one might expect that interaction between two non-essential genes could be missed simply because one or the other gene is essential in the model. Indeed, ~24% of negative and ~22% of positive interactions are missed due to misprediction of single-mutant viability. Although the true-positive rate of genetic interaction predictions slightly improves when genes falsely predicted to be essential are excluded, the majority of empirical interactions are still not captured by the model. In particular, FBA predicts strong negative interaction scores for only 3.7% of in vivo negative interactions, indicating that it over-predicts double mutant fitness in the majority of these gene pairs. Second, weak in vivo genetic interactions might be inherently less reproducible by the metabolic model. While this idea is supported by an improved true-positive rate for strong in vivo interactions (~17% for ε ≤ -0.5 and 25% for ε ≥ 0.15), we conclude that even the strongest interactions are frequently missed by the model. Third, FBA predicts optimal metabolic behaviour without incorporating regulatory mechanisms. Consequently, reactions that are down-regulated in vivo could nevertheless compensate deletions in other parts of the network in silico, therefore the model likely underestimates mutational effects. To address this possibility, we used published quantitative transcriptome data29 to identify non-expressed metabolic genes and constrained the corresponding reaction activities to zero in the simulations30. Imposing transcriptional constraints did not noticeably improve predictions (Supplementary Fig. 2d), suggesting that detailed information on other layers of regulation31 (e.g. metabolic regulation32), data on toxic intermediates and more sophisticated modelling frameworks (e.g. regulatory FBA33) are needed to probe the performance limits of genome-scale models. Finally, aside from the limitations of FBA, some false predictions likely indicate incomplete knowledge or annotation errors in the metabolic network.
Numerous statistical methods have been proposed to predict genetic interactions by combining heterogeneous sources of genomic and functional data (e.g. sequence homology, physical interaction, co-expression, etc.)34,35. These statistical approaches serve complementary roles to FBA. While biochemical modelling has the advantage of easy interpretability and offers direct mechanistic insights, statistical models may illuminate the amount of information available in large-scale datasets to predict genetic interactions. Thus, we asked whether such methods may substantially improve our knowledge on genetic interactions in the metabolic network.
To assess the performance of statistical modelling, we first compiled a dataset of gene-pair characteristics (following earlier studies34,35 and based on metabolic network features, but omitting any information on genetic interactions; see Supplementary Note). and employed data-mining methods (random forest36 and logistic regression) to classify genetic interactions based on these features. Although an increased fraction of in vivo interactions can be retrieved, ~70% of negative and ~75% of positive interactions are still predicted with very low (<10%) precision (Supplementary Fig. 3). Thus, we conclude that the majority of genetic interactions are not well understood either in terms of biochemical processes or statistical associations. Importantly, incorporating FBA-derived fitness and genetic interaction scores into statistical models boosts the precision of negative interaction predictions (Supplementary Fig. 3), indicating that biochemical modelling provides unique information that is not captured by purely statistical data integration.
Automated model refinement using genetic interaction data
To reconcile discrepancies between empirical and computational genetic interaction maps, we developed a machine-learning method that automatically generates hypotheses to explain in vivo compensation (negative interaction) between genes. In contrast to a previously proposed approach37 that reconciled experimental and computational growth data mutant by mutant, we sought to minimize model mispredictions globally (i.e. using all available data) by employing a two-stage genetic algorithm (Fig. 4a and Supplementary Note). The following types of changes to the model were allowed37: i) modifying reaction reversibility, ii) removing reactions, and iii) altering the list of biomass compounds required for growth (Supplementary Note).
Our automated method suggested several modifications (Supplementary Table 3) that, together, considerably improved the fit of the model to our genetic interaction map (100 – 267% increase in recall and 44 – 59% increase in precision, Fig. 4b). Importantly, cross-validation confirmed that our method also significantly improves the model's ability to predict genetic interactions that were not used in model refinement (with recall increased by ~87% on average, P<0.002; Supplementary Note).
As an example of a modification suggested by our method, it showed that omitting glycogen from the set of essential biomass components corrects two falsely-predicted genetic interactions. This is congruent with glycogen's role as a reserve carbohydrate, which becomes important in nutrient-depleted or stress conditions38. Remarkably, our algorithm also revealed that removal of only one or two reactions from the network corrects the prediction of 4 negative interactions between alternative NAD biosynthesis pathways. In particular, the published network reconstruction10 contains three biosynthetic routes for NAD, and removing the two-step path from aspartate to quinolinate uncovers pairwise compensation between the other two pathways (Fig. 5a). Importantly, while de novo NAD synthesis from aspartate is present in E. coli39, it has no genes annotated in the yeast network and bioinformatics analyses failed to find yeast homologs of the E. coli enzymes (Supplementary Note). To further investigate whether quinolinate formation from aspartate might be wrongly included in the yeast reconstruction, we interrogated the metabolic model to deduce specific predictions for experimental testing. We found that only the refined model predicts the essentiality of genes in the kynurenine pathway (BNA1, BNA2, BNA4, and BNA5) when nicotinic acid is absent from the medium. Next, we tested these predictions experimentally and confirmed that deletants of all four genes were nicotinic acid auxotrophs (Fig. 5b). Together, these results strongly suggest that the aspartate to NAD pathway is not present in yeast40.
Our automated procedure identified additional erroneous predictions between NAD pathway genes and suggested further modifications (Supplementary Table 3), prompting us to thoroughly revise NAD biosynthesis in the published reconstruction. Based on inspection of interaction data, single-mutant phenotypes, and literature information, we propose a number of changes including modifications of gene-reaction associations and reaction reversibilities (Supplementary Fig. 4). The revised model is not only consistent with literature data, but also improves both interaction (12 corrections) and gene essentiality (1 correction) predictions.
Discussion
A system-level understanding of genetic interactions requires the integration of experimental and theoretical approaches. To progress towards this goal, we experimentally mapped interactions in yeast metabolism and systematically compared empirical data with predictions from a biochemical model. Our approach provides the first glimpse of genetic interactions in small-molecule metabolism and establishes the performance limits of a genome-scale metabolic model. We revealed that a simple structural model of metabolism captures several organizational properties of genetic interaction networks and suggests mechanistic hypotheses.
Importantly, the computational model sheds new light on the relationship between the severity of mutational effects and genetic interactions. The FBA model not only captures the hitherto unexplained relationship between fitness effect and genetic interaction degree, but also suggests a novel mechanistic link between negative interaction degree and functional pleiotropy: the effect of mutations in pleiotropic genes may be modulated by mutations in a large number of other genes, each of them compensating a different aspect of the first gene's functionality.
Although we reported a coarse-grained consistency between model predictions and experiments, evaluation of individual interaction predictions revealed abundant discrepancies. In particular, FBA fails to capture the majority of experimentally determined genetic interactions, an attribute shared with statistical models built via data integration. Furthermore, interaction patterns of hypomorphic alleles of essential genes are grossly mispredicted, resulting in a discrepancy between our empirical data and a previous theoretical expectation about the high prevalence of positive interactions14.
We can draw several conclusions from these inconsistencies. First, the quality and completeness of the metabolic reconstruction should be improved. Second, while null mutations can easily be represented in the FBA framework, simulation of hypomorphic alleles is inherently problematic as it hinges on assumptions about the relationship of enzyme activity to flux41. Third, the fact that a large number of in vivo instances of genetic interactions are not explained by the structure of the metabolic network suggests that regulation at both the gene expression and metabolite-enzyme levels should be taken into account in future attempts to realistically model metabolic behavior in genetically perturbed cells42.
Most significantly, the comprehensive interaction map can be used to refine the metabolic model. Indeed, reconciling discrepancies between predicted and observed phenotypes is of central importance in developing systems biology models43,44. We demonstrated the feasibility of an automated method to refine the metabolic model. We anticipate that similar approaches, coupled with high-throughput experimentation, have the potential to close the iterative cycles of generating and testing novel hypotheses, leading to at least partial automation of biological discoveries45,46.
Supplementary Material
Acknowledgements
This work was supported by grants from The International Human Frontier Science Program Organization, the Hungarian Scientific Research Fund (OTKA PD 75261) and the ‘Lendület Program’ of the Hungarian Academy of Sciences (B.P.), European Research Council (202591), Wellcome Trust and EMBO Young Investigator Programme (C.P.), FEBS Long-Term Fellowship (B.Szam.), Biotechnology & Biological Sciences Research Council (Grant BB/C505140/1) and the UNICELLSYS Collaborative Project (No. 201142) of the European Commission (S.G.O.), the National Institutes of Health (1R01HG005084-01A1) and a seed grant from the University of Minnesota Biomedical Informatics and Computational Biology program (C.L.M), the Canadian Institutes of Health Research (MOP-102629) (C.B, B.J.A.) and the National Institutes of Health (NIH) (1R01HG005853-01) (C.B., B.J.A, C.L.M).
Online Methods
Experimental mapping of genetic interactions
We employed synthetic genetic array (SGA) methodology, an automated form of genetic analysis, to construct high-density arrays of double mutants (for details, see Refs.4,15). Quantitative assessment of genetic interactions requires measurements of single- and double-mutant fitness, and an estimate of the double-mutant fitness that would be expected based on the single-mutant phenotypes. Mutant fitnesses were derived from colony sizes after correcting systematic experimental biases (including positional effects, spatial effects, nutrient competition and screen batch effects)16. Single-mutant fitness was estimated using a set of control SGA screens, where the queries carried a mutation in a neutral genomic locus1. Double-mutant fitness was estimated by employing the regular SGA protocol. We used the obtained single- (fi and fj) and double-mutant fitnesses (fij) to derive genetic interaction measures as ε = fij – fi·fj. A statistical confidence measure (P-value) was assigned to each interaction based on a combination of the observed variation of each double mutant across four experimental replicates and estimates of the background log-normal error distributions for the corresponding query and array mutants1,16.
To explore the general properties of the metabolic genetic interaction map, we applied a previously suggested1 confidence threshold of |ε|>0.08 and P<0.05 to define significantly interacting gene pairs. This threshold has been previously shown1 to yield a good balance between coverage and precision and defines genetic interactions that cover at least ~35% of negative and ~18% of positive interactions deposited in BioGrid49 with estimated precisions of ~63% and ~59%, respectively. In the case of replicate screens (e.g. both AB and BA pairs were screened), we applied the following procedure: if replicate screens show opposite interaction signs and at least one of them is significant, both pairs were removed; if they show the same interaction sign (both positive or both negative), the interaction with the lowest P-value was retained and both pairs are reported with that interaction. Comparison of interactions from screens performed in the present study with those from a full-genome study1 revealed a good correlation (r=0.76) between interaction scores that were identified as significant by both studies. The high cross-study correlation allowed us to merge interaction data from the present study with interaction data on metabolic gene pairs from the genome-scale screens1.
Additionally, we also defined a smaller high-confidence dataset in which all gene pairs were independently screened at least twice to minimize false interactions. Here, two genes are considered as interacting if at least one screen shows |ε|>0.08 and P<0.05, and another screen shows P<0.05 of the same interaction sign, whereas non-interacting pairs are defined as those not showing |ε|>0.08 and P<0.05 in any of the screens. Any other gene pairs were removed from the high-confidence set. This resulted in 529 negative and 194 positive interactions between 122,875 gene pairs.
Interaction data can be down loaded from http://www.utoronto.ca/boonelab/data/szappanos/.
Analysis of the effect of functional relatedness, paralogy and protein-protein interactions on genetic interactions
We used logistic regression analysis to test the association between genetic interaction and various categorical and continuous features (e.g. paralogy, co-functionality, single mutant fitness etc.). Functional annotation groups were as defined in the published metabolic reconstruction10 and information on physical interactions between proteins was extracted from the BioGrid 2.0.58 database49. Paralog gene pairs were identified by performing all-against-all BLASTP similarity searches50 of yeast ORFs. We defined two genes as paralogs if: i) the BLAST score had an expected value E<10-8, ii) alignment length exceeded 100 residues, iii) sequence similarity was >30%, and iv) were not parts of transposons.
Monochromaticity analysis
To examine the monochromaticity of genetic interactions between pairs of functional annotation groups, we defined a monochromatic score (MC) as follows. Let prij denote the ratio of positive to all genetic interactions between group i and j and bpr denote the background ratio of positive to all interactions:
A pair of groups showing purely positive (negative) genetic interactions between each other has an MC-score equal to +1 (-1), while those reflecting the background ratio (bpr) have MC-scores of 0. We computed MC-scores based on those genes that are assigned to one functional group only. A pair of functional groups was considered monochromatic if |MCij| > 0.5.
To assess the significance of monochromaticity, we compared the monochromatic score of the experimentally determined genetic interaction network to those of a 10,000 interaction maps that were constructed by randomizing the sign of each genetic interaction while keeping constant the total number of negative and positive interactions and conserving the annotation groups (see ref11). We restricted our analysis to those functional group pairs that show at least 2 or 3 interactions between each other (Supplementary Table 1).
Computing the impact of mutations and genetic interactions by flux balance analysis
The recently reconstructed metabolic network (iMM904)10 of Saccharomyces cerevisiae was employed to simulate gene deletions. The reconstruction includes 904 genes and 1412 reactions and gives information on the stoichiometry and direction of biochemical reactions, their assignment to subcellular compartments and their associations to protein coding genes (including information on isoenzymes and enzyme complexes). Details of flux balance analysis (FBA) have been described elsewhere6. The simulated growth medium was set up to mimic the one used in the experiments, see Supplementary Note for more details. Genes CAN1, LYP1, URA3, LEU2 and MET17 were removed from the iMM904 reconstruction to mimic the strain background used in the experiments.
We employed linear programming to identify the maximum biomass yield of the wild-type network. The impact of gene deletions (null mutations) were calculated by constraining the corresponding reaction fluxes to zero and using either FBA or a linearized version of MOMA27 to compute biomass yields of the mutant networks. Mutant fitness was defined as the biomass yield relative to wild-type and interaction between two mutations was calculated as follows: ε = f12 – f1·f2 (where f1, f2 and f12 refer to the single and double mutant fitnesses, respectively). To compute the effect of a partial (non-null) mutation in a gene, we constrained the flux of its corresponding reaction to ≤50% of its wild-type level14.
All calculations were carried out in the custom software package Sybil (Gabriel Gelius-Dietrich and Martin J. Lercher, unpublished), developed in the R statistical environment51 and using solvers GLPK and CPLEX.
Exploring the general properties of the FBA-derived genetic interaction map
To generate an in silico genetic interaction map based on FBA, we computed interaction scores between all non-essential metabolic gene pairs and considered two genes as interacting if predicted |ε|>10-4 (using a more stringent cut-off does not qualitatively affect our results). To investigate the relationship between in silico single deletion fitness and other computed network properties (i.e. in silico genetic interaction degree and pleiotropy), we focused only on those genes i) whose reactions are not blocked (i.e. can attain a flux in some steady states of the network) and ii) whose removal affect the reaction content of the network (i.e. do not have isoenzymes) thereby excluding genes that cannot have any single deletion effect in the model. Furthermore, some sets of genes would always produce identical phenotypes in the model simulations and cannot be treated as independent data points in statistical analyses (e.g. genes encoding flux coupled reactions or subunits of the same protein complex). To avoid such a bias in our analysis, we represented each correlated gene set with one randomly chosen gene. These filtering procedures resulted in 193 genes.
Computing system-level functional pleiotropy
We employed the metabolic model to derive a measure of functional pleiotropy for each metabolic gene. The model specifies a list of 54 metabolites that are essential for biomass formation and therefore in silico growth (e.g. amino acids, carbohydrates, fatty acids, etc.). We computed the maximum production yield of each biomass compound in the wild-type network by maximising the flux through a pseudo-reaction representing its secretion26. Next, we deleted each gene and examined whether the knockout showed a reduction in the maximum production of a given compound (i.e. a flux reduction of at least 10-4 was considered as significant). Finally, for each gene, we counted the number of biomass compounds whose maximal production is affected by its deletion. This number reflects the network-level multifunctionality, hence pleiotropy of a gene.
Identifying flux coupled genes in the network
Coupled genes were identified by applying the flux coupling finder algorithm20 on the metabolic network. We distinguished between coupled and uncoupled relationships between reaction pairs: i) coupled (fully and directionally coupled): activity of one reaction fixes the activity of the other and vice versa or activity of one reaction implies the activity of the other, but not the reverse, and ii) uncoupled: activity of one reaction does not imply the activity of the other and vice versa, indicating that the reactions are not required to operate together. Coupling relationships were calculated without assuming a fixed biomass composition to avoid coupling of a large set of fluxes to the biomass reaction20.
Footnotes
URLs. Interaction data and modified metabolic reconstruction: http://www.utoronto.ca/boonelab/data/szappanos/
The authors declare no competing financial interests.
References
- 1.Costanzo M, et al. The genetic landscape of a cell. Science. 2010;327:425–31. doi: 10.1126/science.1180823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Schuldiner M, et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005;123:507–19. doi: 10.1016/j.cell.2005.08.031. [DOI] [PubMed] [Google Scholar]
- 3.Collins SR, et al. Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007;446:806–10. doi: 10.1038/nature05649. [DOI] [PubMed] [Google Scholar]
- 4.Tong AH, et al. Global mapping of the yeast genetic interaction network. Science. 2004;303:808–13. doi: 10.1126/science.1091317. [DOI] [PubMed] [Google Scholar]
- 5.Feist AM, Herrgard MJ, Thiele I, Reed JL, Palsson BO. Reconstruction of biochemical networks in microorganisms. Nat Rev Microbiol. 2009;7:129–43. doi: 10.1038/nrmicro1949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Price ND, Reed JL, Palsson BO. Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol. 2004;2:886–97. doi: 10.1038/nrmicro1023. [DOI] [PubMed] [Google Scholar]
- 7.Oberhardt MA, Palsson BO, Papin JA. Applications of genome-scale metabolic reconstructions. Mol Syst Biol. 2009;5:320. doi: 10.1038/msb.2009.77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Snitkin ES, et al. Model-driven analysis of experimentally determined growth phenotypes for 465 yeast gene deletion mutants under 16 different conditions. Genome Biol. 2008;9:R140. doi: 10.1186/gb-2008-9-9-r140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shlomi T, Cabili MN, Herrgard MJ, Palsson BO, Ruppin E. Network-based prediction of human tissue-specific metabolism. Nat Biotechnol. 2008;26:1003–10. doi: 10.1038/nbt.1487. [DOI] [PubMed] [Google Scholar]
- 10.Mo ML, Palsson BO, Herrgard MJ. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol. 2009;3:37. doi: 10.1186/1752-0509-3-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Segrè D, Deluna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2005;37:77–83. doi: 10.1038/ng1489. [DOI] [PubMed] [Google Scholar]
- 12.Deutscher D, Meilijson I, Kupiec M, Ruppin E. Multiple knockout analysis of genetic robustness in the yeast metabolic network. Nat Genet. 2006;38:993–8. doi: 10.1038/ng1856. [DOI] [PubMed] [Google Scholar]
- 13.Harrison R, Papp B, Pal C, Oliver SG, Delneri D. Plasticity of genetic interactions in metabolic networks of yeast. Proc Natl Acad Sci U S A. 2007;104:2307–12. doi: 10.1073/pnas.0607153104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.He X, Qian W, Wang Z, Li Y, Zhang J. Prevalent positive epistasis in Escherichia coli and Saccharomyces cerevisiae metabolic networks. Nat Genet. 2010;42:272–6. doi: 10.1038/ng.524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tong AH, et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001;294:2364–8. doi: 10.1126/science.1065810. [DOI] [PubMed] [Google Scholar]
- 16.Baryshnikova A, et al. Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nat Methods. 2010;7:1017–24. doi: 10.1038/nmeth.1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mani R, St Onge RP, Hartman J.L.t., Giaever G, Roth FP. Defining genetic interaction. Proc Natl Acad Sci U S A. 2008;105:3461–6. doi: 10.1073/pnas.0712255105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.DeLuna A, et al. Exposing the fitness contribution of duplicated genes. Nat Genet. 2008;40:676–81. doi: 10.1038/ng.123. [DOI] [PubMed] [Google Scholar]
- 19.Papin JA, Reed JL, Palsson BO. Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. Trends Biochem Sci. 2004;29:641–7. doi: 10.1016/j.tibs.2004.10.001. [DOI] [PubMed] [Google Scholar]
- 20.Burgard AP, Nikolaev EV, Schilling CH, Maranas CD. Flux coupling analysis of genome-scale metabolic network reconstructions. Genome Res. 2004;14:301–12. doi: 10.1101/gr.1926504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pál C, Papp B, Lercher MJ. Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet. 2005;37:1372–1375. doi: 10.1038/ng1686. [DOI] [PubMed] [Google Scholar]
- 22.Bundy JG, et al. Evaluation of predicted network modules in yeast metabolism using NMR-based metabolite profiling. Genome Res. 2007;17:510–9. doi: 10.1101/gr.5662207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Notebaart RA, Teusink B, Siezen RJ, Papp B. Co-regulation of metabolic genes is better explained by flux coupling than by network distance. PLoS Comput Biol. 2008;4:e26. doi: 10.1371/journal.pcbi.0040026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kelley R, Ideker T. Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol. 2005;23:561–6. doi: 10.1038/nbt1096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004;5:101–13. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
- 26.Shlomi T, et al. Systematic condition-dependent annotation of metabolic genes. Genome Res. 2007;17:1626–33. doi: 10.1101/gr.6678707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Segrè D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci U S A. 2002;99:15112–7. doi: 10.1073/pnas.232349399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kuepfer L, Sauer U, Blank LM. Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res. 2005;15:1421–30. doi: 10.1101/gr.3992505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nagalakshmi U, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–9. doi: 10.1126/science.1158441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Akesson M, Forster J, Nielsen J. Integration of gene expression data into genome-scale metabolic models. Metab Eng. 2004;6:285–93. doi: 10.1016/j.ymben.2003.12.002. [DOI] [PubMed] [Google Scholar]
- 31.Daran-Lapujade P, et al. The fluxes through glycolytic enzymes in Saccharomyces cerevisiae are predominantly regulated at posttranscriptional levels. Proc Natl Acad Sci U S A. 2007;104:15753–8. doi: 10.1073/pnas.0707476104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bouwman J, et al. Metabolic regulation rather than de novo enzyme synthesis dominates the osmo-adaptation of yeast. Yeast. 2011;28:43–53. doi: 10.1002/yea.1819. [DOI] [PubMed] [Google Scholar]
- 33.Shlomi T, Eisenberg Y, Sharan R, Ruppin E. A genome-scale computational study of the interplay between transcriptional regulation and metabolism. Mol Syst Biol. 2007;3:101. doi: 10.1038/msb4100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wong SL, et al. Combining biological networks to predict genetic interactions. Proc Natl Acad Sci U S A. 2004;101:15682–7. doi: 10.1073/pnas.0406614101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ulitsky I, Krogan NJ, Shamir R. Towards accurate imputation of quantitative genetic interactions. Genome Biol. 2009;10:R140. doi: 10.1186/gb-2009-10-12-r140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Breiman L. Random forests. Machine Learning. 2001;45:5–32. [Google Scholar]
- 37.Kumar VS, Maranas CD. GrowMatch: an automated method for reconciling in silico/in vivo growth predictions. PLoS Comput Biol. 2009;5:e1000308. doi: 10.1371/journal.pcbi.1000308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Francois J, Parrou JL. Reserve carbohydrates metabolism in the yeast Saccharomyces cerevisiae. FEMS Microbiol Rev. 2001;25:125–45. doi: 10.1111/j.1574-6976.2001.tb00574.x. [DOI] [PubMed] [Google Scholar]
- 39.Flachmann R, et al. Molecular biology of pyridine nucleotide biosynthesis in Escherichia coli. Cloning and characterization of quinolinate synthesis genes nadA and nadB. Eur J Biochem. 1988;175:221–8. doi: 10.1111/j.1432-1033.1988.tb14187.x. [DOI] [PubMed] [Google Scholar]
- 40.Panozzo C, et al. Aerobic and anaerobic NAD+ metabolism in Saccharomyces cerevisiae. FEBS Lett. 2002;517:97–102. doi: 10.1016/s0014-5793(02)02585-1. [DOI] [PubMed] [Google Scholar]
- 41.Kacser H, Burns JA. The control of flux. Symp Soc Exp Biol. 1973;27:65–104. [PubMed] [Google Scholar]
- 42.Heinemann M, Sauer U. Systems biology of microbial metabolism. Curr Opin Microbiol. doi: 10.1016/j.mib.2010.02.005. [DOI] [PubMed] [Google Scholar]
- 43.Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet. 2001;2:343–72. doi: 10.1146/annurev.genom.2.1.343. [DOI] [PubMed] [Google Scholar]
- 44.Kell DB, Oliver SG. Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. Bioessays. 2004;26:99–105. doi: 10.1002/bies.10385. [DOI] [PubMed] [Google Scholar]
- 45.Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO. Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004;429:92–6. doi: 10.1038/nature02456. [DOI] [PubMed] [Google Scholar]
- 46.King RD, et al. The automation of science. Science. 2009;324:85–9. doi: 10.1126/science.1165620. [DOI] [PubMed] [Google Scholar]
- 47.Winzeler EA, et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999;285:901–6. doi: 10.1126/science.285.5429.901. [DOI] [PubMed] [Google Scholar]
- 48.Giaever G, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418:387–91. doi: 10.1038/nature00935. [DOI] [PubMed] [Google Scholar]
- 49.Breitkreutz BJ, et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008;36:D637–40. doi: 10.1093/nar/gkm1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.R: A Language and Environment for Statistical Computing. Vienna, Austria: 2007. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.