Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2012 Nov 5;109(47):E3251–E3259. doi: 10.1073/pnas.1216733109

Bioinformatic identification of genes suppressing genome instability

Christopher D Putnam a,b, Stephanie R Allen-Soltero a,c, Sandra L Martinez a, Jason E Chan a,b, Tikvah K Hayes a,1, Richard D Kolodner a,b,c,d,e,2
PMCID: PMC3511103  PMID: 23129647

Abstract

Unbiased forward genetic screens for mutations causing increased gross chromosomal rearrangement (GCR) rates in Saccharomyces cerevisiae are hampered by the difficulty in reliably using qualitative GCR assays to detect mutants with small but significantly increased GCR rates. We therefore developed a bioinformatic procedure using genome-wide functional genomics screens to identify and prioritize candidate GCR-suppressing genes on the basis of the shared drug sensitivity suppression and similar genetic interactions as known GCR suppressors. The number of known suppressors was increased from 75 to 110 by testing 87 predicted genes, which identified unanticipated pathways in this process. This analysis explicitly dealt with the lack of concordance among high-throughput datasets to increase the reliability of phenotypic predictions. Additionally, shared phenotypes in one assay were imperfect predictors for shared phenotypes in other assays, indicating that although genome-wide datasets can be useful in aggregate, caution and validation methods are required when deciphering biological functions via surrogate measures, including growth-based genetic interactions.

Keywords: DNA damage, DNA repair, systems biology


Genetic instability is a characteristic of most cancers (1) that may play a critical role in driving the accumulation of genetic changes that underlie tumorigenesis (2). A number of observations are consistent with this view, including the following: a number of cancer predisposition syndromes have been identified that are associated with inherited defects in genes involved in suppressing genome instability, and inactivation of some of these genes has been observed in sporadic cancers (3, 4); p53, which promotes cell cycle arrest or apoptosis in response to DNA damage, is inactivated in roughly 50% of human cancers, and p53 defects allow cells to tolerate the accumulation of genome rearrangements (5); and genomic instability has been observed to precede the transition to the carcinogenic state or to be associated with the development of cancers in mouse model systems (6).

The investigation of model systems in the study of genome instability has the potential to identify and understand novel genes and pathways relevant to human cancer. A genetic assay developed in the yeast Saccharomyces cerevisiae has been used to identify genes and pathways that suppress gross chromosomal rearrangements (GCRs) mediated by single-copy DNA sequences (7). In this assay, selection against two genetic markers, CAN1 and URA3, placed on a nonessential end of the left arm of chromosome V selects for the loss of these two genes that results as a consequence of the formation of GCRs that delete the left arm of chromosome V. The types of GCRs that have been observed with this assay include terminal deletions healed by de novo telomere addition, translocations, isoduplications and other types of dicentric translocation chromosomes, interstitial deletions, circular chromosomes, and complex GCRs resulting from multiple cycles of rearrangement, usually as a result of the formation of unstable dicentric translocations (811). Using this assay, oxidative defense pathways, the replication machinery, DNA repair pathways, cell cycle checkpoint pathways, telomere maintenance pathways, and chromatin modification and assembly pathways have been shown to function in concert to prevent genome rearrangements (reviewed in 12). Modifications of the original GCR assay demonstrated that suppression of GCRs mediated by segmental duplications and Ty elements involves additional genes and pathways that do not suppress single-copy sequence-mediated GCRs (1315). Interestingly, homologs of some GCR-suppressing genes and pathways suppress the development of cancer in mammals (16). Most of the genes that suppress GCRs have been identified through a candidate gene approach. Some studies have screened collections of arrayed S. cerevisiae mutants for mutations that cause increased GCR rates and have identified additional genes of interest (1720), although the mutations identified in each screen only had a small overlap with each other. Consequently, it is probable that not all the genes and pathways that suppress GCRs have been identified.

The promise of genome-wide protein–protein interaction, genetic interaction, and drug sensitivity datasets developed using S. cerevisiae is that these data can be used for predicting gene and gene product functions (e.g., ref. 21). Despite the fact that these datasets contain useful information, high-throughput methods are prone to both false-positive and false-negative errors. Consequently, different datasets generated using similar approaches to screen the same mutant collection show a substantial lack of concordance (22). Here, we show that combining these types of data identified additional genes involved in suppressing genome stability based on the hypothesis that these additional genes will share aspects of their phenotypes with known genes. Using these data, we have generated a set of 1,041 gene deletion mutations that have genetic interactions and drug sensitivity profiles matching those mutations known to affect the rate of accumulating GCRs; 787 of them are characterized by dense genetic interactions, and the remaining 254 have limited genetic interactions. To validate this approach, we investigated a subset of the predicted genes and found that deletions of 35 of the 87 genes selected from clusters containing known GCR-suppressing genes for analysis increased the rate of accumulating GCRs, which represents a 200-fold higher efficiency for identifying new GCR-suppressing genes compared with that seen in genome-wide screens. This experimental validation identified genes that had not been previously implicated in suppressing GCRs and demonstrated that components of the nuclear pore, the proteasome, and the morphogenesis and septin checkpoint, as well as proper control of the anaphase-promoting complex/cyclosome (APC/C), play roles in suppressing GCRs. Thus, the resulting gene lists are enriched for genes that function to suppress genome stability. Importantly, our results indicate that identification of genes based on analysis of DNA damaging agent sensitivity and growth-based genetic interaction patterns was an imperfect predictor for identifying genes that suppress GCRs, which has important implications for attempts to reconstruct pathways by computationally combining data from systematic genetic and physical interaction studies.

Results

Bioinformatic Identification of Candidate Genome Stability Genes.

Genes identified as suppressing genome rearrangements.

To identify candidate genes that suppress GCRs (Fig. 1), we first analyzed over 700 published GCR rates of strains with single or multiple mutations. This identified 75 mutations that increased GCR rates by fivefold or more as single mutants and/or caused synergistic increases in rate in combination with other mutations (SI Appendix, Table S1 and Dataset S1) and 40 mutations that did not increase GCR rates (SI Appendix, Table S2). The analysis considered the effect of all pair-wise interactions; for example, the GCR rate of the mre11 lig4 tlc1 triple-mutant strain was compared with the GCR rates of the pairs of strains mre11 and lig4 tlc1, lig4 and mre11 tlc1, and tlc1 and mre11 lig4. Interestingly, many of the mutations that increased the GCR rates also increased GCR rates synergistically in combination with other mutations, whereas many of the mutations tested that did not increase GCR rates suppressed the increased GCR rates caused by other mutations.

Fig. 1.

Fig. 1.

Schematic of the bioinformatic scheme to enrich for genome stability genes. (A) Number of genes identified at each step is indicated. Venn diagrams contain gene counts and indicate merging steps. (B) Breakdown of genes that suppress and have no effect in suppressing GCRs as a function of if these genes were present in the list of genes suppressing GCRs, sensitivity to DNA damaging agents, or both, or were from the list of 10 related genes. Dark bars indicate genes whose roles in GCRs were tested here, and white bars indicate genes whose GCR status was previously known.

Genes identified as suppressing sensitivity to DNA damaging agents.

Mutations in many of the 75 known GCR-suppressing genes caused increased sensitivity to DNA damaging agents (SI Appendix, Table S1). Therefore, we analyzed the results of 155 screens of the S. cerevisiae gene deletion collection against DNA damaging agents (SI Appendix, Table S3). Combined, 4,414 mutations (affecting over 90% of the nonessential genes in the S. cerevisiae genome) were reported to cause some level of increased sensitivity in at least one screen; this number was reduced to 4,143 mutations by treating deletions of dubious ORFs that overlapped verified genes as alleles of the verified genes (Fig. 2A and Dataset S2). The large number of reported mutations causing sensitivity to DNA damaging agents reflected the low reproducibility of different screens of the same damaging agents (Fig. 2E); hierarchal agglomerative clustering analysis grouped screens by laboratory rather than by DNA damaging agent, indicative of “batch effects” in these high-throughput datasets (22). Regardless, the most commonly identified mutations affected genes known to be involved in DNA repair (Fig. 2B). For example, mms4Δ, rad5Δ, mus81Δ, rad59Δ, and rad10Δ were identified in 119, 118, 106, 96, and 91 screens, respectively (Dataset S2). Over 60% of all mutations were identified in 5 or fewer screens, and 16% were observed in only 1 screen. Using random computer simulations (Materials and Methods), we calculated pnhit P values, which was the statistical significance of identifying a gene n times, and found that mutations identified eight times (n = 8) were significant (pnhit < 0.01).

Fig. 2.

Fig. 2.

Analysis of DNA damaging agent treatments. (A) Histogram of the number of DNA damaging agent sensitivity mutations as a function of the number of screens in which each mutation was identified. (B) View of the histogram in A plotting all mutations above the axis and only those mutations known to affect the DNA damage response below the axis. (C) View of the histogram in A after filtering out agent- and laboratory-specific mutations. (D) View of the histogram in C after filtering out nonsignificant genes. (E) Summary table of treatments that have been screened multiple times and the number of mutations found in any screen, in a statistically significant number of screens (pnhit < 0.01), and in all screens. UVC, ultraviolet light in band C.

We also analyzed mutations identified a statistically significant number of times (pnhit < 0.01) that caused sensitivity to specific DNA damaging treatments using the program GOstat (23) to identify statistically significant gene ontology terms (24). This analysis primarily identified terms related to DNA repair, DNA damage signaling, chromatin, and chromosome organization and biogenesis (Dataset S3). Some unexpected pathways were also identified: ubiquitin-dependent protein catabolism of the multivesicular body pathway [2-dimethylaminoethyl chloride (DMAEC), hydroxyurea (HU), mitomycin c, and tirapazamine]; osmotic stress (cisplatin and mitomycin c); vesicle-mediated transport (bleomycin, HU, and oxaliplatin); peroxisome function [methylmethane sulfonate (MMS)]; and secretory pathways, membrane invagination, and glycoprotein biosynthesis (bleomycin). In contrast, genes associated with UV light and ionizing radiation (IR) resistance were predominantly associated with DNA repair, damage signaling, and chromatin remodeling. One implication of these results is that some pathways involved in resistance to chronic drug treatments but not UV or IR treatment might function by means of drug detoxification, drug export, or amelioration of damage to cellular components other than DNA.

Because we were interested in common DNA damage responses, we developed a statistical test to identify mutations with biased distributions to screens of specific treatments and specific laboratories (Materials and Methods). We applied this test to all 4,143 mutations, which reduced the number to 1,446 mutations. Most mutations eliminated were observed in four or fewer screens (Fig. 2C). In addition, the test eliminated frequently observed mutations that were specific to a particular laboratory, such as yll032cΔ, rpl15bΔ, gal1Δ, and tma46Δ, which were observed 52, 50, 49, and 49 times, respectively, in a single laboratory or were specific to a particular damaging agent, including hxk2Δ, ybr242wΔ, ald6Δ, atg12Δ, and ylr064wΔ, which were observed 10, 9, 9, 8, and 8 times, respectively, almost exclusively in cisplatin sensitivity screens. Although the eliminated mutations had no obvious role in the DNA damage response, we tested 45 of these mutations, including the laboratory-specific examples cited above, for their affect on chronic exposure to HU, MMS, 4-nitroquinoline 1-oxide (4NQO), and/or camptothecin. Forty-four of the 45 mutations caused no drug sensitivity (P < 0.0001, hypergeometric test), whereas yll032cΔ caused weak MMS sensitivity. Retaining only those mutations identified in a significant number of screens (pnhit < 0.01) from the 1,446 mutations resulted in 928 mutations, which included 44 of 75 mutations increasing GCR rates (Fig. 2D and Dataset S1).

Genes identified by genetic congruence.

To find genes that had been missed but with related functions, we scored all the genes in the genome on the basis of their genetic similarity or “congruence” (25) with previously identified genes using reported growth-based genetic interactions. The growth-based genetic data also have imperfect concordance; the mean overlap for a reported subset of genetic interactions in S. cerevisiae by different groups has been estimated at less than 50% (26), potentially due to errors in scoring growth phenotypes, escape of diploids during haploid selection (27, 28), additional mutations present in strains in the deletion collection (29, 30), and/or the presence of an incorrect mutation due to cross-contamination, which we have tested for and corrected in our copy of the genome deletion collection. Therefore, our strategy to improve the robustness of this step was to score the genetic congruence of each candidate mutation using the combined genetic signature of the interactions of the 75 mutations causing increased GCR rates and the 928 mutations causing DNA damaging agent sensitivity with each gene in the rest of the genome (∼6,000 genes; SI Appendix, SI Materials and Methods). Congruence scores could range from 0 (no congruence) to 1 (complete congruence), and random simulations were performed to identify statistically significant congruence score cutoffs.

Using the 75 GCR genes, the maximum congruence score was 0.115 for SRS2 (Dataset S4), and the cutoff of 0.040 (P < 0.01, random simulation) selected 227 genes, which included 44 starting genes and 183 new genes. Forty-two of the 44 recovered starting genes were reidentified by congruence selection even when the gene was removed from the initial list. Of the 227 GCR congruent genes, 71 suppressed GCR formation (61% of those tested) and 46 did not when including the experimental results described below, suggesting an enrichment for GCR-suppressing genes (P = 1 × 10−102 if the 110 GCR-suppressing genes identified previously and below are the only ones that exist and P = 2 × 10−20 if all 1,041 candidates identified here suppress GCRs, hypergeometric test). Merging the 75 starting genes and 227 congruent genes produced a merged GCR list of 258 genes (Fig. 1).

Using the 928 DNA damaging agent genes, the maximum congruence score was 0.063 for SWR1 (Dataset S4) and the cutoff of 0.046 (P < 0.01, random simulation) selected 148 genes, which included 114 starting genes and 34 new genes. One hundred five of the 114 starting genes were identified even when removed from the initial list. Thirty-two of the 34 new genes were nonessential, 31 were previously identified in at least one screen, and deletion of 20 of these 32 nonessential new genes caused at least some sensitivity when tested against chronic exposure to HU, MMS, 4NQO, and/or camptothecin (P < 1 × 10−7, hypergeometric test; Dataset S5). Because most newly identified genes suppressed drug sensitivity, we generated a merged list of 962 genes from the starting genes and the genetically congruent genes (Fig. 1).

Merging the 258 GCR genes and 962 DNA damaging agent genes implicated in this study generated a merged list of 1,031 genes (Fig. 1 and Dataset S1). One hundred eighty-nine genes were shared between the merged GCR gene and merged DNA damaging agent lists: 69 were unique to the merged GCR gene list, and 773 were unique to the merged DNA damaging agent list. Additionally, we noted 10 genes (RTT105, IRC15, IRC3, DOT1, DPB3, MLH1, NAS6, PAP2, UMP1, and VAC7) that fell below statistical cutoffs in our analysis but were related to and clustered with bona fide GCR-suppressing genes (see below), and we added them to the final list, resulting in a total of 1,041 genes.

Robustness of the method.

A computational test of the robustness of our method was performed by determining if the method could identify genes found in three different systematic screens using modified GCR assays to identify genes that suppress GCRs (1719) when those genes were removed from the original list of GCR-suppressing genes that anchored the analysis. This analysis recovered 7 of 8 (P < 0.0001, hypergeometric test), 8 of 11 (P < 0.0004), and 13 of 16 (P < 7 × 10−7) of the genes reported in these screens, respectively (SI Appendix, Table S4), although it should be noted that some of these genes that were not identified by our analysis only played small roles in suppressing GCRs and that many of the genes experimentally verified here were not identified by these screens (see below). Despite these differences, the robustness with which these genes from these screens were identified suggests that the final list is enriched in genes involved in preventing genome stability.

Computational Analysis and Prioritization of Candidate Genome Stability Genes.

The 1,041 genes implicated by this analysis were large enough to be problematic for gene-by-gene validation. In addition, the identification of potential drug detoxification mechanisms suggested that not all these genes directly suppress genome instability. Thus, to prioritize the final list of 1,041 genes for subsequent experimental analysis, the genes were subjected to agglomerative hierarchical clustering analysis (Fig. 3 and Dataset S6) using congruence scores calculated from reported growth-based genetic interactions (SI Appendix, SI Materials and Methods). This analysis divided the list into 74 clusters (comprising 787 genes) with an additional “unclustered” group (comprising 254 genes) that contained those genes that did not cluster with other genes due to lack of shared genetic interactions (Fig. 4).

Fig. 3.

Fig. 3.

Annotated genes from cluster 4. The GCR rate column identifies mutations tested in the GCR assay: Circles were previously tested, squares were tested in this study, crosses were essential genes, solid symbols increased GCR rates as single mutants, half filled-in symbols only synergistically increased GCR rates in combination with other mutants, and open symbols did not increase GCR rates. “Inclusion” indicates if a gene was identified in the GCR rate (GCR Rate), genetic congruence to GCR genes (GCR Similar), DNA damaging agent (Drug), or genetic congruence to DNA damaging agent genes (Drug Similar) stage of the bioinformatics analysis. “IRC” indicates those genes causing increased recombination centers (48). “TL” indicates mutations identified in two telomere-length screens by Askree et al. (60) and Gatbonton et al. (61), with decreased (A, G) or increased (A+, G+) telomere lengths. “Ty” indicates mutations causing decreased (Ty1, Ty3) or increased (Ty1+, Ty3+) transposition (49, 62, 63). “CST” indicates mutations identified as affecting chromosome stability by several assays (64, 65). LOH indicates mutations increasing loss of heterozygosity by several assays (66). Sensitivity to each DNA damaging agent is indicated by vertical bars, with different treatments having alternate colors.

Fig. 4.

Fig. 4.

Overview of the clustering of the bioinformatically identified genes. (A) Binary interaction map showing the presence (black) or absence (white) of genetic interactions (Materials and Methods) between all 1,041 genes in the 74 clusters and the nonclustered group (horizontal) and 787 genes in the 74 clusters (vertical). (B) Genetic congruence score for each of the 1,041 genes with the GCR genes. Boundaries for clusters 4 and 32 are shown as vertical lines. (C) Genetic congruence score with the genes suppressing sensitivity to DNA damaging agents. (D) Number of DNA damaging agents screens in which different deletions of the 1,041 genes were identified. (E) GCR rates of single-gene deletion mutants. Genes with rates listed as “Low” in SI Appendix, Table S5 were arbitrarily assigned the WT GCR rate (3.5 × 10−10) for display purposes.

Many clusters were enriched in genes involved in specific cellular functions. For example, cluster 1 was enriched in polarity determination and vesicle-mediated transport; cluster 2 was enriched in mitotic nuclear and chromosome migration; cluster 3 was enriched in chromatin modification and transcription; and cluster 4 was enriched in the DNA damage response, particularly those genes involved in double-strand break (DSB) repair (Dataset S6). Within each cluster, genes encoding protein complexes or belonging to known pathways tended to group together and to have few interactions with each other, consistent with these genes belonging to single epistasis groups. Furthermore, genetic interactions between genes within an individual cluster (Fig. 4A) were consistent with the presence of multiple epistasis groups. Together, these observations indicated that the clustering captured important aspects of at least some of the functions of these genes.

Some biological functions were divided between multiple clusters. DNA damage response genes were divided between cluster 4 (Fig. 3) and cluster 32 (SI Appendix, Fig. S1), as well as the smaller clusters 53, 55, 59, and 60 (Dataset S6). Clusters 4 and 32 have high GCR congruence scores (Fig. 4B) and moderate DNA damaging agent congruence scores (Fig. 4C), and they contain many of the genes implicated in suppressing sensitivity to many different DNA damaging agents (Fig. 4D) and in playing important roles in suppressing GCRs (Fig. 4E). Examination of the interactions of these two clusters suggests that the major reason the clustering algorithm split these genes into two clusters was that cluster 32 had fewer interactions with cluster 3 (chromatin modification) than cluster 4 did. In contrast, genes biased toward interactions with cluster 32 but not with cluster 4 did not define clear pathways or groups.

Remarkably, genes involved in a number of well-characterized pathways, such as base-excision repair, nucleotide-excision repair, and mismatch repair, tended not to be present in either cluster 4 or 32 (Fig. 3 and SI Appendix, Fig. S1), and genes from these pathways were frequently divided between multiple clusters. The lack of clustering of these genes is consistent with their general paucity of genetic interactions relative to DSB repair genes in the absence of DNA damaging agents (Fig. 4 A and B and Dataset S5). However, the importance of these genes in the presence of DNA damaging agents is emphasized by the number of screens in which these genes were identified (Datasets S2 and S6) and is consistent with the known roles of these gene products. We note that the inability of unperturbed growth to capture the roles of these types of genes can be anticipated from decades of classic genetic studies as well as a recent report of changes in genetic interactions measured in a high-throughput manner due to the presence of MMS (31).

Experimental Validation of the Enrichment of Genome Stability Genes.

We selected a subset of 87 genes from the final list of 1,041 genes to analyze their potential roles in suppressing GCRs. Given that some clusters might be more important for drug detoxification or export than genome stability per se, these genes were primarily selected from clusters that contained known GCR-suppressing genes. None of these genes had been tested for a role in suppressing GCRs at the time this analysis was initiated, although, subsequently, the results of studies of some of the selected genes have been reported by others. We also surveyed genes from a number of other clusters. Overall, we tested the effects of 87 different single-gene deletion mutations in our standard GCR assay that measures GCRs mediated by single-copy sequences (Table 1 and SI Appendix, Table S5) and found that 35 (40%) caused at least a modest but significant threefold or higher increase in the spontaneous GCR rate, which suggests a substantial enrichment for genes that suppress genome instability in the higher scoring clusters generated by the bioinformatic analysis. The presence of a newly tested gene in clusters 4, 32, 53, 55, 59, and 60 did not have a statistically significant bias for suppressing GCRs (P = 0.2, Fisher exact probability), and the presence of new genes in these clusters did not correlate with a higher GCR rate (P = 0.3, Mann–Whitney U test).

Table 1.

GCR rates of genome instability mutants implicated by bioinformatic analysis

Genotype* Systematic name Strain Cluster No. of DNA damaging screens Rate
WT RDKY3615 n.a. n.a. 3.5 × 10−10 (1)
esc2::HIS3 ydr363w RDKY7030 32 11 9.0 × 10−8 (257)
rmi1::HIS3 ypl024w RDKY6242 32 6 6.0 × 10−8 (189)
mrc1::TRP1, tof1::HIS3 ycl061c, ynl273w RDKY7032 4, 4 37, 33 2.6 × 10−8 (75)
slx8::HIS3 yer116c RDKY7527 4 11 2.6 × 10−8 (75)
slx5::HIS3 ydl013w RDKY7524 4 9 2.3 × 10−8 (66)
cdh1::HIS3 ygl003c RDKY6485 21 4 2.1 × 10−8 (58)
nup84::HIS3 ydl116w RDKY6195 32 9 1.6 × 10−8 (44)
rtt107::HIS3 yhr154w RDKY7031 4 23 9.4 × 10−9 (27)
rpn10::HIS3 yhr200w RDKY6216 3 16 9.0 × 10−9 (26)
rsc2::G418 ylr357w RDKY6006 11 11 4.8 × 10−9 (13)
mms1::HIS3 ypr164w RDKY6206 4 32 3.9 × 10−9 (11)
nup133::HIS3 ykr082w RDKY6476 7 9 3.7 × 10−9 (10)
rtt105::HIS3 yer104w RDKY6673 32 0 3.3 × 10−9 (9.4)
dst1::G418 ygl043w RDKY7023 3 8 3.0 × 10−9 (8.6)
arp8::HIS3 yor141c RDKY5949 32 21 2.9 × 10−9 (8.4)
irc15::HIS3 ypl017c RDKY7024 29 0 2.8 × 10−9 (8.0)
nup60::HIS3 yar002w RDKY6489 7 11 2.6 × 10−9 (7.4)
irc3::HIS3 ydr332w RDKY7467 29 0 2.4 × 10−9 (6.9)
csm3::G418 ymr048w RDKY5708 4 46 2.2 × 10−9 (6.3)
clb5::G418 ypr120c RDKY7458 32 17 2.2 × 10−9 (6.3)
rml2::HIS3 yel050c RDKY7069 12 8 2.2 × 10−9 (6.3)
hsl1::HIS3 ykl101w RDKY6487 29 20 1.9 × 10−9 (5.4)
tof1::HIS3 ynl273w RDKY5135 4 33 1.6 × 10−9 (4.6)
mph1::G418 yir002c RDKY7026 34 60 1.6 × 10−9 (4.6)
pin4::G418 ybl051c RDKY7476 15 16 1.6 × 10−9 (4.6)
rev7::G418 yil139c RDKY7483 60 28 1.5 × 10−9 (4.3)
sae2::HIS3 ygl175c RDKY6234 55 62 1.4 × 10−9 (4.0)
hst3::HIS3 yor025w RDKY6060 14 26 1.4 × 10−9 (4.0)
ctf4::G418 ypr135w RDKY6018 4 11 1.4 × 10−9 (4.0)
dot1::HPH ydr440w RDKY7021 16 2 1.4 × 10−9 (4.0)
hta1::HIS3 ydr225w RDKY6490 15 13 1.4 × 10−9 (4.0)
rtt109::HIS3 yll002w RDKY6226 4 16 1.4 × 10−9 (4.0)
cdc73::HIS3 ylr418c RDKY6410 3 9 1.3 × 10−9 (3.7)
lrs4::G418 ydr439w RDKY7470 2 29 1.2 × 10−9 (3.4)
clb2::G418 ypr119w RDKY7456 29 45 1.2 × 10−9 (3.4)

n.a., not applicable.

*Deletions constructed in RDKY3615 [MATα leu2Δ1 his3Δ200 trp1Δ63 ura3-52 ade2Δ1 ade8 lys2ΔBgl hom3-10 hxt13::URA3].

Number in parentheses corresponds to fold increase in rate over the wild-type rate.

The newly identified genes that function in the suppression of genome instability could be divided into three classes. The first class of genes encoded subunits of complexes or components of pathways already known to be involved in maintaining genome stability. These included RMI1, which encodes a subunit of the Sgs1/Rmi1/Top3 complex (34); SAE2, which encodes a factor that acts in conjunction with the Mre11/Rad50/Xrs2 complex (35); RTT109, which is involved in the ASF1-dependent acetylation of K56 of histone H3 (36); and HST3, ARP8, RSC2, and HTA1, which function in chromatin assembly and remodeling pathways, processes known to prevent genome instability (37). The second class of genes has been implicated by other analyses as suppressing genome stability but had never been analyzed in the GCR assay at the time this analysis was initiated. This group of genes included CDC73, CLB2, CLB5, CSM3, DOT1, ESC2, MMS1, MRC1, MPH1, NUP84, NUP133, NUP60, REV7, RTT107 (ESC4), SLX5, SLX8, and TOF1. The third class of genes lacks known functions or has not previously known to play a role in maintaining genome stability. These genes include CDH1, CTF4, DST1, HSL1, IRC3, IRC15, LRS4, PIN4, RTT105, RML2, and RPN10. Unlike the case of RPN10, which encodes a proteasome subunit (38), deletion of the nonessential proteasome-related genes NAS6, RPN4, and UMP1 did not increase the GCR rate (SI Appendix, Table S5).

In contrast, 52 identified genes did not increase rates in our standard GCR assay when mutated (SI Appendix, Table S5). A number of these genes were from clusters 4 and 32, which contained many genes that caused substantially increased GCR rates when mutated (Fig. 4E). Defects in some of these genes were previously reported to show significant numbers of genetic interactions with defects in DNA repair, including the genes encoding the Ard1-Nat1 N-terminal acetyltransferase complex (39) and the Get1–Get2 complex involved in transporting tail-anchored proteins to the endoplasmic reticulum (40). Moreover, a number of other genes that were previously implicated as functioning in DNA repair and DNA damage responses, including CCR4, WSS1, PPH3, DOA1, and CSM1, did not appear to act in suppressing GCRs. Although it is possible that these genes play no role in maintaining genome stability, it is also possible that they suppress GCRs not detected by our standard GCR assay. Multiple genes, including RAD6, do not suppress GCRs in the standard assay but do in other GCR assays (13, 14). Other genes, such as MRC1 and TEL1, play redundant roles in suppressing GCRs, whose role can only be observed when combined with other mutations (41) (Table 1), whereas other genes do not increase GCRs because they are required for producing GCRs.

Reiterating the Analysis with Newly Identified GCR Suppressors.

The 35 newly validated GCR-suppressing genes from this analysis (see below) were combined with the initial list of 75 GCR-suppressing genes to generate a starting list of 110 GCR-suppressing genes. This newly identified set of starting genes was then reanalyzed by our bioinformatics pipeline. Two hundred twenty-three genes, rather than 227 genes from the original analysis, were identified as having statistically significant genetic congruence scores (score >0.046; P < 0.01), which included 67 of the 110 starting genes. Compared with the original analysis, 18 genes were omitted (CDC6, MSH2, PBY1, PEP3, PPM1, RDH54, RFA1, RNH201, RPN6, SAE2, SGO1, SLK19, SPT4, SUM1, THP2, UBP14, ULP1, and VPS36), and 14 genes were included (ASC1, CSE2, EAF5, HOS2, MNN10, NUP188, PHO23, RTT103, SEC22, SIF1, SIN3, SNF4, SRC1, and YKE2). Mutations in PBY1, SGO1, and SPT4, which were eliminated from the list, do not increase the GCR rate (SI Appendix, Table S5). Mutations in MSH2, RDH54, RFA1, and SAE2, which were also eliminated, cause only modest increases in the GCR rate as single mutations (msh2Δ, rdh54Δ, and sae2Δ; Table 1 and SI Appendix, Table S1), are complicated by their causing increased rates of point mutations in addition to GCRs (msh2Δ) (32), or are complicated by the existence of different hypomorphic alleles that cause different phenotypes (rfa1) (33). However, all these mutations were retained in this second analysis through their presence in the initial GCR list and/or by effects on sensitivity to DNA damaging agents. Thus, these results suggest that adding more data will further refine the results.

Discussion

Systematic genetics using the S. cerevisiae deletion and hypomorphic allele collections has been well established. However, the ability to screen these mutants readily for complex phenotypes or phenotypes requiring involved quantitative assays, such as GCR assays, can be difficult and subject to significant error. Thus, we designed a bioinformatic protocol for identifying unanticipated genes involved in suppressing GCRs, which involved handling numerous genome-wide datasets affected by both false-positive and false-negative errors. This analysis identified genes that were successfully enriched for genes involved in genome stability, as evidenced by independent identification of most previously known genes (1719) and by the experimental validation of 40% of 87 identified genes that were tested for a role in suppressing GCRs (Table 1 and SI Appendix, Table S5), including genes in unexpected pathways. Our analysis of these 87 genes resulted in the identification of more GCR suppressing genes than resulted from three genome-wide screens involving the analysis of more than 14,000 mutants, which is consistent with a 200-fold enrichment in GCR-suppressing genes relative to the whole-genome screens. In addition, the observed gene validation frequency is likely to be higher than reported here because many mutations only cause increased GCR rates in conjunction with other mutations, such as tel1, or in segmental duplication GCR assays, such as rad6 (13, 41). A critical next step in our analysis is to test these mutations in multiple GCR assays that probe different chromosomal features and combine these mutations with other mutations. Importantly, this approach is generally applicable; these methods allowed the analysis to be performed multiple times as more data have become available, and the nature of the starting set of well-characterized genes and genome-wide screens need not be tied to the problem of genome stability and can readily accommodate RNAi data generated in mammalian systems.

Experimental verification of the implicated genes revealed a number of interesting pathways. Genome instability was increased by deletion of genes involved in synchronizing multiple phases of the cell cycle, including CLB2, CLB5, HSL1, PIN4, and particularly CDH1, which encodes a subunit of the APC/C complex that degrades proteins during mitosis and G1, this role for CDH1 is consistent with observations in vertebrates (42). Genes encoding two different subcomplexes of the nuclear pore, the Nup84 complex (NUP84, NUP133, NUP120, NUP145, NUP85, SEH1, and SEC13) (43) and NUP60, suppressed genome instability; studies performed while this work was in progress suggest a role in suppressing accumulation of DSBs via sumoylation of DNA repair enzymes (44) and direct recruitment of DSBs to the nuclear pore (45). We also identified genes that may suppress GCRs by indirectly aiding DNA replication, including CTF4, which may link DNA synthesis to sister chromatid cohesion, and DST1, which potentially reduces collisions between RNA and DNA polymerases. In addition, we found a role for RPN10, which encodes a non-ATPase base subunit of the 19S regulatory particle of the 26S proteasome, in suppressing genome instability, suggesting that the proteasome may play roles in genome instability outside of nucleotide excision repair (38), consistent with recent reports linking the proteasome to DSB repair in S. cerevisiae (46) and vertebrates (47). Interestingly, deletion of other genes related to the proteasome, including DOA1, NAS6, UMP1, UBP6, and especially RPN4, which encodes a transcription factor that stimulates proteasome gene expression and has a similar genetic interaction profile to RPN10, did not cause increased GCR rates. Thus, the defect in rpn10Δ strains might involve a specific feature or function of the proteasome (or the regulatory particle) that is not affected by eliminating other nonessential proteasome components. Taken together, this bioinformatics procedure has successfully identified tested components of genome stability pathways, untested components of tested genome stability pathways, untested genome stability pathways, and genes in other pathways that are beginning to be implicated in suppressing genome instability. These successes encourage further characterization of genes whose roles in suppressing genome instability might currently be less clear, including IRC3, IRC15, RML2, and RTT105 (48, 49).

This bioinformatic scheme rested on three assumptions: (i) Systematically generated genome-wide data are of sufficient quality to be useful, (ii) novel genes that suppress GCRs share some phenotypes with known genes that suppress GCRs, and (iii) genetic interactions reported on the basis of change in nonperturbed growth provide a reasonable surrogate for other biological processes. The above assumptions are sufficiently true that combining these independent sources of information yielded unexpected genes of interest that were validated at high frequency. The most problematic assumption, however, was that genetic interactions based on growth phenotypes were a reasonable measure of similarity for roles in suppressing genome instability. One of the stronger counterexamples that can be cited is the observation that deletion of 12 of 31 tested genes in a high-scoring DNA damage cluster (cluster 4) did not cause increased GCR rates as single mutations. Sufficient genetic data exist for genes in cluster 4 to suggest that nonperturbed growth-based genetic interactions are only a crude surrogate for measuring similarity in suppressing GCRs, which is consistent with the substantial changes in synthetic lethal interactions between deletion mutations caused DNA damaging agents (31). Additionally, because only pair-wise interactions are typically identified, other kinds of important genetic results cannot be identified, such as suppression of the lethality of srs2Δ sgs1Δ double mutants by mutations causing homologous recombination defects (50), and because more complex genetic redundancies, which are particularly important in higher eukaryotes, cannot be handled. Together these factors argue that although these data can be extraordinarily useful in aggregate as we have demonstrated here, caution is called for in any attempt to use these kinds of data exclusively to derive biological pathways de novo. This is particularly true when growth is used as a surrogate marker for measuring a specific phenotype, because growth defects may not be directly related to the phenotype of interest. We are presently implementing an approach in which systematically generated double-mutant strains designed to query the enriched gene lists described here will be analyzed using multiple GCR assays to define better the pathways that suppress GCRs implied by the bioinformatic analysis presented here. We anticipate that human orthologs of verified GCR genes identified here will also play roles in suppressing genome instability and may be important for suppressing cancer initiation and progression.

Materials and Methods

Bioinformatic Analysis.

The bioinformatic analysis described here has been implemented in the integration of multiple orthogonal datasets (IMOD) program package. IMOD and associated documentation and data files are available at http://sourceforge.net/projects/imod-gene. IMOD consists of command-line programs and shell scripts. IMOD readily compiles and runs in UNIX (uniplexed information and computing service) system-like operating systems. A detailed description of the methods is provided in SI Appendix, SI Materials and Methods.

Analysis of DNA damaging agent sensitivities.

Mutations deemed as causing sensitivity to different DNA damaging agent treatments were included based on the recommendations of the authors of the individual studies (SI Appendix, Table S3). Deletions of genes deemed “dubious ORFs” by the Saccharomyces Genome Database that overlapped validated genes were treated as mutant alleles of the validated genes; for example, ybr099cΔ was treated as an mms4 mutation. The full list of overlaps used is available as part of the data distributed with the IMOD software. The pnhit P values for observing a mutation in n of the N DNA damaging screens were calculated using probabilities from 1,000,000 random simulations (SI Appendix, SI Materials and Methods). Determination if the distribution of any particular mutation was significantly biased toward a group of screens, such as those belonging to a specific laboratory or a specific DNA damaging agent, was calculated by a ratio test of likelihoods (SI Appendix, SI Materials and Methods).

Calculation of genetic distance and genetic congruence.

Growth-based genetic interactions were measured using a modified BioGRID database derived from version 2.0.60 (51), including the interaction categories “synthetic lethality,” “synthetic growth defect,” and “haploinsufficiency,” as well as “phenotypic enhancement” data specifically derived from E-MAP studies (5255). We also added 8,102 and 191,890 E-MAP interactions from additional studies published during the course of this analysis (56, 57). The interaction data were used to calculate genetic distances via the composite angle distance, which is similar to the Jaccard distance (58) but has a number of advantages for analysis of multiple genes (SI Appendix, SI Materials and Methods). We scored genetic congruence of each gene in the genome against the list of genes of interest using the composite angle distance and performed over 100,000 random simulations to calculate P values (SI Appendix, SI Materials and Methods).

Clustering.

Genes were clustered on the basis of their genetic congruence using agglomerative hierarchical clustering (59) (SI Appendix, SI Materials and Methods).

Yeast Genetics.

S. cerevisiae strains were constructed in the RDKY3615 background (MATa leu2Δ1 his3Δ200 trp1Δ63 lys2ΔBgl hom3-10 ade2Δ1 ade8 ura3-52 hxt13::URA3) using standard PCR-based mutagenesis methods. The media and protocol for strain propagation and measuring GCR rates were essentially as described previously (7).

Supplementary Material

Supporting Information

Acknowledgments

We thank Hans Hombauer, Jorritt Ensernik, Vincent Pennaneach, Ellen Kats, and Kyungjae Myung for the generous gift of S. cerevisiae strains. This work was supported by National Institutes of Health Grants GM26017 and GM085764.

Footnotes

The authors declare no conflict of interest.

See Author Summary on page 19055 (volume 109, number 47).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1216733109/-/DCSupplemental.

References

  • 1.Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
  • 2.Loeb LA. A mutator phenotype in cancer. Cancer Res. 2001;61(8):3230–3239. [PubMed] [Google Scholar]
  • 3.Hoeijmakers JH. Genome maintenance mechanisms for preventing cancer. Nature. 2001;411(6835):366–374. doi: 10.1038/35077232. [DOI] [PubMed] [Google Scholar]
  • 4.Vessey CJ, Norbury CJ, Hickson ID. Genetic disorders associated with cancer predisposition and genomic instability. Prog Nucleic Acid Res Mol Biol. 1999;63:189–221. doi: 10.1016/s0079-6603(08)60723-0. [DOI] [PubMed] [Google Scholar]
  • 5.Soussi T, Ishioka C, Claustres M, Béroud C. Locus-specific mutation databases: Pitfalls and good practice based on the p53 experience. Nat Rev Cancer. 2006;6(1):83–90. doi: 10.1038/nrc1783. [DOI] [PubMed] [Google Scholar]
  • 6.van de Wetering CI, Horne MC, Knudson CM. Chromosomal instability and supernumerary centrosomes represent precursor defects in a mouse model of T-cell lymphoma. Cancer Res. 2007;67(17):8081–8088. doi: 10.1158/0008-5472.CAN-07-1666. [DOI] [PubMed] [Google Scholar]
  • 7.Chen C, Kolodner RD. Gross chromosomal rearrangements in Saccharomyces cerevisiae replication and recombination defective mutants. Nat Genet. 1999;23(1):81–85. doi: 10.1038/12687. [DOI] [PubMed] [Google Scholar]
  • 8.Pennaneach V, Kolodner RD. Recombination and the Tel1 and Mec1 checkpoints differentially effect genome rearrangements driven by telomere dysfunction in yeast. Nat Genet. 2004;36(6):612–617. doi: 10.1038/ng1359. [DOI] [PubMed] [Google Scholar]
  • 9.Putnam CD, Pennaneach V, Kolodner RD. Chromosome healing through terminal deletions generated by de novo telomere additions in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2004;101(36):13262–13267. doi: 10.1073/pnas.0405443101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Putnam CD, Pennaneach V, Kolodner RD. Saccharomyces cerevisiae as a model system to define the chromosomal instability phenotype. Mol Cell Biol. 2005;25(16):7226–7238. doi: 10.1128/MCB.25.16.7226-7238.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pennaneach V, Kolodner RD. Stabilization of dicentric translocations through secondary rearrangements mediated by multiple mechanisms in S. cerevisiae. PLoS ONE. 2009;4(7):e6389. doi: 10.1371/journal.pone.0006389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kolodner RD, Putnam CD, Myung K. Maintenance of genome stability in Saccharomyces cerevisiae. Science. 2002;297(5581):552–557. doi: 10.1126/science.1075277. [DOI] [PubMed] [Google Scholar]
  • 13.Putnam CD, Hayes TK, Kolodner RD. Specific pathways prevent duplication-mediated genome rearrangements. Nature. 2009;460(7258):984–989. doi: 10.1038/nature08217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Putnam CD, Hayes TK, Kolodner RD. Post-replication repair suppresses duplication-mediated genome instability. PLoS Genet. 2010;6(5):e1000933. doi: 10.1371/journal.pgen.1000933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chan JE, Kolodner RD. A genetic and structural study of genome rearrangements mediated by high copy repeat Ty1 elements. PLoS Genet. 2011;7(5):e1002089. doi: 10.1371/journal.pgen.1002089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang Y, et al. Mutation in Rpa1 results in defective DNA double-strand break repair, chromosomal instability and cancer in mice. Nat Genet. 2005;37(7):750–755. doi: 10.1038/ng1587. [DOI] [PubMed] [Google Scholar]
  • 17.Huang ME, Rio AG, Nicolas A, Kolodner RD. A genomewide screen in Saccharomyces cerevisiae for genes that suppress the accumulation of mutations. Proc Natl Acad Sci USA. 2003;100(20):11529–11534. doi: 10.1073/pnas.2035018100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Smith S, et al. Mutator genes for suppression of gross chromosomal rearrangements identified by a genome-wide screening in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2004;101(24):9039–9044. doi: 10.1073/pnas.0403093101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kanellis P, et al. A screen for suppressors of gross chromosomal rearrangements identifies a conserved role for PLP in preventing DNA lesions. PLoS Genet. 2007;3(8):e134. doi: 10.1371/journal.pgen.0030134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stirling PC, et al. The complete spectrum of yeast chromosome instability genes identifies candidate CIN cancer genes and functional roles for ASTRA complex components. PLoS Genet. 2011;7(4):e1002057. doi: 10.1371/journal.pgen.1002057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jordan PW, Klein F, Leach DR. Novel roles for selected genes in meiotic DNA processing. PLoS Genet. 2007;3(12):e222. doi: 10.1371/journal.pgen.0030222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Leek JT, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11(10):733–739. doi: 10.1038/nrg2825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Beissbarth T, Speed TP. GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004;20(9):1464–1465. doi: 10.1093/bioinformatics/bth088. [DOI] [PubMed] [Google Scholar]
  • 24.Ashburner M, et al. The Gene Ontology Consortium Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25(1):25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ye P, et al. Gene function prediction from congruent synthetic lethal interactions in yeast. Mol Syst Biol. 2005;1:2005.0026. doi: 10.1038/msb4100034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Tischler J, Lehner B, Fraser AG. Evolutionary plasticity of genetic interaction networks. Nat Genet. 2008;40(4):390–391. doi: 10.1038/ng.114. [DOI] [PubMed] [Google Scholar]
  • 27.Daniel JA, Yoo J, Bettinger BT, Amberg DC, Burke DJ. Eliminating gene conversion improves high-throughput genetics in Saccharomyces cerevisiae. Genetics. 2006;172(1):709–711. doi: 10.1534/genetics.105.047662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Singh I, Pass R, Togay SO, Rodgers JW, Hartman JL., 4th Stringent mating-type-regulated auxotrophy increases the accuracy of systematic genetic interaction screens with Saccharomyces cerevisiae mutant arrays. Genetics. 2009;181(1):289–300. doi: 10.1534/genetics.108.092981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lehner KR, Stone MM, Farber RA, Petes TD. Ninety-six haploid yeast strains with individual disruptions of open reading frames between YOR097C and YOR192C, constructed for the Saccharomyces genome deletion project, have an additional mutation in the mismatch repair gene MSH3. Genetics. 2007;177(3):1951–1953. doi: 10.1534/genetics.107.079368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Game JC, et al. Use of a genome-wide approach to identify new genes that control resistance of Saccharomyces cerevisiae to ionizing radiation. Radiat Res. 2003;160(1):14–24. doi: 10.1667/rr3019. [DOI] [PubMed] [Google Scholar]
  • 31.Bandyopadhyay S, et al. Rewiring of genetic networks in response to DNA damage. Science. 2010;330(6009):1385–1389. doi: 10.1126/science.1195618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Myung K, Datta A, Chen C, Kolodner RD. SGS1, the Saccharomyces cerevisiae homologue of BLM and WRN, suppresses genome instability and homeologous recombination. Nat Genet. 2001;27(1):113–116. doi: 10.1038/83673. [DOI] [PubMed] [Google Scholar]
  • 33.Chen C, Umezu K, Kolodner RD. Chromosomal rearrangements occur in S. cerevisiae rfa1 mutator mutants due to mutagenic lesions processed by double-strand-break repair. Mol Cell. 1998;2(1):9–22. doi: 10.1016/s1097-2765(00)80109-4. [DOI] [PubMed] [Google Scholar]
  • 34.Chang M, et al. RMI1/NCE4, a suppressor of genome instability, encodes a member of the RecQ helicase/Topo III complex. EMBO J. 2005;24(11):2024–2033. doi: 10.1038/sj.emboj.7600684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lengsfeld BM, Rattray AJ, Bhaskara V, Ghirlando R, Paull TT. Sae2 is an endonuclease that processes hairpin DNA cooperatively with the Mre11/Rad50/Xrs2 complex. Mol Cell. 2007;28(4):638–651. doi: 10.1016/j.molcel.2007.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Marmorstein R, Trievel RC. Histone modifying enzymes: Structures, mechanisms, and specificities. Biochim Biophys Acta. 2009;1789(1):58–68. doi: 10.1016/j.bbagrm.2008.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Myung K, Pennaneach V, Kats ES, Kolodner RD. Saccharomyces cerevisiae chromatin-assembly factors that act during DNA replication function in the maintenance of genome stability. Proc Natl Acad Sci USA. 2003;100(11):6640–6645. doi: 10.1073/pnas.1232239100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Reed SH, Gillette TG. Nucleotide excision repair and the ubiquitin proteasome pathway—Do all roads lead to Rome? DNA Repair (Amst) 2007;6(2):149–156. doi: 10.1016/j.dnarep.2006.10.026. [DOI] [PubMed] [Google Scholar]
  • 39.Park EC, Szostak JW. ARD1 and NAT1 proteins form a complex that has N-terminal acetyltransferase activity. EMBO J. 1992;11(6):2087–2093. doi: 10.1002/j.1460-2075.1992.tb05267.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schuldiner M, et al. The GET complex mediates insertion of tail-anchored proteins into the ER membrane. Cell. 2008;134(4):634–645. doi: 10.1016/j.cell.2008.06.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Myung K, Datta A, Kolodner RD. Suppression of spontaneous chromosomal rearrangements by S phase checkpoint functions in Saccharomyces cerevisiae. Cell. 2001;104(3):397–408. doi: 10.1016/s0092-8674(01)00227-6. [DOI] [PubMed] [Google Scholar]
  • 42.García-Higuera I, et al. Genomic stability and tumour suppression by the APC/C cofactor Cdh1. Nat Cell Biol. 2008;10(7):802–811. doi: 10.1038/ncb1742. [DOI] [PubMed] [Google Scholar]
  • 43.Lutzmann M, Kunze R, Buerer A, Aebi U, Hurt E. Modular self-assembly of a Y-shaped multiprotein complex from seven nucleoporins. EMBO J. 2002;21(3):387–397. doi: 10.1093/emboj/21.3.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Palancade B, et al. Nucleoporins prevent DNA damage accumulation by modulating Ulp1-dependent sumoylation processes. Mol Biol Cell. 2007;18(8):2912–2923. doi: 10.1091/mbc.E07-02-0123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Nagai S, et al. Functional targeting of DNA damage to a nuclear pore-associated SUMO-dependent ubiquitin ligase. Science. 2008;322(5901):597–602. doi: 10.1126/science.1162790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ben-Aroya S, et al. Proteasome nuclear activity affects chromosome stability by controlling the turnover of Mms22, a protein important for DNA repair. PLoS Genet. 2010;6(2):e1000852. doi: 10.1371/journal.pgen.1000852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Motegi A, Murakawa Y, Takeda S. The vital link between the ubiquitin-proteasome pathway and DNA repair: impact on cancer therapy. Cancer Lett. 2009;283(1):1–9. doi: 10.1016/j.canlet.2008.12.030. [DOI] [PubMed] [Google Scholar]
  • 48.Alvaro D, Lisby M, Rothstein R. Genome-wide analysis of Rad52 foci reveals diverse mechanisms impacting recombination. PLoS Genet. 2007;3(12):e228. doi: 10.1371/journal.pgen.0030228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Scholes DT, Banerjee M, Bowen B, Curcio MJ. Multiple regulators of Ty1 transposition in Saccharomyces cerevisiae have conserved roles in genome maintenance. Genetics. 2001;159(4):1449–1465. doi: 10.1093/genetics/159.4.1449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gangloff S, Soustelle C, Fabre F. Homologous recombination is responsible for cell death in the absence of the Sgs1 and Srs2 helicases. Nat Genet. 2000;25(2):192–194. doi: 10.1038/76055. [DOI] [PubMed] [Google Scholar]
  • 51.Stark C, et al. BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 2006;34(Database issue):D535–D539. doi: 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Collins SR, et al. Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007;446(7137):806–810. doi: 10.1038/nature05649. [DOI] [PubMed] [Google Scholar]
  • 53.Jessulat M, et al. Interacting proteins Rtt109 and Vps75 affect the efficiency of non-homologous end-joining in Saccharomyces cerevisiae. Arch Biochem Biophys. 2008;469(2):157–164. doi: 10.1016/j.abb.2007.11.001. [DOI] [PubMed] [Google Scholar]
  • 54.Schuldiner M, et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005;123(3):507–519. doi: 10.1016/j.cell.2005.08.031. [DOI] [PubMed] [Google Scholar]
  • 55.Wilmes GM, et al. A genetic interaction map of RNA-processing factors reveals links between Sem1/Dss1-containing complexes and mRNA export and splicing. Mol Cell. 2008;32(5):735–746. doi: 10.1016/j.molcel.2008.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Fiedler D, et al. Functional organization of the S. cerevisiae phosphorylation network. Cell. 2009;136(5):952–963. doi: 10.1016/j.cell.2008.12.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Costanzo M, et al. The genetic landscape of a cell. Science. 2010;327(5964):425–431. doi: 10.1126/science.1180823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Jaccard P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc Vaud Sci Nat. 1901;37:241–272. French. [Google Scholar]
  • 59.Xu R, Wunsch D., 2nd Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–678. doi: 10.1109/TNN.2005.845141. [DOI] [PubMed] [Google Scholar]
  • 60.Askree SH, et al. A genome-wide screen for Saccharomyces cerevisiae deletion mutants that affect telomere length. Proc Natl Acad Sci USA. 2004;101(23):8658–8663. doi: 10.1073/pnas.0401263101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Gatbonton T, et al. Telomere length as a quantitative trait: Genome-wide survey and genetic mapping of telomere length-control genes in yeast. PLoS Genet. 2006;2(3):e35. doi: 10.1371/journal.pgen.0020035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Griffith JL, et al. Functional genomics reveals relationships between the retrovirus-like Ty1 element and its host Saccharomyces cerevisiae. Genetics. 2003;164(3):867–879. doi: 10.1093/genetics/164.3.867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Irwin B, et al. Retroviruses and yeast retrotransposons use overlapping sets of host genes. Genome Res. 2005;15(5):641–654. doi: 10.1101/gr.3739005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Ouspenski II, Elledge SJ, Brinkley BR. New yeast genes important for chromosome integrity and segregation identified by dosage effects on genome stability. Nucleic Acids Res. 1999;27(15):3001–3008. doi: 10.1093/nar/27.15.3001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Yuen KW, et al. Systematic genome instability screens in yeast and their potential relevance to cancer. Proc Natl Acad Sci USA. 2007;104(10):3925–3930. doi: 10.1073/pnas.0610642104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Andersen MP, Nelson ZW, Hetrick ED, Gottschling DE. A genetic screen for increased loss of heterozygosity in Saccharomyces cerevisiae. Genetics. 2008;179(3):1179–1195. doi: 10.1534/genetics.108.089250. [DOI] [PMC free article] [PubMed] [Google Scholar]
Proc Natl Acad Sci U S A. 2012 Nov 20;109(47):19055–19056.

Author Summary

Author Summary

The promise of genome-wide datasets is that they can be used to identify biological pathways and their interactions. Confounding the analyses of these genome-wide datasets are noisy and missing data and the problem of integrating diverse datasets. Here, we developed bioinformatics tools to integrate genetic assay data, genetic interaction data, and drug sensitivity data, and we applied these tools to genes and pathways that suppress genome instability in the yeast Saccharomyces cerevisiae. Genomic instability is a characteristic of most tumors and may be crucial in the initiation or progression of many cancers (1, 2), and identification of pathways that prevent genome instability in S. cerevisiae should aid in the study of conserved pathways in human cancers. A quantitative genetic assay for gross chromosomal rearrangements (GCRs) in S. cerevisiae has identified many genes that suppress GCRs (3) (Fig. P1A). Use of modified GCR assays in genome-wide screens identified only partially overlapping sets of genes, suggesting that important genes may still be unknown. Further, mutations causing low but significant signals are difficult to identify in high-throughput screens. We therefore used bioinformatics tools to identify candidate genes that share genetic properties with genes known to maintain genome stability. Experimental validation of a subset of candidates indicated a 200-fold higher efficiency in identifying GCR-suppressing genes compared with direct genome-wide screens, establishing the gene list described here as a resource for use in studying the genetic control of genome stability in S. cerevisiae and in human cancers.

Fig. P1.

Fig. P1.

Identifying candidate genome instability genes using bioinformatics. (A) GCR assay uses the toxic compounds canavanine and 5-fluoroorotic acid to select for progeny containing chromosomal rearrangements that have deleted the URA3 and CAN1 genes from the nonessential (solid lines) end of the left arm of chromosome V. The major types of rearrangements are chromosome breaks, which must be distal from the most telomeric essential gene, PCM1, and are healed by the addition of novel telomeres (de novo telomere additions); by fusion to telomeric fragments of chromosome V (interstitial deletions); and by various forms of translocations of the breaks to other chromosomes, which can be simple translocations or complex translocations involving dicentric intermediates (translocations). (B) Bioinformatic scheme to identify candidate genome instability genes shows the number of candidate genes identified at each step of the analysis of GCR rates, DNA-damaging agent sensitivity, and genetic similarity.

We first identified 75 genes that, when mutated, were previously reported to increase the GCR rate by at least fivefold and/or cause synergistic increases in GCR rates when combined with other mutations (Fig. P1B). Many of these genes suppress sensitivity to DNA-damaging agents. Analysis of 155 previously published genome-wide screens for mutations causing sensitivity to DNA-damaging agents, such as hydroxyurea and UV light, implicated 4,414 mutations (over 90% of the genes screened in these assays). Most mutations were only identified a few times, and data generated in different screens with the same agent typically showed a substantial lack of concordance. However, the most commonly identified genes were previously known DNA repair or DNA-damage response genes. Therefore, we identified those mutations that were observed a statistically significant number of times and excluded those mutations only identified with a particular damaging agent or only identified by an individual laboratory, which resulted in 928 genes associated with drug resistance, 44 of which were known to suppress GCRs (Fig. P1B).

In addition to genes identified by screening, we sought to identify genes that were genetically similar. Using databases of genetic interactions, which indicated when simultaneous mutation of two genes caused defects in growth under normal conditions, we looked for genes with shared patterns of genetic interactions, termed “genetic congruence.” We identified 227 genes that had statistically significant genetic congruence with the 75 genes suppressing genome instability and 148 genes with statistically significant genetic congruence with the 928 genes suppressing DNA-damaging agent sensitivity. Merging all the genes identified, as well as 10 genes that were related to GCR-suppressing genes but did not reach statistical significance for inclusion, identified 1,041 genes that saturated entire key pathways, including double-strand break repair and DNA damage signaling (Fig. P1B).

We then clustered the 1,041 genes using their genetic congruence with each other to prioritize genes for experimental validation. Thirty-five of the 87 genes selected for validation suppressed the formation of single-copy sequence-mediated GCRs. Thus, genes identified in this analysis were enriched in those that suppressed GCRs, which increased the number of known GCR-suppressing genes from 75 to 110. The GCR-suppressing genes identified belonged to three classes. The first class included untested components of complexes and pathways already identified in suppressing GCRs. Members of the second class were implicated in other analyses as preventing genome stability but had not been tested in the GCR assay. Genes in the third class were of unknown function or belonged to pathways not previously implicated in genome stability. Together, these results indicate that defects involving synchronization of the cell cycle, two distinct subcomplexes of the nuclear pore, and the regulatory particle of the proteasome play roles in suppressing GCRs. The remaining 52 genes did not suppress the GCR rates in this assay, including those previously implicated in DNA damage responses. However, our experimental validation could overlook genes that specifically suppress GCRs mediated by duplicated sequences and not single-copy sequences (4) or suppress GCRs that occur when other genes are inactivated (5). Thus, additional experimentation may identify additional genes that maintain genome stability.

Our present bioinformatics analysis relied on three underlying hypotheses: (i) Known and previously unidentified GCR-suppressing genes share some phenotypic properties, (ii) systematic and random errors will not obscure useful information in genome-wide datasets, and (iii) genetic interactions determined by synergistic growth defects under permissive growth conditions will provide useful information about other biological processes. Our results suggest that each hypothesis is true; however, the third hypothesis is the most problematic, because genes with important roles in suppressing GCRs often cluster with genes that do not, even in cases in which the genetic interactions are among the most likely to be saturated. Thus, our results suggest caution in interpreting pathways and networks constructed when growth interactions are used as a surrogate for biological processes that may not have an impact on growth rates, even though our analysis demonstrated that construction of such pathways and networks can identify large numbers of previously unidentified genes that function in the process investigated. In the case of genes that function in the control of genome stability, these results greatly expand the number of candidate genes available for study in S. cerevisiae and, ultimately, human cancers.

Footnotes

The authors declare no conflict of interest.

This is a Contributed submission.

See full research article on page E3251 of www.pnas.org.

Cite this Author Summary as: PNAS 10.1073/pnas.1216733109.

References

  • 1.Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
  • 2.Loeb LA. A mutator phenotype in cancer. Cancer Res. 2001;61(8):3230–3239. [PubMed] [Google Scholar]
  • 3.Chen C, Kolodner RD. Gross chromosomal rearrangements in Saccharomyces cerevisiae replication and recombination defective mutants. Nat Genet. 1999;23(1):81–85. doi: 10.1038/12687. [DOI] [PubMed] [Google Scholar]
  • 4.Putnam CD, Hayes TK, Kolodner RD. Specific pathways prevent duplication-mediated genome rearrangements. Nature. 2009;460(7258):984–989. doi: 10.1038/nature08217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Myung K, Chen C, Kolodner RD. Multiple pathways cooperate in the suppression of genome instability in Saccharomyces cerevisiae. Nature. 2001;411(6841):1073–1076. doi: 10.1038/35082608. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES