Abstract
The evolutionary rates of functionally related genes often covary. We present a gene coevolution network inferred from examining nearly 3 million orthologous gene pairs from 332 budding yeast species spanning ~400 million years of evolution. Network modules provide insight into cellular and genomic structure and function. Examination of the phenotypic impact of network perturbation using deletion mutant data from the baker’s yeast Saccharomyces cerevisiae, which were obtained from previously published studies, suggests that fitness in diverse environments is affected by orthologous gene neighborhood and connectivity. Mapping the network onto the chromosomes of S. cerevisiae and Candida albicans revealed that coevolving orthologous genes are not physically clustered in either species; rather, they are often located on different chromosomes or far apart on the same chromosome. The coevolution network captures the hierarchy of cellular structure and function, provides a roadmap for genotype-to-phenotype discovery, and portrays the genome as a linked ensemble of genes.
The budding yeast coevolution network captures cellular structure and function in the absence of functional data.
INTRODUCTION
Genetic networks—diagrams wherein nodes represent genes and edges represent measured functional relationships between nodes—can elucidate how genes are organized into pathways and contribute to cellular functions, shedding light onto the relationship between genotype and phenotype (1–4). Given the rich information contained in or derived from genetic networks, numerous approaches that aim to capture some aspect(s) of functional relationships among genes in a genome (e.g., gene coexpression and genetic interaction) have been developed (5–7). While these networks are highly informative, their availability and applicability are typically limited to select model organisms and single extant species or strains. Application of information from the genetic network of one organism to understand the biology of another requires assuming that the networks of the two organisms are conserved, which is not always the case (8–18).
One complementary, but poorly studied, method for constructing genetic networks is by measuring the coevolution of orthologous genes, which can be done by calculating the covariation of relative evolutionary rates among orthologous genes (19–22). Briefly, by estimating an orthologous gene’s phylogeny, one infers the rate (and changes in rate) of its evolution across the phylogeny; if the evolutionary rate values estimated for each branch of an orthologous gene’s phylogeny are significantly correlated with those of another gene’s phylogeny, the two orthologs are said to be coevolving. Note that coevolution of orthologous genes is distinct from organismal coevolution in which reciprocal evolutionary changes occur between interacting lineages—for example, insect pollinators impacting flowering plant diversification (23, 24). By estimating coevolution for all pairs of orthologous genes in a clade, one can infer the clade’s orthologous gene coevolution network, where nodes correspond to orthologs and edges correspond to the degree to which two orthologs coevolve (22). Genetic networks based on gene coevolution leverage evolutionary information, whereas standard genetic networks rely on the correlation of functional data such as gene expression or the presence of genetic interactions among genes within a single extant species or strain.
Orthologous gene coevolution is often observed among genes that share functions, are coexpressed, or whose protein products are subunits in a multimeric protein structure, and can yield insights into the genotype-to-phenotype map (25, 26). For example, screening for genes that have coevolved with genes in known DNA repair pathways across 33 mammals led to the identification of DDIAS, whose involvement in DNA repair was subsequently functionally validated (26). Furthermore, among 918 pairs of interacting proteins in the protein structural interactome map, a database of structural domain-domain interactions in the protein data bank (www.rcsb.org/), four of five proteins exhibit signatures of gene coevolution (27). Although these and other studies have demonstrated that signatures of coevolution are a powerful method to detect functional associations among genes in the absence of functional data (20, 25, 26, 28–30), the network biology principles of gene coevolution, especially between genes that have coevolved for hundreds of millions of years, remain unexplored.
To unravel the general principles of orthologous gene coevolutionary networks, we constructed the coevolution network of a densely sampled set of orthologs from one-third of known budding yeast species (332 species) that diversified over ~400 million years. The inferred network provides a hierarchical view of cellular function from broad bioprocesses to specific pathways. Interpolation of the gene coevolution network with fitness assay data from single and digenic Saccharomyces cerevisiae mutants (1, 2, 31, 32) provides insight into subnetwork- and ortholog-specific potential to buffer genetic perturbations. Unexpectedly, comparisons of genetic networks inferred from gene coevolution and genetic interactions yield similar functional insights; for example, hubs of genes tend to be functionally related and gene essentiality affects gene connectivity, wherein essential genes are more densely connected than nonessential genes. Unlike genetic interaction networks, gene coevolution networks can also provide evolutionary insights; for example, mapping the orthologous gene coevolution network onto the chromosomes of two model yeast genomes uncovers extensive interchromosomal and long-range intrachromosomal associations, providing an “entangled” view of the genome across evolutionary time scales. We anticipate that these results will facilitate the generation, interpretation, and utility of these networks among other lineages in the tree of life.
RESULTS
A gene coevolution network
We examined 2,898,028 pairs of orthologous genes from a dataset of 2408 orthologous genes in 332 budding yeast species. Broad network properties were stable across a range of thresholds for “significant” orthologous gene coevolution (fig. S2). To conservatively define “significant” coevolution and therefore examine orthologous gene pairs with only robust signatures of coevolution, we implemented a high correlation coefficient threshold for significant orthologous gene coevolution (r ≥ 0.825; Pearson correlation among relative evolutionary rates). This resulted in 60,305 significant signatures of orthologous gene coevolution (Fig. 1, A and B, and fig. S1), which were used to construct a network where nodes are orthologous genes and edges connect orthologous genes that are significantly coevolving (Fig. 1C).
Fig. 1. Constructing the budding yeast orthologous gene coevolution network.
(A) We determined coevolution in a set of 2408 single gene trees in which branch lengths were inferred along the species tree topology. (B) Coevolution of orthologous genes was evaluated across all pairwise combinations of orthologous genes using the CovER function in PhyKIT, v0.1 (22). (C) Significantly coevolving pairs of orthologous genes were used to construct a global network of orthologous gene coevolution where nodes correspond to orthologous genes and edges connect orthologous genes that are significantly coevolving. The “ring” of nodes corresponds to the orthologous genes found to be coevolving with very few or no other (i.e., singletons) orthologous genes in our dataset.
To determine how orthologous gene connectivity varied in the network, we examined patterns of dense and sparse connections for individual orthologous genes. Individual orthologous genes coevolved with a median of eight other orthologous genes, but connectivity varied substantially across the network (fig. S3). For example, 1091 orthologous genes have signatures of coevolution with five or fewer other orthologous genes, and 601 orthologous genes are singletons, which we define as orthologous genes that are not significantly coevolving with any other orthologous genes in the dataset. In contrast, 420 orthologous genes have signatures of coevolution with 100 or more other orthologous genes, and 21 orthologous genes coevolve with 400 or more others.
Coevolving orthologous genes in the network tend to be functionally related. For example, PEX1 and PEX6 are one of the pairs of genes with the highest observed correlation coefficient in evolutionary rates (fig. S4). In S. cerevisiae, the two orthologous genes encode a heterohexameric complex responsible for protein transport across peroxisomal membranes (33), and mutations in either gene can lead to severe peroxisomal disorders in humans (34). Functional enrichment among densely connected orthologous genes revealed that complex bioprocesses that require coordination among polygenic protein products are overrepresented (fig. S5 and table S1). For example, CHD1, INO80, and ARP5, which encode proteins responsible for chromatin remodelling processes such as nucleosome sliding and spacing (35), are coevolving with 400 or more other orthologous genes (fig. S5 and table S1). Together, these findings highlight that coevolution may be observed among orthologous genes that physically interact (e.g., PEX1 and PEX6) or contribute to highly intricate biological processes (e.g., INO80). More broadly, these data support the hypothesis that coevolving orthologous genes tend to have similar functions.
To determine how connectivity varied within the network, we examined the properties of subnetworks across orthologous genes considered essential and nonessential in the model yeast S. cerevisiae or the opportunistic pathogen Candida albicans (36, 37). Essential genes are densely connected in the orthologous gene coevolutionary network, whereas nonessential genes exhibit sparser connections (Fig. 2, A to D). To infer network orthologous gene communities—clusters of orthologous genes that have more connections between them than between orthologous genes of different clusters—we used a hierarchical agglomeration algorithm (Fig. 2A). Five large orthologous gene communities (clusters of more than 10 orthologous genes) were identified. Each orthologous gene community varied in size, orthologous gene community–to–orthologous gene community connectivity, and essential/nonessential orthologous gene composition. Specifically, the two largest orthologous gene communities, communities 1 and 2, share the most connections and belong to a higher-order cluster with the next two largest orthologous gene communities, communities 3 and 4 (Fig. 2E and fig. S6). In contrast, the smallest orthologous gene community, community 5, does not cluster with the other orthologous gene communities. Similarly, essential genes are overrepresented in orthologous gene community 1 but are underrepresented in orthologous gene communities 2 and 3 and in smaller communities of 10 or fewer orthologous genes (Fig. 2F; P < 0.01 for all tests; Fisher’s exact test). The result that S. cerevisiae and C. albicans essential genes are central hubs in a coevolution network constructed from orthologous genes that represent 400 million years of budding yeast evolution mirrors the finding that essential genes are central hubs in the S. cerevisiae genetic interaction network (2).
Fig. 2. Network modules reflect modules of bioprocesses.
(A) Global network of orthologous gene coevolution and essential (B) and nonessential (C) orthologous gene networks in S. cerevisiae and C. albicans. The “ring” of nodes in each plot is composed of orthologous genes that coevolve with very few or no other genes. (D) The essential gene subnetwork has higher transitivity and edge density values. The nonessential gene network has higher mean distance and diameter values. (E) There are five major subnetworks or orthologous gene communities illustrated by different colors; small communities (≤10 orthologous genes) are in gray. Edge width: number of coevolving orthologous gene pairs between communities; node size: number of orthologous genes in a community. Orthologous gene communities 1 to 4 cluster together; community 5 is a singleton. (F) Orthologous gene community 1 is overrepresented with essential orthologous genes. (G to I) Orthologous gene communities differ in enriched terms. MF, molecular functions; BP, biological processes. Circles: enriched GO terms; colors: −log10 P values; size of circles: GO term uniqueness. Enrichment results for each orthologous gene community are reported in table S3. The figure legend is to the right of (F).
From processes to pathways: The budding yeast coevolution network captures the hierarchy of cellular function
To gain insight into the functional neighborhoods of the orthologous gene coevolution network, we examined via Gene Ontology (GO) enrichment analysis (38) the composition of each orthologous gene community. Among the highest-order cluster of orthologous gene communities (i.e., communities 1 through 4), we found that higher-order cellular processes including nucleic acid metabolism [P = 0.040; Fisher’s exact test multitest corrected using false discovery rate correction with Benjamini-Hochberg (FDR-BH)] and cellular anatomical entities (P = 0.020; Fisher’s exact test multitest corrected using FDR-BH) are enriched. At the individual orthologous gene community level, we found that orthologous gene community 1 is enriched in orthologous genes with helicase activity (P = 0.005; Fisher’s exact test multitest corrected using FDR-BH), ligase activity (P = 0.004; Fisher’s exact test multitest corrected using FDR-BH), and translation initiation factors (P = 0.024; Fisher’s exact test multitest corrected using FDR-BH); orthologous gene community 2 is enriched in Golgi vesicle transport orthologous genes (P = 0.009; Fisher’s exact test multitest corrected using FDR-BH); whereas singletons are enriched in guanosine triphosphatase (GTPase) activity (P = 0.016; Fisher’s exact test multitest corrected using FDR-BH) and peroxiredoxin activity (P = 0.036; Fisher’s exact test multitest corrected using FDR-BH) (Fig. 2, G to I, and table S3).
Functional neighborhoods of coevolving orthologous genes within and between biological functions, as well as cellular compartments and complex categories, are also captured by the network. For example, orthologous genes involved in the biological functions of ribosome biogenesis, ribosomal RNA (rRNA) processing, and translation, which represent different functional categories, are extensively coevolving with one another (fig. S7A). This finding suggests that the complexity of protein biosynthesis, a process that requires coordination among diverse biochemical functions, is captured in the coevolution of the underlying orthologous genes. Similarly, orthologous genes involved in nuclear processes or located in the cytoplasm tend to coevolve with orthologous genes in the same cellular compartment; however, substantial signatures of coevolution between orthologous genes from different cellular compartments are also observed (fig. S7B).
Last, our network captures functional neighborhoods of coevolving orthologous genes at the level of pathways and complexes. We found strong signatures of coevolution among orthologous genes from specific pathways and complexes. For example, orthologous genes that encode proteins responsible for DNA replication coevolve with a larger number of other DNA replication orthologous genes than expected by random chance (P < 0.001; permutation test) (fig. S8). Orthologous genes involved in DNA mismatch repair and nucleotide excision repair pathways, which participate in the repair of DNA lesions, have more signatures of coevolution than expected by random chance (P < 0.001 for each pathway; permutation test). Orthologous genes in the phosphatidylcholine biosynthesis pathway, which is responsible for the biosynthesis of the major phospholipid in organelle membranes, and orthologous genes in the tricarboxylic acid cycle (also known as the Krebs cycle or citric acid cycle), a key component of aerobic respiration (fig. S9), also have more signatures of coevolution than expected by random chance (P < 0.001 for each pathway; permutation test). Among complexes, orthologous genes that encode the minichromosome maintenance protein complex that functions as a DNA helicase, the DNA polymerase α-primase complex that assembles RNA-DNA primers required for replication, and DNA polymerase ε that serves as a leading strand DNA polymerase (Fig. 3) also coevolve with larger numbers of orthologs from the same complex than expected by random chance (P < 0.001 for each multimeric complex; permutation test). Note that certain gene categories (e.g., transposons and hexose transporters) are not represented in our dataset of orthologous genes and could not be examined (see Methods).
Fig. 3. Extensive coevolution in DNA replication genes.
(A) Cartoon representation of DNA replication. Exemplary complex specific subnetworks are depicted in (B) to (D). (B) Extensive coevolution between orthologous genes that encode the helicase, minichromosome maintenance (MCM) complex, which functions as a helicase. (C) Coevolution in the orthologous genes that encode the DNA polymerase α-primase complex and (D) DNA polymerase ε complex, which are responsible for RNA primer synthesis and leading strand DNA synthesis, respectively. Edges in blue connect orthologous genes that are significantly coevolving. Orthologous genes and complexes in bold have signatures of coevolution. Orthologous genes and complexes are colored according to orthologous gene community assignment. Complexes, such as the DNA polymerase α-primase complex, are depicted in multiple colors reflecting the multiple orthologous gene communities represented within the complex. There is significant coevolution across all DNA replication orthologous genes (P < 0.001; permutation test) as well as the multimeric complexes such as the MCM complex (P < 0.001 for each pathway; permutation test).
In summary, these findings reveal that functional aspects of the network can be viewed with varying degrees of specificity. For example, the highest-order insights (i.e., GO enrichment across orthologous gene communities 1, 2, 3, and 4) revealed coevolution among cellular anatomical entities, whereas greater specificity—such as coevolution among orthologous genes responsible for Golgi vesicle transport—can be obtained by examining lower-order hubs of genes (e.g., GO enrichment in orthologous gene community 2). Furthermore, coevolutionary signatures can bridge distinct but related functional categories such as cellular compartments and complexes, highlighting the complex interplay of distinct functional modules over evolutionary time. Thus, the budding yeast coevolution network captures the hierarchy of cellular function from broad bioprocesses to specific pathways or multimeric complexes.
The coevolution network constructed from budding yeast orthologous genes is distinct, but complementary, to the S. cerevisiae genetic interaction network
To determine similarities and differences between our coevolution network inferred from orthologous genes in the budding yeast subphylum and the genetic interaction network inferred from digenic null mutants in the model organism S. cerevisiae (1, 31), both data types were integrated into a single supernetwork (figs. S10 and S11). In the genetic interaction network, nodes represent genes and edges represent nonadditive genetic interactions between genes; in the supernetwork, nodes represent genes and edges connect two genes that have a significant signature of coevolution, genetic interaction, or both. We hypothesize that there will be broad similarities between the networks because they both capture functional associations; however, we also hypothesize that the connectivity of individual nodes between the networks will sometimes differ because one network is built from ~400 million years of orthologous gene coevolution, whereas the other from genetic interactions in a single extant species.
Supporting this hypothesis, the orthologous gene community clustering observed in the gene coevolution network was also evident in the supernetwork, and the two networks were found to be more similar for all metrics examined (i.e., mean distance and transitivity) than expected by random chance (P < 0.001 for both tests; permutation test); however, gene-/ortholog-wise connectivity at times differed, suggesting each network harbors distinct and complementary insights (fig. S10). For example, connectivity is similar for the gene/ortholog CDC6, which is required for DNA replication (39), between the two networks. Specifically, CDC6 is connected to 96 genes/orthologs in both networks, and 56 of the genes/orthologs are the same. This result suggests that the connectivity of the CDC6 gene in S. cerevisiae is broadly conserved across species from the budding yeast subphylum. In contrast, different gene-/ortholog-wise connectivity was observed for the choline kinase CKI1 (40, 41); CKI1 is coevolving with 87 orthologs and has a significant genetic interaction with 10 genes, and 7 of these genes/orthologs are shared by both networks. This result suggests that the connectivity of the CKI1 gene observed in S. cerevisiae is not broadly conserved across species from the budding yeast subphylum. This difference may be partially explained by the fact that CKI1 has a paralog, EKI1, which arose from an ancient whole-genome duplication event that affected some, but not all, species in the subphylum (42, 43). These results reveal that orthologous gene coevolution networks inferred over macroevolutionary time scales and networks inferred from genetic interactions in single organisms offer complementary insights into functional relationships between genes.
Orthologous gene communities differ in capacity to compensate for perturbation
Examinations of gene dispensability in the model budding yeast S. cerevisiae and the opportunistic pathogen C. albicans (36, 37) suggest that single-organism genetic networks can buffer single gene losses as evidenced by the ability to maintain organismal viability. Thus, we sought to determine whether a gene’s dispensability varies in an orthologous gene community–dependent manner. To address this, we integrated information from the budding yeast orthologous gene coevolution network and genome-wide single-gene deletion fitness assays (or, in the case of essential genes, expression suppression) of S. cerevisiae in 14 diverse environments (32) (figs. S12 and S13). Here, single-gene deletion fitness assays serve as a proxy for network perturbation in which deletion of a single gene is analogous to removing a node from the network. We found that fitness of S. cerevisiae gene knockouts in different environments was significantly dependent on orthologous gene community and the number of coevolving genes per gene [Fig. 4; P < 0.001 for both comparisons of an interaction between orthologous gene community:environment interaction and environment:number of coevolving genes, multifactor analysis of variance (ANOVA)]. We also observed a significant fixed effect for orthologous gene community and environment (P < 0.001, multifactor ANOVA). These observations highlight the importance and role of the environment and the architecture of the underlying genetic network when evaluating the consequences of single-gene deletions on organismal fitness.
Fig. 4. The impact of perturbing the orthologous gene coevolutionary network through single-gene deletion in diverse environments is dependent on orthologous gene community and gene connectivity.
(A) Multifactor ANOVA results indicate orthologous gene community, environment, the interaction between orthologous gene community and environment, and the interaction between environment and the number of coevolving orthologous genes per orthologous gene are significantly associated with the fitness of a single-gene deletion strain (relative to the wild-type strain). (B) Fitness of single-gene deletion strains in diverse environments is affected by orthologous gene community. Here, the y axis indicates mean fitness across all genes in a community (x axis) regardless of node degree. (C) Fitness of single-gene deletion strains in diverse environments is affected by the number of coevolving orthologous genes the deleted node is connected to. Here, the y axis indicates fitness of all genes with a given node degree (x axis) regardless of community status. These results indicate that fitness in diverse environments is affected by orthologous gene neighborhood and connectivity in the network. In both panels, colors correspond to different environments that fitness was measured in. Df, degrees of freedom; Sum of Sq., sum of squares; Mean of Sq., mean of squares.
To further investigate the relationship between S. cerevisiae gene dispensability and structure of the coevolution network, we integrated S. cerevisiae genetic interaction data from double-gene or digenic deletion fitness assays, wherein positive and negative genetic interactions refer to positive and negative fitness effects in the digenic deletion mutants relative to those expected from the combined effects of the individual single-gene deletion mutants, respectively (1, 2, 31). We found that most gene pairs were associated with negative genetic interactions (fig. S14). Furthermore, genetic interaction scores among different orthologous gene community combinations were not significantly different (P > 0.05; Kruskal-Wallis rank sum test), suggesting digenic losses negatively affected fitness irrespective of orthologous gene community.
Last, to examine evolutionary gene loss in the context of the gene coevolution network, we investigated orthologous gene community-wide patterns of gene losses among genes lost in a lineage of budding yeasts previously reported to have undergone extensive gene losses (44). These analyses revealed orthologous gene community 2 and singleton orthologs are more likely to be lost (fig. S14B), which supports the hypothesis that gene losses do not occur stochastically (45). In summary, the architecture of the coevolution network is associated with a gene’s dispensability.
An entangled genome: Extensive interchromosomal and long-range intrachromosomal coevolution
Gene order is not random among eukaryotes and physically linked genes tend to be involved in the same metabolic pathway or protein-protein complex (46, 47). Thus, we hypothesized that coevolving orthologous genes will likely be physically linked or clustered onto yeast chromosomes. To test this hypothesis, we projected the budding yeast gene coevolution network onto the one-dimensional genome structure of S. cerevisiae and C. albicans, which diverged ~235 million years ago (48). We chose the genomes of these two organisms because they both have complete and high-quality chromosome-level assemblies. The two organisms also have distinct evolutionary histories; the lineage that includes S. cerevisiae underwent whole-genome duplication, whereas C. albicans underwent intraspecies hybridization (42, 49). These processes have contributed to differences in chromosome number (16 in S. cerevisiae versus 8 in C. albicans) and a lack of macrosynteny (50–54) (Fig. 5, A and B, and figs. S15 and S16).
Fig. 5. Extensive long range and interchromosomal gene coevolution.
(A) S. cerevisiae and (B) C. albicans differ in chromosome number and size. (C and D) Numbers of genes with interchromosomal orthologous gene coevolution (blue), intrachromosomal (green), or both (orange). (E and F) Intrachromosomal signatures of orthologous gene coevolution corrected by number of genes on chromosome (x axis) and number of interchromosomal signatures of orthologous gene coevolution corrected by number of genes on other chromosomes (y axis). Colors represent different chromosomes, and the regression line of all chromosomes is in black. (G and H) Distances among intrachromosomal signatures of orthologous gene coevolution. (I and J) INO80, an example of how orthologous genes can coevolve with others across the genome. Outermost track: chromosomes of either yeast with chromosome 1 at the 12 o’clock position; second track: genes on plus/minus strand; third track: genes colored according to orthologous gene community. Scatter plot shows the number of coevolving orthologous genes per orthologous gene; size reflects higher values. Links depict orthologous genes coevolving with INO80 and are colored according to chromosomal location of the other orthologous gene. Colors in (E) to (H) and ideogram and link colors in (J) correspond to chromosomes [see (A) and (B)].
Contrary to our hypothesis, we observed extensive interchromosomal and long-range intrachromosomal orthologous gene coevolution (Fig. 5 and figs. S17 to S23). Specifically, coevolving orthologous gene pairs were commonly located on different chromosomes (Fig. 5, C and D, and table S4). There was a near-perfect correlation between the number of intrachromosomal signatures of coevolution (corrected by the number of genes on that chromosome in the dataset) and the number of interchromosomal signatures of coevolution (corrected by the number of genes on all other chromosomes in the dataset) (r = 0.95, P < 0.001 for S. cerevisiae; r = 0.98, P < 0.001 for C. albicans; Spearman correlation). This result suggests that orthologous genes located on the same or different chromosomes are equally likely to be coevolving. Given the extensive coevolution among orthologous genes in the same or similar functional categories, these results support the notion that function, not chromosome structure, is the primary driver of coevolution over macroevolutionary time scales.
Examination of intrachromosomal coevolution revealed variation in orthologous gene pair distances along the genome. Two coevolving orthologous genes on the same chromosome can be kilobase-to-megabase distances from one another (Fig. 5, G to H). The distribution of the closest distance between an orthologous gene and its coevolving partners revealed a positively skewed distribution with a similar range of kilobase-to-megabase associations (fig. S23). In S. cerevisiae, the number of intrachromosomal signatures of coevolution is correlated with the number of genes on a chromosome represented in the dataset, whereas in C. albicans, the number of intrachromosomal signatures of coevolution is correlated both with chromosome length and with the number of genes on a chromosome represented in the dataset (fig. S24). Examination of the distances between orthologous genes in our dataset and their coevolving partners revealed that long-range intrachromosomal coevolution was not an artifact of gene sampling (fig. S24). Investigation of the interplay between orthologous gene coevolution and chromosomal contacts using a three-dimensional model of the S. cerevisiae genome (55) revealed that signatures of coevolution occur independent of chromosomal contacts (fig. S26).
Extensive inter- and intrachromosomal associations are exemplified by INO80, which encodes a chromatin remodeler and has coevolved with 591 orthologous genes on all other chromosomes in both S. cerevisiae and C. albicans (Fig. 5, I and J). To date, few examples of interchromosomal associations between loci are known. One example includes concerted copy number variation between 45S and 5S ribosomal DNA loci in humans; imbalance in copy number is thought to be associated with disease (56, 57). Our observations suggest that extensive interchromosomal and long-range intrachromosomal functional associations may be more common than previously appreciated.
DISCUSSION
We constructed a genetic network based on orthologous gene coevolution from a densely sampled set of orthologs across the budding yeast subphylum. These analyses are distinct from genetic interaction– and gene expression–based genetic networks in that they leverage evolutionary, rather than functional, data. Thus, coevolution networks infer functionally conserved relationships among orthologous genes across entire lineages, whereas genetic networks infer functional relationships among genes in a single extant species or strain (irrespective of whether these relationships are conserved in other species or not). Gene coevolution networks are also distinct from networks constructed from correlated presence and absence patterns of orthologs across a lineage [an approach known as phylogenetic profiling; (58, 59)] in that coevolutionary networks depict relationships among orthologs conserved in the majority of taxa. Examination of the global coevolution network, orthologous gene communities therein, and signatures of orthologous gene coevolution among bioprocesses, complexes, and pathways reveals that the network reflects the hierarchy of cellular function. Moreover, the integration of network-based approaches provides new insights into coevolution among orthologous genes—for example, orthologous genes coevolving with hundreds of other orthologous genes, such as INO80 (Fig. 5, I and J), are enriched in nucleosome mobilization (fig. S5).
Comparison of the budding yeast coevolution network to the genetic interaction–based network of S. cerevisiae revealed numerous notable similarities and differences. For example, both methods found that gene essentiality substantially affects connectivity wherein essential genes/orthologous genes are more densely connected than nonessential genes/orthologous genes (Fig. 2). This finding suggests that genes with more essential cellular functions are more likely central hubs in the coevolution network (1, 2, 5, 32, 60). Similarities were also observed among genes with broadly conserved functions. For example, the majority of genes/orthologs connected to CDC6, a gene required for the fundamental and widely conserved process of DNA replication (39), in the orthologous gene coevolution network and the genetic interaction–based network were the same (1, 31).
Similarities between genetic interaction and gene coevolution networks were also observed when examining the impact of gene deletion(s) on fitness in diverse environments. For example, integrating fitness data with data from the orthologous gene coevolution network revealed significant interactions between community and environment, environment and the number of coevolving genes, as well as fixed effects of community and environment (Fig. 4). These results suggest that phenotype can be affected by genes coevolving with other genes and the environment—a finding that, to our knowledge, represents the first integration of orthologous gene coevolution information and cellular fitness across diverse environments. A similar observation was made in the genetic interaction network wherein phenotype was affected by genes interacting with other genes and the environment, a phenomenon known as differential genetic interaction (32). Together with insights discussed in the previous paragraph, these striking similarities suggest that, despite using different data types to infer genetic interaction networks and gene coevolutionary networks (i.e., functional and evolutionary data, respectively), functional associations between genes, even those affected by environmental contexts, can be encoded in their coevolutionary histories; thus, functional insights can be inferred from gene coevolution networks. We find this observation particularly exciting because compared to genetic interaction analysis, which requires generating and phenotyping single and digenic knockouts for all pairwise gene combinations, orthologous gene coevolution analysis is potentially far less challenging technically and requires fewer resources. Notwithstanding these benefits, orthologous gene coevolution analysis does require the availability of well-annotated genome sequences of multiple species and knowledge of orthology relationships of their genes. Nonetheless, in the absence of physical interaction and genetic interaction data, coevolution networks can provide similar insights into functional relationships among genes.
In contrast, differences between the two networks are likely driven by the fact that not all parts of the genetic interaction–based network of any single organism are conserved across an entire lineage (8–18). The more distinct the evolutionary histories of genes or pathways of species used to construct an orthologous gene coevolution network, the more divergent the topologies of the genetic interaction–based network of a species in that lineage will be from the coevolution network of the entire lineage. For example, CKI1, a choline kinase, gene connectivity substantially differed in the two networks. This may in part be driven by an ancient whole-genome duplication event and retention of the duplicate gene copy in some, but not all, budding yeast species (42, 43). Together, these results indicate that similarities and differences between networks inferred using orthologous gene coevolution from a lineage and networks inferred based on genetic interactions from a single organism are driven by divergence in individual organisms’ genetic networks; thus, these methods offer distinct insights into functional associations among genes.
Another difference between the two networks is that the budding yeast coevolution network offers novel evolutionary insights, which cannot be inferred from genetic interaction networks in a single species. For example, hubs of genes do not only represent functionally related genes but also genes whose function has been maintained across long evolutionary time scales. Furthermore, interpolation of the gene coevolution network and one- and three-dimensional chromosome structure offers novel insights into the interplay of chromosome structure and coevolution. Despite there being few known examples of interchromosomal gene associations (56), we find extensive signatures of interchromosomal and long-range intrachromosomal coevolution (Fig. 5 and figs. S21 and S22), which suggests that gene function, not location, drives orthologous gene coevolution over macroevolutionary time scales. These results uncover a previously underappreciated degree of genome-wide coevolution that has been maintained over millions of years of budding yeast evolution, suggesting that the evolution and function of eukaryotic genomes is best viewed as extensively linked ensembles of genes.
The analyses presented here enabled us to synthesize information from orthologous gene coevolution, genetic interactions, and cellular fitness among digenic knockout strains in a diverse panel of environments. This data-rich case study of orthologous gene coevolution can be thought of as a proof-of-principle report that sets the stage for numerous exciting research opportunities and questions—such as comparisons of orthologous gene coevolutionary networks between lineages that exhibit key evolutionary differences. For example, in budding yeasts, these comparisons of orthologous gene coevolutionary networks could be performed for lineages that differ in their evolutionary rates (44), levels of horizontally acquired genes (48, 61, 62), genetic code (63, 64), whole-genome duplication (43), or ecological niche (65). This approach may also be particularly powerful in lineages where genetically tractable models have yet to be established or in emerging model organisms that are ripe for functional examination.
In summary, we highlight complementary and novel insights that can be inferred using coevolutionary networks compared to other methods to infer genetic networks. Insights and methods used here will facilitate the generation, interpretation, and utility of these networks for other lineages in the tree of life.
METHODS
Inferring gene coevolution
To infer gene coevolution across ~400 million years of budding yeast evolution, we first obtained 2408 orthologous sets of genes (hereafter referred to OGs) from 332 species (48). These 2408 orthologous genes are from diverse GO bioprocesses but are underrepresented for gene functions known to be present in multiple copies, such as transposons and hexose transporters (table S5). Thus, we conclude that the 2408 orthologous sets of genes span a broad range of cellular and molecular functions. Examination of over- and underrepresentation of genes from the various chromosomes of S. cerevisiae and C. albicans revealed that no chromosome was over- or underrepresented in the 2408 orthologs (table S6), suggesting each chromosome is equally represented in our dataset.
Next, we calculated covariation of relative evolutionary rates of all 2,898,028 pairs from the 2408 orthologous sets of genes. To do so, we developed the CovER (Covarying Evolutionary Rates) pipeline for high-throughput genome-scale analyses of orthologous gene covariation based on the mirror tree principle (Fig. 1). The mirror tree principle is conceptually similar to phylogenetic profiling—wherein correlations in gene presence/absence patterns across a phylogeny are used to identify functionally related genes (66)—but instead uses correlations in orthologous genes’ relative evolutionary rates (20, 67, 68).
To implement the CovER pipeline, single gene trees constrained to the species topology were first inferred using IQ-TREE, v1.6.11 (69) (Fig. 1). Thereafter, all pairwise combinations of gene trees were examined for significant signatures of coevolution (Fig. 1B). Differences in taxon occupancy between gene trees are accounted for by pruning both phylogenies to the set of maximally shared taxa. To mitigate the influence of factors that can lead to high false-positive rates, such as time since speciation and mutation rate, and increase the statistical power of calculating gene coevolution, branch lengths were transformed into relative rates by correcting the gene tree branch length by the corresponding branch length in the species phylogeny (19, 20, 70). Single data point outliers (defined as having corrected branch lengths greater than five) are known to cause false-positive correlations and were removed (20). Branch lengths were then Z-transformed, and a Pearson correlation coefficient was calculated for each pair of orthologs. The CovER algorithm has been integrated into PhyKIT, a UNIX toolkit for phylogenomic analysis (22).
Network construction
Complex interactions between orthologous gene pairs were further examined using a network wherein nodes represent orthologs and edges connect orthologs that are coevolving. Following our previous work (22), we considered orthologous gene pairs with a covariation coefficient of 0.825 or greater to have a significant signature of coevolution. This threshold resulted in 60,305 of 2,898,026 (2.08%) significant signatures of coevolution (fig. S1). To explore the impact of our choice of a covariation coefficient threshold, we examined two measures that describe how densely the network is connected: edge density (the proportion of present edges out of all possible edges) and transitivity (ratio of triangles that are connected to triples), as well as two measures that describe how diffuse the network is: mean distance (average path length among pairs of nodes) and diameter (the longest geodesic distance). Across a wide range of thresholds of significant orthologous gene coevolution (Pearson correlation coefficient range of 0.600 to 0.900 with a step of 0.005), we found that the choice of threshold had little impact on network structure (fig. S2).
Network substructure is commonly referred to as orthologous gene community structure and describes a set of orthologs that are more densely connected with each other but more sparsely connected with other sets (or orthologous gene communities) of orthologs. To identify the orthologous gene community structure of our global orthologous gene coevolution network, a hierarchical agglomeration algorithm that conducts greedy optimization of modularity was implemented (71).
To determine whether the orthologous gene coevolutionary network and genetic interaction network were more similar than expected by random chance, we conducted a permutation test. To do so, we generated a null expectation of similarity between the orthologous gene coevolutionary network and 10,000 random networks. Random networks were generated by shuffling the edges of the genetic interaction network. In this way, the edge density (the ratio of the number of edges and the number of possible edges) is the same between the randomly generated network and the genetic interaction network. This is a more conservative than a completely random (null) network that also alters edge density. Next, we took the absolute values of the differences between the descriptive statistics of the orthologous coevolutionary network and the 10,000 random networks to generate the null distribution. The absolute difference between the descriptive statistics of the orthologous coevolutionary network and the observed genetic interaction network were then examined along the null distribution to determine a P value.
Enrichment analysis
To determine functional category enrichment among sets of orthologs, GO enrichment analysis was conducted. To do so, a background set of GO annotations were curated from the 2408 orthologous genes (48). Specifically, for an orthologous group of genes, GO associations were mapped from the representative gene from S. cerevisiae (72). If an S. cerevisiae gene was not present, the annotation from the representative gene from C. albicans was chosen (73). When neither species was represented in an orthologous group, we considered the function of the orthologous group to be uncertain and did not assign a GO term. Significance in functional enrichment was assessed using a Fisher’s exact test with Benjamini-Hochberg multitest correction (α = 0.05) using goatools, v1.0.11 (74). GO annotations were obtained from the GO Consortium (http://geneontology.org/; release date: 19 October 2020). Higher-order summaries of GO term lists were constructed using GO slim annotations and REVIGO (75). Over- and underrepresentation of essential genes across orthologous gene communities and genes on the various chromosomes were examined using the same approach in R, v4.0.2 (https://cran.r-project.org/).
Pathway analysis
To examine coevolution between genes in pathways, we first determined the genes belonging to pathways of interest. To do so, we leveraged pathway information in the Kyoto Encyclopedia of Genes and Genomes database (76) and the Saccharomyces Genome Database (www.yeastgenome.org/). To determine whether there are more signatures of coevolution within a pathway than expected by random chance, we conducted permutation tests. The null distribution was generated by randomly shuffling coevolution coefficients across all ~3 million orthologous gene pairs 10,000 times and then determining the number of coevolving pairs among the pairs of the pathway of interest for each iteration.
Integrating gene loss information
To estimate the impact of network perturbation, fitness of single-gene deletions and genetic interaction scores inferred from digenic deletions were combined with information from the orthologous gene coevolution network (1, 2, 31, 32). For example, the relationship between gene-/ortholog-wise community, connectivity, and fitness in diverse environments was evaluated. To determine whether genes/orthologs were equally likely to be lost across orthologous gene communities, we examined patterns of gene losses in Hanseniaspora spp., which have undergone extensive gene loss compared with other budding yeasts (44).
Projecting the network onto genome structure and organization
To gain insight into the relationship between genome structure and the orthologous gene coevolution network, we projected the network onto the complete chromosome genome assemblies of S. cerevisiae and C. albicans (72, 73, 77, 78). Before mapping the network onto the genome assemblies, we investigated genome-wide synteny using orthology information from the Candida Gene Order Browser (50). Thereafter, the network was projected onto each genome assembly using Circos, v0.69 (79). Examination of the distance between coevolving orthologous genes and chromosomal contacts was conducted using a three-dimensional model of the S. cerevisiae genome (55).
Acknowledgments
Funding: J.L.S. and A.R. were supported by the Howard Hughes Medical Institute through the James H. Gilliam Fellowships for Advanced Study program. T.R.G. was supported by the NIH (1R01GM118452). M.A.P. was partially supported by the Vanderbilt Undergraduate Summer Research Program and the Goldberg Family Immersion Fund. A.R.’s laboratory received additional support from the Burroughs Wellcome Fund, the National Science Foundation (DEB-1442113 and DEB-2110404), and the National Institutes of Health/National Institute of Allergy and Infectious Diseases (R56 AI146096 and R01 AI153356). This material is based upon work supported by the NSF under grant no. DEB-1442148 and DEB-2110403, in part by the DOE Great Lakes Bioenergy Research Center (DOE BER Office of Science DE-SC0018409), and the USDA National Institute of Food and Agriculture (Hatch Project 1020204). C.T.H. is a Pew Scholar in the Biomedical Sciences and an H. I. Romnes Faculty Fellow, supported by the Pew Charitable Trusts and Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation, respectively. J.B. was supported by the European Research Foundation Synergy Fungal Tolerance 951475.
Author contributions: J.L.S. and A.R. designed the research J.L.S., M.A.P., F.Y., and S.S.D. performed the analyses. T.R.G. and C.T.H. contributed materials. J.L.S. prepared the figures with input from J.B. and A.R. J.L.S. and A.R. wrote the paper. All authors contributed to the interpretation of the results and provided comments and input on the figures and manuscript.
Competing interests: A.R. is a scientific consultant for LifeMine Therapeutics Inc. J.L.S. is a scientific consultant for Latch AI Inc. The authors declare that they have no other competing interests.
Data and materials availability: To facilitate other researchers to explore the gene coevolution information, we created a web application, The budding yeast coevolution network (https://github.com/JLSteenwyk/budding_yeast_coevolution_network), written in the R programming language (https://cran.r-project.org/). All data (including single gene phylogenies used to examine coevolution and Pearson covariation coefficients among relative evolutionary rates for all pairwise combinations of orthologous groups of genes) and code (including the code of the web application The budding yeast coevolution network) needed to evaluate the conclusions in the paper and replicate our study are provided in the figshare repository (doi: 10.6084/m9.figshare.14501964).
Supplementary Materials
This PDF file includes:
Figs. S1 to S26
Other Supplementary Material for this manuscript includes the following:
Tables S1 to S6
REFERENCES AND NOTES
- 1.Costanzo M., Baryshnikova A., Bellay J., Kim Y., Spear E. D., Sevier C. S., Ding H., Koh J. L. Y., Toufighi K., Mostafavi S., Prinz J., St. Onge R. P., VanderSluis B., Makhnevych T., Vizeacoumar F. J., Alizadeh S., Bahr S., Brost R. L., Chen Y., Cokol M., Deshpande R., Li Z., Lin Z.-Y., Liang W., Marback M., Paw J., Luis B.-J. S., Shuteriqi E., Tong A. H. Y., van Dyk N., Wallace I. M., Whitney J. A., Weirauch M. T., Zhong G., Zhu H., Houry W. A., Brudno M., Ragibizadeh S., Papp B., Pal C., Roth F. P., Giaever G., Nislow C., Troyanskaya O. G., Bussey H., Bader G. D., Gingras A.-C., Morris Q. D., Kim P. M., Kaiser C. A., Myers C. L., Andrews B. J., Boone C., The genetic landscape of a cell. Science 327, 425–431 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Costanzo M., VanderSluis B., Koch E. N., Baryshnikova A., Pons C., Tan G., Wang W., Usaj M., Hanchard J., Lee S. D., Pelechano V., Styles E. B., Billmann M., van Leeuwen J., van Dyk N., Lin Z.-Y., Kuzmin E., Nelson J., Piotrowski J. S., Srikumar T., Bahr S., Chen Y., Deshpande R., Kurat C. F., Li S. C., Li Z., Usaj M. M., Okada H., Pascoe N., Luis B.-J. S., Sharifpoor S., Shuteriqi E., Simpkins S. W., Snider J., Suresh H. G., Tan Y., Zhu H., Malod-Dognin N., Janjic V., Przulj N., Troyanskaya O. G., Stagljar I., Xia T., Ohya Y., Gingras A.-C., Raught B., Boutros M., Steinmetz L. M., Moore C. L., Rosebrock A. P., Caudy A. A., Myers C. L., Andrews B., Boone C., A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kuzmin E., VanderSluis B., Wang W., Tan G., Deshpande R., Chen Y., Usaj M., Balint A., Mattiazzi Usaj M., van Leeuwen J., Koch E. N., Pons C., Dagilis A. J., Pryszlak M., Wang J. Z. Y., Hanchard J., Riggi M., Xu K., Heydari H., San Luis B.-J., Shuteriqi E., Zhu H., Van Dyk N., Sharifpoor S., Costanzo M., Loewith R., Caudy A., Bolnick D., Brown G. W., Andrews B. J., Boone C., Myers C. L., Systematic analysis of complex genetic interactions. Science 360, eaao1729 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Costanzo M., Kuzmin E., van Leeuwen J., Mair B., Moffat J., Boone C., Andrews B., Global genetic networks and the genotype-to-phenotype relationship. Cell 177, 85–100 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wisecaver J. H., Borowsky A. T., Tzin V., Jander G., Kliebenstein D. J., Rokas A., A global coexpression network approach for connecting genes to specialized metabolic pathways in plants. Plant Cell 29, 944–959 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Baryshnikova A., Costanzo M., Myers C. L., Andrews B., Boone C., Genetic interaction networks: Toward an understanding of heritability. Annu. Rev. Genomics Hum. Genet. 14, 111–133 (2013). [DOI] [PubMed] [Google Scholar]
- 7.Lezon T. R., Banavar J. R., Cieplak M., Maritan A., Fedoroff N. V., Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proc. Natl. Acad. Sci. 103, 19033–19038 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tong A. H. Y., Evangelista M., Parsons A. B., Xu H., Bader G. D., Pagé N., Robinson M., Raghibizadeh S., Hogue C. W. V., Bussey H., Andrews B., Tyers M., Boone C., Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001). [DOI] [PubMed] [Google Scholar]
- 9.Boucher B., Jenna S., Genetic interaction networks: Better understand to better predict. Front. Genet. 4, 290 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Monaco G., van Dam S., Ribeiro J. L. C. N., Larbi A., de Magalhães J. P., A comparison of human and mouse gene co-expression networks reveals conservation and divergence at the tissue, pathway and disease levels. BMC Evol. Biol. 15, 259 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Onge R. P. S., Mani R., Oh J., Proctor M., Fung E., Davis R. W., Nislow C., Roth F. P., Giaever G., Systematic pathway analysis using high-resolution fitness profiling of combinatorial gene deletions. Nat. Genet. 39, 199–206 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mani R., Onge R. P. St., Hartman J. L., Giaever G., Roth F. P., Defining genetic interaction. Proc. Natl. Acad. Sci. U.S.A. 105, 3461–3466 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lehner B., Molecular mechanisms of epistasis within and between genes. Trends Genet. 27, 323–331 (2011). [DOI] [PubMed] [Google Scholar]
- 14.Pan X., Yuan D. S., Xiang D., Wang X., Sookhai-Mahadeo S., Bader J. S., Hieter P., Spencer F., Boeke J. D., A robust toolkit for functional profiling of the yeast genome. Mol. Cell 16, 487–496 (2004). [DOI] [PubMed] [Google Scholar]
- 15.Dixon S. J., Fedyshyn Y., Koh J. L. Y., Prasad T. S. K., Chahwan C., Chua G., Toufighi K., Baryshnikova A., Hayles J., Hoe K.-L., Kim D.-U., Park H.-O., Myers C. L., Pandey A., Durocher D., Andrews B. J., Boone C., Significant conservation of synthetic lethal genetic interaction networks between distantly related eukaryotes. Proc. Natl. Acad. Sci. 105, 16653–16658 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sorrells T. R., Johnson A. D., Making sense of transcription networks. Cell 161, 714–723 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lind A. L., Wisecaver J. H., Smith T. D., Feng X., Calvo A. M., Rokas A., Examining the evolution of the regulatory circuit controlling secondary metabolism and development in the fungal genus Aspergillus. PLOS Genet. 11, e1005096 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yang B., Wittkopp P. J., Structure of the transcriptional regulatory network correlates with regulatory divergence in drosophila. Mol. Biol. Evol. 34, 1352–1362 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sato T., Yamanishi Y., Kanehisa M., Toh H., The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21, 3482–3489 (2005). [DOI] [PubMed] [Google Scholar]
- 20.Clark N. L., Alani E., Aquadro C. F., Evolutionary rate covariation reveals shared functionality and coexpression of genes. Genome Res. 22, 714–720 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Goh C.-S., Bogan A. A., Joachimiak M., Walther D., Cohen F. E., Co-evolution of proteins with their interaction partners 1 1Edited by B. Honig. J. Mol. Biol. 299, 283–293 (2000). [DOI] [PubMed] [Google Scholar]
- 22.Steenwyk J. L., Buida T. J., Labella A. L., Li Y., Shen X.-X., Rokas A., PhyKIT: A broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data. Bioinformatics 37, 2325–2331 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Medina M., Baker D. M., Baltrus D. A., Bennett G. M., Cardini U., Correa A. M. S., Degnan S. M., Christa G., Kim E., Li J., Nash D. R., Marzinelli E., Nishiguchi M., Prada C., Roth M. S., Saha M., Smith C. I., Theis K. R., Zaneveld J., Grand challenges in coevolution. Front. Ecol. Evol. 9, (2022). [Google Scholar]
- 24.Kay K. M., Sargent R. D., The role of animal pollination in plant speciation: Integrating ecology, geography, and genetics. Annu. Rev. Ecol. Evol. Syst. 40, 637–656 (2009). [Google Scholar]
- 25.Findlay G. D., Sitnik J. L., Wang W., Aquadro C. F., Clark N. L., Wolfner M. F., Evolutionary rate covariation identifies new members of a protein network required for drosophila melanogaster female post-mating responses. PLOS Genet. 10, e1004108 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brunette G. J., Jamalruddin M. A., Baldock R. A., Clark N. L., Bernstein K. A., Evolution-based screening enables genome-wide prioritization and discovery of DNA repair genes. Proc. Natl. Acad. Sci. 116, 19593–19599 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kim W. K., Bolser D. M., Park J. H., Large-scale co-evolution analysis of protein structural interlogues using the global protein structural interactome map (PSIMAP). Bioinformatics 20, 1138–1150 (2004). [DOI] [PubMed] [Google Scholar]
- 28.Talsness D. M., Owings K. G., Coelho E., Mercenne G., Pleinis J. M., Partha R., Hope K. A., Zuberi A. R., Clark N. L., Lutz C. M., Rodan A. R., Chow C. Y., A Drosophila screen identifies NKCC1 as a modifier of NGLY1 deficiency. eLife 9, e57831 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Huang J.-W., Acharya A., Taglialatela A., Nambiar T. S., Cuella-Martin R., Leuzzi G., Hayward S. B., Joseph S. A., Brunette G. J., Anand R., Soni R. K., Clark N. L., Bernstein K. A., Cejka P., Ciccia A., MCM8IP activates the MCM8-9 helicase to promote DNA synthesis and homologous recombination upon DNA damage. Nat. Commun. 11, 2948 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Raza Q., Choi J. Y., Li Y., O’Dowd R. M., Watkins S. C., Chikina M., Hong Y., Clark N. L., Kwiatkowski A. V., Evolutionary rate covariation analysis of E-cadherin identifies Raskol as a regulator of cell adhesion and actin dynamics in Drosophila. PLOS Genet. 15, e1007720 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Usaj M., Tan Y., Wang W., VanderSluis B., Zou A., Myers C. L., Costanzo M., Andrews B., Boone C., TheCellMap.org: A web-accessible database for visualizing and mining the global yeast genetic interaction network. G3 7, 1539–1549 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Costanzo M., Hou J., Messier V., Nelson J., Rahman M., VanderSluis B., Wang W., Pons C., Ross C., Ušaj M., San Luis B.-J., Shuteriqi E., Koch E. N., Aloy P., Myers C. L., Boone C., Andrews B., Environmental robustness of the global yeast genetic interaction network. Science 372, eabf8424 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ciniawsky S., Grimm I., Saffian D., Girzalsky W., Erdmann R., Wendler P., Molecular snapshots of the Pex1/6 AAA+ complex in action. Nat. Commun. 6, 7331 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Reuber B. E., Germain-Lee E., Collins C. S., Morrell J. C., Ameritunga R., Moser H. W., Valle D., Gould S. J., Mutations in PEX1 are the most common cause of peroxisome biogenesis disorders. Nat. Genet. 17, 445–448 (1997). [DOI] [PubMed] [Google Scholar]
- 35.Ayala R., Willhoft O., Aramayo R. J., Wilkinson M., McCormack E. A., Ocloo L., Wigley D. B., Zhang X., Structure and regulation of the human INO80–nucleosome complex. Nature 556, 391–395 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Segal E. S., Gritsenko V., Levitan A., Yadav B., Dror N., Steenwyk J. L., Silberberg Y., Mielich K., Rokas A., Gow N. A. R., Kunze R., Sharan R., Berman J., Gene essentiality analyzed by in vivo transposon mutagenesis and machine learning in a stable haploid isolate of Candida albicans. MBio 9, 5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Winzeler E. A., Shoemaker D. D., Astromoff A., Liang H., Anderson K., Andre B., Bangham R., Benito R., Boeke J. D., Bussey H., Chu A. M., Connelly C., Davis K., Dietrich F., Dow S. W., El Bakkoury M., Foury F., Friend S. H., Gentalen E., Giaever G., Hegemann J. H., Jones T., Laub M., Liao H., Liebundguth N., Lockhart D. J., Lucau-Danila A., Lussier M., M’Rabet N., Menard P., Mittmann M., Pai C., Rebischung C., Revuelta J. L., Riles L., Roberts C. J., Ross-MacDonald P., Scherens B., Snyder M., Sookhai-Mahadeo S., Storms R. K., Veronneau S., Voet M., Volckaert G., Ward T. R., Wysocki R., Yen G. S., Yu K. X., Zimmermann K., Philippsen P., Johnston M., Davis R. W., Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999). [DOI] [PubMed] [Google Scholar]
- 38.GeneOntologyConsortium , The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, 258D–D261 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hartwell L. H., Culotti J., Reid B., Genetic control of the cell-division cycle in yeast, I. detection of mutants. Proc. Natl. Acad. Sci. U.S.A. 66, 352–359 (1970). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hosaka K., Kodaki T., Yamashita S., Cloning and characterization of the yeast CKI gene encoding choline kinase and its expression in Escherichia coli. J. Biol. Chem. 264, 2053–2059 (1989). [PubMed] [Google Scholar]
- 41.Kim K., Kim K. H., Storey M. K., Voelker D. R., Carman G. M., Isolation and characterization of the Saccharomyces cerevisiae EKI1 gene encoding ethanolamine kinase. J. Biol. Chem. 274, 14857–14866 (1999). [DOI] [PubMed] [Google Scholar]
- 42.Marcet-Houben M., Gabaldón T., Beyond the whole-genome duplication: Phylogenetic evidence for an ancient interspecies hybridization in the Baker’s yeast lineage. PLoS Biol. 13, e1002220 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wolfe K. H., Origin of the yeast whole-genome duplication. PLoS Biol. 13, e1002221 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Steenwyk J. L., Opulente D. A., Kominek J., Shen X.-X., Zhou X., Labella A. L., Bradley N. P., Eichman B. F., Čadež N., Libkind D., DeVirgilio J., Hulfachor A. B., Kurtzman C. P., Hittinger C. T., Rokas A., Extensive loss of cell-cycle and DNA repair genes in an ancient lineage of bipolar budding yeasts. PLoS Biol. 17, e3000255 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Albalat R., Cañestro C., Evolution by gene loss. Nat. Rev. Genet. 17, 379–391 (2016). [DOI] [PubMed] [Google Scholar]
- 46.Hurst L. D., Pál C., Lercher M. J., The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5, 299–310 (2004). [DOI] [PubMed] [Google Scholar]
- 47.Rokas A., Wisecaver J. H., Lind A. L., The birth, evolution and death of metabolic gene clusters in fungi. Nat. Rev. Microbiol. 16, 731–744 (2018). [DOI] [PubMed] [Google Scholar]
- 48.Shen X.-X., Opulente D. A., Kominek J., Zhou X., Steenwyk J. L., Buh K. V., Haase M. A. B., Wisecaver J. H., Wang M., Doering D. T., Boudouris J. T., Schneider R. M., Langdon Q. K., Ohkuma M., Endoh R., Takashima M., Manabe R., Čadež N., Libkind D., Rosa C. A., DeVirgilio J., Hulfachor A. B., Groenewald M., Kurtzman C. P., Hittinger C. T., Rokas A., Tempo and mode of genome evolution in the budding yeast subphylum. Cell 175, 1533–1545.e20 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Mixão V., Gabaldón T., Genomic evidence for a hybrid origin of the yeast opportunistic pathogen Candida albicans. BMC Biol. 18, 48 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Fitzpatrick D. A., O’Gaora P., Byrne K. P., Butler G., Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser. BMC Genomics 11, 290 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wolfe K. H., Comparative genomics and genome evolution in yeasts. Philos. Trans. R. Soc. B Biol. Sci. 361, 403–412 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Dujon B., Yeast evolutionary genomics. Nat. Rev. Genet. 11, 512–524 (2010). [DOI] [PubMed] [Google Scholar]
- 53.Chibana H., Oka N., Nakayama H., Aoyama T., Magee B. B., Magee P. T., Mikami Y., Sequence finishing and gene mapping for candida albicans chromosome 7 and syntenic analysis against the saccharomyces cerevisiae genome. Genetics 170, 1525–1537 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Seoighe C., Federspiel N., Jones T., Hansen N., Bivolarovic V., Surzycki R., Tamse R., Komp C., Huizar L., Davis R. W., Scherer S., Tait E., Shaw D. J., Harris D., Murphy L., Oliver K., Taylor K., Rajandream M.-A., Barrell B. G., Wolfe K. H., Prevalence of small inversions in yeast gene order evolution. Proc. Natl. Acad. Sci. 97, 14433–14437 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Duan Z., Andronescu M., Schutz K., McIlwain S., Kim Y. J., Lee C., Shendure J., Fields S., Blau C. A., Noble W. S., A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gibbons J. G., Branco A. T., Godinho S. A., Yu S., Lemos B., Concerted copy number variation balances ribosomal DNA dosage in human and mouse genomes. Proc. Natl. Acad. Sci. U.S.A. 112, 2485–2490 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Gibbons J. G., Branco A. T., Yu S., Lemos B., Ribosomal DNA copy number is coupled with gene expression variation and mitochondrial abundance in humans. Nat. Commun. 5, 4850 (2014). [DOI] [PubMed] [Google Scholar]
- 58.M. Pellegrini, (2012), pp. 167–177. [DOI] [PubMed]
- 59.Cokus S., Mizutani S., Pellegrini M., An improved method for identifying functionally linked proteins using phylogenetic profiles. BMC Bioinformatics 8, S7 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Mnaimneh S., Davierwala A. P., Haynes J., Moffat J., Peng W.-T., Zhang W., Yang X., Pootoolal J., Chua G., Lopez A., Trochesset M., Morse D., Krogan N. J., Hiley S. L., Li Z., Morris Q., Grigull J., Mitsakakis N., Roberts C. J., Greenblatt J. F., Boone C., Kaiser C. A., Andrews B. J., Hughes T. R., Exploration of essential gene functions via titratable promoter alleles. Cell 118, 31–44 (2004). [DOI] [PubMed] [Google Scholar]
- 61.Gonçalves P., Gonçalves C., Brito P. H., Sampaio J. P., The Wickerhamiella/Starmerella clade—A treasure trove for the study of the evolution of yeast metabolism. Yeast 37, 313–320 (2020). [DOI] [PubMed] [Google Scholar]
- 62.Gonçalves C., Gonçalves P., Multilayered horizontal operon transfers from bacteria reconstruct a thiamine salvage pathway in yeasts. Proc. Natl. Acad. Sci. 116, 22219–22228 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Krassowski T., Coughlan A. Y., Shen X.-X., Zhou X., Kominek J., Opulente D. A., Riley R., Grigoriev I. V., Maheshwari N., Shields D. C., Kurtzman C. P., Hittinger C. T., Rokas A., Wolfe K. H., Evolutionary instability of CUG-Leu in the genetic code of budding yeasts. Nat. Commun. 9, 1887 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.LaBella A. L., Opulente D. A., Steenwyk J. L., Hittinger C. T., Rokas A., Variation and selection on codon usage bias across an entire subphylum. PLOS Genet. 15, e1008304 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Opulente D. A., Rollinson E. J., Bernick-Roehr C., Hulfachor A. B., Rokas A., Kurtzman C. P., Hittinger C. T., Factors driving metabolic diversity in the budding yeast subphylum. BMC Biol. 16, 26 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Pellegrini M., Marcotte E. M., Thompson M. J., Eisenberg D., Yeates T. O., Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. 96, 4285–4288 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pazos F., Valencia A., Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. Des. Sel. 14, 609–614 (2001). [DOI] [PubMed] [Google Scholar]
- 68.de Juan D., Pazos F., Valencia A., Emerging methods in protein co-evolution. Nat. Rev. Genet. 14, 249–261 (2013). [DOI] [PubMed] [Google Scholar]
- 69.Nguyen L.-T., Schmidt H. A., von Haeseler A., Minh B. Q., IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Chikina M., Robinson J. D., Clark N. L., Hundreds of genes experienced convergent shifts in selective pressure in marine mammals. Mol. Biol. Evol. 33, 2182–2192 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Newman M. E. J., Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004). [DOI] [PubMed] [Google Scholar]
- 72.Goffeau A., Barrell B. G., Bussey H., Davis R. W., Dujon B., Feldmann H., Galibert F., Hoheisel J. D., Jacq C., Johnston M., Louis E. J., Mewes H. W., Murakami Y., Philippsen P., Tettelin H., Oliver S. G., Life with 6000 Genes. Science 274, 546–567 (1996). [DOI] [PubMed] [Google Scholar]
- 73.Jones T., Federspiel N. A., Chibana H., Dungan J., Kalman S., Magee B. B., Newport G., Thorstenson Y. R., Agabian N., Magee P. T., Davis R. W., Scherer S., The diploid genome sequence of Candida albicans. Proc. Natl. Acad. Sci. 101, 7329–7334 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Klopfenstein D. V., Zhang L., Pedersen B. S., Ramírez F., Warwick Vesztrocy A., Naldi A., Mungall C. J., Yunes J. M., Botvinnik O., Weigel M., Dampier W., Dessimoz C., Flick P., Tang H., GOATOOLS: A Python library for Gene Ontology analyses. Sci. Rep. 8, 10872 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Supek F., Bosnjak M., Skunca N., Smuc T., REVIGO summarizes and visualizes long lists of gene ontology terms. PLOS ONE 6, e21800 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kanehisa M., Sato Y., Kawashima M., Furumichi M., Tanabe M., KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.van het Hoog M., Rast T. J., Martchenko M., Grindle S., Dignard D., Hogues H., Cuomo C., Berriman M., Scherer S., Magee B., Whiteway M., Chibana H., Nantel A., Magee P., Assembly of the Candida albicans genome into sixteen supercontigs aligned on the eight chromosomes. Genome Biol. 8, R52 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Muzzey D., Schwartz K., Weissman J. S., Sherlock G., Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure. Genome Biol. 14, R97 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Krzywinski M., Schein J., Birol I., Connors J., Gascoyne R., Horsman D., Jones S. J., Marra M. A., Circos: An information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figs. S1 to S26
Tables S1 to S6





