Abstract
To determine whether genes retain ancestral functions over a billion years of evolution and to identify principles of deep evolutionary divergence, we replaced 414 essential yeast genes with their human orthologs, assaying for complementation of lethal growth defects upon loss of the yeast genes. Nearly half (47%) of the yeast genes could be successfully humanized. Sequence similarity and expression only partly predicted replaceability. Instead, replaceability depended strongly on gene modules: genes in the same process tended to be similarly replaceable (e.g., sterol biosynthesis) or not (e.g., DNA replication initiation). Simulations confirmed selection for specific function can maintain replaceability despite extensive sequence divergence. Critical ancestral functions of many essential genes are thus retained in a pathway-specific manner, robust to drift in sequences, splicing, and protein interfaces.
The ortholog-function conjecture posits that orthologous genes in diverged species perform similar or identical functions (1). The conjecture is supported by comparative analyses of gene-expression patterns, genetic interaction maps, and chemogenomic profiling (2-6), and it is widely used to predict gene function across species. However, even if two genes perform similar functions in different organisms, it may not be possible to replace one for the other, in particular if the organisms are widely diverged. To what extent deeply divergent orthologs can stand in for each other, and which principles govern such functional equivalence across species, is largely unknown.
Here, we systematically addressed these questions by replacing a large number of yeast genes with their human orthologs. Humans and the baker’s yeast Saccharomyces cerevisiae diverged from a common ancestor approximately one billion years ago (7). They share several thousand orthologous genes, accounting for more than 1/3 of the yeast genome (8). Yeast and human orthologs tend to be recognizable but often highly diverged; amino-acid identity ranges from 9% to 92%, with a genome-wide average of 32%. While we know of individual examples of human genes capable of replacing their fungal orthologs (9-12), the extent and specific conditions under which human genes can substitute for their yeast orthologs are generally not known.
We focused on the set of genes essential for yeast cell growth under standard laboratory conditions (13, 14) and for which the yeast-human orthology is 1:1, i.e. genes without lineage-specific duplicate genes that might mask the effects. Based on availability of full-length human cDNA recombinant clones (15, 16) and matched yeast strains with conditionally null alleles of the test genes (17-19), we selected 469 human genes to study (Fig. 1A).
We first sub-cloned and sequence-verified each human protein coding sequence into a single-copy, centromeric yeast plasmid under the transcriptional control of either an inducible (GAL) or constitutively active (GPD) promoter. We assembled a matched set of yeast strains in which each orthologous yeast gene could be conditionally down-regulated (via a tetracycline-repressible promoter (17)), inactivated (via a temperature sensitive allele (18)), or segregated away genetically (following sporulation of a heterozygous diploid deletion strain (13, 19)) (Fig. 1A; Fig. S1). After verifying that loss of the relevant yeast gene conferred a strong growth defect, we tested whether expression of the human ortholog could complement the growth defect, as illustrated for several examples in Fig. 1B (also Figs. S2-4). 73 of the human genes exhibited toxicity when expressed in the permissive condition; reducing the genes’ expression levels allowed us to assay replacement in 66 cases (Table S1).
Overall, we performed 652 informative growth assays surveying 414 human/yeast orthologs (Figs. 1A, C). In total, 176 yeast genes (43%) could be replaced by their human orthologs in at least one of the three strain backgrounds, while 238 (57%) could not (Table S1). We collated previously published reports of yeast gene complementation by human genes; our assays recapitulated these cases with 90% precision, 72% recall (Table S1), and incorporating the literature data for subsequent analyses brought the observed complementation rate to 47% (Fig. 1C). For randomly selected subsets of strains, we additionally validated the assays by sub-cloning the yeast test genes into the assay vectors and confirming positive complementation assays (Table S2), by confirming human protein expression using Western blot analysis (Fig. S5), and confirming complementation by tetrad dissection (Table S1).
Given that roughly half of the tested human genes successfully replaced and half did not, we next investigated factors determining replaceability. We assembled 104 quantitative features of the genes or ortholog pairs, including calculated properties of the genes’ sequences (e.g., gene and protein lengths, sequence similarities, codon usage, and predicted protein aggregation potential) and properties such as protein interactions, mRNA and protein abundances, transcription and translation rates, and mRNA splicing features (Table S3). We then quantified how well each feature predicted replaceability (Fig. 2A, Table S3).
Notably, sequence similarity only partly predicted replaceability. This tendency was strongest for highly similar (>50% amino acid identity) or dissimilar (<20%) ortholog pairs. However, most pairs fell into an intermediate range of 20-50% sequence identity, which only poorly predicted replaceability (Fig. 2B). Instead, replaceability was best predicted by properties of specific gene modules. In particular, proteins in the same pathway or complex tended to be similarly replaceable (Fig. 2A). Replaceable genes also tended to be shorter and more highly expressed. Using these features in a supervised Bayesian network classification algorithm (Fig. S6), we achieved a high overall cross-validated prediction rate (area under the receiver operating characteristic curve of 0.825, Fig. 2A) and correct prediction of 8 of 10 literature cases withheld from all computational analyses (Table S4). Properties such as human-gene splice forms counts, yeast 5′ and 3′ UTR lengths, codon adaptation indices, and yeast mRNA half-lives showed little relationship with replaceability (Fig. 2A, Table S3).
The strong association between replaceability and gene modules led us to investigate this phenomenon in more depth, examining replaceability as a function of specific protein complexes and pathways. Broad KEGG (20) pathway classes showed highly differential replaceability: metabolic enzymes (e.g., enzymes participating in lipid, amino-acid, and carbohydrate metabolism) tended to be replaceable, while proteins involved in DNA replication and repair or in cell growth tended not to be replaceable (Fig. 2C).
Among large protein complexes and pathways, we observed both extremes of replaceability. Some were entirely non-replaceable: for example, we did not observe a single successful replacement among 13 tested members of the TriC chaperone complex, the DNA replication initiation origin recognition complex, or its interacting MCM complex (Figs. 3A, B). In contrast, some pathways were almost entirely replaceable: among 19 components of the sterol biosynthesis pathway (which catalyzes the conversion of acetyl-CoA to cholesterol in humans and ergosterol in yeast) only the human farnesyl-diphosphate farnesyltransferase 1 enzyme (FDFT1) and farnesyl diphosphate synthase (FDPS) failed to replace their yeast orthologs. All other tested components were replaceable, suggesting that yeast and humans both retain the same essential complement of ancestral sterol biosynthesis functionality (Figs. 3C, S7).
The modular nature of replaceability was particularly evident in the case of the 26S proteasome complex. Of 28 tested subunits, 21 human genes replaced their yeast counterparts (Fig. 4A). However, the non-replaceable subunits were not randomly distributed; rather, they clustered in two physically-interacting groups—one consisting of the 19S lid components Rpn3 and Rpn12 and one consisting of the 20S inner core heptameric beta ring subunits β1, β2, β5, β6, and β7. Thus, of the two central heteroheptameric rings, all testable components of the alpha ring replaced, while most of the beta ring did not.
An examination of the alpha and beta subunit structures showed that subunit-subunit interfacial amino acids were conserved to similar degrees between yeast and human subunits (Fig. S8A), although beta subunits exhibited elevated rates of non-synonymous substitutions compared to alpha subunits (Fig. S8B). Even when interfacial amino acids were only partly conserved, modeling human alpha subunits into the known structure of the yeast proteasome (21) revealed that human proteins could be sterically accommodated into the yeast intersubunit-interface, as shown for human a6 (Fig. 4B) packing against yeast β6, in spite of only sharing 50% identical amino acids at the interface (Fig. S8A). Only orthologous alpha subunits replaced; non-orthologs failed (Fig. S9).
We further confirmed this trend across alpha and beta proteasome subunits by cloning and assaying subunits from additional organisms, including another yeast (Saccharomyces kluverii), the nematode C. elegans, and several beta subunits from the frog X. laevis. In all cases, alpha subunits complemented loss of the yeast orthologs, while beta subunits generally failed to complement (Fig. 4C). The pattern of replaceability across species suggests that that alpha and beta subunits experienced different evolutionary pressures, in each case operating at the level of the system of genes (the alpha or beta heteroheptamer).
To determine further why proteasome alpha subunits were replaceable while beta subunits were not, we isolated human β2 subunit mutants that complemented the yeast defect (Figs. S10-12). A single serine to glycine substitution (S214G) was sufficient to rescue growth (Fig. S11). β2 subunits act as proteases, but yeast β2 catalytic activity is dispensable if the proteasome assembles with other functioning protease subunits (22). Notably, a catalytically dead (T44A) human β2 failed to complement, while an S214G, T44A double mutant complemented successfully (Fig. S11). We conclude the S214G mutant is competent to assemble an intact proteasome, although the subunit may not be catalytically active. Thus, native human β2 needs only one amino acid change to pack within the yeast proteasome.
Theory predicts that evolutionary divergence creates Dobzhansky-Muller incompatibilities, since novel mutations in one species are untested in the other species’ genetic background and may be deleterious there (23, 24). To better understand how proteins retain the ability to interact with their ortholog’s interaction partners, even when they have diverged substantially, we developed a biochemically realistic divergence model in which we simulated the evolution of two physically interacting proteins, which both diverge over time. We considered three distinct scenarios: (i) both thermodynamic stability and binding to the extant partner were selected at ancestral levels; (ii) binding was selected at ancestral levels but stability was not; (iii) stability was selected at ancestral levels but binding was not. Thermodynamic stability (ΔGfolding) and binding (ΔGinteraction) were calculated using the empirical FoldX energy function (25). Under all scenarios, we evaluated whether an evolved member of the pair could still bind to its ancestral partner, for which binding was not enforced. We found that ancestral binding decayed rapidly under scenario (iii) but much more slowly under the other two scenarios (Figs. 4D, S13-15). Natural selection for a protein interaction thus preserves the interaction interface in a manner consistent with binding to the ancestral partner (Figs. S16-17), even though many lineages will eventually accumulate mutations that cause incompatibilities with the ancestral interactor.
Our data demonstrate that a substantial portion of conserved yeast and human genes perform much the same roles in both organisms—to an extent that the protein-coding DNA of a human gene can actually substitute for that of the yeast. The strong pathway-specific pattern of individual replacements suggests that group-wise replacement of the genes should be feasible, raising the possibility of humanizing entire cellular processes in yeast. Such strains would simplify drug discovery against human proteins, enable studies of the consequences of human genetic polymorphisms (as in (26) and Fig. S7), and empower functional studies of entire human cellular processes in a simplified organism.
Supplementary Material
Acknowledgments
We thank Megan Minnix and Ariel Royall for assistance with cloning and assays, Kevin Drew for structural modeling assistance, Mark Tsechansky for TANGO assistance, and Charlie Boone for providing the temperature sensitive yeast strain collection. This work was supported by CPRIT research fellowships to A.H.K. and J.M.L, NIH grant R01 GM088344, DTRA grant HDTRA1-12-C-0007, and NSF STC BEACON funds (DBI-0939454) to C.O.W., and grants from the NIH, NSF, CPRIT, and Welch foundation (F-1515) to E.M.M.
References and Notes
- 1.Gabaldon T, Koonin EV. Nat. Rev. Genet. 2013;14:360–366. doi: 10.1038/nrg3456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Nehrt NL, Clark WT, Radivojac P, Hahn MW. PLoS Comput. Biol. 2011;7:e1002073. doi: 10.1371/journal.pcbi.1002073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chen X, Zhang J. PLoS Comput. Biol. 2012;8:e1002784. doi: 10.1371/journal.pcbi.1002784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kapitzky L, et al. Mol. Syst. Biol. 2010;6:451. doi: 10.1038/msb.2010.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ryan CJ, et al. Mol. Cell. 2012;46:691–704. doi: 10.1016/j.molcel.2012.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Frost A, et al. Cell. 2012;149:1339–1352. doi: 10.1016/j.cell.2012.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Douzery EJP, Snell EA, Bapteste E, Delsuc F, Philippe H. Proc Natl Acad Sci U A. 2004;101:15386–15391. doi: 10.1073/pnas.0403984101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.O’Brien KP, Remm M, Sonnhammer ELL. Nucleic Acids Res. 2005;33:D476–80. doi: 10.1093/nar/gki107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dolinski K, Botstein D. Annu. Rev. Genet. 2007;41:465–507. doi: 10.1146/annurev.genet.40.110405.090439. [DOI] [PubMed] [Google Scholar]
- 10.Basson ME, Thorsness M, Finer-Moore J, Stroud RM, Rine J. Mol Cell Biol. 1988;8:37973808. doi: 10.1128/mcb.8.9.3797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cormack BP, Strubin M, Stargell LA, Struhl K. Genes Dev. 1994;8:1335–1343. doi: 10.1101/gad.8.11.1335. [DOI] [PubMed] [Google Scholar]
- 12.Osborn MJ, Miller JR. Brief. Funct. Genomic. Proteomic. 2007;6:104–111. doi: 10.1093/bfgp/elm017. [DOI] [PubMed] [Google Scholar]
- 13.Tong AH, et al. Science. 2001;294:2364–2368. doi: 10.1126/science.1065810. [DOI] [PubMed] [Google Scholar]
- 14.Winzeler EA, et al. Science. 1999;285:901–906. doi: 10.1126/science.285.5429.901. [DOI] [PubMed] [Google Scholar]
- 15.MGC Project Team et al. Genome Res. 2009;19:2324–2333. doi: 10.1101/gr.095976.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Rual J-F, et al. Nature. 2005;437:1173–1178. doi: 10.1038/nature04209. [DOI] [PubMed] [Google Scholar]
- 17.Mnaimneh S, et al. Cell. 2004;118:31–44. doi: 10.1016/j.cell.2004.06.013. [DOI] [PubMed] [Google Scholar]
- 18.Li Z, et al. Nat. Biotechnol. 2011;29:361–367. doi: 10.1038/nbt.1832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pan X, et al. Mol Cell. 2004;16:487–496. doi: 10.1016/j.molcel.2004.09.035. [DOI] [PubMed] [Google Scholar]
- 20.Kanehisa M, Goto S. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Groll M, et al. Nature. 1997;386:463–471. doi: 10.1038/386463a0. [DOI] [PubMed] [Google Scholar]
- 22.Groll M, et al. Proc. Natl. Acad. Sci. U. S. A. 1999;96:10976–10983. doi: 10.1073/pnas.96.20.10976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Orr HA. Genetics. 1995;139:1805–1813. doi: 10.1093/genetics/139.4.1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Welch JJ. Evol. Int. J. Org. Evol. 2004;58:1145–1156. doi: 10.1111/j.0014-3820.2004.tb01695.x. [DOI] [PubMed] [Google Scholar]
- 25.Guerois R, Nielsen JE, Serrano L. J. Mol. Biol. 2002;320:369–387. doi: 10.1016/S0022-2836(02)00442-4. [DOI] [PubMed] [Google Scholar]
- 26.Marini NJ, Thomas PD, Rine J. PLoS Genet. 2010;6:e1000968. doi: 10.1371/journal.pgen.1000968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Houten SM, Waterham HR. Mol. Genet. Metab. 2001;72:273–276. doi: 10.1006/mgme.2000.3133. [DOI] [PubMed] [Google Scholar]
- 28.Alberti S, Gitler AD, Lindquist S. Yeast. 2007;24:913–919. doi: 10.1002/yea.1502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Liang X, Peng L, Baek C-H, Katzen F. BioTechniques. 2013;55:265–268. doi: 10.2144/000114101. [DOI] [PubMed] [Google Scholar]
- 30.Burke D, Cold Spring Harbor Laboratory . Methods in yeast genetics: a Cold Spring Harbor Laboratory course manual. 2000 Cold Spring Harbor Laboratory Press; Plainview, N.Y: 2000. [Google Scholar]
- 31.Kaiser C, Cold Spring Harbor Laboratory . Methods in yeast genetics: a Cold Spring Harbor Laboratory course manual. 1994 Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY: 1994. [Google Scholar]
- 32.Remm M, Storm CE, Sonnhammer EL. J. Mol. Biol. 2001;314:1041–1052. doi: 10.1006/jmbi.2000.5197. [DOI] [PubMed] [Google Scholar]
- 33.Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B. Cell. 2009;138:198–208. doi: 10.1016/j.cell.2009.04.029. [DOI] [PubMed] [Google Scholar]
- 34.Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L. Nat. Biotechnol. 2004;22:1302–1306. doi: 10.1038/nbt1012. [DOI] [PubMed] [Google Scholar]
- 35.Stark C, et al. Nucleic Acids Res. 2006;34:D535–539. doi: 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ruepp A, et al. Nucleic Acids Res. 2008;36:D646–650. doi: 10.1093/nar/gkm936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hart GT, Lee I, Marcotte ER. BMCBioinformatics. 2007;8:236. doi: 10.1186/1471-2105-8-236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ogata H, et al. Nucleic Acids Res. 1999;27:29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Genome Res. 2011;21:1109–1121. doi: 10.1101/gr.118992.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lee I, Date SV, Adai AT, Marcotte EM. Science. 2004;306:1555–1558. doi: 10.1126/science.1099511. [DOI] [PubMed] [Google Scholar]
- 41.Costanzo M, et al. Science. 2010;327:425–431. doi: 10.1126/science.1180823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kulak NA, Pichler G, Paron I, Nagaraj N, Mann M. Nat. Methods. 2014;11:319–324. doi: 10.1038/nmeth.2834. [DOI] [PubMed] [Google Scholar]
- 43.Guo H, Ingolia NT, Weissman JS, Bartel DP. Nature. 2010;466:835–840. doi: 10.1038/nature09267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. Science. 2009;324:218–223. doi: 10.1126/science.1168978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Frank E, Hall M, Trigg L, Holmes G, Witten IH. Bioinforma. Oxf. Engl. 2004;20:2479–2481. doi: 10.1093/bioinformatics/bth261. [DOI] [PubMed] [Google Scholar]
- 46.Escalante-Chong R, et al. Proc. Natl. Acad. Sci. U. S. A. 2015;112:1636–1641. doi: 10.1073/pnas.1418058112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Sella G, Hirsh AE. Proc. Natl. Acad. Sci. U. S. A. 2005;102:9541–9546. doi: 10.1073/pnas.0501865102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Katoh K, Standley DM. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Houten SM, van Woerden CS, Wijburg FA, Wanders RJA, Waterham HR. Eur. J. Hum. Genet. EJHG. 2003;11:196–200. doi: 10.1038/sj.ejhg.5200933. [DOI] [PubMed] [Google Scholar]
- 50.D’Osualdo A, et al. Eur. J. Hum. Genet. EJHG. 2005;13:314–320. doi: 10.1038/sj.ejhg.5201323. [DOI] [PubMed] [Google Scholar]
- 51.Leyva-Vega M, et al. Am. J. Med. Genet. A. 2011;155A:1461–1464. doi: 10.1002/ajmg.a.33915. [DOI] [PubMed] [Google Scholar]
- 52.Kone-Paut I, Sanchez E, Le Quellec A, Manna R, Touitou I. Ann. Rheum. Dis. 2007;66:832834. doi: 10.1136/ard.2006.068841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Nevyjel M, et al. Pediatrics. 2007;119:e523–527. doi: 10.1542/peds.2006-2015. [DOI] [PubMed] [Google Scholar]
- 54.Bayes A, et al. Nat. Neurosci. 2011;14:19–21. doi: 10.1038/nn.2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Unno M, et al. Struct. Lond. Engl. 1993. 2002;10:609–618. doi: 10.1016/s0969-2126(02)00748-7. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.