Abstract
Gene duplication is thought to play a key role in phenotypic innovation. While several processes have been hypothesized to drive the retention and functional evolution of duplicate genes, their genomic contributions have never been determined. We recently developed the first genome-wide method to classify these processes by comparing distances between expression profiles of duplicate genes and their ancestral single-copy orthologs. Application of our approach to spatial gene expression profiles in two Drosophila species revealed that a majority of young duplicate genes possess new functions, and that new functions are acquired rapidly—often within a few million years. Surprisingly, new functions tend to arise in younger copies of duplicate gene pairs. Moreover, we found that young duplicates are often specifically expressed in testes, whereas old duplicates are broadly expressed across several tissues, providing strong support for the hypothetical “out-of-testes” origin of new genes. In this Extra View, I discuss our findings in the context of theoretical predictions about gene duplication, with a particular emphasis on the importance of natural selection in the evolution of novel phenotypes.
Keywords: gene duplication, neofunctionalization, subfunctionalization, expression divergence
In 1936, Muller and colleagues described one of the earliest observations of gene duplication, in which doubling of the Bar locus resulted in an extreme reduction of eye size in Drosophila melanogaster.1 This direct link between gene duplication and phenotypic variation fueled proposals of several hypotheses about the role of gene duplication in evolution.2-4 However, it was not until 1970, with the release of Susumu Ohno’s book Evolution by Gene Duplication,5 that gene duplication became widely recognized as a major mechanism of evolutionary change. Even so, most work at the time was theoretical, as the field was limited by scarce genomic data and computational resources.
According to traditional evolutionary theory, gene duplication produces two identical copies of a gene. These copies are designated as the parent, which is orthologous to the ancestral single-copy gene, and the child, which is the product of the duplication event. Due to their redundancy, one of these copies is expected to be free from selective constraint and evolve neutrally.5 However, because most mutations are deleterious, the neutrally evolving copy is typically pseudogenized within a few million years.6 Thus, under these assumptions, few duplicate genes survive for long enough to contribute to evolutionary change.
Advancement of genome sequencing technology during the late 1990s enabled empirical testing of theoretical predictions about gene duplication. As more genomes, and eventually transcriptomes, were analyzed, it became evident that large proportions of animal genomes consist of functional duplicate genes. Surprisingly, many of these duplicate genes are much older than predicted by evolutionary theory, with the origin of some duplicates dating back hundreds of millions of years. Moreover, expression divergence between duplicate genes occurs rapidly,7-14 and duplicates in Drosophila often evolve essential functions during the first few million years of evolution.15 These findings suggest that, contrary to theoretical predictions, many duplicates are retained over long periods of evolutionary time and make important contributions to phenotypic change.
Different processes may drive the long-term retention of duplicate genes: conservation, neofunctionalization, subfunctionalization, or specialization. Under conservation, parent and child copies each maintain the ancestral function, resulting in increased gene dosage.5 Under neofunctionalization, one copy maintains the ancestral function, and the other acquires a new function.5 Under subfunctionalization, each copy loses a different part of the ancestral function, such that both copies must be retained to preserve the ancestral gene function.16,17 Finally, under specialization, both copies acquire new mutually exclusive functions over time.18
Determining the genome-wide roles of conservation, neofunctionalization, subfunctionalization, and specialization can shed light on the evolutionary forces driving the functional evolution of duplicate genes. In particular, there exists a strong controversy about whether positive selection plays an important role in fixation of duplicate genes. This debate focuses on the contributions of neofunctionalization, for which a new function is fixed by positive selection, and subfunctionalization, which can occur in the absence of selection. However, until recently, this debate was not directly addressed because identification of the evolutionary processes underlying the retention of duplicates was limited to isolated studies.18-24
To assess the genome-wide roles of these evolutionary processes, we developed a classification method that utilizes phylogenetic comparisons of distances between gene expression profiles of duplicate gene copies and their ancestral single-copy ortholog in a sister species.25 Application of our method to spatial gene expression data from duplicate genes in D. melanogaster and D. pseudoobscura yielded several interesting findings. First, 65% of all lineage-specific duplicate genes underwent neofunctionalization, whereas only 1% underwent subfunctionalization.25 Second, the contributions of different evolutionary processes are similar among duplicates of varying ages, suggesting that the fates of duplicates may be rapidly attained.25 Third, in 91% of neofunctionalized pairs, a new function arose in the child copy.25 Finally, young duplicates typically have testis-specific functions, whereas old duplicates are often broadly expressed across several tissues.25
One of the most striking conclusions from this study was that positive selection plays a key role in the retention and functional evolution of young duplicate genes in Drosophila. Not only is positive selection widespread, as indicated by the observation of frequent neofunctionalization and rare subfunctionalization, but it is also rapid, resulting in neofunctionalization of a majority of duplicates that arose after the divergence of D. melanogaster and D. simulans approximately two million years ago.25 We hypothesize that the efficiency of positive selection acting on duplicate gene functions is due to the large effective population size of Drosophila species, which is similarly believed to drive rapid evolution of protein-coding sequences in Drosophila.26,27 Because our classification method can be applied to any organisms with transcriptome data for multiple conditions, this hypothesis can be directly tested via comparisons of the contributions of different evolutionary processes in species with varying effective population sizes.
A second major conclusion was that, rather than acquisition of new functions occurring in either duplicate with equal probability, there was a strong tendency for new functions to arise in child copies.25 However, this contrast to theoretical predictions may be due to a violation of one of the basic assumptions of evolutionary theory—that parent and child copies are identical after duplication. Rather, neofunctionalized child copies arise more often than expected via RNA-mediated duplication,25 which produces copies that lack introns and regulatory sequences of their parent/ancestral genes. Thus, in many cases, the duplication process itself introduces a novel function. In such cases, positive selection may play a more dominant role, because it must act immediately on the new child copy if its function is beneficial, rather than waiting for a redundant copy to acquire a beneficial function. Indeed, comparisons of nonsynonymous-to-synonymous substitution rates (Ka/Ks) and expression profile distances between copies that arose via DNA- and RNA-mediated duplication support this prediction (Fig. 1). In particular, while duplicate genes produced by either mechanism evolve faster than single-copy genes, RNA-mediated duplicates evolve substantially faster than DNA-mediated duplicates. This result prompts questions about how strengths of positive selection differ between duplicates produced via DNA- and RNA-mediated mechanisms, as well as about how these differences may influence the evolutionary trajectories and phenotypic outcomes of duplicate genes.

Figure 1. Sequence and expression evolution of DNA- and RNA-mediated duplicate genes. (A) Distributions of Ka/Ks between D. melanogaster and D. pseudoobscura for copies produced by DNA- and RNA-mediated duplication. Horizontal dashed line indicates median Ka/Ks for single-copy genes. Asterisks above boxplots show significance relative to the distribution for single-copy genes, and asterisks above bars connecting distributions show significance between indicated groups. (B) Distributions of Euclidian distances between spatial gene expression profiles of D. melanogaster and D. pseudoobscura copies produced by DNA- and RNA-mediated duplication. Horizontal dashed line indicates median Euclidian distance for single-copy genes. Asterisks above boxplots show significance relative to the distribution for single-copy genes, and asterisks above bars connecting distributions show significance between indicated groups. * P < 0.001 (Mann-Whitney U tests).
A third key result was the strong support for the hypothetical “out-of-testes” origin of new genes.25,28 Initial localization in testes enables the transcription of young duplicates with relatively simple promoters, and possibly even of those lacking regulatory elements, such as copies produced by RNA-mediated duplication. Additionally, strong selective pressures on testes facilitate the evolution of young duplicates, likely resulting in their faster functional diversification. Furthermore, the observation that old duplicates are expressed in multiple tissues suggests that young testis-specific duplicates eventually acquire broad housekeeping functions over time.25 As transcriptome data become available for more species, it will be interesting to study precisely when and how duplicate genes leave the testes and become incorporated into diverse tissue networks.
Though the functional evolution of duplicate genes has been studied in a number of species, an interesting comparison can be made between our findings for Drosophila and those for Daphnia pulex, a water flea.29 What is particularly unique about D. pulex is the absence of homology of more than 36% of genes to those of any other species examined.29 Analyses of this excess of lineage-specific genes in D. pulex revealed that it is due both to extremely rapid gene duplication, which occurs approximately three times faster than in Drosophila species, and to increased retention of duplicates.29 Comparison of expression profiles among D. pulex duplicates showed that, as in Drosophila, a majority diverged in function at or near the time of duplication.29 Moreover, the nearly identical protein-coding sequences of some duplicates suggests that new regulatory programs may have been created by the gene duplication process itself.29 Interestingly, they also found that young duplicates are most responsive to ecological challenges, which may facilitate their retention.29 Unfortunately, because the investigators did not measure gene expression levels across different tissues, the functions of young duplicates in D. pulex cannot be compared with those of Drosophila. Future collection of these data in D. pulex and a closely related sister species will enable application of our method to explicitly classify young duplicates in Daphnia and directly compare their functions to those of young duplicates in Drosophila.
In sum, our recent manuscript advanced the study of gene duplication by introducing the first genome- scale method for classifying evolutionary processes driving the retention of duplicates, and by using this method to determine the genomic contributions of each of these processes in Drosophila.25 Moreover, our analysis enabled us to test theoretical predictions about gene duplication and, specifically, to interrogate the role of positive selection in the evolution of Drosophila duplicate genes. Together, our findings indicate that strong positive selection drives the rapid acquisition of new functions by Drosophila duplicate genes, and that many factors may contribute to this process, including large effective population size, RNA-mediated duplication, and initial localization of duplicates to the quickly evolving testes.25 Future studies can utilize both our new method and our classifications in Drosophila to probe the strengths and targets of natural selection driving the evolution of novel duplicate gene functions.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
References
- 1.Muller HJ, Prokofjeva-Belgovskaja AA, Kossikov KV. Unequal crossing-over in the bar mutant as a result of duplication of a minute chromosome section. C R Acad Sci USSR. 1936;2:87–8. [Google Scholar]
- 2.Stephens SG. Possible significances of duplication in evolution. Adv Genet. 1951;4:247–65. doi: 10.1016/S0065-2660(08)60237-0. [DOI] [PubMed] [Google Scholar]
- 3.Ohno S. Sex chromosomes and sex-linked genes. Springer; 1967. [Google Scholar]
- 4.Nei M. Gene duplication and nucleotide substitution in evolution. Nature. 1969;221:40–2. doi: 10.1038/221040a0. [DOI] [PubMed] [Google Scholar]
- 5.Ohno S. Evolution by gene duplication. Springer; 1970. [Google Scholar]
- 6.Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–5. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]
- 7.Gu Z, Nicolae D, Lu HH, Li WH. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 2002;18:609–13. doi: 10.1016/S0168-9525(02)02837-8. [DOI] [PubMed] [Google Scholar]
- 8.Wagner A. Asymmetric functional divergence of duplicate genes in yeast. Mol Biol Evol. 2002;19:1760–8. doi: 10.1093/oxfordjournals.molbev.a003998. [DOI] [PubMed] [Google Scholar]
- 9.Makova KD, Li WH. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 2003;13:1638–45. doi: 10.1101/gr.1133803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gu Z, Rifkin SA, White KP, Li WH. Duplicate genes increase gene expression diversity within and between species. Nat Genet. 2004;36:577–9. doi: 10.1038/ng1355. [DOI] [PubMed] [Google Scholar]
- 11.Gu X, Zhang Z, Huang W. Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proc Natl Acad Sci U S A. 2005;102:707–12. doi: 10.1073/pnas.0409186102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li WH, Yang J, Gu X. Expression divergence between duplicate genes. Trends Genet. 2005;21:602–7. doi: 10.1016/j.tig.2005.08.006. [DOI] [PubMed] [Google Scholar]
- 13.Casneuf T, De Bodt S, Raes J, Maere S, Van de Peer Y. Nonrandom divergence of gene expression following gene and genome duplications in the flowering plant Arabidopsis thaliana. Genome Biol. 2006;7:R13. doi: 10.1186/gb-2006-7-2-r13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ganko EW, Meyers BC, Vision TJ. Divergence in expression between duplicated genes in Arabidopsis. Mol Biol Evol. 2007;24:2298–309. doi: 10.1093/molbev/msm158. [DOI] [PubMed] [Google Scholar]
- 15.Chen S, Zhang YE, Long M. New genes in Drosophila quickly become essential. Science. 2010;330:1682–5. doi: 10.1126/science.1196380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151:1531–45. doi: 10.1093/genetics/151.4.1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Stoltzfus A. On the possibility of constructive neutral evolution. J Mol Evol. 1999;49:169–81. doi: 10.1007/PL00006540. [DOI] [PubMed] [Google Scholar]
- 18.He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005;169:1157–64. doi: 10.1534/genetics.104.037051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis CW. Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol Biol Evol. 2006;23:469–78. doi: 10.1093/molbev/msj051. [DOI] [PubMed] [Google Scholar]
- 20.Escriva H, Bertrand S, Germain P, Robinson-Rechavi M, Umbhauer M, Cartry J, Duffraisse M, Holland L, Gronemeyer H, Laudet V. Neofunctionalization in vertebrates: the example of retinoic acid receptors. PLoS Genet. 2006;2:e102. doi: 10.1371/journal.pgen.0020102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, et al. Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007;39:1256–60. doi: 10.1038/ng2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sackton TB, Lazzaro BP, Schlenke TA, Evans JD, Hultmark D, Clark AG. Dynamic evolution of the innate immune system in Drosophila. Nat Genet. 2007;39:1461–8. doi: 10.1038/ng.2007.60. [DOI] [PubMed] [Google Scholar]
- 23.Kleinjan DA, Bancewicz RM, Gautier P, Dahm R, Schonthaler HB, Damante G, Seawright A, Hever AM, Yeyati PL, van Heyningen V, et al. Subfunctionalization of duplicated zebrafish pax6 genes by cis-regulatory divergence. PLoS Genet. 2008;4:e29. doi: 10.1371/journal.pgen.0040029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shapiro JA, Huang W, Zhang C, Hubisz MJ, Lu J, Turissini DA, Fang S, Wang HY, Hudson RR, Nielsen R, et al. Adaptive genic evolution in the Drosophila genomes. Proc Natl Acad Sci U S A. 2007;104:2271–6. doi: 10.1073/pnas.0610385104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Assis R, Bachtrog D. Neofunctionalization of young duplicate genes in Drosophila. Proc Natl Acad Sci U S A. 2013;110:17409–14. doi: 10.1073/pnas.1313759110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Britten RJ. Rates of DNA sequence evolution differ between taxonomic groups. Science. 1986;231:1393–8. doi: 10.1126/science.3082006. [DOI] [PubMed] [Google Scholar]
- 27.Moriyama EN. Higher rates of nucleotide substitution in Drosophila than in mammals. Jpn J Genet. 1987;62:139–47. doi: 10.1266/jjg.62.139. [DOI] [Google Scholar]
- 28.Kaessmann H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 2010;20:1313–26. doi: 10.1101/gr.101386.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Colbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, Tokishita S, Aerts A, Arnold GJ, Basu MK, et al. The ecoresponsive genome of Daphnia pulex. Science. 2011;331:555–61. doi: 10.1126/science.1197761. [DOI] [PMC free article] [PubMed] [Google Scholar]
