Role of positive selection in the retention of duplicate genes in mammalian genomes

Shin-Han Shiu; Jake K Byrnes; Runsun Pan; Peng Zhang; Wen-Hsiung Li

doi:10.1073/pnas.0510388103

. 2006 Feb 6;103(7):2232–2236. doi: 10.1073/pnas.0510388103

Role of positive selection in the retention of duplicate genes in mammalian genomes

Shin-Han Shiu ^1,^*, Jake K Byrnes ¹, Runsun Pan ^1,^†, Peng Zhang ¹, Wen-Hsiung Li ^1,^‡

PMCID: PMC1413713 PMID: 16461903

Abstract

The question of how duplicate genes are retained in a population remains controversial. The duplication-degeneration-complementation model, which involves no positive selection, stipulates a higher retention rate of duplicate genes in a small population than in a large one. This model has been accepted by many evolutionists. However, we found considerably more retentions and fewer losses of duplicate genes in the mouse genome than in the human genome, although the population size of rodents is in general larger than that of primates. Indeed, in nearly every interval of synonymous divergence between duplicate genes, the number of gene retentions in mouse is larger than that in human. Our findings suggest a more important role of positive selection in duplicate retention than duplication–degeneration–complementation. In addition, certain functional categories show a higher tendency of lineage-specific expansion than expected, suggesting lineage-specific selection or functional bias in retained duplicates.

Keywords: functional bias, gene duplication, lineage, specific expansion, duplicability

A major question in molecular evolution is how duplicate genes are retained in a genome (1–3). If a gene duplicate is neutral or slightly deleterious, the probability of its fixation in a population increases with decreasing effective population size (N_e) as predicted by the nearly neutral theory (4, 5). Moreover, it has been proposed that a small population size is favorable for the retention of duplicate genes by the duplication–degeneration–complementation (DDC) model in which complementary degenerate mutations occur in the duplicated genes, so that both genes are needed to maintain the original function(s) (6–8). These arguments lead to the proposition that the gene number increase from lower to higher eukaryotes may not have been driven by adaptive processes but as a passive response to reduced population sizes (7). Note that DDC is a selectively neutral process, requiring no positive selection. An alternative mechanism for duplicate retention is positive selection, which can arise from the various situations to be described below. It should be emphasized that although DDC is now commonly referred to as the subfunctionalization model, it is actually different from the classical subfunctionalization model in which the two genes become specialized in different tissues or at different developmental stages (9, 10). In contrast to DDC, the classical model involves positive selection because tissue (or developmental) specialization improves the function(s). Another subfunctionalization model is that the ancestral gene had two or more functions and the two duplicate genes become specialized in different functions, improving some or all of the functions (2, 11). This is also a positive selection model. In addition, a well known positive selection model is neofunctionalization in which one gene maintains the original function, whereas the other gains a new function (1). The purpose of this study is to compare the relative roles of DDC and positive selection in the retention of duplicate genes by examining lineage-specific gains in two well annotated mammalian genomes, human and mouse. If positive selection is more important than DDC for the evolutionary success of duplicate genes, the mouse genome should have retained more duplicate genes than the human genome because the larger population size of mouse implies more effective natural selection. On the other hand, if DDC is more important, the opposite should be true.

Results and Discussion

Lineage-Specific Gains in the Human and Mouse Genomes.

We obtained human and mouse protein sequences and their gene family annotations from Ensembl (12). To reduce the influence of transposable elements and pseudogenes in our data set, we excluded LINE1s, reverse transcriptases, ribosomal protein families, and genes that are at least 99% identical to reported human or mouse pseudogenes (13–15). Most human and mouse genes are classified into families with members from both organisms (shared families; Table 1 and Data Set 1, which is published as supporting information on the PNAS web site). For each shared family, we constructed a family phylogeny and then identified its “orthologous groups” (OGs). Each OG represents a single ancestral gene from the human–mouse progenitor and all lineage-specific duplicates of this ancestral gene (Fig. 1A). Eight different OG types are defined based on the presence of genes in mouse, human, or both and the extent of lineage-specific expansion (Fig. 1B and Data Set 1).

Table 1.

Family statistics

	Human		Mouse
	No. of units	No. of gene	No. of units	No. of gene
Singletons*	2,249	2,249	2,103	2,103
Families*
Unique families	159	903	154	820
Shared families	9,118	18,802	9,118	21,118
Total	11,526	21,954	11,375	24,041

Open in a new tab

*Based on Ensembl gene family classification. The counts are different from the original Ensembl classification because potential pseudogenes, ribosomal proteins, and potential transposable elements are removed.

Fig. 1. — Types of orthologous groups. (A) Inference of orthologous groups. (*Left*) A family tree with five genes: two genes are from human (Hs1a, Hs1b) and three are from mouse (Mm1, Mm2a, Mm2b). According to the parsimony principle, we assume that a bifurcating clade with one branch leading to human gene(s) and the other leading to mouse gene(s) indicates the presence of one ancestral gene. These bifurcating clades are defined as “orthologous groups” (OGs). Because the gene duplication event that led to mouse Mm1 on the one hand and mouse Mm2 on the other occurred before the mouse–human split, at the time of speciation the human genome must have contained the two duplicate genes as well. Therefore, the absence of the human gene corresponding to the ancestor of Mm2a and 2b indicates that this gene was lost in the human lineage. (*Right*) The reconciled tree with two orthologous groups: OG1 and OG2. The ancestral node (filled circle) denotes the speciation point, whereas the open circles indicate lineage-specific expansion (LSE) events. The absence of a human gene (Hs2) in OG2 is regarded as a lineage-specific loss (LSL) event (dotted line). (B) Orthologous group types and associated statistics. OGs are classified into eight types according to A. In the topology for each type, the top branch contains only human (Hs) genes and the bottom branch contains only mouse (Mm) genes. The dotted branches indicate LSLs. For example, the 4:3 type contains four human and three mouse genes, and we scored this OG as having three human and two mouse lineage-specific gains, respectively.

There are 20,679 OGs in the ≈9,000 gene families shared between human and mouse. For each OG, the number of lineage-specific gains is the number of genes minus one. There are 2,189 genes in the 744 OGs that show human lineage-specific expansion (OG types x:1, x:y, and x:0 with x > 1 in Fig. 1B), so the number of lineage-specific gains is 1,445 (2,189 − 744) in human. In contrast, mouse has 2,292 lineage-specific gains. In addition, we estimated 2,896 losses in the human lineage, but only 1,552 losses in the mouse lineage (Fig. 1B). Note that these losses refer to losses of genes that were duplicated before the human–mouse split.

Our approach only allows us to estimate the minimal numbers of gain and loss events because within-lineage duplications that were not retained and losses that occurred in both lineages cannot be detected. There are other factors that may affect our estimates: (i) the presence of pseudogenes may inflate the number of gene gains, and (ii) gene conversion events can affect the tree topologies that are the basis for inferring gains. Because we have taken a liberal approach to exclude potential pseudogenes (see Methods), they should not significantly affect our estimates. Gene conversion occurs only between sequences with very high sequence identity, and even a few mismatches reduce the frequency significantly (16, 17). Therefore, most gene conversion events that obscure gene gain inference are those that occurred at the early stage of divergence between these two lineages. Because mouse and human diverged ≈80 million years ago, most gene conversion events that occurred after their divergence should not significantly affect our OG inference.

Lineage-Specific Gains in Different Ks Intervals.

The lineage-specific gains in a genome can be divided into bins of Ks intervals (Fig. 2A), where Ks is the number of substitutions per synonymous site between two duplicate genes. Interestingly, there are more lineage-specific gains in the mouse lineage than in the human lineage in nearly all Ks bins. Although Ks is not a good proxy of divergence time because of a higher synonymous rate in the mouse lineage than in the human lineage (18), it may be considered the mutational distance between two duplicate genes. Thus, the fact that in nearly all Ks bins there are more lineage-specific gains in mouse than in human suggests that more mouse duplicate genes have been retained following similar numbers of mutations in these two lineages. There are similar numbers of gains in the bin with Ks ≤ 0.005 in the human (217 gains) and mouse (215 gains) lineages. Because this bin is likely to contain the majority of cases where two alleles have been misclassified as duplicates, this similarity in bin size suggests that the higher number of gains in the mouse lineage is not simply due to less accurate annotation of the mouse genome. Note that the higher mutation rate in rodents than in primates should lead to an underestimate of the number of retained mouse duplicates in a bin. If the synonymous substitution rate is two times higher in the rodent lineage than in the primate lineage, normalization of Ks values results in even larger mouse bins compared to corresponding human bins (Fig. 2B). We also examined the Ks distribution of all duplicates in human and mouse, respectively (Fig. 2C). We found more mouse gains than human ones in all bins with Ks ≤ 1, suggesting that the higher retention rate in mouse than in human is true for duplications that occurred prior or after the divergence between the human and mouse lineages. The L-shaped distribution (i.e., higher numbers of gains for smaller Ks values) may be in part due to the homogenization effect of gene conversion between highly similar duplicate genes (19), but is mainly because young duplicate genes have been subject to fewer gene loss events than older duplicates.

Mechanisms of Duplicate Gene Retention.

The finding of more lineage-specific gains in mouse (n = 2,292) than in human (n = 1,445) is in favor of positive selection rather than the DDC model. However, the observation can also be explained by assuming a higher birth rate and a lower death rate of duplicate genes in the mouse lineage than in the human lineage. These rates over the long history since the human–mouse split are extremely difficult to estimate but the birth rate in a lineage should be reflected in the sum of functional and nonfunctional (nonprocessed pseudogene) duplicate genes found in the genome, whereas the death rate is reflected in the number of nonprocessed pseudogenes.

Having considered functional duplicates above, we now consider nonprocessed pseudogenes. Zhang et al. (15) found more nonprocessed pseudogenes in human (n = 3,015) than in mouse (n = 735). The lower number in mouse can be partly due to difficulties to detect nonprocessed pseudogenes in mouse because its genome is not as well annotated as the human genome. It can also be partly due to a higher mutation rate in rodents than in primates (18). However, the chance is low that a pseudogene that arose in the mouse lineage after the human–mouse split has become so degenerate in sequence that it is no longer recognizable. Another factor that can contribute to the difficulty in identifying mouse pseudogenes is that the nucleotide deletion rate is higher in rodents than in primates. Graur et al. (20) found that processed pseudogenes, when compared to their functional counterpart, are shorter in mouse (2.3%) than in human (1.2%). However, these estimates refer to nucleotide deletion, not deletion of a complete pseudogene sequence. We note that as long as a substantial part of a pseudogene sequence remains, it is likely to be detected by a similarity-based search. Thus, although there are several factors that may lead to an underestimate of the number of nonprocessed pseudogenes in the mouse genome, they may not account for the large difference between the estimated numbers in human (n = 3,015) and mouse (n = 735).

To simplify the arguments, let us assume that the actual number of mouse nonprocessed pseudogenes is as large as that of human (n = 3,015). Then, the total number of gene duplications since the human–mouse split is estimated to be 2,292 + 3,015 = 5,307 in the mouse lineage and 1,445 + 3,015 = 4,460 in the human lineage. It follows that the proportion of retained duplicate genes is 2,292/5,307 = 0.43 in the mouse genome, whereas it is 1,445/4,460 = 0.32 in the human genome. Because the N_e in rodents is likely substantially higher than that in higher primates, the considerably higher proportion of retained duplicate genes in the mouse lineage than in the human lineage suggests that positive selection has played a more important role than DDC in the retention of duplicate genes in these two mammalian lineages.

Functional Bias of Retained Duplicates.

To determine whether genes of certain functions tend to expand more than others, we examine gene ontology (GO) (21) annotation of human and mouse genes and compare the numbers of genes in expanded and unexpanded OGs in each functional category against the average numbers of the whole data set with χ² tests (see Methods). Expanded OGs are those with more than one human or mouse gene (Fig. 1B). We uncovered a large number of functional categories with significantly over- or underrepresented numbers of genes in expanded OGs (for statistical analyses and results, see Data Set 2, which is published as supporting information on the PNAS web site). Biological process categories related to defense, degradation of foreign substances, environmental sensors, RNA catabolism, transcription, reproduction, and cell–cell adhesion are overrepresented in human (Fig. 3A). The overrepresented molecular function categories in general recapitulate the categories in biological processes (Fig. 3B). On the other hand, genes involved in metabolism, development, transport, phosphorylation, and transcription with RNA polymerase II are underrepresented (see Data Set 2). These categories contain mostly genes involved in housekeeping functions and developmental processes and they were either duplicated at a much lower rate or their duplicates were not retained. Our findings imply functional biases in duplicate retention and suggest that gene function is an important determinant of gene duplicability. We have also examined the Ka/Ks values of duplicates and found that the lineage-specific duplicates have significantly higher Ka/Ks values than those of 1:1 orthologs (data not shown). Both positive selection and relaxation of selection can contribute to elevated evolutionary rate. In addition, because most duplicates were generated millions of years ago, purifying selection may erase the signature of positive selection. Nonetheless, genes in several overrepresented categories have been found to be positively selected, such as (i) defense: Ig heavy chain 22, major histocompatibility complexes (23); (ii) cell–cell adhesion: sialic acid-binding Ig-like lectins (24); (iii) RNA catabolism: eosinophil cationic proteins (25); (iv) reproduction: zona pellucida glycoprotein (26), and (v) environmental sensor: olfactory receptors (27). Some functional categories are over- or underrepresented in both human and mouse (see Data Set 3, which is published as supporting information on the PNAS web site, for comparison of mouse and human categories). The strong functional bias among retained duplicates, and the fact that a number of these genes have been found to be positive selected is in agreement with our conclusion that positive selection is important in duplicate retention in these mammals.

Fig. 3. — Relationships between GO categories with overrepresentation in expanded orthologous groups in human. Biological process (A) and molecular function (B) categories are shown. For clarity, the brief GO category descriptions are shown (see Data Set 2 for the original GO terms and their statistical significance in overrepresentation). The arrowheads point to subcategories. Categories with significantly more genes in expanded orthologous groups are in black circles (P < 0.05).

Concluding Remarks.

Our findings of more gains and fewer losses of duplicate genes in the mouse lineage than in the human lineage do not support the DDC model as the major mechanism for the retention of duplicate genes in mammals. Rather, they suggest that positive selection plays a more important role than DDC for duplicate retention. Positive selection can arise from gain of new functions, specialization in different tissues or at different developmental stages, or subdivision of functions. Because we have considered only the human and mouse genomes and there are assumptions involved in our analysis, future research is needed to draw a more definite conclusion on the relative roles of positive selection and DDC in duplicate retention.

Sequence Selection and Identification of Shared Families.

The human and mouse protein sequences used in this study were obtained from Ensembl (www.ensembl.org; ref. 12; version 16.33 for human and version 16.30 for mouse). For genes with multiple protein sequences representing alternatively spliced forms, we selected the longest proteins as representatives for further analysis. To reduce the effect of pseudogenes present in the Ensembl data set, we consolidated the pseudogenes available from Ensembl and those reported in three studies (13–15). The protein sequences from human (mouse) were queried against human (mouse) pseudogene sequences. A gene was regarded as a potential pseudogene and excluded from further analyses if its amino acid identity to a known pseudogene was at least 99%, if the longest aligned segment was at least 50 aa, and the aligned segments covered at least 80% of the sequence length of the gene in question. Based on these criteria, 716 human and 1,063 mouse putative pseudogenes in shared families were excluded from further analysis. We also excluded ribosomal protein genes and glyceraldehyde-3-phosphate dehydrogenase genes because they contain a large number of pseudogenes (28, 29). The gene family assignments for human and mouse proteins were obtained from Ensembl. We found that 14% of human genes and 11% of mouse genes are singletons or families that are unique to either human or mouse. These organism-specific singletons and families were likely the consequences of lineage-specific inventions or losses, or failure in orthology identification owing to high sequence divergence, and they were excluded from our analyses.

Inference of Phylogeny, Ancestral Gene Numbers, Gene Gains, and Gene Losses.

The mouse and human protein sequences of each shared family were aligned with clustalw (30) and the alignments were used for generating neighbor-joining trees (31). The family phylogeny generated was rooted at the midpoint. It should be noted that: (i) the average divergence between human and mouse proteins are ≈20%, so un-supervised alignments are straightforward; (ii) in a random sampling of 100 orthologous groups with the closest gene as an outgroup, all have 100% bootstrap support using neighbor-joining with 1,000 bootstrap replicates; and (iii) midpoint rooting is conducted at the family level and the average identity among family members is ≈40%, much lower than the average identity between human and mouse orthologous proteins (≈80%) (based on Ensembl synteny-based ortholog assignment). These findings indicate that our phylogenetic approach likely would not introduce bias in orthologous group inference. According to the parsimony principle, a bifurcating clade with one branch leading to human gene(s) and the other to mouse gene(s) indicates the existence of one ancestral gene present before the mouse-human split (see Fig. 1). The number of such clades in a family (human-mouse or HM clades) is regarded as the OG count in that family, representing the minimal family size before the human–mouse split. Deviations from a human-to-mouse ratio of 1:1 in an OG indicate lineage-specific gains. The remaining OGs that form sister group relationships to HM OGs but have either no human or no mouse genes are also regarded as OGs with gene loss(es) in either the human or the mouse lineage.

GO and Statistical Tests for Over- and Underrepresented Categories.

GO (21) assignments for human genes were obtained from Ensembl. Two top GO categories, molecular functions and biological processes, were analyzed independently. For each top category, only subcategories with at least 10 genes were analyzed further to ensure sufficient data points for statistical analyses. Among these qualified categories, we obtained the numbers of unique human genes residing in expanded (x:1, x:0, and x:y, where x > 1 and y > 1; see Fig. 1B) and unexpanded (1:1 and 1:0) OGs. These two values were used to calculate the expected number of genes in expanded OGs and in unexpanded OGs for each qualified GO category. The expected values were then compared to the observed numbers of genes in expanded and unexpanded OGs with a χ² test to determine whether the observed values were significantly different from expected values. To correct for multiple tests, the P values generated based on the χ² tests were used to estimate the false discovery rate (q) with the q-value software (32). The q values measure the proportion of false positives incurred and were used to reject the null hypothesis that the expected counts were similar to the observed if q values were <0.05.

Supplementary Material

Supporting Data Sets

pnas_0510388103_index.html^{(852B, html)}

Acknowledgments

We thank J. J. Emerson, Melissa D. Lehti-Shiu, Geoffrey Morris, Todd Oakley, and Arnar Palsson for reading the manuscript and for discussion and the reviewers for valuable comments. This work was funded by a National Institutes of Health fellowship (to S.-H.S.) and National Institutes of Health grants (to W.-H.L.).

Glossary

Abbreviations:

DDC: duplication–degeneration–complementation
OG: orthologous group
GO: gene ontology.

Footnotes

Conflict of interest statement: No conflicts declared.

References

1.Ohno S. Evolution by Gene Duplication. New York: Springer; 1971. [Google Scholar]
2.Hughes A. L. Proc. R. Soc. London Ser. B; 1994. pp. 119–124. [Google Scholar]
3.Zhang J. Trends Ecol. Evol. 2003;18:292–298. [Google Scholar]
4.Ohta T. Nature. 1973;246:96–98. doi: 10.1038/246096a0. [DOI] [PubMed] [Google Scholar]
5.Ohta T. Annu. Rev. Ecol. Syst. 1992;23:263–286. [Google Scholar]
6.Force A., Lynch M., Pickett F. B., Amores A., Yan Y. L., Postlethwait J. Genetics. 1999;151:1531–1545. doi: 10.1093/genetics/151.4.1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lynch M., O’Hely M., Walsh B., Force A. Genetics. 2001;159:1789–1804. doi: 10.1093/genetics/159.4.1789. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lynch M., Conery J. S. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]
9.Markert C. L. Congenital Malformations. New York: International Medical Congress; 1964. pp. 163–174. [Google Scholar]
10.Ferris S. D., Whitt G. S. J. Mol. Evol. 1979;12:267–317. doi: 10.1007/BF01732026. [DOI] [PubMed] [Google Scholar]
11.Piatigorsky J., Wistow G. Science. 1991;252:1078–1079. doi: 10.1126/science.252.5009.1078. [DOI] [PubMed] [Google Scholar]
12.Hubbard T., Barker D., Birney E., Cameron G., Chen Y., Clark L., Cox T., Cuff J., Curwen V., Down T., et al. Nucleic Acids Res. 2002;30:38–41. doi: 10.1093/nar/30.1.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Torrents D., Suyama M., Zdobnov E., Bork P. Genome. Res. 2003;13:2559–2567. doi: 10.1101/gr.1455503. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Zhang Z., Harrison P. M., Liu Y., Gerstein M. Genome. Res. 2003;13:2541–2558. doi: 10.1101/gr.1429003. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Zhang Z., Carriero N., Gerstein M. Trends Genet. 2004;20:62–67. doi: 10.1016/j.tig.2003.12.005. [DOI] [PubMed] [Google Scholar]
16.Modrich P., Lahue R. Annu. Rev. Biochem. 1996;65:101–133. doi: 10.1146/annurev.bi.65.070196.000533. [DOI] [PubMed] [Google Scholar]
17.Datta A., Hendrix M., Lipsitch M., Jinks-Robertson S. Proc. Natl. Acad. Sci. USA. 1997;94:9757–9762. doi: 10.1073/pnas.94.18.9757. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Li W.-H. Molecular Evolution. Sunderland, MA: Sinauer; 1997. [Google Scholar]
19.Gao L. Z., Innan H. Science. 2004;306:1367–1370. doi: 10.1126/science.1102033. [DOI] [PubMed] [Google Scholar]
20.Graur D., Shuali Y., Li W. H. J. Mol. Evol. 1989;28:279–285. doi: 10.1007/BF02103423. [DOI] [PubMed] [Google Scholar]
21.Harris M. A., Clark J., Ireland A., Lomax J., Ashburner M., Foulger R., Eilbeck K., Lewis S., Marshall B., Mungall C., et al. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Tanaka T., Nei M. Mol. Biol. Evol. 1989;6:447–459. doi: 10.1093/oxfordjournals.molbev.a040569. [DOI] [PubMed] [Google Scholar]
23.Hughes A. L., Nei M. Nature. 1988;335:167–170. doi: 10.1038/335167a0. [DOI] [PubMed] [Google Scholar]
24.Sonnenburg J. L., Altheide T. K., Varki A. Glycobiology. 2004;14:339–346. doi: 10.1093/glycob/cwh039. [DOI] [PubMed] [Google Scholar]
25.Zhang J., Rosenberg H. F., Nei M. Proc. Natl. Acad. Sci. USA. 1998;95:3708–3713. doi: 10.1073/pnas.95.7.3708. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Swanson W. J., Yang Z., Wolfner M. F., Aquadro C. F. Proc. Natl. Acad. Sci. USA. 2001;98:2509–2514. doi: 10.1073/pnas.051605998. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sharon D., Glusman G., Pilpel Y., Khen M., Gruetzner F., Haaf T., Lancet D. Genomics. 1999;61:24–36. doi: 10.1006/geno.1999.5900. [DOI] [PubMed] [Google Scholar]
28.Dudov K. P., Perry R. P. Cell. 1984;37:457–468. doi: 10.1016/0092-8674(84)90376-3. [DOI] [PubMed] [Google Scholar]
29.Piechaczyk M., Blanchard J. M., Riaad-El Sabouty S., Dani C., Marty L., Jeanteur P. Nature. 1984;312:469–471. doi: 10.1038/312469a0. [DOI] [PubMed] [Google Scholar]
30.Higgins D. G., Thompson J. D., Gibson T. J. Methods Enzymol. 1996;266:383–402. doi: 10.1016/s0076-6879(96)66024-8. [DOI] [PubMed] [Google Scholar]
31.Saitou N., Nei M. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
32.Storey J. D., Tibshirani R. Proc. Natl. Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Data Sets

pnas_0510388103_index.html^{(852B, html)}

pnas_0510388103_10388DataSet1.xls^{(3.8MB, xls)}

pnas_0510388103_1.pdf^{(147KB, pdf)}

pnas_0510388103_2.pdf^{(23.2KB, pdf)}

[B1] 1.Ohno S. Evolution by Gene Duplication. New York: Springer; 1971. [Google Scholar]

[B2] 2.Hughes A. L. Proc. R. Soc. London Ser. B; 1994. pp. 119–124. [Google Scholar]

[B3] 3.Zhang J. Trends Ecol. Evol. 2003;18:292–298. [Google Scholar]

[B4] 4.Ohta T. Nature. 1973;246:96–98. doi: 10.1038/246096a0. [DOI] [PubMed] [Google Scholar]

[B5] 5.Ohta T. Annu. Rev. Ecol. Syst. 1992;23:263–286. [Google Scholar]

[B6] 6.Force A., Lynch M., Pickett F. B., Amores A., Yan Y. L., Postlethwait J. Genetics. 1999;151:1531–1545. doi: 10.1093/genetics/151.4.1531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Lynch M., O’Hely M., Walsh B., Force A. Genetics. 2001;159:1789–1804. doi: 10.1093/genetics/159.4.1789. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Lynch M., Conery J. S. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]

[B9] 9.Markert C. L. Congenital Malformations. New York: International Medical Congress; 1964. pp. 163–174. [Google Scholar]

[B10] 10.Ferris S. D., Whitt G. S. J. Mol. Evol. 1979;12:267–317. doi: 10.1007/BF01732026. [DOI] [PubMed] [Google Scholar]

[B11] 11.Piatigorsky J., Wistow G. Science. 1991;252:1078–1079. doi: 10.1126/science.252.5009.1078. [DOI] [PubMed] [Google Scholar]

[B12] 12.Hubbard T., Barker D., Birney E., Cameron G., Chen Y., Clark L., Cox T., Cuff J., Curwen V., Down T., et al. Nucleic Acids Res. 2002;30:38–41. doi: 10.1093/nar/30.1.38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Torrents D., Suyama M., Zdobnov E., Bork P. Genome. Res. 2003;13:2559–2567. doi: 10.1101/gr.1455503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Zhang Z., Harrison P. M., Liu Y., Gerstein M. Genome. Res. 2003;13:2541–2558. doi: 10.1101/gr.1429003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Zhang Z., Carriero N., Gerstein M. Trends Genet. 2004;20:62–67. doi: 10.1016/j.tig.2003.12.005. [DOI] [PubMed] [Google Scholar]

[B16] 16.Modrich P., Lahue R. Annu. Rev. Biochem. 1996;65:101–133. doi: 10.1146/annurev.bi.65.070196.000533. [DOI] [PubMed] [Google Scholar]

[B17] 17.Datta A., Hendrix M., Lipsitch M., Jinks-Robertson S. Proc. Natl. Acad. Sci. USA. 1997;94:9757–9762. doi: 10.1073/pnas.94.18.9757. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Li W.-H. Molecular Evolution. Sunderland, MA: Sinauer; 1997. [Google Scholar]

[B19] 19.Gao L. Z., Innan H. Science. 2004;306:1367–1370. doi: 10.1126/science.1102033. [DOI] [PubMed] [Google Scholar]

[B20] 20.Graur D., Shuali Y., Li W. H. J. Mol. Evol. 1989;28:279–285. doi: 10.1007/BF02103423. [DOI] [PubMed] [Google Scholar]

[B21] 21.Harris M. A., Clark J., Ireland A., Lomax J., Ashburner M., Foulger R., Eilbeck K., Lewis S., Marshall B., Mungall C., et al. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Tanaka T., Nei M. Mol. Biol. Evol. 1989;6:447–459. doi: 10.1093/oxfordjournals.molbev.a040569. [DOI] [PubMed] [Google Scholar]

[B23] 23.Hughes A. L., Nei M. Nature. 1988;335:167–170. doi: 10.1038/335167a0. [DOI] [PubMed] [Google Scholar]

[B24] 24.Sonnenburg J. L., Altheide T. K., Varki A. Glycobiology. 2004;14:339–346. doi: 10.1093/glycob/cwh039. [DOI] [PubMed] [Google Scholar]

[B25] 25.Zhang J., Rosenberg H. F., Nei M. Proc. Natl. Acad. Sci. USA. 1998;95:3708–3713. doi: 10.1073/pnas.95.7.3708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Swanson W. J., Yang Z., Wolfner M. F., Aquadro C. F. Proc. Natl. Acad. Sci. USA. 2001;98:2509–2514. doi: 10.1073/pnas.051605998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27.Sharon D., Glusman G., Pilpel Y., Khen M., Gruetzner F., Haaf T., Lancet D. Genomics. 1999;61:24–36. doi: 10.1006/geno.1999.5900. [DOI] [PubMed] [Google Scholar]

[B28] 28.Dudov K. P., Perry R. P. Cell. 1984;37:457–468. doi: 10.1016/0092-8674(84)90376-3. [DOI] [PubMed] [Google Scholar]

[B29] 29.Piechaczyk M., Blanchard J. M., Riaad-El Sabouty S., Dani C., Marty L., Jeanteur P. Nature. 1984;312:469–471. doi: 10.1038/312469a0. [DOI] [PubMed] [Google Scholar]

[B30] 30.Higgins D. G., Thompson J. D., Gibson T. J. Methods Enzymol. 1996;266:383–402. doi: 10.1016/s0076-6879(96)66024-8. [DOI] [PubMed] [Google Scholar]

[B31] 31.Saitou N., Nei M. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]

[B32] 32.Storey J. D., Tibshirani R. Proc. Natl. Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Role of positive selection in the retention of duplicate genes in mammalian genomes

Shin-Han Shiu

Jake K Byrnes

Runsun Pan

Peng Zhang

Wen-Hsiung Li

Abstract

Results and Discussion