Abstract
A two-locus gene conversion model with selection is developed. Under the joint action of selection, mutation, gene conversion, recombination, and random genetic drift, approximate formulas for the expectations of the moments of allele frequencies and the expected amounts of variation within and between two loci are obtained by a diffusion method assuming relatively strong selection. It is shown that the pattern of allelic variation is mainly determined by the balance between gene conversion and selection, because these two mechanisms act in opposite directions. As an application of the theoretical results, the human RHCE and RHD genes are considered. The very high level of amino acid divergence between the two genes is observed only in a short region around exon 7. It is known that exon 7 encodes amino acids that characterize the difference between the RHCE and RHD antigens. The observed pattern of DNA variation in this region is consistent with the selection model developed in this article, suggesting that strong selection might be working to maintain the RHCE/RHD antigen variation in the two-locus system. The selection intensity is estimated on the basis of the theoretical result.
Recent genomic sequencing projects confirmed earlier studies showing there are a number of duplicated genes or chromosome segments in the eukaryotic genome (1–4). Gene duplication has been considered an important mechanism for adaptive genome evolution, because there is an opportunity that an advantageous mutation gives one of the duplicated genes a new function (5–7). However, there is a great debate on the fates of duplicated genes and how often adaptive functional diversification occurs. Despite many demonstrations of adaptive evolution in duplicated genes (e.g., refs. 8–11), theoretical studies indicate that one of the duplicated genes is likely to be silenced relatively quickly after duplication (e.g., refs. 12–16).
To understand the mechanism underlying the acquisition of a new function by duplicated genes, this article considers the evolutionary process within a relatively short period after gene duplication. Walsh (17) suggested that functional diversification does not occur frequently, because gene conversion homogenizes variation between duplicated genes (i.e., concerted evolution of multigene families; see refs. 18–24). He considered a neutral model, in which a duplicated gene can acquire a new function when it has successfully “escaped” from conversion due to accumulation of neutral mutations. The conversion rate is assumed to decrease as genes diverge. Here an alternative model is proposed, in which strong selection results in evolution of a new function under the pressure of gene conversion.
A simple two-locus gene conversion model with two alleles, A and B, is considered in a finite population. It is assumed that A and B have slightly different functions, so that haplotypes with the two different alleles (A-B and B-A) are advantageous over haplotypes with the same alleles (A-A and B-B). Let us suppose that a new allele (B) is introduced by mutation in a population in which A-A is fixed. The frequency of B might increase by selection, and the frequencies of advantageous haplotypes (A-B and B-A) might also increase, while gene conversion changes advantageous haplotypes to deleterious haplotypes (A-A and B-B). Thus, selection and gene conversion act in opposite directions. If the effect of gene conversion is larger than that of selection, the four haplotypes might coexist, but eventually one of the deleterious haplotypes could fix in the population by genetic drift. With very strong selection, on the other hand, one of the advantageous haplotypes is likely to fix, and deleterious haplotypes created by gene conversion are eliminated immediately from the population. This state gives a great opportunity for further functional divergence. The purpose of this article is to consider how strong selection is needed to maintain the state where an advantageous haplotype is nearly fixed under the pressure of gene conversion. The model considers the joint action of selection, mutation, gene conversion, recombination, and random genetic drift as factors to determine the pattern of haplotype polymorphism in the duplicated genes. Because this study considers a relatively short-time evolutionary process (i.e., polymorphism) in a small multigene family, other mechanisms such as unequal crossing over and birth-and-death process, which might play important roles in middle- or large-size multigene families, are ignored (25, 26).
Under the model described above, the pattern of polymorphism is considered by hw and hb, where hw is the heterozygosity within the locus, and hb is the probability that a pair of alleles randomly chosen from different loci is not identical (27). When an advantageous haplotype is nearly fixed by selection, it is expected that hb is almost one and that hw is very small. On the other hand, when selection is not strong, hw might be relatively large and hb is much smaller than one. In this article, approximate equations for the expectations of hw and hb are obtained by using a diffusion method when selection is relatively strong. The theoretical result is very different from that under the neutrality (21, 27, 28).
As an application of the theory, human rhesus (RH) genes are considered. On the short arm of chromosome 1, there are two closely linked RH genes, RHCE and RHD, which encode the CcEe and D blood group antigens (29). The DNA sequence identity between the two genes is high (≈97%), and their exon–intron structures are very similar (30), indicating they were created by a tandem gene duplication event (for review, see ref. 31). It is estimated that the gene duplication occurred 5–12 million years ago (32). The observed high level of amino acid replacement variation between the two genes in a short region around exon 7 might be explained by the model developed in this article, suggesting that strong selection might be working to maintain the RHCE/RHD antigen variation in the human population.
Theory
Consider two linked loci, I and II, in a random mating population with 2N haploids or N diploids. We consider two alleles, A and B, so there are four haplotypes, A-A, A-B, B-A, and B-B. The fitnesses of these haplotypes are given by 1 – s, 1, 1 and 1 – s, respectively. It is assumed that the symmetric mutation rate between the two alleles is μ per locus per generation. The recombination rate between the two loci is assumed to be r per generation. Intrachromosomal gene conversion occurs at rate c per locus per generation, e.g., A-B changes into A-A with probability c and into B-B with the same probability. This is a simple case of Ohta's model (33). Let the frequencies of A-A, A-B, B-A, and B-B be x1, x2, x3, and x4 (x1 + x2 + x3 + x4 = 1), respectively. Given x1, x2, x3, and x4, their expectations in the next generation might be given by the following recursion equations:
![]() |
[1a] |
![]() |
[1b] |
![]() |
[1c] |
![]() |
[1d] |
where D = x1x4 – x2x3. These recursions treat mutation, recombination, gene conversion, and selection independently, which (especially recombination and gene conversion) may not be biologically independent. However, it should be noted that these events can be treated independently in a sufficiently large population (i.e., continuous time approximation). In a diploid population, Eqs. 1a–d might have a problem in the treatment of selection, which will be discussed later.
The goal of this section is to obtain the expectations of hw and hb at equilibrium. In this model, their expectations are given by
![]() |
[2] |
where p (= x1 + x2) and q (= x1 + x3) are the frequencies of A at loci I and II, respectively. Note that this symmetric model predicts E(p) = E(q) = 0.5 and E(p2) = E(q2). To obtain E(hw) and E(hb), we consider the expectations of the moments of allele frequencies by using a diffusion method (34). At equilibrium, it is known that a function, g(x1, x2, x3), satisfies the following equation:
![]() |
[3] |
where L is the differential operator of the Kolmogorov backward equation (34–36). In this model, L(g) is given by
![]() |
[4] |
Transforming the three variables, x1, x2, and x3 into p, q and D, Eq. 4 becomes
![]() |
[5] |
and
![]() |
[6] |
where θ = 4Nμ, C = 4Nc and R = 4Nr.
From Eqs. 3, 5, and 6, we consider approximate solutions for the moments of p and q under the assumption of relatively strong selection, because the diffusion equation can be solved exactly only when Ns = 0 (27). Because the model is symmetrical, it is obvious that
![]() |
[7] |
When the effect of selection on allelic variation is large, the frequencies of deleterious haplotypes, A-A and B-B, should be very small in the population. Therefore, it may be possible to assume that the sum of p and q is approximately 1. Under this assumption, we have the following approximations:
![]() |
[8] |
Because the amount of linkage disequilibrium (D) is much smaller than E(p2), E(p3), and E(p4), it is assumed that
![]() |
[9] |
Then, letting g = p, pq and D with Eqs. 7–9, Eq. 5 gives the following three equations:
![]() |
[10] |
![]() |
[11] |
![]() |
[12] |
Solving these equations, we have
![]() |
[13] |
![]() |
[14] |
and
![]() |
[15] |
It should be noted that these expectations are independent from the recombination rate. From Eq. 13, the expectations of hw and hb are given by
![]() |
[16] |
and
![]() |
[17] |
To check the theoretical results, Monte Carlo simulations were carried out with N = 1,000 and θ = 0.01. For each parameter set, a simulation was run for 100,000 N generations, in which the pseudosampling method (37) was used to determine the haplotype frequencies generation by generation, and p2, p3, p4, hw, and hb were calculated every N generations. The averages of p2, p3, p4, hw, and hb are almost independent from R, as expected, and in very good agreement with the approximate Eqs. 13–17 when selection is relatively strong. Part of the results is shown in Fig. 1, in which the averages of hw and hb for R = 0 and 100 are plotted against C. It is demonstrated that Eqs. 16 and 17 are good approximations for hw and hb when C is smaller than Ns/5. If C is higher than Ns/5, hw from the simulation is smaller than Eq. 16, and hb from the simulation is larger than Eq. 17. The deviation from the theory is bigger when R is small. Eqs. 16 and 17 and Fig. 1 demonstrate that hw and hb are getting close to 0 and 1 as selection intensity increases, indicating that selection works to keep one of the advantageous haplotypes (A-B or B-A) in a very high frequency in a population. That is, this model does not predict the state where both advantageous haplotypes coexist in intermediate frequencies (e.g., x2 ≈ x3 ≈ 0.5 so that hw ≈ hb ≈ 0.5) due to genetic drift.
Fig. 1.
Results of simulations for hw and hb. The lines represent the theoretical results from Eqs. 16 and 17.
The simulations also demonstrate that a population reaches its equilibrium state very quickly after advantageous mutations are introduced (data not shown). The time from the appearance of an advantageous mutation to equilibrium is similar to the fixation time of an advantageous allele with fitness 1 + s in a single locus system [≈–2 ln(1/2N)/s; see ref. 17].
The theoretical results (Eqs. 13–17) might hold in a
diploid population with size N, even though there is a problem in
recursion equations (Eqs. 1a–d). The recursions can be
used in a diploid population only when the fitness effects of haplotypes are
additive. For example, the fitness of a diploid with A-A and
B-B is 1 – 2s, the fitness of a diploid with
A-B and B-A is 1, and so on. This additive assumption is not
consistent with the model considered here, in which diploids with genotype
AABB should be most advantageous. That is, the fitness of a diploid
with A-A and B-B is 1, not 1 – 2s.
Nevertheless, Eqs. 13–17 might hold in a diploid
population because of the following reason. Because strong selection is
assumed, most individuals are homozygotes of one of the two advantageous
haplotypes, say A-B, and others could be heterozygotes of
A-B and one of the two deleterious haplotypes (A-A or
B-B), as indicated by the result that hw
≈ 0 and hw ≈ 1. This means that the effect
of fitness of the heterozygotes of A-A and B-B may be
negligible because they appear only with frequency
2x1x4, which is extremely small when
x1 and x4 are much smaller than 1. In
other words, whatever the fitness of a heterozygote of A-A and
B-B, x1 and x4 cannot increase in a
diploid population with the assumption of strong selection, because
deleterious individuals with genotypes AAAA and BBBB appear
with frequency
.
Nucleotide Polymorphism in RHCE and RHD Genes
As an application of the theoretical results, the human RH genes, RHCE and RHD, are considered. Twenty-two complete coding sequences (five RHCE and 17 RHD) were obtained from GenBank. These sequences are aligned together, and the summary of the amounts of nucleotide variation is shown in Table 1. Forty-one amino acid replacement polymorphic sites are detected in a total of 50 segregating sites. As expected from other reports (e.g., refs. 31 and 39), the number of replacement polymorphic sites is very large. The average numbers of pairwise differences per site (πw) in the RHCE and RHD genes are 0.00526 and 0.00719, much higher than the genome average (0.0007–0.001; refs. 2, 40, and 41). These observations might be a signature of selection favoring amino acid changes (e.g., ref. 42).
Table 1. Summary of the amounts of nucleotide variation in human RH genes.
|
SW
|
πw (×100)
|
|||||||
|---|---|---|---|---|---|---|---|---|
| L | Rep | Syn | Rep | Syn | Total | |||
| Within RHCE (n = 5) | ||||||||
| Exons 1-5 | 801 | 10 | 2 | 0.897 | 0.604 | 0.824 | ||
| Exons 6-10 | 453 | 0 | 0 | 0 | 0 | 0 | ||
| Total | 1254 | 10 | 2 | 0.576 | 0.380 | 0.526 | ||
| Within RHD (n = 17) | ||||||||
| Exons 1-5 | 801 | 25 | 5 | 1.183 | 0.506 | 1.016 | ||
| Exons 6-10 | 453 | 3 | 2 | 0.106 | 0.445 | 0.195 | ||
| Total | 1254 | 28 | 7 | 0.799 | 0.483 | 0.719 | ||
|
SS
|
SF
|
πb (×100)
|
||||||
| L | Rep | Syn | Rep | Syn | Rep | Syn | Total | |
| Between RHCE and RHD | ||||||||
| Exons 1-5 | 801 | 9 | 2 | 0 | 0 | 2.383 | 0.742 | 1.977 |
| Exons 6-10 | 453 | 0 | 0 | 13 | 2 | 4.724 | 2.592 | 4.168 |
| Total | 1254 | 9 | 2 | 13 | 2 | 3.219 | 1.432 | 2.769 |
SW, number of polymorphic site within the gene; SS, number of shared polymorphic sites; SF, number of fixed polymorphic sites; n, sample size; L, sequence length.
Following ref. 43, the 50 segregating sites are classified into three groups: “specific polymorphic sites,” where polymorphism is observed in either of the two genes; “shared polymorphic sites,” where two nucleotides are segregating in both genes; and “fixed polymorphic sites,” where each gene has a different fixed nucleotide. The observed numbers of these three types of polymorphic sites are 24, 11, and 15, respectively. The existence of a relatively large number of shared polymorphic sites indicates that DNA variation in the two RH genes has been frequently homogenized, probably by gene conversion (43, 44). As illustrated in Fig. 2, the distributions of shared and fixed polymorphic sites are not uniform. All 11 shared polymorphic sites are in the first half of the coding region (exons 1–5), whereas all 15 fixed sites are in the remaining region (exons 6–10). This striking difference in the numbers of the two classes of polymorphic sites is highly significant (P < 10–6; Fisher's exact test), even though the test is conservative due to the nonindependence of the two regions (43). It is indicated that the mechanisms to maintain DNA variation in the two regions are very different. In the following analysis, therefore, the two regions are considered separately.
Fig. 2.
Distributions of the number of shared and fixed polymorphic sites. A window analysis is conducted, in which a 100-bp window is moved at 20-bp increments. The gene structures of the human RHCE and RHD genes (Upper) are according to refs. 30 and 49.
First, consider whether a neutral model can explain the observation. Assuming no selection, the mutation and gene conversion parameters can be estimated by a method of ref. 27. This method uses πw, πb and linkage disequilibrium to estimate θ, C, and R. Because the sequences obtained from GenBank are from independent chromosomes, linkage disequilibrium cannot be calculated. Therefore, θ and C are estimated from πw and πb, assuming free recombination (R = ∞). This assumption is not unreasonable, because the distance between the two genes is ≈80 kb, so that R may not be small when the recombination rate is on the same order as the mutation rate (45, 46). The effect of recombination on πw and πb is very small unless R is low (27). Given πw = 0.00920 and πb = 0.01977 in exons 1–5, θ and C are estimated to be 0.0047 and 0.423, respectively. The estimate of the gene conversion rate is ≈90 times larger than the mutation rate. This ratio is in the range of that in the Amy loci in Drosophila melanogaster, where C is estimated to be 60–165 times larger than θ (27). On the other hand, in exons 6–10, C is estimated to be 0.011, which is about 1/40 of the estimate in exons 1–5. Thus, a neutral model with very different levels of gene conversion might explain the data.
However, it is important to notice that the two genes were identical when gene duplication occurred. That is, it is not unreasonable to consider that gene conversion used to occur in exons 6–10 as frequently as in exons 1–5 in the initial stages of the duplicated genes. It is suggested that some kind of mechanism worked to dramatically reduce the level of gene conversion in exons 6–10. A relatively high level of divergence between the two genes in exons 6–10 (4%) might contribute to the reduction in the gene conversion rate. A drastic change in the DNA sequence caused by an indel could be a barrier to restrict gene conversion (e.g., ref. 18), although no such big indels are found in the flanking region of exon 7 (30).
Another mechanism is selection, which could reduce the level of gene conversion effectively (i.e., selection does not favor gene conversion to maintain the variation between the two genes, as demonstrated in the previous section). The very high level of amino acid differences between the two genes might support the selection hypothesis. As shown in Table 1, there are 15 fixed sites in this region, of which 13 are amino acid replacement changes. The ratio of the rate of nonsynonymous substitution (Ka) to the rate of synonymous substitution (Ks) is ω = Ka/Ks = 3.25. Unfortunately, ω is not significantly larger than 1, but because this test is extremely conservative, ω > 1 is sometimes considered as evidence for positive selection. The spatial distribution of the 15 fixed sites might also support the selection hypothesis. All 15 sites are in a relatively short region around exon 7. The length of this cluster is 121 bp, significantly shorter than expected (P < 10–5; permutation test). It should be noted that these amino acid changes characterize the difference between the RHCE and RHD antigens. Exon 7 encodes amino acids in the transmembrane and cytoplasmic domains and those on the exofacial surface. It is suggested that the RHCE/RHD antigen variation might be maintained in the human population by strong selection.
If this is the case, it might be possible to apply the two-locus model developed in this article to the target nucleotide site of selection in the RHCE and RHD genes. As discussed above, if strong selection works to maintain two different alleles (nucleotides) in the two-locus (site) system, the theory predicts that hb is close to 1 and hw is very small at the target site of selection. Although we do not know the exact position(s) of the target site(s) of selection, the pattern of polymorphism at all 16 replacement polymorphic sites in exons 6–10 is compatible with theoretical prediction. Here, we attempt to estimate the selection intensity for these sites. At the target site of selection, from Eqs. 16 and 17, the selection intensity is given by
![]() |
[18] |
when θ is very small. Because this equation can be applied to only a single pair of target sites of selection but we only know candidate sites (see above), we have to make some assumptions to estimate selection intensity. First, we assume that selection is working at all replacement polymorphic sites at equal intensity. The average hw in RHCE and RHD for the 16 sites are 0 and 0.022 (the average is 0.011), respectively, and the average hb is 0.989, so that the observed amounts of variation within and between the two genes can be explained when Ns is ≈45 times larger than C. If we use an estimate of C in exons 1–5, Ns is ≈20.
The selection intensity might be underestimated because of the assumption that selection is acting at all 16 replacement polymorphic sites at equal intensity. Because the peak of the fixed polymorphic sites is in the middle of exon 7 (Fig. 2), it might be reasonable to consider that the target site(s) is in exon 7. Because all 13 replacement variations in exon 7 are fixed sites, hw = 0 and hb = 1, and an estimate of Ns is infinity. A bigger sample size is needed to obtain a correct estimate of Ns. Using C = 0.423 for the gene conversion parameter in exons 6–10 might overestimate Ns if the gene conversion rate in exons 6–10 is lower than that in exons 1–5, as discussed above.
Discussion
Model and Theory. A two-locus gene conversion model with selection is developed. Under the joint action of selection, mutation, gene conversion, recombination, and random genetic drift, the pattern of allelic polymorphism is investigated by a diffusion method. Approximate formulas for the expectations of the moments of allele frequencies and the expected amounts of variation within and between two loci are obtained by assuming relatively strong selection. These expectations are given by functions of Ns, θ, and C, indicating they are nearly independent of the recombination rate. The approximate formulas for the expectations of p2, p3, p4, hw, and hb are in excellent agreement with the results of simulations when Ns is more than five times larger than C. The theoretical results demonstrate that hw and hb are getting close to 0 and 1 as selection intensity increases. It is indicated that selection works to keep one of the advantageous haplotypes (A-B or B-A) in a very high frequency instead of maintaining both in intermediate frequencies. This behavior may be similar to that under Kimura's model of compensatory evolution (see refs. 47 and 48).
Under this model, selection and gene conversion act in opposite directions; that is, gene conversion produces deleterious haplotypes, and selection works to eliminate them. Therefore, the pattern of allelic variation is determined mainly by the balance between gene conversion and selection as shown by Eq. 18. This equation indicates that very strong selection is needed to keep two different alleles in a population when gene conversion is active. In a population in which two alleles are nearly fixed (e.g., hw < 0.01 and hb > 0.99), Ns/C may be >50, indicating that successful evolution of new gene function might not occur without strong selection unless C is very small. This result is compatible with other theoretical studies, which indicate that one of the duplicated genes is silenced relatively quickly after duplication (e.g., refs. 12–16). Recently, Lynch and Conery (1) demonstrated the majority of duplicated genes become pseudogenes within a few million years, based on the survey of the genomic databases of several model species.
The model does not include null mutations by which genes are silenced. Although this is one of the important fates of duplicated genes, it might be possible to ignore such mutations in this model with strong selection, because selection might eliminate them immediately from the population.
Evolution of the Human RHCE and RHD Genes. DNA polymorphism in human RHCE and RHD genes is analyzed. Because the pattern of DNA polymorphism in exons 1–5 is completely different from that in exons 6–10, the two regions are analyzed separately. It is shown that ≈35% of segregating sites in exons 1–5 are shared polymorphic sites, indicating frequent gene conversion in this region. On the other hand, there is no shared polymorphism in the remaining coding region (exons 6–10). Instead, a large proportion of segregating sites (15 of 20) are fixed sites, and most of them (13 of 15) are amino acid replacement changes. Because all fixed sites are around exon 7, it is suggested that some kind of mechanism might be working in this region to accelerate the sequence divergence between the two genes. Although a reduction in the gene conversion rate due to a drastic change of the DNA sequences caused by indels might be one explanation, no such big indels are found in the flanking region of exon 7. An alternative and more likely explanation is that selection is acting to maintain the high level of amino acid differences between the two genes, and many aspects of the observed pattern of DNA variation could support this hypothesis. Assuming this is the case, the two-locus selection model developed in this article is applied to the data to estimate selection intensity. It is suggested that very strong selection (Ns is at least 45 times larger than C) is needed to explain the observed pattern of polymorphism.
Under the selection hypothesis, the evolutionary history of the human RHCE and RHD genes is inferred. The history of the duplicated genes started with two identical sequences created by tandem duplication. In the initial stages, gene conversion occurred quite frequently. The gene conversion parameter, C, might have been 0.4 or so, as estimated from the present-day polymorphism data in exons 1–5. This level of gene conversion is high enough to keep the two genes nearly identical. Then an advantageous mutation was introduced (probably in exon 7) and fixed in one gene. Selection might have been so strong that this fixation state was nearly stable and continued for quite a long time. During this state, additional mutations were accumulated near the target site of selection, creating the high level of sequence divergence between the two genes around exon 7. In exons 1–5, which are at least 3–4 kb upstream of exon 7, the sequence identity between the two genes has been maintained high by frequent gene conversion. The present human RHCE and RHD genes might be in their initial stage of further functional divergence, starting in the short region around exon 7. This region of high divergence might spread if the divergence itself reduces the gene conversion rate.
Although the application of the two-locus selection model to the human RHCE and RHD genes seems successful, there are a few caveats. The first concerns the well known RHD-negative chromosomes on which the RHD gene is either absent or silenced. Although the frequency of chromosomes with no RHD gene might be relatively low, there may be some effect of such chromosomes on the model. There might also be the possibility of other minor duplication and deletion polymorphism in this region. The second is the possibility of selection that might be working on exons 1–5. It is known that the RHCE gene encodes four types of antigens, CE, Ce, cE, and ce, which are characterized by the two amino acid positions 103 (exon 2) and 226 (exon 5) (reviewed in ref. 31). These antigen polymorphisms within RHCE might be under selection. The unusually high level of nonsynonymous polymorphic sites (80%) in exons 1–5 might also be a signature of selection. This selection might overestimate θ and C. The last is the theoretical problem in the treatment of selection in a diploid population. This might not much affect the results under the assumption of strong selection as discussed above, but when selection is weak, the problem should be considered seriously.
Acknowledgments
I thank D. Hewett-Emmett, M. Nordborg, T. Ohta, N. Rosenberg, F. Tajima, K. Teshima, B. Walsh, and two anonymous reviewers for comments and discussions.
Abbreviation: RH, rhesus.
References
- 1.Lynch, M. & Conery, J. S. (2000) Science 290, 1151–1155. [DOI] [PubMed] [Google Scholar]
- 2.The International Human Genome Sequencing Consortium (2001) Nature 409, 860–921. [DOI] [PubMed] [Google Scholar]
- 3.Bailey, J. A., Gu, Z., Clark, R. A., Reinert, K., Samonte, R. V., Schwartz, S., Adams, M. D., Myers, E. W., Li, P. W. & Eichler, E. E. (2002) Science 297, 1003–1007. [DOI] [PubMed] [Google Scholar]
- 4.Samonte, R. V. & Eichler, E. E. (2002) Nat. Rev. Genet. 3, 65–72. [DOI] [PubMed] [Google Scholar]
- 5.Haldane, J. B. (1932) The Causes of Evolution (Longmans Green, London).
- 6.Muller, H. J. (1936) Science 83, 528–530. [DOI] [PubMed] [Google Scholar]
- 7.Ohno, S. (1970) Evolution by Gene Duplication (Springer, New York).
- 8.Goodman, M., Moore, G. W. & Matsuda, G. (1975) Nature 253, 603–608. [DOI] [PubMed] [Google Scholar]
- 9.Li, W.-H. & Gojobori, T. (1983) Mol. Biol. Evol. 1, 94–108. [DOI] [PubMed] [Google Scholar]
- 10.Hughes, A. L. (1999) Adaptive Evolution of Genes and Genomes (Oxford Univ. Press, New York).
- 11.Zhang, J., Zhang, Y.-P. & Rosenberg, H. F. (2002) Nat. Genet. 30, 411–415. [DOI] [PubMed] [Google Scholar]
- 12.Li, W.-H. (1980) Genetics 95, 237–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Takahata, N. (1982) in Molecular Evolution, Protein Polymorphism and the Neutral Theory, ed. Kimura, M. (Springer, Berlin), pp. 169–190.
- 14.Watterson, G. A. (1983) Genetics 105, 745–766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Walsh, J. B. (1995) Genetics 139, 421–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Walsh, J. B. (2003) Genetica 118, 279–294. [PubMed] [Google Scholar]
- 17.Walsh, J. B. (1987) Genetics 117, 543–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Slightom, J. L., Blechl, A. E. & Smithies, O. (1980) Cell 21, 627–638. [DOI] [PubMed] [Google Scholar]
- 19.Hewett-Emmett, D., Venta, P. J. & Tashian, R. E. (1982) in Macromolecular Sequences in Systematic and Evolutionary Biology (Plenum, New York), pp. 357–405.
- 20.Ohta, T. (1981) Genet. Res. 37, 133–149. [DOI] [PubMed] [Google Scholar]
- 21.Ohta, T. (1982) Proc. Natl. Acad. Sci. USA 79, 3251–3254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ohta, T. (1983) Theor. Popul. Biol. 23, 216–240. [DOI] [PubMed] [Google Scholar]
- 23.Nagylaki, T. & Petes, T. D. (1982) Genetics 100, 315–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Arnheim, N. (1983) in Evolution of Genes and Proteins (Sinauer, Sunderland, MA), pp. 38–61.
- 25.Li, W.-H. (1997) Molecular Evolution (Sinauer, Sunderland, MA).
- 26.Nei, M., Gu, X. & Sitnikova, T. (1997) Proc. Natl. Acad. Sci. USA 94, 7799–7806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Innan, H. (2002) Genetics 161, 865–872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nagylaki, T. (1984) Genetics 106, 529–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chérif-Zahar, B., Mattéi, M. G., Le Van Kim, C., Bailly, P., Cartron, J.-P. & Colin, Y. (1991) Hum. Genet. 86, 398–400. [DOI] [PubMed] [Google Scholar]
- 30.Okuda, H., Suganuma, H., Kamesaki, T., Kumada, M., Tsudo, N., Omi, T., Iwamoto, S. & Kajii, E. (2000) Biochem. Biophys. Res. Commun. 274, 670–683. [DOI] [PubMed] [Google Scholar]
- 31.Avent, N. D. & Reid, M. E. (2000) Blood 95, 375–387. [PubMed] [Google Scholar]
- 32.Matassi, G., Chérif-Zahar, B., Pesole, G., Raynal, V. & Cartron, J.-P. (1999) J. Mol. Evol. 48, 151–159. [DOI] [PubMed] [Google Scholar]
- 33.Ohta, T. (1991) Proc. Natl. Acad. Sci. USA 88, 6716–6720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kimura, M. (1964) J. Appl. Prob. 1, 117–232. [Google Scholar]
- 35.Ohta, T. & Kimura, M. (1969) Genet. Res. 13, 47–55. [Google Scholar]
- 36.Ohta, T. & Kimura, M. (1969) Genetics 63, 229–238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kimura, M. & Takahata, N. (1983) Proc. Natl. Acad. Sci. USA 80, 1048–1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kurtz, T. G. (1970) J. Appl. Prob. 7, 49–58. [Google Scholar]
- 39.Mouro, I., Colin, Y., Chérif-Zahar, B., Cartron, J.-P. & Le Van Kim, C. (1993) Nat. Genet. 5, 62–65. [DOI] [PubMed] [Google Scholar]
- 40.Li, W.-H. & Sadler, L. A. (1991) Genetics 129, 513–523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Patil, N., Berno, A. J., Hinds, D. A., Barrett, W. A., Doshi, J. M., Hacker, C. R., Kautzer, C. R., Lee, D. H., Marjoribanks, C., McDonough, D. P., et al. (2001) Science 294, 1719–1723. [DOI] [PubMed] [Google Scholar]
- 42.Hughes, A. L. & Nei, M. (1988) Nature 335, 167–170. [DOI] [PubMed] [Google Scholar]
- 43.Innan, H. (2003) Genetics 163, 803–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kitano, T. & Saitou, N. (1999) J. Mol. Evol. 49, 615–626. [DOI] [PubMed] [Google Scholar]
- 45.Pritchard, J. K. & Przeworski, M. (2001) Am. J. Hum. Genet. 69, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Innan, H., Padhukasahasram, B. & Nordborg, M. (2003) Genome Res. 13, 1158–1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kimura, M. (1985) J. Genet. 64, 7–19. [Google Scholar]
- 48.Innan, H. & Stephan, W. (2001) Genetics 159, 389–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Okuda, H., Suganuma, H., Tsudo, N., Omi, T., Iwamoto, S. & Kajii, E. (1999) Biochem. Biophys. Res. Commun. 263, 378–383. [DOI] [PubMed] [Google Scholar]























