Abstract
Population genomic studies have shown that genetic draft and background selection can profoundly affect the genome-wide patterns of molecular variation. We performed forward simulations under realistic gene-structure and selection scenarios to investigate whether such linkage effects impinge on the ability of the McDonald–Kreitman (MK) test to infer the rate of positive selection (α) from polymorphism and divergence data. We find that in the presence of slightly deleterious mutations, MK estimates of α severely underestimate the true rate of adaptation even if all polymorphisms with population frequencies under 50% are excluded. Furthermore, already under intermediate rates of adaptation, genetic draft substantially distorts the site frequency spectra at neutral and functional sites from the expectations under mutation–selection–drift balance. MK-type approaches that first infer demography from synonymous sites and then use the inferred demography to correct the estimation of α obtain almost the correct α in our simulations. However, these approaches typically infer a severe past population expansion although there was no such expansion in the simulations, casting doubt on the accuracy of methods that infer demography from synonymous polymorphism data. We propose a simple asymptotic extension of the MK test that yields accurate estimates of α in our simulations and should provide a fruitful direction for future studies.
The relative importance of natural selection and random genetic drift in shaping molecular evolution is a matter of a longstanding dispute. Whereas the neo-Darwinian synthesis placed natural selection as the dominant force (1), from the late 1960s on it became popular to assume that the bulk of molecular variation is selectively neutral or at most weakly selected (2). The “neutral theory” of molecular evolution enabled development of analytical approaches, based on the diffusion approximation, for calculating the expected frequency spectra and fixation probabilities of polymorphisms of varying selective effect. Most of the currently available approaches for estimating selection and demography from population genetic data rest upon these results.
Recent studies have strongly challenged key assumption of the neutral theory. First, in many species the rate of adaptation appears to be very high with, for example, in Drosophila melanogaster more than 50% of the amino acid changing substitutions, and similarly large proportions of noncoding substitutions, driven to fixation by positive selection (3). Importantly, it appears that frequent adaptation strongly affects the genome-wide patterns of polymorphism (3–6). These results imply that the dynamics of a given polymorphism is not only affected by genetic drift and purifying selection acting at its particular site, but also by the so-called genetic draft (7), which describes the stochastic effects generated by recurrent selective sweeps at closely linked sites. Second, there is accumulating evidence that many polymorphisms in natural populations are slightly deleterious (8–11), and such polymorphisms are expected to generate another kind of interference among linked sites, known as background selection (12, 13). It is becoming increasingly clear that the assumption of independence between sites is violated in most cases in one way or another. What we do not yet fully understand is the extent to which these violations affect population genetic methods.
Here, we focus on the investigation of one of the primary methods to test the neutral theory and to estimate the rate of adaptation at the molecular level, introduced by McDonald and Kreitman in 1991 (14). The McDonald–Kreitman (MK) test contrasts levels of polymorphism and divergence at neutral and functional sites and uses this contrast to estimate the fraction of substitutions at the functional sites that were driven to fixation by positive selection. The MK test has been applied in many organisms with estimates of the rate of adaptation varying from extremely high in Drosophila (3) and Escherichia coli (15), to virtually zero in yeast (16) and humans (8, 17). These differences might reflect true variation in the rate of adaptation in different lineages or indicate that the test is biased to different extent, and possibly in different direction, in those lineages (18).
By using closely interdigitated sites, the MK test is robust to many sources of error, such as variation of mutation rate across the genome and variation in coalescent histories at different genomic locations. It can be confounded, however, by slightly deleterious mutations and demography (18). Much work has thus gone into the development of sophisticated extensions of the MK test that use the frequency distribution of polymorphisms to estimate the demographic history of the organism in question, to assess the distribution of deleterious effects at the functional sites, and to correct for both in estimating the rate of adaptation (8, 16, 19–26). However, all of these extensions are still based on the assumption that evolutionary dynamics at different sites can be modeled independently of each other. In light of the recent findings that genetic draft and background selection might often be important, it is essential to verify that these methods are robust to the linkage effects from advantageous and weakly deleterious polymorphisms and their interactions.
Results
The MK test compares the levels of diversity at neutral sites and potentially functional sites (p) with the respective levels of divergence ( and d) to evaluate whether neutral evolution can be rejected at the functional sites (14). An extension of the test can be used to estimate the fraction (α) of substitutions driven to fixation by positive selection at the functional sites (18, 27) (SI Text):
A known problem of this approach is slightly deleterious mutations. To minimize their impact, it has been proposed to exclude polymorphisms that are below a certain cutoff frequency (24, 28). More sophisticated extensions of the MK test attempt to infer the actual distribution of fitness effects (DFE) of new mutations at functional sites from the site frequency spectrum (SFS) of polymorphisms at those sites, and then correct the estimates of α accordingly (8, 9, 19–22, 25).
To study the effects of linkage and selection on MK-type approaches, we conducted forward population genetic simulations of a 10-Mb–long chromosome with realistic gene structure, evolving under mutation, recombination, and selection (Materials and Methods). The simulated chromosome resembles a moderately gene-rich region of the human genome with ∼4% of its sites assumed to be functional. Note that functional density varies strongly across eukaryotes, from a few percent of constrained sites in humans to upward of 50% in Drosophila, and the effects of linked selection should become more pronounced with higher functional density. Thus, if we find strong linkage effects in our scenario with only 4% functional density, we would then expect even stronger effects in the functionally denser genomes such as those found in flies. In this way, our scenario should be conservative for many eukaryotic species.
Mutations occurring at functional sites had their selection coefficients (s) drawn from a specified DFE, whereas every fourth site in exons represented a neutral, synonymous site. We assumed a mutation rate of per site and generation, a recombination rate of , and a panmictic diploid population of size (29, 30). These parameters are compatible with standard estimates for human evolution, such as heterozygosity at synonymous sites: . Note, however, that rather than the absolute values of μ, r, N, and s, primarily the products Nμ (specifying the overall rate at which new mutations arise in the population), Ns (specifying the effective strength of selection), and the ratio s/r (determining the region over which a selective sweep affects the genome) should matter in our analysis. We further required that the ratio of the substitution rate at functional sites versus synonymous sites be , the value found in humans and similar to that of many other species. This condition sets bounds on the amount of purifying selection at functional sites. In our simulations, we estimated divergence from the mutations that fixed during a simulation run. Polymorphism levels and frequency distributions were estimated from population samples of 100 randomly drawn chromosomes, taken every N generations throughout a run.
The key observables in MK-type approaches are the levels of polymorphism and divergence at neutral and functional sites. Some approaches additionally take the SFS of polymorphism into account. In the following sections we study the effects of linkage and selection on these quantities individually, and the resulting effects on MK estimates.
Linkage Effects on Levels of Neutral Polymorphism.
It is well known that genetic draft and background selection can reduce the levels of polymorphism at linked neutral sites (13, 31). Analytical approximations have been derived for calculating the expected reduction due to background selection caused by strongly deleterious mutations (32, 33), as well as genetic draft (5, 34) (SI Text). To assess the accuracy of these results, we compared the level of heterozygosity at synonymous sites in our simulation with the analytically predicted values. Functional mutations were of four types in our simulations: neutral, beneficial, deleterious, and strongly deleterious. Each type had a specific selection coefficient: , , , and , respectively. We assumed that 40% of functional mutations are always strongly deleterious (20, 22) and we set . We chose , , and α as our free parameters, which allowed us to assess how different strengths of purifying selection (by varying the value of ), positive selection (by varying ), and rate of adaptation (by varying α) affect the results. Values of α in our simulations ranged from 0 to 0.5, from 0.001 to 0.05, and from −1 to −100 (Table S1).
Fig. 1A shows that inferred and predicted levels of neutral heterozygosity are generally in good agreement. The amount by which linkage effects reduce is primarily determined by the product of rate and strength of adaptation (Fig. 1A, Inset). The contribution of background selection is typically less severe and appears most pronounced for the very weakly deleterious selection coefficients, as indicated by the observation that for the same value of , the simulation runs with the weaker deleterious selection coefficients (, darker points in the inset) yield stronger reduction.
Linkage Effects on the SFS at Functional and Synonymous Sites.
Some heuristic extensions of the MK test simply eliminate low-frequency variants. Other, more sophisticated, extensions try to infer the actual DFE at functional sites from the SFS, based on the assumption of the mutation–selection–drift balance (SI Text). It is well known that genetic draft and background selection can distort the SFS from this expectation (6, 10, 34–39). What is not clear is whether the deviations are substantial under realistic evolutionary scenarios and whether this might affect methods based on the assumption of mutation–selection–drift balance. We measured the SFS at functional and synonymous sites in our simulations and compared it with the prediction under mutation–selection–drift balance given the DFE of the particular simulation run. In an attempt to account for the reduction in overall levels of diversity and reduced effectiveness of selection due to genetic draft and background selection, we replaced N in the formulas for mutation–selection–drift balance by an effective population size , inferred from the level of heterozygosity at synonymous sites in each particular simulation run.
Fig. 1B (Left) shows the observed and expected SFS at functional and synonymous sites in our simulations for a scenario with no adaptation but high levels of background selection . Expected and observed spectra are in good agreement, suggesting that for the chosen recombination rate and functional density the effects of background selection alone are well approximated by mutation–selection–drift balance with being adjusted to the value obtained from the level of neutral heterozygosity.
However, already under moderately frequent adaptation substantial deviations emerge between observed and expected spectra (Fig. 1B, Middle and Right). These distortions do not fit any model of mutation–selection–drift balance with a constant effective population size. Methods based on mutation–selection–drift balance might therefore run into severe biases in the presence of even moderate levels of adaptation.
Linkage Effects on Fixation Probabilities of Deleterious Mutations.
Levels of divergence at functional and neutral sites are the other key parameters in MK-type approaches. Linked selection cannot affect the rate of neutral divergence, as it is always equal to the rate of mutation at neutral sites. The rate of divergence at functional sites, however, could be affected substantially. In the Wright–Fisher model under free recombination, a new mutation with selection coefficient s eventually fixes with probability:
Genetic draft and background selection are expected to increase the fixation probabilities of deleterious mutations: Under recurrent selective sweeps, deleterious mutations can hitchhike to frequencies they are unlikely to reach under mutation–selection–drift balance alone, increasing their chance of fixation over that expected without linkage (11, 40). Similarly, background selection renders purifying selection less effective by reducing the number of successfully reproducing individuals, thereby also increasing the fixation probabilities of deleterious mutations (13, 40, 41).
One common approach for addressing these issues is to assume that [2] can still be used but that N has to be replaced by a lower, effective population size . It is not clear though whether a single scalar applies over a range of selection coefficients. We tested this in our simulations by measuring the fixation probabilities of deleterious mutations with different selection coefficients and then inferring the corresponding values of according to [2] for the different selection coefficients in the same run independently. Every run had a particular rate (α) and strength of adaptation; deleterious functional mutations had selection coefficients , , , and , with all four classes being of equal proportion. The fraction of neutral mutations at functional sites was again tuned to .
Fig. 1C shows the inferred values of according to [2] as a function of . Our results confirm that genetic draft and background selection generally increase fixation probabilities of deleterious mutations, as indicated by the fact that the inferred is always smaller than the actual . However, the fixation probabilities of mutations of different selection coefficients correspond to very different values of . For example, in the simulation run with and , the mutations with fix with a probability that corresponds to , whereas the mutations with yield . For stronger sweeps and higher α the discrepancies become even more pronounced. In none of the investigated scenarios we did we find a scalar that works for all four deleterious selection coefficients. Note that because N enters [2] exponentially, small differences in N can yield substantial differences in the actual fixation probabilities.
These results indicate that there is no scalar transformation of that would allow us to estimate fixation probabilities across multiple fitness classes. Thus, even if we were to know the true DFE at functional sites, it would still be impossible to use mutation–selection–drift methods to predict the rate of fixation of deleterious mutations under scenarios that include even moderate amounts of genetic draft.
MK Estimates of the Rate of Adaptation.
In the previous sections we have shown that linked selection can affect the key quantities in the MK test in complex ways that do not fit the predictions under mutation–selection–drift balance. However, some of the errors partially compensate for each other in the context of the MK test. For example, genetic draft might cause deleterious mutations to appear virtually neutral in the polymorphism data (they could be present at unexpectedly high frequencies) but would also elevate their probabilities of fixation close to that of neutral mutations. It is thus possible that the effects we described above might generally not affect MK estimates of α strongly. Our simulations allow us to explicitly test the accuracy of MK estimates of α inferred from [1]. Fig. 2 shows the comparison of true values and MK estimates for all simulation runs from Table S1. To minimize the bias generated by slightly deleterious polymorphisms, we considered only polymorphisms with a derived allele frequency of (Fig. 2, Left) or (Fig. 2, Right) in the samples. Our results demonstrate that MK estimates of α under both cutoffs still tend to underestimate α, often substantially. For example, when the true α equals 0.4, the MK estimate using a cutoff yields a negative value of −0.2 for a scenario where and . Increasing the cutoff from to reduces this discrepancy, but substantial errors remain. In the above scenario with the MK estimate still yields only .
The underestimation of α is generally more pronounced when deleterious mutations are only weakly deleterious than when they are strongly deleterious. This is consistent with weakly deleterious mutations having a higher chance of contributing to polymorphism than strongly deleterious mutations, but still having low probabilities of fixation. Strongly deleterious mutations contribute to neither polymorphism nor divergence and thus do not bias estimates of α. As strength of positive selection increases, the biases due to weakly deleterious mutations can be mitigated to some extent because now they become effectively neutral and contribute to both polymorphism and divergence.
DFE-Based Extensions of the MK Approach.
Several methods for correcting possible biases in MK estimates have been proposed that attempt to first estimate the DFE at functional sites and then calculate how many nonadaptive mutations are expected to become fixed given the inferred DFE (8, 9, 19–22, 25). Any excess of substitutions should be attributable to adaptation. Some approaches additionally aim to correct for possible effects of demography, which is first inferred from the SFS at synonymous sites and then used for correcting the SFS at functional sites (21, 22, 42).
One particularly popular such method is DFE-alpha by Eyre-Walker and Keightley (25). We investigated the performance of this method as a representative of the class of methods based on the same paradigm. DFE-alpha models the DFE at functional sites by a gamma distribution, specified by the mean strength of selection , and a shape parameter β, allowing the distribution to take on a variety of shapes ranging from leptokurtic to platykurtic. DFE-alpha incorporates two simple demographic models: (i) constant population size and (ii) a single, instantaneous change in population size from an ancestral size to a present-day size having occurred t generations ago. Provided the SFS at both neutral and functional sites and the respective levels of divergence, DFE-alpha infers , and α at functional sites.
We applied DFE-alpha to polymorphism and divergence data from our simulations (SI Materials and Methods). For this analysis, we modified our simulations such that the selection coefficients of the nonadaptive mutations at functional sites were drawn from a gamma distribution and thus the same distribution was used in the simulations as was assumed by DFE-alpha. We chose a shape parameter of , resembling empirical estimates from polymorphism data at nonsynonymous sites in humans (9, 21, 22). We varied α from 0 to 0.5 and investigated two scenarios with or . The mean of the DFE was tuned for each scenario such that . Throughout our simulations population size was always kept constant at individuals.
Table 1 shows the performance of DFE-alpha under its two demographic models. When using the correct model of constant population size, DFE-alpha systematically overestimates α and underestimates the strength of selection against deleterious mutations. The shape parameter β of the gamma distribution is overestimated by up to twofold. These biases are generally more pronounced for the scenarios with stronger sweeps than for those with weaker sweeps. Under the model with a population size change, the estimates of α and β become more accurate but the mean strength of selection against deleterious mutations is now overestimated. Strikingly, under this model DFE-alpha always infers a population size expansion although there was no such expansion in our simulation.
Table 1.
Simulation parameters |
DFE-alpha (constant) |
DFE-alpha (step-change) |
|||||||||
sb | α | γ | β | α | γ | β | α | γ | β | ξ | t |
— | 0.00 | 448 | 0.2 | 0.12 | 297 | 0.26 | 0.00 | 703 | 0.21 | 5.0 | 6.2 |
0.001 | 0.05 | 434 | 0.2 | 0.20 | 264 | 0.27 | 0.07 | 676 | 0.21 | 5.0 | 5.4 |
0.001 | 0.09 | 437 | 0.2 | 0.22 | 288 | 0.26 | 0.09 | 914 | 0.20 | 8.8 | 5.2 |
0.001 | 0.18 | 441 | 0.2 | 0.27 | 265 | 0.27 | 0.15 | 754 | 0.21 | 8.8 | 5.4 |
0.001 | 0.28 | 422 | 0.2 | 0.40 | 276 | 0.27 | 0.29 | 1,055 | 0.20 | 10.0 | 4.6 |
0.001 | 0.37 | 836 | 0.2 | 0.50 | 354 | 0.28 | 0.41 | 1,250 | 0.21 | 10.0 | 4.7 |
0.001 | 0.49 | 1,638 | 0.2 | 0.57 | 532 | 0.29 | 0.48 | 2438 | 0.21 | 10.0 | 4.2 |
0.01 | 0.06 | 424 | 0.2 | 0.24 | 233 | 0.27 | 0.11 | 635 | 0.21 | 5.0 | 4.9 |
0.01 | 0.09 | 424 | 0.2 | 0.26 | 217 | 0.29 | 0.12 | 675 | 0.22 | 10.0 | 4.7 |
0.01 | 0.18 | 381 | 0.2 | 0.40 | 152 | 0.31 | 0.24 | 654 | 0.21 | 10.0 | 3.5 |
0.01 | 0.27 | 339 | 0.2 | 0.49 | 109 | 0.34 | 0.31 | 618 | 0.21 | 10.0 | 2.8 |
0.01 | 0.36 | 652 | 0.2 | 0.58 | 158 | 0.35 | 0.43 | 1,113 | 0.22 | 10.0 | 2.6 |
0.01 | 0.47 | 1,154 | 0.2 | 0.68 | 182 | 0.38 | 0.53 | 1,802 | 0.22 | 10.0 | 2.1 |
Each row denotes a particular simulation run with the parameters specified in the left four columns. The average strength of purifying selection was calculated from the mean of the DFE used in the simulation and inferred from heterozygosity at synonymous sites. The middle three columns show the estimates from DFE-alpha under the demographic model with constant population size. The last five columns show the estimates under the demographic model with a single population size change. is the inferred ratio between present and ancient population size; t is the estimated time since the population size change in units of generations.
This behavior of DFE-alpha is consistent with the fact that genetic draft leaves signatures in the SFS similar to those observed under a recent population size expansion, namely a skew toward low-frequency polymorphisms. The extent of this effect, however, is alarming, given that even for a scenario where α is only about 0.1, an almost 10-fold population size expansion is already inferred (reflecting a built-in limit of DFE-alpha as currently implemented). Note that even in the scenario with no adaptation, DFE-alpha still infers a fivefold expansion, implying that background selection alone can already bias demographic inference.
Discussion
In this study, we have used forward simulations that explicitly incorporate linkage and selection on a chromosome-wide scale to investigate quantitatively how linked selection can bias the MK test and its extensions to infer the rate of adaptation. Consistently with previous results (24), we found that MK estimates of the rate of adaptation can be severely biased in the presence of slightly deleterious mutations and generally underestimate α. Unfortunately, the standard approaches to address this known problem do not typically resolve it:
i) Excluding low-frequency polymorphisms from the analysis renders MK estimates more accurate but substantial biases remain. The reason for this is that the dynamics of slightly deleterious polymorphisms under recurrent selective sweeps can be very different from the expectation under the diffusion model, which predicts that frequent mutations should have a realistic chance of eventually reaching fixation. However, under recurrent selective sweeps, a slightly deleterious mutation can easily hitchhike to substantial population frequencies, yet become unlinked during the late phase of a sweep. This deleterious mutation can then spend substantial time as a frequent polymorphism in the population while it slowly declines in frequency. At every stage of this process, the frequency of the mutation overestimates its fixation probability. Such mutations are not effectively removed from a population sample by excluding low-frequency polymorphisms.
ii) Some methods aim to address the problem of slightly deleterious mutations by estimating the actual DFE of new mutations at functional sites. We found that these methods misestimate the mean and the shape of the DFE and, as a result, tend to overestimate α. This is not surprising given that such approaches infer the DFE by fitting the observed SFS to that predicted under mutation–selection–drift balance, which can be substantially distorted by linkage effects.
iii) The most sophisticated extensions of the MK test available today additionally attempt to correct for demography. Interestingly, we found that such methods obtain accurate estimates of the rate of adaptation while inferring erroneous demography and also inaccurate estimates of the mean strength of purifying selection. This seeming contradiction reflects the fact that the distortions of the SFS at synonymous sites, which these methods interpret to be due to demography, can in fact be due to genetic draft. As we have shown in Fig. 1B, these distortions are very similar at synonymous and functional sites. Thus, by imposing a demographic scenario that corrects for distortions of the SFS at synonymous sites, the methods effectively also correct the SFS at functional sites.
This observation suggests a simple heuristic extension of the standard MK test that might already provide reasonable estimates without having to invoke demography. To illustrate such an approach, let us define as a function of the frequency of the derived mutations:
Here, and are the levels of polymorphism at functional and synonymous sites, respectively, for the specific derived allele frequency x. Because depends only on the ratio , any biases affecting the SFS at functional and synonymous sites in the same way, regardless whether due to demography or genetic draft, effectively cancel out. Furthermore, we can extrapolate to , where it should converge close to the true α, assuming that adaptive mutations do not significantly contribute to polymorphism and that purifying selection has been sufficiently stable over time. As a proof of principle, we show in Fig. 3A and Fig. S1 that this simple heuristic approach indeed converges asymptotically to the true value of α in our simulations, even in a scenario with a high rate of adaptation , strong sweeps , and slightly deleterious mutations .
To obtain the asymptotic value of in the limit , we fitted an exponential function of the form to the data. This makes intuitive sense for the case where deleterious mutations all have the same selection coefficient, and levels of functional polymorphisms should thus decay approximately exponentially over the respective levels of neutral polymorphisms with increasing frequency. However, it is not clear which functional form should be fitted in scenarios where selection coefficients are drawn from a broader distribution, unless the specific DFE were known.
Fig. 3B shows that the simple exponential fit still works reasonably well even for the scenarios where deleterious selection coefficients were drawn from a gamma distribution. The asymptotic MK estimates obtained this way no longer suffer from a systematic downward bias due to deleterious mutations and are much more accurate than standard MK estimates obtained using a cutoff frequency of , as well as estimates from DFE-alpha without the “demographic correction.” They are comparable in accuracy to estimates from DFE-alpha with the correction. Clearly, future analyses need to verify whether the asymptotic approach also works for a broader class of DFEs and complex demographic scenarios. However, neither is it clear whether DFE-alpha with the demographic correction would always work in such scenarios. One advantage of the asymptotic MK approach is that it provides an easy way to evaluate the goodness of fit of its estimates, as we have shown in Fig. 3 A and C.
We applied asymptotic MK to previously analyzed polymorphism and divergence data from D. melanogaster and humans (SI Materials and Methods). For the human data we obtained an asymptotic MK estimate of (0.09, 0.19) (Fig. 3C), which is consistent with the range of estimated in ref. 22. Note that the standard MK estimate for these data when excluding all polymorphisms with sample frequencies below 10% yields a negative value . For D. melanogaster, we obtained an estimate of (0.54, 0.60). This estimate is similar, although somewhat higher, than previously estimated values obtained from earlier polymorphism data sets in this species (3, 43).
The results presented in this study have important ramifications for the inference of evolutionary parameters from polymorphism and divergence data: The standard MK approach, with or without excluding rare polymorphisms, can produce severely biased estimates under many scenarios and even when adaptation is not pervasive. However, it appears that despite the complexity of the process we do have means of estimating the rate of adaptive evolution by using DFE-alpha like approaches with the demographic correction, or using the simple asymptotic MK approach we suggested above.
Unfortunately, estimation of the DFE, and especially of demography, tends to be severely affected by already moderate amounts of genetic draft and background selection. Estimating demography from neutral sites that are close to functional ones (such as synonymous sites) should in general lead to erroneous inference of population expansions.
Our analysis suggests that in the presence of genetic draft and background selection the evolutionary interactions among linked polymorphisms of different selective effects are complex and consequential. It is clear that the standard diffusion approximation that attempts to model evolution at different sites independently and wrap the complexity of linkage effects among sites into effective parameters, such as , can introduce massive errors into the estimation of key population genetic parameters. We thus believe that new analytics need to be developed that correct for linkage effects. At the very least, one has to verify with forward simulations, such as the one presented here or similar programs (44), that commonly used heuristic and analytic methods in population genetics are robust to linkage effects.
Materials and Methods
Our simulations model the population dynamics of a 10-Mb–long chromosome on which genes are placed equidistantly with a density of 1 gene per 40 kb. Each gene consists of eight exons of length 150 bp each, separated by introns of length 1.5 kb. Genes are flanked by a 550-bp–long 5′ UTR and a 250-bp–long 3′ UTR. We assume that three out of four sites in exons and UTRs are functional sites. Every fourth site in exons and UTRs is used to model synonymous sites. We assume that mutations are codominant and that fitness effects at different sites in the genome are additive. A full description of the simulation is provided in SI Materials and Methods.
Supplementary Material
Acknowledgments
We thank David Lawrie for processing the polymorphism and divergence data used for the asymptotic MK estimation in Fig. 3C. We also thank Daniel Fisher, Peter Keightley, Adam Eyre-Walker, David Enard, Nandita Garud, members of the D.A.P. laboratory, and four anonymous reviewers for helpful discussions and comments on the manuscript. This work was supported by the National Institutes of Health Grants R01GM100366, R01GM097415, and R01GM089926 (to D.A.P.).
Footnotes
The authors declare no conflict of interest.
*This Direct Submission article had a prearranged editor.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1220835110/-/DCSupplemental.
References
- 1.Lewontin RC. 1974. The Genetic Basis of Evolutionary Change (Columbia Univ Press, New York)
- 2.Kimura M. 1983. The Neutral Theory of Molecular Evolution (Cambridge Univ Press, Cambridge, UK)
- 3.Sella G, Petrov DA, Przeworski M, Andolfatto P. Pervasive natural selection in the Drosophila genome? PLoS Genet. 2009;5(6):e1000495. doi: 10.1371/journal.pgen.1000495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Andolfatto P. Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res. 2007;17(12):1755–1762. doi: 10.1101/gr.6691007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Macpherson JM, Sella G, Davis JC, Petrov DA. Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics. 2007;177(4):2083–2099. doi: 10.1534/genetics.107.080226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sattath S, Elyashiv E, Kolodny O, Rinott Y, Sella G. Pervasive adaptive protein evolution apparent in diversity patterns around amino acid substitutions in Drosophila simulans. PLoS Genet. 2011;7(2):e1001302. doi: 10.1371/journal.pgen.1001302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gillespie JH. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics. 2000;155(2):909–919. doi: 10.1093/genetics/155.2.909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bustamante CD, et al. Natural selection on protein-coding genes in the human genome. Nature. 2005;437(7062):1153–1157. doi: 10.1038/nature04240. [DOI] [PubMed] [Google Scholar]
- 9.Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat Rev Genet. 2007;8(8):610–618. doi: 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]
- 10.Lohmueller KE, et al. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 2011;7(10):e1002326. doi: 10.1371/journal.pgen.1002326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chun S, Fay JC. Evidence for hitchhiking of deleterious mutations within the human genome. PLoS Genet. 2011;7(8):e1002240. doi: 10.1371/journal.pgen.1002240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993;134(4):1289–1303. doi: 10.1093/genetics/134.4.1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Charlesworth B. The effects of deleterious mutations on evolution at linked sites. Genetics. 2012;190(1):5–22. doi: 10.1534/genetics.111.134288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351(6328):652–654. doi: 10.1038/351652a0. [DOI] [PubMed] [Google Scholar]
- 15.Charlesworth J, Eyre-Walker A. The rate of adaptive evolution in enteric bacteria. Mol Biol Evol. 2006;23(7):1348–1356. doi: 10.1093/molbev/msk025. [DOI] [PubMed] [Google Scholar]
- 16.Elyashiv E, et al. Shifts in the intensity of purifying selection: An analysis of genome-wide polymorphism data from two closely related yeast species. Genome Res. 2010;20(11):1558–1573. doi: 10.1101/gr.108993.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang L, Li WH. Human SNPs reveal no evidence of frequent positive selection. Mol Biol Evol. 2005;22(12):2504–2507. doi: 10.1093/molbev/msi240. [DOI] [PubMed] [Google Scholar]
- 18.Fay JC. Weighing the evidence for adaptation at the molecular level. Trends Genet. 2011;27(9):343–349. doi: 10.1016/j.tig.2011.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bustamante CD, et al. The cost of inbreeding in Arabidopsis. Nature. 2002;416(6880):531–534. doi: 10.1038/416531a. [DOI] [PubMed] [Google Scholar]
- 20.Eyre-Walker A, Woolfit M, Phelps T. The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics. 2006;173(2):891–900. doi: 10.1534/genetics.106.057570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Keightley PD, Eyre-Walker A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics. 2007;177(4):2251–2261. doi: 10.1534/genetics.107.080663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Boyko AR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4(5):e1000083. doi: 10.1371/journal.pgen.1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Andolfatto P. Controlling type-I error of the McDonald-Kreitman test in genomewide scans for selection on noncoding DNA. Genetics. 2008;180(3):1767–1771. doi: 10.1534/genetics.108.091850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Charlesworth J, Eyre-Walker A. The McDonald-Kreitman test and slightly deleterious mutations. Mol Biol Evol. 2008;25(6):1007–1015. doi: 10.1093/molbev/msn005. [DOI] [PubMed] [Google Scholar]
- 25.Eyre-Walker A, Keightley PD. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol Biol Evol. 2009;26(9):2097–2108. doi: 10.1093/molbev/msp119. [DOI] [PubMed] [Google Scholar]
- 26.Wilson DJ, Hernandez RD, Andolfatto P, Przeworski M. A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet. 2011;7(12):e1002395. doi: 10.1371/journal.pgen.1002395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Eyre-Walker A. The genomic rate of adaptive evolution. Trends Ecol Evol. 2006;21(10):569–575. doi: 10.1016/j.tree.2006.06.015. [DOI] [PubMed] [Google Scholar]
- 28.Fay JC, Wyckoff GJ, Wu CI. Positive and negative selection on the human genome. Genetics. 2001;158(3):1227–1234. doi: 10.1093/genetics/158.3.1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000;156(1):297–304. doi: 10.1093/genetics/156.1.297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.McVean GA, et al. The fine-scale structure of recombination rate variation in the human genome. Science. 2004;304(5670):581–584. doi: 10.1126/science.1092500. [DOI] [PubMed] [Google Scholar]
- 31.Stephan W. Genetic hitchhiking versus background selection: The controversy and its implications. Philos Trans R Soc Lond B Biol Sci. 2010;365(1544):1245–1253. doi: 10.1098/rstb.2009.0278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hudson RR, Kaplan NL. Deleterious background selection with recombination. Genetics. 1995;141(4):1605–1617. doi: 10.1093/genetics/141.4.1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Stephan W, Charlesworth B, McVean G. The effect of background selection at a single locus on weakly selected, partially linked variants. Genet Res. 1999;73:133–146. [Google Scholar]
- 34.Wiehe TH, Stephan W. Analysis of a genetic hitchhiking model, and its application to DNA polymorphism data from Drosophila melanogaster. Mol Biol Evol. 1993;10(4):842–854. doi: 10.1093/oxfordjournals.molbev.a040046. [DOI] [PubMed] [Google Scholar]
- 35.Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(1):23–35. [PubMed] [Google Scholar]
- 36.Braverman JM, Hudson RR, Kaplan NL, Langley CH, Stephan W. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics. 1995;140(2):783–796. doi: 10.1093/genetics/140.2.783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155(3):1405–1413. doi: 10.1093/genetics/155.3.1405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.McVean GA, Charlesworth B. The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics. 2000;155(2):929–944. doi: 10.1093/genetics/155.2.929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bustamante CD, Wakeley J, Sawyer S, Hartl DL. Directional selection and the site-frequency spectrum. Genetics. 2001;159(4):1779–1788. doi: 10.1093/genetics/159.4.1779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hartfield M, Otto SP. Recombination and hitchhiking of deleterious alleles. Evolution. 2011;65(9):2421–2434. doi: 10.1111/j.1558-5646.2011.01311.x. [DOI] [PubMed] [Google Scholar]
- 41.Barton NH. Linkage and the limits to natural selection. Genetics. 1995;140(2):821–841. doi: 10.1093/genetics/140.2.821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Williamson SH, et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA. 2005;102(22):7882–7887. doi: 10.1073/pnas.0502300102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mackay TF, et al. The Drosophila melanogaster Genetic Reference Panel. Nature. 2012;482(7384):173–178. doi: 10.1038/nature10811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hernandez RD. A flexible forward simulator for populations subject to selection and demography. Bioinformatics. 2008;24(23):2786–2787. doi: 10.1093/bioinformatics/btn522. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.