Abstract
Simulations of positive directional selection, under parameter values appropriate for approximating human genetic diversity and rates of recombination, reveal that the effects of strong selective sweeps on patterns of linkage disequilibrium (LD) mimic the pattern expected with recombinant hotspots.
IN several cases, the local distribution of meiotic recombination in humans is nonuniform and concentrated into small regions, on the order of 1 kbp in size, termed recombinant hotspots (for reviews see Petes 2001; Arnheim et al. 2003; de Massy 2003; Wall and Pritchard 2003; Kauppi et al. 2004). These recombinant hotspots appear to be a common feature in the human genome (Crawford et al. 2004; McVean et al. 2004) and are rarely shared between humans and closely related species (Wall et al. 2003; Ptak et al. 2004, 2005; Winckler et al. 2005). This suggests recombinant hotspots may be rapidly evolving and species specific. These hotspots contribute to the “haplotype block” pattern of genetic regions with high linkage disequilibrium (LD; nonindependent associations between alleles at different positions) separated by boundaries of low LD, which is actively being characterized to optimize marker choice for association mapping studies (International HapMap Consortium 2003, 2005; Tishkoff and Verrelli 2003). Knowledge of the distribution of LD is critical to mapping the genetic basis of complex phenotypes (Weiss and Clark 2002). Methods have been developed to detect recombinant hotspots from DNA sequence data, which utilize patterns of LD to infer the existence of these hotspots (e.g., Chakravarti et al. 1984; Li and Stephens 2003; Zhang et al. 2004; for review see Stumpf and McVean 2003). In particular, Li and Stephens (2003) developed a coalescent-based “product of approximate conditionals” model, which uses the distribution of haplotypes to estimate the likelihood of the underlying recombination rate.
Positive directional selection, in which a new mutant rises in frequency and quickly fixes in a population (i.e., a selective sweep), can be rapid on an evolutionary timescale and/or population specific (for reviews see Andolfatto 2001; Aquadro et al. 2001; Schlötterer 2002a; Bamshad and Wooding 2003). One predicted effect of this type of selection on a sample of DNA sequences is an increase in LD in regions flanking the site undergoing selection (Kim and Stephan 2002; Przeworski 2002), but a reduction of LD across the site of selection (Kim and Nielsen 2004). This dual pattern may not be intuitive at first, so consider that reducing the genealogical history of a sequence reduces the number of recombination events, thus generally increasing LD in the region. However, for linked genetic variation to be present immediately after a selective sweep, it must either be a new mutation, and therefore rare and contribute little to overall LD, or have experienced recombination during the sweep with the target of selection. This suggests that haplotypes with LD between polymorphic alleles that span the target of selection will not persist beyond the fixation of the selected allele. Here we do not address gene conversion, which could preserve LD over the site of selection. This pattern of two regions of high LD, separated by low LD, is similar to the pattern of LD expected with a recombinant hotspot. Furthermore, the speed and species specificity of selective sweeps may also mimic the species-specific distribution of recombinant hotspots.
To explore the possible effect of selective sweeps on the inference of recombinant hotspots, we simulated positive directional selection of varying intensities (using SelSim 2.1, Spencer and Coop 2004) and applied hotspot detection software (Hotspotter 1.0, Li and Stephens 2003) to the resulting simulated sequences. The parameter values for the simulations were picked to approximate a 10-kbp sequence of DNA from a human population sample. We found that for strong positive selection (σ ≈ 100, where σ ≡ 2Ns, N is the diploid population size, and s is the relative strength of selection), a locally elevated recombination rate can falsely be inferred in the region of selection with statistical significance 22% of the time, corresponding to a 16% excess over the false positive rate (FPR) from neutral simulations (Figure 1). This selective elevation over the neutral FPR is highly significant (P = 1 × 10−7; see Figure 1 legend). However, as the strength of selection becomes even stronger (σ ≥ 300), there is a rapid drop in the FPR—probably due to a loss in power associated with a paucity of genetic variation remaining immediately after a strong selective sweep. The patterns of LD resulting from positive selection can produce locally elevated estimated rates of recombination that are similar to the relative rates reported in the literature (e.g., Figure 2; cf. Jeffreys and Neumann 2002; Wall et al. 2003; McVean et al. 2004; Ptak et al. 2004; Verrelli and Tishkoff 2004; Winckler et al. 2005). The geometric mean of estimated recombination rates at the site of selection, from 100 replicates at σ = 100, is 13.25 times higher than the background rate. Furthermore, this elevated FPR can persist for up to N generations (0.5 × 2N generations) after the selective sweep has ended (Figure 3). Assuming an effective human population size of 10,000 and an average generation time of 25 years, this corresponds to a maximum persistence time of ∼250,000 years.
Figure 1.
A plot of the relative FPR of inferring significantly elevated local recombination rates along a simulated sequence of recombining DNA. The increase (in percentage) under a selective sweep scenario is plotted relative to the FPR from neutral simulations. The position of the site under positive selection was fixed at 0.45. Each population sample consists of 100 sequences. The recombination parameter ρ ≡ 4Nr (for a diploid, where r is the per-generation recombination rate) was uniform and set to a value of 10 over the total region. The mutation parameter θ ≡ 4Nμ (where μ is the per-generation mutation rate) was also set to 10 for the region. A stochastic model of positive selection was used and the fixation of the selected allele was modeled as just completing in the sampled generation. One hundred replicates at each value of selection, σ, were generated and analyzed assuming a single hotspot 0.1 units wide of fixed location (0–0.1, 0.1–0.2,…, 0.9–1). A significantly elevated recombination rate was called when the lower 95% confidence interval of the local estimated recombination rate was higher than the background recombination rate estimate (this includes both hotspots and “warm” spots). In general, FPR excesses higher than 4 (counts out of 100) are significantly elevated (P < 0.05), assuming a Poisson distribution of false positives with a mean equal to the neutral rate, which is an average of six false positives.
Figure 2.
An example of the inferred relative recombination rate from a sample simulated under the conditions described in Figure 1 with σ = 100. The relative recombination rate between each window, with a width 1/10th of the total sequence, and the remaining region were estimated. This example illustrates how a false hotspot of recombination may be inferred. In this case, the “hotspot” is at position 0.45 and has a local recombination rate estimate 49 times higher than that of the surrounding sequence. The upper and lower 95% confidence intervals are also plotted and a significantly elevated point estimate is represented by a solid circle.
Figure 3.
A plot of the excess, over neutrality, of the FPR of inferring a significantly elevated recombination rate at the position of selection vs. time in units of 2N generations. Solid circles denote a significantly elevated FPR relative to neutral simulations (see Figure 1 legend).
We do not mean to imply that true recombinant hotspots do not exist in humans; they have certainly been verified by experimental means (e.g., Hubert et al. 1994; Cullen et al. 1995; Smith et al. 1998; Yip et al. 1999; Jeffreys et al. 2001). But we do suggest caution when inferring the existence of hotspots solely on the basis of patterns of LD. The transient nature of positive selection, both over time and between populations, may easily mimic the rapidly evolving nature of recombination in primates. When a hotspot is inferred, it may be useful to also address the relative levels of genetic variation compared to levels of divergence (Ptak et al. 2004) to help rule out past positive selection—particularly since recombination may be associated with a mutagenic process (Rattray et al. 2002; Hellmann et al. 2003) and selective sweeps can quickly remove genetic variation. However, recombinant hotspots and selective sweeps may be linked at a more basic level. There is evidence that hotspot crossover asymmetry can result in a form of meiotic drive (Jeffreys and Neumann 2002), which itself is a “selfish” form of positive selection (for review see Reed et al. 2005 and references therein). This crossover asymmetry predicts that a derived recombination-suppressing allele will eventually fix in the population (Jeffreys and Neumann 2002), resulting in the co-occurrence of both a recombinant hotspot and a progressing selective sweep.
The possibility exists that inferred recombinant hotspots in gene regions that also appear to have undergone positive selection (e.g., Verrelli and Tishkoff 2004) are not due to nonuniform densities of meiotic recombination, but may simply be a by-product of positive selection. In the same vein, estimates of the rate of hotspot sharing between species, based on LD analysis (Ptak et al. 2005), may be underestimated, and short-scale LD may be lower than expected (e.g., Pritchard and Przeworski 2001) if recent positive selection plays a significant role. If selective sweeps do make a significant contribution to the patterns of LD in the human genome, then a better understanding of the effects of positive selection may have important implications for projects that characterize LD for association studies, particularly to the extent that selective pressures may have varied among human populations (e.g., Hamblin and Di Rienzo 2000; Schlötterer 2002b; Akey et al. 2004; Storz et al. 2004).
Acknowledgments
We thank Yuseob Kim and Michael Li for helpful suggestions. We also thank two anonymous reviewers for their feedback on a previous version of this manuscript. This work was supported by a Burroughs Wellcome Fund and David and Lucile Packard Career Awards to S.A.T. F.A.R. was partially supported by the Center for Bioinformatics and Computational Biology, University of Maryland.
References
- Akey, J. M., M. A. Eberle, M. J. Rieder, C. S. Carlson, M. D. Shriver et al., 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2: 1591–1599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andolfatto, P., 2001. Adaptive hitchhiking effects on genome variability. Curr. Opin. Genet. Dev. 11: 635–641. [DOI] [PubMed] [Google Scholar]
- Aquadro, C. F., V. Bauer Dumont, and F. A. Reed, 2001. Genome-wide variation in the human and fruitfly: a comparison. Curr. Opin. Genet. Dev. 11: 627–634. [DOI] [PubMed] [Google Scholar]
- Arnheim, N., P. Calabrese and M. Nordborg, 2003. Hot and cold spots of recombination in the human genome: the reason we should find them and how this can be achieved. Am. J. Hum. Genet. 73: 5–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bamshad, M., and S. P. Wooding, 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4: 99–111. [DOI] [PubMed] [Google Scholar]
- Chakravarti, A., K. H. Buetow, S. E. Antonarakis, P. G. Waber, C. D. Boehm et al., 1984. Nonuniform recombination within the human β-globin gene cluster. Am. J. Hum. Genet. 63: 861–869. [PMC free article] [PubMed] [Google Scholar]
- Crawford, D. C., T. Bhangale, N. Li, G. Hellenthal, M. J. Rieder et al., 2004. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36: 700–706. [DOI] [PubMed] [Google Scholar]
- Cullen, M., H. Erlich, W. Klitz and M. Carrington, 1995. Molecular mapping of a recombination hotspot located in the second intron of the human TAP2 locus. Am. J. Hum. Genet. 56: 1350–1358. [PMC free article] [PubMed] [Google Scholar]
- de Massy, B., 2003. Distribution of meiotic recombination sites. Trends Genet. 19: 514–522. [DOI] [PubMed] [Google Scholar]
- Hamblin, M. T., and A. Di Rienzo, 2000. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66: 1669–1679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hellmann, I., I. Ebersberger, S. Ptak, S. Pääbo and M. Przeworski, 2003. A neutral explanation for the correlation of diversity with recombination in humans. Am. J. Hum. Genet. 72: 1527–1535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hubert, R., M. MacDonald, J. Gusella and N. Arnheim, 1994. High resolution localization of recombination hot spots using sperm typing. Nat. Genet. 7: 420–424. [DOI] [PubMed] [Google Scholar]
- International Hapmap Consortium, 2003. The international hapmap project. Nature 426: 789–796. [DOI] [PubMed] [Google Scholar]
- International Hapmap Consortium, 2005. A haplotype map of the human genome. Nature 437: 1299–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeffreys, A., and R. Neumann, 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat. Genet. 31: 267–271. [DOI] [PubMed] [Google Scholar]
- Jeffreys, A. J., L. Kauppi and R. Neumann, 2001. Intensely punctuate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genet. 29: 217–222. [DOI] [PubMed] [Google Scholar]
- Kauppi, L., A. J. Jeffreys and S. Keeney, 2004. Where the crossovers are: recombination distributions in mammals. Nat. Rev. Genet. 5: 413–424. [DOI] [PubMed] [Google Scholar]
- Kim, Y., and R. Nielsen, 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 157: 1513–1524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim, Y., and W. Stephan, 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765–777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, N., and M. Stephens, 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165: 2213–2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McVean, G. A. T., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley et al., 2004. The fine-scale structure of recombination variation in the human genome. Science 304: 581–584. [DOI] [PubMed] [Google Scholar]
- Petes, T. D., 2001. Meiotic recombination hot spots and cold spots. Nat. Rev. Genet. 2: 360–369. [DOI] [PubMed] [Google Scholar]
- Pritchard, J. K., and M. Przeworski, 2001. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69: 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Przeworski, M., 2002. The signature of positive selection at randomly chosen loci. Genetics 160: 1179–1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ptak, S. E., A. D. Roeder, M. Stephens, Y. Gilad, S. Pääbo et al., 2004. Absence of the TAP2 human recombination hotspot in chimpanzees. PLoS Biol. 2: 849–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ptak, S. E., D. A. Hinds, K. Koehler, B. Nickel, N. Patil et al., 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat. Genet. 37: 429–434. [DOI] [PubMed] [Google Scholar]
- Rattray, A. J., B. K. Shafer, C. B. McGill and J. N. Strathern, 2002. The roles of REV3 and RAD57 in double-strand-break-repair-induced mutagenesis in Saccharomyces cerevisiae. Genetics 162: 1063–1077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reed, F. A, R. G. Reeves and C. F. Aquadro, 2005. Evidence of susceptibility and resistance to cryptic X-linked meiotic drive in natural populations of Drosophila melanogaster. Evolution 59: 1280–1291. [PubMed] [Google Scholar]
- Schlötterer, C., 2002. a Towards a molecular characterization of adaptation in local populations. Curr. Opin. Genet. Dev. 12: 683–687. [DOI] [PubMed] [Google Scholar]
- Schlötterer, C., 2002. b A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics 160: 753–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith, R. A., P. J. Ho, J. B. Clegg, J. R. Kidd and S. L. Thein, 1998. Recombination breakpoints in the human β-globin gene cluster. Blood 92: 4415–4421. [PubMed] [Google Scholar]
- Spencer, C. C. A, and G. Coop, 2004. SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics 20: 3673–3675. [DOI] [PubMed] [Google Scholar]
- Storz, J. F., B. A. Payseur and M. W. Nachman, 2004. Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol. Biol. Evol. 21: 1800–1811. [DOI] [PubMed] [Google Scholar]
- Stumpf, M. P., and G. A. McVean, 2003. Estimating recombination rates from population-genetic data. Nat. Rev. Genet. 4: 959–968. [DOI] [PubMed] [Google Scholar]
- Tishkoff, S. A., and B. C. Verrelli, 2003. Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping. Curr. Opin. Genet. Dev. 13: 569–575. [DOI] [PubMed] [Google Scholar]
- Verrelli, B. C., and S. A. Tishkoff, 2004. Signatures of selection and gene conversion associated with human color vision variation. Am. J. Hum. Genet. 75: 363–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wall, J. D., and J. K. Pritchard, 2003. Haplotype blocks and the structure of linkage disequilibrium in the human genome. Nat. Rev. Genet. 4: 587–597. [DOI] [PubMed] [Google Scholar]
- Wall, J. D., L. A. Frisse, R. R. Hudson and A. Di Rienzo, 2003. Comparative linkage disequilibrium analysis of the β-globin hotspot in primates. Am. J. Hum. Genet. 73: 1330–1340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weiss, K. M., and A. G. Clark, 2002. Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18: 19–24. [DOI] [PubMed] [Google Scholar]
- Winckler, W., S. R. Myers, D. J. Richter, R. C. Onofrio, G. J. McDonald et al., 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308: 107–111. [DOI] [PubMed] [Google Scholar]
- Yip, S. P., J. U. Lovegrove, N. A. Rana, D. A. Hopkinson and D. B. Whitehouse, 1999. Mapping recombination hot spots in human phosphoglucomutase (PGM1). Hum. Mol. Genet. 9: 1699–1706. [DOI] [PubMed] [Google Scholar]
- Zhang, J., F. Li, J. Li, M. Q. Zhang and X. Zhang, 2004. Evidence and characteristics of putative human α recombination hotspots. Hum. Mol. Genet. 13: 2823–2828. [DOI] [PubMed] [Google Scholar]