Fraction of Informative Recombinations: A Heuristic Approach to Analyze Recombination Rates

J-F Lefebvre; D Labuda

doi:10.1534/genetics.107.082255

. 2008 Apr;178(4):2069–2079. doi: 10.1534/genetics.107.082255

Fraction of Informative Recombinations: A Heuristic Approach to Analyze Recombination Rates

J-F Lefebvre ^*, D Labuda ^*,†,¹

PMCID: PMC2323797 PMID: 18430934

Abstract

In this article we present a new heuristic approach (informative recombinations, InfRec) to analyze recombination density at the sequence level. InfRec is intuitive and easy and combines previously developed methods that (i) resolve genotypes into haplotypes, (ii) estimate the minimum number of recombinations, and (iii) evaluate the fraction of informative recombinations. We tested this approach in its sliding-window version on 117 genes from the SeattleSNPs program, resequenced in 24 African-Americans (AAs) and 23 European-Americans (EAs). We obtained population recombination rate estimates (ρ_obs) of 0.85 and 0.37 kb⁻¹ in AAs and EAs, respectively. Coalescence simulations indicated that these values account for both the recombinations and the gene conversions in the history of the sample. The intensity of ρ_obs varied considerably along the sequence, revealing the presence of recombination hotspots. Overall, we observed ∼80% of recombinations in one-third and ∼50% in only 10% of the sequence. InfRec performance, tested on published simulated and additional experimental data sets, was similar to that of other hotspot detection methods. Fast, intuitive, and visual, InfRec is not constrained by sample size limitations. It facilitates understanding data and provides a simple and flexible tool to analyze recombination intensity along the sequence.

CHARACTERIZING the partitioning of genetic diversity across human populations and along the genome is of fundamental interest and practical importance in genetic epidemiological studies. Mutations contribute to this diversity by creating new alleles, whereas meiotic recombinations redistribute them among homologous chromosomal segments. A reliable map of recombination intensity is important for understanding historical processes that affect linkage disequilibria among polymorphic sites and, consequently, the haplotype structure of the genome. A substantial variation in the recombination rate among and along chromosomes was first observed in pedigree studies (Broman et al. 1998). On a finer scale, at the sequence level, the rate differences are even more dramatic. In fact, it has been proposed that most of the meiotic recombinations occur in small genomic segments, called hotspots (Chakravarti et al. 1984). The presence of hotspots was further shown by single-sperm genotyping (Hubert et al. 1994; Jeffreys et al. 2001) and population diversity data analyses (Stumpf and McVean 2003). No hotspot conservation was observed across species (Ptak et al. 2005; Winckler et al. 2005), and it is also not clear to what extent they are shared among individuals and/or populations (Tiemann-Boege et al. 2006).

Although analyses of the genomewide distribution of recombination density at the sequence level (De Massy 2003; Kauppi et al. 2004) have focused mainly on recombination hotspots, the extended regions of linkage disequilibria or haplotype blocks (Reich et al. 2001) are also interesting, as they are likely to reflect the existence of coldspots or extended areas of poor recombination (Petes 2001). Given the limitations of experimental approaches with respect to measuring recombination rates at the sequence level, genomewide studies rely on computational methods using population diversity data (Stumpf and McVean 2003). However, contrary to mutations, recombinations are not directly visible and must be inferred from the underlying haplotypes. We can, therefore, only estimate their number and density. The existing likelihood-based methods are computationally demanding, although this has become less of an issue with ever-increasing computational power (Fearnhead and Donnelly 2001; Hudson 2001; McVean et al. 2002, 2004; Li and Stephens 2003; Wall 2004; Fearnhead and Smith 2005). Here we present a novel heuristic approach to study recombinations along DNA sequences, which we refer to as informative recombinations (InfRec). This approach combines various existing methods developed for the analysis of recombinations and haplotypes, namely (i) “PHASE,” which solves genotypes into haplotypes (Stephens et al. 2001), (ii) “RecMin,” which estimates the minimum number of recombinations in a haplotype sample (Myers and Griffiths 2003), and (iii) the estimation of the fraction of informative recombinations (FIR) and the fraction of novel recombinations (FNR) (Zietkiewicz et al. 2003). InfRec is convenient to use, provides a transparent relation between the resulting estimates and the observed data, is not limited by sample size, and works within reasonable computational time. It compares well with other methods and provides a realistic picture of the crossover distribution along the genome. We used InfRec to examine the variation in recombination density along 117 genes that were previously resequenced in 24 Americans of African descent (African-Americans, AAs) and 23 Americans of European descent (European-Americans, EAs) at the University of Washington (the University of Washington FHCRC Variation Discovery Resource, SeattleSNPs, http://pga.gs.washington.edu). We tested InfRec performance in detecting hotspots using four separate data sets: two genomic segments for which recombination hotspots were characterized by sperm typing (Jeffreys et al. 2001, 2005), the published simulated data sets of Fearnhead and Smith (2005) and Li et al. (2006), and our own coalescence simulations by msHot (Hellenthal and Stephens 2007). The analysis of InfRec performance clearly shows that, in addition to simple recombinations, gene conversions also influence the overall estimate of the population recombination rate from sequence diversity data.

MATERIALS AND METHODS

Data:

Resequencing data from Crawford et al. (2004) were kindly provided by D. C. Crawford and M. Stephens from Washington University in Seattle. Data on the class II region of the major histocompatibility complex (Jeffreys et al. 2001) and the 200-kb segment of chromosome 1 (Jeffreys et al. 2005) were obtained from A. Jeffreys's laboratory site (http://www.le.ac.uk/ge/ajj/). The simulated data of Fearnhead and Smith (2005) were obtained from http://www.maths.lancs.ac.uk/∼fearnhea/Hotspot/. This data set mimics the SeattleSNPs and consists of four sets, each representing 100 25-kb segments that were simulated separately for 23 EAs and for 24 AAs, with and without a single hotspot. The background recombination rate was set to 1.2 × 10⁻⁸/bp (1.2 cM/Mb, varying between 0.1 and 5 cM/Mb) and a hotspot recombination rate between 50 × 10⁻⁸ and 75 × 10⁻⁸/bp was “added” on top. The HapMap Encode simulated data set of Li et al. (2006) is for three populations: European, Asian, and African, each represented by 90 individuals. Each subset consists of 100 200-kb regions containing a random number of hotspots (mean spacing of 50 kb between hotspots) with 90% of the recombinations occurring within the hotspots. In contrast to Fearnhead and Smith's (2005) simulations, the overall recombination rate was set to 1.2 × 10⁻⁸/bp and hotspot intensities were defined by a “hotspot quotient” (HQ), describing the proportion of recombination events expected to happen within hotspots [HQ = 90% (Li et al. 2006), obtained from http://bioinfo.au.tsinghua.edu.cn/member/∼lijun]. See the Coalescent simulations section below for our own simulated data set.

Informative recombinations:

Recombinations that left an imprint on the sampled chromosomes can be detected by the four-gamete test and retraced within genotypes spanning several polymorphic sites (Hudson and Kaplan 1985; Myers and Griffiths 2003). However, in contrast to mutations, not all genomic segments are equally informative when it comes to recording crossovers. Hence, evaluating recombinations from population diversity data first requires an estimation of the extent of informativeness of the analyzed genomic segment (Stephens 1986).

Considering a pool of haplotypes observed in a population sample, we let all possible haplotype pairs undergo reciprocal recombination. One potential outcome is a haplotype that does not differ from the recombining parental pair. This outcome is observed for all recombinations that occur within homozygous genotypes, within genotypes heterozygous at only one site, or within flanking segments that are delimited by a heterozygous site on only one side. Alternatively, recombination can lead to a daughter haplotype that is distinct from the parental haplotypes, which is the case of all crossovers within a segment flanked by heterozygous sites at both its ends. Furthermore, the resulting haplotype (recombination produces two daughter haplotypes of which only one is retained in humans) can represent either a truly novel variant or a haplotype that is structurally identical with one of the haplotypes already present in the analyzed pool. In other words, the recombinations can be either noninformative, producing haplotypes identical to the parental ones, or potentially informative. The fraction of potentially informative recombinations (FIR) includes a fraction of recombinations leading to novel recombinant haplotypes (FNR) and a fraction of back recombinations (FBR). FBR represents recurrent crossovers, which are noninformative when considering the sample of haplotypes drawn from a population.

In a pool of haplotypes randomly participating in a meiotic recombination, assuming equal probability of recombination between each base pair along the sequence, FIR can be evaluated as

(1)

where f_i and f_j denote frequencies of the ith and jth haplotypes and k denotes the number of haplotypes, so that Inline graphic , D_max,ij is the distance between two maximally separated heterozygous sites of a genotype composed of haplotypes i and j, and L is the haplotype length (Zietkiewicz et al. 2003). While computing FIR, we kept track of all crossover outcomes that produce haplotypes structurally identical with any of the haplotypes already present in the pool. Their proportion corresponds to FBR and should be subtracted from FIR to obtain FNR. Therefore, FNR represents the proportion of crossovers creating recombinants that would be seen as novel haplotypes in the analyzed population sample.

Expectations:

The theoretical expectation of the frequency of undetectable recombination events was calculated by Stephens (1986) as Inline graphic , where the population parameter is Θ = 4Nμ, with N being the effective population size and μ the mutation rate per DNA segment per generation. Stephens's E(I) can be used to evaluate the expectation of FIR, E(FIR), because . The parameter Θ, which itself can be estimated from the average number of pairwise differences in a sample of haplotypes (Tajima 1989), can be used here as a measure of the average number of polymorphic sites in a genotype. The genotypes that differ in k sites, where k = 0, 1, 2, … , are expected to occur with Poisson probabilities Inline graphic and those that are informative have k ≥ 2. Informative recombinations occur among these sites, which correspond to the portion (k − 1)/(k + 1) of the sequence (Stephens 1986). Hence

(2)

where S is the total number of segregating sites in a sample [when S is large, this sum corresponds to Stephens's estimate 1 − E(I)]. For gene conversion, the informative events occur within genotypes at k ≥ 2 and have to include a polymorphic site within the converted track, which, in turn, is expected to occur at a frequency of Inline graphic , where t is the average length of the converted track. The proportion of gene conversions is (k − 1)/k, because a gene conversion including only one of the flanking polymorphisms is seen as a simple recombination (i.e., “half” gene conversion). The resulting estimate of the fraction of informative gene conversions is expressed thus:

(3)

Estimation of recombination density:

We used the PHASE program v.2.1.1 (Stephens et al. 2001) to infer haplotypes. They were inferred for the whole gene or its contiguous sequenced fragments and were not reevaluated for every window separately. The RecMin program by Myers and Griffiths (2003) was used to evaluate the minimum number of recombinations, R_min, from the haplotype data. The total number of inferred past recombinations, R_obs, was obtained by correcting R_min for the fraction of unseen and recurrent recombinations, i.e., by dividing R_min by FNR. From Inline graphic , where N denotes the effective population size, r the recombination rate per segment per generation, and n the sample size in number of chromosomes (Hein et al. 2005), the population recombination rate () was estimated as

(4)

Because gene conversions also contribute to the overall count of recombinations (e.g., Padhukasahasram et al. 2006), taking their rate (γ) into account yields Inline graphic , and thus . Substituting FNR for the first FIR, and replacing the fraction of informative gene conversions (FIG) and FIR with their expectations, we obtain

(5)

where f corresponds to the ratio of gene conversion to recombination rate, and E(FIR) and E(FIG) can be calculated, knowing Θ, from Equations 2 and 3. In other words, the InfRec inferred that ρ_obs can be interpreted in terms of either Equation 4 or Equation 5.

The recombination density profiles obtained by InfRec were compared with those obtained using RecSlider (Wall et al. 2003), a sliding-window version of MaxDip, a program that implements the composite-likelihood approach of Hudson (2001) available online at the Di Rienzo laboratory (http://genapps.uchicago.edu/labweb/index.html).

Methodological considerations:

InfRec can be used with DNA haplotypes of any size defined either by the number of segregating sites or by the sequence length. We used a sliding-window approach, appropriate to study variations of the recombination rate along DNA sequences. The window size was determined by the number of segregating sites independently of their density within the sequence. However, each window's sequence coverage was taken into account to express the recombination rate estimates in units of sequence length. The calculation time of FIR and FNR is in the order of O(k² · W · E), where k is the number of haplotypes, W is the window size in number of polymorphic sites, and E is the number of windows, which is E = S − W + 1 with windows sliding only one polymorphic site at a time, for a sequence with S such sites. The window length (L) is in base pairs and corresponds to the sequence between the first and the last site, plus half the distance between the first and the preceding site, and half the distance between the last and the following site. This L is used in the estimation of FIR, FNR, and ρ_obs. In practice, however, we use an approximation: for a window size of W, there are W − 1 intervals and therefore to correct for the “missing” flanking sequence we add one interval. Hence, window length L is obtained by multiplying the length between the first and the last polymorphic site by Inline graphic . The window length varies according to the density of polymorphic sites; hence, the average ρ_obs of the whole segment or of a spot is calculated by using a weighted average, the weight being the length in base pairs of a window. The result for each window is “reported” as a point in its median position (Figure 1), so that the sequence covered in the analysis in Table 1 is less than the total sequence (2050 kb in AAs and 1822 kb in EAs), summed over 175 and 142 noncontiguous segments, containing 6628 and 4500 segregating sites, respectively.

Figure 1.— — Distribution of the ρ_obs-density along the *IL10b* gene estimated using InfRec (solid blue lines) with a window size of eight polymorphic sites in AAs and EAs, as indicated. The dotted green lines denote the distribution of R_min found using RecMin. The solid red and green lines (top) represent the corresponding FIR and FNR values (right axis). The black marks in the middle represent positions of the polymorphic sites.

TABLE 1.

InfRec analysis of resequencing data of Crawford et al. (2004) in two population samples

	African-Americans	European-Americans
Mean values (per window)
FIR	0.33 (0.01)	0.35 (0.01)
FNR	0.25 (0.01)	0.3 (0.01)
R_min	1.76 (3.76)	1.15 (2.2)
Sequence coverage (kb)	2.47 (1.57)	3.12 (3.91)
Sample characteristics
ρ_obs (kb⁻¹)	0.85 (2.2)	0.37 (1.33)
Chromosomes	48	46
Genes (contiguous segments)	117 (175)	109 (142)
Windows	5228	3364
Informative segment coverage (kb)	1605	1300
Total R_obs	6142	2166
Hotspots
Threshold ρ_obs (kb⁻¹) ≥	5.0	2.5
ρ_obs (kb⁻¹)	7.27 (9.84)	4.75 (18.1)
No. per gene	1.26 (0.35)	1.25 (0.21)
Width above threshold (kb)	0.82 (0.5)	0.94 (0.48)
Genes with hotspots	27	18
Total no.	34	22
Coldspots
Threshold ρ_obs (kb⁻¹) =	0.0	0.0
No. per gene	2.58 (4.43)	2.16 (2.7)
Length (kb)	1.58 (2.82)	2.48 (8.42)
Genes with coldspots	100	91
Total no.	258	197
Coverage (kb)	408	489

Open in a new tab

Values in parentheses represent variances.

The choice of window size is important. A small window size decreases FNR, which makes the ρ_obs values sensitive to small fluctuations in the data and the resulting estimates less reliable. A large window size, on the other hand, reduces the resolution of the recombination rate variation along the sequence, which can mask the presence of recombination hotspots. For these reasons, we evaluated the effect of window size on ρ_obs variance and segment informativeness. Starting with a window size of 5, the average ρ_obs and variance rapidly decreased to stabilize between window sizes 7 and 9 for the EA sample and between sizes 8 and 10 for the AA sample (supplemental Figure S1). In addition, the variance in ρ_obs-estimates was substantially reduced by defining an FNR threshold, below which the ρ_obs results are not counted (supplemental Figure S2). Indeed, windows characterized by a very low FNR (<0.05), i.e., those that are effectively noninformative, are responsible for a large portion of the variance. A single crossover fortuitously observed within a window characterized by a small FNR would lead to an artificially high rate, hence erroneously suggesting the presence of a hotspot. To avoid such false positives and reduce variance in ρ_obs-estimates, it is important to discard results that are under a certain FNR threshold. When well chosen, the cost of using a threshold is minimal, because it concerns regions of very low informativity and also only a small proportion of the data is concerned. For example, in our case, an FNR cutoff of 0.10 excludes <2.2% and a cutoff of 0.05 excludes <0.14% of all windows analyzed (supplemental Figure S2), so that the overall results obtained for both thresholds remain practically identical. Only the results obtained using an FNR threshold of 0.05 and a window size of 8 polymorphic sites are reported below.

Coalescent simulations:

Simulations were carried out using msHot (Hellenthal and Stephens 2007) (http://home.uchicago.edu/∼rhudson1/source/mksamples.html), a modification of the ms program (Hudson 1990), under a simple version of the standard neutral model at a constant population size. Each simulated data set was obtained for a sample of 100 sequences (i.e., corresponding to 50 diploid individuals) and 1000 200-kb genomic segments. While we explored a variety of starting parameters, the reported results were obtained using Θ of 0.73 and 0.59/kb estimated from the density of segregating sites (Watterson 1975) in AAs and EAs, respectively (Table 1). The average length of the converted track was set to t = 500 bp, whereas the gene conversion rates (γ) were expressed as a fraction of the ρ-values (f = γ/ρ). The recombination rate or jointly the recombination and gene conversion rates were kept constant or were varied across the sequence, concentrating in recombination hotspots of 1 or 2 kb. This corresponds to the size range reported in sperm-typing experiments (Jeffreys et al. 2001; Tiemann-Boege et al. 2006) and inferred by our computational methods (see results) and others' (Hellenthal and Stephens 2007). In DNA fragments of such size, one expects to observe on average four to eight polymorphic sites, considering a population sample of 40 individuals (Labuda et al. 2007). Hotspot intensities were defined by the proportion of recombination events expected to happen within hotspots, set at 90 or 70%; i.e., HQ = 90% and HQ = 70%. In other words, using an HQ of 90% sets the background ρ to 0.06/kb when using an overall ρ of 0.6/kb. Hotspots densities were set at 4/200 kb and 3/200 kb in AAs and EAs, respectively. The effect of haplotype phasing on the estimates of ρ_obs by InfRec was assessed by randomly assigning 100 simulated haplotypes to 50 individuals, resolving the resulting genotypes back to the haplotypes using PHASE and reanalyzing them by InfRec. This resulted in up to a 16% increase in the overall ρ_obs as compared with its direct estimate from the simulated haplotypes. There is no fully reliable computational method to infer haplotypes from diploid genotypes (Andrés et al. 2007). The related uncertainty will therefore have to be taken into account in the estimates or procedures, making use of computationally phased haplotypes. As in the analysis of the experimental SeattleSNPs data, recombination intensity peaks >2.5/kb in EAs and >5/kb in AAs were considered hotspots, if not stated otherwise (e.g., Table 3). The hotspot false discovery rate was evaluated as the number of false hotspot predictions, FP, per megabase of the simulated sequence length. The power in detecting hotspots was calculated as the proportion of simulated hotspots found by InfRec. They were considered found if inferred within a ≤1-kb interval from their simulated position. Coalescent simulations were additionally carried out using the cosi package and its “best-fit” model parameters as described by Schaffner et al. (2005) to test the effect of demographic histories on the overall ρ (non-African vs. African population).

TABLE 3.

InfRec comparison with Fearnhead's SequenceLDhot (SLD), penalized-likelihood (PL), and likelihood-ratio (LR) methods (Fearnhead et al. 2004; Fearnhead and Smith 2005; Fearnhead 2006) and with Li's HotspotFisher program (Li et al. 2006), using two simulated data sets (SeattleSNPs and HapMap Encode) as described by Fearnhead (2006) (see materials and methods)

	InfRec
SeattleSNPs populations	Hotspot ≥7 · ρ_obs		Hotspot ≥5 · ρ_obs		LR^a		PL^a		SLD^b
SeattleSNPs populations	EA	AA	EA	AA	EA	AA	EA	AA	EA	AA
ρ_obs/kb	0.58	0.93	0.58	0.93
Power (%)	47	58	63	73	56	44	63	67	73	65
FP/Mb	0.6	2.4	1.4	3.4	0.8	1.6	0.8	2.0	1.2	0.4

InfRec
Hapmap Encode populations	Hotspot ≥7 · ρ_obs			Hotspot ≥5 · ρ_obs			SLD^a			HF^c
Hapmap Encode populations	Eur	Asi	Afr	Eur	Asi	Afr	Eur	Asi	Afr	Eur	Asi	Afr
ρ_obs/kb	0.30	0.26	0.60	0.30	0.26	0.60
Power (%)	62	61	71	78	74	83	77	75	86	69	66	66
FP/Mb	0.6	0.5	1.2	0.9	0.9	2.6	0.2	0.3	0.2	0.4	0.3	0.1

Open in a new tab

InfRec was used with a window size of 8 and hotspots were defined as intensity peaks of five or seven times the average ρ_obs. Eur, Europeans; Asi, Asians; Afr, Africans.

Results taken from Fearnhead and Smith (2005).

Results taken from Fearnhead (2006).

Results taken from Li et al. (2006).

URLs:

RecSlider web page: http://genapps.uchicago.edu/recslider2/index.html
RecMin software: http://www.stats.ox.ac.uk/∼myers/RecMin.html
PHASE 2.1.1 software: http://www.stat.washington.edu/stephens/software.html
SeattleSNPs Programs for Genomic Applications: http://pga.gs.washington.edu/
SeattleSNPs simulated data: http://www.maths.lancs.ac.uk/∼fearnhea/Hotspot/
HapMap Encode simulated data: http://bioinfo.au.tsinghua.edu.cn/member/∼lijun
Alec Jeffrey's Laboratory typing data: http://www.le.ac.uk/ge/ajj/labhome.html
cosi homepage: http://www.broad.mit.edu/personal/sfs/cosi/
msHot homepage: http://home.uchicago.edu/∼rhudson1/source/mksamples.html

Program availability:

InfRec is written in JAVA and available from the authors upon request.

RESULTS

Analysis of recombination densities:

Figure 1 illustrates InfRec analysis along 36 kb of the IL10b gene, using a window size of eight polymorphic sites. It presents the distribution of the ρ_obs-estimates and the underlying numbers of observed recombinations R_min (bottom part and left axis) as well as the associated values of FNR and FIR (top part and right axis). Two peaks of recombination density were observed, at ∼14.5 and 29 kb, separated by regions of low or no recombination activity. The pattern was similar in both population samples, but the recombination signal was weaker in EAs (note the expanded left scale in EAs). On the other hand, the overall levels of FIR and FNR were similar in both populations. A depression of FNR relative to FIR coincided with peaks of ρ_obs. This seems to be a characteristic of regions of high recombination activity, due to a greater number of recurrent (back) recombinations, since FNR = FIR − FBR. InfRec analysis of 117 genes in the AA and 109 in the EA sample (low number of polymorphisms excluded 8 genes in the EA sample) demonstrated great variation in ρ_obs along the genome characterized by peaks of high recombination density and the presence of more extended regions without recombinations (see supplemental Figure S3 for recombination density profiles of all genes investigated). As summarized in Table 1, we analyzed 5228 windows within 1605 kb of the sequence in AAs and 3364 windows in 1300 kb in EAs. The average sequence coverages per window were 2.47 and 3.12 kb and the corresponding average FNRs were 0.25 and 0.30, respectively. This suggests similar power to detect recombinations per sequence length in both populations. However, because more recombinations were observed, there are more than twice as many past recombinations (R_obs) inferred in AAs than in EAs, which was reflected in the resulting estimates of the genomic ρ_obs of 0.85 and 0.37 kb⁻¹, respectively (Table 1). Altogether we inferred 6142 historical recombinations in AAs, or 3.8 such events per kilobase, and 2166 in EAs or 1.7/kb.

A summary of the genomic distribution of recombinations and the proportion of the sequence involved is provided in Figure 2. Bars depict the relative proportions of the number of recombinations and the corresponding proportions of the sequence in regions characterized by different intensities of ρ_obs (at ρ_obs = 0 and in intervals of ρ_obs from >0 to 1, from >1 to 2, etc.). In turn, lines represent the cumulative proportions of the recombinations and of the sequence as a function of increasing ρ_obs. In EAs, we found no recombinations in 38% of the sequence (ρ_obs = 0); 57% of all recombinations were found in 54% of the sequence characterized by ρ_obs ≤ 1, but >0, and the remaining recombinations (43%) were found in 8% of the sequence where ρ_obs-intensities were >1. In AAs, 25% of the sequence showed no trace of recombination (ρ_obs = 0); in 48% of the sequence ρ_obs was <1 but >0 and accounted for 32% of all recombination events. In an additional 17% of the sequence, with ρ_obs between 1 and 2, we found 27% of all recombinations, yielding a total of 59% of all recombinations detected in the AA sample at ρ_obs < 2. The remaining 41% of recombinations occurred in the remaining 10% of the sequence where ρ_obs intensities were >2. Looking at it from a different angle, we found that in both population samples 50–60% of the sequence accounted for only 10% of the recombination events. Most recombinations, from 60 to 80%, took place in 10–30% of the sequence. These recombinations occurred in regions with ρ_obs-intensities greater than the population average.

The picture of a quasi-continuous transition from low to high ρ_obs-sequence regions revealed in Figure 2 was obtained by accumulating data from many genomic sequence segments. They concealed the presence of the recombination hotspots representing peaks of ρ_obs-intensity that often occurred in the background of no or lower recombination signals as seen in Figure 1. To further test the performance of InfRec when detecting recombination hotspots in the experimental data, we used it to infer recombination density profiles in two chromosomal regions, MHCII and MS32, where recombination hotspots were characterized experimentally by sperm typing and for which population diversity data also existed (Jeffreys et al. 2001, 2005). As demonstrated in Figure 3, InfRec analysis does a good job of capturing the variation in recombination intensity in these regions. The peaks of ρ_obs coincide with the experimental hotspots indicated by the arrows. In MHCII, InfRec detected two other peaks in addition to those detected experimentally (see Figure 3): one to the left of the DMB1 hotspot, in the middle of the sequence, and another to the right of the DMB2 peak, shown earlier by others (Li et al. 2006).

Figure 3.— — InfRec analysis of recombination hotspots in the *MHCII* region and in the *MS32* region, using population diversity data obtained for 50 and 80 individuals, respectively (http://www.le.ac.uk/ge/ajj/). The solid blue lines indicate the distribution of ρ_obs-density along the sequence and the dotted green lines denote the distribution of R_min found using RecMin. The black marks in the middle represent positions of the polymorphic sites and the arrows indicate the positions of “experimental” hotspots found by sperm typing (from left to right: DNA1, DNA2, DNA3, DMB1, and DMB2 in the MHCII locus and NID3, double-peak NID2a–b, NID1, MS32, MSTM1a, MSTM1b, and MSTM2 in the MS32 locus).

In our analysis, we defined hotspots as sequence segments with crossover rates >5 kb⁻¹ in AAs and 2.5 kb⁻¹ in EAs, i.e., as recombination intensity peaks exceeding six- to sevenfold that of the average population (ρ_obs). InfRec detected 34 such hotspots in AAs and 18 in EAs (see Table 1). Their average widths above the threshold intensity were 0.82 and 0.94 kb, respectively. They were found in <25% of the analyzed genes, on average once every 47 kb in AAs and once every 59 kb in EAs, corresponding to hotspot densities of 68,000 and 54,000 hotspots per genome, respectively. In turn, recombination coldspots were defined as segments with no observed recombination (R_min = 0). These coldspots were seen in 85% of the genes with average lengths of 1.58 kb in AAs and 2.48 kb in EAs. In total, they covered 410 kb in AAs and 490 kb in EAs, corresponding to 25 and 38% of the sequence at ρ_obs = 0 in Figure 2. They occurred at a density of about once every 6.5 kb in both populations. Being almost one-third shorter in AAs than in EAs, their size seems to be a good predictor, better than hotspot densities, of the linkage-disequilibrium landscape with fewer extended linkage-disequilibrium (LD) regions in Africans than in non-Africans (Reich et al. 2001).

InfRec performance:

Using average ρ_obs-estimates (Table 1) and the recombination rate r = 1.2 × 10⁻⁸/bp/generation (Kong et al. 2002), we obtained estimates of the effective population size of 17,700 for AAs and 7700 for EAs. These estimates were in good agreement with N_e estimates, from different measures of genetic diversity, which were typically in the range of 10,000. However, because RecMin estimates represented the minimum number of recombinations that could be inferred from the data (Myers and Griffiths 2003), our estimates of ρ_obs were also expected to be conservative. Moreover, using InfRec on the simulated data, the resulting overall ρ_obs typically represented only about one-third of the input ρ (Table 2). With ρ of 0.48 (given N_e of 10,000 and r as above), one would expect ρ_obs of ∼0.16 from InfRec. Thus, one may ask where the apparent excess of ρ_obs in the InfRec estimates came from. We excluded recombination artifacts, due to erroneous phasing of the haplotypes by PHASE. On the basis of the simulations, these could account for not more than a 16% increase in ρ_obs. Correcting for this effect left ρ_obs at 0.73 in AAs and 0.32 in EAs, still inflated with respect to the value calculated above. Clearly, over short genomic distances, gene conversions may significantly contribute to the overall count of crossovers (Frisse et al. 2001; Przeworski and Wall 2001; Jeffreys and May 2004; Padhukasahasram et al. 2004).

TABLE 2.

InfRec results for msHot simulated data based on 1000 simulations of a 200-kb segment for 100 chromosomes using θ = 0.73/kb for AAs and θ = 0.59/kb for EAs

	AAs						EAs
Simulation
ρ/kb	2.03	0.6	0.6	0.6	0.6	0.6	0.91	0.3	0.3	0.3	0.3	0.3
γ/ρ	—	—	0.5	1	2	2.4	—	—	0.5	1	2	2.4
Analysis
ρ_obs/kb	0.83	0.22	0.34	0.47	0.74	0.85	0.35	0.11	0.15	0.21	0.32	0.37

	AAs				EAs
Simulation
ρ/kb	0.6				0.3
γ/ρ	2.4				2.4
Hotspot size	1 kb		2 kb		1 kb		2 kb
HQ (%)	70	90	70	90	70	90	70	90
Analysis
ρ_obs/kb	0.79	0.68	0.90	0.84	0.39	0.37	0.46	0.44
Power (%)	79	80	85	87	84	85	87	89
FP/Mb	0.48	0.08	0.6	0.06	1.67	0.27	1.41	0.32

Open in a new tab

The distribution of gene conversion intensity (γ) follows that of the reciprocal recombination (ρ). Window size was 8 and hotspots were defined as intensity peaks of ρ_obs > 2.5 in EAs and >5.0 in AAs.

We evaluated the contribution of gene conversions to ρ_obs, by InfRec analysis of the msHot simulated data (Table 2). For AA simulations, the overall ρ was set at 0.6/kb, assuming the effective population size of 12,500 and the recombination rate of 1.2 × 10⁻⁸/bp/generation. Half of this value, ρ of 0.3, was used for EAs (i.e., less than the ratio of 0.8 in the case of the respective Θ's—Table 1). Halving of the African ρ in non-Africans was justified by the experimental data (Stumpf and McVean 2003; Ptak et al. 2004; Fearnhead and Smith 2005; Serre et al. 2005; Table 1), by cosi best-fit simulations (Schaffner et al. 2005) modeling the out-of-Africa bottleneck (data not shown, see also Table 3) and by a recent analysis by Thornton and Andolfatto (2006) that showed a population bottleneck caused a more dramatic reduction in ρ than in nucleotide diversity (i.e., after a bottleneck, the difference in a's was expected to be much greater than that in the corresponding Θ's). The choice of the remaining parameters for simulations was facilitated by Equation 5. Note that in the sliding-window approach, the window size determines the number of segregating sites per analyzed segment. Therefore, Patterson's estimate of Θ (Watterson 1975) can be used to compute, for a given window size, the expectation of FIG and FIR (Equations 2 and 3) and their ratio E(FIG)/E(FIR) (we assume that this ratio is a good reflection of the fraction of new gene conversions, FNG, to FNR). The window's size-based Θ above is related to L, the average window length (Table 1), through nucleotide diversity π = Θ/L. The FIG to FIR ratio is sensitive to the converted tract length t (chosen at 500 bp), so that in populations with higher π, the relative contribution of gene conversions to the overall count of recombination events is expected to be higher (e.g., in AAs compared with EAs). As predicted, by using Equation 5 based on the parameters above at f of ∼2.4, we obtained a very good correspondence between the simulated and the experimental data for both population samples (Table 2). However, this value was obtained using a simple population model and arbitrary (though realistic) t = 500 and thus should be considered with caution (Ptak et al. 2004). Obviously, an equivalent increase in ρ_obs that was obtained by introducing gene conversions could also be obtained by only raising the input ρ (Table 2), but these values are not realistic in light of the experimental data. In conclusion, coalescent simulations that account for the presence of gene conversions explain well the observations and apparent excess of ρ_obs in AAs and EAs described above.

Hotspots:

The same starting parameters were used to simulate data with the recombination hotspots, using HQs of 90 and 70% and hotspot widths of 1 and 2 kb. As reported in Table 2 (bottom half), the power to detect hotspots was between 79 and 89% with an FP of 0.06–1.67/Mb. In addition, we compared the performance of InfRec in detecting hotspots with the methods described by Fearnhead and Smith (2005), Fearnhead (2006), and Li et al. (2006), using the same simulated data sets as these authors. Table 3 (bottom half) shows that InfRec was as efficient in detecting hotspots in a simulated HapMap Encode data set as the SLD method by Fearnhead (2006) and the HF method by Li et al. (2006). This was also true with the simulated SeattleSNPs data (Table 3, top half). However, it is also clear that hotspot detection and FP rate both depend upon hotspot definition. The power increases when hotspots are defined as intensity peaks of five or more in the overall ρ_obs, as compared with the more stringent “seven or more” hotspots, without a significant increase in the FP rate. Interestingly, in the SeattleSNPs simulated data sets by Fearnhead and Smith (2005), the background ρ was set at the genomic average recombination rate (Kong et al. 2002), while hotspots were placed on the top, thus mimicking genomes with higher than human overall recombination rates.

We also compared InfRec determinations with those by Recliner, at the same window size of 8, in a subset of 66 genes that were common in studies by Crawford et al. (2004) and De Silva et al. (2004). As shown in the supplemental material (supplemental Figure S4), both these approaches revealed the same pattern of variation in recombination intensity along the sequence. At the same time, however, ρ_obs-determinations by RecSlider were much greater, especially at high intensity peaks. This can be partly explained by conservative InfRec estimates related to its dependence on RecMin, and Winfred's determinations of FNR cutoff, which eliminated regions of minimal informativeness. In spite of quantitative differences, both InfRec and RecSlider revealed the same landscape of recombination density variation at the sequence level.

The number and location of such defined hotspots were also compared with two other studies (Crawford et al. 2004; De Silva et al. 2004) for a subset of 66 overlapping genes. Crawford et al. (2004) used a method implemented in the program PHASE 2.1.1 (Li and Stephens 2003). They defined a hotspot by its λ-intensity (10 < λ < 100) relative to the surrounding recombination intensity and used the product of approximate conditionals to evaluate the probability of the presence of a hotspot in the segment under study. De Silva et al. (2004) used the method of McVean et al. (2004), implementing the composite-likelihood approach by Hudson (2001), with varying recombination rates between pairs of segregating sites and other adjustments encouraging smoothness of the ρ-estimations. Here again, hotspots were defined as positions in the genome where there was evidence of an increase (greater than fourfold) of the local recombination rate compared with the flanking regions. Both these studies and InfRec detected more hotspots in the AA than in the EA sample (Table 4). However, of the 29 genes with hotspots found in AAs by De Silva et al. (2004) and the 28 found in AAs by Crawford et al. (2004), only 16 were in common, i.e., 55% (16/29), when comparing the first study with the second, or 57% (16/28), when comparing the second with the first. In the EA sample the concordance between these two methods is better, 76 and 73%, respectively. The comparison with InfRec is “one-sided,” because the number of genes with hotspots revealed by this program is smaller, so that concordance is greater with 61–87% of InfRec hotspots also seen by De Silva et al. (2004) and 70–80% by Crawford et al. (2004).

TABLE 4.

Comparison of different methods of hotspot inference in the same sample of 66 genes

No. of hotspot genes	De Silva et al. (2004)	Crawford et al. (2004)	InfRec
AA exclusively	14	9	11
EA exclusively	6	3	3
Both populations	15	19	12
Total out of 66	35	31	26

Open in a new tab

DISCUSSION

It has been shown by various means that the recombination rate is not constant along the human genome. In contrast to nonrecurrent mutations that are always revealed by the presence of a new allele, recombinations are detected indirectly. They can be seen in the “historical record” of the population diversity data only if they occur against an informative background. Our method provides a way to correct for the extent of this informativeness. The inference of the recombination intensity relies on the minimum number of historical recombinations (R_min) evaluated from the four-gamete test (Myers and Griffiths 2003) and weighted by FNR. FNR corrects for the R_min ascertainment that depends on the density of the surrounding segregating sites and provides a means to evaluate and control the reliability of ρ-estimates. Here, on the basis of the empirical analysis (supplemental Figure S2), we decided to discard windows below a certain FNR threshold. FNR importance can be appreciated in the comparative analysis by RecSlider, where unusually high peaks of ρ-intensity were associated with very low FNR values (supplemental Figure S4). Using LDhat, Tiemann-Boege et al. (2006) also “detected” recombination peaks in segments of low polymorphic content subsequently rejected by sperm typing (see region 40–60 kb in Figure 2B of Tiemann-Boege et al. 2006). In conclusion, ρ-estimates in regions of low informativeness are likely to represent false positive peaks or at least be highly unreliable. The FNR parameter describing the informativeness of the data could thus be used in other algorithms as well to identify genomic regions where there is little ground for recombination inference. We also tested whether ρ_obs-estimates by InfRec were not inversely correlated with the underlying FNR. Indeed, there was a tendency to get higher ρ at lower FNR (R² = 0.074 in AAs and R² = 0.036 in EAs, supplemental Figure S5), but this can be ascribed to the depression of FNR relative to FIR, observed in regions of high crossover activity (Figure 1). This effect is better seen in a plot of ρ_obs as a function of the FNR/FIR ratio (R² = 0.21 in AAs and R² = 0.13 in EAs, supplemental Figure S6). At the same time, there was no correlation between ρ_obs values and FIR (R² < 0.005, supplemental Figure S6), thereby suggesting the absence of any significant bias between the informativeness of the segment and the recombination rate estimations by InfRec using an appropriate FNR threshold.

Finally, given the yield of InfRec evaluated by the analysis of simulated data, the total balance of R_obs (or overall genomic ρ_obs, Table 1) required the inclusion of gene conversion events in the count of historical recombinations. On the basis of simulations using a simple population model, we obtained an average estimate of the ratio of gene conversions to simple crossovers of 2.4 that correctly represented the data of two analyzed population samples. This number is in the range of values of f estimates by other authors (Ardlie et al. 2001; Frisse et al. 2001; Padhukasahasram et al. 2004; Ptak et al. 2004), but it certainly also depends on assumed fixed parameters (such as t) and on the population model. As described by others who used different approaches (De Silva et al. 2004; Myers et al. 2005; Serre et al. 2005), we also observed a greater ratio of ρ_obs in AAs to ρ_obs in EAs as compared with the ratio of the corresponding Θ's. As shown by Thornton and Andolfatto (2006), this can be explained by the out-of-Africa bottleneck that drastically reduced average genomic ρ much more than genetic diversity as measured by Θ.

The terms recombination “hotspot” and “coldspot” were used to describe regions of higher and lower crossover activity. However, these terms can be elusive, as they are sensitive to the underlying assumptions that constitute a hotspot (Hellenthal and Stephens 2006). When defined relative to local recombination rates, a segment of DNA that undergoes recombination at the genome average rate can be considered a hotspot if it is embedded in a very “cold” region of the genome (De Massy 2003). The use of the surrounding region as a reference to identify a hotspot may thus explain certain discordances in results among various studies and algorithms (e.g., Figure 2B in Tiemann-Boege et al. 2006). In contrast, our study focused on the characterization of the recombination rate in absolute terms equivalent throughout the genome (Kong et al. 2002) rather than on “relative to local fluctuations” terms. We, therefore, used the genomic average as a reference to define hotspot regions, which may partly explain differences in hotspot identifications by InfRec when compared with methods using definitions based on the relative rate intensity. Nevertheless, the overall concordance of >50% among De Silva et al.'s (2004), Crawford et al.'s (2004), and this study's results demonstrates that all of them, while differing in detail, detected the underlying genomic recombination pattern with most of the crossovers taking place in narrow sequence segments. At the same time, it also shows that the definition of hotspots and coldspots is arbitrary and subjective, rendering comparisons difficult as discussed by Hellenthal and Stephens (2006). Characterization of the distribution of the whole spectrum of ρ-intensities along the sequence, as proposed here, may therefore be more practical and directly related to the extent of LD and haplotype coverage.

In a recent analysis of a 100-kb segment of chromosome 21 by sperm typing, Tiemann-Boege et al. (2006) estimated that 71% of recombinations occur in ∼12% of the sequence. Likewise, McVean et al. (2004) estimated that ∼50% of recombinations occur in 10% of the sequence, which is not far from the recent estimate by Myers et al. (2005) of ∼80% of recombinations in 10–20% of the sequence. InfRec analysis, indicating about half of recombinations taking place in 10% of the sequence and 80% of them in one-third of the genome, is in good agreement with these estimates.

Overall, InfRec estimates were conservative, certainly due to the conservative lower bound estimate of R_min (Myers and Griffiths 2003; Hein et al. 2005). Its performance, defined as the ratio of ρ_obs over the input ρ, and the relative contribution of gene conversions, can easily be evaluated and even “calibrated” by simulations such as reported in Table 2, at different Θ-values (i.e., polymorphic site density), with different window sizes, for different population samples and different demographic histories, etc. However, it implicitly assumes, as do other similar methods, that locus diversity in the past (and thus its capacity to record recombinations) does not substantially differ from what is observed at present. InfRec has the advantage of being fast, simple, conservative, and without sample or data size limitations. It is fast, because no simulations are done and no MCMC models are used to recreate the whole genetic history of the sequence. It is simple, because the theory is intuitive, the calculations are straightforward, and the results are transparent and can be easily related to the underlying diversity data. It also prevents extrapolations when there is no information available. Finally, theoretical expectations of FIR and FIG can be calculated to help analyze a particular data set and also suggest further developments to include an explicit analysis of the balance between simple recombinations and gene conversions.

Acknowledgments

We are grateful to Eric De Silva and Matthew Stephens for providing their results, David Witonsky for help with the RecSlider program, Stephen F. Schaffner for the cosi package, and anonymous reviewers for critical comments and encouragement. This work was supported by Genome Quebec and Genome Canada as well as by the Canadian Institutes of Health Research (MOP-67150).

References

Andrés, A. M., A. G. Clark, L. Shimmin, E. Boerwinkle, C. F. Sing et al., 2007. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31 659–671. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ardlie, K., S. N. Liu-Cordero, M. A. Eberle, M. Daly, J. Barrett et al., 2001. Lower-than-expected linkage disequilibrium between tightly linked markers in humans suggests a role for gene conversion. Am. J. Hum. Genet. 69 582–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman, K. W., J. C. Murray, V. C. Sheffield, R. L. White and J. L. Weber, 1998. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63 861–869. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chakravarti, A., K. H. Buetow, S. E. Antonarakis, P. G. Waber, C. D. Boehm et al., 1984. Nonuniform recombination within the human beta-globin gene cluster. Am. J. Hum. Genet. 36 1239–1258. [PMC free article] [PubMed] [Google Scholar]
Crawford, D. C., T. Bhangale, N. Li, G. Hellenthal, M. J. Rieder et al., 2004. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36 700–706. [DOI] [PubMed] [Google Scholar]
De Massy, B., 2003. Distribution of meiotic recombination sites. Trends Genet. 19 514–522. [DOI] [PubMed] [Google Scholar]
De Silva, E., L. A. Kelley and M. P. Stumpf, 2004. The extent and importance of intragenic recombination. Hum. Genomics 1 410–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fearnhead, P., 2006. SequenceLDhot: detecting recombination hotspots. Bioinformatics 22 3061–3066. [DOI] [PubMed] [Google Scholar]
Fearnhead, P., and P. Donnelly, 2001. Estimating recombination rates from population genetic data. Genetics 159 1299–1318. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fearnhead, P., and N. G. Smith, 2005. A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am. J. Hum. Genet. 77 781–794. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fearnhead, P., R. M. Harding, J. A. Schneider, S. Myers and P. Donnelly, 2004. Application for coalescent methods to reveal fine-scale rate variation and recombination hotspots. Genetics 167 2067–2081. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frisse, L., R. R. Hudson, A. Bartoszewicz, J. D. Wall, J. Donfack et al., 2001. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69 831–843. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hein, J., M. Schierup and C. Wiuf, 2005. Gene Genealogies, Variation and Evolution. A Primer in Coalescent Theory. Oxford University Press, Oxford.
Hellenthal, G., and M. Stephens, 2006. Insights into recombination from population genetic variation. Curr. Opin. Genet. Dev. 16 565–572. [DOI] [PubMed] [Google Scholar]
Hellenthal, G., and M. Stephens, 2007. msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots. Bioinformatics 23 520–521. [DOI] [PubMed] [Google Scholar]
Hubert, R., M. MacDonald, J. Gusella and N. Arnheim, 1994. High resolution localization of recombination hot spots using sperm typing. Nat. Genet. 7 420–424. [DOI] [PubMed] [Google Scholar]
Hudson, R. R., 1990. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7 1–44. [Google Scholar]
Hudson, R. R., 2001. Two-locus sampling distributions and their application. Genetics 159 1805–1817. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hudson, R. R., and N. L. Kaplan, 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111 147–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jeffreys, A. J., and C. A. May, 2004. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat. Genet. 36 151–156. [DOI] [PubMed] [Google Scholar]
Jeffreys, A. J., L. Kauppi and R. Neumann, 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29 217–222. [DOI] [PubMed] [Google Scholar]
Jeffreys, A. J., R. Neumann, M. Panayi, S. Myers and P. Donnelly, 2005. Human recombination hot spots hidden in regions of strong marker association. Nat. Genet. 37 601–606. [DOI] [PubMed] [Google Scholar]
Kauppi, L., A. J. Jeffreys and S. Keeney, 2004. Where the crossovers are: recombination distributions in mammals. Nat. Rev. Genet. 5 413–424. [DOI] [PubMed] [Google Scholar]
Kong, A., D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson et al., 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31 241–247. [DOI] [PubMed] [Google Scholar]
Labuda, D., C. Labbe, S. Langlois, J.-F. Lefebvre, V. Freytag et al., 2007. Patterns of variation in DNA segments upstream of transcription start sites. Hum. Mutat. 28 441–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li, J., M. Q. Zhang and X. Zhang, 2006. A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data. Am. J. Hum. Genet. 79 628–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li, N., and M. Stephens, 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165 2213–2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
McVean, G., P. Awadalla and P. Fearnhead, 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160 1231–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
McVean, G. A., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley et al., 2004. The fine-scale structure of recombination rate variation in the human genome. Science 304 581–584. [DOI] [PubMed] [Google Scholar]
Myers, S., L. Bottolo, C. Freeman, G. McVean and P. Donnelly, 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310 321–324. [DOI] [PubMed] [Google Scholar]
Myers, S. R., and R. C. Griffiths, 2003. Bounds on the minimum number of recombination events in a sample history. Genetics 163 375–394. [DOI] [PMC free article] [PubMed] [Google Scholar]
Padhukasahasram, B., P. Marjoram and M. Nordborg, 2004. Estimating the rate of gene conversion on human chromosome 21. Am. J. Hum. Genet. 75 386–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Padhukasahasram, B., J. D. Wall, P. Marjoram and M. Nordborg, 2006. Estimating recombination rates from single-nucleotide polymorphisms using summary statistics. Genetics 174 1517–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petes, T. D., 2001. Meiotic recombination hot spots and cold spots. Nat. Rev. Genet. 2 360–369. [DOI] [PubMed] [Google Scholar]
Przeworski, M., and J. D. Wall, 2001. Why is there so little intragenic linkage disequilibrium in humans? Genet. Res. 77 143–151. [DOI] [PubMed] [Google Scholar]
Ptak, S. E., K. Voelpel and M. Przeworski, 2004. Insights into recombination from patterns of linkage disequilibrium in humans. Genetics 167 387–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ptak, S. E., D. A. Hinds, K. Koehler, B. Nickel, N. Patil et al., 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat. Genet. 37 429–434. [DOI] [PubMed] [Google Scholar]
Reich, D. E., M. Cargill, S. Bolk, J. Ireland, P. C. Sabeti et al., 2001. Linkage disequilibrium in the human genome. Nature 411 199–204. [DOI] [PubMed] [Google Scholar]
Schaffner, S. F., C. Foo, S. Gabriel, D. Reich, M. J. Daly et al., 2005. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
Serre, D., R. Nadon and T. J. Hudson, 2005. Large-scale recombination rate patterns are conserved among human populations. Genome Res. 15 1547–1552. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephens, J. C., 1986. On the frequency of undetectable recombination events. Genetics 112 923–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephens, M., N. J. Smith and P. Donnelly, 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68 978–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stumpf, M. P., and G. A. McVean, 2003. Estimating recombination rates from population-genetic data. Nat. Rev. Genet. 4 959–968. [DOI] [PubMed] [Google Scholar]
Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thornton, K., and P. Andolfatto, 2006. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172 1607–1619. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tiemann-Boege, I., P. Calabrese, D. Cochran, R. Sokol and N. Arnheim, 2006. High resolution recombination patterns in a region of human chromosome 21 measured by sperm typing. PLoS Genet. 2 e70. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wall, J. D., 2004. Estimating recombination rates using three-site likelihoods. Genetics 167 1461–1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wall, J. D., L. A. Frisse, R. R. Hudson and A. Di Rienzo, 2003. Comparative linkage-disequilibrium analysis of the beta-globin hotspot in primates. Am. J. Hum. Genet. 73 1330–1340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watterson, G. A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7 256–276. [DOI] [PubMed] [Google Scholar]
Winckler, W., S. R. Myers, D. J. Richter, R. C. Onofrio, G. J. McDonald et al., 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308 107–111. [DOI] [PubMed] [Google Scholar]
Zietkiewicz, E., V. Yotova, D. Gehl, T. Wambach, I. Arrieta et al., 2003. Haplotypes in the dystrophin DNA segment point to a mosaic origin of modern humans' diversity. Am. J. Hum. Genet. 73 994–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] Andrés, A. M., A. G. Clark, L. Shimmin, E. Boerwinkle, C. F. Sing et al., 2007. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31 659–671. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Ardlie, K., S. N. Liu-Cordero, M. A. Eberle, M. Daly, J. Barrett et al., 2001. Lower-than-expected linkage disequilibrium between tightly linked markers in humans suggests a role for gene conversion. Am. J. Hum. Genet. 69 582–589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Broman, K. W., J. C. Murray, V. C. Sheffield, R. L. White and J. L. Weber, 1998. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63 861–869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Chakravarti, A., K. H. Buetow, S. E. Antonarakis, P. G. Waber, C. D. Boehm et al., 1984. Nonuniform recombination within the human beta-globin gene cluster. Am. J. Hum. Genet. 36 1239–1258. [PMC free article] [PubMed] [Google Scholar]

[bib5] Crawford, D. C., T. Bhangale, N. Li, G. Hellenthal, M. J. Rieder et al., 2004. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36 700–706. [DOI] [PubMed] [Google Scholar]

[bib6] De Massy, B., 2003. Distribution of meiotic recombination sites. Trends Genet. 19 514–522. [DOI] [PubMed] [Google Scholar]

[bib7] De Silva, E., L. A. Kelley and M. P. Stumpf, 2004. The extent and importance of intragenic recombination. Hum. Genomics 1 410–420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Fearnhead, P., 2006. SequenceLDhot: detecting recombination hotspots. Bioinformatics 22 3061–3066. [DOI] [PubMed] [Google Scholar]

[bib9] Fearnhead, P., and P. Donnelly, 2001. Estimating recombination rates from population genetic data. Genetics 159 1299–1318. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Fearnhead, P., and N. G. Smith, 2005. A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am. J. Hum. Genet. 77 781–794. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Fearnhead, P., R. M. Harding, J. A. Schneider, S. Myers and P. Donnelly, 2004. Application for coalescent methods to reveal fine-scale rate variation and recombination hotspots. Genetics 167 2067–2081. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Frisse, L., R. R. Hudson, A. Bartoszewicz, J. D. Wall, J. Donfack et al., 2001. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69 831–843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Hein, J., M. Schierup and C. Wiuf, 2005. Gene Genealogies, Variation and Evolution. A Primer in Coalescent Theory. Oxford University Press, Oxford.

[bib13] Hellenthal, G., and M. Stephens, 2006. Insights into recombination from population genetic variation. Curr. Opin. Genet. Dev. 16 565–572. [DOI] [PubMed] [Google Scholar]

[bib14] Hellenthal, G., and M. Stephens, 2007. msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots. Bioinformatics 23 520–521. [DOI] [PubMed] [Google Scholar]

[bib15] Hubert, R., M. MacDonald, J. Gusella and N. Arnheim, 1994. High resolution localization of recombination hot spots using sperm typing. Nat. Genet. 7 420–424. [DOI] [PubMed] [Google Scholar]

[bib16] Hudson, R. R., 1990. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7 1–44. [Google Scholar]

[bib17] Hudson, R. R., 2001. Two-locus sampling distributions and their application. Genetics 159 1805–1817. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Hudson, R. R., and N. L. Kaplan, 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111 147–164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Jeffreys, A. J., and C. A. May, 2004. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat. Genet. 36 151–156. [DOI] [PubMed] [Google Scholar]

[bib20] Jeffreys, A. J., L. Kauppi and R. Neumann, 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29 217–222. [DOI] [PubMed] [Google Scholar]

[bib21] Jeffreys, A. J., R. Neumann, M. Panayi, S. Myers and P. Donnelly, 2005. Human recombination hot spots hidden in regions of strong marker association. Nat. Genet. 37 601–606. [DOI] [PubMed] [Google Scholar]

[bib22] Kauppi, L., A. J. Jeffreys and S. Keeney, 2004. Where the crossovers are: recombination distributions in mammals. Nat. Rev. Genet. 5 413–424. [DOI] [PubMed] [Google Scholar]

[bib23] Kong, A., D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson et al., 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31 241–247. [DOI] [PubMed] [Google Scholar]

[bib24] Labuda, D., C. Labbe, S. Langlois, J.-F. Lefebvre, V. Freytag et al., 2007. Patterns of variation in DNA segments upstream of transcription start sites. Hum. Mutat. 28 441–450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Li, J., M. Q. Zhang and X. Zhang, 2006. A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data. Am. J. Hum. Genet. 79 628–639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Li, N., and M. Stephens, 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165 2213–2233. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] McVean, G., P. Awadalla and P. Fearnhead, 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160 1231–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] McVean, G. A., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley et al., 2004. The fine-scale structure of recombination rate variation in the human genome. Science 304 581–584. [DOI] [PubMed] [Google Scholar]

[bib29] Myers, S., L. Bottolo, C. Freeman, G. McVean and P. Donnelly, 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310 321–324. [DOI] [PubMed] [Google Scholar]

[bib30] Myers, S. R., and R. C. Griffiths, 2003. Bounds on the minimum number of recombination events in a sample history. Genetics 163 375–394. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Padhukasahasram, B., P. Marjoram and M. Nordborg, 2004. Estimating the rate of gene conversion on human chromosome 21. Am. J. Hum. Genet. 75 386–397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Padhukasahasram, B., J. D. Wall, P. Marjoram and M. Nordborg, 2006. Estimating recombination rates from single-nucleotide polymorphisms using summary statistics. Genetics 174 1517–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Petes, T. D., 2001. Meiotic recombination hot spots and cold spots. Nat. Rev. Genet. 2 360–369. [DOI] [PubMed] [Google Scholar]

[bib34] Przeworski, M., and J. D. Wall, 2001. Why is there so little intragenic linkage disequilibrium in humans? Genet. Res. 77 143–151. [DOI] [PubMed] [Google Scholar]

[bib35] Ptak, S. E., K. Voelpel and M. Przeworski, 2004. Insights into recombination from patterns of linkage disequilibrium in humans. Genetics 167 387–397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Ptak, S. E., D. A. Hinds, K. Koehler, B. Nickel, N. Patil et al., 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat. Genet. 37 429–434. [DOI] [PubMed] [Google Scholar]

[bib37] Reich, D. E., M. Cargill, S. Bolk, J. Ireland, P. C. Sabeti et al., 2001. Linkage disequilibrium in the human genome. Nature 411 199–204. [DOI] [PubMed] [Google Scholar]

[bib38] Schaffner, S. F., C. Foo, S. Gabriel, D. Reich, M. J. Daly et al., 2005. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Serre, D., R. Nadon and T. J. Hudson, 2005. Large-scale recombination rate patterns are conserved among human populations. Genome Res. 15 1547–1552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Stephens, J. C., 1986. On the frequency of undetectable recombination events. Genetics 112 923–926. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Stephens, M., N. J. Smith and P. Donnelly, 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68 978–989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Stumpf, M. P., and G. A. McVean, 2003. Estimating recombination rates from population-genetic data. Nat. Rev. Genet. 4 959–968. [DOI] [PubMed] [Google Scholar]

[bib43] Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Thornton, K., and P. Andolfatto, 2006. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172 1607–1619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Tiemann-Boege, I., P. Calabrese, D. Cochran, R. Sokol and N. Arnheim, 2006. High resolution recombination patterns in a region of human chromosome 21 measured by sperm typing. PLoS Genet. 2 e70. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Wall, J. D., 2004. Estimating recombination rates using three-site likelihoods. Genetics 167 1461–1473. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Wall, J. D., L. A. Frisse, R. R. Hudson and A. Di Rienzo, 2003. Comparative linkage-disequilibrium analysis of the beta-globin hotspot in primates. Am. J. Hum. Genet. 73 1330–1340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Watterson, G. A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7 256–276. [DOI] [PubMed] [Google Scholar]

[bib49] Winckler, W., S. R. Myers, D. J. Richter, R. C. Onofrio, G. J. McDonald et al., 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308 107–111. [DOI] [PubMed] [Google Scholar]

[bib50] Zietkiewicz, E., V. Yotova, D. Gehl, T. Wambach, I. Arrieta et al., 2003. Haplotypes in the dystrophin DNA segment point to a mosaic origin of modern humans' diversity. Am. J. Hum. Genet. 73 994–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Fraction of Informative Recombinations: A Heuristic Approach to Analyze Recombination Rates

J-F Lefebvre

D Labuda

Abstract