The Influence of Gene Conversion on Linkage Disequilibrium Around a Selective Sweep

Danielle A Jones; John Wakeley

doi:10.1534/genetics.108.092270

. 2008 Oct;180(2):1251–1259. doi: 10.1534/genetics.108.092270

The Influence of Gene Conversion on Linkage Disequilibrium Around a Selective Sweep

Danielle A Jones ^1,¹, John Wakeley ¹

PMCID: PMC2567372 PMID: 18757941

Abstract

In a 2007 article, McVean studied the effect of recombination on linkage disequilibrium (LD) between two neutral loci located near a third locus that has undergone a selective sweep. The results demonstrated that two loci on the same side of a selected locus might show substantial LD, whereas the expected LD for two loci on opposite sides of a selected locus is zero. In this article, we extend McVean's model to include gene conversion. We show that one of the conclusions is strongly affected by gene conversion: when gene conversion is present, there may be substantial LD between two loci on opposite sides of a selective sweep.

MCVEAN (2002) showed that predictions for r², a commonly used measure of LD introduced by Hill and Robertson (1968), depend on the correlations in coalescence times for a pair of loci, which in turn depend on the recombination rate between the loci. In applying this result to LD near a locus that has undergone a selective sweep, McVean (2007) developed a new model that features two neutral loci partially linked to the selected locus. He assumed that recombination could occur between each pair of loci and that the sweep had a particularly simple structure: a star tree. Figure 1 depicts the model, in which a sample at the two neutral loci is taken at the present time 0, just after the sweep has finished. The sweep is assumed to have occurred quickly and to have begun at time t_M in the past, measured in units of 2N_e generations, where N_e is the (diploid) coalescent effective population size (Sjödin et al. 2005). On the basis of the work of Maynard Smith and Haigh (1974) and others (Kaplan et al. 1989; Stephan et al. 1992; Durrett and Schweinsberg 2004), McVean (2007) used t_M = 0.1, and we adopt this value below. McVean (2007) tested the validity of this approximate model by comparing its predictions to the results of fully stochastic simulations of a sweep and found them to be largely accurate.

Figure 1.— — This is an adaptation of Figure 1 of McVean (2007). (A) The three configurations are A, two neutral loci are sampled from two chromosomes; B, two neutral loci are sampled from three chromosomes; and C, two neutral loci are sampled from four chromosomes. (B) The selection event occurs as a rapid selective sweep during which only crossing-over events can occur. During the neutral phase, coalescent events can also occur.

McVean (2007) allowed for recombination (reciprocal exchange of genetic material as in a single crossover event) but not for gene conversion (nonreciprocal exchange of short tracks of genetic material). However, there is a growing body of evidence for the importance of gene conversion in shaping genetic variation in humans (Frisse et al. 2001; Jeffreys and May 2004; Padhukasahasram et al. 2004; Chen et al. 2007; Gay et al. 2007) and models that do not feature gene conversion, therefore, do not completely capture the biological causes, and expectations, of genetic variability. Our aim here is to incorporate gene conversion into the model and to ask whether this changes the results. We focus on the case in which variation at the two neutral loci is due to mutations that occurred during the neutral phase shown in Figure 1B. In this case, the two neutral loci can be polymorphic only if they do not coalesce along with the selected allele at time t_M in Figure 1B. Without recombination or gene conversion, present-day samples at the two neutral loci will always remain linked to the selected allele and will certainly coalesce with the selected allele. Recombination and gene conversion allow the loci to “escape” the sweep with some probability and to coalesce during the neutral phase, where they might also experience mutations.

To make a prediction for r²—specifically Inline graphic of Ohta and Kimura (1971)—it is necessary to compute the expectation of the product of the coalescence times at two loci for each of the three sample configurations (A, B, and C) in Figure 1A (McVean 2002). Briefly, in the three-locus model of McVean (2007), for each of these three sampling configurations, we must compute the probability that the two neutral loci are in configuration A, B, or C at the start of the neutral phase looking back (i.e., time t_M), at which point all chromosomes in the population have the ancestral type, or wild type, at the selected locus. There are nine such probabilities in total, one for each pair of configurations. These nine probabilities are denoted using φ, with subscripts to represent configurations. The expected product of coalescence times at the two neutral loci, sampled at present when all chromosomes possess the selected allele (denoted by the subscript S), are the averages over the three ancestral configurations. The expected coalescence time for two chromosomes, i and j, sampled at locus X is written as Inline graphic and for chromosomes k and l, at locus Y, it is written as . For configuration A, we have

which is Equation 9 in McVean (2007). The probabilities φ_AA, φ_AB, and so on, depend on t_M and on the rates of recombination and gene conversion. The expected values on the right-hand side above are those expected during the neutral phase (W stands for wild-type allele) and are given in Equation 10 of McVean (2007).

The predictions about LD depend strongly on the relative position of the selected locus compared to the neutral loci. McVean considered two cases: (1) the selected locus is located halfway between the two neutral loci (NSN) and (2) the selected locus is located on one side of the two neutral loci (SNN). All of the derivations described above are done separately for these two cases. Considering only recombination, McVean predicted substantial LD for SNN, because in this case both neutral loci can escape the sweep yet remain linked to each other at the beginning of the neutral phase. For the NSN case, the model without gene conversion predicts no LD between the two neutral loci. As McVean notes, this reflects the symmetric nature of the recombination process for the NSN case: the probabilities of each of the three configurations at the beginning of the neutral phase (A, B, or C) are the same for each of the three sample configurations, so that Inline graphic = 0 (see Equation 14 of McVean 2007). This symmetry breaks down when gene conversion is included in the model because gene conversion at the selected locus allows both neutral loci to escape the sweep yet remain linked at the beginning of the neutral phase.

Figure 2 gives a graphical representation of the model for SNN and NSN, with and without gene conversion. We assume that when gene conversion occurs, it copies a tract length of m nucleotides, which is greater than the size of each locus and less than the distance between the loci. Thus, gene conversion acts on single loci independently. During the neutral phase, on the coalescent timescale, recombination events occur with rates R_x and R_y, and gene conversion events occur at rates κ_s, κ_x, and κ_y. Following McVean (2007), during the selection phase with duration t_M there are two probabilities of escape via recombination: Inline graphic and = . To these we add three probabilities of escape by gene conversion: , = 1 − , .

Inline graphic — There are four cases: SNN without gene conversion and with gene conversion and NSN without gene conversion and with gene conversion. In all cases, during the selective sweep phase, there are two parameters that describe the probability of escape via recombination: and . During the neutral phase, the probability of recombination is *R_x* = 4N_er and *R_y* = 4N_er, where r is the per generation probability of recombination. The recombination distance between the two neutral loci is kept constant regardless of whether they are in the SNN or the NSN case. Thus for NSN, . When gene conversion is added to the model for both SNN and NSN, the gene conversion is restricted to the individual loci. The overall rate of gene conversion is κ_X = κ_S = κ_Y = 4N_eγ for each locus, where γ is the per generation probability of gene conversion.

Tables A1–A6 in the appendix give all the terms needed to compute the probabilities φ_AA, φ_AB, and so on, for both SNN and NSN. We follow McVean (2007) in computing transitions between the present and the start of the neutral phase, using the intermediate configuration at the end of the selection phase; our Tables A1–A3 correspond to each of the three columns of Appendix A in McVean (2007) and our Tables A4–A6 correspond to each of the three columns of Appendix B in McVean (2007). Again, the key difference between the models is that in the NSN case gene conversion allows present-day configuration A to remain in configuration A at the start of the neutral phase, whereas this is not possible by recombination alone. Figure 3 shows the configurations that can be reached from sampling configuration A at the present; it is analogous to Figure 2 in McVean (2007) and shows five of the six novel configurations (the bottom five transitions encompassed in a box) that can be reached when gene conversion is included in the model.

TABLE A1.

Transition probabilities for NSN for the starting configuration A to the four states, A, B, C, or O, present at the beginning of the neutral phase

Configuration at the end of selection phase	Probability given starting configuration [i^Sj^S, i^Sj^S]	Configuration at the start of the neutral phase
[i^Sj^S, i^Sj^S]	((1 – g_s))²	O
[i^Sj^S, i^Sk^W]	2((1 – g_s))((1 – g_s))	O
[i^Sj^W, i^Sk^S]	2((1 – g_s))((1 – g_s))	O
[i^Sj^W, i^Sk^W]	2((1 – g_s))( + g_s + g_s)	B
[i^Sj^S, k^Wl^W]	(((1 – g_s)))²	O
[i^Wj^W, k^Sl^S]	((1 – g_s))²	O
[i^Sj^W, k^Sl^W]	2((1 – g_s))((1 – g_s))	B
[i^Sj^W, k^Wl^W]	2((1 – g_s))( + g_s + g_s)	C
[i^Wj^W, k^Sl^W]	2( + g_s + g_s)((1 – g_s))	C
[i^Wj^W, k^Wl^W]	( + g_s + g_s)²	C
[i^Sj^S, i^Sk^S]	0	O
[i^Sj^S, k^Sl^W]	0	O
[i^Sj^W, k^Sl^S]	0	O
[i^Sj^S, k^Sl^S]	0	O
[i^Sj^W, i^Sj^W]	2((1 – g_s))(g_s)	A
[i^Wj^W, i^Wk^S]	2(g_s)((1 – g_s))	B
[i^Wj^W, i^Wj^W]	(g_s)²	A
[i^Wj^S, i^Wk^W]	2(g_s)((1 – g_s))	B
[i^Wj^W, i^Wk^W]	2(g_s)( + g_s + g_s)	B
[i^Wj^S, i^Wk^S]	0	A

Open in a new tab

O corresponds to a state where at least one of the two neutral loci has coalesced. There are six new states created by the addition of gene conversion that are not present when recombination is the only crossing-over event option. These are the six last states. This table corresponds to the transition probabilities in the second column of Appendix A of McVean (2007) and the same notation, explained in detail in Figure 3, is used.

TABLE A2.

Transition probabilities for NSN for the starting configuration B

Configuration at the end of selection phase	Probability given starting configuration [i^Sj^S, i^Sk^S]	Configuration at start of neutral phase
[i^Sj^S, i^Sj^S]	0	O
[i^Sj^S, i^Sk^W]	(1 – g_s)(1 – g_s)( + g_s)	O
[i^Sj^W, i^Sk^S]	(1 – g_s)( + g_s)(1 – g_s)	O
[i^Sj^W, i^Sk^W]	(1 – g_s)(( + g_s))( + g_s)	B
[i^Sj^S, k^Wl^W]	(1 – g_s)(1 – g_s)( + g_s)	O
[i^Wj^W, k^Sl^S]	(1 – g_s)( + g_s)(1 – g_s)	O
[i^Sj^W, k^Sl^W]	((1 – g_s)( + g_s)(1 – g_s) + (1 – g_s)(1 – g_s)( + g_s) + ( + g_s + g_s)(1 – g_s)²)	B
[i^Sj^W, k^Wl^W]	((1 – g_s)( + g_s)( + g_s) + ( + g_s + g_s)(1 – g_s)( + g_s))	C
[i^Wj^W, k^Sl^W]	((1 – g_s)( + g_s)( + g_s) + ( + g_s + g_s)(+ g_s)(1 – g_s)	C
[i^Wj^W, k^Wl^W]	( + g_s + g_s)( + g_s)( + g_s)	C
[i^Sj^S, i^Sk^S]	(1 – g_s)(1 – g_s)²	O
[i^Sj^S, k^Sl^W]	(1 – g_s)(1 – g_s)²	O
[i^Sj^W, k^Sl^S]	(1 – g_s)(1 – g_s)²	O
[i^Sj^S, k^Sl^S]	0	O
[i^Sj^W, i^Sj^W]	0	A
[i^Wj^W, i^Wk^S]	g_s(+ g_s)(1 – g_s)	B
[i^Wj^W, i^Wj^W]	0	A
[i^Wj^S, i^Wk^W]	g_s(1 – g_s)( + g_s)	B
[i^Wj^W, i^Wk^W]	g_s( + g_s)( + g_s)	B
[i^Wj^S, i^Wk^S]	g_s(1 – g_s)²	A

Open in a new tab

The six new states created by the addition of gene conversion are the last six rows of the table. This corresponds to the transition probabilities in the third column of Appendix A of McVean (2007).

TABLE A3.

Transition probabilities for NSN for the starting configuration C

Configuration at the end of selection phase	Probability given starting configuration [i^Sj^S, k^Sl^S]	Configuration at the start of neutral phase
[i^Sj^S, i^Sj^S]	0	O
[i^Sj^S, i^Sk^W]	0	O
[i^Sj^W, i^Sk^S]	0	O
[i^Sj^W, i^Sk^W]	0	B
[i^Sj^S, k^Wl^W]	((1 – g_s)( + g_s))²	O
[i^Wj^W, k^Sl^S]	(( + g_s)(1 – g_s))²	O
[i^Sj^W, k^Sl^W]	2((1 – g_s)²( + g_s)( + g_s) + (1 – g_s)( + g_s)( + g_s)(1 – g_s))	B
[i^Sj^W, k^Wl^W]	2(1 – g_s)( + g_s)( + g_s)( + g_s)	C
[i^Wj^W, k^Sl^W]	2( + g_s)(1 – g_s)( + g_s)( + g_s)	C
[i^Wj^W, k^Wl^W]	(( + g_s)( + g_s))²	C
[i^Sj^S, i^Sk^S]	0	O
[i^Sj^S, k^Sl^W]	2(1 – g_s)²(1 – g_s)( + g_s)	O
[i^Sj^W, k^Sl^S]	2(1 – g_s)²( + g_s)(1 – g_s)	O
[i^Sj^S, k^Sl^S]	((1 – g_s)²)²	O
[i^Sj^W, i^Sj^W]	0	A
[i^Wj^W, i^Wk^S]	0	B
[i^Wj^W, i^Wj^W]	0	A
[i^Wj^S, i^Wk^W]	0	B
[i^Wj^W, i^Wk^W]	0	B
[i^Wj^S, i^Wk^S]	0	A

Open in a new tab

There are six new states created by the addition of gene conversion; these are described in the last six rows of the table. This corresponds to the transition probabilities in the fourth column of Appendix A of McVean (2007).

TABLE A4.

Transition probabilities for SNN when the starting configuration is A

Configuration at the end of selection phase	Probability given starting configuration [i^Sj^S, i^Sj^S]	Configuration at the start of neutral phase
[i^Sj^S, i^Sj^S]	((1 – g_s))²	O
[i^Sj^W, i^Sj^W]	2(1 – g_s)(p_x + g_s – p_xg_s)(1 – g_x)	A
[i^Sj^S, i^Sk^W]	2((1 – g_s))((1 – g_s))	O
[i^Sj^W, i^Sk^W]	2(1 – g_s)((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)	B
[i^Wj^W, i^Wj^W]	((p_x + g_s – p_xg_s)(1 – g_x))²	A
[i^Wj^S, i^Wk^W]	2(p_x + g_s – p_xg_s)(1 – g_x)(1 – g_s)	B
[i^Sj^S, k^Wl^W]	((1 – g_s))²	O
[i^Sj^W, k^Wl^W]	2(1 – g_s)((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)	C
[i^Wj^W, i^Wk^W]	2(p_x + g_s – p_xg_s)(1 – g_x)((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)	B
[i^Wj^W, k^Wl^W]	(((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x))²	C
[i^Sj^S, i^Sk^S]	0	O
[i^Sj^W, i^Sk^S]	2((1 – g_s))(g_x)	O
[i^Sj^S, k^Sl^W]	0	O
[i^Wj^S, i^Wk^S]	0	A
[i^Sj^W, k^Sl^W]	2g_x(1 – g_s)	B
[i^Wj^W, i^Wk^S]	2(p_x + g_s – p_xg_s)(1 – g_x)g_x	B
[i^Wj^W, k^Sl^W]	2g_x((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)	C
[i^Sj^W, k^Sl^S]	0	O
[i^Wj^W, k^Sl^S]	(g_x)²	O
[i^Sj^S, k^Sl^S]	0	O

Open in a new tab

There are no new states created with the addition of gene conversion but there are new transition probabilities. This table corresponds to the transition probabilities in the second column of Appendix B of McVean (2007).

TABLE A5.

Transition probabilities for SNN when the starting configuration is B

Configuration at the end of selection phase	Probability given starting configuration [i^Sj^S, i^Sk^S]	Configuration at the start of neutral phase
[i^Sj^S, i^Sj^S]	0	O
[i^Sj^W, i^Sj^W]	0	A
[i^Sj^S, i^Sk^W]	((1 – g_s))((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	O
[i^Sj^W, i^Sk^W]	(1 – g_s)( + (1 – g_x)g_s(1 – p_x))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	B
[i^Wj^W, i^Wj^W]	0	A
[i^Wj^S, i^Wk^W]	(p_x + g_s – p_xg_s)(1 – g_x)((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	B
[i^Sj^S, k^Wl^W]	(1 – g_s)((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	O
[i^Sj^W, k^Wl^W]	(1 – g_s)(+ (1 – g_x)g_s(1 – p_x))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y) + ((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	C
[i^Wj^W, i^Wk^W]	(p_x + g_s – p_xg_s)(1 – g_x)( + (1 – g_x)g_s(1 – p_x)) (q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	B
[i^Wj^W, k^Wl^W]	((g_s + p_x – p_xg_s)( + g_x)+ q_x(1 – g_s)g_x)( + (1 – g_x)g_s(1 – p_x))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	C
[i^Sj^S, i^Sk^S]	(1 – g_s)q_x(1 – g_s)²	O
[i^Sj^W, i^Sk^S]	((1 – g_s))( + g_s)(q_x(1 – g_s))	O
[i^Sj^S, k^Sl^W]	(1 – g_s)q_x(1 – g_s)²	O
[i^Wj^S, i^Wk^S]	(p_x + g_s – p_xg_s)(1 – g_x)q_x(1 – g_s)²	A
[i^Sj^W, k^Sl^W]	g_x((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)+ (1 – g_s) ( + g_s)(q_x(1 – g_s))+ ((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)q_x(1 – g_s)²	B
[i^Wj^W, i^Wk^S]	(p_x + g_s – p_xg_s)(1 – g_x)( + g_s)(q_x(1 – g_s))	B
[i^Wj^W, k^Sl^W]	g_x( + (1 – g_x)g_s(1 – p_x))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)+ ((g_s + p_x – p_xg_s)( + g_x) + q_x(1 – g_s)g_x)( + g_s)(q_x(1 – g_s))	C
[i^Sj^W, k^Sl^S]	g_xq_x(1 – g_s)²	O
[i^Wj^W, k^Sl^S]	g_x( + g_s)(q_x(1 – g_s))	O
[i^Sj^S, k^Sl^S]	0	O

Open in a new tab

There are no new states created with the addition of gene conversion but there are no new transition probabilities. This corresponds to the transition probabilities in the third column of Appendix B of McVean (2007).

TABLE A6.

Transition probabilities for SNN when the starting configuration is C

Configuration at the end of selection phase	Probability given starting configuration [i^Sj^S, k^Sl^S]	Configuration at the start of neutral phase
[i^Sj^S, i^Sj^S]	0	O
[i^Sj^W, i^Sj^W]	0	A
[i^Sj^S, i^Sk^W]	0	O
[i^Sj^W, i^Sk^W]	0	B
[i^Wj^W, i^Wj^W]	0	A
[i^Wj^S, i^Wk^W]	0	B
[i^Sj^S, k^Wl^W]	(((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y))²	O
[i^Sj^W, k^Wl^W]	2((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)( + (1 – g_x) g_s(1 – p_x))(q_xg_s + q_xq_yg_y* + p_xq_y + q_xp_y + p_xp_y)	C
[i^Wj^W, i^Wk^W]	0	B
[i^Wj^W, k^Wl^W]	( + (1 – g_x)g_s(1 – p_x))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y))²	C
[i^Sj^S, i^Sk^S]	0	O
[i^Sj^W, i^Sk^S]	0	O
[i^Sj^S, k^Sl^W]	2q_x(1 – g_s)²((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	O
[i^Wj^S, i^Wk^S]	0	A
[i^Sj^W, k^Sl^W]	2((1 – g_s))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)( + g_s) (q_x(1 – g_s))+ 2q_x(1 – g_s)²( + (1 – g_x)g_s(1 – p_x)) (q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	B
[i^Wj^W, i^Wk^S]	0	B
[i^Wj^W, k^Sl^W]	2( + g_s)(q_x(1 – g_s))( + (1 – g_x)g_s(1 – p_x))(q_xg_s + q_xq_yg_y + p_xq_y + q_xp_y + p_xp_y)	C
[i^Sj^W, k^Sl^S]	2q_x(1 – g_s)²( + g_s)(q_x(1 – g_s))	O
[i^Wj^W, k^Sl^S]	(( + g_s)(q_x(1 – g_s)))²	O
[i^Sj^S, k^Sl^S]	(q_x(1 – g_s)²)²	O

Open in a new tab

There are no new states created with the addition of gene conversion but there are new transition probabilities. This corresponds to the transition probabilities in the fourth column of Appendix B of McVean (2007).

Figure 3.— — This is an adaptation of Figure 2 of McVean (2007) but, importantly, it includes five novel transitions from configuration A to configurations A and B at the start of the neutral phase that are not possible when only recombination is present. The transitions represented correspond to the transition probabilities present in Table A1 and are in the same order—excluding the five cases where the transition probability is zero—as in Table A1. Also shown in brackets is the notation used for each configuration (McVean 2007). The notation for configuration A is [i^Sj^S, i^Sj^S] which indicates that two chromosomes, i and j, were sampled at locus X and locus Y, and both chromosomes possess the selected allele. The sampling configurations at locus X and locus Y are separated by a comma.

Following McVean (2007), to generate predictions applicable to molecular data, we assume that the rates of recombination, R_x and R_y, depend linearly on the distance between the loci. For example, if locus X is n nucleotides away from locus S, then R_x = nρ, where ρ is the rate of recombination between adjacent nucleotides on the coalescent timescale. In addition, we assume that each locus is a single-nucleotide site, so that in the neutral phase all three loci have the same rate of conversion: κ_s = κ_x = κ_y = mκ, where κ is the rate of initiation of a gene conversion event between two adjacent nucleotides on a coalescent timescale. Finally, we include a parameter, f = κ/ρ, which is the ratio of gene conversion to recombination. Note that, with this parameterization, the per generation probabilities of recombination and gene conversion in Figure 2 are given by Inline graphic and .

To predict the likely effect of gene conversion on LD in human populations, we substituted plausible genetic parameters from humans: ρ ∼ 0.0005/bp (Frisse et al. 2001); the ratio of gene conversion to recombination, f, has been estimated to be between ∼1.5 and 14 (Frisse et al. 2001; Jeffreys and May 2004; Padhukasahasram et al. 2004; Chen et al. 2007; Gay et al. 2007); and typical gene conversion tract lengths range from 50 to 500 bp (Jeffreys and May 2004). To simplify our analysis, we assume a fixed tract length of 300 bp (Jeffreys and Neumann 2002). Repeating our analysis with a 50-bp tract length led to slightly higher levels of LD (results not shown).

As Figures 4 and 5 show, adding gene conversion affects LD in both the NSN and the SNN case. In the SNN case, the effect of gene conversion is similar to the effect of recombination, so that increasing f decreases LD (Figure 5, B–D). Adding gene conversion in the SNN case creates more opportunities for the two neutral loci to escape the sweep independently, so LD between them is reduced. In the NSN case, gene conversion increases LD (Figure 4, B–D). In this case, gene conversion has a qualitatively different effect. Gene conversion events at the middle (S) locus allow the two neutral loci to escape the sweep together. Present-day samples of this type may then be samples of an ancestral haplotype that would otherwise have been lost during the sweep. Looking forward in time, in the NSN case gene conversion can preserve the preexisting correlated genealogical structure between the outer loci.

Figure 4.— — Expected LD for the NSN case. The y-axis is the amount of LD as predicted by σ. The x-axis is the distance, in base pairs, between the two neutral loci starting from a distance of 2 × tract length, m, between them and increasing to 10,000 bp. In this example, m = 300 bp. The distance between either neutral locus and the selected site must be at least a tract length, so that any given gene conversion event converts only one locus; the two neutral loci are separated by a minimum of two tract lengths, 600 bp. For this distance range, R_Neutral = nρ + 2mκ. (A) f = 0, there is no gene conversion. (B) f = 1, there is the same amount of gene conversion as there is recombination. (C) f = 5, there is 5 times as much gene conversion as recombination. This is a reasonable ratio for human data. (D) f = 15, there is 15 times as much gene conversion as recombination.

Figure 5.— — Expected LD for the SNN case. The y-axis is the amount of LD as predicted by σ. The x-axis is the distance, in base pairs, between the two loci starting from a distance of at least two tract lengths between them, to allow for comparison to the NSN case, and increasing to 10,000 bp. In this example, m = 300 bp; therefore, the starting distance between the two neutral loci is 600 bp. For this distance range, R_Neutral = nρ + 2mκ. (A) f = 0, there is no gene conversion. (B) f = 1, there is the same amount of gene conversion as there is recombination. (C) f = 5, there is 5 times as much gene conversion as recombination. (D) f = 15, there is 15 times as much gene conversion as recombination.

By incorporating gene conversion into the three-locus model of McVean (2007), we have shown that LD is expected between two loci on opposite sides of a selected locus that has undergone a sweep. Although we have focused only on a single pair of neutral loci, our results have implications for genomic scans for selective sweeps using extended haplotype homozygosity (Sabeti et al. 2002), integrated extended haplotype homozygosity (Voight et al. 2006), and long-range haplotypes (Sabeti et al. 2007). In particular, gene conversion at the selected site will cause some fraction of present-day chromosomes to show the selected allele but while sitting on an ancestral haplotype. Using t_M = 0.1 as in McVean (2007), and assuming a tract length of m = 300 and a ratio of gene conversion to recombination of f = 5, the probability of sampling such a chromosome is 1 – e^−0.0375 ≈ 0.037. Then, among 100 chromosomes that all possess the selected allele, we would expect to see about four of these aberrant haplotypes, and the chance that all 100 chromosomes would show the classic, recombination-only sweep pattern would be 0.9625¹⁰⁰ ≈ 0.024. Thus, it is possible that many selected loci have been missed in the recent genomic scans for selection.

APPENDIX

Notation

The transition equations used in Tables A1–A6 are complicated by the addition of gene conversion events. In an attempt to simplify the equations used to build the tables, a new notation, which incorporates the notation of McVean (2007), is used. The new notation is outlined and compared to McVean's below. All of the symbols used refer to escape probabilities during the selection phase that result from either recombination or gene conversion.

For NSN (Tables A1–A3):

McVean (2007):

q_x is used to indicate that no recombination event has occurred between locus X and locus S. p_x is used to indicate that a recombination event has occurred between locus X and locus S. q_y is used to indicate that no recombination event has occurred between locus S and locus Y. p_y is used to indicate that a recombination event has occurred between locus S and locus Y.

Notation with gene conversion:

The terminology is the same as that in McVean (2007) but with the additional consideration that escape from the selection sweep can also come from gene conversion.

: there is no recombination between locus X and locus S and no gene conversion at locus X.
: there is either at least one recombination event between locus X and locus S or at least one gene conversion at locus X.
: there is no recombination between locus S and locus Y and no gene conversion at locus Y.
: there is either at least one recombination event between locus S and locus Y or at least one gene conversion at locus Y.

For SNN (Tables A4–A6):

McVean (2007):

q_x and p_x have the same meaning as their NSN counterparts. q_y is used to indicate that no recombination event has occurred between locus X and locus Y. p_y is used to indicate that at least one recombination event has occurred between locus X and locus Y.

Notation with gene conversion:

and have the same meaning as their NSN counterparts.
: there is no recombination between locus X and locus Y and no gene conversion at locus Y.
= 1 − = [ + + = : there is either at least one recombination event between locus X and locus Y or at least one gene conversion at locus Y.

There are two additional probabilities present in the case of SNN:

q_s = (1 − p_x)(1 − g_s) is the probability that no recombination event has occurred between locus S and locus X and no gene conversion event has occurred at locus S.
p_s = [p_x(1 − g_s) + (1 − p_x)g_s + p_xg_s] = [p_x + g_s − p_xg_s] is the probability that a recombination event between locus S and locus X or a gene conversion event at locus S or both has occurred.

References

Chen, J.-M., D. N. Copper, N. Chuzhanova, C. Ferec and G. P. Patrinos, 2007. Gene conversion: mechanisms, evolution and human disease. Nature 8 762–775. [DOI] [PubMed] [Google Scholar]
Durrett, R., and J. Schweinsberg, 2004. Approximating selective sweeps. Theor. Popul. Biol. 66 129–138. [DOI] [PubMed] [Google Scholar]
Frisse, L., R. R. Hudson, A. Bartoszewicz, J. D. Wall, J. Donfack et al., 2001. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium. Am. J. Hum. Genet. 69 831–843. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gay, J., S. Myers and G. McVean, 2007. Estimating meiotic gene conversion rates from population genetic data. Genetics 177 881–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hill, W. G., and A. Robertson, 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38 226–231. [DOI] [PubMed] [Google Scholar]
Jeffreys, A. J., and C. May, 2004. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat. Genet. 36 151–156. [DOI] [PubMed] [Google Scholar]
Jeffreys, A. J., and R. Neumann, 2002. Reciprocal crossover asymmetry and meiotic drive in human recombination hot spot. Nat. Genet. 31 267–271. [DOI] [PubMed] [Google Scholar]
Kaplan, N. L., R. R. Hudson and C. H. Langley, 1989. The hitchhiking effect revisited. Genetics 123 887–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maynard Smith, J., and J. Haigh, 1974. The hitchhiking effect of a favourable gene. Genet. Res. 23 23–35. [PubMed] [Google Scholar]
McVean, G., 2002. A genealogical interpretation of linkage disequilibrium. Genetics 16 987–991. [DOI] [PMC free article] [PubMed] [Google Scholar]
McVean, G., 2007. The structure of linkage disequilibrium around a selective sweep. Genetics 175 1395–1406. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ohta, T., and M. Kimura, 1971. Linkage disequilibrium between two segregating nucleotide sites under steady flux of mutations in a finite population. Genetics 68 571–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
Padhukasahasram, B., P. Marjoram and M. Nordborg, 2004. Estimating the rate of gene conversion on human chromosome 21. Am. J. Hum. Genet. 75 386–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sabeti, P. C., D. E. Reich, J. M. Higgins, H. Z. P. Levine, D. J. Richter et al., 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419 832–837. [DOI] [PubMed] [Google Scholar]
Sabeti, P. C., P. Varilly, B. Fry, J. Lohmueller, E. Hostetter et al., 2007. Genome-wide detection and characterization of positive selection in human populations. Nature 449 913–918. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sjödin, P., I. Kaj, S. Krone, M. Lascoux and M. Nordborg, 2005. On the meaning and existence of an effective population size. Genetics 169 1061–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephan, W., T. H. E. Wiehe and M. W. Lenz, 1992. The effect of strongly selected substitutions on neutral polymorphisms: analytical results based on diffusion theory. Theor. Popul. Biol. 41 237–254. [Google Scholar]
Voight, B. F., S. Kudaravalli, X. Wen and J. K. Pritchard, 2006. A map of recent positive selection in the human genome. PLoS Biol. 4(3): e72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] Chen, J.-M., D. N. Copper, N. Chuzhanova, C. Ferec and G. P. Patrinos, 2007. Gene conversion: mechanisms, evolution and human disease. Nature 8 762–775. [DOI] [PubMed] [Google Scholar]

[bib2] Durrett, R., and J. Schweinsberg, 2004. Approximating selective sweeps. Theor. Popul. Biol. 66 129–138. [DOI] [PubMed] [Google Scholar]

[bib3] Frisse, L., R. R. Hudson, A. Bartoszewicz, J. D. Wall, J. Donfack et al., 2001. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium. Am. J. Hum. Genet. 69 831–843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Gay, J., S. Myers and G. McVean, 2007. Estimating meiotic gene conversion rates from population genetic data. Genetics 177 881–894. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Hill, W. G., and A. Robertson, 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38 226–231. [DOI] [PubMed] [Google Scholar]

[bib6] Jeffreys, A. J., and C. May, 2004. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat. Genet. 36 151–156. [DOI] [PubMed] [Google Scholar]

[bib7] Jeffreys, A. J., and R. Neumann, 2002. Reciprocal crossover asymmetry and meiotic drive in human recombination hot spot. Nat. Genet. 31 267–271. [DOI] [PubMed] [Google Scholar]

[bib8] Kaplan, N. L., R. R. Hudson and C. H. Langley, 1989. The hitchhiking effect revisited. Genetics 123 887–899. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Maynard Smith, J., and J. Haigh, 1974. The hitchhiking effect of a favourable gene. Genet. Res. 23 23–35. [PubMed] [Google Scholar]

[bib10] McVean, G., 2002. A genealogical interpretation of linkage disequilibrium. Genetics 16 987–991. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] McVean, G., 2007. The structure of linkage disequilibrium around a selective sweep. Genetics 175 1395–1406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Ohta, T., and M. Kimura, 1971. Linkage disequilibrium between two segregating nucleotide sites under steady flux of mutations in a finite population. Genetics 68 571–580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Padhukasahasram, B., P. Marjoram and M. Nordborg, 2004. Estimating the rate of gene conversion on human chromosome 21. Am. J. Hum. Genet. 75 386–397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Sabeti, P. C., D. E. Reich, J. M. Higgins, H. Z. P. Levine, D. J. Richter et al., 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419 832–837. [DOI] [PubMed] [Google Scholar]

[bib15] Sabeti, P. C., P. Varilly, B. Fry, J. Lohmueller, E. Hostetter et al., 2007. Genome-wide detection and characterization of positive selection in human populations. Nature 449 913–918. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Sjödin, P., I. Kaj, S. Krone, M. Lascoux and M. Nordborg, 2005. On the meaning and existence of an effective population size. Genetics 169 1061–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Stephan, W., T. H. E. Wiehe and M. W. Lenz, 1992. The effect of strongly selected substitutions on neutral polymorphisms: analytical results based on diffusion theory. Theor. Popul. Biol. 41 237–254. [Google Scholar]

[bib18] Voight, B. F., S. Kudaravalli, X. Wen and J. K. Pritchard, 2006. A map of recent positive selection in the human genome. PLoS Biol. 4(3): e72. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The Influence of Gene Conversion on Linkage Disequilibrium Around a Selective Sweep

Danielle A Jones

John Wakeley

Abstract

Figure 1.—

Figure 2.—

TABLE A1.

TABLE A2.

TABLE A3.

TABLE A4.

TABLE A5.

TABLE A6.

Figure 3.—

Figure 4.—

Figure 5.—