Distribution of Parental Genome Blocks in Recombinant Inbred Lines

Olivier C Martin; Frédéric Hospital

doi:10.1534/genetics.111.129700

. 2011 Oct;189(2):645–654. doi: 10.1534/genetics.111.129700

Distribution of Parental Genome Blocks in Recombinant Inbred Lines

Olivier C Martin ^*,^†,¹, Frédéric Hospital ^‡

Editor: I Hoeschele

PMCID: PMC3189807 PMID: 21840856

Abstract

We consider recombinant inbred lines obtained by crossing two given homozygous parents and then applying multiple generations of self-crossings or full-sib matings. The chromosomal content of any such line forms a mosaic of blocks, each alternatively inherited identically by descent from one of the parents. Quantifying the statistical properties of such mosaic genomes has remained an open challenge for many years. Here, we solve this problem by taking a continuous chromosome picture and assuming crossovers to be noninterfering. Using a continuous-time random walk framework and Markov chain theory, we determine the statistical properties of these identical-by-descent blocks. We find that successive block lengths are only very slightly correlated. Furthermore, the blocks on the ends of chromosomes are larger on average than the others, a feature understandable from the nonexponential distribution of block lengths.

WITH the advent of dense genomic maps, in particular based on single-nucleotide polymorphism (SNP) data, the study of haplotypes has become central for modern analyses in population genetics (Buckler and Gore 2007; Carlton 2007; Frazer et al. 2007; Mott 2007; Jakobsson et al. 2008; Bryc et al. 2010). Here, the term haplotype refers to the series of alleles that an individual carries on a chromosome pair at a collection of (possibly many) loci and contrasts with single-locus genotypes that were the objects of many past studies. Haplotypic information can be used for association studies (Gold et al. 2008), for diversity studies (Lindblad-Toh et al. 2005), or for recognizing signals of positive selection using various measures of haplotype homozygosity (Sabeti et al. 2002; Zhang et al. 2006; Lencz et al. 2007; Tang et al. 2007; Curtis et al. 2008). Many approaches capitalize on the apparent “block” structure of haplotypes (Stumpf 2002; Cardon and Abecasis 2003; Wall and Pritchard 2003; Altshuler et al. 2005; Zheng and McPeek 2007). Various causes can be called upon to explain the apparent structuration of genomes in haplotype blocks (Tishkoff and Verrelli 2003; Zondervan and Cardon 2004; Pe’er et al. 2006), among which are recombination hotspots (Goldstein 2001; Jeffreys et al. 2001) and population structure (Pritchard et al. 2000; Grote 2007; Slate and Pemberton 2007). However, the situation is often complicated (Shifman et al. 2003; Yalcin et al. 2004; Cuppen 2005; Kauppi et al. 2005; Greenawalt et al. 2006; Moore et al. 2008). In particular, the theoretical properties of many of the objects mentioned above, e.g., haplotype block lengths, remain largely unknown. Often, the distribution of blocks is declared “nonrandom” (Curtis et al. 2008) although the null hypothesis is not clearly specified.

The task of determining statistical properties of chromosomal block structures has arisen in many different contexts. These can be classified into two types according to the kind of populations considered and lead to different mathematical techniques. In the first class, one asks how the genome of one or more parents in a population gets broken up into blocks at successive generations and how different descendents may share identical-by-descent (IBD) blocks. The framework most generally taken allows for random mating between individuals, a stochastic number of offspring for each individual, and possibly population growth; because of this stochasticity, the mathematical theory of branching processes plays a key role. The second class on the contrary assumes complete knowledge of all genealogies and so is relevant only for controlled crosses. But because the corresponding framework is thus constrained, Markov chains can be used to follow the statistics of IBD blocks and even how the genomes of all founding parents get shared among descendants.

The mathematical treatments in the first class typically build on Fisher’s “theory of junctions” (Fisher 1949). Fisher defined a junction in a chromosome as a boundary point between segments descended by different routes from the founders. Once formed by crossovers, junctions can be inherited, just like point mutations. Fisher (1949, 1954, 1959) and Bennett (1953) investigated the expected number of chromosomal regions separated by junctions for different systems of inbreeding (repeated selfing, repeated sib mating, repeated parent–offspring mating, etc.). Stam (1980) extended Fisher’s theory of junctions to a random mating population of constant size and any number of generations. Furthermore, he was able to derive the “probability distribution of the heterogenic part of the genome” (and not just the expected number of fragments) by assuming that the fragments were exponentially distributed—a critical hypothesis that was justified by numerical simulations. Chapman and Thompson (2003) extended Stam’s work to the case of a subdivided population. They were also able to relax the hypothesis of an exponential distribution and showed that the IBD tracts of chromosomes followed a distribution not quite exponential, having in fact a fat tail. They also determined how these properties were affected by the population size. Analogous work by Baird et al. (2003) focused on the case of a formally infinite population; this simplifies the problem because related individuals never mate with one another. Furthermore, they worked in the approximation of allowing only 0 or 1 crossover at each meiosis, a case sometimes referred to as complete interference; then each individual can carry at most just one block from the reference founding parent. Within this framework, they were able to treat the problem exactly, deriving in particular the distribution of the number of descendants containing blocks and the first two moments of these block sizes.

The mathematical treatment of the second class was initiated by Donelly (1983). Numerous studies since have derived exact mathematical results on different kinds of pedigree systems (Slatkin 1972; Franklin 1977; Donelly 1983; Guo 1994; Bickeboller and Thompson 1996a,b; Stefanov 2000; Browning and Browning 2002; Cannings 2003; Dimitropoulou and Cannings 2003; Ball and Stefanov 2005; Walters and Cannings 2005; Rodolphe et al. 2008). Such studies map the IBD problem to that of a random walk on a pedigree-dependent graph. It is that Markovian framework which we use here in the context of recombinant inbred lines, a particular kind of pedigree that has the additional complication of allowing for an infinite number of generations. We provide a description that is mathematically rigorous but also of practical use.

Recombinant inbred lines (RILs) can be derived by either self-fertilization (plants) or brother–sister matings (animals). RILs have become a tool of choice for animal and plant studies [genetic maps, QTL detection, and association studies (Churchill et al. 2004; Churchill 2007; Crow 2007; Keurentjes et al. 2007; Yu et al. 2008)]. Moreover, such lines are fixed and provide ever-lasting replicable reference homozygous genomes; these are very useful to dissect complex traits and estimate epistatic effects or genotype × environment interactions (Bergland et al. 2008; Maccaferri et al. 2008; Alcazar et al. 2009). To produce a RIL, one typically starts with F₁ hybrids derived from the cross of two homozygous parents, say P_A and P_a. Offspring are generated from these F₁ and the process is repeated for many generations; this can be done by selfing [single-seed descent (SSD)] or by full-sib mating (hereafter referred to as “SIB”). At each generation mean heterozygosity decreases and in fact the process tends toward homozygosity at all loci. Due to the formation of crossovers during meiosis, the genomes at each generation are mixtures of the two parental ones, in which closer loci have a higher probability of descending from the same parent P_A or P_a. The fixed genomes then form successions of blocks, each block being IBD to one of the two parents. In effect, we have a mosaic genome for the RIL, patching together pieces from each parent P_A or P_a. What is the mean length of blocks? We shall see that it is 0.5 M in SSD and 0.25 M in SIB if the chromosome genetic length is large. But one may also ask what is the block length distribution, what is the mean number of blocks on a finite chromosome, or even what are the analogous statistics before all loci are fixed. As genome coverage becomes dense or as one approaches a nucleotide-level description of genomes [be it for association studies or genomic selection (Meuwissen et al. 2001)], one is inevitably driven toward a continuous chromosome picture, requiring block-like descriptions. Here, we address the need to work at this level where blocks are the elements of interest. In particular, we show how to calculate block statistics in RILs, using a mix of combinatorial analysis and probability theory.

Model and Methods

Junctions

For all our work, we deal with diploid organisms and are concerned with the construction of RILs in both SSD and SIB. Each chromosome pair is subject to independent dynamics, so without loss of generality we can focus on the case of a single chromosome pair. Furthermore, the objects of study are the IBD blocks, hereafter simply referred to as blocks. The F₁ is considered to be the zeroth generation (g = 0). To go from one generation to the next, we produce one offspring in the case of SSD and a brother–sister pair in the case of SIB. An offspring individual is the union of two gametes, each of which is produced by a parent through meiosis, during which there can be crossovers. (In the case of SSD, there is just one parent.) Once a locus is fixed, it stays so forever. In SSD this simply requires the two homologous chromosomes to have the same allele at that locus; in SIB, it requires all four chromosomes to have the same allele.

We measure continuous positions on the chromosome in morgans, with the leftmost end of the chromosome corresponding to the origin of our axis; i.e., x = 0. Following the work of Fisher (1949, 1954, 1959), Bennett (1953), and Donelly (1983), a crossover is referred to as a “junction” and is identified with an arbitrarily precise point on the chromosome. We assume that crossovers arise without interference; then the production or not of a junction in the interval [x, x + dx] is independent of occurrences of junctions in any other interval. Here x denotes the genetic position, dx is infinitesimal, and junctions arise with density 1 along the chromosome. Figure 1 illustrates the use of these junctions when following successive generations under SSD. At one generation, consider the pair (H, H′) of homologous chromosomes. A meiosis takes place and results in an offspring chromosome (gamete) that is a mosaic of chromosome segments coming from either H or H′. A junction separates two adjacent segments. We use a binary label (0 or 1) to specify the origin (H or H′) of each segment. For any position x, having the list of its labelings for all generations g allows us to determine the IBD content at x as shown on the right-hand side of Figure 1. Note that the numbering of the junctions is done from left to right, not as a function of its occurrence in generations. The successive steps of the procedure are shown in Supporting Information, Figure S1: first one lays out the junctions and their numbering, then one introduces the binary labels across each junction, and finally one reconstructs the haplotypes (see File S1). As illustrated in Figure S2, the case of SIB mating is analogous, and again at each generation a chromosome consists of a mosaic of the founding parents’ chromosomes (see File S1).

Labeling in a SSD RIL. At each generation the homologs are called H and H′. To keep track of the IBD property, for each point on the continuous chromosome under consideration we specify the origin (H or H′ in the parent) using a 0–1 label, covering zones separated by junctions. The genotype at any generation g can be reconstructed from these binary numbers as shown on the right. Note that a junction need not separate two blocks.

So far, the introduction of junctions can be formulated for any pedigree system. Many previous studies have done this and mapped the IBD problem to that of a continuous-time random walk on a pedigree-dependent graph (Slatkin 1972; Franklin 1977; Donelly 1983; Guo 1994; Bickeboller and Thompson 1996a,b; Stefanov 2000; Browning and Browning 2002; Cannings 2003; Dimitropoulou and Cannings 2003; Ball and Stefanov 2005; Walters and Cannings 2005; Rodolphe et al. 2008). Here we consider RILs and then the Markovian framework’s pedigree-dependent graph is a hypercube. The sequence of binary labels at any given locus x specifies a unique vertex of the hypercube, and how this vertex changes as one moves along the chromosome determines the block structure. Note that some of this mathematical framework is close in spirit to that used for studying the coalescent in the presence of recombination; there the central object is the so-called ancestral recombination graph, and the problem (Wiuf and Hein 1999; McVean and Cardin 2005) is to describe how this graph changes with position along a continuous chromosome. This is a very difficult problem and so the authors of those studies derived relatively few exact results.

The application of this Markovian framework for SSD and SIB RILs requires considering all possible continuous-time walks on a high-dimensional hypercube. We do this in two steps. First, we enumerate by computer all possible discrete time random walks on that hypercube. Then we tackle the continuous waiting times of the original walks by analytical techniques. Finally, the numerical treatment of these analytical expressions is performed using Mathematica (Wolfram 1991). The C and Mathematica codes for these different tasks are provided in File S2.

The continuous-time Markov process

When creating the generations 1 to g, 2g gametes are produced in the SSD mating scheme, and 4g gametes are produced in the SIB mating scheme. Denote this number by N_c as it is also the number of (new) chromosomes produced in the RIL construction (remember we follow only one chromosome pair). Rather than follow each gamete from one generation to the next, it is (more) useful to consider all N_c gametes simultaneously. This can be visualized by stacking the pairs of chromosomes for all generations on top of each other and then scanning the chromosome stack from left to right to see where the junctions appear in order of increasing x.

Of great importance is the fact that the junctions on these N_c gametes are independent in all respects: having a junction on one gamete in [x, x + dx] does not affect the probability of having another junction elsewhere, be it on the same gamete or on any other gamete. Because of this independence, one can think of the production of junctions among the whole set of gametes as being a “continuous-time” Markov process (Feller 1950), where x plays the role of time. For the interval [x, x + dx], a junction arises with probability N_cdx, and then if such an event is realized the junction is assigned randomly to one of the N_c gametes (each with probability 1/N_c). The operation is then repeated for the interval [x + dx, x + 2dx], and so forth. We thus have a Markov process where interevent intervals are independent and distributed as

ρ (Δ x) = N_{c} \exp (- N_{c} Δ x)

(1)

while junction assignments to chromosomes are done equiprobably.

Discrete and continuous-time random walks on the hypercube

We initialize the binary labels at x = 0 randomly and uniformly because segregation is unbiased. The continuous-time Markov process extends these labels from x = 0 toward increasing x. At any given x, and using a {0, 1} notation for each binary label, we call ℳ the map from the N_c dimensional hypercube ℋ $= {0, 1}^{N_{c}}$ > to the genotypes at generation g; this map can be thought of as a coloring of the vertices of the N_c-dimensional hypercube. There are as many colors as there are one-locus genotypes at generation g: 4 for SSD and 16 for SIB. Then the block pattern at generation g can be “read off” by examining the succession of vertex colors visited by the Markov walk on the hypercube. Note that having a junction appear at x corresponds to hopping to a random neighboring vertex on ℋ at that time, while the residence time on each vertex is exponentially distributed (cf. Equation 1).

For the block statistics, we want to find the probability that the walk on ℋ leads to a given pattern of successive colors. For this, we sum over all possible walks compatible with the desired pattern. The crucial point is that the continuous variables of the interjunction values affect the lengths of the blocks, but not the pattern of successive blocks. This allows us to decompose the problem of block statistics into two parts. The first comes from the discrete set of possibilities for the sequence of vertices visited on ℋ (the “topology” of the junctions); we use the master equation of the discrete-time random walk on the hypercube to track these sequences. The second is associated with the continuous nature of the junction–junction intervals, which involves summing over known probability distributions derived from Equation 1.

Extracting block length distributions

Consider the simplest observable: the length of the first block along the chromosome. If the block is heterozygous, then its length distribution in SSD is that of the distance of the first junction and is given by Equation 1. Indeed, the locus x = 0 must be heterozygous at generation g, but in SSD, at the very first hop of our random walk on the hypercube, the heterozygous block will end, starting a fixed block. The situation is more instructive if the first block is homozygous (fixed). To calculate the length distribution of that first block, we first consider all possibilities for the different walks from the starting vertex (defined from the situation at x = 0). A walk will maintain the homozygous structure at generation g for perhaps a few hops and then one hop will change that. If k is the first hop that ends the block, we can collect together all the discrete time walks that have the same k. We thus define P⁽¹⁾(k) as the probability to perform k – 1 hops while staying at generation g in the same fixed state as that of x = 0 and to then terminate the block at the kth hop. P⁽¹⁾(k) is the sum of the probabilities of all discrete time walks on ℋ that are compatible with staying in the first fixed state during exactly k – 1 hops. Because the number of such walks grows exponentially with k, it is best to determine this quantity by recursions rather than by enumerations. This is precisely what is done when using the associated master equation. Each iteration of that equation updates a vector on the hypercube and generates the successive P⁽¹⁾(j). To obtain P⁽¹⁾(k) one has to perform k iterations of the master equation. File S1 specifies this master equation, the initialization of the vector iterated, and the relation between the iterated vector and P⁽¹⁾(j); the C programs for implementing these iterations are also provided (see File S2).

Given the P⁽¹⁾(k) probabilities, we can reintroduce the continuous times spent on each vertex of the hypercube to get the distribution of the length of the first bloc. Indeed, for SSD as well as for SIB, for all walks that contribute to this situation, x will go from 0 to x_k with x_k distributed as a rescaled Gamma distribution,

ρ_{k} (x_{k}) = \frac{N_{c}^{k} x_{k}^{k - 1} \exp (- N_{c} x_{k})}{(k - 1)!}

(2)

as this the distribution of the sum of k independent exponentially distributed variables. The distribution of ℓ₁, the length of the first block (assuming it is fixed) is then given by

μ^{(1)} (ℓ_{1}) = \sum_{k = 1}^{\infty} P^{(1)} (k) ρ_{k} (ℓ_{1}) .

(3)

This result holds for an infinite chromosome. For a finite chromosome of length L, we note that if ℓ₁ > L, we have “stepped off” the finite chromosome. Thus, if the value of ℓ₁ (distributed as in Equation 3) is greater than L, we see that on the finite chromosome the block is actually only L long. Thus to adapt Equation 3 to a finite chromosome, we simply keep the distribution as is when ℓ₁ < L while for all those values ℓ₁ ≥ L we set ℓ₁ = L. Mathematically, this generates a delta function at that point of weight given by the probability that ℓ₁ ≥ L. This derivation corresponds to a simple truncation of a distribution, and it can be extended to other observables. These include the length of the nth block, which requires calculating the probabilities P⁽ⁿ⁾(k) of stepping off the nth block after k steps, and the joint distribution of lengths for different blocks. Details on such derivations are given in File S1. The Mathematica codes for performing sums like those in Equation 3 are also provided in File S2.

Results

Infinite and semi-infinite chromosomes

Mean block lengths:

As the number g of generations increases, alleles become fixed and since nothing changes thereafter the statistics of the blocks must have a limit at large g. In that situation, there is only alternation of IBD blocks homozygous of type P_A or P_a. We focus on the statistics of such blocks, either at arbitrary g or in the g → ∞ limit, approximated by taking g large enough. However, our computational framework applies to arbitrary blocks, homozygous or heterozygous. From the practical point of view, we are limited computationally by the part of the algorithm that follows occupation probabilities on the hypercube ℋ: executing this master equation on the computer uses $O (N_{J} N_{c} 2^{N_{c}})$ operations where N_J is the maximum number of hops of the walks; this restricts our study to 14 generations in SSD and 7 generations in SIB.

A simple statistic of blocks is their mean length 〈ℓ〉. This quantity is related to the density η of block extremities: on a very large chromosome of size L, the number of blocks n will satisfy n/L ≈ η while 〈ℓ〉 ≈ L/n. For SSD RILs the density η is known to be 2 while it is 4 in SIB. Such a result follows by considering a small interval [x₁, x₂] and asking that the interval be recombinant. In SSD this occurs with probability R = 2r/(1 + 2r), where r is the recombination rate per meiosis between x₁ and x₂ (Haldane and Waddington 1931). Taking x₂ – x₁ to be infinitesimal, we get r ≈ (x₂ – x₁) (Haldane 1919) and so R ≈ 2(x₂ – x₁); noting that recombination then implies the presence of a block extremity in this interval, we see that the density of block extremities is 2. Setting η = 2, one obtains directly 〈ℓ〉 = $\frac{1}{2}$ , valid at large g and for large chromosomes. An identical reasoning gives 〈ℓ〉 = $\frac{1}{4}$ for SIB since in this case R = 4r/(1 + 6r).

Distribution of block lengths:

Getting the distribution of a block length cannot be achieved by such shortcuts. Instead, we generalize Equation 3, again at very large g and for a very long (semi-infinite) chromosome. Starting from x = 0, the successive block lengths are ℓ₁, ℓ₂, … ; as the block number n increases, the distributions of ℓ_n tend toward a limiting distribution μ*(ℓ) that has no memory of the state at the chromosome’s origin. The computation of the distribution at any given n, in direct analogy with what was done for ℓ₁, requires calculating the probabilities P⁽ⁿ⁾(k) of staying on the nth block during k – 1 hops and stepping off at the kth one. Again, we use the master equation to compute these quantities iteratively; cf. File S1. Then the distribution of ℓ_n is obtained by replacing the P⁽¹⁾(k) in Equation 3 by P⁽ⁿ⁾(k). These successive distributions are displayed for SSD in Figure 2. Note that the convergence in n is very rapid; only the first block is visibly different from the others. In File S1, we provide a parameter-free approximation to μ*(ℓ) that works quite well as shown in Figure S3. In the case of SIB RILs, the convergence with n is much slower as shown in Figure S4.

Block length distribution. Displayed is the probability density of homozygous block length for block number 1, 2, . . . , 10 in SSD. The chromosome is semi-infinite, and the number of generations is large to be in fixation; except for the first block, the curves seem to superpose but in fact are distinct.

A log–log plot of these distributions shows that they are not exponential, though in the tail they all are well approximated by an exponential; such a form in the tail is a consequence of the spectral decomposition of the Markov process (see File S1). Note that these distributions for ℓ₁, ℓ₂, … must all decay at the same asymptotic rate. Another important point is that the distribution of ℓ₂ is slightly different from that of ℓ₃, proving that the successive lengths are not independent: the block lengths are not generated by a stationary renewal process.

Although we were not able to derive the analytic form of μ*, we nevertheless have

μ * (ℓ \to 0) = 3 in SDD and μ * (ℓ \to 0) = 7 in SIB .

(4)

This can be proved by relating μ*(ℓ → 0) to double-recombinant frequencies as follows. We take two successive intervals I_1,2 and I_2,3, each of length dx (infinitesimal), and ask what the probability is that the intervals are both recombinant. There is a probability 2dx in SSD and 4dx in SIB that the first interval is recombinant (recall that the densities of block extremities are respectively 2 and 4). Then given this first extremity, the probability that there is another extremity in the second interval is μ*(ℓ → 0)dx, leading to a total probability of 2dxμ*(ℓ → 0)dx in SSD and 4dxμ*(ℓ → 0)dx in SIB. However, this double-recombinant probability can also be computed (Martin and Hospital 2006) in terms of the three recombination frequencies R_1,2, R_2,3, and R_1,3 associated with the locus pairs (1, 2), (2, 3), and (1, 3). In the limit of small dx, it is (2dx)²3/2 for SSD and (4dx)²7/4 for SIB. Identifying the different expressions, we get the claimed results.

Analogous studies can be performed at given values of g, including or not heterozygous blocks. Recalling that fixation arises rather rapidly in SSD, it is no surprise that the statistics of homozygous blocks converge quickly to a large g limit. For example, if one computes in SSD the length distribution of the first block when it is homozygous, one finds that it does not vary much for g ≥ 3 as illustrated in Figure S5. For completeness, we show the analogous result in SIB in Figure S6.

The first block is longer than the following ones:

The system does not follow a stationary renewal process. Nevertheless, it turns out that the large difference we see between the first and the remaining blocks is not so much due to the memory from one block to the next but to the nonexponential distribution of block lengths. Even in the presence of memory from block to block, one has the following general relation on a semi-infinite chromosome at large g,

〈 ℓ_{1} 〉 = \frac{〈 ℓ_{\infty}^{2} 〉}{2 〈 ℓ_{\infty} 〉},

(5)

where ℓ₁ denotes the length of the first block and ℓ_∞ denotes that of faraway blocks. The proof boils down to considering the blocks on the infinite line and taking the origin of the (semi-infinite) chromosome at random. It falls inside a block of length ℓ_∞ with probability density proportional to ℓ_∞ itself. Denoting as before by μ⁽¹⁾(ℓ₁) the probability density of the length of the first block, we have

μ^{(1)} (ℓ_{1}) = \frac{\int_{ℓ_{1}}^{\infty} μ * (ℓ_{\infty}) d ℓ_{\infty}}{〈 ℓ_{\infty} 〉} .

(6)

(The reader can check that this is a normalized probability density.) Using this density, the computation of the first moment of ℓ₁ leads directly to Equation 5. To interpret this result, note that Equation 5 implies that the relative difference (〈ℓ₁〉 – 〈ℓ_∞〉)/〈ℓ_∞〉 is equal to the [relative variance of μ*(ℓ_∞) $− 1] / 2$ . When μ*(ℓ_∞) is a pure exponential, this quantity vanishes and then 〈ℓ₁〉 = 〈ℓ_∞〉. Thus we have 〈ℓ₁〉 > 〈ℓ_∞〉 if and only if the relative variance of μ*(ℓ_∞) > 1, which is what we find to happen in this system. For instance in the SSD case, we have 〈ℓ₁〉 = 0.595, to be compared with 〈ℓ_∞〉 = $\frac{1}{2}$ . The first block is thus on average substantially larger than the others.

From Figure 2 one can see that μ⁽¹⁾ is more spread out than μ*; μ* gives μ⁽¹⁾ from Equation 6 so that μ⁽¹⁾(0) = 2 in SSD and 4 in SIB by direct computation using 〈ℓ_∞〉. Note that μ⁽¹⁾(0)dx can also be interpreted as the probability of having the first block end between x = 0 and x = dx; since that is the same as the density of junctions times dx, i.e., 2 or 4 × dx, we indeed recover the result μ⁽¹⁾(0) = 2 for SSD and μ⁽¹⁾(0) = 4 for SIB.

The lengths of successive blocks are slightly correlated:

Even though junctions are independent, each junction affects the IBD property in its neighborhood. Two positions will have nearly independent IBD only when they are distant along the chromosome because only in that case will there be many crossover events separating them. It thus seems natural to expect that the successive block lengths will not be independent in contrast to the underlying Δx separating junctions. In fact this must be the case given that we found earlier that on a semi-infinite chromosome the distribution of lengths is different for the second and the third block.

Our framework allows one to compute joint distributions and thus the linear correlation coefficients

C (ℓ_{n}, ℓ_{n + 1}) = \frac{〈 ℓ_{n} ℓ_{n + 1} 〉 - 〈 ℓ_{n} 〉 〈 ℓ_{n + 1} 〉}{σ_{ℓ_{n}} σ_{ℓ_{n + 1}}},

(7)

where $σ_{ℓ_{n}}$ (resp. $σ_{ℓ_{n + 1}}$ ) is the standard deviation of ℓ_n (resp. ℓ_n₊₁). The joint distribution of ℓ_n and ℓ_n₊₁ is given by

\begin{array}{l} μ^{(n, n + 1)} (ℓ_{n}, ℓ_{n + 1}) \\ = \sum_{k_{n} = 1}^{\infty} \sum_{k_{n + 1} = 1}^{\infty} P^{(n, n + 1)} (k_{n}, k_{n + 1}) ρ_{k_{n}} (ℓ_{n}) ρ_{k_{n + 1}} (ℓ_{n + 1}), \end{array}

where P⁽ⁿ^,ⁿ⁺¹⁾(k_n, k_n₊₁) is the probability that the nth block ends after k_n hops and the n + 1th after k_n₊₁. From this distribution, the mean of the product ℓ_nℓ_n₊₁ is the sum of the probabilities P⁽ⁿ^,ⁿ⁺¹⁾(k_n, k_n₊₁) times the average of ℓ_n times the average of ℓ_n₊₁ (factorization), each of which is obtained from Equation 2. The linear correlation coefficient then reduces to

C (ℓ_{n}, ℓ_{n + 1}) = \frac{〈 k_{n} k_{n + 1} 〉 - 〈 k_{n} 〉 〈 k_{n + 1} 〉}{{[(σ_{k_{n}}^{2} + 〈 k_{n} 〉) (σ_{k_{n + 1}}^{2} + 〈 k_{n + 1} 〉)]}^{1 / 2}},

(8)

where $σ_{k_{n}}$ (resp. $σ_{k_{n + 1}}$ ) is the standard deviation of k_n (resp. k_n₊₁). These quantities are directly obtainable from the probability P⁽ⁿ^,ⁿ⁺¹⁾(k_n, k_n₊₁), as long as the number of generations is not too large. We have computed these quantities in SSD for a semi-infinite chromosome, for different g’s and choices of block numbers. For instance, if we consider the first and second blocks, assumed to be fixed, the value of C(ℓ₁, ℓ₂) is –0.0197 at g = 2, –0.0125 at g = 3, –0.00841 at g = 4, … , with a trend that is compatible with a vanishing limit at large g. We find the same trend for the following blocks too. Furthermore, at given g, C(ℓ_n, ℓ_n₊₁) rapidly converges to a limiting value as n increases. This is illustrated in Table S1. The computer programs for obtaining these correlation coefficients are also provided (see File S2).

Case of finite chromosomes

Length distributions:

Clearly on a finite chromosome of length L the distributions of block lengths are modified. The computations are more complicated, but remain feasible. As an illustration, consider the case where the chromosome has just two blocks. We use the P^(1,2)(k₁, k₂) probabilities, and for each (k₁, k₂) we impose the filter that ℓ₁ < L while ℓ₁ + ℓ₂ > L. Then we have for the probability densities μ⁽¹⁾(ℓ₁) = μ⁽¹⁾(ℓ₂) and thus the distribution must be symmetric about L/2, and we also have 〈ℓ₁〉 = 〈ℓ₂〉 = L/2. The explicit expression for this density is

μ^{(1)} (ℓ_{1}) = \sum_{k_{1} = 1}^{\infty} \sum_{k_{2} = 1}^{\infty} P^{(1, 2)} (k_{1}, k_{2}) ρ_{k_{1}} (ℓ_{1}) \int_{L - ℓ_{1}}^{\infty} ρ_{k_{2}} (ℓ_{2}) .

(9)

From this it turns out that μ⁽¹⁾(ℓ₁) has a minimum at ℓ₁ = L/2: it is more likely to have one rather short and one rather long block than to have two blocks of approximately the same size.

A comparison with experimental RIL data:

It is appropriate to compare our theoretical computations with block statistics measured in experimental RILs. Since the block sizes are random variables, it is best to work with RIL data sets where (i) the block structure has been determined precisely, requiring high-density genotyping, and (ii) there are many lines, an easier task for SSD than for SIB crosses. Such a data set has been produced within the species Arabidopsis thaliana by Singer et al. (2006). These authors genotyped several hundred thousand loci via hybridization arrays, from which they determined block extremities in 100 SSD recombinant inbred lines derived from the crossing of Columbia and Landsberg homozygotes. Singer et al. provide the genetic map of their cross and the physical positions of the block extremities. From these data we determined the block lengths for each RIL and each of the five chromosomes. We display in Figure S7 the distributions of the first block length, for all five chromosomes. The solid line is the theoretical curve, corresponding to the infinite chromosome case but truncated to the genetic length of each chromosome. As was explained previously, if the (infinite chromosome) random variable ℓ₁ is larger than the length L of the chromosome, one sees in practice a block of length ℓ₁ = L; this happens with a finite probability that is represented in Figure S7, using a solid dot. The histograms are for the experimental data, and we have included the 95% confidence intervals for each bin. We see that the theoretical predictions agree well with the experimental values except for the last bin of chromosome 1 and chromosome 4. Interestingly, for both of these the segregation data exhibit significant distortion; such distortion, typically caused by loci under selection pressures, can affect recombination rates. It is thus satisfying that the agreement between theory and experiment is as good as it is in spite of this distortion.

Number of blocks:

Clearly the typical number of blocks will grow with the total length L of the chromosome. Furthermore, for large g, the density of block extremities is 2 in SSD (4 in SIB); thus at large L the mean number of blocks should grow as 2L in SSD (as 4L in SIB). It is also of interest to determine the distribution of the number of blocks.

Consider first the probability that the whole chromosome at generation g is in one single homozygous block. Starting at the left end of the chromosome, we must be in a fixed state: we choose it to be, for instance, the P_A genotype. This constraint is used to set the occupation probabilities of the random walks on ℋ before the first hop. Explicitly, at x = 0 we introduce V_i⁽⁰⁾ = 0 on vertex i if its color is incompatible with the P_A genotype; otherwise V_i⁽⁰⁾ is a site-independent constant such that the total probabilities sum to 1. At each junction (hop), the master equation is used to update the vector of probabilities on the hypercube, and so for K hops we have the vector ${V_{i}^{(K)}}_{i = 1, ..., 2^{N_{c}}}$ . We iterate for up to a given total of N_J junctions. In practice we cannot take N_J = ∞ because of the numerical nature of the algorithm; so instead we take N_J sufficiently large so that only negligible probabilities are dropped in the truncation. As a ballpark estimate, N_J must be large enough to have N_J/N_c ≫ L, and then each gamete can have many junctions per morgan. During the application of the master equation to the vector, hops terminating the block are stored and we keep in a file the probabilities Π⁽¹⁾(K) that the first block ends after K hops, 1 ≤ K ≤ N_J. Then the probability p₁ that there will be just one block in 0 ≤ x ≤ L is given by

p_{1} = \sum_{K = 1}^{\infty} Π^{(1)} (K) \int_{L}^{\infty} ρ_{K} (x) d x

(10)

times the probability of fixing at x = 0 (in SSD this is 1 – 1/2^g; in SIB it is given by a recurrence relation). Note that these integrals correspond to incomplete Gamma functions, allowing for a relatively efficient computation (see File S2). As mentioned before, in practice the sum over K is truncated to K ≤ N_J and one must check that the error induced by this truncation is small enough. We find that the probability to have a single block decreases exponentially when L grows. For instance in SSD, at large g, the probability is 0.417 for L = 0.5 M, 0.189 for L = 1.0 M, and 0.088 for L = 1.5 M.

The previous approach can be extended to compute the probability that the chromosome consists of n blocks as follows. For simplicity, consider directly the large g limit so we can ignore hops on ℋ leading to heterozygous genotypes at g. We use the master equation framework to keep track of the joint probabilities of being on a vertex of ℋ and of being in the mth block, for the number of hops K going from 0 to N_J. At each hop, there are transitions from vertex to vertex and potentially from block to block. If one hops to a vertex incompatible with the desired block pattern, the probability is set to 0. We keep track of the probabilities Π⁽ⁿ⁾(K) that the nth block terminates at the Kth hop. The probability of having at most n blocks when 0 ≤ x ≤ L is then given by the same formula as for one block (Equation 10) but replacing Π⁽¹⁾(K) by Π⁽ⁿ⁾(K). (Note that K is the sum of the number of hops for blocks 1, 2, … , n.) Repeating the computation for n – 1, we obtain the probability of having at most n – 1 blocks in the region 0 ≤ x ≤ L, and taking the difference of the two results we obtain the probability of having exactly n blocks (see File S2). As an illustrative example, Figure 3 gives the probability distribution of n at large g in the case of SSD for a chromosome of length L = 1.5 M.

Distribution of block number in SSD RIL. The histogram shows the frequencies of having 1, 2, . . . blocks in SSD at large number of generations; here L = 1.5 M.

Discussion

In the dense marker or continuous chromosome picture, genotypes appearing in a RIL form IBD blocks. The block structure is nontrivial in part because the consequence of a crossover at generation g depends on crossovers arising at previous generations. To study the statistics of such blocks, we used a labeling procedure that allows for a mapping onto a Markov process. Such a process reduces our problem of blocks to linear operations (associated with a master equation that we implemented in a C code), followed by relatively standard analysis involving sums and integrals (that we treated via Mathematica). The sources for these computations are provided with this work (see File S2).

We illustrated a number of properties of block statistics, highlighting in particular several closed-form results. Although the joint statistics of blocks are quite complex, we found that block-to-block correlations were very weak and thus genotype frequencies are well approximated by a stationary renewal process. In addition, the distributions of block lengths are not too far from exponential. Thus approximating the RIL case using exponential distributions will lead to qualitatively correct results with an accuracy of ∼20%. This level of accuracy should hold for other systems of crosses such as randomly mating populations, justifying the use of the exponential approximation in several previous studies (Stam 1980; Chapman and Thompson 2003).

We mainly stressed cases with complete fixation because the construction of RILs aims to have homozygous genotypes, but our formalism is applicable to both homozygous and heterozygous blocks. Note that within SSD RILs, as shown in Results, heterozygous blocks have an exponential distribution for their lengths; furthermore, heterozygous blocks interrupt the memory of the process in SSD; that is, two blocks separated by a heterozygous segment are independent. The reason is that there is only one way to be heterozygous in SSD (up to irrelevant exchanges of gametes at the same generation).

Clearly it is possible to extend our formalism to cases involving more than two parents; this may be of use when dealing with multiparental RILs that are being developed currently to have greater power in association studies (Churchill et al. 2004). We hope our results will stimulate work in this direction.

Acknowledgments

We thank F. Rodolphe for discussing his results with us. This work has been supported by grants from the Agence Nationale de la Recherche: ANR-09-GENM-022-003 SingleMeiosis (to O.C.M.) and ANR-O6-BLANC-0128 (to F.H.).

AUTHOR QUERIES

Please spell out “INA PG” and “LPTMS” in affiliations.

Please confirm or amend the corresponding author’s contact information.

Please verify styling of Greek and math symbols in text and equations throughout article. Check carefully for correct use of boldface, italics, operators, qualifiers, spacing, superscripts, and subscripts.

Please verify supporting information links throughout the manuscript.

“This is illustrated in Table S1” as meant? If not, please cite Table S1 in text.

Please verify the page numbers (851-U3) (in reference "Frazer, Ballinger, Cox, Hinds, Stuve et al., 2007").

Please verify the page numbers (89-U14) (in reference "Zondervan, Cardon, 2004").

Literature Cited

Alcazar R., Garcia A. V., Parker J. E., Reymond M., 2009. Incremental steps toward incompatibility revealed by Arabidopsis epistatic interactions modulating salicylic acid pathway activation. Proc. Natl. Acad. Sci. USA 106: 334–339 [DOI] [PMC free article] [PubMed] [Google Scholar]
Altshuler D., Brooks L. D., Chakravarti A., Collins F. S., Daly M. J., et al. , 2005. A haplotype map of the human genome. Nature 437: 1299–1320 [DOI] [PMC free article] [PubMed] [Google Scholar]
Baird S., Barton N., Etheridge A., 2003. The distribution of surviving blocks of an ancestral genome. Theor. Popul. Biol. 64: 451–471 [DOI] [PubMed] [Google Scholar]
Ball F., Stefanov V., 2005. Evaluation of identity-by-descent probabilities for half-sibs on continuous genome. Math. Biosci. 196: 215–225 [DOI] [PubMed] [Google Scholar]
Bennett J., 1953. Junctions in inbreeding. Genetica 26: 392–406 [DOI] [PubMed] [Google Scholar]
Bergland A. O., Genissel A., Nuzhdin S. V., Tatar M., 2008. Quantitative trait loci affecting phenotypic plasticity and the allometric relationship of ovariole number and thorax length in Drosophila melanogaster. Genetics 180: 567–582 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bickeboller H., Thompson E., 1996a Distribution of genome shared IBD by half-sibs: approximation by the Poisson clumping heuristic. Theor. Popul. Biol. 50: 66–90 [DOI] [PubMed] [Google Scholar]
Bickeboller H., Thompson E., 1996b The probability distribution of the amount of an individual’s genome surviving to the following generation. Genetics 143: 1043–1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
Browning S., Browning B., 2002. On reducing the statespace of hidden Markov models for the identity by descent process. Theor. Popul. Biol. 62: 1–8 [DOI] [PubMed] [Google Scholar]
Bryc K., Auton A., Nelson M. R., Oksenberg J. R., Hauser S. L., et al. , 2010. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc. Natl. Acad. Sci. USA 107: 786–791 [DOI] [PMC free article] [PubMed] [Google Scholar]
Buckler E., Gore M., 2007. An Arabidopsis haplotype map takes root. Nat. Genet. 39: 1056–1057 [DOI] [PubMed] [Google Scholar]
Cannings C., 2003. The identity by descent process along the chromosome. Hum. Hered. 56: 126–130 [DOI] [PubMed] [Google Scholar]
Cardon L. R., Abecasis G. R., 2003. Using haplotype blocks to map human complex trait loci. Trends Genet. 19: 135–140 [DOI] [PubMed] [Google Scholar]
Carlton J. M., 2007. Toward a malaria haplotype map. Nat. Genet. 39: 5–6 [DOI] [PubMed] [Google Scholar]
Chapman N. H., Thompson E. A., 2003. A model for the length of tracts of identity by descent in finite random mating populations. Theor. Popul. Biol. 64: 141–150 [DOI] [PubMed] [Google Scholar]
Churchill G. A., 2007. Recombinant inbred strain panels: a tool for systems genetics. Physiol. Genomics 31: 174–175 [DOI] [PubMed] [Google Scholar]
Churchill G., Airey D. C., Allayee H., Angel J. M., Attie A. D., et al. , 2004. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 36: 1133–1137 [DOI] [PubMed] [Google Scholar]
Crow J. F., 2007. Haldane, Bailey, Taylor and recombinant-inbred lines. Genetics 176: 729–732 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cuppen E., 2005. Haplotype-based genetics in mice and rats. Trends Genet. 21: 318–322 [DOI] [PubMed] [Google Scholar]
Curtis D., Vine A. E., Knight J., 2008. Study of regions of extended homozygosity provides a powerful method to explore haplotype structure of human populations. Ann. Hum. Genet. 72: 261–278 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dimitropoulou P., Cannings C., 2003. Recsim and indstats: probabilities of identity in general genealogies. Bioinformatics 19: 790–791 [DOI] [PubMed] [Google Scholar]
Donelly K. P., 1983. The probability that related individuals share some section of genome identical by descent. Theor. Popul. Biol. 23: 34–63 [DOI] [PubMed] [Google Scholar]
Feller W., 1950. An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York [Google Scholar]
Fisher R., 1949. The Theory of Inbreeding. Oliver & Boyd, Edinburgh [Google Scholar]
Fisher R., 1954. A fuller theory of “junctions” in inbreeding. Heredity 8: 187–197 [Google Scholar]
Fisher R., 1959. An algebraically exact examination of junction formation and transmission in parent-offspring inbreeding. Heredity 13: 523–542 [Google Scholar]
Franklin I., 1977. The distribution of the proportion of the genome which is homozygous by descent in inbred individuals. Theor. Popul. Biol. 11: 60–80 [DOI] [PubMed] [Google Scholar]
Frazer K. A., Ballinger D. G., Cox D. R., Hinds D. A., Stuve L. L., et al. , 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gold B., Kirchhoff T., Stefanov S., Lautenberger J., Viale A., et al. , 2008. Genome-wide association study provides evidence for a breast cancer risk locus at 6q22–33. Proc. Natl. Acad. Sci. USA 105: 4340–4345 [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldstein D. B., 2001. Islands of linkage disequilibrium. Nat. Genet. 29: 109–111 [DOI] [PubMed] [Google Scholar]
Greenawalt D. M., Cui X. F., Wu Y. J., Lin Y., Wang H. Y., et al. , 2006. Strong correlation between meiotic crossovers and haplotype structure in a 2.5-mb region on the long arm of chromosome 21. Genome Res. 16: 208–214 [DOI] [PMC free article] [PubMed] [Google Scholar]
Grote M. N., 2007. A covariance structure model for the admixture of binary genetic variation. Genetics 176: 2405–2420 [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo S., 1994. Computation of identity by descent proportions shared by two siblings. Am. J. Hum. Genet. 54: 1104–1109 [PMC free article] [PubMed] [Google Scholar]
Haldane J. B. S., 1919. The combination of linkage values, and the calculation of distances between the loci of linked factors. J. Genet. 8: 299–309 [Google Scholar]
Haldane J. B. S., Waddington C. H., 1931. Inbreeding and linkage. Genetics 16: 357–374 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jakobsson M., Scholz S. W., Scheet P., Gibbs J. R., VanLiere J. M., et al. , 2008. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998–1003 [DOI] [PubMed] [Google Scholar]
Jeffreys A. J., Kauppi L., Neumann R., 2001. Intensely punctate meiotic recombination in the class ii region of the major histocompatibility complex. Nat. Genet. 29: 217–222 [DOI] [PubMed] [Google Scholar]
Kauppi L., Stumpf M. P. H., Jeffreys A. J., 2005. Localized breakdown in linkage disequilibrium does not always predict sperm crossover hot spots in the human MHC class II region. Genomics 86: 13–24 [DOI] [PubMed] [Google Scholar]
Keurentjes J. J. B., Fu J. Y., Terpstra I. R., Garcia J. M., van den Ackerveken G., et al. , 2007. Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc. Natl. Acad. Sci. USA 104: 1708–1713 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lencz T., Lambert C., DeRosse P., Burdick K. E., Morgan T. V., et al. , 2007. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci. USA 104: 19942–19947 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lindblad-Toh K., Wade C. M., Mikkelsen T. S., Karlsson E. K., Jaffe D. B., et al. , 2005. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803–819 [DOI] [PubMed] [Google Scholar]
Maccaferri M., Sanguineti M. C., Corneti S., Ortega J. L. A., Ben Salem M., et al. , 2008. Quantitative trait loci for grain yield and adaptation of durum wheat (Triticum durum desf.) across a wide range of water availability. Genetics 178: 489–511 [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin O. C., Hospital F., 2006. Two- and three-locus tests for linkage analysis using recombinant inbred lines. Genetics 173: 451–459 [DOI] [PMC free article] [PubMed] [Google Scholar]
McVean G. A. T., Cardin N. J., 2005. Approximating the coalescent with recombination. Philos. Trans. R. Soc. B Biol. Sci. 360: 1387–1393 [DOI] [PMC free article] [PubMed] [Google Scholar]
Meuwissen T. H. E., Hayes B. J., Goddard M. E., 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829 [DOI] [PMC free article] [PubMed] [Google Scholar]
Moore R. C., Henry M., Stevens H., 2008. Local patterns of nucleotide polymorphism are highly variable in the selfing species Arabidopsis thaliana. J. Mol. Evol. 66: 116–129 [DOI] [PubMed] [Google Scholar]
Mott R., 2007. A haplotype map for the laboratory mouse. Nat. Genet. 39: 1054–1056 [DOI] [PubMed] [Google Scholar]
Pe’er I., Chretien Y. R., de Bakker P. I. W., Barrett J. C., Daly M. J., et al. , 2006. Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am. J. Hum. Genet. 78: 588–603 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pritchard J. K., Stephens M., Donnelly P., 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945–959 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rodolphe F., Martin J., Della-Chiesa E., 2008. Theoretical description of chromosome architecture after multiple back-crossing. Theor. Popul. Biol. 73: 289–299 [DOI] [PubMed] [Google Scholar]
Sabeti P. C., Reich D. E., Higgins J. M., Levine H. Z. P., Richter D. J., et al. , 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837 [DOI] [PubMed] [Google Scholar]
Shifman S., Kuypers J., Kokoris M., Yakir B., Darvasi A., 2003. Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 12: 771–776 [DOI] [PubMed] [Google Scholar]
Singer T., Fan Y. P., Chang H. S., Zhu T., Hazen S. P., et al. , 2006. A high-resolution map of Arabidopsis recombinant inbred lines by whole-genome exon array hybridization. PLoS Genet. 2: e144. [DOI] [PMC free article] [PubMed] [Google Scholar]
Slate J., Pemberton J. M., 2007. Admixture and patterns of linkage disequilibrium in a free-living vertebrate population. J. Evol. Biol. 20: 1415–1427 [DOI] [PubMed] [Google Scholar]
Slatkin M., 1972. On treating the chromosome as the unit of selection. Genetics 72: 157–168 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stam P., 1980. The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet. Res. 35: 131–155 [Google Scholar]
Stefanov V. T., 2000. Distribution of genome shared identical by descent by two individuals in grandparent-type relationship. Genetics 156: 1403–1410 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stumpf M. P. H., 2002. Haplotype diversity and the block structure of linkage disequilibrium. Trends Genet. 18: 226–228 [DOI] [PubMed] [Google Scholar]
Tang K., Thornton K. R., Stoneking M., 2007. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 5: e071. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tishkoff S. A., Verrelli B. C., 2003. Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping. Curr. Opin. Genet. Dev. 13: 569–575 [DOI] [PubMed] [Google Scholar]
Wall J. D., Pritchard J. K., 2003. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4: 587–597 [DOI] [PubMed] [Google Scholar]
Walters K., Cannings C., 2005. The probability density of the total IBD length over a single autosome in unilineal relationships. Theor. Popul. Biol. 68: 55–63 [DOI] [PubMed] [Google Scholar]
Wiuf C., Hein J., 1999. Recombination as a point process along sequences. Theor. Popul. Biol. 55: 248–259 [DOI] [PubMed] [Google Scholar]
Wolfram S., 1991. Mathematica: A System for Doing Mathematics by Computer, Ed. 2 Addison Wesley Longman Publishing, Redwood City, CA [Google Scholar]
Yalcin B., Fullerton J., Miller S., Keays D. A., Brady S., et al. , 2004. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc. Natl. Acad. Sci. USA 101: 9734–9739 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu J., Holland J. B., McMullen M. D., Buckler E. S., 2008. Genetic design and statistical power of nested association mapping in maize. Genetics 178: 539–551 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang C., Bailey D. K., Awad T., Liu G. Y., Xing G. L., et al. , 2006. A whole genome long-range haplotype (wglrh) test for detecting imprints of positive selection in human populations. Bioinformatics 22: 2122–2128 [DOI] [PubMed] [Google Scholar]
Zheng M. X., McPeek M. S., 2007. Multipoint linkage-disequilibrium mapping with haplotype-block structure. Am. J. Hum. Genet. 80: 112–125 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zondervan K. T., Cardon L. R., 2004. The complex interplay among factors that influence allelic association. Nat. Rev. Genet. 5: 89–100 [DOI] [PubMed] [Google Scholar]

[bib1] Alcazar R., Garcia A. V., Parker J. E., Reymond M., 2009. Incremental steps toward incompatibility revealed by Arabidopsis epistatic interactions modulating salicylic acid pathway activation. Proc. Natl. Acad. Sci. USA 106: 334–339 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Altshuler D., Brooks L. D., Chakravarti A., Collins F. S., Daly M. J., et al. , 2005. A haplotype map of the human genome. Nature 437: 1299–1320 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Baird S., Barton N., Etheridge A., 2003. The distribution of surviving blocks of an ancestral genome. Theor. Popul. Biol. 64: 451–471 [DOI] [PubMed] [Google Scholar]

[bib4] Ball F., Stefanov V., 2005. Evaluation of identity-by-descent probabilities for half-sibs on continuous genome. Math. Biosci. 196: 215–225 [DOI] [PubMed] [Google Scholar]

[bib5] Bennett J., 1953. Junctions in inbreeding. Genetica 26: 392–406 [DOI] [PubMed] [Google Scholar]

[bib6] Bergland A. O., Genissel A., Nuzhdin S. V., Tatar M., 2008. Quantitative trait loci affecting phenotypic plasticity and the allometric relationship of ovariole number and thorax length in Drosophila melanogaster. Genetics 180: 567–582 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bickeboller H., Thompson E., 1996a Distribution of genome shared IBD by half-sibs: approximation by the Poisson clumping heuristic. Theor. Popul. Biol. 50: 66–90 [DOI] [PubMed] [Google Scholar]

[bib8] Bickeboller H., Thompson E., 1996b The probability distribution of the amount of an individual’s genome surviving to the following generation. Genetics 143: 1043–1049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Browning S., Browning B., 2002. On reducing the statespace of hidden Markov models for the identity by descent process. Theor. Popul. Biol. 62: 1–8 [DOI] [PubMed] [Google Scholar]

[bib10] Bryc K., Auton A., Nelson M. R., Oksenberg J. R., Hauser S. L., et al. , 2010. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc. Natl. Acad. Sci. USA 107: 786–791 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Buckler E., Gore M., 2007. An Arabidopsis haplotype map takes root. Nat. Genet. 39: 1056–1057 [DOI] [PubMed] [Google Scholar]

[bib12] Cannings C., 2003. The identity by descent process along the chromosome. Hum. Hered. 56: 126–130 [DOI] [PubMed] [Google Scholar]

[bib13] Cardon L. R., Abecasis G. R., 2003. Using haplotype blocks to map human complex trait loci. Trends Genet. 19: 135–140 [DOI] [PubMed] [Google Scholar]

[bib14] Carlton J. M., 2007. Toward a malaria haplotype map. Nat. Genet. 39: 5–6 [DOI] [PubMed] [Google Scholar]

[bib15] Chapman N. H., Thompson E. A., 2003. A model for the length of tracts of identity by descent in finite random mating populations. Theor. Popul. Biol. 64: 141–150 [DOI] [PubMed] [Google Scholar]

[bib16] Churchill G. A., 2007. Recombinant inbred strain panels: a tool for systems genetics. Physiol. Genomics 31: 174–175 [DOI] [PubMed] [Google Scholar]

[bib17] Churchill G., Airey D. C., Allayee H., Angel J. M., Attie A. D., et al. , 2004. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 36: 1133–1137 [DOI] [PubMed] [Google Scholar]

[bib18] Crow J. F., 2007. Haldane, Bailey, Taylor and recombinant-inbred lines. Genetics 176: 729–732 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Cuppen E., 2005. Haplotype-based genetics in mice and rats. Trends Genet. 21: 318–322 [DOI] [PubMed] [Google Scholar]

[bib20] Curtis D., Vine A. E., Knight J., 2008. Study of regions of extended homozygosity provides a powerful method to explore haplotype structure of human populations. Ann. Hum. Genet. 72: 261–278 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Dimitropoulou P., Cannings C., 2003. Recsim and indstats: probabilities of identity in general genealogies. Bioinformatics 19: 790–791 [DOI] [PubMed] [Google Scholar]

[bib22] Donelly K. P., 1983. The probability that related individuals share some section of genome identical by descent. Theor. Popul. Biol. 23: 34–63 [DOI] [PubMed] [Google Scholar]

[bib23] Feller W., 1950. An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York [Google Scholar]

[bib24] Fisher R., 1949. The Theory of Inbreeding. Oliver & Boyd, Edinburgh [Google Scholar]

[bib25] Fisher R., 1954. A fuller theory of “junctions” in inbreeding. Heredity 8: 187–197 [Google Scholar]

[bib26] Fisher R., 1959. An algebraically exact examination of junction formation and transmission in parent-offspring inbreeding. Heredity 13: 523–542 [Google Scholar]

[bib27] Franklin I., 1977. The distribution of the proportion of the genome which is homozygous by descent in inbred individuals. Theor. Popul. Biol. 11: 60–80 [DOI] [PubMed] [Google Scholar]

[bib28] Frazer K. A., Ballinger D. G., Cox D. R., Hinds D. A., Stuve L. L., et al. , 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Gold B., Kirchhoff T., Stefanov S., Lautenberger J., Viale A., et al. , 2008. Genome-wide association study provides evidence for a breast cancer risk locus at 6q22–33. Proc. Natl. Acad. Sci. USA 105: 4340–4345 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Goldstein D. B., 2001. Islands of linkage disequilibrium. Nat. Genet. 29: 109–111 [DOI] [PubMed] [Google Scholar]

[bib31] Greenawalt D. M., Cui X. F., Wu Y. J., Lin Y., Wang H. Y., et al. , 2006. Strong correlation between meiotic crossovers and haplotype structure in a 2.5-mb region on the long arm of chromosome 21. Genome Res. 16: 208–214 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Grote M. N., 2007. A covariance structure model for the admixture of binary genetic variation. Genetics 176: 2405–2420 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Guo S., 1994. Computation of identity by descent proportions shared by two siblings. Am. J. Hum. Genet. 54: 1104–1109 [PMC free article] [PubMed] [Google Scholar]

[bib34] Haldane J. B. S., 1919. The combination of linkage values, and the calculation of distances between the loci of linked factors. J. Genet. 8: 299–309 [Google Scholar]

[bib35] Haldane J. B. S., Waddington C. H., 1931. Inbreeding and linkage. Genetics 16: 357–374 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Jakobsson M., Scholz S. W., Scheet P., Gibbs J. R., VanLiere J. M., et al. , 2008. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998–1003 [DOI] [PubMed] [Google Scholar]

[bib37] Jeffreys A. J., Kauppi L., Neumann R., 2001. Intensely punctate meiotic recombination in the class ii region of the major histocompatibility complex. Nat. Genet. 29: 217–222 [DOI] [PubMed] [Google Scholar]

[bib38] Kauppi L., Stumpf M. P. H., Jeffreys A. J., 2005. Localized breakdown in linkage disequilibrium does not always predict sperm crossover hot spots in the human MHC class II region. Genomics 86: 13–24 [DOI] [PubMed] [Google Scholar]

[bib39] Keurentjes J. J. B., Fu J. Y., Terpstra I. R., Garcia J. M., van den Ackerveken G., et al. , 2007. Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc. Natl. Acad. Sci. USA 104: 1708–1713 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Lencz T., Lambert C., DeRosse P., Burdick K. E., Morgan T. V., et al. , 2007. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci. USA 104: 19942–19947 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Lindblad-Toh K., Wade C. M., Mikkelsen T. S., Karlsson E. K., Jaffe D. B., et al. , 2005. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803–819 [DOI] [PubMed] [Google Scholar]

[bib42] Maccaferri M., Sanguineti M. C., Corneti S., Ortega J. L. A., Ben Salem M., et al. , 2008. Quantitative trait loci for grain yield and adaptation of durum wheat (Triticum durum desf.) across a wide range of water availability. Genetics 178: 489–511 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Martin O. C., Hospital F., 2006. Two- and three-locus tests for linkage analysis using recombinant inbred lines. Genetics 173: 451–459 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] McVean G. A. T., Cardin N. J., 2005. Approximating the coalescent with recombination. Philos. Trans. R. Soc. B Biol. Sci. 360: 1387–1393 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Meuwissen T. H. E., Hayes B. J., Goddard M. E., 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Moore R. C., Henry M., Stevens H., 2008. Local patterns of nucleotide polymorphism are highly variable in the selfing species Arabidopsis thaliana. J. Mol. Evol. 66: 116–129 [DOI] [PubMed] [Google Scholar]

[bib47] Mott R., 2007. A haplotype map for the laboratory mouse. Nat. Genet. 39: 1054–1056 [DOI] [PubMed] [Google Scholar]

[bib48] Pe’er I., Chretien Y. R., de Bakker P. I. W., Barrett J. C., Daly M. J., et al. , 2006. Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am. J. Hum. Genet. 78: 588–603 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Pritchard J. K., Stephens M., Donnelly P., 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945–959 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Rodolphe F., Martin J., Della-Chiesa E., 2008. Theoretical description of chromosome architecture after multiple back-crossing. Theor. Popul. Biol. 73: 289–299 [DOI] [PubMed] [Google Scholar]

[bib51] Sabeti P. C., Reich D. E., Higgins J. M., Levine H. Z. P., Richter D. J., et al. , 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837 [DOI] [PubMed] [Google Scholar]

[bib52] Shifman S., Kuypers J., Kokoris M., Yakir B., Darvasi A., 2003. Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 12: 771–776 [DOI] [PubMed] [Google Scholar]

[bib53] Singer T., Fan Y. P., Chang H. S., Zhu T., Hazen S. P., et al. , 2006. A high-resolution map of Arabidopsis recombinant inbred lines by whole-genome exon array hybridization. PLoS Genet. 2: e144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Slate J., Pemberton J. M., 2007. Admixture and patterns of linkage disequilibrium in a free-living vertebrate population. J. Evol. Biol. 20: 1415–1427 [DOI] [PubMed] [Google Scholar]

[bib55] Slatkin M., 1972. On treating the chromosome as the unit of selection. Genetics 72: 157–168 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Stam P., 1980. The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet. Res. 35: 131–155 [Google Scholar]

[bib57] Stefanov V. T., 2000. Distribution of genome shared identical by descent by two individuals in grandparent-type relationship. Genetics 156: 1403–1410 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Stumpf M. P. H., 2002. Haplotype diversity and the block structure of linkage disequilibrium. Trends Genet. 18: 226–228 [DOI] [PubMed] [Google Scholar]

[bib59] Tang K., Thornton K. R., Stoneking M., 2007. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 5: e071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Tishkoff S. A., Verrelli B. C., 2003. Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping. Curr. Opin. Genet. Dev. 13: 569–575 [DOI] [PubMed] [Google Scholar]

[bib61] Wall J. D., Pritchard J. K., 2003. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4: 587–597 [DOI] [PubMed] [Google Scholar]

[bib62] Walters K., Cannings C., 2005. The probability density of the total IBD length over a single autosome in unilineal relationships. Theor. Popul. Biol. 68: 55–63 [DOI] [PubMed] [Google Scholar]

[bib63] Wiuf C., Hein J., 1999. Recombination as a point process along sequences. Theor. Popul. Biol. 55: 248–259 [DOI] [PubMed] [Google Scholar]

[bib64] Wolfram S., 1991. Mathematica: A System for Doing Mathematics by Computer, Ed. 2 Addison Wesley Longman Publishing, Redwood City, CA [Google Scholar]

[bib65] Yalcin B., Fullerton J., Miller S., Keays D. A., Brady S., et al. , 2004. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc. Natl. Acad. Sci. USA 101: 9734–9739 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Yu J., Holland J. B., McMullen M. D., Buckler E. S., 2008. Genetic design and statistical power of nested association mapping in maize. Genetics 178: 539–551 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Zhang C., Bailey D. K., Awad T., Liu G. Y., Xing G. L., et al. , 2006. A whole genome long-range haplotype (wglrh) test for detecting imprints of positive selection in human populations. Bioinformatics 22: 2122–2128 [DOI] [PubMed] [Google Scholar]

[bib68] Zheng M. X., McPeek M. S., 2007. Multipoint linkage-disequilibrium mapping with haplotype-block structure. Am. J. Hum. Genet. 80: 112–125 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] Zondervan K. T., Cardon L. R., 2004. The complex interplay among factors that influence allelic association. Nat. Rev. Genet. 5: 89–100 [DOI] [PubMed] [Google Scholar]

PERMALINK

Distribution of Parental Genome Blocks in Recombinant Inbred Lines

Olivier C Martin

Frédéric Hospital

Roles

Abstract