A Renewal Theory Approach to IBD Sharing

Shai Carmi; Peter R Wilton; John Wakeley; Itsik Pe’er

doi:10.1016/j.tpb.2014.08.002

. Author manuscript; available in PMC: 2015 Nov 1.

Published in final edited form as: Theor Popul Biol. 2014 Aug 18;0:35–48. doi: 10.1016/j.tpb.2014.08.002

A Renewal Theory Approach to IBD Sharing

Shai Carmi ^a,^*, Peter R Wilton ^b, John Wakeley ^b, Itsik Pe’er ^a

PMCID: PMC4179929 NIHMSID: NIHMS625072 PMID: 25149691

Abstract

A long genomic segment inherited by a pair of individuals from a single, recent common ancestor is said to be identical-by-descent (IBD). Shared IBD segments have numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC′), as well as introduce and justify a new model, which we term the renewal approximation, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC′) for the mean number of shared segments longer than a cutoff and the fraction of the chromosome found in such segments. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also compute (again, in Laplace space) the distribution of the fraction of the chromosome in shared segments, from which we obtain explicit expressions for the first two moments. Finally, we generalize all results to populations with a variable effective size.

Keywords: IBD sharing, coalescent theory, recombination, renewal theory, SMC, SMC′

1. Introduction

IBD sharing of a genomic segment between a pair of individuals is traditionally defined in terms of recent co-ancestry, no more remote than some time depth t (Thompson, 2013). In population samples, the time of the common ancestor is unknown, and in practice, IBD segments are often identified as long stretches that are nearly or fully identical-by-state (IBS), to an extent distinguishable from population-level LD. The decision whether a segment is called IBD is either rule-based (e.g., using a certain length cutoff) or model-based, using an underlying HMM for the IBD state (Thompson, 2013). In this paper, we define an IBD segment shared between two chromosomes as the maximal sequence over which the chromosomes have the same most recent common ancestor (MRCA). Recent mutations (or genotyping errors) separating the two sequences do not disqualify the segment from being IBD. On the other hand, we require the segment to be longer than an (arbitrary) cutoff m. This definition enables a theoretical treatment, while largely capturing the way in which some methods (and, for sufficiently large m, virtually all methods) discover IBD segments in real data.

Much attention has recently been devoted to efficient algorithms for IBD detection in large samples (e.g., Purcell et al. (2007); Gusev et al. (2009); Browning and Browning (2011); Brown et al. (2012); Browning and Browning (2013a), to give a few examples). Detected segments have found numerous applications, for example, characterization of relationships between populations (Atzmon et al., 2010; Bray et al., 2010; Moorjani et al., 2013; Gauvin et al., 2013; Botigué et al., 2013; Ralph and Coop, 2013), detection of positive selection (Han and Abney, 2013), estimation of heritability (Browning and Browning, 2013b), mapping haplotypes associated with a trait (Gusev et al., 2011; Browning and Thompson, 2012; Lin et al., 2013), phasing and imputation (Kong et al., 2008; Palin et al., 2011), and pedigree reconstruction (Huff et al., 2011; Henn et al., 2012). See Browning and Browning (2012) and Thompson (2013) for up-to-date reviews.

In parallel, theory has been developed for the expected amount of IBD sharing in model populations, with implications for demographic inference. Palamara et al. (2012) and Palamara and Pe’er (2013) computed, under the coalescent and for complex demographies, the moments of the fraction of the chromosome found in shared segments of a given length. Palamara et al. (2012) and Carmi et al. (2013) then approximated the distribution of this quantity, assuming a Poisson distribution for the number of segments (see also Huff et al. (2011)). Ralph and Coop (2013) computed the expected number of shared segments of a certain length given an arbitrary demographic history. However, certain theoretical problems of interest have remained open.

Here, we introduce a general framework for the analysis of the IBD process along the chromosome, based on a renewal approximation. Renewal theory is the study of processes in which events are separated by independent waiting times, and where each waiting period or event may be associated with a value (Karlin and Taylor, 1975). Under certain conditions, consecutive shared segments along the chromosome can be approximated as independent. Then, interpreting segments with shared ancestry as waiting times, renewal theory can be applied to compute, for example, the distribution of the number of and the total amount of genetic material covered by segments of a certain length.

A renewal approach to the IBD process has been considered in the past (e.g., Stam (1980); Chapman and Thompson (2003), with initial contributions already by Fisher (1954)), in a model where the population has been recently founded by individuals of heterogeneous genetic types. Alternatively, in those works, IBD is defined with respect to a given time depth (Thompson, 2013). The IBD segment lengths were either assumed exponential or fitted. In contrast, we consider a model that can be applied without reference to a particular time point. In our model, two chromosomes can trace their common ancestor, at each locus, to any time in the past, and IBD segments are defined with respect to a length cutoff.

According to our renewal approximation for a pair of chromosomes, the time to the common ancestor is drawn, at a recombination event, independently of the previous time and from a position-independent stationary distribution. The distribution has been derived for the pairwise Sequentially Markov Coalescent (SMC) by Li and Durbin (2011), and we derive it here for the more accurate, yet tractable SMC′ model (Marjoram and Wall, 2006). Under this approximation, the distribution of segment lengths emerges naturally. Using renewal theory, we are then able to derive new results, such as the distribution of the number of shared segments, as well as recover previous results as special cases.

Our results are organized as follows. In section 2, we introduce the renewal approximation in the context of successively simplified approximations of the coalescent with recombination. We then describe the IBD process under the different models and present numerical evidence to justify the renewal approach. In section 3, we show how simple quantities, such as the mean number of shared segments and the mean fraction of the chromosome in shared segments, emerge naturally from our definition of the IBD process by taking the infinite-chromosome limit. Specifically, we recover previously derived results for SMC and obtain new results for SMC′. In section 4, we derive results for finite chromosomes. Specifically, we derive an expression, in Laplace space, for the distribution of the number of shared segments and consider implications for demographic inference. Additionally, we derive, again in Laplace space, the distribution of the fraction of the chromosome found in shared segments, from which we obtain explicit expressions for the first two moments, recovering and extending previous results. Finally, in section 5, we generalize our results to populations with variable size. We summarize and discuss the results in section 6.

2. The IBD process

2.1. Overview of the coalescent with recombination and its Markovian approximations

We consider a sample of two chromosomes of length L (Morgans) in a population of a constant effective size N (haploid chromosomes) and with recombination modeled as a Poisson process along the chromosome. The ancestral process can be described by the coalescent with recombination (Hudson, 1983; Griffiths and Marjoram, 1997). In that model, looking backwards in time, lineages can either coalesce (at rate 1 per pair of lineages, when the time is scaled by N) or recombine at a random position along the chromosome (split into two, at rate ρ = 2Nr, where r is the recombination probability per generation). The resulting structure is called the ancestral recombination graph (ARG). Wiuf and Hein (1999) described an alternative but equivalent formulation, where the ARG is obtained by walking along the chromosome. In that model, a coalescent tree is first formed at the leftmost end of the chromosome (Figure 1A). Recombination then occurs at a genetic distance distributed exponentially with rate equal to the total tree size; the position of the breakpoint (t_r) is randomly and uniformly distributed along the tree (Figure 1B). The branching lineage then coalesces with any of the existing branches of the ARG, and the process is repeated until reaching the end of the chromosome (Figure 1C). The model is non-Markovian, in the sense that the tree formed at a given position depends on all preceding trees.

An illustration of the coalescent with recombination for two chromosomes, and the associated Markovian approximations. Part A shows the coalescent tree at a random site. The two extant chromosomes are denoted a and b. Part B is indicating a recombination event occurring at time *t_r*. The old branch connecting the breakpoint and the MRCA is colored red, and the branching lineage is shown as a dashed line. Under the full model of the coalescent with recombination (Wiuf and Hein (1999); C), the new lineage can coalesce with any branch in the existing tree (in this example, earlier than the previous TMRCA), and both the old lineage (which is not ancestral to the sample anymore) and the new lineage are carried over to the next site. The ‘marginal’ tree at the new site is shown in solid lines; the remainder of the ARG is in dashed lines. The Markovian approximations are presented in parts D–G, where the current TMRCA is denoted as s and the new as t. In SMC (McVean and Cardin (2005); D), the old branch (red in B) is deleted, and the branching lineage can coalesce only with the lineage corresponding to the other chromosome (either earlier or later than the previous TMRCA; corresponding to the two dashed lines). In SMC′ (Marjoram and Wall (2006); E), the branching lineage can coalesce with the old branch (blue), but that branch is deleted once the new tree is formed. Under the renewal approximation (F), the new tree height is drawn independently of the previous tree height. In all Markovian approximations, the new tree (G) contains only the lineages ancestral to the sample at that position.

McVean and Cardin (2005) proposed a Markovian approximation to the coalescent with recombination (the Sequentially Markov Coalescent, or SMC). At each recombination event in SMC, the branch leading from the breakpoint to the most recent common ancestor (MRCA) is deleted, and the branching lineage is allowed to coalesce only with the lineage ancestral to the other individual (Figure 1D,G). Once the MRCA is reached, the process is continued with the newly formed tree. Marjoram and Wall (2006) suggested a more accurate approximation, called SMC′, in which the branching lineage is allowed to coalesce with the branch it had split from, but once the tree has formed, any branch not ancestral to the sample is again deleted (Figure 1E,G). See Hobolth and Jensen (2014) for the joint distribution of tree heights for two sequences at two loci under the ARG and the Markovian approximations.

We propose the renewal approximation, which is a further simplification of SMC. According to our approximation, at a recombination event, the new tree height is drawn, independently of the previous tree height, from the stationary distribution of tree heights under SMC (Figure 1F,G). The stationary distribution was derived by Li and Durbin (2011) (see the next section). While the independence assumption is strong, the fact that we use the SMC stationary distribution guarantees that for sufficiently long sequences (see simulations in section 2.5), the statistical properties of SMC and the renewal process are similar.

In the following subsections, we define the IBD process according to the three models: SMC, renewal, and SMC′ (Tables 1–3, respectively).

Table 1.

The IBD process under SMC

1:	Initialize
2:	x ← 0 ▹ The position along the chromosome
3:	n_m ← 0 ▹ The number of shared segments longer than m
4:	f_m ← 0 ▹ The fraction of the chromosome in shared segments longer than m
5:	Draw TMRCA: t ~ Exp(1)
6:	while x < L
7:	Draw segment length: ℓ ~ Exp(2Nt)
8:	if (x + ℓ) > L ▹ If the new position exceeds the chromosome length
9:	ℓ ← (L − x)
10:	if ℓ > m ▹ The segment is longer than the cutoff
11:	n_m ← n_m + 1
12:	f_m ← f_m + ℓ/L
13:	s ← t
14:	Draw new TMRCA t with PDF q_SMC(t\|s) (Eq. (1))
15:	x ← x + ℓ

Open in a new tab

Table 3.

The IBD process under SMC′

1:	Initialize
2:	As in Table 1
3:	while x < L
4:	ℓ ← 0 ▹ The current total segment length
5:	repeat
6:	▹ Draw distance to next recombination; not necessarily a new segment
7:	Draw ℓ₀ ~ Exp(2Nt)
8:	ℓ ← ℓ + ℓ₀
9:	s ← t
10:	Draw new TMRCA t with PDF q_SMC′(t\|s) (Eq. (6))
11:	until t ≠ s
12:	if (x + ℓ) > L
13:	ℓ ← (L − x)
14:	if ℓ > m
15:	n_m ← n_m + 1
16:	f_m ← f_m + ℓ/L
17:	x ← x + ℓ

Open in a new tab

2.2. The IBD process under SMC

Recently, Li and Durbin (2011) derived the probability density function (PDF) of the tree height for a pair of chromosomes (equivalently, time to MRCA or TMRCA; and scaled by N) at a recombination site, given the TMRCA of the preceding tree. The result is given in their supplementary Eq. (6),

q_{SMC} (t ∣ s) = {\begin{cases} \frac{1}{s} (1 - e^{- t}) & t < s, \\ \frac{1}{s} e^{- (t - s)} (1 - e^{- s}) & t > s, \end{cases}

(1)

where s and t are the previous and new TMRCA, respectively. Note that t ≠ s by definition and that q_SMC(t|s) is normalized. At a recombination site, and for a given new tree height t, the sequence length to the next recombination event is distributed exponentially with rate 2Nt, the total branch length of the tree (in generations; Wiuf and Hein (1999)). The sequence between recombination sites is a shared segment, because the common ancestor of the two chromosomes is fixed throughout the segment. In SMC, the MRCA necessarily changes at recombination sites; therefore, segments are terminated by recombination events. With these preliminaries, and imposing a minimal segment length cutoff, m, we define in Table 1 the IBD process along the chromosome (see also Figure 2).

An illustration of the IBD process along the chromosome under SMC. Segments are broken by recombination events (vertical bars). The TMRCA is shown on top of each segment. Given a TMRCA *t_i* at segment i, the segment length, *ℓ_i*, is distributed exponentially with rate 2Nt_i, and the TMRCA at the next segment, *t_i*₊₁, is distributed according to Eq. (1). The minimal segment length, m, is shown as a horizontal bar under the chromosome. Segments longer than m are shown in dark pink. In this example, there are three such segments; hence *n_m* = 3 and the fraction of the chromosome in shared segments is *f_m* = (ℓ₁ + ℓ₅ + ℓ₉)/L. Segments shorter than m are in light pink. The last segment exceeds the chromosome length; the excess length (yellow) is ignored.

Steps 8 and 9 are needed in case the new position exceeds the chromosome length. In simulations, step 14 is implemented by drawing a random recombination time, t_r, uniform in [0, s], and then a random coalescence time t_c, exponential with rate 1. The new TMRCA is then set to t ← t_r + t_c (Figure 1D).

2.3. The IBD process under the renewal approximation to SMC

Eq. (1) for q_SMC(t|s) can be interpreted as the transition probability for a Markov chain whose states are the tree heights at consecutive recombination sites. Li and Durbin (2011), who derived Eq. (1), further computed the stationary distribution of the chain,

π_{\infty}^{SMC} (t) = {t e}^{- t} .

(2)

Note that this stationary distribution is not the same as the ‘marginal’ coalescence distribution, P_c(t) = e⁻^t, which would apply to the tree height at a pre-specified site, such as the end of a chromosome (Wiuf and Hein, 1999), or to a randomly chosen site. In fact, $π_{\infty}^{SMC} (t)$ is identical to the distribution at a site conditional on a recombination event having occurred at that site when the recombination rate per site is very small. It thus has mean equal to 2 (Griffiths and Marjoram (1996), Eq. (9)), as for example is the case for tree heights around rare insertions in the human genome (Huff et al., 2010). In other words, $π_{\infty}^{SMC} (t)$ , may be interpreted as the PDF of the TMRCA of a randomly chosen segment (rather than site).

To test the convergence to the stationary distribution, we numerically computed the PDFs of successive tree heights, as follows,

\begin{array}{l} π_{1}^{SMC} (t) = e^{- t}, \\ π_{n + 1}^{SMC} (t) = \int_{0}^{\infty} q_{SMC} (t ∣ s) π_{n}^{SMC} (s) d s; n > 1. \end{array}

(3)

The resulting PDFs for the first 10 trees are shown in Figure 3, demonstrating fast convergence to the stationary PDF (Eq. (2)). For typical (human) parameters (N ≈ 10⁴, L ≈ 1 Morgan), the average number of recombination events along the chromosome is 2NL ~ 10⁴ ≫ 1 (Griffiths and Marjoram, 1997). Therefore, the vast majority of trees are expected to have the stationary PDF.

Convergence of the distribution of tree heights in the SMC model. The first tree is distributed as e⁻^t, according to the standard coalescent. Subsequent trees are distributed according to Eqs. (1) and (3). The integrals were solved numerically. The stationary PDF (dashed line; Eq. (2)) is reached quickly.

Using the stationary PDF, segment lengths are therefore distributed as (see also Li and Durbin (2011) and Palamara et al. (2012))

ψ_{SMC} (ℓ) = \int_{0}^{\infty} π_{\infty}^{SMC} (t) \cdot 2 N {t e}^{- 2 N t ℓ} d t = \frac{4 N}{{(1 + 2 N ℓ)}^{3}} .

(4)

The mean segment length is 〈 ℓ 〉_SMC = 1/2N, but no higher moments exist. The distribution of ρ = 2Nℓ, the scaled recombination rate, is ψ(ρ) = 2/(1 + ρ)³, which is, as expected, independent of N (a property that holds generally; see section 5).

Having the distribution of segment lengths, we can now invoke the assumption of independence between successive segments and define the IBD process in the renewal approximation (Table 2). To generate numbers from ψ_SMC(ℓ), we used the transformation method: let u be uniform in [0, 1]; we set $ℓ = (1 - \sqrt{u}) / (2 N \sqrt{u})$ .

Table 2.

The IBD process under the renewal approximation to SMC

1:	Initialize
2:	x ← 0 ▹ The position along the chromosome
3:	n_m ← 0 ▹ The number of shared segments longer than m
4:	f_m ← 0 ▹ The fraction of the chromosome in shared segments longer than m
5:	while x < L
6:	Draw segment length ℓ with PDF ψ_SMC(ℓ) (Eq. (4))
7:	if (x + ℓ) > L
8:	ℓ ← (L − x)
9:	if > m
10:	n_m ← n_m + 1
11:	f_m ← f_m + ℓ/L
12:	x ← x + ℓ

Open in a new tab

2.4. The IBD process under SMC′

In SMC′, the PDF of the new TMRCA, t, given the previous TMRCA, s, is given by (see also Zheng et al. (2014))

q_{{SMC}^{'}} (t ∣ s) = {\begin{cases} \int_{0}^{s} \frac{1}{s} [\int_{t_{r}}^{s} e^{- 2 (t_{c} - t_{r})} {d t}_{c}] {d t}_{r} & t = s, \\ \int_{0}^{t} \frac{1}{s} e^{- 2 (t - t_{r})} {d t}_{r} & t < s, \\ [\int_{0}^{s} \frac{1}{s} e^{- 2 (s - t_{r})} {d t}_{r}] e^{- (t - s)} & t > s . \end{cases}

(5)

To understand Eq. (5), consider how the new TMRCA, t, is drawn in simulations. First, a random recombination time, t_r, is drawn uniformly in [0, s], as in SMC. But then, the random coalescence time, t_c, is drawn from an exponential distribution with rate 2, since the branching lineage can coalesce with either the other chromosome or the lineage it had branched from (Figure 1E). If t_r + t_c < s, the new TMRCA is set to either t ← s (coalescence with the lineage it had branched from) or t ← t_r + t_c (coalescence with the other chromosome) with probability 1/2 each. If t_r + t_c > s, a new coalescence time, τ_c, is drawn from an exponential distribution with rate 1 (since after time s, there is only one other lineage), and the new TMRCA is set to t ← s + τ_c. The upper limit of the integral for t < s is t, not s, since the recombination time, t_r, cannot be greater than the new coalescence time, t. For the case t = s, the density is implicitly assumed to be multiplied by Dirac’s delta function (δ(t − s)), omitted for notational simplicity. The integrals in Eq. (5) can be solved, yielding

q_{{SMC}^{'}} (t ∣ s) = {\begin{cases} \frac{2 t + e^{- 2 t} - 1}{4 t} & t = s, \\ \frac{1 - e^{- 2 t}}{2 s} & t < s, \\ \frac{e^{- (t - s)} - e^{- (t + s)}}{2 s} & t > s . \end{cases}

(6)

Note that q_SMC′(t|s) is normalized. Curiously, the stationary distribution of the chain is $π_{\infty}^{{SMC}^{'}} (t) = {t e}^{- t}$ , exactly as in SMC (Eq. (2)). This can be proved by validating the detailed balance equation, $π_{\infty}^{{SMC}^{'}} (t) q_{{SMC}^{'}} (s ∣ t) = π_{\infty}^{{SMC}^{'}} (s) q_{{SMC}^{'}} (t ∣ s)$ , which also shows that SMC′ is reversible (Zheng et al., 2014).

To define the IBD process (Table 3), we note that in the case t = s, the common ancestor of the two chromosomes does not change, and therefore, the shared segment extends until (at least) the next recombination event.

We now derive the stationary distribution of segment lengths. Given the TMRCA t at the beginning of a segment, the rate at which the segment terminates is the product of the recombination rate (2Nt) and the probability that the segment does not extend beyond the recombination site (1 − q_SMC′(t|t)). Therefore, given t, segment lengths are exponential with rate

λ (t) = 2 N t [1 - q_{{SMC}^{'}} (t ∣ t)] = \frac{N}{2} (2 t + 1 - e^{- 2 t}) .

(7)

Note that this also implies that for two loci distance ℓ apart, and given t at the left locus, the probability of the right TMRCA to remain t is exp[−λ(t)ℓ] = exp {−ρt[1 − q_SMC′(t|t)]}, as in the small ρ limit of Eq. (30) in Harris and Nielsen (2013).

To obtain the unconditional distribution of segment lengths, we cannot use $π_{\infty}^{{SMC}^{'}} (t)$ , because we need the distribution of tree heights at segments ends, not at recombination sites. We therefore define a new Markov chain with transition probability

q_{{SMC}^{'}, seg} (t ∣ s) = \frac{q_{{SMC}^{'}} (t ∣ s)}{1 - q_{{SMC}^{'}} (s ∣ s)} = \frac{q_{{SMC}^{'}} (t ∣ s)}{1 - \frac{2 s + e^{- 2 s} - 1}{4 s}},

(8)

which is the conditional probability of the new tree height, given that it has changed (i.e., a new segment began). By construction, the stationary distribution of the chain, $π_{\infty}^{{SMC}^{'}, seg} (t)$ , is the desired distribution of tree heights at the beginning of segments. It is easy to verify by detailed balance that $π_{\infty}^{{SMC}^{'}, seg} (t) \propto {t e}^{- t} [1 - q_{{SMC}^{'}} (t ∣ t)] \propto e^{- t} λ (t)$ , and then, by normalization,

π_{\infty}^{{SMC}^{'}, seg} (t) = \frac{e^{- t} λ (t)}{\int_{0}^{\infty} e^{- t^{'}} λ (t^{'}) d t^{'}} = \frac{3}{8} e^{- t} (2 t + 1 - e^{- 2 t}) .

(9)

To obtain the distribution of segment lengths, ψ_SMC′(ℓ), we integrate over all t (as in Eq. (4)),

ψ_{{SMC}^{'}} (ℓ) = \int_{0}^{\infty} π_{\infty}^{{SMC}^{'}, seg} (t) λ (t) e^{- λ (t) ℓ} d t = \frac{\int_{0}^{\infty} λ^{2} (t) e^{- t - λ (t) ℓ} d t}{\int_{0}^{\infty} e^{- t} λ (t) d t} .

(10)

The integrals in Eq. (10) can be solved in terms of special functions; the final expression is given in Appendix A (Eq. (A.1)). Note that setting λ(t) = 2Nt (i.e., setting the probability of t = s to zero) reduces Eq. (10) to the SMC distribution (Eq. (4)). Using the representation of Eq. (10), it is easy to see that ψ_SMC′(ℓ) is normalized and that the mean segment length is

{〈 ℓ 〉}_{{SMC}^{'}} = \frac{1}{\int_{0}^{\infty} e^{- t} λ (t) d t} = \frac{3}{4 N} .

(11)

Segments in SMC′ are, by definition, longer than in SMC, and in SMC, ψ_SMC(ℓ) had no moments higher than the first. Therefore, ψ_SMC′(ℓ) also has no second or higher moments.

It is possible, using Eq. (10), to define a renewal process for SMC′ analogous to the process defined in Table 2. However, with the exception of the infinite-chromosome results (section 3), we do not further investigate the properties of such a model.

2.5. Simulations

To demonstrate the IBD process under SMC and SMC′, as well as provide empirical justification to the renewal approximation, we show simulation results for the distribution of the fraction of the chromosome found in shared segments longer than m, P(f_m), (Figure 4) and the distribution of segment lengths, ψ(ℓ) (Figure 5). Simulations were performed precisely as described in Tables 1, 2, and 3 above. For all values of N tested, simulation results for P(f_m) were identical between SMC and its renewal approximation. For small values of N (or more precisely, as 1/2N, the average distance between recombination sites, approaches m), there is more sharing in SMC′ than in SMC/renewal. This is because in SMC′, short segments may extend beyond the first recombination event, and by that exceed the length cutoff. Simulation results for the distribution of segment lengths in SMC and SMC′ (Figure 5) agree well with Eqs. (4) and (10), respectively. As expected, the SMC′ distribution has a heavier tail than in SMC and interestingly, is indistinguishable from that of the ARG, reinforcing the importance of the SMC′ model.

The distribution of the fraction of the chromosome found in shared segments longer than m, *f_m*. We simulated the IBD process for three values of the population size (N = 500, 1000, 5000), for L = 2 and m = 0.01 (Morgans), for SMC (the process defined in Table 1, section 2.2), the renewal approximation (Table 2, section 2.3), and SMC′ (Table 3, section 2.4), and for 10⁶ realizations for each setting. The distribution for N = 5000 was divided by 3 for visibility. For all population sizes, SMC and the renewal approximation produced identical results, which also agree well with the renewal theory result (numerical inversion (Hollenbeck, 1998; de Hoog et al., 1982) of Eq. (B.3))). SMC′ and the Poisson approximation (Eq. (47)) deviate from SMC/renewal, increasingly for smaller values of N. The fluctuations for N = 5000 are due to the sharing of exactly 0,1,2,… segments of length very close to m, and were previously described (Carmi et al., 2013).

The distribution of segment lengths, ψ(ℓ), under SMC, SMC′, and the ARG. Simulations for SMC and SMC′ were as described in Figure 4, but with N = 1000 and L = 0.5 (Morgan). ARG simulations were performed in ms, by outputting the marginal trees and extracting segment lengths. We ran 5000 realizations for each model. Theory for SMC is from Eq. (4) and theory for SMC′ is from Eq. (10) (equivalently (A.1)). Interestingly, simulation results for the ARG are indistinguishable from those of SMC′.

3. The infinite-chromosome limit of the IBD process

In this section, we derive the mean number of shared segments and the mean fraction of the chromosome in shared segments at the infinite-chromosome limit, under the renewal approximation to SMC and SMC′. Let us first derive some general, model-independent results. Given a segment length distribution ψ(ℓ) and using the elementary renewal theorem (Karlin and Taylor (1975), Theorem 4.2), the mean total number of segments (of any length) for L → ∞ is

〈 n_{0} 〉 = \frac{L}{〈 ℓ 〉} = \frac{L}{\int_{0}^{\infty} ℓ ψ (ℓ) d ℓ} .

(12)

Using the elementary renewal theorem for reward processes (Karlin and Taylor (1975), chapter 5, section 7.C.II), the mean number of segments longer than m is, for L → ∞,

〈 n_{m} 〉 = 〈 n_{0} 〉 \int_{m}^{\infty} ψ (ℓ) d ℓ .

(13)

Similarly, the mean fraction of the chromosome found in segments longer than m is

〈 f_{m} 〉 = \frac{〈 n_{0} 〉}{L} \int_{m}^{\infty} ℓ ψ (ℓ) d ℓ .

(14)

We now turn to specific models, recovering previous results for SMC (Palamara et al., 2012) and obtaining new results for SMC′.

3.1. The SMC model

Under SMC, the distribution of segment lengths is given by Eq. (4). The mean total number of segments is

{〈 n_{0} 〉}_{SMC} = \frac{L}{\int_{0}^{\infty} \frac{4 N ℓ}{{(1 + 2 N ℓ)}^{3}} d ℓ} = 2 N L .

(15)

The mean number of shared segments longer than m is

{〈 n_{m} 〉}_{SMC} = 2 N L \int_{m}^{\infty} \frac{4 N}{{(1 + 2 N ℓ)}^{3}} d ℓ = \frac{2 N L}{{(1 + 2 m N)}^{2}} .

(16)

The mean fraction of the chromosome in segments longer than m is

{〈 f_{m} 〉}_{SMC} = 2 N \int_{m}^{\infty} \frac{4 N ℓ}{{(1 + 2 N ℓ)}^{3}} d ℓ = \frac{1 + 4 m N}{{(1 + 2 m N)}^{2}} .

(17)

Eq. (17) has been previously derived by Palamara et al. (2012), by studying the distribution of segment lengths surrounding a randomly chosen site. Simulation results for 〈 f_m 〉_SMC (Figure 6) agree well with Eq. (17). While simulations were shown before (Palamara et al., 2012; Carmi et al., 2013), here we are able to observe perfect agreement even for very small values of N. Eq. (16) was derived by Palamara et al. (2012) using the relation 〈 n_m 〉 = L 〈 f_m 〉/〈 ℓ_m 〉, where 〈 ℓ_m 〉 is the mean length of segments longer than m.

The mean fraction of the chromosome found in shared segments longer than m, 〈 *f_m* 〉. Simulation details are as in Figure 4. Simulation results and theory for SMC and the renewal approximation coincide. The renewal theory curve was obtained by numerically inverting Eq. (41). Theory for SMC and SMC′ (infinite-chromosome limits) is from Eqs. (17) and (A.3), respectively.

Eq. (16) can be derived in yet another way, using a result from Ralph and Coop (2013), who showed that for a fixed TMRCA t, the mean number of segments longer than m is K(t, m) = e⁻²^mNt[2Nt(L − m) + 1]. Integrating over all t using P_c(t) = e⁻^t, we have $〈 n_{m} 〉 = \int_{0}^{\infty} K (t, m) P_{c} (t) d t = (1 + 2 N L) {(1 + 2 m N)}^{2}$ . For L ≫ 1/2N, we recover Eq. (16). Also note that for a fixed t, the mean number of segments of length in [ℓ, ℓ + dℓ] is −∂K(t, ℓ)/∂ℓ dℓ. Integrating over all t as before, this gives 4N(1 + 2NL)/(1 + 2Nℓ)³ dℓ. Since the total number of segments (of all lengths) is K(t, 0) = (1 + 2NL), the probability of a random segment to be of length in [ℓ, ℓ + dℓ] is (ψ)dℓ = 4N/(1 + 2Nℓ)³ dℓ, exactly as in our Eq. (4).

3.2. The SMC′ model

Under SMC′, the distribution of segment lengths is given by Eq. (10). The mean total number of segments is (using Eq. (11))

{〈 n_{0} 〉}_{{SMC}^{'}} = \frac{L}{\int_{0}^{\infty} ℓ ψ_{{SMC}^{'}} (ℓ) d ℓ} = \frac{4 N L}{3} .

(18)

Eq. (18) represents a surprisingly simple result, stating that for long chromosomes, the mean number of segments in SMC′ is precisely 2/3 of the total number of recombination events (2NL). To provide an intuitive explanation, we recall (section 2.4) that the stationary distribution of tree heights at recombination sites in SMC′ is $π_{\infty}^{{SMC}^{'}} (t) = {t e}^{- t}$ (as in SMC). At a recombination site, there is probability 1 − q_SMC′(t|t) for the TMRCA to change and consequently, for the segment to terminate. Integrating over all t,

\int_{0}^{\infty} {t e}^{- t} [1 - q_{{SMC}^{'}} (t ∣ t)] d t = \int_{0}^{\infty} {t e}^{- t} \frac{2 t + 1 - e^{- 2 t}}{4 t} d t = \frac{2}{3} .

(19)

In fact, it can be shown that at stationarity, the new tree has equal probability to be either larger, smaller, or equal to the previous tree. Also note that the probability to change the MRCA at a recombination site is 2/3 also for the ARG (Griffiths and Marjoram (1997), Theorem 2.4).

Next, using Eqs. (10), (11), and (12), it can be seen that

ψ_{{SMC}^{'}} (ℓ) = \frac{\int_{0}^{\infty} λ^{2} (t) e^{- t - λ (t) ℓ} d t}{{〈 n_{0} 〉}_{{SMC}^{'}} / L} .

(20)

Using Eqs. (13) and (20), the mean number of segments longer than m is

{〈 n_{m} 〉}_{{SMC}^{'}} = {〈 n_{0} 〉}_{{SMC}^{'}} \int_{m}^{\infty} ψ_{{SMC}^{'}} (ℓ) d ℓ = L \int_{0}^{\infty} λ (t) e^{- t - λ (t) m} d t .

(21)

The final result, which we obtained using Mathematica (Wolfram Research, 2012), is given in Appendix A (Eq. (A.2)).

Finally, using Eqs. (14) and (20), we have

{〈 f_{m} 〉}_{{SMC}^{'}} = \frac{{〈 n_{0} 〉}_{{SMC}^{'}}}{L} \int_{m}^{\infty} ℓ ψ_{{SMC}^{'}} (ℓ) d ℓ = \int_{0}^{\infty} e^{- t - λ (t) m} [1 + λ (t) m] d t .

(22)

The result of the integral is given in Appendix A (Eq. (A.3)). Numerical evaluation shows perfect agreement with simulation results, for all values of N (Figure 6).

4. Renewal theory results for finite chromosomes

In this section, we use renewal theory to derive the complete distribution of our quantities of interest: the number of segments longer than m (section 4.1) and the fraction of the chromosome in segments longer than m (section 4.2), for a chromosome of a finite size L. In both cases, we derive an expression in Laplace space for the distribution (Eq. (32) for the number of segments and Eq. (38) for the fraction of the chromosome). Those expressions are general for any segment length distribution. We then substitute the specific SMC form, to obtain explicit expressions (Appendix B). As we show, the distributions can be numerically inverted and compared to simulations or be used for demographic inference. Using standard techniques, we also obtain the first two moments (in real space) for long (but finite) chromosomes. Our method in this section is adapted from the physics literature (Godrèche and Luck, 2001).

4.1. The distribution of the number of segments longer than m under the renewal approximation

4.1.1. Theory

Define P(n_m = k; L) as the probability that two chromosomes share exactly k segments longer than m over a sequence of length L, under the renewal IBD process defined in Table 2 (section 2.3). We will obtain P̃(n_m = k, s), the Laplace transform of P (n_m = k, L) with respect to L: $\tilde{P} (n_{m} = k, s) = \int_{0}^{\infty} e^{- s L} P (n_{m} = k, L) d L$ . Let us first define an auxiliary function, η_m(L)dL, which is the probability that, conditional on recombination at position 0 in the sequence, a) recombination occurred at position in [L, L + dL]; and b) all intermediate recombination events in [0, L] had terminated segments that were shorter than m. Note that η_m(L), as well as Q_m(k, L) below (Eq. (26)), are not PDFs. Then, η_m(L) satisfies

η_{m} (L) = δ (L) + \int_{0}^{min (m, L)} ψ (ℓ) η_{m} (L - ℓ) d ℓ .

(23)

In Eq. (23), δ(x) is Dirac’s delta function and ψ(ℓ) is the PDF of segment lengths. The derivation will proceed with a general ψ(ℓ); we will substitute the explicit SMC form (Eq. (4)) only at the final result. Eq. (23) is explained as follows. The first term (δ(L)) accounts for the case L = 0. Otherwise, we condition on the length of the last segment, ℓ, which cannot exceed either m or L. Given ℓ, we require the recombination at L − ℓ to end a series of short segments, which happens with probability η_m(L − ℓ). Note that we made use of the renewal property, namely the independence of successive segment lengths.

We now apply the Laplace transform (L → s) to both sides of Eq. (23),

\begin{array}{l} {\tilde{η}}_{m} (s) = 1 + \int_{0}^{\infty} e^{- s L} [\int_{0}^{min (m, L)} ψ (ℓ) η_{m} (L - ℓ) d ℓ] d L \\ = 1 + \int_{0}^{m} [\int_{ℓ}^{\infty} e^{- s L} ψ (ℓ) η_{m} (L - ℓ) d L] d ℓ \\ = 1 + \int_{0}^{m} e^{- s ℓ} ψ (ℓ) [\int_{ℓ}^{\infty} e^{- s (L - ℓ)} η_{m} (L - ℓ) d L] d ℓ \\ = 1 + \int_{0}^{m} e^{- s ℓ} ψ (ℓ) d ℓ \int_{0}^{\infty} e^{- s L^{'}} η_{m} (L^{'}) d L^{'} \\ = 1 + {\tilde{ψ}}_{< m} (s) {\tilde{η}}_{m} (s), \end{array}

(24)

where we defined ${\tilde{ψ}}_{< m} (s) \equiv \int_{0}^{m} e^{- s ℓ} ψ (ℓ) d ℓ$ . We thus obtained an algebraic equation for η̃_m(s), whose solution is

{\tilde{η}}_{m} (s) = {[1 - {\tilde{ψ}}_{< m} (s)]}^{- 1} .

(25)

Next, we define another auxiliary function, Q_m(k, L)dL, which is the probability that a) recombination occurred at position in [L, L + dL]; and b) that that recombination event has ended the kth segment longer than m. For k = 0, Q_m(0, L) = δ(L). For k > 0, we have the following recursion equation,

Q_{m} (k, L) = \int_{m}^{L} ψ (ℓ) [\int_{0}^{L - ℓ} η_{m} (ℓ^{'}) Q_{m} (k - 1, L - ℓ - ℓ^{'}) d ℓ^{'}] d ℓ .

(26)

Eq. (26) is explained similarly to Eq. (23). We condition on the length of the last segment, ℓ, which must be longer than m (but shorter than L). Given the preceding recombination at L − ℓ, we condition on the length of rightmost stretch of short segments, ℓ′, which has probability η_m(ℓ′). Note that η_m(L) does not depend on the absolute position along the sequence, again, due to the renewal property. Finally, given ℓ and ℓ′, there must have been a recombination event at L − ℓ − ℓ′ ending the (k − 1)th segment longer than m, with probability Q_m(k − 1, L − ℓ − ℓ′). We now apply the Laplace transform to Eq. (26),

\begin{array}{l} {\tilde{Q}}_{m} (k, s) = \int_{m}^{\infty} e^{- s L} {\int_{0}^{L} ψ (ℓ) [\int_{0}^{L - ℓ} η_{m} (ℓ^{'}) Q_{m} (k - 1, L - ℓ - ℓ^{'}) d ℓ^{'}] d ℓ} d L \\ = \int_{m}^{\infty} e^{- s ℓ} ψ (ℓ) d ℓ \int_{ℓ}^{\infty} e^{- s (L - ℓ)} [\int_{0}^{L - ℓ} η_{m} (ℓ^{'}) Q_{m} (k - 1, L - ℓ - ℓ^{'}) d ℓ^{'}] d L \\ = \int_{m}^{\infty} e^{- s ℓ} ψ (ℓ) d ℓ \int_{0}^{\infty} e^{- s L^{'}} [\int_{0}^{L^{'}} η_{m} (ℓ^{'}) Q_{m} (k - 1, L^{'} - ℓ^{'}) d ℓ^{'}] d L^{'} \\ = {\tilde{ψ}}_{> m} (s) {\tilde{η}}_{m} (s) {\tilde{Q}}_{m} (k - 1, s) = \frac{{\tilde{ψ}}_{> m} (s)}{1 - {\tilde{ψ}}_{< m} (s)} {\tilde{Q}}_{m} (k - 1, s), \end{array}

(27)

where ${\tilde{ψ}}_{> m} (s) \equiv \int_{m}^{\infty} e^{- s ℓ} ψ (ℓ) d ℓ$ , we used the fact that Q_m(k > 0, L < m) = 0, and in the last line, we used the convolution theorem and Eq. (25). Using Eq. (27) and the initial condition, Q̃_m(0, s) = 1, we have

{\tilde{Q}}_{m} (k, s) = {(\frac{{\tilde{ψ}}_{> m} (s)}{1 - {\tilde{ψ}}_{< m} (s)})}^{k} .

(28)

We next define $ϕ (ℓ) \equiv 1 - \int_{0}^{ℓ} ψ (ℓ^{'}) d ℓ^{'} = \int_{ℓ}^{\infty} ψ (ℓ^{'}) d ℓ^{'}$ , the probability that a segment extends for sequence length greater than ℓ. We are now in a position to compute P (n_m = k, L). For k > 0,

P (n_{m} = k, L) = \int_{0}^{m} ϕ (ℓ) [\int_{0}^{L - ℓ} η_{m} (ℓ^{'}) Q_{m} (k, L - ℓ - ℓ^{'}) d ℓ^{'}] d ℓ + \int_{m}^{L} ϕ (ℓ) [\int_{0}^{L - ℓ} η_{m} (ℓ^{'}) Q_{m} (k - 1, L - ℓ - ℓ^{'}) d ℓ^{'}] d ℓ .

(29)

For P(n_m = k, L), we do not require recombination at L. Therefore, we condition on the sequence length ℓ since the rightmost recombination event, with the probability of no recombination since then being ϕ(ℓ). Then, if ℓ < m, we require k segments longer than m to be seen by position L − ℓ, possibly followed by any number of short segments. If ℓ > m, then the sequence [L − ℓ, L] will form a segment longer than m on its own, and we only require k − 1 previous segments longer than m. Eq. (29) can be transformed similarly to Eqs. (24) and (27), yielding

\tilde{P} (n_{m} = k, s) = {\tilde{η}}_{m} (s) [{\tilde{ϕ}}_{< m} (s) {\tilde{Q}}_{m} (k, s) + {\tilde{ϕ}}_{> m} (s) {\tilde{Q}}_{m} (k - 1, s)],

(30)

where ${\tilde{ϕ}}_{< m} (s) = \int_{0}^{m} e^{- s ℓ} ϕ (ℓ) d ℓ$ and ${\tilde{ϕ}}_{> m} (s) = \int_{m}^{\infty} e^{- s ℓ} ϕ (ℓ) d ℓ$ . For k = 0, we have $P (n_{m} = 0, L) = \int_{0}^{min (m, L)} ϕ (ℓ) η_{m} (L - ℓ) d ℓ$ . Applying the Laplace transform gives

\tilde{P} (n_{m} = 0, s) = {\tilde{ϕ}}_{< m} (s) {\tilde{η}}_{m} (s) .

(31)

Combining Eqs. (25), (28), (30), and (31), and using ϕ̃_<_m(s)+ ϕ̃_>_m(s) = ϕ̃(s) = [1 − ψ̃(s)]/s and ψ̃_<_m(s) + ψ̃_>_m(s) = ψ̃(s), we finally obtain

\tilde{P} (n_{m} = k, s) = {\begin{cases} \frac{{\tilde{ϕ}}_{< m} (s)}{1 - {\tilde{ψ}}_{< m} (s)} & k = 0, \\ \frac{[1 - \tilde{ψ} (s)] [{\tilde{ψ}}_{> m} (s) + s {\tilde{ϕ}}_{> m} (s)]}{s {[1 - {\tilde{ψ}}_{< m} (s)]}^{2}} {[\frac{{\tilde{ψ}}_{> m} (s)}{1 - {\tilde{ψ}}_{< m} (s)}]}^{k - 1} & k > 0. \end{cases}

(32)

Eq. (32) is our main result, and is valid for any distribution of segment lengths, ψ(ℓ). Due to normalization, we expect $\sum_{k = 0}^{\infty} \tilde{P} (n_{m} = k, s) = \sum_{k = 0}^{\infty} \int_{0}^{\infty} e^{- s L} P (n_{m} = k, L) d L = \int_{0}^{\infty} e^{- s L} [\sum_{k = 0}^{\infty} P (n_{m} = k, L)] d L = \int_{0}^{\infty} e^{- s L} d L = 1 / s$ , as can be verified, after some algebra, from Eq. (32).

Our results have so far been general and could apply to any ‘IBD process’. We now substitute the SMC segment length PDF, ψ(ℓ) = 4N/(1 + 2Nℓ)³ (Eq. (4)). The distribution of the number of segments longer than m (Eq. (32)) under SMC is given in Eq. (B.1) (Appendix B). This can be numerically inverted (Hollenbeck, 1998; de Hoog et al., 1982) for each k, to obtain, for a given L, the distribution P (n_m = k). The theoretical prediction compares perfectly to simulation results for both SMC and the renewal approximation (Figure 7).

The distribution of the number of shared segments longer than m, *n_m*. Simulation details are as in Figure 4 (specifically, L = 2 and m = 0.01 (Morgans)). Theory for the renewal approximation was obtained by numerically inverting Eq. (B.1). The Poisson distribution has mean 2N L/(1 + 2mN)² (Eq. (16)).

4.1.2. The mean

The mean number of segments longer than m is $〈 n_{m} 〉 = \sum_{k = 0}^{\infty} k P (n_{m} = k, L)$ . Taking the Laplace transform of 〈 n_m 〉, using Eq. (32) and the relation $\sum_{k = 0}^{\infty} k x^{k} = x / {(1 - x)}^{2}$ , we obtain, after some algebra,

〈 \tilde{n_{m}} 〉 (s) = \frac{{\tilde{ψ}}_{> m} (s) + s {\tilde{ϕ}}_{> m} (s)}{s [1 - \tilde{ψ} (s)]} .

(33)

For SMC, we obtain, using Eq. (33) and Mathematica,

{〈 \tilde{n_{m}} 〉}_{SMC} (s) = \frac{4 N^{2} e^{- m s}}{s^{2} {(1 + 2 m N)}^{2} [{s e}^{\frac{s}{2 N}} Ei (- \frac{s}{2 N}) + 2 N]},

(34)

where Ei is the exponential integral function. Noting that ${lim}_{s \to 0} s^{2} {〈 \tilde{n_{m}} 〉}_{SMC} (s) = 2 N / {(1 + 2 m N)}^{2}$ , we have lim_L_→∞ 〈n_m〉_SMC /L = 2N/(1 + 2mN)², exactly as in Eq. (16).

4.1.3. The variance

The second moment of the number of segments longer than m can be computed using $〈 \tilde{n_{m}^{2}} 〉 (s) = \sum_{k = 0}^{\infty} k^{2} \tilde{P} (n_{m} = k, s)$ , from which the variance can be obtained. For SMC and for large L,

Var {[n_{m}]}_{SMC} = \frac{2 N L}{{(1 + 2 m N)}^{4}} [2 ln (2 N L) + 4 m N (m N - 1) - 5] + O ({ln}^{2} L) .

(35)

4.1.4. The Poisson approximation

Palamara et al. (2012), following Huff et al. (2011), proposed that the number of shared segments longer than m is Poisson distributed, with the infinite-chromosome mean, 〈n_m〉_SMC = 2NL/(1 + 2mN)² (Eq. (16)). The Poisson distribution fits the simulation results reasonably (Figure 7; see also section 4.2.4). Indeed, for large values of N and L, Eq. (35) gives $Var {[n_{m}]}_{SMC} \approx {〈 n_{m} 〉}_{SMC} \approx \frac{L}{2 m^{2} N}$ , as expected from a Poisson variable. Deviations appear for small values of N (Figure 7).

4.1.5. Demographic inference

The results of section 4.1.1 have attractive implications for demographic inference. While this is not our main focus here, we provide a simple demonstration. For a given population size N (and for L = 2 and m = 0.01 (Morgans)), we simulated the SMC IBD process R = 5000 times and recorded, for each run, the number of shared segments longer than m, n_m. This corresponds, roughly, to the information that will be available from sampling a single chromosome in 50 (diploid) individuals, although we note that in reality, pairs of chromosomes in a sample are weakly dependent (see Carmi et al. (2013) and the Discussion). Additionally, the underlying ancestral process is neither SMC nor even the coalescent with recombination, but there is rather a shared underlying pedigree (Wakeley et al., 2012); however, we leave investigation of more complex models to future studies. Given N, m, and L, the log-likelihood of the sample { $n_{m}^{(i)}$ }, i = 1, …, R, is

LL (N) = \sum_{i = 1}^{R} log P_{N, m} (n_{m} = n_{m}^{(i)}, L),

(36)

where P(n_m = k, L) is given by numerically inverting, s → L, Eq. (B.1). We then computed the maximum likelihood estimator,

\hat{N} = \underset{N}{arg max} LL (N) .

(37)

Simulation results (Figure 8) show that the estimator performs excellently, with standard deviation ≈ 0.01N or lower. The performance of the estimator deteriorates for large values of N, since the number of shared segments longer than m approaches zero (Figure 7; Eq. (16)). Under our “noise-free” simulations, even the simple-minded estimator, N̂ = 1/(m 〈f_m〉) − 3/(4m) (Carmi et al., 2013), performs well, although with bias (〈N〉/N ≈ 1.02; see Carmi et al. (2013)) and with ≈ 60% larger standard deviation than the maximum likelihood estimator.

Inference of the effective population size using the distribution of the number of shared segments longer than m. Simulations for N = 500, 1000, …, 5000 were performed as in Figure 4 and for R = 5000 pairs of chromosomes, and Eq. (37) was used to compute $\hat{N}$ , the estimator of the population size. We then repeated 100 times for each N, and each ratio $\hat{N}$ /N is shown as a dot. The dotted red line represents $\hat{N}$ = N and the blue line shows 〈 $\hat{N}$ 〉/N. The estimator in unbiased, with standard deviation as low as 0.003N for N = 500 and 0.011N for N = 5000.

4.2. The distribution of the fraction of the chromosome found in segments longer than m

4.2.1. Theory

Denote P(f_m) as the density of the fraction of the chromosome found in shared segments longer than m. The derivation of P(f_m) uses techniques similar to those used in section 4.1.1 and is tedious. We therefore omit the details and skip to the analysis of the final result. Let P (L_m, L) be the density of L_m ≡ Lf_m, the total sequence length found in shared segments longer than m, given a chromosome of length L, and let P̃_{L_m} (u, s) be its Laplace transform. This is a double Laplace transform: L → s and L_m → u, or ${\tilde{P}}_{L_{m}} (u, s) = \int_{0}^{\infty} \int_{0}^{\infty} e^{- {u L}_{m} - s L} P (L_{m}, L) {d L}_{m} d L$ . For the renewal IBD process defined in section 2.3 and with segment length PDF ψ(ℓ), it can be shown that

{\tilde{P}}_{L_{m}} (u, s) = \frac{\frac{1}{s} - \frac{1}{s} {\tilde{ψ}}_{< m} (s) + ϕ (m) [\frac{e^{- m (s + u)}}{s + u} - \frac{e^{- m s}}{s}] - \frac{{\tilde{ψ}}_{> m} (s + u)}{s + u}}{1 - {\tilde{ψ}}_{< m} (s) - {\tilde{ψ}}_{> m} (s + u)},

(38)

where, as section 4.1.1, $ϕ (ℓ) = 1 - \int_{0}^{ℓ} ψ (ℓ^{'}) d ℓ^{'}, {\tilde{ψ}}_{< m} (z) = \int_{0}^{m} e^{- z ℓ} ψ (ℓ) d ℓ$ , and ${\tilde{ψ}}_{> m} (z) = \int_{m}^{\infty} e^{- z ℓ} ψ (ℓ) d ℓ$ . For u = 0, we expect, due to normalization, ${\tilde{P}}_{L_{m}} (u = 0, s) = \int_{0}^{\infty} e^{- s L} \int_{0}^{\infty} P (L_{m}, L) {d L}_{m} d L = \int_{0}^{\infty} e^{- s L} d L = 1 / s$ , as can be verified from Eq. (38).

We then substituted the SMC form, ψ(ℓ) = 4N/(1 + 2Nℓ)³ (Eq. (4)), and evaluated Eq. (38) in Mathematica. The final result is given in Appendix B, Eq. (B.3). Eq. (B.3) can be numerically inverted with respect to both u and s (Brančík, 2011) to give P (L_m, L), from which we have P (f_m = L_m/L) = LP (L_m, L). The theoretical prediction agrees well with simulations (Figure 4). Very small deviations may be due to numerical errors in the two-dimensional inversion.

4.2.2. The mean

The mean sequence length in segments longer than m, 〈L_m〉, can be obtained (in s space) from P̃_{L_m} (u, s) by

〈 \tilde{L_{m}} 〉 (s) = {- \frac{\partial {\tilde{P}}_{L_{m}} (u, s)}{\partial u} |}_{u = 0} .

(39)

For a general ψ(ℓ), we obtain from Eq. (38),

〈 \tilde{L_{m}} 〉 (s) = \frac{ϕ (m) e^{- m s} (1 + m s) - {\tilde{ψ}}_{> m} (s)}{s^{2} [1 - \tilde{ψ} (s)]} .

(40)

For the SMC form of ψ(ℓ) (Eq. (4)), this gives

{〈 \tilde{L_{m}} 〉}_{SMC} (s) = e^{- m s} \frac{{s C}^{2} e^{\frac{s C}{2 N}} Ei (- \frac{s C}{2 N}) + 2 N (1 + 4 m N)}{s^{2} C^{2} [{s e}^{\frac{s}{2 N}} Ei (- \frac{s}{2 N}) + 2 N]},

(41)

where C = 1 + 2mN. The prediction of Eq. (41) turns out to be virtually identical (Figure 6) to the infinite-chromosome SMC expression (Eq. (17)), which can also be obtained by taking the limit s → 0 (corresponding to L → ∞) of Eq. (41).

4.2.3. The variance

The second moment of L_m is given by

〈 \tilde{L_{m}^{2}} 〉 (s) = {\frac{\partial^{2} {\tilde{P}}_{L_{m}} (u, s)}{\partial u^{2}} |}_{u = 0} .

(42)

Assuming the SMC form of ψ(ℓ) (Eq. (4)), the derivatives can be taken. While the resulting expression (in s space) can be numerically inverted, more insight is gained by looking at the large L limit. Considering only the first order expansion in s and inverting, we obtain ${lim}_{L \to \infty} 〈 f_{m}^{2} 〉 = {lim}_{L \to \infty} {〈 f_{m} 〉}^{2}$ or lim_L_→∞ Var [f_m] = 0, as expected. Expanding to the next order in s and inverting, we find

\begin{array}{l} Var {[f_{m}]}_{SMC} = \frac{ln (1 + 2 m N) [8 m N (1 + 2 m N - 2 m^{3} N^{3}) + 1]}{N L {(1 + 2 m N)}^{4}} \\ + \frac{2 m N {8 m^{3} N^{3} ln N + m N [4 m N [m N (ln 4 - 1) - 2] - 7] - 1}}{N L {(1 + 2 m N)}^{4}} \\ + \frac{16 m^{4} N^{4}}{{(1 + 2 m N)}^{4}} \frac{ln L}{N L} + O (\frac{{ln}^{2} L}{L^{2}}) . \end{array}

(43)

Eq. (43) is compared to simulations in Figure 9, showing excellent agreement with the renewal process. For large N, Var [f_m]_SMC ≈[ln (L/m) − 1/2]/(NL).

The variance of the fraction of the chromosome found in shared segments longer than m, Var [*f_m*]. Simulation details are as in Figure 4. The inset zooms in on the small N region. The renewal theory curve is the large L expansion given in Eq. (43). The line representing Carmi et al. (2013) is from Eq. (44), and the Poisson expression is from Eq. (50).

Carmi et al. (2013) computed the variance by approximating, for large N, the probability that two sites lie on shared segments, obtaining

Var [f_{m}] \approx \frac{ln (\frac{L}{m}) - 1}{N L} .

(44)

For ln(L/m) ≫ 1, Eq. (44) has the same limit as Eq. (43). Eq. (44) agrees well with simulations for large values of N (Figure 9); however, the approximation breaks down for small values of N.

4.2.4. The Poisson approximation

Palamara et al. (2012) approximated the number of shared segments longer than m, n_m, as a Poisson with mean 〈n_m〉_SMC = 2NL(1 + 2mN)² (Eq. (16); see also section 4.1.4). According to that approximation, L_m can be written as a sum of n_m independent random variables, each of which is distributed as $ψ_{m} (ℓ) = ψ_{SMC} (ℓ) / \int_{m}^{\infty} ψ (ℓ)$ . To compute the distribution of L_m under the Poisson approximation, P_Poisson(L_m, L), it is again convenient to work in Laplace space (see also Carmi et al. (2013)). Define ${\tilde{ψ}}_{m} (u) = \int_{0}^{\infty} e^{- u ℓ} ψ_{m} (ℓ) d ℓ$ , the Laplace transform (ℓ → u) of ψ_m(ℓ) and denote by P̃_{L_m},_Poisson(u, L) the Laplace transform, L_m → u, of P_Poisson(L_m, L). Using the convolution theorem, given n_m,

{\tilde{P}}_{L_{m}, Poisson} (u, L ∣ n_{m}) = {[{\tilde{ψ}}_{m} (u)]}^{n_{m}} .

(45)

Since n_m is assumed to be Poisson,

\begin{array}{l} {\tilde{P}}_{L_{m}, Poisson} (u, L) = \sum_{n = 0}^{\infty} e^{- {〈 n_{m} 〉}_{SMC}} \frac{{〈 n_{m} 〉}_{SMC}^{n} {[{\tilde{ψ}}_{m} (u)]}^{n}}{n!} \\ = exp [- {〈 n_{m} 〉}_{SMC} (1 - {\tilde{ψ}}_{m} (u))] . \end{array}

(46)

This gives

- ln [{\tilde{P}}_{L_{m}, Poisson} (u, L)] / L = \frac{u^{2} e^{\frac{u}{2 N}} Ei [- \frac{u (1 + 2 m N)}{2 N}]}{2 N} + \frac{e^{- m u} [2 N (e^{m u} + m u - 1) + u]}{{(1 + 2 m N)}^{2}},

(47)

where Ei is the exponential integral function. Using Eq. (47), P̃_{L_m},_Poisson(u, L) can be numerically inverted (Hollenbeck, 1998; de Hoog et al., 1982), showing (Figure 4) reasonable agreement with simulation results, albeit with deviations for small values of N.

To compute the variance under the Poisson approximation, we redefine ψ_m(ℓ) as

ψ_{m} (ℓ) = \frac{ψ_{SMC} (ℓ)}{\int_{m}^{L} ψ_{SMC} (ℓ) d ℓ}; m < ℓ < L,

(48)

imposing an upper limit at L, since otherwise $〈 ℓ_{m}^{2} 〉 \to \infty$ . Using the law of total variance,

\begin{array}{l} Var {[L_{m}]}_{Poisson} = 〈 Var [L_{m} ∣ n_{m}] 〉 + Var [〈 L_{m} ∣ n_{m} 〉] \\ = 〈 n_{m} 〉 Var [ℓ_{m}] + Var [n_{m}] {〈 ℓ_{m} 〉}^{2} = {〈 n_{m} 〉}_{SMC} 〈 ℓ_{m}^{2} 〉, \end{array}

(49)

where we used the fact that a Poisson variable has equal mean and variance. Using Eqs. (13), (15), and (48),

\begin{array}{l} Var {[f_{m}]}_{Poisson} = \frac{{〈 n_{m} 〉}_{SMC}}{L^{2}} \frac{\int_{m}^{L} ℓ^{2} ψ_{SMC} (ℓ) d ℓ}{\int_{m}^{L} ψ_{SMC} (ℓ) d ℓ} = \frac{2 N}{L} \int_{m}^{L} ℓ^{2} ψ (ℓ) d ℓ \\ = \frac{\frac{2 N (m - L) [m N (8 N L + 3) + 3 N L + 1]}{{(1 + 2 m N)}^{2} {(1 + 2 N L)}^{2}} + ln (\frac{1 + 2 N L}{1 + 2 m N})}{N L} . \end{array}

(50)

Here too, for large N, Var [f_m]_Poisson ≈ ln (L/m) /(NL), which is the same (for ln(L/m) ≫ 1) as the renewal theory limit (Eq. (43)). Eq. (50) agrees well with simulations for large values of N, but breaks down already for N ≲ 5000.

5. Variable population size

Many natural populations (including humans) did not maintain a constant population size throughout their history. As we show in this section, our results are generalizable to any arbitrary variable population size, N(t) = N₀ν(t). The key insight is that all results depend on a single quantity, the PDF of segment lengths, ψ(ℓ). This can be seen from Eqs. (12)–(14) (the infinite-chromosome results; section 3), Eq. (32) (the distribution of the number of shared segments longer than m; section 4.1), and Eq. (38) (the distribution of the fraction of the chromosome in segments longer than m; section 4.2). Therefore, we need only show how to compute ψ(ℓ) for an arbitrary ν(t). In sections 5.1 and 5.2, we compute ψ(ℓ) for SMC and SMC′, respectively, as well as derive the infinite-chromosome means.

5.1. The SMC model

Define h(t) = 1/ν(t). Li and Durbin (2011) derived the stationary distribution of tree heights at a recombination site (their supplementary Eq. (7)),

π_{\infty}^{SMC} (t) = \frac{t h (t) e^{- \int_{0}^{t} h (τ) d τ}}{\int_{0}^{\infty} e^{- \int_{0}^{t^{'}} h (τ) d τ} d t^{'}} .

(51)

Eq. (51) reduces to $π_{\infty}^{SMC} (t) = {t e}^{- t}$ (Eq. (2)) for a constant population size, where h(t) = 1. For a given tree height t, the sequence length between recombination events is distributed exponentially with rate 2N₀t. Therefore (see also Eq. (4)),

\begin{array}{l} ψ_{SMC} (ℓ) = \int_{0}^{\infty} π_{\infty}^{SMC} (t) \cdot 2 N_{0} {t e}^{- 2 N_{0} t ℓ} d t \\ = 2 N_{0} \frac{\int_{0}^{\infty} t^{2} h (t) e^{- \int_{0}^{t} h (τ) d τ - 2 N_{0} t ℓ} d t}{\int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t} . \end{array}

(52)

We can now evaluate Eqs. (12)–(14) for the infinite-chromosome means. The mean segment length is

\begin{array}{l} {〈 ℓ 〉}_{SMC} = 2 N_{0} \frac{\int_{0}^{\infty} t^{2} h (t) e^{- \int_{0}^{t} h (τ) d τ} [\int_{0}^{\infty} ℓ e^{- 2 N_{0} t ℓ} d ℓ] d t}{\int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t} \\ = \frac{\int_{0}^{\infty} h (t) e^{- \int_{0}^{t} h (τ) d τ} d t}{2 N_{0} \int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t} = \frac{1}{2 N_{0} \int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t} . \end{array}

(53)

Hence (see Eq. (12)),

{〈 n_{0} 〉}_{SMC} = 2 N_{0} L \int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t .

(54)

Note that we implicitly assumed that that lim_t _→∞ ν(t) < ∞. Eq. (54) can also be obtained using Corollary 3 in Li and Durbin (2011). For the mean number of segments longer than m, we obtain, using techniques similar to those used in Eq. (53),

{〈 n_{m} 〉}_{SMC} = 2 N_{0} L \int_{0}^{\infty} t h (t) e^{- \int_{0}^{t} h (τ) d τ - 2 N_{0} m t} d t .

(55)

Finally,

{〈 f_{m} 〉}_{SMC} = \int_{0}^{\infty} h (t) e^{- \int_{0}^{t} h (τ) d τ - 2 N_{0} m t} (1 + 2 N_{0} m t) d t .

(56)

Eq. (56) was also derived by Palamara et al. (2012). It can be verified that substituting h(t) = 1 in Eqs. (54), (55), and (56), we recover the results of section 3.1 (Eqs. (15), (16), and (17), respectively).

5.2. The SMC′ model

For SMC′, we need to recompute q_SMC′(t|s), the probability that the new tree height at a recombination site is t, given that the previous height was s (see Eq. (5)),

q_{{SMC}^{'}} (t ∣ s) = {\begin{cases} \int_{0}^{s} \frac{1}{s} [\int_{t_{r}}^{s} h (t_{c}) e^{- 2 \int_{t_{r}}^{t_{c}} h (τ) d τ} {d t}_{c}] {d t}_{r} & t = s, \\ \int_{0}^{t} \frac{1}{s} h (t) e^{- 2 \int_{t_{r}}^{t} h (τ) d τ} {d t}_{r} & t < s, \\ [\int_{0}^{s} \frac{1}{s} e^{- 2 \int_{t_{r}}^{s} h (τ) d τ} {d t}_{r}] h (t) e^{- \int_{s}^{t} h (τ) d τ} & t > s . \end{cases}

(57)

Eq. (57) is explained similarly to Eq. (5), once we recognize that coalescence occurs at (absolute) time t at rate h(t), and that the probability of no coalescence between [s, t] is $e^{- \int_{s}^{t} h (τ) d τ}$ (Griffiths and Tavare, 1994). It can be shown that Eq. (57) is normalized $\int_{0}^{t} q_{{SMC}^{'}} (t ∣ s) d t = 1$ , and that, as in the case of a constant population size (section 2.4), the stationary distribution of tree heights, $π_{\infty}^{{SMC}^{'}} (t)$ , is identical to that of SMC and is given by Eq. (51). It can also be shown that at stationarity, the new tree has equal probabilities to be either taller, shorter, or equal to the previous tree, as we have seen for a constant population size (section 3.2).

As in section 3.2, we define a chain with probabilities q_SMC′,seg(t|s) = q_SMC′(t|s)/[1− q_SMC′(s|s)] (as in Eq. (8)), whose stationary distribution, $π_{\infty}^{{SMC}^{'}, seg} (t)$ , is the distribution of tree heights at segment ends. Using the marginal distribution of tree heights at random sites (Griffiths and Tavare, 1994),

P_{c} (t) = h (t) e^{- \int_{0}^{t} h (τ) d τ},

(58)

and a detailed balance argument, it can be shown that

π_{\infty}^{{SMC}^{'}, seg} (t) = \frac{P_{c} (t) λ (t)}{\int_{0}^{\infty} P_{c} (t) λ (t) d t},

(59)

where

λ (t) = 2 N_{0} t [1 - q_{{SMC}^{'}} (t ∣ t)]

(60)

\begin{array}{l} = 2 N_{0} t [1 - \frac{1}{t} \int_{0}^{t} \int_{t_{r}}^{t} h (t_{c}) e^{- 2 \int_{t_{r}}^{t_{c}} h (τ) d τ} {d t}_{c} {d t}_{r}] \\ = N_{0} [t + e^{- 2 \int_{0}^{t} h (τ) d τ} \int_{0}^{t} e^{2 \int_{0}^{t^{'}} h (τ) d τ} d t^{'}] . \end{array}

(61)

The distribution of segment lengths is then given by (see also Eq. (10); section 2.4)

\begin{array}{l} ψ_{{SMC}^{'}} (ℓ) = \int_{0}^{\infty} π_{\infty}^{{SMC}^{'}, seg} (t) λ (t) e^{- λ (t) ℓ} d t \\ = \frac{\int_{0}^{\infty} P_{c} (t) λ^{2} (t) e^{- λ (t) ℓ} d t}{\int_{0}^{\infty} P_{c} (t) λ (t) d t} . \end{array}

(62)

Note that Eq. (62) depends solely on N(t), and as expected, the distribution of ρ = 2N₀ℓ is independent of N₀.

We now derive the infinite-chromosome means (section 3). The mean segment length is

{〈 ℓ 〉}_{{SMC}^{'}} = \int_{0}^{\infty} ℓ ψ_{{SMC}^{'}} (ℓ) d ℓ = {[\int_{0}^{\infty} P_{c} (t) λ (t) d t]}^{- 1},

(63)

where we used the fact that $\int_{0}^{\infty} P_{c} (t) d t = 1$ . Using Eq. (12) and after some algebra,

\begin{array}{l} {〈 n_{0} 〉}_{{SMC}^{'}} = L \int_{0}^{\infty} P_{c} (t) λ (t) d t \\ = N_{0} L \int_{0}^{\infty} h (t) e^{- \int_{0}^{t} h (τ) d τ} [t + e^{- 2 \int_{0}^{t} h (τ) d τ} \int_{0}^{t} e^{2 \int_{0}^{t^{'}} h (τ) d τ} d t^{'}] d t \\ = \frac{4 N_{0} L}{3} \int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t . \end{array}

(64)

This is, as expected, exactly 2/3 of the number of recombination events (Eq. (54)).

Using Eqs. (62) and (64), we can write

\begin{array}{l} ψ_{{SMC}^{'}} (ℓ) = \frac{\int_{0}^{\infty} P_{c} (t) λ^{2} (t) e^{- λ (t) ℓ} d t}{{〈 n_{0} 〉}_{{SMC}^{'}} / L} \\ = \frac{\int_{0}^{\infty} h (t) λ^{2} (t) e^{- \int_{0}^{t} h (τ) d τ - λ (t) ℓ} d t}{\frac{4 N_{0}}{3} \int_{0}^{\infty} e^{- \int_{0}^{t} h (τ) d τ} d t} . \end{array}

(65)

The mean number of segments longer than m is

\begin{array}{l} {〈 n_{m} 〉}_{{SMC}^{'}} = {〈 n_{0} 〉}_{{SMC}^{'}} \int_{m}^{\infty} ψ_{{SMC}^{'}} (ℓ) d ℓ \\ = L \int_{0}^{\infty} P_{c} (t) λ (t) e^{- λ (t) m} d t . \end{array}

(66)

Finally, the mean fraction of the chromosome in segments longer than m is

\begin{array}{l} {〈 f_{m} 〉}_{{SMC}^{'}} = \frac{{〈 n_{0} 〉}_{{SMC}^{'}}}{L} \int_{m}^{\infty} ℓ ψ_{{SMC}^{'}} (ℓ) d ℓ \\ = \int_{0}^{\infty} P_{c} (t) e^{- λ (t) m} [1 + λ (t) m] d t . \end{array}

(67)

It can be verified that all the results of this section reduce to the SMC results (section 5.1) for λ(t) = 2N₀t and to the constant population size results (section 3.2) for h(t) = 1.

6. Summary and discussion

In summary, we introduced a general framework for the IBD process in Markovian approximations of the coalescent with recombination (SMC and SMC′), as well as a new renewal approximation, in which tree heights on both sides of a recombination site are independent (section 2). We showed how previous results for the mean number of segments and the mean shared sequence length in SMC emerge naturally from our framework in the infinite-chromosome limit; we then derived these quantities under SMC′ (section 3). Using renewal theory, we derived expressions for the distributions of the number of shared segments (section 4.1) and the fraction of the chromosome in shared segments (section 4.2). Finally, we generalized our results to populations with variable size (section 5).

Our main contributions are a) providing a unified framework for the IBD process, depending exclusively on a single distribution (that of segment lengths), in which previous and new results are coherently derived and easily generalized; b) new results for SMC′: the distribution of tree heights at recombination sites (both conditional on the previous tree and at stationarity), the stationary distribution of tree heights at segment ends, the mean number of shared segments, and the mean fraction of the chromosome in shared segments; and c) introducing a novel renewal approximation, under which distributions of key quantities were obtained.

Our results rely on a number of simplifying assumptions, beyond the standard postulates of coalescent theory. First, our model considers segments shared between haploid chromosomes and does not incorporate any model for shared segments detection errors. In reality, genotyping errors, recent mutations, and phase uncertainty do not allow the confident detection of short segments, although this is partly remedied by our theory being entirely specified in terms of a length cutoff (m), which can be tuned for the quality of the data under examination. Next, when computing distributions, we assumed that sharing between each pair of chromosomes is independent, whereas in practice, scans for IBD search for shared segments between all pairs in a cohort. Indeed, as studied in detail by Carmi et al. (2013), while sharing between two pairs in a cohort is only weakly dependent, the cumulative effect increases the observed variance of the amount of overall sharing. Therefore, more work will be needed to understand the distribution of IBD sharing within a cohort. Finally, we derived all results for a single chromosome. To apply the results genome-wide, we must assume inter-chromosome independence, which may not be well justified for the very recent past (Wakeley et al., 2012).

Turning to the quality of the renewal approximation itself, we verified using simulations that for chromosome-wide properties (e.g. the total number of segments), the renewal results are indistinguishable from SMC. We do, however, expect small deviations for very short chromosomes and for very small populations (e.g., see Figure 9), when segments are few and long compared to the chromosome length and the distribution of tree heights does not reach stationarity. We also note that as opposed to SMC and SMC′ (Marjoram and Wall, 2006), the renewal approximation introduces an asymmetry between the two ends of the chromosome: while the segment at the left end has distribution ψ(ℓ), the segment at the right end has the distribution of the ‘age’ of the process (see Karlin and Taylor (1975) for more details). As we explained in section 2.2, the number of segments is typically so large that this has a negligible effect. However, one can also formulate a stationary renewal process, which begins at coordinate −∞, while observations begin at the origin (Karlin and Taylor, 1975). With some effort, we could rederive all results under the stationary process (not shown).

Our results have consequences for demographic inference. Current approaches rely on the assumption that recombination events terminate shared segments, as in SMC (Palamara et al., 2012; Ralph and Coop, 2013). Using our results, the more accurate SMC′ can now be used, particularly for small populations. The distribution of the number of shared segments is also expected to be useful, as we briefly demonstrated (Figure 8). The case we studied is simple, and would have been easily solved by other methods (e.g., Palamara et al. (2012); Carmi et al. (2013)). Nevertheless, our approach has the attractive feature of providing a maximum-likelihood estimator (under the assumptions discussed above). Of course, for either large populations or for the very remote past, long IBD segments are scarce and our method, like any other IBD-based estimator, will have limited power.

Another drawback of our method is that it requires a numerical Laplace transform inversion, and for complex demographies, even the Laplace space solution will have to be numerically computed. Nevertheless, computationally, this is not very different from any method based on results specified as integrals or sums. The inverse transform (at least for the distribution of the number of shared segments) was simple to compute and reliable, as we validated by simulations (e.g., Figure 7), as well as by comparing a number of inversion methods (not shown). Running time was reasonably short, at ≈ 2.5 seconds for each N on a standard machine. We anticipate that using the results for the fraction of the chromosome in shared segments (section 4.2) will have more limited applications, due to the need for a double Laplace transform inversion. But we also note that, as we showed in sections 4.1.2, 4.1.3, 4.2.2, and 4.2.3, standard Laplace transform techniques allow insight into the moments of the examined distributions. The Laplace transform method is ideal for problems of Markovian evolution in time or sequence space that are otherwise difficult (e.g., Lohse et al. (2011)), and is therefore expected to be of future interest in population genetics.

We foresee a number of future directions and potential extensions. First, it would be useful (e.g., for demographic inference) to have analytical forms for simple non-constant demographies, such as exponential expansions and bottlenecks. Second, while we provided an equation for ψ_SMC′ (ℓ), the PDF of segment lengths in SMC′ (Eq. (10)), we did not investigate the corresponding renewal approximation, which should be feasible, since all of our renewal-based results are given in terms of a general segment length distribution. This is expected to rise in importance with the increasing popularity of SMC′ (e.g., Harris and Nielsen (2013)) and the emerging understanding that it provides a much better approximation to the coalescent with recombination than SMC (e.g., Hobolth and Jensen (2014)). Another potential future application is pedigree reconstruction using IBD segments (Huff et al., 2011; Henn et al., 2012). For example, for (half-) cousins separated by 2k meioses, the segment length distribution will be a superposition of an exponential with rate 2k, with probability 2⁻²^k, and ψ_SMC(ℓ) or ψ_SMC′ (ℓ) otherwise (Eqs. (4) and (10), respectively). A more challenging extension will be to sharing between more than two chromosomes. Potentially interesting applications are awaiting, as methods for the detection of such segments have been developed (Gusev et al., 2011; Moltke et al., 2011; He, 2013), and the resulting information is expected to improve the accuracy of demographic inference, natural selection detection, and disease mapping.

Acknowledgments

We thank Asger Hobolth for commenting on the manuscript and for pointing out a number of arguments regarding Markov chains reversibility and detailed balance. S. C. thanks Eli Barkai, whose lecture notes on renewal theory have been heavily consulted, and the Human Frontier Science Program for financial support. I. P. thanks NIH grant 1R01MH095458-01A1.

Appendix A. Full expressions for SMC′ results

In this section, we provide full expressions for a number of SMC′ quantities that were expressed as integrals in the main text. The full expression for the distribution of segment lengths (Eq. (10), section 2.4) is

\begin{array}{l} ψ_{{SMC}^{'}} (ℓ) = \frac{\int_{0}^{\infty} e^{- t} λ^{2} (t) e^{- λ (t) ℓ} d t}{\int_{0}^{\infty} e^{- t} λ (t) d t} \\ = 3 e^{- \frac{q}{2} (1 + 2 π i)} q^{- \frac{q + 1}{2}} {[64 ℓ q Γ {(\frac{1 - q}{2})}^{2}]}^{- 1} \times \\ {π^{2} q^{2} e^{i π q} q^{\frac{q + 1}{2}} {sec}^{2} (\frac{π q}{2}) \times \\ [4 {}_{2}{\tilde{F}}_{2} (\frac{q + 1}{2}, \frac{q + 1}{2}; \frac{q + 3}{2}, \frac{q + 3}{2}; \frac{q}{2}) \\ - {(q + 1)}^{2} {}_{2}{\tilde{F}}_{2} (\frac{q + 3}{2}, \frac{q + 3}{2}; \frac{q + 5}{2}, \frac{q + 5}{2}; \frac{q}{2}) \\ + 4 Γ (\frac{q + 1}{2}) {}_{3}{\tilde{F}}_{3} (\frac{q + 1}{2}, \frac{q + 1}{2}, \frac{q + 1}{2}; \frac{q + 3}{2}, \frac{q + 3}{2}, \frac{q + 3}{2}; \frac{q}{2})] \\ + i 2^{\frac{q + 3}{2}} e^{\frac{i π q}{2}} Γ (\frac{1 - q}{2}) \times \\ [Γ (\frac{1 - q}{2}) [q^{2} Γ (\frac{q + 1}{2}, - \frac{q}{2}) + 4 q Γ (\frac{q + 3}{2}, - \frac{q}{2}) + 4 Γ (\frac{q + 5}{2}, - \frac{q}{2})] \\ - π sec (\frac{π q}{2}) (4 q^{2} + 6 q + 3)]}, \end{array}

(A.1)

where q = N ℓ and _aF̃_b is the regularized generalized hypergeometric function (Weisstein, 2014). See simulation results in Figure 5. Note that ψ_SMC′ (ℓ) is necessary real (and similarly below).

The full expression for the mean number of shared segments longer than m (Eq. (21), section 3.2) is

\begin{array}{l} {〈 n_{m} 〉}_{{SMC}^{'}} = L \int_{0}^{\infty} λ (t) e^{- t - λ (t) m} d t \\ = LNi \frac{{(\frac{- e M}{2})}^{- \frac{M}{2}}}{2 \sqrt{2 M}} \times {Γ (\frac{M + 1}{2}, - \frac{M}{2}) + \frac{2}{M} Γ (\frac{M + 3}{2}, - \frac{M}{2}) + Γ (\frac{M + 1}{2}) [ψ^{0} (\frac{M + 1}{2}) - 2 - i π - ln \frac{M}{2} - \frac{1}{M}] - G_{2, 3}^{3, 0} (- \frac{M}{2} | \begin{matrix} 1, 1 \\ 0, 0, \frac{M + 1}{2} \end{matrix})}, \end{array}

(A.2)

where M = mN, G is the Meijer G-function (Weisstein, 2014), Γ (with two arguments) is the incomplete Gamma function, and ψ⁰ is the digamma function.

The full expression for the mean fraction of the chromosome in segments longer than m (Eq. (22), section 3.2) is

\begin{array}{l} {〈 f_{m} 〉}_{{SMC}^{'}} = \frac{{(\frac{- e M}{2})}^{- \frac{M}{2}}}{2 \sqrt{2 M}} {\frac{M^{3 / 2}}{\sqrt{2}} G_{2, 3}^{3, 0} (- \frac{M}{2} | \begin{matrix} \frac{1}{2}, \frac{1}{2} \\ - \frac{1}{2}, - \frac{1}{2}, \frac{M}{2} \end{matrix}) \\ + i (M + 2) Γ (\frac{M + 1}{2}, - \frac{M}{2}) + 2 i Γ (\frac{M + 3}{2}, - \frac{M}{2}) \\ + i M Γ (\frac{M + 1}{2}) [ψ^{0} (\frac{M + 1}{2}) - ln \frac{M}{2} - 2 - i π - \frac{3}{M}]}, \end{array}

(A.3)

where M = mN. See simulations results in Figure 6.

Appendix B. Full expression for the renewal theory results

In the renewal approximation to SMC, the distribution of the number of segments longer than m, in Laplace space (Eq. (32); section 4.1.1), is

\begin{array}{l} \tilde{P} (n_{m} = k, s) = C^{- 2} 2^{3 - n} N^{2} e^{- m s} [{s e}^{\frac{s}{2 N}} Ei (- \frac{s}{2 N}) + 2 N] \\ \times {s^{2} e^{\frac{s}{2 N}} [E_{1} (\frac{s}{2 N}) - E_{1} (\frac{s C}{2 N})] + 2 D - 2 N s}^{- 2} \\ \times {\frac{s^{2} e^{\frac{s}{2 N}} Ei (- \frac{s C}{2 N}) + 2 D}{\frac{s^{2}}{2} e^{\frac{s}{2 N}} [Γ (0, \frac{s}{2 N}) - Γ (0, \frac{s C}{2 N})] - N s + D}}^{n - 1} \end{array}

(B.1)

for k > 0 and

\tilde{P} (n_{m} = 0, s) = {\frac{4 N^{2} C^{- 1}}{e^{m s} C [2 N - {s e}^{\frac{s}{2 N}} [E_{1} (\frac{s}{2 N}) - E_{1} (\frac{s C}{2 N})]] - 2 N} + s}^{- 1}

(B.2)

for k = 0, where C = 1 + 2mN, D = Ne⁻^ms(sC −2N)/C² and E₁ is related to the exponential integral function (E₁(x) = −E_i(−x)) (Weisstein, 2014).

The distribution of the fraction of the chromosome in segments longer than m, in Laplace space (Eq. (38), section 4.2.1), is

{\tilde{P}}_{L_{m}} (u, s) = A / (1 - B),

(B.3)

where A is given by

\begin{array}{l} 4 C^{2} A = \frac{4 e^{- m r}}{r} - \frac{4 e^{- m s}}{s} + \frac{4}{s} + \frac{2 (1 - e^{- m s})}{N} \\ + \frac{e^{- m r}}{N^{2} r} [C^{2} r^{2} e^{\frac{C r}{2 N}} Ei (- \frac{C r}{2 N}) + 2 N (C r - 2 N)] \\ + \frac{4 e^{- m s}}{N s} {\frac{s}{2} (e^{m s} - 1) + 2 m^{2} N^{2} {s e}^{m s} + N [e^{m s} (2 m s - 1) + 1 - m s]} \\ + \frac{{s C}^{2} e^{\frac{s}{2 N}}}{N^{2}} [Γ (0, \frac{s C}{2 N}) - Γ (0, \frac{s}{2 N})], \end{array}

(B.4)

B is given by

\begin{array}{l} 4 N^{2} C^{2} B = 4 N^{2} [4 m^{2} N^{2} + 2 m N (2 - m s) + 1 - 2 m s + e^{- m s} (m s - 1)] \\ + 2 N s (e^{- m s} - 1) + C^{2} s^{2} e^{\frac{s}{2 N}} [Γ (0, \frac{s}{2 N}) - Γ (0, \frac{s C}{2 N})] \\ - e^{- m r} [C^{2} r^{2} e^{\frac{C r}{2 N}} Ei (- \frac{C r}{2 N}) + 2 N (C r - 2 N)], \end{array}

(B.5)

C = 1 + 2mN, and r = s + u.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Atzmon G, Hao L, Pe’er I, Velez C, Pearlman A, Palamara PF, Morrow B, Friedman E, Oddoux C, Burns E, Ostrer H. Abraham’s children in the genome era: Major Jewish diaspora populations comprise distinct genetic clusters with shared middle eastern ancestry. Am J Hum Genet. 2010;86:850–859. doi: 10.1016/j.ajhg.2010.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Botigué LR, Henn BM, Gravel S, Maples BK, Gignoux CR, Corona E, Atzmon G, Burns E, Ostrer H, Flores C, Bertranpetit J, Comas D, Bustamante CD. Gene flow from North Africa contributes to differential human genetic diversity in Southern Europe. Proc Natl Acad Sci USA. 2013;110:11791–11796. doi: 10.1073/pnas.1306223110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brančík L. Numerical Inverse Laplace Transforms for Electrical Engineering Simulation, MATLAB for Engineers - Applications in Control, Electrical Engineering, IT and Robotics. InTech; 2011. [Google Scholar]
Bray SM, Mulle JG, Dodd AF, Pulver AE, Wooding S, Warren ST. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population. P Natl Acad Sci USA. 2010;107:16222–16227. doi: 10.1073/pnas.1004381107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown MD, Glazner CG, Zheng C, Thompson EA. Inferring coancestry in population samples in the presence of linkage disequilibrium. Genetics. 2012;190:1447–1460. doi: 10.1534/genetics.111.137570. [DOI] [PMC free article] [PubMed] [Google Scholar]
Browning BL, Browning SR. A fast, powerful method for detecting identity by descent. Am J Hum Genet. 2011;88:173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013a;194:459–471. doi: 10.1534/genetics.113.150029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Browning SR, Browning BL. Identity by descent between distant relatives: Detection and applications. Annu Rev Genet. 2012;46:615–631. doi: 10.1146/annurev-genet-110711-155534. [DOI] [PubMed] [Google Scholar]
Browning SR, Browning BL. Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort. Hum Genet. 2013b;132:129–138. doi: 10.1007/s00439-012-1230-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Browning SR, Thompson EA. Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics. 2012;190:1521–1531. doi: 10.1534/genetics.111.136937. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carmi S, Palamara PF, Vacic V, Lencz T, Darvasi A, Pe’er I. The variance of identity-by-descent sharing in the wright-fisher model. Genetics. 2013;193:911–928. doi: 10.1534/genetics.112.147215. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chapman NH, Thompson EA. A model for the length of tracts of identity by descent in finite random mating populations. Theor Pop Biol. 2003;64:141–150. doi: 10.1016/s0040-5809(03)00071-6. [DOI] [PubMed] [Google Scholar]
de Hoog FR, Knight JH, Stokes AN. An improved method for numerical inversion of Laplace transforms. SIAM J Sci and Stat Comput. 1982;3:357–366. code by K. J. Hollenbeck, INVLAP.M: A matlab function for numerical inversion of Laplace transforms by the de Hoog algorithm, 1998. [Google Scholar]
Fisher RA. A fuller theory of “junctions” in inbreeding. Heredity. 1954;8:187–197. [Google Scholar]
Gauvin H, Moreau C, Lefebvre JF, Laprise C, Vezina H, Labuda D, Roy-Gagnon MH. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur J Hum Genet. 2013 doi: 10.1038/ejhg.2013.227. [DOI] [PMC free article] [PubMed] [Google Scholar]
Godréche C, Luck JM. Statistics of the occupation time of renewal processes. J Stat Phys. 2001;104:489. [Google Scholar]
Griffiths RC, Marjoram P. Ancestral inference from samples of dna sequences with recombination. J Comput Biol. 1996;3:479–502. doi: 10.1089/cmb.1996.3.479. [DOI] [PubMed] [Google Scholar]
Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavaré S, editors. Progress in Population Genetics and Human Evolution (IMA Volumes in Mathematics and its Applications) Vol. 87. Springer-Verlag; Berlin: 1997. pp. 257–270. [Google Scholar]
Griffiths RC, Tavare S. Sampling theory for neutral alleles in a varying environment. Philos Trans R Soc Lond B Biol Sci. 1994;344:403–410. doi: 10.1098/rstb.1994.0079. [DOI] [PubMed] [Google Scholar]
Gusev A, Kenny EE, Lowe JK, Salit J, Saxena R, Kathiresan S, Altshuler DM, Friedman JM, Breslow JL, Pe’er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am J Hum Genet. 2011;88:706–717. doi: 10.1016/j.ajhg.2011.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gusev A, Lowe JK, Stoffel M, Daly MJ, Altshuler D, Breslow JL, Friedman JM, Pe’er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–326. doi: 10.1101/gr.081398.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han L, Abney M. Using identity by descent estimation with dense genotype data to detect positive selection. Eur J Hum Genet. 2013;21:205–211. doi: 10.1038/ejhg.2012.148. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harris K, Nielsen R. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 2013;9:e1003521. doi: 10.1371/journal.pgen.1003521. [DOI] [PMC free article] [PubMed] [Google Scholar]
He D. IBD-Groupon: an efficient method for detecting group-wise identity-by-descent regions simultaneously in multiple individuals based on pairwise IBD relationships. Bioinformatics. 2013;29:i162–170. doi: 10.1093/bioinformatics/btt237. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henn BM, Hon L, Macpherson JM, Eriksson N, Saxonov S, Pe’er I, Mountain JL. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS One. 2012;7:e34267. doi: 10.1371/journal.pone.0034267. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hobolth A, Jensen JL. Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor Popul Biol. 2014 doi: 10.1016/j.tpb.2014.01.002. [DOI] [PubMed] [Google Scholar]
Hollenbeck KJ. INVLAP.M: A matlab function for numerical inversion of Laplace transforms by the de Hoog algorithm 1998 [Google Scholar]
Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983;23:183–201. doi: 10.1016/0040-5809(83)90013-8. [DOI] [PubMed] [Google Scholar]
Huff CD, Witherspoon DJ, Simonson TS, Xing J, Watkins WS, Zhang Y, Tuohy TM, Neklason DW, Burt RW, Guthery SL, Woodward SR, Jorde LB. Maximum-likelihood estimation of recent shared ancestry (ERSA) Genome Res. 2011;21:768–774. doi: 10.1101/gr.115972.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huff CD, Xing J, Rogers AR, Witherspoon D, Jorde LB. Mobile elements reveal small population size in the ancient ancestors of Homo sapiens. Proc Natl Acad Sci, USA. 2010;107:2147–2152. doi: 10.1073/pnas.0909000107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlin S, Taylor HM. A First Course in Stochastic Processes. 2. Academic Press; 1975. [Google Scholar]
Kong A, Masson G, Frigge ML, Gylfason A, Zusmanovich P, Thorleifsson G, Olason PI, Ingason A, Steinberg S, Rafnar T, Sulem P, Mouy M, Jonsson F, Thorsteinsdottir U, Gudbjartsson DF, Stefansson H, Stefansson K. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet. 2008;9:1068–1075. doi: 10.1038/ng.216. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. doi: 10.1038/nature10231. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin R, Charlesworth J, Stankovich J, Perreau VM, Brown MA, Taylor BV. Identity-by-descent mapping to detect rare variants conferring susceptibility to multiple sclerosis. PloS One. 2013;8:e56379. doi: 10.1371/journal.pone.0056379. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lohse K, Harrison RJ, Barton NH. A general method for calculating likelihoods under the coalescent process. Genetics. 2011;189:977–987. doi: 10.1534/genetics.111.129569. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marjoram P, Wall JD. Fast “coalescent” simulation. BMC Genetics. 2006;7:16. doi: 10.1186/1471-2156-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
McVean GAT, Cardin NJ. Approximating the coalescent with recombination. Phil Trans R Soc B. 2005;360:1387–1393. doi: 10.1098/rstb.2005.1673. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moltke I, Albrechtsen A, Hansen TV, Nielsen FC, Nielsen R. A method for detecting IBD regions simultaneously in multiple individuals–with applications to disease genetics. Genome Res. 2011;21:1168–1180. doi: 10.1101/gr.115360.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moorjani P, Patterson N, Loh PR, Lipson M, Kisfali P, Melegh BI, Bonin M, Kadasi L, Riess O, Berger B, Reich D, Melegh B. Reconstructing Roma history from genome-wide data. PloS One. 2013;8:e58633. doi: 10.1371/journal.pone.0058633. [DOI] [PMC free article] [PubMed] [Google Scholar]
Palamara PF, Lencz T, Darvasi A, Pe’er I. Length distributions of identity by descent reveal fine-scale demographic history. Am J Hum Genet. 2012;91:809–822. doi: 10.1016/j.ajhg.2012.08.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Palamara PF, Pe’er I. Inference of historical migration rates via hap-lotype sharing. Bioinformatics. 2013;29:i180–188. doi: 10.1093/bioinformatics/btt239. [DOI] [PMC free article] [PubMed] [Google Scholar]
Palin K, Campbell H, Wright AF, Wilson JF, Durbin R. Identity-by-descent-based phasing and imputation in founder populations using graphical models. Genet Epidemiol. 2011;35:853–860. doi: 10.1002/gepi.20635. [DOI] [PMC free article] [PubMed] [Google Scholar]
Purcell S, Neale N, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ralph P, Coop G. The geography of recent genetic ancestry across europe. PLoS Biol. 2013;11:e1001555. doi: 10.1371/journal.pbio.1001555. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stam P. The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet Res. 1980;35:131–155. [Google Scholar]
Thompson EA. Identity by descent: variation in meiosis, across genomes, and in populations. Genetics. 2013;194:301–326. doi: 10.1534/genetics.112.148825. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wakeley J, King L, Low BS, Ramachandran S. Gene genealogies within a fixed pedigree, and the robustness of kingman’s coalescent. Genetics. 2012;190:1433–1445. doi: 10.1534/genetics.111.135574. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weisstein EW. MathWorld–A Wolfram web resource. 2014 URL http://mathworld.wolfram.com.
Wiuf C, Hein J. Recombination as a point process along sequences. Theor Popul Biol. 1999;55:248–259. doi: 10.1006/tpbi.1998.1403. [DOI] [PubMed] [Google Scholar]
Wolfram Research I. Mathematica, version 9.0 Edition. Wolfram Research, Inc; Champaign, Illinois: 2012. [Google Scholar]
Zheng C, Kuhner MK, Thompson EA. Bayesian inference of local trees along chromosomes by the sequential Markov coalescent. J Mol Evol. 2014;78:279–292. doi: 10.1007/s00239-014-9620-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Atzmon G, Hao L, Pe’er I, Velez C, Pearlman A, Palamara PF, Morrow B, Friedman E, Oddoux C, Burns E, Ostrer H. Abraham’s children in the genome era: Major Jewish diaspora populations comprise distinct genetic clusters with shared middle eastern ancestry. Am J Hum Genet. 2010;86:850–859. doi: 10.1016/j.ajhg.2010.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Botigué LR, Henn BM, Gravel S, Maples BK, Gignoux CR, Corona E, Atzmon G, Burns E, Ostrer H, Flores C, Bertranpetit J, Comas D, Bustamante CD. Gene flow from North Africa contributes to differential human genetic diversity in Southern Europe. Proc Natl Acad Sci USA. 2013;110:11791–11796. doi: 10.1073/pnas.1306223110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Brančík L. Numerical Inverse Laplace Transforms for Electrical Engineering Simulation, MATLAB for Engineers - Applications in Control, Electrical Engineering, IT and Robotics. InTech; 2011. [Google Scholar]

[R4] Bray SM, Mulle JG, Dodd AF, Pulver AE, Wooding S, Warren ST. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population. P Natl Acad Sci USA. 2010;107:16222–16227. doi: 10.1073/pnas.1004381107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Brown MD, Glazner CG, Zheng C, Thompson EA. Inferring coancestry in population samples in the presence of linkage disequilibrium. Genetics. 2012;190:1447–1460. doi: 10.1534/genetics.111.137570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Browning BL, Browning SR. A fast, powerful method for detecting identity by descent. Am J Hum Genet. 2011;88:173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013a;194:459–471. doi: 10.1534/genetics.113.150029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Browning SR, Browning BL. Identity by descent between distant relatives: Detection and applications. Annu Rev Genet. 2012;46:615–631. doi: 10.1146/annurev-genet-110711-155534. [DOI] [PubMed] [Google Scholar]

[R9] Browning SR, Browning BL. Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort. Hum Genet. 2013b;132:129–138. doi: 10.1007/s00439-012-1230-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Browning SR, Thompson EA. Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics. 2012;190:1521–1531. doi: 10.1534/genetics.111.136937. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Carmi S, Palamara PF, Vacic V, Lencz T, Darvasi A, Pe’er I. The variance of identity-by-descent sharing in the wright-fisher model. Genetics. 2013;193:911–928. doi: 10.1534/genetics.112.147215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Chapman NH, Thompson EA. A model for the length of tracts of identity by descent in finite random mating populations. Theor Pop Biol. 2003;64:141–150. doi: 10.1016/s0040-5809(03)00071-6. [DOI] [PubMed] [Google Scholar]

[R13] de Hoog FR, Knight JH, Stokes AN. An improved method for numerical inversion of Laplace transforms. SIAM J Sci and Stat Comput. 1982;3:357–366. code by K. J. Hollenbeck, INVLAP.M: A matlab function for numerical inversion of Laplace transforms by the de Hoog algorithm, 1998. [Google Scholar]

[R14] Fisher RA. A fuller theory of “junctions” in inbreeding. Heredity. 1954;8:187–197. [Google Scholar]

[R15] Gauvin H, Moreau C, Lefebvre JF, Laprise C, Vezina H, Labuda D, Roy-Gagnon MH. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur J Hum Genet. 2013 doi: 10.1038/ejhg.2013.227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Godréche C, Luck JM. Statistics of the occupation time of renewal processes. J Stat Phys. 2001;104:489. [Google Scholar]

[R17] Griffiths RC, Marjoram P. Ancestral inference from samples of dna sequences with recombination. J Comput Biol. 1996;3:479–502. doi: 10.1089/cmb.1996.3.479. [DOI] [PubMed] [Google Scholar]

[R18] Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavaré S, editors. Progress in Population Genetics and Human Evolution (IMA Volumes in Mathematics and its Applications) Vol. 87. Springer-Verlag; Berlin: 1997. pp. 257–270. [Google Scholar]

[R19] Griffiths RC, Tavare S. Sampling theory for neutral alleles in a varying environment. Philos Trans R Soc Lond B Biol Sci. 1994;344:403–410. doi: 10.1098/rstb.1994.0079. [DOI] [PubMed] [Google Scholar]

[R20] Gusev A, Kenny EE, Lowe JK, Salit J, Saxena R, Kathiresan S, Altshuler DM, Friedman JM, Breslow JL, Pe’er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am J Hum Genet. 2011;88:706–717. doi: 10.1016/j.ajhg.2011.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Gusev A, Lowe JK, Stoffel M, Daly MJ, Altshuler D, Breslow JL, Friedman JM, Pe’er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–326. doi: 10.1101/gr.081398.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Han L, Abney M. Using identity by descent estimation with dense genotype data to detect positive selection. Eur J Hum Genet. 2013;21:205–211. doi: 10.1038/ejhg.2012.148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Harris K, Nielsen R. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 2013;9:e1003521. doi: 10.1371/journal.pgen.1003521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] He D. IBD-Groupon: an efficient method for detecting group-wise identity-by-descent regions simultaneously in multiple individuals based on pairwise IBD relationships. Bioinformatics. 2013;29:i162–170. doi: 10.1093/bioinformatics/btt237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Henn BM, Hon L, Macpherson JM, Eriksson N, Saxonov S, Pe’er I, Mountain JL. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS One. 2012;7:e34267. doi: 10.1371/journal.pone.0034267. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Hobolth A, Jensen JL. Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor Popul Biol. 2014 doi: 10.1016/j.tpb.2014.01.002. [DOI] [PubMed] [Google Scholar]

[R27] Hollenbeck KJ. INVLAP.M: A matlab function for numerical inversion of Laplace transforms by the de Hoog algorithm 1998 [Google Scholar]

[R28] Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983;23:183–201. doi: 10.1016/0040-5809(83)90013-8. [DOI] [PubMed] [Google Scholar]

[R29] Huff CD, Witherspoon DJ, Simonson TS, Xing J, Watkins WS, Zhang Y, Tuohy TM, Neklason DW, Burt RW, Guthery SL, Woodward SR, Jorde LB. Maximum-likelihood estimation of recent shared ancestry (ERSA) Genome Res. 2011;21:768–774. doi: 10.1101/gr.115972.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Huff CD, Xing J, Rogers AR, Witherspoon D, Jorde LB. Mobile elements reveal small population size in the ancient ancestors of Homo sapiens. Proc Natl Acad Sci, USA. 2010;107:2147–2152. doi: 10.1073/pnas.0909000107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Karlin S, Taylor HM. A First Course in Stochastic Processes. 2. Academic Press; 1975. [Google Scholar]

[R32] Kong A, Masson G, Frigge ML, Gylfason A, Zusmanovich P, Thorleifsson G, Olason PI, Ingason A, Steinberg S, Rafnar T, Sulem P, Mouy M, Jonsson F, Thorsteinsdottir U, Gudbjartsson DF, Stefansson H, Stefansson K. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet. 2008;9:1068–1075. doi: 10.1038/ng.216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. doi: 10.1038/nature10231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Lin R, Charlesworth J, Stankovich J, Perreau VM, Brown MA, Taylor BV. Identity-by-descent mapping to detect rare variants conferring susceptibility to multiple sclerosis. PloS One. 2013;8:e56379. doi: 10.1371/journal.pone.0056379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Lohse K, Harrison RJ, Barton NH. A general method for calculating likelihoods under the coalescent process. Genetics. 2011;189:977–987. doi: 10.1534/genetics.111.129569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Marjoram P, Wall JD. Fast “coalescent” simulation. BMC Genetics. 2006;7:16. doi: 10.1186/1471-2156-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] McVean GAT, Cardin NJ. Approximating the coalescent with recombination. Phil Trans R Soc B. 2005;360:1387–1393. doi: 10.1098/rstb.2005.1673. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Moltke I, Albrechtsen A, Hansen TV, Nielsen FC, Nielsen R. A method for detecting IBD regions simultaneously in multiple individuals–with applications to disease genetics. Genome Res. 2011;21:1168–1180. doi: 10.1101/gr.115360.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Moorjani P, Patterson N, Loh PR, Lipson M, Kisfali P, Melegh BI, Bonin M, Kadasi L, Riess O, Berger B, Reich D, Melegh B. Reconstructing Roma history from genome-wide data. PloS One. 2013;8:e58633. doi: 10.1371/journal.pone.0058633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Palamara PF, Lencz T, Darvasi A, Pe’er I. Length distributions of identity by descent reveal fine-scale demographic history. Am J Hum Genet. 2012;91:809–822. doi: 10.1016/j.ajhg.2012.08.030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Palamara PF, Pe’er I. Inference of historical migration rates via hap-lotype sharing. Bioinformatics. 2013;29:i180–188. doi: 10.1093/bioinformatics/btt239. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Palin K, Campbell H, Wright AF, Wilson JF, Durbin R. Identity-by-descent-based phasing and imputation in founder populations using graphical models. Genet Epidemiol. 2011;35:853–860. doi: 10.1002/gepi.20635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Purcell S, Neale N, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Ralph P, Coop G. The geography of recent genetic ancestry across europe. PLoS Biol. 2013;11:e1001555. doi: 10.1371/journal.pbio.1001555. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Stam P. The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet Res. 1980;35:131–155. [Google Scholar]

[R46] Thompson EA. Identity by descent: variation in meiosis, across genomes, and in populations. Genetics. 2013;194:301–326. doi: 10.1534/genetics.112.148825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Wakeley J, King L, Low BS, Ramachandran S. Gene genealogies within a fixed pedigree, and the robustness of kingman’s coalescent. Genetics. 2012;190:1433–1445. doi: 10.1534/genetics.111.135574. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Weisstein EW. MathWorld–A Wolfram web resource. 2014 URL http://mathworld.wolfram.com.

[R49] Wiuf C, Hein J. Recombination as a point process along sequences. Theor Popul Biol. 1999;55:248–259. doi: 10.1006/tpbi.1998.1403. [DOI] [PubMed] [Google Scholar]

[R50] Wolfram Research I. Mathematica, version 9.0 Edition. Wolfram Research, Inc; Champaign, Illinois: 2012. [Google Scholar]

[R51] Zheng C, Kuhner MK, Thompson EA. Bayesian inference of local trees along chromosomes by the sequential Markov coalescent. J Mol Evol. 2014;78:279–292. doi: 10.1007/s00239-014-9620-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Renewal Theory Approach to IBD Sharing

Shai Carmi

Peter R Wilton

John Wakeley

Itsik Pe’er

Abstract

1. Introduction

2. The IBD process

2.1. Overview of the coalescent with recombination and its Markovian approximations

Figure 1.

Table 1.

Table 3.

2.2. The IBD process under SMC

Figure 2.

2.3. The IBD process under the renewal approximation to SMC

Figure 3.

Table 2.

2.4. The IBD process under SMC′

2.5. Simulations

Figure 4.

Figure 5.

3. The infinite-chromosome limit of the IBD process

3.1. The SMC model

Figure 6.

3.2. The SMC′ model

4. Renewal theory results for finite chromosomes

4.1. The distribution of the number of segments longer than m under the renewal approximation

4.1.1. Theory

Figure 7.

4.1.2. The mean

4.1.3. The variance

4.1.4. The Poisson approximation

4.1.5. Demographic inference

Figure 8.

4.2. The distribution of the fraction of the chromosome found in segments longer than m

4.2.1. Theory

4.2.2. The mean

4.2.3. The variance

Figure 9.

4.2.4. The Poisson approximation

5. Variable population size

5.1. The SMC model

5.2. The SMC′ model

6. Summary and discussion

Acknowledgments

Appendix A. Full expressions for SMC′ results

Appendix B. Full expression for the renewal theory results

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases