Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Aug 20.
Published in final edited form as: Theor Popul Biol. 2007 May 16;72(2):179–185. doi: 10.1016/j.tpb.2006.05.006

Prediction of multi-locus inbreeding coefficients and relation to linkage disequilibrium in random mating populations

William G Hill 1, Bruce S Weir 2
PMCID: PMC2729754  NIHMSID: NIHMS29705  PMID: 17575994

Abstract

An algorithm to predict the level of identity by descent simultaneously at multiple loci is presented, which can in principle be extended to any number of loci. The model assumes a random mating population, with random association of haplotypes. The relationship is shown between coefficients of multi-locus identity or non-identity by descent and moments of multi-locus linkage disequilibrium. Thus these moments can be computed from the multilocus identity or, using algorithms derived previously to predict the disequilibria moments, vice-versa. The results can be applied to predict multi-locus identity in, for example, gene mapping.

Keywords: Identity by descent, Multiple loci, Linkage disequilibrium, Random drift, Random mating

INTRODUCTION

The probability of identity by descent simultaneously at two or more neutral loci is a generalization of Wright’s inbreeding coefficient, F. Such probabilities are clearly functions of both the population size and the breeding structure, as is F, but they also depend on the linkage relationships between the loci. For example, in a closed random mating population without mutation, the probability of double identity is approximately equal to F2 for unlinked loci, but increases to F for a completely linked pair. The multi-locus identity is a useful parameter in predicting the joint ancestry of pairs of loci, for example in mapping studies, in inferences about historic population structure from current data (Hayes et al., 2003), and also in computing variances and covariances of quantitative traits in finite populations. Whereas contributions to variance in the absence of epistasis depend only on two-locus identities or disequilibria, with epistasis multi-locus terms may be involved.

Weir and Cockerham (1969, 1974) derived algorithms for the computation of identity by descent at two loci. Recently Hernández-Sánchez et al. (2004) have provided approximate methods for predicting identity at three and four (and, in principle, more) loci. They use the results of Weir and Cockerham for two loci, and then predict by regression the conditional probability of identity at the third (and fourth) locus conditional on identity at the pair. Here we provide algorithms for computing exact values of multi-locus inbreeding for random mating populations with random selfing.

Just as there is a linear relationship between F (as identity by descent) and heterozygosity within populations, so there is between the moments for linkage disequilibrium and multi-locus identity. It is therefore possible to compute multi-locus identities from linkage disequilibrium and vice versa. Weir and Cockerham (1974) show the relationship for a pair of loci. Algorithms for the computation of multi-locus linkage disequilibrium have been given previously (Hill, 1974; Hill and Weir, 1988). We show here how these relate to multi-locus identity coefficients. For populations undergoing random mating and random selfing as assumed here, haplotype frequencies are a sufficient descriptor of the population. There is also assumed to be no selection at the loci of interest, or at loci linked to them.

DEFINITIONS

Identity and Non-identity Coefficients

To define identity coefficients, we need to consider the number of haplotypes sampled and we adopt a notation that is more easily extended to multiple loci than that of Weir and Cockerham (1969). Let Fm,h:hap denote the probability of identity by descent at each of m loci, A, B, … in a sample of h haplotypes (2 ≤ h ≤ 2m), where ‘hap’ denotes the haplotype structure of the sample. Thus F1,2:A,A = FA and F1,2:B,B = FB are the probabilities of identity of two alleles sampled at locus A and locus B respectively, and are the usual single-locus inbreeding coefficients. In the absence of mutation, or with equal mutation rates at each locus, FA = FB = …. Similarly F2,2:AB,AB denotes the probability of identity at loci A and B in a pair of haplotypes, and is the two-locus inbreeding coefficient FAB (denoted F11 by Weir and Cockerham, 1969, for two haplotypes within one individual). The magnitude of this two-locus identity in successive generations also depends on the probability of identity of genes sampled from three and four haplotypes at two loci, F2,3:A,B,AB and F2,4:A,A,B,B, respectively. Weir and Cockerham (1974 and elsewhere) denote two, three and four haplotype non-identities at two loci as Θ, Γ and Δ.

Similarly, let Xm,h:hap be the equivalent probability of non-identity by descent at each of the m loci. Thus X1,2:A,A = 1 − F1,2:A,A = 1 − FA and, for example, X2,2:AB,AB = 1 − FAFB + F2,2:AB,AB and X2,3:A,B,AB = 1 − FAFB + F2,3:A,B,AB, as can be seen by writing the four two-locus probabilities of identity or non-identity for a pair of loci in a 2 × 2 table with single-locus probabilities as marginal totals. Let the vector x define the non-identity coefficients,

x=[X2,2:AB,ABX2,3:A,B,ABX2,4:A,A,B,B]. (1)

For three loci F3,2:ABC,ABC is the probability of identity at each of three loci for alleles on two haplotypes, with

X3,2:ABC,ABC=1FAFBFC+F2,2:AB,AB+F2,2:AC,AC+F2,2:BC,BCF3,2:ABC,ABC. (2)

Relationships between the 16 different three-locus identity and non-identity coefficients are shown in Appendix 1. These follow from writing the eight joint identity/non-identity probabilities in a 2 × 2 × 2 table with appropriate two-locus marginals. The number of coefficients rises rapidly with the number of loci, e.g. 139 for four loci (not listed!).

Linkage Disequilibrium Coefficients and Moments

It is sufficient to assume in the present analysis of disequilibrium calculations that there are two alleles at each locus. At locus A, for example, let the frequency of allele A be pA and the frequency of the alternative allele, a, be pa = 1 − pA. To simplify later formulae for moments of disequilibria, let πA = pA(1 − pA) = pApa and τA = (1 − 2pA)/2 = (papA)/2. To simplify relations between identities and disequilibria, this last quantity is one-half of the τ used by Hill and Weir (1988). The frequency of the two-locus haplotype AB, for example, is pAB, and the corresponding disequilibrium coefficient is DAB = pABpApB. Note that DAB = −DAb = −DaB = Dab. For three loci DABC = pABCpApBCpBpACpCpAB + 2pApBpC, and, for example, DABC = −DABc = DAbc = −Dabc. For four loci DABCD = pABCD − Σ4pApBCD − Σ3pABpCD + 2Σ6pApBpCD6pApBpCpD, where, for example, Σ4pApBCD denotes the sum of the four terms found by permuting A,B,C,D: Σ4pApBCD = pApBCD + pBpACD + pCpABD + pDpABC; and DABCD = − DABCd = −DABcd = −DAbcd =Dabcd.

Expected changes in variances and covariances of linkage disequilibria involve changes in interrelated moments (Hill and Robertson, 1968; Hill, 1974; Hill and Weir, 1988). The quantities used here, defined by the vector v, facilitate comparisons between moments and the measures of identity. For two loci

v=[E(DAB2)E(τAτBDABE(πAπB)]=[E(DABDab)E(pApBDab+pApbDaB+papBDAb+papbDAB)E(pApapBpb)]. (3)

Expectation in this context refers to the average over all finite populations that descend from a specified initial population. The elements of v for three loci are given in Table 1. Note that there is the same number (e.g. 16 for three loci) of disequilibrium moments and non-identity terms, i.e. x and v have the same dimension.

Table 1.

Non-identity coefficients and moments for three loci, elements of the upper triangular matrix Q defining their relationship, and diagonal elements of R, defining non-recombination* among moments. Rows of Q should be multiplied by the coefficient κ.

Moment κ Elements of Q Non-identity Element of R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 DABC2 −1 1 −2 −2 −2 0 4 1 1 1 2 2 2 −4 −4 −4 4 X3,2:ABC,ABC (1−rABC)2
2 τADBCDABC 1 1 0 0 0 −1 −1 0 0 0 −1 −1 3 1 1 −2 X3,3:A,BC,ABC (1−rBC)(1−rABC)
3 τBDACDABC 1 1 0 0 −1 0 −1 0 −1 0 −1 1 3 1 −2 X3,3:B,AC,ABC (1−rAC)(1−rABC)
4 τCDABDABC 1 1 0 −1 0 0 −1 −1 −1 0 1 1 3 −2 X3,3:C,AB,ABC (1−rAB)(1−rABC)
5 DABDACDBC −1 1 0 0 0 0 −1 −1 −1 1 1 1 −1 X3,3:AB,AC,BC (1−rAB)(1−rAC)(1−rBC)
6 τAτBτCDABC 1 1 0 0 0 0 0 0 −1 −1 −1 2 X3,4:A,B,C,ABC 1−rABC
7 πADBC2 1 1 0 0 0 0 0 −2 0 0 1 X3,4:A,A,BC,BC (1−rBC)2
8 πBDAC2 1 1 0 0 0 0 0 −2 0 1 X3,4:B,B,AC,AC (1−rAC)2
9 πCDAB2 1 1 0 0 0 0 0 −2 1 X3,4:C,C,AB,AB (1−rAB)2
10 τBτCDABDAC −1 1 0 0 0 −1 −1 1 X3,4:B,C,AB,AC (1−rAB)(1−rAC)
11 τAτCDABDBC −1 1 0 −1 0 −1 1 X3,4:A,C,AB,BC (1−rAB)(1−rBC)
12 τAτBDACDBC −1 1 −1 −1 0 1 X3,4:A,B,AC,BC (1−rAC)(1−rBC)
13 πAτBτCDBC 1 1 0 0 −1 X3,5:A,A,B,C,BC 1−rBC
14 πBτAτCDAC 1 1 0 −1 X3,5:B,B,A,C,AC 1−rAC
15 πCτAτBDAB 1 1 −1 X3,5:C,C,A,B,AB 1−rAB
16 πAπBπC 1 1 X3,6:A,A,B,B,C,C 1
*

If loci A, B and C have that order on the chromosome and there is no interference (1 − rAC) = (1 − rAB) (1 − rBC) + rABrBC and (1 − rABC) = (1 − rAB) (1 − rBC)

CORRESPONDENCE BETWEEN IDENTITY AND DISEQUILIBRIUM COEFFICIENTS

There is a linear relationship between moments and identity coefficients for a closed random mating finite population (Weir and Cockerham, 1974). It is simpler in the present model because here we assume that genotype frequencies equal products of haplotype frequencies, i.e. a haploid model with frequencies of each single locus genotype in Hardy-Weinberg proportions. Although not required for computation of the identities, we assume that the population was initially in linkage equilibrium, i.e. v0′ = [0, 0, 1](πAπB)0, and note that E(πAπB)t is the probability that, at generation t, from a sample of four haplotypes, the alleles sampled at locus A from two haplotypes are different and those at locus B sampled from two other haplotypes are also different. Hence E(πAπB)t = X2,4:A,B,A,B,t(πAπB)0. (To simplify notation, subsequently in this section generation t is implied in the formulae.) Further, using the fact that for a population initially in linkage equilibrium, E(DAB) = E(paDAB) = 0 (Hill, 1974) so E(τAτBDAB) = E(papbpABpApapBpb), and noting that E(DAB2) = E(DABDab) = E([pABpApB][pabpapb]), we obtain the relationships:

E(DAB2)=(X2,2:AB,AB2X2,3:A,B,AB+X2,4:A,B,A,B)(πAπB)0E(τAτBDAB)=(X2,3:A,B,ABX2,4:A,B,A,B)(πAπB)0E(πAπB)=X2,4:A,B,A,B(πAπB)0.

In matrix terms

v=Qx(πAπB)0andx=Q1v[(πAπB)0]1, (4)

where

Q=[121011001]andQ1=[121011001]. (5)

From (3) and (5), it can be shown that

Q1v=14[E[2(pABpab+pAbpaB)]E[pABpapb+pAbpbpB+paBpApb+pabpApB]E[4pApapBpb]]

The first element of 4Q−1v is the expected frequency of double heterozygotes, i.e. of genotypes at which both loci on two haplotypes are non-identical in state and thus also non-identical by descent. Similarly the second and third elements are the expected frequencies of doubly heterozygous genotypes (i.e. non-identical in state and by descent) formed from, respectively, three haplotypes, taking two loci at one and one each from the others, and four haplotypes, taking one allele from each. Equivalent relationships apply with more loci.

The elements of Q relating non-identity coefficients x and moments v for three loci are given in Table 1. These follow from the definitions of the disequilibrium components, and for clarity the elements of each row are scaled such that the corresponding diagonal elements are all +1. The coefficients (κ) are +1 or −1, with the latter whenever an odd number of loci have to be represented by the alternative allele in the disequilibria, e.g. DABC2 = −DABCDabc and DABDACDBC = −DABDaCDbc, but τCDABDABC = τCDABDabC. For example, as DABC2 = −DABCDabc and DABC = pABC − Σ3pApBC + 2pApBpC,

E(DABC2)=[X3,2:ABC,ABC23X3,3:A,BC,ABC+4X3,4:A,B,C,ABC+3X3,4:A,A,BC,BC+23X3,4:B,C,AB,AC43X3,5:A,A,B,C,BC+4X3,6:A,A,B,B,C,C](πAπBπC)0. (6)

TRANSITION EQUATIONS FOR RECOMBINATION AND DRIFT

The population is assumed to be random mating including random selfing, with no selection and effective population size N diploids. The recombination fraction between pairs of loci is given by, for example, rAB for loci A and B. The term rABC denotes the probability of any recombination between the three loci, i.e. (1 − rABC) is the probability the haplotype ABC is transmitted intact, such that, in the absence of interference and if A, B and C are so ordered on the chromosome, (1 − rABC) = (1 − rAB)(1 − rBC). A simple haploid model is assumed for the random sampling of gametes after recombination, such that, for example, E(DAB,t+1) = [1 − 1/(2N)](1 − rAB)E(DAB,t) where t denotes generation. The expectation process E is over all finite populations that descend from a specified initial population. As Weir and Cockerham (1974) point out, this is an approximation for a model in which recombination occurs only between haplotypes carried by an individual, but otherwise the analysis becomes impracticable for more than two loci and is an increasingly better approximation for increasing population size and decreasing recombination rate.

We obtain the recurrence relationships both for non-identity coefficients and for moments of disequilibria in order to show the processes involved although, as shown above, one set can be expressed in terms of the other. Details of the relevant matrices are given for two loci; results are summarised for three loci and the matrix equations are general.

Identity Coefficients

Assuming random mating and random selfing, with recombination preceding sampling, the expected non-identities among the gametes at generation t +1 in terms of those at t are

xt+1=Txt=LUxt, (7)

where U denotes transition probabilities due to recombination, L those due to sampling and T = LU the complete transition. Thus if all alleles are non-identical in the base or reference population,

x0=1andxt=Tt1, (8)

where 1 is the unit vector.

For two loci,

U=[(1rAB)22rAB(1rAB)rAB201rABrAB001] (9)

For example, the first row of U defines the probabilities that two pairs of non-identical alleles on two haplotypes, x1 = X2,2;AB,AB, can have arisen from no recombination on a pair of non-identical haplotypes (U11), no recombination on one and a recombination generating one haplotype on the other (U12), or recombination on four haplotypes to create a non-identical pair (U13).

The subsequent sampling of haplotypes with only non-identical alleles is given by

L=(112N)[10012N11N012N22N(11N)(11N)(132N)] (10)

(see also Weir and Cockerham, 1974, p. 349, setting recombination fractions to zero, λ = 1 in ΩEC in their notation). The diagonal elements of L, which correspond to sampling of h haplotypes in offspring from h different haplotypes in parents are simply 1[1−1/(2N)] … [1−(h−1)/(2N)], where h = 2, 3 and 4 for rows 1, 2 and 3 of L. For sub-diagonal elements, where h′ offspring haplotypes are sampled from h (h < h′) parental haplotypes, the coefficients are k[1/(2N)(h′−h)][(1 − 1/(2N)] … [1 − (h−1)/(2N)], where the number k of possible sampling sequences depends on the haplotypic designations of parents and offspring. We illustrate by example as no closed formula has been obtained, but an algorithm is given in Appendix 2.

Consider the element L32, where four offspring haplotypes defined by alleles A,A,B,B are sampled from three parental haplotypes defined by (i) AB, (ii) A, (iii) B. The four possible sequences are: 1: A(i), A(ii), B(i), B(iii) (shorthand for A from (i), A from (ii), B from (i), B from (iii)); 2: A(i), A(ii), B(iii), B(i); 3: A(ii), A(i), B(i), B(iii); and 4: A(ii), A(i), B(iii), B(i). Thus h = 3, h′ = 4, k = 4 and L32 = 4[1/(2N)][1−1/(2N)][1−2/(2N)]. Note that when two parental haplotypes are the same, e.g. in element L31, ordering of these is irrelevant. For example, if the parental haplotypes are defined by (i) AB and (ii) AB, there are only two relevant orderings (i.e. k = 2), because A(i), A(ii), B(i), B(ii) ≡ A(ii), A(i), B(ii), B(i) and A(i), A(ii), B(ii), B(i) ≡ A(ii), A(i), B(i), B(ii).

Moments of Disequilibria

For two loci, the moments needed to describe transitions in squared linkage disequilibrium are given by the vector v, for which a recurrence relation was given by Hill and Robertson (1968), assuming as above that recombination preceded sampling (but they used (1 − 2pA)(1 − 2pB)DAB = 4τAτBDAB so coefficients differ):

vt+1=Mvt=KRvt (11)

where

R=[(1rAB)20001rAB0001] (12)

denotes transitions due solely to recombination, and

K=(112N)[14N2+(112N)22N(112N)12N12N(11N)(11N)2012N22N(112N)112N] (13)

those due solely to sampling.

Using the relationship vt = Qxt (see 4), the recurrence relations for v and x can therefore also be written as

xt=Q1MQxt1andvt=QTtQ1vt1, (14)
xt=Ttx0=Q1MtQx0andvt=Mtv0=QTtQ1v0. (15)

In the absence of recombination (i.e. rAB, rAC, etc. → 0), UI and RI (eq. 9, 12), and hence MQLQ−1. In the absence of drift (i.e., 1/N → 0), LI and KI (eq. 10, 13) so that UQ−1RQ. As R is diagonal, the relationship U = Q−1RQ is a convenient route for constructing U in the multi-locus case; indeed the most simply obtained matrices are R, Q and L.

Analysis for Three (or More) Loci

The elements of the diagonal matrix R, defining changes in disequilibrium terms due to recombination, are also given in Table 1. Their definitions are obvious. The elements of L given in Table 2 are obtained by extending the arguments used to derive L for two loci (Eq. 10 and subsequent text, and Appendix.)

Table 2.

Elements of L for three loci. To reduce expressions, ν = 2N. All elements should be multiplied by ξ1 = [1 − 1/(2N)]. Other terms are ξi = [1 − i/(2N)], i = 2, …, 5.

Non-
identity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 X3,2:ABC,ABC 1
2 X3,3:A,BC,ABC 1/ν ξ2
3 X3,3:B,AC,ABC 1/ν 0 ξ2
4 X3,3:C,AB,ABC 1/ν 0 0 ξ2
5 X3,3:AB,AC,BC 0 0 0 0 ξ2
6 X3,4:A,B,C,ABC 1/ν2 ξ2/ν ξ2/ν ξ2/ν 0 ξ2ξ3
7 X3,4:A,A,BC,BC 2/ν2 4z2/ν 0 0 0 0 ξ2ξ3
8 X3,4:B,B,AC,AC 2/ν2 0 4ξ2/ν 0 0 0 0 ξ2ξ3
9 X3,4:C,C,AB,AB 2/ν2 0 0 4ξ2/ν 0 0 0 0 ξ2ξ3
10 X3,4:B,C,AB,AC 1/ν2 0 ξ2/ν ξ2/ν ξ2/ν 0 0 0 0 ξ2ξ3
11 X3,4:A,C,AB,BC 1/ν2 ξ2/ν 0 ξ2/ν ξ2/ν 0 0 0 0 0 ξ2ξ3
12 X3,4:A,B,AC,BC 1/ν2 ξ2/ν ξ2/ν 0 ξ2/ν 0 0 0 0 0 0 ξ2ξ3
13 X3,5:A,A,B,C,BC 2/ν3 4ξ2/ν2 2ξ2/ν2 2ξ2/ν2 2ξ2/ν2 2ξ2ξ3/ν ξ2ξ3/ν 0 0 0 2ξ2ξ3/ν 2ξ2ξ3/ν ξ2ξ3ξ4
14 X3,5:B,B,A,C,AC 2/ν3 2ξ2/ν2 4ξ2/ν2 2ξ2/ν2 2ξ2/ν2 2ξ2ξ3/ν 0 ξ2ξ3/ν 0 2ξ2ξ3/ν 0 2ξ2ξ3/ν 0 ξ2ξ3ξ4
15 X3,5:C,C,A,B,AB 2/ν3 2ξ2/ν2 2ξ2/ν2 4ξ2/ν2 2ξ2/ν2 2ξ2ξ3/ν 0 0 ξ2ξ3/ν 2ξ2ξ3/ν 2ξ2ξ3/ν 0 0 0 ξ2ξ3ξ4
16 X3,6:A,A,B,B,C,C 4/ν4 8ξ2/ν3 8ξ2/ν3 8ξ2/ν3 8ξ2/ν3 8ξ2ξ3/ν2 2ξ2ξ3/ν2 2ξ2ξ3/ν2 2ξ2ξ3/ν2 8ξ2ξ3/ν2 8ξ2ξ3/ν2 8ξ2ξ3/ν2 4ξ2ξ3ξ4/ν 4ξ2ξ3ξ4/ν 4ξ2ξ3ξ4/ν ξ2ξ3ξ4ξ5

The matrix U for three loci, which defines recombination among identity coefficients, can be derived directly by considering how recombination among haplotypes can provide the relevant reduced set. For example, for element (12,16) the haplotypes for X3,4:A,B,AC,BC can arise from six haplotypes corresponding to X3,6:A,A,B,B,C,C by recombination both between A - - and - - C and between - B - and - - C haplotypes, with probability rACrBC. It is simpler, however, to compute U = Q−1RQ.

Mutation

Consider the infinite alleles model of Kimura and Crow (1964), such that mutation leads to a new allele, and also assume that alleles that are not identical in state as a consequence of mutation are not identical by descent, even though their ancestry coalesces. Hence the probability of multiple-locus heterozygosity and identity asymptote at values dependent on population size, mutation rates and recombination fractions. Let uA denote the mutation rate at locus A, and assume it is sufficiently small that all terms such as uA2, uA/N, uArAB and similarly for other loci can be ignored. Then, in the present notation,

X1,2:A,A,t+1=[11/(2N)]X1,2:A,A,t+2uA(1X1,1:A,A,t), (16)

which is simply 1 − FA,t+1 = [1 − 1/(2N)(1 − FA,t) + 2uFA,t, a version of the standard equation of Kimura and Crow (1964). Similarly, allowing for mutation among haplotypes which are identical at one locus and not at the other, for two loci

xt+1=Txt+2uA(X1,1:B,B,t1xt)+2uB(X1,1:A,A,t1xt) (17)

where 1 is the unit vector. For three loci, an example (element 2) of the relevant expressions, letting t2′ denote the second row of T, is:

X3,3:A,BC,ABC,t+1=t2xt+2uA(X2,2:BC,BC,tX3,3:A,BC,ABC,t)+2uB(X2,3:A,C,AC,tX3,3:A,BC,ABC,t)+2uC(X2,3:A,B,AB,tX3,3:A,BC,ABC,t). (18)

DISCUSSION

The main aim of this analysis was to derive and show clearly the relationship between multi-locus identities and moments of linkage disequilibrium and hence how to compute each most simply. The analysis here is restricted solely to the haploid model, but in a random mating population it becomes an increasingly accurate predictor for the diploid case for small recombination fractions and large population size (see Weir and Cockerham, 1974, for two-locus examples). The assumption of a random mating population is more critical than just in the use of the haploid model. In the absence of random mating, the changes in identity can not be described solely in terms of two-gene measures at each locus, and it is necessary to include more terms in the transition calculations (Weir and Cockerham, 1974). Further, the correspondence between identities and moments of linkage disequilibria also become more complicated.

Predictions of multi-locus identity are needed in, for example, prediction of the origins of haplotypes in pedigrees and of associating them with particular traits or disease conditions. While the method developed here is feasible for four loci it becomes unwieldy with more. Hernandez-Sanchez et al. (2004) showed how to predict probabilities of identity at three and four loci from those on two loci by a regression analysis. They used the exact results for two loci of Weir and Cockerham (1974) to obtain FAB, FAC and FBC (e.g. FAB = F2,2:AB,AB) using the appropriate recombination fractions, and from these the covariances and regression coefficients of identity, e.g. cov(FA,FC) = FACFAFC and βC.A = cov(FA,FC)/Var(FA) = (FACFAFC)/[FA(1 − FA)]. With these they computed the expected probability of identity at locus C given identity at A and B from a partial regression equation, and thus the three locus identity FABC. Hernandez-Sanchez et al. extended their regression method to predict identity at four loci in a two-step process from that predicted for three loci. The procedure gave good predictions as checked by simulation for three and four locus identity, although predictions were better for three loci and when the fitted loci were between the two reference loci. The results given here for three or four loci could be extended by standard multiple regression methods to make predictions for five or more loci, and thereby obtain accurate predictions of multi-locus identity for substantial lengths of the chromosome. They cannot however be used directly for other mating systems, whereas two-locus equations for different systems, including avoidance of selfing, are given by Weir and Cockerham (1974) and can be incorporated in the regression method. In a subsequent paper we shall consider other ways to predict multi-locus from two-locus identity.

Whilst it is usual to describe association between loci in terms of linkage disequilibria or derived statistics such as the correlation of gene frequencies, it can, for example, also be described in terms of observed multi-locus heterozygosities. For example, the ratio of non-identities X2,2:AB,AB/XAXB is also equal to the ratio of expected heterozygosity at a pair of loci relative to that of independent loci. The numbers of heterozygous loci have, for example, been used by Brown et al. (1980) to describe multi-locus association and by Weir et al. (2004), albeit less directly, as the number of multi-locus genotypes to identify regions of differential recombination rates in the human genome. Further, an important use of data on genetic markers is to compute relationships among pairs of individuals in non-pedigreed populations. This has been based on independent loci, but increasingly as we get more information on closely linked markers, relationships among linked loci will provide further information (Chapman and Thompson, 2002; Hu, 2005).

Analyses of population structure from samples of loci are based on the assumption that the genetic distance between them or from their ancestral population (F) is the same throughout the genome, apart from local sampling. Analyses of potential selective sweeps of human HapMap data show, however, that there are substantial differences among regions of the genome in the levels of population dispersion (Weir et al., 2005), which should be quantified in multi-locus terms. The relationships between them enable multi-locus identity to be inferred from moments of linkage disequilibria which have known sampling properties.

Acknowledgments

This work was supported in part by NIH Grant GM 45344. We are grateful to Jules Hernández-Sánchez for helpful comments on an earlier draft.

APPENDIX 1: Definitions of Vector x of Non-identity Coefficients for Three Loci, and Correspondence to Identity Coefficients

To shorten the equations, let G = 1 − FAFBFC

x1=X3,2:ABC,ABC=G+F2,2:AB,AB+F2,2:AC,AC+F2,2:BC,BCF3,2:ABC,ABCx2=X3,3:A,BC,ABC=G+F2,3:A,B,AB+F2,3:A,C,AC+F2,2:BC,BCF3,3:A,BC,ABCx3=X3,3:B,AC,ABC=G+F2,3:A,B,AB+F2,3:B,C,BC+F2,2:AC,ACF3,3:B,AC,ABCx4=X3,3:C,AB,ABC=G+F2,3:A,C,AC+F2,3:B,C,BC+F2,2:AB,ABF3,3:C,AB,ABCx5=X3,3:AB,AC,BC=G+F2,3:A,B,AB+F2,3:A,C,AC+F2,3:B,C,BCF3,3:AB,AC,BCx6=X3,4:A,B,C,ABC=G+F2,3:A,B,AB+F2,3:A,C,AC+F2,3:B,C,BCF3,4:A,B,C,ABCx7=X3,4:A,A,BC,BC=G+F2,4:A,A,B,B+F2,4:A,A,C,C+F2,2:BC,BCF3,4:A,A,BC,BCx8=X3,4:B,B,AC,AC=G+F2,4:A,A,B,B+F2,4:B,B,C,C+F2,2:AC,ACF3,4:B,B,AC,ACx9=X3,4:C,C,AB,AB=G+F2,4:A,A,C,C+F2,4:B,B,C,C+F2,2:AB,ABF3,4:C,C,AB,ABx10=X3,4:B,C,AB,AC=G+F2,4:B,B,C,C+F2,3:A,B,AB+F2,3:A,C,ACF3,4:B,C,AB,ACx11=X3,4:A,C,AB,BC=G+F2,4:A,A,C,C+F2,3:A,B,AB+F2,3:B,C,BCF3,4:A,C,AB,ACx12=X3,4:A,B,AC,BC=G+F2,4:A,A,B,B+F2,3:A,C,AC+F2,3:B,C,BCF3,4:A,B,AC,BCx13=X3,5:A,A,B,C,BC=G+F2,4:A,A,B,B+F2,4:A,A,C,C+F2,3:B,C,BCF3,5:A,A,B,C,BCx14=X3,5:B,B,A,C,AC=G+F2,4:A,A,B,B+F2,4:B,B,C,C+F2,3:A,C,ACF3,5:B,B,A,C,ACx15=X3,5:C,C,A,B,AB=G+F2,4:A,A,C,C+F2,4:B,B,C,C+F2,3:A,B,ABF3,5:C,C,A,B,ABx16=X3,6:A,A,B,B,C,C=G+F2,4:A,A,B,B+F2,4:A,A,C,C+F2,4:B,B,C,CF3,6:A,A,B,B,C,C

APPENDIX 2: Algorithm to Compute Coefficients of Elements of L

  1. Define vector y of dimension h as the haplotypes sampled in the parental array, with elements expressed in binary form, e.g. as follows for two loci: AB = 11, A = 10, B = 01. For example, if the parental haplotypes are AB, AB, y′ = (11, 11).

  2. Similarly define vector z of dimension h′ as the haplotypes sampled in the progeny array. For example, if the progeny haplotypes are A, B, AB, z′ = (10, 01, 11).

  3. Construct matrix Λ, dimension h′ by h, with elements which define whether the progeny haplotype can be obtained from sampling the parental haplotype:

    If zi & yj = zi, where & denotes the logical “and” operation, then the element is λij = zi; otherwise λij = 0.

  4. (iv) Consider all possible matrices Δ, dimension h′ by h, with elements δij = 0 or 1 and j=1hδij=1 for all i, i.e. one element in each row takes the value 1, the rest 0, with the non-zero element defining the parental haplotype as source of the progeny haplotype.

  5. If i=1hδijλij=yj for all j, this implies that the sampling leads to non-identical haplotypes. Count the number, K, of all matrices Δ that satisfy this criterion.

    Count number of pairs, c, of identical haplotypes among the parents; for example (see text following Equation 4) if y′ = (11, 11), then c = 1. Compute coefficient k in L as K/2c.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

William G. Hill, Institute of Evolutionary Biology, School of Biological, Sciences, University of Edinburgh, West Mains Road, Edinburgh, EH9 3JT, UK

Bruce S. Weir, Department of Biostatistics, University of Washington, Seattle, WA 98195-7232, USA

References

  1. Brown AHD, Feldman MW, Nevo E. Multilocus structure of natural populations of Hordeum spontaneum. Genetics. 1980;96:523–536. doi: 10.1093/genetics/96.2.523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chapman NH, Thompson EA. The effect of population history on the lengths of ancestral chromosome segments. Genetics. 2002;162:449–458. doi: 10.1093/genetics/162.1.449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Hernández-Sánchez J, Haley CS, Woolliams JA. On the prediction of simultaneous inbreeding coefficients at multiple loci. Genet Res. 2004;83:113–120. doi: 10.1017/s0016672303006633. [DOI] [PubMed] [Google Scholar]
  4. Hayes BJ, Visscher PM, McPartlan HC, Goddard ME. Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 2003;13:635–643. doi: 10.1101/gr.387103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Hill WG. Disequilibrium among several linked neutral genes in finite population. II Variances and covariances of disequilibria. Theor Pop Biol. 1974;6:l84–l98. doi: 10.1016/0040-5809(74)90023-9. [DOI] [PubMed] [Google Scholar]
  6. Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theor Appl Genet. 1968;38:226–231. doi: 10.1007/BF01245622. [DOI] [PubMed] [Google Scholar]
  7. Hill WG, Weir BS. Variances and covariances of squared linkage disequilibria in finite populations. Theor Pop Biol. 1988;33:54–78. doi: 10.1016/0040-5809(88)90004-4. [DOI] [PubMed] [Google Scholar]
  8. Hu XS. Estimating the correlation of pairwise relatedness along chromosomes. Heredity. 2005;94:338–346. doi: 10.1038/sj.hdy.6800586. [DOI] [PubMed] [Google Scholar]
  9. Kimura M, Crow JF. The number of alleles that can be maintained in a finite population. Genetics. 1964;49:725–738. doi: 10.1093/genetics/49.4.725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005;15:1468–1476. doi: 10.1101/gr.4398405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Weir BS, Cockerham CC. Group inbreeding with two linked loci. Genetics. 1969;63:711–742. doi: 10.1093/genetics/63.3.711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Weir BS, Cockerham CC. Behavior of pairs of loci in finite monoecious populations. Theor Pop Biol. 1974;6:323–354. doi: 10.1016/0040-5809(74)90015-x. [DOI] [PubMed] [Google Scholar]
  13. Weir BS, Hill WG, Cardon LR. Allelic association patterns for a dense SNP map. Genetic Epidemiology. 2004;24:1–9. doi: 10.1002/gepi.20038. [DOI] [PubMed] [Google Scholar]

RESOURCES