Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Aug 1.
Published in final edited form as: Theor Popul Biol. 2011 May 6;80(1):29–37. doi: 10.1016/j.tpb.2011.04.004

Genetic Diversity of Microsatellite Loci in Hierarchically Structured Populations

Seongho Song *,1, Dipak K Dey **, Kent E Holsinger ***
PMCID: PMC3124608  NIHMSID: NIHMS300091  PMID: 21575649

Abstract

Microsatellite loci are widely used for investigating patterns of genetic variation within and among populations. Those patterns are in turn determined by population sizes, migration rates, and mutation rates. We provide exact expressions for the first two moments of the allele frequency distribution in a stochastic model appropriate for studying microsatellite evolution with migration, mutation, and drift under the assumption that the range of allele sizes is bounded. Using these results we study the behavior of several measures related to Wright's FST, including Slatkin's RST. Our analytical approximations for FST and RST show that familiar relationships between Nem and FST or RST hold when migration and mutation rates are small. Using the exact expressions for FST and RST, our numerical results show that when migration and mutation rates are large, these relationships no longer hold. Our numerical results also show that the diversity measures most closely related to FST depend on mutation rates, mutational models (stepwise versus two-phase), migration rates, and population sizes. Surprisingly, RST is relatively insensitive to mutation rates and mutational models. The differing behaviors of RST and FST suggest that properties of the among-population distribution of allele frequencies may allow the roles of mutation and migration in producing patterns of diversity to be distinguished, a topic of continuing investigation.

Keywords: genetic diversity, genetic drift, microsatellite loci, mutation, migration, RST, FST

1. Introduction

In the last decade microsatellite markers have become a standard tool for genetic analysis. Because of the relative ease with which they can be isolated and the large allelic diversity commonly present at each locus, they are widely used in the construction of genetic maps (e.g., Chistiakov et al., 2005; Kai et al., 2005), the identification of quantitative trait loci (e.g., Allan et al., 2005; Minvielle et al., 2005), and the analysis of gene flow and other evolutionary processes (e.g., Hall and Willis, 2005; Kretzer et al., 2005). In particular, many evolutionary applications use measures of population divergence derived from microsatellite markers as an indicator of the evolutionary distance among populations or the degree of evolutionary connection among them (e.g., (δμ)2: Goldstein et al., 1995; RST Slatkin, 1995). These measures are inspired by Wright's (1951) observation that the proportion of genetic diversity due to among-population differentiation can be a useful index of the degree to which populations are evolutionarily connected by gene flow. For more than fifty years Wright's F-statistics have been the most widely used index for describing the genetic structure of populations.

Useful as they are, Wright's F-statistics are implicitly based on the assumption that all alleles are mutationally equidistant from one another. Indeed, the widely adopted Weir and Cockerham framework for evolutionary inference from F-statistics (Weir and Cockerham, 1984; Weir and Hill, 2002) is based on co-ancestry or probabilities of identity by descent. Thus, an “infinite alleles” model of mutation underlies typical approaches to evolutionary inference from Wright's F-statistics (compare Rousset, 1996). As Slatkin (1995) and Goldstein et al. (1995) pointed out, however, an infinite alleles model of mutation may not be appropriate for microsatellite loci. The combination of high mutation rates (see, for example, Xu et al., 2005) and predominantly stepwise mutation among adjacent allele classes (see, for example, Calabrese et al., 2001) implies that alleles of the same size may have different mutational histories (i.e., homoplasy, see Estoup et al., 2002) and that alleles of similar size will tend to covary in frequency.

Several authors have studied the evolutionary dynamics of microsatellite loci either in models of isolated populations (for example, Feldman et al., 1997) or in models where migration occurs according to a finite-island model (for example, Rousset, 1996). For a stepwise mutation model (Ohta and Kimura, 1973; Wehrhahn, 1975), Rousset shows that RST has the familiar relationship with the migration rate and population size when migration and mutation are rare. Specifically,

RST=14Nmα+1, (1)

where N is the local population size, m is the (backwards) migration rate, α = k/(k − 1), and k is the number of populations. Similarly, simulations in Feldman et al. (1997) show that (δμ)2 increases roughly linearly with time since divergence for isolated populations. Nonetheless, the relationship in (1) is only approximate. The magnitude of among population differentiation tends to be smaller than predicted from (1) because of mutation-induced homoplasy (Rousset, 1996).

In this paper, we use the modeling framework introduced in Fu et al. (2003) to provide the allele frequency distribution in a stochastic evolutionary model appropriate for investigating the evolutionary dynamics of microsatellite loci, under the assumption that the range of allele sizes is bounded (compare Feldman et al., 1997; Pollock et al., 1998). We use our results to study the behavior of RST and of two measures more directly related to Wright's FST as functions of local population size, migration rate, and mutation rate both for the usual stepwise model of mutation and for a more realistic two-phase model proposed by Di Rienzo et al. (1994) and studied numerically by Rousset (1996). Although all of the results we present assume that the driftmutation-migration process has reached stationarity, the close relationship between RST and coalescence times (Slatkin 1995) suggests that RST may be a useful index of gene flow among populations, so long as we remember that “gene flow” may refer either to recent common ancestry, continuing migration, or any combination of these two. Finally, we discuss implications of our results for analysis and interpretation of data derived from microsatellite loci.

2. Process Model and Results

Fu et al. (2003) described a general stochastic framework for the study of drift, migration, and mutation. Here we consider models developed within that framework that are designed to illuminate the evolutionary dynamics of microsatellite alleles in which the range of allele sizes is bounded (compare Feldman et al., 1997; Pollock et al., 1998). Specifically, we focus on a single locus with A alleles, b1, b2, …, bA. In the context of microsatellite variation, allele bj+1 has one more repeat unit than allele bj. Correspondingly, allele b1 corresponds to the allele with the smallest number of repeat units and bA to the allele with the largest number of repeat units. Let VA×A, be a general mutation matrix with elements, vrs, the probability of mutation from allele type br to allele type bs, r = 1, …, A and s = 1, …, A.

2.1. Main Results

Assume that there are k populations indexed by i. Let Mk×k be a general (backward) migration matrix, i.e., M=((mij)) where mij is the probability that the allele in population i came from population j (compare Nagylaki, 1982; Rousset, 1999, 2001). Let pi(t) be the A × 1 vector of allele frequencies in population i at generation t, (1pi(t)=1). Concatenate the pi(t) to a kA×1 vector p(t) and define

p(t)=(MV)p(t) (2)

where ⊗ denotes the Kronecker product between two matrices. Let B = B(M, V′) be MV′ then we can write p*(t) = B(M, V′)p(t) for convenience. Let Ni be the number of individuals in the ith population and let N be the vector of population sizes. For diploid organisms, the number of allele copies is 2Ni. Given p*(t) and Ni, the pi(t+1) are conditionally independent with

2Nipi(t+1)~M(2Ni,pi(t)), (3)

where M denotes a multinomial distribution. Through (2) and (3), we pass from p(t)p*(t)p(t+1).

Fu et al. (2003) showed that the stationary mean satisfies

u=B(M,V)u (4)

where u = E(p(t)|M, V, N) as t tends to infinity. Additional analysis shows that the stationary mean is identical across populations and corresponds to the left eigenvector of V associated with the leading eigenvalue of unity.

If we assume that Ni = N and that the entries in M do not depend on population indices, then a common distribution for all pi(t) will arise, regardless of V. Thus, we can describe the stationary covariance structure for the entire set of k populations in terms of a single covariance matrix within populations, Σ11, and another between populations, Σ12. Given that the entries in M do not depend on population indices, we denote the diagonal elements of M by (1 − m) and the off-diagonal elements as m/(k − 1). This migration model corresponds to the finite-island model studied by Crow and Aoki (1984) and Cockerham and Weir (1987).

The stationary equations for covariances are

(BΣB)11=V{(1rk)Σ11+rkΣ12}V

and

(BΣB)12=V{rkk1Σ11+(1rkk1)Σ12}V,

where rk = r(m, k) = 2mm2k/(k − 1). Solving for Σ11 and Σ12

Σ11=(112N)V{(1rk)Σ11+rkΣ12}V+12N(1AIA1A21A1A) (5)

and

Σ12=V{rkk1Σ11+(1rkk1)Σ12}V, (6)

(compare Fu et al. 2003). Further analysis shows that

Σ11=QΦ11Q(1AIA1A21A1A) (7)

and

Σ12=QΦ12Q(1AIA1A21A1A), (8)

where Φ11 = diag(ϕ11,1, …, ϕ11,A) and Φ12 = diag(ϕ12,1, …, ϕ12;A) with

ϕ11,j=12N{1(1rkk1)λj2}[1(112N)λj2][1(1kk1rk)λj2]rk2Nλj2

and

ϕ12,j=12Nrkk1λj2[1(112N)λj2][1(1kk1rk)λj2]rk2Nλj2

for j = 2, …, A (see Appendix A for details). Further, we note that, as k tends to infinity, ϕ12,j, for all j = 2, …, A become zero.

2.2. Stepwise Mutation Model for microsatellite loci

Microsatellite loci consist of short (26) nucleotide sequences repeated as many as 100 times (Tautz,1993). Differences among alleles correspond to differences in the number of repeat units. Because mutations occur predominantly through mispairing or slippage, the stepwise mutation model originally developed for study of charge-state variation in isozyme alleles (Ohta and Kimura, 1973; Wehrhahn, 1975) is a widely used approximation to the mutational process. The mutation matrix, V in this case is of size A × A and is given by

V=(1μ2μ200000μ21μμ200000μ21μμ20000000μ21μμ200000μ21μ2),

where the alleles sizes lie in the discrete space, (1, 2, …, A), with allele size 1 corresponding to the smallest number of repeat units and allele size A corresponding to the largest number of repeat units.

The eigenvalues of V are λj=1μ+μcos((j1)πA) for j = 1, …, A and the corresponding eigenvectors are free from the mutation rate, μ, which are qj=qjqj for j = 1, …, A, where qj=(qj1,,qjA) with qjl=cos((2l1)αj)cos(αj) for l = 1, 2, …, A and αj=(j1)π2A. Details of the analysis are provided Appendix B.

Now we see that q1=1A1A and qj1A = 0 for all j = 2, …, A. Furthermore, it turns out that

Σ11=QΦ11Q(1AIA1A21A1A)=j=1Aϕ11,jqjqj(1AIA1A21A1A)=1Aj=1Aϕ11,jqjqj1A2j=1Aϕ11,jqjqj1A1A=1Aϕ11,1q1q1(IA1AJA)+1Aj=1Aϕ11,jqjqj=1A2ϕ11,1JA(IA1AJA)+1Aj=1Aϕ11,jqjqj=1Aj=2Aϕ11,jqjqj

and similarly

Σ12=QΦ12Q(1AIA1A21A1A)=1Aj=2Aϕ12,jqjqj.

Notice that Σ11 and Σ12 do not depend on the largest eigenvalue, λ1 = 1 or its corresponding eigenvector, q1=1A1A.

3. FST Analysis for Microsatellite Loci

Wright (1951) and Malécot (1948) introduced F-statistics to describe hierarchical structure in genetic data for one locus with two alleles, defining FST as a scaled variance

FST=σp2μp(1μp), (9)

where μp is the mean allele frequency across populations and σp2 is the variance in allele frequency among populations. Equivalently, FST can be regarded as the intraclass correlation coefficient between pairs of alleles arising from a random-effects model of population sampling, as in the widely adopted Weir and Cockerham framework for population structure analysis (Cockerham, 1969; Weir and Cockerham, 1984; Weir, 1996; Weir and Hill, 2002). Fu et al. (2003) and Song et al. (2004) point out that in an evolutionary context there are two statistics related to equation (9) that might be of interest:

θ(I)=σp(t)2μp(1μp),

which corresponds to the scaled temporal variance in allele frequency, and

θ(p1(t)),,(pk(t))=E((1k)Σ(pi(t)μp(t))2μp(t)(1μp(t)))

with μp(t)=(1k)Σpi(t), which corresponds to the scaled geographical variance in allele frequency. When the number of populations exchanging genes is even moderately large, say 10 or more,

θ(II)=E((1k)Σ(pi(t)μp(t))2)E(μp(t)(1μp(t)))

provides a satisfactory approximation to θ(p1(t),,pk(t)) by the Central Limit Theorem and the Slutsky's theorem in probability theory.

In addition to using the analytical results above to study the behavior of θ(I) and θ(II) as a function of mutation rates, migration rates, and population size, we will also consider the behavior of RST, an analogue of FST that is sensitive not only to allele frequency differences among populations but also to repeat-size differences among those alleles. Slatkin (1995) introduced RST, defining it as

RST=SSWS

where

S=2N12Nk1SW+2N(k1)2Nk1SB.

SW is the average sum of squares of the differences in allele size within each population, which is equivalent to D0 of Goldstein et al. (1995), and SB is the average sum of squares of the differences in allele size between populations, which is equivalent to D1 of Goldstein et al. (1995). Because SW and S are proportional to the within-population and total variances, RST is just the proportion of the total allele-size variance accounted for by differences among populations. Therefore, RST has an interpretation similar to that of Weir and Cockerham's (1984) θ, which is also defined as a ratio of among-population to total variances. Moreover, Slatkin (1995) points out that for a stepwise mutation model RST is related to the excess coalescence time for alleles found in different populations. Specifically, RST(ttw)t, where t is the average coalescence time for alleles drawn at random without respect to population and tw is the average coalescence time for allele drawn at random within populations.

3.1. Asymptotic results for θ statistics

Suppose that (p1(t),,pk(t)) arise under (2) and (3). At stationarity, it is shown that

θ(I)=tr(Σ11)A1A(11A) (10)

and

θ(II)=1Aj=1Ak1k(σ11,j2σ12,j2)1A(11A)1k(σ11,j2+(k1)σ12,j2), (11)

where σ11,j2 and σ12,j2 are the jth diagonal element of Σ11 and Σ12, respectively, and tr(·) denotes the trace of a matrix. Notice that when k is moderate to large

θ(II)=j=1A(σ11,j2σ12,j2)A1A(11A)σ12,j2.

Thus, θ(I) > θ(II) unless σ12,j2=0. Since σ12,j20 as k → ∞, θ(II) → θ(I) as k → ∞. Moreover, as k tends to infinity, rk = 2mm2 and

θ(I)=1A1j=2Aϕ11,j=1A1j=2A[2N(2N1)(1rk)λj2]1.

Further, we observe that, once we ignore the terms O(Nm2) and O(Nμ2),

θ(I)=1A1j=2Aψ(j), (12)

where ψ(j) = [1 − 2m + 4Nm + 2(2N − 1)(1 − 2m)μ(1 − cos αj)]−1 for j = 2, …, A. Thus, θ(I) is the simple average of ψ(j). Moreover, if O(Nμ) is negligible and the migration is negligible with respect to Nm, we find that θ(I) ≈ (1 + 4Nm)−1. Further, we observe that θ(I) has the lower bound, [1 − 2m + 4Nm + 4(2N − 1)(1 − 2m)μ]−1 and the upper bound, (1 + 4Nm)−1.

3.2. Asymptotic results for RST

Rousset (1996) pointed out that with stepwise mutation and unbounded allele sizes, RST ≈ (1 + 4αNm)−1 where α = k/(k − 1). Our approach provides similar results. Specifically, when the terms O(Nm2) and O(Nμ2) are negligible,

RSTj=2Awjψ(j),

where

wj=Σl=1AΣl=1A(ll)2qjlqjlΣl=1A(ll)2A

and ψ(j) is as before. Thus, RST is the weighted average of ψ(j), where the weights depend only on the differences in allele sizes and the number of alleles since

j=2Aqjlqjl={11A,l=l1A,ll}.

If m and μ are negligible and k tends to infinity, then

RST11+4Nm1Aj=2AΣl=1AΣl=1A(ll)2qjlqjlΣl=1AΣl=1A(ll)2A2=11+4Nmj=2Awj=(1+4Nm)1,

showing that in the limit of a large number of populations, small mutation rates, and small migration rates θ(I), θ(II), and RST have equivalent values.

3.3. Exact results from numerical studies:

While the asymptotic results just presented provide some insight into patterns of population differentiation expected at microsatellite loci, they are limited in two respects. First, the degree to which asymptotic results apply when the number of populations is moderate or small and when mutation or migration rates are moderate is unknown. Second, they depend on a highly simplified model of mutation, namely the stepwise mutation model. In this section we use the exact results in (4)–(8) to study the behavior of θ(I), θ(II), and RST over a broad range of mutation rates and migration rates. In addition, we explore the sensitivity of these parameters to the details of the mutational process by comparing results from the stepwise mutation model with an extreme case of the two-phase model suggested by Di Rienzo et al. (1994). In the two-phase model mutations may increase or decrease microsatellite size by more than one repeat. Specifically, with probability ϕ, mutation increases or decreases allele size difference by one repeat, and with probability 1 – ϕ it increases or decreases allele size difference by j repeats, where j follows some probability distribution. Di Rienzo et al. (1994) considered a truncated geometric distribution where Pr(j) ∝ αj for j ≥ 1. We restrict our attention to the case where ϕ = 0, which results in more multistep mutations than any other choice of ϕ.

Figure 1 displays the behavior of θ(I) and θ(II) as a function of μ and m for two combinations of k and A and 2N = 100. As expected, both parameters decrease towards zero as the migration rate increases. Similarly, both parameters decrease towards zero as the mutation rate increases, because mutation-induced homoplasy causes similarity among populations when mutation rates are high, unless mutation matrices are population dependent. The high values of θ(I) with small k, high m, and low μ may be initially surprising, but notice that θ(I) depends on the variance of allele frequencies over time, not over populations. Under these conditions populations will be nearly fixed for one allele or another at all times, causing the variance of allele frequency over time and θ(I) to be near their maxima. Thus, populations with similar allele frequencies at high mutation rate loci may be similar either because they exchange alleles frequently or because the mutation rate is large enough to swamp the effects of genetic drift (or because the populations have only recently diverged from one another; Felsenstein, 1982).

Figure 1.

Figure 1

Plots of θ(I) and θ(II) v.s. m and μ with 2N = 100.

Tables 1 and 2 provide more details on the behavior of θ(I) and θ(II) for two extremes of local population size, 2N = 100 and 2N = 10,000, and for a variety of realistic migration and mutation parameters. Several additional observations emerge from examining these tables. First, equation (12) and the discussion that follow suggest that the expected value of θ(I) at stationarity should not depend on mutation rate when Nμ is small. Evaluation of the exact expression (10), on the other hand, shows that θ(I) is strongly dependent on mutation rates in the range of 10−2 to 10−4, which may be characteristic of microsatellite loci. Second, although the values of both θ(I) and θ(II) are influenced by the particular mutational model chosen, differences between values associated with the stepwise mutation model typically differ from those associated with the two-phase model by only a few percent, and in no case are the differences greater than about 10%. Third, θ(I) and θ(II) are only weakly dependent on the size range (number) of alleles. Together these observations suggest that θ(I) and θ(II) are strongly influenced by the overall rate of mutation, but only weakly influenced by details of the mutational process. Finally, notice that values of θ(II) are smaller than corresponding values of θ(I), and that the differences can be substantial when k is small.

Table 1.

Behavior of θ(I) and θ(II) as a function of k, A, m, and mutational parameters when 2N = 100. The subscript SMM refers to results for the stepwise mutation model. The subscript 10 refers to results from the two-phase model with ϕ = 0 and α = 0.1. The subscript 90 refers to results from the two-phase model with ϕ = 0 and α = 0.9.

k A m μ θSMM(I) θ10(I) θ90(I) θSMM(II) θ10(II) θ90(II)
5 10 0.1 0.01 0.20719 0.25370 0.18154 0.03055 0.03139 0.03150
0.001 0.57706 0.67914 0.65222 0.03232 0.03244 0.03246
0.0001 0.91358 0.94881 0.94804 0.03255 0.03256 0.03256
0.01 0.01 0.29395 0.34922 0.29359 0.15761 0.18282 0.18464
0.001 0.62695 0.70975 0.68321 0.22827 0.23543 0.23707
0.0001 0.91647 0.94977 0.94883 0.24304 0.24397 0.24419
0.001 0.01 0.39728 0.48408 0.44163 0.31972 0.39721 0.37635
0.001 0.76294 0.81812 0.80356 0.63171 0.68162 0.69078
0.0001 0.93523 0.95725 0.95532 0.74525 0.75367 0.75567
50 0.1 0.01 0.23408 0.29433 0.21945 0.03046 0.03143 0.03154
0.001 0.58506 0.70155 0.67689 0.03231 0.03244 0.03246
0.0001 0.91392 0.95289 0.95220 0.03254 0.03256 0.03256
0.01 0.01 0.31682 0.38499 0.32580 0.15500 0.18443 0.18703
0.001 0.63455 0.72958 0.70513 0.22743 0.23573 0.23732
0.0001 0.91688 0.95376 0.95291 0.24293 0.24400 0.24421
0.001 0.01 0.41302 0.51493 0.47196 0.31307 0.40536 0.39321
0.001 0.76745 0.83014 0.81614 0.62620 0.68408 0.69331
0.0001 0.93590 0.96049 0.95874 0.74424 0.75398 0.75592

100 10 0.1 0.01 0.06174 0.06767 0.05600 0.04593 0.04740 0.04740
0.001 0.15748 0.18610 0.12794 0.04908 0.04926 0.04928
0.0001 0.45111 0.54384 0.49488 0.04945 0.04947 0.04948
0.01 0.01 0.22611 0.26144 0.25009 0.21637 0.24937 0.24466
0.001 0.37258 0.39648 0.36177 0.31132 0.31999 0.32059
0.0001 0.57028 0.62966 0.58784 0.32964 0.33077 0.33099
0.001 0.01 0.39137 0.47652 0.43990 0.38762 0.47221 0.43679
0.001 0.72735 0.77196 0.76708 0.71809 0.76200 0.76183
0.0001 0.84835 0.86139 0.84932 0.81713 0.82355 0.82435
50 0.1 0.01 0.08378 0.09584 0.06601 0.04586 0.04751 0.04755
0.001 0.18772 0.22744 0.15971 0.04906 0.04927 0.04929
0.0001 0.46463 0.57335 0.52768 0.04945 0.04947 0.04948
0.01 0.01 0.24108 0.28438 0.26194 0.21575 0.25343 0.25055
0.001 0.39449 0.42630 0.38248 0.31048 0.32042 0.32122
0.0001 0.58191 0.65268 0.61317 0.32951 0.33081 0.33103
0.001 0.01 0.40289 0.50221 0.46659 0.39405 0.49237 0.46289
0.001 0.73267 0.78320 0.77510 0.71524 0.76512 0.76649
0.0001 0.85313 0.86904 0.85627 0.81644 0.82384 0.82469

Table 2.

Behavior of θ(I) and θ(II) as a function of k, A, m, and mutational parameters when 2N = 10,000. Refer to Table 1 for an explanation of the subscripts.

k A m μ θSMM(I) θ10(I) θ90(I) θSMM(II) θ10(II) θ90(II)
5 10 0.1 0.01 0.00393 0.00499 0.00221 0.00031 0.00032 0.00032
0.001 0.03273 0.04174 0.01884 0.00033 0.00033 0.00033
0.0001 0.18458 0.23256 0.15770 0.00033 0.00033 0.00033
0.01 0.01 0.00558 0.00696 0.00412 0.00198 0.00231 0.00224
0.001 0.03521 0.04426 0.02148 0.00297 0.00307 0.00307
0.0001 0.18658 0.23435 0.15973 0.00318 0.00319 0.00319
0.001 0.01 0.01018 0.01293 0.00790 0.00665 0.00836 0.00603
0.001 0.05002 0.06175 0.03949 0.01898 0.02219 0.02176
0.0001 0.20440 0.25092 0.17870 0.02863 0.02961 0.02974
50 0.1 0.01 0.01344 0.01782 0.00228 0.00031 0.00032 0.00032
0.001 0.06203 0.07874 0.03543 0.00033 0.00033 0.00033
0.0001 0.21222 0.27440 0.19694 0.00033 0.00033 0.00033
0.01 0.01 0.01506 0.01979 0.00433 0.00199 0.00236 0.00231
0.001 0.06440 0.08114 0.03801 0.00297 0.00307 0.00308
0.0001 0.21416 0.27608 0.19885 0.00317 0.00319 0.00319
0.001 0.01 0.01999 0.02637 0.00977 0.00723 0.00727 0.00937
0.001 0.07839 0.09796 0.05595 0.01880 0.02251 0.02230
0.0001 0.23140 0.29167 0.21666 0.02852 0.02965 0.02979

100 10 0.1 0.01 0.00066 0.00073 0.00059 0.00048 0.00049 0.00049
0.001 0.00233 0.00286 0.00146 0.00051 0.00051 0.00051
0.0001 0.01768 0.02256 0.00986 0.00051 0.00052 0.00052
0.01 0.01 0.00300 0.00357 0.00330 0.00282 0.00333 0.00320
0.001 0.00629 0.00699 0.00559 0.00448 0.00465 0.00465
0.0001 0.02187 0.02673 0.01414 0.00485 0.00487 0.00487
0.001 0.01 0.00913 0.01150 0.00784 0.00895 0.01126 0.00774
0.001 0.02877 0.03417 0.03170 0.02707 0.03197 0.03080
0.0001 0.05855 0.06460 0.05296 0.04282 0.04442 0.04442
50 0.1 0.01 0.00131 0.00168 0.00072 0.00048 0.00049 0.00049
0.001 0.00786 0.01055 0.00260 0.00051 0.00051 0.00051
0.0001 0.04122 0.05245 0.02050 0.00051 0.00052 0.00052
0.01 0.01 0.00368 0.00462 0.00355 0.00285 0.00344 0.00332
0.001 0.01177 0.01463 0.00675 0.00490 0.00467 0.00467
0.0001 0.04527 0.05645 0.02470 0.00485 0.00488 0.00488
0.001 0.01 0.01099 0.01431 0.00981 0.01020 0.01319 0.00959
0.001 0.03419 0.04227 0.03389 0.02732 0.03292 0.03193
0.0001 0.08060 0.09279 0.06297 0.04274 0.04454 0.04459

As the results in Tables 3 and 4 show, there is one striking difference between the behavior of RST as a function of local population sizes, migration rate, and mutational parameters and the behavior of θ(I) and θ(II) as functions of those same parameters: RST is not only relatively insensitive to the choice of mutational model (stepwise versus two-phase), it is also relatively insensitive to the overall rate of mutation. Moreover, the expected value of RST at stationarity is relatively close the the value predicted for a finite-island model when the range of allele sizes is unbounded (Rousset, 1996). As Table 5 shows, however, the relatively small differences in RST may mask larger differences in the value of Nmα that would be inferred from them (compare Rousset, 1996), especially when the number of populations exchanging genes is small.

Table 3.

Behavior of RST under different mutational models when 2N = 100. RST,L = 1/(4Nmα + 1), where α = k/(k − 1). The subscript SMM refers to the stepwise mutation model. The numerical subscripts, K, refer to the two-phase model with ϕ = 0 and α = K/100.

k A m μ RST,L RST RST,10 RST,50 RST,90
5 10 0.1 0.01 0.03846 0.03251 0.03254 0.03238 0.03210
0.001 0.03262 0.03262 0.03261 0.03259
0.0001 0.03263 0.03264 0.03263 0.03263
0.01 0.01 0.28571 0.23721 0.23879 0.22814 0.21107
0.001 0.24418 0.24433 0.24356 0.24167
0.0001 0.24523 0.24523 0.24520 0.24503
0.001 0.01 0.80000 0.69443 0.70790 0.61446 0.50037
0.001 0.75457 0.75614 0.74634 0.72728
0.0001 0.76277 0.76293 0.76232 0.76052
50 0.1 0.01 0.03846 0.03262 0.03262 0.03261 0.03259
0.001 0.03263 0.03263 0.03263 0.03263
0.0001 0.03264 0.03264 0.03264 0.03264
0.01 0.01 0.28571 0.24442 0.24455 0.24381 0.24268
0.001 0.24518 0.24519 0.24510 0.24480
0.0001 0.24536 0.24536 0.24536 0.24531
0.001 0.01 0.80000 0.75816 0.75924 0.75384 0.74623
0.001 0.76276 0.76289 0.76198 0.76016
0.0001 0.76391 0.76392 0.76383 0.76346

100 10 0.1 0.01 0.04717 0.04929 0.04933 0.04900 0.04844
0.001 0.04948 0.04949 0.04945 0.04940
0.0001 0.04950 0.04950 0.04950 0.04949
0.01 0.01 0.33110 0.32106 0.32327 0.30641 0.28134
0.001 0.33099 0.33124 0.32947 0.32641
0.0001 0.33206 0.33209 0.33195 0.33165
0.001 0.01 0.83193 0.76564 0.77831 0.68440 0.56754
0.001 0.82450 0.82602 0.81454 0.79562
0.0001 0.83116 0.83133 0.83025 0.82828
50 0.1 0.01 0.04717 0.04949 0.04950 0.04948 0.04946
0.001 0.04950 0.04950 0.04950 0.04950
0.0001 0.04950 0.04950 0.04950 0.04950
0.01 0.01 0.33110 0.33162 0.33177 0.33101 0.32977
0.001 0.33211 0.33213 0.33203 0.33188
0.0001 0.33220 0.33220 0.33219 0.33215
0.001 0.01 0.83193 0.82866 0.82961 0.82479 0.82097
0.001 0.83154 0.83164 0.83113 0.83033
0.0001 0.83191 0.83193 0.83185 0.83172

Table 4.

Behavior of RST under different mutational models when 2N = 10,000. RST,L = 1/(4Nmα + 1), where α = k/(k − 1). The subscript SMM refers to the stepwise mutation model. The numerical subscripts, K, refer to the two-phase model with ϕ = 0 and α = K/100.

k A m μ RST,L RST RST,10 RST,50 RST,90
5 10 0.1 0.01 0.00040 0.00033 0.00033 0.00033 0.00033
0.001 0.00033 0.00033 0.00033 0.00033
0.0001 0.00033 0.00033 0.00033 0.00033
0.01 0.01 0.00398 0.00308 0.00310 0.00291 0.00263
0.001 0.00319 0.00319 0.00317 0.00314
0.0001 0.00320 0.00320 0.00320 0.00320
0.001 0.01 0.03846 0.02230 0.02363 0.01539 0.00969
0.001 0.02979 0.03003 0.02820 0.02553
0.0001 0.03088 0.03090 0.03072 0.03039
50 0.1 0.01 0.00040 0.00033 0.00033 0.00033 0.00033
0.001 0.00033 0.00033 0.00033 0.00033
0.0001 0.00033 0.00033 0.00033 0.00033
0.01 0.01 0.00398 0.00320 0.00320 0.00319 0.00319
0.001 0.00320 0.00320 0.00320 0.00320
0.0001 0.00320 0.00320 0.00320 0.00320
0.001 0.01 0.03846 0.03047 0.03062 0.02986 0.02961
0.001 0.03094 0.03096 0.03087 0.03074
0.0001 0.03101 0.03101 0.03099 0.03097

100 10 0.1 0.01 0.00049 0.00051 0.00051 0.00051 0.00050
0.001 0.00052 0.00052 0.00051 0.00051
0.0001 0.00052 0.00052 0.00052 0.00052
0.01 0.01 0.00493 0.00466 0.00471 0.00435 0.00386
0.001 0.00488 0.00488 0.00484 0.00477
0.0001 0.00490 0.00490 0.00490 0.00489
0.001 0.01 0.04717 0.03170 0.03386 0.02087 0.01268
0.001 0.04453 0.04496 0.04165 0.03700
0.0001 0.04650 0.04655 0.04618 0.04556
50 0.1 0.01 0.00049 0.00052 0.00052 0.00052 0.00051
0.001 0.00052 0.00052 0.00052 0.00052
0.0001 0.00052 0.00052 0.00052 0.00052
0.01 0.01 0.00493 0.00489 0.00489 0.00488 0.00486
0.001 0.00490 0.00490 0.00490 0.00490
0.0001 0.00490 0.00490 0.00490 0.00490
0.001 0.01 0.04716 0.04578 0.04606 0.04477 0.04308
0.001 0.04664 0.04667 0.04651 0.04633
0.0001 0.04673 0.04673 0.04672 0.04669

Table 5.

Comparison of Nmα that would be inferred from stationary values of RST ((Nm^α)) and Nmα for several combinations of N, m, and k with A = 10, μ = 0.001, and γ = (0.1, 0.5, 0.9). SMM in subscripts refers to the stepwise mutation model. The numerical subscripts, K, refer to the two-phase model with ϕ = 0 and α = K/100.

2N m k Nmα Nmα^SMM Nmα^10 Nmα^50 Nmα^90
100 0.001 100 0.0505 0.0532 0.0527 0.0569 0.0642
5 0.0625 0.0813 0.0806 0.0850 0.0937
0.01 100 0.505 0.505 0.505 0.509 0.516
5 0.625 0.787 0.773 0.776 0.784
0.1 100 5.05 4.80 4.80 4.81 4.81
5 6.25 7.41 7.41 7.42 7.42

10000 0.001 100 5.05 5.36 5.31 5.75 6.51
5 6.25 8.14 8.08 8.62 9.54
0.01 100 50.5 50.6 50.6 51.4 52.2
5 62.5 78.1 78.1 78.6 79.4

4. Discussion

The results presented above lead to several important conclusions regarding evolutionary analysis of microsatellite data. First, our results show that RST is sensitive to demographic parameters that determine the importance of gene flow (local population size, migration rate, and the number of populations in a metapopulation), but it is relatively insensitive to mutational parameters (mutation rates and stepwise versus two-phase mutational models). Thus, it provides a useful index of the degree to which populations are genetically isolated from one another.

Second, our results reinforce previous observations that the amount of genetic differentiation among contemporaneous populations is substantially less than the amount of genetic variation expected within any one population over evolutionary time (compare Fu et al. 2003; Holsinger in press). Because populations connected through gene flow tend to “drift” together, allele frequencies among contemporaneous populations are correlated with one another. Methods that ignore this correlation may substantially underestimate the extent of stochastic variation in allele frequencies (compare Song et al., 2004; Fu et al., 2005; Holsinger, in press). The effect of the among-population correlation is particularly pronounced when the number of populations exchanging genes (not the number of populations from which samples are available) is small.

Finally, our results show that while RST is relatively insenstive to mutational parameters, measures of among-population genetic differentiation that depend only on allele frequency, namely θ(I) and θ(II), depend quite sensitively on the overall mutation rate. This observation suggests that by taking into account the special mutational properties of microsatellite data, we may be able to develop inferential methods that allow us to make separate estimates of the contribution of mutation and migration to similarities and differences among populations that are geographically structured. Clearly, coalescent methods like those described in Beerli and Felsenstein (2001) allow such distinctions, but the differing properties of RST and θ(I)II suggest that it may be possible to estimate Neμ and Nem directly from RST and θ, a topic of continuing investigation.

Of course, all of the results we present in this paper depend on the assumption that populations have reached stationarity with respect to mutation, migration, and drift. In real populations the assumption of stationarity will never be satisfied. In many cases it may not even be approximately correct. Nonetheless, the relationsip between RST and coalescence times (Slatkin, 1995) suggests that it remains a useful index of population differentiation and gene flow for microsatellite loci, provided we remember that “gene flow” may reflect either continuing migration of individuals among distinct populations or recent divergence of those populations from one another or any combination of those two.

Highlights.

We provide exact expressions for the moments of the allele frequency distribution in a stochastic model appropriate for studying microsatellite evolution with migration, mutation, and drift under the assumption that the range of allele sizes is bounded.

We study the behavior of several measures related to Wright's Fst, including Slatkin's Rst. Our results show that familiar relationships between Nem and Fst or Rst hold when migration and mutation rates are small.

Acknowledgements

This research was supported in part by a grant from the U.S. National Institutes of Health, 1 R01 GM068449-01A1.

Appendix A

We begin with the observation that the stationary covariances are given by

Σ11=(112N)V{(1rk)Σ11+rkΣ12}V+12N(1AIA1A21A1A)

and

Σ12=V{rkk1Σ11+(1rkk1)Σ12}V,

where Σ11, Σ12, and V are symmetric matrices. Rearranging we obtain

Σ11=(112N){(1rk)V2Σ11+rkV2Σ12}+12N(1AIA1A21A1A)

and

Σ12=rkk1V2Σ11+(1rkk1)V2Σ12.

Thus,

Σ11=12N[I(1rkk1)V2]D11(1AIA1A21A1A)=12NQ[I(1rkk1)Λ2]D11Q(1AIA1A21A1A)

and

Σ12=12Nrkk1V2D11(1AIA1A21A1A)=12Nrkk1QΛ2D11Q(1AIA1A21A1A),

where

D1=[I(112N)Λ2][I(1kk1rk)Λ2]rk2NΛ2,

Λ = diag{λ1, …, λA} and QA×A = (q1, …,qA) with λj is the jth eigenvalue of V, with qj, the A × 1 corresponding eigenvector. Further algebraic simplification yields equations (7) and (8).

Appendix B

To establish the results in the text it is sufficient to show that λj=1μ+μcos((j1)πA) and qj=qjqj satisfy the characteristic function of V. Following Gregory and Karney (1969) and Barnett (1990) it follows that this condition is equivalent to

qj1+qj22cos(2α)qj1=0, (B.1)
qjk+qj,k+22cos(2α)qj,k+1=0, (B.2)

for k = 1, …, A − 2 and

qj,A1+qjA2cos(2α)qjA=0, (B.3)

for j = 1, …, A. First, it is trivial for j = 1. Next, we verify (B.1), (B.2) and (B.3) for j = 2, 3, …, A.

l.h.s.of(B.1)=sin(2αj)+2cos(3αj)sin(αj)2cos(2αj)sin(2αj)=sin(2αj)+sin(αj3αj)+sin(αj+3αj)sin(4αj)=sin(2αj)+sin(2αj)=0.

Now, for k = 1, we have

l.h.s.of(B.2)=sin(2αj)+2cos(5αj)sin(αj)4cos(2αj)cos(3αj)sin(αj)=sin(2αj)sin(4αj)+sin(6αj)+2cos(2αj)(sin(2αj)sin(4αj))=sin(2αj)sin(4αj)+sin(6αj)+sin(0)+sin(4αj)sin(2αj)sin(6αj)=0.

The final steps in (B.2) and the identity in (B.3) follow from the standard trigonometric identity

2cosz1cosz2=cos(z1z2)cos(z1+z2).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Allan M, Eisen G, Pomp D. Genomic mapping of direct and correlated responses to long-term selection for rapid weight gain in mice. Genetics. 2005 doi: 10.1534/genetics.105.041319. 105.041319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Barnett S. Matrices: Methods and Applications. Oxford University Press; Oxford: 1990. [Google Scholar]
  3. Beerli P, Felsenstein J. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(8):4563–4568. doi: 10.1073/pnas.081068098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Calabrese PP, Durrett RT, Aquadro CF. Dynamics of microsatellite divergence under stepwise mutation and proportional slippage point mutation models. Genetics. 2001;159:839–852. doi: 10.1093/genetics/159.2.839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chistiakov DA, Hellemans B, Haley CS, Law AS, Tsigeneopoulos CS, Kotoulas G, Bertotto D, Libertini A, Volckaert FAM. A microsatellite linkage map of the European seabass Dicentrarchus labrax L. Genetics. 2005 doi: 10.1534/genetics.104.039719. 104.039719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cockerham CC. Variance of gene frequencies. Evolution. 1969;23:72–84. doi: 10.1111/j.1558-5646.1969.tb03496.x. [DOI] [PubMed] [Google Scholar]
  7. Cockerham CC, Weir BS. Correlations, descent measures: Drift with migration and mutation. Proceedings of the National Academy of Sciences USA. 1987;84:8512–8514. doi: 10.1073/pnas.84.23.8512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Crow JF, Aoki K. Group selection for a polygenic behavioral trait: estimationg the degree of population subdivision. Proceedings of the National Academy of Sciences USA. 1984;81:6073–6077. doi: 10.1073/pnas.81.19.6073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Slatkin M, Freimer NB. Mutational processes of simple-sequence repeat loci in human populations. Proceedings of the National Academy of Sciences USA. 1994;91:3166–3170. doi: 10.1073/pnas.91.8.3166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Estoup A, Jarne P, Cornuet J-M. Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Molecular Ecology. 2002;11:1591–1604. doi: 10.1046/j.1365-294x.2002.01576.x. [DOI] [PubMed] [Google Scholar]
  11. Feldman MW, Bergman A, Pollock DD, Goldstein DB. Microsatellite genetic distances with range constraints: analytic description and problems of estimation. Genetics. 1997;145:207–216. doi: 10.1093/genetics/145.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Felsenstein J. How can we infer geography and history from gene frequencies? Journal of Theoretical Biology. 1982;96:9–20. doi: 10.1016/0022-5193(82)90152-7. [DOI] [PubMed] [Google Scholar]
  13. Fu R, Gelfand AE, Holsinger KE. Exact moment calculations for genetic models with migration, mutation, and drift. Theoretical Populatipon Biology. 2003;63:231–243. doi: 10.1016/s0040-5809(03)00003-0. [DOI] [PubMed] [Google Scholar]
  14. Goldstein DB, Linares AR, Cavalli-Sforza LL, Feldman MW. An evaluation of genetic distances for use with microsatellite loci. Genetics. 1995;139:463–471. doi: 10.1093/genetics/139.1.463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gregory RT, Karney DL. A Collection of Matrices for Testing Computational Algorithms. Wiley-Interscience; 1969. 1969. [Google Scholar]
  16. Hall MC, Willis JH. Transmission ratio distortion in intraspecific hybrids of Mimulus guttatus: implications for genomic divergence. Genetics. 2005;170:375–386. doi: 10.1534/genetics.104.038653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Holsinger KE. Bayesian hierarchical models in geographical genetics. In: Clark JS, Gelfand AE, editors. Applications of Computational Statistics in the Environmental Sciences. Oxford University Press; New York, NY: in press. [Google Scholar]
  18. Kretzer AM, Dunham S, Molina R, Spatafora JW. Patterns of vegetative growth and gene flow in Rhizopogon vinicolor and R. vesiculosus (Boletales, Basidiomycota) Molecular Ecology. 2005;14:2259–2268. doi: 10.1111/j.1365-294X.2005.02547.x. [DOI] [PubMed] [Google Scholar]
  19. Kai W, Kikuchi K, Fujita M, Suetake H, Fujiwara A, Yoshiura Y, Ototake M, Venkatesh B, Miyaki K, Suzuki Y. A genetic linkage map for the tiger pufferfish, Takifugu rubripes. Genetics. 2005 doi: 10.1534/genetics.105.042051. 105.042051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Malécot G. Les Mathématiques de l'Hérédité. Masson et Cie; Paris: 1948. [Google Scholar]
  21. Minvielle F, Kayang BB, Inoue-Murayama M, Miwa M, Vignal A, Gourichon D, Neau A, Monvoisin JL, Ito S. Microsatellite mapping of QTL affecting growth, feed consumption, egg production, tonic immobility and body temperature of Japanese quail. BMC Genomics. 2005;6:87. doi: 10.1186/1471-2164-6-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Nagylaki T. Geographical invariance in population genetics. Journal of Theoretical Biology. 1983;99:159–172. doi: 10.1016/0022-5193(82)90396-4. [DOI] [PubMed] [Google Scholar]
  23. Ohta T, Kimura M. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genetical Research. 1973;22:201–204. doi: 10.1017/s0016672300012994. [DOI] [PubMed] [Google Scholar]
  24. Pollock DD, Bergman A, Feldman MW, Goldstein DB. Microsatellite behavior with range constraints: parameter estimation and improved distances for use in phylogenetic reconstruction. Theoretical Population Biology. 1998;53:256–271. doi: 10.1006/tpbi.1998.1363. [DOI] [PubMed] [Google Scholar]
  25. Rousset F. Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics. 1996;142:1357–1362. doi: 10.1093/genetics/142.4.1357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Rousset F. Genetic differentiation in populations with different classes of individuals. Theoretical Populatipon Biology. 1999;55:297–308. doi: 10.1006/tpbi.1998.1406. [DOI] [PubMed] [Google Scholar]
  27. Rousset F. Inferences from spatial population genetics. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of Statistical Genetics. John Wiley & Sons; Chichester: 2001. pp. 239–269. 2001. [Google Scholar]
  28. Slatkin M. A measure of population subdivision based on microsatellite allele frequencies. Genetics. 1995;139:457–462. doi: 10.1093/genetics/139.1.457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Song S, Dey DK, Holsinger KE. Hierarchical models with migration, mutation, and drift: implications for genetic inference. Evolution. 2006;60:1–12. [PMC free article] [PubMed] [Google Scholar]
  30. Tautz D. Note on the definition and nomenclature of tandemly repetitive DNA sequences. In: Pena SDJ, Eplen JT, Jeffreys AJ, editors. DNA Fingerprinting: State of the Science. Birkhauser Verlag; Basel: 1993. pp. 21–28. [DOI] [PubMed] [Google Scholar]
  31. Wehrhahn CF. The evolution of selectively similar electrophoretically detectable alleles in finite natural populations. Genetics. 1975;80:375–394. doi: 10.1093/genetics/80.2.375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Weir BS. Genetic Data Analysis II. Sinauer Associates; Sunderland, MA: 1996. [Google Scholar]
  33. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38:1358–1370. doi: 10.1111/j.1558-5646.1984.tb05657.x. [DOI] [PubMed] [Google Scholar]
  34. Weir BS, Hill WG. Estimating F-statistics. Annual Reviews of Genetics. 2002;36:721–750. doi: 10.1146/annurev.genet.36.050802.093940. [DOI] [PubMed] [Google Scholar]
  35. Wright S. The genetical Structure of populations. Annals of Eugenics. 1951;15:323–354. doi: 10.1111/j.1469-1809.1949.tb02451.x. [DOI] [PubMed] [Google Scholar]
  36. Xu H, Chakraborty R, Fu Y-X. Mutation rate variation at human dinucleotide microsatellites. Genetics. 2005;170:305–312. doi: 10.1534/genetics.104.036855. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES