Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Aug 1.
Published in final edited form as: Stud Hist Philos Biol Biomed Sci. 2015 Feb 9;52:32–45. doi: 10.1016/j.shpsc.2014.12.005

Implications of the apportionment of human genetic diversity for the apportionment of human phenotypic diversity

Michael D Edge 1,*, Noah A Rosenberg 1
PMCID: PMC4516610  NIHMSID: NIHMS655366  PMID: 25677859

Abstract

Researchers in many fields have considered the meaning of two results about genetic variation for concepts of “race.” First, at most genetic loci, apportionments of human genetic diversity find that worldwide populations are genetically similar. Second, when multiple genetic loci are examined, it is possible to distinguish people with ancestry from different geographical regions. These two results raise an important question about human phenotypic diversity: To what extent do populations typically differ on phenotypes determined by multiple genetic loci? It might be expected that such phenotypes follow the pattern of similarity observed at individual loci. Alternatively, because they have a multilocus genetic architecture, they might follow the pattern of greater differentiation suggested by multilocus ancestry inference. To address the question, we extend a well-known classification model of Edwards (2003) by adding a selectively neutral quantitative trait. Using the extended model, we show, in line with previous work in quantitative genetics, that regardless of how many genetic loci influence the trait, one neutral trait is approximately as informative about ancestry as a single genetic locus. The results support the relevance of single-locus genetic-diversity partitioning for predictions about phenotypic diversity.

Keywords: Genetic differentiation, population genetics, quantitative genetics, race

1. Introduction

Going back to Lewontin’s 1972 study of human genetic diversity, many investigators have reported that at typical genetic loci, most of the allelic variation in statistical partitions of human genetic variation is “within,” rather than “between,” populations (e.g. Barbujani et al., 1997; Brown & Armelagos, 2001; Rosenberg et al., 2002, 2003b; Li et al., 2008). Many of these studies presented their results as estimates of FST, a quantity that can be interpreted as the proportion of allelic variance—that is, variance in a binary random variable representing the presence or absence of a specific allele—attributable to differences in allele frequencies between populations (Holsinger & Weir, 2009). Estimates of worldwide human FST and FST-like quantities have ranged from ~0.05 (e.g. Rosenberg et al., 2002) to ~0.15 (e.g. Barbujani et al., 1997), meaning that 5–15% of allelic variance at a representative locus is due to between-population differences in allele frequencies—or, equivalently, that 85–95% lies in the within-population variance component.

In spite of this result, which shows that human groups have similar allele frequencies at most variable loci, it is possible to infer the continental ancestry of individual people using genetic data alone (e.g. Bowcock et al., 1994; Mountain & Cavalli-Sforza, 1997; Rosenberg et al., 2002; Bamshad et al., 2003; Tang et al., 2005). Ancestry inference is performed by pooling information from many loci. Each locus provides only a small amount of information about population membership, but when many loci are used, their information can be combined to distinguish among potential ancestries.

In 2003, A.W.F. Edwards provided a particularly clear explanation of the way in which multiple loci can be used to classify accurately even when each individual locus is only weakly informative (Edwards, 2003). Edwards’ point was not new—it appeared in earlier arguments about allelic-variance partitioning and classification (e.g., Mitton, 1977; Neel, 1981; Smouse et al., 1982)—but he used an accessible model that clarified the result.

What do single-locus variance partitioning and multilocus classification studies lead us to expect about phenotypic differences between human populations? The finding that human groups have similar allele frequencies at most genetic loci has been used to support arguments that most large, genetically-based phenotypic differences between groups are exceptions to the genomic rule (e.g., Goodman, 2000; Brown & Armelagos, 2001; Feldman & Lewontin, 2008). Indeed, single-locus partitioning studies do suggest that human populations will not differ widely on most traits controlled by a single genetic locus. But the fact that classification is possible using many loci seems to suggest that human groups might differ more substantially on traits influenced by many loci. If populations can be distinguished with multilocus genotypes, then it is possible that phenotypes controlled by multilocus genotypes could differ markedly between populations. Should we expect to see larger differences between human populations for traits influenced by many loci than for traits influenced by a single locus?

Here, we extend Edwards’ model—which has already proven to be an effective framework for describing results about allelic-variance partitioning and classification—to study the expected level of between-population difference for a selectively neutral quantitative trait. Other researchers have studied this question in other contexts (e.g. Felsenstein, 1973; Lande, 1976, 1992; Chakraborty & Nei, 1982; Rogers & Harpending, 1983; Whitlock, 1999; Berg & Coop, 2014), but by basing our analysis in Edwards’ model, we explicitly connect questions about trait differences to questions about multilocus ancestry inference. We show that for a random quantitative trait under the extended Edwards model, two groups are not unduly likely to differ on traits that are determined by many loci, even when the loci influencing the trait would provide a sufficient basis for accurate classification. In particular, the expected level of difference between the populations’ mean trait values is, in two senses made more precise below, approximately equal to the magnitude of single-locus genetic difference between the populations. Similarly, a typical multilocus trait contributes approximately as much information for classification as does a single genetic locus.

2. The Edwards Model

Risch et al. (2002, box 1), Edwards (2003), and Tal (2012) have examined related classification models involving accumulations of information across loci; here, we consider the simplest of these models, that of Edwards (2003). We first describe key features of the model, and we then introduce a quantitative trait.

Suppose we have two haploid populations of equal size, labeled A and B. At one genetic locus, the probability that an individual from population A has an allele we label “1” is p, with p ∈ (0, 1/2), and the probability of allele “0” is q = 1 − p. In population B, the allele frequencies are switched: The probability of “1” is q and the probability of “0” is p. Table 1 shows the allele frequencies by population.

Table 1.

The frequencies of the “0” and “1” alleles in each population, with p + q = 1.

Population Allele
“0” “1”
A graphic file with name nihms655366t1.jpg
B

We can represent the genotype of an individual at the locus as a random variable L that takes values of 0 and 1, and we can represent population membership of an individual as a random variable M that takes values A and B. Within each population, the allelic variance at the locus—that is, the variance of L—is Var(L|M = A) = Var(L|M = B) = pq. This result follows from the status of L|M as a Bernoulli random variable with probability either p or q.

When not conditioning on population membership, the genotype at the locus is still a Bernoulli random variable, but now, because populations A and B are equal in size, the probability of observing a “1” is 1/2:

P(L=1)=P(M=A)P(L=1M=A)+P(M=B)P(L=1M=B)=12p+12q=12[p+(1-p)]=12.

The total unconditional variance of L is therefore Var(L) = P(L = 0)P(L = 1) = 1/4.

The proportion of the total allelic variance that is “within populations”—that is, the proportion of the total variance that remains after conditioning on an individual’s population membership—is the conditional variance of L given M divided by the total variance of L:

Var(LM)/Var(L)=pq/(1/4)=4pq.

Because the total allelic variance is the sum of within- and between-population components, the proportion of the total variance in allelic types that is “between populations,” or FST, is

FST=(1/4-pq)/(1/4)=1-4pq. (1)

Mimicking estimates for the between-region and between-population proportion of genetic diversity from Lewontin (1972) and subsequent studies, if we assume p < q, then we might take p between 0.3 and 0.4—an interval that produces within-population variance proportions from 0.84 to 0.96—as approximately reflecting differences between human groups at a typical locus.

Suppose we want to classify individuals into populations using the genotype at the locus. That is, we wish to predict population membership M after observing an individual’s allele. If p < q, then the decision rule with the greatest prediction accuracy is to assign individuals with a “0” allele to population A and individuals with allele “1” to population B (Rosenberg et al., 2003a). That is, we assign an individual to the population in which its allele is most common. Misclassification occurs for individuals from population A with a “1” allele and individuals from population B with a “0” allele. The total misclassification probability is

P(L=1M=A)P(M=A)+P(L=0M=B)P(M=B)=12p+12p=p.

Thus, if we use a single locus for classification, then the misclassification rate is p.

Suppose now that instead of being limited to one locus, we use k loci to classify. We represent the genotypes of a random individual at the k loci as random variables L1, …, Lk, denoting the total number of “1” alleles at the loci by the random variable S=i=1kLi. Assume that for all loci, allele frequencies in each population are the same as at the single locus described above, and that conditional on population membership, alleles at separate loci are independent. In other words, conditional on population membership, the sum S of “1” alleles is the sum of k independent Bernoulli trials—a binomial random variable. For population A, (S|M = A)~ Binomial(k, p), and in population B, (S|M = B)~ Binomial(k, q). Because q > p and q = 1 − p, we set a rule that if S < k/2, then the individual is assigned to population A, and if S > k/2, then the individual is assigned to population B. If S = k/2, then the individual is assigned to population A or B randomly with probability 1/2 for each population. We represent the event that an individual is misclassified on the basis of S with the random variable WS. If the individual is misclassified, then WS = 1, and WS = 0 otherwise.

Following our classification rule, for odd k, the probability that an individual from population B is misclassified into population A is the probability that S < k/2:

P(WS=1M=B)=P(S<k/2M=B)=i=0(k-1)/2(ki)qipk-i. (2)

For even k, we modify the expression slightly to accommodate the possibility that S = k/2, in which case misclassification occurs with probability 1/2:

P(WS=1M=B)=12(kk/2)pk/2qk/2+i=0k/2-1(ki)qipk-i. (3)

Because we have assumed that p = 1 − q, the misclassification probability is the same irrespective of the population from which the individual is drawn, so we can drop the condition on population membership. Moreover, we show in Appendix A that Eq. 2 evaluated at k = 2h + 1, where h is a non-negative integer, is equal to Eq. 3 evaluated at k = 2h + 2. Applying this identity yields an expression for P(WS = 1) for both odd and even k:

P(WS=1)=i=0(k-1)/2(2k/2-1i)qip2k/2-1-i. (4)

Figure 1 shows the decline in misclassification rates for several values of p, illustrating that the misclassification probability decreases as the number of loci used for classification grows.

Figure 1.

Figure 1

Under the Edwards model, the misclassification rate approaches zero as the number of loci increases, as long as the two source populations differ in their allele frequencies. Misclassification rates are computed from Eq. 4. A similar figure appears in Edwards (2003).

To better understand why the misclassification rate falls as the number of loci increases, we can approximate the distribution of S in each population. By the central limit theorem, as k increases, the distribution of the binomial random variable S in each population approaches a normal distribution. Using the properties of binomial random variables, the expected sum is E(S|M = A) = kp for an individual from population A and E(S|M = B) = kq for an individual from population B. The variance of the sum in each group is Var(S|M) = kpq, and the standard deviation is kpq.

Under the normal approximation, discarding the (1/2)P(S = k/2) term, which is negligible for large k, the probability of misclassifying an individual from population A into population B is the probability that S > k/2. For individuals from population A, the difference between k/2 and the expected value of S in units of the standard deviation of S is

k/2-kpkpq=kq-p2pq.

Eq. 4 is then approximated by

P(WS=1)1-Φ(kq-p2pq), (5)

where Φ is the cumulative distribution function for the standard normal distribution. Φ increases to 1 monotonically as its argument approaches infinity. In fact, the argument need not be too large for Φ to take values close to 1. Φ(c) is the probability that a normal random variable is more than c standard deviations above its expectation. Normal random variables are unlikely to be more than 3 standard deviations above their expectation, Φ(3) ≈ 1 − 10−3, and they are less likely still to fall more than 6 standard deviations above their expectation, Φ(6) ≈ 1 − 10−9. In our case, the argument of Φ grows with k—for example, with p = 0.35, setting k = 90 gives a misclassification rate P(WS = 1) ≈ 10−3, and setting k = 360 gives P(WS = 1) ≈ 10−9. As the number of loci k grows large, the misclassification rate approaches 0.

The Edwards model demonstrates that as long as there is a nonzero difference in populations’ allele frequencies and there are enough conditionally independent loci on which to base the classification, it is possible to classify individuals into populations with arbitrarily high accuracy.

3. Adding a quantitative trait

Next, consider a quantitative trait that is completely determined by the alleles at k loci that have the properties described above. The trait is not influenced by variation in the environment, by gene-environment interaction, by gene-gene interaction, or by epigenetic effects. In quantitative genetics terms, its narrow-sense heritability is 1. We assume that each of the k loci contributes equally to the trait. Specifically, at each locus, we label one allele “+” and the other “−”, where we have not yet specified whether the “+” allele is allele “0” or allele “1.” Because each individual’s value for the trait—which we model as the random variable T—is determined entirely by the equal additive effects of the k loci, T is equal to the number of “+” alleles that the individual carries. That is,

T=i=1kVi, (6)

where Vi = 1 if the individual carries a “+” allele at the ith locus and Vi = 0 otherwise. In other words, whereas we counted the number of “1” alleles to build the random variable S—which was useful for classifying individuals into populations—we now count “+” alleles to construct T, which gives the value of a quantitative trait.

Each of our k loci has two alleles, and each allele now has two labels. One label carries information about population membership: as described in Section 2, the allele that is more common in population A is labeled “0”, and the allele that is more common in population B is labeled “1”. The other label tells us about a trait—the allele that leads to larger values of the trait is labeled “+”, and the allele that leads to smaller values of the trait is labeled “−”. We assume that whether an allele is labeled “1” (or “0”) is independent of whether it is labeled “+” (or “−”). Thus, the allele that is more common in population A is as likely to be associated with larger values of the trait as it is to be associated with smaller values. This choice amounts to an assumption that the trait has been selectively neutral during the divergence of populations A and B, that the allele frequencies in the two populations have reached their current status without any influence of the effect of the loci on the trait. We can express the point with the random variable Xi, which equals 0 if the “0” allele is the “+” allele at the ith locus and 1 if the “1” allele is the “+” allele at the ith locus. For each of the k loci, P(Xi = 0) = P(Xi = 1) = 1/2, independently of the other loci. We denote the sum of the Xi as Z,

Z=i=1kXi. (7)

Because each Xi is a Bernoulli random variable with success probability 1/2 and is independent of the other Xi, Z is binomially distributed with k trials and success probability 1/2.

The Xi and the individual’s values for the Li determine the individual’s values for the Vi, and thus for T. That is, if we know the individual’s set of “0” and “1” alleles and we know which alleles are the “+” alleles, then we can calculate the individual’s value for the trait. In particular,

Vi=Li+(1-2Li)(1-Xi). (8)

Thus, if Li = Xi, then Vi = 1, and if LiXi, then Vi = 0. Because T is the sum of the Vi (Eq. 6), we can rewrite T=i=1kLi+(1-2Li)(1-Xi). The relationships between L, X, and V appear in Table 2. Figure 2 shows a schematic of one realization of our model.

Table 2.

The relationships between the random variables L, X, and V.

Situation L X V
Allele is more common in population A and produces smaller T 0 1 0
Allele is more common in population A and produces larger T 0 0 1
Allele is more common in population B and produces smaller T 1 0 0
Allele is more common in population B and produces larger T 1 1 1

Note. L indicates an allele, either 0 or 1. X represents the randomized labeling of the alleles, indicating whether having L = 1 contributes to larger (X = 1) or smaller (X = 0) values of the trait. V indicates whether the individual’s allele contributes to larger (V = 1) or smaller (V = 1) values of the trait. Any one of these variables can be constructed from the other two—if the other two variables have the same value, then the third variable equals 1; otherwise it equals 0.

Figure 2.

Figure 2

A schematic of one realization of our quantitative trait model with 10 loci. For a given trait, loci are labeled according to whether the “1” or the “0” allele is the “+” allele. These labels are the Xi, and their sum is Z. The Xi are independent Bernoulli random variables with probability 1/2, so Z is a binomial random variable. For every individual, we draw alleles at each locus according to the allele frequencies in the individual’s population—these are the Li, and their sum is S. The Li are independent Bernoulli random variables with probability p in population A and q in population B, so S is also binomial in each population. If Xi = Li, then the individual has a “+” allele at the ith locus (Vi = 1). The number of “+” alleles for an individual is the trait value, T.

4. Properties of the quantitative trait conditional on the labeling of the alleles

We can use this quantitative trait model to study the distribution of group differences for traits. In particular, we can ask how the distribution depends on k. If a trait is highly polygenic, are the populations more likely to differ considerably on the trait than if the trait is determined by a single locus?

Our model contains two randomizations. The first randomization, which we call “labeling,” determines which allele at each locus contributes to larger values of the trait. As mentioned above, independently at each locus, either the “0” or the “1” allele is randomly labeled “+”. Labeling happens once per trait.

In the second randomization, we generate individuals from each population by randomly choosing alleles at each locus according to the allele frequencies in the individual’s population. For example, an individual from population A is generated by drawing “1” alleles with probability p and “0” alleles with probability q, independently at each locus.

We first study properties of the trait conditional on labeling—that is, we study the second randomization conditional on the outcome of the first randomization. More specifically, we examine the distribution, expectation, and variance of the trait value T of an individual, and the misclassification probability of an individual on the basis of T, all conditional on the labels of the alleles being known. These computations tell us how the groups differ on a specific trait with known allelic labels. Later, in Section 5, to learn about how the group differences vary across traits, we study the ways in which these expressions vary across different labelings.

4.1. Distribution of T in each population given the labeling of the alleles

We start by considering the distribution of the trait value T in population A given the labeling. That is, we seek P(T = t|M = A, {X1, …, Xk} = {x1, …, xk}), where T is the sum of the Vi (Eq. 6). Applying Eq. 8 and conditioning on Xi gives either Vi = 1 − Li if Xi = 0 or Vi = Li if Xi = 1. Because the Li are independent and identically distributed in each population, the order of the xi does not affect the computation. Thus, we can forgo conditioning on the Xi and simply condition on their sum, Z (Eq. 7). For ease of representation, if Z = z, then we order the xi so that the first z entries are equal to 1 and the remaining kz entries are equal to 0.

Given that Z = z, we can rewrite T|M, Z as

(TM=A,Z=z)=i=1z(LiM=A,Z=z)+i=z+1k[1-(LiM=A,Z=z)]. (9)

We denote the two sums on the right, with the subscript indicating the population (A or B) and the value of Xi for the corresponding values of Li being summed (1 or 0), by

TA1=i=1z(LiM=A,Z=z) (10)
TA0=i=z+1k[1-(LiM=A,Z=z)]. (11)

In population A, the first z of the Vi are independent Bernoulli random variables with parameter p. Thus, TA1~Binomial(z, p). Similarly, TA0~Binomial(kz, q).

Viewing T as the sum of two binomial distributions leads to distributions, expectations, and variances of T within each population. Conditional on Z = z and membership in population A, the probability that an individual has T = t is

P(T=tM=A,Z=z)=l=0t(zl)plqz-l(k-zt-l)pk-z-t+lqt-l. (12)

This expression sums the probabilities of the ways that t “+” alleles can be drawn from the z loci for which the “1” allele is the “+” allele and the kz loci for which the “0” allele is the “+” allele. A useful alternative statement of Eq. 12 is

P(T=tM=A,Z=z)=pk-zqzl=0t(zl)(k-zt-l)(pq)2l-t. (13)

Similarly, transposing the roles of p and q, in population B,

P(T=tM=B,Z=z)=pzqk-zl=0t(zl)(k-zt-l)(qp)2l-t. (14)

Before considering the expectation and variance of T in each population, we need three more facts about the distribution of T in populations A and B (eqs. 1517). Analogously to Eqs. 911, in population B, T can be viewed as the sum TB1 + TB0, where TB1~Binomial(z, q) and TB0~Binomial(kz, p). Thus, TB1 has the same distribution as zTA1, and TB0 has the same distribution as kzTA0. Then TB0 + TB1 has the same distribution as k − (TA0 + TA1), meaning that

P(T=tM=B,Z=z)=P(T=k-tM=A,Z=z). (15)

If Z = k/2, then TA0~Binomial(k/2, p) and TA1~Binomial(k/2, q). Similarly, TB0~Binomial(k/2, q) and TB1~Binomial(k/2, p). Thus, in both populations, T is the sum of two independent binomial random variables with k/2 trials, one with probability p and one with probability q. The distribution of T is therefore the same in the two populations if Z = k/2:

P(T=tM=A,Z=k/2)=P(T=tM=B,Z=k/2). (16)

In combination, Eqs. 15 and 16 guarantee that if z = k/2, then the conditional probability mass function in each population is symmetric around k/2. That is,

P(T=tM=A,Z=k/2)=P(T=k-tM=A,Z=k/2)P(T=tM=B,Z=k/2)=P(T=k-tM=B,Z=k/2). (17)

The symmetries in Eqs. 1517 result from our assumption that q = 1 − p, and they assist in our analysis of misclassification rates obtained when using T for classification.

4.2 Expectation and variance of T in each population given the labeling of the alleles

The expectation of T in population A conditional on Z is the sum of the expectations of TA1 and TA0 (Eqs. 10, 11), which follow from their status as binomial random variables:

E(TM=A,Z=z)=E(TA1)+E(TA0)=zp+(k-z)q=kq+(p-q)z. (18)

In population B, the expectation of T is

E(TM=B,Z=z)=kp+(q-p)z. (19)

Because TA1 and TA0 are independent,

Var(TM=A,Z=z)=Var(TA1)+Var(TA0)=zpq+(k-z)pq=kpq.

Similarly, Var(T|M = B, Z = z) = kpq. Noticing that z does not appear in the expression, we can remove the condition on Z. The variance σw2 of T in either population is

σw2=Var(TM)=kpq. (20)

One convenient summary of the extent to which the two populations differ on the trait is the standardized group difference, DT. This quantity is the difference between the trait means in the two populations divided by the within-population standard deviation. Conditional on Z = z,

(DTZ=z)=E(TM=A,Z=z)-E(TM=B,Z=z)σw=(q-p)(k-2z)kpq. (21)

DT is an instance of Cohen’s d (Cohen, 1988), a measurement of effect size and a special case of the Mahalanobis distance (Mahalanobis, 1936). Its absolute value is the number of within-population standard deviations separating the population means. For fixed k and z, the value of DT decreases as p increases from 0 to 1/2, so that DT is larger for populations with a greater allele frequency difference qp.

4.3 The total expectation and variance of T given the labeling of the alleles

The expectation of T, removing the condition on population membership, is

E(TZ=z)=EM[E(TM,Z=z)],

where the subscript M indicates that the expectation is with respect to randomness in population membership. Each individual has a probability of 1/2 of being from either population. Thus,

E(TZ=z)=E(TM=A,Z=z)2+E(TM=B,Z=z)2.

Using Eqs. 18 and 19 and remembering that p + q = 1 gives

E(TZ=z)=kq+(p-q)z+kp+(q-p)z2=k2. (22)

This expression does not depend on z, so E(T) = k/2 for all Z.

By the law of total variance, the variance in T can be decomposed into within- and between-population components:

Var(TZ=z)=EM[Var(TM,Z=z)]+VarM[E(TM,Z=z)]. (23)

We already have the first term: Var(T|M, Z = z) = kpq (Eq. 20), and because the conditional variance is constant across populations, EM[Var(T|M, Z = z)] = kpq.

The second term in Eq. 23 is the “between-population” variance of T. Note that

E(TM,Z=z)=E(TM=A,Z=z)+[E(TM=B,Z=z)-E(TM=A,Z=z)]IM=B,

where IM=B =1 if an individual is in population B and IM=B = 0 otherwise. Conditional on Z, the only random variable in this expression is IM=B. IM=B~Bernoulli(1/2), so Var(IM=B) = 1/4. Using Eqs. 18 and 19, the between-population variance of T, which we term σb2, is

σb2=VarM[E(TM,Z=z)]=[E(TM=B,Z=z)-E(TM=A,Z=z)]2/4=(k-2z)2(1-4pq/4,) (24)

and the total variance of T conditional on the labeling of the alleles is then

Var(TZ=z)=kpq+(k-2z)2(1-4pq)/4. (25)

In quantitative genetics, QST is a conceptual analogue of FST for a quantitative trait. For haploids, it is the proportion of heritable variance in a quantitative trait attributable to genetic differences between populations (Whitlock, 2008). In our case, all the variance of T is heritable, so conditional on Z, we define QST as

(QSTZ=z)=σb2σw2+σb2=[(k-2z)(q-p)]2/4kpq+[(k-2z)(q-p)]2/4. (26)

4.4 Probability of misclassification using T given the labeling of the alleles

We represent the event that an individual is misclassified on the basis of its trait value T with the random variable WT, which equals 1 if the individual is misclassified and 0 otherwise. The probability of misclassifying an individual on the basis of T is thus P(WT = 1).

We set a classification rule for the trait value T analogous to the rule used for the genotypic statistic S in Section 2. In particular, we classify each individual into the population to whose trait mean the individual’s trait value T is closest. Thus, conditional on Z = z, we classify the individual into population A if

T-E(TM=A,Z=z)<T-E(TM=B,Z=z), (27)

and into population B if

T-E(TM=A,Z=z)>T-E(TM=B,Z=z). (28)

We randomly classify the individual into either population A or population B, with probability 1/2 for each population, if

T-E(TM=A,Z=z)=T-E(TM=B,Z=z). (29)

We use the properties of the conditional expectation of T to translate this rule into a statement about the distribution of T. By Eq. 22, the two population means of T are symmetric around k/2. Thus, we can translate Eqs. 2729 into statements about the relationship of T to k/2. In particular, because we assume that p < q, Eqs. 18 and 19 show that if z < k/2, then T > k/2 satisfies Eq. 27, and we classify the individual into population A; T < k/2 satisfies Eq. 28, and we classify the individual into population B; and T = k/2 satisfies Eq. 29, and we randomly classify into either population with probability 1/2. Thus, for z < k/2, the probability of misclassifying an individual from population A into population B is

P(WT=1M=A,Z=z)=P(T<k/2M=A,Z=z)+12P(T=k/2M=A,Z=z). (30)

Using the distribution of T in population A (Eq. 13), we have

P(WT=1M=A,Z=z)=12γ+pk-zqzt=0k/2-1l=0t(zl)(k-zt-l)(pq)2l-t, (31)

where γ = P(T = k/2|Z = z) is 0 if k is odd and pk-zqzl=0k/2(zl)(k-zk/2-l)(pq)2l-k/2 if k is even.

Retaining the assumption that z < k/2, the probability of misclassifying an individual from population B into population A is

P(WT=1M=B,Z=z)=P(T>k/2M=B,Z=z)+12P(T=k/2M=B,Z=z). (32)

Because the probability mass function of T in population B is the reflection across k/2 of the probability mass function of T in population A (Eq. 15), the right sides of Eqs. 30 and 32 are equal. Thus, we can use Eq. 31 to calculate the probability of misclassifying an individual from population B on the basis of its trait value when z < k/2.

For z > k/2, applying similar reasoning, the misclassification probability in either population is given by switching the roles of z and kz in Eq. 31. If z = k/2, then Eq. 31 continues to provide the correct probability of misclassification. Eq. 29 is satisfied for all T if z = k/2, so the misclassification probability is P(WT = 1|M = A, Z = k/2) = P(WT = 1|M = B, Z = k/2) = 1/2. To see that Eq. 31 is equal to 1/2 if z = k/2, notice that by the definition of a probability mass function,

P(T<k/2M=A,Z=k/2)+P(T=k/2M=A,Z=k/2)+P(T>k/2M=A,Z=k/2)=1.

By Eq. 17, we can substitute P(T < k/2|M = A, Z = k/2) = P(T > k/2|M = A, Z = k/2) to give 2P(T < k/2|M = A, Z = k/2) + P(T = k/2|M = A, Z = k/2) = 1. Dividing both sides by 2 shows that Eq. 30 is equal to 1/2 if z = k/2. In turn, Eq. 31 is equal to Eq. 30.

We can modify Eq. 31 by replacing z with min(z, kz}and kz with max(z, kz} to get the misclassification probability given Z = z in either population, for any z:

P(WT=1Z=z)=12γ+pmax{z,k-z}qmin{z,k-z}t=0k/2-1l=0t(min{z,k-z}l)(max{z,k-z}t-l)(pq)2l-t. (33)

As we did with P(WS = 1) (Eq. 5), we can approximate P(WT = 1|Z = z) using a normal distribution. In population A, if k is large, then T is approximately normal with expectation zp + (kz)q and variance kpq (Deheuvels et al., 1989, theorem 1.1). The probability of observing a value of T leading to a misclassification—that is, of observing T < k/2 if z < k/2 or T > k/2 if z > k/2—is approximated by

P(WT=1Z=z)1-Φ[k-2z(q-p)2kpq]. (34)

Because the standard normal cumulative distribution function Φ increases monotonically, the approximation of P(WT = 1|Z = z) decreases monotonically as z approaches k/2 from either direction. Holding p, q, and k constant, it achieves its upper bound in z if z = k/2 and P(WT = 1|Z = z) ≈ 1/2. It achieves its lower bound when z = 0 or z = k and P(WT = 1|Z = z) is approximated by the same expression that approximates P(WS = 1) (Eq. 5). The approximate misclassification probability is lowest if the k loci all have the same labeling, so that loci with opposite labelings do not “undo” each other’s contributions to separating the populations. In Appendix B, we prove analogous results for the exact misclassification probability.

5. Properties of T across different labelings of the alleles

Conditional on the labelings of the alleles {X1, X2, …, Xk}, we have obtained the conditional expectation and variance of T given group membership, the expectation and variance of T in the absence of information on group membership, and the probability of misclassifying an individual on the basis of T. We defined two summaries of group difference—DT, the standardized difference in group mean trait values (Eq. 21), and QST, the quantitative-trait analogue of FST (Eq. 26). The overall expectation of T and the variance of T in each population were constant across labelings of the alleles, and the conditional expectation of T given population membership (Eqs. 18, 19), the variance of T (Eq. 25), DT (Eq. 21), and QST (Eq. 26) depended only on Z=i=1kXi, the number of “1” alleles that are labeled as “+” alleles.

In this section, we consider how group differences in the trait vary across labelings of the alleles. That is, we consider the distributions of DT, QST, and the misclassification rate across different traits T, which can have different values of Z. We address three questions. First, how does DT change as k, the number of loci that influence the trait, increases? Second, what is the expected proportion of variance in the trait that is accounted for by genetic differences between the populations—that is, what is E(QST)? Third, does the trait become increasingly useful for classification as the number of loci grows, as the genotypic statistic S did?

5.1. Question 1: How does the standardized difference DT change as k, the number of loci that influence the trait, increases?

Eq. 21 gives the standardized difference DT conditional on Z = z. Removing the condition on Z gives the random variable

DT=E(TM=A)-E(TM=B)σw=(q-p)(k-2Z)kpq. (35)

DT is linear in Z. Recall that Z~Binomial(k, 1/2), so E(Z) = k/2 and Var(Z) = k/4. Thus,

E(DT)=0 (36)
Var(DT)=4(q-p)2kpqVar(Z)=1-4pqpq. (37)

Because E(DT) = 0,

E(DT2)=Var(DT)=1-4pqpq. (38)

The distribution of DT across traits comes from solving Eq. 35 for Z, remembering that Z~Binomial(k, 1/2):

P(DT=d)=12k(kk2-dkpq2(q-p)). (39)

Because Z takes values in (0, 1, 2, …, k), DT takes values in

{-k(q-p)kpq,(-k+2)(q-p)kpq,(-k+4)(q-p)kpq,,k(q-p)kpq}.

The distribution of DT is symmetric around 0 because the distribution of Z is symmetric around k/2. Applying the central limit theorem, for large k, the distribution of DT is approximated by a normal distribution with expectation 0 (Eq. 36) and variance (1 − 4pq)/(pq) (Eq. 37). Figure 3 shows the distribution of DT for p = 0.35 and several values of k. As seen in the figure, increasing k increases the smoothness of the distribution of DT and makes larger values of DT possible, but it does not change the location or spread of the distribution.

Figure 3.

Figure 3

The distribution of the standardized mean group difference DT for a trait additively determined by k biallelic loci of equal effect. Here, p = 0.35. As the number of loci grows, the distribution approaches a normal distribution with expectation 0 and variance (1 − 4pq)/(pq). The expectation and variance of DT do not change with k (Eqs. 36, 37). The plot was produced using histograms of the probabilities in Eq. 39, scaled to have total area 1.

What can we conclude from these results? First, the expectation of the difference in population trait means, E(DT), is zero (Eq. 36). This result reflects the symmetry of the distribution of Z, which, in combination with Eqs. 18 and 19, implies that for a randomly chosen trait, the larger mean value of T is as likely to come from population A as it is to come from population B.

Second, the variance across traits of the standardized mean difference between populations, Var(DT), does not depend on the number of loci that determine the trait (Eq. 37). If the variance of DT grew with k, then the probability of observing large absolute values of DT would also grow with k. Instead, by the central limit theorem, as k increases, the probability of observing absolute values of DT larger than a positive constant C approaches P(DT>C)2Φ[-Cpq/(q-p)]. This value does not depend on k.

Finally, we note that E(DT2) is equal to the squared standardized difference in allelic values at a single locus, a quantity we define analogously to DT (Eq. 21) as

DL=E(LM=A)-E(LM=B)Var(LM).

Recall that L is a Bernoulli random variable with probability p in population A and probability q in population B. Plugging in E(L|M = A) = p, E(L|M = B) = q, and Var(LM)=pq gives

DL=p-qpq,

and squaring gives

DL2=1-4pqpq=E(DT2),

where E(DT2) comes from Eq. 38. Thus, the average squared standardized difference in the population means for the trait, E(DT2), is the same as the squared standardized genetic difference between the populations at a single genetic locus, DL2, regardless of how many loci determine the trait. One answer to question 1, then, is that the expected absolute size of the standardized difference in the two populations’ trait means does not grow as the number of loci influencing the trait increases.

5.2. Question 2: What is the expected proportion of variance in the trait that is accounted for by genetic differences between the populations?

In Eq. 26, we defined QST conditional on Z = z, where QST is, for haploids, the proportion of heritable variance in the trait attributable to genetic differences between the populations. Removing the condition on Z in Eq. 26 gives the random variable

QST=σb2σw2+σb2=(k-2Z)2(1-4pq)/4kpq+(k-2Z)2(1-4pq/4). (40)

Because P(Z=z)=(kz)/2k, the expectation across traits of QST is

E(QST)=12kz=0k(k-2z)2(1-4pq)/4kpq+(k-2z)2(1-4pq)/4(kz). (41)

Figure 4 shows E(QST) for p = 0.35 and k ranging from 1 to 100. As seen in the figure, the expected value of QST is nearly constant in k.

Figure 4.

Figure 4

The proportion of variance that is “between groups” in a typical neutral trait, in allelic values at a single locus, and in the genotypic statistic S designed for classification. Here, p = 0.35. QST is the proportion of variance of a neutral trait attributable to differences between groups (Eq. 40). QST varies for different traits according the labeling of alleles, and E(QST) is the expectation of QST across traits (Eq. 41). FST is the proportion of allelic variance at a single locus attributable to differences between groups (Eq. 1). We define SST analogously to QST and FST as the proportion of variance in S, the sum of “1” alleles, attributable to differences between populations: SST = Var[EM(S|M)}/Var(S) = k(1 − 4pq)/[4pq + k(1 − 4pq)]. As the number of loci increases, this quantity grows to 1. By contrast, E(QST) is approximately the same as FST, regardless of how many loci influence the trait. Because QST is increasing in |Zk/2| (Eq. 40), we can obtain quantiles of QST across traits by plugging the corresponding quantiles of the distribution of |Zk/2| (Eq. B.15) into Eq. 40. The gray region, representing variability in QST, extends from the 5th to the 95th percentile.

To obtain more insight into the behavior of QST across different traits, we approximate E(QST) by replacing (k − 2Z)2(1 − 4pq)/4 in Eq. 40 with its expectation. This replacement is justified as a first-order Taylor approximation. If we define a random variable Y = (k − 2Z)2(1 − 4pq)/4, then by Eq. 40,

QST=Ykpq+Y=g(Y).

Defining μY = E(Y), a first-order Taylor series expansion then gives

QST=g(Y)g(μY)+g(μY)(Y-μY),

and applying the expectation operator gives

E(QST)=E[g(Y)]g(μY)+g(μY)E(Y-μY)=g(μY).

By Eq. 35, (k-2Z)2(1-4pq)/4=kpqDT2/4. By Eq. 38, E(DT2)=(1-4pq)/(pq). Therefore,

E(QST)k(1-4pq)/4k(1-4pq)/4+kpq=1-4pq. (42)

Because QST is a concave function of Y, it follows from Jensen’s inequality—which holds that for a concave function g and a random variable X, E[g(X)] ≤ g[E(X)]—that E(QST) ≤ 1 − 4pq. Thus, the approximation in Eq. 42 is an upper bound on E(QST). In the Edwards model, the proportion of variance in allelic types attributable to differences between the populations, or FST, is also 1 − 4pq (Eq. 1), so we have

E(QST)FST. (43)

These results support the idea that for a neutral trait, QSTFST, regardless of how many neutral loci influence the trait. The answer to question 2, then, is that we expect the proportion of the variance of a neutral trait attributable to between-population differences to be about the same as the proportion of the allelic variance at a single locus that is attributable to between-population differences.

5.3 Question 3: Does the trait become increasingly useful for classification as the number of loci grows?

We saw that using genotypic data, it is possible to pool information across genetic loci to classify accurately. Can the information about ancestry contained in a large collection of loci be extracted from a neutral trait they influence? To answer this question, we consider the expected misclassification rate across traits influenced by k loci.

The expected misclassification rate across traits is

E[P(WT=1)]=12kz=0kP(WT=1Z=z)(kz). (44)

Plugging in the expression for P(WT = 1|Z = z) from Eq. 33 gives

E[P(WT=1)]=12kz=0k(kz)[12γ+pmax{z,k-z}qmin{z,k-z}t=0k/2-1l=0t(min{z,k-2}l)(max{z,k-z}t-l)(pq)2l-t], (45)

where γ = P(T = k/2|Z = z). Figure 5 shows the expected misclassification rate across traits when p = 0.35, with the misclassification rate obtained using S, the sum of the individual’s “1” alleles (Eq. 4), for comparison. In the figure, the expected misclassification rate obtained using T, the value of the neutral trait, does not systematically decrease as k increases.

Figure 5.

Figure 5

Expected misclassification rates obtained when using T, the individual’s value for a neutral trait (Eq. 45). These values are shown with misclassification rates obtained when using L, the individual’s genotype at a single locus (p), and S, the multilocus sum of the individual’s “1” alleles (Eq. 2). Here, p = 0.35. When one uses S to classify, the misclassification rate declines as the number of loci increases. When the neutral trait—constructed from the same alleles as S, but with different labelings—is used instead, the expected misclassification rate stays approximately the same as when a single genetic locus is used, regardless of how many loci influence the trait. Traits vary in classification accuracy depending on the labeling of alleles, and the gray region indicates this variability. Because the misclassification rate using T decreases monotonically with |Zk/2| (Appendix B, Theorem 1), we can obtain quantiles of the distribution of the misclassification rate across traits by plugging values of z corresponding to quantiles of |Zk/2| (Eq. B.15) into Eq. 33. The gray region extends from the 5th to the 95th percentile.

In Section 2, considering the genotypic statistic S, we showed that the normal approximation of the misclassification rate P(WS = 1) (Eq. 5) approaches zero as k increases. We now show that in contrast, the normal approximation to the expected misclassification rate across traits E[P(WT = 1)] is at least as large as the corresponding approximate misclassification rate from a single locus, regardless of the number of loci k that affect the trait.

For large k, the misclassification probability on the basis of the trait obeys

1-P(WT=1)Φ[k-2Z(q-p)2kpq],

where Φ is the standard normal distribution function (Eq. 34, removing the condition on Z). Taking the expectation across traits of both sides gives

1-E[P(WT=1)]E(Φ[k-2Z(q-p)2kpq]).

On the right side, the argument to Φ is nonnegative. Φ(x) is concave for x > 0 because for x > 0, the standard normal density Φ′(x) is strictly decreasing in x, implying that Φ″(x) < 0. Applying Jensen’s inequality gives

1-E[P(WT=1)]E(Φ[k-2Z(q-p)2kpq])Φ[E(k-2Z)(q-p)2kpq]. (46)

E(|k − 2Z|) = 2E(|k/2 − Z|), twice the mean absolute deviation of Z. For all random variables, by Jensen’s inequality, the mean absolute deviation is no larger than the standard deviation. The standard deviation of Z is k/2. Replacing E(|k − 2Z|) with 2Var(Z)=k therefore does not decrease the value on the right side of Eq. 46, and

1-E[P(WT=1)]E(Φ[k-2Z(q-p)2kpq])Φ(q-p2pq).

The lower bound on the approximate expected misclassification rate is then

E[P(WT=1)]1-E(Φ[k-2Z(q-p)2kpq])1-Φ(q-p2pq). (47)

The approximation of the expected trait-based misclassification rate is no smaller than the normal approximation of the genetic misclassification rate with one locus (Eq. 5). Thus, the answer to question 3 is that unlike the probability of misclassification obtained when using genotypes, the expected misclassification rate for a neutral trait does not decrease as k increases.

6. Discussion

Population-genetic studies have found that even if differences between populations are small at each locus, accurate classification of individuals into source populations is possible when many loci are considered. Our results extend an important model for the demonstration of this result by examining the distribution of population differences in quantitative traits. Specifically, we examined the situation in which the k loci of the Edwards model additively determine the value of a neutral trait. We found that such traits differ between populations to about the same degree as do individual genetic loci, in that the expected proportion of variance in the trait attributable to genetic differences between populations (Eq. 41) is approximately the same as, and no larger than, the single-locus proportion of allelic variance attributable to between-population differences (Eqs. 1, 42, 43). Unlike multilocus genotypes, multilocus neutral traits do not become more useful for classification as the number of underlying loci increases (Eq. 47). For accurate classification, many traits are required. These results (Table 3) emphasize that for neutral, genetically controlled traits, phenotypic-diversity partitioning typically reflects single-locus genetic-diversity partitioning.

Table 3.

Three questions about selectively neutral, polygenic phenotypes and their answers under the extended Edwards model.

Question Answer References
1: How does the standardized difference in population means for the trait (DT) change as k, the number of loci influencing the trait, increases? As k grows, the typical absolute size of DT does not change, but the distribution of DT approaches normality. Eq. 38, Figure 3.
2: What is the expected proportion of variance in the trait that is accounted for by genetic differences between the populations, E(QST)? It is approximately equal to, and no larger than, the proportion of allelic variance at a single locus attributable to genetic differences, FST. Eqs. 42, 43, Figure 4.
3: Does the trait become increasingly useful for classification as the number of loci grows? No. Eq. 46, Figure 5.

Why are the results for a selectively neutral quantitative trait so different from the results for multilocus genetic classification? The power of multilocus classification comes from the aggregation of information across loci—an individual from population A is more likely to have a “1” allele not just at one locus but at all of them. Small differences in allele frequency at each locus accumulate to separate the populations clearly. But when we assume that these differences determine a trait that has not been under selection, we impede the accumulation of information across loci. Suppose we have a single locus at which the allele that is more common in population A contributes to larger values of the trait. The influence of this locus on the trait gives us a hint about population membership; that hint, however, is likely to be masked by the influence of another locus at which the allele more common in population A reduces trait values.

Although we have used a simple haploid model, our results are similar to conclusions from other models with different assumptions. The result that the expected proportion of variance in a trait attributable to between-group differences is approximately the proportion of allelic variance attributable to between-group differences (Eqs. 1, 42) is related to previous work in population and quantitative genetics arguing that in several contexts, both haploid and diploid, QST for selectively neutral traits is on average equal to FST (e.g. Felsenstein, 1973, 1986; Rogers & Harpending, 1983; Lande, 1992; Spitze, 1993; Lynch & Spitze, 1994; Whitlock, 1999; Berg & Coop, 2014). We use this close analogy between our analysis and previous work to discuss how relaxation of our model’s assumptions might affect our results.

In our model, the genetic architecture of the trait is additive, with no interactions between genes and no dominance. In the presence of dominance and epistasis—that is, gene-gene interaction—QST tends to be somewhat smaller than FST (Whitlock, 2008). Thus, modifying our model by adding dominance and epistasis would not likely affect our main claim that the partition of neutral phenotypic diversity mirrors the single-locus partition of genetic diversity.

To keep our model simple, we considered haploids rather than diploids. For diploids, the analysis would proceed similarly, but because diploids have two alleles at each locus, comparable information for distinguishing populations is achieved in a diploid model with half as many loci as in a haploid model. A slightly modified expression for QST in diploids takes this difference into account (Whitlock, 2008, Leinonen et al., 2013).

Another assumption is that the trait is entirely genetically determined. In our model, the environment either does not affect the trait, or individuals in a population live in identical environments, at least in relation to aspects of the environment that could affect the trait. If the trait is influenced by environmental variation, then environmental differences between populations can either cause or erase large group differences in the trait, regardless of genetic differentiation or selection history (Pujol et al., 2008).

Finally, we assumed that the trait has not been under selection. The behavior of QST under divergent selection is a major source of interest in QST. When a trait has been under divergent selection in groups under consideration—that is, when the selective pressure on the trait has varied in strength or direction across groups—QST tends to exceed FST, and when the trait has been under uniform selection, QST tends to be smaller than FST (Whitlock, 2008).

Considering the phenomena that affect QST and FST can help us make sense of past work on human phenotypic variation. For example, in Relethford’s (2002) study of worldwide variation in craniometric traits and skin color, the proportion of variance in craniometric traits attributable to between-population differences was roughly equal to the proportion of allelic variance at a single locus attributable to between-population differences. This finding accords with the results for our extension of the Edwards model, and it is consistent with a hypothesis that selection on many craniometric traits has been weak or absent.

In the same groups studied by Relethford (2002), the proportion of variation in skin color attributable to between-population differences was much larger. When confronted by trait differences between groups that are much larger than genetic differences at typical loci, we have recourse to several possible explanations (Whitlock, 2008; Leinonen et al., 2013). First, the trait differences might stem from an unlikely realization of drift. Second, the trait differences might be due to environmental differences between the groups. Third, the trait differences might be due to differences in selection operating on the groups’ ancestors. Fourth, our measurements may be incorrect in a way that magnifies group differences. These possibilities are not mutually exclusive, nor are they exhaustive. Concluding definitively in favor of any of these explanations requires more investigation. For skin pigmentation, genetic evidence supports natural selection as part of the explanation of group differences (Berg & Coop, 2014).

The finding that genomic analyses enable ancestry inference has led many authors to consider the relevance of both single-locus genetic-diversity partitioning and multilocus classification for philosophical ideas of “race” (e.g. Glasgow, 2003; Andreasen, 2004; Kitcher, 2007; Gannett, 2010; Sesardic, 2010; Hardimon, 2012; Hochman, 2013; Kaplan & Winther, 2014; Spencer, 2014; Kopec, forthcoming; Ludwig, forthcoming; Donovan, forthcoming; Spencer, forthcoming). Our results contribute to these discussions by clarifying the phenotypic consequences of single-locus genetic diversity partitioning and multilocus classification results. Both sets of results are valid, and they are mutually compatible, but they have different implications and uses (Neel, 1981; Rosenberg, 2011; Barbujani et al., 2013; Winther, 2014). Multilocus methods allow us to detect genome-wide patterns of variation and to investigate the ancestry of individual people. But it is single-locus allelic-variance partitioning that informs our expectations about selectively neutral phenotypic diversity. In particular, population genetics and quantitative genetics lead us to expect that differences between human populations in neutral phenotypes will mirror differences between human populations at single neutral loci.

Highlights.

  • We extend a simple model for studying genetic differences between groups.

  • We study selectively neutral traits controlled by k genetic loci.

  • Traits are about as informative about group membership as single genetic loci.

  • The expected size of the between-group trait difference does not increase with k.

Acknowledgments

We thank Graham Coop, Brian Donovan, Aaron Hirsh, and Rasmus Winther for stimulating discussions. Joe Felsenstein and two anonymous reviewers provided helpful comments on the manuscript. Support was provided by NIH grant R01 HG005855, the Stanford Center for Computational, Evolutionary, and Human Genomics, and a Stanford Graduate Fellowship.

Appendix A

In this appendix, we prove that Eq. 2 evaluated at k = 2h + 1 with h a non-negative integer is equal to Eq. 3 evaluated at k = 2h + 2. That is, we show that for odd k ≥ 1 and q = 1 − p, with 0 < p,q < 1.

i=0(k-1)/2(ki)piqk-i=12(k+1k+12)p(k+1)/2q(k+1)/2+i=0(k-1)/2(k+1i)piqk+1-i. (A.1)

Because k is odd, we write k = 2h + 1, with h ≥ 0 an integer. Because q ≠ 0, we let u = p/q, so q = 1/(u + 1). The desired identity is then equivalent to

(u+1)i=0h(2h+1i)ui=12(2h+2h+1)uh+1+i=0h(2h+2i)ui.

Applying the binomial identity (2h+1i)=[1-i2h+2](2h+2i), we obtain

i=0h[u(1-i2h+2)-i2h+2](2h+2i)ui=12(2h+2h+1)uh+1.

Next, noting that i2h+2(2h+2i)=(2h+1i-1), Eq. A.1 is equivalent to

u+i=1h[((2h+2i)-(2h+1i-1))ui+1-(2h+1i-1)ui]=12(2h+2h+1)uh+1. (A.2)

The sum telescopes: by Pascal’s rule, the ((2h+2i)-(2h+1i-1))ui+1 term with index i is canceled by the -(2h+1(i+1)-1)u(i+1) term with index i + 1. Thus, the left-hand side of Eq. A.2 evaluates to

u+[-u+((2h+2h)-(2h+1h-1))uh+1]=((2h+2h)-(2h+1h-1))uh+1.

Applying Pascal’s rule twice more, (2h+2h)-(2h+1h)=(2h+1h) and (2h+1h)+(2h+1h+1)=(2h+2h+1); because (2h+1h)=(2h+1h+1), the left-hand side of Eq. A.2 reduces to 12(2h+2h+1)uh+1, completing the proof of Eq. A.2 and hence of Eq. A.1.

Appendix B

In this appendix, we show that the conditional probability of misclassifying an individual on the basis of its trait value, P(WT = 1|Z = z), increases as z gets closer to k/2. We introduce and prove three claims used to obtain the result. We then give upper and lower bounds on P(WT = 1|Z = z) in z, and we outline a method for obtaining quantiles of P(WT = 1).

Theorem 1

Define P(WT = 1|Z = z)as in Eq. 33 and fix k ≥ 1 an integer. For integers z1 and z2 in {0,1, …, k}, if |z1k/2| > |z2k/2|, then P(WT = 1|Z = z1) < P(WT = 1|Z = z2).

B.1. Claim 1: For even k, P(WT = 1|M = A,Z = k/2) = 1/2

This claim has been shown in the main text, but we restate the proof for completeness.

Proof of Claim 1

For even k, the claim follows from the classification rule in the main text (Eqs. 2729). By Eqs. 18 and 19, if Z = k/2, then Eq. 29 holds for all T, and we classify each individual randomly into either population, each with probability 1/2.

B.2. Claim 2: For odd k, P(WT = 1|M = A,Z = (k − 1)/2) < 1/2

We start by introducing two lemmas.

Lemma 1 (Samuels, 1965, eq. 15)

Let Y be a sum of independent Bernoulli random variables with probabilities that are not necessarily identical, and let p1 be the smallest probability associated with any of these Bernoulli trials. If y < E(Y) + p1, then P(Y = y − 1) < P(Y = y).

Lemma 2

Let R be a random variable taking values in {0,1, …, 2h} for an integer h ≥ 1. Assume that for each r in {0,1, …, 2h},

P(R=r)=P(R=2h-r), (B.1)

and that

0<P(R=0)<P(R=1)<<P(R=h). (B.2)

Let Bq be an independent Bernoulli random variable with parameter q > 1/2, and let p = 1 − q. Then for l ∈ {0,1, …, h},

P(R+Bq=l)<P(R+Bq=2h+1-l). (B.3)
Proof of Lemma 2

For l ∈ {1, …, h}, probabilities for the sum R + Bq satisfy

P(R+Bq=l)=P(R=l)p+P(R=l-1)q (B.4)
P(R+Bq=2h+1-l)=P(R=2h+1-l)p+P(R=2h-1)q. (B.5)

Note that Eqs. B.4 and B.5 also hold for l = 0 because P(R = −1) = P(R = 2h + 1) = 0. Because of the symmetry of the distribution of R (Eq. B.1), Eq. B.5 is equivalent to

P(R+Bq=2h+1-l)=P(R=l-1)p+P(R=l)q.

Because q > p, Eq. B.3 is then equivalent to

P(R=l-1)<P(R=l).

This last inequality is true for l ∈ {1, …, h} by the assumption in Eq. B.2 and, for l = 0, by the fact that P(R = −1) = 0.

Proof of Claim 2

In population A, for h a positive integer, the random variable (T|M = A,Z = h,k = 2h), including the extra condition on k to indicate the number of loci contributing to the trait, satisfies the hypotheses of Lemma 2. The symmetry in Eq. B.1 comes from Eq. 17, and the monotonically increasing probabilities in Eq. B.2 come from applying Lemma 1 to the independent Bernoulli trials that sum to produce the random variable (Eqs. 911) and noting that P(T = 0|M = A,Z = h,k = 2h) = phqh > 0.

We can view Bq as an additional locus with X = 0, meaning that the probability in population A that the locus increases the trait value T is q (Table 1). The sum (T|M = A,Z = h,k = 2h) + Bq is therefore equal in distribution to (T|M = A,Z = h,k = 2h + 1). Applying Lemma 2 with (T|M = A,Z = h,k = 2h + 1) as R + Bq gives, for l ∈ {0,1, …, (k − 1)/2},

P(T=lM=A,Z=(k-1)/2)<P(T=k-lM=A,Z=(k-1)/2). (B.6)

Eq. B.6 guarantees that

l=0(k-1)/2P(T=lM=A,Z=(k-1)/2)<l=(k+1)/2kP(T=lM=A,Z=(k-1)/2),

meaning that

P(T(k-1)/2M=A,Z=(k-1)/2)<P(T(k+1)/2M=A,Z=(k-1)/2). (B.7)

Applying Eq. 30 and noting P(T ≤ (k − 1)/2) + P(T ≥ (k + 1)/2) = 1, for odd k ≥ 3,

P(WT=1M=A,Z=(k-1)/2)<1/2. (B.8)

Because Lemma 2 assumes h ≥ 1, our argument applies to odd k ≥ 3. If k = 1, then Eq. B.8 holds because q > p.

B.3. Claim 3: If zk/2 − 1, then P(WT = 1|M = A,Z = z) < P(WT = 1|M = A,Z = z + 1)

To prove this claim, we use a lemma.

Lemma 3

Consider a random variable R = R1 + R2 that is the sum of two independent binomial random variables, R1 with an integer zk/2 − 1 trials and probability p and R2 with kz − 1 trials and probability q. That is, R1~Binomial(z,p), and independently, R2~Binomial(kz − 1,q). Define two independent random variables, Bq~Bernoulli(q) and Bp~Bernoulli(p) with q > p. Then for 0 ≤ jk/2,

P(R+Bq=j)<P(R+Bp=j). (B.9)
Proof of Lemma 3

For 1 ≤ jk/2, the random variables R + Bq and R + Bp satisfy

P(R+Bq=j)=P(R=j)p+P(R=j-1)qP(R+Bp=j)=P(R=j)q+P(R=j-1)p.

For j = 0, these equations follow from the fact that P(R = −1) = 0. Eq. B.9 therefore holds if

P(R=j)p+P(R=j-1)q<P(R=j)q+P(R=j-1)p.

Because q > p, the inequality is satisfied if

P(R=j-1)<P(R=j). (B.10)

R is the sum of k − 1 independent Bernoulli random variables. Lemma 1 guarantees that Eq. B.10 is satisfied if j < E(R) + p Because zk/2 − 1 and q > p,

k/2=(p+q)k/2<(z+1)p+(k-z-1)q=E(R)+p.

Thus, Eq. B.10 is satisfied for jk/2, which shows that Eq. B.9 holds for jk/2.

Proof of Claim 3

The random variable (T|M = A,Z = z), which, with k loci, is the sum of zk/2 − 1 independent Bernoulli trials with probability p and kz independent Bernoulli trials with probability q, has the properties required for R + Bq in Lemma 3. Similarly, (T|M = A,Z = z + 1) has the properties of R + Bp. Applying Lemma 3, for zk/2 − 1 and tk/2,

P(T=tM=A,Z=z)<P(T=tM=A,Z=z+1).

It follows that

P(T<k/2M=A,Z=z)+P(T=k/2M=A,Z=z)/2<P(T<k/2M=A,Z=z+1)+P(T=k/2M=A,Z=z+1)/2.

Applying Eq. 30 to this last inequality gives, for zk/2 − 1,

P(WT=1M=A,Z=z)<P(WT=1M=A,Z=z+1), (B.11)

demonstrating the desired result.

B.4. Completing the proof of Theorem 1: both populations and all z

Claims 1–3 prove Theorem 1 for population A and z1,z2k/2. If Z = k/2, then P(WT = 1|M = A,Z = z) = 1/2) (Claim 1); for odd k, if Z = (k − 1)/2, then P(WT = 1|M = A,Z = z) < 1/2) (Claim 2); and as Z decreases from k/2 − 1, P(WT = 1|M = A,Z = z) decreases (Claim 3). We have thus proven that for z1,z2 ∈ {0,1, …, ⌊k/2⌋}, if z1 < z2, then

P(WT=1M=A,Z=z1)<P(WT=1M=A,Z=z2). (B.12)

It remains to examine z > k/2 and to remove the condition on M.

By Eq. 33, the misclassification probability on the basis of T does not depend on population membership, so we can drop the condition on M = A in Eq. B.12. That is, for z1,z2 ∈ {0,1, …, ⌊k/2⌋}, if z1 < z2, we now have

P(WT=1Z=z1)<P(WT=1Z=z2). (B.13)

Further, Eq. 33 shows that P(WT = 1|Z = z) = P(WT = 1|Z = kz), so that Eq. B.13 holds for z1,z2 ∈ {0,1, …, k} with min{z1,kz1} < min{z2,kz2}. But min{z1,kz1} < min{z2,kz2} if and only if |z1k/2| > |z2k/2|, completing the proof of Theorem 1.

B.5. Applying Theorem 1

By Theorem 1, the upper bound in z of P(WT = 1|Z = z), achieved when z = k/2, is (Section B.1)

P(WT=1Z=z)1/2.

The lower bound in z of P(WT = 1|Z = z), achieved when when z = 0 or z = k, is

P(WT=1Z=z)P(WS=1),

taking P(WS = 1) from Eq. 4. The lower bound is calculated using z = 0 or z = k in Eq. 33.

Because P(WT = 1|Z = z) decreases with |zk/2|, quantiles of the distribution of P(WT = 1) are obtained by identifying the corresponding quantiles of |Zk/2|. We define J = |Zk/2|. If k is even, then J takes values in {0,1, …, k/2}; if k is odd, then J takes values in {1/2,3/2, …, k/2}.

Because Z~Binomial(k, 1,2),

P(J=j)={12k(kk/2),j=012k-1(kk/2-j),j0 (B.14)

The cumulative distribution function of J is

FJ(j)=i=0jP(J=i).

The qth quantile of J is

FJ-1(q)=min{j:FJ(j)q}. (B.15)

The two values of z corresponding to the qth quantile of J are k/2-FJ-1(q) and k/2+FJ-1(q). Plugging either of these values into Eq. 33 gives the qth quantile of P(WT = 1).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Andreasen RO. The cladistic race concept: a defense. Biology and Philosophy. 2004;19:425–442. [Google Scholar]
  2. Bamshad MJ, Wooding S, Watkins WS, Ostler CT, Batzer MA, Jorde LB. Human population genetic structure and inference of group membership. American Journal of Human Genetics. 2003;72:578–589. doi: 10.1086/368061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Barbujani G, Ghirotto S, Tassi F. Nine things to remember about human genome diversity. Tissue Antigens. 2013;82:155–164. doi: 10.1111/tan.12165. [DOI] [PubMed] [Google Scholar]
  4. Barbujani G, Magagni A, Minch E, Cavalli-Sforza LL. An apportionment of human DNA diversity. Proceedings of the National Academy of Sciences of the United States of America. 1997;94:4516–4519. doi: 10.1073/pnas.94.9.4516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLoS Genetics. 2014;10:e1004412. doi: 10.1371/journal.pgen.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL. High resolution of human evolutionary trees with polymorphic microsatellites. Nature. 1994;368:455–457. doi: 10.1038/368455a0. [DOI] [PubMed] [Google Scholar]
  7. Brown RA, Armelagos GJ. Apportionment of racial diversity: a review. Evolutionary Anthropology. 2001;10:34–40. [Google Scholar]
  8. Chakraborty R, Nei M. Genetic differentiation of quantitative characters between populations or species: I. Mutation and random genetic drift. Genetical Research. 1982;39:303–314. [Google Scholar]
  9. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2. Hillsdale, New Jersey: Lawrence Erlbaum Associates; 1988. [Google Scholar]
  10. Deheuvels P, Puri ML, Ralescu SS. Asymptotic expansions for sums of nonidentically distributed Bernoulli random variables. Journal of Multivariate Analysis. 1989;28:282–303. [Google Scholar]
  11. Donovan BM. Studies in History and Philosophy of Biological and Biomedical Sciences. Putting humanity back into the teaching of human biology. (forthcoming) [DOI] [PubMed] [Google Scholar]
  12. Edwards AWF. Human genetic diversity: Lewontin’s fallacy. Bioessays. 2003;25:798–801. doi: 10.1002/bies.10315. [DOI] [PubMed] [Google Scholar]
  13. Feldman MW, Lewontin RC. Race, ancestry, and medicine. In: Koenig BA, Lee SS-J, Richardson SS, editors. Revisiting Race in a Genomic Age. Piscataway, New Jersey: Rutgers; 2008. pp. 89–101. [Google Scholar]
  14. Felsenstein J. Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics. 1973;25:471–492. [PMC free article] [PubMed] [Google Scholar]
  15. Felsenstein J. Population differences in quantitative characters and gene frequencies: a comment on papers by Lewontin and Rogers. American Naturalist. 1986;127:731–732. [Google Scholar]
  16. Gannett L. Questions asked and unasked: how by worrying less about the ‘really real’ philosophers of science might better contribute to debates about genetics and race. Synthese. 2010;177:363–385. [Google Scholar]
  17. Glasgow JM. On the new biology of race. The Journal of Philosophy. 2003;100:456–474. [Google Scholar]
  18. Goodman AH. Why genes don’t count (for racial differences in health) American Journal of Public Health. 2000;90:1699–1702. doi: 10.2105/ajph.90.11.1699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hardimon MO. The idea of a scientific concept of race. Journal of Philosophical Research. 2012;37:249–282. [Google Scholar]
  20. Hochman A. Against the new racial naturalism. The Journal of Philosophy. 2013;110:331–351. [Google Scholar]
  21. Holsinger KE, Weir BS. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics. 2009;10:639–650. doi: 10.1038/nrg2611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kaplan JM, Winther RG. Realism, antirealism, and conventionalism about race. Philosophy of Science. 2014;81:1039–1052. [Google Scholar]
  23. Kitcher P. Does ‘race’ have a future? Philosophy & Public Affairs. 2007;35:293–317. [Google Scholar]
  24. Kopec M. Philosophy of Science. Clines, clusters, and clades in the race debate. (forthcoming) [Google Scholar]
  25. Leinonen T, McCairns RJS, O’Hara RB, Merilä J. QST-FST comparisons: evolutionary and ecological insights from genomic heterogeneity. Nature Reviews Genetics. 2013;14:179–190. doi: 10.1038/nrg3395. [DOI] [PubMed] [Google Scholar]
  26. Lande R. Natural selection and random genetic drift in phenotypic evolution. Evolution. 1976;30:314–334. doi: 10.1111/j.1558-5646.1976.tb00911.x. [DOI] [PubMed] [Google Scholar]
  27. Lande R. Neutral theory of quantitative genetic variance in an island model with local extinction and colonization. Evolution. 1992;46:381–389. doi: 10.1111/j.1558-5646.1992.tb02046.x. [DOI] [PubMed] [Google Scholar]
  28. Lewontin RC. The apportionment of human diversity. In: Dobzhansky T, Hecht MK, Steere WC, editors. Evolutionary Biology. Vol. 6. New York: Appleton-Century-Crofts; 1972. pp. 381–398. [Google Scholar]
  29. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319:1100–1104. doi: 10.1126/science.1153717. [DOI] [PubMed] [Google Scholar]
  30. Ludwig D. Philosophy of Science. Against the new metaphysics of race. (forthcoming) [Google Scholar]
  31. Lynch M, Spitze K. Evolutionary genetics of Daphnia. In: Real L, editor. Ecological Genetics. Princeton: Princeton University Press; 1994. pp. 109–128. [Google Scholar]
  32. Mahalanobis PC. On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India. 1936;2:49–55. [Google Scholar]
  33. Mitton JB. Genetic differentiation of races of man as judged by single-locus and multilocus analyses. The American Naturalist. 1977;111:203–212. [Google Scholar]
  34. Mountain JL, Cavalli-Sforza LL. Multilocus genotypes, a tree of individuals, and human evolutionary history. American Journal of Human Genetics. 1997;61:705–718. doi: 10.1086/515510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Neel JV. The major ethnic groups: Diversity in the midst of similarity. American Naturalist. 1981;117:83–87. [Google Scholar]
  36. Pujol B, Wilson AJ, Ross RIC, Pannell JR. Are QST-FST comparisons for natural populations meaningful? Molecular Ecology. 2008;17:4782–4785. doi: 10.1111/j.1365-294X.2008.03958.x. [DOI] [PubMed] [Google Scholar]
  37. Relethford JH. Apportionment of global human genetic diversity based on craniometrics and skin color. American Journal of Physical Anthropology. 2002;118:393–398. doi: 10.1002/ajpa.10079. [DOI] [PubMed] [Google Scholar]
  38. Risch N, Burchard E, Ziv E, Tang H. Categorization of humans in biomedical research: genes, race and disease. Genome Biology. 2002;3 doi: 10.1186/gb-2002-3-7-comment2007. comment 2007.1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rogers AR, Harpending HC. Population structure and quantitative characters. Genetics. 1983;105:985–1002. doi: 10.1093/genetics/105.4.985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rosenberg NA. A population-genetic perspective on the similarities and differences among worldwide human populations. Human Biology. 2011;83:659–684. doi: 10.3378/027.083.0601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Rosenberg NA, Li LM, Ward R, Pritchard JK. Informativeness of genetic markers for inference of ancestry. American Journal of Human Genetics. 2003a;73:1402–1422. doi: 10.1086/380416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al. Response to Comment on “Genetic Structure of Human Populations”. Science. 2003b;300:1877. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
  43. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al. Genetic structure of human populations. Science. 2002;298:2381–2385. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
  44. Samuels SM. On the number of successes in independent trials. Annals of Mathematical Statistics. 1965;36:1272–1278. [Google Scholar]
  45. Sesardic N. Race: a social destruction of a biological concept. Biology & Philosophy. 2010;25:143–162. [Google Scholar]
  46. Smouse PE, Spielman RS, Park MH. Multiple-locus allocation of individuals to groups as a function of the genetic variation within and differences among human populations. American Naturalist. 1982;119:445–460. [Google Scholar]
  47. Spencer Q. A radical solution to the race problem. Philosophy of Science. 2014;81:1025–1038. [Google Scholar]
  48. Spencer Q. Studies in History and Philosophy of Biological and Biomedical Sciences. Philosophy of race meets population genetics. (forthcoming) [DOI] [PubMed] [Google Scholar]
  49. Spitze K. Population structure in Daphnia obtusa: quantitative genetic and allozymic variation. Genetics. 1993;135:367–374. doi: 10.1093/genetics/135.2.367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Tal O. The cumulative effect of genetic markers on classification performance: Insights from simple models. Journal of Theoretical Biology. 2012;293:206–218. doi: 10.1016/j.jtbi.2011.10.005. [DOI] [PubMed] [Google Scholar]
  51. Tang H, Quertermous T, Rodriguez B, Kardia SLR, Zhu X, Brown A, et al. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. American Journal of Human Genetics. 2005;76:268–275. doi: 10.1086/427888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Whitlock MC. Neutral additive genetic variance in a metapopulation. Genetical Research. 1999;74:215–221. doi: 10.1017/s0016672399004127. [DOI] [PubMed] [Google Scholar]
  53. Whitlock MC. Evolutionary inference from QST. Molecular Ecology. 2008;17:1885–1896. doi: 10.1111/j.1365-294X.2008.03712.x. [DOI] [PubMed] [Google Scholar]
  54. Winther RG. The genetic reification of ‘race’? A story of two mathematical methods. Critical Philosophy of Race. 2014;2:204–223. [Google Scholar]

RESOURCES