Abstract
Human genome-wide association studies (GWASs) are revealing the genetic architecture of anthropomorphic and biomedical traits, i.e., the frequencies and effect sizes of variants that contribute to heritable variation in a trait. To interpret these findings, we need to understand how genetic architecture is shaped by basic population genetics processes—notably, by mutation, natural selection, and genetic drift. Because many quantitative traits are subject to stabilizing selection and because genetic variation that affects one trait often affects many others, we model the genetic architecture of a focal trait that arises under stabilizing selection in a multidimensional trait space. We solve the model for the phenotypic distribution and allelic dynamics at steady state and derive robust, closed-form solutions for summary statistics of the genetic architecture. Our results provide a simple interpretation for missing heritability and why it varies among traits. They predict that the distribution of variances contributed by loci identified in GWASs is well approximated by a simple functional form that depends on a single parameter: the expected contribution to genetic variance of a strongly selected site affecting the trait. We test this prediction against the results of GWASs for height and body mass index (BMI) and find that it fits the data well, allowing us to make inferences about the degree of pleiotropy and mutational target size for these traits. Our findings help to explain why the GWAS for height explains more of the heritable variance than the similarly sized GWAS for BMI and to predict the increase in explained heritability with study sample size. Considering the demographic history of European populations, in which these GWASs were performed, we further find that most of the associations they identified likely involve mutations that arose shortly before or during the Out-of-Africa bottleneck at sites with selection coefficients around s = 10−3.
Author summary
One of the central goals of evolutionary genetics is to understand the processes that give rise to phenotypic variation in humans and other taxa. Genome-wide association studies (GWASs) in humans provide an unprecedented opportunity in that regard, revealing the genetic basis of variation in numerous traits. However, exploiting this opportunity requires models that relate genetic and population genetic processes with the discoveries emerging from GWASs. We present such a model and show that it can help explain the results of GWASs for height and body mass index. More generally, our results offer a simple interpretation of the findings emerging from GWASs and suggest how they relate to the evolutionary and genetic forces that give rise to phenotypic variation.
Introduction
Much of the phenotypic variation in human populations, including variation in morphological, life history, and biomedical traits, is “complex” or “quantitative”, in the sense that heritable variation in the trait is largely due to small contributions from many genetic variants segregating in the population [1,2]. Quantitative traits have been studied since the birth of biometrics over a century ago [1–3], but only in the past decade have technological advances made it possible to systematically dissect their genetic basis [4–6]. Notably, since 2007, genome-wide association studies (GWASs) in humans have led to the identification of many thousands of variants reproducibly associated with hundreds of quantitative traits, including susceptibility to a wide variety of diseases [4]. While still ongoing, these studies already provide important insights into the genetic architecture of quantitative traits, i.e., the number of variants that contribute to heritable variation, as well as their frequencies and effect sizes.
Perhaps the most striking observation to emerge from these studies is that, despite the large sample size of many GWASs, all variants significantly associated with a given trait typically account for less (often much less) than 25% of the narrow sense heritability ([4,7,8], but see [9]). (Henceforth, we use “heritability” to refer to narrow sense heritability.) While many factors have been hypothesized to contribute to the “missing heritability” [7,8,10–14], the most straightforward explanation and the emerging consensus is that much of the heritable variation derives from variants with frequencies that are too low or effect sizes that are too small for current studies to detect. Comparisons among traits also suggest that there are substantial differences in architectures. For example, recent meta-analysis GWASs uncovered 7 times as many variants for height (697) as for body mass index (97), and together, the variants for height account for more than 4 times the heritable variance than the variants for body mass index do (approximately 20% versus approximately 3%–5%, respectively), despite comparable sample sizes [15,16].
These first glimpses underscore the need for theory that relates the findings emerging from GWASs with the evolutionary processes that shape genetic architectures. Such theory would help to interpret the “missing heritability” [17–20] and to explain why architecture differs among traits. It may also allow us to use GWAS findings to make inferences about underlying evolutionary parameters, helping to answer enduring questions about the processes that maintain phenotypic variation in quantitative traits [5,21].
Development of such theory can be guided by empirical observations and first-principles considerations. New mutations affecting a trait arise at a rate that depends on its “mutational target size” (i.e., the number of sites at which a mutation would affect the trait). Once they arise, the trajectories of variants through the population are determined by the interplay between genetic drift, demographic processes, and natural selection acting on them. These processes determine the number and frequencies of segregating variants underlying variation in the trait. The genetic architecture further depends on the relationship between the selection on variants and their effects on the trait. Notably, selection on variants depends not only on their effect on the focal trait but also on their pleiotropic effects on other traits. We therefore expect both direct and pleiotropic selection to shape the joint distribution of allele frequencies and effect sizes.
Multiple lines of evidence suggest that many quantitative traits are subject to stabilizing selection, i.e., selection favoring an intermediate trait value [5,22–27]. For instance, a decline in fitness components (e.g., viability and fecundity) is observed with displacement from mean values for a variety of traits in human populations [28–30], in other species in the wild [31,32], and in experimental manipulations [31,33]. While less is known about complex diseases, they may often reflect large deviations of an underlying continuous trait from an optimal value [1], with these continuous traits subject to directional (purifying) selection in some cases and to stabilizing selection in others. What remains unclear is the extent to which stabilizing selection is acting directly on variation in a given trait or is “apparent”, i.e., results from pleiotropic effects of this variation on other traits.
Other lines of evidence suggest that pleiotropy is pervasive. For one, theoretical considerations about the variance in fitness in natural populations and its accompanying genetic load suggest that only a moderate number of independent traits can be effectively selected on at once [34]. Thus, the aforementioned relationships between the value of a focal trait and fitness are likely heavily affected by the pleiotropic effects of genetic variation on other traits [25,34–36]. Second, many of the variants detected in human GWASs have been found to be associated with more than one trait [37–41]. For example, a recent analysis of GWASs revealed that variants that delay the age of menarche in women tend to delay the age of voice drop in men, decrease body mass index, increase adult height, and decrease risk of male pattern baldness [37]. More generally, the extent of pleiotropy revealed by GWASs appears to be increasing rapidly with improvements in power and methodology [37,42–45]. These considerations and others [45,46] point to the general importance of pleiotropic selection on quantitative genetic variation.
The discoveries emerging from human GWASs further suggest that genetic variance is dominated by additive contributions from numerous variants with small effect sizes. Dominance and epistasis may be common among newly arising mutations of large effect (e.g., [47–51]), but both theory and data suggest that they play a minor role in shaping quantitative genetic variation within populations (e.g., [9,52–56]). Indeed, for many traits, most or all of the heritability explained in GWASs arises from the additive contribution of variants with squared effect sizes that are substantially smaller than the total genetic variance (e.g., [15,16,57,58]). Moreover, statistical quantifications of the total genetic variance tagged by genotyping (i.e., not only due to the genome-wide significant associations) suggest that such contributions may account for most of the heritable variance in many traits (e.g., [9,59–61]). Finally, considerable efforts to detect epistatic interactions in human GWASs have, by and large, come up empty-handed [9,56,62], with few counterexamples, mostly involving variants in the major histocompatibility complex region ([53,56,63,64], but see [65]). Thus, while the discovery of epistatic interactions may be somewhat limited by statistical power [56], theory and current evidence suggest that nonadditive interactions play a minor role in shaping human quantitative genetic variation. Motivated by these considerations, we model how direct and pleiotropic stabilizing selection shape the genetic architecture of continuous, quantitative traits by considering additive variants with small effects and assuming that together they account for most of the heritable variance.
To date, there has been relatively little theoretical work relating population genetics processes with the results emerging from GWASs. Moreover, the few existing models have reached divergent predictions about genetic architecture, largely because they make different assumptions about the effects of pleiotropy. Focusing on disease susceptibility, Pritchard [19] considered the “purely pleiotropic” extreme, in which selection on variants is independent of their effect on the trait being considered. In this case, we expect the largest contribution to genetic variance in a trait to come from mutations that have large effect sizes but are also weakly selected or neutral, allowing them to ascend to relatively high frequencies. Other studies considered the opposite extreme, in which selection on variants stems entirely from their effect on the trait under consideration [26,66–70], and have shown that the greatest contribution to genetic variance would arise from strongly selected mutations [67,68] (we return to this case below).
In practice, we expect most traits to fall somewhere in between these extremes. While there are compelling reasons to believe that quantitative genetic variation is highly pleiotropic, the effects of variants on different traits are likely to be correlated. Thus, even if a given trait is not subject to selection, variants that have a large effect on it will also tend to have larger effects on traits that are under selection (e.g., by causing large perturbation to pathways that affect multiple traits [36,45]). Motivated by such considerations, Eyre-Walker (2010) [20], Keightley and Hill (1990) [18], and Caballero et al. (2015) [71] considered models in which the correlation between the strength of selection on an allele and its effect size can vary between the purely pleiotropic and direct selection extremes. These models diverge in their predictions about architecture, however. Assuming, as seems plausible, an intermediate correlation between the strength of selection and effect size, Eyre-Walker finds that genetic variance should be dominated by strongly selected mutations [20], whereas Keightley and Hill and Caballero et al. conclude that the greatest contribution should arise from weakly selected ones [18,71]. Their conclusions differ because of how they chose to model the relationship between selection and effect size, a choice based largely on mathematical convenience. We approach this problem by explicitly modeling stabilizing selection on multiple traits, thereby learning, rather than assuming, the relationship between selection and effect sizes.
The model
We model stabilizing selection in a multidimensional phenotype space, akin to Fisher’s geometric model [72]. An individual’s phenotype is a vector in an n-dimensional Euclidian space, in which each dimension corresponds to a continuous quantitative trait. We focus on the architecture of one of these traits (say, the first dimension), where the total number of traits parameterizes pleiotropy. Fitness is assumed to decline with distance from the optimal phenotype positioned at the origin, thereby introducing stabilizing selection. Specifically, we assume that absolute fitness takes the form
(1) |
where is the (n-dimensional) phenotype, is the distance from the origin, and w parameterizes the strength of stabilizing selection. However, we later show that the specific form of the fitness function does not matter. Moreover, the additive environmental contribution to the phenotype can be absorbed into w ([73]; Section 1.1 in S1 Text); we therefore consider only the genetic contribution.
The genetic contribution to the phenotype follows from the multidimensional additive model [74]. Specifically, we assume that the number of genomic sites affecting the phenotype (the target size) is very large, L ≫ 1, and that allelic effects on the phenotype at these sites are vectors in the n-dimensional trait space. An individual’s phenotype then follows from adding up the effects of her or his alleles, i.e.,
(2) |
where and are the phenotypic effects of the parents’ alleles at site l.
The population dynamics follows from the standard model of a diploid, panmictic population of constant size N, with nonoverlapping generations. In each generation, parents are randomly chosen to reproduce with probabilities proportional to their fitness (i.e., Wright-Fisher sampling with viability selection), followed by mutation, free recombination (i.e., no linkage), and Mendelian segregation. We further assume that the mutation rate per site, u, and the population size are sufficiently small such that no more than 2 alleles segregate at any time at each site (i.e., that θ = 4Nu ≪ 1) and therefore an infinite sites approximation applies. The number of mutations per gamete per generation therefore follows a Poisson distribution with mean U = Lu; based on biological considerations (see Sections 4.1 and 4.2 in S1 Text), we also assume that 1 ≫ U ≫ 1/2N. The size of mutations in the n-dimensional trait space, , is drawn from some distribution, assuming only that a2 ≪ w2. We later show that this requirement is equivalent to the standard assumption about selection coefficients satisfying s ≪ 1 (also see Section 4.3 in S1 Text). The directions of mutations are assumed to be isotropic, i.e., uniformly distributed on the hypersphere in n-dimensions defined by their size, although we later show that our results are robust to relaxing this assumption as well.
Results
The phenotypic distribution
In the first 3 sections, we develop the tools that we later use to study genetic architecture. We start by considering the equilibrium distribution of phenotypes in the population and generalizing previous results for the case with a single trait [26,66,67,70]. Under biologically sensible conditions, this distribution is well approximated by a tight multivariate normal centered at the optimum. Namely, the distribution of n-dimensional phenotypes, , in the population, is well approximated by the probability density function:
(3) |
where σ2 is the genetic variance of the phenotypic distances from the optimum (see Eq A25 in S1 Text for closed form); and under plausible assumptions about the rate and size of mutations (i.e., when 1 ≫ U ≫ 1/2N and a2 ≪ w2), it satisfies σ2 ≪ w2, implying small variance in fitness in the population (Section 4.2 in S1 Text). Intuitively, the phenotypic distribution is normal because it derives from additive and (approximately) independently and identically distributed contributions from many segregating sites. Moreover, the population mean remains extremely close to the optimum because stabilizing selection becomes increasingly stronger with the displacement from it and because any displacement is rapidly offset by minor changes to allele frequencies at many segregating sites.
With phenotypes close to the optimum, only the curvature of the fitness function at the optimum (i.e., the multidimensional second derivative) affects the selection acting on individuals. In addition, it is always possible to choose an orthonormal coordinate system centered at the optimum, in which the trait under consideration varies along the first coordinate and a unit change in other traits (along other coordinates) near the optimum has the same effect on fitness. These considerations suggest that the equilibrium behavior is insensitive to our choice of fitness function around the optimum. Moreover, in S1 Text (Section 5), we show that the rapid offset of perturbations of the population mean from the optimum (by minor changes to allele frequencies at numerous sites) lends robustness to the equilibrium dynamics with respect to the presence of major loci, moderate changes in the optimal phenotype over time, and moderate asymmetries in the mutational distribution.
Allelic dynamic
Next, we consider the dynamic at a segregating site and generalize previous results for the case with a single trait [68–70]. This dynamic can be described in terms of the first 2 moments of change in allele frequency in a single generation (see, e.g., [75]). To calculate these moments for an allele with phenotypic effect and frequency q (=1-p), we note that the phenotypic distribution can be well approximated as a sum of the expected contribution of the allele to the phenotype, , and the distribution of contributions to the phenotype from all other sites, . From Eq 3, it then follows that the distribution of background contributions is well approximated by probability density:
(4) |
By averaging the fitness of the 3 genotypes at the focal site over the distribution of genetic backgrounds, we find that the first moment is well approximated by
(5) |
assuming that a2 and σ2 ≪ w2 (Section 4 in S1 Text). By the same token, we find that
(6) |
which is the standard second moment with genetic drift.
The functional form of the first moment is equivalent to that of the standard viability selection model with underdominance. This result is a hallmark of stabilizing selection on (additive) quantitative traits: with the population mean at the optimum, the dynamics at different sites are decoupled, and selection at a given site acts to reduce its contribution to the phenotypic variance (2a2pq), thereby pushing rare alleles to loss. Comparison with the standard viability selection model shows that the selection coefficient in our model is s = a2/w2, or S = 2Ns = 2Na2/w2 in scaled units. In other words, the selection acting on an allele is proportional to its size squared in the n-dimensional trait space (where w translates effect size into units of fitness).
The relationship between selection and effect size
The statistical relationship between the strength of selection acting on mutations and their effect on a given trait follows from the aforementioned geometric interpretation of selection. Specifically, all mutations with a given selection coefficient, s, lie on a hypersphere in n-dimensions with radius , and any given mutation satisfies
(7) |
where ai is the allele’s effect on the i-th trait (Fig 1A). Our assumption that mutation is isotropic then implies that the probability density of mutations on the hypersphere is uniform.
The distribution of effect sizes on a focal trait, a1, corresponding to a given selection coefficient, s, follows. Given that mutation is symmetric in any given trait, E(a1|s) = 0, and given that it is symmetric among traits,
(8) |
More generally, the probability density corresponding to an effect size a1 is proportional to the volume of the (n − 2)–dimensional cross section of the hypersphere with projection a1 (Fig 1A). For a single trait, this implies that a1 = ±a with probability ½, and for n > 1, it implies the probability density
(9) |
(Section 1.2 in S1 Text). Intriguingly, when the number of traits n increases, this density approaches a normal distribution, i.e.,
(10) |
implying that the distribution of effect sizes given the selection coefficient becomes
(11) |
This limit is already well approximated for a moderate number of traits (e.g., n = 10; Fig 1B).
The limit behavior also holds when we relax the assumption of isotropic mutation. This generalization is important because, having chosen a parameterization of traits in which the fitness function near the optimum is isotropic, we can no longer assume that the distribution of mutations is also isotropic [76]. Specifically, mutations might tend to have larger effects on some traits than on others, and their effects on different traits might be correlated. In Section 5.4 in S1 Text, we show that the limit distribution (Eq 11) also holds for anisotropic mutation (excluding pathological cases). To this end, we introduce the concept of an effective number of traits, ne, which can take any real value ≥1 and is defined as the number of equivalent traits required to generate the same relationship between the strength of selection on mutations and their expected effects on the trait under consideration (i.e., replacing n in Eq 11). The robustness of our model, along with mounting evidence that genetic variation is highly pleiotropic (see “Introduction”), suggests that the limit form may apply quite generally. In that regard, we note that even in this limit, the strength of selection on mutations and their effects on the focal trait are correlated, implying that the kind of “purely pleiotropic” extreme postulated in previous works cannot arise [18–20].
Genetic architecture
We can now derive closed forms for summary statistics of the genetic architecture (see Section 2.3 in S1 Text). For mutations with a given selection coefficient, the frequency distribution follows from the diffusion approximation based on the first 2 moments of change in allele frequency (Eqs 5 and 6; [75]), and the distribution of effect sizes follows from the geometric considerations of the previous section. Conditional on the selection coefficient, these distributions are independent, and therefore, the joint distribution of frequency and effect size equals their product. Summaries of architecture can be expressed as expectations over the joint distribution of frequencies and effect sizes for a given selection coefficient and then weighted according to the distribution of selection coefficients. While we know little about the distribution of selection coefficients of mutations affecting quantitative traits, we can draw general conclusions from examining how summaries of architecture depend on the strength of selection.
Expected variance per site
We focus on the distribution of additive genetic variances among sites, a central feature of architecture that is key to connecting our model with GWAS results. We start by considering how selection affects the expected contribution of a site to additive genetic variance in a focal trait. We include monomorphic sites in the expectation, such that the expected total variance is given by the product of the expectation per-site and the population mutation rate, 2NU. Under the infinite sites assumption, sites are monomorphic or biallelic, and their expected contribution to variance is
(12) |
(expressed in terms of the scaled selection coefficient S). Thus, the degree of pleiotropy only affects the expectation through a multiplicative constant.
This multiplicative factor would have a discernable effect in generalizations of our model in which the degree of pleiotropy varies among sites. For example, if the degree of pleiotropy of one set of sites was k and of another set was l > k, and both sets were subject to the same strength of selection, then the expected contribution to genetic variance of sites in the first set would be l/k times greater than in the second (from Eq 12). While such generalizations may prove interesting in the future, here we focus on the model in which the degree of pleiotropy is constant. In this case, the multiplicative factor introduced by pleiotropy is not identifiable from data, because even if we could measure genetic variance in units of fitness (e.g., rather than in units of the total phenotypic variance), we still would not be able to distinguish between the effects of w and n on the genetic variance per site. We therefore focus on the effect of selection on the relative contribution to variance, which is insensitive to the degree of pleiotropy in our model.
The effect of selection on the relative contribution to genetic variance was described by Keightley and Hill (in the one-dimensional case [68]) and is depicted in Fig 2A. When selection is strong (roughly corresponding to S > 30), its effect on allele frequency (which scales with 1/S) is canceled out by its relationship with the effect size (Eq 8), yielding a constant contribution to genetic variance per site, vS = 2w2/nN, regardless of the selection coefficient (Section 3.1 in S1 Text; Fig 2A and Fig A1b in S1 Text). Henceforth, we measure genetic variance in units of vS. When selection is effectively neutral (roughly corresponding to S < 1) and thus too weak to affect allele frequency, the expected contribution of a site to genetic variance scales with the effect size and equals ½S (·vs) and therefore is lower than under strong selection (Section 3.1 in S1 Text; Fig 2A and Fig A1a in S1 Text). In between these selection regimes, selection effects on allele frequency are more complex and are influenced by underdominance (Section 3.1 in S1 Text). As the selection coefficient increases, the expected contribution to variance reaches vS at S ≈ 3 and continues to increase until it reaches a maximal contribution that is approximately 30% greater at S ≈ 10 (Fig 2A), after which it slowly declines to the asymptotic value of vS (Fig 2A and Fig A1b in S1 Text). Henceforth, we refer to this selection regime as intermediate (not to be confused with the nearly neutral range, which is much narrower and does not include selection coefficients with S > 10). These results suggest that effectively neutral sites should contribute much less to genetic variance than intermediate and strongly selected ones [67,68].
While intermediate and strongly selected sites contribute similarly to variance, their minor allele frequencies (MAFs) can differ markedly (Fig 2B). As an illustration, segregating sites with MAF > 0.1 account for approximately 72% and approximately 49% of the additive genetic variance for intermediate selection coefficients of S = 3 and 10, respectively, when almost no segregating sites would be found at such high MAF for a strong selection coefficient of S = 100 (Fig 2B). Thus, within the wide range of selection coefficients characterized as intermediate and strong, genetic variance arises from sites segregating at a wide range of MAFs ranging from common to exceedingly rare.
Distribution of variances among sites
Next, we consider how genetic variance is distributed among sites with a given selection coefficient. We focus on the distribution among segregating sites (including monomorphic effects would just add a point mass at 0). This distribution is especially relevant to interpreting the results of GWASs, because, to a first approximation, a study will detect only sites with contributions to variance exceeding a certain threshold, , which decreases as the study size increases (see “Discussion”). We therefore depict the distribution in terms of the proportion of genetic variance, G(v), arising from sites whose contribution to genetic variance exceeds a threshold v.
We begin with the case without pleiotropy (n = 1), in which selection on an allele determines its effect size (Fig 3A). When selection is strong (S > 30), the proportion of genetic variance exceeding a threshold v is also insensitive to the selection coefficient and takes a simple form, with
(13) |
(Fig 3A; Section 3.2 in S1 Text). In contrast, in the effectively neutral range (S < 1),
(14) |
where the dependency on the selection coefficient enters through , which is the maximal contribution to variance and corresponds to an allele frequency of ½ (Fig A4a; Section 3.2 in S1 Text). In the intermediate selection regime, G(v) is also intermediate and takes a more elaborate functional form (Section 3.2 in S1 Text). These results suggest how genetic variance would be distributed among sites given any distribution of selection coefficients (Fig 3A): starting from sites that contribute the most, the distribution would at first be dominated by strongly selected sites, and then the intermediate selected sites would begin to contribute, whereas effectively neutral sites would enter only for .
Pleiotropy causes sites with a given selection coefficient to have a distribution of effect sizes on the focal trait, thereby increasing the contribution to genetic variance of some sites and decreasing it for others. In Section 3.2 of S1 Text, we show that increasing the degree of pleiotropy, n, increases the proportion of genetic variance, G(v), for any threshold, v, regardless of the distribution of selection coefficients (Fig A5 in S1 Text). When variation in a trait is sufficiently pleiotropic for the distribution of effect sizes to attain the limit form (Eq 11)
(15) |
for strongly selected sites and
(16) |
for effectively neutral ones (Fig 3B and Fig A4b in S1 Text; Section 3.2 in S1 Text). The intermediate selection range is split between these behaviors: on the weaker end, roughly corresponding to S < 5, G(v) is similar to the effectively neutral case (Fig A4b and Section 3.2 in S1 Text); and on the stronger end, roughly corresponding to S > 5, G(v) is similar to the case of strong selection, with measurable differences only when v ≫ vs (inset in Fig 3B and Section 3.2 in S1 Text). We would therefore expect that as the sample size of a GWAS increases and the threshold contribution to variance decreases, intermediate and strongly selected sites (more precisely, sites with S > 5) will be discovered first, and effectively neutral sites will be discovered much later. In S1 Text (Section 3.2 and Fig A3 in S1 Text), we also derive corollaries for the distribution of numbers of segregating sites that make a given contribution to genetic variance.
Discussion
Interpreting the results of human GWASs
In humans, GWASs for many traits display a similar behavior: when sample sizes are small, the studies discover almost nothing, but once they exceed a threshold sample size, both the number of associations discovered and the heritability explained begin to increase rapidly [4,77]. Intriguingly though, both the threshold study size and rate of increase vary among traits. These observations raise several questions: How is the threshold study size determined? How should the number of associations and explained heritability increase with study size once this threshold is exceeded? With an order of magnitude increase in study sizes into the millions imminent, how much more of the genetic variance in traits should we expect to explain? The theory that we developed provides tentative answers to these questions.
To relate the theory to GWASs, we must first account for the power to detect loci that contribute to quantitative genetic variation. In studies of continuous traits, the power can be approximated by a step function, where loci that contribute more than a threshold value v* to additive genetic variance will be detected and those that contribute less will not (Section 6.1 in S1 Text). The threshold depends on the study size, m, and on the total phenotypic variance in the trait, VP, where v* ∝ VP/m (Section 6.1 in S1 Text; [77]); conversely, the study size m needed to detect loci with contributions above v* is proportional to VP/v*. Given a trait and study size, the number of associations discovered and heritability explained then follow from our predictions for the distribution of variances among sites.
When genetic variation in a trait is sufficiently pleiotropic, our results suggest that the first loci to be discovered in GWASs will be intermediate or strongly selected, with correspondingly large effect sizes (i.e., . The functional form of the distribution of variances among these loci (Eq 15 and Fig 3B) implies that for GWASs to capture a substantial proportion of the genetic variance, their threshold variance for detection v* has to be on the order of the expected variance contributed by strongly selected sites, vs, or smaller. We therefore expect the threshold study size for the discovery of intermediate and strongly selected loci to be proportional to VP/vs. When the study size exceeds this threshold, the number of associations detected and proportion of variance explained depend on the study size measured in units of VP/vs (Fig 4) and follow from the functional forms that we derived (Eq 15 and Table A1 in S1 Text). The dependence on VP/vs makes intuitive sense, as the total phenotypic variance VP is background noise for the discovery of individual loci whose contributions to variance are on the order of vs. Some results are modified when variation in a trait is only weakly pleiotropic, which is probably less common: notably, the threshold study size for strongly selected loci would be higher, and loci under intermediate selection would begin to be discovered only after the strongly selected ones (Fig A22 in S1 Text, Eq 13, and Eq A35 in S1 Text). Regardless of the degree of pleiotropy, effectively neutral loci would only begin to be discovered at much larger study sizes, after the bulk of intermediate and strongly selected variance has been mapped (Fig 4 and Fig A22 in S1 Text). Thus, the dependence of the explained heritability on study size is largely determined by VP/vs and by the proportion of heritable variance arising from intermediate and strongly selected loci, whereas the number of associations also depends on the mutational target size, providing a tentative explanation for why the performance of GWASs varies among traits.
Inference and prediction
Importantly, these theoretical predictions can be tested. As an illustration, we consider height and body mass index (BMI) in Europeans, 2 traits for which GWASs have discovered a sufficiently large number of genome-wide significant (GWS) associations (697 for height [16] and 97 for BMI [15]) for our test to be well powered. We fit our theoretical predictions to the distributions of variances among GWS associations reported for each of these traits, assuming that these distributions faithfully reflect what they would look like for the true causal loci (see Section 6.3 in S1 Text). We further assume that these loci are under intermediate or strong selection (as our predictions suggest) and that they are highly pleiotropic (see "Introduction"; [37, 42, 45]). Under these assumptions, we expect the distribution of variances to be well approximated by a simple form (Eq A89 in S1 Text), which depends on a single parameter, vs. We find that the theoretical distribution with the estimated vs fits the data for both traits well (Fig 5A): we cannot reject our model based on the data for either trait (by a Kolmogorov-Smirnov test, p = 0.14 for height and p = 0.54 for BMI; Section 7.5 in S1 Text). By comparison, without pleiotropy (n = 1), our predictions provide a poor fit to these data (by a Kolmogorov-Smirnov test, p < 10−5 for height and p = 0.05 for BMI; Fig A14 in S1 Text).
Fitting the model to GWAS results further allows us to make inferences about evolutionary parameters (Sections 7.1 and 7.3 in S1 Text). By including the degree of pleiotropy (n) as an additional parameter, we find that for both height and BMI, n is sufficiently large for it to be indistinguishable from the high pleiotropy limit. Based on the shape of the distributions in this limit and on scaling the threshold values of v* in units of our estimates for vs, we estimate that the proportion of variance arising from mutations within the range of detectable selection effects is approximately 50% for height and approximately 15% for BMI. Further relying on the number of associations that fall above the thresholds, we infer that, within this range, height has a mutational target size of approximately 5 Mb, whereas BMI has a target size of approximately 1 Mb (Table A2 in S1 Text).
These parameter estimates can help to interpret GWAS results. They suggest that, despite their comparable sample sizes, the GWAS for height succeeded in mapping a substantially greater proportion of the heritable variance than the GWAS for BMI (approximately 20% compared to approximately 3%–5%) primarily because the proportion of variance arising from mutations within the range of detectable selection effects for height is much greater than for BMI. Moreover, the estimates of target sizes and the relationship between sample size and threshold contribution to variance can be used to predict how the explained heritability and number of associations should increase with sample size (Fig 5B and 5C). These predictions are likely underestimates as the range of detectable selection effects itself should also increase with sample size.
We can also examine to what extent our inferences are consistent with data and estimates from earlier studies. For example, the distribution of variances that we inferred for height fits those obtained in a recent GWAS of height based on exome genotyping (Kolmogorov-Smirnov test, p = 0.99; Fig A15b and Section 8.1 in S1 Text). In addition, the proportion of variance that we estimate to arise from the range of selection effects detectable in existing GWASs for height and BMI is consistent with estimates of the heritable variance tagged by all single-nucleotide polymorphisms (SNPs) with MAF > 1% [60, 61]; Section 8.2 in S1 Text.
The effect of polygenic adaptation
While we have assumed that quantitative traits have been subject to long-term stabilizing selection, recent studies indicate that some traits, and height in particular, have also been subject to recent directional selection [78–82]. Under plausible evolutionary scenarios, recent directional selection can induce large changes to the mean phenotype through the collective response at many segregating loci while having a negligible effect on allele frequencies at individual loci [21,83]. This very subtle effect on allele frequencies is likely one reason why polygenic adaptation is so difficult to detect and why studies have to pool faint signals across many loci to do so [78–82]. In Section 5.1 of S1 Text, we show that the distribution of allele frequencies on which our results rely is insensitive to sizable recent changes to the optimal phenotype. Importantly then, even when recent directional selection has occurred and its effects are discernable, the genetic architecture of a trait is nonetheless likely to be dominated by the effects of longer-term stabilizing selection.
The effect of demography
In contrast, recent changes in the effective population size are likely to have had a dramatic effect on allele frequencies and thus on the genetic architecture of quantitative traits [84,85]. In particular, European populations in which the GWASs for height and BMI were performed are known to have experienced dramatic changes in population size, including an Out-of-Africa (OoA) bottleneck about 100 KYA and explosive growth over the past 5 KY [86–89]. To study how these changes would have affected genetic architecture, we simulated allelic trajectories under our model and historical changes in population sizes in Europeans (relying on the model of [89]; Section 9 in S1 Text).
Our results suggest that the individual segregating sites with the greatest contribution to the extant genetic variance have selection coefficients around s = 10−3 and are due to mutations that originated shortly before or during the OoA bottleneck (Fig 6A and Section 9 in S1 Text). These mutations ascended to relatively high frequencies during the bottleneck and minimally decreased in frequency during subsequent, recent increases in population size, thereby resulting in large contributions to current genetic variance. Segregating sites under weaker selection contribute much less to variance because of their smaller effect sizes (i.e., for the same reason that applied in the case with a constant population size). Finally, and in contrast to the case with a constant population size, individual segregating sites under stronger selection (e.g., s ≥ 10−2.5) contribute much less to current variance than those with s ≈ 10−3. Mutations at these sites are younger and arose after the bottleneck, when the population size was considerably larger, resulting in much lower initial and current frequencies and therefore a lower per (segregating) site contribution to variance (as distinct from the proportion of strongly selected sites that are currently segregating, which will have greatly increased, resulting in the same total contribution to variance; [84, 85]). In Section 10 in S1 Text, we discuss one implication of these demographic effects: that the reliance on genotyping rather than resequencing in GWASs had a minimal effect on the discovery of associations.
Segregating loci with s ≈ 10−3 not only make the largest contributions to the current variance but also are likely to account for most of the GWS associations in the GWASs of height and BMI (Section 9 in S1 Text). When we account for the discovery thresholds of these studies, the expected distribution of variances for loci with s ≈ 10−3 closely matches the distribution observed among GWS associations (Fig 6B and Fig A20b in S1 Text). Moreover, these distributions closely match our theoretical predictions for s ≈ 10−3 and an Ne ≈ 5,000 (Fig 6B)—roughly the effective population size experienced by mutations that originated shortly before or during the bottleneck. This match likely explains why the results predicted on a constant population size fit the data well nonetheless. Our interpretation of GWAS findings is supported by other aspects of the data (Section 9 in S1 Text).
Our conclusions about the high degree of pleiotropy of genetic variation for height and BMI and the differences between these traits are likely robust to demographic effects, given how well our model fits the distributions of variances among loci, once we account for European demographic history. However, we might be underestimating the mutational target sizes and total heritable variances associated with the selection effects currently visible in GWASs, as simulations with European demographic history indicate that the proportion of variance arising from loci with s ≈ 10−3 explained by current GWASs is lower than our equilibrium estimates (approximately 42% compared to approximately 53% for height and approximately 29% compared to approximately 38% for BMI). By the same token, we likely underestimated the future increases in explained heritability with increases in study sizes (Fig 5B and 5C).
Conclusion
In summary, a ground-up model of stabilizing selection and pleiotropy can go a long way toward explaining the findings emerging from GWASs. Important next steps involve explicitly using more information from GWASs in the inferences. In particular, we can learn more about the selection acting on quantitative genetic variation by explicitly incorporating information about frequency and effect size (rather than their combination in terms of variance) and by including information from associations that do not attain genome-wide significance. Doing so will further require directly incorporating the effects of recent demographic history on genetic architecture [84,85]. An extended version of the inference, applied to the myriad traits now subject to GWASs, should allow us to learn about differences in the genetic architectures of traits and answer long-standing questions about the evolutionary forces that shape quantitative genetic variation.
Supporting information
Acknowledgments
We have benefited hugely from discussions with and comments from Guy Amster, Nick Barton, Jeremy Berg, Graham Coop, Laura Hayward, David Murphy, Joe Pickrell, Jonathan Pritchard, and Molly Przeworski.
Abbreviations
- BMI
body mass index
- GWAS
genome-wide association study
- GWS
genome-wide significant
- MAF
minor allele frequency
- OoA
Out of Africa
- SNP
single-nucleotide polymorphism
Data Availability
A Mathematica notebook for calculating the main functions and reproducing the main figures is available at https://github.com/sellalab/GenArchitecture. Simulation code (in Python and C++) is also available at https://github.com/sellalab/GenArchitecture. See S1 Text for details.
Funding Statement
NIH (grant number GM115889). Received by GS. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Falconer DS, Mackay TFC. Introduction to quantitative genetics. 4th ed Essex, England: Benjamin Cummings; 1996. February 16, 1996. [Google Scholar]
- 2.Lynch M, Walsh B. Genetics and analysis of quantitative traits: Sinauer Sunderland, MA; 1998.
- 3.Provine WB. The origins of theoretical population genetics: With a new afterword: University of Chicago Press; 2001. [Google Scholar]
- 4.Visscher PM, Brown MA, Mccarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Barton NH, Keightley PD. Understanding quantitative genetic variation. Nat Rev Genet. 2002;3(1):11–21. doi: 10.1038/nrg700 [DOI] [PubMed] [Google Scholar]
- 6.Manolio TA. Genomewide association studies and assessment of the risk of disease. New Engl J Med. 2010;363(2):166–76. doi: 10.1056/NEJMra0905980 [DOI] [PubMed] [Google Scholar]
- 7.Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494(7436):234–7. doi: 10.1038/nature11867 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–53. doi: 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zaitlen N, Kraft P, Patterson N, Pasaniuc B, Bhatia G, Pollack S, et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 2013;9(5):e1003520 doi: 10.1371/journal.pgen.1003520 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gibson G. Rare and common variants: Twenty arguments. Nat Rev Genet. 2012;13(2):135–45. doi: 10.1038/nrg3118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010;11(6):446–50. doi: 10.1038/nrg2809 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci USA. 2012;109(4):1193–8. doi: 10.1073/pnas.1119675109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88(3):294–305. doi: 10.1016/j.ajhg.2011.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, et al. Searching for missing heritability: Designing rare variant association studies. Proc Natl Acad Sci USA. 2014;111:E455–E64. doi: 10.1073/pnas.1322563111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46(11):1173–86. doi: 10.1038/ng.3097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Agarwala V, Flannick J, Sunyaev S, Consortium GD, Altshuler D. Evaluating empirical bounds on complex disease genetic architecture. Nat Genet. 2013;45:1418–27. doi: 10.1038/ng.2804 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Caballero A, Tenesa A, Keightley PD. The nature of genetic variation for complex traits revealed by GWAS and regional heritability mapping analyses. Genetics. 2015;201(4):1601–13. doi: 10.1534/genetics.115.177220 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–37. doi: 10.1086/321272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Eyre-Walker A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc Natl Acad Sci USA. 2010;107:1752–6. doi: 10.1073/pnas.0906182107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.De Vladar HP, Barton N. Stability and response of polygenic traits to stabilizing selection and mutation. Genetics. 2014;197(2):749–67. doi: 10.1534/genetics.113.159111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Turelli M. Heritable genetic variation via mutation-selection balance: Lerch’s zeta meets the abdominal bristle. Theor Popul Biol. 1984;25(2):138–93. [DOI] [PubMed] [Google Scholar]
- 23.Barton NH, Turelli M. Evolutionary quantitative genetics: How little do we know? Annu Rev Genet. 1989;23:337–70. doi: 10.1146/annurev.ge.23.120189.002005 [DOI] [PubMed] [Google Scholar]
- 24.Hill WG, Kirkpatrick M. What animal breeding has taught us about evolution. Annu Rev Ecol Evol Syst. 2010;41:1–19. [Google Scholar]
- 25.Johnson T, Barton N. Theoretical models of selection and mutation on quantitative traits. Phil Trans R Soc Lon B. 2005;360(1459):1411–25. doi: 10.1098/rstb.2005.1667 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lande R. Natural selection and random genetic drift in phenotypic evolution. Evolution. 1976;30(2):314–34. doi: 10.1111/j.1558-5646.1976.tb00911.x [DOI] [PubMed] [Google Scholar]
- 27.Hodgins-Davis A, Rice DP, Townsend JP. Gene expression evolves under a house-of-cards model of stabilizing selection. Mol Biol Evol. 2015;32(8):2130–40. doi: 10.1093/molbev/msv094 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Byars SG, Ewbank D, Govindaraju DR, Stearns SC. Natural selection in a contemporary human population. Proc Natl Acad Sci USA. 2010;107(suppl 1):1787–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Stulp G, Pollet TV, Verhulst S, Buunk AP. A curvilinear effect of height on reproductive success in human males. Behav Ecol Sociobiol. 2012;66(3):375–84. doi: 10.1007/s00265-011-1283-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Frederick DA, Haselton MG. Why is muscularity sexy? Tests of the fitness indicator hypothesis. Pers Soc Psychol Bull. 2007;33(8):1167–83. doi: 10.1177/0146167207303022 [DOI] [PubMed] [Google Scholar]
- 31.Endler JA. Natural selection in the wild: Princeton University Press; 1986. [Google Scholar]
- 32.Kingsolver JG, Hoekstra HE, Hoekstra JM, Berrigan D, Vignieri SN, Hill CE, et al. The strength of phenotypic selection in natural populations. Am Nat. 2001;157(3):245–61. doi: 10.1086/319193 [DOI] [PubMed] [Google Scholar]
- 33.Charlesworth B, Lande R, Slatkin M. A neo-darwinian commentary on macroevolution. Evolution. 1982;36(3):474–98. doi: 10.1111/j.1558-5646.1982.tb05068.x [DOI] [PubMed] [Google Scholar]
- 34.Barton NH. Pleiotropic models of quantitative variation. Genetics. 1990;124(3):773–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kondrashov AS, Turelli M. Deleterious mutations, apparent stabilizing selection and the maintenance of quantitative variation. Genetics. 1992;132(2):603–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wagner GP. Apparent stabilizing selection and the maintenance of neutral genetic variation. Genetics. 1996;143(1):617–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Pickrell JK, Berisa T, Liu JZ, Segurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet. 2016;48(7):709–17. doi: 10.1038/ng.3570 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, et al. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89(5):607–18. doi: 10.1016/j.ajhg.2011.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: Challenges and strategies. Nat Rev Genet. 2013;14(7):483–95. doi: 10.1038/nrg3461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Cotsapas C, Voight BF, Rossin E, Lage K, Neale BM, Wallace C, et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 2011;7(8):e1002254 doi: 10.1371/journal.pgen.1002254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Andreassen OA, Thompson WK, Schork AJ, Ripke S, Mattingsdal M, Kelsoe JR, et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet. 2013;9(4):e1003455 doi: 10.1371/journal.pgen.1003455 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47(11):1236–41. doi: 10.1038/ng.3406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28(19):2540–2. doi: 10.1093/bioinformatics/bts474 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45(9):984–94. doi: 10.1038/ng.2711 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: From polygenic to omnigenic. Cell. 2017;169(7):1177–86. doi: 10.1016/j.cell.2017.05.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Visscher PM, Yang J. A plethora of pleiotropy across complex traits. Nat Genet. 2016;48(7):707–8. doi: 10.1038/ng.3604 [DOI] [PubMed] [Google Scholar]
- 47.Charlesworth B. Evidence against fisher’s theory of dominance. Nature. 1979;278(5707):848–9. [Google Scholar]
- 48.Segrè D, Deluna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2004;37:77 doi: 10.1038/ng1489 [DOI] [PubMed] [Google Scholar]
- 49.Phadnis N, Fry JD. Widespread correlations between dominance and homozygous effects of mutations: Implications for theories of dominance. Genetics. 2005;171(1):385–92. doi: 10.1534/genetics.104.039016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wright S. Fisher’s theory of dominance. Am Nat. 1929;63(686):274–9. [Google Scholar]
- 51.Agrawal AF, Whitlock MC. Inferences about the distribution of dominance drawn from yeast gene knockout data. Genetics. 2011;187(2):553–66. doi: 10.1534/genetics.110.124560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008;4(2):e1000008 doi: 10.1371/journal.pgen.1000008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Clayton DG. Prediction and interaction in complex disease genetics: Experience in type 1 diabetes. PLoS Genet. 2009;5(7):e1000540 doi: 10.1371/journal.pgen.1000540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Crow JF. On epistasis: Why it is unimportant in polygenic directional selection. Phil Trans R Soc Lon B. 2010;365(1544):1241–4. doi: 10.1098/rstb.2009.0275 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ávila V, Pérez-Figueroa A, Caballero A, Hill WG, García-Dorado A, López-Fanjul C. The action of stabilizing selection, mutation, and drift on epistatic quantitative traits. Evolution. 2014;68(7):1974–87. doi: 10.1111/evo.12413 [DOI] [PubMed] [Google Scholar]
- 56.Wei W-H, Hemani G, Haley CS. Detecting epistasis in human complex traits. Nat Rev Genet. 2014;15(11):722–33. doi: 10.1038/nrg3747 [DOI] [PubMed] [Google Scholar]
- 57.Perry JRB, Day F, Elks CE, Sulem P, Thompson DJ, Ferreira T, et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature. 2014;514(7520):92–7. doi: 10.1038/nature13545 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Scott RA, Lagou V, Welch RP, Wheeler E, Montasser ME, Luan JA, et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet. 2012;44(9):991–1005. doi: 10.1038/ng.2385 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Shi H, Kichaev G, Pasaniuc B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am J Hum Genet. 2016;99(1):139–53. doi: 10.1016/j.ajhg.2016.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Yang J, Benyamin B, Mcevoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–9. doi: 10.1038/ng.608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AaE, Lee SH, et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat Genet. 2015;47(10):1114–20. doi: 10.1038/ng.3390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wood AR, Tuke MA, Nalls MA, Hernandez DG, Bandinelli S, Singleton AB, et al. Another explanation for apparent epistasis. Nature. 2014;514(7520):E3–E5. doi: 10.1038/nature13691 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, Kochan G, et al. Interaction between ERAP1 and HLA-b27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-b27 in disease susceptibility. Nat Genet. 2011;43(8):761–7. doi: 10.1038/ng.873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.The International Multiple Sclerosis Genetics Consortium. Class ii HLA interactions modulate genetic risk for multiple sclerosis. Nat Genet. 2015;47(10):1107–13. doi: 10.1038/ng.3395 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Moutsianas L, Jostins L, Beecham AH, Dilthey AT, Xifara DK, Ban M, et al. Class ii HLA interactions modulate genetic risk for multiple sclerosis. Nat Genet. 2015;47(10):1107–13. doi: 10.1038/ng.3395 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Barton N, Turelli M. Adaptive landscapes, genetic distance and the evolution of quantitative characters. Genet Res. 1987;49(02):157–73. [DOI] [PubMed] [Google Scholar]
- 67.Bulmer MG. The genetic variability of polygenic characters under optimizing selection, mutation and drift. Genet Res. 1972;19(01):17–25. [DOI] [PubMed] [Google Scholar]
- 68.Keightley PD, Hill WG. Quantitative genetic variability maintained by mutation-stabilizing selection balance in finite populations. Genet Res. 1988;52(01):33–43. [DOI] [PubMed] [Google Scholar]
- 69.Robertson A. The effect of selection against extreme deviants based on deviation or on homozygosis. J Genet. 1956;54(2):236–48. [Google Scholar]
- 70.Wright S. The analysis of variance and the correlations between relatives with respect to deviations from an optimum. J Genet. 1935;30(2):243–56. [Google Scholar]
- 71.Keightley PD, Hill WG. Variation maintained in quantitative traits with mutation-selection balance: Pleiotropic side-effects on fitness traits. Proc R Soc Lond B Biol Sci. 1990;242(1304):95–100. [Google Scholar]
- 72.Fisher RA. The genetical theory of natural selection. Oxford, England: Clarendon Press; 1930. [Google Scholar]
- 73.Lande R. The maintenance of genetic variability by mutation in a polygenic character with linked loci. Genet Res. 1975;26(3):221–35. [DOI] [PubMed] [Google Scholar]
- 74.Lande R. The genetic covariance between characters maintained by pleiotropic mutations. Genetics. 1980;94(1):203–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ewens WJ. Mathematical population genetics 1: I. Theoretical introduction: Springer Science & Business Media; 2004. [Google Scholar]
- 76.Martin G, Lenormand T. A general multivariate extension of fisher’s geometrical model and the distribution of mutation fitness effects across species. Evolution. 2006;60(5):893–907. [PubMed] [Google Scholar]
- 77.Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet. 2014;15(5):335–46. doi: 10.1038/nrg3706 [DOI] [PubMed] [Google Scholar]
- 78.Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLoS Genet. 2014;10(8):e1004412 doi: 10.1371/journal.pgen.1004412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Field Y, Boyle EA, Telis N, Gao Z, Gaulton KJ, Golan D, et al. Detection of human adaptation during the past 2000 years. Science. 2016;354(6313):760–4. doi: 10.1126/science.aag0776 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Turchin MC, Chiang CW, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN, et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat Genet. 2012;44(9):1015–9. doi: 10.1038/ng.2368 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Berg JJ, Zhang X, Coop G. Polygenic adaptation has impacted multiple anthropometric traits. bioRxiv. 2017.
- 82.Robinson MR, Hemani G, Medina-Gomez C, Mezzavilla M, Esko T, Shakhbazov K, et al. Population genetic differentiation of height and body mass index across Europe. Nat Genet. 2015;47(11):1357–62. doi: 10.1038/ng.3401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Jain K, Stephan W. Rapid adaptation of a polygenic trait after a sudden environmental shift. Genetics. 2017;207(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nat Genet. 2014;46(3):220–4. doi: 10.1038/ng.2896 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Lohmueller KE. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 2014;10(5):e1004379 doi: 10.1371/journal.pgen.1004379 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Wall JD, Przeworski M. When did the human population size start increasing? Genetics. 2000;155(4):1865–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Coventry A, Bull-Otterson LM, Liu X, Clark AG, Maxwell TJ, Crosby J, et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat Commun. 2010;1:131 doi: 10.1038/ncomms1130 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Tennessen JA, Bigham AW, O’connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337(6090):64–9. doi: 10.1126/science.1219240 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Schiffels S, Durbin R. Inferring human population size and separation history from multiple genome sequences. Nat Genet. 2014;46(8):919–25. doi: 10.1038/ng.3015 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
A Mathematica notebook for calculating the main functions and reproducing the main figures is available at https://github.com/sellalab/GenArchitecture. Simulation code (in Python and C++) is also available at https://github.com/sellalab/GenArchitecture. See S1 Text for details.