Summary
Phenotype prediction is a key goal for medical genetics. Unfortunately, most genome-wide association studies are done in European populations, which reduces the accuracy of predictions via polygenic scores in non-European populations. Here, we use population genetic models to show that human demographic history and negative selection on complex traits can result in population-specific genetic architectures. For traits where alleles with the largest effect on the trait are under the strongest negative selection, approximately half of the heritability can be accounted for by variants in Europe that are absent from Africa, leading to poor performance in phenotype prediction across these populations. Further, under such a model, individuals in the tails of the genetic risk distribution may not be identified via polygenic scores generated in another population. We empirically test these predictions by building a model to stratify heritability between European-specific and shared variants and applied it to 37 traits and diseases in the UK Biobank. Across these phenotypes, ∼30% of the heritability comes from European-specific variants. We conclude that genetic association studies need to include more diverse populations to enable the utility of phenotype prediction in all populations.
Keywords: negative selection, complex traits, polygenic scores, risk prediction, population history, population genetics, simulations
Introduction
The past decade of genome wide association studies (GWASs) has uncovered a plethora of trait-associated loci scattered across the genome.1, 2, 3, 4 Geneticists have devoted many resources to turning these associations into phenotype prediction models that aggregate variants across the genome into a polygenic score. Such scores can be used to guide healthcare decisions for a variety of traits and diseases,5 and recent work has suggested these polygenic scores may be ready for clinical use.6,7 While individuals with high polygenic risk for diseases have been found via these scores, for example in atherosclerosis8 and breast cancer,9 challenges remain in applying these polygenic scores uniformly across populations. Recent analyses have suggested that because many of the largest studies are concentrated on European populations, polygenic scores may be biased and less informative in non-European populations.10, 11, 12, 13, 14, 15 There are several reasons why polygenic scores may not transfer well across populations. One possibility is that alleles have different effect sizes in different populations, owing to differences in interactions with the environment.16 Another possibility is that differences in linkage disequilibrium (LD) between variants across populations means that causal variants may be tagged differently in non-European populations, leading to differences in effect sizes.11,17 Finally, the original polygenic score performance in Europeans may be inflated because of population stratification.18,19
Here, we propose that an additional reason for the lack of transferability of polygenic scores is that each population has its own genetic architecture, owing to the evolutionary processes that give rise to traits. Under this reasoning, a population’s demographic history influences the number of causal variants and their frequencies, resulting in some phenotypic variance coming from causal variants that are population specific. For example, work on the genetic architecture of skin color in African populations has uncovered distinct loci affecting the trait in each population, suggesting that populations with independent demographic histories can end up with different genetic architectures and causal variants for the same traits.20 Indeed, modeling work suggests that genetic architecture is an outcome of the evolutionary process rather than a trait-specific property.21
Recent exponential growth in human populations has created an excess of new variants that tend to be low frequency and population specific (private variation22, 23, 24). Population genetic models of genetic architecture that include negative selection suggest that, in aggregate, low-frequency variants could contribute substantially to traits.25, 26, 27 Application of these models to large-scale genetic datasets has discovered that many traits are under apparent negative selection, ranging from anthropometric traits to molecular phenotypes.28, 29, 30, 31, 32, 33 Depending on the interplay between allele frequency and effect size, these variants could make up a large portion of the heritability for many traits, as demonstrated by a recent GWAS on height and BMI using whole-genome sequencing data.34,35 Because narrow-sense heritability is the proportion of variance explained by additive genetic factors, it is directly related to the accuracy of phenotypic prediction as the variance explained by the polygenic score.36 If these private variants contribute substantially to heritability, it follows that the variants will not be useful for phenotype prediction between populations because they are not present in other populations. The proportion of narrow-sense heritability that private variants explain places an upper bound on the accuracy of polygenic scores between populations.
In this study, we use simulations under demographic scenarios of recent explosive population growth with varying amounts of negative selection as well as analyses of empirical data to test the role of private variants in complex traits.
Material and methods
Population genetic modeling and simulations
We performed forward simulations by using SLiM v.3.37 We simulated a demographic history for a European and an African population according to the demographic model fit by Gravel et al.38 (including migration). The African population size expanded to 14,474 individuals and the European population began at a size of 1,032 individuals after splitting from Africa and grew exponentially at a rate of 0.38% per generation for 920 generations. We simulated a mutational target size of 5 Mb with a mutation rate of 1.2 × 10−8 per base pair (bp) and a recombination rate of 1 × 10−8 per bp. To simulate selection across the entire region, we drew selection coefficients for new mutations from a gamma distribution with parameters fit by Kim et al.39 (mean = −0.01026, = 0.186). We sampled 10,000 haploid genomes from each population. To simulate a quantitative trait, we followed the model described by Eyre-Walker25 and the framework set by Lohmueller21 where a SNP’s effect on a trait, , is given by
where with equal probability, , and is the selection coefficient of a variant segregating in the population at the end of the simulation. is a scaling factor for effect sizes and controls the heritability for a given mutational target size. In these simulations, was set to obtain a heritability of ∼0.4 (see Table S1). Finally, reflects the relationship between a SNP’s effect on fitness and the trait. indicates no relationship between fitness and the trait, while indicates that mutations that are more evolutionarily deleterious are those that have larger effects on the trait. In this model, when , the trait itself may be under direct selection or it may be correlated with a trait under selection. We call variants private and shared on the basis of their allele frequency in a sample of 10,000 chromosomes from both populations (see below).
To compare our simulation results to the empirical data from the Exome Aggregation Consortium (ExAC), which includes African American individuals, we computed the expected allele frequency for SNP in simulated admixed African American individuals as
where and denote the allele frequencies in Europe and Africa, respectively. For each SNP, we drew an admixture proportion in order to incorporate variance in the admixture proportion along the genome. The parameters of the beta distribution were chosen to match the observed variation in admixture proportion in African American individuals40 and result in a mean proportion of African ancestry of 80%.
Defining the proportion of heritability from private variants: h2private
We begin by describing a model in which an individual, in a population, has a phenotype, that is a linear combination of genotypes (, ), effect sizes , and a normally distributed term describing the effect of the environment, :
The narrow-sense heritability, , of the phenotype, in the population is given by
where the variance of the phenotype can be decomposed into additive, dominance, interacting, and environmental terms: . The additive genetic variance is when there are variants, where is the allele frequency for variant and is the effect size of variant .
We wish to examine the proportion of heritability that comes from a particular class of variants. Consider a sister population, that diverged from the population described above . Variants in population can be partitioned into those that appear only in (private variants) or those that appear in both populations (shared variants). The total number of variants is the sum of the number of shared and number of private variants, . We wish to partition the heritability into these two classes, and , which make up the total heritability: . Define to be the proportion of the heritability accounted for by the private variants.
The quantity of interest, then, is
The additive genetic variance from private variants is , where is an indicator function that is 1 when the variant is private (with probability ) to the population and 0 otherwise. We describe how is estimated below when analyzing empirical data (see model to identify private variants).
Polygenic score calculation
We compute three sets of polygenic scores on the simulated individuals: (1) using all variants, (2) using variants private to the simulated population of interest, and (3) using variants shared between the simulated European and African populations. For each haploid genome, we sum the effect sizes, for each class of variants, resulting in three scores for each genome. We standardize the scores by subtracting the mean of the true polygenic score (class 1) and dividing by the standard deviation of the true polygenic score (class 1). We compute the Pearson correlation between classes 1 and 2 as well as classes 1 and 3 and report the r2 value as a percentage.
Model to identify private variants
When analyzing the empirical UK Biobank data, it is challenging to assess whether a particular variant is private or shared. If a variant is seen only in one population, it is possible that it is truly private to that population, or instead, it is shared but at too low a frequency to have been discovered with the number of individuals samples from the other population. To address this issue, we built a probabilistic model to evaluate the probability that a variant is private to a population given the number of copies of the allele in that population (that is, the allele frequency).
We begin with the intuition that rare alleles tend to be private and common alleles tend to be shared between populations, even in the presence of migration. Migration can be thought of as sampling alleles from one population and placing them in the other population. Under this model, rare alleles will tend to stay within a population and not transfer between populations. This suggests that allele frequency is informative in determining whether an allele is private or not.
Wakeley and Hey41 use coalescent theory to determine the frequency spectrum of private variants. An application of Bayes’ rule allows us to calculate the following probability:
where is the number of copies of the allele in the sample and is 1 if the allele is private and 0 if not. is the site frequency spectrum of private variants, and is given by the full site frequency spectrum. For example, in a constant-sized equilibrium population, . is the probability of a variant’s being private to a population.
Wakeley and Hey41 provide expressions to obtain these quantities in a constant-sized equilibrium population without natural selection. However, here we are concerned with populations that are not in equilibrium and with variants under negative selection, so we obtain these probabilities via simulation under a particular demographic model and distribution of fitness effects.
In the results presented here, we use the demographic model from Gravel et al.38 that relates European and African populations. We use a distribution of fitness effects from Kim et al.,39 assuming that mutations are additive (that is, = 0.5) and that selection coefficients, , are drawn from a gamma distribution with mean = −0.01026 and shape = 0.186. Using these parameters, we simulate data for 10,000 European chromosomes by using SLiM37 and compute (1) the proportional site frequency spectrum for private variants , (2) the proportional site frequency spectrum for all variants , and (3) the proportion of private variants . We defined private variants in the simulation as those that appear in the simulated European population but not the simulated African population.
Next, we store these quantities in a lookup table and use them to compute the probability that a variant is private given the number of copies of the allele in the empirical data. In the UK Biobank dataset, alleles are present at frequency and higher. However, in simulations, the lowest allele frequency is . For alleles below this frequency, we set the probability equal to the probability for alleles at a frequency of 1 in 10,000.
Testing our probabilistic model to infer private variants
We evaluated the ability of our model to distinguish between private and shared variants by simulating new data and performing binary classification, calling a variant private if the exceeded some threshold, . We varied this threshold and computed the number of true positive (private variants that are truly private), false positives (private variants that are truly shared), false negatives (shared variants that are truly private), and true negatives (shared variants that are truly shared). We summarized this by using receiver operator characteristic and precision recall curves (Figure S1; Tables S2 and S3).
We also validated our model by using data from ExAC.42 For each variant in ExAC, we used our model to compute the probability that the variant is private to the non-Finnish European population on the basis of the allele frequency in that population. Then, we checked whether variants were observed in a sample of 10,406 African and African American samples.
Partitioning heritability
We applied our Bayesian model to predict which variants are private to GWAS summary statistics from 37 traits in the UK Biobank released by the Neale lab (see web resources). We computed the additive genetic variance for variants with a high posterior probability of being private to the British cohort and divided that by the total amount of additive genetic variance explained by SNPs to obtain our estimate of h2private (Note S1). We also performed the inference by using a randomized algorithm to correct for the effects of LD and misestimated effect sizes as well as population stratification (Notes S2, S3, S4, and S5; Figures S3, S4, S5, S6, S7, S8, and S9). Finally, we also independently replicated the results on BMI by using data from the GIANT consortium43 (Note S1). Importantly, this partitioning of the heritability into shared and private components does not make use of the -model25 that relates a mutation’s effect on fitness to its effect on the trait.
Results
The distribution of European-specific variants in data and models
We begin by precisely defining private variants in the datasets and models that we consider. Studies of genomic variation point to the out-of-Africa bottleneck and subsequent explosive growth in population size as a key driver of the distribution of genomic variation. We focus on a simplified model of this history (Figure 1A; Gravel et al.38). We define private variants as those that are found in Europe but are absent from Africa and shared variants as those that are found in both populations. Note that by our definition, private variants may be shared between other out-of-Africa populations (e.g., between Europe and East Asia) because of shared recent history.
Figure 1.
Human population history generates population-specific variants
(A) Model for variants that are shared (common to Europe [EUR] and Africa [AFR]) and private (occurring only in EUR and absent from AFR). Bottom, examples of private and shared variants from ExAC.42
(B) The number of non-synonymous variants that are private to European populations and absent from African populations (blue bars) and the number of non-synonymous variants that are shared between the two populations in the 1KG exome dataset and the ExAC dataset (orange bars).
(C) The proportion of non-synonymous alleles above a given frequency that are private to Europe and absent from Africa in the ExAC dataset and in simulations based on human history. Note that because the ExAC dataset contains admixed African American individuals, the proportion of private variants is reduced compared with the original simulation (black dots). Modeling this admixture (red dots) shows a better fit to this dataset. Error bars denote standard deviation across simulation replicates.
One potential concern with this definition of whether a variant is private to Europe is that it may depend on the sample size of the African population used in the comparison. We examined this possibility by computing the probability of not observing an allele present in a sample of African individuals across a range of minor allele frequencies (MAFs) with a sample size of 10,000 chromosomes. This sample size is approximately similar to the sample size of the ExAC dataset (Lek et al.42). We find that variants with a frequency as low as 10−3 in the African population have a nearly 100% probability of being sampled in ExAC (Figure S2). Thus, we would correctly classify variants segregating at low frequency in Africa as being shared.
Next, we examined the number of private variants in European populations compared to African populations in two datasets: the 1000 Genomes (1KG) data and the ExAC data. In order to meaningfully compare the two datasets, we focused on variants contained in the exome. For both datasets, there are many more private variants in the European population compared to shared variants (Figure 1B). This is expected under models of human history where many shared alleles were lost during the out-of-Africa bottleneck and new mutations accumulated independently in the out-of-Africa population. Because of the small population size, some of these mutations could drift to a higher frequency than they would have in a larger population.
We next conducted simulations under this model of human evolution, where an ancestral population splits into a group that underwent a genetic bottleneck out of Africa (representing a European population) and a group that stayed within Africa without a bottleneck (representing an African population; Figure 1A 38), coupled with varying levels of negative selection on traits (including no negative selection). We include negative selection by modifying the relationship between a mutation’s effect on the trait and its effect on reproductive fitness by using the model put forth by Eyre-Walker in 201025 (see material and methods). This model includes a parameter, , which ties the selection coefficient of a mutation to its effect on a trait.25 Larger values of imply that more evolutionarily deleterious mutations have larger effects on the trait. Importantly, our model includes exponential growth in the out-of-Africa population, which creates an excess of private variants, as well as low levels of migration between the European and African populations, which can turn some private variants into shared variants. We compared our simulations to data from ExAC and found that our simulations predicted more higher-frequency private alleles than are observed in the data (Figure 1C). However, the ExAC data contains admixed African American individuals. Admixture can introduce variants that are private to Europe into the sample labeled “African.” We simulated this admixture process (see material and methods) and found that the resulting simulation matches the data closely, suggesting that our model is a reasonable approximation of human demography and selection (Figure 1C).
Population genetic models predict population-specific variants account for heritability and impact polygenic scores
We reasoned that since there are many private causal variants in our simulations, they may account for a substantial proportion of the heritability in aggregate. We examined the contribution of private variants to heritability and found that when traits are not tied to fitness , private variants account for ∼30% of the heritability (Figure 2A). However, when the coupling between trait effects and fitness effects is moderate or strong , private variants account for over half of the heritability, and there is a maximum of ∼79% under strong coupling (Figures 2B and 2C). These results suggest that many causal variants, which jointly explain much of the heritability, tend to be population specific. This effect is a consequence of how the trait relates to fitness as well as the demographic history of the population.
Figure 2.
The effect of natural selection on the relationship between heritability and allele frequency
(A–C) Cumulative fraction of heritability explained by private and shared variants under (A) no relation between a mutation’s effect on fitness and the trait , (B) moderate coupling between a mutation’s effect on fitness and the trait , and (C) strong coupling between a mutation’s effect on fitness and the trait. Note that the x axis is on a log scale. As increases, a greater fraction of heritability comes from variation that is found only within Europe.
The fact that many of the variants that affect the trait are not shared across populations may limit the applicability of polygenic scores derived from European populations to other populations. This effect would be distinct from imperfect tagging of causal variants due to differences in LD patterns between populations. To test for this effect in simulated data, we calculated true polygenic scores for individuals in the simulated European and African populations and asked how well polygenic scores derived from only private variants and only shared variants correlated with the true polygenic scores. Polygenic scores derived from only shared variants represent the case where a polygenic score can be transferred from Europe to another population. If shared variant effect sizes correlate well between populations, despite not contributing to a majority of additive genetic variance, polygenic scores may still be accurate across populations. We note that these simulations include identification of the true causal SNPs and, as such, are much higher than polygenic score accuracies reported elsewhere.13 These simulations represent the best-case scenario for polygenic scores. We found that when traits are independent of fitness, the shared polygenic score has a 91% correlation in Europe and 96% correlation in Africa with the true polygenic score, suggesting that polygenic scores can be applied between populations (Figure 3A). However, we found that when trait effects are tied to fitness effects, the correlation between shared polygenic scores and the true polygenic scores decreases (Figures 3B and 3C) and the correlation between private polygenic scores and true polygenic scores increases (Figures 3D, 3E, and 3F). Note that in the analysis with private polygenic scores, each population uses variants private to that population but not from the other population. That is, the African private polygenic score uses variants private to Africa. This suggests that the reduction in accuracy does not depend on the population’s specific demography, as the same pattern is present in European and African populations. For traits with strong coupling between trait effects and fitness effects ( = 0.5), the correlation between the true polygenic scores and the polygenic scores derived from shared variants drops to 62% in Europe and 57% in Africa (Table S4). These findings suggest that polygenic scores based solely on shared variants may be substantially less accurate than polygenic scores using all variants and may not transfer between populations well when the variants with the greatest effects on the trait are those under the most negative selection.
Figure 3.
The relationship between polygenic scores and natural selection
(A–F) Polygenic score accuracy for shared variants only (top row) and private variants only (bottom row) in Europe and Africa on simulated data with different degrees of negative selection. In the bottom row, each score uses private variants from within the population being considered (e.g., for Africa, we use variants private to Africa) but not from the other population. The black line shows the 1:1 line. (A and D) No relationship between a mutation’s effect on fitness and its effect on the trait (B and E) Moderate coupling between fitness and trait effects . As the strength of coupling increases, polygenic scores computed from shared variation become less correlated with the true polygenic score. However, at the same time, polygenic scores computed from private variation become more correlated with the true polygenic scores.
While shared variants do not capture the full distribution of polygenic scores, we asked whether individuals in the tail of the true polygenic score distribution remained in the tail when examining shared variants only. When there is no coupling between fitness and trait effects ( = 0), shared variants capture 35% of the tail correctly in Europe and 28% of the tail correctly in Africa (Table 1). However, when there is moderate coupling ( = 0.25), this number drops to 11% in Europe and 7% in Africa. When there is strong coupling, the polygenic score based on shared variants identifies none of the individuals in the tails of the distribution. If the trait under consideration is a disease, this analysis suggests that a polygenic score based on shared variation cannot identify individuals at the highest risk for that disease. In contrast, when considering only private variants, the polygenic score correctly identifies 44%–46% of individuals who are at the extremes of the distribution. These results suggest that when using scores derived from European populations, individuals who are truly in the tails of the polygenic score distribution will not be identified via shared variants alone, corresponding to a high false-negative error rate. In addition, the low recall for both of these polygenic scores suggests many individuals that are in the tails of the distribution will be missed.
Table 1.
The effect of natural selection on identifying high-risk individuals
Shared (Europe) | Private (Europe) | Shared (Africa) | Private (Africa) | |
---|---|---|---|---|
0 | 35% | 22% | 28% | 11% |
0.25 | 11% | 20% | 7% | 18% |
0.5 | 0% | 46% | 0% | 44% |
Percentage of individuals in the extreme 5% tail of the true polygenic score distribution that are recovered when using only private variants and shared variants in simulated European and African populations. Overall, the percentage of individuals correctly classified is low, suggesting that there will be many false negatives when using polygenic scores to identify individuals in the tails of the risk distribution. Further, as the degree of coupling between fitness effects and trait effects increases, shared variants correctly classify fewer individuals, while private variants classify more individuals correctly.
While our simulations suggest private variants may be an important component of the heritability and may limit phenotype prediction across populations, their precise role depends on the extent of negative selection acting on traits (either directly or through pleiotropy), which remains an open question.28, 29, 30,32,33 Thus, we next tested how much of the heritability private variants account for in real GWAS data in European populations, where GWAS data is abundant.
A model for private variation
We built a Bayesian model to classify variants segregating in the UK Biobank as private or shared by using the allele frequency conditional on a demographic model and distribution of fitness effects inferred for a European population (see material and methods). To validate our model, we simulated a new dataset under the same European demographic model and recorded whether each allele was observed in both populations. Then, we calculated the probability of each allele’s being private to the European population. We classified variants as private if the probability , is 1 if the allele is private and 0 if not, is the number of copies of the allele in the sample, and is some probability cutoff. For each cutoff, we calculated (1) the number of variants that we predict are private and are truly private (true positives), (2) the number of variants that we predict are private and are truly not private (false positives), (3) the number of variants that we predict are not private and are truly private (false negatives), and (4) the number of variants that we predict are not private and are truly not private (true negatives).
We summarize these numbers by using two curves: a precision-recall curve (Figure S1A) and a receiver operator characteristic (ROC) curve (Figure S1B). We find that at a precision of 94%, we have a recall of 99% and that the area under the ROC curve is 0.80, suggesting that our model is able to distinguish between private and shared variants on the basis of allele frequency alone (Table S3). We also tested the model on a simulated dataset including five times more individuals than the 10,000 individuals used in the initial simulation. Importantly, for this comparison, we used the same lookup table, based on 10,000 individuals, as before. This allows us to test how sample size affects our inferences. We find that the precision-recall curve is largely the same, but there is a decrease in the ROC curve (AUROC = 0.70).
In addition, examining versus the allele frequency in the simulated independent dataset (Figure S1C), we find that alleles higher than ∼10% frequency have a negligible probability of being private. This is consistent with the intuition that common alleles are unlikely to be private.
We also examined several posterior probability thresholds in detail (; Table S3). Across these thresholds, we find that the false discovery rate (FDR) from simulations is ∼5%, suggesting that the model is relatively robust to the threshold used.
Next, we empirically validated the performance of our model to infer whether variants are private. Using data from ExAC,42 we use our framework described above to calculate the probability that each variant is private by using the allele frequency in the non-Finnish Europeans (NFE). In Figure S1D, we plot this probability for a random subset of 10,000 variants. We see that variants above 10% frequency have a very low probability of being private and that variants below that frequency increase in probability of being private as their frequency decreases.
In addition, we classified variants in ExAC as private to “EUR” by using the simulation-based FDR of 5% and checked whether those variants were present in the “AFR” subset of samples (Table S2). We see that 83% of the variants we call private are not observed in “AFR” in a sample of 10,406 chromosomes. This suggests that our empirical-based FDR is 17% and is higher than the simulation-based FDR. However, the “AFR” sample in ExAC is a mixture of African American and African samples. Importantly, African American samples are admixed between European and African populations.42 This has the effect of introducing European variants into the “AFR” samples, making variants we expect to be private to “EUR” appear shared. Therefore, this estimate of the accuracy is most likely an underestimate.
Nonetheless, these simulations and empirical evaluations suggest that our model is able to distinguish between private and shared variants on the basis of allele frequency alone. Additionally, out of an abundance of caution, we utilize two different empirically based FDRs of 17% in downstream inferences as described below. Importantly, our determination of whether a variant is private or shared is expected to hold regardless of the sample size taken from either population (Table S2).
Inference of heritability accounted for by private variants: h2private
We used summary statistics for 37 different traits and diseases from the UK Biobank relating to anthropometric and blood-related traits as well as cancer-related and non-cancer related diseases (see web resources) to infer the proportion of the SNP-based heritability attributable to private variants, h2private. Using these data and our probabilistic method to determine whether a variant is private or not, we find that the average h2private = 31%, and there is substantial variation across traits (standard deviation: 11%; Figure 4). Examining categories of diseases, we find that cancer-related diseases have h2private = 12%, while non-cancer-related diseases have h2private = 32%. Similarly, private variants account for ∼30% of the heritability in blood-related and anthropometric traits. We observe substantial variability across different traits within a category. Two blood pressure-related traits have h2private of nearly 50%, while other blood-related traits have a lower proportion.
Figure 4.
Estimates of the amount of heritability from private variants
The expected reduction in accuracy when transferring a polygenic score from Europe to Africa (expressed as the percentage of heritability explained by private variants) across 37 traits and diseases in the UK Biobank. We only include SNPs with an MAF > 10−3. The mean reduction is 26.7% (SD across traits is 14.7%). “17% FDR correction” refers to randomly setting 17% of the SNPs that we call private to shared. “Max 17% FDR correction” refers to setting the 17% of the SNPs that explain the most heritability from private to shared. Lines indicate standard errors obtained via a 1 Mb block jackknife.
The effect of falsely identified private variants on our inference of h2private
To ensure that our results from the UK Biobank data described above were not driven by shared SNPs that we mistakenly classified as private, we adjusted for an empirically based FDR. At the threshold used for classifying variants as being private , validation in the empirical data suggest the FDR is ∼17% (see above). In other words, approximately 17% of SNPs that we identify as private may actually be shared. Thus, we adjusted our estimates of h2private by randomly reclassifying 17% of the private SNPs as shared and re-computed h2private (“17% FDR correction” in Figure 4). Despite the extremely conservative nature of this correction (because the empirical FDR is based on an admixed sample), we find that a sizeable proportion of the heritability (about 22%) still comes from private variants (Figure 4).
In addition to this conservative correction, we also performed an even more stringent correction where we sorted the SNPs we call private by their heritability and removed 17% of the SNPs that explain the most heritability. As expected, the amount of heritability from private variants goes down, but for most traits, the heritability explained by private variants is still greater than 10% (“Max FDR correction” in Figure 4). This suggests that our central claim, that private variants contribute to heritability, remains true even if our classification method is imperfect.
The effect of population stratification
Recent studies have highlighted the effects of stratification on polygenic scores.18,19 We considered whether stratification could have an effect on our analyses. To test this, we repeated our analyses by using only those SNPs showing stronger associations with the trait. Specifically, we employed p value cutoffs, using only SNPs with a p value lower than the cutoff (Figure S3). Broadly, for quantitative traits, we observe that as the p value threshold becomes stricter, the proportion of the heritability attributable to private variants decreases. This is due to the power to detect associations for private variants. The power to detect an association will be lower for private variants than shared variants because private variants tend to have lower allele frequencies. Therefore, as the p value cutoff decreases, we expect a lower proportion of heritability to come from private variants. We found that the total variance explained by SNPs for dichotomous traits was much lower than for quantitative traits. This effect produced a statistical artifact where the heritability from private variants tended to be very high for dichotomous traits (Figure S4).
In addition to this analysis, we were also concerned with the effect of differential population structure from rare variants.35,44 Therefore, we checked the robustness of our results to allele frequency filters. We computed h2private for atrial fibrillation, BMI, standing height, diastolic blood pressure, and type 2 diabetes with MAF cutoffs from 10−5, 10−4, 10−3, and 10−2 (Figure 5). We find that although h2private decreases, it still remains substantial up to a cutoff of 10−2. Although this analysis removes both real and spurious signals, it suggests that private variants do indeed explain a non-negligible proportion of heritability.
Figure 5.
The effect of MAF cutoffs on heritability from private variants
“Filter” refers to a quality and allele frequency filter that removes variants with a frequency below 10−3, a Hardy Weinberg equilibrium p value < 10−10, or SNP information score < 0.8. We show results for five commonly studied traits. For BMI, diastolic blood pressure, and standing height, the “filter yes” lines are behind the “filter no” lines. This suggests that the variant filter has no effect on the estimated heritability. Lines indicate standard errors obtained via a 1 Mb block jackknife.
Across different p value thresholds, a non-negligible proportion of heritability comes from private variants. However, this analysis does not alleviate all concerns about population stratification, as at a large enough sample size, an association due to stratification can be arbitrarily strong. Similarly, stratification could still occur when using variants at different MAF cutoffs. While these analyses provide evidence that our results are not primarily driven by stratification, they cannot completely rule it out. Further advances in controlling for stratification of rare variants will be crucial to understand the full contribution of private variants to heritability.
The effect of unmodeled LD on our inferences
Our inferences of h2private make the assumption that the estimated effect sizes for the GWAS SNPs were the true effect sizes of the causal variants. Further, we assumed that the variants were all independent of each other. In truth, these assumptions are violated for a variety of reasons. First, because of LD, SNPs may be correlated with one another. Second, some of the non-zero effect sizes of GWAS SNPs may be due to the fact that the GWAS SNP is tagging (in LD with) an untyped causal variant and is itself not causal. Third, even if the GWAS variants analyzed in our study are the true causal variants, their effect sizes may be misestimated by the effects at nearby SNPs in LD with them. Thus, given these challenges, we carefully considered the effect that unmodeled LD may have on our inferences (see Note S2).
First, we developed an estimator of the SNP-based heritability that downsamples the number of SNPs to be independent of each other. We checked the robustness of our results to this effect by randomly selecting a single SNP in a window and computing the proportion of heritability from private variants by using these randomly selected SNPs. We select only one SNP per window to avoid counting SNPs located nearby each other that are in LD with each other. We randomly selected SNPs to avoid biases due to the fact that more sophisticated methods for fine-mapping SNPs by using LD patterns may have different performance for different allele frequencies. We find similar results via our LD-pruned estimator compared with the full data (Note S3; Figures S5, S6, and S7). We also ensure that our estimates are sensible by estimating the proportion of additive genetic variance from variants we infer to be shared (Figure S8). If the inference procedure works correctly, this number should be . In Figure S8, we see that this is indeed the case.
Second, based on first principles, our estimates of h2private most likely underestimate the true proportion due to LD between tagging and causal variants (see Note S2). Because shared variants tend to be more common, they will tend to be in LD with more (and therefore tag more) variants. Because of this effect, shared variants could have inflated marginal effect sizes compared to private variants. This would lead to overestimating the heritability from shared variants compared to private variants, making our inferences conservative. We tested for this effect in real data by testing the correlation between marginal effect sizes and recombination rate for variants we predict to be private and variants we predict to be shared (Note S4). We found that variants we predict to be private have lower correlation than variants we predict to be shared, consistent with the idea that shared variants tag more variants than private variants (we also note that this reasoning is the motivation for LD score regression45). In addition, coalescent simulations show that our estimator of h2private is indeed slightly downwardly biased (Note S5; Figure S9).
Discussion
In this work, we have shown that recent population growth and negative selection create population-specific genetic architectures for phenotypes, which has the direct effect of reducing the accuracy of polygenic scores when applied between populations. The reduction in accuracy will depend on how differentiated populations are and accuracy decreases as populations become more differentiated. Another case to consider is admixed populations where some causal variants could be introduced and thus become shared variants. In these cases, we expect the utility of polygenic scores to be higher, but this will depend on how recent the admixture was and how many causal variants are transferred between populations, which can vary between individuals.
In our simulation results, we found that when there was no coupling between trait effects and fitness, approximately 30% of the heritability comes from private variants and that this proportion increases as the coupling increases. Although we expect this general pattern to hold, the specific values will depend on the distribution of fitness effect for causal alleles, the mutation target size, and the demographic history of the populations under study. We have used a distribution of fitness effects that was fit to non-synonymous variants39 and note that the estimates of selection on causal alleles could be revised in future studies. In addition, our model with admixture fits the observed data better than a model without admixture (Figure 1C), but we may still be underestimating the number of private alleles, which would cause our estimates to be a lower bound. Nevertheless, our results suggest that a non-negligible proportion of the heritability comes from private alleles.
We find that phenotypes with a majority of heritability explained by private variants are not likely to be predicted well in non-European populations, even if effect sizes are accurately inferred. Our analysis of the UK Biobank data suggests that most traits examined here have at least 20% of the heritability explained by private variants (h2private > 20%), indicating that cross-population polygenic scores are limited in accuracy and many population-specific causal variants remain to be discovered. We note that our inferences on the empirical data do not make use of the Eyre-Walker model.25 As such, our inferences from empirical data do not make any assumptions about the relationship between a mutation’s effect on fitness and the trait.
At first glance, our result that many traits have a population-specific genetic component seems at odds with recently reported results suggesting that the genetic correlation between traits in European and East Asian populations is very high.46,47 However, we note that both of these studies examined common variants (MAF > 5%), which are more likely to be shared. Our study explicitly considers a larger range of allele frequencies, which is more likely to include population-specific variants.
Our results have several implications for users of polygenic scores. First, we show that the transferability of polygenic scores depends on the particular trait being examined. For traits with larger values of h2private (such as diastolic and systolic blood pressure), the transferability would be lower because we find these traits derive more of their heritability from variants that are more likely to be private (h2private: ≈48% for both). In contrast, we find that traits with lower values of h2private, such as white blood cell count, can be more easily transferred because the heritability is spread more evenly across the spectrum of MAFs (h2private: 28%). Although we include standard errors estimated via a jackknife, this procedure may not account for all the uncertainty. Therefore, specific differences across traits should be interpreted cautiously. In addition, our inferences, like those in Lam et al.46 and Liu et al.,47 focus on the SNP heritability rather than the total heritability of particular traits.
Several recent reviews and commentaries have pointed out the potential for misuse of polygenic scores to justify racism and white supremacy, especially when comparing polygenic scores across populations.16,48, 49, 50, 51 Importantly, although our study indicates that population-specific variants play a role in complex traits, it is incorrect to conclude that population-specific variants lead to differences in traits between populations. Previous simulation studies have suggested that the interplay between demography and negative selection will not lead to large differences in trait heritability between populations.21,27 Instead, these evolutionary forces can change how the heritability is accounted for. For example, as we show here, population growth and negative selection can lead to heritability’s being accounted for by lower-frequency variants that are population specific instead of common variants shared across populations. Further, non-genetic factors most likely play an important role in differences in phenotype between populations.52
We also highlight a crucial issue in identifying individuals in the tails of the phenotype distribution. If polygenic scores are to be used more commonly in the clinic, false-negative rates must be more closely examined across populations and phenotypes. Our work suggests that many causal variants may not be shared between populations, indicating that variants ascertained in European populations may not be informative in other populations. This could occur because, on average, more European-specific variants have been either directly included in GWASs or imputed more often than variants specific to other non-European populations. To ensure equal predictive power of polygenic scores across populations, whole-genome sequencing-based association studies must be undertaken in non-European populations. Such studies would allow for unbiased discovery of private variants accounting for much of the heritability, resulting in improved polygenic prediction in non-European populations. Finally, large imputation panels from the relevant population of interest are necessary to include variation that is not present in Europe.
Declaration of interests
The authors declare no competing interests.
Acknowledgments
We thank Sriram Sankararaman, James Boocock, Alec Chiu, and Ruth Johnson for helpful discussions and Bogdan Pasaniuc, Nelson Freimer, and members of the Lohmueller lab for helpful comments on a draft of this manuscript. A.D. is funded by NSF Graduate Research Fellowship DGE-1650604 and K.E.L. is funded by NIH grant R35GM119856.
Published: March 9, 2021
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2021.02.013.
Data and code availability
The scripts required to carry out the inference of heritability from private variants can be found at https://github.com/LohmuellerLab/PRS.
Web resources
GWAS summary statistics, http://www.nealelab.is/uk-biobank
Supplemental information
References
- 1.Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Walter K., Min J.L., Huang J., Crooks L., Memari Y., McCarthy S., Perry J.R., Xu C., Futema M., Lawson D. The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82–90. doi: 10.1038/nature14962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS discovery: Biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.R., Bhatia G., Do R. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Khera A.V., Chaffin M., Wade K.H., Zahid S., Brancale J., Xia R., Distefano M., Senol-Cosar O., Haas M.E., Bick A. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177:587–596.e9. doi: 10.1016/j.cell.2019.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Natarajan P., Young R., Stitziel N.O., Padmanabhan S., Baber U., Mehran R. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation. 2017;135:2091–2101. doi: 10.1161/CIRCULATIONAHA.116.024436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Maas P., Barrdahl M., Joshi A.D., Auer P.L., Gaudet M.M., Milne R.L., Schumacher F.R., Anderson W.F., Check D., Chattopadhyay S. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States. JAMA Oncol. 2016;2:1295–1302. doi: 10.1001/jamaoncol.2016.1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Scutari M., Mackay I., Balding D. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 2016;12:e1006288. doi: 10.1371/journal.pgen.1006288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Martin A.R., Gignoux C.R., Walters R.K., Wojcik G.L., Neale B.M., Gravel S., Daly M.J., Bustamante C.D., Kenny E.E. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kim M.S., Patel K.P., Teng A.K., Berens A.J., Lachance J. Genetic disease risks can be misestimated across global populations. Genome Biol. 2018;19:179. doi: 10.1186/s13059-018-1561-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mostafavi H., Harpak A., Agarwal I., Conley D., Pritchard J.K., Przeworski M. Variable prediction accuracy of polygenic scores within an ancestry group. eLife. 2020;9:e48376. doi: 10.7554/eLife.48376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ragsdale A.P., Nelson D., Gravel S., Kelleher J. Lessons learned from bugs in models of human history. Am. J. Hum. Genet. 2020;107:583–588. doi: 10.1016/j.ajhg.2020.08.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Novembre J., Barton N.H. Tread lightly interpreting polygenic tests of selection. Genetics. 2018;208:1351–1355. doi: 10.1534/genetics.118.300786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Berg J.J., Harpak A., Sinnott-Armstrong N., Joergensen A.M., Mostafavi H., Field Y., Boyle E.A., Zhang X., Racimo F., Pritchard J.K., Coop G. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sohail M., Maier R.M., Ganna A., Bloemendal A., Martin A.R., Turchin M.C., Chiang C.W., Hirschhorn J., Daly M.J., Patterson N. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702. doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Martin A.R., Lin M., Granka J.M., Myrick J.W., Liu X., Sockell A., Atkinson E.G., Werely C.J., Möller M., Sandhu M.S. An unexpectedly complex architecture for skin pigmentation in Africans. Cell. 2017;171:1340–1353.e14. doi: 10.1016/j.cell.2017.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lohmueller K.E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 2014;10:e1004379. doi: 10.1371/journal.pgen.1004379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Keinan A., Clark A.G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science. 2012;336:740–743. doi: 10.1126/science.1217283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Tennessen J.A., Bigham A.W., O’Connor T.D., Fu W., Kenny E.E., Gravel S., McGee S., Do R., Liu X., Jun G. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. doi: 10.1126/science.1219240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gao F., Keinan A. High burden of private mutations due to explosive human population growth and purifying selection. BMC Genomics. 2014;15(Suppl 4):S3. doi: 10.1186/1471-2164-15-S4-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Eyre-Walker A. Evolution in health and medicine Sackler colloquium: Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl. Acad. Sci. USA. 2010;107(Suppl 1):1752–1756. doi: 10.1073/pnas.0906182107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sanjak J.S., Long A.D., Thornton K.R. A Model of compound heterozygous, loss-of-function alleles is broadly consistent with observations from complex-disease GWAS datasets. PLoS Genet. 2017;13:e1006573. doi: 10.1371/journal.pgen.1006573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Uricchio L.H. Evolutionary perspectives on polygenic selection, missing heritability, and GWAS. Hum. Genet. 2020;139:5–21. doi: 10.1007/s00439-019-02040-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hernandez R.D., Uricchio L.H., Hartman K., Ye C., Dahl A., Zaitlen N. Ultrarare variants drive substantial cis heritability of human gene expression. Nat. Genet. 2019;51:1349–1355. doi: 10.1038/s41588-019-0487-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gazal S., Finucane H.K., Furlotte N.A., Loh P.R., Palamara P.F., Liu X., Schoech A., Bulik-Sullivan B., Neale B.M., Gusev A., Price A.L. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Gazal S., Loh P.R., Finucane H.K., Ganna A., Schoech A., Sunyaev S., Price A.L. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 2018;50:1600–1607. doi: 10.1038/s41588-018-0231-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zeng J., de Vlaming R., Wu Y., Robinson M.R., Lloyd-Jones L.R., Yengo L., Yap C.X., Xue A., Sidorenko J., McRae A.F. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 2018;50:746–753. doi: 10.1038/s41588-018-0101-4. [DOI] [PubMed] [Google Scholar]
- 32.Schoech A.P., Jordan D.M., Loh P.R., Gazal S., O’Connor L.J., Balick D.J., Palamara P.F., Finucane H.K., Sunyaev S.R., Price A.L. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 2019;10:790. doi: 10.1038/s41467-019-08424-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Uricchio L.H., Kitano H.C., Gusev A., Zaitlen N.A. An evolutionary compass for detecting signals of polygenic selection and mutational bias. Evol. Lett. 2019;3:69–79. doi: 10.1002/evl3.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wainschtein P., Jain D.P., Yengo L., Zheng Z., Cupples L.A., Shadyab A.H., McKnight B., Shoemaker B.M., Mitchell B.D., Psaty B.M. Recovery of trait heritability from whole genome sequence data. bioRxiv. 2019 doi: 10.1101/588020. [DOI] [Google Scholar]
- 35.Young A.I. Solving the missing heritability problem. PLoS Genet. 2019;15:e1008222. doi: 10.1371/journal.pgen.1008222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.de los Campos G., Gianola D., Allison D.B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 2010;11:880–886. doi: 10.1038/nrg2898. [DOI] [PubMed] [Google Scholar]
- 37.Haller B.C., Messer P.W. SLiM 3: Forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 2019;36:632–637. doi: 10.1093/molbev/msy228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gravel S., Henn B.M., Gutenkunst R.N., Indap A.R., Marth G.T., Clark A.G., Yu F., Gibbs R.A., Bustamante C.D., 1000 Genomes Project Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA. 2011;108:11983–11988. doi: 10.1073/pnas.1019276108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kim B.Y., Huber C.D., Lohmueller K.E. Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples. Genetics. 2017;206:345–361. doi: 10.1534/genetics.116.197145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bryc K., Durand E.Y., Macpherson J.M., Reich D., Mountain J.L. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 2015;96:37–53. doi: 10.1016/j.ajhg.2014.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wakeley J., Hey J. Estimating ancestral population parameters. Genetics. 1997;145:847–855. doi: 10.1093/genetics/145.3.847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O’Donnell-Luria A.H., Ware J.S., Hill A.J., Cummings B.B. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Turcot V., Lu Y., Highland H.M., Schurmann C., Justice A.E., Fine R.S., Bradfield J.P., Esko T., Giri A., Graff M. Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat. Genet. 2018;50:26–41. doi: 10.1038/s41588-017-0011-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mathieson I., McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 2012;44:243–246. doi: 10.1038/ng.1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bulik-Sullivan B.K., Loh P.R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lam M., Chen C.Y., Li Z., Martin A.R., Bryois J., Ma X., Gaspar H., Ikeda M., Benyamin B., Brown B.C. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 2019;51:1670–1678. doi: 10.1038/s41588-019-0512-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Liu J.Z., van Sommeren S., Huang H., Ng S.C., Alberts R., Takahashi A., Ripke S., Lee J.C., Jostins L., Shah T. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 2015;47:979–986. doi: 10.1038/ng.3359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rosenberg N.A., Edge M.D., Pritchard J.K., Feldman M.W. Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evol. Med. Public Health. 2018;2019:26–34. doi: 10.1093/emph/eoy036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Harmon A. 2018. Why white supremacists are chugging milk (and why geneticists are alarmed). The New York Times, October 17, 2018.https://www.nytimes.com/2018/10/17/us/white-supremacists-science-dna.html [Google Scholar]
- 50.Fuentes A., Ackermann R.R., Athreya S., Bolnick D., Lasisi T., Lee S.H., McLean S.A., Nelson R. AAPA statement on race and racism. Am. J. Phys. Anthropol. 2019;169:400–402. doi: 10.1002/ajpa.23882. [DOI] [PubMed] [Google Scholar]
- 51.Saini A. Beacon Press; 2019. Superior: The Return of Race Science. [Google Scholar]
- 52.Coop G. Reading tea leaves? Polygenic scores and differences in traits among groups. arXiv. 2019 https://arxiv.org/abs/1909.00892 1909.00892. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The scripts required to carry out the inference of heritability from private variants can be found at https://github.com/LohmuellerLab/PRS.