Skip to main content
eLife logoLink to eLife
. 2019 Mar 21;8:e39702. doi: 10.7554/eLife.39702

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies

Mashaal Sohail 1,2,3,†,§,, Robert M Maier 3,4,5,†,, Andrea Ganna 3,4,5,6,7, Alex Bloemendal 3,4,5, Alicia R Martin 3,4,5, Michael C Turchin 8,9, Charleston WK Chiang 10, Joel Hirschhorn 3,11,12, Mark J Daly 3,4,5,7, Nick Patterson 3,13, Benjamin Neale 3,4,5,‡,, Iain Mathieson 14,‡,, David Reich 3,13,15,‡,, Shamil R Sunyaev 2,3,16,‡,
Editors: Magnus Nordborg17, Mark I McCarthy18
PMCID: PMC6428571  PMID: 30895926

Abstract

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

Research organism: Human

Introduction

Most human complex traits are highly polygenic (Yang et al., 2010; Boyle et al., 2017). For example, height has been estimated to be modulated by as much as 4% of human allelic variation (Boyle et al., 2017; Zeng et al., 2018). Polygenic traits are expected to evolve differently from monogenic ones, through slight but coordinated shifts in the frequencies of a large numbers of alleles, each with mostly small effect. In recent years, multiple methods have sought to detect selection on polygenic traits by evaluating whether shifts in the frequency of trait-associated alleles are correlated with the signed effects of the alleles estimated by genome-wide association studies (GWAS) (Turchin et al., 2012; Berg and Coop, 2014; Mathieson et al., 2015; Robinson et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018).

Here we focus on a series of recent studies—some involving co-authors of the present manuscript—that have reported evidence of polygenic adaptation at alleles associated with height in Europeans. One set of studies observed that height-increasing alleles are systematically elevated in frequency in northern compared to southern European populations, a result that has subsequently been extended to ancient DNA (Turchin et al., 2012; Berg and Coop, 2014; Mathieson et al., 2015; Robinson et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018; Simonti et al., 2017). Another study using a very different methodology (singleton density scores, SDS) found that height-increasing alleles have systematically more recent coalescence times in the United Kingdom (UK) consistent with selection for increased height in the last few thousand years (Field et al., 2016a). In the present work, we assess polygenic adaptation on human height as a particular case of the effects that uncorrected population structure in GWAS can have on studies of complex traits.

Most of these previous studies have been based on SNP associations and effect sizes (summary statistics) reported by the GIANT Consortium, which most recently combined 79 individual GWAS through meta-analysis, including a total of 253,288 individuals (Lango Allen et al., 2010; Wood et al., 2014). Here, we show that the selection effects described in these studies are severely attenuated and in some cases no longer significant when using summary statistics derived from the UK Biobank, an independent and larger study that includes 336,474 genetically unrelated individuals who derive their recent ancestry almost entirely from the British Isles (identified as ‘white British ancestry’ by the UK Biobank) (Supplementary file 1). The UK Biobank analysis is based on a single cohort drawn from a relatively homogeneous population enabling better control of population stratification. Both datasets have high concordance even for low P value SNPs which do not reach genome-wide significance (Figure 1—figure supplement 1; genetic correlation between the two height studies is 0.94 [se = 0.0078]). Despite this concordance, we observe that small but systematic biases lead to the two datasets yielding qualitatively different conclusions with respect to signals of polygenic adaptation.

Results

Discrepancies in GWAS: population-level differences in height

To study population level differences among ancient and present-day European samples, we began by estimating ‘polygenic height scores’ as sums of allele frequencies at independent SNPs weighted by their effect sizes from GIANT. We used a set of different significance thresholds and strategies to correct for linkage disequilibrium as employed by previous studies, and replicated their signals for significant differences in genetic height across populations (Turchin et al., 2012; Berg and Coop, 2014; Mathieson et al., 2015; Robinson et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018; Simonti et al., 2017) (Figure 1a, Figure 1—figure supplement 2). We then repeated the analysis using summary statistics from a GWAS for height in the UK Biobank restricting to individuals of British Isles ancestry (hereafter referred to as the ‘white British' (WB) subset) and correcting for population stratification based on the first ten principal components (UK Biobank [UKB]; also referred to as ‘UKB Neale’ in the supplementary figures) (Churchhouse et al., 2017). This analysis resulted in a dramatic attenuation of differences in polygenic height scores (Figure 1a, Figure 1—figure supplements 24). The differences between ancient European populations also greatly attenuated (Figure 1a, Figure 1—figure supplement 5). Strikingly, the ordering of the scores for populations also changed depending on which GWAS was used to estimate genetic height both within Europe (Figure 1a, Figure 1—figure supplements 25) and globally (Figure 1—figure supplement 6), consistent with reports from a recent simulation study (Martin et al., 2017). The height scores were qualitatively similar only when we restricted to independent genome-wide significant SNPs in GIANT and the UK Biobank (p<5×10−8) (Figure 1—figure supplement 2b). This replicates the originally reported significant north-south difference in the allele frequency of the height-increasing allele (Turchin et al., 2012) or in genetic height (Berg and Coop, 2014) across Europe, as well as the finding of greater genetic height in ancient European steppe pastoralists than in ancient European farmers (Mathieson et al., 2015), although the signals are attenuated even here. Our observations suggest that tests of polygenic adaptation based on genome-wide significant SNPs are relatively consistent across different GWAS (Figure 1—figure supplement 2b) and that our concern is primarily directed towards the use of sub-significant SNPs in polygenic scores (Figure 1a, Figure 1—figure supplement 2a).

Figure 1. Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.

(a) Polygenic scores in present-day and ancient European populations are shown, centered by the average score across populations and standardized by the square root of the additive variance. Independent SNPs for the polygenic score from both GIANT (red) and the UK Biobank [UKB] (blue) were selected by picking the SNP with the lowest P value in each of 1700 independent LD blocks similarly to refs. (Berg et al., 2017; Racimo et al., 2018) (see Materials and methods). Present-day populations are shown from Northern Europe (CEU, GBR) and Southern Europe (IBS, TSI) from the 1000 genomes project; Ancient populations are shown in three meta-populations (HG = Hunter Gatherer (n = 162 individuals), EF = Early Farmer (n = 485 individuals), and SP = Steppe Ancestry (n = 465 individuals)) (see Supplementary file 2). Error bars are drawn at 95% credible intervals. See Figure 1—figure supplement 1 for analyses of concordance of effect size estimates between GIANT and UKB. See Figure 1—figure supplements 26 for polygenic height scores computed using other linkage disequilibrium pruning procedures, significance thresholds, summary statistics and populations. (b) tSDS for height-increasing allele in GIANT (left) and UK Biobank (right). The tSDS method was applied using pre-computed Singleton Density Scores for 4,451,435 autosomal SNPs obtained from 3195 individuals from the UK10K project (Field et al., 2016a; Field et al., 2016b) for SNPs associated with height in GIANT and the UK Biobank. SNPs were ordered by GWAS P value and grouped into bins of 1000 SNPs each. The mean tSDS score within each P value bin is shown on the y-axis. The Spearman correlation coefficient between the tSDS scores and GWAS P values, as well as the correlation standard errors and P values, were computed on the un-binned data. The gray line indicates the null-expectation, and the colored lines are the linear regression fit. The correlation is significant for GIANT (Spearman r = 0.078, p=1.55×10−65) but not for UK Biobank (Spearman r = −0.009, p=0.077). See Figure 1—source data 1 for figure data.

Figure 1—source data 1. Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.
elife-39702-fig1-data1.xlsx (220.2KB, xlsx)
DOI: 10.7554/eLife.39702.009

Figure 1.

Figure 1—figure supplement 1. Beta concordance between GIANT and UK Biobank by P value bin.

Figure 1—figure supplement 1.

SNPs intersecting between GIANT and UKB were LD-pruned (using PLINK 1.9 with parameters r^2 = 0.1, window size = 1 Mb, step size 5) and grouped into P value bins of 500 SNPs each, for P values from GIANT (left) and UKB (right). Color is based on the smallest P value in each bin. (a) Absolute beta difference. As expected, absolute beta and thus the absolute beta difference increases across P value bins. (b) Absolute beta difference, scaled by the sum of absolute betas. The relative difference of absolute betas decreases for lower P values. (c) Pearson correlation among betas approaches one for the lowest P values. (d) Correlation between beta (left GIANT, right UK Biobank) and GBR-TSI allele frequency difference. (e) Correlation between the GIANT - UK Biobank beta difference and GBR-TSI allele frequency difference.
Figure 1—figure supplement 2. Polygenic height scores based on GIANT and UK Biobank GWAS for clumped SNPs in present-day and ancient Europeans.

Figure 1—figure supplement 2.

Scores are shown, centered by the average score across modern and ancient populations respectively and standardized by the square root of the additive variance. SNPs were LD-pruned with plink’s clumping procedure for parameters: (a) r2 <0.1, 1 Mb, p<0.01 (81,941 SNPs in UKB, 22,561 SNPs in GIANT), and (b) r2 <0.1, 1 Mb, p<5×10−8 (4478 SNPs in UKB, 1442 SNPs in GIANT). Modern populations are shown from Northern Europe (CEU, GBR) and Southern Europe (IBS, TSI) from the 1000 genomes project. Ancient populations are shown in three meta-populations (HG = Hunter Gatherer (n = 162 individuals), EF = Early Farmer (n = 485 individuals), and SP = Steppe Ancestry (n = 465 individuals)). Error bars are drawn at 95% credible intervals.
Figure 1—figure supplement 3. Polygenic height scores in 1000 genomes European populations using clumped SNPs and effect sizes from different summary statistics.

Figure 1—figure supplement 3.

Polygenic scores in modern European populations are shown using SNPs LD-pruned with PLINK’s clumping procedure with parameters: (a) r2 <0.1, 1 Mb, p<0.01, and (b) r2 <0.1, 1 Mb, p<5×10−8. Scores are centered by the average score across populations and standardized by the square root of the additive variance. Modern populations are shown from Northern Europe (CEU, GBR) and Southern Europe (IBS, TSI) from the 1000 Genomes Project. Error bars are drawn at 95% credible intervals.
Figure 1—figure supplement 4. Polygenic height scores in 1000 Genomes Project European populations using ~1700 independent SNPs and effect sizes from different summary statistics.

Figure 1—figure supplement 4.

Polygenic scores in modern European populations are shown using SNPs LD-pruned by picking the SNP with the lowest P value in each of ~1700 LD-independent blocks genome-wide. Scores are centered by the average score across populations and standardized by the square root of the additive variance. Modern populations are shown from Northern Europe (CEU, GBR) and Southern Europe (IBS, TSI) from the 1000 Genomes Project. Error bars are drawn at 95% credible intervals.
Figure 1—figure supplement 5. Polygenic height scores in ancient populations using ~1700 independent SNPs and effect sizes from different summary statistics.

Figure 1—figure supplement 5.

Polygenic scores in ancient meta-populations are shown using SNPs LD-pruned by picking the SNP with the lowest P value in each of ~1700 LD-independent blocks genome-wide. Scores are centered by the average score across populations and standardized by the square root of the additive variance. Error bars are drawn at 95% credible intervals. Ancient populations are shown in three meta-populations (HG = Hunter Gatherer (n = 162 individuals), EF = Early Farmer (n = 485 individuals), and SP = Steppe Ancestry (n = 465 individuals)). The y-axis is truncated at (−1.5, 1.5) for all panels – this omits two points in the NG2015 sibs panel: HG [3.86 (CI: 3.60, 4.12)], EF [−2.18(CI: −2.34,–2.02)].
Figure 1—figure supplement 6. Polygenic height scores in ancient and global modern populations using three different GWAS.

Figure 1—figure supplement 6.

All scores are centered by the average score across all populations (μGIANT=0.645, μLOH=0.219, μNEALELAB=0.259) and standardized by the square root of the additive variance. Error bars are drawn at 95% credible intervals. Modern populations are shown from Northern Europe (CEU, GBR), Southern Europe (IBS, TSI), South Asia (PJL, BEB), East Asia (CHB, JPT) and Africa (YRI, LWK). Ancient populations are shown in three meta-populations (HG = Hunter-Gatherer (n=162 individuals), EF = Early Farmer (n=485 individuals), and SP = Steppe Ancestry (n=465 individuals)). Pseudo-haploid genotype calls were made for modern populations before computing polygenic scores to allow fair comparison with ancient DNA. SNPs were LD-pruned by picking the SNP with the lowest P value in each of ~1700 LD-independent blocks genome-wide.

Discrepancies in GWAS: height evolution within a single population

Next, we assessed if an independent measure, the ‘singleton density score (SDS)', which uses a coalescent approach to infer adaptation within a population, is equally as susceptible to biases in GWAS (Field et al., 2016a; Field et al., 2016b). SDS can be combined with GWAS effect size estimates to infer polygenic adaptation on complex traits (generating a ‘tSDS score’ by aligning the SDS sign to the trait-increasing allele). A tSDS score larger than zero for height-increasing alleles implies that these alleles have been increasing in frequency in a population over time due to natural selection. We replicate the original finding that SDS scores of the height-increasing allele computed in the UK population (using the UK10K dataset) increase with stronger association of the alleles to height as inferred by GIANT (Field et al., 2016a) across the entire P value spectrum (Spearman’s ρ = 0.078, p=1.55×10−65, Figure 1b). However, we observed that this signal of polygenic adaptation in the UK, measured using a Spearman correlation across all GWAS SNPs, disappeared when we used the UK Biobank height effect size estimates (ρ = 0.009, p=0.077, Figure 1b). These observations suggest that concerns about sub-significant SNPs should not only be directed towards population-level differences using polygenic scores but also to analyses of adaptation within a single population.

Population structure underlying discrepancies in GWAS

Discrepancies between GIANT and UK biobank

We propose that the qualitative difference between the polygenic adaptation signals in GIANT and the UK Biobank is due to the cumulative effect of subtle biases in each of the SNPs estimated in GIANT. This bias can arise due to incomplete control of the population structure in GWAS (Novembre and Barton, 2018). For example, if height were differentiated along a north-south axis because of differences in environment, any variant that is differentiated in frequency along the same axis would have an artificially large effect size estimated in the GWAS. Population structure is substantially less well controlled for in the GIANT study than in the UK Biobank study. This is both because the GIANT study population is more heterogeneous than that in the UK Biobank, and because population structure in the GIANT meta-analysis may not have been well controlled in some component cohorts due to their relatively small sizes (i.e., the ability to detect and correct population structure is dependent on sample size (Patterson et al., 2006; Price et al., 2006). The GIANT meta-analysis also found that such stratification effects worsen as SNPs below genome-wide significance are used to estimate height scores (Wood et al., 2014), consistent with our finding that the differences in genetic height among populations increase when including these SNPs.

We obtained direct confirmation that population structure is more correlated with effect size estimates in GIANT than to those in the UK Biobank. Figure 2a shows that the effect sizes estimated in GIANT, in contrast to those in the UKB, are highly correlated with the SNP loadings of several principal components of population structure (PC loadings). We also find that the UK Biobank estimates including individuals of diverse ancestry and not correcting for population structure (UKB all no PCs) show the same stratification effects as GIANT (Figure 2—figure supplements 13). Further, in line with our intuition regarding the effects of residual stratification on GWAS effect size estimates, we find that alleles that are more common in the Great Britain population (1000 genomes GBR) than in the Tuscan population from Italy (1000 genomes TSI) tend to be preferentially estimated as height-increasing according to the GIANT study but not according to the UKB study (Figure 2c, Figure 2—figure supplements 23).

Figure 2. Evidence of stratification in height summary statistics.

Top row: Pearson Correlation coefficients of (a) PC loadings and height beta coefficients from GIANT and UKB, and (b) PC loadings and SDS (pre-computed in the UK10K) across all SNPs. PCs were computed in all 1000 genomes phase one samples (Abecasis et al., 2012). Colors indicate the correlation of each PC loading with the allele frequency difference between GBR and TSI, a proxy for the European North-South genetic differentiation. PC 4 and 11 are most highly correlated with the GBR - TSI allele frequency difference. Confidence intervals and P values are based on Jackknife standard errors (1000 blocks). Open circles indicate correlations significant at alpha = 0.05, stars indicate correlations significant after Bonferroni correction in 20 PCs (p<0.0025). Bottom row: Heat map after binning all SNPs by GBR and TSI minor allele frequency of (c) mean beta coefficients from GIANT and UKB, and (d) SDS scores for all SNPs. Only bins with at least 300 SNPs are shown. While the stratification effect in SDS is not unexpected, it can lead to false conclusions when applied to summary statistics that exhibit similar stratification effects. See Figure 2—figure supplements 13 for analyses of stratification effects in different summary statistics, and Supplementary file 3 for further description of stratification effects. UKB height betas exhibit stratification effects that are weaker, and in the opposite direction of the stratification effects in GIANT (see Figure 2—figure supplement 4 for a possible explanation). See Figure 2—source data 1 for figure data.

Figure 2—source data 1. Evidence of stratification in height summary statistics.
elife-39702-fig2-data1.xlsx (196.2KB, xlsx)
DOI: 10.7554/eLife.39702.015

Figure 2.

Figure 2—figure supplement 1. Pearson Correlation coefficients of PC loadings and height beta coefficients for different summary statistics.

Figure 2—figure supplement 1.

PCs were computed in all 1000 genomes phase one samples. Colors indicate the correlation of each PC loading with the allele frequency difference between GBR and TSI, a proxy for the European North-South genetic differentiation. PC 4 and 11 are most highly correlated with the GBR - TSI allele frequency difference. Error bars indicate 95% confidence interval of the correlation coefficient, assuming 60,000 independent genetic markers. We confirmed that the resulting standard error estimates are similar to block jackknife standard error estimates. Open circles indicate correlations significant at alpha = 0.05, stars indicate correlations significant after Bonferroni correction in 20 PCs (p<0.0025).
Figure 2—figure supplement 2. Heat map of mean beta coefficients for different summary statistics.

Figure 2—figure supplement 2.

All SNPs are binned by GBR and TSI minor allele frequency. Only bins with at least 300 SNPs are shown. Panel 7 (as well as 2, 3 and 4) shows stratification effects in opposite direction to those in GIANT. Figure 2—figure supplement 4 illustrates how these opposite-direction stratification effects can arise.
Figure 2—figure supplement 3. Effect of GBR-TSI allele frequency difference on beta estimates and P values.

Figure 2—figure supplement 3.

SNPs with MAF >0.2 (based on mean between TSI and GBR) were grouped into GBR-TSI allele frequency difference deciles, with the first decile representing SNPs less common in GBR and the last decile representing SNPs more common in GBR. (a) Fraction of height-increasing (yellow dots) vs. height-decreasing SNPs (purple dots) in each decile. In GIANT, 59% of SNPs in the highest decile are estimated to be height-increasing, and 41% are estimated to be height-decreasing. In the UK Biobank, this ratio is close to 50–50. (b) Lambda-GC in each decile for height-increasing (yellow dots) vs. height-decreasing SNPs (purple dots). In GIANT, the median P value of SNPs in the highest decile is 2.78 for SNPs estimated to be height-increasing and 1.83 for SNPs estimated to be height-decreasing (a difference of 52%). In the UK Biobank, the median P value of SNPs in the highest decile is 2.65 for SNPs estimated to be height-increasing and 2.89 for SNPs estimated to be height-decreasing (a difference of only 9%, going in the opposite direction).
Figure 2—figure supplement 4. Height (cm) in the UKB as a function of GBR-TSI score.

Figure 2—figure supplement 4.

We computed the relative number of GBR to TSI related alleles in each sample by multiplying the allele frequency difference by the number of alternative alleles in each sample in the UKB (GBR-TSI score). Vertical lines indicate 5th and 95th percentile of among-white British samples, showing that there is a significant negative relationship between the GBR-TSI allele sharing score and height (in cm). Among all other broadly European samples, this relationship is significantly positive across the whole range, but again significantly negative in the white British range. This can explain why stratification effects go in opposite directions in a UKB height GWAS of white British samples and a UKB height GWAS of all samples. Here, other European samples were defined as those that lie within the mean ±24 standard deviations along the first six principal components.

Effect size estimates from previously published family-based height GWAS

We analyzed previously released family-based effect size estimates based on an approach of Robinson et al. (2015) (NG2015 sibs). Surprisingly, we found that while these summary statistics produced significant polygenic adaptation signals, they were also correlated with PC loadings as well as with GBR-TSI allele frequency differences (Figure 2—figure supplements 13). This suggests that these estimates are also affected by population structure despite being computed within families and, therefore, in principle, robust to structure. Our own family-based estimates in the UK Biobank (UKB sibs all, UKB sibs WB) appear unconfounded and do not produce significant adaptation signals across the spectrum of associated SNPs (Figure 2—figure supplements 13). The residual structure in the original NG2015 sibs dataset is likely to reflect a technical artifact (personal communication from Peter Visscher, and note on their website [Program in Complex Trait Genomics, 2018]). Berg and colleagues (Berg et al., 2019) show that the updated NG2015 sibs summary statistics (posted in the public domain [Program in Complex Trait Genomics, 2018] in November 2018 during the revision of this manuscript) do not show significant signals of polygenic adaptation using either polygenic score differences in Europe or the tSDS metric in the UK.

Population structure within the UK biobank

We also note that the white British subset of the UKB data is not completely free of population stratification (as shown previously [Haworth et al., 2019]), although the magnitude of the potential confounding is much smaller than in the Continental European population (Figure 2—figure supplements 12). Interestingly, the north-south genetic cline in the UK tracks the height gradient in the opposite direction than in Continental Europe (Figure 2—figure supplements 2 and 4), and after correcting with principal components, we do not observe any evidence of residual stratification in comparison with the 1000 genomes data (Figure 2a,c). However, we cannot exclude the possibility of uncorrected population stratification, even in the UK Biobank, along axes not captured by the principal components of the 1000 genomes project data. For example, even for genome-wide significant SNPs (Figure 1—figure supplement 2b), polygenic scores for both modern and ancient individuals change when UKB summary statistics (WB ancestry controlling for 10 PCs) are used instead of GIANT. This shift, for example, for the ancient European hunter-gatherer polygenic score is troubling as different European populations are shown to have variable amounts of genetic ancestry from ancient ‘hunter-gatherer’ vs. ‘early farmer’ vs. ‘steppe ancestry’ populations (Haak et al., 2015; Galinsky et al., 2016), and could reflect residual stratification in the UKB GWAS not captured by the 1000 genomes PCs.

Effects of population structure on within-population adaptation inference

We proceeded to investigate the effects of uncontrolled population stratification in GWAS discussed above on a coalescent approach such as tSDS that relies on singleton density (Field et al., 2016a). In principle, this approach is robust to the type of population stratification that affects the allele-frequency based tests. However, there is a north-south cline in singleton density in Europe due to lower genetic diversity in northern than in southern Europeans, leading to singleton density being lower in northern than in southern regions (Sohail et al., 2017). As a consequence, SDS tends to be higher (corresponding to fewer singletons) in alleles more common in GBR than in TSI (Figure 2d). This cline in singleton density coincidentally parallels the phenotypic cline in height and the major axis of genome-wide genetic variation. Therefore, when we perform the tSDS test using GIANT, we find a higher SDS around the inferred height-increasing alleles, which tend, due to the uncontrolled population stratification in GIANT, to be at high frequency in northern Europe (Figure 2c). This effect does not appear when we use UK Biobank summary statistics because of the much lower level of population stratification and more modest variation in height. We find that SDS is not only correlated with GBR-TSI allele frequency differences, but with several principal component loadings across all SNPs (Figure 2b), and that these SDS-PC correlations often coincide with correlations between GIANT-estimated effect sizes and PC loadings (Figure 2a). We further find that the tSDS signal which is observed across the whole range of P values in some GWAS summary statistics can be mimicked by replacing SDS with GBR-TSI allele frequency differences (Figure 3a and c, Figure 3—figure supplements 14), suggesting that the tSDS signal at non-significant SNPs may be driven in part by residual population stratification.

Figure 3. Height tSDS results for different summary statistics.

(a) Mean tSDS of the height increasing allele in each P value bin for six different summary statistics. The first two panels are computed analogously to Figure 4A and Figure S22 of Field et al. (2016a). In contrast to those Figures and to Figure 1b, the displayed betas and P values correspond to the slope and P value of the linear regression across all un-binned SNPs (rather than the Spearman correlation coefficient and Jackknife P values). The y-axis has been truncated at 0.75, and does not show the top bin for UKB all no PCs, which has a mean tSDS of 1.5. (b) tSDS distribution of the height increasing allele in 506 LD-independent SNPs which are genome-wide significant in a UKB height GWAS, where the beta coefficient is taken from a within sibling analysis in the UKB. The gray curve represents the standard normal null distribution, and we observe a significant shift. (c) Allele frequency difference between GBR and TSI of the height increasing allele in each P value bin for six different summary statistics. Betas and P values correspond to the slope and P value of the linear regression across all un-binned SNPs. The lowest P value bin in UKB all no PCs with a y-axis value of 0.06 has been omitted. (d) Allele frequency difference between GBR and TSI of the height increasing allele in 329 LD-independent SNPs which are genome-wide significant in a UKB height GWAS and were intersected with our set of 1000 genomes SNPs. There is no significant difference in frequency in these two populations, suggesting that tSDS shift at the genome-wide significant SNPs is not driven by population stratification at least due to this particular axis. The patterns shown here suggest that the positive tSDS values across the whole range of P values is a consequence of residual stratification. At the same time, the increase in tSDS at genome-wide significant, LD-independent SNPs in (b) cannot be explained by GBR - TSI allele frequency differences as shown in (d). See Figure 3—figure supplements 14 for other GWAS summary statistics for unpruned and LD-pruned SNPs. Binning SNPs by P value without LD-pruning can lead to unpredictable patterns at the low P value end, as the SNPs at the low P value end are less independent of each other than higher P value SNPs (Figure 3—figure supplement 5). See Figure 3—source data 1 for figure data.

Figure 3—source data 1. Height tSDS results for different summary statistics.
elife-39702-fig3-data1.xlsx (404.6KB, xlsx)
DOI: 10.7554/eLife.39702.022

Figure 3.

Figure 3—figure supplement 1. tSDS for height-increasing alleles using effect sizes from different summary statistics.

Figure 3—figure supplement 1.

SNPs were ordered by GWAS P value and grouped into bins of 1000 SNPs each. The mean tSDS score within each P value bin is shown on the y-axis. In contrast to Figure 3, where Spearman correlation coefficients and Jackknife standard errors were computed, here we show the regression slope and P value, which were computed on the un-binned data. The gray line indicates the null-expectation, and the colored lines are the linear regression fit. The lowest P value bin in panel five with a y-axis value of 1.5 has been omitted.
Figure 3—figure supplement 2. Allele frequency difference for height-increasing alleles using different summary statistics.

Figure 3—figure supplement 2.

SNPs were ordered by GWAS P value and grouped into bins of 1000 SNPs each. The gray line indicates the null-expectation, and the colored lines are the linear regression fit. The lowest P value bin in panel five with a y-axis value of 0.06 has been omitted.
Figure 3—figure supplement 3. tSDS for LD-pruned height-increasing alleles using effect sizes from different summary statistics.

Figure 3—figure supplement 3.

Binning SNPs by P value can lead to spurious results at the low P value bins when SNPs are in LD (Figure 3—figure supplement 5). Here, LD-pruned SNPs were ordered by GWAS P value and grouped into bins of 100 SNPs each. The mean tSDS score within each P value bin is shown on the y-axis. In contrast to Figure 3, where Spearman correlation coefficients and Jackknife standard errors were computed, here we show the regression slope and P value, which were computed on the un-binned data. The gray line indicates the null-expectation, and the colored lines are the linear regression fit.
Figure 3—figure supplement 4. Allele frequency difference for LD-pruned height-increasing alleles using different summary statistics.

Figure 3—figure supplement 4.

Binning SNPs by P value can lead to spurious results at the low P value bins when SNPs are in LD (Figure 3—figure supplement 5). Here, LD-pruned SNPs were ordered by GWAS P value and grouped into bins of 100 SNPs each. The gray line indicates the null-expectation, and the colored lines are the linear regression fit.
Figure 3—figure supplement 5. Number of independent regions per GWAS P value bin in the UK Biobank.

Figure 3—figure supplement 5.

SDS results in Field et al. as well as in Figure 3 in this article are visualized by grouping non-independent SNPs into bins according to their P value. This may lead to unpredictable patterns at the low end of the P value distribution, because the lowest P value bins do not represent independent signals. This is demonstrated here, by grouping all UKB SNPs into bins of 1000 SNPs each, as in the SDS plots in Figure 1b and Figure 3. Left: The number of independent SNPs per P value bin is much lower at lower P values. Right: Neighboring P value bins share a large fraction of 1 Mb regions at lower P values. This demonstrates that the lowest P value bins do not represent independent signals if SNPs are not LD-pruned and can exhibit patterns that are dominated by one or a few LD-regions.

A residual signal of polygenic adaptation on height?

For polygenic adaptation within a population, a small but significant tSDS signal is observed in the UK when we restrict to genome-wide significant SNPs (p<5×10−8). This effect persists when using UK Biobank family-based estimates (UKB sibs WB) for genome-wide significant SNPs (Figure 3b), and is not driven by allele frequency differences between GBR and TSI (Figure 3d), suggesting an attenuated signal of polygenic adaptation in the UK that is driven by a much smaller number of SNPs than previously thought. Indeed, under most genetic architectures, a tSDS signal which is driven by natural selection is not expected to lead to an almost linear increase over the whole P value range in a well-powered GWAS. Instead, we would expect to see a greater difference between highly significant SNPs and non-significant SNPs, similar to the pattern observed in the UK Biobank (Figure 3a).

For population-level differences in height, we assessed whether any remaining variation in height polygenic scores among populations is driven by polygenic adaptation by testing against a null model of genetic drift (Berg and Coop, 2014). We re-computed polygenic height scores in the POPRES dataset to increase power for this analysis as it has larger sample sizes of northern and southern Europeans than the 1000 Genomes project (Nelson et al., 2008). We computed height scores using independent SNPs that are 1) genome-wide significant in the UK Biobank (‘gw-sig’, p<5×10−8) and 2) sub-significantly associated with height (‘sub-sig’, p<0.01) in different GWAS datasets. For each of these, we tested if population differences were significant due to an overall overdispersion (PQx), and if they were significant along a north-south cline (Plat) (Figure 4, Figure 4—figure supplements 12). Both gw-sig and sub-sig SNP-based scores computed using GIANT effect sizes showed significant overdispersion of height scores overall and along a latitude cline, consistent with previous results (Figure 4, Figure 4—figure supplements 12). However, the signal attenuated dramatically between sub-sig (Qx = 1100, PQx = 1×10−220) and gw-sig (Qx = 48, PQx = 2×10−4) height scores. In comparison, scores that were computed using the UK Biobank (UKB) effect sizes showed substantially attenuated differences using both sub-sig (Qx = 64, PQx = 5×10−7) and gw-sig (Qx = 33, PQx = 0.02) SNPs, and a smaller difference between the two scores. This suggests that the attenuation of the signal in GIANT is not only driven by a loss of power when using fewer gw-sig SNPs, but also reflects a decrease in stratification effects. The overdispersion signal disappeared entirely when the UK Biobank family based effect sizes were used (Figure 4, Figure 4—figure supplements 12). Moreover, Qx P values based on randomly ascertained SNPs and UK Biobank summary statistics are not uniformly distributed as would be expected if the theoretical null model is valid and if population structure is absent (Figure 4—figure supplement 3). The possibility of residual stratification effects even in the UK Biobank is also supported by a recent study (Haworth et al., 2019). Therefore, we remain cautious about interpreting any residual signals as ‘real’ signals of polygenic adaptation.

Figure 4. Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height.

Standardized polygenic height scores from four summary statistics for 19 POPRES populations with at least 10 samples per population, ordered by latitude (see Supplementary file 4). The grey line is the linear regression fit to the mean polygenic scores per population. Error bars represent 95% confidence intervals and are calculated in the same way as in Figure 1. SNPs which were overlapping between each set of the summary statistics and the POPRES SNPs were clumped using PLINK 1.9 with parameters r^2 < 0.1, 1 Mb distance, p<1. (Top) A number of independent SNPs was chosen for each summary statistic to match the number of SNPs which remained when clumping UKB at p<0.01. (Bottom) A set of independent SNPs with p<5×10−8 in the UK Biobank was selected and used to compute polygenic scores along with effect size estimates from each of the different summary statistics. The numbers on each plot show the Qx P value and the latitude covariance P value respectively for each summary statistic. See Figure 4—figure supplements 14 for other clumping strategies and GWAS summary statistics. See Figure 4—source data 1 for figure data.

Figure 4—source data 1. Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height.
This reference was updated from its bioRxiv version to its now published version.
DOI: 10.7554/eLife.39702.028

Figure 4.

Figure 4—figure supplement 1. Polygenic height scores in POPRES for different summary statistics.

Figure 4—figure supplement 1.

Standardized polygenic height score from diverse summary statistics for 19 POPRES populations with at least 10 samples per population, ordered by latitude (see Supplementary file 4). Confidence intervals and clumping procedure are the same as in (a). The gray line is the linear regression fit to the mean polygenic height score per population. The numbers on each plot show the Qx P value, the latitude covariance P value and the number of SNPs respectively for each summary statistic. Each column shows a different selection of SNPs. clump all: clumped SNPs with no P value threshold; clump 0.01: clumped SNPs with p<0.01 in UKB and the same number of SNPs in other summary statistics (same as Figure 4); clumpwindow 1.5M: genome was split into blocks of 1.5 Mb, lowest P-value SNP was picked in each bin, similar to the 1700 blocks; ldwindow 1.5 Mb: genome was split into blocks of 1.5 Mb, random SNP was picked in each bin; UKB sig: LD-pruned SNPs with p<5×10−8 in UKB.
Figure 4—figure supplement 2. Test statistics for Qx (left) and latitude correlation (right) in the POPRES dataset for different summary statistics.

Figure 4—figure supplement 2.

The numbers indicate P values and the number of SNPs, and numbers in bold highlight nominal significance (p<0.05).
Figure 4—figure supplement 3. P value calibration in the POPRES dataset for Qx and latitude covariance tests.

Figure 4—figure supplement 3.

Random sets of around 1700 independent markers were drawn in 100 repetitions for four summary statistics and Qx and latitude P values were computed. In UK Biobank sibling estimates this resulted in a uniform P value distribution (non-significant Kolmogorov–Smirnov test), while an inflation was observed for UK Biobank GWAS summary statistics.
Figure 4—figure supplement 4. Spearman correlations between polygenic height scores in the POPRES dataset computed from different summary statistics.

Figure 4—figure supplement 4.

Spearman correlation coefficients of mean population polygenic score ranking for all pairs of summary statistics at different SNP selections. Polygenic scores from independent SNPs which are genome-wide significant in UKB lead to more consistent rankings than PRS from other sets of SNPs, despite having lower prediction power. Each column shows a different selection of SNPs. clump all: clumped SNPs with no P value threshold; clump 0.01: clumped SNPs with p<0.01 in UKB and the same number of SNPs in other summary statistics (same as Figure 4); clumpwindow 1.5M: genome was split into blocks of 1.5 Mb, lowest P-value SNP was picked in each bin, similar to the 1700 blocks; ldwindow 1.5 Mb: genome was split into blocks of 1.5 Mb, random SNP was picked in each bin; UKB sig: LD-pruned SNPs with p<5×10−8in UKB.

Discussion

We have shown, by conducting a detailed analysis of human height, that estimates of population differences in polygenic scores are reduced when using the UK Biobank GWAS data relative to claims of previous studies that used GWAS meta-analyses such as GIANT. We find some evidence for population-level differences in genetic height, but it can only be robustly seen at highly significant SNPs, because any signal at less significant P values is dominated by the effect of residual population stratification. Even genome-wide significant SNPs in these analyses may be subtly affected by population structure, leading to continued overestimation of the effect. Thus, it is difficult to arrive at any quantitative conclusion regarding the proportion of the population differences that are due to statistical biases vs. population stratification of genetic height. Further, estimates of the number of independent genetic loci contributing to complex trait variation are sensitive to and likely confounded by residual population stratification.

We conclude that while effect estimates are highly concordant between GIANT and the UK Biobank when measured individually (Supplementary file 57, Figure 1—figure supplement 1), they are also influenced by residual population stratification that can mislead comparisons of complex traits across populations and inferences about polygenic adaptation. Although these biases are subtle, in the context of tests for polygenic adaptation, which are driven by small systematic shifts in allele frequency, they can create highly significant artificial signals especially when SNPs that are not genome-wide significant are used to estimate genetic height. Our results do not question the reliability of the genome-wide significant associations discovered in the GIANT cohort. However, we urge caution in the interpretation of signals of polygenic adaptation or between-population differences that are based on large number of sub-significant SNPs–particularly when using effect sizes derived from meta-analysis of heterogeneous cohorts which may be unable to fully control for population structure.

Our results have implications in other areas of human genetics research. For example, there is growing interest in polygenic scores that predict complex phenotypes from the aggregate effects of all allelic variants (Wray et al., 2007; Purcell et al., 2009; Vilhjálmsson et al., 2015; Chun et al., 2018). The observation that individuals with extreme values of polygenic scores exhibit many-fold elevated risk of common diseases raises hopes for their potential clinical utility (Ganna et al., 2013; Khera et al., 2018), and use for sociogenomics applications (Lee et al., 2018; Savage et al., 2018; Nagel et al., 2018). It is already clear that polygenic scores derived from European populations do not translate across populations on a global scale (Martin et al., 2017). Our analysis further suggests that subtle population structure, especially in GWAS that are meta-analyses of independent cohorts, could be an additional source of error in polygenic scores and affect their applicability even within populations. We also note that other factors such as gene by environment interactions can be an alternative confounding factor for GWAS effect sizes and polygenic scores.

Materials and methods

Genome-wide association studies (GWAS)

We analyzed height using publicly available summary statistics that were obtained either by meta-analysis of multiple GWAS or by a GWAS performed on a single large population. We used results from the GIANT Consortium (N = 253,288) (Wood et al., 2014) and a GWAS performed on individuals of the UK Biobank (‘UKB Neale’ or simply ‘UK Biobank (UKB)', N = 336,474) (Churchhouse et al., 2017) who derive their ancestry almost entirely from the British Isles (identified as ‘white British ancestry (WB)’ by the UK Biobank). The Neale lab’s GWAS uses a linear model with sex and 10 principal components as covariates. We also used an independent GWAS that included all UK Biobank European samples, allowing related individuals as well as population structure (‘UKB Loh’, N = 459,327) (Loh et al., 2018). Loh et al.’s GWAS uses a BOLT-LMM Bayesian mixed model (Loh et al., 2018). Association signals from the three studies are generally correlated for SNPs that are genome-wide significant in GIANT (see Yengo et al., 2018).

We also used previously published family-based effect size estimates (Robinson et al., 2015) (‘NG2015 sibs’) as well as a number of test summary statistics on the UK Biobank that we generated to study the effects of population stratification. These are: ‘UKB Neale new’ (Similar to UKB Neale, with less stringent ancestry definition and 20 PCs calculated within sample), ‘UKB all no PCs’ (All UK Biobank samples included in the GWAS without correction by principal components), ‘UKB all 10 PCs’ (All UK Biobank samples included in the GWAS with correction by 10 principal components), ‘UK WB no PCs’ (Only ‘white British ancestry’ samples included in the GWAS without correction by principal components), ‘UKB WB 10 PCs’ (Only ‘white British ancestry’ samples included in the GWAS with correction by 10 principal components), ‘UKB sibs all’ (All UK Biobank siblings included in the GWAS), ‘UKB sibs WB’ (Only UK Biobank ‘white British ancestry’ siblings included in the GWAS) (Please see Supplementary file 1 for sample sizes and other details).

Population genetic data for ancient and modern samples

We analyzed ancient and modern populations for which genotype data are publicly available. For ancient samples (Haak et al., 2015; Mathieson et al., 2018), we computed scores after dividing populations into three previously described broad ancestry labels (HG = Hunter Gatherer (n = 162 individuals), EF = Early Farmer (n = 485 individuals), and SP = Steppe Ancestry (n = 465 individuals)). For modern samples available through the 1000 genomes phase three release (Auton et al., 2015), we computed scores in two populations each from Northern Europe (GBR, CEU), Southern Europe (IBS, TSI), Africa (YRI, LWK), South Asia (PJL, BEB) and East Asia (CHB, JPT) (Figure 1a). In total, we analyzed 1112 ancient individuals, and 1005 modern individuals from 10 different populations in the 1000 genomes project (Supplementary file 2). We used the allele frequency differences between the GBR and TSI populations for a number of analyses to study population stratification (Figures 23). We also analyzed 19 European populations from the POPRES (Nelson et al., 2008) dataset with at least 10 samples per population (Figure 4—figure supplement 4).

All ancient samples had ‘pseudo-haploid’ genotype calls at 1240k sites generated by selecting a single sequence randomly for each individual at each SNP (Mathieson et al., 2018). Thus, there is only a single allele from each individual at each site, but adjacent alleles might come from either of the two haplotypes of the individual. We also re-computed scores in present-day 1000 genomes individuals using only pseudo-haploid calls at 1240 k sites to allow for a fair comparison between ancient and modern samples (Figure 1—figure supplement 6).

Polygenic scores

The polygenic scores, confidence intervals and test statistics (against the null model of genetic drift) were computed based on the methodology developed in references Berg and Coop, 2014 and Berg et al., 2017. We computed the polygenic score (Z) for a trait in a population by taking the sum of allele frequencies in that population across all L sites associated with the trait, weighting each allele’s frequency (pl) by its effect on the trait (βl).

Z=lLβlpl

Al polygenic scores are plotted in centered standardized form (Z-μVA),

where μ=lβlpl-, VA=lβl2pl-(1-pl-), and pl- is the mean allele frequency across all populations analyzed. Source code repositories for the polygenic score analysis and computing scripts and source data for all the main figures have been made available at https://github.com/msohail88/polygenic_selection (Sohail, 2018; copy archived at https://github.com/elifesciences-publications/polygenic_selection) and https://github.com/uqrmaie1/sohail_maier_2019 (Sohail, 2019; copy archived at https://github.com/elifesciences-publications/sohail_maier_2019).

Polygenic scores were computed using independent GWAS SNPs associated with height in three main ways: (1) The genome was divided into ~1700 non-overlapping linkage disequilibrium (LD) blocks (using the approximately independent linkage disequilibrium blocks in the EUR population computed in Berisa and Pickrell, 2015), and the SNP with the lowest P value within each block was picked to give a set of ~1700 independent SNPs for each height GWAS used (all SNPs for which effect sizes are available were considered) similar to the analysis in Berg et al., 2017. In (2) and (3), Plink’s (Chang et al., 2015; Purcell and Chang, 2015) clumping procedure was used to make independent ‘clumps’ of SNPs for each GWAS at different P value thresholds. This procedure selects SNPs below a given P value threshold as index SNPs to start clumps around, and then reduces all SNPs below a given P value threshold that are in LD with these index SNPs (above an r2 threshold, 0.1) and within a physical distance of them (1 Mb) into clumps with them. Clumps are preferentially formed around index SNPs with the lowest P value in a greedy manner. The index SNP from each clump is then picked for further polygenic score analyses. The algorithm is also greedy such that each SNP will only appear in one clump if at all. We clumped each GWAS to obtain (2) a set of independent sub-significant SNPs associated with height (p<0.01) similarly to Robinson et al. (2015), and (3) a set of independent genome-wide significant SNPs associated with height (p<5×10−8). The 1000 genomes phase three dataset was used as the reference panel for computing LD for the clumping procedure.

The estimated effect sizes for these three sets of SNPs from each GWAS was used to compute scores. Only autosomal SNPs were used for all analyses to avoid creating artificial mean differences between populations with different numbers of males and females.

The 95% credible intervals were constructed by assuming that the posterior of the underlying population allele frequency is independent across loci and populations and follows a beta distribution. We updated a Uniform prior distribution with allele counts from ancient and modern populations to obtain the posterior distribution at each locus in each population. We estimated the variance of the polygenic score VZ using the variance of the posterior distribution at each locus, and computed the width of 95% credible intervals as 1.96VZ for each population.

The Qx test statistic measures the degree of overdispersion of the mean population polygenic score compared to a null model of genetic drift. It assumes that the vector of mean centered mean population polygenic score follows a multivariate normal distribution: Z ~ MVN(0, 2 VA F), where VA is the additive genetic variance of the ancestral population and F is a square matrix describing the population structure. This is equivalent to the univariate case of the test statistic used in Robinson et al. (2015). The latitude test statistic assumes that Y’Z ~ N(0, 2 VA Y’FY), where Y is a mean centered vector of latitudes for each population (Berg et al., 2019).

tSDS analysis

The Singleton Density Score (SDS) method identifies signatures of recent positive selection based on a maximum likelihood estimate of the log-ratio of the mean tip-branch length of the derived vs. the ancestral allele at a given SNP. The tip-branch lengths are inferred from the average distance of each allele to the nearest singleton SNP across all individuals in a sequencing panel. When the sign of the SDS scores is aligned with the trait-increasing or trait-decreasing allele in the effect estimates of a GWAS, the Spearman correlation between the resulting tSDS scores and the GWAS P values has been proposed as an estimate of recent positive selection on polygenic traits.

Here, we applied the tSDS method using pre-computed Singleton Density Scores for 4,451,435 autosomal SNPs obtained from 3195 individuals from the UK10K project (Field et al., 2016a; Field et al., 2016b) for SNPs associated with height in GIANT and the UK biobank (Figure 1b) and in different summary statistics (Figure 3). After normalizing SDS scores in each 1% allele frequency bin to mean zero and unit variance, excluding SNPs from the MHC region on chromosome six and aligning the sign of the SDS scores to the height increasing alleles (resulting in tSDS scores), we computed the Spearman correlation coefficient between the tSDS score and the GWAS P value. The tSDS Spearman correlation standard errors and P values were computed using a block-jackknife approach, where each block of 1% of all SNPs ordered by genomic location was left out and the Spearman correlation coefficient was computed on the remaining SNPs. We also compared the tSDS score distributions for only genome-wide significant SNPs (Figure 3b).

Population structure analysis

To compute SNP loadings of the principal components of population structure (PC loadings) in the 1000 genomes data (Figure 2), we first computed PC scores for each individual. We used SNPs that had matching alleles in 1000 genomes, GIANT and UK Biobank, that had minor allele frequency >5% in 1000 genomes, and that were not located in the MHC locus, the chromosome eight inversion region, or regions of long LD. After LD pruning to SNPs with r2 <0.2 relative to each other, PCA was performed in PLINK on the 187,160 remaining SNPs. In order to get SNP PC loadings for more SNPs than those that were used to compute PC scores, we performed linear regressions of the PC scores on the genotype allele count of each SNP (after controlling for sex) and used the resulting regression coefficients as the SNP PC loading estimates. The 1000 genomes phase one dataset (Abecasis et al., 2012) was used to compute the PC loadings.

Acknowledgements

We thank Alkes Price, Jeremy Berg, Graham Coop, Jonathan Pritchard, Matthew Robinson, Jian Yang, Peter Visscher, Hilary Finucane, John Novembre and Raymond Walters for useful discussions and comments that significantly improved the manuscript. The study was supported by National Institutes of Health grants HG009088, MH101244 (MS, RM, BN and SS) and GM127131 (SS). DR was supported by National Institutes of Health grants GM100233 and HG006399, an Allen Discovery Center grant from the Paul Allen Foundation, and the Howard Hughes Medical Institute. IM was supported by a Sloan Research Fellowship and a New Investigator Research Grant from the Charles E Kaufman foundation.

This research was conducted using the UK Biobank Resource applications 18597, 11898 and 31063.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Mashaal Sohail, Email: mashaal33@gmail.com.

Robert M Maier, Email: rmaier@broadinstitute.org.

Benjamin Neale, Email: bneale@broadinstitute.org.

Iain Mathieson, Email: mathi@pennmedicine.upenn.edu.

David Reich, Email: reich@genetics.med.harvard.edu.

Shamil R Sunyaev, Email: ssunyaev@rics.bwh.harvard.edu.

Magnus Nordborg, Austrian Academy of Sciences, Austria.

Mark I McCarthy, University of Oxford, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • National Institutes of Health HG009088 to Mashaal Sohail, Robert M Maier, Benjamin Neale, Shamil R Sunyaev.

  • National Institutes of Health MH101244 to Mashaal Sohail, Robert M Maier, Benjamin Neale, Shamil R Sunyaev.

  • Alfred P. Sloan Foundation Sloan Research Fellowship to Iain Mathieson.

  • Charles E Kaufman Foundation New Investigator Research Grant to Iain Mathieson.

  • Paul Allen Foundation Allen Discovery Center to David Reich.

  • National Institutes of Health GM100233 to David Reich.

  • National Institutes of Health HG006399 to David Reich.

  • Howard Hughes Medical Institute Investigator to David Reich.

  • National Institutes of Health GM127131 to Shamil R Sunyaev.

Additional information

Competing interests

No competing interests declared.

Ben Neale is a member and on the scientific advisory board of Deep Genomics, a consultant for Camp4 Therapeutics Corporation, a consultant for Merck & Co., a consultant for Takeda Phamaceutical, and a consultant for Avanir Pharmaceuticals. None of these entities played a role in determining the content of this paper.

Author contributions

Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Formal analysis, Writing—review and editing.

Methodology, Writing—review and editing.

Data curation, Writing—review and editing.

Validation, Writing—review and editing.

Validation, Writing—review and editing.

Validation, Methodology, Writing—review and editing.

Methodology, Writing—review and editing.

Methodology, Writing—review and editing.

Supervision, Visualization, Methodology, Writing—review and editing.

Data curation, Supervision, Investigation, Methodology, Writing—review and editing.

Conceptualization, Supervision, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Conceptualization, Supervision, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Additional files

Supplementary file 1. Description of 11 GWAS summary statistics.
elife-39702-supp1.xlsx (42.9KB, xlsx)
DOI: 10.7554/eLife.39702.029
Supplementary file 2. Table of ancient and 1000 genomes modern populations used with sample sizes.
elife-39702-supp2.xlsx (36.8KB, xlsx)
DOI: 10.7554/eLife.39702.030
Supplementary file 3. Supplementary note on characterization of stratification effects in GIANT and UK Biobank.
elife-39702-supp3.docx (120.4KB, docx)
DOI: 10.7554/eLife.39702.031
Supplementary file 4. Table of POPRES populations used with sample sizes and latitude.
elife-39702-supp4.xlsx (40.6KB, xlsx)
DOI: 10.7554/eLife.39702.032
Supplementary file 5. LD Score regression estimates for 11 different summary statistics.

LD score regression can be used to detect residual stratification effects in summary statistics, and so we tested whether LDSC confirms our hypothesis of residual stratification. We detect a greatly inflated intercept estimate of 9.42 in UKB all no PCs, but only a moderately increased intercept value in GIANT and an intercept less than one in NG2015 sibs. The relatively small GIANT intercept can be explained by cohort-wise lambda-GC correction, while the low intercept in NG2015 sibs is possibly caused by the adaptive permutation procedure which does not compute precise p-values for non-significant associations. In both cases LDSC cannot be expected to pick up stratification effects, since the generation of summary statistics is not in line with the LDSC model.

elife-39702-supp5.xlsx (51KB, xlsx)
DOI: 10.7554/eLife.39702.033
Supplementary file 6. Correlation of beta estimates at all 86,153 shared SNPs.
elife-39702-supp6.xlsx (45.5KB, xlsx)
DOI: 10.7554/eLife.39702.034
Supplementary file 7. Correlation of beta estimates at 2251 shared SNPs which are significant in the UK Biobank.
elife-39702-supp7.xlsx (47.5KB, xlsx)
DOI: 10.7554/eLife.39702.035
Transparent reporting form
DOI: 10.7554/eLife.39702.036

Data availability

All newly generated UK Biobank height GWAS summary statistics have been made available at http://dx.doi.org/10.5061/dryad.8g5g6j4. Results from the GIANT Consortium (GWAS Anthropometric 2014 Height) were downloaded from https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files#GWAS_Anthropometric_2014_Height. GWAS results from the UK Biobank ("UKB" or "UKB Neale") were downloaded from http://www.nealelab.is/uk-biobank. The previously published family-based effect size estimates ("NG2015 sibs") can be accessed here http://cnsgenomics.com/data/robinson_et_al_2015_ng/withinfam_summary_ht_bmi_release_March2016.tar.gz. The independent mixed model association analysis that included all UK Biobank individuals of European ancestry ("UKB Loh") was downloaded from https://data.broadinstitute.org/alkesgroup/UKBB/body_HEIGHTz.sumstats.gz. Approximately independent linkage disequilibrium blocks in human populations were downloaded for the EUR population from https://bitbucket.org/nygcresearch/ldetect-data/overview. Source code repositories for the polygenic score analysis in this manuscript and computing scripts and source data for all the main figures have been made available at https://github.com/msohail88/polygenic_selection and https://github.com/uqrmaie1/sohail_maier_2019 (copies archived at https://github.com/elifesciences-publications/polygenic_selection and https://github.com/elifesciences-publications/sohail_maier_2019, respectively).

The following dataset was generated:

Sohail M, Maier RM, Ganna A. 2018. Data from: Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Dryad Digital Repository.

References

  1. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Berg JJ, Zhang X, Coop G. Polygenic adaptation has impacted multiple anthropometric traits. BioRxiv. 2017 doi: 10.1101/167551. [DOI]
  4. Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, Boyle EA, Zhang X, Racimo F, Pritchard JK, Coop G. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLOS Genetics. 2014;10:e1004412. doi: 10.1371/journal.pgen.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Berisa T, Pickrell JK. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2015;32:btv546. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:1–16. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chun S, Imakaev M, Stitziel NO, Sunyaev SR. Non-parametric polygenic risk prediction using partitioned GWAS summary statistics. BioRxiv. 2018 doi: 10.1101/370064. [DOI] [PMC free article] [PubMed]
  10. Churchhouse C, Neale BM, Abbott L, Anttila V, Aragam K, Baumann A, Bloom J, Bryant S, Churchhouse C, Cole J, Daly MJ, Damian R, Ganna A, Goldstein J, Haas M, Hirschhorn J, Howrigan D, Jones E, King D. Rapid gwas of thousands of phenotypes for 337,000 samples in the Uk biobank. [February 11, 2018];2017 https://sites.google.com/broadinstitute.org/ukbbgwasresults/home?authuser=0
  11. Field Y, Boyle EA, Telis N, Gao Z, Gaulton KJ, Golan D, Yengo L, Rocheleau G, Froguel P, McCarthy MI, Pritchard JK. Detection of human adaptation during the past 2000 years. Science. 2016a;354:760–764. doi: 10.1126/science.aag0776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Field Y, Boyle E, Telis N, Gao Z, Gaulton K, Golan D, Yengo L, Rocheleau G, Froguel P, McCarthy M, Pritchard J. 2016b. Data from: detection of human adaptation during the past 2000 years. Dyrad Digital Repository. [DOI] [PMC free article] [PubMed]
  13. Galinsky KJ, Loh PR, Mallick S, Patterson NJ, Price AL. Population structure of UK biobank and ancient eurasians reveals adaptation at genes influencing blood pressure. The American Journal of Human Genetics. 2016;99:1130–1139. doi: 10.1016/j.ajhg.2016.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Ganna A, Magnusson PK, Pedersen NL, de Faire U, Reilly M, Arnlöv J, Sundström J, Hamsten A, Ingelsson E. Multilocus genetic risk scores for coronary heart disease prediction. Arteriosclerosis, Thrombosis, and Vascular Biology. 2013;33:2267–2272. doi: 10.1161/ATVBAHA.113.301218. [DOI] [PubMed] [Google Scholar]
  15. Guo J, Wu Y, Zhu Z, Zheng Z, Trzaskowski M, Zeng J, Robinson MR, Visscher PM, Yang J. Global genetic differentiation of complex traits shaped by natural selection in humans. Nature Communications. 2018;9:1–9. doi: 10.1038/s41467-018-04191-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, Fu Q, Mittnik A, Bánffy E, Economou C, Francken M, Friederich S, Pena RG, Hallgren F, Khartanovich V, Khokhlov A, Kunst M, Kuznetsov P, Meller H, Mochalov O, Moiseyev V, Nicklisch N, Pichler SL, Risch R, Rojo Guerra MA, Roth C, Szécsényi-Nagy A, Wahl J, Meyer M, Krause J, Brown D, Anthony D, Cooper A, Alt KW, Reich D. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–211. doi: 10.1038/nature14317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Haworth S, Mitchell R, Corbin L, Wade KH, Dudding T, Budu-Aggrey A, Carslake D, Hemani G, Paternoster L, Smith GD, Davies N, Lawson DJ, J Timpson N. Apparent latent structure within the UK biobank sample has implications for epidemiological analysis. Nature Communications. 2019;10 doi: 10.1038/s41467-018-08219-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segrè AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Mägi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G, Winkler TW, Goddard ME, Sin Lo K, Palmer C, Workalemahu T, Aulchenko YS, Johansson A, Zillikens MC, Feitosa MF, Esko T, Johnson T, Ketkar S, Kraft P, Mangino M, Prokopenko I, Absher D, Albrecht E, Ernst F, Glazer NL, Hayward C, Hottenga JJ, Jacobs KB, Knowles JW, Kutalik Z, Monda KL, Polasek O, Preuss M, Rayner NW, Robertson NR, Steinthorsdottir V, Tyrer JP, Voight BF, Wiklund F, Xu J, Zhao JH, Nyholt DR, Pellikka N, Perola M, Perry JR, Surakka I, Tammesoo ML, Altmaier EL, Amin N, Aspelund T, Bhangale T, Boucher G, Chasman DI, Chen C, Coin L, Cooper MN, Dixon AL, Gibson Q, Grundberg E, Hao K, Juhani Junttila M, Kaplan LM, Kettunen J, König IR, Kwan T, Lawrence RW, Levinson DF, Lorentzon M, McKnight B, Morris AP, Müller M, Suh Ngwa J, Purcell S, Rafelt S, Salem RM, Salvi E, Sanna S, Shi J, Sovio U, Thompson JR, Turchin MC, Vandenput L, Verlaan DJ, Vitart V, White CC, Ziegler A, Almgren P, Balmforth AJ, Campbell H, Citterio L, De Grandi A, Dominiczak A, Duan J, Elliott P, Elosua R, Eriksson JG, Freimer NB, Geus EJ, Glorioso N, Haiqing S, Hartikainen AL, Havulinna AS, Hicks AA, Hui J, Igl W, Illig T, Jula A, Kajantie E, Kilpeläinen TO, Koiranen M, Kolcic I, Koskinen S, Kovacs P, Laitinen J, Liu J, Lokki ML, Marusic A, Maschio A, Meitinger T, Mulas A, Paré G, Parker AN, Peden JF, Petersmann A, Pichler I, Pietiläinen KH, Pouta A, Ridderstråle M, Rotter JI, Sambrook JG, Sanders AR, Schmidt CO, Sinisalo J, Smit JH, Stringham HM, Bragi Walters G, Widen E, Wild SH, Willemsen G, Zagato L, Zgaga L, Zitting P, Alavere H, Farrall M, McArdle WL, Nelis M, Peters MJ, Ripatti S, van Meurs JB, Aben KK, Ardlie KG, Beckmann JS, Beilby JP, Bergman RN, Bergmann S, Collins FS, Cusi D, den Heijer M, Eiriksdottir G, Gejman PV, Hall AS, Hamsten A, Huikuri HV, Iribarren C, Kähönen M, Kaprio J, Kathiresan S, Kiemeney L, Kocher T, Launer LJ, Lehtimäki T, Melander O, Mosley TH, Musk AW, Nieminen MS, O'Donnell CJ, Ohlsson C, Oostra B, Palmer LJ, Raitakari O, Ridker PM, Rioux JD, Rissanen A, Rivolta C, Schunkert H, Shuldiner AR, Siscovick DS, Stumvoll M, Tönjes A, Tuomilehto J, van Ommen GJ, Viikari J, Heath AC, Martin NG, Montgomery GW, Province MA, Kayser M, Arnold AM, Atwood LD, Boerwinkle E, Chanock SJ, Deloukas P, Gieger C, Grönberg H, Hall P, Hattersley AT, Hengstenberg C, Hoffman W, Lathrop GM, Salomaa V, Schreiber S, Uda M, Waterworth D, Wright AF, Assimes TL, Barroso I, Hofman A, Mohlke KL, Boomsma DI, Caulfield MJ, Cupples LA, Erdmann J, Fox CS, Gudnason V, Gyllensten U, Harris TB, Hayes RB, Jarvelin MR, Mooser V, Munroe PB, Ouwehand WH, Penninx BW, Pramstaller PP, Quertermous T, Rudan I, Samani NJ, Spector TD, Völzke H, Watkins H, Wilson JF, Groop LC, Haritunians T, Hu FB, Kaplan RC, Metspalu A, North KE, Schlessinger D, Wareham NJ, Hunter DJ, O'Connell JR, Strachan DP, Wichmann HE, Borecki IB, van Duijn CM, Schadt EE, Thorsteinsdottir U, Peltonen L, Uitterlinden AG, Visscher PM, Chatterjee N, Loos RJ, Boehnke M, McCarthy MI, Ingelsson E, Lindgren CM, Abecasis GR, Stefansson K, Frayling TM, Hirschhorn JN. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. doi: 10.1038/nature09410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, Nguyen-Viet TA, Bowers P, Sidorenko J, Karlsson Linnér R, Fontana MA, Kundu T, Lee C, Li H, Li R, Royer R, Timshel PN, Walters RK, Willoughby EA, Yengo L, Alver M, Bao Y, Clark DW, Day FR, Furlotte NA, Joshi PK, Kemper KE, Kleinman A, Langenberg C, Mägi R, Trampush JW, Verma SS, Wu Y, Lam M, Zhao JH, Zheng Z, Boardman JD, Campbell H, Freese J, Harris KM, Hayward C, Herd P, Kumari M, Lencz T, Luan J, Malhotra AK, Metspalu A, Milani L, Ong KK, Perry JRB, Porteous DJ, Ritchie MD, Smart MC, Smith BH, Tung JY, Wareham NJ, Wilson JF, Beauchamp JP, Conley DC, Esko T, Lehrer SF, Magnusson PKE, Oskarsson S, Pers TH, Robinson MR, Thom K, Watson C, Chabris CF, Meyer MN, Laibson DI, Yang J, Johannesson M, Koellinger PD, Turley P, Visscher PM, Benjamin DJ, Cesarini D, 23andMe Research Team COGENT (Cognitive Genomics Consortium) Social Science Genetic Association Consortium Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics. 2018;50:1112–1121. doi: 10.1038/s41588-018-0147-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nature Genetics. 2018;50:906–908. doi: 10.1038/s41588-018-0144-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human demographic history impacts genetic risk prediction across diverse populations. The American Journal of Human Genetics. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, Harney E, Stewardson K, Fernandes D, Novak M, Sirak K, Gamba C, Jones ER, Llamas B, Dryomov S, Pickrell J, Arsuaga JL, de Castro JM, Carbonell E, Gerritsen F, Khokhlov A, Kuznetsov P, Lozano M, Meller H, Mochalov O, Moiseyev V, Guerra MA, Roodenberg J, Vergès JM, Krause J, Cooper A, Alt KW, Brown D, Anthony D, Lalueza-Fox C, Haak W, Pinhasi R, Reich D. Genome-wide patterns of selection in 230 ancient eurasians. Nature. 2015;528:499–503. doi: 10.1038/nature16152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Mathieson I, Alpaslan-Roodenberg S, Posth C, Szécsényi-Nagy A, Rohland N, Mallick S, Olalde I, Broomandkhoshbacht N, Candilio F, Cheronet O, Fernandes D, Ferry M, Gamarra B, Fortes GG, Haak W, Harney E, Jones E, Keating D, Krause-Kyora B, Kucukkalipci I, Michel M, Mittnik A, Nägele K, Novak M, Oppenheimer J, Patterson N, Pfrengle S, Sirak K, Stewardson K, Vai S, Alexandrov S, Alt KW, Andreescu R, Antonović D, Ash A, Atanassova N, Bacvarov K, Gusztáv MB, Bocherens H, Bolus M, Boroneanţ A, Boyadzhiev Y, Budnik A, Burmaz J, Chohadzhiev S, Conard NJ, Cottiaux R, Čuka M, Cupillard C, Drucker DG, Elenski N, Francken M, Galabova B, Ganetsovski G, Gély B, Hajdu T, Handzhyiska V, Harvati K, Higham T, Iliev S, Janković I, Karavanić I, Kennett DJ, Komšo D, Kozak A, Labuda D, Lari M, Lazar C, Leppek M, Leshtakov K, Vetro DL, Los D, Lozanov I, Malina M, Martini F, McSweeney K, Meller H, Menđušić M, Mirea P, Moiseyev V, Petrova V, Price TD, Simalcsik A, Sineo L, Šlaus M, Slavchev V, Stanev P, Starović A, Szeniczey T, Talamo S, Teschler-Nicola M, Thevenet C, Valchev I, Valentin F, Vasilyev S, Veljanovska F, Venelinova S, Veselovskaya E, Viola B, Virag C, Zaninović J, Zäuner S, Stockhammer PW, Catalano G, Krauß R, Caramelli D, Zariņa G, Gaydarska B, Lillie M, Nikitin AG, Potekhina I, Papathanasiou A, Borić D, Bonsall C, Krause J, Pinhasi R, Reich D. The genomic history of southeastern europe. Nature. 2018;555:197–203. doi: 10.1038/nature25778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nagel M, Jansen PR, Stringer S, Watanabe K, de Leeuw CA, Bryois J, Savage JE, Hammerschlag AR, Skene NG, Muñoz-Manchado AB, White T, Tiemeier H, Linnarsson S, Hjerling-Leffler J, Polderman TJC, Sullivan PF, van der Sluis S, Posthuma D, 23andMe Research Team Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nature Genetics. 2018;50:920–927. doi: 10.1038/s41588-018-0151-7. [DOI] [PubMed] [Google Scholar]
  26. Nelson MR, Bryc K, King KS, Indap A, Boyko AR, Novembre J, Briley LP, Maruyama Y, Waterworth DM, Waeber G, Vollenweider P, Oksenberg JR, Hauser SL, Stirnadel HA, Kooner JS, Chambers JC, Jones B, Mooser V, Bustamante CD, Roses AD, Burns DK, Ehm MG, Lai EH. The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. The American Journal of Human Genetics. 2008;83:347–358. doi: 10.1016/j.ajhg.2008.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Novembre J, Barton NH. Tread lightly interpreting polygenic tests of selection. Genetics. 2018;208:1351–1355. doi: 10.1534/genetics.118.300786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLOS Genetics. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  30. Program in Complex Trait Genomics Program in complex trait genomics. [December 2, 2018];2018 http://cnsgenomics.com/data.html
  31. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P,  International Schizophrenia Consortium Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Purcell S, Chang C. PLINK 1. GigaScience 2015
  33. Racimo F, Berg JJ, Pickrell JK. Detecting polygenic adaptation in admixture graphs. Genetics. 2018;208:1565–1584. doi: 10.1534/genetics.117.300489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Robinson MR, Hemani G, Medina-Gomez C, Mezzavilla M, Esko T, Shakhbazov K, Powell JE, Vinkhuyzen A, Berndt SI, Gustafsson S, Justice AE, Kahali B, Locke AE, Pers TH, Vedantam S, Wood AR, van Rheenen W, Andreassen OA, Gasparini P, Metspalu A, Berg LH, Veldink JH, Rivadeneira F, Werge TM, Abecasis GR, Boomsma DI, Chasman DI, de Geus EJ, Frayling TM, Hirschhorn JN, Hottenga JJ, Ingelsson E, Loos RJ, Magnusson PK, Martin NG, Montgomery GW, North KE, Pedersen NL, Spector TD, Speliotes EK, Goddard ME, Yang J, Visscher PM. Population genetic differentiation of height and body mass index across Europe. Nature Genetics. 2015;47:1357–1362. doi: 10.1038/ng.3401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Savage JE, Jansen PR, Stringer S, Watanabe K, Bryois J, de Leeuw CA, Nagel M, Awasthi S, Barr PB, Coleman JRI, Grasby KL, Hammerschlag AR, Kaminski JA, Karlsson R, Krapohl E, Lam M, Nygaard M, Reynolds CA, Trampush JW, Young H, Zabaneh D, Hägg S, Hansell NK, Karlsson IK, Linnarsson S, Montgomery GW, Muñoz-Manchado AB, Quinlan EB, Schumann G, Skene NG, Webb BT, White T, Arking DE, Avramopoulos D, Bilder RM, Bitsios P, Burdick KE, Cannon TD, Chiba-Falek O, Christoforou A, Cirulli ET, Congdon E, Corvin A, Davies G, Deary IJ, DeRosse P, Dickinson D, Djurovic S, Donohoe G, Conley ED, Eriksson JG, Espeseth T, Freimer NA, Giakoumaki S, Giegling I, Gill M, Glahn DC, Hariri AR, Hatzimanolis A, Keller MC, Knowles E, Koltai D, Konte B, Lahti J, Le Hellard S, Lencz T, Liewald DC, London E, Lundervold AJ, Malhotra AK, Melle I, Morris D, Need AC, Ollier W, Palotie A, Payton A, Pendleton N, Poldrack RA, Räikkönen K, Reinvang I, Roussos P, Rujescu D, Sabb FW, Scult MA, Smeland OB, Smyrnis N, Starr JM, Steen VM, Stefanis NC, Straub RE, Sundet K, Tiemeier H, Voineskos AN, Weinberger DR, Widen E, Yu J, Abecasis G, Andreassen OA, Breen G, Christiansen L, Debrabant B, Dick DM, Heinz A, Hjerling-Leffler J, Ikram MA, Kendler KS, Martin NG, Medland SE, Pedersen NL, Plomin R, Polderman TJC, Ripke S, van der Sluis S, Sullivan PF, Vrieze SI, Wright MJ, Posthuma D. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nature Genetics. 2018;50:912–919. doi: 10.1038/s41588-018-0152-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Simonti C, Stein J, Thompson P, Fisher SE, Dan J. Polygenic selection underlies evolution of human brain structure and behavioral traits. BioRxiv. 2017 doi: 10.1101/164707. [DOI]
  37. Sohail M, Vakhrusheva OA, Sul JH, Pulit SL, Francioli LC, van den Berg LH, Veldink JH, de Bakker PIW, Bazykin GA, Kondrashov AS, Sunyaev SR, Genome of the Netherlands Consortium Alzheimer’s Disease Neuroimaging Initiative Negative selection in humans and fruit flies involves synergistic epistasis. Science. 2017;356:539–542. doi: 10.1126/science.aah5238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sohail M. GitHub; 2018. https://github.com/msohail88/polygenic_selection [Google Scholar]
  39. Sohail M. sohail_maier_2019. 7e84c66GitHub. 2019 https://github.com/uqrmaie1/sohail_maier_2019
  40. Turchin MC, Chiang CW, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN, Genetic Investigation of ANthropometric Traits (GIANT) Consortium Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genetics. 2012;44:1015–1019. doi: 10.1038/ng.2368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The American Journal of Human Genetics. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, Chu AY, Estrada K, Luan J, Kutalik Z, Amin N, Buchkovich ML, Croteau-Chonka DC, Day FR, Duan Y, Fall T, Fehrmann R, Ferreira T, Jackson AU, Karjalainen J, Lo KS, Locke AE, Mägi R, Mihailov E, Porcu E, Randall JC, Scherag A, Vinkhuyzen AA, Westra HJ, Winkler TW, Workalemahu T, Zhao JH, Absher D, Albrecht E, Anderson D, Baron J, Beekman M, Demirkan A, Ehret GB, Feenstra B, Feitosa MF, Fischer K, Fraser RM, Goel A, Gong J, Justice AE, Kanoni S, Kleber ME, Kristiansson K, Lim U, Lotay V, Lui JC, Mangino M, Mateo Leach I, Medina-Gomez C, Nalls MA, Nyholt DR, Palmer CD, Pasko D, Pechlivanis S, Prokopenko I, Ried JS, Ripke S, Shungin D, Stancáková A, Strawbridge RJ, Sung YJ, Tanaka T, Teumer A, Trompet S, van der Laan SW, van Setten J, Van Vliet-Ostaptchouk JV, Wang Z, Yengo L, Zhang W, Afzal U, Arnlöv J, Arscott GM, Bandinelli S, Barrett A, Bellis C, Bennett AJ, Berne C, Blüher M, Bolton JL, Böttcher Y, Boyd HA, Bruinenberg M, Buckley BM, Buyske S, Caspersen IH, Chines PS, Clarke R, Claudi-Boehm S, Cooper M, Daw EW, De Jong PA, Deelen J, Delgado G, Denny JC, Dhonukshe-Rutten R, Dimitriou M, Doney AS, Dörr M, Eklund N, Eury E, Folkersen L, Garcia ME, Geller F, Giedraitis V, Go AS, Grallert H, Grammer TB, Gräßler J, Grönberg H, de Groot LC, Groves CJ, Haessler J, Hall P, Haller T, Hallmans G, Hannemann A, Hartman CA, Hassinen M, Hayward C, Heard-Costa NL, Helmer Q, Hemani G, Henders AK, Hillege HL, Hlatky MA, Hoffmann W, Hoffmann P, Holmen O, Houwing-Duistermaat JJ, Illig T, Isaacs A, James AL, Jeff J, Johansen B, Johansson Å, Jolley J, Juliusdottir T, Junttila J, Kho AN, Kinnunen L, Klopp N, Kocher T, Kratzer W, Lichtner P, Lind L, Lindström J, Lobbens S, Lorentzon M, Lu Y, Lyssenko V, Magnusson PK, Mahajan A, Maillard M, McArdle WL, McKenzie CA, McLachlan S, McLaren PJ, Menni C, Merger S, Milani L, Moayyeri A, Monda KL, Morken MA, Müller G, Müller-Nurasyid M, Musk AW, Narisu N, Nauck M, Nolte IM, Nöthen MM, Oozageer L, Pilz S, Rayner NW, Renstrom F, Robertson NR, Rose LM, Roussel R, Sanna S, Scharnagl H, Scholtens S, Schumacher FR, Schunkert H, Scott RA, Sehmi J, Seufferlein T, Shi J, Silventoinen K, Smit JH, Smith AV, Smolonska J, Stanton AV, Stirrups K, Stott DJ, Stringham HM, Sundström J, Swertz MA, Syvänen AC, Tayo BO, Thorleifsson G, Tyrer JP, van Dijk S, van Schoor NM, van der Velde N, van Heemst D, van Oort FV, Vermeulen SH, Verweij N, Vonk JM, Waite LL, Waldenberger M, Wennauer R, Wilkens LR, Willenborg C, Wilsgaard T, Wojczynski MK, Wong A, Wright AF, Zhang Q, Arveiler D, Bakker SJ, Beilby J, Bergman RN, Bergmann S, Biffar R, Blangero J, Boomsma DI, Bornstein SR, Bovet P, Brambilla P, Brown MJ, Campbell H, Caulfield MJ, Chakravarti A, Collins R, Collins FS, Crawford DC, Cupples LA, Danesh J, de Faire U, den Ruijter HM, Erbel R, Erdmann J, Eriksson JG, Farrall M, Ferrannini E, Ferrières J, Ford I, Forouhi NG, Forrester T, Gansevoort RT, Gejman PV, Gieger C, Golay A, Gottesman O, Gudnason V, Gyllensten U, Haas DW, Hall AS, Harris TB, Hattersley AT, Heath AC, Hengstenberg C, Hicks AA, Hindorff LA, Hingorani AD, Hofman A, Hovingh GK, Humphries SE, Hunt SC, Hypponen E, Jacobs KB, Jarvelin MR, Jousilahti P, Jula AM, Kaprio J, Kastelein JJ, Kayser M, Kee F, Keinanen-Kiukaanniemi SM, Kiemeney LA, Kooner JS, Kooperberg C, Koskinen S, Kovacs P, Kraja AT, Kumari M, Kuusisto J, Lakka TA, Langenberg C, Le Marchand L, Lehtimäki T, Lupoli S, Madden PA, Männistö S, Manunta P, Marette A, Matise TC, McKnight B, Meitinger T, Moll FL, Montgomery GW, Morris AD, Morris AP, Murray JC, Nelis M, Ohlsson C, Oldehinkel AJ, Ong KK, Ouwehand WH, Pasterkamp G, Peters A, Pramstaller PP, Price JF, Qi L, Raitakari OT, Rankinen T, Rao DC, Rice TK, Ritchie M, Rudan I, Salomaa V, Samani NJ, Saramies J, Sarzynski MA, Schwarz PE, Sebert S, Sever P, Shuldiner AR, Sinisalo J, Steinthorsdottir V, Stolk RP, Tardif JC, Tönjes A, Tremblay A, Tremoli E, Virtamo J, Vohl MC, Amouyel P, Asselbergs FW, Assimes TL, Bochud M, Boehm BO, Boerwinkle E, Bottinger EP, Bouchard C, Cauchi S, Chambers JC, Chanock SJ, Cooper RS, de Bakker PI, Dedoussis G, Ferrucci L, Franks PW, Froguel P, Groop LC, Haiman CA, Hamsten A, Hayes MG, Hui J, Hunter DJ, Hveem K, Jukema JW, Kaplan RC, Kivimaki M, Kuh D, Laakso M, Liu Y, Martin NG, März W, Melbye M, Moebus S, Munroe PB, Njølstad I, Oostra BA, Palmer CN, Pedersen NL, Perola M, Pérusse L, Peters U, Powell JE, Power C, Quertermous T, Rauramaa R, Reinmaa E, Ridker PM, Rivadeneira F, Rotter JI, Saaristo TE, Saleheen D, Schlessinger D, Slagboom PE, Snieder H, Spector TD, Strauch K, Stumvoll M, Tuomilehto J, Uusitupa M, van der Harst P, Völzke H, Walker M, Wareham NJ, Watkins H, Wichmann HE, Wilson JF, Zanen P, Deloukas P, Heid IM, Lindgren CM, Mohlke KL, Speliotes EK, Thorsteinsdottir U, Barroso I, Fox CS, North KE, Strachan DP, Beckmann JS, Berndt SI, Boehnke M, Borecki IB, McCarthy MI, Metspalu A, Stefansson K, Uitterlinden AG, van Duijn CM, Franke L, Willer CJ, Price AL, Lettre G, Loos RJ, Weedon MN, Ingelsson E, O'Connell JR, Abecasis GR, Chasman DI, Goddard ME, Visscher PM, Hirschhorn JN, Frayling TM, Electronic Medical Records and Genomics (eMEMERGEGE) ConsortiumMIGen Consortium PAGEGE ConsortiumLifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wray NR, Goddard ME, Visscher PM. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Research. 2007;17:1520–1528. doi: 10.1101/gr.6665407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Peter M. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of european ancestry. BioRxiv. 2018 doi: 10.1101/274654. [DOI] [PMC free article] [PubMed]
  46. Zeng J, de Vlaming R, Wu Y, Robinson MR, Lloyd-Jones LR, Yengo L, Yap CX, Xue A, Sidorenko J, McRae AF, Powell JE, Montgomery GW, Metspalu A, Esko T, Gibson G, Wray NR, Visscher PM, Yang J. Signatures of negative selection in the genetic architecture of human complex traits. Nature Genetics. 2018;50:746–753. doi: 10.1038/s41588-018-0101-4. [DOI] [PubMed] [Google Scholar]

Decision letter

Editor: Magnus Nordborg1
Reviewed by: Magnus Nordborg2, Nicholas H Barton3, Joachim Hermisson4

In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses.

[Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed.]

Thank you for submitting your article "Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Magnus Nordberg as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individuals involved in review of your submission have also agreed to reveal their identity: Nicholas H Barton (Reviewer #2); Joachim Hermisson (Reviewer #3).

The Reviewing Editor has summarized the major concerns shared by all reviewers, and we have also included the separate reviews below for your consideration.

If you have any questions, please do not hesitate to contact us.

Summary:

This is one of two papers demonstrating that published signals of selection on human height cannot be replicated in the recently released UK Biobank data, apparently because these signals were caused by confounding population structure that is absent in UK Biobank data.

Major concerns:

We were struck by how both papers focus on spurious signals of selection rather than the underlying cause, which is that the GWAS effect-size estimates are confounded. The former is a somewhat esoteric question, but the latter may have enormous implications for much of human genetics, and these papers are likely to be heavily cited because of this. However, the papers seem to go out of their way to avoid discussing this topic. Of course we are not the authors, but, for the record, it looks odd.

Furthermore, the papers seem to suggest that confounding is not present in the UK Biobank data, but isn't it more likely that the magnitude is simply smaller?

Both papers also present evidence that a sib-based study by Robinson et al., 2015, that was meant to eliminate confounding did no such thing. This is disturbing, and while we understand that identifying the reason may be beyond the present papers, the general implications should again probably be discussed.

Finally, this paper often seems stream-of-consciousness: it lacks detailed explanations as well as a coherent outline, making it very difficult to follow unless you are a specialist in the field. We urge the authors to explain better for a general audience.

Separate reviews (please respond to each point):

Reviewer #1:

This one of at least two papers appearing simultaneously and reaching exactly the same conclusion. It is well written.

The only thing that surprises me about this paper is that it, as well as the other one I have seen, focuses on the relatively obscure issue of whether height has been under selection, tiptoeing around the much bigger issue (the elephant in the room) that the reason the claims for selection do not stand is that the GWAS estimates of effect sizes are biased because of population structure. It is not just the selection signals that do not replicate, but the polygenic scores. I'm not surprised, but, as you know, there are probably at least a hundred papers out there that are based on the infallibility of LD score regression and genomic prediction. I understand the need for caution before attacking this edifice, but I nonetheless think some clarification is unavoidable.

Reviewer #2:

This paper identifies a discrepancy between signs of selection estimated from UK Biobank data, compared with previous studies, and suggest that those earlier signals were caused by subtle stratification in the data. This is a useful contribution to an important question. I have only minor comments (below), but overall, urge the authors to try to rewrite the text to make it more accessible to those not immersed in the field. I find it hard to make specific suggestions, but it comes across as a list of statistical tests, without enough flow to carry the reader along with the argument. Admittedly, given the quite intricate arguments, this is not easy to do.

Minor Comments:

Why should one believe that using the first 15 PC corrects for stratification? Even this is somehow traditional in the field, it needs explanation, since the failure of the correction is the key point of the paper.

Figure 1B: – The x axis needs to be labelled. More important, Spearman's correlation seems far too small, given that by eye, the points follow the linear regression rather well. This may be related to the large values seen at the right of each figure, fitting a single regression is clearly inappropriate. There needs to be a test which separates these two sets of points in some way: as it stands the significance test is just not appropriate.

Figure 3: – b in the figure should be β. Also, there is a paragraph break before "The patterns" which makes it hard to work out what is main text and what is caption.

Figure 4: – I do not understand what the "six summary statistics" are here.

Discussion section: – The concluding paragraph seems too weak, especially the sentence "In no way..". Surely the point of the paper is to "question the statistical methodology.… in polygenic tyests for adaptation", since that methodology seems to give spurious results? It is also not at all clear how much the stratification implied here influences effect size estimates in GWAS.

Paragraph five of subsection “Polygenic scores”: Does the distribution in fact follow a β?

Reviewer #3:

Both manuscripts by Berg et al. and Sohail et al. present thorough and insightful analyses with highly relevant results for current and future GWAS studies. Even prior to publication, the manuscripts have considerable impact. They will be widely read and cited. I do not think that further analyses are needed, with the potential exception of the third point below. All other points concern the discussion, in particular the guidance for further research that will surely emerge from these studies.

How safe are results based on the UK Biobank data?

This refers to the weak signals reported (with much caution) in the present studies, but also to potential future results on other traits. You recommend using data "such as UKB" and we will certainly see many more studies based on this resource. I would therefore appreciate a more specific discussion of risks connected to this particular data set.

1) Stratification even within the UKB-GB data: It is well known that height and socioeconomic status are correlated in modern societies (e.g. BMJ 2016; 352:i582), and social status correlates with descent. In the UK, both factors are also geographically stratified, with people living in the north of the country having lower socioeconomic status and shorter stature, on average, than those in the south. Furthermore, the percentage of Anglo-Saxon admixture varies across the UK. How could these factors influence results based on UKB data, both here and otherwise?

2) Potential influence of GxE interactions: The manuscripts focus (for good reason) on issues connected with stratification. However, if polygenic scores depend on the environment (e.g., due to countergradient variation), GxE interactions are an alternative confounding factor. Importantly, use of a homogeneous detection panel (to avoid stratification), such as UKB-GB, could increase these effects. Maybe this should be briefly discussed in the context of the present results and mentioned as a necessary caveat also for future studies that use detection panels from narrow geographic regions.

What, exactly, causes the problems with the previous data?

3) There seem to be two relevant differences of the GIANT data relative to UKB: 1) UKB is much more homogeneous and 2) GIANT is a meta-study, collecting summary statistics from many sources that are individually corrected for stratification. One would like to know better which factor is decisive. This could be further addressed by combining summaries from sub-samples of the "UKB-all" data in an artificial meta-study.

4) The Robinson et al., 2015 GWAS: Sib-based studies are done to avoid / minimize stratification effects and the Robinson 2015 data have been used as a proof of robustness in several previous studies. The fact that you find clear signs of stratification is sobering and one would like to know what has gone wrong. You may not currently have any explanation and this is fair enough. However, the discussion should be clearer and say upfront that results based on these data cannot be trusted until we understand the issues.

Minor Comments:

a) You use 11 different summary statistics, with partly inconsistent naming strategy. I had to look up names in the methods part a number of times. I think this can be improved. Maybe even use the same names as Berg et al. where the summaries are identical.

b) The switch from 1000 genomes to POPRES complicates comparison between figures. If there are advantages of POPRES, why not use it throughout? This holds, in particular, for the test of the latitudinal slope, which would be more convincing with many populations rather than just 4 from the 1000 genomes data.

c) Figure 4: "The overdispersion signal disappeared entirely when the UK Biobank family based effect sizes were used": Is this due to the smaller sample size of the sib data or due to residual stratification issues in UKB? This could be tested using a sub-sample from UKB of the same size as the sib data.

d) Figure 3 legend: "suggesting that tSDS shift at the gw-significant SNPs is not driven by population stratification": only true for stratification due to this particular axis.

Additional data files and statistical comments:

All necessary information is provided and the UKB sib data is on Dryad. I think the other newly generated GWAS data should go there, too.

eLife. 2019 Mar 21;8:e39702. doi: 10.7554/eLife.39702.042

Author response


Major concerns:

We were struck by how both papers focus on spurious signals of selection rather than the underlying cause, which is that the GWAS effect-size estimates are confounded. The former is a somewhat esoteric question, but the latter may have enormous implications for much of human genetics, and these papers are likely to be heavily cited because of this. However, the papers seem to go out of their way to avoid discussing this topic. Of course we are not the authors, but, for the record, it looks odd.

We agree that our analysis raises broader issues beyond detection of polygenic adaptation. However, we do not find that our results question the whole GWAS enterprise. Almost all genome-wide significant signals identified by the GIANT consortium replicate in the UK Biobank. Overall, there is a high correlation between the effect size estimates between the two studies. However, certain aspects of current human genetics research outside evolutionary biology are obviously affected. The prime example is the transferability of polygenic risk scores between populations. We added a detailed discussion of this in the revised manuscript.

Furthermore, the papers seem to suggest that confounding is not present in the UK Biobank data, but isn't it more likely that the magnitude is simply smaller?

The revised version of the manuscript discusses the confounding in the UK biobank data. We clearly demonstrate that uncorrected summary statistics of the UK Biobank GWAS show signals of stratification even if the analysis is restricted to White British individuals. Interestingly, in the UK the north-south genetic cline tracks the height gradient in the opposite direction than in Continental Europe. Obviously, the magnitude of the confounding is much smaller. When principal components are included in the UK Biobank GWAS, we do not find any evidence of residual stratification when testing for a correlation between effect size estimates and twenty 1000 genomes principal components (Figure 2). However, this does not preclude the possibility of residual stratification along axes that are not captured by these principal components.

Both papers also present evidence that a sib-based study by Robinson et al., 2015, that was meant to eliminate confounding did no such thing. This is disturbing, and while we understand that identifying the reason may be beyond the present papers, the general implications should again probably be discussed.

In the revised manuscript, we clarify that we agree with the conceptual approach of Robinson et al. but that the discrepancy is likely to be due to a technical error in the computations of Robinson et al. We have in fact now confirmed this through correspondence with the authors of Robinson et al., and they are currently preparing a manuscript revisiting these analyses and correcting the technical issues. We emphasize that family-based effect size estimates computed in the UK Biobank following the Robinson et al. methodology behave as expected.

Finally, this paper often seems stream-of-consciousness: it lacks detailed explanations as well as a coherent outline, making it very difficult to follow unless you are a specialist in the field. We urge the authors to explain better for a general audience.

We edited the text making it more accessible.

Separate reviews (please respond to each point):

Reviewer #1:

This one of at least two papers appearing simultaneously and reaching exactly the same conclusion. It is well written.

The only thing that surprises me about this paper is that it, as well as the other one I have seen, focuses on the relatively obscure issue of whether height has been under selection, tiptoeing around the much bigger issue (the elephant in the room) that the reason the claims for selection do not stand is that the GWAS estimates of effect sizes are biased because of population structure. It is not just the selection signals that do not replicate, but the polygenic scores. I'm not surprised, but, as you know, there are probably at least a hundred papers out there that are based on the infallibility of LD score regression and genomic prediction. I understand the need for caution before attacking this edifice, but I nonetheless think some clarification is unavoidable.

As noted above, we added a discussion on the potential impact of our findings on the debate about the transferability of polygenic scores between populations. In this work we have focused on the effects of residual population stratification on tests of selection because they appear to be particularly sensitive. While we cannot exclude the possibility that some other methods are also sensitive to residual stratification, neither our analyses nor previous publications provide evidence that this is a widespread problem for other applications of GWAS data, even though our results certainly do highlight the importance of revisiting the importance of population stratification in all analyses of polygenic predictors. As for polygenic scores, it has been demonstrated previously that results can be unreliable when predicting across populations (Martin et al., 2017). The fact that polygenic scores from the UK Biobank tend to have a higher out of sample prediction accuracy than polygenic scores from GIANT is just one of many pieces of evidence showing that polygenic scores do not just pick up residual stratification, but rather a signal from the trait of interest.

Generally, any method that uses genetic data from multiple populations is prone to be susceptible to bias from residual stratification. However, our SDS results show that even methods which do not use data from distinct populations can be affected by residual stratification. In the case of SDS, the problem is that both the singleton density scores and GIANT height summary statistics are stratified across the European north south cline. We cannot exclude the possibility that similar biases can exist in other methods, but we have not found any other examples of it. LD score regression is unlikely to be affected, since residual (environmental) stratification should affect high and low LD score SNPs to a similar extent, and thus not have a large impact on the parameter estimates. To confirm the robustness of LD score regression, we have compared bivariate LD score regression estimates from GIANT to estimates from the UK Biobank. In each case, we used LD hub to obtain genetic correlation estimates between height and 832 other traits. We found very high concordance between the estimates in GIANT and UK Biobank, which further supports that bivariate LD scores regression results are in fact robust to population stratification.

Author response image 1.

Author response image 1.

While it is outside the scope of this work to provide a complete characterization of the extent to which residual population stratification affects the plethora of methods that make use of GWAS summary data, we hope that it provides a useful case study and stimulates more research in this area, as well as more careful study design in general.

Reviewer #2:

This paper identifies a discrepancy between signs of selection estimated from UK Biobank data, compared with previous studies, and suggest that those earlier signals were caused by subtle stratification in the data. This is a useful contribution to an important question. I have only minor comments (below), but overall, urge the authors to try to rewrite the text to make it more accessible to those not immersed in the field. I find it hard to make specific suggestions, but it comes across as a list of statistical tests, without enough flow to carry the reader along with the argument. Admittedly, given the quite intricate arguments, this is not easy to do.

We have substantially edited the text and hope that it has become more accessible.

Minor Comments:

Why should one believe that using the first 15 PC corrects for stratification? Even this is somehow traditional in the field, it needs explanation, since the failure of the correction is the key point of the paper.

One can certainly envision a scenario where very few samples in the dataset have a substantially different ancestry or there is an axis of very mild stratification affecting many samples. Both of these would not be captured by top principal components. However, investigation of these effects is not a focus of this manuscript, which attracts attention to a problem rather than suggests new analytical standards.

Figure 1B: The x axis needs to be labelled. More important, Spearman's correlation seems far too small, given that by eye, the points follow the linear regression rather well. This may be related to the large values seen at the right of each figure, fitting a single regression is clearly inappropriate. There needs to be a test which separates these two sets of points in some way: as it stands the significance test is just not appropriate.

We have labeled the x-axis for Figure 1B. We agree with the reviewer and, in our work, present an analysis that studies the tSDS distribution for only genome-wide SNPs (Figure 3). We use the Spearman correlation coefficient to recapitulate the original analysis. The points and the linear slope in the figure are for visualization only. The Spearman correlation coefficient is computed based on raw data and is unrelated to the dots that correspond to binned data.

Figure 3: b in the figure should be β. Also, there is a paragraph break before "The patterns" which makes it hard to work out what is main text and what is caption.

We have updated the figure and removed the paragraph break.

Figure 4: I do not understand what the "six summary statistics" are here.

We have corrected the caption to say “four” summary statistics

Discussion section: The concluding paragraph seems too weak, especially the sentence "In no way..". Surely the point of the paper is to "question the statistical methodology.… in polygenic tyests for adaptation", since that methodology seems to give spurious results? It is also not at all clear how much the stratification implied here influences effect size estimates in GWAS.

We have edited this sentence as well as the concluding paragraphs in general.

Paragraph five of subsection “Polygenic scores” Does the distribution in fact follow a β?

Again, we have re-implemented the existing approach used by previous work (Berg et al., 2017). Although not shown in the paper, we have investigated the issue and found that different ways to compute confidence intervals yield very similar results (including methods not making specific assumptions about the shape of allele frequency distribution).

Reviewer #3:

Both manuscripts by Berg et al. and Sohail et al. present thorough and insightful analyses with highly relevant results for current and future GWAS studies. Even prior to publication, the manuscripts have considerable impact. They will be widely read and cited. I do not think that further analyses are needed, with the potential exception of the third point below. All other points concern the discussion, in particular the guidance for further research that will surely emerge from these studies.

How safe are results based on the UK Biobank data?

This refers to the weak signals reported (with much caution) in the present studies, but also to potential future results on other traits. You recommend using data "such as UKB" and we will certainly see many more studies based on this resource. I would therefore appreciate a more specific discussion of risks connected to this particular data set.

1) Stratification even within the UKB-GB data: It is well known that height and socioeconomic status are correlated in modern societies (e.g. BMJ 2016; 352:i582), and social status correlates with descent. In the UK, both factors are also geographically stratified, with people living in the north of the country having lower socioeconomic status and shorter stature, on average, than those in the south. Furthermore, the percentage of Anglo-Saxon admixture varies across the UK. How could these factors influence results based on UKB data, both here and otherwise?

This is an important point, which we took into account. We repeat here the same response as above (the Major comments section). The revised version of the manuscript discusses the confounding in the UK biobank data. We clearly demonstrate that uncorrected summary statistics of the UK Biobank GWAS show signals of stratification even if the analysis is restricted to White British individuals. Interestingly, in the UK the north-south genetic cline tracks the height gradient in the opposite direction than in Continental Europe. Obviously, the magnitude of the confounding is much smaller, compared to the confounding that can arise from residual stratification in a trans-European sample. It seems likely that both genetic and environmental stratification are present along similar dimensions in the UK Biobank population, which could explain this finding. Principal components can correct for stratification effects, regardless of whether they are of genetic or environmental origin. We therefore have made no attempts to disentangle the causes of stratification in the UK Biobank, but other researchers have (Haworth et al.).

2) Potential influence of GxE interactions: The manuscripts focus (for good reason) on issues connected with stratification. However, if polygenic scores depend on the environment (e.g., due to countergradient variation), GxE interactions are an alternative confounding factor. Importantly, use of a homogeneous detection panel (to avoid stratification), such as UKB-GB, could increase these effects. Maybe this should be briefly discussed in the context of the present results and mentioned as a necessary caveat also for future studies that use detection panels from narrow geographic regions.

We thank the reviewer for bringing this up and agree that GxE is another potential confounder affecting transferability of polygenic scores. At the same time, presence of GxE interactions without selection are not expected to generate a non-zero covariance between effect size estimates and allele frequency differences (basis of Qx) or between effect size estimates and allelic ages (basis of tSDS), although such interactions are an important caveat for how adaptation signals should be interpreted. We think that the countergradient due to stabilizing selection in a changing environment (with or without presence of GxE) can indeed lead to a signal of adaptation. Whether this should be considered a false signal or as a true adaptation to the same phenotypic value might be a matter of a terminology debate.

We briefly mention the potential effect of GxE on the transferability of polygenic scores in the Discussion. We decided not to include the technical points above in the manuscript because we are trying to make it more accessible.

What, exactly, causes the problems with the previous data?

3) There seem to be two relevant differences of the GIANT data relative to UKB: 1) UKB is much more homogeneous and 2) GIANT is a meta-study, collecting summary statistics from many sources that are individually corrected for stratification. One would like to know better which factor is decisive. This could be further addressed by combining summaries from sub-samples of the "UKB-all" data in an artificial meta-study.

We believe that we have a good understanding of the factors that can lead to residual population stratification in the UK Biobank. First, not including PCs as covariates in the estimation of effect sizes. Second, extending the studied samples from the very homogeneous set of white British individuals to a wider range of samples with more diverse ancestry.

Conducting a meta-analysis per se should not be a factor that compromises stratification correction, as long as the principal components were computed on the whole sample. However, as the reviewer has pointed out, correcting for stratification within each cohort individually can lead to ineffective stratification correction, if cohort sizes are too small to allow PCA to capture the underlying population structure.

The distributed nature of a meta-analysis, and the difficulty of balancing transparency with data privacy concerns, make it easier for inconsistencies or mistakes to remain undetected and to have an impact on stratification correction. As we do not currently have access to cohort level data in GIANT, we are unable to comment on the causes of residual population stratification in the GIANT meta-analysis.

It would be possible to conduct an artificial meta-analysis in the UK Biobank and to induce residual stratification effects by computing PCs in very small cohorts or by omitting some PCs as covariates in some cohort. However, as there are several potential ways for stratification effects to enter into a meta-analysis, we would be unable to make strong conclusions about the origin of the observed differences between GIANT and the UK Biobank.

4) The Robinson et al., 2015 GWAS: Sib-based studies are done to avoid / minimize stratification effects and the Robinson 2015 data have been used as a proof of robustness in several previous studies. The fact that you find clear signs of stratification is sobering and one would like to know what has gone wrong. You may not currently have any explanation and this is fair enough. However, the discussion should be clearer and say upfront that results based on these data cannot be trusted until we understand the issues.

Again, repeating the above (the Major comments section). In the revised manuscript, we clarify that we agree with the conceptual approach of Robinson et al. but that the discrepancy is likely to be due to a technical error. We have in fact now confirmed this through correspondence with the authors of Robinson et al., and they are currently preparing a manuscript revisiting these analyses and correcting the technical issues. We emphasize that family-based effect size estimates computed in the UK Biobank following the Robinson et al. methodology behave as expected.

Minor Comments:

a) You use 11 different summary statistics, with partly inconsistent naming strategy. I had to look up names in the methods part a number of times. I think this can be improved. Maybe even use the same names as Berg et al. where the summaries are identical.

We have made sure the names of the 11 summary statistics are consistent throughout the paper and figures. The main "UKB" dataset is referred to as "UKB Neale" in the supplemental figures to distinguish it from "UKB Neale new" and from the UKB LMM summary statistics "UKB Loh".

b) The switch from 1000 genomes to POPRES complicates comparison between figures. If there are advantages of POPRES, why not use it throughout? This holds, in particular, for the test of the latitudinal slope, which would be more convincing with many populations rather than just 4 from the 1000 genomes data.

We use 1000 genomes as it is publically available and several of the previous studies we analyze in this paper use 1000 genomes populations for claims of polygenic adaptation (Mathieson et al., Berg et al). The switch to POPRES for the test of latitudinal slope and overall overdispersion is to ensure that we do not see a nonsignificant P-value simply due to a lack of power in 1000 genomes. We realize this makes comparing Figures 1 and 4 difficult but we believe our paper is stronger for analyzing both datasets due to their different strengths and because both datasets have been used by previous studies of polygenic adaptation.

c) Figure 4: "The overdispersion signal disappeared entirely when the UK Biobank family based effect sizes were used": Is this due to the smaller sample size of the sib data or due to residual stratification issues in UKB? This could be tested using a sub-sample from UKB of the same size as the sib data.

Even without running another GWAS on a subset of the UK Biobank, we have some reasons to believe that the lack of overdispersion signal in the family-based estimates reflects a lack of power rather than residual stratification in the UK Biobank: Row 7 in Figure 4—figure supplement 1 shows that in the UK Biobank GWAS of white British samples without PC correction, no latitude signal is detectable, unless only genome-wide significant SNPs are used, which are least affected by stratification effects. We therefore think that the latitude signal at the genome-wide significant SNPs is likely real, rather than driven by stratification in the UK Biobank, as the stratification effects in UKB WB no PCs tend to go in the opposite direction. This leads us to believe that given enough power, a real overdispersion signal should probably be detected. However, we can’t help but remain agnostic as even for genome-wide significant SNPs (Figure 1—figure supplement 2B), polygenic scores for both modern and ancient individuals change when the main UKB summary statistics are used (WB ancestry controlling for 10 PCs) instead of GIANT. This shift, for example, for the hunter-gatherer (HG) polygenic score is troubling as we know that different European populations have variable amounts of ancient HG vs. EF vs. SP ancestry, and could reflect residual stratification in the UKB GWAS not captured by our PCs.

d) Figure 3 legend: "suggesting that tSDS shift at the gw-significant SNPs is not driven by population stratification": only true for stratification due to this particular axis.

We agree with the reviewer and have changed this sentence to “There is no significant difference in frequency in these two populations, suggesting that tSDS shift at the gw-significant SNPs is not driven by population stratification at least due to this particular axis.”

Additional data files and statistical comments:

All necessary information is provided and the UKB sib data is on Dryad. I think the other newly generated GWAS data should go there, too.

We have placed all newly generated GWAS data on Dryad.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Sohail M, Maier RM, Ganna A. 2018. Data from: Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]
    2. Field Y, Boyle E, Telis N, Gao Z, Gaulton K, Golan D, Yengo L, Rocheleau G, Froguel P, McCarthy M, Pritchard J. 2016b. Data from: detection of human adaptation during the past 2000 years. Dyrad Digital Repository. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Figure 1—source data 1. Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.
    elife-39702-fig1-data1.xlsx (220.2KB, xlsx)
    DOI: 10.7554/eLife.39702.009
    Figure 2—source data 1. Evidence of stratification in height summary statistics.
    elife-39702-fig2-data1.xlsx (196.2KB, xlsx)
    DOI: 10.7554/eLife.39702.015
    Figure 3—source data 1. Height tSDS results for different summary statistics.
    elife-39702-fig3-data1.xlsx (404.6KB, xlsx)
    DOI: 10.7554/eLife.39702.022
    Figure 4—source data 1. Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height.

    This reference was updated from its bioRxiv version to its now published version.

    DOI: 10.7554/eLife.39702.028
    Supplementary file 1. Description of 11 GWAS summary statistics.
    elife-39702-supp1.xlsx (42.9KB, xlsx)
    DOI: 10.7554/eLife.39702.029
    Supplementary file 2. Table of ancient and 1000 genomes modern populations used with sample sizes.
    elife-39702-supp2.xlsx (36.8KB, xlsx)
    DOI: 10.7554/eLife.39702.030
    Supplementary file 3. Supplementary note on characterization of stratification effects in GIANT and UK Biobank.
    elife-39702-supp3.docx (120.4KB, docx)
    DOI: 10.7554/eLife.39702.031
    Supplementary file 4. Table of POPRES populations used with sample sizes and latitude.
    elife-39702-supp4.xlsx (40.6KB, xlsx)
    DOI: 10.7554/eLife.39702.032
    Supplementary file 5. LD Score regression estimates for 11 different summary statistics.

    LD score regression can be used to detect residual stratification effects in summary statistics, and so we tested whether LDSC confirms our hypothesis of residual stratification. We detect a greatly inflated intercept estimate of 9.42 in UKB all no PCs, but only a moderately increased intercept value in GIANT and an intercept less than one in NG2015 sibs. The relatively small GIANT intercept can be explained by cohort-wise lambda-GC correction, while the low intercept in NG2015 sibs is possibly caused by the adaptive permutation procedure which does not compute precise p-values for non-significant associations. In both cases LDSC cannot be expected to pick up stratification effects, since the generation of summary statistics is not in line with the LDSC model.

    elife-39702-supp5.xlsx (51KB, xlsx)
    DOI: 10.7554/eLife.39702.033
    Supplementary file 6. Correlation of beta estimates at all 86,153 shared SNPs.
    elife-39702-supp6.xlsx (45.5KB, xlsx)
    DOI: 10.7554/eLife.39702.034
    Supplementary file 7. Correlation of beta estimates at 2251 shared SNPs which are significant in the UK Biobank.
    elife-39702-supp7.xlsx (47.5KB, xlsx)
    DOI: 10.7554/eLife.39702.035
    Transparent reporting form
    DOI: 10.7554/eLife.39702.036

    Data Availability Statement

    All newly generated UK Biobank height GWAS summary statistics have been made available at http://dx.doi.org/10.5061/dryad.8g5g6j4. Results from the GIANT Consortium (GWAS Anthropometric 2014 Height) were downloaded from https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files#GWAS_Anthropometric_2014_Height. GWAS results from the UK Biobank ("UKB" or "UKB Neale") were downloaded from http://www.nealelab.is/uk-biobank. The previously published family-based effect size estimates ("NG2015 sibs") can be accessed here http://cnsgenomics.com/data/robinson_et_al_2015_ng/withinfam_summary_ht_bmi_release_March2016.tar.gz. The independent mixed model association analysis that included all UK Biobank individuals of European ancestry ("UKB Loh") was downloaded from https://data.broadinstitute.org/alkesgroup/UKBB/body_HEIGHTz.sumstats.gz. Approximately independent linkage disequilibrium blocks in human populations were downloaded for the EUR population from https://bitbucket.org/nygcresearch/ldetect-data/overview. Source code repositories for the polygenic score analysis in this manuscript and computing scripts and source data for all the main figures have been made available at https://github.com/msohail88/polygenic_selection and https://github.com/uqrmaie1/sohail_maier_2019 (copies archived at https://github.com/elifesciences-publications/polygenic_selection and https://github.com/elifesciences-publications/sohail_maier_2019, respectively).

    The following dataset was generated:

    Sohail M, Maier RM, Ganna A. 2018. Data from: Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES