Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2020 Jun 12;107(1):60–71. doi: 10.1016/j.ajhg.2020.05.014

Evidence of Polygenic Adaptation in Sardinia at Height-Associated Loci Ascertained from the Biobank Japan

Minhui Chen 1,, Carlo Sidore 2, Masato Akiyama 3,4, Kazuyoshi Ishigaki 3, Yoichiro Kamatani 3,5, David Schlessinger 6, Francesco Cucca 2, Yukinori Okada 3,7, Charleston WK Chiang 1,8,∗∗
PMCID: PMC7332648  PMID: 32533944

Abstract

Adult height is one of the earliest putative examples of polygenic adaptation in humans. However, this conclusion was recently challenged because residual uncorrected stratification from large-scale consortium studies was considered responsible for the previously noted genetic difference. It thus remains an open question whether height loci exhibit signals of polygenic adaptation in any human population. We re-examined this question, focusing on one of the shortest European populations, the Sardinians, in addition to mainland European populations. We utilized height-associated loci from the Biobank Japan (BBJ) dataset to further alleviate concerns of biased ascertainment of GWAS loci and showed that the Sardinians remain significantly shorter than expected under neutrality (∼0.22 standard deviation shorter than Utah residents with ancestry from northern and western Europe [CEU] on the basis of polygenic height scores, p = 3.89 × 10−4). We also found the trajectory of polygenic height scores between the Sardinian and the British populations diverged over at least the last 10,000 years (p = 0.0082), consistent with a signature of polygenic adaptation driven primarily by the Sardinian population. Although the polygenic score-based analysis showed a much subtler signature in mainland European populations, we found a clear and robust adaptive signature in the UK population by using a haplotype-based statistic, the trait singleton density score (tSDS), driven by the height-increasing alleles (p = 9.1 × 10−4). In summary, by ascertaining height loci in a distant East Asian population, we further supported the evidence of polygenic adaptation at height-associated loci among the Sardinians. In mainland Europeans, the adaptive signature was detected in haplotype-based analysis but not in polygenic score-based analysis.

Keywords: height, polygenic adaptation, population stratification

Introduction

Because of the highly polygenic nature of many human complex traits, polygenic adaptation was thought to be an important mechanism of phenotypic evolution in humans. Since each genetic locus contributes a small effect to complex traits, polygenic adaptation is expected to be different from the classic selective sweep, where a beneficial allele is driven to near-fixation in a population because of strong positive selection.1 In polygenic adaptation, only a subtle but coordinated allele frequency shift across loci underlying the selected trait is expected. In human beings, height is one of the earliest putative examples of polygenic adaptation. Northern Europeans are known to be taller than southern Europeans on average.2,3 By evaluating the changes of allele frequencies at height-associated loci, either weighted or unweighted by the effect sizes on height, multiple studies have suggested polygenic adaptation as the reason for differences in human height in European and other populations.4, 5, 6, 7, 8, 9, 10 It is important to note that the signals of adaptation were inferred at height-associated loci; height itself might not be the target of selection because it could be due to a trait that shares genetic architecture with height. Nevertheless, these inferred signals of adaptation suggest that natural selection contributed to the differentiation of height between human populations.

However, the adaptative signature at height-associated loci was recently called into question by two papers.11,12 The authors of both papers found that the adaptive signature disappeared if genome-wide association study (GWAS) summary statistics based on the UK Biobank (UKB) individuals were used in the analysis. This suggested that previous studies of adaptation might have been confounded because of the ascertainment of a set of height-associated loci with biased estimates of effect sizes, the aggregation of which across a large number of height-associated loci led to the apparent difference in genetic height scores between northern and southern Europeans. It was suggested that the biased effect sizes were due to residual uncorrected stratification from large-scale consortium studies of human height, such as that by the GIANT (Genetic Investigation of ANthropometric Traits) consortium,13 where the control for population stratification implemented at the level of smaller individual studies was insufficient. In contrast, large-scale biobank-level studies where individual data were available enabled much more effective control for stratification either through principal component analysis (PCA) or linear mixed models.11,12

Although these studies and others14 investigated the degree to which over-estimated effect sizes in GWASs led to unrealistic polygenic height scores and differences between populations, it remains an open question of whether height-associated loci exhibit signals of polygenic adaptation in any human populations. For one, the original report of polygenic adaptation on height in Europe relied solely on frequency changes between populations and the direction of association among alleles most associated with height.4 By not taking into account the effect sizes, this approach should be more robust to uncorrected stratification in GWASs. Moreover, loci most strongly associated with height appear to still exhibit a strong signal in a haplotype-based trait singleton density score (tSDS) analysis.11,12 Furthermore, estimated temporal trajectory of polygenic height scores also showed a small but significant uptick in the recent history.15 Finally, pygmies from the Indonesian island Flores exhibited lower genetic height than expected on the basis of height loci ascertained in the distantly related UKB population.10

In the present study, we re-examined whether height-associated loci exhibit signs of adaptation in Europe. In light of reported stratification16 and polygenic selection for height17 in the UKB population, and our finding here that height-associated loci ascertained from the UKB dataset are still significantly associated with structure in Europe, we chose to conduct our analysis by using height-associated SNPs ascertained from summary statistics based on the Biobank Japan (BBJ) individuals. Because it is a population distant from Europe, differences in frequencies and patterns of linkage disequilibrium (LD) could lead to a decrease in the accuracy of polygenic score predictions of a trait18 and thus lower the power of polygenic scores to detect polygenic selection. However, we reasoned and demonstrated that height-associated loci ascertained in BBJ were still significantly predictive of height in European populations, and in exchange for the decreased accuracy in prediction these loci were much less associated with the structure in Europe. As such, in the absence of a very large-scale family-based analysis to ascertain height-associated loci, our approach to ascertain height-associated SNPs from BBJ is the least likely to be impacted by any cryptic covariances due to population stratification. Using this approach, we found that the Sardinians, one of the shortest populations in Europe, have significantly lower polygenic height scores than expected given their genetic relatedness to other European populations, consistent with previous reports.6 In mainland Europe, however, the adaptive signature based on allele frequencies was much subtler, although we observed a strong haplotype-based signature by using the tSDS. Together, findings of our study provided additional evidence of polygenic adaptation at height-associated loci in some European populations.

Material and Methods

GWAS Panels

To calculate polygenic height score, we obtained GWAS summary statistics from three studies.

The first study is GIANT,13 a meta-analysis of 79 separate GWASs for height using a total of ∼253,000 individuals of European ancestry with ∼2.5 M variants. Each study imputed their genetic data to HapMap Phase II CEU (Utah residents with ancestry from northern and western Europe) genotypes and then tested for association with sex-standardized height, assuming an additive inheritance model and adjusting for age and other study-specific covariates (including principal components [PCs]). Studies with related samples used variance-component or other linear mixed-effects modeling to account for relatedness in the regression, and studies with unrelated individuals tested for association under a linear regression framework. Meta-analysis was performed via an inverse-variance fixed-effect method. On the basis of all variants with a minor allele frequency (MAF) > 1% in the summary statistics, the genomic control parameter, λGC, is 2.00.

The second study is UKB, a GWAS based on ∼361,000 individuals of white British ancestry in the UK Biobank. These individuals were genotyped with either the Affymetrix UK BiLEVE Axiom Array or the Affymetrix UK Biobank Axiom Array and imputed to whole-genome sequencing data from Haplotype Reference Consortium (HRC), UK10K (coverage = 7×), and 1000 Genomes for ∼13.8 M variants. Association testing was done on standardized height correcting for age, age2, sex, agesex, and age2∗sex interactions. Population structure was adjusted by including 20 PCs. On the basis of all variants with an MAF > 1% in the summary statistics, λGC = 2.25.

The third study is BBJ,19 a GWAS based on ∼159,000 individuals of Japanese ancestry from Biobank Japan.20,21 These individuals were genotyped on either the Illumina HumanOmniExpressExome BeadChip or a combination of the Illumina HumanOmniExpress and HumanExome BeadChips and imputed to combined whole-genome sequencing data from BBJ1K (coverage = 30× )22 and 1000 Genomes for ∼27.9 M variants. Individuals not of Japanese origins were excluded by self-report or PCA. Using standardized residuals of height after adjusting for age, age2, and sex, a GWAS was conducted with a linear mixed model implemented in the software BOLT-LMM to control for cryptic relatedness and population structure.23 On the basis of all variants with an MAF > 1% in the summary statistics, λGC = 1.69.

Population Genetic Data

We separately evaluated polygenic selection on height-associated loci in mainland Europeans and in Sardinians. For mainland Europeans, we analyzed two populations with northern European ancestry, i.e., the GBR (British in England and Scotland) and the CEU, and two populations from southern Europe, i.e., the IBS (Iberian population in Spain) and the TSI (Toscani in Italia), by using data from the 1000 Genomes phase three release.24 We did not include the FIN (Finnish in Finland) population because of its known unique demographic history;25 we also did not include the FIN population to achieve a better balance of sample sizes between the two comparison populations. For Sardinians, we included frequency estimates from 615 unrelated Sardinian individuals whole-genome sequenced (coverage = 4× ) in the SardiNIA study.26,27 All Sardinian participants gave informed consent; protocols were approved by the institutional review boards of the University of Cagliari, the National Institute on Aging, and the University of Michigan.

Population Structure Analysis

We first conducted PCA on the four mainland European populations (CEU, GBR, IBS, and TSI) from 1000 Genomes. We used variants that were present in all three GWAS panels and that had an MAF > 5% in the four European populations. We pruned SNPs in windows of 50 SNPs, moving in steps of 5, such that no two SNPs have r2 > 0.2 (via the option of “–indep-pairwise 50 5 0.2” in PLINK version 1.9)28 to remove correlated variants. We further removed SNPs in regions of long-range LD.29 PCA was performed on the remaining variants via Eigensoft version 7.2.1. We also conducted PCA in the same manner by using the four mainland Europeans plus 91 Sardinian individuals randomly selected for polygenic height score trajectory analysis (below). We used 91 Sardinian individuals instead of all 615 individuals so that each population in the PCA has an approximately equal sample size.

To measure the impact of uncorrected stratification on estimated effect sizes for a set of ascertained height-associated variants, we computed the correlation between PC loadings and SNP effect sizes estimated from GWASs. We performed linear regressions of the PC value on the allelic genotype count for each polymorphic variant in the four mainland European populations and Sardinia, and we used the resulting regression coefficients as the variant’s PC loading estimates. For each PC, we then computed Pearson correlation coefficients of PC loadings and effect sizes (of variants with an MAF > 0.01) from each GWAS panel (GIANT, UKB, and BBJ). We estimated p values on the basis of jackknife standard errors by splitting the genome into 1,000 blocks with an equal number of variants.

Population-Level Polygenic Height Score Calculation

To compute polygenic scores, we ascertained independent GWAS variants associated with height by selecting a set of genome-wide significant variants (p < 5 × 10−8) that had an MAF > 1% in a GWAS panel and that were polymorphic in test populations. To obtain independent height loci, we first pruned variants such that no two variants were within 1 Mb of each other. We then further pruned by LD by using 1000 Genomes as the reference such that no two variants would have a r2 > 0.1. In both pruning steps, we preferentially retain the variant with a lower p value. We used CEU, GBR, and JPT (Japanese in Tokyo, Japan) populations to compute reference LD for pruning GIANT, UKB, and BBJ summary statistics, respectively. In total, there were 26,593, 227,794, and 65,291 variants reaching genome-wide significance in the GIANT, UKB, and BBJ summary statistics, respectively. Using this pruning approach, we identified 457, 774, and 371 independent height-associated variants from GIANT, UKB, and BBJ summary statistics, respectively.

We additionally ascertained height-associated variants by using an alternative approach. Previous studies11,12 also ascertained height-associated variants from approximately independent LD blocks across the genome. In order to compare to these studies, we similarly divided the genome into approximately independent LD blocks computed by Berisa and Pickrell30 (∼1,700 blocks in the European population for GIANT and UKB panels; ∼1,400 blocks in the Asian population for the BBJ panel) and retained within each LD block the variant (MAF > 0.01) with the lowest p value for association with height, regardless of whether the variant reached the genome-wide significance level. Out of the 1,702 variants found in GIANT, 474 (28%) were genome-wide significant. Out of the 1,703 variants found in UKB, 812 (48%) were genome-wide significant. Finally, out of 1,444 variants found in BBJ, 380 (26%) were genome-wide significant.

Given a set of L height-associated SNPs, the estimated effect sizes from each GWAS panel were then used to compute polygenic height scores for each population by

Z=l=1L2βlpl,

where pl and βl were the allele frequency and effect size at SNP l.

Signature of Selection at Height-Associated Loci

To evaluate the evidence of selection at height-associated loci, we applied the following three methods: excess variance test, polygenic height score trajectory, and tSDS analysis.

Excess Variance Test

We conducted the QX test5 to determine whether the estimated polygenic scores exhibited more variance among populations than null expectation under genetic drift:

QX=Z'TF1Z'2VA,

where Z was a vector of estimated genetic values (i.e., a sum of sample allele frequencies weighted by effect size) for test populations, F was a matrix describing the correlation structure of allele frequencies across populations, and VA was the additive genetic variance of the ancestral (global) population. To construct the F matrix, we sampled 20,000 variants from the same GWAS panels, matched to the height-associated SNPs by MAF, recombination rate, and background selection as measured by B values.31 Specifically, we partitioned variants into a three-way contingency table in each GWAS panel with 25 bins for MAF (i.e., a bin size of 0.02), 100 bins for recombination rate, and 10 bins for B value. For recombination rate, we used the CEU, GBR, and JPT (Japanese in Tokyo, Japan) genetic maps generated from the 1000 Genomes phased OMNI data for GWAS panels GIANT, UKB, and BBJ, respectively. The Qx statistic follows a χ2 distribution with M − 1 degrees of freedom under neutrality, where M is the number of test populations, from which an asymptotic p value was estimated. Significant excess of variance among populations would be consistent with the differential action of natural selection among populations.

To identify outlier populations that contributed to the excess of variance, we further estimated the conditional Z score proposed by Berg and Coop.5 Specifically, we excluded one population at a time and then calculated the expected mean and variance of genetic value in the excluded population given the values observed in the remaining populations and the covariance matrix relating them. Using this conditional mean and variance, we calculated a Z score to describe the fit of the estimated genetic value of the excluded population by the drift model conditioned on the values in the remaining populations. An extreme Z score would suggest that the excluded population had experienced directional selection that was not experienced by the conditioned populations in the analysis on the trait of interest.

In practice, we also generated the empirical null distributions of the QX statistic and conditional Z scores by calculating 10,000 null genetic values via resampled SNPs genome-wide matched by MAF, recombination rate, and B value, just as how the F matrix was constructed. The empirical p values for conditional Z scores tended to match well with the asymptotic p values (data not shown). Therefore, throughout the study, we used the asymptotic p value for the QX statistic and conditional Z score. The scripts we used to implement these analyses are available on GitHub (see Web Resources).

Polygenic Height Score Trajectory

Using the framework proposed by Edge and Coop,15 we constructed the history of polygenic height scores in the GBR and Sardinian populations. Using a genetic map from HapMap as reference, we first phased 91 Sardinian individuals from the SardiNIA study26 together with 503 Europeans from 1000 Genomes via Eagle v2.4.1.32 Extracting out 91 individuals each from the GBR and Sardinian populations, we then used the software RELATE v1.0.833 to reconstruct ancestral recombination graphs in these two populations together. We only included bi-allelic SNPs that are found in the genomic mask provided with the 1000 Genomes Project dataset. We used an estimate of the human ancestral genome to identify the most likely ancestral allele for each SNP. We initially estimated branch lengths by using a constant effective population size of 11,314 and a mutation rate of 1.25 × 10−8 per base per generation. We then calculated mutation rate and coalescent rate through time given the branch lengths by using default parameters (30 bins between 1,000 and 10,000,000 years before present and 28 years per generation). By averaging coalescence rates over all pairs of haplotypes and taking the inverse, we obtained a population-wide estimate of effective population size. We then used this population size estimate to re-estimate branch lengths. We iterated these two steps five times to convergence, as suggested by Speidel et al.,33 then obtained a final estimate of branch lengths and the effective population size. On the basis of the output ancestral recombination graphs, we estimated the time courses of polygenic height scores as the estimated sum of allele frequencies weighted by effect sizes for the GBR and Sardinian populations separately by using the three estimators proposed by Edge and Coop:15 (1) the proportion-of-lineages estimator, (2) the waiting-time estimator, and (3) the lineages-remaining estimator. The first estimator estimated allele frequency at a specified time in the past as the proportion of lineages that carry the allele of interest. The latter two estimators estimated allele frequency as the relative sizes of the two subpopulations carrying ancestral and derived alleles. The former used waiting times between coalescent events to estimate subpopulation sizes, whereas the latter used the number of coalescence events that occur between specified time points. The same set of SNPs (genome-wide significantly associated variants in BBJ or UKB, after pruning by distance and LD) was used to compute the polygenic height score in both the GBR and Sardinian populations. We focused on the proportion-of-lineages estimators because it had been shown to be the most powerful at detecting selection because of its improved precision,15 but all three estimators were provided for completeness.

We tested for significant differences in polygenic height score trajectory between the GBR and Sardinian populations over time by performing 10,000 permutations of the signs of effect sizes across these SNPs. We specifically tested whether polygenic height score in the Sardinian population is changing relative to in the GBR population for two time intervals: between 20,000 years and 10,000 years ago and between 10,000 years ago and the present time. The former time point was chosen because it is approximately the time point for the first evidence of human inhabitation on Sardinia (up to 18,000 years ago).34,35 The latter time point was chosen because it is around the beginning of the Neolithic period (∼8,000 years ago) when Sardinia became isolated and genetically diverged from mainland Europeans (∼7,000 years ago).26 We also conducted post hoc tests of directional changes of individual population trajectories at time points 10,000 years, 5,000 years, and 1,000 years before the present to see whether and when the polygenic height score trajectories in the GBR or Sardinian populations are individually deviating from the null. These time points were chosen on the basis of visual inspection of the inferred trajectory. To estimate courses of polygenic height scores and conduct the significance test, we adopted code from Edge and Coop15 available at GitHub (see Web Resources).

tSDS Analysis

We tested whether height-associated loci are under recent selection in a mainland European population by examining the distribution of tSDSs. Recent selection results in shorter tip branches for the favored allele. The SDS8 leveraged the average distance between the nearest singletons on either side of a test SNP across all individuals to estimate the mean tip-branch length of the derived and ancestral alleles and used this measure to infer evidence of selection. The sign of an SDS can be polarized such that positive scores indicate increased frequency of the trait-increasing (or trait-decreasing) allele instead of derived allele. This metric is referred to as a tSDS. We obtained pre-computed SDSs for 4,451,435 autosomal SNPs from 3,195 white British individuals from the UK10K project. In each GWAS panel (UKB and BBJ), we included only SNPs with a reported SDS prior to distance and LD pruning to obtain a set of genome-wide significant SNPs. For each variant, we looked up the effect size of the derived allele then determined the sign of a tSDS value as positive if the derived allele is height-increasing or negative if the derived allele is height-decreasing. Therefore, a positive tSDS indicates that a height-increasing allele has risen in frequency in the recent past; a negative tSDS indicates a height-increasing allele has dropped in frequency in the recent past. To estimate whether an observed mean tSDS across a set of height-associated SNPs was significantly different from the null expectation, we performed 100,000 permutations of the sign of the effect size of derived alleles across these SNPs and reported the empirical p value.

Results

European Population Structure Underlying GWAS Summary Statistics

Incomplete control of population structure could lead to biases in the estimated effect sizes in GWASs. As a result, polygenic scores constructed on the basis of these GWASs would show elevated population differentiation relative to neutral genetic drift.7 For example, because the primary feature of genetic differentiation in mainland Europe is along the north-south axis, if human height is differentiated along this axis because of non-genetic effects, any variant that is also differentiated along this axis would have an overestimated effect size if the population structure is not well controlled in GWASs.36 Using GWAS summary statistics from a geographically and genetically distant population should alleviate this issue because the effect of stratification in the GWAS panel would be independent from that of the test populations for polygenic adaptation.

We first evaluated the impact of population stratification on height-associated variants ascertained from different GWAS panels that are available to us: the GIANT consortium, the UKB, and the BBJ datasets. Specifically, we examined the correlation between effect sizes estimated from each GWAS panel and the PC loading on a PCA conducted in four 1000 Genomes European populations (Figure S1). The first three PCs reflected geographical or population structure in mainland Europe: the first two described the north-south and southeast-southwest axes of variation, whereas PC3 reflected variation within the GBR population (Figure S1). We found that the effect sizes estimated in GIANT were highly correlated with the loading of the first PC of population structure (rho = 0.124, p = 1.57 × 10−92 for PC1) (Figure S2). Compared to the situation in GIANT, the correlations were smaller in UKB, and even more so in BBJ (e.g., rho = −0.0049, p = 0.258 in BBJ versus rho = 0.014, p = 0.0063 in UKB for PC1). Both measures were not significant after correcting for 20 PCs tested, although the correlation in UKB (p = 0.0063) would have been significant if only accounting for the three PC axes associated with geographical structure, PCs 1–3 (Figure S2).

We also conducted a similar PCA and included Sardinians (Figure S3). In this case, up to the first five PCs showed evidence of geographical or population structure: the first three PCs because of variations attributable to the population groups, whereas PC4 and PC5 were reflecting variation within the GBR and Sardinian populations, respectively (Figure S3). As was the case in the mainland-only analysis, effect sizes from GIANT were highly correlated with the loading of the first two PCs (Figures 1 and S2) and, to a lesser but significant extent, with the loading of PC3 and PC5 (rho = −0.026, p = 1.45 × 10−7 for PC3; rho = 0.021, p = 2.26 × 10−5 for PC5). Compared to the situation in GIANT, the correlations in UKB were again much smaller. Most worryingly, however, is the significant correlation with PC1 (rho = 0.020, p = 6.27 × 10−5), which was driven by the northern Europe-southern Europe-Sardinia axis of variation (Figures 1, S2, and S3). This correlation could in principle be driven by selection, or alternatively, the effect sizes on height estimated from UKB were not completely free from stratification. On the other hand, effect sizes from BBJ were generally non-significantly associated with any population structure in Europeans. More importantly, the magnitude of the correlation was at least an order of magnitude lower for PC1 (e.g., rho = −0.0007 in BBJ versus rho = 0.0202 in UKB for PC1) (Figures 1 and S2), the axis most likely to confound an analysis of height in Europe. Even though we cannot strictly rule out that the smaller sample size in BBJ would have less power to detect a genome-wide correlation of effect size estimates with PC loadings, we believe the conservative approach is to use height-associated SNPs ascertained from BBJ as the set of SNPs used in primary analysis.

Figure 1.

Figure 1

Evidence of Stratification in GWAS Summary Statistics

Pearson correlation coefficients of PC loadings and SNP effects from GIANT, UKB, and BBJ. PCs were computed in four 1000 Genomes European populations and Sardinians. p values are based on jackknife standard errors (1,000 blocks). p values lower than 0.05/20 (for testing 20 PCs; Figures S1–S3) are indicated on each bar.

Signals of Polygenic Adaptation in Sardinians

In order to evaluate the signal of polygenic adaptation in Sardinians by using height loci ascertained from BBJ, we first needed to demonstrate that these height loci, despite being ascertained from a geographically and genetically distant population, are predictive of height. On the basis of independent SNPs associated with height with p < 5 × 10−8 in GIANT (457 SNPs), UKB (774 SNPs), and BBJ (371 SNPs), we constructed a polygenic height score (Methods) and tested its correlation with height in 572 unrelated Sardinians with available height information. As expected, we found that polygenic scores constructed from all three GWAS panels were significantly correlated with sex-standardized height (Figure S4), although the correlation was smallest in BBJ compared to GIANT or UKB (rho = 0.21, R2 = 0.043 in BBJ; rho = 0.35, R2 = 0.122 in GIANT; rho = 0.38, R2 = 0.142 in UKB; Figure S4).

We then calculated the polygenic scores for Sardinians and the four mainland European populations (CEU, TSI, GBR, and IBS) on the basis of height loci ascertained from each of the three GWAS panels and used Berg and Coop’s Qx and conditional Z score framework to evaluate the significance of differences in polygenic scores across populations. Qualitatively, we found that across all GWAS panels the estimated polygenic height scores in Sardinians remain significantly lower than would be expected on the basis of its genetic relatedness to European populations (Figure 2). Although a direct comparison across the three GWAS panels is complicated by differences in GWAS populations, estimated effect sizes, and study power, among others factors, the degree to which Sardinians were genetically shorter was more attenuated when using summary statistics derived from UKB and BBJ, relative to GIANT (Sardinians were 0.22, 0.31, and 0.72 units of SD shorter than those in the CEU population when we used polygenic scores computed from UKB, BBJ, and GIANT, respectively).

Figure 2.

Figure 2

Excess Variance Tests in Sardinia

(A–C) The polygenic score was constructed on the basis of independent genome-wide significant SNPs from GIANT (A), UKB (B), and BBJ (C) GWAS summary statistics. Pval(Qx) denotes p values for Qx tests. The p values for conditional Z scores are represented by the size of each circle, and those lower than 0.01 are shown in the plot. The following abbreviations are used for each population: SDI, Sardinians; IBS, Iberian Population in Spain; TSI, Toscani in Italia; GBR, British in England and Scotland; and CEU, Utah residents with ancestry from northern and western Europe.

We constructed polygenic height scores by using only variants surpassing the genome-wide significance threshold to be better protected from uncorrected stratification, particularly when using summary statistics from European populations. In addition to this approach, previous reports11,12 also used an alternative approach to identify height-associated variants, namely by selecting the lowest p value SNPs from approximately independent LD blocks across the genome.30 This resulted in approximately 1,700 variants via the GIANT or UKB GWAS panels, or 1,400 variants via the BBJ panel. We reasoned that by including more sub-threshold variants in the construction of polygenic scores, particularly from a GWAS panel less effective in controlling for population structure, the statistical evidence consistent with polygenic adaptation would be spuriously improved. We thus examined the impact of this alternative ascertainment scheme in our study. When using SNPs from approximately independent LD blocks as compared to genome-wide significant SNPs, we observed that the difference in polygenic height scores between Sardinians and the CEU population increased from 0.72 to 1.61 units of SD when we used summary statistics from GIANT. More strikingly, the statistical evidence for adaptation, based on Qx and the conditional Z score for Sardinians, became much stronger (p for conditional Z score decreased from 6.48 × 10−9, Figure 2, to 3.33 × 10−15, Figure S5). These results suggest that the exaggerated signature of polygenic adaptation using GIANT was at least partly due to the practice of ascertaining SNPs in approximate linkage equilibrium blocks, which are enriched for loci that escaped statistical control of stratification. We observed a similar trend, although to a lesser extent, when using summary statistics from UKB (Figure S5). In contrast, when using summary statistics from BBJ, the LD-block-based ascertainment scheme actually decreased the statistical evidence of adaptation (p increased from 3.89 × 10−4 to 9.03 × 10−4, Figure S5). These results are consistent with the observation that the strongest correlation between effect sizes and PC loadings are found in GIANT, followed by UKB, followed by BBJ (Figure 1), and they suggest that a better analytical practice would be analyzing a set of independent variants ranked by p values such that true height-associated variants will be highly enriched.

The interpretation of the conditional Z score results (Figure 2) implicitly assumes that all other populations tested in this framework are evolving neutrally. Our finding of a significant conditional Z in Sardinians could also be explained if height-associated loci were evolving neutrally in Sardinia but the height-increasing alelles were collectively increasing in frequency in all mainland European populations. To further investigate if selection is acting on height-associated loci in Sardinia, we compared the trajectory of polygenic height scores in the Sardinian population to that from the GBR population. Using the proportion-of-lineages estimator from Edge and Coop,15 we observed that the population-mean polygenic score for height between the Sardinian and GBR populations had been diverging in at least the past ∼7–10 ky (thousand years) (Figure 3). We tested whether the difference in polygenic score trajectories between the Sardinian and GBR populations was significant for two time points: 20 ky and 10 ky before present. The former time point is approximately the time with the first evidence of human inhabitation on Sardinia.34,35 The latter time point is approximately the beginning of the Neolithic period and the estimated divergence time between the Sardinian and mainland northern European populations.26 We found that the mean difference in polygenic height score was significant 10 kya (thousand years ago) (p = 0.0082). The trend was not significant between 20 kya and 10 kya (p = 0.7406). When using the other two estimators of polygenic score trajectory presented in Edge and Coop, the pattern was much less obvious (Figure S6); these estimators are known to be significantly noisier and thus have much less power to detect selection.15 When using height-associated loci ascertained from UKB, we observed similar results (Figure S7; p = 0.017 and p = 0.6641 for differences between 10 kya to present and 20 kya to 10 kya, respectively). Because visual inspections of the polygenic score trajectory in GBR and Sardinian populations suggested that the scores diverged in opposite directions in the recent past, we tested for changes in the trajectory in each population at three time points: 10 kya, 5 kya, and 1 kya (Table 1). We found that polygenic height scores appeared to be increasing (though only marginally significant, if at all) in the GBR population since at least 10 kya (Table 1). Over the same time, the polygenic height scores appeared to be decreasing in the Sardinian population (Table 1). Although this test is post hoc and the p values generally would not be significant after multiple testing correction, these results were consistent with a stronger signal of decreasing height scores in Sardinia, but also potentially consistent with natural selection acting in opposing directions between the two populations.

Figure 3.

Figure 3

The Trajectory of Mean Polygenic Height Scores in British (GBR) and Sardinian (SDI) Populations over the Past 25 ky

The past polygenic scores are estimated by the proportion-of-lineages estimators from Edge and Coop with height loci and effect sizes ascertained from BBJ. The left panel shows the mean polygenic scores in the GBR and SDI populations via the proportion-of-lineages estimator. The right panel shows the difference between GBR and SDI populations in the mean polygenic scores via this estimator. Shaded areas denote the 95% confidence interval. The dashed lines indicate two epochs (10 kya and 20 kya) where we tested whether the polygenic height score in the Sardinian population changed relative to that in the GBR population. The mean polygenic height score in the Sardinian population began significantly decreasing in comparison to that in the GBR population since at least 10 kya (p = 0.0082), whereas the difference was not significant between 20 kya and 10 kya (p = 0.7406).

Table 1.

Changes in Historical Polygenic Scores in British and Sardinian Populations

GWAS Panel Test Population Estimated Polygenic Height scores at Different Time Points (p value)
Present 1 kya 5 kya 10 kya
UKB GBR −0.713 −0.769 (p = 0.006) −0.730 (p = 0.707) −0.801 (p = 0.166)
SDI −0.975 −0.932 (p = 0.067) −0.874 (p = 0.063) −0.846 (p = 0.091)
BBJ GBR −0.904 −0.944 (p = 0.045) −0.997 (p = 0.049) −0.970 (p = 0.367)
SDI −1.153 −1.094 (p = 0.015) −1.041 (p = 0.044) −0.971 (p = 0.031)

Test was conducted in British (GBR) and Sardinian (SDI) populations at three time points (10 kya, 5 kya, and 1 kya) relative to the present. Polygenic height score trajectories estimated from variants ascertained from UKB and BBJ are shown in Figure S7 and Figure 3, respectively.

Signals of Polygenic Adaptation in Mainland Europeans

We then focused on evaluating whether there is a signal of polygenic adaptation in mainland Europeans, in which the original findings of polygenic adaptation4,5,7,8 were recently challenged and attributed to uncorrected stratification.11,12 Using Berg and Coop’s QX and conditional Z score, we observed a clear signal of adaptation when we ascertained height loci from GIANT, but not when we ascertained them from BBJ (Figure S8), suggesting the signal from Qx analysis may be largely driven by uncorrected stratification in the GIANT data. Height loci ascertained from BBJ tended to be rarer in Europe than those ascertained from UKB (Figure S9), and rare BBJ-ascertained loci were less associated with height in Europe than common BBJ-ascertained loci (Figure S10). For example, of the 77 BBJ-ascertained variants with an MAF < 0.1 in Europeans and present in the UKB, 41.56% of them (32 variants) were not associated with height in UKB (p > 0.05), suggesting possibly that differences in LD between UKB and BBJ dissociated the causal SNP from the proxy we selected or that the effect on height was specific to BBJ. We investigated whether these factors impacted the power of the Qx statistic. Restricting analysis to BBJ-ascertained variants that are common or significantly associated with height in the UKB did not qualitatively change the results (Figure S11). Furthermore, direct comparisons of frequency changes of BBJ-ascertained variants in 1000 Genomes European populations (Table S1) or gnomAD populations (Table S2 and Figure S12) did not show any difference. Taken together, we concurred with previous authors that the Qx analyses do not support a signal of polygenic adaptation in mainland Europe. Any signal of adaptation in the mainland Europe, if it exists, is undoubtedly weaker than that in Sardinia on the basis of our Qx analysis.

Haplotype-based analysis might be more sensitive for detecting adaptation. For example, the SDSs previously computed for a UK population might be better powered than our Qx analyses to detect adaptation over the last 2,000–3,000 years.8 We thus evaluated the signal of polygenic adaptation in 3,195 white British individuals from UK10K dataset by using SDSs polarized to the height-increasing allele (tSDSs; Methods). Using independent variants surpassing genome-wide significance in UKB, we found that tSDSs for height-increasing alleles were significantly elevated (p = 3.4 × 10−4; Figure 4), consistent with previous reports for the most strongly associated height loci.11,12 More reassuringly, when we examined the significant height loci ascertained from BBJ, we observed a similar pattern (p = 9.1 × 10−4; Figure 4). This suggests that the height-increasing alleles, compared to the height-decreasing alleles at the same genomic site, were more likely to be found on the longer haplotype, consistent with positive selection in the recent past in the British UK10K population. The tSDS result also corroborated our observation of an upward but marginally insignificant trend in polygenic score trajectory in the 1000 Genomes GBR population (Figure 3 and Table 1). Together our results imply that outside of the Sardinian population, height differences in some populations in mainland Europe might be driven by polygenic selection in the recent past, although this conclusion should still be taken with caution because it is not supported by the Qx analyses above.

Figure 4.

Figure 4

The Average of tSDSs in Height-Associated SNPs Ascertained from UKB and BBJ

The histogram is the null distribution of average tSDSs from 100,000 permutations, from which we estimate the empirical p value shown. The dashed line indicates the observed average of tSDSs.

Discussion

By ascertaining height-associated alleles from the BBJ GWAS panel, we showed that frequencies of these alleles were not impacted by population structure in Europe as represented by the first 20 PCs (Figure 1 and Figure S2). Using this set of height loci, our study has demonstrated that height alleles appear to be under selection in some European populations. Our observation among the Sardinians is qualitatively consistent with previous reports in this population,6 although the signal we find is more attenuated (Figure 2). Moreover, using the recently developed method to infer the trajectory of polygenic height scores, we showed a decrease in polygenic height scores in the Sardinians over the last 10,000 years or so (Table 1). Using BBJ-ascertained alleles, we could not detect a signature of polygenic adaptation in mainland Europe via frequency-based methods, such as the Qx test (Figures S8 and S11 and Tables S1 and S2). This is consistent with previous suggestions11,12 that selection signals, if any, would be much more attenuated after controlling for population stratification. However, we did find an upward, although marginally insignificant, trend in polygenic height scores in the British population within the last 5,000 years (Figure 3 and Table 1). We also observed a robust signal of selection for height-increasing alleles in an independent British population by using tSDSs (Figure 4), which replicated previous findings based on height loci ascertained from the UKB GWAS panel.11,12 This tSDS result, although not corroborated by our Qx test in 1000 Genome populations, implies that any robust selection signal in the British population is most likely due to selection of the recent past.

A major consideration in detecting signals of polygenic selection is examining the causal SNPs for the trait of interest. As such, we would advocate focusing on genome-wide significant variants, which could be more robust to confounding by population stratification (although the effect sizes might still reflect some residual stratification in a polygenic score style of analysis). On the other hand, trait-associated variants found in GWASs using genotyping data are only proxies for causal variation. Differences in LD across populations could thus lead to a decrease in the accuracy of polygenic score prediction and lowered power in detecting polygenic selection. Fine-mapping could help identify the causal or the best-tagging variants at associated loci. We elected not to conduct a fine-mapping analysis because currently the largest GWAS datasets are in Europeans and East Asians; fine-mapping approaches using Europeans might cause residual stratification to seep into the ascertainment scheme. As larger non-European GWAS datasets become available, fine-mapping studies outside of Europe might provide a better set of causal alleles associated with height to address the question of adaptation in Europe.

A second consideration is that the lack of signals in the analysis of mainland Europeans using direct comparison of allele frequencies (Table S1) might be partly due to the small sample sizes in publicly available, geographically indexed, whole-genome sequences. Because only a subtle allele frequency shift would be expected in mainland Europe, the imprecision in allele frequency estimates can mask the signal of adaptation (subtle coordinated allele frequency shifts between populations). We note that Berg and Coop’s framework of excess variance tests and conditional Z scores5 could also possibly be impacted by this issue with precision of allele frequency estimates.

Concerns of both the imprecise ascertainment of causal allele and of allele frequency estimates could be partly overcome by haplotype-based methods. Using height-associated loci ascertained from BBJ, where population stratification should no longer be an issue, we observed a robust signal of recent selection for height-increasing alleles by using the tSDS, which was calculated on a sample size of > 3,000 UK individuals8 (Figure 4). This finding was also corroborated by our polygenic score trajectory analysis, which did also show a potential uptick in polygenic height scores over the last 5,000 years or so (Figure 3). Even though the statistical evidence was marginal (Table 1), our observation was consistent with previous findings in the same population15 (but we estimated the ancestral recombination graphs by using a more recent method). The polygenic score trajectory analysis is a hybrid approach based on an inferred ancestral recombination graph that combines both haplotype and genotype information, so the same shortcomings in a frequency-based approach could be similarly in play here. Moreover, computationally this approach is currently limited to smaller sample sizes, which might also limit our resolution in the recent past. Therefore, future scalable inference on genome-wide genealogies from an independent northern European population might help address this discrepancy.

Taken together, our findings remain consistent with natural selection leading to shorter stature in the Sardinians. Taking into account recent evidence of selection for shorter stature in the island of Flores, these observations might suggest a general impact due to the island effect, akin to what has been observed in some island mammals who became adaptively smaller relative to their mainland counterparts37 (Model 1 in Figure 5). However, because the power of polygenic score trajectory to detect betweenpopulation differences decreases going further back in time,15 we cannot definitively infer the onset of adaptation toward shorter height on the island of Sardinia. It is possible that adaptation occurred in the ancestors of modern Sardinians. Because of the relative isolation in Sardinia,26 the Sardinian population is expected to exhibit the strongest effect among European populations today (Model 2 in Figure 5). This might be consistent with the recent observation that Neolithic European populations are shorter than both their predecessors and their successors in Europe in both genetic height scores and skeletal stature;38 Sardinians retained the largest amount of Neolithic ancestry among a number of extant European populations tested.26,39 Furthermore, the two models are not mutually exclusive and might be acting along with non-additive components of height variation6,40 to lead to the large difference in height observed between Sardinians and mainland Europeans. Further explorations to differentiate these two models will most likely rely on examining a large number of ancient specimens from Sardinia,41,42 as well as studying other isolated island populations across the world.

Figure 5.

Figure 5

Possible Models of Natural Selection for Human Height

Assuming that natural selection has occurred and can be detected by the framework utilized in this paper, we illustrate two scenarios in which we cannot infer the timing and the location of selection. Each figure represents a modern or ancestral population and has a pseudo-score representing genetic height labeled in the circle, whose size corresponds to this score. Model 1 (left) illustrates the island effect in which an island population, possibly like the one in Sardinia, was selected for shorter stature after arriving at the island. The non-island population was assumed to be not under selection. Model 2 (right) illustrates an alternative model in which selection occurred much more anciently such that differentiation between populations has occurred. It is unclear the direction of selection in this scenario, and it might be in opposite directions in different populations because of differential interactions with the geography or environment. At this point, height no longer needs to be selected, but the subsequent migration between populations establishes the pattern of height variability. Note that if natural selection did not occur, alternative mechanisms, such as a genetic drift or GxE interaction, could also lead to differences in genetic and/or phenotypic height. Also note that instead of height, a trait or collection of traits proxied by height could be under selection, although we focused on height in this illustration.

When considering Europe at large, our tSDS and polygenic score trajectory findings in the British population suggest that selection for taller height might have occurred in the more recent past, within the last 5,000 years and more likely within the last 1,000–2,000 years. However, this finding could still be consistent with previous suggestions that the post-Neolithic Eurasian Steppe populations might have been selected for increased height38,43 and that admixture of these populations, in different proportions in mainland Europe, provided the tSDS signal and contributed to the pattern of height variation across Europe (Model 2 in Figure 5). Note that because the signal of height adaptation is weaker in mainland European populations, we cannot rule out higher order confounding factors, such as more subtle substructures within the British population (e.g., varying degrees of Steppe admixture), or gene-by-environment (GxE) interaction effects among the populations used to ascertain height alleles and test for selection. Large-scale family-based ascertainment of height-associated loci might ultimately be illuminating. What does seem clear is that if selection did occur in present-day mainland European populations, it would have been independent of a selection for shorter height in Sardinians or their ancestors because we observed a diverging trend in polygenic height scores between Sardinians and individuals of British ancestry (Figure 3).

In summary, although the timing and geographical location of selection for height (or alternatively, for a set of traits correlated with height) remain elusive, it seems evident that human height differences in Europe have been driven by selection in at least some instances. Multiple episodes of adaptation might have occurred and influenced the height of past populations. Signatures of these adaptive events might have stemmed from outside of Sardinia, but today they are much more obscured, or have even changed direction, because of recent population migrations and admixture. Furthermore, much of the literature characterizing polygenic score models has focused on its prediction accuracy in the population in which the GWAS summary statistics were derived and the poor transferability of this model to other populations.18 Our results here demonstrate that although polygenic score estimators derived from a distant population might have reduced prediction accuracy (Figure S4), they are also less biased by fine-scale population structure in a population of interest (Figure 1) and can greatly help addressing important population genetic questions.

Declaration of Interests

The authors declare no competing interests.

Acknowledgments

We gratefully thank Graham Coop, Michael D. Edge, and Michael C. Turchin for helpful comments and discussions. This work is supported in part by the Intramural Research Program of the US National Institutes of Health, National Institute on Aging (N01-AG-1-2109 and HHSN271201100005C) and by start-up funds provided by the Center for Genetic Epidemiology at the Keck School of Medicine of the University of Southern California (USC) (to C.W.K.C.). Computation for this work is supported by USC’s Center for High-Performance Computing(https://hpcc.usc.edu).

Published: June 12, 2020

Footnotes

Supplemental Data can be found online at https://doi.org/10.1016/j.ajhg.2020.05.014.

Contributor Information

Minhui Chen, Email: minhuic@usc.edu.

Charleston W.K. Chiang, Email: charleston.chiang@med.usc.edu.

Web Resources

Supplemental Data

Document S1. Figures S1–S12 and Tables S1 and S2
mmc1.pdf (7.8MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (8.5MB, pdf)

References

  • 1.Pritchard J.K., Pickrell J.K., Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 2010;20:R208–R215. doi: 10.1016/j.cub.2009.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bentham J., Di Cesare M., Stevens G.A., Zhou B., Bixby H., Cowan M., Fortunato L., Bennett J.E., Danaei G., Hajifathalian K., NCD Risk Factor Collaboration (NCD-RisC) A century of trends in adult human height. eLife. 2016;5:e13410. doi: 10.7554/eLife.13410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Grasgruber P., Cacek J., Kalina T., Sebera M. The role of nutrition and genetics as key determinants of the positive height trend. Econ. Hum. Biol. 2014;15:81–100. doi: 10.1016/j.ehb.2014.07.002. [DOI] [PubMed] [Google Scholar]
  • 4.Turchin M.C., Chiang C.W., Palmer C.D., Sankararaman S., Reich D., Hirschhorn J.N., Genetic Investigation of ANthropometric Traits (GIANT) Consortium Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 2012;44:1015–1019. doi: 10.1038/ng.2368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Berg J.J., Coop G. A population genetic signal of polygenic adaptation. PLoS Genet. 2014;10:e1004412. doi: 10.1371/journal.pgen.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zoledziewska M., Sidore C., Chiang C.W.K., Sanna S., Mulas A., Steri M., Busonero F., Marcus J.H., Marongiu M., Maschio A., UK10K consortium. Understanding Society Scientific Group Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 2015;47:1352–1356. doi: 10.1038/ng.3403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Robinson M.R., Hemani G., Medina-Gomez C., Mezzavilla M., Esko T., Shakhbazov K., Powell J.E., Vinkhuyzen A., Berndt S.I., Gustafsson S. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 2015;47:1357–1362. doi: 10.1038/ng.3401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Field Y., Boyle E.A., Telis N., Gao Z., Gaulton K.J., Golan D., Yengo L., Rocheleau G., Froguel P., McCarthy M.I. Detection of human adaptation during the past 2000 years. Science. 2016;354:760–764. doi: 10.1126/science.aag0776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Guo J., Wu Y., Zhu Z., Zheng Z., Trzaskowski M., Zeng J., Robinson M.R., Visscher P.M., Yang J. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat. Commun. 2018;9:1865. doi: 10.1038/s41467-018-04191-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tucci S., Vohr S.H., McCoy R.C., Vernot B., Robinson M.R., Barbieri C., Nelson B.J., Fu W., Purnomo G.A., Sudoyo H. Evolutionary history and adaptation of a human pygmy population of Flores Island, Indonesia. Science. 2018;361:511–516. doi: 10.1126/science.aar8486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sohail M., Maier R.M., Ganna A., Bloemendal A., Martin A.R., Turchin M.C., Chiang C.W., Hirschhorn J., Daly M.J., Patterson N. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702. doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Berg J.J., Harpak A., Sinnott-Armstrong N., Joergensen A.M., Mostafavi H., Field Y., Boyle E.A., Zhang X., Racimo F., Pritchard J.K., Coop G. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wood A.R., Esko T., Yang J., Vedantam S., Pers T.H., Gustafsson S., Chu A.Y., Estrada K., Luan J., Kutalik Z., Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kerminen S., Martin A.R., Koskela J., Ruotsalainen S.E., Havulinna A.S., Surakka I., Palotie A., Perola M., Salomaa V., Daly M.J. Geographic Variation and Bias in the Polygenic Scores of Complex Diseases and Traits in Finland. Am. J. Hum. Genet. 2019;104:1169–1181. doi: 10.1016/j.ajhg.2019.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Edge M.D., Coop G. Reconstructing the History of Polygenic Scores Using Coalescent Trees. Genetics. 2019;211:235–262. doi: 10.1534/genetics.118.301687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Haworth S., Mitchell R., Corbin L., Wade K.H., Dudding T., Budu-Aggrey A., Carslake D., Hemani G., Paternoster L., Smith G.D. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun. 2019;10:333. doi: 10.1038/s41467-018-08219-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Liu X., Loh P.-R., O’Connor L.J., Gazal S., Schoech A., Maier R.M., Patterson N., Price A.L. Quantification of genetic components of population differentiation in UK Biobank traits reveals signals of polygenic selection. bioRxiv. 2018 doi: 10.1101/357483. [DOI] [Google Scholar]
  • 18.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Akiyama M., Ishigaki K., Sakaue S., Momozawa Y., Horikoshi M., Hirata M., Matsuda K., Ikegawa S., Takahashi A., Kanai M. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 2019;10:4393. doi: 10.1038/s41467-019-12276-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Nagai A., Hirata M., Kamatani Y., Muto K., Matsuda K., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Mushiroda T. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 2017;27(3S):S2–S8. doi: 10.1016/j.je.2016.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hirata M., Kamatani Y., Nagai A., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Kubo M., Muto K., Mushiroda T., BioBank Japan Cooperative Hospital Group Cross-sectional analysis of BioBank Japan clinical data: A large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 2017;27(3S):S9–S21. doi: 10.1016/j.je.2016.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Okada Y., Momozawa Y., Sakaue S., Kanai M., Ishigaki K., Akiyama M., Kishikawa T., Arai Y., Sasaki T., Kosaki K. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 2018;9:1631. doi: 10.1038/s41467-018-03274-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Loh P.-R., Tucker G., Bulik-Sullivan B.K., Vilhjálmsson B.J., Finucane H.K., Salem R.M., Chasman D.I., Ridker P.M., Neale B.M., Berger B. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Locke A.E., Steinberg K.M., Chiang C.W.K., Service S.K., Havulinna A.S., Stell L., Pirinen M., Abel H.J., Chiang C.C., Fulton R.S., FinnGen Project Exome sequencing of Finnish isolates enhances rare-variant association power. Nature. 2019;572:323–328. doi: 10.1038/s41586-019-1457-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chiang C.W.K., Marcus J.H., Sidore C., Biddanda A., Al-Asadi H., Zoledziewska M., Pitzalis M., Busonero F., Maschio A., Pistis G. Genomic history of the Sardinian population. Nat. Genet. 2018;50:1426–1434. doi: 10.1038/s41588-018-0215-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sidore C., Busonero F., Maschio A., Porcu E., Naitza S., Zoledziewska M., Mulas A., Pistis G., Steri M., Danjou F. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 2015;47:1272–1281. doi: 10.1038/ng.3368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Price A.L., Weale M.E., Patterson N., Myers S.R., Need A.C., Shianna K.V., Ge D., Rotter J.I., Torres E., Taylor K.D.D. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 2008;83:132–135. doi: 10.1016/j.ajhg.2008.06.005. author reply 135–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.McVicker G., Gordon D., Davis C., Green P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 2009;5:e1000471. doi: 10.1371/journal.pgen.1000471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Loh P.R., Danecek P., Palamara P.F., Fuchsberger C., A Reshef Y., K Finucane H., Schoenherr S., Forer L., McCarthy S., Abecasis G.R. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Speidel L., Forest M., Shi S., Myers S.R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 2019;51:1321–1329. doi: 10.1038/s41588-019-0484-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Calò C., Melis A., Vona G., Piras I. Review Synthetic Article: Sardinian Population (Italy): a Genetic Review. Int. J. Mod. Anthropol. 2008;1:39–64. [Google Scholar]
  • 35.Vona G. The peopling of Sardinia (Italy): history and effects. Int. J. Anthropol. 1997;12:71–87. [Google Scholar]
  • 36.Campbell C.D., Ogburn E.L., Lunetta K.L., Lyon H.N., Freedman M.L., Groop L.C., Altshuler D., Ardlie K.G., Hirschhorn J.N. Demonstrating stratification in a European American population. Nat. Genet. 2005;37:868–872. doi: 10.1038/ng1607. [DOI] [PubMed] [Google Scholar]
  • 37.Millien V. Morphological evolution is accelerated among island mammals. PLoS Biol. 2006;4:e321. doi: 10.1371/journal.pbio.0040321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cox S.L., Ruff C.B., Maier R.M., Mathieson I. Genetic contributions to variation in human stature in prehistoric Europe. Proc. Natl. Acad. Sci. USA. 2019;116:21484–21492. doi: 10.1073/pnas.1910606116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Haak W., Lazaridis I., Patterson N., Rohland N., Mallick S., Llamas B., Brandt G., Nordenfelt S., Harney E., Stewardson K. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–211. doi: 10.1038/nature14317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Joshi P.K., Esko T., Mattsson H., Eklund N., Gandin I., Nutile T., Jackson A.U., Schurmann C., Smith A.V., Zhang W. Directional dominance on stature and cognition in diverse human populations. Nature. 2015;523:459–462. doi: 10.1038/nature14618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Marcus J.H., Posth C., Ringbauer H., Lai L., Skeates R., Sidore C., Beckett J., Furtwängler A., Olivieri A., Chiang C.W.K. Genetic history from the Middle Neolithic to present on the Mediterranean island of Sardinia. Nat. Commun. 2020;11:939. doi: 10.1038/s41467-020-14523-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Fernandes D.M., Mittnik A., Olalde I., Lazaridis I., Cheronet O., Rohland N., Mallick S., Bernardos R., Broomandkhoshbacht N., Carlsson J. The spread of steppe and Iranian-related ancestry in the islands of the western Mediterranean. Nat. Ecol. Evol. 2020;4:334–345. doi: 10.1038/s41559-020-1102-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Martiniano R., Cassidy L.M., Ó’Maoldúin R., McLaughlin R., Silva N.M., Manco L., Fidalgo D., Pereira T., Coelho M.J., Serra M. The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet. 2017;13:e1006852. doi: 10.1371/journal.pgen.1006852. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S12 and Tables S1 and S2
mmc1.pdf (7.8MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (8.5MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES