Variable prediction accuracy of polygenic scores within an ancestry group

Hakhamanesh Mostafavi; Arbel Harpak; Ipsita Agarwal; Dalton Conley; Jonathan K Pritchard; Molly Przeworski

doi:10.7554/eLife.48376

. 2020 Jan 30;9:e48376. doi: 10.7554/eLife.48376

Variable prediction accuracy of polygenic scores within an ancestry group

Hakhamanesh Mostafavi ^1,^†,^‡,^✉, Arbel Harpak ^1,^†,^✉, Ipsita Agarwal ¹, Dalton Conley ^2,³, Jonathan K Pritchard ^4,^5,⁶, Molly Przeworski ^1,^7,^✉

Editors: Ruth Loos⁸, Michael B Eisen⁹

PMCID: PMC7067566 PMID: 31999256

Abstract

Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.

Research organism: Human

eLife digest

Complex diseases like cancer and heart disease are caused by the interplay of many factors: the variants of genes we inherit, the lifestyles we lead and the environments we inhabit, plus the interaction of all these factors. In fact, almost every trait, even how many years we will spend studying, is influenced both by our environment and our genes.

To identify some of the genetic factors at play, scientists perform analyses known as genome-wide association studies, or GWAS for short. In these studies, the genomes from many different people are scanned to look for genetic differences associated with differences in traits. By summing up all the small genetic differences, so-called “polygenic scores” can be calculated. When there is a large genetic component to a trait, polygenic scores can be useful predictive tools.

But there is a catch: polygenic scores make less accurate predictions for individuals of a different ancestry than those involved in the GWAS, which limits the use of these tools around the world. Mostafavi, Harpak et al. set out to understand if there are other factors in addition to ancestry that could influence the performance of polygenic scores.

Using data from the UK Biobank, an international health resource that pairs genomic data and clinical information, Mostafavi, Harpak et al. examined polygenic scores among individuals that share a single, common ancestry. These polygenic scores were used to predict three traits (blood pressure, body mass index and educational attainment) in individuals and the predictions were then compared to the actual trait values to see how accurate they were. The analysis revealed that even within a group of people with similar ancestry, the accuracy of polygenic scores can vary, depending on characteristics such as the sex, age or socioeconomic status of the individuals.

This analysis emphasises how variable GWAS and their predictive value can be even within seemingly similar population groups. It further highlights both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use in medical and social sciences.

Introduction

Genome-wide association studies (GWAS) have now been conducted for thousands of human complex traits, revealing that the genetic architecture is almost always highly polygenic, that is that the bulk of the heritable variation is due to thousands of genetic variants, each with tiny marginal effects (Boyle et al., 2017; Bulik-Sullivan et al., 2015). These findings make it difficult to interpret the molecular basis for variation in a trait, but they lend themselves more immediately to another use: phenotypic prediction. Under the assumption that alleles act additively, a 'polygenic score' (PGS) can be created by summing the effects of the alleles carried by an individual; this score can then be used to predict that individual’s phenotype (Henderson, 1984; Meuwissen et al., 2001; Kathiresan et al., 2008; Lynch and Walsh, 1998). For highly heritable traits, such scores already provide informative predictions in some contexts: for example, prediction accuracies are 24.4% for height (using $R^{2}$ as a measure) (Yengo et al., 2018) and up to 13% for educational attainment (using incremental $R^{2}$ ) (Lee et al., 2018).

This genomic approach to phenotypic prediction has been rapidly adopted in three distinct fields. In human genetics, PGS have been shown to help identify individuals that are more likely to be at risk of diseases such as breast cancer and cardiovascular disease (Khera et al., 2018; Inouye et al., 2018; Mavaddat et al., 2019; Khera et al., 2019). Based on these findings, a number of papers have advocated that PGS be adopted in designing clinical studies, and by clinicians as additional risk factors to consider in treating patients (Torkamani et al., 2018; Khera et al., 2018). In human evolutionary genetics, several lines of evidence suggest that adaptation may often take the form of shifts in the optimum of a polygenic phenotype and hence act jointly on the many variants that influence the phenotype (Pritchard and Di Rienzo, 2010; Berg and Coop, 2014; Höllinger et al., 2019; Sella and Barton, 2019). In this context, the goal is to test whether the set of variants that influence a trait are rapidly evolving across populations or over time (Field et al., 2016; Berg et al., 2019; Uricchio et al., 2019; Edge and Coop, 2019; Racimo et al., 2018; Berg and Coop, 2014). Finally, in various disciplines of the social sciences, PGS are increasingly used to distinguish environmental from genetic sources of variability (Conley, 2016), as well as to understand how genetic variation among individuals may cause heterogeneous treatment effects when studying how an environmental influence (e.g., a schooling reform) affects an outcome (such as BMI) (Barcellos et al., 2018; Davies et al., 2018). In all these applications, the premise is that PGS will ‘port’ well across groups—that is that they remain predictive not only in samples very similar to the ones in which the GWAS was conducted, but also in other sets of individuals (henceforth ‘prediction sets’).

As recent papers have highlighted, however, PGS are not as predictive in individuals whose genetic ancestry differs substantially from the ancestry of individuals in the original GWAS (reviewed in Martin et al., 2019). As one illustration, PGS calculated in the UK Biobank predict phenotypes of individuals sampled in the UK Biobank better than those of individuals sampled in the BioBank Japan Project: for instance, the incremental $R^{2}$ for height is approximately 11% in the UK versus 3% in Japan (Martin et al., 2019). Similarly, using PGS based on Europeans and European-Americans, the largest educational attainment GWAS to date ('EA3') reported an incremental $R^{2}$ of 10.6% for European-Americans but only 1.6% for African-Americans (Lee et al., 2018).

To date, such observations have been discussed mainly in terms of population genetic factors that reduce portability (Martin et al., 2017; Kim et al., 2018; Duncan et al., 2018; De La Vega and Bustamante, 2018; Sirugo et al., 2019; Martin et al., 2019). Notably, GWAS does not pinpoint causal variants, but instead implicates a set of possible causal variants that lie in close physical proximity in the genome. The estimated effect of a given SNP depends on the extent of linkage disequilibrium (LD) with the causal sites (Pritchard and Przeworski, 2001; Bulik-Sullivan et al., 2015). LD differences between populations that arose from their distinct demographic and recombination histories will lead to variation in the estimated effect sizes and hence to variable phenotypic prediction accuracies (Rosenberg et al., 2019). Populations will also differ in the allele frequencies of causal variants. This problem is particularly acute for alleles that are rare in the population in which the GWAS was conducted but common in the population in which the trait is being predicted. Such variants are likely to have noisy effect size estimates in the estimation sample or may not be included in the PGS at all, and yet they contribute substantially to heritability in the target population. Furthermore, causal loci or effect sizes may differ among populations, for instance if the effect of an allele depends on the genetic background on which it arises (e.g., Adhikari et al., 2019). For all these reasons, we should expect PGS to be less predictive across ancestries.

In practice, given that most individuals (about 80%) included in current GWAS are of European ancestry (Popejoy and Fullerton, 2016; Martin et al., 2019), PGS are systematically more predictive in European-ancestry individuals than among other people. As a consequence, the clinical applications and scientific understanding to be gained from PGS will predominantly and unfairly benefit a small subset of humanity. A number of papers have therefore highlighted the importance of expanding GWAS efforts to include more diverse ancestries (Martin et al., 2018; Bien et al., 2019; Wojcik et al., 2019; Martin et al., 2019; Sirugo et al., 2019).

Importantly, factors other than ancestry could also impact the accuracy and portability of PGS. For example, the educational attainment of an individual depends not only on their own genotype, but on the genotypes of their parents, due to nurturing effects (Kong et al., 2018), and of their peers, due to social genetic effects (Domingue et al., 2018), and of course on non-genetic factors. Also, traits such as height and educational attainment show strong patterns of assortative mating, which can distort effect size estimates in GWAS (Domingue et al., 2014; Robinson et al., 2017; Ruby et al., 2018). To what extent these effects remain the same across cultures and environments is unknown, but if they differ, so will the prediction accuracy. More generally, while we still know little about genotype-environment interactions (GxE) in humans, they are well-documented in other species—notably in experimental settings—and would further reduce the portability of PGS across environments (Gibson, 2008; Tropf et al., 2017; Mills and Rahal, 2019; Lynch and Walsh, 1998). In addition, the extent of environmental variability could differ between GWAS and prediction groups, which would change the proportion of the variance in the trait explained by a PGS (i.e., the prediction accuracy). PGS for some traits may also include a component of environmental or cultural confounding with population structure (Sohail et al., 2019; Haworth et al., 2019; Lawson et al., 2020; Kerminen et al., 2018; Berg et al., 2019); this source of confounding can increase or decrease prediction accuracy, depending on the structure in the prediction samples.

Given these considerations, it is important to ask to what extent PGS are portable among groups within the same ancestry. To explore this question, we stratified the subset of UK Biobank samples designated as ‘White British’ (WB) according to some of the standard sample characteristics of GWAS studies: the ages of the individuals, their sex, and socio-economic status. We chose to focus on these particular characteristics because they vary among GWAS samples depending on sample ascertainment procedures. Furthermore, these characteristics have been shown to influence heritability for some traits in a study of a subset of the UK Biobank (Ge et al., 2017), raising the possibility that these choices also influence prediction accuracy. Indeed, for three example traits, we show that there exist major differences in the prediction accuracy of the PGS among these groups, even though they share highly similar genetic ancestries. We further demonstrate for a variety of traits that prediction accuracy differs markedly depending on whether the GWAS is conducted in unrelated individuals or in pairs of siblings, even when controlling for the precision of the estimates. This finding is again unexpected under standard GWAS assumptions; it underscores the importance of genetic effects that are included in estimates from some study designs and not others and highlights underappreciated challenges with GWAS-based phenotypic prediction.

At present, it is difficult to determine the reasons why we see such variable prediction accuracy across these strata and study designs. Contributing factors probably include indirect genetic effects from relatives, assortative mating, varying levels of genetic and environmental variance, GxE interaction effects and perhaps undetected confounding. Nonetheless, our results make clear that the prediction accuracy of PGS can be affected in unpredictable ways by known—and presumably unknown—factors in addition to genetic ancestry.

Results

Sample characteristics of the GWAS and prediction set can influence prediction accuracy even within a single ancestry

We examined how PGS for a few example traits port across samples that are of similar genetic ancestry but differ in terms of some common study characteristics, such as the male:female ratio (henceforth ‘sex ratio’), age distribution, or socio-economic status (SES). To this end, we limited our analysis to the largest subset of individuals in the UKB with a relatively homogeneous ancestry: 337,536 unrelated individuals that were characterized by the UKB, based on self-reported ethnicities as well as genetic analysis, as ‘White British’ (WB) (Bycroft et al., 2018). In all analyses, we further adjusted for the first 20 principal components of the genotype data, to account for population structure within this set of individuals (Materials and methods).

In all analyses, we randomly selected a subset of individuals to be the prediction set; we then conducted GWAS using the remaining individuals and built a PGS model by LD-based clumping of the associations (Materials and methods). To examine the reliability of the prediction, we considered the incremental $R^{2}$ , that is the $R^{2}$ increment obtained when adding the PGS to a model with other covariates (referred to as 'prediction accuracy' henceforth). Whether this measure is appropriate depends on how PGS are to be used; it is not always the most obvious choice in human genetics, where the goal is often to identify individuals at high risk of developing a particular disease (i.e., in the tail of the polygenic score distribution). Nonetheless, because it has been widely reported in discussions of portability across genetic ancestries (e.g., Lee et al., 2018; Martin et al., 2019), we also used it here; later, we also present some results on binary traits using incremental area under the receiver operator curve (AUC).

As a first case, we considered the prediction accuracy of a PGS for diastolic blood pressure in prediction sets stratified by sex, motivated by reports that variation in this trait may arise for somewhat distinct reasons in the two sexes (Reckelhoff, 2001; Zhou et al., 2017). We randomly selected males and females as prediction sets (20K individuals each), and used a subset of the rest of the individuals for GWAS, matching the numbers of females and males in the GWAS set (total sample size 122,774); we refer to this mixed set, somewhat loosely, as the 'diverse GWAS.' Adjusting for mean sex effects and medication use (see Materials and methods), the prediction accuracy is about 1.15-fold higher for females than for males (Mann-Whitney $p = 1.1 \cdot 1 0^{- 5}$ ; Figure 1A). Thus, despite equal representation of males and females in the GWAS set, the prediction accuracy varies depending on the sex ratio of prediction samples. To examine this further, we repeated the same analysis but performed the GWAS in only one sex (which we refer to as 'stratified GWAS' using the same sample size as in the diverse GWAS). [Note that the diverse GWAS sample is not a merge of the stratified GWAS samples but a mixed-sex sample of equal sample size to that used in the women-only and the men-only GWAS, to allow for direct comparison between GWASs. Results for the merged GWAS (with a much larger sample size) are presented in Appendix 1—figure 1A.] When the GWAS is conducted only in females, the prediction accuracy is about 1.35-fold higher for females than for males; in turn, when GWAS was done in only males, the prediction accuracy in both sexes is similar, as well as somewhat decreased (Figure 1A).

Figure 1. — Shown are incremental $R^{2}$ values (i.e., the increment in $R^{2}$ obtained by adding a polygenic score predictor to a model with covariates alone) in different prediction sets. Each box and whiskers plot is computed based on 20 iterations of resampling GWAS and prediction sets. Thick horizontal lines denote the medians. The polygenic scores were estimated in samples of unrelated WB individuals. Phenotypes were then predicted in distinct samples of unrelated WB individuals, stratified by sex (A), age (B) or Townsend deprivation index, a measure of SES (C). In red and green cases, polygenic scores are based on a GWAS in a sample limited to one sex, age or SES group (a 'stratum'). In blue, polygenic scores are based on a GWAS in a diverse sample matching the number of individuals in each stratum. GWAS samples sizes are: 122,774 for all three diastolic blood pressure GWAS samples, 72,328 for all three BMI GWAS samples, 73,280 for years of schooling GWAS in the diverse sample and 73,283 for GWAS in the low SES and high SES samples.

We then considered two other cases, evaluating prediction accuracy in groups stratified by age for BMI—since the UK Biobank participants were enrolled within about a five-year span, differences in age could in principle also be reflective of cohort effects—and by adult SES for years of schooling, using the Townsend deprivation index as a measure; our choices were motivated by prior evidence suggesting that these characteristics of the GWAS influence estimates of SNP-heritability (Branigan et al., 2013; Conley et al., 2015; Belsky et al., 2018; Elks et al., 2012; Ge et al., 2017). We withheld a random set of 10K individuals in each quartile of age and SES for prediction and performed GWAS using a subset of the remaining individuals, matching the sample sizes across quartiles in the GWAS set (total sample sizes of 72,328 and 73,280 for BMI and years of schooling GWAS, respectively). Similar to our observation for diastolic blood pressure, the prediction accuracy varies across prediction sets: it is 1.4-fold higher for BMI in the youngest quartile compared to the oldest (Mann-Whitney $p = 1.1 \cdot 1 0^{- 5}$ ; Figure 1B), and 2-fold higher for years of schooling in the lowest SES quartile compared to the highest (Mann-Whitney $p = 2.9 \cdot 1 0^{- 6}$ ; Figure 1C). Furthermore, the differences across groups are again sensitive to the choice of the GWAS set: the differences are marked when GWAS is restricted to the youngest quartile for BMI and the lowest SES quartile for years of schooling, but diminished when the GWAS is performed in the oldest and the highest SES quartiles for BMI and years of schooling, respectively (Figure 1B, C). These results remained qualitatively unchanged when we used $R^{2}$ instead of incremental $R^{2}$ to measure prediction accuracy (Appendix 1—figure 2).

In these analyses, we used a p-value threshold of 10^-4 for inclusion of a SNP in the PGS. The choice of how stringent to make the GWAS p-value threshold is important but somewhat arbitrary, with approaches ranging from requiring genome-wide significance to including all SNPs (Weedon et al., 2008; Pharoah et al., 2008; Euesden et al., 2015; Vilhjálmsson et al., 2015; Ware et al., 2017; Mostafavi et al., 2017; Speidel et al., 2019). Often, this threshold is chosen to maximize prediction accuracy in an independent validation set. When the goal is to compare prediction performance across different groups, there is no obvious optimal choice of the p-value threshold. [The optimal p-value in this context will differ across studies, as it depends not only on the genetic architecture and heritability of the trait, but also on the GWAS sample size, that is power (Dudbridge, 2013).] As we show, however, the qualitative trends reported in Figure 1 do not depend on the p-value threshold choice (Appendix 1—figure 3); moreover, the qualitative trends remain when LDpred is used (with a prior probability of 1 on loci being causal; Vilhjálmsson et al., 2015) instead of pruning approaches (Appendix 1—figure 3).

These results pertain to three exemplar traits and do not speak to the prevalence of this phenomenon. Nonetheless, they demonstrate that the prediction accuracy of a polygenic score can vary markedly depending on sample characteristics of both the original GWAS and the prediction set, even within a single ancestry, and that this variation in prediction accuracy can be substantial—on the same order as reported for different continental ancestries within the UK Biobank (Martin et al., 2019). As one example, the prediction accuracy in East Asian samples, averaged across a number of traits, is about half of that in European samples when GWAS was European-based; when the GWAS is done in the lowest SES group for years of schooling, prediction accuracy in the highest SES group is less than half of that in the lowest SES (Figure 1C). Moreover, whereas for these traits, we had prior information about which characteristics may be relevant, other aspects that vary across sets of individuals are undoubtedly important as well (e.g., smoking behavior and diet may modify genetic effects on lipid traits; Bentley et al., 2019; Telkar et al., 2019), and for other traits of interest, much less may be known a priori.

Possible explanations for the variable prediction accuracy

Our goal in this paper is to highlight that prediction accuracies can vary across groups of highly similar ancestry, rather than to investigate the likely causes for any particular phenotype. Nonetheless, we provide some observations that may cast light on these results. We first note that in these three examples, the prediction accuracies track SNP heritability differences across strata (Figure 2A,B,C). This relationship should be expected, given that the estimation noise decreases with heritability (Appendix 1), and potentially underlies the observation that prediction accuracies using the diverse GWAS sample are often intermediate between those obtained from stratified GWAS samples of equal sample size (Figure 1).

Figure 2. — (**A,B,C**) The x-axes show heritability estimates (± SE) based on LD score regression in each set. The y-axes show incremental $R^{2}$ values obtained using the procedure described in Figure 1, with GWAS performed in a pooled sample of all strata and testing in stratified prediction sets (see Materials and methods); points and bars show mean and central 80% range computed based on 20 iterations of resampling GWAS and prediction sets. ‘Q’ denotes quartile of age and SES in (**B,E**) and (**C,F**), respectively. (**D,E,F**) The x-axes show phenotypic variance estimates (± SE) across strata after adjusting for covariates (sex, age and 20 PCs). If the heritability differences across strata are due to differences in environmental variance alone, with genetic variance constant, then heritability should be inversely proportional to phenotypic variance. The best-fitting model for this inverse proportionality (dashed line, simple linear regression) provides a poor fit to the observations.

Perhaps the simplest explanation for these findings would be that heritabilities, and hence prediction accuracies, vary only because of differences in the extent of environmental variance across strata, while the genetic variance is the same. We can test this hypothesis by examining whether the heritability decreases with increasing phenotypic variance (more precisely whether it is inversely proportional to it), as expected if the genetic variance is fixed across strata. What we find instead is that the estimated SNP heritabilities for all three traits increase or remain the same with increasing phenotypic variance (Figure 2D,E,F). Thus, for these traits at least, the variable prediction accuracy is not simply the result of differences in the extent of environmental heterogeneity across strata.

Another possibility is that there is an interaction between genetic effects and sample characteristics, for instance that different sets of genetic variants contribute to blood pressure levels in males and females or to BMI across different stages of life. [Although such interactions could in some contexts be thought of as reflecting GxE, we use the term ‘sample characteristic’ rather than ‘environment’, as environment has different meaning across disciplines, referring in some contexts only to factors that are exogenous to genetics. Viewed in this lens, SES in adulthood cannot be interpreted as exogenous, because it is in part determined by educational achievement, which is itself influenced by genetic factors, and similarly it is questionable whether age or sex are environments.] This explanation is not supported by bivariate LD score regression, which indicates that the genetic correlations across strata are close to 1 (Appendix 1—table 2; Materials and methods). Yet when we re-estimate individual SNP effects in the prediction sets for SNPs ascertained in the original GWAS, the estimated effects of trait-increasing alleles are larger in the groups with higher prediction accuracy (Appendix 1—figure 4; Materials and methods).

One simple model that could reconcile these findings is if effect sizes are highly correlated across the groups, but systematically larger in those groups with higher prediction accuracy. This explanation is reminiscent of the ‘amplification’ model of genetic influences on cognition during development (Briley and Tucker-Drob, 2013).

Other factors complicate interpretation, however, and may also contribute to our observations. In particular, for the case of years of schooling, conditioning on adult SES induces a form of range restriction, which could contribute to variable prediction accuracy across strata. We note, however, that we see highly variable prediction accuracies across SES strata even when the GWAS is conducted in a diverse sample (i.e., including individuals from all strata) (Figure 1C); in that regard, our approach mimics what happens in practice when polygenic scores are used to predict phenotypes in a sample with a smaller range of SES (e.g., Rimfeld et al., 2018). More generally, although this type of range restriction is artificially amplified in our example, SES differences may often be a problem for GWAS in which the sample is not representative of the population; for instance, the most recent major GWAS of educational attainment (Lee et al., 2018) included numerous medical data sets and the 23andMe data set, which are not representative of the national population.

Another potentially important factor is that the adjustment for PCs may not be a sufficient control for the different ways in which population structure can confound GWAS results (Vilhjálmsson and Nordborg, 2013), leading to variable prediction accuracy across strata if they differ in their population structure. To examine this possibility, we repeated the analysis in Figure 1 but using a linear mixed model (LMM) approach (including PCs among other covariates; see Materials and methods), and obtained qualitatively similar results (Appendix 1—figure 5). Although not a perfect fix (Listgarten et al., 2013; Mathieson and McVean, 2013), the fact that we obtain similar results using PCs and LMM suggests that confounding due to population stratification in the UK Biobank alone does not explain the variable prediction accuracies across strata.

Obstacles to portability explored through a comparison of standard and family-based GWAS

Beyond sample characteristics such as age or sex, a number of other factors may shape the portability of scores across groups of similar ancestry. Standard GWAS is done in samples of individuals that deliberately exclude close relatives; as implemented, it detects direct effects of the genetic variants, but also any indirect genetic effects of parents, siblings, or peers, effects of assortative mating among parents, and potentially environmental differences associated with fine-scale population structure (Young et al., 2018; Trejo and Benjamin, 2019; Kong et al., 2018; Lee et al., 2018; Berg et al., 2019). Given that many of these effects are likely to be culturally mediated (Stulp et al., 2017; Selzam et al., 2019), it seems plausible that they may vary within as well as across groups of individuals with different ancestries. If culturally-contingent effects contribute to GWAS estimates (and hence to PGS), they may lead to differences in the prediction accuracy in samples unlike the original GWAS.

To demonstrate that these considerations are not just hypothetical, we compared the prediction accuracy when the PGS is trained on ‘unrelated’ individuals such as those used in a standard GWAS to one obtained from a sibling-based (or ‘sib-based’) GWAS (Materials and methods). In the latter, genotype differences between sibs, a result of random Mendelian segregation in the parents, are tested for association with the phenotypic differences between them. Because the tests depend on phenotypic differences between siblings who, of course, have the same parents, these tests are conditioned on the parental genotypes and hence exclude many of the indirect effects signals that may be picked up in standard GWAS (Appendix 1). Differences between standard and sib-based GWAS are thus informative about the presence of factors other than direct genetic effects (Wood et al., 2014; Trejo and Benjamin, 2019; Lee et al., 2018; Berg et al., 2019; Selzam et al., 2019).

A challenge in this comparison is that the UKB contains only ~22K sibling pairs, ~19K of whom are labeled as ‘White British’ (WB). The siblings are similar to the unrelated individuals in terms of ages, SES distributions and genetic ancestries (Appendix 1—figures 6 and 7) but include a higher proportion of females; this difference is unlikely to influence our analyses (see below). While a large number, 19K pairs is still too few to have adequate power to discover trait-associated SNPs, when compared to a standard GWAS using the much larger sample of unrelated WB individuals (~340K).

To increase power and enable a direct comparison between the two designs, we split the SNP ascertainment and effect estimation steps as follows (Figure 3A): we identified SNPs using a standard GWAS with a large sample size (median ~270K across the traits considered) (see Materials and methods). We then estimated the effect of each significant SNP using (i) a sib-based association test and (ii) a standard association test. We chose the size of the estimation set in (ii) such that the median standard error of effect estimates in (i) and (ii) is approximately equal. We then compared the prediction accuracy of the two PGS obtained in this way (‘standard PGS’ and ‘sib-based PGS’) in an independent prediction set of unrelated individuals; as we show in Appendix 1, our approach leads to highly similar prediction accuracies of the two approaches under a model with direct effects only (see Materials and methods for details). A further advantage is that the two scores are compared for the same set of SNPs, such that LD patterns and allele frequency differences do not come into play.

Figure 3. — (A) After ascertaining SNPs in a large sample of unrelated individuals, we estimated the effects of these SNPs with a standard regression using unrelated individuals and, independently, using sib-regression. We then used the polygenic scores for prediction in a third sample of unrelated individuals. We chose the sample size of the standard PGS estimation set such that median effect estimate SEs are equal in the two designs, thereby ensuring equal prediction accuracy under a vanilla model with no indirect effects or assortative mating. Numbers in parentheses are median sample size in each set across 20 traits (see Materials and methods and Appendix 1—table 1 for the definition of each trait, and Appendix 1—table 3 for sample sizes for each trait). (B) Ratio of prediction accuracy in the two designs across 20 traits. For each trait, we performed 10 resampling iterations of unrelated individuals into three sets for discovery, estimation and prediction (small points). Large points show median values. (**C-F**) We repeated this procedure with different discovery-set p-value thresholds for including a SNP in the polygenic score. The higher the p-value threshold is, the more SNPs are included. For each p-value threshold, points show 10 iterations as described and large points show median values. Shown are a subset of traits, with traits appearing in (B) but not shown here presented in Appendix 1—figure 12.

We applied the approach to 20 traits, focusing on traits with relatively high heritability estimates as well as social and behavioral traits that have been the focus of recent attention in social sciences. For the majority of the traits, such as diastolic blood pressure, BMI, and hair color, the prediction accuracies of standard and sib-based PGS were similar (Figure 3B), as expected under standard GWAS assumptions and as observed for traits simulated under these assumptions (Appendix 1—figure 8). However, for height and for a range of social and behavioral traits, such as years of schooling, pack years of smoking and household income, the prediction accuracy of the sib-based PGS was substantially lower than that of the standard PGS (Figure 3B). [We caution that, because the first step of our study design is to identify SNPs that are associated with the trait in a large set of unrelated individuals and we subsequently match the sampling variances of sib- and standard GWAS, rather than identify distinct sets of SNPs separately in the two designs, the ratio of prediction accuracies that we obtain cannot be directly compared to those reported in other studies.]

A number of factors could contribute to the differences between prediction accuracies for PGS based on sibs versus unrelated individuals, including confounding effects of population stratification, indirect genetic effects from parents and assortative mating. The relative importance of each factor will vary across traits (Rosenberg et al., 2019; Kong et al., 2018; Haworth et al., 2019; Ruby et al., 2018; Selzam et al., 2019). For educational attainment, this gap is likely to reflect at least in part the documented contribution of indirect genetic effects to the standard PGS (Lee et al., 2018; Kong et al., 2018; Young et al., 2018). We show in Appendix 1 that in the presence of indirect genetic effects mediated through parents, standard PGS outperforms sib-based PGS unless direct and indirect effects are strongly anticorrelated (Appendix 1—figure 9), which seems unlikely to be the case for years of schooling. The difference in the performance of sib-based and standard PGS observed for other social and behavioral outcomes, such as household income and age at first sexual intercourse (Figure 3B), may reflect a similar phenomenon. An additional contribution to divergent prediction accuracies could come from indirect effects among siblings, which would also contribute differentially to standard and sibling-based PGS. For height, there may be an important contribution of assortative mating to the difference in prediction accuracies (Wood et al., 2014; Robinson et al., 2017; Lee et al., 2018). In Appendix 1, we show that under a simple model of positive assortative mating, the prediction accuracy based on a standard PGS is higher than that of a sib-based PGS (Appendix 1—figure 10). We further confirmed that the difference in the sex ratio of the siblings and unrelated individuals, mentioned earlier, has a negligible effect on these differences, though it may underlie the slightly lower prediction accuracy of the standard PGS for pulse rate (Appendix 1—figure 11).

The lower prediction accuracies for PGS based on sib-based GWAS indicate that complications such as assortative mating or indirect effects contribute to the standard GWAS estimates. In the absence of these complications, we ensure that prediction accuracies are comparable by matching the sampling errors of the two approaches (Figure 3A). In the presence of these complications, the magnitude of the ratio of prediction accuracies should reflect the strength of assortative mating, the relative contribution of indirect genetic effects compared to direct effects, and so forth. However, interpreting the magnitude of the deviation from 1 is far from straightforward: as we show in Appendix 1, the relative difference in prediction accuracies between the two approaches stems in part from the noise-to-signal ratio for the effect estimates in sib-based versus standard GWAS (Appendix 1, Appendix 1—figures 9 and 10), and as a result also depends on features of the comparison like the sample sizes used and the PGS model.

Motivated by these considerations, we examined how the prediction accuracy varies when progressively relaxing the GWAS p-value threshold for inclusion of SNPs, that is when including more weakly associated SNPs in the PGS. [In Figure 3B, results are shown for the p-value threshold that maximizes the prediction accuracy of the standard PGS, replicating the practice when comparing populations of different ancestry; Martin et al., 2019.] For hair color and diastolic blood pressure, there is little to no difference in prediction accuracy between the two estimation methods, regardless of the number of SNPs included in the score (Figure 3C,D). In contrast, for height, standard and sib-based PGS perform similarly when based on the most significantly associated SNPs, but standard PGS progressively outperforms sib-based PGS when more SNPs are included (Figure 3E). Similarly, the difference in prediction accuracy between sib-based and standard PGS changes markedly for years of schooling, household income and other social and behavioral traits (Figure 3F and Appendix 1—figure 12). The growing gap in performance with increasing p-value threshold likely reflects a combination of an increasing noise-to-signal ratio for the effect estimates in sib-based versus standard GWAS (see Appendix 1) and changes in the relative importance of direct effects versus other factors such as indirect parental effects and assortative mating.

In summary, the differences between the prediction accuracies of standard and sib-based PGS seen for a number of traits (Figure 3B), notably social and behavioral ones, demonstrate that standard GWAS estimates often include a substantial contribution of factors other than direct effects. In these cases, even if the power to detect direct effects were comparable, standard GWAS would lead to higher prediction accuracy than sib-GWAS. In some contexts that may be a sufficient reason to rely on PGS derived from standard GWAS. However, that gain stems from the inclusion of factors such as indirect effects and assortative mating that are likely to be modulated by SES, environment and culture (e.g., Selzam et al., 2019; Stulp et al., 2017). Thus, the increased prediction accuracy likely comes at a cost of not always porting well across groups, even of the same ancestry, in ways that may be difficult to anticipate.

Discussion

Although the conversation around the portability of PGS has largely focused on genetic ancestries, our results show that prediction accuracy can also differ, in some cases substantially, across groups of similar ancestry—even due to basic study design differences such as age, sex or SES composition. When due only to increased environmental variance, such decreased accuracy may not pose a problem, at least for certain applications. But as we have shown, differences in the degree of environmental variance are not the primary explanation for the patterns we report (Figure 2), and other factors, including differences in the magnitude of genetic effects among groups, indirect effects and assortative mating, also lead to differences in the prediction accuracy of PGS, in ways that may make applications of phenotypic prediction less reliable, even within a single ancestry group. For some traits, there is prior information about which factors are likely to be important, but not always, and even for well-studied traits, it may be difficult to enumerate all the influential factors. As an example, we considered the accuracy of the polygenic score for years of schooling and found that it also varies somewhat depending on whether individuals have no sibling or one sibling in the prediction sets (Materials and methods; Appendix 1—figure 13).

Following the discussion of portability across ancestries, we have focused on incremental $R^{2}$ as a measure of portability. This measure is less directly informative when the goal is to use PGS to reliably identify individuals in the tails of the distribution, that is those at elevated risk of developing a disease—the main application of PGS in human genetics, as distinct from social science or evolutionary biology. Nonetheless, the same concerns raised here are likely to apply. To illustrate that point, we considered binary outcomes of the traits considered in Figure 1, 'hypertension' (defined as diastolic blood pressure > 110 mmHG), 'obesity' (defined as BMI > 35 kg/m²), and 'college completion', and evaluated the prediction accuracy as measured by incremental AUC (Appendix 1—figure 14).The qualitative results are the same as in Figure 1. We also examined how incremental AUC varies by sex for five binary disease traits that we chose because they have relatively high heritability. For three of them, hypothyroidism and two cardiovascular outcomes, prediction accuracy varies depending on both the GWAS and prediction sets (Appendix 1—figure 15).

Thus, for both quantitative and binary traits, the question of the domain over which a PGS applies is not just about LD patterns, allele frequencies or GxG effects but also about the extent of environmental and genetic variance, GxE, as well as the contribution of direct effects versus indirect effects, assortative mating and environmental confounding. An important implication is that differences in prediction accuracies among groups with distinct ancestries cannot be interpreted exclusively or even primarily in terms of population genetic parameters when these groups differ dramatically in their SES (Chetty and Hendren, 2018; Conley, 2010; Nuru-Jeter et al., 2018; Reich, 2017) and other factors that may affect portability—especially when the relative contribution of these factors to GWAS signals remains unknown (Young et al., 2019; Mills and Rahal, 2019). Thus, efforts to conduct GWAS in groups that vary in ancestry and geographic locations will need to be accompanied by a careful examination of variation in portability along other dimensions.

While these results raise the question of how to best construct a PGS, the answer is not obvious, and likely depends on the specific trait and samples. For example, for the three cases shown in Figure 1, considering a fixed GWAS sample size, the highest prediction accuracy is attained with a GWAS sample limited to some stratum (e.g., women for diastolic blood pressure). Yet a much larger merged data set containing the union of strata generates the most predictive PGS (Appendix 1—figure 1). Together, these observations suggest a trade-off between the factors that are shared among strata and lead to increased power with sample size and those that differ across strata and underlie the variable prediction accuracy. In principle then, if influential factors were known, the composition of the GWAS sample could be optimized to yield the highest accuracy in a given prediction set, but how much each stratum should be weighted will depend on a number of factors such as the genetic and environmental variance in each stratum, genetic correlation across strata, and sample sizes. Moreover, factors such as assortative mating and indirect effects are soaked up into the GWAS estimates—and critically also into the SNP heritability estimates. Thus, the choice of a GWAS sample is about more than power; it is implicitly making a choice about all sorts of sample characteristics that may or may not hold true of the prediction set.

In that regard, it is worth noting that while classical twin studies were often constituted to be representative of a reference population (often national in nature) (Polderman et al., 2015; Branigan et al., 2013), the same is not true of most contemporary human genetic datasets, which are skewed towards medical case-control studies, biobanks that are opt-in (and thus tend to include individuals who are wealthier and better educated than the population average) or direct-to-consumer proprietary genetic databases (which are even more skewed along these dimensions) (Lee et al., 2018). For instance, individuals in UK Biobank have higher SES than the rest of the British population (Fry et al., 2017) and are presumably self-selected for a certain level of interest in biomedical research. These factors alone raise challenges as to the broad portability of PGS derived from them. More generally, it seems plausible that individuals included in a GWAS differ from those that, for myriad reasons, do not end up participating (Taylor et al., 2018), in ways that make it difficult to predict the domain over which GWAS-based estimates can be reliably generalized.

One fruitful way forward may be to study data from related individuals, in which it should be possible to decompose the components of the signals identified in GWAS into direct and indirect effects, the degree of assortative mating and the contribution of residual stratification (Zhang et al., 2015; Young et al., 2018; Kong et al., 2018). Not only will this decomposition help us to better interpret the results of GWAS and the resulting PGS, it will make it possible to examine under which circumstances, and for which phenotypes, components port more reliably to other sets of individuals, both unrelated and related. Ultimately, we envisage that in order to be broadly applicable, GWAS-based phenotypic prediction models will need to include not only a PGS but some study characteristics, other social and environmental measures and, perhaps crucially, their interactions.

Materials and methods

UK biobank

The UK Biobank (UKB) is a large study of about half a million United Kingdom residents, recruited between years 2006 to 2010 (Bycroft et al., 2018). In addition to genetic data, hundreds of phenotypes were collected through measurements and questionnaires at assessment centers, and by accessing medical records of the participants.

Inclusion criteria

In this study, we focused on 408,434 participants who passed quality control (QC) measures provided by UKB; specifically, for whom the reported sex (QC parameter ‘Submitted.Gender’) matched their inferred sex from genotype data (QC parameter ‘Inferred.Gender’); who were not identified as outliers based on heterozygosity and missing rate (QC parameter ‘het.missing.outliers’==0); and did not have an excessive number of relatives in the database (QC parameter ‘excess.relatives’==0). We further selected individuals identified by UKB to be of ‘White British’ (WB) ancestry (QC parameter ‘in.white.British.ancestry.subset’==1), which is a label that refers to those who, when given a set of choices, self-reported to be of ‘White’ and ‘British’ ethnic backgrounds and, in addition, were tightly clustered in a principal component analysis of the genotype data, as detailed in Bycroft et al. (2018). We excluded individuals that had withdrawn from the UK Biobank by the time of the analyses here. For a given trait, we further conditioned on individuals for whom the trait value was reported.

Phenotype data

We focused on 25 traits, including traits with relatively high heritability estimates as well as social and behavioral traits that have been the focus of recent attention in social sciences (see Appendix 1—table 1 for a complete list of phenotype data used in this work, and their corresponding numeric field codes in the UKB data showcase). We calculated the phenotype ‘years of schooling’ by converting the maximal educational qualification of the participants to years following Okbay et al. (2016) (Appendix 1—table 4). For diastolic blood pressure, pulse rate, and forced vital capacity, we took the average of the first two rounds of measurement taken during the same examination at UKB assessment centers. We adjusted the diastolic blood pressure levels for blood pressure lowering medication following Evangelou et al. (2018) by shifting the values upward by 10 mmHg for individuals taking medication. For hand grip strength, we took the average of the measurements for the two hands. For categorical phenotypes, we assigned integer values to each category (Appendix 1—table 1). For hair color, individuals who reported hair color variable ‘Other’ were excluded from the analyses. We considered binary traits, ‘hypertension’ defined as diastolic blood pressure >110 mmHG, ‘obesity’ defined as BMI >35 kg/m², and ‘college completion’ defined based on attainment of a college or a university degree. Disease outcomes were ascertained using self-reported information and/or using the hospital inpatient main and secondary diagnoses coded according to the International Classification of Diseases (ICD-9 and ICD-10). Hypothyroidism, type 2 diabetes, and rheumatoid arthritis were ascertained based on ICD-10 codes of E03.X, E11.X and M06.X, respectively. Myocardial infarction was ascertained based on ICD-9 codes of 410.9, 411.9, 412.9, or ICD-10 codes of I21.X, I22.X, I23.X, I24.1, I25.2 following Khera et al. (2018), or participants with myocardial infarction outcome data among the UK Biobank’s algorithmically-defined outcomes. We also considered the binary outcome of ever being diagnosed to have had a heart attack, angina or stroke. For a subset of individuals, multiple measurements of a phenotype were provided, corresponding to multiple visits to UKB assessment centers; in those cases, we used the measurements during the first visit.

Genotype data

UKB participants were genotyped on either of two similar genotyping arrays, UK Biobank Axiom and UK BiLEVE arrays, at a total of ~850K markers. We focused on autosomal bi-allelic SNPs shared between both arrays, and used plink v. 1.90b5 (Chang et al., 2015) to filter SNPs with calling rate >0.95, minor allele frequency >10⁻³, and Hardy-Weinberg equilibrium test p-val >10⁻¹⁰ among the WB samples, resulting in 616,323 SNPs.

GWAS and trait prediction methods

GWAS by sample characteristics

We focused on a set of 337,488 WB samples that were identified by the UKB to be ‘unrelated’ (sample QC parameter ‘used.in.pca.calculation’==1 as provided by UKB), defined such that no pairs of individuals are inferred to be 3^rd degree relatives or closer. We split the sample into non-overlapping sets of individuals by one of the following factors: age at recruitment (in years), sex, and Townsend deprivation index at recruitment (used as a proxy for socio-economic status or SES, specifically we take the negative of the Townsend deprivation index as a measure of SES). For SES and age, we divided the sample into four sets: Q1 [minimum value, first quartile], Q2 (first quartile, second quartile], Q3 (second quartile, third quartile], and Q4 (third quartile, maximum value]. We randomly selected 10K samples in each SES and age group, and 20K of males and 20K of females as held-out prediction sets, and performed GWAS using the remaining samples, matching sample sizes across groups in the GWAS set. We performed nine GWASs: for years of schooling in SES Q1 and SES Q4 (sample size 73,283 for each), and in a diverse sample with equal number of individuals from all four groups (sample size 73,280); for body mass index (BMI) in Q1, Q4, and in a diverse sample with equal number of individuals from all four groups (sample size 72,328 for each); and for diastolic blood pressure in males, females, and in a diverse sample with equal number of males and females (sample size 122,774 for each). We performed all GWASs using plink v. 2.0 (with the flag --linear), adjusting for sex, age (at recruitment) and first 20 PCs as covariates. PCs are principal components of the genotype data, as provided by UKB, calculated using the entire cohort (not just WB individuals). For a subset of cases (where GWAS was performed in samples restricted by characteristics described above), we additionally performed association tests using a linear mixed model (LMM) as implemented in BOLT-LMM v. 2.3.2 (Loh et al., 2015), using LD scores computed from 1000 Genomes European-ancestry samples, with sex, age and first 20 PCs as covariates. The GWAS summary statistics were used to construct PGS for the samples in the prediction sets.

To better understand the performance of PGS across the strata (see ‘Possible explanations for the variable prediction accuracy’), we estimated the mean effect sizes of significant SNPs in each of the strata. To avoid overfitting, we first performed an association test in the pooled sample of all strata excluding individuals in the prediction sets and matching the number of individuals per stratum; sample size 293,132 for years of schooling, 272,456 for BMI, and 245,548 for diastolic blood pressure. Then for significantly associated SNPs (LD pruned as described in ‘Polygenic score construction and trait prediction’), we re-estimated the effect sizes in each of the strata in the prediction sets (see Appendix 1—figure 4). We also used these pooled GWASs to explore the relationship between prediction accuracy and SNP heritability (as shown in Figure 2) and with GWAS sample size (Appendix 1—figure 1). We performed 20 iterations of all above steps.

In addition to above examples, we explored the prediction accuracy for years of schooling when GWAS and prediction sets are stratified based the participants’ number of full siblings. Specifically, we performed GWAS using individuals who had exactly one sibling (sample size 90,417), and evaluated prediction in two independent samples of individuals who reported having no siblings or having one sibling (sample size 20K for each) (see Appendix 1—figure 13).

We also considered five binary disease outcomes stratified by sex. Specifically, we performed GWAS in equally sized samples of males and females for hypothyroidism (sample size 135,526), type 2 diabetes (sample size 136,061), rheumatoid arthritis (sample size 136,039), myocardial infarction (sample size 136,061) and having been diagnosed with a heart attack or angina or stroke (sample size 135,833), leaving out 20K samples of males and females for prediction (see Appendix 1—figures 14 and 15). For these traits we used a logistic regression model for GWAS (using plink v. 2.0 with the flag --logistic). An important caveat to analyses of disease outcomes recorded during multiple follow-ups is that for ‘age’, we could only consider the age at recruitment in the GWAS; that approach is not ideal, considering that a fraction of individuals died during the course of the study (about 20K individuals in the full cohort).

Standard versus sibling-based polygenic score

We used the genetic relatedness information provided by UKB to infer sibling pairs among the WB samples. Following Bycroft et al. (2018), we marked pairs with $\frac{1}{2^{5 / 2}} < ϕ < \frac{1}{2^{3 / 2}}$ and IBS0 > 0.0012 as siblings, where $ϕ$ is the estimated kinship coefficient and IBS0 is the fraction of loci at which individuals share no alleles. By this approach, we identified 19,329 sibling pairs including 35,634 individuals across 17,328 families. For a given trait, we included pairs with the property that trait values for both individuals were reported. We then formed two sets of individuals: 'Siblings' set, including the sibling pairs randomly sampled to include only one pair per family, and an 'Unrelateds' set, including the unrelated individuals identified by the UKB (see section 'GWAS by sample characteristics' above), but excluding the Siblings and 6,911 individuals that were related to the Siblings (3^rd degree or closer).

We focused on 20 quantitative traits (see Figure 3B for the list of traits considered in this analysis) and a number of simulated traits (see below). For each trait, we first downsampled the Unrelateds set to a sample size $n^{*}$ such that the median standard error of effect estimates roughly matched the median standard error in the sibling-based regression (see 'Estimating $n^{*}$ ' below). We then divided the Unrelateds set into three non-overlapping sets: after sampling $n^{*}$ individuals (Unrelateds- $n^{*}$ set), we randomly split the rest of the Unrelateds set into an Unrelateds-prediction set (10% of the samples) to be used as a sample for trait prediction ('prediction set'), and an Unrelateds-discovery set (90% of the samples) to be used for the discovery of trait associated variants (see Appendix 1—figure 3 for sample sizes in each set). For each trait, we performed standard GWAS in the Unrelateds-discovery set, and ascertained SNPs by thresholding on association p-values. We then estimated the effect sizes for these ascertained SNPs in two ways: by a sibling-based association test in the Siblings set (using plink v. 1.90b5’s QFAM procedure with the flag --qfam), and by a standard association test in the Unrelateds- $n^{*}$ set (using plink v. 2.0). Subsequently, for each set of ascertained SNPs in the Unrelateds-discovery set, two PGS were constructed for the samples in the Unrelateds-prediction set (see Figure 3A for overview of the pipeline). We performed 10 iterations of the above sampling, ascertainment and estimation steps, except for simulated traits where we performed 30 iterations.

Estimating $n^{*}$

In order to compare the performance of sibling-based and standard GWAS designs, we wanted to match both analyses to have similar prediction accuracy under a vanilla model of no assortative mating, population structure stratification or indirect effects. In Appendix 1, we show that this could be achieved by matching median effect estimate standard errors. For each trait, we therefore calculated $n^{*}$ , the sample size of a standard GWAS that yields roughly equal standard errors in the standard and sibling-based regressions. Specifically, for each trait, we first performed sibling-based GWAS in the Siblings using plink’s QFAM procedure (with the flag --qfam mperm=100000 emp-se). We then randomly sampled a range of sample sizes from the set of Unrelateds, from 5K to 20K in 1K increments. Following Wood et al. (2014), for each sample size, we performed a standard GWAS, and investigated the linear relationship between the square root of the sample size and the inverse of the median standard error of the effect size estimates. We then used this linear relationship to estimate the sample size of a standard GWAS that corresponds to the inverse of the median standard error of the effect sizes estimate in the sibling-based GWAS.

All standard association tests were performed using plink v. 2.0 (with the flag --linear), adjusting for sex, age and first 20 PCs as covariates. For sibling-based association tests we first residualized the phenotypic values on age and sex, and then regressed the sibling differences in residuals on sibling genotypic differences using plink’s QFAM procedure as described above.

We also considered a version of the analysis described above, in which we first residualized the phenotypes on covariates in the pooled sample of all WB individuals, and then ran the pipeline on the residuals without further adjustment for covariates in the GWAS or prediction evaluation. As shown in Appendix 1—figure 16, this approach produced results that are qualitatively the same to what we present in Figure 3.

Simulated traits

We wanted to check that given the study design described above, sibling-based and standard PGS perform similarly with respect to trait prediction, under the vanilla model of no population stratification, assortative mating or indirect genetic effects (Figure 3). To this end, we simulated traits with heritability $h^{2} =$ 0.1 or 0.5 and either 10K or 100K causal SNPs. For each set of parameters, we simulated three replicates giving a total of 12 simulated traits.

We randomly selected the causal SNPs from a set of 10,879,183 imputed SNPs, considering that most causal variants are plausibly not directly genotyped on SNP arrays. We used a set of SNPs that passed quality control procedures by the Neale lab (http://www.nealelab.is/uk-biobank), namely autosomal SNPs, imputed using the haplotype reference consortium (HRC) panel, which have INFO score > 0.8 and have minor allele frequency > 10⁻⁴; we further limited the SNP set to ones that were bi-allelic in the WB sample. As in Martin et al. (2017), we randomly assigned effect sizes to these causal SNPs as $β ~ N (0, \frac{h^{2}}{m})$ , and zero for non-causal SNPs. We then calculated genetic component of the trait, $g$ , for all WB samples under an additive model by summing the allelic counts weighted by their effect sizes using plink (with the flag --score). Allelic counts were determined by converting imputation dosages to genotype calls with no hard calling threshold. We also assigned environmental contributions as $ε ~ N (0,1 - h^{2})$ , and then constructed the PGS for each individual,

g = \sum_{i = 1}^{m} β_{i} X_{i},

where $X_{i}$ is the number of minor alleles at SNP $i$ carried by the individual, and the trait value for the individual is calculated as the sum of genetic and environmental contributions:

y = \sqrt{h^{2}} (\frac{g - \bar{g}}{σ_{g}}) + \sqrt{1 - h^{2}} (\frac{ε - \bar{ε}}{σ_{ε}})

where bars represent averages, $σ_{g}$ is the standard deviation of PGS across individuals and $σ_{ε}$ is the standard deviation of environmental contributions across individuals. These simulated traits were then analyzed using the same pipelines as the other traits (e.g., adjusting for covariates etc.). Importantly, SNP discovery and effect size estimations in GWAS were performed without knowledge of the causal SNPs.

Polygenic score construction and trait prediction

For all GWAS designs described above, we used p-value thresholding followed by clumping to choose sets of roughly independent SNPs to build PGS. We considered a logarithmically-spaced range of p-values: 10⁻⁸, 10⁻⁷, 10⁻⁶, 10⁻⁵, 10⁻⁴, 10⁻³, and 10⁻² (or a subset if no SNP reached that significance level). We then used plink’s clumping procedure (with the flag --clump) with LD threshold $r^{2}$ < 0.1 (using 10,000 randomly selected unrelated WB samples as a reference for LD structure) and physical distance threshold of >1MB. The selected SNPs were then used to calculate PGS for individuals in the prediction sets, by summing the allelic counts weighted by their estimated effect sizes (log of the odds ratios in the case of binary traits) using plink (with the flag --score). In a subset of cases, we also calculated polygenic scores using LDpred assuming all loci are causal (Vilhjálmsson et al., 2015). To evaluate prediction accuracy, we calculated the incremental $R^{2}$ : we first determined $R^{2}$ in a regression of the phenotype to the covariates, and then calculated the change in $R^{2}$ when including the PGS as a predictor. For binary traits, we calculated the incremental area under the receiver operator curve (AUC).

Estimating heritability and genetic correlation

We calculated SNP heritability across sex, age and SES groups for diastolic blood pressure, BMI and years of schooling, respectively (as described in the section ‘GWAS by sample characteristics’) as well as genetic correlations across pairs of groups: we first performed GWAS using all unrelated WB individuals in each group. We then used the GWAS summary statistics to perform LD score regression with LD scores computed from the 1000 Genomes European-ancestry samples (Bulik-Sullivan et al., 2015).

Acknowledgements

This study has been conducted using the UK Biobank resource under application Number 11138, as approved by Columbia University Institutional Review Board, protocol AAAS2914. We are grateful to Daniel Belsky, Jeremy Berg, Graham Coop, Peter Donnelly, Doc Edge, Iain Mathieson, Augustine Kong, Magnus Nordborg, Vincent Plagnol, Guy Sella, Alex Young and members of the Przeworski and Sella labs for valuable discussions and to Doc Edge, Guy Sella, Graham Coop and Magnus Nordborg for comments on a draft of the manuscript. This work was funded by NIH GM121372 to MP, NIH HG008140 to JKP, a Robert Wood Johnson Foundation Pioneer Award (grant number 84337817) to DC and a Junior Fellowship from the Simons Society of Fellows (number 633313) to AH.

Appendix 1

1 Prediction accuracies of polygenic scores based on standard and sib-GWAS

1.1 Overview of derived results

In the main text, we compare the prediction accuracies of polygenic scores (PGS) based on a standard GWAS of unrelated individuals and a GWAS based on sibling differences, for a number of traits. Here, we describe how this comparison is implemented, and how indirect effects and assortative mating manifest in this comparison.

Matching standard and sib-based prediction accuracies

Current standard GWAS are based on huge sample sizes, leading to less noisy estimates than are afforded by family association studies such as those based on sib-differences, which are typically much smaller. This difference in precision needs to be taken into account in making comparisons between the prediction accuracy of scores derived from the two approaches. We show that under a vanilla additive model with no assortative mating, indirect effects, population structure (or other complications), and if the standard GWAS is subsampled to a sample size

n^{*} \approx \frac{1}{1 + (1 - h^{2}) (1 - 2 ρ_{s i b})} n^{p a i r s},

where $n^{p a i r s}$ is the number of sib pairs, $h^{2}$ is the heritability and $ρ_{s i b}$ is the correlation in environmental effects experienced by siblings, the two study designs are expected to have the same (out-of-sample) prediction accuracy (see Section 1.2). This analytic result is not that useful in practice, however; in particular, it requires prior knowledge about the extent to which environmental effects correlate among siblings. Instead, we took an empirical approach to match the prediction accuracies in the two approaches: following Wood et al. (2014), we subsampled the regular GWAS to match the median standard errors of the sib-GWAS. As we show in Section 1.2.3, under our vanilla model, we then expect equal out-of-sample prediction accuracies for polygenic scores derived from the two study designs.

Indirect parental effects

In the presence of indirect parental effects, the out-of-sample prediction accuracy takes a simple form. For a polygenic score based on a standard GWAS, we obtain

E [R_{u r}^{2}] = τ^{2} \frac{1}{1 + c},

where $τ^{2}$ is the ratio of the variance in the trait due to both direct effects and indirect effects of transmitted parental alleles over the total phenotypic variance; and $c$ is a term representing the noise-to-signal ratio in a standard GWAS. For the polygenic score based on sib-GWAS, we obtain

E [R_{s i b}^{2}] = {(1 + ρ \frac{σ_{η}}{σ_{β}})}^{2} h_{β}^{2} \frac{1}{1 + c τ^{2} / h_{β}^{2}} .

where $σ_{β}^{2}$ and $σ_{η}^{2}$ are the variances of random direct and indirect effects, respectively, $ρ$ is the correlation between direct and indirect effects, and $h_{β}^{2}$ is the proportion of the phenotypic variance explained by direct effects. Our results suggest that under plausible conditions, the presence of indirect effects would lead to higher prediction accuracy in a standard GWAS. This result holds whether direct and indirect effects are positively correlated, uncorrelated or even somewhat negatively correlated (Appendix 1—figure 9).

Assortative mating

We investigated several models of assortative mating by simulation. Standard GWAS-based polygenic scores have greater prediction accuracies than those based on sib-GWAS when the parental phenotypes are positively correlated, and the reverse is true if they are negatively correlated (Appendix 1—figure 10 A,B). The relative difference in prediction accuracies of the two study designs grows with the inclusion of more SNPs in the polygenic score model (Appendix 1—figure 10 D,F).

In our analytic model, we ignored the ascertainment step of our study design, in which it is decided which SNPs to include in the polygenic score. We assumed that SNPs are pre-ascertained and that the set of ascertained SNPs includes all causal ones. In a subset of simulations, we implemented the ascertainment step based on an independent simulated GWAS (see below). In both settings, we refer (somewhat loosely therefore) to the regression on ascertained SNPs in a sample of unrelated individuals as ‘standard GWAS’ and the regression of the difference in phenotypes on the difference in sib genotypes as ‘sib-GWAS.’

1.2 Picking the sample size of the standard GWAS to match the prediction accuracy of the score based on the sib-GWAS

We look for the sample size $n^{*}$ of a standard GWAS performed on sample of unrelated individuals such that, under our vanilla model, the resulting polygenic score has the same (out-of-sample) prediction accuracy as the polygenic score obtained from a sib-GWAS with sample size $n_{p a i r s}$ . We begin by assuming that all causal sites $i$ are known; that they are unlinked; that they have only additive, direct effects on the phenotype; and that there is no population stratification or assortative mating. We first find the sampling variance of the effect size estimate for a single site obtained from each of the two study designs. We then examine (and ultimately match) the prediction accuracy of the polygenic scores obtained from effect sizes estimated in the estimation sets, ${\hat{β}}_{u r}, {\hat{β}}_{s i b}$ , on a new, independent prediction sample of unrelated individuals ${(x^{'}, y^{'})}$ .

1.2.1 Sampling error of the estimated effect size at a single site

Our model for the phenotypic value $y$ is

y = g + e

where $e$ is a Normally distributed environmental effect (which includes all sources of random noise) and

g = β_{0}^{u r} + \sum_{i} β_{i} x_{i}

where $x_{i} \in {0, 1, 2}$ are random genotypes. The genotype is coded as the the number of alleles with effect $β_{i}$ carried by the individual at site $i$ . Effect sizes $β = {β_{i}}$ are treated as fixed parameters throughout (except when noted otherwise in the very last step leading to Equation 23). We can rewrite our model to focus on the effect size at a single site $i$ :

y = β_{0} + β_{i} x_{i} + ϵ_{i},

(1)

where

ϵ_{i} = g - β_{i} x_{i} + e,

with variance

V a r [ϵ_{i}] = V a r [g - β_{i} x_{i}] + V a r [e] = V a r [y] - β_{i}^{2} V a r [x_{i}]

In an OLS regression, the standard error for the effect of an allele at site $i$ is

V a r [{\hat{β}}_{i}^{u r}] = \frac{V a r [ϵ_{i}]}{(n - 1) V a r [x_{i}]} = \frac{V a r [y] - β_{i}^{2} V a r [x_{i}]}{(n - 1) V a r [x_{i}]},

(2)

where $n$ is the sample size and ${\hat{β}}^{u r}$ denotes that the estimate was obtained using a sample of unrelated individuals. In sib-GWAS, our model for site $i$ is

Δ y = β_{0}^{s i b} + β_{i} Δ x_{i} + Δ ϵ_{i},

with variance

V a r [Δ ϵ_{i}] = V a r [Δ g - β_{i} Δ x_{i}] + V a r [Δ e] =

V a r [Δ g] + β_{i}^{2} V a r [Δ x_{i}] - 2 β_{i}^{2} V a r [Δ x_{i}] + V a r [Δ e] .

Recall that for siblings (denoted with subscripts $A$ and $B$ ), we expect

C o v [x_{i, A}, x_{i, B}] = \frac{1}{2} V a r [x_{i}],

C o v [g_{A}, g_{B}] = \frac{1}{2} V a r [g] .

Plugging these back in, we obtain

V a r [Δ ϵ_{i}] = V a r [g] - β_{i}^{2} V a r [x_{i}] + 2 V a r [e] (1 - ρ_{s i b})

where $ρ_{s i b} = C o r [e_{A}, e_{B}]$ is the correlation in environmental effects between sibs. The variance of the estimated effect size in sib-GWAS is therefore

V a r [{\hat{β}}_{i}^{s i b}] = \frac{V a r [Δ ϵ_{i}]}{(n_{p a i r s} - 1) V a r [Δ x_{i}]} = \frac{V a r [y] - β_{i}^{2} V a r [x_{i}] + V a r [e] (1 - 2 ρ_{s i b})}{(n_{p a i r s} - 1) V a r [x_{i}]} .

(3)

1.2.2 Sample size required for equal prediction accuracy

We measure prediction accuracy as the expected squared correlation between polygenic scores $\hat{g}$ and phenotypic values in an independent prediction set of unrelated individuals, denoted ${(x^{'}, y^{'})}$ ,

E_{{(x^{'}, y^{'})}} [R^{2}] = \frac{C o v^{2} [\hat{g} (x^{'}), y^{'}]}{V a r [y^{'}] V a r [\hat{g} (x^{'})]},

To incorporate randomness both in the estimation set (summarized by the Multivariate Normal distribution of $\hat{β}$ ) and the prediction set ${(x^{'}, y^{'})}$ , we will require

E_{{\hat{β}}^{u r} (n^{*})} [E_{{(x^{'}, y^{'})}} [R^{2}]] \overset{!}{=} E_{{\hat{β}}^{s i b} (n^{p a i r s})} [E_{{(x^{'}, y^{'})}} [R^{2}]]

where $\hat{β} (n)$ is a set ${{\hat{β}}_{i}}$ estimated in a GWAS with sample size $n$ . Equivalently,

E_{{\hat{β}}^{s i b}} [\frac{C o v^{2} [{\hat{g}}_{s i b} (x^{'}), y^{'}]}{V a r [{\hat{g}}_{s i b} (x^{'})]}] \overset{!}{=} E_{{\hat{β}}^{u r}} [\frac{C o v^{2} [{\hat{g}}_{u r} (x^{'}), y^{'}]}{V a r [{\hat{g}}_{u r} (x^{'})]}],

(4)

where we left out the sample sizes for brevity, and $V a r [y^{'}]$ was cancelled out. Finally, we can replace Equation 4 by its first order Taylor approximation to get the requirement

\frac{E_{\hat{β}} [C o v_{{(x^{'}, y^{'})}} [{\hat{g}}_{s i b} (x^{'}), y^{'}]]^{2}}{E_{\hat{β}} [V a r_{{(x^{'}, y^{'})}} [{\hat{g}}_{s i b} (x^{'})]]} \overset{!}{=} \frac{E_{\hat{β}} [C o v_{{(x^{'}, y^{'})}} [{\hat{g}}_{u r} (x^{'}), y^{'}]]^{2}}{E_{\hat{β}} [V a r_{{(x^{'}, y^{'})}} [{\hat{g}}_{u r} (x^{'})]]} .

(5)

We solve Equation 4 for a sample size $n^{*}$ to be used for estimation of the polygenic score in a standard GWAS that satisfies Equation 4. We note that if the vector of estimates $\hat{β}$ is given, then

\begin{array}{ll} C o v_{{(x^{'}, y^{'})}} [y^{'}, \hat{g} (x^{'}) | \hat{β}] = C o v_{{(x^{'}, y^{'})}} [g (x^{'}), g (x^{'}) + \sum_{i}^{m} x_{i}^{'} ({\hat{β}}_{i} - β_{i}) | \hat{β}] = \\ V a r_{{(x^{'}, y^{'})}} [g (x^{'}) | \hat{β}] + \sum_{i}^{m} C o v_{{(x^{'}, y^{'})}} [β_{i} x_{i}^{'}, ({\hat{β}}_{i} - β_{i}) x_{i}^{'} | \hat{β}] = \sum_{i}^{m} V a r [x_{i}^{'}] β_{i} {\hat{β}}_{i} . \end{array}

(6)

Since for every $i$ , we have

E [{\hat{β}}_{i}^{u r}] = E [{\hat{β}}_{i}^{s i b}] = β_{i},

we obtain

E_{{\hat{β}}^{s i b}} [C o v [y^{'}, {\hat{g}}_{s i b} (x^{'}) | {\hat{β}}^{s i b}]] = \sum_{i}^{m} V a r [x_{i}^{'}] β_{i}^{2} = E_{{\hat{β}}^{u r}} [C o v [y^{'}, {\hat{g}}_{u r} (x^{'}) | {\hat{β}}^{u r}]],

which turns the requirement of Equation 5 into

E_{{\hat{β}}^{s i b}} [V a r_{{(x^{'}, y^{'})}} [{\hat{g}}_{s i b} (x^{'})]] \overset{!}{=} E_{{\hat{β}}^{u r}} [V a r_{{(x^{'}, y^{'})}} [{\hat{g}}_{u r} (x^{'})]],

or simply

\sum_{i}^{m} V a r [x_{i}] V a r [{\hat{β}}_{i}^{u r}] \overset{!}{=} \sum_{i}^{m} V a r [x_{i}] V a r [{\hat{β}}_{i}^{s i b}] .

(7)

Plugging the sampling variance results from Equation 2 and Equation 3 into Equation 7 and reordering, we obtain

\frac{n^{*} - 1}{n_{p a i r s} - 1} = \frac{\sum_{i}^{m} V a r [y] - β_{i}^{2} V a r [x_{i}]}{\sum_{i}^{m} V a r [y] - β_{i}^{2} V a r [x_{i}] + V a r [e] (1 - 2 ρ_{s i b})},

or, assuming that the trait is polygenic such that $m ≫ 1$ ,

\frac{n^{*}}{n^{p a i r s}} \approx \frac{1}{1 + (1 - h^{2}) (1 - 2 ρ_{s i b})} .

(8)

Equation 8 can in principle be applied to the estimation of $ρ_{s i b}$ for a given trait, under our model assumptions, and given an independent estimate of $h^{2} .$

1.2.3 Empirical matching of standard errors

The result of Equation 8 is the same as we would obtain if we required

\forall i V a r [{\hat{β}}_{i}^{s i b} (x_{i})] \overset{!}{=} V a r [{\hat{β}}_{i}^{u r} (x_{i}^{s i b})]

(9)

without taking into account randomness in the prediction set. In practice (and in the results shown in the main text), we have no prior knowledge about $ρ_{s i b}$ and instead we find a sample size $n^{*}$ for the standard GWAS such that

m e d i a n_{{s i t e s i}} (V a r [{\hat{β}}_{i}^{s i b} (x)]) \overset{!}{=} m e d i a n_{{s i t e s i}} (V a r [{\hat{β}}_{i}^{u r} (x)])

(10)

We note that the condition in Equation 9 is approximately met because, if we assume that $y$ is a highly polygenic trait where

\forall i β_{i}^{2} V a r [x_{i}] << V a r [y],

then, if for one site $j$ , $n^{*}$ satisfies

V a r [{\hat{β}}_{j}^{s i b} (x)] = V a r [{\hat{β}}_{j}^{u r} (x)] = \frac{D (n^{*})}{V a r [x_{j}]}

such that $D (n^{*})$ is the same for sib-GWAS and standard GWAS, then for all sites $D (n^{*}) = \frac{V a r [y]}{n^{*} - 1}$ is the same, namely,

\forall i V a r [{\hat{β}}_{i}^{s i b} (x)] = V a r [{\hat{β}}_{i}^{u r} (x)] = \frac{D (n^{*})}{V a r [x_{i}]}

Equation 10 can therefore be thought of as using a weighted-median to estimate $n^{*}$ where each site $i$ is weighted by $\frac{1}{V a r [x_{i}]}$ . In conclusion, the requirement of Equation 10 leads to equal prediction accuracy of standard and sib-GWAS under the vanilla model assumptions. We note further that in the main text (Figure 3), to follow common practice, we use incremental $R^{2}$ throughout rather than $R^{2}$ . However, as we show in Appendix 1—figure 16, using $R^{2}$ instead gives highly similar qualitative results.

1.3 Indirect parental effects

1.3.1 Distribution of the effect size estimate at a single site

We consider an additive model with direct effects as well as indirect parental effects, assuming no interaction between the parents and the polygenic score of the children and ignoring possible indirect effects of siblings on each other. The other assumptions from the previous section—for example independent segregation of alleles across sites—remain. We start by considering the model

y = β_{0} + g + n + e

where $g$ is the sum of direct effects in an individual with genotype (effect-allele count) $x_{i}$ at each site $i$ ,

g = \sum_{i}^{m} β_{i} x_{i},

and

n = \sum_{i}^{m} η_{i} (x_{i} + {\tilde{x}}_{i}^{m} + {\tilde{x}}_{i}^{p})

is the sum of parental indirect effects, with overall parental effect allele count $x_{i} + {\tilde{x}}_{i}^{p} + {\tilde{x}}_{i}^{m}$ at each site, where ${\tilde{x}}_{i}^{m}$ is the untransmitted maternal effect allele count, and ${\tilde{x}}_{i}^{p}$ the untransmitted paternal effect allele count, with ${\tilde{x}}_{i}^{m}, {\tilde{x}}_{i}^{p} \in {0, 1}$ . As we show, when we choose the standard GWAS sample size $n^{*}$ such that the sampling error of the effect size estimates matches that of the sib-GWAS, the prediction accuracies of the two polygenic scores differ in an independent sample: unless there is a large, negative correlation between indirect and direct effects, the polygenic score from standard GWAS is expected to outperform the one based on sib-GWAS.

We first examine the distribution of an estimated effect size of $x_{i}$ on the phenotype. The OLS regression for a single site in a standard GWAS follows Equation 1 and can be rewritten as

y = β_{0} + (β_{i} + η_{i}) x_{i} + η_{i} ({\tilde{x}}_{i}^{p} + {\tilde{x}}_{i}^{m}) + ϵ_{i}

(11)

with

ϵ_{i} = g + n + e - (β_{i} + η_{i}) x_{i} - η_{i} ({\tilde{x}}_{i}^{p} + {\tilde{x}}_{i}^{m}) .

By the assumption of no assortative mating or other population structure,

C o v [{\tilde{x}}_{i}^{p}, {\tilde{x}}_{i}^{m}] = C o v [x_{i}, {\tilde{x}}_{i}^{m}] = C o v [x_{i}, {\tilde{x}}_{i}^{p}] = 0 .

(12)

It directly follows that under the generative model specified by Equation 11, the OLS regression of $y$ to $x_{i}$ and ${\tilde{x}}_{i}^{p} + {\tilde{x}}_{i}^{m}$ is a regression involving two independent variables. Therefore, ${\hat{β}}_{i}^{u r}$ is Normally distributed with expectation

E [{\hat{β}}_{i}^{u r}] = β_{i} + η_{i} .

We next calculate the variance of ${\hat{β}}_{i}^{u r}$ . From Equation 12 and

V a r [{\tilde{x}}_{i}^{m} + {\tilde{x}}_{i}^{p}] = V a r [x_{i}],

we obtain

V a r [ϵ_{i}] = V a r [y] + {(β_{i} + η_{i})}^{2} V a r [x_{i}] + η_{i}^{2} V a r [x_{i}] - 2 C o v [g + n, (β_{i} + η_{i}) x_{i}] - 2 C o v [n, η_{i} ({\tilde{x}}_{i}^{m} + {\tilde{x}}_{i}^{p})] =

= V a r [y] - V a r [x_{i}] (β_{i}^{2} + 2 β_{i} η_{i} + 2 η_{i}^{2}) .

Finally,

V a r [{\hat{β}}_{i}^{u r}] = \frac{V a r [ϵ_{i}]}{(n - 1) V a r [x_{i}]} = \frac{V a r [y] - V a r [x_{i}] (β_{i}^{2} + 2 β_{i} η_{i} + 2 η_{i}^{2})}{(n - 1) V a r [x_{i}]} .

(13)

In sib regression, we have

Δ y = Δ g + Δ e

since indirect parental effects cancel out when taking the difference between siblings (as siblings have the same parental effect allele count). Thus, the expected estimate is the same as it was in the absence of indirect effects. Using the same considerations as in Section 1.2 for the variance in sib differences, we obtain

{\hat{β}}_{i}^{s i b} \sim N (β_{i}, \frac{V a r [g] - β_{i}^{2} V a r [x_{i}] + V a r [e] (1 - 2 ρ_{s i b})}{(n_{p a i r s} - 1) V a r [x_{i}]}),

where $ρ_{s i b}$ is again the correlation in environmental effects between siblings.

1.3.2 Polygenic score prediction accuracy

We now examine the difference in prediction accuracies of ${\hat{g}}^{u r}$ and ${\hat{g}}^{s i b}$ after matching

V a r [{\hat{β}}_{i}^{u r}] \overset{!}{=} V a r [{\hat{β}}_{i}^{s i b}]

(14)

by choosing a standard GWAS sample size $n^{*}$ that empirically satisfies the condition, as we do in the main text (see also Section 1.2.3).

We can derive the expected prediction accuracy by averaging over both the estimation set (which we again shorthand as the distribution of $\hat{β}$ ) and the prediction set ${(x^{'}, y^{'})}$ . By the law of total expectation,

E [R^{2}] = E_{\hat{β}} [E_{{(x^{'}, y^{'})}} [R^{2}]] = E_{\hat{β}} [\frac{C o v_{{(x^{'}, y^{'})}}^{2} [\hat{g} (x^{'}), y^{'} | \hat{β}]}{V a r_{{(x^{'}, y^{'})}} [y^{'} | \hat{β}] V a r_{{(x^{'}, y^{'})}} [\hat{g} (x^{'}) | \hat{β}]}] \approx

\approx \frac{E_{\hat{β}} {[C o v_{{(x^{'}, y^{'})}} [\hat{g} (x^{'}), y^{'} | \hat{β}]]}^{2}}{V a r_{{(x^{'}, y^{'})}} [y^{'} | \hat{β}] E_{\hat{β}} [V a r_{{(x^{'}, y^{'})}} [\hat{g} (x^{'}) | \hat{β}]]},

(15)

where the last step is an approximation of the expectation of ratio by its first-order Taylor expansion, a ratio of expectations. The numerator of Equation 15 is

E_{\hat{β}} {[C o v_{{(x^{'}, y^{'})}} [\hat{g} (x^{'}), y^{'} | \hat{β}]]}^{2} = E_{\hat{β}} [\sum_{i}^{m} (β_{i} + η_{i}) {\hat{β}}_{i} C o v_{{(x^{'}, y^{'})}} {[x_{i}^{'}, x_{j}^{'} | \hat{β]}]}^{2} =

= E_{\hat{β}} {[\sum_{i}^{m} V a r [x_{i}] (β_{i} + η_{i}) {\hat{β}}_{i}]}^{2} =

= {(\sum_{i}^{m} V a r [x_{i}] (β_{i} + η_{i}) E [{\hat{β}}_{i}])}^{2} .

(16)

The terms in the denominator of Equation 15 are

V a r_{{(x^{'}, y^{'})}} [y^{'} | \hat{β}] = V a r [y]

(17)

and

E_{\hat{β}} [V a r_{{(x^{'}, y^{'})}} [\hat{g} (x^{'}) | \hat{β}] = E_{\hat{β}} [\sum_{i}^{m} V a r [x_{i}] {\hat{β}}_{i}^{2}] = \sum_{i}^{m} V a r [x_{i}] (E {[{\hat{β}}_{i}]}^{2} + V a r [{\hat{β}}_{i}]) .

(18)

Plugging Equations 16,17,18 back into Equation 15, we obtain

E [R^{2}] \approx \frac{{(\sum_{i}^{m} V a r [x_{i}] (β_{i} + η_{i}) E [{\hat{β}}_{i}])}^{2}}{V a r [y] (\sum_{i}^{m} V a r [x_{i}] V a r [{\hat{β}}_{i}] + \sum_{i}^{m} V a r [x_{i}] E {[{\hat{β}}_{i}]}^{2})} .

(19)

We note that

\tilde{C} := V a r [y] \sum_{i}^{m} V a r [x_{i}] V a r [{\hat{β}}_{i}]

is the same for sib-GWAS and standard GWAS under the requirement of Equation 14. We therefore have

E [R_{u r}^{2}] \approx \frac{{(\sum_{i}^{m} V a r [x_{i}] {(β_{i} + η_{i})}^{2})}^{2}}{\tilde{C} + V a r [y] \sum_{i}^{m} V a r [x_{i}] {(β_{i} + η_{i})}^{2}},

(20)

and

E [R_{s i b}^{2}] \approx \frac{{(\sum_{i}^{m} V a r [x_{i}] (β_{i} + η_{i}) β_{i})}^{2}}{\tilde{C} + V a r [y] \sum_{i}^{m} V a r [x_{i}] β_{i}^{2}} .

(21)

If we denote the proportion of the phenotypic variance explained by direct effects by

h_{β}^{2} := \frac{\sum_{i}^{m} V a r [x_{i}] β_{i}^{2}}{V a r [y]},

the proportion of the phenotypic variance explained by indirect effects of transmitted parental alleles by

τ_{η}^{2} := \frac{\sum_{i}^{m} V a r [x_{i}] η_{i}^{2}}{V a r [y]},

and the proportion of phenotypic variance explained by both direct and indirect effects of transmitted alleles by

τ^{2} := \frac{\sum_{i}^{m} V a r [x_{i}] {(β_{i} + η_{i})}^{2}}{V a r [y]}

then Equation 20 can be written as

E [R_{u r}^{2}] \approx τ^{2} \frac{1}{1 + c},

(22)

where we defined

c := \frac{\sum_{i}^{m} V a r [x_{i}] V a r [{\hat{β}}_{i}]}{\sum_{i}^{m} V a r [x_{i}] {(β_{i} + η_{i})}^{2}} .

Here, $c$ can be thought of as a summary of the noise-to-signal ratio, with respect to the signal coming from both direct and indirect effects of transmitted alleles. If we consider effects $β$ and $η$ as random, treating results obtained thus far as conditional on $β$ and $η$ , and further assume that effects are i.i.d. across sites (implying, in particular, that effect sizes and allele frequencies are independent),

(\begin{array}{ll} β_{i} \\ η_{i} \end{array}) \sim ((\begin{array}{ll} 0 \\ 0 \end{array}), (\begin{array}{ll} σ_{β}^{2} & ρ σ_{β} σ_{η} \\ ρ σ_{β} σ_{η} & σ_{η}^{2} \end{array})),

the expectation of the numerator of Equation 21 is

E_{β, η} [\sum_{i}^{m} V a r [x_{i}] β_{i} (β_{i} + η_{i}) | β, η] = \sum_{i}^{m} V a r [x_{i}] E_{β_{i}, η_{i}} [β_{i}^{2} + β_{i} η_{i}] = \sum_{i}^{m} V a r [x_{i}] (σ_{β}^{2} + ρ σ_{β} σ_{η})

and thus Equation 21, in expectation, is:

E [R_{s i b}^{2}] \approx E_{β, η} [E [R_{s i b}^{2} | β, η]] = {(1 + ρ \frac{σ_{η}}{σ_{β}})}^{2} h_{β}^{2} \frac{1}{1 + c / α} .

(23)

where

α := h_{β}^{2} / τ^{2} = \frac{\sum_{i}^{m} V a r [x_{i}] β_{i}^{2}}{\sum_{i}^{m} V a r [x_{i}] {(β_{i} + η_{i})}^{2}} .

We examined the fit of this prediction to simulated data. Specifically, we ran simulations to estimate effect sizes in a sib-GWAS and in a standard GWAS, after choosing $n^{*}$ to match their sampling variances. Finally, we used the polygenic scores to predict phenotypic values in a sample of unrelated individuals (see Section 1.3.3 for further detail).

Appendix 1—figure 9 A,C,D show the analytic result alongside simulation results, for different correlation coefficients between indirect and direct effect sizes. Even in the absence of a correlation between indirect and direct effect sizes, the polygenic score based on standard GWAS outperforms the polygenic score based on sib-GWAS.

To understand this behavior and dependency of the $\frac{R_{s i b}^{2}}{R_{u r}^{2}}$ ratio on other parameters, we divide Equation 23 by Equation 22 and obtain

E [\frac{R_{s i b}^{2}}{R_{u r}^{2}}] \approx \frac{E [R_{s i b}^{2}]}{E [R_{u r}^{2}]} \approx {(1 + ρ \frac{σ_{η}}{σ_{β}})}^{2} α \frac{1 + c}{1 + c / α} .

Noting further that

{(1 + ρ \frac{σ_{η}}{σ_{β}})}^{2} α = {(\frac{σ_{β} + ρ σ_{η}}{σ_{β}})}^{2} \frac{σ_{β}^{2}}{σ_{β}^{2} + 2 ρ σ_{β} σ_{η} + σ_{η}^{2}} = 1 - (1 - ρ^{2}) \frac{τ_{η}^{2}}{τ^{2}},

we obtain

E [\frac{R_{s i b}^{2}}{R_{u r}^{2}}] \approx [1 - (1 - ρ^{2}) \frac{τ_{η}^{2}}{τ^{2}}] \frac{1 + c}{1 + c \frac{τ^{2}}{h_{β}^{2}}} .

(24)

A few conclusions emerge from Equation 24 and the accompanying simulations. First, the sib-GWAS based polygenic score will outperform the standard GWAS-based polygenic score only if direct and indirect effects are strongly negatively correlated (see Appendix 1—figure 9A-D for illustration). Second, the term

\frac{1 + c}{1 + c \frac{τ^{2}}{h_{β}^{2}}} = \frac{1 + \frac{\sum_{i}^{m} V a r [{\hat{β}}_{i}] V a r [x_{i}]}{τ^{2}}}{1 + \frac{\sum_{i}^{m} V a r [{\hat{β}}_{i}] V a r [x_{i}]}{h_{β}^{2}}}

(25)

can be interpreted as the dependence on the noise-to-signal ratio (where the signals are the proportions of phenotypic variance explained by direct and indirect effects of transmitted alleles). For a given sampling variance (matched across the two study designs), the extent of the signal will differ between standard GWAS and sib-GWAS. Importantly, the sampling variance influences the ratio of prediction accuracies. If indirect effects do not exist or make negligible contributions to the trait in question, then the ratio of prediction accuracies is expected to be close to one. In the presence of indirect effects, however, the magnitude of the deviation from one depends on the relationship between direct and indirect effects (and their covariance) as well as on the (matched) sampling variance. Simulations of several parameter combinations suggest that the overall effect of this dependence on the noise-to-signal ratio is a decrease in $R_{s i b}^{2} / R_{u r}^{2}$ as noise increases; as more SNPs are included in the polygenic scores, the advantage of the standard GWAS-based polygenic score over that of the sib-GWAS grows larger (Appendix 1—figure 9 E-H). These considerations inform the interpretation of patterns observed in Figure 3C–F of the main text.

1.3.3 Simulations of indirect effects

For each set of simulated individuals (discovery, estimation and prediction sets), we first simulated mother-father pairs, assigning parental alleles from $B e r n o u l l i (p_{i})$ , where $p_{i}$ denotes the allele frequency at site $i$ . We then sampled the parental alleles at random to generate offspring (one offspring per each mother-father pair to simulate a sample of unrelated individuals and two offspring to generate sibling pairs). Phenotypes of the offspring were assigned under an additive model, sampling from a Normal distribution with mean

\sum_{i}^{m} β_{i} x_{i} + η_{i} (x_{i}^{p} + x_{i}^{m})

(where $x_{i}^{m}$ and $x_{i}^{p}$ are the maternal and paternal effect allele counts, respectively) and variance $σ_{e}^{2},$ representing the total variance of environmental effects. When there is no correlation between direct and indirect effects, $σ_{e}^{2} = 1 - h_{β}^{2} - 2 τ_{η}^{2}$ . Using this approach, we generated a set of sibling pairs and estimated SNP effect sizes from these simulated data using a sib-GWAS. We calculated $n^{*}$ as follows: we simulated sets of unrelated individuals with a range of sample sizes. In each set, we performed a simple linear regression of the phenotypic values on the genotypes. We then estimated a linear relationship between the inverse of the median standard error of effect size estimates (as a dependent variable) and the square root of the sample size. Using this linear relationship, we predicted the sample size for the unrelated set that gives a median standard error equal to the median standard error of sib-GWAS effect size estimates ( $n^{*}$ ). Finally, we simulated a set of unrelated individuals with sample size $n^{*}$ and compared the prediction accuracy ( $R^{2}$ ) of the polygenic score based on standard GWAS on this sample with the one obtained from sib-GWAS.

We additionally investigated the effect of the number of SNPs included in the polygenic scores. For this analysis, we sorted the SNPs based on the association p-value obtained in an independent simulated set of unrelated individuals.

In these simulations, we used the following parameter values:

The ratio of the phenotypic variance accounted for by direct effects versus by indirect effects ( $h_{β}^{2} / τ_{η}^{2}$ ): 5
The phenotypic variance explained by offspring and parental alleles, given no correlation between direct and indirect effects ( $h_{β}^{2} + 2 τ_{η}^{2}$ ): 0.25 or 0.5
The ratio of the variance of direct effects to the variance of indirect effects ( $σ_{β}^{2} / σ_{η}^{2}$ ): 5
Allele frequencies, $p$ , drawn from a truncated exponential distribution, truncated on the left such that the minimum allele frequency is 1%.
The number of loci, assumed independent (i.e., in linkage equilibrium): 100 (all causal), or 10,000 (all causal) or 10,000 (20% causal)
SNP effect sizes drawn as
- $(\begin{array}{ll} β_{i} \\ η_{i} \end{array}) \sim N ((\begin{array}{ll} 0 \\ 0 \end{array}), (\begin{array}{ll} σ_{β}^{2} & ρ σ_{β} σ_{η} \\ ρ σ_{β} σ_{η} & σ_{η}^{2} \end{array})),$

where $ρ$ is the correlation between direct and indirect effect sizes. Effects sizes were then re-scaled to satisfy $\sum_{i}^{m} 2 β_{i}^{2} p_{i} (1 - p_{i}) = h_{β}^{2}$ and $\sum_{i}^{m} 2 η_{i}^{2} p_{i} (1 - p_{i}) = τ_{η}^{2}$ . Effects were set to 0 for non-causal loci.

The number of sibling pairs for sib GWAS: 10,000
The number of unrelated individuals for prediction: 10,000
The number of unrelated individuals for discovery GWAS (i.e., to decide which SNPs to include): 20,000
Number of iterations used to estimate $n^{*}$ and $R^{2}$ for a given set of parameters: 10

1.4 Assortative mating

We consider assortative mating with regard to a phenotype, whereby the parents of individuals were more likely to mate if they were similar with respect to that phenotype. This process generates a correlation between genetic variants that contribute to the phenotype (i.e., linkage disequilibrium). Consequently, in a standard GWAS, the effect sizes of causal SNPs will partially capture the effect of other causal SNPs as well. Estimated effect sizes are thus expected to be inflated under positive assortative mating (mating of similar individuals) and deflated under negative assortative mating (mating of dissimilar individuals). In turn, in a sib-GWAS, the estimates are in expectation unaffected by assortative mating, because genetic differences between siblings arise from random Mendelian segregation in the parents.

1.4.1 Simulations of assortative mating

We used simulations to examine the phenotypic prediction accuracies of polygenic scores based on sib- and standard GWAS under a model with assortative mating (assuming no indirect effects or population stratification beyond assortative mating); to this end, we considered a sample of unrelated individuals, varying the degree of correlation between parental phenotypes $ρ_{a}$ . Similar to our simulations for indirect effects (Section 1.3.3), we first simulated the estimation procedure in a sibling-based and in a standard GWAS (with sample size $n^{*}$ ). We then computed the prediction accuracy $R^{2}$ in an independent sample of unrelated individuals (see ‘Further simulation details’ below).

We first considered the simple case of a single generation of assortative mating. In the presence of positive assortative mating ( $ρ_{a} > 0$ ), polygenic scores based on standard GWAS outperform those based on sib-GWAS, whereas the opposite is true in the case of negative assortative mating ( $ρ_{a} < 0$ ) (Appendix 1—figure 10 A). In simulations of two generations of assortative mating, the gap between the prediction accuracies of scores based on standard and sib-GWAS (Appendix 1—figure 10 B) widens, suggesting that our qualitative findings apply to scenarios of sustained assortative mating as well.

We further investigated prediction accuracy as a function of the number of SNPs included in the polygenic scores, by progressively increasing the p-value threshold, using p-values obtained from an independent GWAS in unrelated samples (similar to our analysis in Figure 3). We considered two genetic architectures scenarios: (i) in which all SNPs are causal and (ii) the case in which 20% of of SNPs are causal (leading polygenic scores to include non-causal SNPs). Under both scenarios, the gap in prediction accuracies between standard and sib-GWAS grows with the number of SNPs (Appendix 1—figure 10 C-F).

Further simulation details

We simulated parental and offspring alleles as described for indirect effects in Section 1.3.3. To mimic assortative mating between parents, we first simulated i.i.d. genotypes (with effect allele counts $x_{i}$ at each SNP $i$ ) and randomly assigned ‘mother’ and ‘father’ labels to each individual. We then generated corresponding parental phenotypes under an additive model as

N (\sum_{i}^{m} β_{i} x_{i}, \sqrt{1 - h^{2}})

where $β_{i}$ is the effect size of SNP $i$ , and $h^{2}$ is the heritability. The same model was used to generate offspring phenotypes.

To mimic the assortative mating process, we induced a given correlation between parental phenotypes, $ρ_{a}$ , by paring mothers and fathers as follows: we first generated a random matrix

(\begin{array}{ll} u_{m, i} \\ u_{p, i} \end{array}) \sim N ((\begin{array}{ll} {\bar{y}}_{m} \\ {\bar{y}}_{p} \end{array}), (\begin{array}{ll} σ_{y_{m}}^{2} & ρ_{a} σ_{y_{m}} σ_{y_{p}} \\ ρ_{a} σ_{y_{m}} σ_{y_{p}} & σ_{y_{p}}^{2} \end{array})),

where ${\bar{y}}_{m}$ and ${\bar{y}}_{p}$ are the average phenotypes of mothers and fathers, respectively, $σ_{y_{m}}$ and $σ_{y_{p}}$ are the standard deviation of the phenotypes of mothers and fathers, respectively. We then sorted the mothers and fathers sets such that the ranks of values in $y_{m}$ and $y_{p}$ match the ranks of values in $u_{m}$ and $u_{p}$ , respectively, to obtain $c o r (y_{m}, y_{p}) \approx c o r (u_{m}, u_{p}) = ρ_{a}$ . In the case of two generations of assortative mating, we simulated the generation of the grandparents similarly. We compared the performance of polygenic scores based on standard and sib-GWAS as described in Section 1.3.3. In the simulations, we used the following parameter values:

Heritability under random mating ( $h^{2}$ ): 0.5
The number of loci, assumed independent (i.e., in linkage equilibrium) under random mating: 10,000 (all causal) or 10,000 (20% causal)
Allele frequencies, $p$ , drawn from a truncated exponential distribution, truncated on the left such that the minimum allele frequency is 1%.
SNP effect sizes set to 0 for non-causal loci and drawn as $β_{i} \sim N (0, σ^{2})$ , choosing $σ^{2}$ to satisfy $\sum_{i}^{m} 2 β_{i}^{2} p_{i} (1 - p_{i}) = h^{2}$ for causal loci.
The number of sibling pairs for sib-GWAS: 10,000
The number of unrelated individuals for prediction: 10,000
The number of unrelated individuals for discovery GWAS (i.e., to decide which SNPs to include in the polygenic score): 20,000
The number of iterations used to estimate $n^{*}$ and $R^{2}$ for a given set of parameters: 10

Appendix 1—figure 1. — This figure extends Figure 1 of the main text, showing prediction accuracies based on large-scale diverse GWAS that are the union of all strata matching the number of individuals in each stratum. The numbers in parentheses show GWAS sample sizes (see Materials and methods for details). Each box and whiskers plot was computed based on 20 iterations of resampling estimation and prediction sets. Thick horizontal lines denote the medians. The polygenic scores were estimated in samples of unrelated WB individuals. Phenotypes were then predicted in distinct samples of unrelated WB individuals, stratified by sex (A), age (B) or Townsend deprivation index, a measure of SES (C). In red and green cases, polygenic scores are based on a GWAS in a sample limited to one sex, age or SES group (a 'stratum’). In black, polygenic scores are based on a diverse GWAS in a pooled sample of all strata. In blue, polygenic scores are based on a diverse GWAS in a pooled sample of all strata but downsampled to match the size of the stratified GWAS.

Appendix 1—figure 2. — This figure mirrors Figure 1 of the main text, except for the y-axis showing $R^{2}$ values (squared correlation between polygenic score and phenotype residualized on covariates), rather than incremental $R^{2}$ . Each box and whiskers plot was computed based on 20 iterations of resampling estimation and prediction sets. Thick horizontal lines denote the medians. The polygenic scores were estimated in samples of unrelated WB individuals. Phenotypes were then predicted in distinct samples of unrelated WB individuals, stratified by sex (A), age (B) or Townsend deprivation index, a measure of SES (C). In red and green cases, polygenic scores are based on a GWAS in a sample limited to one sex, age or SES group (a 'stratum’). In blue, polygenic scores are based on a GWAS in a diverse sample of all strata downsampled to match the size of the stratified GWAS.

Appendix 1—figure 3. — This figure extends Figure 1 of the main text, showing the prediction accuracies as a function of the p-value threshold for inclusion of a SNP in the polygenic score when based on a pruning and thresholding approach. The higher the p-value threshold is, the more SNPs are included. Last points on the x-axis correspond to a polygenic score model based on the LDpred approach (Vilhjálmsson et al., 2015) with a prior probability of 1 on loci being causal. Shown are incremental $R^{2}$ values in different prediction sets. Points and error bars are mean and central 80% range computed based on 20 iterations of resampling estimation and prediction sets. (**A–C**) The polygenic scores were estimated in samples of unrelated WB individuals. Phenotypes were then predicted in distinct samples of unrelated WB individuals, stratified by sex (A), age (B) or Townsend deprivation index, a measure of SES (C). (**D–I**) Same as in **A-C**, but here the polygenic scores are based on a GWAS in a sample limited to one sex, age or SES group. The trends shown in Figure 1 of the main text are for p-value threshold of 10⁻⁴, and are qualitatively similar to the trends for other choices of the polygenic score model. For each trait, sample sizes are matched across all GWAS sets.

Appendix 1—figure 4. — SNPs were ascertained in large samples of unrelated WB individuals. The effects of trait-increasing alleles were then re-estimated in an independent set of unrelated WB individuals (that were excluded from the original GWAS) stratified by sex for diastolic blood pressure (A), by age for BMI (B) and by Townsend deprivation index, a measure of SES for years of schooling (C). Points and error bars are mean and central 80% range computed based on 20 iterations of resampling ascertainment and estimation sets, plotted as a function of the p-value threshold (for p-values obtained in the discovery GWAS).

Appendix 1—figure 5. — This figure mirrors the last two columns in Appendix 1—figure 3, except that here, the GWAS estimates were obtained from a linear mixed model (LMM) (Loh et al., 2015). Shown are the prediction accuracies, measured as incremental $R^{2}$ , as a function of the p-value threshold for inclusion of a SNP in the polygenic score. Points and error bars are mean and central 80% range computed based on 20 iterations of resampling estimation and prediction sets. The polygenic scores are based on a GWAS in a sample limited to one sex, age or SES group. Phenotypes are then predicted in distinct samples of unrelated individuals, stratified by sex (**A,B**), age (**C,D**) or Townsend deprivation index, as a measure of SES (**E,F**). The qualitative trends are similar to those in Appendix 1—figure 3, which uses a standard linear regression with PCs (principal components of the genotype data) as a control for population structure when testing for an association between the phenotypes and genotypes. The similarity suggests that the observed differences in prediction accuracies across strata are not driven to a large degree by population structure confounding.

Appendix 1—figure 6. — Panels show the distribution of Townsend deprivation index, a measure of SES (A), the age distribution (B), and the proportion of males (C) for the siblings and unrelated sets used in the analysis described for Figure 3 of the main text. For each sibling pair, one sibling was randomly selected for these comparisons. The asterisk symbol marks a significant difference at the 1% level between siblings and unrelated individuals, as assessed by a Mann-Whitney test. SES and age distributions are quite similar in siblings and unrelated sets, whereas the proportion of males is significantly smaller in the siblings.

Appendix 1—figure 7. — Panels show the distribution of PCs (principal components of the genotype data) for the siblings and unrelated sets used in the analysis described for Figure 3 of the main text. For each sibling pair, one sibling was randomly selected for these comparisons. The asterisk symbol marks a significant difference at the 1% level between siblings and unrelated individuals, as assessed by a Mann-Whitney test. Despite slight but significant differences, siblings and unrelated sets are broadly similar with respect to their genetic ancestries.

Appendix 1—figure 8. — This figure mirrors Figure 3B of the main text, but here plotted for 12 simulated traits. The numbers in parentheses are the heritability, the number of causal loci considered, and the simulation replicate number, respectively. Three traits were simulated for each pair of heritability and number of causal loci parameters (see Materials and methods for simulation details). Small points show the ratio of the prediction accuracies in the two designs across 30 iterations; in each iteration, we resample sets of unrelated individuals to constitute three sets for discovery, estimation and prediction. Larger points show median values.

Appendix 1—figure 9. — (**A,B**) Simulation results as a function of the correlation between direct and indirect effects, $ρ$ . Simulations were performed with $h_{β}^{2} = 0.5$ , $τ_{η}^{2} = 0.1$ , and $σ_{β}^{2} / σ_{η}^{2} = 5$ . The size of the estimation set in the sib-GWAS is 10,000, and the size of the estimation set in the standard GWAS is chosen to match sampling variances between the two study designs. The polygenic scores is based on 10,000 causal loci; its performance was evaluated in an independent set of 10,000 unrelated individuals. As long as direct and indirect effects are not strongly negatively correlated, the out of sample prediction accuracy is higher for the polygenic scores based on standard GWAS. (C) Same as (A) but with three-fold greater environmental noise. (D) Same as (A) but with 100 causal loci. In (**A–D**) points are mean ± 2 SD in 10 simulation iterations. Solid lines are values based on analytic expressions derived in Section 1.3.2. (**E–H**) Simulation results, with the same parameters as in (A) but $ρ = 0.5$ , as a function of the number of SNPs included in the polygenic scores, with all loci being causal (**E,F**), or with 20% of loci being causal (**G,H**). SNPs are added in increasing order of their association p-value in an independent set of 20,000 unrelated individuals. In both cases, the ratio of prediction accuracies of polygenic scores based on sib- versus standard GWAS becomes smaller with the inclusion of more weakly associated SNPs, a behavior qualitatively similar to observations in Figure 3 in the main text. Points are mean ± 2 SD in 10 simulations. See Section 1.3.3 for simulation details.

Appendix 1—figure 10. — (A) Simulation results as a function of the approximate correlation between parental phenotypes, $ρ_{a}$ . Simulations were performed with $h^{2} = 0.5$ under random mating. The size of the estimation set in the sib-GWAS is 10,000, and the size of the estimation set in the standard GWAS is chosen to match sampling variances between the two study designs. The polygenic score is based on 10,000 causal loci; its performance was evaluated in an independent set of 10,000 unrelated individuals. Standard-GWAS based polygenic scores outperforms (underperforms) sib-GWAS based polygenic scores under positive (negative) assortative mating. (B) Ratio of prediction accuracies of the polygenic scores based on sib- versus standard GWAS, as a function of $ρ_{a}$ , for two sets of simulations with one or two generations of assortative mating, with same parameters as in (A). (**C–F**) Simulation results, with the same parameters as in (A) but $ρ_{a} = 0.5$ , as a function of the number of SNPs included in the polygenic score, with all loci being causal (**C,D**), or with 20% of loci being causal (**E,F**). SNPs are added in the order of their association p-value in an independent set of 20,000 unrelated individuals. In both cases, the ratio of prediction accuracies for scores based on sib-GWAS versus standard GWAS becomes smaller with the inclusion of more weakly associated SNPs, a behavior that is qualitatively similar to observations in Figure 3 in the main text. Points are mean ± 2 SD in 10 simulation iterations. See Section 1.4.1 for simulation details.

Appendix 1—figure 11. — This figure mirrors Figure 3B of the main text, but here the samples of siblings and unrelated individuals used in the analysis are matched for their sex ratios. Results are shown for diastolic blood pressure, as the prediction accuracy differed between sexes (Figure 1); the related phenotype of pulse rate; and a subset of the traits for which the prediction accuracy varied by GWAS design (Figure 3B). Small points show the ratio of the prediction accuracies in the two designs across 10 iterations; in each iteration, we resample sets of unrelated individuals to constitute three sets for discovery, estimation and prediction. Larger points show median values. We note that pulse rate is now similarly predicted by the two GWAS approaches, suggesting that perhaps the slightly higher prediction accuracy of the sib-GWAS shown in the main text Figure 3B are due to the sex ratio difference; for other traits, results are qualitatively unchanged.

Appendix 1—figure 12. — This figure complements Figure 3C–F of the main text, showing the results of the study design depicted in Figure 3A for all traits presented in Figure 3. As described for Figure 3, we randomly divided unrelated individuals to constitute three non-overlapping sets for discovery, estimation and prediction. Small points correspond to 10 iterations of resmapling these three sets. The prediction accuracy is plotted as a function of the p-value threshold, where p-values come from the discovery GWAS. Lines show median values.

Appendix 1—figure 13. — (A) The y-axis shows the prediction accuracy, measured as incremental $R^{2}$ , in prediction sets stratified by participants’ number of siblings, using a polygenic score for years of schooling based on a GWAS performed using individuals who reported to have exactly 1 sibling. The x-axis shows the p-value threshold for inclusion of a SNP in the polygenic score when based on a pruning and thresholding approach. Last points on the x-axis correspond to a polygenic score model based on the LDpred approach (Vilhjálmsson et al., 2015) with a prior probability of 1 on loci being causal. Points are values based on 10 iterations of resampling estimation and prediction sets. Thick horizontal lines denote the mean values. (**B–E**) Comparison of the distribution of Townsend deprivation index (B) the age distribution (C), the proportion of males (D), and mean years of schooling (± 2 SD) between individuals who reported having no sibling and those who reported having 1 sibling. The two sets have somewhat different distributions of ages (or possibly come from somewhat different birth cohorts), a feature that could contribute to the patterns seen in panel A, but are otherwise similar with respect to the other features considered.

Appendix 1—figure 14. — This figure is analogous to the one shown in Figure 1 of the main text, but considering dichotomized versions of the traits presented in Figure 1 in the prediction sets, and with the y-axis showing incremental AUC values rather than incremental $R^{2}$ . The polygenic scores are based on GWAS using the quantitative trait values as in Figure 1. The traits are (A) diastolic blood pressure of over 110 mmHg, (B) BMI of over 35 Kg/m², and (C) completing a college or a university degree. Each box and whiskers plot was computed based on 20 iterations of resampling estimation and prediction sets. Thick horizontal lines denote the medians.

Appendix 1—figure 15. — This figure is analogous to the one shown in Figure 1 of the main text, but looking at disease traits, and with the y-axis showing incremental AUC rather than incremental $R^{2}$ . Each box and whiskers plot was computed based on 20 iterations of resampling estimation and prediction sets. Thick horizontal lines denote the medians. The variable prediction accuracy of PGS based on GWAS in men only versus women only could be driven in part by the differences in ratios of cases to controls (and hence by differences in the precision of the effect size estimates). However, we also observe that the prediction accuracy can vary depending on the sex composition of the prediction set (e.g., for cardiovascular outcomes), an observation that cannot be attributed to differences in case:control ratios of the GWAS.

Appendix 1—figure 16. — This figure mirrors Figure 3B of the main text, but here we first residualized the phenotypes on covariates, and then ran the same pipeline described as that used to generate Figure 3B on the residuals without further adjustment for covariates in the GWAS or prediction evaluation. Thus, this figure relates more directly to the analytical derivation in Section 1.2. However, the results in Figure 3B and here are qualitatively similar. Small points show the ratio of the prediction accuracies in the two designs across 10 iterations; in each iteration, we resample sets of unrelated individuals to constitute three sets for discovery, estimation and prediction. Larger points show median values.

Appendix 1—table 1. UK Biobank phenotype data used in this study and their corresponding data fields.

In parentheses are the units of measurements.

Trait	Description	UKB data field
Age	Age when attended assessment center (years)	21003
Age at first sex	Self-reported age at first sexual intercourse (years)	2139
Alcohol intake frequency	Self-reported category, encoded as an integer: 1, 'Daily or almost daily’; 2, 'Three or four times a week’; 3, 'Once or twice a week’; 4, 'One to three times a month’; 5, 'Special occasions only’; 6, 'Never’	1558
Basal metabolic rate	Estimated from body composition impedance measurements (KJ)	23105
Birth weight	Self-reported birth weight (Kg)	20022
Body mass index	Constructed from height and weight measurements (Kg/m²)	21001
Diastolic blood pressure	Measured using automated devices (mmHg); values are adjusted for medicine use (see Materials and methods)	4079, 6153, 6177
Fluid intelligence	Unweighted sum of the number of correct answers given to 13 fluid intelligence questions	20016
Forced vital capacity	Calculated from breath spirometry (liters)	3062
Hair color	Self-reported category, encoded as an integer: 1, 'Blonde’; 2, 'Red’; 3, 'Light brown’; 4, 'Dark brown’; 5, 'Black’; none, 'Other’	1747
Hand grip strength	Measured right and left hand isometric grip strength (Kg)	46, 47
Height	Measured standing height (cm)	50
Hip circumference	Measured hip circumference (cm)	49
Hospital inpatient diagnoses	Diagnoses made during hospital inpatient admissions, coded according to the International Classification of Diseases (ICD-9 and ICD-10)	41202, 41203, 41204, 41205, 41270, 41271
Household income	Self-reported average total annual household income before tax category, encoded as an integer: 1, 'Less than £18,000'; 2, '£18,000 to £30,999'; 3, '£31,000 to £51,999'; 4, '£52,000 to £100,000'; 5, 'Greater than £100,000'	738
Myocardial infarction outcomes	Algorithmically-defined myocardial infarction outcomes obtained through combinations of UK Biobank's assessment data collection (e.g., self-reported conditions), and data from hospital admissions	42000, 42001
Neuroticism score	Derived summary score, based on participants’ responses to 12 neurotic behaviour-related questions	20127
Number of full siblings	Sum of self-reported number of full brothers and full sisters	1873, 1883
Overall health rating	Self-reported category, encoded as an integer: 1, 'Excellent'; 2, 'Good'; 3, 'Fair'; 4, 'Poor’	2178
Pack years of smoking	Calculated for individuals who have ever smoked as the number of cigarettes smoked per day, divided by twenty, multiplied by the number of years of smoking (years)	20161
Pulse rate	Measured during the automated blood pressure readings (bpm)	102
Qualifications	Self-reported educational or professional qualifications, selected from: 'College or University degree', 'NVQ or HND or HNC or equivalent', 'Other professional qualifications eg: nursing, teaching', 'A levels/AS levels or equivalent', 'O levels/GCSEs or equivalent', 'CSEs or equivalent', or 'None of the above'	6138
Sex	Self-reported sex and as determined from genotyping analysis	31, 22001
Skin color	Self-reported category, encoded as an integer: 1, 'Very fair'; 2, 'Fair'; 3, 'Light olive'; 4, 'Dark olive'; 5, 'Brown'; 6, 'Black'	1717
Townsend deprivation index	Townsend deprivation index at recruitment	189
Vascular/heart problems	Self-reported vascular/heart problems diagnosed by doctor selected from the categories: 'Heart attack’, 'Angina', 'Stroke', 'High blood pressure', and 'None of the above'	6150
Waist circumference	Measured waist circumference (cm)	48

Open in a new tab

Appendix 1—table 2. Genetic correlations across samples that vary by a study characteristic.

Numbers are genetic correlations estimated using LD score regression for BMI, years of schooling and diastolic blood pressure, across samples stratified by age, Townsend deprivation index (a measure of socioeconomic status, SES), and sex, respectively. ’Q’ denotes quartile of age or SES.

Trait/characteristic	Pair of strata	Genetic correlation (s.e.)
BMI/Age	(Q1,Q2)	0.93 (0.036)
	(Q1,Q3)	0.95 (0.035)
	(Q1,Q4)	0.95 (0.038)
	(Q2,Q3)	0.89 (0.032)
	(Q2,Q4)	0.91 (0.036)
	(Q3,Q4)	1.00 (0.040)
Years of schooling/SES	(Q1,Q2)	0.98 (0.054)
	(Q1,Q3)	1.00 (0.067)
	(Q1,Q4)	0.93 (0.068)
	(Q2,Q3)	0.97 (0.064)
	(Q2,Q4)	1.09 (0.074)
	(Q3,Q4)	1.04 (0.074)
Diastolic blood pressure/Sex	(male,female)	0.93 (0.031)

Open in a new tab

Appendix 1—table 3. Sample sizes used for siblings and unrelated sets.

As described in Figure 3A, for the comparison of prediction accuracies of polygenic scores based on standard and sib-GWAS, we first ascertain SNPs in a large sample of unrelated individuals (‘Unrelated-discovery’) and then estimate the effect of these SNPs with a standard regression using unrelated individuals (‘Unrelated-n*') and, independently, using sib-regression (in the ‘Siblings’ set). Finally, we used the polygenic scores for prediction in a third sample of unrelated individuals (‘Unrelated-prediction’). This table shows sample sizes used for each set across the traits analyzed. For simulated traits, the numbers in parentheses are heritability, number of causal loci, and simulation replicate, respectively (three traits were simulated for each pair of heritability and number of causal loci parameters, see Materials and methods for simulation details).

Trait	Siblings (pairs)	Unrelated-n*	Unrelated- discovery	Unrelated- prediction
Age at first sex	13675	8746	244988	27220
Alcohol intake frequency	17282	10923	276885	30764
Basal metabolic rate	16802	13467	269750	29972
Birth weight	6750	5766	159074	17674
BMI	17217	12359	274868	30540
Diastolic blood pressure	14791	9514	253227	28136
Fluid intelligence	3889	2979	101016	11223
Forced vital capacity	14605	10009	252576	28064
Hair color	16859	11763	272209	30245
Hand grip strength	17070	10832	275117	30568
Height	17242	18147	269973	29997
Hip circumference	17254	11648	275930	30658
Household income	13240	8704	239326	26591
Neuroticism score	11756	6909	227010	25223
Overall health rating	17189	10378	276581	30731
Pack years of smoking	2307	1682	85544	9504
Pulse rate	14791	8812	253859	28206
Skin color	16903	10334	274159	30462
Waist circumference	17257	11749	275873	30652
Years of schooling	17037	11885	273553	30394
Simulated trait (0.5,10K,1)	17299	11685	276404	30711
Simulated trait (0.5,10K,2)	17299	11505	276566	30729
Simulated trait (0.5,10K,3)	17299	11422	276641	30737
Simulated trait (0.5,100K,1)	17299	11814	276288	30698
Simulated trait (0.5,100K,2)	17299	11833	276271	30696
Simulated trait (0.5,100K,3)	17299	11490	276579	30731
Simulated trait (0.1,10K,1)	17299	9072	278756	30972
Simulated trait (0.1,10K,2)	17299	9158	278678	30964
Simulated trait (0.1,10K,3)	17299	9111	278721	30968
Simulated trait (0.1,100K,1)	17299	9133	278701	30966
Simulated trait (0.1,100K,2)	17299	9069	278758	30973
Simulated trait (0.1,100K,3)	17299	9108	278723	30969

Open in a new tab

Appendix 1—table 4. Qualifications to years of schooling conversion table.

Educational or professional qualifications were converted to years of schooling following Okbay et al. (2016).

Qualifications (UKB data field 6138)	Years of schooling
College or University degree	20
NVQ or HND or HNC or equivalent	19
Other professional qualifications eg: nursing, teaching	15
A levels/AS levels or equivalent	13
O levels/GCSEs or equivalent	10
CSEs or equivalent	10
None of the above	7

Open in a new tab

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Hakhamanesh Mostafavi, Email: hsm2137@columbia.edu.

Arbel Harpak, Email: ah3586@columbia.edu.

Molly Przeworski, Email: mp3284@columbia.edu.

Ruth Loos, The Icahn School of Medicine at Mount Sinai, United States.

Michael B Eisen, HHMI, University of California, Berkeley, United States.

Funding Information

This paper was supported by the following grants:

National Institute of General Medical Sciences GM121372 to Molly Przeworski.
National Human Genome Research Institute HG008140 to Jonathan K Pritchard.
Robert Wood Johnson Foundation 84337817 to Dalton Conley.
Simons Foundation 633313 to Arbel Harpak.

Additional information

Competing interests

Reviewing editor, eLife.

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology.

Formal analysis, Investigation, Visualization.

Conceptualization.

Conceptualization, Resources, Supervision, Methodology, Project administration.

Ethics

Human subjects: This study has been conducted using the UK Biobank resource under application Number 11138, as approved by Columbia University Institutional Review Board, protocol AAAS2914.

Additional files

Transparent reporting form

elife-48376-transrepform.pdf^{(276.5KB, pdf)}

Data availability

The GWAS summary statistics generated in this study have been uploaded to Dryad.

The following dataset was generated:

Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M. 2019. Variable prediction accuracy of polygenic scores within an ancestry group. Dryad Digital Repository.

References

Adhikari K, Mendoza-Revilla J, Sohail A, Fuentes-Guajardo M, Lampert J, Chacón-Duque JC, Hurtado M, Villegas V, Granja V, Acuña-Alonzo V, Jaramillo C, Arias W, Lozano RB, Everardo P, Gómez-Valdés J, Villamil-Ramírez H, Silva de Cerqueira CC, Hunemeier T, Ramallo V, Schuler-Faccini L, Salzano FM, Gonzalez-José R, Bortolini M-C, Canizales-Quinteros S, Gallo C, Poletti G, Bedoya G, Rothhammer F, Tobin DJ, Fumagalli M, Balding D, Ruiz-Linares A. A GWAS in latin americans highlights the convergent evolution of lighter skin pigmentation in eurasia. Nature Communications. 2019;10:385. doi: 10.1038/s41467-018-08147-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barcellos SH, Carvalho LS, Turley P. Education can reduce health disparities related to genetic risk of obesity: evidence from a british reform. bioRxiv. 2018 doi: 10.1101/260463. [DOI] [PMC free article] [PubMed]
Belsky DW, Domingue BW, Wedow R, Arseneault L, Boardman JD, Caspi A, Conley D, Fletcher JM, Freese J, Herd P, Moffitt TE, Poulton R, Sicinski K, Wertz J, Harris KM. Genetic analysis of social-class mobility in five longitudinal studies. PNAS. 2018;115:E7275–E7284. doi: 10.1073/pnas.1801238115. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bentley AR, Sung YJ, Brown MR, Winkler TW, Kraja AT, Ntalla I, Schwander K, Chasman DI, Lim E, Deng X, Guo X, Liu J, Lu Y, Cheng CY, Sim X, Vojinovic D, Huffman JE, Musani SK, Li C, Feitosa MF, Richard MA, Noordam R, Baker J, Chen G, Aschard H, Bartz TM, Ding J, Dorajoo R, Manning AK, Rankinen T, Smith AV, Tajuddin SM, Zhao W, Graff M, Alver M, Boissel M, Chai JF, Chen X, Divers J, Evangelou E, Gao C, Goel A, Hagemeijer Y, Harris SE, Hartwig FP, He M, Horimoto A, Hsu FC, Hung YJ, Jackson AU, Kasturiratne A, Komulainen P, Kühnel B, Leander K, Lin KH, Luan J, Lyytikäinen LP, Matoba N, Nolte IM, Pietzner M, Prins B, Riaz M, Robino A, Said MA, Schupf N, Scott RA, Sofer T, Stancáková A, Takeuchi F, Tayo BO, van der Most PJ, Varga TV, Wang TD, Wang Y, Ware EB, Wen W, Xiang YB, Yanek LR, Zhang W, Zhao JH, Adeyemo A, Afaq S, Amin N, Amini M, Arking DE, Arzumanyan Z, Aung T, Ballantyne C, Barr RG, Bielak LF, Boerwinkle E, Bottinger EP, Broeckel U, Brown M, Cade BE, Campbell A, Canouil M, Charumathi S, Chen YI, Christensen K, Concas MP, Connell JM, de Las Fuentes L, de Silva HJ, de Vries PS, Doumatey A, Duan Q, Eaton CB, Eppinga RN, Faul JD, Floyd JS, Forouhi NG, Forrester T, Friedlander Y, Gandin I, Gao H, Ghanbari M, Gharib SA, Gigante B, Giulianini F, Grabe HJ, Gu CC, Harris TB, Heikkinen S, Heng CK, Hirata M, Hixson JE, Ikram MA, Jia Y, Joehanes R, Johnson C, Jonas JB, Justice AE, Katsuya T, Khor CC, Kilpeläinen TO, Koh WP, Kolcic I, Kooperberg C, Krieger JE, Kritchevsky SB, Kubo M, Kuusisto J, Lakka TA, Langefeld CD, Langenberg C, Launer LJ, Lehne B, Lewis CE, Li Y, Liang J, Lin S, Liu CT, Liu J, Liu K, Loh M, Lohman KK, Louie T, Luzzi A, Mägi R, Mahajan A, Manichaikul AW, McKenzie CA, Meitinger T, Metspalu A, Milaneschi Y, Milani L, Mohlke KL, Momozawa Y, Morris AP, Murray AD, Nalls MA, Nauck M, Nelson CP, North KE, O'Connell JR, Palmer ND, Papanicolau GJ, Pedersen NL, Peters A, Peyser PA, Polasek O, Poulter N, Raitakari OT, Reiner AP, Renström F, Rice TK, Rich SS, Robinson JG, Rose LM, Rosendaal FR, Rudan I, Schmidt CO, Schreiner PJ, Scott WR, Sever P, Shi Y, Sidney S, Sims M, Smith JA, Snieder H, Starr JM, Strauch K, Stringham HM, Tan NYQ, Tang H, Taylor KD, Teo YY, Tham YC, Tiemeier H, Turner ST, Uitterlinden AG, van Heemst D, Waldenberger M, Wang H, Wang L, Wang L, Wei WB, Williams CA, Wilson G, Wojczynski MK, Yao J, Young K, Yu C, Yuan JM, Zhou J, Zonderman AB, Becker DM, Boehnke M, Bowden DW, Chambers JC, Cooper RS, de Faire U, Deary IJ, Elliott P, Esko T, Farrall M, Franks PW, Freedman BI, Froguel P, Gasparini P, Gieger C, Horta BL, Juang JJ, Kamatani Y, Kammerer CM, Kato N, Kooner JS, Laakso M, Laurie CC, Lee IT, Lehtimäki T, Magnusson PKE, Oldehinkel AJ, Penninx B, Pereira AC, Rauramaa R, Redline S, Samani NJ, Scott J, Shu XO, van der Harst P, Wagenknecht LE, Wang JS, Wang YX, Wareham NJ, Watkins H, Weir DR, Wickremasinghe AR, Wu T, Zeggini E, Zheng W, Bouchard C, Evans MK, Gudnason V, Kardia SLR, Liu Y, Psaty BM, Ridker PM, van Dam RM, Mook-Kanamori DO, Fornage M, Province MA, Kelly TN, Fox ER, Hayward C, van Duijn CM, Tai ES, Wong TY, Loos RJF, Franceschini N, Rotter JI, Zhu X, Bierut LJ, Gauderman WJ, Rice K, Munroe PB, Morrison AC, Rao DC, Rotimi CN, Cupples LA, COGENT-Kidney Consortium, EPIC-InterAct Consortium, Understanding Society Scientific Group, Lifelines Cohort Multi-ancestry genome-wide gene-smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids. Nature Genetics. 2019;51:636–648. doi: 10.1038/s41588-019-0378-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, Boyle EA, Zhang X, Racimo F, Pritchard JK, Coop G. Reduced signal for polygenic adaptation of height in UK biobank. eLife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLOS Genetics. 2014;10:e1004412. doi: 10.1371/journal.pgen.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bien SA, Wojcik GL, Hodonsky CJ, Gignoux CR, Cheng I, Matise TC, Peters U, Kenny EE, North KE. The future of genomic studies must be globally representative: perspectives from PAGE. Annual Review of Genomics and Human Genetics. 2019;20:181–200. doi: 10.1146/annurev-genom-091416-035517. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
Branigan AR, McCallum KJ, Freese J. Variation in the heritability of educational attainment: an international Meta-Analysis. Social Forces. 2013;92:109–140. doi: 10.1093/sf/sot076. [DOI] [Google Scholar]
Briley DA, Tucker-Drob EM. Explaining the increasing heritability of cognitive ability across development: a meta-analysis of longitudinal twin and adoption studies. Psychological Science. 2013;24:1704–1713. doi: 10.1177/0956797613478618. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM, Schizophrenia Working Group of the Psychiatric Genomics Consortium LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J. The UK biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chetty R, Hendren N. Race and Economic Opportunity in the United States: An Intergenerational Perspective. National Bureau of Economic Research; 2018. [Google Scholar]
Conley D. Being Black, Living in the Red: Race, Wealth, and Social Policy in America. University of California Press; 2010. [Google Scholar]
Conley D, Domingue BW, Cesarini D, Dawes C, Rietveld CA, Boardman JD. Is the effect of parental education on offspring biased or moderated by genotype? Sociological Science. 2015;2:82–105. doi: 10.15195/v2.a6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Conley D. Socio-Genomic research using Genome-Wide molecular data. Annual Review of Sociology. 2016;42:275–299. doi: 10.1146/annurev-soc-081715-074316. [DOI] [Google Scholar]
Davies NM, Dickson M, Davey Smith G, van den Berg GJ, Windmeijer F. The causal effects of education on health outcomes in the UK biobank. Nature Human Behaviour. 2018;2:117–125. doi: 10.1038/s41562-017-0279-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
De La Vega FM, Bustamante CD. Polygenic risk scores: a biased prediction? Genome Medicine. 2018;10:100. doi: 10.1186/s13073-018-0610-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Domingue BW, Fletcher J, Conley D, Boardman JD. Genetic and educational assortative mating among US adults. PNAS. 2014;111:7996–8000. doi: 10.1073/pnas.1321426111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Domingue BW, Belsky DW, Fletcher JM, Conley D, Boardman JD, Harris KM. The social genome of friends and schoolmates in the national longitudinal study of adolescent to adult health. PNAS. 2018;115:702–707. doi: 10.1073/pnas.1711803115. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLOS Genetics. 2013;9:e1003348. doi: 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]
Duncan L, Shen H, Gelaye B, Ressler K, Feldman M, Peterson R, Domingue B. Analysis of polygenic score usage and performance across diverse human populations. bioRxiv. 2018 doi: 10.1101/398396. [DOI] [PMC free article] [PubMed]
Edge MD, Coop G. Reconstructing the history of polygenic scores using coalescent trees. Genetics. 2019;211:235–262. doi: 10.1534/genetics.118.301687. [DOI] [PMC free article] [PubMed] [Google Scholar]
Elks CE, den Hoed M, Zhao JH, Sharp SJ, Wareham NJ, Loos RJ, Ong KK. Variability in the heritability of body mass index: a systematic review and meta-regression. Frontiers in Endocrinology. 2012;3:29. doi: 10.3389/fendo.2012.00029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Euesden J, Lewis CM, O'Reilly PF. PRSice: polygenic risk score software. Bioinformatics. 2015;31:1466–1468. doi: 10.1093/bioinformatics/btu848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, Ntritsos G, Dimou N, Cabrera CP, Karaman I, Ng FL, Evangelou M, Witkowska K, Tzanis E, Hellwege JN, Giri A, Velez Edwards DR, Sun YV, Cho K, Gaziano JM, Wilson PWF, Tsao PS, Kovesdy CP, Esko T, Mägi R, Milani L, Almgren P, Boutin T, Debette S, Ding J, Giulianini F, Holliday EG, Jackson AU, Li-Gao R, Lin WY, Luan J, Mangino M, Oldmeadow C, Prins BP, Qian Y, Sargurupremraj M, Shah N, Surendran P, Thériault S, Verweij N, Willems SM, Zhao JH, Amouyel P, Connell J, de Mutsert R, Doney ASF, Farrall M, Menni C, Morris AD, Noordam R, Paré G, Poulter NR, Shields DC, Stanton A, Thom S, Abecasis G, Amin N, Arking DE, Ayers KL, Barbieri CM, Batini C, Bis JC, Blake T, Bochud M, Boehnke M, Boerwinkle E, Boomsma DI, Bottinger EP, Braund PS, Brumat M, Campbell A, Campbell H, Chakravarti A, Chambers JC, Chauhan G, Ciullo M, Cocca M, Collins F, Cordell HJ, Davies G, de Borst MH, de Geus EJ, Deary IJ, Deelen J, Del Greco M F, Demirkale CY, Dörr M, Ehret GB, Elosua R, Enroth S, Erzurumluoglu AM, Ferreira T, Frånberg M, Franco OH, Gandin I, Gasparini P, Giedraitis V, Gieger C, Girotto G, Goel A, Gow AJ, Gudnason V, Guo X, Gyllensten U, Hamsten A, Harris TB, Harris SE, Hartman CA, Havulinna AS, Hicks AA, Hofer E, Hofman A, Hottenga JJ, Huffman JE, Hwang SJ, Ingelsson E, James A, Jansen R, Jarvelin MR, Joehanes R, Johansson Å, Johnson AD, Joshi PK, Jousilahti P, Jukema JW, Jula A, Kähönen M, Kathiresan S, Keavney BD, Khaw KT, Knekt P, Knight J, Kolcic I, Kooner JS, Koskinen S, Kristiansson K, Kutalik Z, Laan M, Larson M, Launer LJ, Lehne B, Lehtimäki T, Liewald DCM, Lin L, Lind L, Lindgren CM, Liu Y, Loos RJF, Lopez LM, Lu Y, Lyytikäinen LP, Mahajan A, Mamasoula C, Marrugat J, Marten J, Milaneschi Y, Morgan A, Morris AP, Morrison AC, Munson PJ, Nalls MA, Nandakumar P, Nelson CP, Niiranen T, Nolte IM, Nutile T, Oldehinkel AJ, Oostra BA, O'Reilly PF, Org E, Padmanabhan S, Palmas W, Palotie A, Pattie A, Penninx B, Perola M, Peters A, Polasek O, Pramstaller PP, Nguyen QT, Raitakari OT, Ren M, Rettig R, Rice K, Ridker PM, Ried JS, Riese H, Ripatti S, Robino A, Rose LM, Rotter JI, Rudan I, Ruggiero D, Saba Y, Sala CF, Salomaa V, Samani NJ, Sarin AP, Schmidt R, Schmidt H, Shrine N, Siscovick D, Smith AV, Snieder H, Sõber S, Sorice R, Starr JM, Stott DJ, Strachan DP, Strawbridge RJ, Sundström J, Swertz MA, Taylor KD, Teumer A, Tobin MD, Tomaszewski M, Toniolo D, Traglia M, Trompet S, Tuomilehto J, Tzourio C, Uitterlinden AG, Vaez A, van der Most PJ, van Duijn CM, Vergnaud AC, Verwoert GC, Vitart V, Völker U, Vollenweider P, Vuckovic D, Watkins H, Wild SH, Willemsen G, Wilson JF, Wright AF, Yao J, Zemunik T, Zhang W, Attia JR, Butterworth AS, Chasman DI, Conen D, Cucca F, Danesh J, Hayward C, Howson JMM, Laakso M, Lakatta EG, Langenberg C, Melander O, Mook-Kanamori DO, Palmer CNA, Risch L, Scott RA, Scott RJ, Sever P, Spector TD, van der Harst P, Wareham NJ, Zeggini E, Levy D, Munroe PB, Newton-Cheh C, Brown MJ, Metspalu A, Hung AM, O'Donnell CJ, Edwards TL, Psaty BM, Tzoulaki I, Barnes MR, Wain LV, Elliott P, Caulfield MJ, Million Veteran Program Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nature Genetics. 2018;50:1412–1425. doi: 10.1038/s41588-018-0205-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Field Y, Boyle EA, Telis N, Gao Z, Gaulton KJ, Golan D, Yengo L, Rocheleau G, Froguel P, McCarthy MI, Pritchard JK. Detection of human adaptation during the past 2000 years. Science. 2016;354:760–764. doi: 10.1126/science.aag0776. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, Collins R, Allen NE. Comparison of Sociodemographic and Health-Related characteristics of UK biobank participants with those of the general population. American Journal of Epidemiology. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ge T, Chen CY, Neale BM, Sabuncu MR, Smoller JW. Phenome-wide heritability analysis of the UK biobank. PLOS Genetics. 2017;13:e1006711. doi: 10.1371/journal.pgen.1006711. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gibson G. The environmental contribution to gene expression profiles. Nature Reviews Genetics. 2008;9:575–581. doi: 10.1038/nrg2383. [DOI] [PubMed] [Google Scholar]
Haworth S, Mitchell R, Corbin L, Wade KH, Dudding T, Budu-Aggrey A, Carslake D, Hemani G, Paternoster L, Smith GD, Davies N, Lawson DJ, J Timpson N. Apparent latent structure within the UK biobank sample has implications for epidemiological analysis. Nature Communications. 2019;10:333. doi: 10.1038/s41467-018-08219-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henderson CR. Applications of Linear Models in Animal Breeding. Vol. 462. University of Guelph Guelph; 1984. [Google Scholar]
Höllinger I, Pennings PS, Hermisson J. Polygenic adaptation: From sweeps to subtle frequency shifts. PLOS Genetics. 2019;15:e1008035. doi: 10.1371/journal.pgen.1008035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, Lai FY, Kaptoge S, Brozynska M, Wang T, Ye S, Webb TR, Rutter MK, Tzoulaki I, Patel RS, Loos RJF, Keavney B, Hemingway H, Thompson J, Watkins H, Deloukas P, Di Angelantonio E, Butterworth AS, Danesh J, Samani NJ, UK Biobank CardioMetabolic Consortium CHD Working Group Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. Journal of the American College of Cardiology. 2018;72:1883–1893. doi: 10.1016/j.jacc.2018.07.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kathiresan S, Melander O, Anevski D, Guiducci C, Burtt NP, Roos C, Hirschhorn JN, Berglund G, Hedblad B, Groop L, Altshuler DM, Newton-Cheh C, Orho-Melander M. Polymorphisms associated with cholesterol and risk of cardiovascular events. New England Journal of Medicine. 2008;358:1240–1249. doi: 10.1056/NEJMoa0706728. [DOI] [PubMed] [Google Scholar]
Kerminen S, Martin AR, Koskela J, Ruotsalainen SE, Havulinna AS, Surakka I, Palotie A. Geographic variation and Bias in Polygenic scores of complex diseases and traits in Finland. bioRxiv. 2018 doi: 10.1101/485441. [DOI] [PMC free article] [PubMed]
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, Distefano M, Senol-Cosar O, Haas ME, Bick A, Aragam KG, Lander ES, Smith GD, Mason-Suares H, Fornage M, Lebo M, Timpson NJ, Kaplan LM, Kathiresan S. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177:587–596. doi: 10.1016/j.cell.2019.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim MS, Patel KP, Teng AK, Berens AJ, Lachance J. Genetic disease risks can be misestimated across global populations. Genome Biology. 2018;19:179. doi: 10.1186/s13059-018-1561-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kong A, Thorleifsson G, Frigge ML, Vilhjalmsson BJ, Young AI, Thorgeirsson TE, Benonisdottir S, Oddsson A, Halldorsson BV, Masson G, Gudbjartsson DF, Helgason A, Bjornsdottir G, Thorsteinsdottir U, Stefansson K. The nature of nurture: effects of parental genotypes. Science. 2018;359:424–428. doi: 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]
Lawson DJ, Davies NM, Haworth S, Ashraf B, Howe L, Crawford A, Hemani G, Davey Smith G, Timpson NJ. Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Human Genetics. 2020;139:1–2. doi: 10.1007/s00439-019-02014-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, Nguyen-Viet TA, Bowers P, Sidorenko J, Karlsson Linnér R, Fontana MA, Kundu T, Lee C, Li H, Li R, Royer R, Timshel PN, Walters RK, Willoughby EA, Yengo L, Alver M, Bao Y, Clark DW, Day FR, Furlotte NA, Joshi PK, Kemper KE, Kleinman A, Langenberg C, Mägi R, Trampush JW, Verma SS, Wu Y, Lam M, Zhao JH, Zheng Z, Boardman JD, Campbell H, Freese J, Harris KM, Hayward C, Herd P, Kumari M, Lencz T, Luan J, Malhotra AK, Metspalu A, Milani L, Ong KK, Perry JRB, Porteous DJ, Ritchie MD, Smart MC, Smith BH, Tung JY, Wareham NJ, Wilson JF, Beauchamp JP, Conley DC, Esko T, Lehrer SF, Magnusson PKE, Oskarsson S, Pers TH, Robinson MR, Thom K, Watson C, Chabris CF, Meyer MN, Laibson DI, Yang J, Johannesson M, Koellinger PD, Turley P, Visscher PM, Benjamin DJ, Cesarini D, 23andMe Research Team, COGENT (Cognitive Genomics Consortium), Social Science Genetic Association Consortium Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics. 2018;50:1112–1121. doi: 10.1038/s41588-018-0147-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Listgarten J, Lippert C, Heckerman D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nature Genetics. 2013;45:470–471. doi: 10.1038/ng.2620. [DOI] [PubMed] [Google Scholar]
Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nature Genetics. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Vol. 1. Sunderland, MA: Sinauer; 1998. [Google Scholar]
Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human demographic history impacts genetic risk prediction across diverse populations. The American Journal of Human Genetics. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin AR, Teferra S, Möller M, Hoal EG, Daly MJ. The critical needs and challenges for genetic architecture studies in Africa. Current Opinion in Genetics & Development. 2018;53:113–120. doi: 10.1016/j.gde.2018.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mathieson I, McVean G. Reply to: "FaST-LMM-Select for addressing confounding from spatial structure and rare variants". Nature Genetics. 2013;45:471. doi: 10.1038/ng.2619. [DOI] [PubMed] [Google Scholar]
Mavaddat N, Michailidou K, Dennis J, Lush M, Fachal L, Lee A, Tyrer JP, Chen TH, Wang Q, Bolla MK, Yang X, Adank MA, Ahearn T, Aittomäki K, Allen J, Andrulis IL, Anton-Culver H, Antonenkova NN, Arndt V, Aronson KJ, Auer PL, Auvinen P, Barrdahl M, Beane Freeman LE, Beckmann MW, Behrens S, Benitez J, Bermisheva M, Bernstein L, Blomqvist C, Bogdanova NV, Bojesen SE, Bonanni B, Børresen-Dale AL, Brauch H, Bremer M, Brenner H, Brentnall A, Brock IW, Brooks-Wilson A, Brucker SY, Brüning T, Burwinkel B, Campa D, Carter BD, Castelao JE, Chanock SJ, Chlebowski R, Christiansen H, Clarke CL, Collée JM, Cordina-Duverger E, Cornelissen S, Couch FJ, Cox A, Cross SS, Czene K, Daly MB, Devilee P, Dörk T, Dos-Santos-Silva I, Dumont M, Durcan L, Dwek M, Eccles DM, Ekici AB, Eliassen AH, Ellberg C, Engel C, Eriksson M, Evans DG, Fasching PA, Figueroa J, Fletcher O, Flyger H, Försti A, Fritschi L, Gabrielson M, Gago-Dominguez M, Gapstur SM, García-Sáenz JA, Gaudet MM, Georgoulias V, Giles GG, Gilyazova IR, Glendon G, Goldberg MS, Goldgar DE, González-Neira A, Grenaker Alnæs GI, Grip M, Gronwald J, Grundy A, Guénel P, Haeberle L, Hahnen E, Haiman CA, Håkansson N, Hamann U, Hankinson SE, Harkness EF, Hart SN, He W, Hein A, Heyworth J, Hillemanns P, Hollestelle A, Hooning MJ, Hoover RN, Hopper JL, Howell A, Huang G, Humphreys K, Hunter DJ, Jakimovska M, Jakubowska A, Janni W, John EM, Johnson N, Jones ME, Jukkola-Vuorinen A, Jung A, Kaaks R, Kaczmarek K, Kataja V, Keeman R, Kerin MJ, Khusnutdinova E, Kiiski JI, Knight JA, Ko YD, Kosma VM, Koutros S, Kristensen VN, Krüger U, Kühl T, Lambrechts D, Le Marchand L, Lee E, Lejbkowicz F, Lilyquist J, Lindblom A, Lindström S, Lissowska J, Lo WY, Loibl S, Long J, Lubiński J, Lux MP, MacInnis RJ, Maishman T, Makalic E, Maleva Kostovska I, Mannermaa A, Manoukian S, Margolin S, Martens JWM, Martinez ME, Mavroudis D, McLean C, Meindl A, Menon U, Middha P, Miller N, Moreno F, Mulligan AM, Mulot C, Muñoz-Garzon VM, Neuhausen SL, Nevanlinna H, Neven P, Newman WG, Nielsen SF, Nordestgaard BG, Norman A, Offit K, Olson JE, Olsson H, Orr N, Pankratz VS, Park-Simon TW, Perez JIA, Pérez-Barrios C, Peterlongo P, Peto J, Pinchev M, Plaseska-Karanfilska D, Polley EC, Prentice R, Presneau N, Prokofyeva D, Purrington K, Pylkäs K, Rack B, Radice P, Rau-Murthy R, Rennert G, Rennert HS, Rhenius V, Robson M, Romero A, Ruddy KJ, Ruebner M, Saloustros E, Sandler DP, Sawyer EJ, Schmidt DF, Schmutzler RK, Schneeweiss A, Schoemaker MJ, Schumacher F, Schürmann P, Schwentner L, Scott C, Scott RJ, Seynaeve C, Shah M, Sherman ME, Shrubsole MJ, Shu XO, Slager S, Smeets A, Sohn C, Soucy P, Southey MC, Spinelli JJ, Stegmaier C, Stone J, Swerdlow AJ, Tamimi RM, Tapper WJ, Taylor JA, Terry MB, Thöne K, Tollenaar R, Tomlinson I, Truong T, Tzardi M, Ulmer HU, Untch M, Vachon CM, van Veen EM, Vijai J, Weinberg CR, Wendt C, Whittemore AS, Wildiers H, Willett W, Winqvist R, Wolk A, Yang XR, Yannoukakos D, Zhang Y, Zheng W, Ziogas A, Dunning AM, Thompson DJ, Chenevix-Trench G, Chang-Claude J, Schmidt MK, Hall P, Milne RL, Pharoah PDP, Antoniou AC, Chatterjee N, Kraft P, García-Closas M, Simard J, Easton DF, ABCTB Investigators, kConFab/AOCS Investigators, NBCS Collaborators Polygenic risk scores for prediction of breast Cancer and breast Cancer subtypes. The American Journal of Human Genetics. 2019;104:21–34. doi: 10.1016/j.ajhg.2018.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using Genome-Wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mills MC, Rahal C. A scientometric review of genome-wide association studies. Communications Biology. 2019;2:9. doi: 10.1038/s42003-018-0261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mostafavi H, Berisa T, Day FR, Perry JRB, Przeworski M, Pickrell JK. Identifying genetic variants that affect viability in large cohorts. PLOS Biology. 2017;15:e2002458. doi: 10.1371/journal.pbio.2002458. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nuru-Jeter AM, Michaels EK, Thomas MD, Reeves AN, Thorpe RJ, LaVeist TA. Relative roles of race versus socioeconomic position in studies of health inequalities: a matter of interpretation. Annual Review of Public Health. 2018;39:169–188. doi: 10.1146/annurev-publhealth-040617-014230. [DOI] [PMC free article] [PubMed] [Google Scholar]
Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, Turley P, Chen GB, Emilsson V, Meddens SF, Oskarsson S, Pickrell JK, Thom K, Timshel P, de Vlaming R, Abdellaoui A, Ahluwalia TS, Bacelis J, Baumbach C, Bjornsdottir G, Brandsma JH, Pina Concas M, Derringer J, Furlotte NA, Galesloot TE, Girotto G, Gupta R, Hall LM, Harris SE, Hofer E, Horikoshi M, Huffman JE, Kaasik K, Kalafati IP, Karlsson R, Kong A, Lahti J, van der Lee SJ, deLeeuw C, Lind PA, Lindgren KO, Liu T, Mangino M, Marten J, Mihailov E, Miller MB, van der Most PJ, Oldmeadow C, Payton A, Pervjakova N, Peyrot WJ, Qian Y, Raitakari O, Rueedi R, Salvi E, Schmidt B, Schraut KE, Shi J, Smith AV, Poot RA, St Pourcain B, Teumer A, Thorleifsson G, Verweij N, Vuckovic D, Wellmann J, Westra HJ, Yang J, Zhao W, Zhu Z, Alizadeh BZ, Amin N, Bakshi A, Baumeister SE, Biino G, Bønnelykke K, Boyle PA, Campbell H, Cappuccio FP, Davies G, De Neve JE, Deloukas P, Demuth I, Ding J, Eibich P, Eisele L, Eklund N, Evans DM, Faul JD, Feitosa MF, Forstner AJ, Gandin I, Gunnarsson B, Halldórsson BV, Harris TB, Heath AC, Hocking LJ, Holliday EG, Homuth G, Horan MA, Hottenga JJ, de Jager PL, Joshi PK, Jugessur A, Kaakinen MA, Kähönen M, Kanoni S, Keltigangas-Järvinen L, Kiemeney LA, Kolcic I, Koskinen S, Kraja AT, Kroh M, Kutalik Z, Latvala A, Launer LJ, Lebreton MP, Levinson DF, Lichtenstein P, Lichtner P, Liewald DC, Loukola A, Madden PA, Mägi R, Mäki-Opas T, Marioni RE, Marques-Vidal P, Meddens GA, McMahon G, Meisinger C, Meitinger T, Milaneschi Y, Milani L, Montgomery GW, Myhre R, Nelson CP, Nyholt DR, Ollier WE, Palotie A, Paternoster L, Pedersen NL, Petrovic KE, Porteous DJ, Räikkönen K, Ring SM, Robino A, Rostapshova O, Rudan I, Rustichini A, Salomaa V, Sanders AR, Sarin AP, Schmidt H, Scott RJ, Smith BH, Smith JA, Staessen JA, Steinhagen-Thiessen E, Strauch K, Terracciano A, Tobin MD, Ulivi S, Vaccargiu S, Quaye L, van Rooij FJ, Venturini C, Vinkhuyzen AA, Völker U, Völzke H, Vonk JM, Vozzi D, Waage J, Ware EB, Willemsen G, Attia JR, Bennett DA, Berger K, Bertram L, Bisgaard H, Boomsma DI, Borecki IB, Bültmann U, Chabris CF, Cucca F, Cusi D, Deary IJ, Dedoussis GV, van Duijn CM, Eriksson JG, Franke B, Franke L, Gasparini P, Gejman PV, Gieger C, Grabe HJ, Gratten J, Groenen PJ, Gudnason V, van der Harst P, Hayward C, Hinds DA, Hoffmann W, Hyppönen E, Iacono WG, Jacobsson B, Järvelin MR, Jöckel KH, Kaprio J, Kardia SL, Lehtimäki T, Lehrer SF, Magnusson PK, Martin NG, McGue M, Metspalu A, Pendleton N, Penninx BW, Perola M, Pirastu N, Pirastu M, Polasek O, Posthuma D, Power C, Province MA, Samani NJ, Schlessinger D, Schmidt R, Sørensen TI, Spector TD, Stefansson K, Thorsteinsdottir U, Thurik AR, Timpson NJ, Tiemeier H, Tung JY, Uitterlinden AG, Vitart V, Vollenweider P, Weir DR, Wilson JF, Wright AF, Conley DC, Krueger RF, Davey Smith G, Hofman A, Laibson DI, Medland SE, Meyer MN, Yang J, Johannesson M, Visscher PM, Esko T, Koellinger PD, Cesarini D, Benjamin DJ, LifeLines Cohort Study Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533:539–542. doi: 10.1038/nature17671. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast Cancer. New England Journal of Medicine. 2008;358:2796–2803. doi: 10.1056/NEJMsa0708739. [DOI] [PubMed] [Google Scholar]
Polderman TJ, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]
Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pritchard JK, Di Rienzo A. Adaptation - not by sweeps alone. Nature Reviews Genetics. 2010;11:665–667. doi: 10.1038/nrg2880. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. The American Journal of Human Genetics. 2001;69:1–14. doi: 10.1086/321275. [DOI] [PMC free article] [PubMed] [Google Scholar]
Racimo F, Berg JJ, Pickrell JK. Detecting polygenic adaptation in admixture graphs. Genetics. 2018;208:1565–1584. doi: 10.1534/genetics.117.300489. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reckelhoff JF. Gender differences in the regulation of blood pressure. Hypertension. 2001;37:1199–1208. doi: 10.1161/01.HYP.37.5.1199. [DOI] [PubMed] [Google Scholar]
Reich M. Racial Inequality: A Political-Economic Analysis. Princeton University Press; 2017. [Google Scholar]
Rimfeld K, Krapohl E, Trzaskowski M, Coleman JRI, Selzam S, Dale PS, Esko T, Metspalu A, Plomin R. Genetic influence on social outcomes during and after the soviet era in Estonia. Nature Human Behaviour. 2018;2:269–275. doi: 10.1038/s41562-018-0332-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson MR, Kleinman A, Graff M, Vinkhuyzen AAE, Couper D, Miller MB, Peyrot WJ, Abdellaoui A, Zietsch BP, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, Medland SE, Martin NG, Magnusson PKE, Iacono WG, McGue M, North KE, Yang J, Visscher PM. Genetic evidence of assortative mating in humans. Nature Human Behaviour. 2017;1:16. doi: 10.1038/s41562-016-0016. [DOI] [Google Scholar]
Rosenberg NA, Edge MD, Pritchard JK, Feldman MW. Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evolution, Medicine, and Public Health. 2019;2019:26–34. doi: 10.1093/emph/eoy036. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruby JG, Wright KM, Rand KA, Kermany A, Noto K, Curtis D, Varner N, Garrigan D, Slinkov D, Dorfman I, Granka JM, Byrnes J, Myres N, Ball C. Estimates of the heritability of human longevity are substantially inflated due to assortative mating. Genetics. 2018;210:1109–1124. doi: 10.1534/genetics.118.301613. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sella G, Barton NH. Thinking about the evolution of complex traits in the era of Genome-Wide association studies. Annual Review of Genomics and Human Genetics. 2019;20:461–493. doi: 10.1146/annurev-genom-083115-022316. [DOI] [PubMed] [Google Scholar]
Selzam S, Ritchie SJ, Pingault JB, Reynolds CA, O'Reilly PF, Plomin R. Comparing within- and Between-Family polygenic score prediction. The American Journal of Human Genetics. 2019;105:351–363. doi: 10.1016/j.ajhg.2019.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CW, Hirschhorn J, Daly MJ, Patterson N, Neale B, Mathieson I, Reich D, Sunyaev SR. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702. doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
Speidel L, Forest M, Shi S, Myers S. A method for Genome-Wide genealogy estimation for thousands of samples. bioRxiv. 2019 doi: 10.1101/550558. [DOI] [PMC free article] [PubMed]
Stulp G, Simons MJP, Grasman S, Pollet TV. Assortative mating for human height: a meta-analysis. American Journal of Human Biology. 2017;29:e22917. doi: 10.1002/ajhb.22917. [DOI] [PMC free article] [PubMed] [Google Scholar]
Taylor AE, Jones HJ, Sallis H, Euesden J, Stergiakouli E, Davies NM, Zammit S, Lawlor DA, Munafò MR, Davey Smith G, Tilling K. Exploring the association of genetic factors with participation in the avon longitudinal study of parents and children. International Journal of Epidemiology. 2018;47:1207–1216. doi: 10.1093/ije/dyy060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Telkar N, Reiker T, Walters RG, Lin K, Eriksson A, Gurdasani D, Gilly A. The transferability of lipid loci across african, asian and european cohorts. bioRxiv. 2019 doi: 10.1101/525170. [DOI] [PMC free article] [PubMed]
Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics. 2018;19:581–590. doi: 10.1038/s41576-018-0018-x. [DOI] [PubMed] [Google Scholar]
Trejo S, Benjamin WD. Genetic nature or genetic nurture? quantifying Bias in analyses using polygenic scores. bioRxiv. 2019 doi: 10.1101/524850. [DOI]
Tropf FC, Lee SH, Verweij RM, Stulp G, van der Most PJ, de Vlaming R, Bakshi A, Briley DA, Rahal C, Hellpap R, Iliadou AN, Esko T, Metspalu A, Medland SE, Martin NG, Barban N, Snieder H, Robinson MR, Mills MC. Hidden heritability due to heterogeneity across seven populations. Nature Human Behaviour. 2017;1:757–765. doi: 10.1038/s41562-017-0195-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Uricchio LH, Kitano HC, Gusev A, Zaitlen NA. An evolutionary compass for detecting signals of polygenic selection and mutational Bias. Evolution Letters. 2019;3:69–79. doi: 10.1002/evl3.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The American Journal of Human Genetics. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vilhjálmsson BJ, Nordborg M. The nature of confounding in genome-wide association studies. Nature Reviews Genetics. 2013;14:1–2. doi: 10.1038/nrg3382. [DOI] [PubMed] [Google Scholar]
Ware EB, Schmitz LL, Faul JD, Gard A, Mitchell C, Smith JA, Zhao W, Weir D, Kardia SLR. Heterogeneity in Polygenic scores for common human traits. bioRxiv. 2017 doi: 10.1101/106062. [DOI]
Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, Freathy RM, Perry JR, Stevens S, Hall AS, Samani NJ, Shields B, Prokopenko I, Farrall M, Dominiczak A, Johnson T, Bergmann S, Beckmann JS, Vollenweider P, Waterworth DM, Mooser V, Palmer CN, Morris AD, Ouwehand WH, Zhao JH, Li S, Loos RJ, Barroso I, Deloukas P, Sandhu MS, Wheeler E, Soranzo N, Inouye M, Wareham NJ, Caulfield M, Munroe PB, Hattersley AT, McCarthy MI, Frayling TM, Diabetes Genetics Initiative, Wellcome Trust Case Control Consortium, Cambridge GEM Consortium Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genetics. 2008;40:575–583. doi: 10.1038/ng.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL, Belbin GM, Bien SA, Cheng I, Cullina S, Hodonsky CJ, Hu Y, Huckins LM, Jeff J, Justice AE, Kocarnik JM, Lim U, Lin BM, Lu Y, Nelson SC, Park SL, Poisner H, Preuss MH, Richard MA, Schurmann C, Setiawan VW, Sockell A, Vahi K, Verbanck M, Vishnu A, Walker RW, Young KL, Zubair N, Acuña-Alonso V, Ambite JL, Barnes KC, Boerwinkle E, Bottinger EP, Bustamante CD, Caberto C, Canizales-Quinteros S, Conomos MP, Deelman E, Do R, Doheny K, Fernández-Rhodes L, Fornage M, Hailu B, Heiss G, Henn BM, Hindorff LA, Jackson RD, Laurie CA, Laurie CC, Li Y, Lin DY, Moreno-Estrada A, Nadkarni G, Norman PJ, Pooler LC, Reiner AP, Romm J, Sabatti C, Sandoval K, Sheng X, Stahl EA, Stram DO, Thornton TA, Wassel CL, Wilkens LR, Winkler CA, Yoneyama S, Buyske S, Haiman CA, Kooperberg C, Le Marchand L, Loos RJF, Matise TC, North KE, Peters U, Kenny EE, Carlson CS. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, Chu AY, Estrada K, Luan J, Kutalik Z, Amin N, Buchkovich ML, Croteau-Chonka DC, Day FR, Duan Y, Fall T, Fehrmann R, Ferreira T, Jackson AU, Karjalainen J, Lo KS, Locke AE, Mägi R, Mihailov E, Porcu E, Randall JC, Scherag A, Vinkhuyzen AA, Westra HJ, Winkler TW, Workalemahu T, Zhao JH, Absher D, Albrecht E, Anderson D, Baron J, Beekman M, Demirkan A, Ehret GB, Feenstra B, Feitosa MF, Fischer K, Fraser RM, Goel A, Gong J, Justice AE, Kanoni S, Kleber ME, Kristiansson K, Lim U, Lotay V, Lui JC, Mangino M, Mateo Leach I, Medina-Gomez C, Nalls MA, Nyholt DR, Palmer CD, Pasko D, Pechlivanis S, Prokopenko I, Ried JS, Ripke S, Shungin D, Stancáková A, Strawbridge RJ, Sung YJ, Tanaka T, Teumer A, Trompet S, van der Laan SW, van Setten J, Van Vliet-Ostaptchouk JV, Wang Z, Yengo L, Zhang W, Afzal U, Arnlöv J, Arscott GM, Bandinelli S, Barrett A, Bellis C, Bennett AJ, Berne C, Blüher M, Bolton JL, Böttcher Y, Boyd HA, Bruinenberg M, Buckley BM, Buyske S, Caspersen IH, Chines PS, Clarke R, Claudi-Boehm S, Cooper M, Daw EW, De Jong PA, Deelen J, Delgado G, Denny JC, Dhonukshe-Rutten R, Dimitriou M, Doney AS, Dörr M, Eklund N, Eury E, Folkersen L, Garcia ME, Geller F, Giedraitis V, Go AS, Grallert H, Grammer TB, Gräßler J, Grönberg H, de Groot LC, Groves CJ, Haessler J, Hall P, Haller T, Hallmans G, Hannemann A, Hartman CA, Hassinen M, Hayward C, Heard-Costa NL, Helmer Q, Hemani G, Henders AK, Hillege HL, Hlatky MA, Hoffmann W, Hoffmann P, Holmen O, Houwing-Duistermaat JJ, Illig T, Isaacs A, James AL, Jeff J, Johansen B, Johansson Å, Jolley J, Juliusdottir T, Junttila J, Kho AN, Kinnunen L, Klopp N, Kocher T, Kratzer W, Lichtner P, Lind L, Lindström J, Lobbens S, Lorentzon M, Lu Y, Lyssenko V, Magnusson PK, Mahajan A, Maillard M, McArdle WL, McKenzie CA, McLachlan S, McLaren PJ, Menni C, Merger S, Milani L, Moayyeri A, Monda KL, Morken MA, Müller G, Müller-Nurasyid M, Musk AW, Narisu N, Nauck M, Nolte IM, Nöthen MM, Oozageer L, Pilz S, Rayner NW, Renstrom F, Robertson NR, Rose LM, Roussel R, Sanna S, Scharnagl H, Scholtens S, Schumacher FR, Schunkert H, Scott RA, Sehmi J, Seufferlein T, Shi J, Silventoinen K, Smit JH, Smith AV, Smolonska J, Stanton AV, Stirrups K, Stott DJ, Stringham HM, Sundström J, Swertz MA, Syvänen AC, Tayo BO, Thorleifsson G, Tyrer JP, van Dijk S, van Schoor NM, van der Velde N, van Heemst D, van Oort FV, Vermeulen SH, Verweij N, Vonk JM, Waite LL, Waldenberger M, Wennauer R, Wilkens LR, Willenborg C, Wilsgaard T, Wojczynski MK, Wong A, Wright AF, Zhang Q, Arveiler D, Bakker SJ, Beilby J, Bergman RN, Bergmann S, Biffar R, Blangero J, Boomsma DI, Bornstein SR, Bovet P, Brambilla P, Brown MJ, Campbell H, Caulfield MJ, Chakravarti A, Collins R, Collins FS, Crawford DC, Cupples LA, Danesh J, de Faire U, den Ruijter HM, Erbel R, Erdmann J, Eriksson JG, Farrall M, Ferrannini E, Ferrières J, Ford I, Forouhi NG, Forrester T, Gansevoort RT, Gejman PV, Gieger C, Golay A, Gottesman O, Gudnason V, Gyllensten U, Haas DW, Hall AS, Harris TB, Hattersley AT, Heath AC, Hengstenberg C, Hicks AA, Hindorff LA, Hingorani AD, Hofman A, Hovingh GK, Humphries SE, Hunt SC, Hypponen E, Jacobs KB, Jarvelin MR, Jousilahti P, Jula AM, Kaprio J, Kastelein JJ, Kayser M, Kee F, Keinanen-Kiukaanniemi SM, Kiemeney LA, Kooner JS, Kooperberg C, Koskinen S, Kovacs P, Kraja AT, Kumari M, Kuusisto J, Lakka TA, Langenberg C, Le Marchand L, Lehtimäki T, Lupoli S, Madden PA, Männistö S, Manunta P, Marette A, Matise TC, McKnight B, Meitinger T, Moll FL, Montgomery GW, Morris AD, Morris AP, Murray JC, Nelis M, Ohlsson C, Oldehinkel AJ, Ong KK, Ouwehand WH, Pasterkamp G, Peters A, Pramstaller PP, Price JF, Qi L, Raitakari OT, Rankinen T, Rao DC, Rice TK, Ritchie M, Rudan I, Salomaa V, Samani NJ, Saramies J, Sarzynski MA, Schwarz PE, Sebert S, Sever P, Shuldiner AR, Sinisalo J, Steinthorsdottir V, Stolk RP, Tardif JC, Tönjes A, Tremblay A, Tremoli E, Virtamo J, Vohl MC, Amouyel P, Asselbergs FW, Assimes TL, Bochud M, Boehm BO, Boerwinkle E, Bottinger EP, Bouchard C, Cauchi S, Chambers JC, Chanock SJ, Cooper RS, de Bakker PI, Dedoussis G, Ferrucci L, Franks PW, Froguel P, Groop LC, Haiman CA, Hamsten A, Hayes MG, Hui J, Hunter DJ, Hveem K, Jukema JW, Kaplan RC, Kivimaki M, Kuh D, Laakso M, Liu Y, Martin NG, März W, Melbye M, Moebus S, Munroe PB, Njølstad I, Oostra BA, Palmer CN, Pedersen NL, Perola M, Pérusse L, Peters U, Powell JE, Power C, Quertermous T, Rauramaa R, Reinmaa E, Ridker PM, Rivadeneira F, Rotter JI, Saaristo TE, Saleheen D, Schlessinger D, Slagboom PE, Snieder H, Spector TD, Strauch K, Stumvoll M, Tuomilehto J, Uusitupa M, van der Harst P, Völzke H, Walker M, Wareham NJ, Watkins H, Wichmann HE, Wilson JF, Zanen P, Deloukas P, Heid IM, Lindgren CM, Mohlke KL, Speliotes EK, Thorsteinsdottir U, Barroso I, Fox CS, North KE, Strachan DP, Beckmann JS, Berndt SI, Boehnke M, Borecki IB, McCarthy MI, Metspalu A, Stefansson K, Uitterlinden AG, van Duijn CM, Franke L, Willer CJ, Price AL, Lettre G, Loos RJ, Weedon MN, Ingelsson E, O'Connell JR, Abecasis GR, Chasman DI, Goddard ME, Visscher PM, Hirschhorn JN, Frayling TM, Electronic Medical Records and Genomics (eMEMERGEGE) Consortium, MIGen Consortium, PAGEGE Consortium, LifeLines Cohort Study. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
Young AI, Frigge ML, Gudbjartsson DF, Thorleifsson G, Bjornsdottir G, Sulem P, Masson G, Thorsteinsdottir U, Stefansson K, Kong A. Relatedness disequilibrium regression estimates heritability without environmental Bias. Nature Genetics. 2018;50:1304–1310. doi: 10.1038/s41588-018-0178-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Young AI, Benonisdottir S, Przeworski M, Kong A. Deconstructing the sources of genotype-phenotype associations in humans. Science. 2019;365:1396–1400. doi: 10.1126/science.aax3710. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang G, Bacelis J, Lengyel C, Teramo K, Hallman M, Helgeland Ø, Johansson S, Myhre R, Sengpiel V, Njølstad PR, Jacobsson B, Muglia L. Assessing the causal relationship of maternal height on birth size and gestational age at birth: a mendelian randomization analysis. PLOS Medicine. 2015;12:e1001865. doi: 10.1371/journal.pmed.1001865. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou B, Bentham J, Di Cesare M, Bixby H, Danaei G, Cowan MJ, Paciorek CJ, Singh G, Hajifathalian K, Bennett JE, Taddei C, Bilano V, Carrillo-Larco RM, Djalalinia S, Khatibzadeh S, Lugero C, Peykari N, Zhang WZ, Lu Y, Stevens GA, Riley LM, Bovet P, Elliott P, Gu D, Ikeda N, Jackson RT, Joffres M, Kengne AP, Laatikainen T, Lam TH, Laxmaiah A, Liu J, Miranda JJ, Mondo CK, Neuhauser HK, Sundström J, Smeeth L, Soric M, Woodward M, Ezzati M, Abarca-Gómez L, Abdeen ZA, Rahim HA, Abu-Rmeileh NM, Acosta-Cazares B, Adams R, Aekplakorn W, Afsana K, Aguilar-Salinas CA, Agyemang C, Ahmadvand A, Ahrens W, Al Raddadi R, Al Woyatan R, Ali MM, Alkerwi Ala'a, Aly E, Amouyel P, Amuzu A, Andersen LB, Anderssen SA, Ängquist L, Anjana RM, Ansong D, Aounallah-Skhiri H, Araújo J, Ariansen I, Aris T, Arlappa N, Aryal K, Arveiler D, Assah FK, Assunção MCF, Avdicová M, Azevedo A, Azizi F, Babu BV, Bahijri S, Balakrishna N, Bandosz P, Banegas JR, Barbagallo CM, Barceló A, Barkat A, Barros AJD, Barros MV, Bata I, Batieha AM, Baur LA, Beaglehole R, Romdhane HB, Benet M, Benson LS, Bernabe-Ortiz A, Bernotiene G, Bettiol H, Bhagyalaxmi A, Bharadwaj S, Bhargava SK, Bi Y, Bikbov M, Bjerregaard P, Bjertness E, Björkelund C, Blokstra A, Bo S, Bobak M, Boeing H, Boggia JG, Boissonnet CP, Bongard V, Braeckman L, Brajkovich I, Branca F, Breckenkamp J, Brenner H, Brewster LM, Bruno G, Bueno-de-Mesquita H B(as), Bugge A, Burns C, Bursztyn M, de León AC, Cacciottolo J, Cameron C, Can G, Cândido APC, Capuano V, Cardoso VC, Carlsson AC, Carvalho MJ, Casanueva FF, Casas J-P, Caserta CA, Chamukuttan S, Chan AW, Chan Q, Chaturvedi HK, Chaturvedi N, Chen C-J, Chen F, Chen H, Chen S, Chen Z, Cheng C-Y, Dekkaki IC, Chetrit A, Chiolero A, Chiou S-T, Chirita-Emandi A, Cho B, Cho Y, Chudek J, Cifkova R, Claessens F, Clays E, Concin H, Cooper C, Cooper R, Coppinger TC, Costanzo S, Cottel D, Cowell C, Craig CL, Crujeiras AB, Cruz JJ, D'Arrigo G, d'Orsi E, Dallongeville J, Damasceno A, Dankner R, Dantoft TM, Dauchet L, De Backer G, De Bacquer D, de Gaetano G, De Henauw S, De Smedt D, Deepa M, Dehghan A, Delisle H, Deschamps V, Dhana K, Di Castelnuovo AF, Dias-da-Costa JS, Diaz A, Dickerson TT, Do HTP, Dobson AJ, Donfrancesco C, Donoso SP, Döring A, Doua K, Drygas W, Dulskiene V, Džakula A, Dzerve V, Dziankowska-Zaborszczyk E, Eggertsen R, Ekelund U, El Ati J, Ellert U, Elliott P, Elosua R, Erasmus RT, Erem C, Eriksen L, de la Peña JE, Evans A, Faeh D, Fall CH, Farzadfar F, Felix-Redondo FJ, Ferguson TS, Fernández-Bergés D, Ferrante D, Ferrari M, Ferreccio C, Ferrieres J, Finn JD, Fischer K, Föger B, Foo LH, Forslund A-S, Forsner M, Fortmann SP, Fouad HM, Francis DK, Franco MdoC, Franco OH, Frontera G, Fuchs FD, Fuchs SC, Fujita Y, Furusawa T, Gaciong Z, Gareta D, Garnett SP, Gaspoz J-M, Gasull M, Gates L, Gavrila D, Geleijnse JM, Ghasemian A, Ghimire A, Giampaoli S, Gianfagna F, Giovannelli J, Goldsmith RA, Gonçalves H, Gross MG, Rivas JPG, Gottrand F, Graff-Iversen S, Grafnetter D, Grajda A, Gregor RD, Grodzicki T, Grøntved A, Gruden G, Grujic V, Gu D, Guan OP, Gudnason V, Guerrero R, Guessous I, Guimaraes AL, Gulliford MC, Gunnlaugsdottir J, Gunter M, Gupta PC, Gureje O, Gurzkowska B, Gutierrez L, Gutzwiller F, Hadaegh F, Halkjær J, Hambleton IR, Hardy R, Harikumar R, Hata J, Hayes AJ, He J, Hendriks ME, Henriques A, Cadena LH, Herrala S, Heshmat R, Hihtaniemi IT, Ho SY, Ho SC, Hobbs M, Hofman A, Dinc GH, Hormiga CM, Horta BL, Houti L, Howitt C, Htay TT, Htet AS, Hu Y, Huerta JM, Husseini AS, Huybrechts I, Hwalla N, Iacoviello L, Iannone AG, Ibrahim MM, Ikram MA, Irazola VE, Islam M, Ivkovic V, Iwasaki M, Jackson RT, Jacobs JM, Jafar T, Jamrozik K, Janszky I, Jasienska G, Jelakovic B, Jiang CQ, Joffres M, Johansson M, Jonas JB, Jørgensen T, Joshi P, Juolevi A, Jurak G, Jureša V, Kaaks R, Kafatos A, Kalter-Leibovici O, Kamaruddin NA, Kasaeian A, Katz J, Kauhanen J, Kaur P, Kavousi M, Kazakbaeva G, Keil U, Boker LK, Keinänen-Kiukaanniemi S, Kelishadi R, Kemper HCG, Kengne AP, Kersting M, Key T, Khader YS, Khalili D, Khang Y-H, Khaw K-T, Kiechl S, Killewo J, Kim J, Klumbiene J, Kolle E, Kolsteren P, Korrovits P, Koskinen S, Kouda K, Koziel S, Kristensen PL, Krokstad S, Kromhout D, Kruger HS, Kubinova R, Kuciene R, Kuh D, Kujala UM, Kula K, Kulaga Z, Kumar RK, Kurjata P, Kusuma YS, Kuulasmaa K, Kyobutungi C, Laatikainen T, Lachat C, Lam TH, Landrove O, Lanska V, Lappas G, Larijani B, Laugsand LE, Laxmaiah A, Bao KLN, Le TD, Leclercq C, Lee J, Lee J, Lehtimäki T, Lekhraj R, León-Muñoz LM, Levitt NS, Li Y, Lilly CL, Lim W-Y, Lima-Costa MF, Lin H-H, Lin X, Linneberg A, Lissner L, Litwin M, Lorbeer R, Lotufo PA, Lozano JE, Luksiene D, Lundqvist A, Lunet N, Lytsy P, Ma G, Ma J, Machado-Coelho GLL, Machi S, Maggi S, Magliano DJ, Majer M, Makdisse M, Malekzadeh R, Malhotra R, Rao KM, Malyutina S, Manios Y, Mann JI, Manzato E, Margozzini P, Marques-Vidal P, Marrugat J, Martorell R, Mathiesen EB, Matijasevich A, Matsha TE, Mbanya JCN, Posso AJMD, McFarlane SR, McGarvey ST, McLachlan S, McLean RM, McNulty BA, Khir ASM, Mediene-Benchekor S, Medzioniene J, Meirhaeghe A, Meisinger C, Menezes AMB, Menon GR, Meshram II, Metspalu A, Mi J, Mikkel K, Miller JC, Miquel JF, Mišigoj-Durakovic M, Mohamed MK, Mohammad K, Mohammadifard N, Mohan V, Yusoff MFM, Møller NC, Molnár D, Momenan A, Mondo CK, Monyeki KDK, Moreira LB, Morejon A, Moreno LA, Morgan K, Moschonis G, Mossakowska M, Mostafa A, Mota J, Motlagh ME, Motta J, Muiesan ML, Müller-Nurasyid M, Murphy N, Mursu J, Musil V, Nagel G, Naidu BM, Nakamura H, Námešná J, Nang EEK, Nangia VB, Narake S, Navarrete-Muñoz EM, Ndiaye NC, Neal WA, Nenko I, Nervi F, Nguyen ND, Nguyen QN, Nieto-Martínez RE, Niiranen TJ, Ning G, Ninomiya T, Nishtar S, Noale M, Noboa OA, Noorbala AA, Noorbala T, Noto D, Al Nsour M, O'Reilly D, Oh K, Olinto MTA, Oliveira IO, Omar MA, Onat A, Ordunez P, Osmond C, Ostojic SM, Otero JA, Overvad K, Owusu-Dabo E, Paccaud FM, Padez C, Pahomova E, Pajak A, Palli D, Palmieri L, Panda-Jonas S, Panza F, Papandreou D, Parnell WR, Parsaeian M, Pecin I, Pednekar MS, Peer N, Peeters PH, Peixoto SV, Pelletier C, Peltonen M, Pereira AC, Pérez RM, Peters A, Petkeviciene J, Pham ST, Pigeot I, Pikhart H, Pilav A, Pilotto L, Pitakaka F, Plans-Rubió P, Polakowska M, Polašek O, Porta M, Portegies MLP, Pourshams A, Pradeepa R, Prashant M, Price JF, Puiu M, Punab M, Qasrawi RF, Qorbani M, Radic I, Radisauskas R, Rahman M, Raitakari O, Raj M, Rao SR, Ramachandran A, Ramos E, Rampal S, Reina DAR, Rasmussen F, Redon J, Reganit PFM, Ribeiro R, Riboli E, Rigo F, de Wit TFR, Ritti-Dias RM, Robinson SM, Robitaille C, Rodríguez-Artalejo F, Rodriguez-Perez del Cristo M, Rodríguez-Villamizar LA, Rojas-Martinez R, Rosengren A, Rubinstein A, Rui O, Ruiz-Betancourt BS, Horimoto ARVR, Rutkowski M, Sabanayagam C, Sachdev HS, Saidi O, Sakarya S, Salanave B, Salazar Martinez E, Salmerón D, Salomaa V, Salonen JT, Salvetti M, Sánchez-Abanto J, Sans S, Santos D, Santos IS, dos Santos RN, Santos R, Saramies JL, Sardinha LB, Margolis GS, Sarrafzadegan N, Saum K-U, Savva SC, Scazufca M, Schargrodsky H, Schneider IJ, Schultsz C, Schutte AE, Sen A, Senbanjo IO, Sepanlou SG, Sharma SK, Shaw JE, Shibuya K, Shin DW, Shin Y, Siantar R, Sibai AM, Silva DAS, Simon M, Simons J, Simons LA, Sjöström M, Skovbjerg S, Slowikowska-Hilczer J, Slusarczyk P, Smeeth L, Smith MC, Snijder MB, So H-K, Sobngwi E, Söderberg S, Solfrizzi V, Sonestedt E, Song Y, Sørensen TIA, Jérome CS, Soumare A, Staessen JA, Starc G, Stathopoulou MG, Stavreski B, Steene-Johannessen J, Stehle P, Stein AD, Stergiou GS, Stessman J, Stieber J, Stöckl D, Stocks T, Stokwiszewski J, Stronks K, Strufaldi MW, Sun C-A, Sundström J, Sung Y-T, Suriyawongpaisal P, Sy RG, Tai ES, Tammesoo M-L, Tamosiunas A, Tang L, Tang X, Tanser F, Tao Y, Tarawneh MR, Tarqui-Mamani CB, Taylor A, Theobald H, Thijs L, Thuesen BH, Tjonneland A, Tolonen HK, Tolstrup JS, Topbas M, Topór-Madry R, Tormo MJ, Torrent M, Traissac P, Trichopoulos D, Trichopoulou A, Trinh OTH, Trivedi A, Tshepo L, Tulloch-Reid MK, Tuomainen T-P, Tuomilehto J, Turley ML, Tynelius P, Tzourio C, Ueda P, Ugel E, Ulmer H, Uusitalo HMT, Valdivia G, Valvi D, van der Schouw YT, Van Herck K, van Rossem L, van Valkengoed IGM, Vanderschueren D, Vanuzzo D, Vatten L, Vega T, Velasquez-Melendez G, Veronesi G, Verschuren WMM, Verstraeten R, Victora CG, Viet L, Viikari-Juntura E, Vineis P, Vioque J, Virtanen JK, Visvikis-Siest S, Viswanathan B, Vollenweider P, Voutilainen S, Vrdoljak A, Vrijheid M, Wade AN, Wagner A, Walton J, Mohamud WNW, Wang M-D, Wang Q, Wang YX, Wannamethee SG, Wareham N, Wederkopp N, Weerasekera D, Whincup PH, Widhalm K, Widyahening IS, Wiecek A, Wijga AH, Wilks RJ, Willeit J, Willeit P, Williams EA, Wilsgaard T, Wojtyniak B, Wong TY, Wong-McClure RA, Woo J, Woodward M, Wu AG, Wu FC, Wu SL, Xu H, Yan W, Yang X, Ye X, Yiallouros PK, Yoshihara A, Younger-Coleman NO, Yusoff AF, Yusoff MFM, Zambon S, Zdrojewski T, Zeng Y, Zhao D, Zhao W, Zheng Y, Zhu D, Zimmermann E, Zuñiga Cisneros J. Worldwide trends in blood pressure from 1975 to 2015: a pooled analysis of 1479 population-based measurement studies with 19·1 million participants. The Lancet. 2017;389:37–55. doi: 10.1016/S0140-6736(16)31919-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.48376.sa1

Decision letter

Editor: Ruth Loos¹

Reviewed by: Paul O'Reilly²

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Polygenic scores are being generated for an increasingly large number of traits, and are beginning to be used outside of a narrow research context. However many questions remain about how and when it is appropriate to use them. One oft-discussed concern is whether polygenic scores are "portable" between ancestry groups. But this study demonstrates that polygenic scores are often not even portable within ancestry groups when stratified on the basis of common demographic parameters. This and other results discussed in the paper raise important questions about the use of polygenic scores, and should be read and considered carefully by anyone designing, carrying out or applying the results of large-scale human genetic analyses.

Decision letter after peer review:

Thank you for submitting your article "Variable prediction accuracy of polygenic scores within an ancestry group" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Paul O'Reilly (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The manuscript in its current form would not, even with minor revision, be suitable for the journal. However, we would consider an extensively revised manuscript that addresses the following key concerns of the reviewers.

1) The paper will benefit from a clearer focus; while the authors aim was to simply establish the variable of portability between strata, they went quite further from this with the topics of assortative matting, sex balance, parents' effects, etc. Just achieving their main aim would make this paper very worthy as it points to issues that those already thinking in applying PRS in clinical practice right now need to urgently appreciate.

2) In focusing on the main aim, a deeper exploration of the causes that drive the lack of portability would increases the informativeness of the study. It was pointed out that the difference in portability across strata might be due to differences in power, sample size, heritability, phenotypic variance, phenotype distribution, influence of environment, insights in how these and potentially others affect portability would be helpful. This should also include a discussion on how is it that different strata can generate even higher predictive power than using the same stratum (in base and target) and why the PRS based on the full sample size, in the provided examples, always has a better "predictive ability" (highest R²), than PRS based on any of the strata.

3) Analysis of disease outcomes, here e.g. (extreme) obesity and hypertension, will be useful to translate findings to more clinical settings, including calculation of ROC-AUC, a standard easy-to-interpret predictive statistic.

Reviewer #1:

In this study, the authors examine the portability of polygenic scores within a homogeneous, generally healthy population of white British unrelated adults, living across the UK. They show that the PGS performance is influenced by age, sex, socio-economic status and study designs. The reduced accuracy of PGS is not (only) caused by environmental influences, the authors show that also the magnitude of the genetic effects among groups, and indirect effects of assortative mating affect the performance of the PGS.

While these findings raise important consideration with reference to the use of PGSs, it would have been interesting if they had gone a little deeper with their analyses; e.g. to also identify the features (distribution, prevalence of disease, effects of environment) of outcomes and covariates that impact the prediction performance the most.

It seems that for all traits, even though there are differences in R² for sex-specific, age-specific and SES-specific PGSs, the highest R² (still low) is achieved when the PGS is generated in the full population. Thus, isn't the conclusion then simply that stratification lowers the PGS performance and that a (blunt) PGS, based on the most comprehensive possible population performs best?

The authors only considered continuous outcomes for "prediction"; it would be informative to also include dichotomous/disease outcomes (related to the continuous risk factors), such as (extreme) obesity, hypertension, etc. That would also allow calculation prediction statistics such as AUC-ROC (sensitivity, specificity, PPV, NPV), which are easier to interpret in the context of clinical relevance.

The PGS was generated based on LD-based clumping, but it seems only one threshold was used; r²<0.1. From experience, it seems that higher LD threshold results in better performance of the PGS. Thus, it might be worth testing higher LD threshold for better performing PGSs.

Reviewer #2:

The manuscript of Mostafavi discusses a timely and important question: how accurate are polygenic risk scores (PRS) derived from GWAS studies and first validated in large, but single cohorts, when predicting risk in populations different to those used in their derivation. Recently GWAS studies have achieved sample sizes of a million subjects. In addition, the UK biobank has released rich datasets close to half a million people genotyped, extensively phenotyped, and followed so far for about 10 years. These resources have spurred the development of PRS for many traits ranging from complex disease onset, quantitative phenotypes, and even to sociological predictions. Furthermore, a number of direct to consumer companies have started to provide their customers PRS for medical traits. Therefore, the urgent question is how reliable are these scores and how well they port to other populations and people around the world.

Mostafavi work starts with the recent publications suggesting that PRS port poorly to populations other than European descent (the population object of most of the data available), in particular to African populations. There have been recent calls to expand GWAS and biobank style cohorts to include people from other ancestries around the world. However, Mostafavi analyzes now show that problems with portability are not just evident in different ancestries, but also to different strata from samples from European decent participants. If this is correct, this raises the urgency to hold off implementations of these scores in to healthcare settings until we truly understand how and when these scores can be applied across groups of people.

The authors provide a meticulous dissection of the situations when the PRS are not portable within people of the same Ancestry. They perform careful QC of the UK Biobank data, use reasonable methods to adjust the sample strata, perform the GWAS, and construct and test the PRS. The authors state that their main goal was to demonstrate how differences in the composition of the GWAS cohorts affect portability for different traits of interest within the same ancestry, and they achieve this. However, they do a good work exploring the possible factors involved in the reduced portability and leave us with important lessons on how our assumptions about GWAS data may be wrong at times. Their data and simulations add important information on how sex ratios, indirect effects, GxE, assortative mating and other factors confound the data and contribute to poor performance in subsets of the UK biobank subjects. The manuscript is well written and extensive details is provided on the methods and simulations. The main text is accompanied by extensive supplemental results of importance, which are referred at appropriate places in the text.

I have a number of questions that I would like the authors consider and when appropriate clarify in their final manuscript:

1) Throughout the manuscript the term "portability" is used and metrics that measure the performance of the PRS are used to judge whether portability is feasible or reduced. While this term has been used in prior literature w.r.t to the reduced ability of a PRS developed in cohorts of European descent people when applied to populations of different ancestry, this term is jargon. In statistics and machine learning the term that describes this poor performance is overfitting and the ability of a model, or in this case a PRS, to preform equally well in other samples, is termed generalization. Would it not be "overfitting" the term that more accurately describe the lack of "portability"?

2) The author's use R² to measure the reduced generalization of the PRS to the different cohorts' strata, and they concede that is not clear what would be the best metric to use for this analysis. However, many publications use the AUC as a metric to decide the best model and threshold of SNPs to pick and to describe how the PRS augments other predictor bases on covariates or other factors. It is not intuitive how R² correlates with AUC, and therefore it would be best if the authors try in at least one analysis to compare R² to AUC.

3) Recent publications have used the LDPred method that uses as input summary statistics rather than genotypes to develop PRS based on UK Biobank data. Some of these have shown generalization to others cohorst. Can the author's comment whether LDpred features distinguished it from the methods used in their analysis conferring an advantage? PRS derived from LDpred also tend to include many more SNPs to achieve a greater AUC. The author's explore P-value cut-offs (and hence number of SNPS) in some analysis, but it was not clear whether this suggested more SNPs increases overfitting. Do the results from the author's suggest that including more SNPs increases overfitting?

Reviewer #3:

The generalisability of polygenic scores is an important issue given the possibility of them being incorporated in to healthcare soon. This paper makes excellent use of the UK Biobank resource to present two sets of interesting results relating to the predictive power of PRS. However, I'm not sure if the way in which the primary results are framed in relation to portability is quite right.

The first set of results appear to highlight how GWAS are better powered in certain subgroups of the population (eg. GWAS of DBP are better powered in women, BMI in those in middle age, years of schooling in lower SES groups). Years of schooling is better predicted in SES 4 by GWAS on SES 1 than by SES 4 itself – this shows hyper-portability, not limited portability as discussed throughout the article. This is also true for BMI and DBP (to a lesser extent DBP). The bivariate LD Score results showing genetic correlations close to 1 seems to support this being principally a power (rather than portability) issue, in which there are different relative contributions of genetics and the environment across the groups, making some better-powered to capture – largely the same – genetics. Based on the results (inc. those of Figure 2), I'd have thought that this is most likely due to the environment resulting in convergence and genetics resulting in divergence (at least in the case of BMI and education) – eg. BMI converges to low values in old age, irrespective of genetic predisposition (relatively), while education levels of those in the top SES are forced to be similar due to the qualification requirements of professional jobs. Thus, the highest phenotypic variance, heritability, and thus PRS predictive power for BMI is from the lower age group and schooling from the lowest SES. How specific this pattern of results is to these particular traits is unclear, and certainly there seems to be a somewhat different cause for the DBP results. However, in general I see these results as being analogous to eg. how GWAS on Alzheimer's is perhaps being best-powered to detect common genetic variants when applied to individuals in their 70s, GWAS on blood pressure best-powered when applied to individuals in their 30s, and lung cancer GWAS being optimised when applied to non-smokers.

While the downstream impact of variable GWAS power in different sub-groups on PRS prediction is extremely valuable to see here, I think the implications of these results may be missed unless discussed in the context of differential GWAS power in different sub-groups – eg. in relation to ancestry, the consensus is that portability will be optimised by expanding GWAS sampling to non-European populations, but the analogy is not true here – the consequence of these results (at least for these traits) is that portability will be optimised if GWAS sampling is restricted to those sub-groups for which heritability is highest (since, otherwise, the portability in fact seems high).

The second set of results expand on similar analyses by Kong et al., 2018 and Selzam et al., 2019, showing that the indirect effects of familial genetics, and potentially assortative mating, could contribute to standard PRS predictive power. Any impact on portability is not explicitly demonstrated here, but the point is made that differences in culture (family structures etc) could lead to significant differential portability given the strong influence of indirect effects on standard PRS for some traits. These are great results to see, and will add to the literature on the topic of indirect genetic effects nicely.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Variable prediction accuracy of polygenic scores within an ancestry group" for further consideration by eLife. Your revised article has been evaluated by Mark McCarthy (Senior Editor) and a Reviewing Editor.

The manuscript has been improved and reviewers were generally satisfied with the changes made. However, while discussing the revisions some remaining issues were noted that need to be addressed before acceptance, as outlined below:

1) As sample size is an important determinant of the accuracy of the association effect sizes and thus also of the prediction accuracy, the sample sizes of each stratum needs to be clearly stated. Currently, the use of "unstratified" and "all" is misleading as it would suggest it include the data of strata combined. However, this seems not the case; the "unstratified" or "all" have now been ("artificially") reduced to match the sample size of the strata, which is not intuitive. After all, when data for multiple strata is available, one would be better off combining all data, rather than using the strata. We suggest to clearly report sample sizes, and to choose a more precise terminology to describe the "unstratified" or "all" group. Including an "all" group, that combined strata, as before, would seem informative to illustrate that sample size trumps population-specificity.

2) One of the reviewers noted difference between the original Figure 1 and the new Figure 1, and wondered whether the data changed (slightly) compared to their previous version because the results are slightly different (easiest to see by comparing median/boxes of young/old in BMI plot between original and new versions). It would be worth checking that this is just updated UKB data rather than an error. Furthermore, the R² for DBP in the "mixed GWAS" is just less than half than for the "total GWAS" (again comparing original Vs new figures), while for education there's a 10-fold difference (ie. doubling GWAS N has led to 10x greater R²).. This can be the case ('inflection points' and all), but it would be worth checking that there's no error here because it seems a little surprising.

Reviewer #1:

The authors have addressed all concerns and the revised version is more focused and conveys a clearer message.

While they now present the "all" GWAS for the same sample size as for the stratified GWAS, this has not been presented in a very clear way. In the text, L179 they state they use the same sample size, but it would be good to report what this sample size is. Intuitively, one would not expect that an "all" GWAS had the same sample size as a stratified GWAS.

I suggest to clarify this in the legends of the figures, and also more clearly in the text.

Reviewer #2:

Motafavi et al. have updated their manuscript based on reviewer's feedback and answered all the questions and comments posed during the review process of the first submission.

In my opinion their rebuttal and responses are satisfactory and their new and improved manuscript reflect their responses to reviewers. The author's maintained the focus of their manuscript in demonstrating that within ancestry there are a number of possible confounders and study design considerations that impact the prediction accuracy of PGS and put in question its immediate implementation in the clinic as some propose. In addition, and without going overboard expanding hugely the manuscript to explore all possible avenues to explain these effects, their perform a number of analyses that reject the simple idea than environment variance among strata is solely responsible for these effects, but in fact more complex phenomena are at play, including indirect effects, assortative mating, etc. These studies would be the starting point for future studies by the authors and others in the field. I thank the authors to indulge us by adding several interesting analyses suggested by reviewers, such as the inclusion of results by LDpred, a method gaining traction, and showing that as more markers are included in the PGS portability may be more difficult. The comprehensiveness of the supplementary material and clarity of explanations are superb.

After reading the new material in its entirety (apologies that took some time for me to go over this again), I find it a significant contribution to the field, and I do not have any other significant comments to offer recommending its acceptance for publication.

Reviewer #3:

I am satisfied with the revisions that the authors have made and have no further comments to make on the manuscript. Nice work!

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Variable prediction accuracy of polygenic scores within an ancestry group" for consideration by eLife. Your article has been discussed by Ruth Loos, the Reviewing Editor, and Mark McCarthy as the Senior Editor.

While you have been very responsive to the reviewers comments and concerns, we remain concerned that some of the presentation of findings remains counter-intuitive and could lead to mistaken inference. We would like you to consider one final set of revise your paper to include the following:

1) Include strata-combined analyses; i.e. a "full" analysis that combines the stratum-specific sample sizes into one large population. The current study shows that the stratum-specific PRSs perform better than strata-combined PRS, but only if the strata-combined (e.g. men+women) PRS is based on GWAS of the same size as the individual strata (e.g. men, women).

In the very first version of the paper, it was clear that the strata-combined PRS (based on combined stratum-specific data) performs much better than any of the stratum-specific PRS: it seems that the larger sample size overcomes any advantage of stratum-specific analyses (in these examples at least). In response to the reviewers' comments, the authors "reduced" the sample size of the strata-combined analyses to the size of that of the individual strata. However, reducing a strata-combined analyses to the same sample size as the stratum-specific sample sizes is not an intuitive comparison (i.e. a combined analyses indicated that samples are merged into a bigger sample), as stratum-specific implies that these are a subset of the strata-combined. The current version does not address this issue: the labeling of the (sub)groups has been changed, but that has the potential to mislead the reader to infer that stratum-specific analyses will provide the best prediction, whereas a strong predictor is also sample size.

2) Discuss the role of sample size in prediction in the Discussion section;

i.e. put the impact of stratum-specificity in prediction in comparison to the impact of sample size in the (this relates to the comment above). It is important that readers understand that both contribute to prediction, and that possibly sample size is even more important than stratum-specificity. It would be good if the authors can speculate on any circumstances in which the strata-combined analysis would not provide better overall prediction than the stratum-specific subsets.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before we move to final acceptance. These relate to the discussion about the trade-off between sample size and stratum-specific effects on PRS performance that have been the subject of the recent revision requests. Thank you for adding additional information and figures that address this issue. However, as per the last round of revision requests, we do not feel that this question has been adequately addressed in the Discussion, and that some explicit text on this matter forms an essential part of the paper. As you will have seen from the to and fro over the revisions, this is an issue that has exercised several of the editors and reviewers, and I can’t imagine that it will be any less important to the readers of the paper. Put simply, the question is this: based on the data you have generated, are there any real world situations where researchers would be better off splitting a data set into its component strata (eg by gender) and proceeding on the basis of stratum specific risk scores, rather than using the whole data set? The data in the additional figure, suggest not (at least in the models and situations you tested), which is why this is important to place into context.

eLife. 2020 Jan 30;9:e48376. doi: 10.7554/eLife.48376.sa2

Author response

The manuscript in its current form would not, even with minor revision, be suitable for the journal. However, we would consider an extensively revised manuscript that addresses the following key concerns of the reviewers.

1) The paper will benefit from a clearer focus; while the authors aim was to simply establish the variable of portability between strata, they went quite further from this with the topics of assortative matting, sex balance, parents' effects, etc. Just achieving their main aim would make this paper very worthy as it points to issues that those already thinking in applying PRS in clinical practice right now need to urgently appreciate.

We agree that establishing variable prediction accuracy--in clinical settings as well as in other areas in which polygenic scores are increasingly deployed--is the most important result in the paper. Assortative mating, indirect effects, GxE and stratification are examined only through the prism of portability. We have tried to emphasize our main focus in the revisions. In particular, to make the relevance to human diseases more explicit, we have added disease traits to our analysis, as well as considering three binary traits (Appendix—figure 13 and 14, and more detail in the response to comment #3).

2) In focusing on the main aim, a deeper exploration of the causes that drive the lack of portability would increases the informativeness of the study. It was pointed out that the difference in portability across strata might be due to differences in power, sample size, heritability, phenotypic variance, phenotype distribution, influence of environment, insights in how these and potentially others affect portability would be helpful. This should also include a discussion on how is it that different strata can generate even higher predictive power than using the same stratum (in base and target) and why the PRS based on the full sample size, in the provided examples, always has a better "predictive ability" (highest R²), than PRS based on any of the strata.

Following the helpful suggestions of reviewers, we realized that one of our analyses was misleading, in that the GWAS sample size in “all” was larger than that in the stratified examples, providing better power for the “all” GWAS set but in an unfair comparison. We now changed the “all” GWAS to have the same sample size as the GWAS conducted within each stratum. As we show, once sample sizes are matched, prediction accuracy for the “all”-based PGS is intermediate between the PGS based on the stratified samples (Figure 1).

In addition, we have expanded the Discussion of how a different stratum can generate even higher predictive power than using the same stratum in the GWAS and prediction sets. In short, the higher prediction accuracy results from the increased power of the GWAS in the stratum with the largest h².

Beyond this discussion, we feel that an exploration of the precise factors driving variability in prediction accuracy for the traits we present is both beyond the scope of this paper and somewhat counterproductive to our main goal. In highlighting these particular examples, we wished to highlight the problem of generalizing polygenic scores from GWAS sets to prediction sets within a given ancestry group, rather than to explain the behavior of specific traits. In that regard, a detailed examination of the examples may draw attention away from our desired focus.

Having said that, we do discuss several factors that we believe are of general importance for portability across many traits, including for these examples. Notably, Figure 2A-C show that much of the trends in Figure 1 can be explained by differences in (SNP) heritabilities. In Figure 2D-F, we further show that these results cannot be explained by differences in the extent of environmental variance across strata alone. This conclusion stands in contrast to the common notion in the field (including ours, prior to performing this analysis) that h² would differ across groups primarily because of differences in the extent of environmental variance. We appreciate that this point could have been easily missed by readers as originally framed and we have therefore elaborated upon it in our revision and hopefully clarified our thinking.

3) Analysis of disease outcomes, here e.g. (extreme) obesity and hypertension, will be useful to translate findings to more clinical settings, including calculation of ROC-AUC, a standard easy-to-interpret predictive statistic.

We have added analyses on extreme obesity and hypertension, using ROC-AUC as a measure for prediction accuracy (Appendix—figure 13). We find that the (incremental) AUC trends for obesity, for example, quantitatively mirror those observed for the prediction of BMI as a continuous trait. We have also added five examples of disease traits with relatively high heritability to the Appendix (Appendix—figure 14); in three of these cases, we again see differences in prediction accuracy depending on the GWAS set.

Reviewer #1:

In this study, the authors examine the portability of polygenic scores within a homogeneous, generally healthy population of white British unrelated adults, living across the UK. They show that the PGS performance is influenced by age, sex, socio-economic status and study designs. The reduced accuracy of PGS is not (only) caused by environmental influences, the authors show that also the magnitude of the genetic effects among groups, and indirect effects of assortative mating affect the performance of the PGS.

While these findings raise important consideration with reference to the use of PGSs, it would have been interesting if they had gone a little deeper with their analyses; e.g. to also identify the features (distribution, prevalence of disease, effects of environment) of outcomes and covariates that impact the prediction performance the most.

We agree that a dissection of different factors that influence portability would be of great utility; at the same time such a breakdown is beyond our scope in this paper and might require a different approach. For instance, we use incremental R² throughout the paper mainly to facilitate comparison with key papers published previously in this area (e.g., Lee et al., 2018; Martin et al., 2019). However, it is not an ideal measure for breaking down these factors because it confounds biases and estimation noise. Furthermore, as noted in our response to the editor, we worry that focusing on the breakdown for the three specific examples may draw attention away from the generality of the concerns that we wish to raise, as these contributions likely vary in nature and magnitude across traits.

We do however discuss several factors that are likely to be generally important for portability across traits, including these examples. Notably, Figure 2A-C show that much of the trends in Figure 1 can be explained by differences in the heritability across strata.

We had initially hypothesized that these differences in h² are mostly due to differences in environmental variance across strata, but as we show in Figure 2D-F, a model with constant genetic variance and changing environmental variance provides a poor fit to the data. As noted in response to the editor, we now clarify this point in revisions of the main text. Given these findings, we hypothesize that there is an interaction between genetic effects and sample characteristics, such that genetic effects are systematically larger in the groups with higher prediction accuracy, even though the effect sizes are highly correlated among strata (Appendix—table 2, Appendix—figure 3) (i.e., that we are seeing what is sometimes called “genetic amplification” in some strata).

It seems that for all traits, even though there are differences in R² for sex-specific, age-specific and SES-specific PGSs, the highest R² (still low) is achieved when the PGS is generated in the full population. Thus, isn't the conclusion then simply that stratification lowers the PGS performance and that a (blunt) PGS, based on the most comprehensive possible population performs best?

We apologize for the previous presentation, which led to confusion. As we detail in the response to the editor, we have modified Figure 1 to address this point (see above).

The authors only considered continuous outcomes for "prediction"; it would be informative to also include dichotomous/disease outcomes (related to the continuous risk factors), such as (extreme) obesity, hypertension, etc. That would also allow calculation prediction statistics such as AUC-ROC (sensitivity, specificity, PPV, NPV), which are easier to interpret in the context of clinical relevance.

We have now done so, both by dichotomizing continuous traits such as BMI and blood pressure and considering them as binary, and by adding five additional disease traits. Following the reviewers’ suggestions, we have used (incremental) AUC to measure prediction accuracy in these binary traits. As can be seen (Appendix—figures 13 and 14), the qualitative conclusions remain.

The PGS was generated based on LD-based clumping, but it seems only one threshold was used; r²<0.1. From experience, it seems that higher LD threshold results in better performance of the PGS. Thus, it might be worth testing higher LD threshold for better performing PGSs.

Our goal here was not to maximize prediction accuracy, but to show that at a given threshold, it is variable across groups of the same ancestry. Nonetheless, to evaluate the sensitivity of our analysis to the choice of the PGS model, we added an analysis repeating the three examples in Figure 1 with LDpred. We find that LDpred generally outperforms the prediction accuracy of clumping approaches. Most importantly for this paper, we find that the trends across strata remain qualitatively unchanged and are often accentuated when LDpred is used instead of clumping (Appendix—figure 2).

Reviewer #2:

The manuscript of Mostafavi discusses a timely and important question: how accurate are polygenic risk scores (PRS) derived from GWAS studies and first validated in large, but single cohorts, when predicting risk in populations different to those used in their derivation. Recently GWAS studies have achieved sample sizes of a million subjects. In addition, the UK biobank has released rich datasets close to half a million people genotyped, extensively phenotyped, and followed so far for about 10 years. These resources have spurred the development of PRS for many traits ranging from complex disease onset, quantitative phenotypes, and even to sociological predictions. Furthermore, a number of direct to consumer companies have started to provide their customers PRS for medical traits. Therefore, the urgent question is how reliable are these scores and how well they port to other populations and people around the world.

Mostafavi work starts with the recent publications suggesting that PRS port poorly to populations other than European descent (the population object of most of the data available), in particular to African populations. There have been recent calls to expand GWAS and biobank style cohorts to include people from other ancestries around the world. However, Mostafavi analyzes now show that problems with portability are not just evident in different ancestries, but also to different strata from samples from European decent participants. If this is correct, this raises the urgency to hold off implementations of these scores in to healthcare settings until we truly understand how and when these scores can be applied across groups of people.

The authors provide a meticulous dissection of the situations when the PRS are not portable within people of the same Ancestry. They perform careful QC of the UK Biobank data, use reasonable methods to adjust the sample strata, perform the GWAS, and construct and test the PRS. The authors state that their main goal was to demonstrate how differences in the composition of the GWAS cohorts affect portability for different traits of interest within the same ancestry, and they achieve this. However, they do a good work exploring the possible factors involved in the reduced portability and leave us with important lessons on how our assumptions about GWAS data may be wrong at times. Their data and simulations add important information on how sex ratios, indirect effects, GxE, assortative mating and other factors confound the data and contribute to poor performance in subsets of the UK biobank subjects. The manuscript is well written and extensive details is provided on the methods and simulations. The main text is accompanied by extensive supplemental results of importance, which are referred at appropriate places in the text.

I have a number of questions that I would like the authors consider and when appropriate clarify in their final manuscript:

1) Throughout the manuscript the term "portability" is used and metrics that measure the performance of the PRS are used to judge whether portability is feasible or reduced. While this term has been used in prior literature w.r.t to the reduced ability of a PRS developed in cohorts of European descent people when applied to populations of different ancestry, this term is jargon. In statistics and machine learning the term that describes this poor performance is overfitting and the ability of a model, or in this case a PRS, to preform equally well in other samples, is termed generalization. Would it not be "overfitting" the term that more accurately describe the lack of "portability"?

We agree that the terminology is important here, but precisely because the problem of portability has to date largely been reduced to a question of population genetics alone (i.e., discussed in terms of allele frequencies and LD differences), we think it is a good idea to use the same terminology in highlighting the numerous other factors that may affect prediction in samples that differ from the GWAS sample. As we discuss in the text, these factors (e.g., SES composition, indirect effects, assortative mating patterns, stratification) differ among continental groups or different genetic ancestry groups in the same country. Therefore, our within-ancestry group analyses pertain to cross-ancestry portability as well.

2) The author's use R² to measure the reduced generalization of the PRS to the different cohorts' strata, and they concede that is not clear what would be the best metric to use for this analysis. However, many publications use the AUC as a metric to decide the best model and threshold of SNPs to pick and to describe how the PRS augments other predictor bases on covariates or other factors. It is not intuitive how R² correlates with AUC, and therefore it would be best if the authors try in at least one analysis to compare R² to AUC.

As the reviewer notes, we use R² not because we believe it is in any sense an optimal measure, but to allow for more ready comparisons to previous work. Following the reviewer’s suggestion, we have now added analyses with AUC instead of R² (Appendix—figures 13 and 14) for a total of eight traits, including five disease traits. The qualitative conclusions are the same; a more in depth examination of how AUC relates to R² is, in our opinion, beyond the scope of this paper.

3) Recent publications have used the LDPred method that uses as input summary statistics rather than genotypes to develop PRS based on UK Biobank data. Some of these have shown generalization to others cohorst. Can the author's comment whether LDpred features distinguished it from the methods used in their analysis conferring an advantage? PRS derived from LDpred also tend to include many more SNPs to achieve a greater AUC. The author's explore P-value cut-offs (and hence number of SNPS) in some analysis, but it was not clear whether this suggested more SNPs increases overfitting. Do the results from the author's suggest that including more SNPs increases overfitting?

As noted in our response to the reviewer #1, we added an analysis repeating the three examples in Figure 1 with LDpred. We find that LDpred (with a prior probability of 1 on the loci being causal) generally outperforms the prediction accuracy of clumping methods. Most importantly for this paper, we find that the trends across strata remain qualitatively unchanged and are often accentuated when LDpred is used instead of clumping (Appendix—figure 2).

Reviewer #3:

The generalisability of polygenic scores is an important issue given the possibility of them being incorporated in to healthcare soon. This paper makes excellent use of the UK Biobank resource to present two sets of interesting results relating to the predictive power of PRS. However, I'm not sure if the way in which the primary results are framed in relation to portability is quite right.

The first set of results appear to highlight how GWAS are better powered in certain subgroups of the population (eg. GWAS of DBP are better powered in women, BMI in those in middle age, years of schooling in lower SES groups). Years of schooling is better predicted in SES 4 by GWAS on SES 1 than by SES 4 itself – this shows hyper-portability, not limited portability as discussed throughout the article. This is also true for BMI and DBP (to a lesser extent DBP). The bivariate LD Score results showing genetic correlations close to 1 seems to support this being principally a power (rather than portability) issue, in which there are different relative contributions of genetics and the environment across the groups, making some better-powered to capture – largely the same – genetics. Based on the results (inc. those of Figure 2), I'd have thought that this is most likely due to the environment resulting in convergence and genetics resulting in divergence (at least in the case of BMI and education) – eg. BMI converges to low values in old age, irrespective of genetic predisposition (relatively), while education levels of those in the top SES are forced to be similar due to the qualification requirements of professional jobs. Thus, the highest phenotypic variance, heritability, and thus PRS predictive power for BMI is from the lower age group and schooling from the lowest SES. How specific this pattern of results is to these particular traits is unclear, and certainly there seems to be a somewhat different cause for the DBP results. However, in general I see these results as being analogous to eg. how GWAS on Alzheimer's is perhaps being best-powered to detect common genetic variants when applied to individuals in their 70s, GWAS on blood pressure best-powered when applied to individuals in their 30s, and lung cancer GWAS being optimised when applied to non-smokers.

We agree that a central factor driving the trends of Figure 1 is likely the differential power (and the precision of the estimates) across GWAS strata. We now further clarify this point in the text. We note further that for the three examples in Figure 1 and Figure 2, the data are poorly explained by assuming that the environmental variance differs across strata but the genetic variance remains the same (see Figure 2D-F). We therefore hypothesize that genetic variance is also substantially variable across strata; in other words, that there is an interaction between the genetics and the strata characteristics (Figure 2, Appendix—figure 3).

While the downstream impact of variable GWAS power in different sub-groups on PRS prediction is extremely valuable to see here, I think the implications of these results may be missed unless discussed in the context of differential GWAS power in different sub-groups – eg. in relation to ancestry, the consensus is that portability will be optimised by expanding GWAS sampling to non-European populations, but the analogy is not true here – the consequence of these results (at least for these traits) is that portability will be optimised if GWAS sampling is restricted to those sub-groups for which heritability is highest (since, otherwise, the portability in fact seems high).

The reviewer brings up a very interesting point. We have expanded the Discussion of how a different stratum can generate higher predictive power than using the same stratum in the GWAS and prediction sets. As the reviewer notes, the higher prediction accuracy is largely a result of the increased power of the GWAS in the stratum with the largest (SNP) h² estimates.

The implications for how best to conduct a GWAS are not obvious to us, for a number of reasons: one, the three examples we present were “cherry-picked” in the sense that we had prior information about what sample characteristics may matter for the trait (e.g., sex for diastolic blood pressure). We used them to show that such characteristics can matter, and matter just as much as factors like ancestry. But these are just some characteristics about which we had prior knowledge, and it seems likely that there exist many more, unknown characteristics that affect heritability for these traits--and that for some other traits, it will be hard to guess a priori what may matter. We now show an additional example in the Appendix: when we focus on a trait of interest in the social sciences (educational attainment) and follow the study design used in most studies, the prediction accuracy of educational attainment is somewhat higher for individuals with a (single) sibling than individuals without a sibling, when the GWAS is done in individuals with one sibling (see Appendix—figure 12).

Second, as we discuss, factors such as GxE, assortative mating and indirect effects are soaked up into the GWAS estimates--and critically also into the SNP heritability estimates. Thus, the choice of a GWAS sample is about more than detection power; it is implicitly making a choice about all sorts of sample characteristics than may or may not hold true of the prediction set.

Given these considerations, it is not obvious to us how to choose an optimal GWAS sample a priori. Notably, as mentioned by the reviewer, the most “diverse” set (with respect to the sample characteristic) is not necessarily the optimal set in which to conduct the GWAS: in the revised version of Figure 1, we now show that after matching the sample size, the performance of a GWAS done on an unstratified sample (“all”) is intermediate between the performances of GWAS on stratified samples (Figure 1).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved and reviewers were generally satisfied with the changes made. However, while discussing the revisions some remaining issues were noted that need to be addressed before acceptance, as outlined below:

1) As sample size is an important determinant of the accuracy of the association effect sizes and thus also of the prediction accuracy, the sample sizes of each stratum needs to be clearly stated. Currently, the use of "unstratified" and "all" is misleading as it would suggest it include the data of strata combined. However, this seems not the case; the "unstratified" or "all" have now been ("artificially") reduced to match the sample size of the strata, which is not intuitive. After all, when data for multiple strata is available, one would be better off combining all data, rather than using the strata. We suggest to clearly report sample sizes, and to choose a more precise terminology to describe the "unstratified" or "all" group. Including an "all" group, that combined strata, as before, would seem informative to illustrate that sample size trumps population-specificity.

We agree that the term “all” is unclear, and thank the editors and reviewer #1 for their suggestion. We now use the term “diverse” to refer to the sample that consists of individuals from all strata. We also report the sample sizes used for GWAS in Figure 1 in the main text and the figure caption, in addition to the Materials and methods.

2) One of the reviewers noted difference between the original Figure 1 and the new figure 1, and wondered whether the data changed (slightly) compared to their previous version because the results are slightly different (easiest to see by comparing median/boxes of young/old in BMI plot between original and new versions). It would be worth checking that this is just updated UKB data rather than an error. Furthermore, the R² for DBP in the "mixed GWAS" is just less than half than for the "total GWAS" (again comparing original Vs new figures), while for education there's a 10-fold difference (ie. doubling GWAS N has led to 10x greater R²).. This can be the case ('inflection points' and all), but it would be worth checking that there's no error here because it seems a little surprising.

For our analysis of DBP by sex we downsampled the “diverse” GWAS by a factor of ~2 (given two sex strata), whereas for BMI by age and years of schooling by SES we downsampled by a factor of ~4 (given four BMI and SES strata). As a result, the drop in R² between our initial submission (without downsampling) and our resubmission (with downsampling) is more pronounced for BMI and years of schooling compared to DBP.

In addition, since our initial submission we removed 60 UKB participants that withdrew from the UKB, and repeated our analyses. Therefore, slight differences are expected due to random sampling.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

While you have been very responsive to the reviewers comments and concerns, we remain concerned that some of the presentation of findings remains counter-intuitive and could lead to mistaken inference. We would like you to consider one final set of revise your paper to include the following:

1) Include strata-combined analyses; i.e. a "full" analysis that combines the stratum-specific sample sizes into one large population. The current study shows that the stratum-specific PRSs perform better than strata-combined PRS, but only if the strata-combined (e.g. men+women) PRS is based on GWAS of the same size as the individual strata (e.g. men, women).

In the very first version of the paper, it was clear that the strata-combined PRS (based on combined stratum-specific data) performs much better than any of the stratum-specific PRS: it seems that the larger sample size overcomes any advantage of stratum-specific analyses (in these examples at least). In response to the reviewers' comments, the authors "reduced" the sample size of the strata-combined analyses to the size of that of the individual strata. However, reducing a strata-combined analyses to the same sample size as the stratum-specific sample sizes is not an intuitive comparison (i.e. a combined analyses indicated that samples are merged into a bigger sample), as stratum-specific implies that these are a subset of the strata-combined. The current version does not address this issue: the labeling of the (sub)groups has been changed, but that has the potential to mislead the reader to infer that stratum-specific analyses will provide the best prediction, whereas a strong predictor is also sample size.

2) Discuss the role of sample size in prediction in the Discussion section;

i.e. put the impact of stratum-specificity in prediction in comparison to the impact of sample size in the (this relates to the comment above). It is important that readers understand that both contribute to prediction, and that possibly sample size is even more important than stratum-specificity. It would be good if the authors can speculate on any circumstances in which the strata-combined analysis would not provide better overall prediction than the stratum-specific subsets.

We thank the editors for their consideration and input. We have added a Figure (Appendix—figure 1) to reflect the point raised by the editors. We have also added to the main text explicit comments on the higher prediction accuracy for the much-larger sample size of a GWAS combining all strata (i.e., without matching sample sizes).

Having said that, we agree with previous reviewers’ comments (and revised our manuscript accordingly) on the potential for misinterpretation in including a GWAS with a larger sample size in a main-text figure. Further, in matching sample sizes we mimic the methodology in previous research on polygenic score portability, to highlight the factors other than allele frequencies and linkage disequilibrium that affect polygenic prediction. Not matching would confound the effect of these factors with the effect of sample size.

We agree with the editors that the interplay of prediction accuracy with sample size is very important, but see it as orthogonal to the points discussed in the paper. In fact, our analyses explicitly control for sample size effect throughout. We therefore suspect that adding such a discussion might only act to confuse the reader.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before we move to final acceptance. These relate to the discussion about the trade-off between sample size and stratum-specific effects on PRS performance that have been the subject of the recent revision requests. Thank you for adding additional information and figures that address this issue. However, as per the last round of revision requests, we do not feel that this question has been adequately addressed in the Discussion, and that some explicit text on this matter forms an essential part of the paper. As you will have seen from the to and fro over the revisions, this is an issue that has exercised several of the editors and reviewers, and I can’t imagine that it will be any less important to the readers of the paper. Put simply, the question is this: based on the data you have generated, are there any real world situations where researchers would be better off splitting a data set into its component strata (eg by gender) and proceeding on the basis of stratum specific risk scores, rather than using the whole data set? The data in the additional figure, suggest not (at least in the models and situations you tested), which is why this is important to place into context.

We have added a paragraph to the Discussion section discussing the implications of our findings for PGS construction, notably on the choice between GWAS samples limited to one stratum versus the union of all strata, and removed the previous discussion of this point from the Results section.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M. 2019. Variable prediction accuracy of polygenic scores within an ancestry group. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

Supplementary Materials

Transparent reporting form

elife-48376-transrepform.pdf^{(276.5KB, pdf)}

Data Availability Statement

The GWAS summary statistics generated in this study have been uploaded to Dryad.

The following dataset was generated:

Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M. 2019. Variable prediction accuracy of polygenic scores within an ancestry group. Dryad Digital Repository.

[bib1] Adhikari K, Mendoza-Revilla J, Sohail A, Fuentes-Guajardo M, Lampert J, Chacón-Duque JC, Hurtado M, Villegas V, Granja V, Acuña-Alonzo V, Jaramillo C, Arias W, Lozano RB, Everardo P, Gómez-Valdés J, Villamil-Ramírez H, Silva de Cerqueira CC, Hunemeier T, Ramallo V, Schuler-Faccini L, Salzano FM, Gonzalez-José R, Bortolini M-C, Canizales-Quinteros S, Gallo C, Poletti G, Bedoya G, Rothhammer F, Tobin DJ, Fumagalli M, Balding D, Ruiz-Linares A. A GWAS in latin americans highlights the convergent evolution of lighter skin pigmentation in eurasia. Nature Communications. 2019;10:385. doi: 10.1038/s41467-018-08147-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Barcellos SH, Carvalho LS, Turley P. Education can reduce health disparities related to genetic risk of obesity: evidence from a british reform. bioRxiv. 2018 doi: 10.1101/260463. [DOI] [PMC free article] [PubMed]

[bib3] Belsky DW, Domingue BW, Wedow R, Arseneault L, Boardman JD, Caspi A, Conley D, Fletcher JM, Freese J, Herd P, Moffitt TE, Poulton R, Sicinski K, Wertz J, Harris KM. Genetic analysis of social-class mobility in five longitudinal studies. PNAS. 2018;115:E7275–E7284. doi: 10.1073/pnas.1801238115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, Boyle EA, Zhang X, Racimo F, Pritchard JK, Coop G. Reduced signal for polygenic adaptation of height in UK biobank. eLife. 2019;8:e39725. doi: 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLOS Genetics. 2014;10:e1004412. doi: 10.1371/journal.pgen.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bien SA, Wojcik GL, Hodonsky CJ, Gignoux CR, Cheng I, Matise TC, Peters U, Kenny EE, North KE. The future of genomic studies must be globally representative: perspectives from PAGE. Annual Review of Genomics and Human Genetics. 2019;20:181–200. doi: 10.1146/annurev-genom-091416-035517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Branigan AR, McCallum KJ, Freese J. Variation in the heritability of educational attainment: an international Meta-Analysis. Social Forces. 2013;92:109–140. doi: 10.1093/sf/sot076. [DOI] [Google Scholar]

[bib10] Briley DA, Tucker-Drob EM. Explaining the increasing heritability of cognitive ability across development: a meta-analysis of longitudinal twin and adoption studies. Psychological Science. 2013;24:1704–1713. doi: 10.1177/0956797613478618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM, Schizophrenia Working Group of the Psychiatric Genomics Consortium LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J. The UK biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Chetty R, Hendren N. Race and Economic Opportunity in the United States: An Intergenerational Perspective. National Bureau of Economic Research; 2018. [Google Scholar]

[bib15] Conley D. Being Black, Living in the Red: Race, Wealth, and Social Policy in America. University of California Press; 2010. [Google Scholar]

[bib16] Conley D, Domingue BW, Cesarini D, Dawes C, Rietveld CA, Boardman JD. Is the effect of parental education on offspring biased or moderated by genotype? Sociological Science. 2015;2:82–105. doi: 10.15195/v2.a6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Conley D. Socio-Genomic research using Genome-Wide molecular data. Annual Review of Sociology. 2016;42:275–299. doi: 10.1146/annurev-soc-081715-074316. [DOI] [Google Scholar]

[bib18] Davies NM, Dickson M, Davey Smith G, van den Berg GJ, Windmeijer F. The causal effects of education on health outcomes in the UK biobank. Nature Human Behaviour. 2018;2:117–125. doi: 10.1038/s41562-017-0279-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] De La Vega FM, Bustamante CD. Polygenic risk scores: a biased prediction? Genome Medicine. 2018;10:100. doi: 10.1186/s13073-018-0610-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Domingue BW, Fletcher J, Conley D, Boardman JD. Genetic and educational assortative mating among US adults. PNAS. 2014;111:7996–8000. doi: 10.1073/pnas.1321426111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Domingue BW, Belsky DW, Fletcher JM, Conley D, Boardman JD, Harris KM. The social genome of friends and schoolmates in the national longitudinal study of adolescent to adult health. PNAS. 2018;115:702–707. doi: 10.1073/pnas.1711803115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLOS Genetics. 2013;9:e1003348. doi: 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Duncan L, Shen H, Gelaye B, Ressler K, Feldman M, Peterson R, Domingue B. Analysis of polygenic score usage and performance across diverse human populations. bioRxiv. 2018 doi: 10.1101/398396. [DOI] [PMC free article] [PubMed]

[bib24] Edge MD, Coop G. Reconstructing the history of polygenic scores using coalescent trees. Genetics. 2019;211:235–262. doi: 10.1534/genetics.118.301687. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Elks CE, den Hoed M, Zhao JH, Sharp SJ, Wareham NJ, Loos RJ, Ong KK. Variability in the heritability of body mass index: a systematic review and meta-regression. Frontiers in Endocrinology. 2012;3:29. doi: 10.3389/fendo.2012.00029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Euesden J, Lewis CM, O'Reilly PF. PRSice: polygenic risk score software. Bioinformatics. 2015;31:1466–1468. doi: 10.1093/bioinformatics/btu848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Field Y, Boyle EA, Telis N, Gao Z, Gaulton KJ, Golan D, Yengo L, Rocheleau G, Froguel P, McCarthy MI, Pritchard JK. Detection of human adaptation during the past 2000 years. Science. 2016;354:760–764. doi: 10.1126/science.aag0776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, Collins R, Allen NE. Comparison of Sociodemographic and Health-Related characteristics of UK biobank participants with those of the general population. American Journal of Epidemiology. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Ge T, Chen CY, Neale BM, Sabuncu MR, Smoller JW. Phenome-wide heritability analysis of the UK biobank. PLOS Genetics. 2017;13:e1006711. doi: 10.1371/journal.pgen.1006711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Gibson G. The environmental contribution to gene expression profiles. Nature Reviews Genetics. 2008;9:575–581. doi: 10.1038/nrg2383. [DOI] [PubMed] [Google Scholar]

[bib32] Haworth S, Mitchell R, Corbin L, Wade KH, Dudding T, Budu-Aggrey A, Carslake D, Hemani G, Paternoster L, Smith GD, Davies N, Lawson DJ, J Timpson N. Apparent latent structure within the UK biobank sample has implications for epidemiological analysis. Nature Communications. 2019;10:333. doi: 10.1038/s41467-018-08219-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Henderson CR. Applications of Linear Models in Animal Breeding. Vol. 462. University of Guelph Guelph; 1984. [Google Scholar]

[bib34] Höllinger I, Pennings PS, Hermisson J. Polygenic adaptation: From sweeps to subtle frequency shifts. PLOS Genetics. 2019;15:e1008035. doi: 10.1371/journal.pgen.1008035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, Lai FY, Kaptoge S, Brozynska M, Wang T, Ye S, Webb TR, Rutter MK, Tzoulaki I, Patel RS, Loos RJF, Keavney B, Hemingway H, Thompson J, Watkins H, Deloukas P, Di Angelantonio E, Butterworth AS, Danesh J, Samani NJ, UK Biobank CardioMetabolic Consortium CHD Working Group Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. Journal of the American College of Cardiology. 2018;72:1883–1893. doi: 10.1016/j.jacc.2018.07.079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Kathiresan S, Melander O, Anevski D, Guiducci C, Burtt NP, Roos C, Hirschhorn JN, Berglund G, Hedblad B, Groop L, Altshuler DM, Newton-Cheh C, Orho-Melander M. Polymorphisms associated with cholesterol and risk of cardiovascular events. New England Journal of Medicine. 2008;358:1240–1249. doi: 10.1056/NEJMoa0706728. [DOI] [PubMed] [Google Scholar]

[bib37] Kerminen S, Martin AR, Koskela J, Ruotsalainen SE, Havulinna AS, Surakka I, Palotie A. Geographic variation and Bias in Polygenic scores of complex diseases and traits in Finland. bioRxiv. 2018 doi: 10.1101/485441. [DOI] [PMC free article] [PubMed]

[bib38] Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, Distefano M, Senol-Cosar O, Haas ME, Bick A, Aragam KG, Lander ES, Smith GD, Mason-Suares H, Fornage M, Lebo M, Timpson NJ, Kaplan LM, Kathiresan S. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177:587–596. doi: 10.1016/j.cell.2019.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Kim MS, Patel KP, Teng AK, Berens AJ, Lachance J. Genetic disease risks can be misestimated across global populations. Genome Biology. 2018;19:179. doi: 10.1186/s13059-018-1561-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Kong A, Thorleifsson G, Frigge ML, Vilhjalmsson BJ, Young AI, Thorgeirsson TE, Benonisdottir S, Oddsson A, Halldorsson BV, Masson G, Gudbjartsson DF, Helgason A, Bjornsdottir G, Thorsteinsdottir U, Stefansson K. The nature of nurture: effects of parental genotypes. Science. 2018;359:424–428. doi: 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]

[bib42] Lawson DJ, Davies NM, Haworth S, Ashraf B, Howe L, Crawford A, Hemani G, Davey Smith G, Timpson NJ. Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Human Genetics. 2020;139:1–2. doi: 10.1007/s00439-019-02014-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, Nguyen-Viet TA, Bowers P, Sidorenko J, Karlsson Linnér R, Fontana MA, Kundu T, Lee C, Li H, Li R, Royer R, Timshel PN, Walters RK, Willoughby EA, Yengo L, Alver M, Bao Y, Clark DW, Day FR, Furlotte NA, Joshi PK, Kemper KE, Kleinman A, Langenberg C, Mägi R, Trampush JW, Verma SS, Wu Y, Lam M, Zhao JH, Zheng Z, Boardman JD, Campbell H, Freese J, Harris KM, Hayward C, Herd P, Kumari M, Lencz T, Luan J, Malhotra AK, Metspalu A, Milani L, Ong KK, Perry JRB, Porteous DJ, Ritchie MD, Smart MC, Smith BH, Tung JY, Wareham NJ, Wilson JF, Beauchamp JP, Conley DC, Esko T, Lehrer SF, Magnusson PKE, Oskarsson S, Pers TH, Robinson MR, Thom K, Watson C, Chabris CF, Meyer MN, Laibson DI, Yang J, Johannesson M, Koellinger PD, Turley P, Visscher PM, Benjamin DJ, Cesarini D, 23andMe Research Team, COGENT (Cognitive Genomics Consortium), Social Science Genetic Association Consortium Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics. 2018;50:1112–1121. doi: 10.1038/s41588-018-0147-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Listgarten J, Lippert C, Heckerman D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nature Genetics. 2013;45:470–471. doi: 10.1038/ng.2620. [DOI] [PubMed] [Google Scholar]

[bib45] Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nature Genetics. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Vol. 1. Sunderland, MA: Sinauer; 1998. [Google Scholar]

[bib47] Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human demographic history impacts genetic risk prediction across diverse populations. The American Journal of Human Genetics. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Martin AR, Teferra S, Möller M, Hoal EG, Daly MJ. The critical needs and challenges for genetic architecture studies in Africa. Current Opinion in Genetics & Development. 2018;53:113–120. doi: 10.1016/j.gde.2018.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Mathieson I, McVean G. Reply to: "FaST-LMM-Select for addressing confounding from spatial structure and rare variants". Nature Genetics. 2013;45:471. doi: 10.1038/ng.2619. [DOI] [PubMed] [Google Scholar]

[bib52] Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using Genome-Wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Mills MC, Rahal C. A scientometric review of genome-wide association studies. Communications Biology. 2019;2:9. doi: 10.1038/s42003-018-0261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Mostafavi H, Berisa T, Day FR, Perry JRB, Przeworski M, Pickrell JK. Identifying genetic variants that affect viability in large cohorts. PLOS Biology. 2017;15:e2002458. doi: 10.1371/journal.pbio.2002458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Nuru-Jeter AM, Michaels EK, Thomas MD, Reeves AN, Thorpe RJ, LaVeist TA. Relative roles of race versus socioeconomic position in studies of health inequalities: a matter of interpretation. Annual Review of Public Health. 2018;39:169–188. doi: 10.1146/annurev-publhealth-040617-014230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast Cancer. New England Journal of Medicine. 2008;358:2796–2803. doi: 10.1056/NEJMsa0708739. [DOI] [PubMed] [Google Scholar]

[bib58] Polderman TJ, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]

[bib59] Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Pritchard JK, Di Rienzo A. Adaptation - not by sweeps alone. Nature Reviews Genetics. 2010;11:665–667. doi: 10.1038/nrg2880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. The American Journal of Human Genetics. 2001;69:1–14. doi: 10.1086/321275. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Racimo F, Berg JJ, Pickrell JK. Detecting polygenic adaptation in admixture graphs. Genetics. 2018;208:1565–1584. doi: 10.1534/genetics.117.300489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Reckelhoff JF. Gender differences in the regulation of blood pressure. Hypertension. 2001;37:1199–1208. doi: 10.1161/01.HYP.37.5.1199. [DOI] [PubMed] [Google Scholar]

[bib64] Reich M. Racial Inequality: A Political-Economic Analysis. Princeton University Press; 2017. [Google Scholar]

[bib65] Rimfeld K, Krapohl E, Trzaskowski M, Coleman JRI, Selzam S, Dale PS, Esko T, Metspalu A, Plomin R. Genetic influence on social outcomes during and after the soviet era in Estonia. Nature Human Behaviour. 2018;2:269–275. doi: 10.1038/s41562-018-0332-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Robinson MR, Kleinman A, Graff M, Vinkhuyzen AAE, Couper D, Miller MB, Peyrot WJ, Abdellaoui A, Zietsch BP, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, Medland SE, Martin NG, Magnusson PKE, Iacono WG, McGue M, North KE, Yang J, Visscher PM. Genetic evidence of assortative mating in humans. Nature Human Behaviour. 2017;1:16. doi: 10.1038/s41562-016-0016. [DOI] [Google Scholar]

[bib67] Rosenberg NA, Edge MD, Pritchard JK, Feldman MW. Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evolution, Medicine, and Public Health. 2019;2019:26–34. doi: 10.1093/emph/eoy036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Ruby JG, Wright KM, Rand KA, Kermany A, Noto K, Curtis D, Varner N, Garrigan D, Slinkov D, Dorfman I, Granka JM, Byrnes J, Myres N, Ball C. Estimates of the heritability of human longevity are substantially inflated due to assortative mating. Genetics. 2018;210:1109–1124. doi: 10.1534/genetics.118.301613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] Sella G, Barton NH. Thinking about the evolution of complex traits in the era of Genome-Wide association studies. Annual Review of Genomics and Human Genetics. 2019;20:461–493. doi: 10.1146/annurev-genom-083115-022316. [DOI] [PubMed] [Google Scholar]

[bib70] Selzam S, Ritchie SJ, Pingault JB, Reynolds CA, O'Reilly PF, Plomin R. Comparing within- and Between-Family polygenic score prediction. The American Journal of Human Genetics. 2019;105:351–363. doi: 10.1016/j.ajhg.2019.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib71] Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CW, Hirschhorn J, Daly MJ, Patterson N, Neale B, Mathieson I, Reich D, Sunyaev SR. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702. doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] Speidel L, Forest M, Shi S, Myers S. A method for Genome-Wide genealogy estimation for thousands of samples. bioRxiv. 2019 doi: 10.1101/550558. [DOI] [PMC free article] [PubMed]

[bib74] Stulp G, Simons MJP, Grasman S, Pollet TV. Assortative mating for human height: a meta-analysis. American Journal of Human Biology. 2017;29:e22917. doi: 10.1002/ajhb.22917. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Taylor AE, Jones HJ, Sallis H, Euesden J, Stergiakouli E, Davies NM, Zammit S, Lawlor DA, Munafò MR, Davey Smith G, Tilling K. Exploring the association of genetic factors with participation in the avon longitudinal study of parents and children. International Journal of Epidemiology. 2018;47:1207–1216. doi: 10.1093/ije/dyy060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib76] Telkar N, Reiker T, Walters RG, Lin K, Eriksson A, Gurdasani D, Gilly A. The transferability of lipid loci across african, asian and european cohorts. bioRxiv. 2019 doi: 10.1101/525170. [DOI] [PMC free article] [PubMed]

[bib77] Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics. 2018;19:581–590. doi: 10.1038/s41576-018-0018-x. [DOI] [PubMed] [Google Scholar]

[bib78] Trejo S, Benjamin WD. Genetic nature or genetic nurture? quantifying Bias in analyses using polygenic scores. bioRxiv. 2019 doi: 10.1101/524850. [DOI]

[bib79] Tropf FC, Lee SH, Verweij RM, Stulp G, van der Most PJ, de Vlaming R, Bakshi A, Briley DA, Rahal C, Hellpap R, Iliadou AN, Esko T, Metspalu A, Medland SE, Martin NG, Barban N, Snieder H, Robinson MR, Mills MC. Hidden heritability due to heterogeneity across seven populations. Nature Human Behaviour. 2017;1:757–765. doi: 10.1038/s41562-017-0195-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] Uricchio LH, Kitano HC, Gusev A, Zaitlen NA. An evolutionary compass for detecting signals of polygenic selection and mutational Bias. Evolution Letters. 2019;3:69–79. doi: 10.1002/evl3.97. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The American Journal of Human Genetics. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Vilhjálmsson BJ, Nordborg M. The nature of confounding in genome-wide association studies. Nature Reviews Genetics. 2013;14:1–2. doi: 10.1038/nrg3382. [DOI] [PubMed] [Google Scholar]

[bib83] Ware EB, Schmitz LL, Faul JD, Gard A, Mitchell C, Smith JA, Zhao W, Weir D, Kardia SLR. Heterogeneity in Polygenic scores for common human traits. bioRxiv. 2017 doi: 10.1101/106062. [DOI]

[bib84] Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, Freathy RM, Perry JR, Stevens S, Hall AS, Samani NJ, Shields B, Prokopenko I, Farrall M, Dominiczak A, Johnson T, Bergmann S, Beckmann JS, Vollenweider P, Waterworth DM, Mooser V, Palmer CN, Morris AD, Ouwehand WH, Zhao JH, Li S, Loos RJ, Barroso I, Deloukas P, Sandhu MS, Wheeler E, Soranzo N, Inouye M, Wareham NJ, Caulfield M, Munroe PB, Hattersley AT, McCarthy MI, Frayling TM, Diabetes Genetics Initiative, Wellcome Trust Case Control Consortium, Cambridge GEM Consortium Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genetics. 2008;40:575–583. doi: 10.1038/ng.121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL, Belbin GM, Bien SA, Cheng I, Cullina S, Hodonsky CJ, Hu Y, Huckins LM, Jeff J, Justice AE, Kocarnik JM, Lim U, Lin BM, Lu Y, Nelson SC, Park SL, Poisner H, Preuss MH, Richard MA, Schurmann C, Setiawan VW, Sockell A, Vahi K, Verbanck M, Vishnu A, Walker RW, Young KL, Zubair N, Acuña-Alonso V, Ambite JL, Barnes KC, Boerwinkle E, Bottinger EP, Bustamante CD, Caberto C, Canizales-Quinteros S, Conomos MP, Deelman E, Do R, Doheny K, Fernández-Rhodes L, Fornage M, Hailu B, Heiss G, Henn BM, Hindorff LA, Jackson RD, Laurie CA, Laurie CC, Li Y, Lin DY, Moreno-Estrada A, Nadkarni G, Norman PJ, Pooler LC, Reiner AP, Romm J, Sabatti C, Sandoval K, Sheng X, Stahl EA, Stram DO, Thornton TA, Wassel CL, Wilkens LR, Winkler CA, Yoneyama S, Buyske S, Haiman CA, Kooperberg C, Le Marchand L, Loos RJF, Matise TC, North KE, Peters U, Kenny EE, Carlson CS. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] Young AI, Frigge ML, Gudbjartsson DF, Thorleifsson G, Bjornsdottir G, Sulem P, Masson G, Thorsteinsdottir U, Stefansson K, Kong A. Relatedness disequilibrium regression estimates heritability without environmental Bias. Nature Genetics. 2018;50:1304–1310. doi: 10.1038/s41588-018-0178-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Young AI, Benonisdottir S, Przeworski M, Kong A. Deconstructing the sources of genotype-phenotype associations in humans. Science. 2019;365:1396–1400. doi: 10.1126/science.aax3710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] Zhang G, Bacelis J, Lengyel C, Teramo K, Hallman M, Helgeland Ø, Johansson S, Myhre R, Sengpiel V, Njølstad PR, Jacobsson B, Muglia L. Assessing the causal relationship of maternal height on birth size and gestational age at birth: a mendelian randomization analysis. PLOS Medicine. 2015;12:e1001865. doi: 10.1371/journal.pmed.1001865. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Variable prediction accuracy of polygenic scores within an ancestry group

Hakhamanesh Mostafavi

Arbel Harpak

Ipsita Agarwal

Dalton Conley

Jonathan K Pritchard

Molly Przeworski

Roles

Abstract

eLife digest

Introduction

Results

Sample characteristics of the GWAS and prediction set can influence prediction accuracy even within a single ancestry

Figure 1. Variable prediction accuracy of polygenic scores within an ancestry group.

Possible explanations for the variable prediction accuracy

Figure 2. Differences in environmental variance alone do not explain the variable prediction accuracy.

Obstacles to portability explored through a comparison of standard and family-based GWAS

Figure 3. Comparison of prediction accuracy of standard and sib-based polygenic scores.

Discussion

Materials and methods

UK biobank

Inclusion criteria

Phenotype data

Genotype data

GWAS and trait prediction methods

GWAS by sample characteristics

Standard versus sibling-based polygenic score

Estimating n*

Simulated traits

Polygenic score construction and trait prediction

Estimating heritability and genetic correlation

Acknowledgements

Appendix 1

1 Prediction accuracies of polygenic scores based on standard and sib-GWAS

1.1 Overview of derived results

Matching standard and sib-based prediction accuracies

Indirect parental effects

Assortative mating

1.2 Picking the sample size of the standard GWAS to match the prediction accuracy of the score based on the sib-GWAS

1.2.1 Sampling error of the estimated effect size at a single site

1.2.2 Sample size required for equal prediction accuracy

1.2.3 Empirical matching of standard errors

1.3 Indirect parental effects

1.3.1 Distribution of the effect size estimate at a single site

1.3.2 Polygenic score prediction accuracy

1.3.3 Simulations of indirect effects

1.4 Assortative mating

1.4.1 Simulations of assortative mating

Further simulation details

Appendix 1—figure 1. Variable prediction accuracy within an ancestry group.

Appendix 1—figure 2. Variable prediction accuracy (measured as R2) within an ancestry group.

Appendix 1—figure 3. Dependence on the polygenic score model.

Appendix 1—figure 4. Estimating mean effect size across strata.

Appendix 1—figure 5. Variable prediction accuracy within an ancestry also seen using a linear mixed model.

Appendix 1—figure 6. Comparison of siblings and unrelated individuals in the UK Biobank with respect to age, SES, and sex ratio.

Appendix 1—figure 7. Comparison of siblings and unrelated individuals in the UK Biobank with respect to population structure.

Appendix 1—figure 8. Comparison of prediction accuracies of polygenic scores based on standard and sib-GWAS for simulated traits.

Appendix 1—figure 9. Simulation results for polygenic scores based on standard GWAS and sib-GWAS in the presence of indirect effects.

Appendix 1—figure 10. Simulation results for polygenic scores based on standard GWAS and sib-GWAS in the presence of assortative mating.

Appendix 1—figure 11. Comparison of prediction accuracies of polygenic scores based on standard and sib-GWAS matched for sex ratio.

Appendix 1—figure 12. Prediction accuracy of polygenic scores based on sib-and standard GWAS, for a range of traits.

Appendix 1—figure 13. Prediction accuracy for years of schooling, for individuals with 0 or 1 full sibling.

Appendix 1—figure 14. Variable prediction accuracy for binary traits, when measured as incremental AUC.

Appendix 1—figure 15. Variable prediction accuracy for binary disease phenotypes, measured as incremental AUC, in men versus women.

Appendix 1—figure 16. Comparison of prediction accuracies of polygenic scores (measured as R2) based on standard and sib-GWAS.

Appendix 1—table 1. UK Biobank phenotype data used in this study and their corresponding data fields.

Appendix 1—table 2. Genetic correlations across samples that vary by a study characteristic.

Appendix 1—table 3. Sample sizes used for siblings and unrelated sets.

Appendix 1—table 4. Qualifications to years of schooling conversion table.

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Ethics

Additional files

Data availability

References

Estimating $n^{*}$

Appendix 1—figure 2. Variable prediction accuracy (measured as $R^{2}$ ) within an ancestry group.

Appendix 1—figure 16. Comparison of prediction accuracies of polygenic scores (measured as $R^{2}$ ) based on standard and sib-GWAS.