Skip to main content
Genetics logoLink to Genetics
. 2022 Oct 17;222(4):iyac158. doi: 10.1093/genetics/iyac158

Genetic determinants of polygenic prediction accuracy within a population

Tianyuan Lu 1,2,, Vincenzo Forgetta 3, John Brent Richards 4,5,6, Celia M T Greenwood 7,8,9,10,
Editor: E Hauser
PMCID: PMC9713421  PMID: 36250789

Abstract

Genomic risk prediction is on the emerging path toward personalized medicine. However, the accuracy of polygenic prediction varies strongly in different individuals. Based on up to 352,277 European ancestry participants in the UK Biobank, we constructed polygenic risk scores for 15 physiological and biochemical quantitative traits. We identified a total of 185 polygenic prediction variability quantitative trait loci for 11 traits by Levene’s test among 254,376 unrelated individuals. We validated the effects of prediction variability quantitative trait loci using an independent test set of 58,927 individuals. For instance, a score aggregating 51 prediction variability quantitative trait locus variants for triglycerides had the strongest Spearman correlation of 0.185 (P-value <1.0 × 10−300) with the squared prediction errors. We found a strong enrichment of complex genetic effects conferred by prediction variability quantitative trait loci compared to risk loci identified in genome-wide association studies, including 89 prediction variability quantitative trait loci exhibiting dominance effects. Incorporation of dominance effects into polygenic risk scores significantly improved polygenic prediction for triglycerides, low-density lipoprotein cholesterol, vitamin D, and platelet. In conclusion, we have discovered and profiled genetic determinants of polygenic prediction variability for 11 quantitative biomarkers. These findings may assist interpretation of genomic risk prediction in various contexts and encourage novel approaches for constructing polygenic risk scores with complex genetic effects.

Keywords: polygenic risk score, prediction accuracy, dominance, gene-by-environment interaction, quantitative trait loci, genome-wide association study, Genomic Prediction, GenPred, Shared Data Resource

Introduction

In the past decade, large-scale genome-wide association studies (GWASs) have begun to reveal the genetic architecture of complex traits (Visscher et al. 2017). Accurately estimated effects of genetic determinants have made the construction of polygenic risk scores possible (Khera et al. 2018; Wand et al. 2021). These polygenic risk scores aggregate multiple genetic variants associated with the target traits across the genome and may be able to capture a significant proportion of trait variance (Khera et al. 2018, 2019). It has been recognized that polygenic risk scores can contribute importantly both clinically and in research, by enabling risk stratification in large populations (Khera et al. 2018; Inouye et al. 2018; Lu, Forgetta, Wu, et al. 2021; Lu, Forgetta, Keller-Baruch, et al. 2021; Lu et al. 2022a), informing on risk factors (Lu et al. 2020; Lewis and Vassos 2020), assisting diagnosis for complex diseases (Lu, Zhou, et al. 2021; Lu et al. 2022b), and suggesting potential therapeutic targets (Ritchie et al. 2021).

Despite its ever-increasing efficiency, polygenic prediction has important shortcomings. Interpopulation heterogeneity, especially genetic ancestry discrepancies, may lead to substantial attenuation in the predictive performance of polygenic risk scores (Martin et al. 2017, 2019). However, even within the same population, the prediction accuracy of polygenic risk scores can be highly variable (Mostafavi et al. 2020). For instance, among European ancestry participants in the UK Biobank (Bycroft et al. 2018), polygenic risk scores have been demonstrated to predict body mass index (BMI) more accurately in middle-aged adults than in older adults and to predict years of schooling more accurately among individuals in lower socioeconomic status groups (Mostafavi et al. 2020). Notably, to date, existing polygenic risk scores mainly include linear additive effects of common genetic variants. Therefore, the attenuation of prediction accuracy associated with population characteristics may reflect poorly modeled gene-by-environment interaction effects, which could render the estimated effects of genetic predictors in polygenic risk scores inaccurate, in the presence of specific environmental or lifestyle exposures.

In addition, more complex genetic effects, such as dominance effects, exist for many complex traits (Aschard et al. 2012; Varona et al. 2018; Huber et al. 2018; Kerin and Marchini 2020), yet they are difficult to rigorously model in polygenic risk scores due to their relatively weak effect sizes (Zhu et al. 2015; Sulc et al. 2020). Therefore, profiling the genetic determinants of prediction variability within a population may help better understand intrapopulation heterogeneity, interpret results of genomic risk predictions, and further improve methods for developing polygenic risk scores.

In this study, leveraging resources from the UK Biobank (Bycroft et al. 2018), we construct polygenic risk scores for 15 vital physiological and biochemical quantitative traits. We then systematically search for polygenic prediction variability quantitative trait loci (PVLs) associated with the residuals after correcting for the standard linear polygenic risk score and other known covariates and validate their effects on the accuracy of polygenic prediction. We assess dominance effects and interaction effects with environmental and lifestyle exposures underlying these PVLs. Lastly, we seek to improve polygenic prediction by incorporating dominance effects into polygenic risk scores.

Methods

Study cohort

We utilized the UK Biobank (Bycroft et al. 2018), one of the largest genotyped cohorts to ensure statistical power. Between 2006 and 2010, the UK Biobank recruited and genotyped approximately 500,000 participants at multiple assessment centers located in the United Kingdom. Though participants in the UK Biobank were healthier, less obese, and less likely to smoke and consume alcohol compared to the general population (Fry et al. 2017), this cohort has facilitated extensive investigations on the associations between the genetics, environmental and lifestyle exposures, and health outcomes. The UK Biobank performed genome-wide genotyping using Affymetrix arrays based on DNA extracted from blood samples provided by the participants. The genotypes were imputed to the Haplotype Reference Consortium reference panel (McCarthy et al. 2016).

Deep phenotyping of the UK Biobank participants was conducted upon the initial assessment visit (Bycroft et al. 2018), including a wide variety of anthropometric measurements, blood and urine biomarkers, etc. We hereby focused on 15 quantitative traits that were important biomarkers in health care practice or research while having a complex genetic architecture. These traits were BMI (data field 21001), waist-to-hip ratio (WHR, derived based on data fields 48 and 49), standing height (data field 50), heel bone mineral density (BMD, data field 3148), plasma total calcium (data field 30680), plasma vitamin D (data field 30890), ratio of the forced expiratory volume in the first 1 s to the forced vital capacity of the lungs (FEV1/FVC ratio, derived based on data fields 3062 and 3063), plasma glucose (data field 30740), hemoglobin A1c (HbA1c, data field 30750), plasma low-density lipoprotein (LDL) cholesterol (data field 30780), plasma triglycerides (data field 30870), diastolic blood pressure (DBP, data field 4079), systolic blood pressure (SBP, data field 4080), nucleated red blood cell (nRBC) count (data field 30170), and platelet count (data field 30080).

Because the genetic architectures of complex traits might differ across populations of different genetic ancestries (Martin et al. 2017, 2019), we only included 440,346 genotyped European ancestry participants defined based on a consensus of self-reported ancestral background and clustering of the first 6 genetic principal components, as described previously (Bycroft et al. 2018). We randomly split this cohort into 3 datasets: a discovery set (80.0%), a linkage disequilibrium (LD) reference set (1.5%), and a test set (18.5%). The sample size of the LD reference set was chosen to ensure approximately 5,000 individuals could be included for LD calculation (Yang, Ferreira, et al. 2012). Furthermore, to reduce the confounding effects of genetic relatedness, we first derived pairwise genetic relationship based on autosomal variants using the Genome-wide Complex Trait Analysis (GCTA) software (Yang et al. 2011) with default settings. We then randomly removed 1 individual in each pair of third-degree or closer relatives that had a kinship >0.0442 (Wang et al. 2019). In total, 318,071 largely unrelated individuals were retained (Supplementary Table 1), including 254,376 in the discovery set, 4,768 in the LD reference set, and 58,927 in the test set. Apart from the linear mixed model-based GWASs performed on the discovery set, all downstream analyses were conducted using the datasets that had excluded related individuals.

Genome-wide association studies and construction of polygenic risk scores

On the discovery set that included related individuals, we retained 6,708,723 common single-nucleotide polymorphisms (SNPs, minor allele frequency >0.05) that were genotyped or imputed with an imputation quality score (INFO) >0.8. We next conducted GWASs for each of the 15 quantitative traits using the linear mixed model implemented in fastGWA (Jiang et al. 2019), adjusting for age, sex, recruitment center, genotyping array, as well as the first 20 genetic principal components. The linear mixed model was designed to account for genetic relatedness and population stratification, while substantially increasing statistical power over restricting association testing to unrelated individuals (Loh et al. 2015; Jiang et al. 2019). The genetic principal components provided by the UK Biobank were calculated using genotyped SNPs of high quality after LD pruning (Bycroft et al. 2018). For each trait, we excluded individuals whose phenotypes were more than 5 standard deviations (SD) away from the phenotypic mean after adjusting for these covariates.

We then performed conditional and joint multiple SNP (COJO) analysis (Yang, Ferreira, et al. 2012) of the GWAS summary statistics of each trait to identify COJO-independent SNPs (P-value < 5 × 10−8) using the GCTA software (Yang et al. 2011) with default settings. The LD reference set was used as the LD reference panel to account for LD at each given locus. For each trait, following Yengo et al. (2018), we constructed polygenic risk scores as the sum of allele dosage of the COJO-independent SNPs, weighted by their effect sizes on the corresponding trait estimated in GWAS. We evaluated the proportion of total phenotypic variance explained by each polygenic risk score in the discovery, LD reference, and test sets separately.

Levene’s test for identifying prediction variability quantitative trait loci

In the discovery set excluding related individuals, for each trait, we regressed out the effects of age, sex, recruitment center, genotyping array, the first 20 genetic principal component, and the polygenic effect summarized by a polygenic risk score, using linear regression. With the regression residuals (prediction errors) as responses, we performed median-based Levene’s test on a per-SNP basis, as implemented in the OmicS-data-based Complex trait Analysis (OSCA) software (Zhang et al. 2019). Again, we excluded individuals whose residualized phenotypes were more than 5 SD away from the mean.

Different from SNPs in GWAS-identified risk loci that were associated with phenotypic mean values, the PVL SNPs obtained from Levene’s test were associated with the squared prediction errors. The per-allele effect of each SNP on the squared prediction errors was derived from z-statistics (Zhu et al. 2016; Zhang et al. 2019). Notably, we did not impose any transformation of the phenotypes in either the GWASs or in PVL discovery, because (1) Levene’s test is robust to distribution of the phenotypes and (2) it has been shown that nonlinear transformation, such as the logarithm transformation or the rank-based inverse normal transformation, may lead to an inflated false positive rate in Levene’s test (Wang et al. 2019).

COJO analysis (Yang, et al. 2012) was performed on the PVL summary statistics of each trait to identify COJO-independent PVL SNPs using the GCTA software (Yang et al. 2011) with default settings and with the UK Biobank LD reference set as the LD reference panel.

To account for testing multiple correlated traits, we determined the effective number of traits by an eigen-decomposition approach (Wang et al. 2019). Specifically, we calculated the variance-covariance matrix of all 15 traits based on the UK Biobank discovery set excluding related individuals. The eigen-decomposition of this variance–covariance matrix yielded 15 ordered eigenvalues (λ1, …, λ15). The effective number was estimated as

k=115λk2k=115λk210.8

Therefore, we set the corrected genome-wide significance threshold as P-value <5 × 10−8/10.8 ≈ 4.6 × 10−9. All identified PVLs were annotated to their nearest genes based on the hg19 genome assembly.

Estimating heritability and cross-trait genetic correlation of polygenic prediction variability

We next estimated the heritability of polygenic prediction variability by performing LD score regression (Bulik-Sullivan, et al. 2015) leveraging the PVL summary statistics for each trait. We also performed cross-trait LD score regression (Bulik-Sullivan, Finucane, et al. 2015) for each pair of the 15 traits to quantify the genetic correlation of polygenic prediction variability. Genetic correlation with a false discovery rate (FDR, i.e. Benjamini–Hochberg-corrected P-value) <0.05, accounting for 105 pairs of traits, was considered significant. LD score regression and cross-trait LD score regression were implemented in the LDSC software with default settings based on HapMap3 SNPs (Bulik-Sullivan, Loh, et al. 2015). At least 1,063,290 out of 1,217,312 HapMap3 SNPs curated by the LDSC software were included in our analyses and used for heritability and genetic correlation estimation (Data Availability). LD scores were provided by the LDSC software and were obtained from the participants of the 1000 Genomes Project of European ancestry (Abecasis et al. 2012; Auton et al. 2015).

Predicting the variability of prediction errors

To estimate the PVL effects on polygenic prediction variability, we constructed PVL scores based on the per-allele effect of each COJO-independent PVL SNP on the prediction errors estimated above. On the test set, for traits with identified PVLs, we first regressed out the effects of the corresponding polygenic risk scores, age, sex, recruitment center, genotyping array, and the first 20 genetic principal components. We examined whether the squared residuals (prediction errors) were associated with the PVL scores, by estimating the Spearman’s rank correlation coefficient (ρ).

Colocalization analyses

We subsequently evaluated whether the risk loci identified in GWAS and used for constructing standard polygenic risk scores had impact on the prediction variability by performing colocalization analyses using the eCAVIAR software (Hormozdiari et al. 2016). Following Hormozdiari et al. (2016), We retrieved GWAS summary statistics and PVL summary statistics for all SNPs in a 100-SNP window centered around each COJO-independent SNP (50 SNPs upstream and 50 SNPs downstream) identified in GWASs. The UK Biobank LD reference set was used as the LD reference panel. Evidence for colocalization was assessed by the colocalization posterior probability (CLPP) (Hormozdiari et al. 2016).

Detecting dominance effects

In the discovery set, but excluding related individuals, we tested whether the identified PVLs had significant dominance effects. Significance of the dominance effects was assessed for each COJO-independent PVL SNP by comparing the following 2 linear regression models using a likelihood ratio test:

Yβ0+DβD+Qγ

and

Yβ0+Qγ

where Y represents the trait with phenotypic mean β0; D is an indicator of carriers of the chosen allele with effect βD (the 2 alleles of each PVL SNP were tested in turn for dominance effects); and Q represents covariates (age, sex, recruitment center, genotyping array, the first 20 genetic principal components, as well as the polygenic risk score) with effects γ.

Detecting SNP–exposure interaction effects

Furthermore, we examined whether the PVL SNPs showed evidence for interaction with measured environmental or lifestyle exposures, using the discovery set. Seven common exposures were investigated, including age, sex, smoking, self-reported alcohol intake frequency, use of cholesterol-lowering drug, use of antihypertensive drug, and sedentary activity duration.

Smoking status was determined according to self-reported smoking history, and included ever-smokers and never-smokers. Cholesterol-lowering drugs reported by UK Biobank participants included statin (atorvastatin, fluvastatin, pravastatin, rosuvastatin, simvastatin, etc.), cholesterol absorption inhibitors (ezetimibe, etc.), and others. These included medication codes in data field 20003 (Elliott et al. 2020): 1140861954, 1140861958, 1140888594, 1140888648, 1141146234, 1141192410, and 1141192736. Antihypertensive drugs included diuretics (amiloride, bumetanide, etc.), calcium channel blockers (amlodipine, nifedipine, diltiazem, verapamil, etc.), angiotensin-converting enzyme inhibitors (enalapril, lisinopril, ramipril, etc.), angiotensin II receptor antagonists (losartan, valsartan, etc.), adrenergic receptor antagonists (atenolol, metoprolol, nadolol, etc.), and others. These included records in data fields 6153 and 6177, as well as medication codes in data field 2003 (Elliott et al. 2020): 1140860192, 1140860292, 1140860696, 1140860728, 1140860750, 1140860806, 1140860882, 1140860904, 1140861088, 1140861190, 1140861276, 1140866072, 1140866078, 1140866090, 1140866102, 1140866108, 1140866122, 1140866138, 1140866156, 1140866162, 1140866724, 1140866738, 1140868618, 1140872568, 1140874706, 1140874744, 1140875808, 1140879758, 1140879760, 1140879762, 1140879802, 1140879806, 1140879810, 1140879818, 1140879822, 1140879826, 1140879830, 1140879834, 1140879842, 1140879866, 1140884298, 1140888552, 1140888556, 1140888560, 1140888646, 1140909706, 1140910442, 1140910614, 1140916356, 1140923272, 1140923336, 1140923404, 1140923712, 1140926778, 1140928226, 1141145660, 1141146126, 1141152998, 1141153026, 1141164276, 1141165470, 1141166006, 1141169516, 1141171336, 1141180592, 1141180772, 1141180778, 1141184722, 1141193282, 1141194794, and 1141194810. Sedentary activity duration was the sum of self-reported number of hours spent on driving, using computer, or watching television (Wang et al. 2019). Summaries of these environmental factors are provided in Supplementary Table 2.

Significance of the SNP–exposure interaction effects was assessed for each pair of COJO-independent PVL SNP and candidate exposure by comparing the 2 following linear regression models using a likelihood ratio test:

Yβ0+GβG+EβE+G×EβG×E+Qγ

and

Yβ0+GβG+EβE+Qγ

where Y represents the phenotype with phenotypic mean β0; G and E represent the genotype at an identified PVL SNP, and one of the 7 candidate exposures with effects βG and βE, respectively; G×E stands for the SNP–exposure interaction with effect βG×E; and Q represents other covariates (age, sex, recruitment center, genotyping array, the first 20 genetic principal component, as well as the polygenic risk score) with effects γ.

We repeated the analyses of dominance effects and SNP–exposure interaction effects for all COJO-independent SNPs identified in GWASs.

Incorporating dominance effects into polygenic risk scores

To potentially improve polygenic predictions, for each trait, we built a multivariate linear regression model on the discovery set to combine the polygenic risk score with all PVLs demonstrating significant dominance effects, adjusting for age, sex, recruitment center, genotyping array, and the first 20 genetic principal components. To account for multiple testing (185 PVLs × 2 alleles = 370 tests), we experimented with both an FDR threshold and a Bonferroni threshold for calling significant dominance effects. Using the estimated regression coefficients and the Test dataset, we then compared the predictive performance of a polygenic risk score including dominance effects with the original polygenic risk score. We bootstrapped the test set 1,000 times to obtain a confidence interval for the predictions.

We refrained from incorporating SNP–exposure interaction effects into polygenic risk scores due to potential confounding and the possibility of reverse causation.

Results

Characterization of PVLs

An overview of this study is provided in Fig. 1 and detailed in Methods. Typically, GWASs estimate the effects of genetic variants using a regression framework assuming linearity and additivity. Most genetic variants would not be associated with the squared prediction error if a genetic risk predictor derived from a GWAS accurately captures or approximates the overall genetic effects (Fig. 1a). However, PVLs featuring genetic associations with squared prediction error may exist when the genetic risk predictor fails to capture strong dominance effects or strong interaction effects between genetic variants and nongenetic exposures (Fig. 1a).

Fig. 1.

Fig. 1.

Study overview. a) Illustration of possible sources of genetic associations with genetic prediction accuracy. A variant will not be associated with squared prediction error if it only confers a linear and additive effect. Dominance effects and interaction effects between genetic variants and an unmeasured exposure can lead to genotype-dependent heteroscedasticity, and genetic associations with squared prediction error. Boxes denote interquartile ranges with the solid lines indicating median values. Widths of boxes are proportionate to sample sizes. Genetic predictors are derived from linear regression. b) The European ancestry participants in the UK Biobank were randomly split into 3 disjoint subsets. Third-degree or more closely related individuals were removed from the LD reference set and the test set. Genome-wide association studies were performed on the discovery set including related individuals, using linear mixed model regression. Identification of PVL was conducted on the discovery set excluding related individuals.

To identify PVLs and characterize the nature of PVL effects, European ancestry participants in the UK Biobank were randomly assigned into a discovery set (80%), an LD reference set (1.5%), and a test set (18.5%) (Fig. 1b and Supplementary Tables 1 and 2). On the UK Biobank discovery set, GWAS and COJO analyses identified risk loci for 15 physiological and biochemical traits. These traits demonstrated high polygenicity, with glucose having the smallest number of COJO-independent SNPs (60), and height having the largest number (1,317) (Supplementary Tables 3 and 4). Although less powered than the largest meta-analyses of GWASs (Yengo et al. 2018), these in-sample GWASs should ensure homogeneity of estimated genetic effects in the discovery and test sets. Polygenic risk scores constructed using these COJO-independent SNPs captured a nontrivial proportion of the total phenotypic variance, up to 16.93% for platelet counts (Supplementary Table 3).

By performing the median-based Levene’s test, we then identified genetic variants significantly associated with polygenic prediction variability (Methods). A total of 185 COJO-independent PVLs were identified for 11 out of the 15 quantitative traits under investigation (Fig. 2a and Supplementary Table 5), including BMI (12 PVLs), FEV1/FVC ratio (2 PVLs), glucose (12 PVLs), HbA1c (34 PVLs), heel BMD (1 PVL), LDL cholesterol (52 PVLs), triglycerides (51 PVLs), SBP (1 PVL), platelet (9 PVLs), calcium (1 PVL), and vitamin D (10 PVLs). No PVL was identified for DBP, height, WHR, and nRBC count. Based on genomic annotations, 108 of these 185 PVLs were located in 81 known genes (Supplementary Table 6). Furthermore, we found 51 PVLs that seemed to be located in upstream regulatory regions of known genes, and 26 PVLs in downstream regulatory regions (Supplementary Table 6).

Fig. 2.

Fig. 2.

Identification of PVLs for 15 quantitative traits. a) Manhattan plots demonstrate genetic associations with polygenic prediction variability. Dashed lines indicated a corrected genome-wide significance threshold of P-value <4.6 × 10−9. b) Comparison of estimated trait heritability based on GWASs and estimated heritability of polygenic prediction variability based on PVL studies. Heritability estimates and 95% confidence intervals are indicated for each trait.

Subsequently, we estimated the heritability of the polygenic prediction variability of each trait. Although a considerable proportion of trait variance is explained by GWAS SNPs (Fig. 2b and previously shown by many authors http://www.nealelab.is/uk-biobank/), none of these 15 traits exhibited a strong heritability of prediction variability. Traits that had more PVLs in general demonstrated that higher heritability of prediction variability, though the highest heritability, was estimated to be only 5.73% (SD = 1.21%) for triglycerides, followed by 4.46% (SD = 0.30%) for BMI, and 3.83% for vitamin D (SD = 1.73%; Fig. 2c). On the contrary, 4 traits that did not have identified PVLs (DBP, height, WHR, and nRBC) and calcium which had 1 PVL, had the lowest prediction variability heritability estimates from LD score regression, all below 0.70% (Fig. 2c).

Our observations of genetic correlations in phenotypic variability were limited to traits that were biologically related (Supplementary Fig. 1 and Supplementary Table 7). As expected, LDL, triglycerides, and glucose had pairwise significant genetically correlated prediction variability, with the strongest correlation of 0.38 (SD = 0.09; P-value = 1.2 × 10−5) between LDL and triglycerides. Meanwhile, prediction variability of BMI was genetically correlated with prediction variability of WHR (correlation = 0.35; SD = 0.07; P-value = 7.4 × 10−7), heel BMD (correlation = 0.33; SD = 0.08; P-value = 8.9 × 10−6), plasma glucose (correlation = 0.38; SD = 0.07; P-value = 2.9 × 10−8), HbA1c (correlation = 0.29; SD = 0.05; P-value = 2.4 × 10−9), and LDL (correlation = 0.14; SD = 0.05; P-value = 7.1 × 10−3).

Validation of PVL effects on polygenic prediction

We examined whether the identified PVLs had effects on polygenic prediction (Methods). For 8 traits that had more than 1 PVL, on the test set, a PVL score constructed based on the discovery set demonstrated significant association with the squared prediction errors of the polygenic risk scores (Fig. 3). For example, the PVL score for triglycerides based on 51 PVL SNPs had a Spearman correlation of 0.185 (P-value <1.0 × 10−300) with the squared prediction errors; the PVL score for vitamin D based on 10 PVL SNPs had a Spearman correlation of 0.109 (P-value = 4.1 × 10−141) with the squared prediction errors (Fig. 3). Although only 1 PVL was identified for heel BMD, this PVL SNP also demonstrated significant association with the squared prediction errors (Spearman correlation = 0.036; P-value = 7.2 × 10−11; Fig. 3). On the other hand, the single PVL for SBP had the weakest association with the squared prediction errors and was deemed insignificant (Fig. 3).

Fig. 3.

Fig. 3.

PVLs predict polygenic prediction errors on an out-of-sample test set. PVL scores were derived for BMI, triglycerides, LDL, glucose, HbA1c, platelet, and vitamin D, which had at least 9 PVLs. Median squared prediction errors (dots) and interquartile ranges (error bars) are compared across PVL score deciles. FEV1/FVC ratio had 2 PVLs, resulting in 9 PVL score values. Median squared prediction errors and interquartile ranges are summarized according to the ranks. Calcium, heel BMD, and SBP had 1 PVL each; thus, median squared prediction errors and interquartile ranges are summarized with respect to genotypes of the corresponding PVL SNP. Spearman correlation estimates are indicated for each trait.

Enrichment of dominance effects and SNP–exposure interaction effects among PVLs

The majority of GWAS-identified risk loci did not demonstrate impact on polygenic prediction variability, since there was little evidence of colocalization with PVL signals [CLPP < 0.01 (Hormozdiari et al. 2016); Fig. 4a and Supplementary Table 8]. Compared to GWAS-identified risk loci, PVLs were more likely to have dominance effects on the corresponding traits (Fig. 4b), as 89 (48.1%) of the 185 PVL SNPs had at least 1 allele with a P-value <1.4 × 10−4 (Bonferroni threshold accounting for 370 tests) for dominance effect (Methods). Interestingly, GWAS-identified risk loci with a higher CLPP, suggesting a stronger effect on prediction variability, were also more likely to have a detectable dominance effect. Specifically, 92 (20.2% out of 455) GWAS-identified risk loci with a CLPP of >0.1 had a P-value <1.4 × 10−4 for dominance effect, compared to 62 (7.9% out of 787) among those with a CLPP between 0.01 and 0.1, 53 (4.0% out of 1,328) among those with a CLPP between 0.001 and 0.01, and 10 (4.5% out of 223) among those with a CLPP of ≤0.001 (Fig. 4b).

Fig. 4.

Fig. 4.

Enrichment of complex genetic effects among PVLs. a) Number of PVLs or GWAS risk loci identified for each trait. b) Comparison of P-value distribution for dominance effect tests between PVLs and GWAS risk loci. For each genetic variant, the smaller P-value obtained from 2 dominance effect tests conducted using 2 different alleles was used to represent the evidence of dominance effect for the corresponding PVL or GWAS risk locus. c) Comparison of P-value distribution for SNP–exposure interaction effect tests between PVLs and GWAS risk loci. All GWAS risk loci were categorized with respect to evidence of colocalization with PVL signals. GWAS risk loci with a higher CLPP were more likely to have effects on polygenic prediction variability.

On the discovery set, at least 1 significant PVL SNP–exposure interaction effect with a P-value <3.9 × 10−5 (Bonferroni threshold accounting for 7 exposures × 185 PVL SNPs = 1,295 tests) was detected for each of the 7 exposures, while a total of 105 (56.8%) PVL SNPs had interaction effects with at least 1 exposure (Supplementary Fig. 2 and Supplementary Table 9). Notably, PVL SNPs for LDL, triglycerides, glucose, and HbA1c had interaction effects with the use of cholesterol-lowering drug or antihypertensive drug (Supplementary Fig. 2). As expected, these interaction effects also appeared to be more enriched among PVLs than GWAS-identified risk loci (Fig. 4c). Full summary statistics of dominance effects and SNP–exposure interaction effects are provided in Supplementary Tables 8 and 9.

Improved polygenic prediction by incorporating dominance effects

By modeling dominance effects on the discovery set (Methods), polygenic risk scores incorporating additional dominance effects may have improved predictive performance on the test set (Fig. 5 and Supplementary Table 10). For instance, after incorporating 87 dominance effects with an FDR of <0.05 for triglycerides, the adjusted R2 for the polygenic risk score increased from 0.1193 to 0.1290 (8.1% relative increment) including covariate effects; with 64 additional dominance effects in an LDL polygenic risk score, the adjusted R2 increased from 0.0950 to 0.1024 (7.8% relative increment; Fig. 5). Significant improvements in predictive performance were also observed for polygenic risk scores for vitamin D (1.1% relative increment) and platelet (0.5% relative increment), despite their smaller magnitude (Fig. 5). These improvements were consistent if a more stringent Bonferroni threshold was implemented to preselect dominance effects to be incorporated into the polygenic risk scores (Fig. 5 and Supplementary Table 11). However, predictive performance of polygenic risk scores for other traits did not exhibit evident improvements.

Fig. 5.

Fig. 5.

Improving polygenic prediction by modeling dominance effects. An FDR threshold and a Bonferroni threshold were implemented separately to preselect dominance effects to be incorporated into the polygenic risk scores. Likelihood ratio tests were performed on the discovery set to assess the significance of the jointly added dominance effects (ANOVA P-value). Adjusted R2 metrics including covariate effects, based on the baseline models without dominance effects and the complex models with dominance effects respectively, were derived on the test set. Distributions of the relative predictive performance of the 2 models were obtained based on the 1,000 bootstrap samples. Paired t-tests were performed to evaluate whether adding dominance effects significantly improved the predictive performance.

Discussion

In the past decade, discovery of genetic determinants for complex traits enabled by well-established cohorts has empowered polygenic risk scores to achieve considerable prediction accuracy (Khera et al. 2018; Lambert et al. 2019; Escribe et al. 2021; Wand et al. 2021). However, one of the major obstacles yet to be overcome before polygenic risk scores can be universally utilized in health care is the highly variable predictive performance among different groups of individuals (Mostafavi et al. 2020). Following one line of investigation as to why these performance differences occur, in this work, we sought to identify and characterize PVLs for 15 key physiological and biochemical quantitative traits based on European ancestry participants in the UK Biobank.

In total, 185 PVLs were identified for 11 out of the 15 traits under investigation. The overall genetic contributions to polygenic prediction variability were small in number and magnitude. This is expected for polygenic traits, as all traits included in this study, where many genetic variants can be involved in the formation of the trait yet each variant confers a small effect. For such traits, it has been shown both by experimental data and in theory that nonadditive genetic effects can be well approximated by linear and additive models (Hill et al. 2008; Hivert et al. 2021). As a result, in our study, most genetic components should have been captured, either directly or as a surrogate, in polygenic risk scores derived from GWASs.

Nonetheless, the identified PVLs affected pivotal genes in the biological pathways pertaining to the target traits. For instance, PVLs were identified in the APOB, LDLR, and PCSK9 genes for LDL, which are 3 clinically actionable genes for familial hyperlipidemia (Rehm et al. 2013; Richards et al. 2015; Miller et al. 2021). Meanwhile, as expected, some PVLs were previously known to be associated with both the phenotypic mean and phenotypic variance of the target traits, including but not limited to PVLs identified in the FTO gene for BMI (Yang, Loos, et al. 2012), in the CHRNA3 gene for FEV1/FVC ratio (Wang et al. 2019), and in the TCF7L2 gene for glucose and HbA1c levels (Wang et al. 2019). Apart from SBP, the aggregated effects of PVLs for 10 traits were verified to be significantly associated with squared polygenic prediction errors on an independent test set. These identified PVLs hence provide important resources for characterizing intrapopulation heterogeneity and are candidates for explaining complex genetic effects.

Dominance effects were found to be strongly enriched among PVLs compared to GWAS-identified risk loci. By incorporating dominance effects conferred by PVLs, polygenic prediction accuracy was improved for LDL, triglycerides, vitamin D, and platelets, but not for other traits where PVLs showed more limited effects. Although the improvements in accuracy were not large, our results encourage analysts to explicitly model complex genetic effects in addition to linear additive effects for common variants when developing polygenic risk scores. This may be particularly important to equitable clinical use of risk prediction tools when polygenic prediction accuracy demonstrates a high variability between subpopulations having different demographic characteristics or exposed to different risk factors. Genome-wide scans for dominance effects have been difficult in traditional GWASs due to limited statistical power. Yet, loci harboring dominance effects have sometimes been identified in large cohorts, such as KSR2 and ZNF507-LOC400684 for coronary artery disease (Nikpay et al. 2015). We thus also anticipate our analytical scheme could be generalized to identify dominance effects with a significantly reduced number of association tests and could bring a new avenue toward understanding the genetics of complex traits.

SNP–exposure interactions are another important source of genotype-dependent phenotypic variance heterogeneity (Paré et al. 2010; Franks and Pare 2016; Wang et al. 2019). Among the identified PVLs, nearly 60% interacted with at least one of the 7 environmental or lifestyle exposures that represented crucial clinical conditions. These interaction effects implied that the same genetic variant could have condition-specific effects on the corresponding traits. For instance, the FEV1/FVC ratio PVL [rs56077333 in the CHRNA3 gene, which is associated with lung functions and smoking behaviors (Stevens et al. 2008; Kaur-Knudsen et al. 2012)] had an effect on FEV1/FVC ratio specific to ever-smokers; use of cholesterol-lowering drug and antihypertensive drug widely altered the effects of PVL SNPs on blood lipids and glycemic traits. These results suggest that clinical interpretation of polygenic risk prediction could also consider a patient’s relevant environmental and/or lifestyle exposures and, possibly, infer to what extent prediction errors should be expected. Notably, although we did not attempt to incorporate these interaction effects into polygenic risk scores in order to avoid potential confounding effects, constructing condition-specific polygenic risk scores to improve their clinical utility may also be desired in future prospective studies with larger sample sizes.

Our study has important limitations. First of all, our findings should be considered population-specific. It has been revealed in multiple studies that genetic, particularly polygenic effects on complex traits may be subject to alterations in the underlying genetic architectures (Martin et al. 2017, 2019), including but not limited to changes in minor allele frequencies, LD patterns, interactions with population-specific environmental exposures as well as trait heritability. We posit that these interpopulation differences also have strong impacts on PVL effects. Due to the limited sample sizes of non-European ancestry populations present in the UK Biobank, we refrained from screening for PVLs in these populations. We expect future work in emerging non-European cohorts to extend our findings, potentially leveraging population-specific polygenic risk scores. Second, following Wang et al. (2019), we adopted Levene’s test for PVL discovery. Compared to other classical approaches, such as Bartlett’s test (Bartlett 1937) and Fligner–Killeen test (Fligner and Killeen 1976) that also assess violation of variance homogeneity assumption in linear regression, Levene’s test has been shown to have a considerably lower false positive rate, especially when the effects on variance are weak (Wang et al. 2019). However, because the model assumptions of Levene’s test is violated in the presence of genetic relatedness, related individuals were discarded from our analyses. A recently developed double generalized linear model that performs a dispersion effect test is able to incorporate random effect modeling to account for genetic relatedness (Young et al. 2018), while achieving comparable performance as Levene’s test on the same samples (Wang et al. 2019). This method could probably have enhanced power by including related individuals, but was not utilized in this study to perform genome-wide scanning for PVL due to its high computational cost (Young et al. 2018; Wang et al. 2019). It is worth noting also that Levene’s test targets variance heterogeneity in quantitative traits. It remains challenging to efficiently identify and characterize PVL for binary traits, such as disease outcomes. In addition, our analyses were restricted to common SNPs to ensure statistical power. More rigorous methods, such as region-based tests (Soave and Sun 2017), may allow for integration of rare variants that also have strong effects on phenotypic variability. Lastly, although we identified significant dominance effects and SNP–exposure interaction effects underlying some PVL, there were still many PVLs that did not display dominance effects and did not interact with any of the exposures examined in this study. While arguably these PVLs could interact with exposures not tested in the current study, we note that they may also arise from other mechanisms (Paré et al. 2010), such as genetically controlled homeostatic regulation and epistasis. More investigations are needed to evaluate if these mechanisms are plausible.

In summary, we have discovered and validated genetic variants associated with the polygenic prediction variability of 11 vital physiological and biochemical traits. We have elucidated the nature of some of the PVL effects and the added value of incorporating complex genetic effects into polygenic risk scores. These findings warrant future investigations into the interpretation of polygenic risk prediction in different contexts, as well as novel approaches for developing polygenic risk scores for improved personalized medicine.

Supplementary Material

iyac158_Supplemental_Table_S1-S3
iyac158_Supplemental_Figures_S1-S2
iyac158_Supplemental_Table_S4
iyac158_Supplemental_Table_S5
iyac158_Supplemental_Table_S6
iyac158_Supplemental_Table_S7
iyac158_Supplemental_Table_S8
iyac158_Supplemental_Table_S9
iyac158_Supplemental_Table_S10
iyac158_Supplemental_Table_S11
iyac158_Supplemental_Material_Legends

Contributor Information

Tianyuan Lu, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada; Quantitative Life Sciences Program, McGill University, Montreal, QC H3A 0G4, Canada.

Vincenzo Forgetta, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada.

John Brent Richards, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada; Department of Human Genetics, McGill University, Montreal, QC H3A 0G4, Canada; Department of Twin Research and Genetic Epidemiology, King’s College London, London WC2R 2LS, UK.

Celia M T Greenwood, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada; Department of Human Genetics, McGill University, Montreal, QC H3A 0G4, Canada; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 0G4, Canada; Gerald Bronfman Department of Oncology, McGill University, Montreal, QC H3A 0G4, Canada.

Data Availability

Full summary statistics of genome-wide discovery of PVLs, computational scripts, and supplementary tables are deposited in a figshare repository https://doi.org/10.6084/m9.figshare.17131472.v1. TL had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Restrictions apply to the availability of individual-level data from the UK Biobank to preserve patient confidentiality. These data are available from UK Biobank upon successful project application to the research committee.

Supplemental material is available at GENETICS online.

Funding

This research has been conducted using the UK Biobank resource under Application Number 27449 and 60755. This study was enabled in part by support provided by Calcul Québec and Compute Canada. CMTG is supported by a Canadian Institutes of Health Research grant (CIHR; PJT-148620). The Richards research group is supported by the Canadian Institutes of Health Research (365825; 409511), the Lady Davis Institute of the Jewish General Hospital, the Canadian Foundation for Innovation, the NIH Foundation, Cancer Research UK, Genome Québec, the Public Health Agency of Canada, and the Fonds de Recherche Québec Santé (FRQS). JBR is supported by a FRQS Clinical Research Scholarship Merite. TL has been supported by a Vanier Canada Graduate Scholarship, an FRQS Doctoral Training Fellowship and a McGill University Faculty of Medicine Scholarship.

Author contributions

TL, CMTG, and JBR designed and directed the study. TL and CMTG designed the analytical framework. VF managed the data and computational software. TL performed the analyses and wrote the initial article. All authors revised and approved the article.

Conflicts of interest

JBR is the founder of 5 Prime Sciences and has served as a consultant to GlaxoSmithKline and Deerfield Capital for their genetics programs. The other authors have no relevant disclosures.

Literature cited

  1. Aschard H, Chen J, Cornelis MC, Chibnik LB, Karlson EW, Kraft P.. Inclusion of gene-gene and gene-environment interactions unlikely to dramatically improve risk prediction for complex diseases. Am J Hum Genet. 2012;90(6):962–972. doi: 10.1016/j.ajhg.2012.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bartlett MS. Properties of sufficiency and statistical tests. Proc R Soc Lond Ser A: Math Phys Sci. 1937;160(901):268–282. [Google Scholar]
  3. Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, Duncan L, Perry JRB, Patterson N, Robinson EB, et al. ; Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47(11):1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM; Consortium Schizophrenia Working Group of the Psychiatric Genomics. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, Dehghan A, Muller DC, Elliott P, Tzoulaki I.. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA. 2020;323(7):636–645. doi: 10.1001/jama.2019.22241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Escribe C, Lu T, Keller-Baruch J, Forgetta V, Xiao B, Richards JB, Bhatnagar S, Oualkacha K, Greenwood CMT.. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genet Epidemiol. 2021;45(8):874–890. doi: 10.1002/gepi.22430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Fligner MA, Killeen TJ.. Distribution-free two-sample tests for scale. J Am Stat Assoc. 1976;71(353):210–213. [Google Scholar]
  9. Franks PW, Pare G.. Putting the genome in context: gene-environment interactions in type 2 diabetes. Curr Diab Rep. 2016;16(7):57. doi: 10.1007/s11892-016-0758-y. [DOI] [PubMed] [Google Scholar]
  10. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, Collins R, Allen NE.. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am J Epidemiol. 2017;186(9):1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA; 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR; 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Hill WG, Goddard ME, Visscher PM.. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008;4(2):e1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hivert V, Sidorenko J, Rohart F, Goddard ME, Yang J, Wray NR, Yengo L, Visscher PM.. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. Am J Hum Genet. 2021;108(5):962. doi: 10.1016/j.ajhg.2021.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hormozdiari F, van de Bunt M, Segre AV, Li X, Joo JWJ, Bilow M, Sul JH, Sankararaman S, Pasaniuc B, Eskin E.. Colocalization of GWAS and eQTL signals detects target genes. Am J Hum Genet. 2016;99(6):1245–1260. doi: 10.1016/j.ajhg.2016.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Huber CD, Durvasula A, Hancock AM, Lohmueller KE.. Gene expression drives the evolution of dominance. Nat Commun. 2018;9(1):2750. doi: 10.1038/s41467-018-05281-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, Lai FY, Kaptoge S, Brozynska M, Wang T, et al. ; UK Biobank CardioMetabolic Consortium CHD Working Group. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J Am Coll Cardiol. 2018;72(16):1883–1893. doi: 10.1016/j.jacc.2018.07.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jiang L, Zheng Z, Qi T, Kemper KE, Wray NR, Visscher PM, Yang J.. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet. 2019;51(12):1749–1755. doi: 10.1038/s41588-019-0530-8. [DOI] [PubMed] [Google Scholar]
  19. Kaur-Knudsen D, Nordestgaard BG, Bojesen SE.. CHRNA3 genotype, nicotine dependence, lung function and disease in the general population. Eur Respir J. 2012;40(6):1538–1544. doi: 10.1183/09031936.00176811. [DOI] [PubMed] [Google Scholar]
  20. Kerin M, Marchini J.. Inferring gene-by-environment interactions with a Bayesian whole-genome regression model. Am J Hum Genet. 2020;107(4):698–713. doi: 10.1016/j.ajhg.2020.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50(9):1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, Distefano M, Senol-Cosar O, Haas ME, Bick A, et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177(3):587–596.e9. doi: 10.1016/j.cell.2019.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lambert SA, Abraham G, Inouye M.. Towards clinical utility of polygenic risk scores. Hum Mol Genet. 2019;28(R2):R133–R142. doi: 10.1093/hmg/ddz187. [DOI] [PubMed] [Google Scholar]
  24. Lewis CM, Vassos E.. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12(1):44. doi: 10.1186/s13073-020-00742-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47(3):284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lu T, Forgetta V, Keller-Baruch J, Nethander M, Bennett D, Forest M, Bhatnagar S, Walters RG, Lin K, Chen Z, et al. Improved prediction of fracture risk leveraging a genome-wide polygenic risk score. Genome Med. 2021;13(1):16. doi: 10.1186/s13073-021-00838-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lu T, Forgetta V, Wu H, Perry JRB, Ong KK, Greenwood CMT, Timpson NJ, Manousaki D, Richards JB.. A polygenic risk score to predict future adult short stature amongst children. J Clin Endocrinol Metab. 2021;106(7):1918–1928. doi: 10.1210/clinem/dgab215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lu T, Forgetta V, Yu OHY, Mokry L, Gregory M, Thanassoulis G, Greenwood CMT, Richards JB.. Polygenic risk for coronary heart disease acts through atherosclerosis in type 2 diabetes. Cardiovasc Diabetol. 2020;19(1):12. doi: 10.1186/s12933-020-0988-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lu T, Zhou S, Wu H, Forgetta V, Greenwood CMT, Richards JB.. Individuals with common diseases but with a low polygenic risk score could be prioritized for rare variant screening. Genet Med. 2021;23(3):508–515. doi: 10.1038/s41436-020-01007-7. [DOI] [PubMed] [Google Scholar]
  30. Lu T, Forgetta V, Richards JB, Greenwood CM. Capturing additional genetic risk from family history for improved polygenic risk prediction. Commun Biol 2022a;5:595. https://doi.org/10.1038/s42003-022-03532-4. [DOI] [PMC free article] [PubMed]
  31. Lu T, Forgetta V, Richards JB, Greenwood CM.. Polygenic risk score as a possible tool for identifying familial monogenic causes of complex diseases. Genet Med. 2022b;24(7):1545–1555. [DOI] [PubMed] [Google Scholar]
  32. Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE.. Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet. 2017;100(4):635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ.. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51(4):584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, et al. ; Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Miller DT, Lee K, Chung WK, Gordon AS, Herman GE, Klein TE, Stewart DR, Amendola LM, Adelman K, Bale SJ, et al. ; ACMG Secondary Findings Working Group. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23(8):1381–1390. doi: 10.1038/s41436-021-01172-3. [DOI] [PubMed] [Google Scholar]
  36. Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M.. Variable prediction accuracy of polygenic scores within an ancestry group. eLife. 2020;9:e48376. doi: 10.7554/eLife.48376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Nikpay M, Goel A, Won HH, Hall LM, Willenborg C, Kanoni S, Saleheen D, Kyriakou T, Nelson CP, Hopewell JC, et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015;47(10):1121–1130. doi: 10.1038/ng.3396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Paré G, Cook NR, Ridker PM, Chasman DI.. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet. 2010;6(6):e1000981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rehm HL, Bale SJ, Bayrak-Toydemir P, Berg JS, Brown KK, Deignan JL, Friez MJ, Funke BH, Hegde MR, Lyon E; Working Group of the American College of Medical Genetics and Genomics Laboratory Quality Assurance Committee. ACMG clinical laboratory standards for next-generation sequencing. Genet Med. 2013;15(9):733–747. doi: 10.1038/gim.2013.92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, et al. ; ACMG Laboratory Quality Assurance Committee. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ritchie SC, Lambert SA, Arnold M, Teo SM, Lim S, Scepanovic P, Marten J, Zahid S, Chaffin M, Liu Y, et al. Integrative analysis of the plasma proteome and polygenic risk of cardiometabolic diseases. Nat Metab. 2021;3(11):1476–1483. doi: 10.1038/s42255-021-00478-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Soave D, Sun L.. A generalized Levene's scale test for variance heterogeneity in the presence of sample correlation and group uncertainty. Biometrics. 2017;73(3):960–971. doi: 10.1111/biom.12651. [DOI] [PubMed] [Google Scholar]
  43. Stevens VL, Bierut LJ, Talbot JT, Wang JC, Sun J, Hinrichs AL, Thun MJ, Goate A, Calle EE.. Nicotinic receptor gene variants influence susceptibility to heavy smoking. Cancer Epidemiol Biomarkers Prev. 2008;17(12):3517–3525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sulc J, Mounier N, Günther F, Winkler T, Wood AR, Frayling TM, Heid IM, Robinson MR, Kutalik Z.. Quantification of the overall contribution of gene-environment interaction for obesity-related traits. Nat Commun. 2020;11(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Varona L, Legarra A, Toro MA, Vitezica ZG.. Non-additive effects in genomic selection. Front Genet. 2018;9:78. doi: 10.3389/fgene.2018.00078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J.. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wand H, Lambert SA, Tamburro C, Iacocca MA, O'Sullivan JW, Sillari C, Kullo IJ, Rowley R, Dron JS, Brockman D, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591(7849):211–219. doi: 10.1038/s41586-021-03243-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wang H, Zhang F, Zeng J, Wu Y, Kemper KE, Xue A, Zhang M, Powell JE, Goddard ME, Wray NR, et al. Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci Adv. 2019;5(8):eaaw3538. doi: 10.1126/sciadv.aaw3538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Yang J, Ferreira T, Morris AP, Medland SEGenetic Investigation of ANthropometric Traits (GIANT), DIAbetes Genetics Replication and Meta-analysis (DIAGRAM) Consortium Madden PAF, Heath AC, Martin NG, Montgomery GW, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44(4):369–375, S1–3. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Yang J, Lee SH, Goddard ME, Visscher PM.. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Yang J, Loos RJ, Powell JE, Medland SE, Speliotes EK, Chasman DI, Rose LM, Thorleifsson G, Steinthorsdottir V, Magi R, et al. FTO genotype is associated with phenotypic variability of body mass index. Nature. 2012;490(7419):267–272. doi: 10.1038/nature11401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM; Giant Consortium. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum Mol Genet. 2018;27(20):3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Young AI, Wauthier FL, Donnelly P.. Identifying loci affecting trait variability and detecting interactions in genome-wide association studies. Nat Genet. 2018;50(11):1608–1614. doi: 10.1038/s41588-018-0225-6. [DOI] [PubMed] [Google Scholar]
  54. Zhang F, Chen W, Zhu Z, Zhang Q, Nabais MF, Qi T, Deary IJ, Wray NR, Visscher PM, McRae AF, et al. OSCA: a tool for omic-data-based complex trait analysis. Genome Biol. 2019;20(1):107. doi: 10.1186/s13059-019-1718-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Zhu Z, Bakshi A, Vinkhuyzen AA, Hemani G, Lee SH, Nolte IM, van Vliet-Ostaptchouk JV, Snieder HLifeLines Cohort Study Esko L, et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am J Hum Genet. 2015;96(3):377–385. doi: 10.1016/j.ajhg.2015.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–487. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

iyac158_Supplemental_Table_S1-S3
iyac158_Supplemental_Figures_S1-S2
iyac158_Supplemental_Table_S4
iyac158_Supplemental_Table_S5
iyac158_Supplemental_Table_S6
iyac158_Supplemental_Table_S7
iyac158_Supplemental_Table_S8
iyac158_Supplemental_Table_S9
iyac158_Supplemental_Table_S10
iyac158_Supplemental_Table_S11
iyac158_Supplemental_Material_Legends

Data Availability Statement

Full summary statistics of genome-wide discovery of PVLs, computational scripts, and supplementary tables are deposited in a figshare repository https://doi.org/10.6084/m9.figshare.17131472.v1. TL had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Restrictions apply to the availability of individual-level data from the UK Biobank to preserve patient confidentiality. These data are available from UK Biobank upon successful project application to the research committee.

Supplemental material is available at GENETICS online.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES