Significance tests for R2 of out-of-sample prediction using polygenic scores

Md Moksedul Momin; Soohyun Lee; Naomi R Wray; S Hong Lee

doi:10.1016/j.ajhg.2023.01.004

. 2023 Jan 25;110(2):349–358. doi: 10.1016/j.ajhg.2023.01.004

Significance tests for R² of out-of-sample prediction using polygenic scores

Md Moksedul Momin ^1,^2,^3,^4,^∗, Soohyun Lee ⁵, Naomi R Wray ^6,⁷, S Hong Lee ^1,^2,^4,^∗∗

PMCID: PMC9943721 PMID: 36702127

Summary

The coefficient of determination (R²) is a well-established measure to indicate the predictive ability of polygenic scores (PGSs). However, the sampling variance of R² is rarely considered so that 95% confidence intervals (CI) are not usually reported. Moreover, when comparisons are made between PGSs based on different discovery samples, the sampling covariance of R² is required to test the difference between them. Here, we show how to estimate the variance and covariance of R² values to assess the 95% CI and p value of the R² difference. We apply this approach to real data calculating PGSs in 28,880 European participants derived from UK Biobank (UKBB) and Biobank Japan (BBJ) GWAS summary statistics for cholesterol and BMI. We quantify the significantly higher predictive ability of UKBB PGSs compared to BBJ PGSs (p value 7.6e−31 for cholesterol and 1.4e−50 for BMI). A joint model of UKBB and BBJ PGSs significantly improves the predictive ability, compared to a model of UKBB PGS only (p value 3.5e−05 for cholesterol and 1.3e−28 for BMI). We also show that the predictive ability of regulatory SNPs is significantly enriched over non-regulatory SNPs for cholesterol (p value 8.9e−26 for UKBB and 3.8e−17 for BBJ). We suggest that the proposed approach (available in R package r2redux) should be used to test the statistical significance of difference between pairs of PGSs, which may help to draw a correct conclusion about the comparative predictive ability of PGSs.

Keywords: coefficient of determination, R², polygenic scores, PGSs, sampling variance, sampling covariance, significance test, non-nested model comparison, R²-based genomic partitioning

R² is a well-established measure for the reliability of polygenic score models although its significance test is rarely considered in this context. We release an R package r2redux that allows formal statistical comparison of two polygenic score models, providing the 95% confidence interval and significance of R² difference.

Introduction

Complex traits are affected by many risk factors including polygenic effects.¹^,²^,³ Genetic profile analysis can quantify how polygenic effects are associated with future disease risk at the individual and population levels.⁴^,⁵ Genetic profiling has potential benefits that can help people make informed decisions when they manage their health and medical care.⁶^,⁷^,⁸

Genome-wide association studies (GWASs) have provided an opportunity to estimate genetic profile or polygenic scores (PGSs) that represent individual risk predictions from genetic data.⁴^,⁹^,¹⁰^,¹¹^,¹²^,¹³^,¹⁴ Typically, the effects of genome-wide single-nucleotide polymorphisms (SNPs) associated with complex traits are estimated in a discovery dataset, which are projected in an independent target dataset. Then, for each individual in the target samples the weighted genotypic coefficients according to the projected SNP effects (i.e., PGSs) are derived and correlated with outcome (trait including affected/unaffected for disease) to quantify the prediction accuracy. The squared correlation or coefficient of determination (R²) is a useful measure to quantify the reliability of the PGS. Note that R² is equivalent to the squared regression coefficient if the dependent and explanatory variables are column standardized.¹⁵

Previously, we introduced a measure of R² on the liability scale that can be comparable across different models and scales¹⁶ when using disease traits or ascertained case-control data. Choi et al.¹² reported that this R² measure on the liability scale outperforms the widely used Nagelkerke pseudo R² in controlling for bias due to ascertained case-control samples. Nagelkerke pseudo R² estimates depend on the proportion of affected individuals in the sample. In contrast, R² on the liability scale does not depend on the proportion of cases in the sample but does require an estimate of the lifetime population prevalence of the disease.

Wand et al.¹¹ suggested that any PGS study should report R² as an indicator of the predictive ability. Choi et al.¹² concluded that R² is a useful metric to measure association and goodness of fit in the interpretation of PGS predictions. Many studies have demonstrated the predictive ability of PGSs, using R².¹²^,¹³^,¹⁷^,¹⁸ However, the variance of R²¹⁵ has been rarely studied especially in the context of PGSs although it is the crucial parameter for estimation of confidence intervals (CI) of R². Furthermore, estimates of the covariance between a pair of R² values (e.g., from two sets of PGSs) are necessary to assess whether they are significantly different from each other, or if the ratio of two R² values significantly deviates from the expectation. This significance test for the difference or ratio is important when comparing two or multiple sets of PGSs that are derived from different sets of SNPs, e.g., genomic partitioning, genome-wide association p value thresholds (p_T) analysis, or PGSs based on pathway subsets.¹⁹^,²⁰

In this study, we use R² measures and their variance-covariance matrix to assess whether the predictive abilities of PGSs based on different sources are significantly different from each other. We derive the variance and covariance of R² values to generate estimates of its 95% CI and p value of the R² difference, considering two sets of dependent or independent PGSs. We also derive the variance and covariance matrix (i.e., information matrix) of squared regression coefficients in a multiple regression model, testing whether the proportion of the squared regression coefficient attributable to SNPs in the regulatory region is significantly higher than expected (i.e., PGS-based genomic partitioning method). We apply this approach to real data to compare PGSs calculated in 28,880 European individuals using UK Biobank (UKBB) and Biobank Japan (BBJ) GWAS summary statistics for cholesterol and BMI.

Material and methods

We used data from the UK Biobank (https://www.ukbiobank.ac.uk), the scientific protocol of which has been reviewed and approved by the Northwest Multi-center Research Ethics Committee, National Information Governance Board for Health & Social Care, and Community Health Index Advisory Group. UK Biobank has obtained informed consent from all participants. Our access to the UK Biobank data was under the reference number 14575.

Publicly available GWAS summary statistics of Biobank Japan (BBJ)²¹^,²² were used, following BBJ’s guidelines (http://jenger.riken.jp/en/result). The research ethics approval of this study has been obtained from the University of South Australia Human Research Ethics Committee.

PGS models

We use a linear model that regresses the observed phenotypes on a single or multiple sets of PGSs. It is assumed that the phenotypes are already adjusted for other non-genetic and environmental factors (e.g., demographic variables, ancestry principal components), and PGSs are already calculated based on GWAS summary statistics.

A PGS model can be written as

y = X β + e

(Equation 1)

where y is the vector of standardized phenotypes of trait, X is a column-standardized N × M matrix including M sets of PGS, β is the vector of regression coefficients of X (i.e., PGS), and e is the vector of residuals. For example, with two sets of PGSs (M = 2), X and $\hat{β}$ can be expressed as

X = [x_{1}, x_{2}]

\hat{β} = [\begin{array}{c} {\hat{β}}_{1} \\ {\hat{β}}_{2} \end{array}] = {(X^{'} X)}^{- 1} X^{'} y = Σ_{22}^{- 1} Σ_{21},

(Equation 2)

Σ = [\begin{array}{c} (Σ_{11}) & (Σ_{12}) \\ (Σ_{21}) & (Σ_{22}) \end{array}] = [\begin{array}{c} (1) & (\begin{array}{c} r_{y, x_{1}} & r_{y, x_{2}} \end{array}) \\ (\begin{array}{c} r_{y, x_{1}} \\ r_{y, x_{2}} \end{array}) & (\begin{array}{c} 1 & r_{x_{1}, x_{2}} \\ r_{x_{1}, x_{2}} & 1 \end{array}) \end{array}]

(Equation 3)

where $r_{y, x_{1}}, r_{y, x_{2}}, a n d r_{x_{1}, x_{2}}$ are correlations between y and the first PGS (x₁), y and the second PGS (x₂), and between the two PGSs (x₁ and x₂), respectively, in the sample. Using $\hat{β}$ that are estimated in the multiple regression (Equation 2), the predicted phenotypes ( $\hat{y}$ ) can be obtained as

\hat{y} = X \hat{β} .

The coefficient of determination for this multiple regression model with $X = [x_{1}, x_{2}]$ in Equation 1 can be written as

r_{y, (x_{1}, x_{2})}^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} y_{i}^{2}} = \frac{\sum_{i = 1}^{N} {\hat{y}}_{i}^{2}}{N} = {\hat{β}}_{1}^{2} + {\hat{β}}_{2}^{2} + 2 r_{x_{1}, x_{2}} {\hat{β}}_{1} {\hat{β}}_{2} .

(Equation 4)

With a single set of PGSs, i.e., M = 1 and $X = [x_{1}]$ or $[x_{2}]$ in Equation 1, the expression of R² can be reduced as

r_{y, x_{1}}^{2} = \frac{\sum_{i = 1}^{N} {\hat{y}}_{i}^{2}}{N} = {\hat{β}}_{1}^{2} w i t h X = [x_{1}]

r_{y, x_{2}}^{2} = \frac{\sum_{i = 1}^{N} {\hat{y}}_{i}^{2}}{N} = {\hat{β}}_{2}^{2} w i t h X = [x_{2}] .

It is noted that $r_{y, (x_{1}, x_{2})}^{2}$ , $r_{y, x_{1}}^{2}$ , or $r_{y, x_{2}}^{2}$ is an estimate of parameter $ρ_{y, (x_{1}, x_{2})}^{2}, ρ_{y, x_{1}}^{2}, o r ρ_{y, x_{2}}^{2}$ , and each estimate has a sampling variance.

Variance of R²

The distribution of R² can be transformed to a non-central $χ^{2}$ distribution with mean = M+ $λ$ and variance = $2 \times (M + 2 λ$ ) where $λ = \frac{N \times R^{2}}{{(1 - R^{2})}^{2}}$ is the non-centrality parameter. For example, the variance of the transformed value for $r_{y, x_{1}}^{2}$ is

v a r [{(\frac{{\hat{β}}_{1}}{s d ({\hat{β}}_{1})})}^{2}] = \frac{1}{{v a r ({\hat{β}}_{1})}^{2}} v a r ({\hat{β}}_{1}^{2}) = 2 (M + 2 λ) .

Therefore,

v a r (r_{y, x_{1}}^{2}) = v a r ({\hat{β}}_{1}^{2}) = 2 {v a r ({\hat{β}}_{1})}^{2} (M + 2 λ)

(Equation 5)

where $v a r ({\hat{β}}_{1}) = 1 / N \cdot {(1 - ρ_{y, x_{1}}^{2})}^{2}$ , M = 1, and $ρ_{y, x_{1}}^{2}$ is the squared correlation in the population and can be approximated as $ρ_{y, x_{1}}^{2} \approx r_{y, x_{1}}^{2}$ .²³^,²⁴

In a similar manner, Equation 5 can be extended to multiple explanatory variables as

v a r (r_{y, (x_{1}, x_{2}, \dots, x_{M})}^{2}) \approx 2 {[\frac{1}{N} \cdot {(1 - r_{y, (x_{1}, x_{2}, \dots, x_{M})}^{2})}^{2}]}^{2} (M + 2 λ),

(Equation 6)

that is, Equation 6 is a generalized form of Equation 5.

Wishart²⁵ introduced a formula to obtain the variance of R² (also see Stuart and Ord²⁶ and Olkin and Finn¹⁵) as

V a r (R^{2}) = \frac{[4 \times R^{2} \times {(1 - R^{2})}^{2} \times {N - (M + 1)}^{2}]}{[(N^{2} - 1) \times (N + 3)]}

which provides an equivalent estimate as in Equation 6. Wishart²⁵ derived his formula of the variance of R² based on the hypergeometric series that has been used in the literature including Olkin and Finn.¹⁵ We introduce Equation 6 derived based on the transformation of a non-central $χ^{2}$ distribution. Both Equation 6 and Wishart equation provide identical estimates of the variance of R² (Figure S1). The s.e. of R² estimate is the square root of $v a r (R^{2}) .$

Variance of the difference between two R² values

Following Olkin and Finn,¹⁵ we use the delta method to estimate the variance of the difference between R² values based on two sets of PGS (x₁ and x₂). Assuming that the difference of R² values can be formulated as a function of the correlations, i.e., $f (r_{y, x_{1}}, r_{y, x_{2}}, r_{x_{1}, x_{2}})$ , the delta method approximates the variance of the difference as

v a r (f) = θ^{'} Ω θ

(Equation 7)

where

θ^{'} = (\frac{\partial f}{\partial r_{y, x_{1}}}, \frac{\partial f}{\partial r_{y, x_{2}}}, \frac{\partial f}{\partial r_{x_{1}, x_{2}}})

(Equation 8)

is the derivatives of f with respect to the correlations and

Ω = [\begin{array}{c} v a r (r_{y, x_{1}}) & c o v (r_{y, x_{1}}, r_{y, x_{2}}) & c o v (r_{y, x_{1}}, r_{x_{1}, x_{2}}) \\ c o v (r_{y, x_{1}}, r_{y, x_{2}}) & v a r (r_{y, x_{2}}) & c o v (r_{y, x_{2}}, r_{x_{1}, x_{2}}) \\ c o v (r_{y, x_{1}}, r_{x_{1}, x_{2}}) & c o v (r_{y, x_{2}}, r_{x_{1}, x_{2}}) & v a r (r_{x_{1}, x_{2}}) \end{array}]

Each element of $Ω$ is shown in Olkin and Finn¹⁵ (also see Supplemental Note A).

From Equation 7, the following variances of differences can be estimated and used in our PGS analyses.

R² difference when using different discovery samples to generate the PGS

The variance of R² difference can be written as

v a r (r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}) w i t h f (r_{y, x_{1}}, r_{y, x_{2}}, r_{x_{1}, x_{2}}) = r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2},

(Equation 9)

which allows us to compare two PGS models that are not nested to each other (see R2 difference when using different information sources in results section), for which the conventional log likelihood ratio test cannot be applied.

In Equation 9, the values of $r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}$ from random samples in the population are normally distributed when the sample size is sufficient.¹⁵ Assuming that our PGS analysis is sufficiently powered (n > 25,000), the p value for the significance test of the difference can be derived from

\frac{{(r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2})}^{2}}{v a r (r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2})} \sim χ_{1}^{2}

and the 95% confidence interval is

[(r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}) - 1.96 \sqrt{v a r (r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2})}, (r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}) + 1.96 \sqrt{v a r (r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2})}]

(Equation 10)

When comparisons are made between R² values based on two sets of PGSs (x₁ and x₂), the sampling covariance of R² is required, which is explicitly used in Equations 7 and 9. If the sampling covariance ignored, the test statistics can be biased (Figures S2 and S3).

R² difference when using nested models

When using nested models, the variance of R² difference can be written as

v a r (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}) w i t h f (r_{y, x_{1}}, r_{y, x_{2}}, r_{x_{1}, x_{2}}) = r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2} = {\hat{β}}_{1}^{2} + {\hat{β}}_{2}^{2} + 2 r_{x_{1}, x_{2}} {\hat{β}}_{1} {\hat{β}}_{2} - r_{y, x_{2}}^{2}

(Equation 11)

where ${\hat{β}}_{1}$ and ${\hat{β}}_{2}$ are the estimated regression coefficients from a multiple regression (Equation 2), calculated from $Σ$ (see Equations 2, 3, and 4). Again, the derivative with respect to each of the correlations can be obtained for this function (Equation 8). Note that the comparison between $r_{y, (x_{1}, x_{2})}^{2}$ and $r_{y, x_{2}}^{2}$ is equivalent to the log likelihood ratio test (i.e., $y = x_{1} β_{1} + x_{2} β_{2} + e$ vs. $y = x_{2} β_{2} + e$ ).¹⁵

The values of $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ in Equation 11 from random samples in the population follows a non-central chi-squared distribution with a non-centrality parameter $= N \times \frac{r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}}{{(1 - r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2})}^{2}}$ . The p value for the significance test of the difference can be derived from

λ \sim χ_{1}^{2}

and the 95% confidence interval is

[(r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}) + \sqrt{v a r (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2})} \frac{ξ_{97.5 %} - λ - 1}{\sqrt{2 (1 + 2 λ)}}, (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}) + \sqrt{v a r (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2})} \frac{ξ_{2.5 %} - λ - 1}{\sqrt{2 (1 + 2 λ)}}]

(Equation 12)

where $ξ_{%}$ is the value at the percentile of the inverse of non-central chi-squared cumulative distribution function with mean = $λ + 1$ and d.f. = 1.

When the sample size is large, the values of $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ from random samples in the population are normally distributed,¹⁵ and the 95% confidence interval is

[(r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}) - 1.96 \sqrt{v a r (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2})}, (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}) + 1.96 \sqrt{v a r (r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2})}]

(Equation 13)

Note that Equations 12 and 13 are equivalent when the sample size is sufficient.¹⁵

R² difference when using two independent sets of PGSs

In this case, there is no correlation structure between two independent sets of PGSs ( $r_{x_{1}, x_{2}} = 0$ , e.g., PGSs in male and female individuals), so the variance of R² difference is simply the sum of the variances of each R² value, which can be obtained from Equation 5. For example, assuming $r_{x_{1}, x_{2}} = 0$ , the variance of R² difference can be written as

v a r (r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2}) = 2 {[\frac{1}{N_{1}} \cdot {(1 - r_{y_{1}, (x_{1})}^{2})}^{2}]}^{2} (1 + 2 λ_{1}) + 2 {[\frac{1}{N_{2}} \cdot {(1 - r_{y_{2}, (x_{1})}^{2})}^{2}]}^{2} (1 + 2 λ_{2})

(Equation 14)

where y₁ and y₂ are the vectors of standardized phenotypes and N₁ and N₂ are the sample sizes for the two independent sets of PGSs. The non-centrality parameters ( $λ_{1}$ and $λ_{2}$ ) for two independent PGSs can be written as

λ_{1} = \frac{N_{1} \times r_{y_{1}, x_{1}}^{2}}{{(1 - r_{y_{1}, x_{1}}^{2})}^{2}} a n d λ_{2} = \frac{N_{2} \times r_{y_{2}, x_{2}}^{2}}{{(1 - r_{y_{2}, x_{2}}^{2})}^{2}} .

The p value for the significance test of the difference can be derived from

\frac{{(r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2})}^{2}}{v a r (r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2})} \sim χ_{1}^{2}

and the 95% confidence interval¹⁵ is

[(r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2}) - 1.96 \sqrt{v a r (r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2})}, (r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2}) + 1.96 \sqrt{v a r (r_{y_{1}, x_{1}}^{2} - r_{y_{2}, x_{2}}^{2})}]

(Equation 15)

PGS-based genomic partitioning analysis

It is of interest to test whether a set of PGSs based on a genomic region of interest (or a pathway-based SNP subset) can better predict the phenotypes, compared to the rest of genomic regions. The proportion of the coefficient of determination explained by x₁ can be estimated as ${\hat{β}}_{1}^{2} / r_{y, (x_{1}, x_{2})}^{2}$ from a multiple regression, $y = x_{1} + x_{2} + e$ , where x₁ is the PGS of a genomic region of interest and x₂ is the PGS of the rest of genomic regions. The expected proportion of the coefficient of determination explained by x₁ can be calculated from prior information, referred to as p_exp = # SNPs used for PGS1/total # SNPs. We are interested in testing whether the value of ${\hat{β}}_{1}^{2} / r_{y, (x_{1}, x_{2})}^{2}$ is significantly different from its expectation, p_exp, which requires to estimate the sampling variance of ${\hat{β}}_{1}^{2} / r_{y, (x_{1}, x_{2})}^{2}$ , using Equation 7. The variance of the proportion can be written as

v a r ({\hat{β}}_{1}^{2} / r_{y, (x_{1}, x_{2})}^{2}) w i t h f (r_{y, x_{1}}, r_{y, x_{2}}, r_{x_{1}, x_{2}}) = {\hat{β}}_{1}^{2} / r_{y, (x_{1}, x_{2})}^{2}

(Equation 16)

where ${\hat{β}}_{1}$ is the estimated regression coefficient of x₁, calculated from $Σ$ (Equation 3), and $r_{y, (x_{1}, x_{2})}^{2} = {\hat{β}}_{1}^{2} + {\hat{β}}_{2}^{2} + 2 r_{x_{1}, x_{2}} {\hat{β}}_{1} {\hat{β}}_{2}$ is the coefficient of determination. Therefore, it is possible to get the derivative with respect to each of the correlations, $r_{y, x_{1}}, r_{y, x_{2}}, a n d r_{x_{1}, x_{2}}$ in Equation 8. This variance can be used to obtain the significance and 95 CI of the observed proportion of the coefficient of determination.

Analogous to Equation 9, the values of $\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp}$ with random samples in the population are asymptotically normal.¹⁵ Using a Wald test, the p value for the significance test of the difference can be derived from

\frac{{[(\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp})]}^{2}}{v a r (\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp})} \sim χ_{1}^{2}

The 95% confidence interval of the ratio is

[(\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp}) - 1.96 \sqrt{v a r (\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp})}, (\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp}) + 1.96 \sqrt{v a r (\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp})}]

(Equation 17)

In addition, the package, r2redux, can provide $v a r ({\hat{β}}_{1}^{2})$ , $v a r ({\hat{β}}_{2}^{2})$ , and $v a r ({\hat{β}}_{1}^{2} - {\hat{β}}_{2}^{2})$ , i.e., the information matrix of the squared regression coefficients (see Supplemental Note B) that is useful when comparing the actual values of ${\hat{β}}_{1}^{2}$ and ${\hat{β}}_{2}^{2}$ .

It is noted that the delta method employed in this study is a well-established approach to derive the distribution of a function of an asymptotically normal variable.²⁷ Following Olkin and Finn,¹⁵ we used the delta method to derive the variances of R² and their difference as a function of regression coefficients (Equations 7, 8, 9, 11, and 16). We explicitly checked that the regression coefficients are asymptotically normal, using a realistic correlation structure among variables (Figures S4–S6).

Data

The UK Biobank is a large-scale biomedical database that comprises 0.5 million individuals who had been recruited between 2006 and 2010; their age ranged between 40 and 69 years.²⁸^,²⁹ The data consist of health-related information for samples who are genotyped for genome-wide SNPs. A stringent quality control (QC) process was applied to UKBB data that excludes individuals with non-white British ancestries, mismatched sex between reported and inferred from genotypic information, genotype call rate < 0.95, or putative sex chromosome aneuploidy. The SNP QC criteria filtered out SNPs with an imputation reliability <0.6, missingness >0.05, minor allele frequency (MAF) < 0.01, or Hardy-Weinberg equilibrium p value <10⁻⁷. We also applied a relatedness cut-off QC (>0.05) so that there was no high pairwise relatedness among individuals. After QC, 288,792 individuals and 7,701,772 SNPs were retained.

Discovery GWAS data

Ninety percent of the individuals from the 288,792 QCed individuals were randomly selected as discovery samples (n = 259,912 to generate GWAS summary statistics (UKBB hereafter) for the 7,701,772 SNPs.. For the GWAS with the 259,912 UKBB discovery samples, we used BMI and cholesterol that were adjusted for age, sex, birth year, Townsend Deprivation Index (TDI), education, genotype measurement batch, assessment center, and the first 10 ancestry principal components using a linear regression.

We also have access to Japanese Biobank (BBJ) (http://jenger.riken.jp/en/result) GWAS summary statistics (BBJ hereafter) for BMI²¹ (n = 158,284) and cholesterol²² (n = 128,305) for 5,961,601 SNPs.

Target data

Ten percent of the individuals from the 288,792 QCed individuals were randomly selected as an independent target dataset (n = 28,880) that were non-overlapping and unrelated with the UKBB and BBJ discovery samples. In the PGS analyses, we used only 4,113,630 SNPs that were common between UKBB and BBJ GWAS data after excluding ambiguous SNPs and SNPs with any strand issue.

In the target dataset (n = 28,880), the phenotypes of each trait were adjusted for age, sex, birth year, TDI, education, genotype batch, assessment center, and the first 10 principal components using a linear regression. The pre-adjusted phenotypes were correlated with PGSs estimated in the following step. For each trait, we used the UKBB and BBJ GWAS summary statistics to estimate two sets of PGSs (UKBB PGSs vs. BBJ PGSs for the 28,880 target individuals ), using PLINK2 (https://www.cog-genomics.org/plink/2.0/) with the score function.³⁰ Then, we estimated the correlation between the PGS and pre-adjusted phenotypes to obtain R² values in the PGS analyses.

Functional annotation of the genome

We annotated the genome using pre-defined functional categories (regulatory vs. non-regulatory genomic regions).³¹ Regulatory region includes SNPs from coding regions, untranslated regions (UTRs), and promotors. Non-regulatory region includes all the other regions except the regulatory region. The number of SNPs belong to regulatory and non-regulatory is 158,653 and 3,954,947 (i.e., 4% of the total SNPs are located in the regulatory region).

Simulation of dependent and explanatory variables

For a quantitative trait, we simulated dependent variable (y) and PGSs (x₁ and x₂), varying the correlation structure of $[\begin{array}{c} 1 & r_{y, x_{1}} & r_{y, x_{2}} \\ r_{y, x_{1}} & 1 & r_{x_{1}, x_{2}} \\ r_{y, x_{2}} & r_{x_{1}, x_{2}} & 1 \end{array}]$ and the sample size (detailed simulation parameters are shown in Figures S7–S15). For a disease trait, the same simulation procedure was used, and the simulated quantitative phenotypes were transformed to binary responses using a liability threshold model with a population prevalence of k = 0.05. For example, case-control status was assigned to individuals according to their standardized quantitative phenotypes (i.e., liability), i.e., cases have liability greater than a threshold such that the proportion of cases is k = 0.05. The empirical variances of $r_{y, x_{1}}^{2}$ , $r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}$ , $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ , and $\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp}$ were obtained over 10,000 replicates, which were compared to the theoretical variances estimated using Equations 6, 9, 11, and 17, respectively.

Results

Simulation verification

The theory of the proposed method has been explicitly verified using simulations, varying sample size, and values of $r_{y, x_{1}}^{2}$ , $r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}$ , $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ , and $\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp}$ (Figures S7–S15). The empirical variances obtained from 10,000 simulated replicates are almost perfectly correlated with the theoretical variance for the values of $r_{y, x_{1}}^{2}$ , $r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}$ , $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ , $\frac{{\hat{β}}_{1}^{2}}{r_{y, (x_{1}, x_{2})}^{2}} - p_{\exp}$ when varying the sample size (Figures S7–S10) and when varying R² values (Figures S11–S14). When considering two independent PGSs, the theoretical and empirical variances are also agreed well (Figure S15).

R² difference when using different information sources: UKBB vs. BBJ

It is of interest to determine whether different information sources (e.g., ancestries) have significantly different predictive abilities in PGS analyses, which can be assessed using Equations 9 and 10. Figure 1 illustrates that when predicting the 28,880 European target samples, the coefficient of determinations (R²) with the UKBB and BBJ PGSs were 0.024 (95% CI = 0.021–0.028) and 0.003 (95% CI = 0.002–0.004), respectively, for cholesterol. However, these R² values and CIs cannot be used to assess their difference because the two sets of PGSs are not independent. Furthermore, the two PGS models with UKBB and BBJ are not nested to each other, so the likelihood ratio test could not be used either. For this problem, we used Equations 9 and 10 to obtain the variance, 95% CI (0.0175–0.0247), and p value (7.6e−31) of the R² difference, accounting for the dependency between UKBB and BBJ PGSs, for cholesterol (Figure 1). Similarly, the test statistics of the R² difference was obtained for BMI, 0.035–0.046 for 95% CI and p value = 1.4e−50 (Figure 1).

The predictive ability (R²) of PGSs when predicting 28,880 European individuals using UKBB or BBJ discovery GWAS dataset

(A) The main bars represent R² values and error bars correspond 95% confidence intervals. Two sets of GWAS summary statistics were obtained from UKBB and BBJ discovery GWAS datasets to estimate two sets of PGSs.

(B) Dot points represent the differences of R² values between UKBB and BBJ PGS models, and error bars indicate 95% confidence intervals of the difference.

It is also interesting to whether BBJ PGSs provides a significant improvement in the predictive ability, in addition to UKBB PGSs, when predicting the 28,880 European target samples. Figure 2 compares R² value with each UKBB or BBJ PGSs to R² value from a joint model fitting UKBB and BBJ PGSs simultaneously. Using Equations 11 and 12, we acquired the variance, 95% CI (0.0001–0.001), and p value (3.5e−05) of R² difference when comparing the joint model with a single model with UKBB, indicating that BBJ PGSs contributed to a significant improvement for cholesterol. Similarly, BBJ PGSs improved the predictive ability significantly (p value = 1.3e−28) for BMI. As expected, excluding UKBB PGSs from the joint model substantially decreased the prediction accuracy (p value = 1.6e−136 for cholesterol and 3.0e−308 for BMI).

The predictive ability (R²) of a PGS model based on UKBB or BBJ discovery dataset, compared to the joint model of both UKBB and BBJ when predicting 28,880 European individuals

(A) The main bars represent R² values and error bars correspond 95% confidence intervals. Two sets of GWAS summary statistics were obtained from UKBB and BBJ discovery GWAS datasets to estimate two sets of PGSs, i.e., UKBB and BBJ PGSs. In addition, a joint model fitting both UKBB and BBJ PGSs was compared.

(B) Dot points represent the differences of R² values between the joint model and UKBB or BBJ PGS model, and error bars indicate 95% confidence intervals of the difference.

R² difference when using two independent sets of PGSs: male vs. female

We were also interested in testing whether the PGSs could predict the adjusted phenotypes of the target individuals equally well for males and females. In this case, there is no correlation structure between male and female PGSs, so the variance of R² difference is simply the sum of the variances of each R² value, which can be obtained from Equation 5 or 6. Figure S16 shows that there was no significant difference between male and female PGSs in their predictive ability for cholesterol and BMI whether using UKBB or BBJ discovery GWAS dataset.

PGSs with genome-wide association p value thresholds (p_T)

PGSs also have been widely used to determine which p_T provides the highest prediction accuracy, for example, using PGS software such as PLINK.³⁰^,³² However, there is a lack of test statistics that can assess whether the predictive ability of the best-performing p_T is significantly different from the other p_T. Figure 3A illustrates that R² value is the highest at p_T = 0.3 when predicting 28,880 European individuals in the target dataset, using BBJ discovery GWAS dataset (BMI). However, it is not clear if the predictive ability at p_T = 0.3 is significantly higher than the adjacent p_T (e.g., p_T = 0.2 or 0.4), and it may be important to report p_T of which the predictive ability is not statistically different from the best-performing p_T. Using Equations 9 and 10, we assessed the significance of difference between the best-performing p_T and each of the other p_T (Figure 3B). From this analysis, we found that the best-performing p_T was not significantly different from p_T ranging between 0.1 and 1, but significantly different from p_T < 0.05 (Figure 3B). When using the UKBB discovery GWAS dataset to predict the 28,880 European individuals, the highest R² value at the p_T of 1 was significantly different from all the other p_T (Figure S17B).

The predictive ability (R²) of PGSs estimated based on SNPs below p_T when predicting BMI in 28,880 European samples using BBJ discovery samples (GWAS summary statistics)

(A) The main bars represent R² values and error bars correspond 95% confidence intervals. The values above 95% CIs are p values indicating that R² values are not different from zero.

(B) The main bars represent the difference of R² values between the corresponding p_T and the best-performing p_T and error bars indicate 95% confidence intervals. The values above 95% CIs are p values indicating the significance of the difference between the pairs of R² values.

Interestingly, the highest R² value was found at p_T = 1e−04 (Figure 4A) when predicting the European target samples using BBJ discovery GWAS dataset for cholesterol, which was not statistically different from p_T = 0.001 but was significantly higher than the other p_T (Figure 4B). For the same target samples and trait, the best R² value was obtained from p_T = 0.01 when using the UKBB discovery GWAS dataset (Figure S18A). Except for p_T = 0.01, 0.05, and 0.1, R² values at the other p_T were significantly different from the best R² values (Figure S18B).

The predictive ability (R²) of PGSs estimated based on SNPs below the p_T when predicting cholesterol in 28,880 European samples using BBJ discovery samples (GWAS summary statistics)

(A) The main bars represent R² values and error bars correspond 95% confidence intervals. The values above 95% CIs are p values indicating that R² values are not different from zero.

(B) The main bars represent the difference of R² values between the corresponding p_T and the best-performing p_T and error bars indicate 95% confidence intervals. The values above 95% CIs are p values indicating the significance of the difference between the pairs of R² values.

PGS-based genomic partitioning analyses

Genomic partitioning analyses have been widely applied.³¹^,³³^,³⁴^,³⁵ Such analysis could be useful in the PGS context. Using Equation 16, we can estimate the variance of the $\frac{{\hat{β}}_{r e g u}^{2}}{R^{2}}$ where ${\hat{β}}_{r e g u}$ is the estimated regression coefficient from a multiple regression (Equation 2), and assess whether the observed proportion ( $\frac{{\hat{β}}_{r e g u}^{2}}{R^{2}}$ ) is significant different from p_exp (i.e., the coverage of the SNPs belonged to the category). For example, we partitioned the genome-wide SNPs into the regulatory (158,653) and non-regulatory (3,954,947) regions, following Gusev et al.,³¹ resulting p_exp = 4% of SNP coverage for the regulatory region as the expectation. We simultaneously fit two sets of PGSs from regulatory and non-regulatory regions to get ${\hat{β}}_{r e g u}^{2}$ and ${\hat{β}}_{n o n - r e g u}^{2}$ , using a multiple regression, then assess whether the value of $\frac{{\hat{β}}_{r e g u}^{2}}{R^{2}} - p_{\exp}$ is significantly different from zero (Equation 17). Figure 5 shows that the predictive ability of regulatory SNPs was significantly higher than the expectation (p value = 8.9e−26 for UKBB and 3.8e−17 for BBJ) for cholesterol. In contrast, the predictive ability of regulatory SNPs was not better than the expectation for BMI (Figure 5).

PGS-based genomic partitioning method to assess whether the predictive ability is enriched in the regulatory region for cholesterol and BMI

Here p_exp = 0.04 is the expectation for the regulatory SNPs based on the proportion of SNPs allocated to this annotation.

(A) The main bars represent squared regression coefficients attributable to SNPs in the regulatory region ( ${\hat{β}}_{r e g u}^{2}$ ) and non-regulatory region ( ${\hat{β}}_{n o n - r e g u}^{2}$ ), and error bars correspond to 95% confidence intervals when predicting 28,880 European samples using UKBB or BBJ GWAS summary statistics.

(B) Dot points represent the difference between the observed and expected proportions ( $\frac{{\hat{β}}_{r e g u}^{2}}{R^{2}} - p_{\exp}$ ) and error bars indicate 95% confidence intervals of the difference.

Application to binary responses and ascertained case-control data

The proposed method is also explicitly verified using simulation for binary or case-control data, varying sample size and values of $r_{y, x_{1}}^{2}$ , $r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}$ , $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ , and $\frac{{\hat{β}}_{1}^{2}}{R^{2}} - p_{\exp}$ (Figures S19–S26). The empirical variances obtained from 10,000 simulated replicates are almost identical with the theoretical variances for the values of $r_{y, x_{1}}^{2}$ , $r_{y, x_{1}}^{2} - r_{y, x_{2}}^{2}$ , $r_{y, (x_{1}, x_{2})}^{2} - r_{y, x_{2}}^{2}$ , and $\frac{{\hat{β}}_{1}^{2}}{R^{2}} - p_{\exp}$ when varying the sample size (Figures S19–S22) and when varying R² values (Figures S23–S26). In the case of ascertained case-control, a similar pattern is shown, i.e., the empirically observed variances obtained from 10,000 simulated replicates are agreed well with the theoretical variances for the values (Figures S27–S30). This finding shows that the proposed method can be applied to test the significance of difference between predictive abilities of PGSs for binary traits and ascertained case-control traits when R² is not very high (<0.1). Note that the empirical and theoretical variances diverge when R² values on the observed scale are more than 0.1 for binary responses and ascertained case control (Figures S31 and S32). Although R² value > 0.1 is not frequently observed in the current PGS studies (Table S2), a careful interpretation is required for the variance of such high R², and we would not recommend using the theoretical approximation.

Discussion

R² has been widely used to measure the predictive ability of PGSs.¹³ However, the confidence interval of R² has rarely been reported, and the test statistic for the difference of two R² values has not been well documented. Here, we show how to get the variance of each estimated R² value and covariance between two R² estimates (from two sets of PGSs) that can be used to assess whether they are significantly different from each other.

Martin et al.¹⁸ reported that the PGS prediction accuracy is higher when discovery and target samples are from the same ancestry background, compared to when the samples are from different ancestries. However, they did not formally assess the statistical significance of the increase (no p value provided). More importantly, they did not consider the correlation structure between predictors when they compared two PGSs. We applied the proposed approach and found that the predictive ability of PGSs based on UKBB discovery GWASs is significantly higher than that of PGSs based on BBJ discovery GWASs, by formally deriving the 95% CI and p value of the R² difference.

Many studies evaluating PGSs use the p_T method¹² and report the p_T that maximizes performance. This provides useful information when inferring the genetic architecture of the trait of interest and when fine-tuning p_T as a hyper-parameter in PGS methods.³²^,³⁶^,³⁷^,³⁸ For such cases, it may be crucial to determine if the best-performing p_T is genuinely better than other (adjacent) p_T or if it occurs just by random chance (i.e., sampling error). For example, in Figure 3, the best-performing p_T is 0.3 (the set of SNPs with p_T ≤ 0.3), which is, however, not statistically different from p_T ≤ 0.2 or ≤ 0.1. Note that the set of SNPs with p_T ≤ 0.1 is nested within SNPs with p_T ≤ 0.3, meaning that the additional SNPs in the latter would not significantly improve the prediction accuracy. Therefore, p_T ≤ 0.1 should be used instead of the p_T ≤ 0.3 as the former is a more parsimonious model than the latter. Our proposed approach can formally assess statistical difference among p_T, providing 95% CI of the difference with a significance p value.

We also derived an information matrix of squared regression coefficients in a multiple regression model, establishing a PGS-based genomic partitioning method that could test whether the ratio of two squared regression coefficients is significantly deviated from its expectation given the proportion of SNPs allocated to each partition. This is analogous to the existing genomic partitioning approaches using GREML or LDSC³¹^,³³^,³⁴^,³⁵ that may have an overfitting issue because SNP effects and genomic partitioning are estimated in the same samples.

In conclusion, we show how to estimate the variance and covariance of R² estimates to quantify the 95% CI and p value of the difference and ratio when comparing two PGSs, which is available in R package r2redux (see Supplemental Note B). We suggest that the proposed approach should be used to test the statistical significance of difference and ratio between pairs of PGSs, which may help to draw a correct conclusion about the predictive ability of PGSs.

Acknowledgments

This research is supported by the Australian Research Council (DP190100766). The R package development is supported by Cooperative Research Program for Agriculture Science and Technology Development (PJ01609901) from the Rural Development Administration, Republic of Korea. We thank the staffs and samples of the UK Biobank and Biobank Japan for their important contributions. Our reference number approved by UK Biobank is 14575. The UK Biobank is funded by the UK Department of Health, the Medical Research Council, the Scottish Executive, and the Wellcome Trust medical research charity. The analyses were performed using computational resources provided by the Australian Government through Gadi under the National Computational Merit Allocation Scheme (NCMAS), and HPCs (Tango and Statgen server) managed by UniSA IT. N.R.W. acknowledges funding from the NHMRC (1113400, 1173790), Australia.

Author contributions

S.H.L. and N.R.W. conceived the idea. S.H.L. derived theory and supervised the study. M.M.M. performed the analysis. M.M.M. and S.H.L. verified the theory and analytical methods, and made the R package, with support from S.L. S.H.L. and M.M.M. wrote the first draft of the manuscript. N.R.W. and S.L. provided critical feedback and suggestions. All the authors discussed the results and contributed to the final manuscript.

Declaration of interests

The authors declare that they have no competing interests.

Published: January 25, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.01.004.

Contributor Information

Md. Moksedul Momin, Email: cvasu.momin@gmail.com.

S. Hong Lee, Email: hong.lee@unisa.edu.au.

Supplemental information

Document S1. Figures S1–S35, Tables S1 and S2, and supplemental notes

mmc1.pdf^{(2.9MB, pdf)}

Document S2. Article plus supplemental information

mmc2.pdf^{(3.9MB, pdf)}

Data and code availability

The genotype and phenotype data of the UK Biobank can be accessed through procedures described on its webpage (https://www.ukbiobank.ac.uk/) and summary statistics of BMI and cholesterol from Japanese Biobank (BBJ) can be obtained from its website (http://jenger.riken.jp/en/). r2redux can be downloaded from (https://github.com/mommy003/r2redux) or from CRAN [install.packages("r2redux") in R] (also see Supplemental Note B).

References

1.Plomin R., Haworth C.M.A., Davis O.S.P. Common disorders are quantitative traits. Nat. Rev. Genet. 2009;10:872–878. doi: 10.1038/nrg2670. [DOI] [PubMed] [Google Scholar]
2.Schork N.J. Genetics of complex disease: approaches, problems, and solutions. Am. J. Respir. Crit. Care Med. 1997;156:S103–S109. doi: 10.1164/ajrccm.156.4.12-tac-5. [DOI] [PubMed] [Google Scholar]
3.Gibson G. Decanalization and the origin of complex disease. Nat. Rev. Genet. 2009;10:134–140. doi: 10.1038/nrg2502. [DOI] [PubMed] [Google Scholar]
4.Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Ding Y., Hou K., Burch K.S., Lapinska S., Privé F., Vilhjálmsson B., Sankararaman S., Pasaniuc B. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat. Genet. 2022;54:30–39. doi: 10.1038/s41588-021-00961-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bilkey G.A., Burns B.L., Coles E.P., Bowman F.L., Beilby J.P., Pachter N.S., Baynam G., JS Dawkins H., Nowak K.J., Weeramanthri T.S. Genomic testing for human health and disease across the life cycle: applications and ethical, legal, and social challenges. Front. Public Health. 2019;7:40. doi: 10.3389/fpubh.2019.00040. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Allyse M.A., Robinson D.H., Ferber M.J., Sharp R.R. Vol. 1. Elsevier; 2018. pp. 113–120. (Direct-to-consumer Testing 2.0: Emerging Models of Direct-To-Consumer Genetic Testing). [DOI] [PubMed] [Google Scholar]
8.Frerichs F.C.P., Dingemans K.P., Brinkman K. Cardiomyopathy with mitochondrial damage associated with nucleoside reverse-transcriptase inhibitors. N. Engl. J. Med. 2002;347:1895–1896. doi: 10.1056/NEJM200212053472320. [DOI] [PubMed] [Google Scholar]
9.Wray N.R., Goddard M.E., Visscher P.M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 2007;17:1520–1528. doi: 10.1101/gr.6665407. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Wray N.R., Yang J., Goddard M.E., Visscher P.M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010;6:e1000864. doi: 10.1371/journal.pgen.1000864. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Wand H., Lambert S.A., Tamburro C., Iacocca M.A., O’Sullivan J.W., Sillari C., Kullo I.J., Rowley R., Dron J.S., Brockman D., et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591:211–219. doi: 10.1038/s41586-021-03243-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Choi S.W., Mak T.S.H., O’Reilly P.F. A guide to performing Polygenic Risk Score analyses. Nat. Protoc. 2020;15:2759–2772. doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Lewis C.M., Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12 doi: 10.1186/s13073-020-00742-5. 44–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O'Donovan M.C., Sullivan P.F., Sklar P., Consortium I.S. Common polygenic variation contributes to risk of schizophrenia that overlaps with bipolar disorder. 2009;460:748. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Olkin I., Finn J.D. Correlations redux. Psychol. Bull. 1995;118:155–164. [Google Scholar]
16.Lee S.H., Goddard M.E., Wray N.R., Visscher P.M. A better coefficient of determination for genetic profile analysis. Genet. Epidemiol. 2012;36:214–224. doi: 10.1002/gepi.21614. [DOI] [PubMed] [Google Scholar]
17.So H.-C., Sham P.C. Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits. Bioinformatics. 2017;33:886–892. doi: 10.1093/bioinformatics/btw745. [DOI] [PubMed] [Google Scholar]
18.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Choi S., Garcia-Gonzalez J., Ruan Y., Wu H., Johnson J., Hoggart C., O’Reilly P. The power of pathway-based polygenic risk scores. Research Square. 2021 doi: 10.21203/rs.3.rs-643696/v1. [DOI] [Google Scholar]
20.Li J., Chaudhary D.P., Khan A., Griessenauer C., Carey D.J., Zand R., Abedi V. Polygenic risk scores augment stroke subtyping. Neurol. Genet. 2021;7:e560. doi: 10.1212/NXG.0000000000000560. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Akiyama M., Okada Y., Kanai M., Takahashi A., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 2017;49:1458–1467. doi: 10.1038/ng.3951. [DOI] [PubMed] [Google Scholar]
22.Kanai M., Akiyama M., Takahashi A., Matoba N., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]
23.Olkin I., Siotani M. Asymptotic distribution of functions of a correlation matrix. Essays in probability and statistics. 1976:235–251. [Google Scholar]
24.Olkin I., Finn J.D. Testing correlated correlations. Psychol. Bull. 1990;108:330–333. [Google Scholar]
25.Wishart J. The mean and second moment coefficient of the multiple correlation coefficient, in samples from a normal population. Biometrika. 1931;22:353–361. [Google Scholar]
26.Stuart A., Ord J.K. 5th ed. Vol 2. 1991. Kendall's Advanced Theory of Statistics. [Google Scholar]
27.Ver Hoef J.M. Who invented the delta method? Am. Statistician. 2012;66:124–127. [Google Scholar]
28.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Fry A., Littlejohns T.J., Sudlow C., Doherty N., Adamska L., Sprosen T., Collins R., Allen N.E. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Gusev A., Lee S.H., Trynka G., Finucane H., Vilhjálmsson B.J., Xu H., Zang C., Ripke S., Bulik-Sullivan B., Stahl E., et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 2014;95:535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Euesden J., Lewis C.M., O’Reilly P.F. PRSice: polygenic risk score software. Bioinformatics. 2015;31:1466–1468. doi: 10.1093/bioinformatics/btu848. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Yang J., Manolio T.A., Pasquale L.R., Boerwinkle E., Caporaso N., Cunningham J.M., De Andrade M., Feenstra B., Feingold E., Hayes M.G., et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Lee S.H., DeCandia T.R., Ripke S., Yang J., Schizophrenia Psychiatric Genome-Wide Association Study Consortium PGC-SCZ. International Schizophrenia Consortium ISC. Molecular Genetics of Schizophrenia Collaboration MGS. Sullivan P.F., Goddard M.E., Keller M.C., et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 2012;44:247–250. doi: 10.1038/ng.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Choi S.W., O'Reilly P.F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience. 2019;8:giz082. doi: 10.1093/gigascience/giz082. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Zhao Z., Yi Y., Song J., Wu Y., Zhong X., Lin Y., Hohman T.J., Fletcher J., Lu Q. PUMAS: fine-tuning polygenic risk scores with GWAS summary statistics. Genome Biol. 2021;22:257–319. doi: 10.1186/s13059-021-02479-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S35, Tables S1 and S2, and supplemental notes

mmc1.pdf^{(2.9MB, pdf)}

Document S2. Article plus supplemental information

mmc2.pdf^{(3.9MB, pdf)}

Data Availability Statement

[bib1] 1.Plomin R., Haworth C.M.A., Davis O.S.P. Common disorders are quantitative traits. Nat. Rev. Genet. 2009;10:872–878. doi: 10.1038/nrg2670. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Schork N.J. Genetics of complex disease: approaches, problems, and solutions. Am. J. Respir. Crit. Care Med. 1997;156:S103–S109. doi: 10.1164/ajrccm.156.4.12-tac-5. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Gibson G. Decanalization and the origin of complex disease. Nat. Rev. Genet. 2009;10:134–140. doi: 10.1038/nrg2502. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Ding Y., Hou K., Burch K.S., Lapinska S., Privé F., Vilhjálmsson B., Sankararaman S., Pasaniuc B. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat. Genet. 2022;54:30–39. doi: 10.1038/s41588-021-00961-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Bilkey G.A., Burns B.L., Coles E.P., Bowman F.L., Beilby J.P., Pachter N.S., Baynam G., JS Dawkins H., Nowak K.J., Weeramanthri T.S. Genomic testing for human health and disease across the life cycle: applications and ethical, legal, and social challenges. Front. Public Health. 2019;7:40. doi: 10.3389/fpubh.2019.00040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Allyse M.A., Robinson D.H., Ferber M.J., Sharp R.R. Vol. 1. Elsevier; 2018. pp. 113–120. (Direct-to-consumer Testing 2.0: Emerging Models of Direct-To-Consumer Genetic Testing). [DOI] [PubMed] [Google Scholar]

[bib8] 8.Frerichs F.C.P., Dingemans K.P., Brinkman K. Cardiomyopathy with mitochondrial damage associated with nucleoside reverse-transcriptase inhibitors. N. Engl. J. Med. 2002;347:1895–1896. doi: 10.1056/NEJM200212053472320. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Wray N.R., Goddard M.E., Visscher P.M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 2007;17:1520–1528. doi: 10.1101/gr.6665407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Wray N.R., Yang J., Goddard M.E., Visscher P.M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010;6:e1000864. doi: 10.1371/journal.pgen.1000864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Wand H., Lambert S.A., Tamburro C., Iacocca M.A., O’Sullivan J.W., Sillari C., Kullo I.J., Rowley R., Dron J.S., Brockman D., et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591:211–219. doi: 10.1038/s41586-021-03243-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Choi S.W., Mak T.S.H., O’Reilly P.F. A guide to performing Polygenic Risk Score analyses. Nat. Protoc. 2020;15:2759–2772. doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Lewis C.M., Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12 doi: 10.1186/s13073-020-00742-5. 44–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O'Donovan M.C., Sullivan P.F., Sklar P., Consortium I.S. Common polygenic variation contributes to risk of schizophrenia that overlaps with bipolar disorder. 2009;460:748. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Olkin I., Finn J.D. Correlations redux. Psychol. Bull. 1995;118:155–164. [Google Scholar]

[bib16] 16.Lee S.H., Goddard M.E., Wray N.R., Visscher P.M. A better coefficient of determination for genetic profile analysis. Genet. Epidemiol. 2012;36:214–224. doi: 10.1002/gepi.21614. [DOI] [PubMed] [Google Scholar]

[bib17] 17.So H.-C., Sham P.C. Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits. Bioinformatics. 2017;33:886–892. doi: 10.1093/bioinformatics/btw745. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Choi S., Garcia-Gonzalez J., Ruan Y., Wu H., Johnson J., Hoggart C., O’Reilly P. The power of pathway-based polygenic risk scores. Research Square. 2021 doi: 10.21203/rs.3.rs-643696/v1. [DOI] [Google Scholar]

[bib20] 20.Li J., Chaudhary D.P., Khan A., Griessenauer C., Carey D.J., Zand R., Abedi V. Polygenic risk scores augment stroke subtyping. Neurol. Genet. 2021;7:e560. doi: 10.1212/NXG.0000000000000560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Akiyama M., Okada Y., Kanai M., Takahashi A., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 2017;49:1458–1467. doi: 10.1038/ng.3951. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Kanai M., Akiyama M., Takahashi A., Matoba N., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Olkin I., Siotani M. Asymptotic distribution of functions of a correlation matrix. Essays in probability and statistics. 1976:235–251. [Google Scholar]

[bib24] 24.Olkin I., Finn J.D. Testing correlated correlations. Psychol. Bull. 1990;108:330–333. [Google Scholar]

[bib25] 25.Wishart J. The mean and second moment coefficient of the multiple correlation coefficient, in samples from a normal population. Biometrika. 1931;22:353–361. [Google Scholar]

[bib26] 26.Stuart A., Ord J.K. 5th ed. Vol 2. 1991. Kendall's Advanced Theory of Statistics. [Google Scholar]

[bib27] 27.Ver Hoef J.M. Who invented the delta method? Am. Statistician. 2012;66:124–127. [Google Scholar]

[bib28] 28.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Fry A., Littlejohns T.J., Sudlow C., Doherty N., Adamska L., Sprosen T., Collins R., Allen N.E. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Gusev A., Lee S.H., Trynka G., Finucane H., Vilhjálmsson B.J., Xu H., Zang C., Ripke S., Bulik-Sullivan B., Stahl E., et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 2014;95:535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Euesden J., Lewis C.M., O’Reilly P.F. PRSice: polygenic risk score software. Bioinformatics. 2015;31:1466–1468. doi: 10.1093/bioinformatics/btu848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Yang J., Manolio T.A., Pasquale L.R., Boerwinkle E., Caporaso N., Cunningham J.M., De Andrade M., Feenstra B., Feingold E., Hayes M.G., et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Lee S.H., DeCandia T.R., Ripke S., Yang J., Schizophrenia Psychiatric Genome-Wide Association Study Consortium PGC-SCZ. International Schizophrenia Consortium ISC. Molecular Genetics of Schizophrenia Collaboration MGS. Sullivan P.F., Goddard M.E., Keller M.C., et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 2012;44:247–250. doi: 10.1038/ng.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Choi S.W., O'Reilly P.F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience. 2019;8:giz082. doi: 10.1093/gigascience/giz082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Zhao Z., Yi Y., Song J., Wu Y., Zhong X., Lin Y., Hohman T.J., Fletcher J., Lu Q. PUMAS: fine-tuning polygenic risk scores with GWAS summary statistics. Genome Biol. 2021;22:257–319. doi: 10.1186/s13059-021-02479-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Significance tests for R2 of out-of-sample prediction using polygenic scores

Md Moksedul Momin

Soohyun Lee

Naomi R Wray

S Hong Lee

Summary

Introduction

Material and methods

PGS models

Variance of R2

Variance of the difference between two R2 values

R2 difference when using different discovery samples to generate the PGS

R2 difference when using nested models

R2 difference when using two independent sets of PGSs