Evaluating and improving health equity and fairness of polygenic scores

Tianyu Zhang; Geyu Zhou; Lambertus Klei; Peng Liu; Alexandra Chouldechova; Hongyu Zhao; Kathryn Roeder; Max G’Sell; Bernie Devlin

doi:10.1016/j.xhgg.2024.100280

. 2024 Feb 23;5(2):100280. doi: 10.1016/j.xhgg.2024.100280

Evaluating and improving health equity and fairness of polygenic scores

Tianyu Zhang ^1,^∗, Geyu Zhou ², Lambertus Klei ³, Peng Liu ⁴, Alexandra Chouldechova ^5,⁶, Hongyu Zhao ², Kathryn Roeder ^1,⁷, Max G’Sell ¹, Bernie Devlin ^3,^8,^∗∗

PMCID: PMC10937319 PMID: 38402414

Summary

Polygenic scores (PGSs) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single-nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWASs, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum (JLS). In the simulation settings we explore, JLS provides more accurate PGSs compared to other methods, especially when measured in terms of fairness. In analyses of UK Biobank data, JLS was computationally more efficient but slightly less accurate than a Bayesian comparator, SDPRX. Like all PGS methods, JLS requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how JLS can help mitigate fairness-related harms that might result from the use of PGSs in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWASs for different ancestries, JLS is an effective approach for enhancing portability and reducing predictive bias.

Genetic scores of individuals can be used to predict their traits, such as eventual disease status. However, these scores work best in the ancestry from which they were developed. Lack of portability across ancestries violates fairness principles and could generate clinical harm. We propose methods to expand portability.

Introduction

Phenotypic diversity is a hallmark of human populations. When heritable variation underlies within-population diversity, polygenic scores (PGSs) are logical, if imperfect, predictors for phenotypic variation.¹^,²^,³^,⁴ Insofar as we are aware, the concept of genetic scores arose in the animal breeding literature to estimate the breeding values of individuals, usually males.⁵^,⁶ In human genetics, PGSs were introduced⁷ as a natural byproduct of a genome-wide association study (GWAS), from which a subset of SNPs were selected by some threshold for their association with phenotype and some highly dependent SNPs were pruned out to create a sparser set of quasi-independent predictors. Since the original pruning-and-thresholding (P&T) method was introduced,⁷ many and somewhat more accurate methods of PGSs have been developed.⁸^,⁹^,¹⁰^,¹¹^,¹²^,¹³^,¹⁴^,¹⁵^,¹⁶^,¹⁷^,¹⁸ An ongoing challenge of such PGSs, however, is making them portable across ancestries—in other words, if a PGS were derived from a GWAS based on subjects of a certain ancestry, does it provide equivalent prediction for individuals of other ancestries? Disappointingly, the answer is usually no.¹⁹^,²⁰^,²¹^,²²^,²³ For example, prediction accuracy for European-based PGSs was 4.5-fold lower when applied to individuals of African (AFR) ancestry,²⁴ raising concerns that clinical use of PGSs could exacerbate existing health disparities.²⁰

In part, migration patterns and limitations to gene flow underlie lack of portability. Populations separated by greater distances or geographic impediments, which tend to hinder gene flow, also tend to cause larger allele frequency differences than obtained for nearby populations. Nonetheless, whenever the distribution of genetic variation within and among populations is assessed, a recurrent theme emerges: variation common in one population is rarely absent in others, even for populations from different continents.²⁵^,²⁶ Exceptions include variation under natural selection due to environmental forces unique to specific populations. That most common genetic variations are present across diverse ancestries, even if at different frequencies, allows for a simplifying assumption for methods seeking to make PGSs more portable: assume that common variants altering variation of a phenotype in one population (causal variants) have similar, but not identical, effects in other populations.²⁷ Another complication is linkage disequilibrium (LD).²⁸ Due to extensive LD among proximate SNPs in a population, GWAS does not necessarily reveal causal variants. Rather, it reveals associated SNPs, some of which are causal, many of which are neutral but in LD with causal variants, and others that are falsely associated. Moreover, LD patterns differ among populations, due to human history and resulting allele frequency differences.²⁸ To overcome this hurdle, it is commonly assumed that LD patterns derived from population resources such as 1000 Genomes Project are useful approximations for particular population samples and that these can be used to understand patterns of association in different populations.

Some recent reviews²⁹^,³⁰ delineate the challenges and opportunities in the field, and various methods have been proposed to improve portability,²³^,³¹^,³²^,³³^,³⁴^,³⁵^,³⁶ including methods such as XPXP³⁷ and PolyPred³⁸ that incorporate information beyond GWASs of one phenotype. Building on modern statistical methods, for example, TL-PRS³⁹ uses transfer learning (TL) techniques to improve portability. In this setting, we conjectured that model selection techniques, such as penalized regression embodied in Lassosum (LS),¹⁰ could be another useful approach for refining portability. Here, we develop Joint-Lassosum (JLS), a computationally efficient method to leverage GWAS summary statistics and their proxy LD matrices from two ancestries. This approach is more efficient, computationally, than its natural Bayesian counterpart, PRS-CSx,²³ or hierarchical Bayesian SDPRX,³⁵ although not necessarily more accurate. Using simulations consistent with our assumptions, JLS achieved good prediction performance across various scenarios, including in comparison to prediction from TL-PRS. These simulations use schizophrenia as a model outcome and Europeans and Yorubas as model ancestries. Additionally, we introduce a convenient method for choosing tuning parameters that obviates the need for independent validation data, which are often extremely limited for underrepresented populations. Finally, we draw connections to the literature on algorithmic fairness and assess the fairness of PGSs across different models. We motivate the disparity in false discovery rate (FDR) across groups as a meaningful measure of predictive bias for PGSs and show that JLS is effective in reducing this disparity. Furthermore, we demonstrate that underrepresentation is not the only source of disparate predictive performance. Disparities in the predictive performance of PGSs persist even if Europeans and Yorubas are equally represented in the data. This is because the Yorubas LD structure creates a more challenging estimation problem.

Material and methods

Penalized linear regression for PGS

LS¹⁰ is a penalized regression method for derivation of a PGS and requires only GWAS summary statistics and LD information as input. The $l_{1}$ -penalized regression⁴⁰ minimizes the objective function,

f (β) = y^{T} y + β^{T} R β - 2 β^{T} r + λ ∥ β ∥_{1} .

Here, $y$ is the phenotype vector of length n (the sample size); $R = X^{T} X$ is the LD matrix between SNPs (LD block matrix), in the form of a correlation matrix; and $r = X^{T} y$ is a vector of relationships between the SNPs and phenotype, which can be constructed using GWAS summary statistics. The $∥ β ∥_{1}$ term is the $l_{1}$ -norm of the SNP effects vector $β$ . Such a sparsity-inducing regularization term shrinks most of the SNP effects to zero, and λ is a hyperparameter controlling the regularization strength. GWASs usually report phenotype-SNP relationships $r$ , but the SNP-SNP LD level, which requires the genotypes of the subjects in the study, is often not available. Instead, $R$ is often estimated based on another resource, such as the 1000 Genomes Project. This replacement does not hurt statistical validity, although it could cause the solution to be numerically more sensitive. Therefore, the LS algorithm aims to minimizes the following objective function:

f (β) = (1 - s) β^{T} R β - 2 β^{T} r + λ ∥ β ∥_{1} + s β^{T} β,

in which s is another regularization parameter to ensure a stable solution.

From one population to two: JLS

We extend LS by incorporating information from two ancestries. Our proposed method, JLS (Figure 1), calls for two LD block matrices ( $R_{1}, R_{2}$ ) and GWAS summary statistics ( $r_{1}, r_{2}$ ) from two ancestries. These summary statistics can come from different resources. Suppose that the GWAS from population 1 is of large sample size, and therefore its PGS has good performance, whereas we especially wish to improve the PGS for population 2, the “target” population. The (minimization) objective function of JLS is

f (β) = γ {(1 - s) β^{T} R_{1} β - 2 β^{T} r_{1}} + (1 - γ) {(1 - s) β^{T} R_{2} β - 2 β^{T} r_{2}} + λ ∥ β ∥_{1} + s β^{T} β .

Joint-Lassosum (JLS)

This flowcharts presents the standard implementation of JLS, which would require independent individual-level target population tuning (for hyperparameter selection) and testing data (assessing diagnostic accuracy). In practice, the independent tuning and testing datasets may not be available, which motivates our synthetic-data-based tuning procedure in proposed parameter-tuning procedures.

We introduced a new hyperparameter, $0 \leq γ \leq 1$ , to balance the contribution of the two sets of samples. A larger γ up-weights information from population 1, which would have less variability in its association statistics, whereas a smaller γ induces a β calibrated better to the target population’s genome structure. The objective function can be solved by a coordinate descent algorithm, which is described in detail in the supplemental information.

There are some direct generalizations of one-population methods to two-population problems in the literature. For example, there are methods that take the weighted average of two models trained separately for each population (one from the source population and one from the target population), including “weighted P&T”⁴¹ and “weighted LS” (WLS).³⁹ Although conceptually simple, both methods require tuning models for both populations because the optimal values for parameters will typically be different for each component model. By comparison, JLS only requires tuning two key parameters, γ and λ.

Proposed parameter-tuning procedures

Parameter tuning is a common challenge for PGS methods, including all the penalized-regression-based PGS methods. In our method, there are three hyperparameters: (1) sparsity penalty parameter λ, (2) population weighting parameter γ, and (3) shrinkage parameter s. The penalty parameter λ controls the number of SNPs that have non-zero contribution to the PGS (i.e., the sparsity of the regression coefficients). The weighting parameter γ balances the contribution of two populations to the loss function, and the shrinkage parameter s is introduced because of numerical concerns. According to our simulation results, the parameter s can be trivially set to 0.9. The parameter tuning mainly focuses on the other two more meaningful parameters: sparsity parameter λ and population weight parameter γ.

To tune these hyperparameters, we propose using a synthetic-sample-based procedure. This proposal is inspired by the parametric bootstrap estimation of prediction error described in Bradley,⁴² which is shown to have favorable theoretical guarantees under mild assumptions. The procedure can be briefly described as the following (for a sketch of the framework, see Figure 2).

(1)
Perform a preliminary fitting of JLS under a set of candidate hyperparameters. Each candidate hyperparameter corresponds to one candidate model.
(2)
Use one of the fitted coefficients ${\hat{β}}_{p r e}$ and publicly available genotype information to generate (individual-level) synthetic samples. They are first generated as individual-level samples. The synthetic sample should have a similar sample size, noise level, and study design to the original dataset of interest. (Note that this information can be obtained by examining summary statistics from the GWAS samples).
(3)
We employ standard (cross-)validation techniques with the synthetic samples to select the hyperparameters. This is possible because we have access to the genotypes of each synthetic individual and their phenotype. More specifically, this step can be further divided as follows.
- (3.1)
  Split the data generated in step (2) into training and testing sets.
- (3.2)
  Fit models using the training set with the same candidate hyperparameters as in step (1).
- (3.3)
  Use the testing set to evaluate the performance and determine the better candidate hyperparameter.

Framework of synthetic-data-based hyperparameter tuning procedure

We take the proposed JLS method as an example. In this work, the “source population” usually refers to subjects of European ancestry (CEU) and the “target population” refers to those of African ancestry, specifically Yoruba (YRI).

A range of sparsity hyperparameters (λs) may produce similar performance (areas under the receiver operating characteristic curve [AUCs]). In this situation, we recommend choosing the more regularized model (i.e., the model that selects fewer non-zero regression coefficients).

(4)
Choose the model, among those created in step (1), that corresponds to the hyperparameter selected in step (3).

This procedure is general and is applicable to other methods as well.

Simulations using realistic LD data

We performed simulations to evaluate the performance of JLS under various scenarios and compare its efficiency with some other PGS algorithms. Simulations were based on two populations, one of European (EUR) ancestry and another of AFR ancestry. Imitating the current-day reality, for most simulations, the sample size was larger for the sample of EUR ancestry, and we wish to transfer what is learned from the larger EUR population to the smaller AFR population. To generate genotypes, we used data on inferred haplotypes from individuals who contributed DNA to the 1000 Genomes Project, specifically haplotypes from 179 individuals of European (Utah residents with Northern and Western European ancestry [CEU]) and 178 African (Yoruba [YRI]) ancestry. From these data, we selected SNPs that were haplotyped across all samples; had minor allele frequency greater than 0.01 and p values greater than 0.005 from the two tests of Hardy-Weinberg equilibrium, one for each ancestry group; were autosomal; and fell into one of the 2,559 and 1,681 LD blocks for YRI and CEU, respectively, as defined by Berisa and Pickrell.⁴³ In total, 5,630,745 SNPs met these criteria.

Within each ancestry, genotypes for each of n individuals were generated by LD block (e.g., within a block, sample from the 356 haplotypes that were available for YRI). For each of the n individuals, we randomly selected two of the haplotypes, with the restriction that they could not be the same. Performing this operation over all haplotype blocks creates the combined set of genotypes for 5,630,745 SNPs; among these SNPs, the LD structures of the original CEU and YRI samples were largely preserved, as were the relative allele frequencies.

From these SNPs, we randomly selected 4,000 causal variants after meeting these conditions: SNPs were chosen based on the relative length of each chromosome versus their total length, the first causal SNP was chosen to be between 100 and 250 kb from the first SNP on the chromosome, and the remaining causal SNPs were then spread approximately equidistant across the chromosome. The average distance between causal SNPs was approximately 690 kb.

Our next goal was to obtain the binary phenotypes for individuals using the standard threshold model from quantitative genetics,⁴⁴ which assumes an underlying normal distribution for population liability for a binary trait and under which an individual is affected if their liability exceeds a threshold t. The prevalence of affection status in the population, which we set to 1.5% for our simulations, determined t. To generate causal effects, we fixed the variance due to environmental effects to 1, which is typical for the threshold model. The total variance due to genetics effect was then $σ_{g} = (1 - h^{2}) / h^{2}$ , in which $h^{2}$ is the heritability of a binary trait on the liability scale. After distributing the genetic variance evenly across loci, $τ = σ_{g} / 4, 000$ , the effect size for each SNP i was defined as ${(τ / (2 p_{i} q_{i}))}^{1 / 2}$ , where $p_{i}$ is the relative frequency of causal allele i in the population sample and $q_{i} = 1 - p_{i}$ .

The genetic score for each simulated subject was determined by summing the number of risk alleles times the effect size at each locus. Formally speaking, let $x_{r i s k, i}$ be the risk allele genotype profile of individual i and $β_{r i s k}$ record the effect size of each risk allele. We determine the score of individual i as ${GeneScore}_{i} = x_{r i s k, i}^{⊤} β_{r i s k}$ . Due to the finite size of subjects and causal variants, we adjusted these genetic scores so that their mean is zero. To account for environmental effects on the phenotype, a random number drawn from $N (0, 1)$ was added to the individual’s genetic score to obtain their phenotypic score on the liability scale. Affection status was then assigned based on whether the phenotypic score exceeds t from the normal distribution $N (0, σ_{g} + 1)$ . Rejection sampling was performed until $n / 2$ -affected and $n / 2$ -unaffected individuals were obtained. For each simulation, we created a population used to calculate the GWAS (training [TRN]), tuning parameters, and testing for both CEU and YRI. Unless noted otherwise, for each of the TRN, tuning, and testing data set, we generated 20,000 CEU subjects and 4,000 YRI subjects. To obtain references populations for quantifying LD among SNPs, we generated random sets of subjects of CEU and YRI ancestry with the same size. For these samples, affection status was irrelevant.

Results

Performance of PGS

We investigate performance of some one- and two-population PGS methods under various simulated scenarios. In the first scenario, we set the heritability of liability of the phenotype to 80% and set the genetic variance explained by each causal variant to be the same over SNPs and populations (CEU and YRI). Because relative allele frequencies of SNPs typically differ between populations, the realized effects of the SNPs on phenotype differed as well. In contrast to a one-population method (LS), two-population PGS algorithms substantially improve prediction of affection status for the smaller YRI sample (Figure 3A). Of the two-population PGS algorithms, JLS slightly outperforms TL³⁹ and WLS. Similar patterns were obtained for lower heritability (50%), although prediction accuracy is correspondingly diminished over all methods too (Figure S1A). The same patterns hold when there are different heritabilities, 80% for CEU and 60% for YRI populations (Figure S1B).

Prediction accuracy of PGS algorithms

The accuracy is measured by area under the receiver operating characteristic curve (AUC). The legend uses the format method (training "2" tuning population samples). For example, LS(Y2Y) stands for the LS score, which is trained with a sample from the CEU poplation and tuned with a sample from the CEU population.

(A) The dashed line is the AUC of a Lassosum (LS) PGS for a CEU sample, trained using another sample from the CEU population. Also shown are an LS PGS for a YRI sample, trained using another sample from the YRI LS(Y2Y) or CEU LS(C2Y) population; a PGS obtained as a weighted combination of the Y2Y and C2Y PGSs, denoted as WLS; a PGS obtained by a transfer learning (TL) method; and a PGS obtained by JLS.

(B–D) Results are shown for diminishing levels of genetic correlation (0.80, B; 0.60, C; and 0.4, D) between the trait in the CEU and YRI populations.

The meaning of prediction accuracy, as summarized by AUC, is arguably obscure for clinical practice. Suppose clinicians wanted to apply a preventative treatment to adolescents who, without it, would go on to be diagnosed with schizophrenia. To identify those at greatest risk, they selected individuals with extreme PGSs, say, exceeding 2. The fraction of individuals selected who would never be affected, the FDR, differs markedly across populations and methods (Figure 4A). For the larger CEU population, when predicted using LS score trained with CEU samples, only 3.4% of the individuals would be falsely treated. However, a much larger fraction of the YRI population would be falsely treated regardless of method. As expected, based on AUC (Figure 3A), JLS performs slightly better than TL at minimizing the FDR (7.5% versus 8.2%), and far better than LS(Y2Y) (21.2%), although none perform as well as LS(C2C). As expected, more extreme results were obtained for lower (Figure S2A) or unequal heritabilities (Figure S2B).

Prediction inaccuracy, as highlighted by the false discovery rate (FDR)

The legend uses the format method (training “2” tuning population samples). For example, LS(C2Y) stands for the LS score, which is trained with a sample from the CEU population and tuned with a sample from the YRI population, and JLS(Y&C2Y) is the JLS score, which is trained with samples from both populations and tuned with a sample from the YRI population.

In the first simulated scenario (Figure 3A), the genetic correlation of the trait over the two populations was 1.0, even though the effects of risk alleles differed between the populations. We also explored prediction performance when the heritability of the trait was held constant over populations but the genetic correlation was less than unity. We explored three levels of genetic correlation, namely 0.8, 0.6, and 0.4 (Figures 3B–3D and 4B–4D), which was accomplished by reducing the overlap of causal loci (i.e., 80%, 60%, and 40% shared causal SNPs, respectively). As the genetic correlation declines, the prediction accuracy declines for all of the two-population methods. Nonetheless, they always outperformed the one-population methods. Notably, despite the 5-fold smaller size of the YRI sample, versus CEU, as the genetic correlation fell to 0.4, the predictive accuracy of the PGS for YRI trained on YRI outperformed that of YRI scored by CEU (Figure 3D). Paralleling results for AUC, the FDR increased with diminishing genetic correlation (Figures 4B–4D). If clinicians were to use a PGS trained on CEU samples for prediction of YRI risk, false treatment rates would be very high, approaching 30%.

To explore the effect of the size of the YRI sample, we generated 20,000 YRI under the 60% overlap scenario and compared the FDR over a range of PGS thresholds for four combinations of method, training population, and testing population. Even when YRI and CEU are equally represented in the data (Figure 5A), the FDR for LS(Y2Y) remains significantly greater than that of LS(C2C) at all thresholds. This suggests that shorter LD in YRI makes the estimation process inherently more challenging. Nonetheless, the poor performance of PGS for the YRI population, as seen in most of our results, is largely due to smaller sample size. Furthermore, JLS improves performance over LS for both YRI and CEU populations, even when the training sample sizes are both large (Figure 5A). This suggests that any bias induced by using heterogeneous training data is overcome by the reduced variance of the larger sample size. Next, by subsampling the 20,000 YRI to obtain a range of sample sizes from 4,000 to 20,000, we varied the proportion of the YRI to CEU population from 0.2 to 1.0 and evaluated the FDR for YRI at a threshold of 2 (Figure 5B). Notably, while LS is always inferior to JLS, the relative improvement produced by JLS declines as the proportion grows.

Prediction inaccuracy and fairness

(A) Equal sample size setting (YRI = CEU = 20,000).

(B) PGS threshold is set to be 2.0, varying YRI sample size (from 4,000 to 20,000). The data points at (x axis $=$ )2 in (A) are identical to those at 1 in (B). The legend uses the format method (training “2” tuning population samples). For example, LS(C2C) stands for PGS trained with a sample from the CEU population and tuned with a sample from the CEU population, and JLS(Y&C2Y) is the JLS score trained with samples from both populations and tuned with a sample from the YRI population.

Similar to the LS method, fitting a JLS model is computationally efficient. Moreover, the process can be run in parallel under the reasonable assumption that SNPs from different LD blocks are independent. The CPU time of JLS is roughly linear with respect to the total number of SNPs, the number of samples used to estimate the LD block structure, and the number of tuning parameter combinations. For an optimized set of hyperparameters, fitting the JLS took roughly 4 min using 22 CPU cores, with each core processing computations for a different autosome. See the supplemental information for a thorough analysis of computational times under different scenarios.

Parameter tuning with synthetic samples

While hyperparameter selection is critical for all PGS algorithms, an independent dataset for such tuning is often unavailable, especially for underrepresented populations. In material and methods, we presented a synthetic tuning approach to overcome this challenge. Here, through simulations, we evaluated the proposed approach. In these simulations, we generated training and testing samples (CEU and YRI) and calculated summary GWAS statistics. Using these summary statistics, multiple JLS models were fitted under 30 different combinations of hyperparameters γ and λ (three choices of γ times ten choices of λ). These 30 models all had different prediction accuracy, based on the test data, and we defined the best of them as the “Oracle (testing) AUC.” We performed 10 such simulations, so there were 10 “Oracle AUCs” for each population.

To apply the method described in proposed parameter-tuning procedures, we used the regression coefficients ${\hat{β}}_{p r e}$ of one of the 30 models to generate synthetic data. (In practice, it is up to the user which ${\hat{β}}_{p r e}$ is implemented because no other reference information is available.) We chose four different ${\hat{β}}_{p r e}$ to generate the synthetic data: (1) $γ = 0.8, λ = 7 \times 10^{- 3}$ ; (2) $γ = 0.5, λ = 7 \times 10^{- 3}$ ; (3) $γ = 0.2, λ = 1.5 \times 10^{- 2}$ ; and (4) $γ = 0.8, λ = 2.5 \times 10^{- 3}$ . The Oracle testing AUC in general is achieved around $γ = 0.8, λ = 2 \times 10^{- 2}$ , which we purposely avoided in the ${\hat{β}}_{p r e}$ settings. Using these synthetic tuning data, the hyperparameters yielding the best prediction from the synthetic test data are chosen. The JLS model corresponding to these hyperparameters, one of the 30 models, is then applied to the true target data to determine the realized AUC, which is then compared to the corresponding Oracle (Figure 6). For comparison, we also generated independent tuning data, 20,000 CEU and 4,000 YRI samples, and used them to perform model selection with cross-validation (CV). In each repeat, CV identified one model that is expected to perform best, and the selected model’s testing AUC was contrasted to the corresponding Oracle (Figure 6). Leveraging extra individual-level information, CV can more consistently select models of almost the best performance. However, synthetic data model selection can still help the investigators to effectively exclude severely underperforming ones without requiring any (extra) individual-level information. Typically, more than half of the 30 candidate models have YRI testing AUC $< 0.7$ (the worst can be as low as 0.6), and we observe that the proposed procedure is likely to select models of AUCs between 0.75 and 0.8 (as presented in Figure 6)—synthetic (4) is more underperforming, usually getting AUCs around 0.75.

Hyperparameter tuning results comparison

Synthetic-data-based tuning method gives comparable results to cross-validation (CV) and the Oracle for two population samples from CEU and YRI. The Oracle testing AUC on the x axis is the best AUC that can be achieved by the candidate hyperparameters. Synthetic datasets are generated under four different choices of ${\hat{β}}_{p r e}$ , which are estimated under different hyperparameter combinations.

Performance for type 2 diabetes (T2D) and coronary artery disease (CAD)

We next compared the performance of JLS with LS and SDPRX³⁵ on two binary traits, T2D and CAD, assessed in samples of EUR, East Asian (EAS), and AFR ancestry from the UK Biobank.⁴⁵ Summary GWAS statistics and genotype data were cleaned and processed as described by Zhou et al.³⁵ The GWAS pooled sample sizes were 156,109 (EUR), 191,764 (EAS), and 14,480 (AFR) for T2D⁴⁶^,⁴⁷^,⁴⁸ and 61,294 (EUR) and 101,091 (EAS) for CAD.⁴⁹^,⁵⁰ Validation samples from UK Biobank consisted of 1,263 EAS and 4,809 AFR individuals for T2D and 1,116 EAS individuals for CAD.

For parameter tuning, we also followed Zhou et al.³⁵ Specifically, we randomly assigned one-third of the participants for each ancestry to the tuning dataset and all other participants to the test dataset, which is used to evaluate prediction performance. After performing the random assignment 20 times, we computed the average AUC of PGS for the traits (Table 1). For both traits, the methods showed similar predictions, although JLS was computationally more efficient and SDPRX was more accurate. The baseline model LS has a decent prediction accuracy in certain settings, mainly due to the large sample size when calculating EAS or AFR GWAS statistics. We do observe a significant improvement when leveraging EUR information in predicting T2D outcomes for EAS individuals (AUC from 0.54 to 0.60).

Table 1.

Prediction results for type 2 diabetes and coronary artery disease from the UK Biobank

Method	T2D			CAD
Method	AUC, AFR	AUC, EAS	Time (h)	AUC, EAS	Time (h)
JLS	$0.54 \pm 0.01$	$0.60 \pm 0.04$	1.0	$0.61 \pm 0.04$	0.5
LS	$0.54 \pm 0.01$	$0.54 \pm 0.02$	0.1	$0.60 \pm 0.03$	0.1
SDPRX	$0.56 \pm 0.01$	$0.61 \pm 0.03$	4.5	$0.63 \pm 0.03$	6.9

Open in a new tab

Both testing accuracy and computational time are reported. SDPRX is a comparison polygenic score method.³⁵ AUC, AFR: testing population is of African (AFR) ancestry. AUC, EAS: testing population is of East Asian (EAS) ancestry. The wall-clock time was measured on an Intel Xeon Gold 6240 processor. We used the most suitable parallelization scheme for each method: JLS, 12 cores; LS, 3 cores; and SDPRX, 66 cores. T2D, type 2 diabetes; CAD, coronary artery disease; AUC, area under the receiver operating characteristic curve; JLS, Joint-Lassosum; LS, Lassosum.

Discussion

PGSs are typically developed from one major ancestry group, such as EUR ancestry. Because of human diversity, unequal information content of GWASs for different ancestries, and LD among nearby SNPs, such PGSs have limited portability to other ancestries. Here, we develop a portable PGS that requires only summary statistics from a relevant GWAS and that has the model selection advantages of the well-characterized Lasso method. We present JLS (Figure 1), which integrates information for two populations to increase the portability of the PGS from the larger “source” population to the smaller “target” population. Simulations show that JLS provides portability comparable with other methods (Figures 3, 4, and 5). JLS requires selection of data-driven tuning parameters, as do all PGS methods. We describe an approach to tuning that obviates the need for additional population samples, beyond the original GWAS, by generating synthetic population samples for tuning from the GWAS results (Figures 2 and 6). PUMAS⁵¹ is another recent method designed for tuning hyperparameters. It differs from our proposed method by leveraging the limiting distribution of the GWAS summary statistics, while our method is closer to the standard CV. JLS and our proposed synthetic tuning are effective approaches for enhancing PGS portability. In large part, our results emphasize the FDR as a measure of performance, in contrast to AUC, a popular metric. This is because AUC can fail to differentiate the performance of PGS methods in a meaningful way, while FDR provides a metric that is meaningful in practice.

We can also view the results of the simulations used to evaluate JLS (Figures 4 and 5) through the lens of fairness-related harms.⁵² Specifically, we are concerned with the potential for allocative harms, wherein a resource or burden is inequitably distributed due to the differential performance of a model across groups (here, ancestry groups). To ground the discussion, consider the hypothetical clinical screening scenario described for our simulations. Herein, a PGS model is to be used to identify adolescents who are at greatest risk of schizophrenia, for the purpose of providing therapeutic treatment aimed at prevention. In this setting, allocative harms can arise in two ways: (1) failing to treat individuals who develop schizophrenia and (2) unnecessarily treating individuals who would not develop schizophrenia even in the absence of treatment. We focus here on the second setting—overtreatment due to false positive predictions.

Individuals who are flagged in adolescence as being high risk for developing schizophrenia and subsequently treated would be subject to at least two sources of harm. First, beyond burdens such as cost, any treatment could have significant side effects, and side effects of drugs designed to prevent psychosis are numerous.⁵³ Second, such individuals could unnecessarily and adversely alter their lives going forward due to perceived risk. We can assess the extent to which a given PGS model would inequitably subject individuals to these harms by comparing the FDRs across groups. Because FDR equals one minus the positive predictive value, comparing FDRs is equivalent to assessing a model for predictive parity—that is, the equivalence of positive predictive values—a fairness metric considered in risk assessment contexts such as criminal justice and healthcare.⁵⁴^,⁵⁵^,⁵⁶ For instance, consider results in Figure 4C, in which there is 60% overlap of risk variants in the Yorubas and CEU- EUR populations. In the absence of joint modeling, FDR rates at all score thresholds are considerably lower for the CEU-EUR population (LS(C2C)) compared to the Yorubas population (LS(Y2Y)). Joint modeling helps to bridge this gap, with the proposed JLS method producing the greatest decrease in FDR. For instance, at a threshold of 2, which is high enough to be plausibly clinically relevant, the FDR for YRI drops from 25% to 13% when applying JLS. This brings the FDR disparity from 25%–3% = 22% to 13%–3% = 10%. Thus, while disparity persists, it is greatly mitigated by applying JLS.

For simplicity, we focused our simulation analyses on two continental ancestry groups. Nonetheless, the results have direct relevance to African Americans, whose ancestry typically is a mixture of AFR and EUR ancestries. Clearly, the fairness discrepancies arising from unequal sample sizes for GWAS, as seen in Figures 4 and 5, apply with equal force to African Americans, who are typically underrepresented in genetic studies. Moreover, of relevance to our hypothetical clinical screening scenario, African Americans also tend to be disproportionately diagnosed with psychotic disorders, at least in part due to diagnostic bias.⁵⁷ This underscores the importance of ensuring accuracy of PGSs for minority populations, if PGSs are to be used in a clinical setting, as a matter of fairness and to minimize harms.

There are now a variety of two-population methods for enhancing portability.²³^,³¹^,³²^,³³^,³⁴^,³⁵ Undoubtedly, some of these methods will yield better prediction than JLS for some pairs of populations and samples, as our analyses of UK Biobank data show (Table 1). Nonetheless, whenever multiple analytical methods have different advantages in different settings, ensembles of these methods, appropriately tuned, will perform at least as well as and usually better than any one of the methods. Hence, an ensemble of current two-population methods is a viable option for greater prediction accuracy.

Data and code availability

The algorithm code of the proposed JLS method and synthetic parameter tuning is available at https://github.com/terrytianyuzhang/JointLassosum/.

Web resources

BBJ, https://pheweb.jp/downloads

CARDIoGRAMplusC4D, http://www.cardiogramplusc4d.org/data-downloads/

DIAGRAM, https://diagram-consortium.org/downloads.html

PAGE, https://www.ebi.ac.uk/gwas/studies/

PLINK, https://www.cog-genomics.org/plink/

SDPRX, https://github.com/eldronzhou/SDPRX

Acknowledgments

This project was funded by National Institute of Mental Health (NIMH) grants R01MH123184 (K.R.), R37MH057881 (B.D.), and R01MH128813 (to Joseph Buxbaum), and G.Z. and H.Z. were supported in part by NIH grant R01 HG012735. G.Z. and H.Z. conducted the comparisons using UK Biobank resources under an approved data request (ref. 29900). We thank BBJ, CARDIoGRAMplusC4D, DIAGRAM, and PAGE consortia for making their GWAS summary data accessible.

Declaration of interests

The authors declare no competing interests.

Footnotes

It can be found online at https://doi.org/10.1016/j.xhgg.2024.100280.

Contributor Information

Tianyu Zhang, Email: tianyuz3@andrew.cmu.edu.

Bernie Devlin, Email: devlinbj@upmc.edu.

Supplemental information

Document S1. Supplemental results, Figures S1–S4, Table S1, and supplemental methods

mmc1.pdf^{(349.4KB, pdf)}

Document S2. Article plus supplemental information

mmc2.pdf^{(3.4MB, pdf)}

References

1.Wray N.R., Lee S.H., Mehta D., Vinkhuyzen A.A.E., Dudbridge F., Middeldorp C.M. Research review: polygenic methods and their application to psychiatric traits. JCPP (J. Child Psychol. Psychiatry) 2014;55:1068–1087. doi: 10.1111/jcpp.12295. [DOI] [PubMed] [Google Scholar]
2.Choi S.W., Mak T.S.-H., O’Reilly P.F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 2020;15:2759–2772. doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Lambert S.A., Gil L., Jupp S., Ritchie S.C., Xu Y., Buniello A., McMahon A., Abraham G., Chapman M., Parkinson H., et al. The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 2021;53:420–425. doi: 10.1038/s41588-021-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Ni G., Zeng J., Revez J.A., Wang Y., Zheng Z., Ge T., Restuadi R., Kiewa J., Nyholt D.R., Coleman J.R.I., et al. A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts. Biol. Psychiatr. 2021;90:611–620. doi: 10.1016/j.biopsych.2021.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lande R., Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124:743–756. doi: 10.1093/genetics/124.3.743. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Meuwissen T.H., Hayes B.J., Goddard M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.International Schizophrenia Consortium. Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O'Donovan M.C., Sullivan P.F., Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Tian G., Chen C.-Y., Ni Y., Feng Y.A., Smoller J.W. Polygenic prediction via bayesian regression and continuous shrinkage priors. Nat. Commun. 2019;10:1776. doi: 10.1038/s41467-019-09718-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Mak T.S.H., Porsch R.M., Choi S.W., Zhou X., Sham P.C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 2017;41:469–480. doi: 10.1002/gepi.22050. [DOI] [PubMed] [Google Scholar]
11.Oetjens M.T., Kelly M.A., Sturm A.C., Martin C.L., Ledbetter D.H. Quantifying the polygenic contribution to variable expressivity in eleven rare genetic disorders. Nat. Commun. 2019;10:4897–4910. doi: 10.1038/s41467-019-12869-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Privé F., Vilhjálmsson B.J., Aschard H., Blum M.G.B. Making the most of clumping and thresholding for polygenic scores. Am. J. Hum. Genet. 12 2019;105:1213–1221. doi: 10.1016/j.ajhg.2019.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Lambert S.A., Abraham G., Inouye M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 2019;28:R133–R142. doi: 10.1093/hmg/ddz187. [DOI] [PubMed] [Google Scholar]
14.Aragam K.G., Natarajan P. Polygenic scores to assess atherosclerotic cardiovascular disease risk: clinical perspectives and basic implications. Circ. Res. 2020;126:1159–1177. doi: 10.1161/CIRCRESAHA.120.315928. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ma Y., Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet. 2021;37:995–1011. doi: 10.1016/j.tig.2021.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Mistry S., Judith R., Harrison D.J.S., Escott-Price V., Zammit S. The use of polygenic risk scores to identify phenotypes associated with genetic risk of schizophrenia: Systematic review. Schizophr. Res. 2018;197 doi: 10.1016/j.schres.2017.10.037. [DOI] [PubMed] [Google Scholar]
17.Jansen P.R., Polderman T.J.C., Bolhuis K., van der Ende J., Jaddoe V.W.V., Verhulst F.C., White T., Posthuma D., Tiemeier H. Polygenic scores for schizophrenia and educational attainment are associated with behavioural problems in early childhood in the general population. JCPP (J. Child Psychol. Psychiatry) 2018;59:39–47. doi: 10.1111/jcpp.12759. [DOI] [PubMed] [Google Scholar]
18.Fanous A.H., Zhou B., Aggen S.H., Bergen S.E., Amdur R.L., Duan J., Sanders A.R., Shi J., Mowry B.J., Olincy A., et al. Genome-wide association study of clinical dimensions of schizophrenia: polygenic effect on disorganized symptoms. Am. J. Psychiatr. 2012;169:1309–1317. doi: 10.1176/appi.ajp.2012.12020218. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Curtis D. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. Psychiatr. Genet. 2018;28:85–89. doi: 10.1097/YPG.0000000000000206. [DOI] [PubMed] [Google Scholar]
20.Clarke S.L., Assimes T.L., Tcheandjieu C. The propagation of racial disparities in cardiovascular genomics research. Circ. Genom. Precis. Med. 2021;14 doi: 10.1161/CIRCGEN.121.003178. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Privé F., Aschard H., Carmi S., Folkersen L., Hoggart C., O’Reilly P.F., Vilhjálmsson B.J. Portability of 245 polygenic scores when derived from the uk biobank and applied to 9 ancestry groups from the same cohort. Am. J. Hum. Genet. 2022;109:373–423. doi: 10.1016/j.ajhg.2022.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Yang S., Zhou X. Pgs-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Briefings Bioinf. 2022;23:bbac039. doi: 10.1093/bib/bbac039. [DOI] [PubMed] [Google Scholar]
23.Ruan Y., Lin Y.-F., Feng Y.C.A., Chen C.-Y., Lam M., Guo Z., Stanley Global Asia Initiatives. He L., Sawa A., Martin A.R., et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 2022;54:573–580. doi: 10.1038/s41588-022-01054-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Rosenberg N.A. A population-genetic perspective on the similarities and differences among worldwide human populations. Hum. Biol. 2011;83:659–684. doi: 10.3378/027.083.0601. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Biddanda A., Rice D.P., Novembre J. A variant-centric perspective on geographic patterns of human allele frequency variation. Elife. 2020;9 doi: 10.7554/eLife.60107. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Hou K., Ding Y., Xu Z., Wu Y., Bhattacharya A., Mester R., Belbin G.M., Buyske S., Conti D.V., Darst B.F., et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 2023;55:549–558. doi: 10.1038/s41588-023-01338-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Kang J.T.L., Rosenberg N.A. Mathematical properties of linkage disequilibrium statistics defined by normalization of the coefficient d = pab - papb. Hum. Hered. 2019;84:127–143. doi: 10.1159/000504171. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Wang Y., Tsuo K., Kanai M., Neale B.M., Martin A.R. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 2022;5:293–320. doi: 10.1146/annurev-biodatasci-111721-074830. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Kachuri L., Chatterjee N., Hirbo J., Schaid D.J., Martin I., Kullo I.J., Kenny E.E., Pasaniuc B., Auer P.L., Conomos M.P., et al. Tian Ge, and Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium Methods Working Group. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 2023;25:8–25. doi: 10.1038/s41576-023-00637-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Hu Y., Lu Q., Powles R., Yao X., Yang C., Fang F., Xu X., Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 2017;13 doi: 10.1371/journal.pcbi.1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Lloyd-Jones L.R., Zeng J., Sidorenko J., Yengo L., Moser G., Kemper K.E., Wang H., Zheng Z., Magi R., Esko T., et al. Improved polygenic prediction by bayesian multiple regression on summary statistics. Nat. Commun. 2019;10:5086–5111. doi: 10.1038/s41467-019-12653-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Amariuta T., Ishigaki K., Sugishita H., Ohta T., Koido M., Kushal K., Dey, Matsuda K., Murakami Y., Price A.L., Kawakami E., et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 2020;52:1346–1354. doi: 10.1038/s41588-020-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Márquez-Luna C., Gazal S., Loh P.-R., Kim S.S., Furlotte N., Auton A., 23andMe Research Team. Price A.L. Incorporating functional priors improves polygenic prediction accuracy in uk biobank and 23andme data sets. Nat. Commun. 10 2021;12:6052. doi: 10.1038/s41467-021-25171-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Zhou G., Chen T., Zhao H. Sdprx: A statistical method for cross-population prediction of complex traits. Am. J. Hum. Genet. 2023;110:13–22. doi: 10.1016/j.ajhg.2022.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Miao J., Guo H., Song G., Zhao Z., Hou L., Lu Q. Quantifying portable genetic effects and improving cross-ancestry genetic prediction with gwas summary statistics. Nat. Commun. 2023;14:832. doi: 10.1038/s41467-023-36544-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Xiao J., Cai M., Hu X., Wan X., Chen G., Yang C. Xpxp: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics. 2022;38:1947–1955. doi: 10.1093/bioinformatics/btac029. [DOI] [PubMed] [Google Scholar]
38.Weissbrod O., Kanai M., Shi H., Gazal S., Peyrot W.J., Khera A.V., Okada Y., Biobank Japan Project. Martin A.R., Finucane H.K., Price A.L. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 2022;54:450–458. doi: 10.1038/s41588-022-01036-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhao Z., Fritsche L.G., Smith J.A., Mukherjee B., Lee S. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 2022;109:1998–2008. doi: 10.1016/j.ajhg.2022.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. J. Roy. Stat. Soc. B. 2011;73:273–282. [Google Scholar]
41.Márquez-Luna C., Loh P.-R., South Asian Type 2 Diabetes SAT2D Consortium, SIGMA Type 2 Diabetes Consortium, Price, A.L. South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium, and Alkes L Price. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 2017;41:811–823. doi: 10.1002/gepi.22083. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Bradley E. The estimation of prediction error: covariance penalties and cross-validation. J. Am. Stat. Assoc. 2004;99:619–632. [Google Scholar]
43.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Falconer D.S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 1965;29:51–76. [Google Scholar]
45.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The uk biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Genevieve L.W., Graff M., Katherine K., Nishimura, Tao R., Haessler J., Gignoux C.R., Heather M., Highland, Patel Y.M., Sorokin E.P., Avery C.L., et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Scott R.A., Scott L.J., Mägi R., Marullo L., Gaulton K.J., Kaakinen M., Pervjakova N., Pers T.H., Johnson A.D., Eicher J.D., et al. An expanded genome-wide association study of type 2 diabetes in europeans. Diabetes. 2017;66:2888–2902. doi: 10.2337/db16-1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Suzuki K., Akiyama M., Ishigaki K., Kanai M., Hosoe J., Shojima N., Hozawa A., Kadota A., Kuriki K., Naito M., et al. Identification of 28 new susceptibility loci for type 2 diabetes in the japanese population. Nat. Genet. 2019;51:379–386. doi: 10.1038/s41588-018-0332-4. [DOI] [PubMed] [Google Scholar]
49.Schunkert H., König I.R., Kathiresan S., Reilly M.P., Assimes T.L., Holm H., Preuss M., Stewart A.F.R., Barbalic M., Gieger C., et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 2011;43:333–338. doi: 10.1038/ng.784. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Koyama S., Ito K., Terao C., Akiyama M., Horikoshi M., Momozawa Y., Matsunaga H., Ieki H., Ozaki K., Onouchi Y., et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat. Genet. 2020;52:1169–1177. doi: 10.1038/s41588-020-0705-3. [DOI] [PubMed] [Google Scholar]
51.Zhao Z., Yi Y., Song J., Wu Y., Zhong X., Lin Y., Hohman T.J., Fletcher J., Lu Q. Pumas: fine-tuning polygenic risk scores with gwas summary statistics. Genome Biol. 2021;22 doi: 10.1186/s13059-021-02479-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Crawford K. NeurIPS keynote; 2017. The Trouble with Bias. [Google Scholar]
53.Mastro A.M., Rozengurt E. Endgoenous protein kinase in outer plasma membrane of cultured 3t3 cells. nature of the membrane-bound substrate and effect of cell density, serum addition, and oncogenic transformation. J. Biol. Chem. 1976;251:7899–7906. [PubMed] [Google Scholar]
54.Chouldechova A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data. 2017;5:153–163. doi: 10.1089/big.2016.0047. [DOI] [PubMed] [Google Scholar]
55.Verma S., Rubin J. Proceedings of the International Workshop on Software Fairness. 2018. Fairness definitions explained; pp. 1–7. [Google Scholar]
56.Obermeyer Z., Powers B., Vogeli C., Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
57.Gara M.A., Minsky S., Silverstein S.M., Miskimen T., Strakowski S.M. A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic. Psychiatr. Serv. 2019;70:130–134. doi: 10.1176/appi.ps.201800223. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental results, Figures S1–S4, Table S1, and supplemental methods

mmc1.pdf^{(349.4KB, pdf)}

Document S2. Article plus supplemental information

mmc2.pdf^{(3.4MB, pdf)}

Data Availability Statement

The algorithm code of the proposed JLS method and synthetic parameter tuning is available at https://github.com/terrytianyuzhang/JointLassosum/.

[bib1] 1.Wray N.R., Lee S.H., Mehta D., Vinkhuyzen A.A.E., Dudbridge F., Middeldorp C.M. Research review: polygenic methods and their application to psychiatric traits. JCPP (J. Child Psychol. Psychiatry) 2014;55:1068–1087. doi: 10.1111/jcpp.12295. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Choi S.W., Mak T.S.-H., O’Reilly P.F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 2020;15:2759–2772. doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Lambert S.A., Gil L., Jupp S., Ritchie S.C., Xu Y., Buniello A., McMahon A., Abraham G., Chapman M., Parkinson H., et al. The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 2021;53:420–425. doi: 10.1038/s41588-021-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Ni G., Zeng J., Revez J.A., Wang Y., Zheng Z., Ge T., Restuadi R., Kiewa J., Nyholt D.R., Coleman J.R.I., et al. A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts. Biol. Psychiatr. 2021;90:611–620. doi: 10.1016/j.biopsych.2021.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Lande R., Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124:743–756. doi: 10.1093/genetics/124.3.743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Meuwissen T.H., Hayes B.J., Goddard M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.International Schizophrenia Consortium. Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O'Donovan M.C., Sullivan P.F., Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Tian G., Chen C.-Y., Ni Y., Feng Y.A., Smoller J.W. Polygenic prediction via bayesian regression and continuous shrinkage priors. Nat. Commun. 2019;10:1776. doi: 10.1038/s41467-019-09718-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Mak T.S.H., Porsch R.M., Choi S.W., Zhou X., Sham P.C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 2017;41:469–480. doi: 10.1002/gepi.22050. [DOI] [PubMed] [Google Scholar]

[bib11] 11.Oetjens M.T., Kelly M.A., Sturm A.C., Martin C.L., Ledbetter D.H. Quantifying the polygenic contribution to variable expressivity in eleven rare genetic disorders. Nat. Commun. 2019;10:4897–4910. doi: 10.1038/s41467-019-12869-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Privé F., Vilhjálmsson B.J., Aschard H., Blum M.G.B. Making the most of clumping and thresholding for polygenic scores. Am. J. Hum. Genet. 12 2019;105:1213–1221. doi: 10.1016/j.ajhg.2019.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Lambert S.A., Abraham G., Inouye M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 2019;28:R133–R142. doi: 10.1093/hmg/ddz187. [DOI] [PubMed] [Google Scholar]

[bib14] 14.Aragam K.G., Natarajan P. Polygenic scores to assess atherosclerotic cardiovascular disease risk: clinical perspectives and basic implications. Circ. Res. 2020;126:1159–1177. doi: 10.1161/CIRCRESAHA.120.315928. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Ma Y., Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet. 2021;37:995–1011. doi: 10.1016/j.tig.2021.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Mistry S., Judith R., Harrison D.J.S., Escott-Price V., Zammit S. The use of polygenic risk scores to identify phenotypes associated with genetic risk of schizophrenia: Systematic review. Schizophr. Res. 2018;197 doi: 10.1016/j.schres.2017.10.037. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Jansen P.R., Polderman T.J.C., Bolhuis K., van der Ende J., Jaddoe V.W.V., Verhulst F.C., White T., Posthuma D., Tiemeier H. Polygenic scores for schizophrenia and educational attainment are associated with behavioural problems in early childhood in the general population. JCPP (J. Child Psychol. Psychiatry) 2018;59:39–47. doi: 10.1111/jcpp.12759. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Fanous A.H., Zhou B., Aggen S.H., Bergen S.E., Amdur R.L., Duan J., Sanders A.R., Shi J., Mowry B.J., Olincy A., et al. Genome-wide association study of clinical dimensions of schizophrenia: polygenic effect on disorganized symptoms. Am. J. Psychiatr. 2012;169:1309–1317. doi: 10.1176/appi.ajp.2012.12020218. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Curtis D. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. Psychiatr. Genet. 2018;28:85–89. doi: 10.1097/YPG.0000000000000206. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Clarke S.L., Assimes T.L., Tcheandjieu C. The propagation of racial disparities in cardiovascular genomics research. Circ. Genom. Precis. Med. 2021;14 doi: 10.1161/CIRCGEN.121.003178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Privé F., Aschard H., Carmi S., Folkersen L., Hoggart C., O’Reilly P.F., Vilhjálmsson B.J. Portability of 245 polygenic scores when derived from the uk biobank and applied to 9 ancestry groups from the same cohort. Am. J. Hum. Genet. 2022;109:373–423. doi: 10.1016/j.ajhg.2022.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Yang S., Zhou X. Pgs-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Briefings Bioinf. 2022;23:bbac039. doi: 10.1093/bib/bbac039. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Ruan Y., Lin Y.-F., Feng Y.C.A., Chen C.-Y., Lam M., Guo Z., Stanley Global Asia Initiatives. He L., Sawa A., Martin A.R., et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 2022;54:573–580. doi: 10.1038/s41588-022-01054-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Rosenberg N.A. A population-genetic perspective on the similarities and differences among worldwide human populations. Hum. Biol. 2011;83:659–684. doi: 10.3378/027.083.0601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Biddanda A., Rice D.P., Novembre J. A variant-centric perspective on geographic patterns of human allele frequency variation. Elife. 2020;9 doi: 10.7554/eLife.60107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Hou K., Ding Y., Xu Z., Wu Y., Bhattacharya A., Mester R., Belbin G.M., Buyske S., Conti D.V., Darst B.F., et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 2023;55:549–558. doi: 10.1038/s41588-023-01338-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Kang J.T.L., Rosenberg N.A. Mathematical properties of linkage disequilibrium statistics defined by normalization of the coefficient d = pab - papb. Hum. Hered. 2019;84:127–143. doi: 10.1159/000504171. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Wang Y., Tsuo K., Kanai M., Neale B.M., Martin A.R. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 2022;5:293–320. doi: 10.1146/annurev-biodatasci-111721-074830. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Kachuri L., Chatterjee N., Hirbo J., Schaid D.J., Martin I., Kullo I.J., Kenny E.E., Pasaniuc B., Auer P.L., Conomos M.P., et al. Tian Ge, and Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium Methods Working Group. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 2023;25:8–25. doi: 10.1038/s41576-023-00637-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Hu Y., Lu Q., Powles R., Yao X., Yang C., Fang F., Xu X., Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 2017;13 doi: 10.1371/journal.pcbi.1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Lloyd-Jones L.R., Zeng J., Sidorenko J., Yengo L., Moser G., Kemper K.E., Wang H., Zheng Z., Magi R., Esko T., et al. Improved polygenic prediction by bayesian multiple regression on summary statistics. Nat. Commun. 2019;10:5086–5111. doi: 10.1038/s41467-019-12653-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Amariuta T., Ishigaki K., Sugishita H., Ohta T., Koido M., Kushal K., Dey, Matsuda K., Murakami Y., Price A.L., Kawakami E., et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 2020;52:1346–1354. doi: 10.1038/s41588-020-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Márquez-Luna C., Gazal S., Loh P.-R., Kim S.S., Furlotte N., Auton A., 23andMe Research Team. Price A.L. Incorporating functional priors improves polygenic prediction accuracy in uk biobank and 23andme data sets. Nat. Commun. 10 2021;12:6052. doi: 10.1038/s41467-021-25171-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Zhou G., Chen T., Zhao H. Sdprx: A statistical method for cross-population prediction of complex traits. Am. J. Hum. Genet. 2023;110:13–22. doi: 10.1016/j.ajhg.2022.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Miao J., Guo H., Song G., Zhao Z., Hou L., Lu Q. Quantifying portable genetic effects and improving cross-ancestry genetic prediction with gwas summary statistics. Nat. Commun. 2023;14:832. doi: 10.1038/s41467-023-36544-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Xiao J., Cai M., Hu X., Wan X., Chen G., Yang C. Xpxp: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics. 2022;38:1947–1955. doi: 10.1093/bioinformatics/btac029. [DOI] [PubMed] [Google Scholar]

[bib38] 38.Weissbrod O., Kanai M., Shi H., Gazal S., Peyrot W.J., Khera A.V., Okada Y., Biobank Japan Project. Martin A.R., Finucane H.K., Price A.L. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 2022;54:450–458. doi: 10.1038/s41588-022-01036-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Zhao Z., Fritsche L.G., Smith J.A., Mukherjee B., Lee S. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 2022;109:1998–2008. doi: 10.1016/j.ajhg.2022.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. J. Roy. Stat. Soc. B. 2011;73:273–282. [Google Scholar]

[bib41] 41.Márquez-Luna C., Loh P.-R., South Asian Type 2 Diabetes SAT2D Consortium, SIGMA Type 2 Diabetes Consortium, Price, A.L. South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium, and Alkes L Price. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 2017;41:811–823. doi: 10.1002/gepi.22083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42.Bradley E. The estimation of prediction error: covariance penalties and cross-validation. J. Am. Stat. Assoc. 2004;99:619–632. [Google Scholar]

[bib43] 43.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] 44.Falconer D.S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 1965;29:51–76. [Google Scholar]

[bib45] 45.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The uk biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46.Genevieve L.W., Graff M., Katherine K., Nishimura, Tao R., Haessler J., Gignoux C.R., Heather M., Highland, Patel Y.M., Sorokin E.P., Avery C.L., et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] 47.Scott R.A., Scott L.J., Mägi R., Marullo L., Gaulton K.J., Kaakinen M., Pervjakova N., Pers T.H., Johnson A.D., Eicher J.D., et al. An expanded genome-wide association study of type 2 diabetes in europeans. Diabetes. 2017;66:2888–2902. doi: 10.2337/db16-1253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] 48.Suzuki K., Akiyama M., Ishigaki K., Kanai M., Hosoe J., Shojima N., Hozawa A., Kadota A., Kuriki K., Naito M., et al. Identification of 28 new susceptibility loci for type 2 diabetes in the japanese population. Nat. Genet. 2019;51:379–386. doi: 10.1038/s41588-018-0332-4. [DOI] [PubMed] [Google Scholar]

[bib49] 49.Schunkert H., König I.R., Kathiresan S., Reilly M.P., Assimes T.L., Holm H., Preuss M., Stewart A.F.R., Barbalic M., Gieger C., et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 2011;43:333–338. doi: 10.1038/ng.784. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] 50.Koyama S., Ito K., Terao C., Akiyama M., Horikoshi M., Momozawa Y., Matsunaga H., Ieki H., Ozaki K., Onouchi Y., et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat. Genet. 2020;52:1169–1177. doi: 10.1038/s41588-020-0705-3. [DOI] [PubMed] [Google Scholar]

[bib51] 51.Zhao Z., Yi Y., Song J., Wu Y., Zhong X., Lin Y., Hohman T.J., Fletcher J., Lu Q. Pumas: fine-tuning polygenic risk scores with gwas summary statistics. Genome Biol. 2021;22 doi: 10.1186/s13059-021-02479-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] 52.Crawford K. NeurIPS keynote; 2017. The Trouble with Bias. [Google Scholar]

[bib53] 53.Mastro A.M., Rozengurt E. Endgoenous protein kinase in outer plasma membrane of cultured 3t3 cells. nature of the membrane-bound substrate and effect of cell density, serum addition, and oncogenic transformation. J. Biol. Chem. 1976;251:7899–7906. [PubMed] [Google Scholar]

[bib54] 54.Chouldechova A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data. 2017;5:153–163. doi: 10.1089/big.2016.0047. [DOI] [PubMed] [Google Scholar]

[bib55] 55.Verma S., Rubin J. Proceedings of the International Workshop on Software Fairness. 2018. Fairness definitions explained; pp. 1–7. [Google Scholar]

[bib56] 56.Obermeyer Z., Powers B., Vogeli C., Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]

[bib57] 57.Gara M.A., Minsky S., Silverstein S.M., Miskimen T., Strakowski S.M. A naturalistic study of racial disparities in diagnoses at an outpatient behavioral health clinic. Psychiatr. Serv. 2019;70:130–134. doi: 10.1176/appi.ps.201800223. [DOI] [PubMed] [Google Scholar]

PERMALINK

Evaluating and improving health equity and fairness of polygenic scores

Tianyu Zhang

Geyu Zhou

Lambertus Klei

Peng Liu

Alexandra Chouldechova

Hongyu Zhao

Kathryn Roeder

Max G’Sell

Bernie Devlin

Summary

Introduction

Material and methods

Penalized linear regression for PGS

From one population to two: JLS

Figure 1.

Proposed parameter-tuning procedures

Figure 2.

Simulations using realistic LD data

Results

Performance of PGS

Figure 3.

Figure 4.

Figure 5.

Parameter tuning with synthetic samples

Figure 6.

Performance for type 2 diabetes (T2D) and coronary artery disease (CAD)

Table 1.

Discussion

Data and code availability

Web resources

Acknowledgments

Declaration of interests

Footnotes

Contributor Information

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases