Evaluating Performance and Agreement of Coronary Heart Disease Polygenic Risk Scores

Sarah A Abramowitz; Kristin Boulier; Karl Keat; Katie M Cardone; Manu Shivakumar; John DePaolo; Renae Judy; Francisca Bermudez; Nour Mimouni; Christopher Neylan; Dokyoon Kim; Daniel J Rader; Marylyn D Ritchie; Benjamin F Voight; Bogdan Pasaniuc; Michael G Levin; Scott M Damrauer

doi:10.1001/jama.2024.23784

. 2024 Nov 16;333(1):60–70. doi: 10.1001/jama.2024.23784

Evaluating Performance and Agreement of Coronary Heart Disease Polygenic Risk Scores

Sarah A Abramowitz ^1,², Kristin Boulier ³, Karl Keat ⁴, Katie M Cardone ⁵, Manu Shivakumar ⁴, John DePaolo ¹, Renae Judy ¹, Francisca Bermudez ¹, Nour Mimouni ¹, Christopher Neylan ¹, Dokyoon Kim ⁶, Daniel J Rader ^5,⁷, Marylyn D Ritchie ^5,⁶, Benjamin F Voight ^5,^6,^7,⁸, Bogdan Pasaniuc ³, Michael G Levin ^7,^9,^10,¹¹, Scott M Damrauer ^1,^5,^6,^7,^9,^10,^✉, for the Penn Medicine BioBank

¹Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia

²Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York

³Department of Computational Medicine, University of California, Los Angeles

⁴Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia

⁵Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia

⁶Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia

⁷Institute of Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia

⁸Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia

⁹Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia

¹⁰Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania

¹¹Division of Cardiovascular Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia

Group Information: Penn Medicine BioBank members appear in Supplement 4.

Accepted for Publication: October 23, 2024.

Published Online: November 16, 2024. doi:10.1001/jama.2024.23784

^✉

Corresponding Author: Scott M. Damrauer, MD, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Perelman Center for Advanced Medicine 14 South Tower, Philadelphia, PA 19104 (Scott.Damrauer@pennmedicine.upenn.edu).

Author Contributions: Ms Abramowitz had full access to Penn Medicine BioBank and All of Us Research Program data used in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Dr Boulier had full access to University of California, Los Angeles (UCLA) ATLAS Precision Health Biobank data used in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Drs Levin and Damrauer jointly supervised this work.

Concept and design: Abramowitz, DePaolo, Ritchie, Pasaniuc, Levin, Damrauer.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Abramowitz, Bermudez, Pasaniuc, Levin, Damrauer.

Critical review of the manuscript for important intellectual content: Abramowitz, Boulier, Keat, Cardone, Shivakumar, DePaolo, Judy, Mimouni, Neylan, Kim, Rader, Ritchie, Voight, Pasaniuc, Levin, Damrauer.

Statistical analysis: Abramowitz, Boulier, Keat, Cardone, Shivakumar, DePaolo, Bermudez, Kim, Voight, Pasaniuc, Levin, Damrauer.

Obtained funding: Abramowitz, Damrauer.

Administrative, technical, or material support: Abramowitz, Keat, DePaolo, Judy, Mimouni, Neylan, Rader, Ritchie, Voight, Pasaniuc, Levin, Damrauer.

Supervision: Abramowitz, Neylan, Ritchie, Levin, Damrauer.

Conflict of Interest Disclosures: Dr Ritchie has a patent for genetic risk prediction model for primary open-angle glaucoma pending for the University of Pennsylvania. Dr Voight received fees for services provided as a statistical reviewer for JAMA Network Open. Drs Rader and Levin received research funding to the institution from Myome to study a coronary heart disease polygenic risk score, unrelated to this study. Dr Damrauer reported receiving grants from the National Heart, Lung, and Blood Institute, Veterans Affairs Office of Research and Development, and RenalytixAI, and nonfinancial support from Novo Nordisk and Amgen, all outside the scope of the study, during the conduct of the study; in addition, Dr Damrauer has a patent for genetic risk prediction for venous thromboembolic disease and for the use of PDE3B inhibition for preventing cardiovascular disease filed by the US Department of Veterans Affairs in accordance with federal regulatory requirements. No other disclosures were reported.

Funding/Support: This study was supported by the National Heart, Lung, and Blood Institute (HL169458). Ms Abramowitz and Ms Bermudez were supported by the Sarnoff Cardiovascular Research Foundation. Dr Boulier was supported by a National Institutes of Health T32 award. Dr DePaolo was supported by the American Heart Association (23POST1011251). Dr Neylan was supported by the Institute for Translational Medicine and Therapeutics of the Perelman School of Medicine at University of Pennsylvania, the National Center for Advancing Translational Sciences (TL1TR001880), and the American Society of Colon and Rectal Surgeons. Dr Voight was supported by the National Institute of Diabetes and Digestive and Kidney Diseases (DK126194, DK138512). Dr Levin was supported by the Institute for Translational Medicine and Therapeutics of the Perelman School of Medicine at University of Pennsylvania, the National Heart, Lung, and Blood Institute (T32HL007843), the Measey Foundation, the Doris Duke Foundation (award 2023-0224), and US Department of Veterans Affairs (IK2-BX006551). The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; interagency agreement: AOD16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C 9 OD023196; Biobank: 1 U24 OD023121; the Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. The Penn Medicine BioBank is supported by the Perelman School of Medicine at University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences (UL1TR001878). The UCLA ATLAS Precision Health Biobank is supported by the National Center for Advancing Translational Sciences (UL1TR001881).

Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Group Information: The Penn Medicine Biobank members appear in Supplement 3.

Disclaimer: This article does not represent the views of the Department of Veterans Affairs or the US government.

Meeting Presentation: This study was presented at the American Heart Association Scientific Sessions 2024; November 16, 2024; Chicago, Illinois.

Data Sharing Statement: See Supplement 4.

Additional Contributions: We acknowledge the All of Us Research Program and its participants. We acknowledge the Penn Medicine BioBank (PMBB) for providing data and thank the patient-participants of Penn Medicine who consented to participate in this research program. We thank the Penn Medicine BioBank team and Regeneron Genetics Center for providing genetic variant data for analysis. We acknowledge the UCLA Institute for Precision Health, participating patients from the UCLA ATLAS Precision Health Biobank, UCLA David Geffen School of Medicine, UCLA Clinical and Translational Science Institute, and UCLA Health. We thank the participants and researchers of FinnGen, China Kadoorie Biobank, BioBank Japan, Mass General Brigham Biobank, CardiogramplusC4D, the UK Biobank, and all other biobanks and consortia whose available genome-wide association study data were used for this study. Data from the VA Million Veteran Program (MVP) were obtained from dbGaP (accession phs001672.v11.p1) under project No. 33458; we thank the MVP staff, researchers, and volunteers, who have contributed to MVP, and especially participants who previously served their country in the military and agreed to enroll in the study. Additionally, much of this work leveraged the robust tools and platform provided by the PGS Catalog, and we thank its creators as well as the researchers who have deposited their scores in the Catalog.

^✉

Corresponding author.

PMCID: PMC11569413 PMID: 39549270

Key Points

Question

Is prediction of an individual’s genetic risk for coronary heart disease (CHD) consistent across polygenic risk scores (PRSs) that perform equivalently at a population level?

Findings

Forty-six scores with similar population-level performance produced discordant individual-level estimates of risk in the All of Us Research Program. Twenty percent of participants had PRSs in both the top and bottom 5th percentile at least once across the 46 nearly equivalent PRSs.

Meaning

CHD PRSs with similar population-level performance characteristics may not provide consistent individual risk estimates. Personalized applications and clinical interpretations of CHD PRSs should consider the uncertainty of risk predictions at the individual level.

Abstract

Importance

Polygenic risk scores (PRSs) for coronary heart disease (CHD) are a growing clinical and commercial reality. Whether existing scores provide similar individual-level assessments of disease susceptibility remains incompletely characterized.

Objective

To characterize the individual-level agreement of CHD PRSs that perform similarly at the population level.

Design, Setting, and Participants

Cross-sectional study of participants from diverse backgrounds enrolled in the All of Us Research Program (AOU), Penn Medicine BioBank (PMBB), and University of California, Los Angeles (UCLA) ATLAS Precision Health Biobank with electronic health record and genotyping data.

Exposures

Polygenic risk for CHD from published PRSs and new PRSs developed separately from testing samples.

Main Outcomes and Measures

PRSs that performed population-level prediction similarly were identified by comparing calibration and discrimination of models of prevalent CHD. Individual-level agreement was tested with intraclass correlation coefficient (ICC) and Light κ.

Results

A total of 48 PRSs were calculated for 171 095 AOU participants. The mean (SD) age was 56.4 (16.8) years. A total of 104 947 participants (61.3%) were female. A total of 35 590 participants (20.8%) were most genetically similar to an African reference population, 29 801 (17.4%) to an admixed American reference population, 100 493 (58.7%) to a European reference population, and the remaining to Central/South Asian, East Asian, and Middle Eastern reference populations. There were 17 589 participants (10.3%) with and 153 506 participants without (89.7%) CHD. When included in a model of prevalent CHD, 46 scores had practically equivalent Brier scores and area under the receiver operator curves (region of practical equivalence ±0.02). Twenty percent of participants had at least 1 score in both the top and bottom 5% of risk. Continuous agreement of individual predictions was poor (ICC, 0.373 [95% CI, 0.372-0.375]). Light κ, used to evaluate consistency of risk assignment, did not exceed 0.56. Analysis among 41 193 PMBB and 53 092 ATLAS participants yielded different sets of equivalent scores, which also lacked individual-level agreement.

Conclusions and Relevance

CHD PRSs that performed similarly at the population level demonstrated highly variable individual-level estimates of risk. Recognizing that CHD PRSs may generate incongruent individual-level risk estimates, effective clinical implementation will require refined statistical methods to quantify uncertainty and new strategies to communicate this uncertainty to patients and clinicians.

This study aims to characterize the individual-level agreement of polygenic risk scores for coronary heart disease that perform similarly at the population level.

Introduction

Polygenic risk scores (PRSs) have been proposed as a tool to improve prevention and treatment of coronary heart disease (CHD).¹ PRSs aggregate the effects of risk variants to summarize the genetic component of an individual’s disease risk. Advocates suggest that PRSs have the potential to enable early identification of individuals at increased risk for CHD and facilitate implementation of focused primary prevention as part of precision cardiovascular medicine initiatives.² Applications include combining PRSs with clinical variables into comprehensive risk models, considering a PRS as a risk enhancer to be applied holistically to individuals at borderline risk, or considering a PRS risk estimate as a stand-alone test.^3,4

Dozens of CHD PRSs have been deposited in the Polygenic Score (PGS) Catalog, which seeks to standardize and improve the reporting of PRSs.⁵ Proprietary commercial scores, for which the underlying genetic association data, methodology, and weights are variably reported, are also being marketed.

Consensus recommendations from experts provide a framework to assess how scores estimate risk at the population level, and how scores can be compared with each other.⁶ Based largely on the results of these population-level analyses, multiple CHD PRSs have begun to move toward clinical implementation.⁷ Because there is no criterion standard approach to calculate an individual’s genetic risk, it is not possible to characterize how close a PRS estimate is to the true value (accuracy). In the absence of a criterion standard with which to calculate accuracy, tests instead may be evaluated for the agreement of results with those of a nonreference standard as a test of precision.⁸

To compare the consistency of risk assessment between PRSs, the current study sought to evaluate the population-level performance of available CHD PRSs, and assess individual-level agreement of risk estimates among the set of scores that performed similarly at the population level.

Methods

Study Population

The primary analysis was conducted using data from the All of Us (AOU) Research Program, a National Institutes of Health–funded biobank composed of adult volunteers across the United States who gave written consent for analysis of deidentified electronic health record and genetic data.⁹ All analyses were performed on deidentified data from All of Us Controlled Tier Dataset v7 by authorized researchers; the current analysis was deemed exempt from institutional review board (IRB) review by the University of Pennsylvania IRB. For each participant, genetic sex was inferred based on heterozygosity of the X chromosome using PLINK.¹⁰ A binary CHD phenotype was assigned in the presence of at least 1 of the following codes: 410, 411, 412, 413, 41, V45.81 (International Classification of Diseases, Ninth Revision), I21, I22, I24, Z95.1, Z98.61, or I20.0 (International Statistical Classification of Diseases and Related Health Problems, Tenth Revision). Age was calculated using participant birth year and the most recent data release cutoff date. Based on the National Academies of Sciences, Engineering, and Medicine guidelines regarding the use of race, ethnicity, and ancestry as population descriptors in genomics research, population group membership was assessed based on genetic similarity to established, publicly available reference populations.¹¹ Individuals were assigned to 1 of 6 population groups based on their genetic similarity to reference populations included in the 1000 Genomes and Human Genome Diversity Project (HGDP) reference panel using pgsc_calc v2.0.0-alpha.2 (eMethods in Supplement 1).¹² Individuals without electronic health record data were excluded. Similar approaches were used for the Penn Medicine BioBank (PMBB) and University of California, Los Angeles (UCLA) ATLAS Precision Health Biobank (ATLAS) (eMethods in Supplement 1).^13,14 PMBB was approved by the University of Pennsylvania IRB; the UCLA ATLAS Precision Health Biobank was approved by the UCLA IRB.

Selection and Creation of PRSs

PRSs for CHD were selected from the PGS Catalog (eMethods in Supplement 1).⁵ In addition, 2 novel CHD PRSs were created (eMethods and eFigure 1 in Supplement 1, eTable 1 in Supplement 2). Scores are referred to by their catalog-designated identifiers. Neither of the created scores had AOU, PMBB, or ATLAS data used in their construction.

Calculation of PRSs

Heterogeneity of allele frequencies and linkage disequilibrium patterns across populations can influence raw PRS distributions, limiting interpretation and generalizability.^4,15 To minimize these effects, scores were adjusted using a principal component analysis (PCA)–based method to normalize both mean and variance to the 1000 Genomes and HGDP reference panel (eMethods in Supplement 1).⁴ All downstream analyses used PCA-normalized scores. These values were transformed into risk percentiles based on a standard normal distribution, facilitating standardized interpretation of scores across all samples and analyses.

PRSs are built from the results of genome-wide association studies (GWASs), which identify genomic variants statistically associated with the disease of interest. PRS performance in diverse populations depends on representation of genetically similar individuals in the GWAS used to construct the PRS.^16,17,18,19 CHD PRSs vary widely in their use of diverse GWAS data. To characterize the impact of genetic background on PRS performance, sensitivity analyses were conducted that stratified individuals into population subgroups.

Population-Level Assessment: Identification of Similarly Performing PRSs

The association between each PRS and prevalent CHD was assessed using generalized linear regression models with a logit link, including age and sex as covariates. These covariates were included to minimize ascertainment bias. Genetic principal components were not included in the primary analysis model on the premise that they are already accounted for by PCA-based score normalization. Inclusion of PCs, consideration of 10-year age groupings as factors, addition of an age-sex interaction term, as well as exclusion of all covariates were evaluated as sensitivity analyses. As an initial quality-control step, scores without a significant (P < .05) positive association with CHD were excluded. The approach to risk model evaluation considered published reporting standards^6,20: risk score distribution was assessed visually, PRS effect size was quantified with odds ratios, discrimination and calibration were visualized with calibration plots and receiver operator curves, and quantified using Brier score (mean squared error for binary classification) and area under the receiver operator curve (AUROC). Bayesian methods were used to perform systematic between-model comparisons. First, the score with the lowest absolute model Brier score or highest absolute model AUROC was selected as the reference. Next, the difference between the Brier score/AUROC of each model compared with the reference PRS was evaluated using bayesian analysis of variance and a determination made for whether the PRS performed equivalently to the reference.

In the absence of a criterion standard approach for interpreting probabilities in this application, multiple definitions of equivalent were considered. Statistically equivalent scores were defined as those with a less than 95% probability of a difference of both Brier score and AUROC (eMethods and eFigure 2 in Supplement 1). Noting that model performance differences may be statistically distinguishable but of minor clinical importance, score pairings were also tested for the probability of the difference falling outside the bounds of a prespecified range (referred to as a region of practical equivalence, or ROPE). Scores where the difference relative to the reference fell within the ROPE boundary were considered practically equivalent. An a priori practical effect size margin was set at ±0.02. In sensitivity analyses, margins of ±0.01 and ±0.005 were also considered. Practically equivalent scores were those with a more than 95% probability of model performance being practically equivalent at a given practical effect size margin (eMethods and eFigure 2 in Supplement 1).²¹

This approach resulted in sets of scores that were statistically or practically equivalent to the reference score for each metric (Brier or AUROC) in a given sample/analysis.

Individual-Level PRS Assessment

Descriptive statistics were calculated to understand the individual-level distributions of genetic risk estimates produced by similarly performing scores. Each individual’s average risk percentile was calculated, defined as the mean of percentiles assigned by each score meeting the ROPE ± 0.02 practical equivalence criteria. We calculated the standard deviation (SD) and the ratio of the SD to the mean (ie, the coefficient of variation) across scores per individual. Distributions of mean risk percentiles and SDs across each sample were plotted, and we calculated the population-level median of the individual-level mean scores, SDs, and coefficient of variation. Bootstrapped 95% CIs were obtained using the “boot” version 1.3.28.1 and “simpleboot” version 1.1.7 R packages.

Given the lack of a criterion standard, accuracy of each PRS could not be assessed. Rather, agreement (precision) across scores was assessed. The intraclass correlation coefficient (ICC), which is a metric of the consistency of quantitative measurements (precision) made by different raters measuring the same quantity, was used to compare PRS percentile as a continuous variable.²²

Many tested and proposed clinical applications of PRSs use percentile cutoffs to define high-risk groups. We quantified the proportion of individuals whose assignment to the top 5th, 10th, or 20th percentile varied by score, as well as the proportion of individuals receiving risk estimates in both the top and bottom 5th, 10th, or 20th percentile.⁷ Agreement of binary assignment above or below these thresholds was quantitatively assessed by evaluating interrater agreement of the categorical classifications using Light κ (eMethods in Supplement 1).²²

Pearson correlation coefficient (r) of individual risk estimates was calculated for all pairs of scores in the primary sample.

Statistical Analysis

This study followed Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for cross-sectional studies. All analyses were performed using R (version 4.3.0 in PMBB, version 4.3.1 in AOU and ATLAS). For population-level analysis, AUROC and Brier score were used to evaluate prediction of prevalent CHD. For individual-level analysis, Light κ and ICC were used to compare categorical and continuous agreement of PRS percentiles. Although both κ and ICC are descriptive measures, guidelines exist to aid interpretation of intermediate values. For both, 0 indicates agreement that is no better than random chance, and 1 indicates perfect agreement. A κ between 0.41 and 0.60 can be interpreted as fair agreement; an ICC between 0.50 and 0.75 is generally interpreted as a fair to good relationship.^23,24,25 Methods are described in further detail in eMethods in Supplement 1.

Results

Clinical Characteristics

The AOU study population comprised 17 589 participants with (10.3%) and 153 506 participants without (89.7%) CHD (Table 1; eResults in Supplement 1). A total of 97 265 (63.4%) of those without CHD and 7682 (43.7%) of those with CHD were female. A total of 35 590 participants (20.8%) were most genetically similar to an African reference population, 29 801 (17.4%) to an admixed American reference population, 100 493 participants (58.7%) to a European reference population, and the remaining to Central/South Asian, East Asian, and Middle Eastern reference populations (eFigure 3 in Supplement 1). Those with CHD were older on average (69.3 vs 54.9 years) (Table 1). Rates of prevalent CHD were higher in the PMBB and ATLAS than AOU groups. Of 41 193 PMBB participants, 9215 (22.4%) had CHD (eTable 2 in Supplement 2). Of 53 092 ATLAS participants, 12 512 (23.5%) had CHD (eTable 3 in Supplement 2).

Table 1. Baseline Characteristics of All of Us Participants by CHD Status.

Characteristic	CHD (n = 17 589)	No CHD (n = 153 506)
Age, mean (SD), y	69.3 (12.0)	54.9 (16.7)
Sex, No. (%)
Male	9907 (56.3)	56 241 (36.6)
Female	7682 (43.7)	97 265 (63.4)
Population, No. (%)^a
Admixed American	2268 (12.9)	27 533 (17.9)
African	3558 (20.2)	32032 (20.9)
Central/South Asian	134 (0.762)	1612 (1.05)
East Asian	132 (0.750)	1865 (1.21)
European	11 344 (64.5)	89 149 (58.1)
Middle Eastern	153 (0.870)	1315 (0.857)
BMI, mean (SD) (n = 167 261)	30.7 (6.91)	29.7 (7.22)
Ever smoked, No. (%)	9638 (55.6)	63 573 (42.2)
Type 2 diabetes diagnosis, No. (%)	8062 (45.8)	24 615 (16.0)
Hypertension diagnosis, No. (%)	14 762 (83.9)	55 845 (36.4)
Lipid parameters, mean (SD), mg/dL
HDL-C (n = 96 976)	39.8 (13.4)	48.4 (15.7)
LDL-C (n = 94 597)	128.9 (45.2)	126.3 (38.9)
Triglycerides (n = 95 728)	128.2 (59.9)	117.1 (58.3)

Open in a new tab

Abbreviations: BMI, body mass index (calculated as weight in kilograms divided by height in meters squared); CHD, coronary heart disease; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol.

SI conversion factors: to convert cholesterol to mmol/L, multiply by 0.0259; and triglycerides to mmol/L, multiply by 0.0113.

^{^a}

Population categorization was determined based on genetic similarity to an external reference population. Details for the UCLA ATLAS and Penn Medicine BioBank replication samples can be found in eTables 2-3 in Supplement 2.

Population-Level PRS Performance

We first aimed to identify a set of polygenic scores that predicted prevalent CHD in a statistically equivalent way based on population-based performance prediction metrics (Figure 1). We tested a total of 48 CHD polygenic scores (eTable 4 in Supplement 2). A mean (SD) of 92.1% (8.6%), 91.6% (8.5%), and 84.6% (7.5%) of genetic variants included in each score were present for AOU, PMBB, and ATLAS participants, respectively (eTables 5-7 in Supplement 2). In an initial quality-control step to confirm scores’ function as a CHD PRS, 2 scores were excluded from downstream analysis due to negative association with disease (Figure 1; eTable 8 in Supplement 2; eFigure 4 and eResults in Supplement 1). When added to a model of prevalent disease that included age and sex as covariates in AOU, the remaining 46 scores were significantly associated with increased risk of CHD (P < .05), with odds ratios that ranged from 1.10 (model PGS000059) to 1.46 (model PGS005091) per SD increase in each PRS (eFigure 4 in Supplement 1, eTable 8 in Supplement 2), confirming the expected statistical association of these models. Similar results were seen in PMBB and ATLAS (eTables 9-10 in Supplement 2, eFigures 5-6 in Supplement 1).

Figure 1. — A total of 48 coronary heart disease polygenic risk scores (PRSs) were calculated for all individuals in the All of Us Research Program (primary sample) and Penn Medicine Biobank (replication sample), and 47 in University of California, Los Angeles (UCLA) ATLAS (replication sample). Overall score performance and calibration were estimated with Brier score (the mean squared error of the probabilities), and discrimination was estimated by calculating the area under the receiver operative curve (AUROC). We identified a set of scores that were practically equivalent based on probability of a posterior distribution of the difference between score model metrics falling within a region of practical equivalence (ROPE) of ±0.02; ROPE establishes a region in which the difference is a priori determined to be not practically or clinically meaningful. We considered agreement between risk percentiles assigned individuals by equivalent scores. The primary and replication samples were not present in the genome-wide association study (GWAS) data used to construct scores.

The model for PGS005091 had the best calibration measured by Brier score (0.0825 [95% credible interval {CrI}, 0.0823-0.0828]), whereas the model including PGS003725 had the best discrimination as measured by AUROC (0.777 [95% CrI, 0.776-0.778]) (eTable 11 in Supplement 2). All 46 scores had practically equivalent population-level performance using a prespecified region of practical equivalence margin of ±0.02 (eFigure 7 in Supplement 1). Sensitivity analyses that applied more stringent practical effect size margins of ±0.01 and ±0.005 resulted in 19 and 5 equivalent scores, respectively (eTable 11 in Supplement 2). Two scores met our prespecified definition for statistically equivalent performance across both measures of calibration and discrimination. Sensitivity analyses modifying the model to include principal components, an age-sex interaction term, or age groupings as factors yielded the same sets of equivalent scores as the primary model (eResults in Supplement 1, eTables 12-17 in Supplement 2). Analyses in PMBB and ATLAS revealed similar results, identifying 33 and 46 practically equivalent scores, respectively (eResults and eFigures 8-9 in Supplement 1, eTables 18-19 in Supplement 2). This result indicates that most CHD PRSs are practically equivalent using population-level performance metrics, and the best-performing scores can differ between samples.

In sensitivity analyses stratifying individuals by genetic similarity to reference populations, or removing model covariates, we observed unique sets of scores meeting equivalence criteria (eResults and eFigures 10-15 in Supplement 1, eTables 20-24 in Supplement 2). These results indicate that the best-performing score at the population-level can differ within as well as between samples.

Agreement of Individual PRS Risk Assessments

After identifying CHD PRSs with practically equivalent population-level metrics (ROPE ± 0.02), we next evaluated the consistency of individual-level risk estimates provided by those scores. Across AOU participants, the median SD of risk percentiles for each individual was 22.94 (95% CI, 22.92-22.96) and the median coefficient of variation for each individual was 0.504 (95% CI, 0.503-0.505), providing scale-dependent and scale-independent evidence of the variability of individual-level estimates (Figure 2; eResults in Supplement 1). We observed similar results in analyses of PMBB and UCLA ATLAS and across sensitivity analyses (eFigures 16-20 in Supplement 1).

Figure 2. — Risk estimates were calculated for each participant (n = 171 095) using the 46 scores determined to have practically equivalent population-level performance and are presented as percentiles, normalized to the 1000 Genomes and Human Genome Diversity Project reference population. A, Heat map of Pearson correlation coefficient between-risk estimates from pairs of scores, by PGS Catalog identifier. B, Mean individual risk percentile (median, 48.24 [95% CI, 48.10-48.36]). C, SD of the mean individual risk percentile (median, 22.48 [95% CI, 22.46-22.50]). D, Median, 0.4997 (95% CI, 0.4984-0.5010).

The 46 practically equivalent scores had an ICC of 0.373 (95% CI, 0.372-0.375), which, according to common interpretation frameworks, can be considered poor (Table 2).^23,24,25 ICC improved when focusing on the most recently published scores (eTable 25 in Supplement 2). ICCs for the sets of 19 and 5 scores that met more stringent practical equivalence criteria with a ROPE ±0.01 and ±0.005 were 0.555 (95% CI, 0.551-0.558) and 0.734 (95% CI, 0.732-0.736), respectively. The 5 practically equivalent scores (ROPE ±0.005) included 2 pairs of scores from the same studies (PGS005092 and PGS005091; PGS003725 and PGS003726) (eTable 4 in Supplement 2). The ICC between the 2 scores that performed equivalently at the population level (PGS003725 and PGS005091) of 0.646 (95% CI, 0.643-0.649) can be considered moderate to good.^23,24,25 Light κ was then used to assess whether scores agreed in their risk categorization. Across a range of possible high-risk thresholds, κ never exceeded 0.56 (Table 2; eTable 26 in Supplement 2, eResults in Supplement 1).

Table 2. Individual-Level Score Concordance in AOU.

Equivalence criteria^a	No. of scores	ICC (95% CI)^b	Light κ^c
Equivalence criteria^a	No. of scores	ICC (95% CI)^b	99th Percentile	95th Percentile	90th Percentile	80th Percentile	70th Percentile	50th Percentile
<95% Probability of difference	2	0.649 (0.646-0.652)	0.233	0.340	0.390	0.436	0.464	0.476
ROPE, 0.005	5	0.734 (0.732-0.736)	0.334	0.431	0.476	0.520	0.542	0.557
ROPE, 0.01	19	0.555 (0.551-0.558)	0.182	0.267	0.310	0.357	0.381	0.401
ROPE, 0.02	46	0.373 (0.372-0.375)	0.096	0.156	0.190	0.227	0.246	0.261

Open in a new tab

Abbreviations: AOU, All of Us; ICC, intraclass correlation coefficient; ROPE, region of practical equivalence.

^{^a}

For sets of scores meeting as series-designated equivalence criteria in AOU, the agreement of these scores’ individual risk classification was assessed. Equivalency criteria were based on practically equivalent population-level performance determined by the probability of the difference between metrics of discrimination and calibration differing beyond a ROPE of ±0.02; ROPE establishes a region in which the difference is a priori determined to be not practically or clinically meaningful.

^{^b}

ICC measures agreement of risk estimates as continuous variables and were calculated using a 2-way mixed-effects model.

^{^c}

Light κ values reflect congruence of scores meeting a denoted equivalence criteria in stratifying individuals above a specified percentile (99th, 95th, 90th, 80th, 70th, 50th).

We also observed that the pairwise correlation of risk estimates across all combinations of practically equivalent scores varied widely, ranging from 0.028 to 0.98 (Figure 2). Pairs of scores arising from the same original publications, which often leverage the same GWAS data for PRS construction, were on average more strongly—but still variably—correlated (median Pearson correlation, 0.56 [IQR, 0.37-0.72] vs 0.36 [IQR, 0.25-0.47]; Wilcoxon rank-sum P < .001). Correlation tended to be higher among pairs of scores with similar population-level performance and scores published more recently (eResults and eFigures 21-22 in Supplement 1). These trends suggest that using a common dataset to generate multiple scores can inflate agreement, and that interscore agreement may be improving with time and score performance.

We observed that from the 46 practically equivalent scores, 52% of individuals received at least 1 risk estimate above the top 5th risk percentile, an established cutoff for designating individuals at high risk⁷; all of these individuals also had at least 1 score that did not classify them as in the top 5th percentile of risk and 39% (which equates to 20% of all participants) had at least 1 score that also classified them as being in the bottom 5th percentile of risk. When considering the top quintile as high risk, 80% of all participants had at least 1 risk estimate above the top 20th risk percentile, as well as 1 in the bottom 20th risk percentile (eTable 27 in Supplement 2). As an illustrative example of this individual-level variability, we plotted individual-level risk percentiles for randomly selected participants across all practically equivalent scores (Figure 3; eFigure 23 in Supplement 1) and designed an interactive web application that allows users to explore CHD PRS percentiles for randomly selected individuals from the 1000 Genomes and HGDP reference population (https://mglev1n.github.io/CAD-prs-variability/).

Figure 3. — Individual-level variance in estimates for the 46 practically equivalent polygenic risk scores (PRSs), calculated and plotted for randomly selected AOU participants. The left panels plot scores in order of first publication, grouped by year. The right panel summarizes the distribution of PRSs. Scores are named according to their PGS Catalog identifier. An exemption to the Data and Statistics Dissemination Policy was obtained from the AOU Resource Access Board to permit display of individual-level results. These prediction patterns can be interactively visualized in the 1000 Genomes and HDGP reference population.

We found that metrics and patterns of variability and risk percentile congruence were similar in a population-stratified analysis of AOU participants (eTables 26-27 in Supplement 2) and in PMBB and ATLAS (eResults in Supplement 1, eTables 28-31 in Supplement 2). These results confirmed that the individual-level variability was not exclusive to our primary sample.

Discussion

In this study, results from 265 380 individuals across 3 large and diverse biobanks demonstrated that despite similar population-level performance across extant PRSs, individual-level estimates of genetic susceptibility to CHD varied considerably. The current findings highlight challenges in evaluating PRS performance at both the population and individual levels, call into question the validity of PRSs as interchangeable tests of genetic risk, and have important implications for the clinical implementation of PRSs for CHD.

At the population level, the current results identify challenges in selecting a single PRS to apply broadly for clinical implementation. While the current findings support that larger and more diverse training data, advances in score construction, and incorporation of cardiovascular disease–related traits^{26,27,28,29,30,31} may translate to improved population-level PRS performance, the improvements were marginal, and in most cases less than the prespecified ROPE ±0.02. More importantly, no single score performed best in all analyses. These conclusions are consistent with prior literature demonstrating that PRS performance varies depending on the environment in which it is assessed, owing to factors such as the age, sex, and sociodemographic composition of the population, as well as genetic background.^18,19

These results highlight that CHD PRSs, which demonstrate similar performance at a population level, are not interchangeable at the individual level. Across the primary analyses, considerable individual-level variability in PRS estimates were observed across practically and statistically equivalent scores. Even among smaller sets of scores stratified based on training data, population group, or date of score development, agreement was not strong. Based on these sensitivity analyses, it can be concluded that factors such as training dataset, time, PRS method, and training/testing mismatch do not fully explain the discordance in individual-level risk estimates. Rather, PRSs must be recognized for what they are: estimates of genetic risk that come with an inherent uncertainty.¹⁶

The results of this study have important implications for the clinical implementation of PRSs for CHD. Currently, there is no framework for guiding patients and clinicians on how to interpret potentially divergent estimates of polygenic risk, as the concepts of precision and uncertainty have historically been absent from discussions surrounding the clinical implementation of PRSs.^15,32 This challenge is compounded by the ever-accelerating proliferation of available scores, including a growing number of proprietary scores that cannot be externally evaluated. The combination of these factors has the potential to undermine the clinical utility of PRSs and lead to confusion and harm.^33,34,35

Taken collectively, this study should motivate improvements in the framework with which future CHD PRSs are evaluated, guidelines are issued, and CHD PRS investigations are designed. Ultimately, future work must balance a desire for enhancing population-level cardiovascular health with pragmatic consideration for the ramifications of the proliferation of scores that disagree in their individual-level risk estimates.

Limitations

This study should be interpreted in the context of its limitations. First, the primary conclusions regarding relative population-level performance of CHD PRSs are specific to the tested models of prevalent CHD. Prevalent disease was chosen as a primary outcome to maximize the number of individuals with CHD and, therefore, power to detect differences in score performance. Although sensitivity analyses were performed that considered different population subgroups and demographic covariates, other predictive modeling scenarios have been proposed.³⁶ These results do not preclude the possibility that CHD PRSs can be more conclusively differentiated within these other models, such as integrated risk models that combine traditional cardiovascular disease risk factors (eg, pooled cohort equations or PREVENT) with genetics, or models that consider incident disease or absolute risk. Second, minor differences in sequencing and imputation methodology (which in turn affects match percentage) and phenotype definitions used in ATLAS, PMBB, and AOU may account for some of the variability between score performance metrics obtained from each biobank.³⁷ Third, in testing individual-level score concordance/precision, it is assumed that the scores were independent of each other, but these tested PRSs cannot be considered truly independent because, in most cases, the scores build on a core set of CHD GWAS data. This lack of PRS independence should bias the results toward a higher degree of agreement/correlation, and thus strengthen the interpretation of poor concordance. Fourth, individual-level variability of CHD PRSs does not preclude a meaningful role of PRSs at the population level, where higher-risk populations on average may derive greater benefit from therapeutic interventions.^38,39

Conclusions

CHD PRSs that performed similarly at the population-level demonstrated highly variable individual-level estimates of risk. Recognizing that CHD PRSs may generate incongruent individual-level risk estimates, effective clinical implementation will require refined statistical methods to quantify uncertainty and new strategies to communicate this uncertainty to patients and clinicians.

Supplement 1.

eFigure 1. Flowchart Summarizing CHD Multi-Population Meta-Analysis

eFigure 2. Schematic of Equivalence Criteria

eFigure 3. Principal Component Plots for A) All of Us, B) Penn Medicine Biobank, and C) UCLA ATLAS

eFigure 4. Forest Plot of Odds Ratios – AOU

eFigure 5. Forest Plot of Odds Ratios – PMBB

eFigure 6. Forest Plot of Odds Ratios in ATLAS

eFigure 7. Distribution of Polygenic Risk Score Brier Scores and AUROCs in All of Us

eFigure 8. Distribution of Polygenic Risk Score Brier Scores and AUROCs in PMBB

eFigure 9. Distribution of Polygenic Risk Score Brier Scores and AUROCs in UCLA ATLAS

eFigure 10. Forest Plot of Odds Ratios – AOU AFR

eFigure 11. Distribution of Polygenic Risk Score Brier Scores and AUROCs in AOU - AFR

eFigure 12. Forest Plot of Odds Ratios – AOU EUR

eFigure 13. Distribution of Polygenic Risk Score Brier Scores and AUROCs in AOU - EUR

eFigure 14. Forest Plot of Odds Ratios – AOU, PRS Only

eFigure 15. Distribution of Polygenic Risk Score Brier Scores and AUROCs in AOU – PRS Only

eFigure 16. Within-Person Score Concordance in PMBB

eFigure 17. Within-Person Score Concordance in ATLAS

eFigure 18. Within-Person Score Concordance in AOU – AFR

eFigure 19. Within-Person Score Concordance in AOU – EUR

eFigure 20. Within-Person Score Concordance in AOU – PRS Only

eFigure 21. Correlation Coefficients Between Pairs of Scores From Same Publication vs Pairs of Scores from Different Publication

eFigure 22. Score Pair Correlation by Magnitude of Performance Difference - AOU

eFigure 23. Risk Predictions for 25 Randomly Selected Individuals from a Reference Population

eReferences

jama-e2423784-s001.pdf^{(5.4MB, pdf)}

Supplement 2.

eTable 1. Studies Included in CHD GWAS Meta-Analysis

eTable 2. PMBB Participant Demographics

eTable 3. ATLAS Participant Demographics

eTable 4. List of PRS Scores Used

eTable 5. Variant Match Percentages in AOU

eTable 6. Variant Match Percentages in PMBB

eTable 7. Variant Match Percentages in UCLA ATLAS

eTable 8. Model Term Odds Ratios (AOU)

eTable 9. Model Term Odds Ratios (PMBB)

eTable 10. Model Term Odds Ratios (ATLAS)

eTable 11. AOU Model Metrics - Primary Model w PRS, Age Sex

eTable 12. Model Term Odds Ratios (AOU, Model With Age, Sex, and PCs)

eTable 13. Model Term Odds Ratios (AOU, Model With Age: Sex Interaction Term)

eTable 14. Model Term Odds Ratios (AOU, Model With Age Group as Factor)

eTable 15. AOU Model Metrics - Model w PRS, Age, Sex, 5 PCs

eTable 16. AOU Model Metrics - Model w Age:Sex Interaction Term

eTable 17. AOU Model Metrics - Model w Age Group as Factor

eTable 18. PMBB Model Metrics

eTable 19. ATLAS Model Metrics

eTable 20. Model Term Odds Ratios (AFR and EUR GIA-Stratified, AOU)

eTable 21. Model Term Odds Ratios (AOU, PRS w/o Age and Sex in Model)

eTable 22. AOU Model Metrics (EUR) - Model w PRS, Age Sex

eTable 23. AOU Model Metrics (AFR) - Model w PRS, Age Sex

eTable 24. AOU Model Metrics - Model w PRS Only, No Covariates

eTable 25. ICC Based on Year of Publication

eTable 26. AOU Within Person Score Concordance

eTable 27. AOU Individual Consistency

eTable 28. PMBB Within Person Score Concordance

eTable 29. ATLAS Within Person Score Concordance

eTable 30. PMBB Individual Consistency

eTable 31. ATLAS Individual Consistency

jama-e2423784-s002.xlsx^{(168.6KB, xlsx)}

Supplement 3.

Nonauthor Collaborators. Penn Medicine BioBank

jama-e2423784-s003.pdf^{(118.5KB, pdf)}

Supplement 4.

Data Sharing Statement

jama-e2423784-s004.pdf^{(121.3KB, pdf)}

References

1.O’Sullivan JW, Raghavan S, Marquez-Luna C, et al. ; American Heart Association Council on Genomic and Precision Medicine; Council on Clinical Cardiology; Council on Arteriosclerosis, Thrombosis and Vascular Biology; Council on Cardiovascular Radiology and Intervention; Council on Lifestyle and Cardiometabolic Health; and Council on Peripheral Vascular Disease . Polygenic risk scores for cardiovascular disease: a scientific statement from the American Heart Association. Circulation. 2022;146(8):e93-e118. doi: 10.1161/CIR.0000000000001077 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Natarajan P. Polygenic risk scoring for coronary heart disease: the first risk factor. J Am Coll Cardiol. 2018;72(16):1894-1897. doi: 10.1016/j.jacc.2018.08.1041 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Klarin D, Natarajan P. Clinical utility of polygenic risk scores for coronary artery disease. Nat Rev Cardiol. 2022;19(5):291-301. doi: 10.1038/s41569-021-00638-w [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Lennon NJ, Kottyan LC, Kachulis C, et al. ; GIANT Consortium; All of Us Research Program . Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat Med. 2024;30(2):480-487. doi: 10.1038/s41591-024-02796-z [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lambert SA, Gil L, Jupp S, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53(4):420-425. doi: 10.1038/s41588-021-00783-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wand H, Lambert SA, Tamburro C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591(7849):211-219. doi: 10.1038/s41586-021-03243-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Linder JE, Allworth A, Bland HT, et al. ; eMERGE Consortium . Returning integrated genomic risk and clinical recommendations: the eMERGE study. Genet Med. 2023;25(4):100006. doi: 10.1016/j.gim.2023.100006 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.US Food and Drug Administration . Statistical guidance on reporting results from studies evaluating diagnostic tests—guidance for industry and FDA staff. Published online March 2007. Accessed May 27, 2024. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/statistical-guidance-reporting-results-studies-evaluating-diagnostic-tests-guidance-industry-and-fda
9.All of Us Research Program Genomics Investigators . Genomic data in the All of Us Research Program. Nature. 2024;627(8003):340-346. doi: 10.1038/s41586-023-06957-x [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559-575. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Committee on the Use of Race, Ethnicity, and Ancestry as Population Descriptors in Genomics Research; Board on Health Sciences Policy; Committee on Population; Health and Medicine Division; Division of Behavioral and Social Sciences and Education; National Academies of Sciences, Engineering, and Medicine . Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. National Academies Press; 2023. [PubMed] [Google Scholar]
12.Lambert SA, Wingfield B, Gibson JT, et al. Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nat Genet. Published online September 26, 2024. doi: 10.1038/s41588-024-01937-x [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Johnson R, Ding Y, Venkateswaran V, et al. ; UCLA Precision Health Data Discovery Repository Working Group, UCLA Precision Health ATLAS Working Group . Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med. 2022;14(1):104. doi: 10.1186/s13073-022-01106-x [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Verma A, Damrauer SM, Naseer N, et al. ; Penn Medicine BioBank . The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J Pers Med. 2022;12(12):1974. doi: 10.3390/jpm12121974 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ding Y, Hou K, Xu Z, et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature. 2023;618(7966):774-781. doi: 10.1038/s41586-023-06079-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Ding Y, Hou K, Burch KS, et al. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet. 2022;54(1):30-39. doi: 10.1038/s41588-021-00961-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Pencina MJ, Goldstein BA, D’Agostino RB. Prediction models: development, evaluation, and clinical application. N Engl J Med. 2020;382(17):1583-1586. doi: 10.1056/NEJMp2000589 [DOI] [PubMed] [Google Scholar]
18.Kachuri L, Chatterjee N, Hirbo J, et al. ; Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium Methods Working Group . Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet. 2024;25(1):8-25. doi: 10.1038/s41576-023-00637-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Monti R, Eick L, Hudjashov G, et al. ; Genes and Health Research Team . Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. Am J Hum Genet. 2024;111(7):1431-1447. doi: 10.1016/j.ajhg.2024.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1. doi: 10.1186/s12916-014-0241-z [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kruschke JK. Rejecting or accepting parameter values in bayesian estimation. Adv Methods Pract Psychol Sci. 2018;1(2):270-280. doi: 10.1177/2515245918771304 [DOI] [Google Scholar]
22.Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23-34. doi: 10.20982/tqmp.08.1.p023 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.White SE. Basic & Clinical Biostatistics. 5th ed. McGraw-Hill Education; 2020. [Google Scholar]
24.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155-163. doi: 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284-290. doi: 10.1037/1040-3590.6.4.284 [DOI] [Google Scholar]
26.Tcheandjieu C, Zhu X, Hilliard AT, et al. ; Regeneron Genetics Center; CARDIoGRAMplusC4D Consortium; Biobank Japan; Million Veteran Program . Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat Med. 2022;28(8):1679-1692. doi: 10.1038/s41591-022-01891-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Aragam KG, Jiang T, Goel A, et al. ; Biobank Japan; EPIC-CVD; CARDIoGRAMplusC4D Consortium . Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet. 2022;54(12):1803-1815. doi: 10.1038/s41588-022-01233-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Truong B, Hull LE, Ruan Y, et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genom. 2024;4(4):100523. doi: 10.1016/j.xgen.2024.100523 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Graham SE, Clarke SL, Wu KHH, et al. ; VA Million Veteran Program; Global Lipids Genetics Consortium . The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600(7890):675-679. doi: 10.1038/s41586-021-04064-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Patel AP, Wang M, Ruan Y, et al. ; Genes & Health Research Team; the Million Veteran Program . A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat Med. 2023;29(7):1793-1803. doi: 10.1038/s41591-023-02429-x [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Norland K, Schaid DJ, Kullo IJ. A linear weighted combination of polygenic scores for a broad range of traits improves prediction of coronary heart disease. Eur J Hum Genet. 2024;32(2):209-214. doi: 10.1038/s41431-023-01463-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Kavousi M, Schunkert H. Polygenic risk score: a tool ready for clinical use? Eur Heart J. 2022;43(18):1712-1714. doi: 10.1093/eurheartj/ehab923 [DOI] [PubMed] [Google Scholar]
33.Sherkow JS, Park JK, Lu CY. Regulating direct-to-consumer polygenic risk scores. JAMA. 2023;330(8):691-692. doi: 10.1001/jama.2023.12262 [DOI] [PubMed] [Google Scholar]
34.Muslimova D, Dias Pereira R, von Hinke S, van Kippersluis H, Rietveld CA, Meddens SFW. Rank concordance of polygenic indices. Nat Hum Behav. 2023;7(5):802-811. doi: 10.1038/s41562-023-01544-6 [DOI] [PubMed] [Google Scholar]
35.Namba S, Akiyama M, Hamanoue H, et al. ; BioBank Japan Project . Inconsistent embryo selection across polygenic score methods. Nat Hum Behav. Published online October 14, 2024. doi: 10.1038/s41562-024-02019-y [DOI] [PubMed] [Google Scholar]
36.Elliott J, Bodinier B, Bond TA, et al. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA. 2020;323(7):636-645. doi: 10.1001/jama.2019.22241 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Pain O, Gillett AC, Austin JC, Folkersen L, Lewis CM. A tool for translating polygenic scores onto the absolute scale using summary statistics. Eur J Hum Genet. 2022;30(3):339-348. doi: 10.1038/s41431-021-01028-z [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Mega JL, Morrow DA, Brown A, Cannon CP, Sabatine MS. Identification of genetic variants associated with response to statin therapy. Arterioscler Thromb Vasc Biol. 2009;29(9):1310-1315. doi: 10.1161/ATVBAHA.109.188474 [DOI] [PubMed] [Google Scholar]
39.Khera AV, Emdin CA, Drake I, et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease. N Engl J Med. 2016;375(24):2349-2358. doi: 10.1056/NEJMoa1605086 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eFigure 1. Flowchart Summarizing CHD Multi-Population Meta-Analysis

eFigure 2. Schematic of Equivalence Criteria

eFigure 3. Principal Component Plots for A) All of Us, B) Penn Medicine Biobank, and C) UCLA ATLAS

eFigure 4. Forest Plot of Odds Ratios – AOU

eFigure 5. Forest Plot of Odds Ratios – PMBB

eFigure 6. Forest Plot of Odds Ratios in ATLAS

eFigure 7. Distribution of Polygenic Risk Score Brier Scores and AUROCs in All of Us

eFigure 8. Distribution of Polygenic Risk Score Brier Scores and AUROCs in PMBB

eFigure 9. Distribution of Polygenic Risk Score Brier Scores and AUROCs in UCLA ATLAS

eFigure 10. Forest Plot of Odds Ratios – AOU AFR

eFigure 11. Distribution of Polygenic Risk Score Brier Scores and AUROCs in AOU - AFR

eFigure 12. Forest Plot of Odds Ratios – AOU EUR

eFigure 13. Distribution of Polygenic Risk Score Brier Scores and AUROCs in AOU - EUR

eFigure 14. Forest Plot of Odds Ratios – AOU, PRS Only

eFigure 15. Distribution of Polygenic Risk Score Brier Scores and AUROCs in AOU – PRS Only

eFigure 16. Within-Person Score Concordance in PMBB

eFigure 17. Within-Person Score Concordance in ATLAS

eFigure 18. Within-Person Score Concordance in AOU – AFR

eFigure 19. Within-Person Score Concordance in AOU – EUR

eFigure 20. Within-Person Score Concordance in AOU – PRS Only

eFigure 21. Correlation Coefficients Between Pairs of Scores From Same Publication vs Pairs of Scores from Different Publication

eFigure 22. Score Pair Correlation by Magnitude of Performance Difference - AOU

eFigure 23. Risk Predictions for 25 Randomly Selected Individuals from a Reference Population

eReferences

jama-e2423784-s001.pdf^{(5.4MB, pdf)}

Supplement 2.

eTable 1. Studies Included in CHD GWAS Meta-Analysis

eTable 2. PMBB Participant Demographics

eTable 3. ATLAS Participant Demographics

eTable 4. List of PRS Scores Used

eTable 5. Variant Match Percentages in AOU

eTable 6. Variant Match Percentages in PMBB

eTable 7. Variant Match Percentages in UCLA ATLAS

eTable 8. Model Term Odds Ratios (AOU)

eTable 9. Model Term Odds Ratios (PMBB)

eTable 10. Model Term Odds Ratios (ATLAS)

eTable 11. AOU Model Metrics - Primary Model w PRS, Age Sex

eTable 12. Model Term Odds Ratios (AOU, Model With Age, Sex, and PCs)

eTable 13. Model Term Odds Ratios (AOU, Model With Age: Sex Interaction Term)

eTable 14. Model Term Odds Ratios (AOU, Model With Age Group as Factor)

eTable 15. AOU Model Metrics - Model w PRS, Age, Sex, 5 PCs

eTable 16. AOU Model Metrics - Model w Age:Sex Interaction Term

eTable 17. AOU Model Metrics - Model w Age Group as Factor

eTable 18. PMBB Model Metrics

eTable 19. ATLAS Model Metrics

eTable 20. Model Term Odds Ratios (AFR and EUR GIA-Stratified, AOU)

eTable 21. Model Term Odds Ratios (AOU, PRS w/o Age and Sex in Model)

eTable 22. AOU Model Metrics (EUR) - Model w PRS, Age Sex

eTable 23. AOU Model Metrics (AFR) - Model w PRS, Age Sex

eTable 24. AOU Model Metrics - Model w PRS Only, No Covariates

eTable 25. ICC Based on Year of Publication

eTable 26. AOU Within Person Score Concordance

eTable 27. AOU Individual Consistency

eTable 28. PMBB Within Person Score Concordance

eTable 29. ATLAS Within Person Score Concordance

eTable 30. PMBB Individual Consistency

eTable 31. ATLAS Individual Consistency

jama-e2423784-s002.xlsx^{(168.6KB, xlsx)}

Supplement 3.

Nonauthor Collaborators. Penn Medicine BioBank

jama-e2423784-s003.pdf^{(118.5KB, pdf)}

Supplement 4.

Data Sharing Statement

jama-e2423784-s004.pdf^{(121.3KB, pdf)}

[joi240138r1] 1.O’Sullivan JW, Raghavan S, Marquez-Luna C, et al. ; American Heart Association Council on Genomic and Precision Medicine; Council on Clinical Cardiology; Council on Arteriosclerosis, Thrombosis and Vascular Biology; Council on Cardiovascular Radiology and Intervention; Council on Lifestyle and Cardiometabolic Health; and Council on Peripheral Vascular Disease . Polygenic risk scores for cardiovascular disease: a scientific statement from the American Heart Association. Circulation. 2022;146(8):e93-e118. doi: 10.1161/CIR.0000000000001077 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r2] 2.Natarajan P. Polygenic risk scoring for coronary heart disease: the first risk factor. J Am Coll Cardiol. 2018;72(16):1894-1897. doi: 10.1016/j.jacc.2018.08.1041 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r3] 3.Klarin D, Natarajan P. Clinical utility of polygenic risk scores for coronary artery disease. Nat Rev Cardiol. 2022;19(5):291-301. doi: 10.1038/s41569-021-00638-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r4] 4.Lennon NJ, Kottyan LC, Kachulis C, et al. ; GIANT Consortium; All of Us Research Program . Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat Med. 2024;30(2):480-487. doi: 10.1038/s41591-024-02796-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r5] 5.Lambert SA, Gil L, Jupp S, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53(4):420-425. doi: 10.1038/s41588-021-00783-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r6] 6.Wand H, Lambert SA, Tamburro C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591(7849):211-219. doi: 10.1038/s41586-021-03243-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r7] 7.Linder JE, Allworth A, Bland HT, et al. ; eMERGE Consortium . Returning integrated genomic risk and clinical recommendations: the eMERGE study. Genet Med. 2023;25(4):100006. doi: 10.1016/j.gim.2023.100006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r8] 8.US Food and Drug Administration . Statistical guidance on reporting results from studies evaluating diagnostic tests—guidance for industry and FDA staff. Published online March 2007. Accessed May 27, 2024. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/statistical-guidance-reporting-results-studies-evaluating-diagnostic-tests-guidance-industry-and-fda

[joi240138r9] 9.All of Us Research Program Genomics Investigators . Genomic data in the All of Us Research Program. Nature. 2024;627(8003):340-346. doi: 10.1038/s41586-023-06957-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r10] 10.Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559-575. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r11] 11.Committee on the Use of Race, Ethnicity, and Ancestry as Population Descriptors in Genomics Research; Board on Health Sciences Policy; Committee on Population; Health and Medicine Division; Division of Behavioral and Social Sciences and Education; National Academies of Sciences, Engineering, and Medicine . Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. National Academies Press; 2023. [PubMed] [Google Scholar]

[joi240138r12] 12.Lambert SA, Wingfield B, Gibson JT, et al. Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nat Genet. Published online September 26, 2024. doi: 10.1038/s41588-024-01937-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r13] 13.Johnson R, Ding Y, Venkateswaran V, et al. ; UCLA Precision Health Data Discovery Repository Working Group, UCLA Precision Health ATLAS Working Group . Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med. 2022;14(1):104. doi: 10.1186/s13073-022-01106-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r14] 14.Verma A, Damrauer SM, Naseer N, et al. ; Penn Medicine BioBank . The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J Pers Med. 2022;12(12):1974. doi: 10.3390/jpm12121974 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r15] 15.Ding Y, Hou K, Xu Z, et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature. 2023;618(7966):774-781. doi: 10.1038/s41586-023-06079-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r16] 16.Ding Y, Hou K, Burch KS, et al. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet. 2022;54(1):30-39. doi: 10.1038/s41588-021-00961-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r17] 17.Pencina MJ, Goldstein BA, D’Agostino RB. Prediction models: development, evaluation, and clinical application. N Engl J Med. 2020;382(17):1583-1586. doi: 10.1056/NEJMp2000589 [DOI] [PubMed] [Google Scholar]

[joi240138r18] 18.Kachuri L, Chatterjee N, Hirbo J, et al. ; Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium Methods Working Group . Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet. 2024;25(1):8-25. doi: 10.1038/s41576-023-00637-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r19] 19.Monti R, Eick L, Hudjashov G, et al. ; Genes and Health Research Team . Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. Am J Hum Genet. 2024;111(7):1431-1447. doi: 10.1016/j.ajhg.2024.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r20] 20.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1. doi: 10.1186/s12916-014-0241-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r21] 21.Kruschke JK. Rejecting or accepting parameter values in bayesian estimation. Adv Methods Pract Psychol Sci. 2018;1(2):270-280. doi: 10.1177/2515245918771304 [DOI] [Google Scholar]

[joi240138r22] 22.Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23-34. doi: 10.20982/tqmp.08.1.p023 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r23] 23.White SE. Basic & Clinical Biostatistics. 5th ed. McGraw-Hill Education; 2020. [Google Scholar]

[joi240138r24] 24.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155-163. doi: 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r25] 25.Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284-290. doi: 10.1037/1040-3590.6.4.284 [DOI] [Google Scholar]

[joi240138r26] 26.Tcheandjieu C, Zhu X, Hilliard AT, et al. ; Regeneron Genetics Center; CARDIoGRAMplusC4D Consortium; Biobank Japan; Million Veteran Program . Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat Med. 2022;28(8):1679-1692. doi: 10.1038/s41591-022-01891-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r27] 27.Aragam KG, Jiang T, Goel A, et al. ; Biobank Japan; EPIC-CVD; CARDIoGRAMplusC4D Consortium . Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet. 2022;54(12):1803-1815. doi: 10.1038/s41588-022-01233-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r28] 28.Truong B, Hull LE, Ruan Y, et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genom. 2024;4(4):100523. doi: 10.1016/j.xgen.2024.100523 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r29] 29.Graham SE, Clarke SL, Wu KHH, et al. ; VA Million Veteran Program; Global Lipids Genetics Consortium . The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600(7890):675-679. doi: 10.1038/s41586-021-04064-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r30] 30.Patel AP, Wang M, Ruan Y, et al. ; Genes & Health Research Team; the Million Veteran Program . A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat Med. 2023;29(7):1793-1803. doi: 10.1038/s41591-023-02429-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r31] 31.Norland K, Schaid DJ, Kullo IJ. A linear weighted combination of polygenic scores for a broad range of traits improves prediction of coronary heart disease. Eur J Hum Genet. 2024;32(2):209-214. doi: 10.1038/s41431-023-01463-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r32] 32.Kavousi M, Schunkert H. Polygenic risk score: a tool ready for clinical use? Eur Heart J. 2022;43(18):1712-1714. doi: 10.1093/eurheartj/ehab923 [DOI] [PubMed] [Google Scholar]

[joi240138r33] 33.Sherkow JS, Park JK, Lu CY. Regulating direct-to-consumer polygenic risk scores. JAMA. 2023;330(8):691-692. doi: 10.1001/jama.2023.12262 [DOI] [PubMed] [Google Scholar]

[joi240138r34] 34.Muslimova D, Dias Pereira R, von Hinke S, van Kippersluis H, Rietveld CA, Meddens SFW. Rank concordance of polygenic indices. Nat Hum Behav. 2023;7(5):802-811. doi: 10.1038/s41562-023-01544-6 [DOI] [PubMed] [Google Scholar]

[joi240138r35] 35.Namba S, Akiyama M, Hamanoue H, et al. ; BioBank Japan Project . Inconsistent embryo selection across polygenic score methods. Nat Hum Behav. Published online October 14, 2024. doi: 10.1038/s41562-024-02019-y [DOI] [PubMed] [Google Scholar]

[joi240138r36] 36.Elliott J, Bodinier B, Bond TA, et al. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA. 2020;323(7):636-645. doi: 10.1001/jama.2019.22241 [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r37] 37.Pain O, Gillett AC, Austin JC, Folkersen L, Lewis CM. A tool for translating polygenic scores onto the absolute scale using summary statistics. Eur J Hum Genet. 2022;30(3):339-348. doi: 10.1038/s41431-021-01028-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[joi240138r38] 38.Mega JL, Morrow DA, Brown A, Cannon CP, Sabatine MS. Identification of genetic variants associated with response to statin therapy. Arterioscler Thromb Vasc Biol. 2009;29(9):1310-1315. doi: 10.1161/ATVBAHA.109.188474 [DOI] [PubMed] [Google Scholar]

[joi240138r39] 39.Khera AV, Emdin CA, Drake I, et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease. N Engl J Med. 2016;375(24):2349-2358. doi: 10.1056/NEJMoa1605086 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Evaluating Performance and Agreement of Coronary Heart Disease Polygenic Risk Scores

Sarah A Abramowitz, BA

Kristin Boulier, MD

Karl Keat, BS

Katie M Cardone, BS

Manu Shivakumar, BS

John DePaolo, MD, PhD

Renae Judy, MS

Francisca Bermudez, BA

Nour Mimouni, BA

Christopher Neylan, MD

Dokyoon Kim, PhD

Daniel J Rader, MD

Marylyn D Ritchie, PhD

Benjamin F Voight, PhD

Bogdan Pasaniuc, PhD

Michael G Levin, MD

Scott M Damrauer, MD

Key Points

Question

Findings

Meaning

Abstract

Importance

Objective

Design, Setting, and Participants

Exposures

Main Outcomes and Measures

Results

Conclusions and Relevance

Introduction

Methods

Study Population

Selection and Creation of PRSs

Calculation of PRSs

Population-Level Assessment: Identification of Similarly Performing PRSs

Individual-Level PRS Assessment

Statistical Analysis

Results

Clinical Characteristics

Table 1. Baseline Characteristics of All of Us Participants by CHD Status.

Population-Level PRS Performance

Figure 1. Overview of Approach.

Agreement of Individual PRS Risk Assessments

Figure 2. Concordance of Individual Scores in the Primary All of Us (AOU) Research Sample.

Table 2. Individual-Level Score Concordance in AOU.

Figure 3. Risk Predictions for 5 Randomly Selected Individuals in the All of Us (AOU) Research Program.

Discussion

Limitations

Conclusions

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases