Key Points
Question
Does a polygenic predictor of coronary heart disease (CHD) that incorporates millions of common single-nucleotide polymorphisms (SNPs) improve risk stratification compared with a guideline-based risk equation?
Findings
In a retrospective cohort study that included 7237 middle-aged participants of European ancestry free of clinical CHD at baseline, a polygenic risk score added to the 2013 American College of Cardiology and American Heart Association pooled cohort equations did not significantly improve discriminative accuracy (measured by C statistic), calibration (comparing observed vs expected event probabilities), or net reclassification improvement (using a 10-year risk threshold of 7.5%).
Meaning
Addition of a polygenic risk score to a clinical risk score for incident CHD may not provide important information in a white middle-aged population.
Abstract
Importance
Polygenic risk scores comprising millions of single-nucleotide polymorphisms (SNPs) could be useful for population-wide coronary heart disease (CHD) screening.
Objective
To determine whether a polygenic risk score improves prediction of CHD compared with a guideline-recommended clinical risk equation.
Design, Setting, and Participants
A retrospective cohort study of the predictive accuracy of a previously validated polygenic risk score was assessed among 4847 adults of white European ancestry, aged 45 through 79 years, participating in the Atherosclerosis Risk in Communities (ARIC) study and 2390 participating in the Multi-Ethnic Study of Atherosclerosis (MESA) from 1996 through December 31, 2015, the final day of follow-up. The performance of the polygenic risk score was compared with that of the 2013 American College of Cardiology and American Heart Association pooled cohort equations.
Exposures
Genetic risk was computed for each participant by summing the product of the weights and allele dosage across 6 630 149 SNPs. Weights were based on an international genome-wide association study.
Main Outcomes and Measures
Prediction of 10-year first CHD events (including myocardial infarctions, fatal coronary events, silent infarctions, revascularization procedures, or resuscitated cardiac arrest) assessed using measures of model discrimination, calibration, and net reclassification improvement (NRI).
Results
The study population included 4847 adults from the ARIC study (mean [SD] age, 62.9 [5.6] years; 56.4% women) and 2390 adults from the MESA cohort (mean [SD] age, 61.8 [9.6] years; 52.2% women). Incident CHD events occurred in 696 participants (14.4%) and 227 participants (9.5%), respectively, over median follow-up of 15.5 years (interquartile range [IQR], 6.3 years) and 14.2 (IQR, 2.5 years) years. The polygenic risk score was significantly associated with 10-year CHD incidence in ARIC with hazard ratios per SD increment of 1.24 (95% CI, 1.15 to 1.34) and in MESA, 1.38 (95% CI, 1.21 to 1.58). Addition of the polygenic risk score to the pooled cohort equations did not significantly increase the C statistic in either cohort (ARIC, change in C statistic, −0.001; 95% CI, −0.009 to 0.006; MESA, 0.021; 95% CI, −0.0004 to 0.043). At the 10-year risk threshold of 7.5%, the addition of the polygenic risk score to the pooled cohort equations did not provide significant improvement in reclassification in either ARIC (NRI, 0.018, 95% CI, −0.012 to 0.036) or MESA (NRI, 0.001, 95% CI, −0.038 to 0.076). The polygenic risk score did not significantly improve calibration in either cohort.
Conclusions and Relevance
In this analysis of 2 cohorts of US adults, the polygenic risk score was associated with incident coronary heart disease events but did not significantly improve discrimination, calibration, or risk reclassification compared with conventional predictors. These findings suggest that a polygenic risk score may not enhance risk prediction in a general, white middle-aged population.
This pooled cohort involving US adults with white European ancestry compares the accuracy of a polygenic risk score vs 2013 ACC/AHA pooled cohort equations for predicting 10-year risk of coronary heart disease.
Introduction
Early identification and treatment of individuals at risk of coronary heart disease (CHD) has been an important contributor to reductions in cardiovascular morbidity and mortality since 1970.1 The American College of Cardiology and American Heart Association (ACC/AHA) cardiovascular prevention guidelines suggest that therapy with lipid-lowering statin medications be considered for individuals with an estimated 10-year risk of atherosclerotic events greater than 7.5% based on the 2013 ACC/AHA pooled cohort equations.2 However, many individuals who develop CHD have an estimated 10-year cardiovascular risk of less than 7.5%.3 Conversely, only a minority of those judged to be at high risk actually have events over the subsequent decades. Thus, there is considerable interest in identifying strategies to enhance risk stratification in order to minimize overtreatment and undertreatment, improve communication with patients about risk, and promote further health gains.4
Recently, CHD risk scores based on common genetic variation have been developed using single-nucleotide polymorphisms (SNPs) derived from genome-wide association studies (GWAS).5 Such classifiers may now incorporate millions of SNPs.6 In cross-sectional studies, individuals falling in the highest deciles of these polygenic risk scores have odds ratios for prevalent CHD of 3 to 4 compared with lower risk individuals.7,8 The risk associated with elevated polygenic risk scores has been more modest in studies focusing on incident events.9,10 Nevertheless, there has been substantial interest in the possibility of incorporating polygenic risk scores into population-wide screening, as evidenced by the attention given by both the scientific and lay communities.11,12
The clinical utility of new risk markers such as the polygenic risk score depends on the ability to predict future CHD events, not on the strength of the associations with prevalent CHD. The objective of this study was to evaluate the performance of a polygenic risk score for prediction of incident CHD events compared with risk prediction using a guideline-recommended clinical risk equation.13,14
Methods
Study Population
Data came from 2 population-based cohort studies, the Atherosclerosis Risk in Communities (ARIC) study and the Multi-Ethnic Study of Atherosclerosis (MESA) (Table 1). The ARIC study comprised genotyped adult participants aged between 45 and 64 years old and followed up from 1986 through 2015.13 Publicly available ARIC data were obtained from dB Gap (phs000280). Use of ARIC data was approved by the institutional review board of Vanderbilt University Medical Center.
Table 1. Baseline Characteristics of the ARIC and MESA Cohorts.
Characteristic | No. (%) of Participants | |
---|---|---|
ARIC (n = 4847)a |
MESA (n = 2390)a |
|
Men | 2113 (43.6) | 1142 (47.8) |
Women | 2734 (56.4) | 1248 (52.2) |
Age, mean (SD), y | 62.9 (5.6) | 61.8 (9.6) |
Total cholesterol, mean (SD), mg/dL | 202.3 (36.0) | 196.3 (35.2) |
HDL cholesterol, mean (SD), mg/dL | 50.2 (16.5) | 52.5 (15.7) |
Systolic blood pressure, mean (SD), mmHg | 125.5 (18.2) | 122.8 (20.0) |
Taking antihypertensive medications | 1400 (28.9) | 768 (32.1) |
Taking statin medication | 472 (9.7) | 397 (16.6) |
Current smoker | 689 (14.2) | 284 (11.9) |
Type 2 diabetes | 291 (6.0) | 139 (5.8) |
High estimated 10-y risk, >7.5%b | 2772 (57.2) | 1198 (50.1) |
Maternal or paternal family history of CHDc | 2011 (41.5) | NA |
Maternal or paternal premature CHDd | 443 (9.1) | NA |
Completed high school | 4333 (89.4) | 2280 (95.4) |
Abbreviations: ARIC, Atherosclerosis Risk in Communities; CHD, coronary heart disease; HDL, high-density lipoprotein, MESA, Multi-Ethnic Study of Atherosclerosis; NA, not available.
SI conversion factor: To convert cholesterol and HDL cholesterol from mg/dL to mmol/L, multiply by 0.0259.
For the ARIC study, counts are for participants without a prior diagnosis of CHD at their visit 4 examination. For the MESA study, counts are for participants without a CHD diagnosis at their initial visit 1 examination.
Estimated 10-year risk is based on the 2013 American College of Cardiology and American Heart Association pooled cohort equations and was calculated using the race- and sex-specific formulas provided in the guidelines. Individuals with a 10-year estimated atherosclerotic risk of 7.5% or less are classified as low risk and individuals with a risk higher than 7.5% are classified as high risk.
Indicates that either a participant’s mother or father had a history of CHD.
Indicates that either a participant’s mother or father had a history of CHD prior to the age of 60 or 55 years, respectively.
The MESA cohort comprised genotyped individuals from 45 through 84 years old, recruited from 2000 through 2002, and followed up through 2015.14 Data were obtained through dB Gap (phs000209), and analyses were approved by the institutional review boards of Cedars-Sinai Medical Center, LA BioMed at Harbor UCLA, University of Washington (MESA DCC), and affiliated MESA field centers. All ARIC and MESA participants provided written informed consent.
Because the existing polygenic risk score was derived from a majority of persons (77%) with white European ancestry via genome-wide association study analysis15 and calibrated for use in this population, analyses were restricted to participants with European ancestry. In the ARIC cohort, genetic ancestry was determined using the STRUCTURE program.16,17 In the MESA cohort, individuals of European ancestry were those whose race was reported as white and confirmed by principal components analyses.
Genetic Data
Single-nucleotide polymorphism genotype data for both cohorts were acquired on the Affymetrix 6.0 SNP array. Quality control for the ARIC data set followed the guidelines accompanying the dB Gap release and used PLINK version 1.07.18 For both data sets, SNPs were imputed using the 1000 Genomes cosmopolitan phase 3 version 5 reference haplotypes. Closely related individuals were excluded by randomly removing one of each pair of individuals with pi-hat genetic relatedness that was greater than 0.05 for the ARIC study or pi-hat that was greater than 0.2 for the MESA cohort. Principal components used to control for population stratification19 were generated from the underlying SNP genotypes using the packages from SNPRelate for ARIC or EIGENSOFT for MESA.20
Phenotypes
In ARIC, the primary analyses examined age, smoking status (current vs other), systolic blood pressure, antihypertensive medication use, total cholesterol, high-density lipoprotein cholesterol, and type 2 diabetes status ascertained at the visit 4 examination (1996-1998). The analogous variables in the MESA cohort were ascertained at the baseline examination (visit 1, 2000-2002). In the ARIC cohort, a binary family history variable, which was not used in predictive models, was defined as positive if either the mother or father had CHD or negative if otherwise.
The ARIC study incident CHD cases were defined as having incident myocardial infarction (MI), fatal coronary event, or silent infarction or having undergone a revascularization procedure by December 31, 2015. The ARIC study prevalent CHD cases were participants with a reported history of MI, heart or arterial surgery, coronary artery bypass graft surgery, or angioplasty; or evidence of having had an MI based on electrocardiogram taken at their visit 1 examination. The MESA cohort incident CHD cases were defined as MI, resuscitated cardiac arrest, definite or probable angina if followed by a revascularization, and CHD death occurring by visit 5 (December 31, 2015). For each individual, 10-year risk based on the 2013 ACC/AHA pooled cohort equations was calculated using the race- and sex-specific formulas provided in the guidelines.2 Individuals were also grouped into low risk (10-year risk ≤7.5%) or high risk (>7.5%) groups based on the pooled risk equations.2 Individuals missing any measurement required to compute their pooled cohort equations–based risk were excluded from analyses.
CHD Polygenic Risk Score
These analyses used the CHD polygenic risk score previously developed by Khera et al7 and based on the summary statistics from the Coronary Artery Disease Genome Wide Replication and Meta-analysis plus the Coronary Artery Disease Genetics (CARDIOGRAMplusC4D) consortium GWAS analysis. The Khera study empirically evaluated a large number of polygenic risk scores that were created using differing SNP-selection methods. The study found that the best performing polygenic risk score was based on the linkage-disequilibrium SNP-reweighting approach encoded in the LDpred21 software package, which incorporated the majority of common SNPs analyzed in the GWAS. The best-performing score comprised 6 630 149 million SNPs and was the one used in our analyses. Single-nucleotide polymorphism weightings were downloaded from http://www.broadcvdi.org/informational/data.
The 6-million SNP score included a large number of SNPs below the genome-wide significance threshold for association with CHD, so it is likely that many of those SNPs did not contribute to the explanatory power of the score. In secondary analyses, the performance of 5 additional polygenic risk scores that used smaller numbers of SNPs, from 652 down to 44 corresponding to increasingly stringent thresholds for significance for association with CHD, was also evaluated. Each polygenic risk score was computed for each individual by summing the product of the allele weighting and the allele dosage across the selected SNPs.
Analysis
Although the primary focus was the prediction of incident CHD events, analyses were conducted that used the combination of prevalent and incident CHD events. Prevalent CHD was incorporated into the initial analyses so that the results presented herein could be compared with those of Khera et al, who examined prevalent cases in the UK Biobank data set. Khera et al showed that the polygenic risk score was significantly associated with CHD case status in the UK Biobank in logistic regression analyses.7 To demonstrate that the polygenic risk score in these analyses retained this feature, similar logistic regression analyses were used to measure the association of the polygenic risk score with CHD using prevalent and incident (prior to visit 4) CHD cases and controls from the ARIC study. The analyses adjusted for age, sex, and the first 5 principal components. Because adjusting for principal components is essential to ensure that any associations with a polygenic risk score are not attributable to population stratification, all statistical models incorporated principal components.
Subsequent analyses focused only on incident CHD events. Cox proportional hazards modeling was used to estimate hazard ratios and to compute 10-year CHD event probabilities. All analyses were adjusted for age, the top 5 principal components, and sex. Model fit was evaluated by examining Schoenfeld residuals to evaluate the proportional hazards assumptions for the covariates, Martingale residuals to assess nonlinearity, and deviance results to identify influential outliers.
Harrell C statistics were based on a 10-year follow-up window, as previously described.22 The C statistics were computed using a Cox model that included the pooled cohort equations estimated risk modeled as a continuous variable (range, 0-1).2 The primary analyses examined the difference in C statistics when the polygenic risk score was added to the pooled cohort risk model. We computed 95% confidence intervals for C statistics and for the difference in C statistic values between models by bootstrapping. Model calibration was assessed by comparing observed vs expected event probabilities using the Greenwood-Nam-D'Agostino χ2 test.23
The net reclassification improvement (NRI) assesses the correct reassignment among risk categories.24,25 Risk probabilities are determined by Cox modeling and a base model is compared with an alternate model that includes the additional classifier being evaluated.26 For the primary analyses, the base model included the pooled cohort risk classifier defined as low risk (≤7.5% 10-year risk of incident events) or high risk (>7.5%) based on the pooled risk equations. The alternate model included the CHD polygenic risk score. We used bootstrapping to determine 95% confidence intervals.
All statistical tests were 2-sided and a P < .05 was considered significant. We also considered significant 95% confidence intervals derived from bootstrapping that did not cross 0.
Separate models that included either only men or only women were also run.27 The R v3.5.0 package was used in conjunction with the survival, survminer, nricens, boot, and DescTools packages.
Results
The overall ARIC study sample consisted of 13 113 participants, of whom 7480 participants of European ancestry had complete data at their baseline visit 1 examination (Figure 1; and eTable 1 in the Supplement). Of these, 4847 participants between the ages of 53 and 74 years did not have a prior diagnosis of CHD at their visit 4 examination (43.6% men) (Table 1). The overall MESA sample comprised 6680 participants, of whom 2390 met the inclusion criteria (47.8% men) (Table 1 and Figure 1). The ARIC study reported 696 (14.4%) incident CHD events over a median follow-up of 15.5 years (interquartile range [IQR], 6.3 years) with 448 (64%) occurring in men. The MESA trial reported 227 (9.5%) incident CHD events over 14.2 years (IQR, 2.5 years) with 139 (61%) occurring in men. Demographic characteristics by sex are presented in eTable 2 in the Supplement.
In the ARIC study, 394 participants had prevalent CHD at visit 1 in 1986 and another 611 participants developed incident CHD before visit 4 in 1996. The association of the polygenic risk score with prevalent and incident CHD was assessed in this manner for the age- and sex-adjusted analyses to provide comparable information with prior studies that had used a similar end point. The polygenic risk score was significantly associated with CHD (adjusted odds ratio [OR] per SD increment, 1.89; 95% CI, 1.75-2.03; Table 2). Those in the top decile of the polygenic risk score had an adjusted OR of 3.19 (95%, CI, 2.64-3.84) compared with those in the lower 9 deciles. The odds of CHD for other quantiles of the polygenic risk score are presented in Table 2.
Table 2. Odds Ratios Associated With CHD Cases Prior to Visit 4 for Selected Polygenic Risk Score Percentiles Among ARIC Participants.
Polygenic Risk Scores | Odds Ratio (95% CI)a |
---|---|
Continuous per SD Increment | 1.89 (1.75-2.03) |
Top percentilesb | |
20 | 2.89 (2.49-3.36) |
10 | 3.19 (2.64-3.84) |
5 | 4.14 (3.25-5.26) |
2 | 4.81 (3.32-6.93) |
Abbreviations: ARIC, Atherosclerosis Risk in Communities; CHD, coronary heart disease.
Odds ratios (95% CIs) are derived from logistic regression models adjusted for age, sex, and 5 principal components. Differences in allele frequencies between cases and controls due to systematic ancestral differences (population stratification) can cause spurious genetic associations. Such differences can be captured by principal components. The statistical models were adjusted for principal components to ensure that associations with the polygenic risk score were not attributable to such population stratification.
The odds ratios are the risk of a CHD diagnosis for individuals in the top percentile of the distribution compared with individuals in the remaining sample. For instance, participants in the top 20% of the polygenic risk score distribution are compared with individuals in the bottom 80%.
The association of the polygenic risk score with incident CHD events in the ARIC cohort after the fourth visit in 1996 and the MESA cohort after the first visit in 2000 was examined next. The polygenic risk score was significantly associated with incident CHD in the ARIC cohort (adjusted hazards ratio [HR] per SD increment, 1.24; 95% CI, 1.15-1.34) and the MESA cohort (adjusted HR, 1.38; 95% CI, 1.21-1.58; Table 3). Hazard ratios associated with polygenic risk score values in the upper 5th and 10th percentiles are shown in Table 3 and are shown stratified by sex in eTable 3 in the Supplement.
Table 3. Hazard Ratios for ARIC and MESA Incident CHD Events for Selected Polygenic Risk Score Strata.
Polygenic Risk Scores | Hazard Ratio (95% CI)a | |
---|---|---|
ARIC | MESA | |
Continuous per SD increment | 1.24 (1.15-1.34) | 1.38 (1.21-1.58) |
Top percentilesb | ||
20 | 1.54 (1.30-1.83) | 1.63 (1.22-2.19) |
10 | 1.68 (1.35-2.09) | 1.74 (1.21-2.51) |
5 | 1.68 (1.25-2.26) | 2.15 (1.37-3.37) |
2 | 2.04 (1.33-3.13) | 2.68 (1.41-5.06) |
Abbreviations: ARIC, Atherosclerosis Risk in Communities; CHD, coronary heart disease; MESA, Multi-Ethnic Study of Atherosclerosis.
Hazard ratios (95% CIs) are derived from a Cox proportional hazards regression adjusted for age, sex, and 5 principal components.
The hazard ratios are the risk of a CHD diagnosis for individuals in the top quantile of the distribution compared with individuals in the remaining sample.
The C statistic associated with the polygenic risk score alone was 0.549 (95% CI, 0.521 to 0.571) for the ARIC cohort and 0.587 (95% CI, 0.532 to 0.623) for the MESA cohort (Table 4). A model that included age and sex in addition to the polygenic risk score had a C statistic of 0.669 (95% CI, 0.644 to 0.691) for the ARIC cohort and 0.672 (95% CI, 0.627 to 0.705) for the MESA cohort. The addition of the polygenic risk score to the pooled equations predictor (modeled as a continuous variable between 0 and 1) did not significantly change the C statistic in the ARIC cohort from 0.701 (difference, −0.001; 95% CI, −0.009 to 0.006). The addition of the polygenic risk score increased the C statistic in the MESA cohort from 0.660 to 0.681 (difference, 0.021; 95% CI, −0.0004 to 0.043; Table 4). In both data sets, the findings were similar when the polygenic risk score was dichotomized at various quartiles (Table 4). Similar findings were observed in sex-stratified analyses, analyses that used alternative risk scores comprising smaller numbers of SNPs, and analyses that excluded participants taking lipid-lowering statin medications (eTable 4, eTable 5, and eTable 6 in the Supplement).
Table 4. C Statistics Evaluating the Performance of the Polygenic Risk Score in ARIC and MESA.
Model | C Statistic (95% CI)a | |
---|---|---|
ARIC | MESA | |
5 principal components + PRSb | 0.549 (0.521-0.571) | 0.587 (0.532-0.623) |
Age + sex + 5 principal components | 0.663 (0.638-0.684) | 0.646 (0.600-0.681) |
Age + sex + 5 principal components + PRS | 0.669 (0.644-0.691) | 0.672 (0.627-0.705) |
Base modelc | 0.701 (0.679-0.722) | 0.660 (0.613-0.694) |
Base model + PRS | 0.700 (0.677-0.721) | 0.681 (0.637-0.715) |
Base model + PRS: Top 10%d | 0.700 (0.676-0.721) | 0.675 (0.63-0.711) |
Base model + PRS: Top 20%d | 0.700 (0.675-0.721) | 0.670 (0.625-0.703) |
Base model + family history | 0.705 (0.681-0.725) | NA |
Abbreviations: ARIC, Atherosclerosis Risk in Communities; MESA, Multi-Ethnic Study of Atherosclerosis; NA, not available; PRS, polygenic risk score.
C statistics are based on 10-year incident events from a Cox regression model.
The PRS is modeled as a continuous variable.
The base model includes the pooled cohort risk percentile (a continuous variable) based on the pooled equations, sex, age, and 5 principal components.
The PRS is modeled as a binary phenotype representing individuals in the top 10% of the score distribution vs the bottom 90% or the top 20% vs the bottom 80%, respectively.
Calibration was assessed by comparing expected and actual event rates for CHD models with and without the polygenic risk score. In the ARIC study, the pooled cohort equations model categorized 39.2% of the sample as low risk (predicted 10-year event rate ≤7.5%) and 60.8% as high risk (predicted 10-year event rate >7.5%) (Figure 2). Actual event rates in these groups were 4.4% and 16.7%, respectively. After adding the polygenic risk score, the model categorized similar proportions of individuals as low risk (42.2%) and high risk (57.8%), with similar event rates as well (4.4% and 17.3%). Calibration analyses suggested better calibration in the model without the polygenic risk score (Greenwood-Nam-D’Agostino χ2, P = .85) than with the polygenic risk score (P = .03) (eFigure 1 in the Supplement).
In the MESA trial, the pooled cohort equations model categorized 54.7% of the sample as low risk and 45.3% as high risk (Figure 2) with actual event rates of 3.4% and 13.4%, respectively. After adding the polygenic risk score, the proportion of individuals categorized as low risk was 59.2% and as high risk was 40.8%, with event rates of 3.8% and 14.0%, respectively. Both models showed good calibration (Greenwood-Nam-D’Agostino χ2, P = .39 and P = .93, respectively, for models without and with the polygenic risk score).
Adding the polygenic risk score to the pooled cohort risk categories did not significantly improve classification accuracy in either the ARIC (NRI, 0.018; 95% CI, −0.012 to 0.036) or the MESA cohorts (NRI, 0.001; 95% CI, −0.038 to 0.076; Table 5). The overall proportions of individuals reclassified to a new category were 4.4% in the ARIC study and 6.9% the MESA study. Among those who subsequently developed a CHD event, these reclassifications were often incorrect (80.0% of reclassifications in the ARIC cohort, and 78.6% of reclassifications in the MESA cohort, Figure 2). There was no significant improvement associated with the polygenic risk score in analyses stratified by sex (eTable 7 in the Supplement).
Table 5. Reclassification Based on the Net Reclassification Improvement.
Model 1a | Model 2a | Net Reclassification Improvement (95% CI)b | |
---|---|---|---|
ARIC | MESA | ||
Age + sex | + Pooled cohort risk group3 | 0.020 (−0.015 to 0.091) | 0.112 (0.001 to 0.167) |
Age + sex | + Polygenic risk score | − 0.022 (− 0.036 to 0.021) | 0.042 (− 0.046 to 0.102) |
Age + sex + pooled cohort risk groupc | + Polygenic risk score | 0.018 (− 0.012 to 0.036) | 0.001 (− 0.038 to 0.076) |
Abbreviations: ARIC, Atherosclerosis Risk in Communities; MESA, Multi-Ethnic Study of Atherosclerosis.
All models are additionally adjusted for 5 principal components.
The NRI compares participant reassignment to high- vs low-risk categories for a base statistical model (model 1) compared with a model that includes an additional covariate (model 2). The NRI reported herein is based on a 7.5% 10-year coronary heart disease risk threshold, with risk being estimated using a Cox proportional hazards model. The maximum value of the categorical NRI is 2, because it is the sum of the net proportions of correct reclassifications for events and nonevents. Most cardiovascular risk factors have NRI values in excess of 0.10.
The pooled cohort group is a binary classifier for low vs high risk based on the pooled equations classifier.
Discussion
A CHD polygenic risk score offered little to no improvement in CHD risk stratification in middle-aged white populations from 2 well-characterized retrospective studies involving adults of white, European ancestry. The score minimally changed risk discrimination and reclassified fewer than 10% of individuals to a higher or lower CHD risk category. Furthermore, among individuals who subsequently developed CHD, the specific group that screening programs wish to identify, the majority of reclassifications (79%-80%) were incorrect. Neither the proportions of individuals categorized as high risk or low risk nor the observed event rates in each group were substantially altered by the polygenic risk score, suggesting that implementation may have limited effect at the population level.
These analyses used the polygenic risk score developed by Khera and colleagues,7 based on 6 million SNPs from an international GWAS and cross-sectionally validated in the UK Biobank, a middle-aged population of largely European ancestry. Similar results were seen with regard to the robust association of the polygenic risk score with prevalent CHD disease risk, with relative risk estimates in a similar range. These data support the overall applicability of their genetic model to these US-based white populations and suggest that its limited predictive utility was not a result of poor model selection.28 This study also found that polygenic risk scores comprising smaller numbers of SNPs strongly associated with CHD risk did not perform better than the 6-million SNP predictor.
These findings underscore the frequent discordance between statistical association and predictive performance, a phenomenon that has been observed with other cardiovascular biomarkers.4 Odds ratios greater than 10 are typically required for new risk markers to substantially improve model discrimination.29,30 The odds ratio associated with being in the top 5% of polygenic risk score (≈ 4) is comparable with that observed with other biomarkers such as C-reactive protein and homocysteine that have been shown to have similarly modest predictive utility.31,32
Initial studies characterizing highly polygenic CHD predictors using UK Biobank data reported C statistics of approximately 0.80.7,8 However, these estimates were from models that included age and sex, which are the major determinants of CHD risk.33 These analyses highlight the modest performance of the polygenic risk score when considered alone, consistent with the findings of Inouye et al.8 Another distinction between these analyses and the aforementioned studies is the primary focus on incident events rather than prevalent CHD. A few studies have examined prospective outcomes, and they have observed results similar to those reported herein. For instance, the C statistics for an analysis of incident CHD in French-Canadians were 0.56 to 0.60, despite findings of high risk estimates associated with being in the tails of the polygenic risk score distribution.10 Similarly, a study involving approximately 52 000 white people in a Northern California health care system found a C statistic improvement of 0.008 when a polygenic risk score was added to the Framingham risk score.9
A proposed strength of the polygenic risk score is its ability to identify a subgroup of individuals with a relative risk of CHD comparable with individuals with monogenic traits, such as familial hypercholesterolemia. In the UK Biobank, the polygenic risk score identified 8% of the sample with a relative risk of CHD of 3, similar to the risk associated with some familial hypercholesterolemia mutations.34 Enrichment of high-risk individuals in the tails of the distribution is a feature of many biomarkers of modest prognostic utility.35 Furthermore, an important distinction between risk stratification using a mendelian variant or a rare genetic variant vs a polygenic risk score is that the former identifies individuals with a specific mechanism of disease that can be targeted. In contrast, the polygenic risk score does not focus on an underlying mechanism, biology, or behavior that can be intervened upon. Interventions promoting general cardiovascular risk modification in individuals with high genetic risk have yielded mixed results to date.36,37 Thus, the clinical value of the polygenic risk score relies principally on its ability to risk stratify individuals, which, in this case, appears limited.
Another potential advantage of the polygenic risk score, compared with conventional risk markers, is that it can be assessed at an early age. Given the poor discriminative performance of a polygenic risk score observed in these analyses, the clinical implications of finding a high polygenic risk score in a young person with very low absolute risk are unclear, in the absence of an identifiable risk factor such as hyperlipidemia. Screening with a polygenic risk score could provide motivation for lifestyle modification (eg, better diet or increased physical activity), but there may be simpler ways to promote such interventions at the individual or population level.
Limitations
This study has several limitations. First, the SNP weights for the polygenic risk score were derived from the CARDIOGRAMplusC4D GWAS. Although 77% of participants in this GWAS were of European ancestry, inclusion of other ancestries could affect SNP weights and attenuate the performance of the polygenic risk score in a European ancestry population. Similarly, CARDIOGRAMplusC4D captured a heterogenous collection of CHD cases, so while SNP weightings derived from this study may be well-suited to the heterogenous mix of cases that would be expected in the community-based cohorts used in these analyses, the weightings are not optimized to capture specific CHD subtypes such as early-onset CHD. Second, the ARIC study was one of the cohorts in CARDIOGRAMplusC4D, which might lead to an overestimation of the performance of the polygenic risk score. The ARIC study also contributed data used for the pooled cohort equations; however, the ARIC events that contributed to the pooled cohort equations largely occurred prior to the visit 4 time point used in this study. The consistency of the results between the ARIC and MESA studies (the latter of which did not contribute to the derivation of the pooled cohort equations) further supports the study findings. Third, the pooled cohort classifier was calibrated to identify individuals at risk of any atherosclerotic cardiovascular event, not just CHD. This might lead to an overestimation of the performance of the polygenic risk score relative to the pooled cohort equations.
Fourth, a 10-year risk threshold of 7.5% was used to assess reclassification, based on the current ACC/AHA cholesterol guidelines. It is possible that the performance of the predictor could vary using other thresholds. Fifth, the analyses were restricted to participants in epidemiological cohorts, who may not be representative of individuals seen in other settings such as hospitals or clinics. Sixth, these analyses were restricted to individuals of European descent because the polygenic risk score was calibrated to a European-ancestry GWAS using a European linkage-disequilibrium reference panel. The inability to assess polygenic prediction in nonwhite individuals underscores the problem of limited diversity in prior GWAS.38
Conclusions
In this analysis of 2 cohorts of US adults, the polygenic risk score was associated with incident coronary heart disease events but did not significantly improve discrimination, calibration, or risk reclassification compared with conventional predictors. These findings suggest that a polygenic risk score may not enhance risk prediction in a general, white middle-aged population.
References
- 1.Benjamin EJ, Muntner P, Alonso A, et al. ; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee . Heart disease and stroke statistics-2019 update: a report from the American Heart Association. Circulation. 2019;139(10):e56-e528. doi: 10.1161/CIR.0000000000000659 [DOI] [PubMed] [Google Scholar]
- 2.Goff DC Jr, Lloyd-Jones DM, Bennett G, et al. ; American College of Cardiology/American Heart Association Task Force on Practice Guidelines . 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(25)(suppl 2):S49-S73. doi: 10.1161/01.cir.0000437741.48606.98 [DOI] [PubMed] [Google Scholar]
- 3.Muntner P, Colantonio LD, Cushman M, et al. . Validation of the atherosclerotic cardiovascular disease pooled cohort risk equations. JAMA. 2014;311(14):1406-1415. doi: 10.1001/jama.2014.2630 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Greenland P, Hassan S. Precision preventive medicine-ready for prime time? JAMA Intern Med. 2019;179(5):605-606. doi: 10.1001/jamainternmed.2019.0142 [DOI] [PubMed] [Google Scholar]
- 5.Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19(9):581-590. doi: 10.1038/s41576-018-0018-x [DOI] [PubMed] [Google Scholar]
- 6.Khera AV, Chaffin M, Wade KH, et al. . Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177(3):587-596.e9. doi: 10.1016/j.cell.2019.03.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Khera AV, Chaffin M, Aragam KG, et al. . Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50(9):1219-1224. doi: 10.1038/s41588-018-0183-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Inouye M, Abraham G, Nelson CP, et al. ; UK Biobank CardioMetabolic Consortium CHD Working Group . Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J Am Coll Cardiol. 2018;72(16):1883-1893. doi: 10.1016/j.jacc.2018.07.079 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Iribarren C, Lu M, Jorgenson E, et al. . Clinical utility of multimarker genetic risk scores for prediction of incident coronary heart disease: a cohort study among over 51 000 individuals of European ancestry. Circ Cardiovasc Genet. 2016;9(6):531-540. doi: 10.1161/CIRCGENETICS.116.001522 [DOI] [PubMed] [Google Scholar]
- 10.Wünnemann F, Sin Lo K, Langford-Avelar A, et al. . Validation of genome-wide polygenic risk scores for coronary artery disease in French Canadians. Circ Genom Precis Med. 2019;12(6):e002481. doi: 10.1161/CIRCGEN.119.002481 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Knowles JW, Ashley EA. Cardiovascular disease: the rise of the genetic risk score. PLoS Med. 2018;15(3):e1002546. doi: 10.1371/journal.pmed.1002546 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Warren M. The approach to predictive medicine that is taking genomics research by storm. Nature. 2018;562(7726):181-183. doi: 10.1038/d41586-018-06956-3 [DOI] [PubMed] [Google Scholar]
- 13.The ARIC investigators The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129(4):687-702. doi: 10.1093/oxfordjournals.aje.a115184 [DOI] [PubMed] [Google Scholar]
- 14.Bild DE, Bluemke DA, Burke GL, et al. . Multi-Ethnic Study of Atherosclerosis: objectives and design. Am J Epidemiol. 2002;156(9):871-881. doi: 10.1093/aje/kwf113 [DOI] [PubMed] [Google Scholar]
- 15.Nikpay M, Goel A, Won H-H, et al. . A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015;47(10):1121-1130. doi: 10.1038/ng.3396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945-959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mosley JD, van Driest SL, Wells QS, et al. . Defining a contemporary ischemic heart disease genetic risk profile using historical data. Circ Cardiovasc Genet. 2016;9(6):521-530. doi: 10.1161/CIRCGENETICS.116.001530 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Purcell S, Neale B, Todd-Brown K, et al. . PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559-575. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190. doi: 10.1371/journal.pgen.0020190 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28(24):3326-3328. doi: 10.1093/bioinformatics/bts606 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Vilhjálmsson BJ, Yang J, Finucane HK, et al. ; Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study . Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet. 2015;97(4):576-592. doi: 10.1016/j.ajhg.2015.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982;247(18):2543-2546. doi: 10.1001/jama.1982.03320430047030 [DOI] [PubMed] [Google Scholar]
- 23.Demler OV, Paynter NP, Cook NR. Tests of calibration and goodness-of-fit in the survival setting. Stat Med. 2015;34(10):1659-1680. doi: 10.1002/sim.6428 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157-172. doi: 10.1002/sim.2929 [DOI] [PubMed] [Google Scholar]
- 25.Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. Net reclassification indices for evaluating risk prediction instruments: a critical review. Epidemiology. 2014;25(1):114-121. doi: 10.1097/EDE.0000000000000018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Baker SG, Schuit E, Steyerberg EW, et al. . How to interpret a small increase in AUC with an additional risk prediction marker: decision analysis comes through. Stat Med. 2014;33(22):3946-3959. doi: 10.1002/sim.6195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hajek C, Guo X, Yao J, et al. . Coronary heart disease genetic risk score predicts cardiovascular disease risk in men, not women. Circ Genom Precis Med. 2018;11(10):e002324. doi: 10.1161/CIRCGEN.118.002324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Janes H, Pepe MS, Gu W. Assessing the value of risk predictions by using risk stratification tables. Ann Intern Med. 2008;149(10):751-760. doi: 10.7326/0003-4819-149-10-200811180-00009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wald NJ, Old R. The illusion of polygenic disease risk prediction. Genet Med. 2019;21(8):1705-1707. doi: 10.1038/s41436-018-0418-5 [DOI] [PubMed] [Google Scholar]
- 30.Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882-890. doi: 10.1093/aje/kwh101 [DOI] [PubMed] [Google Scholar]
- 31.Homocysteine Studies Collaboration Homocysteine and risk of ischemic heart disease and stroke: a meta-analysis. JAMA. 2002;288(16):2015-2022. doi: 10.1001/jama.288.16.2015 [DOI] [PubMed] [Google Scholar]
- 32.Kaptoge S, Di Angelantonio E, Pennells L, et al. ; Emerging Risk Factors Collaboration . C-reactive protein, fibrinogen, and cardiovascular disease prediction. N Engl J Med. 2012;367(14):1310-1320. doi: 10.1056/NEJMoa1107477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Karmali KN, Goff DC Jr, Ning H, Lloyd-Jones DM. A systematic examination of the 2013 ACC/AHA pooled cohort risk assessment tool for atherosclerotic cardiovascular disease. J Am Coll Cardiol. 2014;64(10):959-968. doi: 10.1016/j.jacc.2014.06.1186 [DOI] [PubMed] [Google Scholar]
- 34.Perak AM, Ning H, de Ferranti SD, Gooding HC, Wilkins JT, Lloyd-Jones DM. Long-term risk of atherosclerotic cardiovascular disease in US adults with the familial hypercholesterolemia phenotype. Circulation. 2016;134(1):9-19. doi: 10.1161/CIRCULATIONAHA.116.022335 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wang TJ, Gona P, Larson MG, et al. . Multiple biomarkers for the prediction of first major cardiovascular events and death. N Engl J Med. 2006;355(25):2631-2639. doi: 10.1056/NEJMoa055373 [DOI] [PubMed] [Google Scholar]
- 36.Knowles JW, Zarafshar S, Pavlovic A, et al. . Impact of a genetic risk score for coronary artery disease on reducing cardiovascular risk: a pilot randomized controlled study. Front Cardiovasc Med. 2017;4:53. doi: 10.3389/fcvm.2017.00053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kullo IJ, Jouni H, Austin EE, et al. . Incorporating a genetic risk score into coronary heart disease risk estimates: effect on low-density lipoprotein cholesterol levels (the MI-GENES Clinical Trial). Circulation. 2016;133(12):1181-1188. doi: 10.1161/CIRCULATIONAHA.115.020109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.De La Vega FM, Bustamante CD. Polygenic risk scores: a biased prediction? Genome Med. 2018;10(1):100. doi: 10.1186/s13073-018-0610-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.