Abstract
Background
In 2023, the American Heart Association PREVENT (Predicting Risk of Cardiovascular Disease Events) equations were introduced as a tool to improve cardiovascular disease (CVD) risk prediction. This study tests their performance in a diverse socioeconomic cohort.
Methods
We analyzed All of Us participants aged 30 to 79 years without baseline CVD who had required PREVENT input data over a 5.4‐year follow‐up. Discrimination was assessed using Harrell's C‐statistic, with calibration by comparing predicted and observed 5‐year CVD rates across 10‐year risk deciles. Mean data are ±SD.
Results
We examined 9010 individuals (mean age, 63.0±11.0 years; 45.5% male). Racial and ethnic composition was 61.7% non‐Hispanic White, 17.2% non‐Hispanic Black, 4.5% multiracial/other, 1.3% non‐Hispanic Asian, and 11.2% Hispanic or Latino. The "other" race/ethnic category reflects participants who self‐identified as "other" in response to the, "Which category describes you?" item in the Basics survey. Over a mean follow‐up of 3.6±1.8 years, 9.0% experienced a cardiovascular event. The mean 10‐year predicted risks were 0.23±0.17 for total CVD, 0.13±0.10 for atherosclerotic CVD (ASCVD), and 0.19±0.17 for heart failure. The predicted‐to‐observed rate ratios were 5.3 for CVD and 3.3 for ASCVD. The C statistic for the overall sample was 0.732 (95% CI, 0.718–0.752) for CVD, 0.716 (95% CI, 0.698–0.741) for ASCVD, and 0.777 (95% CI, 0.757–0.800) for heart failure.
Conclusions
The PREVENT equations showed strong discrimination across all strata in this national cohort. Overprediction of CVD events likely reflects baseline differences in comorbidity burden between the PREVENT development cohort and this All of Us cohort, particularly due to the exclusion of individuals missing estimated glomerular filtration rate, a variable not routinely collected and likely missing, not at random. Strong discrimination supports potential clinical utility, though further work is needed to improve calibration in this population.
Keywords: cardiovascular disease, cardiovascular models, heart failure, risk prediction, social determinants of health
Subject Categories: Race and Ethnicity, Social Determinants of Health, Cardiovascular Disease
Nonstandard Abbreviations and Acronyms
- AHA
American Heart Association
- AoU
All of Us
- PREVENT
Predicting Risk of Cardiovascular Disease Events
Clinical Perspective.
What Is New?
The American Heart Association (AHA) PREVENT (Predicting Risk of Cardiovascular Disease Events) equations were introduced in late 2023 as a new tool to improve CVD risk prediction.
The AHA PREVENT equations were shown to have good model discrimination ability among diverse racial and ethnic groups but their performance among diverse socioeconomic groups has been less explored.
Using the national All of Us cohort, we show that the AHA PREVENT Equations have good discriminatory ability, but slight overprediction for short‐term cardiovascular disease risk.
What Are the Clinical Implications?
The AHA PREVENT equations display strong model discrimination by various socioeconomic groups.
The AHA PREVENT equations have utility for clinical decision‐making for cardiovascular disease risk prediction, especially among diverse socioeconomic groups.
However, additional calibration of the AHA PREVENT equation is needed for shorter‐term risk prediction.
In late 2023, the American Heart Association (AHA) introduced the PREVENT (Predicting Risk of Cardiovascular Disease Events) equations as an updated method to guide cardiovascular disease (CVD) risk prediction. 1 Previous equations, such as the Pooled Cohort Equations and the Pooled Cohort Equations to Prevent Heart Failure, often used condition‐specific inputs (eg, QRS duration for Heart Failure) to predict specific CVD outcomes. 2 , 3 In contrast, the PREVENT equations provided a unifying framework to predict the 10‐year and 30‐year risk of atherosclerotic CVD (ASCVD), coronary heart disease, stroke, and heart failure (HF), and total CVD (ie, any CVD outcome including ASCVD, coronary heart disease, stroke, and HF). This unifying framework also accounted for the interplay of cardiovascular‐kidney‐metabolic syndrome by including estimated glomerular filtration rate (eGFR) and urinary albumin‐to‐creatinine ratio as predictors within the equation. Additionally, the PREVENT equations expanded the eligibility of valid patients to people aged 30 to 79 years, reflecting the rising incidence of CVD among younger people. Finally, the PREVENT equations removed racial correction factors and incorporated social deprivation indices, citing the lack of utility of previous equations in predicting CVD among non‐White and non‐Black groups and the need to consider upstream social factors that influence CVD disparities by race.
The National Institutes of Health's AoU (All of Us) research program offers a valuable opportunity to evaluate the generalizability of the PREVENT equations to the US population, addressing the limitations of previous validation cohorts. These cohorts—REGARDS (Reasons for Geographic and Racial Differences in Stroke), CRIC (Chronic Renal Insufficiency Cohort), RBS (Rancho Bernardo Study), and OLDW (OptumLabs Data Warehouse)—have significant generalizability constraints. REGARDS mainly focuses on stroke disparities between Black and White individuals in the Southeastern United States and lacks representation of other racial and ethnic groups. CRIC targets individuals with chronic kidney disease, resulting in a narrow population profile. RBS includes only older adults in a predominantly White, higher‐socioeconomic suburban community in Southern California, whereas OLDW often overlooks uninsured individuals. Additionally, these studies rely on extensive follow‐up data, which may not reflect the fragmented health care records common in the United States. In contrast, AoU's intentional recruitment of underrepresented individuals across the United States since 2018 positions it as an ideal validation cohort for a contemporary study. AoU also contains comprehensive survey data that are linked to electronic health records and laboratory data. 4 By examining AoU's diverse population over the initial follow‐up period, we aim to assess model performance in a real‐world population—an underexplored area—providing insights into variations influenced by common yet overlooked limitations.
The PREVENT equations have shown great promise in improving CVD risk prediction (with C‐total statistic interquartile interval [IQI] of 0.789 [0.778–0.810] for women and 0.745 [0.734–0.745] for men for Total CVD, and 0.768 [0.756–0.788] for women and 0.730 [0.718–0.741] for men for ASCVD). However, the equations' generalizability to ongoing diverse racial and socioeconomic cohorts is less known. 1 In this study, we provide insights on the performance of the total CVD and ASCVD PREVENT equations across stratified income and educational groups to further assess their generalizability within community‐based cohorts.
METHODS
Transparency and Openness Promotion Statement
Data used in this study are available to registered users on the AoU Researcher Workbench. The code used for statistical analyses can be accessed on the workbench upon request.
Data Source
This study used curated data repository version 8 from the National Institutes of Health's AoU research program. 5 The longitudinal research database encompasses a wide range of data, including electronic health records, surveys, and genomics, from a diverse US population. As a secondary analysis of deidentified data, this study does not meet federal or Stanford University criteria for human subjects research and therefore does not require institutional review board approval. All participants previously provided informed consent as part of the AoU program. 6
Study Design
This study initially included 38 523 individuals without CVD at baseline. We excluded those with clinical variable values outside of the range of the original development criteria for the equations. 1 Additionally, participants missing responses on smoking frequency from the AoU Lifestyle questionnaire were excluded. To assess potential differences between the included and excluded groups, we examined systematic differences by self‐reported race or ethnicity and socioeconomic status (SES) categories.
Variables
The clinical variables used in this analysis include those required for each PREVENT equation: total cholesterol, high‐density lipoprotein cholesterol, systolic blood pressure, body mass index, and serum creatinine, which was used to calculate eGFR using the race‐free Chronic Kidney Disease Epidemiology Collaboration equation. 7 It is worth noting that there was large missingness in creatinine (and thus eGFR) values within the AoU data set. Although we initially considered imputing missing eGFR values, we chose not to given that these values were not missing at random.
Additional variables included were diabetes diagnosis based on International Classification of Diseases (ICD) codes, as well as the use of antihypertensive and lipid‐lowering medications, as identified through Anatomical Therapeutic Chemical medication codes. Our key independent variables of interest were race or ethnicity, annual income, and educational attainment. We examined 6 racial and ethnic groups: Hispanic, non‐Hispanic Asian, non‐Hispanic Black, non‐Hispanic multiracial, non‐Hispanic White, and missing race or ethnicity. In our descriptive analyses, we report the proportion of each group. However, because of small sample sizes we were unable to calculate stable discrimination statistics during analyses. For this reason, we examine the overall performance for all participants regardless of race or ethnicity and specifically among non‐Hispanic Black and non‐Hispanic White individuals.
For income, we originally examined 7 distinct income categories: <$35k, $35k to $50k, $50k to $75k, $75k to $100k, $100k to $150k, >$150k, and missing income. Due to small sample sizes, we condensed our income categories to the following: <$50k, $50k to $100k, and >$100k.
For educational attainment, we examined 5 categories: primary or some secondary school, high school or General Educational Development (GED), some college, college graduate or higher, and missing education. Due to small sample sizes, we are unable to provide equation performance for missing education. Additionally, we combined individuals with primary or some secondary school with those with a high school or GED degree to create a “high school diploma/GED or lower” category.
Finally, we examine the varying distribution of race or ethnicity and SES across risk deciles.
Outcome Variable
Individuals were identified as having a CVD outcome if they had the ICD, Ninth Revision (ICD‐9) or ICD, Tenth Revision (ICD‐10) diagnosis codes for 1 of the outcomes. Time to event for CVD outcomes was censored at the time of the event of interest, death from any cause, or on October 1, 2023, which is the AoU cutoff date for the latest version of the curated data repository.
Events were coded into 4 categories: no event, ASCVD only, HF only, and both ASCVD and HF. This approach captured total CVD as a composite outcome, aligning with the definition used in the Khan et al study. 1 ASCVD was defined as a composite of coronary heart disease and stroke.
Statistical Analysis
A Cox proportional hazards model was fitted to assess the risk factors for total CVD, ASCVD, and HF modeling time to first event separately for each outcome. Bootstrapping was employed to estimate the 95% CIs for the C index. Separate Cox models were fitted for subgroups based on race and SES to evaluate the performance and risk factors within these groups. Calibration was assessed by comparing predicted 10‐year risk to observed 5‐year event rates for total CVD, ASCVD, and HF separately, stratified by decile, race or ethnicity, and SES.
ANOVA was conducted to assess significant differences across risk deciles for all strata. Mean data are ±SD. Post hoc tests, including Tukey's honest significant difference and pairwise t tests with Bonferroni correction, identified specific group differences where ANOVA indicated significance. χ2 tests compared the distributions of socioeconomic variables and health outcomes between included and excluded groups. All tests were performed using R version 4.4.0 (2024‐04‐24).
RESULTS
Study Cohort
The final cohort comprised 9010 individuals (Table 1) with a mean age of 63.0±11.0 years. The majority of participants were female (54.5%), and 45.5% were male. The racial and ethnic composition of the cohort included 11.2% Hispanic, 1.3% non‐Hispanic Asian, 17.2% non‐Hispanic Black, 4.5% non‐Hispanic multiracial, and 61.7% non‐Hispanic White patients, with 4.2% missing data on race or ethnicity. Educational attainment varied, with 42.3% holding a college degree or higher, 30.1% having some college education, 18.7% with a high school diploma or GED, and 6.9% with only primary or some secondary education; 1.9% had missing education data. Income distribution showed 42.2% earning <$50k annually, 20.4% earning between $50k and $100k, 22.4% earning at least $100k; income data were missing for 15.0% of the cohort. Employment status indicated that 31.8% were employed for wages, 3.0% were homemakers, 9.4% were out of work, 28.0% were retired, 6.7% were self‐employed, 0.9% were students, and 18.0% were unable to work, with 2.2% missing employment status data.
Table 1.
Descriptive Statistics by AHA PREVENT Risk Category, All of Us Cohort
| Level | Overall | Low risk (<5%) | Borderline risk (5%–7.4%) | Intermediate risk (7.5%–19.9%) | High risk (≥20%) |
|---|---|---|---|---|---|
| No. | 9010 | 1710 | 1099 | 4517 | 1684 |
| Age, y, mean±SD | 63.0±11.0 | 52.2±8.5 | 62.9±8.5 | 67.36±9.5 | 62.9±10.7 |
| Sex | |||||
| Female (%) | 4910 (54.5) | 1229 (71.0) | 724 (65.9) | 2229 (49.3) | 728 (43.2) |
| Male (%) | 4100 (45.5) | 481 (28.1) | 375 (34.1) | 2288 (50.7) | 956 (56.8) |
| Race and ethnicity | |||||
| Hispanic (%) | 1007 (11.2) | 180 (10.5) | 93 (8.5) | 420 (9.3) | 314 (18.6) |
| Non‐Hispanic Asian (%) | 115 (1.3) | 39 (2.3) | * | 40 (0.9) | * |
| Non‐Hispanic Black (%) | 1547 (17.2) | 262 (15.3) | 163 (14.8) | 749 (16.6) | 373 (22.1) |
| Non‐Hispanic multiracial/other (%) | 403 (4.5) | 86 (5.0) | 40 (3.6) | 185 (4.1) | * |
| Non‐Hispanic White (%) | 5559 (61.7) | 1089 (63.7) | 748 (68.1) | 2939 (65.1) | 783 (46.5) |
| Missing race or ethnicity (%) | 379 (4.2) | 54 (3.2) | * | 184 (4.1) | 103 (6.1) |
| Educational attainment | |||||
| Never attended (%) | * | * | * | * | * |
| Primary or some secondary (%) | 615 (6.8) | 64 (3.7) | * | 267 (5.9) | 239 (14.2) |
| High school diploma or General Educational Development (%) | 1686 (18.7) | 221 (12.9) | 179 (16.3) | 848 (18.8) | 438 (26.0) |
| Some college (%) | 2721 (30.2) | 477 (27.9) | 337 (30.7) | 1341 (29.7) | 566 (33.6) |
| College graduate or higher (%) | 3807 (42.3) | 923 (54.0) | 527 (48.0) | 1989 (44.0) | 368 (21.9) |
| Missing education (%) | 175 (1.9) | * | * | * | * |
| Annual income | |||||
| <$50k (%) | 3798 (42.2) | 581 (34.0) | 403 (36.7) | 1893 (41.9) | 921 (54.7) |
| $50–100k (%) | 1840 (20.4) | 371 (21.7) | 240 (21.8) | 1001 (22.2) | 228 (13.5) |
| >$100k (%) | 2022 (22.4) | 583 (34.1)) | 303 (27.6) | 1001 (22.2) | 174 (10.3) |
| Missing income (%) | 1350 (15.0) | 175 (10.2) | 153 (13.9) | 661 (14.6) | 361 (21.4) |
| Current employment | |||||
| Employed (%) | 2862 (31.8) | 935 (54.7 | 418 (38.0) | 1151 (25.5) | 358 (21.3) |
| Homemaker (%) | 271 (3.0) | 96 (5.6) | 39 (3.5) | 89 (2.0) | ‐ |
| Out of work (%) | 847 (9.4) | 164 (9.6) | 100 (9.1) | 348 (7.7) | 235 (14.0) |
| Retired (%) | 2523 (28.0) | 79 (4.6) | 224 (20.4) | 1801 (39.9) | 410 (24.3) |
| Self‐employed (%) | 604 (6.7) | 136 (8.0) | 88 (8.0) | 288 (6.4) | 92 (5.5) |
| Student (%) | 81 (0.9) | 37 (2.2) | * | * | * |
| Unable to work (%) | 1624 (18.0) | 232 (13.6) | 197 (17.9) | 736 (16.3) | 459 (27.3) |
| Missing employment status (%) | 198 (2.2) | 31 (1.8) | * | * | 61 (3.6) |
| Health outcomes and behaviors | |||||
| Body mass index, kg/m2, mean±SD | 29.09 ±5.09 | 28.39 ±5.14 | 28.52 ±4.96 | 29.14 ±5.07 | 30.02 ±5.05 |
| Current smoking (%) | 2431 (27.0) | 297 (17.4) | 225 (20.5) | 1071 (23.7) | 838 (49.8) |
| Diabetes (%) | 2719 (30.2) | 100 (5.8) | 178 (16.2) | 1365 (30.2) | 1076 (63.9) |
| Total cholesterol, mg/dL, mean±SD | 212.38 ±42.29 | 209.29 ±36.88 | 218.84 ±41.35 | 213.05 ±42.57 | 209.49 ±46.56 |
| High‐density lipoprotein cholesterol, mg/dL, mean±SD | 60.74 ±17.25 | 66.58 ±17.09 | 66.86 ±17.33 | 61.32 ±16.44 | 49.27 ±13.55 |
| eGFR, mL/min/1.73m2, mean±SD | 59.85 ±36.18 | 87.27 ±18.10 | 77.47 ±20.56 | 60.04 ±32.75 | 20.01 ±31.07 |
| Systolic blood pressure, mm Hg, mean±SD | 146.11 ±23.30 | 134.26 ±17.83 | 141.80 ±20.02 | 148.12 ±23.07 | 155.58 ±25.27 |
| Antihypertensive medication use (%) | 5822 (64.6) | 678 (39.6) | 584 (53.1) | 3186 (70.5) | 1374 (81.6) |
| Statin medication use (%) | 4616 (51.2) | 342 (20.0) | 497 (45.2) | 2752 (60.9) | 1025 (60.9) |
| Cardiovascular outcomes | |||||
| 10‐y total CVD risk± mean±SD | 0.23 ±0.17 | 0.04 ±0.02 | 0.11 ±0.02 | 0.23 ±0.08 | 0.50 ±0.14 |
| 10‐y total ASCVD risk, mean±SD | 0.13 ±0.10 | 0.03 ±0.01 | 0.06 ±0.01 | 0.13 ±0.03 | 0.29 ±0.10 |
| 10‐y HF risk, mean±SD | 0.19 ±0.17 | 0.03 ±0.02 | 0.07 ±0.03 | 0.18 ±0.10 | 0.45 ±0.18 |
| CVD (%) | 807 (9.0) | 55 (3.2) | 56 (5.1) | 397 (8.8) | 299 (17.8) |
| ASCVD (%) | 517 (5.7) | 44 (2.6) | 42 (3.8) | 246 (5.4) | 185 (11.0) |
| HF (%) | 410 (4.6) | * | * | 209 (4.6) | 168 (10.0) |
| Died during follow‐up (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) |
AHA indicates American Heart Association; ASCVD, atherosclerotic cardiovascular disease; CVD, cardiovascular disease; HF, heart failure; and PREVENT, Predicting Risk of Cardiovascular Disease Events.
Counts <20, or values that could enable deduction of participant identities, have been suppressed.
Among this sample, 1710 individuals were classified as “low risk (<5%)” (19.0%) by the PREVENT equation, 1099 individuals were classified as “borderline risk (5%–7.4%)” (12.2%), 4517 individuals were classified as “intermediate risk (7.5%–19.9%)” (50.1%), and 1684 individuals were classified as “high risk (≥20%)” (18.7%) for CVD (Table 1). As expected, mean CVD risk was highest among the “high risk” category (0.50±0.14) and lowest among the “low risk” category (0.04±0.02).
There were significant differences in sociodemographic characteristics by risk group (Table 1). The “low risk” group had a higher proportion of college graduates or higher (54.0%) compared with the “high risk” group (21.9%). Similarly, 34.1% of the “low risk” group reported an income >$100 000 annually, whereas only 10.3% of the “high risk” group fell into this income category.
Performance of PREVENT Equations Across Strata
Table 2 presents performance of the PREVENT equations, stratified by race and ethnicity, income, and educational attainment. The PREVENT Total CVD equation performed well among the full cohort (C statistic, 0.732 [95% CI, 0.718–0.752]). The total CVD equation performed robustly among non‐Hispanic White (C statistic, 0.748 [95% CI, 0.729–0.774]) and non‐Hispanic Black individuals (C statistic, 0.733 [95% CI, 0.705–0.775]). The PREVENT ASCVD equation performed similarly among non‐Hispanic White (C statistic, 0.716 [95% CI, 0.692–0.754]) and non‐Hispanic Black individuals (C statistic, 0.720 [95% CI, 0.690–0.776]). The PREVENT HF equation also performed similarly among non‐Hispanic White (C statistic, 0.812 [95% CI, 0.790–0.842]) and non‐Hispanic Black (C statistics, –0.765 [95% CI, 0.734–0.842]) individuals. Differences between groups were not statistically significant as indicated by the overlapping 95% CIs.
Table 2.
AHA PREVENT Total CVD, ASCVD, and HF C‐Statistics by Racial or Ethnic Category, Income, and Educational Attainment
| 10‐y total CVD risk | 10‐y ASCVD risk | 10‐y HF risk | |
|---|---|---|---|
| Category | C‐statistic (95% CI) | C‐statistic (95% CI) | C‐statistic (95% CI) |
| Overall | 0.732 (0.718–0.752) | 0.716 (0.698–0.741) | 0.777 (0.757–0.800) |
| Race/ethnic group | |||
| Non‐Hispanic White | 0.748 (0.729–0.774) | 0.716 (0.692–0.754) | 0.812 (0.790–0.842) |
| Non‐Hispanic Black | 0.733 (0.705–0.775) | 0.720 (0.690–0.776) | 0.765 (0.734–0.842) |
| Other race | 0.707 (0.678–0.749) | 0.704 (0.675–0.761) | 0.753 (0.716–0.807) |
| Income | |||
| <$50k | 0.730 (0.709–0.758) | 0.723 (0.700–0.759) | 0.758 (0.733–0.793) |
| $50k–$100k | 0.728 (0.695–0.785) | 0.717 (0.682–0.792) | 0.785 (0.742–0.857) |
| >$100k | 0.757 (0.730–0.820) | 0.744 (0.714–0.821) | 0.853 (0.825–0.914) |
| Missing income | 0.686 (0.662–0.736) | 0.682 (0.648–0.748) | 0.743 (0.714–0.802) |
| Education | |||
| High school diploma/General Educational Development or lower | 0.706 (0.680–0.743) | 0.713 (0.683–0.760) | 0.731 (0.698–0.777) |
| Some college | 0.730 (0.707–0.764) | 0.729 (0.705–0.774) | 0.770 (0.744–0.811) |
| College graduate or higher | 0.737 (0.712–0.772) | 0.686 (0.661–0.738) | 0.824 (0.792–0.864) |
| Missing education | 0.834 (0.808–0.979) | 0.855 (0.821–0.989) | * |
AHA indicates American Heart Association; ASCVD, atherosclerotic cardiovascular disease; CVD, cardiovascular disease; HF, heart failure; and PREVENT, Predicting Risk of Cardiovascular Disease Events.
Limited number of examples to produce stable bootstrap estimate.
For income (Table 2), discrimination for total CVD was strong across <$50k (C statistic, 0.730 [95% CI, 0.709–0.758]), $50 to 100k the income group (C statistic, 0.728 [95% CI, 0.695–0.785]), and >$100k (C statistic, 0.757 [95% CI, 0.730–0.820]) income categories. Similarly, discrimination for ASCVD by income group was <50k (C statistic, 0.723 [95% CI, 0.700–0.759]), $50 to 100k (C statistic, 0.717 [95% CI, 0.682–0.792]), and >$100k was (C statistic, 0.744 [95% CI, 0.714–0.821]). Discrimination was also generally good for HF with C statistics ranging from 0.758 among people who made <$50k to as high as 0.853 among those who made >$100. Differences were not statistically significant.
Finally, discriminatory performance was consistent across all educational attainment groups. For individuals with a high school diploma/GED or less education, discrimination for total CVD (C‐statistic, 0.706 [95% CI, 0.680–0.734]), ASCVD (C‐statistic, 0.713 [95% CI, 0.683–0.760]), and HF (C‐statistic, 0.731 [95% CI, 0.698–0.777]) equation performance was good. For individuals with “some college,” discrimination for total CVD (C‐statistic, 0.730 [95% CI, 0.707–0.764]), ASCVD (C‐statistic, 0.729 [95% CI, 0.707–0.774]), and HF (C‐statistic, 0.700 [95% CI, 0.744–0.811]) equation performance was similarly strong. Discrimination was also good for individuals with a college degree or higher for total CVD (C‐statistic, 0.737 [95% CI, 0.712–0.772]), ASCVD (C‐statistic, 0.686 [95% CI, 0.661–0.738]), and HF (C‐statistic, 0.824 [95% CI, 0.792–0.864]).
Tables 3 and 4 displays the predicted versus observed rates for total CVD, ASCVD (Table 3), and HF (Table 4) for the full sample and by race or ethnicity, income, and educational attainment. In general, the predicted rates of total CVD (23.0%), ASCVD (18.7%), and HF (31.5%) were higher compared with their observed rates for the full sample (total CVD=9.0%; ASCVD=5.7%, HF=4.6; P<0.001 for all outcomes). This overprediction in total CVD, ASCVD, and HF was seen when stratified by race or ethnicity, income, and educational attainment. For total CVD by race, the predicted rates were 6‐fold higher among Non‐Hispanic White (P<0.001) individuals. For income, predicted total CVD rates were 2‐fold higher for each income group (P<0.001) whereas for educational attainment, predicted total CVD rates were 6‐fold higher among those who had at least a college education (P<0.001).
Table 3.
PREVENT Equation 10‐Year Predicted Versus Observed Rates of Total CVD and ASCVD by Race or Ethnicity, Income, and Educational Attainment
| Category | No. | PREVENT‐predicted total CVD | Observed total CVD | Predicted to observed ratio | P value | PREVENT‐predicted ASCVD | Observed ASCVD | Predicted to observed ratio | P value |
|---|---|---|---|---|---|---|---|---|---|
| Overall | 9010 | 23.0% | 9.0% | 5.3 | <0.001 | 18.7% | 5.7% | 3.3 | <0.001 |
| Race or ethnicity | |||||||||
| Non‐Hispanic White | 5559 | 43.8% | 6.9% | 6.3 | <0.001 | 14.1% | 4.4% | 3.2 | <0.001 |
| Non‐Hispanic Black | 1547 | 53.5% | 13.2% | 4.1 | <0.001 | 24.1% | 8.5% | 2.8 | <0.001 |
| Non‐Hispanic Asian | 115 | 33.0% | 7.0% | 4.8 | <0.001 | 16.5% | 3.5% | 4.8 | <0.001 |
| Hispanic | 1007 | 58.5% | 13.2% | 4.4 | <0.001 | 31.2% | 8.7% | 3.6 | <0.001 |
| Non‐Hispanic other | 403 | 51.6% | 8.4% | 6.1 | <0.001 | 22.8% | 5.7% | 4.0 | <0.001 |
| Missing race or ethnicity | 379 | 57.8% | 11.1% | 5.2 | <0.001 | 27.2% | 6.3% | 4.3 | <0.001 |
| Income | |||||||||
| <$50k | 3798 | 24.3% | 11.4% | 2.1 | <0.001 | 24.2% | 7.2% | 3.3 | <0.001 |
| $50k–$100k | 1840 | 12.4% | 6.1% | 2.0 | <0.001 | 12.4% | 3.8% | 3.3 | <0.001 |
| >$100k | 2022 | 8.6% | 3.9% | 2.2 | <0.001 | 8.6% | 2.8% | 3.1 | <0.001 |
| Missing Income | 1350 | 26.7% | 13.6% | 2.0 | <0.001 | 26.7% | 8.7% | 3.1 | <0.001 |
| Educational attainment | |||||||||
| High school diploma/General Educational Development or lower | 2307 | 60.8% | 12.7% | 4.8 | <0.001 | 29.5% | 8.1% | 3.7 | <0.001 |
| Some college | 2721 | 49.2% | 9.8% | 5.0 | <0.001 | 20.8% | 6.2% | 3.3 | <0.001 |
| College graduate or higher | 3801 | 38.2% | 6.1% | 6.2 | <0.001 | 9.7% | 3.9% | 2.5 | <0.001 |
| Missing educational attainment | 175 | 23.3% | 6.1% | 3.8 | <0.001 | 39.4% | 6.9% | 5.8 | <0.001 |
ASCVD indicates atherosclerotic cardiovascular disease; CVD, cardiovascular disease; HF, heart failure; and PREVENT, Predicting Risk of Cardiovascular Disease Events.
Table 4.
PREVENT Equation 10‐Year Predicted Versus Observed Rates of Heart Failure by Race or Ethnicity, Income, and Educational Attainment
| Category | No. | PREVENT‐predicted HF | Observed HF | Predicted‐to‐observed ratio | P value |
|---|---|---|---|---|---|
| Overall | 9010 | 31.5% | 4.6% | 6.9 | <0.001 |
| Race or ethnicity | |||||
| Non‐Hispanic White | 5559 | 26.4% | 3.5% | 7.5 | <0.001 |
| Non‐Hispanic Black | 1547 | 36.9% | 6.6% | 5.6 | <0.001 |
| Non‐Hispanic Asian | 115 | 20.1% | 3.5% | 6.0 | <0.001 |
| Hispanic | 1007 | 45.5% | 6.8% | 6.6 | <0.001 |
| Non‐Hispanic other | 403 | 35.7% | 4.0% | 9.0 | <0.001 |
| Missing race or ethnicity | 379 | 43.3% | 6.3% | 6.8 | <0.001 |
| Income | |||||
| <$50k | 3798 | 38.7% | 5.9% | 6.6 | <0.001 |
| $50k–$100k | 1840 | 25.0% | 3.1% | 8.1 | <0.001 |
| >$100k | 2022 | 15.9% | 1.7% | 9.2 | <0.001 |
| Missing income | 1350 | 31.4% | 4.6% | 6.0 | <0.001 |
| Educational attainment | |||||
| High school diploma/General Educational Development or lower | 2307 | 46.3% | 6.4% | 7.2 | <0.001 |
| Some college | 2721 | 32.7% | 5.3% | 6.2 | <0.001 |
| College graduate or higher | 3801 | 20.3% | 2.9% | 6.9 | <0.001 |
| Missing educational attainment | 175 | 56.0% | 4.0% | 14.0 | <0.001 |
HF indicates heart failure; and PREVENT, Predicting Risk of Cardiovascular Disease Events.
For ASCVD, the ratio of predicted versus observed rates remained high and statistically significant for each group, (Table 3). Among the racial and ethnic groups, predicted were observed rates of ASCVD were nearly 5‐fold higher among Asian individuals (P<0.001). Among income and educational attainment groups, predicted versus observed rates of ASCVD were over 3‐fold higher among those who made $50k to $100k and those who had some college or less (P<0.001).
Finally, predicted versus observed rates were highest for HF (Table 4). Among the full sample, predicted versus observed rates were nearly 7‐fold higher. Among race groups, Non‐Hispanic White individuals had one of the highest discrepancies, with 7.5‐fold higher predicted HF rates than observed (P<0.001). For income, those who made more than $100k per year had >9‐fold higher predicted rates of HF than observed (P<0.001). Finally, for educational attainment, those with a high school degree or less had 7‐fold higher predicted rates of HF than observed (P<0.001).
Statistical Tests Results
When examining 10‐year predicted versus 5‐year observed risk by AHA PREVENT decile, the 10‐year predicted rate of both total CVD and ASCVD were higher compared with observed rates (Tables S1). Total CVD risk per 1000 person‐years (Tables S4–S6) was highest among non‐Hispanic Black individuals (31.9 per 1000 person‐years) and Hispanic individuals (31.8 per 1000). ASCVD and HF rates were also highest among non‐Hispanic Black (ASVCD: 20.6 per 1000 person‐years, HF: 15.9 per 1000 person‐years) and Hispanic individuals (ASCVD: 21.1 per 1000 person‐years, HF: 16.5 per 1000 person‐years). Total CVD and ASCVD rates per 1000 person‐years were highest among those with missing income (total CVD: 32.7; ASCVD 20.9, HF: 17.0) and lower educational attainment such as those with a high school diploma or GED (total CVD: 30.6; ASCVD: 19.4, HF: 15.6).
A sensitivity analysis was conducted on those included in the final sample compared with those excluded due to missingness of 1 or more of the required variables for PREVENT. The chi‐square test revealed a significant difference in sex distribution between the groups (P<0.001), with the included group showing a greater proportion of women (54.5%) compared with the excluded group (52.1%). Other significant differences between groups include age, comorbidity, and outcome categories (Table S7).
DISCUSSION
Using the AoU cohort, this study demonstrates that the PREVENT equations have strong discriminatory ability over a 5‐year follow‐up period. However, all equations tended to overpredict risk in this AoU cohort, despite higher overall CVD event rates (9.0%) to the development and validation cohorts used in the original derivation and validation study. 1 These differences in predictive ability and calibration of the AHA PREVENT equations in the AoU cohort compared with the derivation and validation study may be driven by differences in key CVD risk factors such as smoking prevalence (27% in AoU versus ≈6% in the PREVENT development cohort), mean systolic blood pressure (146 mm Hg in AoU versus 125 mm Hg in PREVENT), and diabetes prevalence (30% in AoU versus 10% in PREVENT). This healthier CVD risk profile in the PREVENT validation and derivation cohorts (which include older historical cohorts with lower burden of CVD) may lead to inflation of risk estimates when applied to a cohort with a higher‐risk baseline profile, such as AoU. This misalignment in risk factor distribution leads to systematic overprediction, particularly in subgroups with high comorbidity prevalence.
Our findings extend beyond the original PREVENT study by testing the equation's performance across these socioeconomic classifications, such as income and educational attainment. Although previous studies have used other electronic health record systems as well as the NHANES (National Health and Nutrition Examination Survey) data to evaluate the equations for statin eligibility among racial groups, 8 , 9 this study rigorously validates their generalizability within a diverse community sample and uniquely provides comprehensive sociodemographic characterization, often lacking in widely used models like the Framingham Risk Score and Pooled Cohort Equations.
In our study, discrimination performance differences between racial and ethnic groups were not statistically significant, reflecting that the PREVENT Equations may be more robust than the Pooled Cohort Equations due to the larger and more diverse racial background in the training and validation cohorts. Unlike the Framingham Risk Score and the Pooled Cohort Equations, which has been criticized for overestimating risk in Black populations, 6 , 10 the PREVENT Equations show promise for a more equitable approach to CVD risk prediction across racial groups.
Our study's findings on the variation of model performance across income groups also contribute novel insights. Although SES has long been recognized as a key determinant of cardiovascular outcomes, few studies have rigorously tested how SES affects the performance of prediction models, like the PREVENT equations. 8 , 11 , 12 Previous models have often underperformed in low‐income populations due to underrepresentation in training data sets. 9 , 13
There were notable differences across risk deciles by SES category, with a greater proportion of individuals with less educational attainment and lower income in the higher risk deciles—consistent with findings from previous studies. 14 , 15 Evidence suggests that nonbiological factors, including social, behavioral, and economic stressors, significantly influence CVD outcomes. Further investigation into the unique factors affecting individuals within these risk ranges could provide valuable insights to inform targeted interventions.
Previous studies have highlighted challenges in model generalizability across socioeconomic status groups. 16 , 17 Key features of our cohort, such as higher rates of comorbidities such as current smoking status, diabetes, and older mean age (63.0±11.0 years) likely contribute to discrepancies observed between our results, the original derivation and validation study, and models developed within more integrated health care systems. 1 , 18 The poor calibration observed, particularly the overprediction of cardiovascular risk, underscores the importance of recalibrating the AHA PREVENT equations to ensure accurate clinical application. However, issues of overprediction have been seen in other studies. 19 , 20 An examination of >360 000 patients in a Northern Californian health care system found that although discrimination for the AHA PREVENT equation was generally good by racial or ethnic group, there was slight overprediction among Asian and Hispanic populations both in aggregated and disaggregated groups. 19 An earlier study in 2023 using the AoU data set, although for a different purpose (to assess the performance of a contralateral breast cancer calculator) also observed significant overprediction of 5‐year risks. 20 Collectively, these factors highlight challenges in predicting outcomes in real‐world settings, underscoring the need to enhance model generalizability across diverse populations.
The study cohort was significantly limited by the availability of creatinine values, which inherently introduces selection bias within our analytical sample. Although imputation may be a possible way to mitigate this bias, creatine and eGFR by extension, may not be missing at random. However, we recognize that eGFR may not be routinely collected during regular laboratory workup unless there is a clinical suspicion of kidney dysfunction or the presence of related comorbidities. We show that individuals with observed eGFR values were systematically different in clinical variables from those without eGFR. Furthermore, those with eGFR values had higher rates of CVD, supporting the nonrandomness of the analytical sample. Future studies and clinical practice should consider the routine collection of creatinine and calculation of eGFR to better ensure completeness in analysis and avoid such exclusion in further validation of CVD risk prediction equations.
The inability to adequately disaggregate Asian, Hispanic, and multiracial groups due to low event rates, leading to unstable estimates, is another notable limitation of this study. Less populous groups, such as American Indian, Alaska Native, and Native Hawaiian or Pacific Islander, were excluded because of small sample sizes and limited representation in the original validation cohort. Future research should investigate the generalizability of the PREVENT equations for these populations. Additionally, self‐reported income and education data may introduce misclassification bias, particularly for individuals with “missing” income. Although we included a missing income category to reduce sample size biases and found the equations applicable, further research is necessary to address underreporting, especially among those in poverty. Finally, our use of a composite CVD outcome combining ASCVD and HF events within total CVD may have led to misclassification in time‐to‐event analyses, and future studies should consider recurrent event modeling to better capture disease burden.
CONCLUSIONS
The PREVENT equations demonstrate strong discriminatory ability in predicting short‐term CVD risk across various sociodemographic groups, including race or ethnicity, income, and education. However, calibration is needed to address the overprediction observed in shorter‐term settings. Despite their overall generalizability to this diverse cohort, potential limitations exist, particularly a tendency to overpredict risk in populations with higher rates of comorbidities than those in the PREVENT development and validation cohorts. By considering these variations in predictive performance, clinicians can improve individual patient assessments and enhance cardiovascular risk management.
Sources of Funding
This research was supported by the Department of Biomedical Data Science at Stanford School of Medicine, the American Heart Association (Grant Numbers: 24POST1192328, 23ARCRM1077167 awarded to Dr Adrian M. Bacong), and the National Institutes of Health National Heart, Lung, and Blood Institute (1K24HL150476‐01A1 awarded to Dr Latha Palaniappan).
Disclosures
None.
Supporting information
Data S1 Tables S1–S7.
Acknowledgments
We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health's All of Us Research Program for making available the participant data examined in this study.
This article was sent to Tiffany M. Powell‐Wiley, MD, MPH, Associate Editor, for review by expert referees, editorial decision, and final disposition.
Supplemental Material is available at https://www.ahajournals.org/doi/suppl/10.1161/JAHA.125.041549
For Sources of Funding and Disclosures, see page 10.
References
- 1. Khan SS, Matsushita K, Sang Y, et al. Development and validation of the American Heart Association's PREVENT equations. Circulation. 2024;149:430–449. doi: 10.1161/CIRCULATIONAHA.123.067626 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Goff DC, Lloyd‐Jones DM, Bennett G, et al. ACC/AHA guideline on the assessment of cardiovascular risk. Circulation. 2013;129(25_suppl_2):S49–S73. doi: 10.1161/01.cir.0000437741.48606.98 [DOI] [PubMed] [Google Scholar]
- 3. Khan SS, Ning H, Shah SJ, et al. 10‐year risk equations for incident heart failure in the general population. J Am Coll Cardiol. 2019;73:2388–2397. doi: 10.1016/j.jacc.2019.02.057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. All of Us Research Program, Data Methods . Accessed March 14, 2025.
- 5. v8 Curated Data Repository (CDR) Release Notes (C2024Q3R4 versions). User Support. 2025. Accessed March 1, 2025. https://support.researchallofus.org/hc/en‐us/articles/30294451486356‐Curated‐Data‐Repository‐CDR‐version‐8‐Release‐Notes.
- 6. Doerr M, Grayson S, Moore S, Suver C, Wilbanks J, Wagner J. Implementing a universal informed consent process for the all of us research program. Pac Symp Biocomput. 2019;24:427–438. [PMC free article] [PubMed] [Google Scholar]
- 7. Inker LA, Eneanya ND, Coresh J, et al. New creatinine‐ and cystatin C–based equations to estimate GFR without race. N Engl J Med. 2021;385:1737–1749. doi: 10.1016/j.eururo.2021.09.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Albert MA, Glynn RJ, Buring J, Ridker PM. Impact of Traditional and Novel Risk Factors on the Relationship Between Socioeconomic Status and Incident Cardiovascular Events. [DOI] [PubMed]
- 9. van DKR, Zhang D, Kaptoge S, Paige E, Angelantonio ED, Pennells L. Risk estimation for the primary prevention of cardiovascular disease: considerations for appropriate risk prediction model selection. Lancet Glob Health. 2024;12:e1343–e1358. doi: 10.1016/S2214-109X(24)00210-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Gijsberts CM, Groenewegen KA, Hoefer IE, et al. Race/ethnic differences in the associations of the Framingham risk factors with carotid IMT and cardiovascular events. PLOS One. 2015;10:e0132321. doi: 10.1371/journal.pone.0132321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. DeFilippis AP, Young R, Carrubba CJ, et al. An analysis of calibration and discrimination among multiple cardiovascular risk scores in a modern multiethnic cohort. Ann Intern Med. 2015;162:266–275. doi: 10.7326/M14-1281 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Clark AM, DesMeules M, Luo W, Duncan AS, Wielgosz A. Socioeconomic status and cardiovascular disease: risks and implications for care. Nat Rev Cardiol. 2009;6:712–722. doi: 10.1038/nrcardio.2009.163 [DOI] [PubMed] [Google Scholar]
- 13. Yang J, Clifton L, Dung NT, et al. Mitigating machine learning bias between high income and low–middle income countries for enhanced model fairness and generalizability. Sci Rep. 2024;14:13318. doi: 10.1038/s41598-024-64210-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Minhas AMK, Jain V, Li M, et al. Family income and cardiovascular disease risk in American adults. Sci Rep. 2023;13:279. doi: 10.1038/s41598-023-27474-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Palakshappa D, Ip EH, Berkowitz SA, et al. Pathways by which food insecurity is associated with atherosclerotic cardiovascular disease risk. J Am Heart Assoc. 2021;10:e021901. doi: 10.1161/JAHA.121.021901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Li Y, Wang H, Luo Y. Improving fairness in the prediction of heart failure length of stay and mortality by integrating social determinants of health. Circ Heart Fail. 2022;15:e009473. doi: 10.1161/CIRCHEARTFAILURE.122.009473 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Schuch HS, Furtado M, Dos Silva GF, Kawachi I, Adp CF, Elani HW. Fairness of machine learning algorithms for predicting foregone preventive dental care for adults. JAMA Netw Open. 2023;6:e2341625. doi: 10.1001/jamanetworkopen.2023.41625 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hurley LP, Dickinson LM, Estacio RO, Steiner JF, Havranek EP. Prediction of cardiovascular death in racial/ethnic minorities using Framingham risk factors circulation. Cardiovasc Qual Outcomes. 2010;3:181–187. doi: 10.1161/CIRCOUTCOMES.108.831073 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Yan X, Bacong AM, Huang Q. Performance of the American Heart Association's PREVENT equations among disaggregated racial and ethnic subgroups. JAMA Cardiol. 2025. doi: 10.1001/jamacardio.2025.1865 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Yoo RH. Assessment of Contralateral Breast Cancer Risk Prediction Using the All of Us Dataset. Johns Hopkins University; 2023. JScholarship repository [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1 Tables S1–S7.
