Performance of PREVENT Cardiovascular Risk in Electronic Health Record–Based Clinical Practice

Chuan Hong; Mu Niu; Haoyuan Wang; Daniel M Wojdyla; Nicoleta Economou-Zavlanos; Matthew M Engelhard; Michael Pignone; Manesh R Patel; Michael J Pencina

doi:10.1001/jamanetworkopen.2026.6838

. 2026 Apr 14;9(4):e266838. doi: 10.1001/jamanetworkopen.2026.6838

Performance of PREVENT Cardiovascular Risk in Electronic Health Record–Based Clinical Practice

Chuan Hong ^1,^✉, Mu Niu ², Haoyuan Wang ¹, Daniel M Wojdyla ³, Nicoleta Economou-Zavlanos ¹, Matthew M Engelhard ¹, Michael Pignone ⁴, Manesh R Patel ⁴, Michael J Pencina ¹

¹Department of Biostatistics & Bioinformatics, Duke University School of Medicine, Durham, North Carolina

²Interdisciplinary Data Science, Duke University, Durham, North Carolina

³Duke Clinical Research Institute, Durham, North Carolina

⁴Department of Medicine, Duke University, Durham, North Carolina

Accepted for Publication: February 19, 2026.

Published: April 14, 2026. doi:10.1001/jamanetworkopen.2026.6838

Open Access: This is an open access article distributed under the terms of the CC-BY-NC-ND License, which does not permit alteration or commercial use, including those for text and data mining, AI training, and similar technologies. © 2026 Hong C et al. JAMA Network Open.

^✉

Corresponding Author: Chuan Hong, PhD, Department of Biostatistics & Bioinformatics, Duke University School of Medicine, 2424 Erwin Rd, Room 9022, Durham, NC 27710 (chuan.hong@duke.edu).

Author Contributions: Dr Hong and Mr Niu had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Dr Hong and Mr Niu contributed equally.

Concept and design: Hong, Niu, Patel, Pencina.

Acquisition, analysis, or interpretation of data: Hong, Niu, Wang, Wojdyla, Economou-Zavlanos, Engelhard, Pignone, Patel.

Drafting of the manuscript: Hong, Niu.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Hong, Niu, Wang, Wojdyla, Engelhard.

Administrative, technical, or material support: Hong, Niu.

Supervision: Hong, Engelhard, Patel.

Conflict of Interest Disclosures: Dr Economou-Zavlanos reported receiving grants from The Duke Endowment Improving the Safety and Trustworthiness of Artificial Intelligence in Health Care and the Gordon and Betty Moore Foundation Health AI Maturity Model; personal fees from Coalition for Health AI (CHAI; scientific advisor), National Institutes of Health Common Fund Bridge2AI (chair of executive advisory board), and Bipartisan Policy Center (consultant); having done volunteer work as a scientific and operations for CHAI Trustworthy AI Blueprint and Responsible AI Guide; and having a patent for systems and methods for optimal operating room scheduling under uncertainty pending (US provisional patent application No. 63/555,456) outside the submitted work. Dr Patel reported receiving grants from Novartis; Regeneron; and the National Heart, Lung, and Blood Institute outside the submitted work. Dr Pencina reported receiving grants from the American Heart Association; personal fees from Eli Lilly, Cleerly Inc, McGill University Health Centre, and American Heart Association; and being employed by Optum employment outside the submitted work. No other disclosures were reported.

Data Sharing Statement: See Supplement 2.

^✉

Corresponding author.

PMCID: PMC13080547 PMID: 41979878

This cohort study evaluates the discrimination and calibration of the Predicting Risk of Cardiovascular Disease Events (PREVENT) equations using 2 cohorts of data collected from electronic health records (EHRs), comparing the discrimination and calibration with derivation and validation cohorts.

Key Points

Question

Do the Predicting Risk of Cardiovascular Disease Events (PREVENT) equations maintain 5-year cardiovascular disease (CVD) risk performance across subgroups under electronic health record (EHR) conditions with missing data?

Findings

In this cohort study using data from the Duke University Health System EHR to create a cohort of 127 151 individuals with complete data and a cohort of 406 230 individuals with partially missing data, PREVENT showed strong discrimination with consistent subgroup performance. Original PREVENT equations modestly underestimated risk; local adaptation minimally improved calibration without affecting discrimination.

Meaning

These findings suggest that the PREVENT equations can be applied to detect increased CVD risk in common clinical settings, including those with missing laboratory or vital sign data when relevant imputation is used.

Abstract

Importance

In 2023, the American Heart Association Cardiovascular-Kidney-Metabolic Scientific Advisory Group introduced the Predicting Risk of Cardiovascular Disease Events (PREVENT) equations, a race-free, sex-specific model for cardiovascular disease (CVD) risk prediction in adults aged 30 to 79 years. While initial validations showed strong performance, their reliability under missingness conditions remains unclear.

Objective

To evaluate discrimination and calibration of the PREVENT equations in an electronic health record (EHR) cohort and assess robustness to missingness.

Design, Setting, and Participants

This retrospective cohort study used Duke University Health System, a health network encompassing tertiary hospitals, regional hospitals, and primary care practices across North Carolina, EHR data from March 2014 to December 2024 with up to 8 years follow-up. Patients without baseline CVD with sufficient data to calculate PREVENT risk were included. Two cohorts were defined: a relaxed cohort, allowing for missing laboratory and vital sign data with race-sex median imputation, and a strict cohort, restricted to those with complete records. Data were analyzed from October 2024 to June 2025.

Exposures

Published PREVENT equations alongside locally fitted Cox proportional hazards, discrete-time neural network, and recalibrated PREVENT models.

Main Outcomes and Measures

The primary outcomes were estimated 5-year risk of incident CVD and assessed discrimination (C-index) and calibration (expected vs observed event rates) at 5 years by race, sex, and socioeconomic subgroups. The local adaptation via Duke retraining was compared with machine learning–based recalibration of PREVENT scores.

Results

The study included 406 230 patients in the relaxed cohort (239 764 females with a mean [SD] age of 49 [20] years and 166 466 males with a mean [SD] age of 49 [20] years; 16 291 Asian [4.0%], 107 114 Black [26.4%], and 256 403 White [63.1%]) and 127 151 patients in the strict cohort (71 086 females with a mean [SD] age of 54 [13] years and 56 065 males with a mean [SD] age of 53 [12] years; 8210 Asian [6.5%], 29 033 Black [22.8%], and 83 515 White [65.7%]). PREVENT showed strong discrimination in both cohorts (C-index, 0.77 for both males and females in the strict cohort vs 0.75 for males and 0.77 for females in the relaxed cohort), indicating robustness to missing data. Calibration ratios were higher in the strict cohort, indicating more risk underestimation in the relaxed cohort. Local adaptations minimally affected discrimination and modestly improved calibration.

Conclusions and Relevance

In this cohort study, the PREVENT equations showed strong discrimination and generalizability, including with missing laboratory and vital sign data when imputation was applied, supporting reliable CVD risk identification and ranking in routine practice.

Introduction

Cardiovascular disease (CVD) remains a leading cause of death in the US.^1,2,3 Accurate risk assessment using multivariable risk prediction equations is recommended to guide primary prevention strategies for CVD.^4,5,6,7,8 Existing models, such as the Framingham Stroke Risk Profiles^9,10,11 and the Pooled Cohort Equations,¹² are widely used, but are limited to certain age groups, race-specific to Black and White populations, and do not account for CVD subtypes like heart failure (HF).^13,14,15,16 To address these gaps, the American Heart Association Cardiovascular-Kidney-Metabolic Scientific Advisory Group developed the Predicting Risk of Cardiovascular Disease Events (PREVENT) equations in 2023.^13,17,18,19 This race-free, sex-specific model improves upon prior tools by extending its applicability to US adults aged 30 to 79 years and including individuals from diverse racial and ethnic backgrounds. Furthermore, the PREVENT equations evaluate the total burden of CVD, including both atherosclerotic CVD (ASCVD) and HF, while incorporating additional predictors relevant to cardiovascular-kidney-metabolic syndrome.^20,21 While early validations of PREVENT have shown promising discrimination and calibration,^{22,23,24,25,26,27} these studies relied on curated datasets with complete information and may not reflect the complex, often incomplete data environments of clinical settings.²⁸

This study evaluated the performance of the PREVENT equations using electronic health record (EHR) data from Duke University Health System (DUHS).²⁹ The DUHS EHR dataset provides a unique opportunity for analysis due to its diverse patient population (more than 30% racial and ethnic minority groups). In addition, situated in North Carolina, part of the Stroke Belt (a region with a significantly higher incidence of stroke and stroke-related deaths than the national average),^30,31 the dataset allows for a meaningful assessment of the PREVENT equations within a high-risk population, marking it particularly relevant for validating this novel risk prediction model. In this study, we focused on the base PREVENT equations (expanded PREVENT models were not evaluated), which were originally developed for a 10-year risk horizon. To align with the available follow-up in our cohort, we adapted the evaluation to a 5-year horizon. Predicted 5-year CVD risk was calculated using the PREVENT baseline survival at 5 years and observed 5-year event rates were estimated with Kaplan-Meier methods to account for variable follow-up.

Our study assessed the performance of the PREVENT equations in a large unselected EHR cohort, including patients with missing or irregular data common in routine care but often underrepresented in model development. Specifically, we (1) evaluated performance in a cohort allowing missing data to reflect real-life limitations, (2) assessed 5-year discrimination and calibration across subgroups defined by sex, race, and social determinants of health (SDOH), and (3) examined the impact of local adaptation using Duke-specific retrained and recalibrated models.

Methods

This cohort study was approved by the Duke University Health System institutional review board. Informed consent was waived because the study used deidentified EHR data. We evaluated the PREVENT equations in DUHS EHR data using 2 cohorts: a relaxed cohort, which allowed missing laboratory and vital sign data to reflect real-life practice, and a strict cohort, which required complete measurements. We defined outcomes, curated clinical and socioeconomic predictors, and compared the published PREVENT equations with locally retrained and recalibrated models. Discrimination and calibration were assessed across demographic and socioeconomic subgroups (eFigure 1 in Supplement 1). Reporting followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.⁷

Study Design and Population

Eligibility required at least 1 outpatient encounter to establish cohort entry, while laboratory and vital sign measurements used for the index date and baseline covariates could come from any care setting. Figure 1A summarizes the study timeline, including observation windows, index date definition, lookback periods, and follow-up. DUHS EHRs integrate data across Duke hospitals, primary care, and affiliated practices.²⁹

Figure 1. — A, Timeline overview of observation windows, index date, and follow-up periods. B and C, inclusion and exclusion criteria used to define analytic cohorts. ASCVD indicates atherosclerotic cardiovascular disease; BMI, body mass index; CVD, cardiovascular disease; EHR, electronic health record; HDL-C, high-density lipoprotein cholesterol; HF, heart failure; PREVENT, Predicting Risk of Cardiovascular Disease Event equations; SBP, systolic blood pressure; SC, serum creatinine; SDOH, social determinants of health; TC, total cholesterol. To convert TC and HDL-C to millimoles per liter, multiply by 0.0259.

To reflect real-life conditions, we constructed the relaxed cohort, allowing missing laboratory and vital sign data and imputing missing values using race- and sex-specific medians.³² As a benchmark, we defined the strict cohort as restricted to patients with complete data per the original PREVENT inclusion criteria. Figure 1B shows the cohort identification flowchart and inclusion and exclusion steps.

Eligibility Criteria

The eligibility criteria for the relaxed cohort were less stringent than those established in the PREVENT equations.^13,17 Individual-level participant data were included without age restriction. Patients with known ASCVD or HF at baseline were excluded. Participants with missing data for both serum creatinine (SC) and systolic blood pressure (SBP) were excluded. Rather than excluding patients with extreme clinical values, we truncated measurements at the thresholds established in the PREVENT equations: SBP less than 90 or greater than 200 mm Hg, total cholesterol (TC) less than 130 or greater than 320 mg/dL, high-density lipoprotein cholesterol (HDL-C) less than 20 or greater than 100 mg/dL (to convert TC and HDL-C to millimoles per liter, multiply by 0.0259), and body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) less than 18.5 or greater than 40.0.

To construct the relaxed cohort, we set the enrollment period to 2016 to 2018 to ensure sufficient follow-up. The index date was the later of the earliest SC and SBP measured within 1 year of each other; if no such pair existed, it was the earliest available SC or SBP measurement. Patients were included if they (1) had 1 or more outpatient visits with an index date in the enrollment period, (2) had no prior CVD before the index date, (3) had SC or SBP within a 1-year lookback window (no requirement for TC, HDL-C, or BMI), (4) had 1 or more follow-up visits 7 or more days after the index date, and (5) had demographic data (age, sex, and race).

The strict cohort used the same enrollment period as the relaxed cohort but required complete laboratory and vital sign data. The design followed the variable collection strategy in the original PREVENT equations study^13,17,18,19; full details are in eAppendix 1 in Supplement 1.

Variable Curation

CVD Outcomes

Myocardial infarction, stroke, and HF were identified using International Classification of Diseases, Ninth Revision (ICD-9) and International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) codes.^33,34 To maintain consistency across the October 2015 ICD-9 to ICD-10 transition, we used both ICD-9 and ICD-10 codes wherever diagnosis codes were used in cohort construction. A CVD event was defined by any of these outcomes; ASCVD included myocardial infarction or stroke. Time-to-event was measured from the index date to the first event, with censoring at study end. Outcomes (CVD, ASCVD, or HF) were analyzed by submodel, with performance metrics assessed at 5 years.

Key Variables

The index date marked the start of follow-up and varied by cohort. Baseline predictors (TC, HDL-C, and BMI) were taken from a 3-year lookback window, using the value closest to the index date. Estimated glomerular filtration rate was calculated with the 2021 Chronic Kidney Disease Epidemiology Collaboration creatinine equation.³⁵ Age, sex, and race were extracted from demographics. Race was based on self-report and categorized into 4 groups: Asian, Black, White, and other. The other race category included individuals who identified as American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, or multiple races, as well as records with refused responses; race was included to evaluate subgroup performance of the PREVENT equations. Diabetes was identified using ICD-9 and ICD-10 codes, medications (statins and antihypertensives) via RxNorm,³⁶ and smoking via ICD-9 and ICD-10 codes and vital signs within 3 years. Patients without data were assumed negative. Data processing followed the PREVENT protocol; in the relaxed cohort, missing values were imputed using race- and sex-specific medians.

SDOH Variables

Socioeconomic variables, including Area Deprivation Index (ADI) national rank^36,37 and insurance type, were assigned based on proximity to each patient’s index date. ADI was derived from residential Census block group, with lower national rank indicating greater socioeconomic advantage,^38,39 and categorized into quartiles (quartile 1 [most advantaged] to quartile 4 [most disadvantaged]).³⁸ Insurance was categorized as commercial, Medicare, Medicaid, other, or no insurance. Additional curation details are in eAppendix 2 in Supplement 1.

Statistical Analysis

Comparison of Patient Characteristics Across Cohorts

We compared the demographic, clinical, and socioeconomic characteristics of our study cohorts (relaxed cohort and strict cohort) with the original PREVENT derivation and validation cohorts. Characteristics were stratified by sex and included age, race, clinical risk factors, ADI, insurance type, medication use, and cardiovascular outcomes. Data were analyzed from October 2024 to June 2025.

Cumulative Incidence Estimation

Kaplan-Meier plots⁴⁰ were generated to compare cumulative incidence rates of CVD, ASCVD, and HF events across demographic and socioeconomic subgroups, including sex, race, ADI quartiles, and insurance type. The log-rank test was used to assess statistical differences between groups.⁴¹

Models for Comparison

We compared the published PREVENT equations with locally trained and recalibrated models. These included (1) original PREVENT, applied to the Duke cohort with predicted 5-year CVD risk computed from the published linear predictor and 5-year baseline survival; (2) local Cox, a Cox proportional hazards model retrained on Duke data using the same predictors with coefficients re-estimated to handle right censoring; (3) local deep neural network, a discrete-time neural network with 2 hidden layers (64 and 32 neurons)⁴²; (4) recalibrated PREVENT, using the published linear predictor with 5-year baseline survival re-estimated to match the mean Kaplan-Meier risk in the training data; and (5) machine learning model–recalibrated PREVENT, a Cox Survival XGBoost model using age, sex, and the original PREVENT risk score to evaluate data-driven recalibration (eAppendix 3 in Supplement 1).⁴³

Discrimination and Calibration Assessment

Model discrimination was evaluated using the C-index with 95% CIs from 100 bootstrap replicates and validated via 5-fold cross-validation.⁴⁴ Subgroup analyses were conducted by sex, race, ADI quartile, and insurance type, with subgroup differences assessed using bootstrap CIs (eAppendix 4 in Supplement 1).⁴⁵ Calibration compared predicted and observed event rates using the Kaplan-Meier estimator,⁴⁶ assessing over- and underestimation across race, ADI, and insurance subgroups.

Software and Implementation

Analyses used R version 4.4.1 (R Project for Statistical Computing) and the survival package.⁴⁷ Code for data curation and analysis is available online.⁴⁸

Results

Study Population

Sample Size and Age

As shown in the Table, the relaxed cohort (406 230 individuals; 239 764 females [59.0%] and 166 466 males [41.0%]) and the strict cohort (127 151 individuals; 71 086 females [55.9%] and 56 065 males [44.1%]) were smaller than the original PREVENT cohorts (>3 million each) but remain large samples from 1 health system. Mean (SD) age was 49 (20) years in the relaxed cohort (both sexes) vs 52 (13) years for females and 52 (12) years for males in the original PREVENT cohorts, reflecting inclusion of younger adults. The strict cohort was older (mean [SD] age, 54 [13] years in females and 53 [12] years in males), more closely matching the development cohorts.

Table. Demographic and Clinical Characteristics of the Study Cohorts Compared to Original PREVENT Derivation and Validation Cohorts, Stratified by Sex.

Demographic characteristics	Cohort, No. (%)
	Original PREVENT derivation cohort (N = 3 281 919)			Original PREVENT validation cohort (N = 3 330 085)		Relaxed cohort (b = 406 230)		Strict cohort (b = 127 151)
	Female (n = 1 839 828)	Male (n = 1 442 091)		Female (n = 1 894 882)	Male (n = 1 435 203)	Female (n = 239 764)	Male (n = 166 466)	Female (n = 71 086)	Male (n = 56 065)
Age, mean (SD), y	53 (13)		52 (12)	52 (13)	52 (12)	49 (20)	49 (20)	54 (13)	53 (12)
Race
Asian	47 835 (2.6)		36 052 (2.5)	51 162 (2.7)	31 574 (2.2)	9493 (4.0)	6798 (4.1)	4485 (6.3)	3725 (6.6)
Black	183 983 (10.0)		115 367 (8.0)	189 488 (10.0)	117 686 (8.2)	68 490 (28.6)	38 624 (23.2)	18 040 (25.4)	10 993 (19.6)
Hispanic^a	110 390 (6.0)		76 431 (5.3)	79 585 (4.2)	53 102 (3.7)	NA	NA	NA	NA
White	1 435 066 (78.0)		1 153 673 (80.0)	1 478 008 (78.0)	1 148 162 (80.0)	146 610 (61.1)	109 793 (66.0)	44 986 (63.3)	38 529 (68.7)
Other^b	75 433 (4.1)		66 336 (4.6)	92 849 (4.9)	78 936 (5.5)	15 171 (6.3)	11 251 (6.8)	3575 (5.0)	2818 (5.0)
Risk factors
Systolic blood pressure
Mean (SD) mm Hg	123 (16)		127 (15)	123 (16)	128 (15)	125 (18)	128 (18)	123 (17)	127 (16)
Missing	NA		NA	NA	NA	10 385 (4.3)	10 375 (6.2)	NA	NA
Total cholesterol
Mean (SD), mg/dL	193 (31)		190 (31)	193 (31)	190 (31)	194 (39)	189 (38)	201 (36)	195 (36)
Missing	NA		NA	NA	NA	127 127 (53.0)	85 003 (51.1)	NA	NA
High-density lipoprotein, cholesterol
Mean (SD) mg/dL	58 (16)		46 (12)	58 (16)	46 (12)	58 (17)	46 (13)	59 (15)	47 (13)
Missing	NA		NA	NA	NA	127 461 (53.2)	85 227 (51.2)	NA	NA
Body mass index^c
Mean (SD)	29 (5)		29 (4)	28 (5)	29 (4)	29 (7)	29 (6)	29 (5)	29 (4)
Missing	NA		NA	NA	NA	13 822 (5.8)	13 062 (7.8)	NA	NA
Estimated glomerular filtration rate
Mean (SD) mL/min/1.73 m²	91 (19)		91 (17)	91 (18)	91 (17)	90 (30)	90 (33)	88 (19)	88 (18)
Missing	NA		NA	NA	NA	3146 (1.3)	2324 (1.4)	NA	NA
Diabetes	183 983 (10.0)		173 051 (12.0)	208 437 (11.0)	186 576 (13.0)	24 456 (10.2)	20 142 (12.1)	7749 (10.9)	6671 (11.9)
Current smoking	106 710 (5.8)		89 410 (6.2)	89 059 (4.7)	70 325 (4.9)	23 737 (9.9)	22 806 (13.7)	5758 (8.1)	6111 (10.9)
Antihypertensive treatment	423 161 (23.0)		389 365 (27.0)	454 772 (24.0)	416 209 (29.0)	78 403 (32.7)	60 234 (36.2)	25 022 (35.2)	20 968 (37.4)
Statin treatment	257 576 (14.0)		245 156 (17.0)	265 283 (14.0)	243 985 (17.0)	37883 (15.8)	34791 (20.9)	14501 (20.4)	14072 (25.1)
Outcomes
Follow-up time, mean (SD), y	4.8 (3.1)		4.6 (3.0)	5.0 (3.2)	4.8 (3.2)	5.1 (2.4)	4.9 (2.5)	5.9 (2.1)	5.7 (2.2)
Cardiovascular disease events	53 258 (2.9)		53 403 (3.7)	54 365 (2.9)	50 489 (3.5)	17 023 (7.1)	14 483 (8.7)	3910 (5.5)	3700 (6.6)
Atherosclerotic cardiovascular disease events	31 812 (1.7)		34 691 (2.4)	33 969 (1.8)	33 933 (2.4)	8392 (3.5)	7491 (4.5)	2062 (2.9)	2074 (3.7)
Heart failure events	30 957 (1.7)		28 393 (2.0)	30 287 (1.6)	25 679 (1.8)	11 269 (4.7)	9322 (5.6)	2346 (3.3)	2243 (4.0)
Deaths	84 289 (4.6)		80 897 (5.6)	82 555 (4.4)	76 783 (5.3)	18 941 (7.9)	17 146 (10.3)	2488 (3.5)	2579 (4.6)

Open in a new tab

Abbreviation: NA, not applicable.

SI conversion factor: To convert total cholesterol and high-density lipoprotein cholesterol to millimoles per liter, multiply by 0.0259.

^{^a}

Hispanic was not included as a race category in the strict or relaxed cohorts, because in the Duke University Health System electronic health record, Hispanic is recorded as an ethnicity rather than a race category.

^{^b}

American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, or multiple races, as well as records with refused responses.

^{^c}

Calculated as weight in kilograms divided by height in meters squared.

Race and Ethnicity

The Duke cohorts (relaxed cohort: 16 291 Asian [4.0%], 107 114 Black [26.4%], 256 403 White [63.1%], and 26 422 other [6.5%]; strict cohort: 8210 Asian [6.5%], 29 033 Black [22.8%], 83 515 White [65.7%], and 6393 other [5.0%]) were more racially diverse than the original PREVENT cohorts (derivation cohort: 83 887 Asian [2.6%], 299 350 Black [9.1%], 2 588 739 White [78.9%], and 141 769 other [4.3%]; validation cohort: 82 736 Asian [2.5%], 307 174 Black [9.2%], 2 626 170 White [78.9%], and 171 785 other [5.2%]), with higher Asian representation, especially in the strict cohort. In the relaxed cohort, there were 68 490 Black female patients (28.6%) and 38 624 Black male patients (23.2%), compared with 183 983 Black female patients (10.0%) and 115 367 Black male patients (8.0%) in the PREVENT derivation cohort, and 189 488 Black female patients (10.0%) and 117 686 Black male patients (8.2%) in the PREVENT validation cohort

Socioeconomic Indicators

Socioeconomic status differed by cohort. The relaxed cohort included more patients from the most disadvantaged ADI quartile (60 322 patients [14.8%]) than the strict cohort (10 587 patients [8.3%]), consistent with greater missingness among underserved groups. Commercial insurance was most common (242 910 individuals [59.8%] in the relaxed cohort vs 93 120 individuals [73.2%] in the strict cohort), while Medicaid and uninsured rates were higher in the relaxed cohort. Medicare was more frequent in the strict cohort, consistent with its older population.

Clinical Risk Factors

Blood pressure, cholesterol, BMI, and estimated glomerular filtration rate were similar across PREVENT and Duke cohorts, although missingness was high in in the relaxed cohort (TC: 212 130 individuals [52.2%]; HDL-C: 212 688 individuals [52.4%]). Males had higher SBP and lower HDL-C than females. Statin and antihypertensive use was more common in Duke data, especially among males and in the strict cohort (male statin use: 14 072 individuals [25.1%] in the strict cohort vs 34 791 individuals [20.9%] in the relaxed cohort vs 245 156 individuals [17.0%] in the PREVENT derivation cohort vs 243 985 individuals [17.0%] in the PREVENT validation cohort).

Comorbidities and Risk Behaviors

Diabetes prevalence was slightly higher in males across all cohorts. Current smoking rates were notably higher in Duke data, especially in males in the relaxed cohort (22 806 individuals [13.7%]) compared with PREVENT derivation (89 410 individuals [6.2%]) and validation (70 325 [individuals [4.9%]) cohorts. This pattern underscores the higher prevalence of behavioral risk factors in EHR data.

Follow-Up Time and Outcomes

Mean (SD) follow-up time was longest in the strict cohort (5.9 [2.1] years in females and 5.7 [2.2] years in males) and shorter in the relaxed cohort (5.1 [2.4] years in females and 4.9 [2.5] years in males). CVD events were more frequent in Duke data (17 023 females [7.1%] and 14 483 males [8.7%] in the relaxed cohort and 3910 females [5.5%] and 3700 males [6.6%] in the strict cohort) than in PREVENT derivation (53 258 females [2.9%] and 53 403 males [3.7%]) and validation (54 365 females [2.9%] and 50 489 males [3.5%]) cohorts, as were ASCVD and HF events. Mortality was also higher, particularly among males.

Cumulative Incidence Estimation

eFigure 2 in Supplement 1 shows cumulative CVD incidence in the relaxed cohort, revealing consistent subgroup disparities. Asian patients had the lowest event rates and Black patients the highest. Males showed slightly higher incidence than females. Higher risk was associated with greater neighborhood disadvantage. By insurance, Medicare patients had the greatest incidence, while those with commercial insurance had the lowest. Similar trends were observed across CVD subtypes.

Hazard Ratio From Different Models

eFigure 3 in Supplement 1 shows hazard ratios for CVD outcomes in the strict cohort. Elevated SBP, diabetes, smoking, and low estimated glomerular filtration rate were associated with increased risk of CVD, whereas statin use was associated with lower risk. eFigure 4 and eFigure 5 in Supplement 1 show similar patterns across CVD subtypes.

Robustness of PREVENT to Data Missingness: Relaxed vs Strict Cohort

Figure 2 compares C-index values across subgroups in the relaxed cohort and strict cohort. Discrimination remained stable between cohorts, although disparities persisted. Asian males showed the highest C-index, and Black individuals showed the lowest. Model performance declined with increasing neighborhood disadvantage, with the poorest discrimination in the highest ADI quartile. Medicare patients had lower C-index values than those with commercial or Medicaid insurance, while sex-based differences were minimal.

The effect of missingness was minimal, with C-index values largely consistent across cohorts, confirming the robustness of PREVENT to incomplete data. The relaxed cohort showed slightly better performance for Asian subgroups, suggesting good generalizability despite relaxed inclusion criteria. Results for other CVD subtypes are shown in eFigure 6, eFigure 7, and eTable 1 in Supplement 1.

Compared with the original PREVENT validation sample, discrimination in both the relaxed cohort and the strict cohort was similar. Among females, C-index values were both 0.77, slightly below the PREVENT validation cohort (0.79). Among males in the relaxed and strict cohorts, C-index values ranged from 0.75 to 0.77, comparable to or slightly higher than in the PREVENT validation cohort (0.76). Patterns were consistent across CVD subtypes. Black individuals, who showed similar performance to White participants in PREVENT cohorts, had the lowest C-index in both Duke cohorts. Calibration ratios were higher in the strict cohort, indicating greater underestimation of CVD risk in the broader relaxed cohort, especially among Black patients, those in more disadvantaged ADI quartiles, and those with Medicaid or no insurance (eFigure 8, eFigure 9, and eTable 2 in Supplement 1). Because baseline predictors could be drawn from measurements recorded in different care settings, we conducted a sensitivity analysis restricting baseline predictor ascertainment to outpatient-only measurements to reflect typical ambulatory risk calculator use; discrimination and calibration were not different from the primary analysis, and conclusions were unchanged (eTables 1-2 in Supplement 1).

Value of Local Adaptation

Figure 3 compares original PREVENT with Duke-specific adaptations in the relaxed cohort. Discrimination (C-index) ranged from 0.64 to 0.82 and was highest in Asian females and males (0.82) and lowest in Medicare patients (0.63), Black males (0.73), and those in the most disadvantaged ADI quartile (0.72); Medicaid showed higher discrimination (C-index, 0.77). Local retraining and recalibration yielded similar C-indices, with consistent patterns at 5 and 8 years (eFigure 10, eFigure 11, and eTable 3 in Supplement 1). Recalibration modestly improved calibration, but risk remained underestimated for most subgroups, particularly Black patients, higher ADI quartiles, and those with Medicaid or no insurance (eFigure 12, eFigure 13, and eTable 4 in Supplement 1); decile-based calibration plots are shown in eFigure 14 and eFigure 15 in Supplement 1. Adding race, SDOH, and race × SDOH terms did not materially change performance vs the local Cox model.

Discussion

This cohort study assessed the performance of the PREVENT equations using Duke EHR data across 2 cohorts and multiple model variants, including retrained Cox, deep neural network, and recalibrated approaches. PREVENT showed strong, consistent discrimination across demographic and socioeconomic subgroups, even with missing data, supporting robustness in clinical settings. Discrimination was highest in Asian patients and females and lowest in Black patients and those from disadvantaged neighborhoods, highlighting persistent performance disparities and the need for contextual evaluation.⁴⁹

Impact of Missingness and the Value of Flexible Cohort Definitions

Using 2 cohorts, the relaxed cohort (allowing missing data) and strict cohort (complete case), we assessed performance across different levels of data completeness. PREVENT maintained stable discrimination in both cohorts, indicating resilience to missingness. However, calibration was worse in the relaxed cohort, especially among disadvantaged subgroups (Black patients, higher ADI quartiles, and those with Medicaid or no insurance), suggesting missingness is associated with poorer calibration even when discrimination is preserved.

Local Adaptation: Limited Gains in Discrimination and Modest Gains in Calibration

We compared the original PREVENT equations with 4 Duke-specific adaptations, including a locally trained Cox model, a deep neural network, and 2 recalibration approaches. Local adaptation had little effect on discrimination, suggesting PREVENT generalizes well for rank-ordering risk. In contrast, recalibration modestly improved calibration, especially in subgroups underrepresented or miscalibrated by the original model, indicating local adaptation may be more useful for refining absolute risk estimates than improving discrimination.

Implications for Implementation

Overall, the PREVENT equations appear well-suited for routine care, even with incomplete data, given consistent discrimination across diverse subgroups. However, calibration disparities in disadvantaged populations suggest health systems should perform subgroup-specific calibration checks and consider recalibration before deployment, particularly when absolute risk thresholds guide treatment. These findings also underscore the importance of evaluating models in the settings and populations where they will be used. In practice, PREVENT is most useful for population-level risk stratification to guide targeted outreach, follow-up, or referral, and may be especially valuable in primary care and large systems with missing or inconsistently collected measurements. Because PREVENT ranks relative risk well, it can help prioritize prevention, but absolute-risk use should be paired with calibration assessment and, when needed, local or subgroup-specific recalibration.

Limitations

This study has several limitations. Follow-up was relatively short, limiting long-term risk estimation and precluding full 10-year evaluation; because maximum follow-up was 8 years, we assessed discrimination and calibration at 5 and 8 years with consistent patterns. Analyses were based on a single North Carolina health system, which may limit geographic generalizability and reflect regional risk patterns, and events occurring outside DUHS may have been missed. We did not evaluate dynamic or time-updated models. Additionally, we used race- and sex-specific median imputation to preserve inclusiveness; while effective, it may not capture complex dependencies.⁵⁰ Future work will evaluate advanced imputation methods to improve calibration, particularly in disadvantaged groups.⁵¹

Conclusions

In this retrospective cohort study of EHR data from an integrated health system, the PREVENT equations showed strong, consistent discrimination, including with missing laboratory and vital sign data. Local adaptation modestly improved calibration but did not meaningfully change discrimination. These findings support PREVENT for routine clinical use, particularly for ranking cardiovascular risk across diverse populations; however, subgroup-specific recalibration may be needed to improve absolute risk estimates and promote equitable performance across sociodemographic strata.

Supplement 1.

eFigure 1. Overview of Study Design and Analysis Framework

eAppendix 1. Detailed Eligibility Criteria for CohortStrict

eAppendix 2. Codes Used for Data Variables and Medications

eAppendix 3. Open-Source ML Model Recalibration: Detailed Implementation

eAppendix 4. Statistical Analysis: Bootstrapping Method

eFigure 2. Stratified Kaplan-Meier Plots

eFigure 3. Forest Plot for Hazard Ratios for CVD

eFigure 4. Forest Plot for Hazard Ratios for ASCVD

eFigure 5. Forest Plot for Hazard Ratios for HF

eFigure 6. Radar Plot for C-Index of PREVENT Equations for ASCVD and HF (5-Year ASCVD)

eFigure 7. Radar Plot for C-Index of PREVENT Equations (8-Year CVD) Across Cohorts

eTable 1. C-Index of PREVENT Equations Across Cohorts

eFigure 8. Calibration Ratio for 5-Year ASCVD and HF Across Cohorts (PREVENT Predicted vs Observed Event Rate)

eFigure 9. Calibration Ratio for 8-Year CVD Across Cohorts (PREVENT Predicted vs Observed Event Rate)

eTable 2. Calibration Ratio for 5-Year CVD, ASCVD, and HF Across Cohorts (PREVENT Predicted vs Observed Event Rate)

eFigure 10. Radar Plot of 5-Year C-Index for ASCVD and HF Models (CohortRelax)

eFigure 11. Radar Plot of 8-Year C-Index for CVD Models (CohortRelax)

eTable 3. C-Index for 5-Year CVD, ASCVD, and HF Across Models

eFigure 12. Calibration Ratio for 5-Year ASCVD and HF across Models (Models Predicted vs Observed Event Rate)

eFigure 13. Calibration Ratio for 8-Year CVD across Models (Models Predicted vs Observed Event Rate)

eTable 4. Calibration Ratio for 5-Year CVD, ASCVD, and HF Across Models (Models Predicted vs Observed Event Rate)

eFigure 14. Calibration Line Plot—CohortRelax Across Models

eFigure 15. Calibration Line Plot—CohortStrict Across Models

jamanetwopen-e266838-s001.pdf^{(2.9MB, pdf)}

Supplement 2.

Data Sharing Statement

jamanetwopen-e266838-s002.pdf^{(15.3KB, pdf)}

References

1.Martin SS, Aday AW, Allen NB, et al. ; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Committee . 2025 Heart disease and stroke statistics: a report of US and global data from the American Heart Association. Circulation. 2025;151(8):e41-e660. doi: 10.1161/CIR.0000000000001303 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Roth GA, Mensah GA, Johnson CO, et al. ; GBD-NHLBI-JACC Global Burden of Cardiovascular Diseases Writing Group . Global burden of cardiovascular diseases and risk factors, 1990-2019: update from the GBD 2019 study. J Am Coll Cardiol. 2020;76(25):2982-3021. doi: 10.1016/j.jacc.2020.11.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Virani SS, Alonso A, Aparicio HJ, et al. ; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee . Heart disease and stroke statistics—2021 update: a report from the American Heart Association. Circulation. 2021;143(8):e254-e743. doi: 10.1161/CIR.0000000000000950 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Arnett DK, Blumenthal RS, Albert MA, et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. 2019;140(11):e596-e646. doi: 10.1161/CIR.0000000000000678 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Goff DC Jr, Lloyd-Jones DM, Bennett G, et al. ; American College of Cardiology/American Heart Association Task Force on Practice Guidelines . 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(25)(suppl 2):S49-S73. doi: 10.1161/01.cir.0000437741.48606.98 [DOI] [PubMed] [Google Scholar]
6.Lloyd-Jones DM, Braun LT, Ndumele CE, et al. Use of risk assessment tools to guide decision-making in the primary prevention of atherosclerotic cardiovascular disease: a special report from the American Heart Association and American College of Cardiology. Circulation. 2019;139(25):e1162-e1177. doi: 10.1161/CIR.0000000000000638 [DOI] [PubMed] [Google Scholar]
7.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. doi: 10.1136/bmj.g7594 [DOI] [PubMed] [Google Scholar]
8.Damen JA, Hooft L, Schuit E, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. doi: 10.1136/bmj.i2416 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wolf PA, D’Agostino RB, Belanger AJ, Kannel WB. Probability of stroke: a risk profile from the Framingham Study. Stroke. 1991;22(3):312-318. doi: 10.1161/01.STR.22.3.312 [DOI] [PubMed] [Google Scholar]
10.D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. The Framingham Study. Stroke. 1994;25(1):40-43. doi: 10.1161/01.STR.25.1.40 [DOI] [PubMed] [Google Scholar]
11.Dufouil C, Beiser A, McLure LA, et al. Revised Framingham stroke risk profile to reflect temporal trends. Circulation. 2017;135(12):1145-1159. doi: 10.1161/CIRCULATIONAHA.115.021275 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Andrus B, Lacaille D. 2013 ACC/AHA guideline on the assessment of cardiovascular risk. J Am Coll Cardiol. 2014;63(25 Pt A):2886. doi: 10.1016/j.jacc.2014.02.606 [DOI] [PubMed] [Google Scholar]
13.Diaz CL, Shah NS, Lloyd-Jones DM, Khan SS. State of the nation’s cardiovascular health and targeting health equity in the United States: a narrative review. JAMA Cardiol. 2021;6(8):963-970. doi: 10.1001/jamacardio.2021.1137 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Shah NS, Molsberry R, Rana JS, et al. Heterogeneous trends in burden of heart disease mortality by subtypes in the United States, 1999-2018: observational analysis of vital statistics. BMJ. 2020;370:m2688. doi: 10.1136/bmj.m2688 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Tsao CW, Aday AW, Almarzooq ZI, et al. ; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee . Heart disease and stroke statistics—2023 update: a report from the American Heart Association. Circulation. 2023;147(8):e93-e621. doi: 10.1161/CIR.0000000000001123 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Sofogianni A, Stalikas N, Antza C, Tziomalos K. Cardiovascular risk prediction models and scores in the era of personalized medicine. J Pers Med. 2022;12(7):1180. doi: 10.3390/jpm12071180 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Khan SS, Matsushita K, Sang Y, et al. ; Chronic Kidney Disease Prognosis Consortium and the American Heart Association Cardiovascular-Kidney-Metabolic Science Advisory Group . Development and validation of the American Heart Association’s PREVENT equations. Circulation. 2024;149(6):430-449. doi: 10.1161/CIRCULATIONAHA.123.067626 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.American Heart Association. The American Heart Association PREVENT^TM Online Calculator. Accessed October 19, 2025. https://professional.heart.org/en/guidelines-and-statements/prevent-calculator
19.Schutte AE, Pilote L. Implementing the PREVENT Risk equation in the 2025 Guideline for the prevention, detection, evaluation, and management of high blood pressure in adults. Hypertension. 2025;82(10):1541-1544. doi: 10.1161/HYPERTENSIONAHA.125.25465 [DOI] [PubMed] [Google Scholar]
20.Ndumele CE, Neeland IJ, Tuttle KR, et al. ; American Heart Association . A synopsis of the evidence for the science and clinical management of cardiovascular-kidney-metabolic (CKM) syndrome: a scientific statement from the American Heart Association. Circulation. 2023;148(20):1636-1664. doi: 10.1161/CIR.0000000000001186 [DOI] [PubMed] [Google Scholar]
21.Ndumele CE, Rangaswami J, Chow SL, et al. ; American Heart Association . Cardiovascular-kidney-metabolic health: a presidential advisory from the American Heart Association. Circulation. 2023;148(20):1606-1635. doi: 10.1161/CIR.0000000000001184 [DOI] [PubMed] [Google Scholar]
22.Al-Shamsi S, Govender RD. External validation of the American heart association’s PREVENT equations in predicting atherosclerotic cardiovascular disease risk among Arab women and men: a retrospective cohort study. BMC Cardiovasc Disord. 2025;25(1):732. doi: 10.1186/s12872-025-05211-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Razavi AC, Kohli P, McGuire DK, et al. PREVENT equations: a new era in cardiovascular disease risk assessment. Circ Cardiovasc Qual Outcomes. 2024;17(4):e010763. doi: 10.1161/CIRCOUTCOMES.123.010763 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kobo O, Rutter MK, Misra S, et al. Predicting mortality risk using the PREVENT equation across diverse racial groups. Am J Manag Care. 2025;31(5):e113-e119. doi: 10.37765/ajmc.2025.89734 [DOI] [PubMed] [Google Scholar]
25.Cho SMJ, Levin M, Chen R, et al. AHA PREVENT equations and cardiovascular disease risk in diverse health care populations. J Am Coll Cardiol. 2025;86(3):181-192. doi: 10.1016/j.jacc.2025.04.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Lewis AA, Bacong AM, Palaniappan L, Hernandez-Boussard T. Validation of the American Heart Association predicting risk of cardiovascular disease events equations in diverse socioeconomic groups: the All of Us Cohort. J Am Heart Assoc. 2025;14(18):e041549. doi: 10.1161/JAHA.125.041549 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Roxane H, Pedro MV, Julien V. External validation of the 2023 American Heart Association predicting risk of cardiovascular disease events equations for atherosclerotic cardiovascular disease in primary cardiovascular prevention setting and comparison with 2021 systematic coronary risk evaluation and 2013 pooled cohort equations. Eur J Prev Cardiol. Published online May 30, 2025. doi: 10.1093/eurjpc/zwaf213 [DOI] [PubMed] [Google Scholar]
28.Scheuermann B, Brown A, Colburn T, Hakeem H, Chow CH, Ade C. External validation of the American Heart Association PREVENT cardiovascular disease risk equations. JAMA Netw Open. 2024;7(10):e2438311. doi: 10.1001/jamanetworkopen.2024.38311 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hurst JH, Liu Y, Maxson PJ, Permar SR, Boulware LE, Goldstein BA. Development of an electronic health records datamart to support clinical and population health research. J Clin Transl Sci. 2020;5(1):e13. doi: 10.1017/cts.2020.499 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Howard G, Howard VJ. Response by G. Howard and V.J. Howard to letter regarding article, “Twenty Years of Progress Toward Understanding the Stroke Belt”. Stroke. 2020;51(6):e114-e115. doi: 10.1161/STROKEAHA.120.029633 [DOI] [PubMed] [Google Scholar]
31.Cushman M, Cantrell RA, McClure LA, et al. Estimated 10-year stroke risk by region and race in the United States: geographic and racial differences in stroke risk. Ann Neurol. 2008;64(5):507-513. doi: 10.1002/ana.21493 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC). 2013;1(3):1035. doi: 10.13063/2327-9214.1035 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Slee VN. The international classification of diseases: ninth revision (ICD-9). Ann Intern Med. 1978;88(3):424-426. doi: 10.7326/0003-4819-88-3-424 [DOI] [PubMed] [Google Scholar]
34.Steindel SJ. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J Am Med Inform Assoc. 2010;17(3):274-282. doi: 10.1136/jamia.2009.001230 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Inker LA, Eneanya ND, Coresh J, et al. ; Chronic Kidney Disease Epidemiology Collaboration . New creatinine- and cystatin C–based equations to estimate GFR without race. N Engl J Med. 2021;385(19):1737-1749. doi: 10.1056/NEJMoa2102953 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011;18(4):441-448. doi: 10.1136/amiajnl-2011-000116 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Petterson S. Deciphering the Neighborhood Atlas Area Deprivation Index: the consequences of not standardizing. Health Aff Sch. 2023;1(5):qxad063. doi: 10.1093/haschl/qxad063 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Kind AJH, Buckingham WR. Making neighborhood-disadvantage metrics accessible—The Neighborhood Atlas. N Engl J Med. 2018;378(26):2456-2458. doi: 10.1056/NEJMp1802313 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.University of Wisconsin School of Medicine and Public Health. 2015 Area Deprivation Index. Accessed May 23, 2019. https://www.neighborhoodatlas.medicine.wisc.edu/
40.Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. Springer; 2003. doi: 10.1007/b97377 [DOI] [Google Scholar]
41.Bland JM, Altman DG. The logrank test. BMJ. 2004;328(7447):1073. doi: 10.1136/bmj.328.7447.1073 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Hickey J, Henao R, Wojdyla D, Pencina M, Engelhard MM. Improving event time prediction by learning to partition the event time space. arXiv. Preprint posted online October 24, 2023. https://arxiv.org/abs/2310.15853
43.Zinzuwadia AN, Mineeva O, Li C, et al. Tailoring risk prediction models to local populations. JAMA Cardiol. 2024;9(11):1018-1028. doi: 10.1001/jamacardio.2024.2912 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Kohavi R. 1995, August. A study of cross-validation and bootstrap for accuracy estimation and model selection. Abstract presented at the 14th International Joint Conference on Artificial Intelligence; August 20, 1995; Montreal, Quebec, Canada. Accessed March 13, 2026. https://www.ijcai.org/Proceedings/95-2/Papers/016.pdf [Google Scholar]
45.Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-138. doi: 10.1097/EDE.0b013e3181c30fb2 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW; Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative . Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi: 10.1186/s12916-019-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Miller DM, Shalhout SZ. Survival Analysis in R. The Miller Lab. Published April 13, 2020. Accessed October 19, 2025. https://www.themillerlab.io/posts/survival_analysis/
48.Niu M. Real world performance of prevent cardiovascular risk equations using Duke electronic health records. Github. Updated February 2026. Accessed March 5, 2026. https://github.com/AutoEvaluation/PREVENT_DukeStudy
49.Hong C, Pencina MJ, Wojdyla DM, et al. Predictive accuracy of stroke risk prediction models across Black and White race, sex, and age groups. JAMA. 2023;329(4):306-317. doi: 10.1001/jama.2022.24683 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. doi: 10.1136/bmj.b2393 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.van Buuren S. Flexible Imputation of Missing Data. 2nd ed. Chapman and Hall/CRC; 2018. doi: 10.1201/9780429492259 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials