Skip to main content
JACC: Advances logoLink to JACC: Advances
. 2025 Jul 31;4(9):102041. doi: 10.1016/j.jacadv.2025.102041

Electrocardiogram-Based Artificial Intelligence to Identify Coronary Artery Disease

Shinwan Kany a,b,c,, Samuel F Friedman d,, Mostafa Al-Alusi a,b,e, Shaan Khurshid a,b,f, Joel T Rämö a,b,g, Daniel Pipilas a,f, James P Pirruccello a,h,i, Christopher Reeder d, Anthony A Philippakis d,j, Jennifer E Ho a,k, Mahnaz Maddah d, Patrick T Ellinor a,b,e,l,, Akl C Fahed a,b,e,l,∗,
PMCID: PMC12337187  PMID: 40749517

Abstract

Background

Coronary artery disease (CAD) results in substantial morbidity and mortality.

Objectives

The purpose of this study was to develop a deep learning model to detect CAD defined using diagnostic codes (“ECG2CAD”) and identify people at risk for adverse events using electrocardiograms (ECGs) in a primary care setting.

Methods

ECG2CAD was trained on 764,670 ECGs representing 137,199 individuals at Massachusetts General Hospital (MGH). Model performance for discrimination of prevalent CAD was measured using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC), and compared against model of age and sex, and Pooled Cohort Equations, in 3 test sets: MGH, Brigham and Women's Hospital (BWH), and UK Biobank. Subgroups were assessed for incident CAD-related events in a BWH primary care cohort.

Results

ECG2CAD was evaluated in MGH (N = 18,706 [6,051 cases], age 57 ± 16 years), BWH (N = 88,270 [27,898 cases], age 57 ± 16 years), and UK Biobank (N = 42,147 [1,509 cases], age 65 ± 8 years). ECG2CAD consistently discriminated prevalent CAD (MGH AUROC: 0.782; AUPRC: 0.639; BWH: AUROC: 0.747; AUPRC: 0.588; UK Biobank AUROC: 0.760; AUPRC: 0.155) and incrementally vs models based on age and sex or Pooled Cohort Equations (P < 0.01) in MGH and BWH. In the BWH primary care subset, model performance was consistent across subgroups. Being in the highest quintile of ECG2CAD risk was associated with higher risk for adverse events compared with low-risk group (myocardial infarction HR: 5.59; 95% CI: 4.76-6.56, heart failure 10.49; 95% CI: 7.96-13.84, all-cause mortality 2.68; 95% CI: 2.32-3.10).

Conclusions

Artificial intelligence–enabled analysis of the ECG may facilitate identification of individuals with possible undiagnosed CAD and inform downstream testing and preventive measures.

Key words: artificial intelligence, coronary artery disease, electrocardiogram, prediction

Central Illustration

graphic file with name ga1.jpg


Coronary artery disease (CAD) remains a leading cause of global morbidity and mortality despite increasing attention to the optimization of risk factors.1,2 The process of coronary atherosclerosis starts early in the life course and often remains unrecognized and untreated until older age, or worse, until it manifests clinically through stable symptoms (eg, angina, dyspnea), myocardial infarction, or heart failure. Subclinical CAD is common even as early as the second decade, as shown by autopsy studies on soldiers who died during their service.3,4 A more contemporary study using coronary computed tomography angiography to screen an early middle-aged population reported a CAD prevalence of over 42% in individuals without known clinical disease as young as 50 years.5

Current paradigms of CAD prevention rely heavily on evaluation and treatment of conventional risk factors. While evaluation of conventional risk factors such as hypertension, diabetes, smoking, and hyperlipidemia is integral to the risk assessment process, this risk-factor-centric approach might fail to identify patients with substantial disease that do not present with these conventional risk factors.6 A recent international working group estimated that at least 11.6% of patients presenting with acute coronary syndromes have no standard modifiable cardiovascular risk factors.7

The 12-lead electrocardiogram (ECG) is a widely available, noninvasive, and inexpensive tool in medical care worldwide. Classical features, such as q-waves or reduced R-wave amplitudes, are commonly used to diagnose or raise suspicion of CAD.8 However, the ECG may contain more subtle features associated with CAD that are difficult or time-consuming to measure manually. Advances in artificial intelligence (AI) have enabled rapid prediction of disease states using data from the whole ECG, as recently shown for the prediction of left ventricular dysfunction or atrial fibrillation.9,10

We hypothesized that a deep learning model based on standard 12-lead ECG could detect prevalent CAD. Our primary aim was to show that: 1) an ECG-based AI model can provide discriminatory power to detect CAD; and 2) that this prediction would be at least as good as clinical risk or age and sex. We also wanted to explore whether the ECG-based AI prediction is associated with adverse events related to CAD.

We envisioned this model as an additional risk estimator for prevalent CAD in a primary care setting augmenting existing clinical frameworks.

Methods

Study populations

The deep learning model to predict CAD (Electrocardiogram to Coronary Artery Disease, ECG2CAD) was developed using data from 2 established cohorts, the Community Care Cohort Project (C3PO) and the Enterprise Warehouse of Cardiology (EWOC), for training and validation. These cohorts represent 520,868 patients from primary care (C3PO) and 99,257 patients from cardiology care (EWOC) within the Mass General Brigham (MGB) multi-institutional health care system in New England, USA.11,12 Briefly, patients aged 18 to 90 years, who received medical services in the MGB health care system, within 3 years, at any of the 7 MGB network hospitals between 2000 and 2019 were included. Patients needed to have at least 2 visits with primary care (C3PO) or ambulatory cardiology services (EWOC) to qualify for inclusion. Only the Brigham and Women's Hospital (BWH) data from C3PO and EWOC represent the external validation test set within MGB.

All these institutions share a unified electronic health record (EHR) database. Training for ECG2CAD was conducted on data from patients who had at least one ECG performed within 3 years prior to the start of follow-up within Massachusetts General Hospital (MGH) (Figure 1) in either primary or cardiology care. The model was then tested on an independent test set of individuals in MGH as well as an external validation set in BWH for the association with prevalent CAD. Given our interest in the potential use of ECG2CAD as a risk estimator in the primary care setting, we utilized a subset of the BWH external validation set that included the primary care patients (BWH Primary Care Cohort) for exploratory incident outcome analysis.

Figure 1.

Figure 1

Study Overview

Depicted is an overview of the study design. ECG2CAD was derived from the Massachusetts General Hospital (MGH) development set and trained using both prevalent and incident CAD within 3 years of start of follow-up. The exclusions are shown in Supplemental Figure 2. This figure includes graphics from Smart Servier Medical Art under the CC BY 4.0 license. BWH = Brigham and Women's Hospital; CAD = coronary artery disease; PCE = Pooled Cohort Equations.

We performed additional external validation in the UK Biobank, a prospective cohort comprising 502,629 participants recruited between 2006 and 2010.13 We included all UK Biobank participants who had a standardized resting 12-lead ECG during the imaging visit (“instance 2 visit”).

The use of MGB data was approved by the MGB Institutional Review Board. The UK Biobank was approved by the UK Biobank Research Ethics Committee (reference 11/NW/0382) and conducted under application #7089. All UK Biobank participants provided written informed consent. No preregistered protocol was published for this work. There was no patient involvement in the study design. This work was conducted following the TRIPOD-AI Checklist.

Outcome definition

CAD was defined using International Statistical Classification of Diseases and Related Health Problems (ICD)-9 and ICD-10 billing codes in MGH and BWH utilizing a previously established data infrastructure for processing and updating EHR.12 In the UK Biobank, CAD was defined using self-reported questionnaire data, in addition to ICD-9 and ICD-10 billing codes as well as OPCS procedure codes from inpatient and primary care data. The complete list with each code is provided in Supplemental Table 1.

Model training and output

ECG2CAD is a convolutional neural network designed to perform binary classification of prevalent CAD using a single 12-lead ECG as input. The model was trained using 2 different loss functions: cross-entropy loss for categorical tasks (CAD diagnosis and sex) and logcosh loss for the regression task (age). ECG2CAD provides a probability estimation of CAD on a scale from 0 to 1. Further details are found in the Supplemental Methods.

For training, ECG2CAD was exposed to all 12-lead ECGs performed within the 3 years before the start of follow-up in the MGH training set and early stopping was used to select model weights with the best performance in the held-out subset of 24,267 individuals from the MGH development set. For the purposes of model development, the training target for ECG2CAD was not only prevalent CAD but also incident CAD occurring within 3 years of start of follow-up. We estimated that any clinical CAD diagnosed within 3 years would be likely already present at the start of follow-up, and therefore chose to include these cases in model training. We additionally chose to compare ECG2CAD to a logistic model of age and sex for predicting prevalent CAD for benchmarking.

For evaluation, ECG2CAD was tested using only the most recent ECG at or before the start of follow-up in MGH/BWH, and the single 12-lead ECG performed at the imaging visit in the UK Biobank, and only considering prevalent CAD as the prediction target.

Pooled cohorts equations

We sought to compare ECG2CAD to a clinical risk score. Due to the lack of such a score to discriminate prevalent CAD, other groups have used the Pooled Cohorts Equations (PCE) to classify prevalent CAD or 1-year risk of CAD. To enable a more accurate comparison, we refitted the PCE for the prediction of prevalent CAD in a logistic regression in the MGH development set which was used to train EG2CAD. The PCE scores were designed to estimate 10-year incident atherosclerotic cardiovascular disease (ASCVD) risk based on clinical factors. We selected PCE as a comparator for benchmarking given its use as an ASCVD risk stratification tool for ambulatory patients and strong incorporation into clinical care, along with its previously favorable performance compared to other prediction scores and the endorsement for such in guildines.14,15

We applied the revised PCEs by Yadlowsky et al14 for risk estimation. For this analysis, we utilized a complete case approach and included only patients aged 40 to 79 years, the specified age range for PCE application. Baseline age, sex, total cholesterol (ranging from 130 to 320 mg/dL), high density lipoprotein cholesterol (ranging from 20 to 100 mg/dL), systolic blood pressure (ranging from 90 to 200 mm Hg and accounting for antihypertensive medication status), diabetes status, and smoking status were obtained from the EHR. We excluded any patients with values outside the recommended ranges for blood-based measures and systolic blood pressure to ensure accurate risk estimation for PCE as outlined by Yadlowsky et al.14

Model performance in ECGs without abnormal findings and subgroups

To assess the model performance beyond known ECG signs of prevalent CAD, we tested ECG2CAD in the subgroup of ECGs that were interpreted as normal by the clinical cardiologist reader. The text of the cardiologist ECG readings was examined for the term “Normal Sinus Rhythm Normal ECG,” a frequently used phrase, input via a text macro, by cardiologists in the MGB system when an ECG displays no abnormalities. We additionally assessed model performance within subgroups of age (<45 years, 45-65 years, and <65 years), based on sex and self-reported race and compared it with the age- and sex-based logistic model. We also assessed performance in individuals without risk factors defined as patients without evidence for hypertension, hyperlipidemia, diabetes mellitus, or chronic kidney disease at the start of follow-up.

Association of incident coronary artery disease-related outcomes

To evaluate whether the prediction of ECG2CAD has any utility for risk estimation in people without and with known CAD, we analyzed associations with incident CAD-related outcomes (Supplemental Table 1) including myocardial infarction, heart failure, and all-cause mortality within the BWH primary care cohort, given ECG2CAD is designed for primary care use. To ensure minimal misclassification and contamination with acute events, we excluded participants who encountered incident events within the initial 30 days of the start of follow-up.

Using Cox proportional hazard models adjusted for age and sex, we compared risks across 4 scenarios: 1) participants identified with CAD based solely on the ICD-based definition; 2) those classified with CAD based solely on the ECG2CAD model based on the threshold that optimized the F1 score in the BWH test set; 3) those identified with CAD by both the ICD-based definition and the ECG2CAD model; and 4) those not identified with CAD by either method. Schoenfeld residuals were visually inspected to test the proportional hazard assumptions.

We also assessed the cumulative incidence of cardiovascular outcomes by these groups and the cumulative incidence when stratifying the ECG2CAD prediction into high risk (top quintile), intermediate risk (quintile 2-4), and low risk (bottom quintile). This analysis was conducted for the overall cohort and after excluding individuals with diagnosed CAD.

Phenome-wide association study in the UK Biobank

Deep learning–based models such as ECG2CAD are often a black box, which is challenging when envisioning a clinical use case. One method to interrogate the model is to perform hypothesis-free systematic association testing to recapitulate known associations with CAD risk. Therefore, we performed a phenome-wide association study (PheWAS) in the UK Biobank. We used logistic regression models with standardized ECG2CAD prediction adjusting for age at the time of ECG and sex. Since the follow-up time after ECG in the UK Biobank set was only around 3 years, we opted to combine both prevalent and incident disease, similar to our reasoning in ECG2CAD model development. We employed v1.2 of the Phecode Map19,16 comprising 1,867 disease definitions organized into clinically relevant categories and defined using standardized sets of International Classification of Diseases-9th and -10th Revision codes. These Phecode definitions can be accessed at https://phewascatalog.org/.

Statistical analysis

Two primary metrics were used for ECG2CAD evaluation: the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). The AUPRC enables a more interpretable estimation of the ability of a model to identify true positives as it evaluates the balance between (positive predictive value) and recall (sensitivity) which is particularly useful in unbalanced cohorts.

CIs for these metrics were computed using bootstrapping techniques with 1,000 replicates. To evaluate the relative performance and added predictive value of ECG2CAD, we compared against a baseline logistic regression model: 1) constructed solely using age and sex; and 2) using ECG2CAD and age and sex as covariates. The statistical significance of differences in AUROC between ECG2CAD and the baseline models, as well as the perturbations of combined models was evaluated using the DeLong's test for paired ROC curves. In the primary care analysis across subgroups, we estimated the statistical significance with bootstrapping with 1,000 replicates. A P value <0.05 was considered statistically significant, except for the PheWAS with a Bonferroni threshold of P = 4.6 × 10–5. A similar approach using logistic regression was used for combined models of PCE and ECG2CAD in the PCE subsets as outlined in Figure 1.

We additionally assessed the accuracy, negative predictive value, positive predictive value, sensitivity, specificity, and maximal F1 score of ECG2CAD using thresholds that optimize F1 score in each cohort to assess how ECG2CAD may work as a risk estimation tool using a prespecified cutoff. Finally, a decision curve analysis was conducted to compare the net-benefit of using ECG2CAD combined with age and sex vs an age- and sex-only model.

All statistical analyses were conducted using Python and R (version 4.3.1).

Results

Participant characteristics

There were 286,240 participants in this study divided into an MGH development set used for training and validation (N = 137,036) and 3 test sets (Table 1). The MGH development set comprised 137,036 participants (27.4% with CAD) with 764,670 ECGs (median ECGs per individual: 3 [Q1: 1, Q3: 6]). The mean age was 56 ± 20 years and 50.7% were women. The performance of ECG2CAD was tested in 3 different test sets: MGH (18,794 individuals, age 57 ± 16 years, 48.8% women), BWH (88,270 individuals, age 57 ± 16, 54.8% women), and UK Biobank (42,147 individuals, age 65 ± 8 years, 51.6% women). The MGH and BWH test sets had overall similar baseline characteristics compared with the MGH development set, whereas UK Biobank participants were older but generally healthier (Table 1). An overview of the cohorts by CAD status is shown in Supplemental Tables 2 to 4.

Table 1.

Sample Baseline Clinical Characteristics

MGH Development
(n = 137,036)
MGH Test
(n = 18,794)
BWH Test
(n = 88,270)
UK Biobank Test
(n = 42,140)
Age, y (SD) 56.19 (17.0) 57.47 (16.7) 57.17 (16.2) 64.66 (7.7)
Female, n (%) 69,445 (50.7) 9,164 (48.8) 48,327 (54.8) 21,737 (51.6)
Race/ethnicity, n (%)
 Asian or Pacific Islander 5,098 (3.7) 697 (3.7) 2,211 (2.5) 647 (1.5)
 Black 7,588 (5.5) 967 (5.1) 9,447 (10.7) 364 (0.9)
 Hispanic or Latino 5,656 (4.1) 765 (4.1) 5,527 (6.3) 0
 Other 8,343 (6.1) 1,121 (6.0) 6,445 (7.3) 1,714 (4.1)
 White 110,357 (80.5) 15,244 (81.1) 64,640 (73.2) 39,390 (93.5)
Coronary artery disease, n (%) 37,537 (27.4) 6,051 (32.2) 27,898 (31.6) 1,509 (3.6)
Body mass index, kg/m2 (SD) 28.66 (6.5) 28.71 (7.0) 29.02 (7.9) 26.46 (4.4)
Diabetes mellitus, n (%) 23,208 (16.9) 3,348 (17.8) 16,234 (18.4) 1,572 (3.7)
Hypertension, n (%) 80,682 (58.9) 11,789 (62.7) 51,953 (58.9) 12,914 (30.7)
Atrial fibrillation, n (%) 19,242 (14.0) 3,199 (17.0) 14,454 (16.4) 1,218 (2.9)
Chronic kidney disease, n (%) 11,919 (8.7) 1,816 (9.7) 9,414 (10.7) 392 (0.9)

Baseline clinical characteristics of all the development cohort and the 3 test cohorts. Other is defined as either mixed, other or unknown.

BWH = Brigham and Women's Hospital; MGH = Massachusetts General Hospital.

ECG2CAD detects prevalent coronary artery disease

The ECG2CAD model exhibited favorable performance in detecting CAD across all test sets. The prevalence of CAD differed between cohorts. In the MGH test set it was 32.2% (6,051 cases, 37.2% female), in BWH it was 31.6% (27,898 cases, 41.3% female), and in UK Biobank 3.6% (1,509 cases, 18.6% female). Using AUROC, the performance of ECG2CAD was consistent in all 3 test sets with an AUROC of 0.782 (95% CI: 0.775-0.789) in MGH, of 0.747 (95% CI: 0.744-0.750) in BWH, and 0.760 (95% CI: 0.748-0.772) in UK Biobank. Reflecting the differences in prevalence, the AUPRC was 0.643 (95% CI: 0.630-0.655) in the MGH test set, 0.574 (95% CI: 0.568-0.580) in BWH, and 0.156 (95% CI: 0.138-0.173) in UK Biobank.

When compared to ECG2CAD, discrimination of prevalent CAD using age and sex was less favorable in the MGH (age and sex model AUROC 0.744, 95% CI: 0.736-0.751; P < 0.01) and BWH (AUROC: 0.717; 95% CI: 0.713-0.721; P < 0.01) test sets, and comparable in the UK Biobank (0.756; 95% CI: 0.743-0.766; P = 0.525). However, AUPRC in the UK Biobank was greater using ECG2CAD as depicted in Figure 2. The logistic regression model including ECG2CAD and age and sex, the minimal information available to a clinician in a putative use case, was superior to the other models in all test sets, achieving an AUROC of 0.790 (95% CI: 0.784-0.797) in MGH, 0.757 (95% CI: 0.754-0.760) in BWH, and 0.786 (95% CI: 0.774-0.797) in the UK Biobank.

Figure 2.

Figure 2

Model Performance vs Age and Sex in the 3 Test Cohorts for Discrimination of Prevalent CAD

Depicted is model discrimination of prevalent CAD based on a model of age and sex (yellow), ECG2CAD (blue) and a combined model (red) measured using the area under the receiver operator curve with a 95% CI (AUROC) on the left and the area under the precision recall curve (AUPRC) on the right. MGH = Massachusetts General Hospital; other abbreviation as in Figure 1.

Comparison of ECG2CAD with pooled cohort equations

We then compared the performance of ECG2CAD to a model based on the PCE refitted within the MGH development set to predict prevalent CAD. After exclusion of people with missing data or outside the recommended ranges for PCE calculation, the MGH PCE subset consisted of 2,886 patients, the BWH PCE subset consisted of 20,868 patients, and the UK Biobank PCE subset consisted of 30,160 participants. A flow diagram is shown in Supplemental Figure 2.

ECG2CAD showed better discrimination than the refitted PCE-based model in the MGH PCE subset (AUROC: 0.721; 95% CI: 0.699-0.741, vs 0.673, 95% CI: 0.652-0.694; P < 0.001), in the BWH PCE subset (AUROC: 0.703; 95% CI: 0.695-0.711, vs 0.675; 95% CI: 0.667-0.683; P < 0.001) and the UK Biobank PCE subset (AUROC: 0.762; 95% CI: 0.746-0.777, vs 0.740; 95% CI: 0.726-0.753; P = 0.009). ECG2CAD showed the highest AUPRC for all 3 test sets. Compared with ECG2CAD alone, a combined logistic regression model of both PCE and ECG2CAD showed the best performance as depicted in Figure 3 with an AUROC of 0.730 (95% CI: 0.710-0.748; P < 0.001) in MGH PCE subset, 0.715 (95% CI: 0.707-0.723; P < 0.001) in the BWH PCE subset, and 0.779 (95% CI: 0.764-0.794; P = 0.001) in the UK Biobank PCE subset.

Figure 3.

Figure 3

Model Performance vs Pooled-Cohort Equations in the 3 Test Cohorts for Discrimination of Prevalent CAD

Depicted is model discrimination of prevalent CAD based on a model of PCE risk (yellow), ECG2CAD (blue) and a combined model (red), measured using the area under the receiver operator curve with a 95%-Confidence Interval (AUROC) on the left and the area under the precision recall curve (AUPRC) on the right. All shown test sets comprise a subset eligible for PCE calculation within the overall test sets and therefore not comparable to the performance in Figure 2. Abbreviations as in Figures 1 and 2.

ECG2CAD performance is consistent in normal ECGs and across age, sex, and self-reported ethnicity/race groups

Since ECG2CAD is conceptualized as a tool that would be particularly useful in the primary care setting, we assessed the performance of the combined ECG2CAD + age and sex model with the age- and sex-only model in a subset of the BWH test set including only primary care patients (N = 60,386). We used the combined models since both age and sex would be available when using such the ECG2CAD tool.

We assessed the performance of our model to detect CAD in ECGs that were read as “normal” by cardiologists, and across age groups, sex, and self-reported race groups (Table 2). Utilizing ECG2CAD in addition to age and sex consistently improved discrimination (AUROC delta of 0.011-0.086) in all subgroups, including those considered normal by cardiologist reading as well as subgroups based on age categories, sex, or self-reported race/ethnicity and patients without risk factors. The comparison of ECG2CAD only vs age and sex only is provided in Supplemental Table 6.

Table 2.

Discrimination of Prevalent Coronary Artery Disease Using ECG2CAD Across Subgroups in the BWH Primary Care Cohort Set

AUROC
N Cases Age and Sex ECG2CAD + Age and Sex Improvement Using ECG2CAD P Value
Overall 60,386 14,660 0.716 (0.711-0.72) 0.76 (0.756-0.765) 0.044 <0.001
Normal ECG 19,615 2,948 0.667 (0.657-0.678) 0.679 (0.668-0.689) 0.011 <0.001
Not normal ECG 40,771 11,712 0.715 (0.709-0.72) 0.766 (0.761-0.772) 0.052 <0.001
Age
 <45 y 16,157 1,546 0.607 (0.592-0.622) 0.658 (0.643-0.673) 0.051 <0.001
 45-65 y 26,617 6,039 0.634 (0.626-0.642) 0.7 (0.692-0.708) 0.066 <0.001
 >65 y 17,612 7,075 0.626 (0.618-0.634) 0.712 (0.704-0.719) 0.086 <0.001
Sex
 Male 24,622 7,930 0.698 (0.691-0.705) 0.761 (0.754-0.767) 0.063 <0.001
 Female 35,764 6,730 0.69 (0.683-0.697) 0.732 (0.725-0.739) 0.042 <0.001
Race/Ethnicity
 Black 7,759 1,627 0.697 (0.683-0.711) 0.75 (0.736-0.764) 0.053 <0.001
 White 41,319 10,576 0.718 (0.713-0.724) 0.762 (0.756-0.767) 0.043 <0.001
 Other 4,998 1,208 0.698 (0.681-0.715) 0.744 (0.727-0.76) 0.045 <0.001
Asian or Pacific Islander 1,611 340 0.710 (0.68-0.741) 0.743 (0.713-0.773) 0.033 <0.001
Hispanic or Latino 4,699 909 0.716 (0.697-0.735) 0.760 (0.741-0.778) 0.044 <0.001
Without risk factors 17,626 1,560 0.644 (0.629-0.659) 0.670 (0.655-0.685) 0.026 <0.001

AUROC in the overall BWH primary care test set using either age and sex alone or in addition with ECG2CAD for discrimination of prevalent coronary artery disease across subgroups of age, sex and race/ethnicity. Without risk factors are defined as patients without evidence for hypertension, hyperlipidemia, diabetes mellitus or chronic kidney disease at the start of follow-up. P values derived from DeLong test. Other is defined as either mixed, other or unknown.

AUROC = area under the receiver operator curve; ECG = electrocardiogram; other abbreviation as in Table 1.

ECG2CAD is associated with ischemic heart disease in a UK Biobank PheWAS

To assess which diseases the ECG2CAD prediction is associated with in a structured, hypothesis-free testing, we performed a PheWAS in the UK Biobank. ECG2CAD had the strongest statistical association with ischemic heart disease (OR: 1.7 per SD of standardized ECG2CAD; P = 2.0 × 10−175), myocardial infarction (OR: 2.1/SD; P = 6.9×10−148), and coronary atherosclerosis (OR: 1.8/SD; P = 1.8 × 10−148) (Supplemental Figure 3). In terms of noncardiovascular diagnoses, we also observed associations with disorders of lipid metabolism (OR: 1.3/SD; P = 2.1 × 10−62) and hyperlipidemia (OR: 1.3/SD; P = 2.0×10−175). The top 20 statistical associations are shown in Supplemental Table 7.

Exploratory analysis of incident coronary artery disease-related outcomes

To understand whether the ECG2CAD classifier has predictive value even when CAD status is known, we conducted an exploratory analysis of incident adverse outcomes based on the threshold that optimizes the F1 score in the BWH set. After excluding anyone with an incident event of myocardial infarction, heart failure, or death within 30 days of ECG in the BWH primary care cohort, a total of 51,808 participants were available for longitudinal analysis of future clinical events. The mean follow-up was 8.1 years (SD: 5.7 years, 37.6% with at least 10 years of follow-up).

Compared with the reference population of those classified as having no CAD by ECG2CAD or with no ICD-based definition of CAD, we observed that either the presence of AI-predicted CAD (HR: 2.35; 95% CI: 2.16-2.56; P < 0.001) and ICD-based known CAD (HR: 2.21; 95% CI: 1.98-2.48; P < 0.001) were associated with increased risk for incident myocardial infarction. However, the presence of both definitions was associated with an almost five-fold increase in risk (HR: 4.92; 95% CI: 4.80-5.40; P < 0.001). We observed a similar pattern for heart failure (HR for combined CAD definition 5.77, 95% CI: 4.97-6.71; P < 0.001) and all-cause mortality (HR for combined CAD definition 2.12; 95% CI: 1.95-2.31; P < 0.001) as shown in Table 3 (cumulative incidence in Supplemental Figure 4). In the ECG2CAD-based prediction by risk groups, the high-risk group consistently showed the highest cumulative incidence while the risk for future events was modest in the low-risk group (Figure 4). Being in the high-risk group was associated with higher risk for adverse events compared with low-risk group (myocardial infarction HR: 5.59; 95% CI: 4.76-6.56, heart failure 10.49, 95% CI: 7.96-13.84, all-cause mortality 2.68, 95% CI: 2.32-3.10) (Supplemental Table 8). Similar observations were made, even when excluding people with diagnosed CAD at baseline from the analysis (Supplemental Table 8, Supplemental Figure 5).

Table 3.

Hazard Ratios for Incident Adverse Events by ECG2CAD Classification and by ICD-Based CAD

HR (95% CI) P Value
Myocardial infarction
 ECG2CAD−/ICD+ 2.21 (1.98-2.48) <0.001
 ECG2CAD+/ICD− 2.35 (2.16-2.56) <0.001
 ECG2CAD+/ICD+ 4.92 (4.80-5.40) <0.001
Heart failure
 ECG2CAD−/ICD+ 1.96 (1.55-2.48) <0.001
 ECG2CAD+/ICD− 3.15 (2.75-3.61) <0.001
 ECG2CAD+/ICD+ 5.77 (4.97-6.71) <0.001
Mortality
 ECG2CAD−/ICD+ 1.20 (1.05-1.37) <0.001
 ECG2CAD+/ICD− 1.67 (1.55-1.79) <0.001
 ECG2CAD+/ICD+ 2.12 (1.95-2.31) <0.001

Cox models of incident events in those with either that were classified with present coronary artery disease by none of ECG2CAD and ICD billing code (ECG2CAD−/ICD−, N = 28,415), by only ICD billing code (ECG2CAD−/ICD+, N = 3,276), by only ECG2CAD (ECG2CAD+/ICD−, N = 14,906) or by both (ECG2CAD+/ICD+, N = 5,187). The ECG2CAD−/ICD-group is the reference group, all models were adjusted by age and sex.

CAD = coronary artery disease; ICD = International Statistical Classification of Diseases and Related Health Problems.

Figure 4.

Figure 4

Cumulative Incidence of Myocardial Infarction, Heart Failure, and All-Cause Mortality in the BWH Primary Care Cohort Stratified by ECG2CAD Risk Group

Depicted are incident all-cause mortality, myocardial infarction, and heart failure in the BWH primary care cohort. The low-risk group represents the bottom quintile of predicted risk, the Intermediate risk group the quintiles 2 to 4 of predicted risk and the high-risk group the top quintile of ECG2CAD predicted risk. The analysis excluding people with prevalent CAD is shown in Supplemental Figure 5. Abbreviations as in Figure 1.

Model interpretability

To understand which waveform patterns are associated with higher predictions from ECG2CAD, we stratified the cohort into high- and low-risk categories and produced median waveforms as well as saliency maps. The saliency map shows the gradient of the model's CAD prediction with respect to input voltage. Since saliency maps show which parts of the ECG the model is most sensitive to, they give an indication of what drives the model prediction. However, since saliency maps are derived from specific examples, they might not give an entirely comprehensive picture of what drives the model's predictions. For that reason, we also depict the median waveforms from both extremes of the CAD prediction distribution. The high-risk group demonstrated greater feature importance in the amplitudes during the QRS across all leads, except for leads V5 and V6, where feature importance was more evenly distributed as seen in Supplemental Figure 6. Saliency maps were also generated for ECGs considered “normal” by cardiologist reading and these showed high sensitivity to the P wave and the initial slope of the R-wave, particularly in leads I, aVL, and V1 (Supplemental Figure 7).

Net-benefit of using ECGCAD compared with age and sex

In the decision curve analysis, we observed that compared with a treat all strategy, risk stratification becomes useful around the 7.5% mark. The combined model consistently shows higher net benefit than the age- and sex-only model, especially evident at moderate threshold probabilities (20%-50%) (Supplemental Figure 8).

Discussion

Our study demonstrates that ECG2CAD, a deep learning model developed and evaluated using data from nearly 300,000 ambulatory primary care and cardiology patients, can discriminate the presence of CAD based on a single 12-lead ECG (Central Illustration). ECG2CAD demonstrated consistent performance across 3 different test sets spanning 2 ambulatory health care samples and a prospective cohort study, and across strata of sex, age, and race. Incorporation of ECG2CAD provided incremental improvement over age and sex, or the PCEs in detecting CAD, implying the presence of predictive information beyond conventional risk factors. Furthermore, ECG2CAD was associated with future CAD-related adverse events such as incident myocardial infarction, heart failure, and mortality, both in people with and without clinically known CAD at baseline, highlighting the potential of uncovering potential undiagnosed CAD. These findings provide evidence that deep learning models may empower large-scale efforts to identify people at risk for CAD and its sequelae like heart failure.

Central Illustration.

Central Illustration

ECG2CAD Identifies Coronary Artery Disease and Predicts Adverse Events

ECG2CAD was derived from the Massachusetts General Hospital (MGH) development set and trained using both prevalent and incident CAD within 3 years of start of follow-up. This figure includes graphics from Smart Servier Medical Art under the CC BY 4.0 license. AUROC = area under the receiver operating characteristic curve; other abbreviations as in Figure 1.

ECG2CAD expands prior work using deep learning models trained on 12-lead ECGs to detect existing heart disease or predict future cardiac conditions. Previous work from our group and others has shown that convolutional neural network-based models using 12-lead ECGs accurately predict 5-year risk of incident atrial fibrillation, or underlying left ventricular dysfunction.9,10 Recently, ASCVD has become a focus of modeling cardiovascular mortality based on 12-lead ECGs. Hughes et al used over 300,000 ECGs to train a model to predict 5-year risk of cardiovascular mortality with an area under the curve (AUC) of 0.83. However, the performance to predict 5-year risk of ASCVD was lower with an AUC of 0.63 to 0.67 reflecting the diverse etiology of cardiovascular mortality.17 Awasthi et al developed 3 ECG-based models to detect coronary artery calcium ≥300, obstructive CAD, and regional akinesis (ie, a surrogate for prior myocardial infarction). While these specific models showed good performance with AUCs of 0.88, 0.85, and 0.94, respectively, they were trained on smaller patient cohorts (N = <20,000) and provided no external validation.18 In contrast, ECG2CAD was designed specifically as a risk estimation tool for CAD and achieves an AUC of 0.75 to 0.78 across 3 distinct samples, including a large healthy cohort with a relatively low prevalence of CAD (UK Biobank).

When considering how to use an ECG-based prediction model such as ECG2CAD, it is important to consider the context with a clinical framework. In addition to minimal available information (age and sex), physical examination, patient history, and risk factors remain critical in assessing the risk for cardiovascular disease. While the ability of ECG2CAD in detecting CAD is promising, it is neither perfect nor without inherent challenges. One challenge is that usually ECGs are performed because cardiovascular disease is suspected which leads to an indication bias in most available data sets. The UK Biobank collects a 12-lead resting ECG as part of the structured imaging visit. Additionally, the prevalence of CAD in UK Biobank is low and the pretest probability in this healthier cohort is also lower compared with MGH and BWH. The discriminatory ability of ECG2CAD even in the UK Biobank is therefore reassuring but points to the need for comprehensive clinical work-up in addition to AI tools in settings with low pretest probability of CAD.

The detection of subclinical CAD is an interesting application of AI in preventive cardiology and needs prospective validation. In our work, as shown in the incident CAD-related outcome analysis, it would be more accurate to view undiagnosed CAD as the potential use case of ECG2CAD. The capability to detect CAD from ECGs, even those classified as “normal” by experienced cardiologists, point to variations or patterns the AI discerns that seem to elude expert-level detection. This might be due to very subtle changes that are individually imperceptible to routine reads of ECGs but may be important in sum or more obvious changes that are not widely recognized as CAD-conferring yet. For instance, predicted high-risk ECGs show bigger R-wave amplitudes in leads I and aVL but not V1 which is also observed when inflating a balloon during coronary interventions as very early signs of transmural ischemia.19 Whether these changes also indicate the presence of chronic ischemia such as CAD is not clinically established. Such patterns on ECG, while perhaps not pathognomonic for CAD, might also reflect a predisposition or an earlier phase of the disease with detectable effects on the ECG waveform and amplitude.

With advances in noninvasive coronary imaging, it is now feasible to evaluate undiagnosed atherosclerosis and detect high-risk coronary features on a computed tomography (CT) scan earlier in the life course. Given that a CT-focused approach is not feasible in the general population today due to radiation exposure, lack of CT machines, as well as cost considerations, enrichment strategies will be needed to identify individuals for risk estimation, an AI-based strategy leveraging the widely available ECG might also become useful.

The predictive ability of ECG2CAD for CAD sequelae like myocardial infarction, heart failure, and mortality, highlights the potential use case in preventive cardiology. For patients identified within the highest risk quintile by the model, early preventive measures—from medical therapy to lifestyle modifications—could be initiated, potentially redirecting their cardiac risk trajectory.20 This also raises the question of how to incorporate such a tool into primary care workflows. With early CAD detection, primary care practitioners could drive more precision-focused referrals to cardiology specialists for preventive care. The low cost and ubiquity of 12-lead ECGs, both in academic and resource-constrained clinical settings, underlines the scalability of AI approaches utilizing ECGs. Early CAD detection in areas with limited health care resources could improve access in global preventive cardiac care.

However, an amplification in testing due to AI-based tools could increase both the financial burden on health care systems and patients and induce anxiety stemming from potential false positive diagnoses, and lead to net harm. Therefore, a critical appraisal of any risk estimation modality is needed to understand the balance between the advantages of early CAD detection with the pitfalls of potential overinterpretation of such an AI tool.

Study Limitations

This study, while very large and robust in its design and outcomes, has several limitations worth noting. First, the application of ECG2CAD as a clinical decision tool needs to be assessed in a prospective manner to assess whether it leads to higher detection rates and greater initiation of appropriate preventive measures. Second, the reliance on ICD billing codes for CAD as labels might introduce potential misclassification, given that billing codes may not always accurately capture clinical diagnoses. Third, the intended application is within a clinical decision framework in primary care, but the model was solely trained using ICD-based diagnosis of CAD, that is, clinical CAD. However, we submit that associations with incident CAD-related events among individuals with no known CAD provide evidence for the detection of undiagnosed CAD. Fourth, our analysis was confined to the specific populations of the involved healthcare institutions and the UK Biobank, mainly of White European descent, potentially limiting the generalizability to other demographics. Fifth, while we used PCE as a clinical comparison tool for benchmarking against ECG2CAD, the PCE score was originally developed to predict 10-year risk of ASCVD rather than prevalent CAD. Sixth, we could not correlate ECG2CAD's prediction with information from coronary computed tomography angiography or coronary angiography as we did not have linkage to these imaging data.

Finally, 2 of our test cohorts (MGH and BWH) derive from the same hospital network that have operated independent cardiology services but are currently in the early stages of clinical integration.

Conclusions

We have developed a model, ECG2CAD, that leverages deep learning for the early detection of CAD using 12-lead ECGs. Our model shows favorable performance compared to age and sex or clinical-based models to detect CAD and is associated with incident risk of myocardial infarction, heart failure, and all-cause mortality. The ability of ECG2CAD to discern CAD in ECGs read across subgroups of age, sex, and ethnicity suggests potential utility specifically for identifying undiagnosed CAD. However, careful implementation and prospective evaluation are needed to understand benefits while minimizing unintended consequences. The findings motivate further research to refine the model, validate it across diverse populations, and evaluate practical utility from integration into clinical practice for optimized preventive care.

Perspectives.

COMPETENCY IN MEDICAL KNOWLEDGE: AI-based on single 12-lead ECG has the potential to identify individuals with existing CAD (“ECG2CAD”). Such model performs well even in subgroups of ethnicity, age, sex, and in those with normal ECGs or without conventional risk factors.

TRANSLATIONAL OUTLOOK: Careful implementation and prospective evaluation are needed to understand benefits of ECG2CAD while minimizing unintended consequences. Further research to refine the model, validate it across diverse populations, and evaluate practical utility from integration into clinical practice are needed.

Funding support and author disclosures

Dr Kany is supported by the Walter Benjamin Fellowship from the Deutsche Forschungsgemeinschaft (521832260). Dr Rämö is supported by a Fellowship grant from the Sigrid Jusélius Foundation. Dr Ellinor is supported by grants from the National Institutes of Health (R01HL092577, 1R01HL157635, 5R01HL139731), from the American Heart Association (18SFRN34110082, 961045) and from the European Union (MAESTRIA 965286). Dr Fahed is supported by grants from the National Heart Lung and Blood Institute (K08HL161448 and R01HL164629). Dr Ho is supported by grants from the National Institutes of Health (K24 HL153669, R01 HL160003, R01 HL140224). Dr Khurshid is supported by grants from the National Heart, Lung, and Blood Institute (K23HL169839) and the American Heart Association (23CDA1050571). Dr Fahed is cofounder of Goodpath. Dr Ellinor has received sponsored research support from Bayer AG, IBM Research, Bristol Myers Squibb, Pfizer, and Novo Nordisk; and has also served on advisory boards or consulted for MyoKardia and Bayer AG. Dr Ho has received research support from Bayer AG. Dr Philippakis is a Venture Partner and employee of GV; and has received funding from Intel, IBM, Verily, Microsoft, and Bayer, all unrelated to the present work. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.

Footnotes

The authors attest they are in compliance with human studies committees and animal welfare regulations of the authors’ institutions and Food and Drug Administration guidelines, including patient consent where appropriate. For more information, visit the Author Center.

Appendix

For supplemental methods, tables, and figures, please see the online version of this paper.

Supplementary data

Supplementary data
mmc1.docx (3.8MB, docx)

References

  • 1.American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee Heart disease and stroke statistics—2023 update: a report from the American Heart Association. Circulation. 2023;147:e93–e621. doi: 10.1161/CIR.0000000000001123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Roth G.A., Mensah G.A., Johnson C.O., et al. GBD-NHLBI-JACC Global Burden of Cardiovascular Diseases Writing Group Global Burden of cardiovascular diseases and risk factors, 1990-2019: update from the GBD 2019 Study. J Am Coll Cardiol. 2020;76:2982–3021. doi: 10.1016/j.jacc.2020.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Enos W.F., Holmes R.H., Beyer J. Coronary disease among United States soldiers killed in action in Korea; preliminary report. J Am Med Assoc. 1953;152:1090–1093. doi: 10.1001/jama.1953.03690120006002. [DOI] [PubMed] [Google Scholar]
  • 4.Strong J.P. Coronary atherosclerosis in soldiers: a clue to the natural history of atherosclerosis in the young. JAMA. 1986;256:2863–2866. doi: 10.1001/jama.256.20.2863. [DOI] [PubMed] [Google Scholar]
  • 5.Bergström G., Persson M., Adiels M., et al. Prevalence of subclinical coronary artery atherosclerosis in the general population. Circulation. 2021;144:916–929. doi: 10.1161/CIRCULATIONAHA.121.055340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Vedin O. Acta Universitatis Upsaliensis; 2015. Prevalence and Prognostic Impact of Periodontal Disease and Conventional Risk Factors in Patients With Stable Coronary Heart Disease. [Google Scholar]
  • 7.Figtree G.A., Vernon S.T., Harmer J.A., et al. Clinical pathway for coronary atherosclerosis in patients without conventional modifiable risk factors: JACC State-of-the-Art Review. J Am Coll Cardiol. 2023;82:1343–1359. doi: 10.1016/j.jacc.2023.06.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gorgels A.P., Vos M.A., Mulleneers R., de Zwaan C., Bär F.W., Wellens H.J. Value of the electrocardiogram in diagnosing the number of severely narrowed coronary arteries in rest angina pectoris. Am J Cardiol. 1993;72:999–1003. doi: 10.1016/0002-9149(93)90852-4. [DOI] [PubMed] [Google Scholar]
  • 9.Khurshid S., Friedman S., Reeder C., et al. ECG-based deep learning and clinical risk factors to predict atrial fibrillation. Circulation. 2022;145:122–133. doi: 10.1161/CIRCULATIONAHA.121.057480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Attia Z.I., Kapa S., Lopez-Jimenez F., et al. Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram. Nat Med. 2019;25:70–74. doi: 10.1038/s41591-018-0240-2. [DOI] [PubMed] [Google Scholar]
  • 11.Singh P., Haimovich J., Reeder C., et al. One clinician is all you need–cardiac magnetic resonance imaging measurement extraction: deep learning algorithm development. JMIR Med Inform. 2022;10 doi: 10.2196/38178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Khurshid S., Reeder C., Harrington L.X., et al. Cohort design and natural language processing to reduce bias in electronic health records research. NPJ Digit Med. 2022;5:47. doi: 10.1038/s41746-022-00590-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bycroft C., Freeman C., Petkova D., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yadlowsky S., Hayward R.A., Sussman J.B., McClelland R.L., Min Y.-I., Basu S. Clinical implications of revised pooled cohort equations for estimating atherosclerotic cardiovascular disease risk. Ann Intern Med. 2018;169:20–29. doi: 10.7326/M17-3011. [DOI] [PubMed] [Google Scholar]
  • 15.Goff D.C., Jr., Lloyd-Jones D.M., Bennett G., et al. American College of Cardiology/American Heart Association Task Force on Practice Guidelines 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. J Am Coll Cardiol. 2014;63(25_Part_B):2935–2959. doi: 10.1016/j.jacc.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wu P., Gifford A., Meng X., et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med Inform. 2019;7 doi: 10.2196/14325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hughes J.W., Tooley J., Torres Soto J., et al. A deep learning-based electrocardiogram risk score for long term cardiovascular death and disease. NPJ Digit Med. 2023;6:169. doi: 10.1038/s41746-023-00916-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Awasthi S., Sachdeva N., Gupta Y., et al. Identification and risk stratification of coronary disease by artificial intelligence-enabled ECG. EClinicalMedicine. 2023;65 doi: 10.1016/j.eclinm.2023.102259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sinno M.C.N., Kowalski M., Kenigsberg D.N., Krishnan S.C., Khanal S. R-wave amplitude changes measured by electrocardiography during early transmural ischemia. J Electrocardiol. 2008;41:425–430. doi: 10.1016/j.jelectrocard.2007.12.008. [DOI] [PubMed] [Google Scholar]
  • 20.Byrne P., Cullinan J., Smith S.M. Statins for primary prevention of cardiovascular disease. BMJ. 2019;367 doi: 10.1136/bmj.l5674. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data
mmc1.docx (3.8MB, docx)

Articles from JACC: Advances are provided here courtesy of Elsevier

RESOURCES