Artificial Intelligence–Enhanced Electrocardiogram Models for Detection of Left Ventricular Dysfunction: A Comparison Study

Philip M Croon; Machteld J Boonstra; Cornelis P Allaart; Bauke KO Arends; Lovedeep S Dhingra; Yu-Chang Huang; Thomas Mast; Rohan Khera; Chang-Fu Kuo; Joon-Myoung Kwon; Hak Seung Lee; Min Sung Lee; Rutger R van de Leur; Zhi-Yong Liu; Evangelos K Oikonomou; Jasper L Selder; Michiel M Winter; Folkert W Asselbergs

doi:10.1016/j.jacadv.2025.102572

. 2026 Jan 20;5(2):102572. doi: 10.1016/j.jacadv.2025.102572

Artificial Intelligence–Enhanced Electrocardiogram Models for Detection of Left Ventricular Dysfunction

A Comparison Study

Philip M Croon ^a,^b,^∗, Machteld J Boonstra ^a,^c,^∗, Cornelis P Allaart ^a, Bauke KO Arends ^d, Lovedeep S Dhingra ^b, Yu-Chang Huang ^e, Thomas Mast ^f, Rohan Khera ^b,^g,^h, Chang-Fu Kuo ^i,^j, Joon-Myoung Kwon ^k,^l, Hak Seung Lee ^k,^l, Min Sung Lee ^k,^l, Rutger R van de Leur ^d, Zhi-Yong Liu ^m, Evangelos K Oikonomou ^b, Jasper L Selder ^a, Michiel M Winter ^a, Folkert W Asselbergs ^a,^n,^o,^∗

PMCID: PMC12856472 PMID: 41564731

Abstract

Background

Several artificial intelligence–enhanced electrocardiogram (AI-ECG) models have shown promise in detecting left ventricular systolic dysfunction (LVSD), but their head-to-head agreement and performance have not been independently compared within the same cohort.

Objectives

This study aimed to compare the performance of published AI-ECG models for LVSD detection in a standardized external cohort and evaluate the field’s transparency and reproducibility.

Methods

We systematically reviewed AI-ECG models predicting LVSD and assessed the risk of bias. Authors were invited to share models for external validation in a well-phenotyped registry of patients undergoing routine clinical cardiac magnetic resonance imaging with cardiologist-adjudicated reports and paired ECGs. Model performance was evaluated in all consecutive patients and a lower-complexity subgroup with 15% LVSD prevalence.

Results

We identified 35 studies describing 51 models, reporting high (area under the receiver-operating characteristic curve [AUROC] >0.80) or excellent (AUROC >0.90) performance. The risk of bias is high and primarily attributed to the limited description of development and validation cohort characteristics, as well as the lack of independent external validation. Four groups (from Korea, the United States, Taiwan, and the Netherlands) shared models for independent testing. AUROCs ranged from 0.83 to 0.93 in all patients (n = 1,203; mean age 59 ± 15 years; 450 [35%] female) and from 0.87 to 0.96 in the lower-complexity subset. Performance remained consistent across subgroups, with slight decreases in ECGs showing wide QRS complexes or atrial fibrillation.

Conclusions

In this first-in-kind independent validation and head-to-head comparison study, AI-ECG for LVSD detection demonstrated strong performance despite training on disparate populations. However, the limited-availability of models hinders independent validation.

Key words: artificial intelligence, deep learning, electrocardiography, heart failure, left ventricular dysfunction

Central Illustration

Recent advances in artificial intelligence (AI) have enabled the detection of subtle abnormalities in electrocardiograms (ECGs) that often elude expert interpretation.¹^,² This expands the ECG’s utility to detect conditions that typically require advanced imaging.3, 4, 5, 6, 7 Given its low cost and widespread availability, AI-enhanced ECG (AI-ECG) models show great promise for large-scale screening and early detection of cardiovascular disease, even in low-resource settings.⁸ While the early results of these models are encouraging, clinical adoption remains limited.

A major obstacle to the widespread adoption of AI-ECG is the uncertainty surrounding its performance in real-world settings across diverse populations.⁹ To evaluate the generalizability of these models beyond their training population and to identify potential biases introduced by the training population, external validation is crucial.¹⁰^,¹¹ However, most AI-ECG models are not publicly available.¹⁰ As a result, external validation is often conducted by the original developers, raising concerns about biases like selective reporting and overfitting to specific populations.¹⁰^,¹²

Among the various applications of AI-ECG, potentially one of the most clinically impactful and commonly studied is the detection of left ventricular systolic dysfunction (LVSD).²^,¹³^,¹⁴ Early and accurate detection of LVSD can significantly improve patient outcomes by enabling timely intervention and management.15, 16, 17 However, LVSD is often diagnosed at an advanced disease stage, which delays the timely initiation of therapy.¹⁸ Using AI-ECG models may provide a straightforward tool for screening patients at risk for LVSD who require further clinical evaluation.¹⁹

We sought to systematically compare the performance of AI-ECG models for LVSD detection in a standardized external cohort, while assessing the transparency and reproducibility of the field of AI-ECG (Central Illustration). We identified all AI-ECG models through a systematic review and evaluated their internal and external performance, as well as the risk of bias of the identified models. We invited all authors to share their models for the first independent external validation and head-to-head comparison of AI-ECG for LVSD detection.

Central Illustration — **Artificial Intelligence–Enhanced Electrocardiogram Models for Detection of Left Ventricular Dysfunction - A Comparison Study**

We propose a structured framework for the independent validation of AI-enhanced ECG models. Of the 51 published AI-enhanced ECG models for left ventricular systolic dysfunction, only 4 were available for true external validation. These models were evaluated in an independent cohort with paired electrocardiograms and cardiac magnetic resonance imaging as the reference standard. Despite differences in architecture and training cohorts, model performance remained high in this external setting. However, limited openness and accessibility of AI-enhanced ECG models constrain independent assessment and may impede broader clinical adoption. AI-ECG = AI-Enhanced ECG; ECG = electrocardiogram; LVSD = left ventricular systolic dysfunction; CGMH = Chang Gung Memorial Hospital; CMR = cardiac magnetic resonance; AUROC= area under the receiver-operating characteristic curve.

Methods

The institutional review board (Medisch Etische Toetsingscommissie Amsterdam UMC) waived the requirement for informed consent since the study involved secondary analysis of existing data.

Systematic review of AI-ECG for LVSD

To identify all available AI-ECG models for detecting LVSD, we conducted a systematic review of the literature. This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (Supplemental Table 1).²⁰ Screening and data extraction were performed by 2 independent reviewers (PMC, MJB), and disagreements were resolved through discussion. The comprehensive methods for the systematic review are presented in the Supplemental Appendix.

External validation cohort

Data for this study were obtained from a prospectively maintained registry that includes all consecutive patients who underwent cardiac magnetic resonance (CMR) imaging as part of routine clinical care, the Adam CMR database (Amsterdam University Medical Center [UMC], The Netherlands). The database contains over 130 variables, including patient demographics, clinical characteristics, and detailed quantitative CMR image measurements. All CMR scans were conducted between 2015 and 2022, and the reports were reviewed by expert cardiologists. From all patients in the Adam CMR database, clinically obtained 12-lead ECGs (Philips PageWriter or GE MUSE) were collected. All ECGs recorded within a 45-day window, covering 15 days before and 30 days after CMR, were included in the external validation cohort. Left ventricular ejection fraction (LVEF) was documented as a continuous variable for all patients and used to define the LVEF threshold labels.

To assess the models’ behavior in a lower-complexity scenario, such as in the cardiology outpatient clinic or emergency department, we created a subcohort, referred to as the representative heart failure (HF) cohort. This cohort was constructed by stratified resampling from the full CMR cohort to include 15% of patients with LVEF <40% and 10% with LVEF between 40 and 50. While this cohort is a subset of the CMR cohort and not a real clinical cohort, the prevalence is in accordance with previously published reports of LVSD prevalence in these settings.²¹^,²² Among the remaining controls, CMR findings were either normal or limited to isolated aortic pathology or coronary artery disease, limiting the number of complex cardiomyopathies with a normal LVEF. This stratification was performed to provide a sensitivity analysis to assess the models’ performances in a less complex setting and at the point of care where AI-ECG models are most likely to be applied, such as the emergency department or outpatient clinic. Additionally, since outcome prevalence can significantly impact model performance metrics, this design allows for the assessment of the influence of LVEF distribution on the discriminative performance. The use of CMR, known for its high precision in measuring LVEF, ensures robust ground truth labels, making it a particularly suitable modality for the AI-ECG evaluation.

External validation of included LVSD models

To gain access to existing AI-ECG models, we contacted all the corresponding authors of identified AI-ECG for LVSD publications. Initial emails were sent on July 16, 2024, outlining our request for independent external validation of their models. Specifically, we asked that the developed AI-ECG models for LVSD detection be shared for local deployment at Amsterdam UMC. If no response was received, a follow-up email was sent 2 weeks later to increase the likelihood of engagement and provide authors with a fair opportunity to share their models for independent validation. The email templates for the initial invitation and follow-up are provided in the Supplemental Methods. For all responding groups, meetings were held to discuss local technical deployment and concerns regarding data and model privacy. After these meetings, the research groups prepared their models for local deployment at Amsterdam UMC.

Model performance assessment

All shared models were applied to the ECGs in the CMR cohort to generate predicted probabilities for LVSD ranging from 0 to 1. The models were delivered as modules; either containerized using Docker or through repositories containing the entire pipeline and model weights. Thresholds were based on those provided by the original developers. Each module ingested raw signals since all signal-processing steps were integrated within the module. The number of ECGs for which models failed to produce valid predictions, for example, due to noise, was recorded and reported for each model. To ensure a fair comparison across models, sensitivity analyses were conducted using only the subset of ECGs that returned valid predictions for each model. Results were stratified by sex (male and female), age (<60 and ≥60 years), QRS duration (<100 ms and ≥100 ms), heart rate (>80 beats/min and ≤80 beats/min), atrial fibrillation status (present or absent), and clinical diagnosis related to the CMR, including coronary artery disease, ischemia, dilated cardiomyopathy, hypertrophic cardiomyopathy, myocarditis, and other cardiomyopathies. Additionally, false positive rates were calculated for LVEF 40% to 50% or LVEF >50%.

Sensitivity analyses

To determine if recalibrating the model thresholds could improve model performance in the Amsterdam UMC CMR data set, we optimized the model probability threshold based on Youden’s index within the Amsterdam UMC data set. To assess the similarity of the outputs produced by the individual models, we calculated pairwise correlation coefficients using the predicted probabilities of each model.

Statistical analysis

The characteristics of the study population were summarized using mean ± SD for continuous variables and counts and percentages for categorical variables. Model performance was evaluated by calculating the area under the receiver-operating characteristic curve (AUROC), sensitivity, specificity, negative predictive value, and positive predictive value, along with 95% CIs derived from 500 bootstrap samples. For the representative HF cohort, a random distinct subset of the full cohort was used in each bootstrap iteration, fulfilling the predefined distributions. Calibration was evaluated by comparing predicted and observed risks across deciles of predicted probability. Pairwise correlations between model predictions were estimated using Spearman’s rank correlation coefficients. Analyses were conducted in Python 3.11.

Results

Identification and review of AI-ECG models

A systematic review of the literature identified 35 eligible studies describing 51 unique AI-ECG models for LVSD (Table 1). The full results of the review are presented in the Supplemental Results, Supplemental Figure 1, and Supplemental Tables 4 to 7. Most models were convolutional neural networks trained on ECG signal data, with LVSD mainly defined as LVEF <40% on echocardiography. Among these articles, 9 (25%) reported internal validation results, 11 (31%) included both internal and external validation, and 15 (43%) involved external validation only. All external validation efforts were carried out by the model developers.

Table 1.

Overview of all Included Articles

First Author	Year of Publication	Outcome Label	Validation
Bergquist et al²³	2023	LVEF <40%	Internal validation
Khunte et al²⁴	2023	LVEF <40%	Internal validation
Vaid et al²⁵	2023	LVEF <40%	Internal and external validation
Honarvar et al²⁶	2022	LVEF <35%	Internal validation
Huang et al¹³	2023	LVEF <40%	Internal and external validation
Chen et al²⁷	2022	LVEF <35%, <50%	Internal and external validation
Golany et al²⁸	2022	LVEF <35%	Internal validation
Yagi et al¹¹	2022	LVEF <40%	Internal and external validation
Lee et al²⁹	2022	LVEF <40%	Internal and external validation
Sangha et al²	2023	LVEF <40%	Internal and external validation
Chen et al³⁰	2022	LVEF <35%	Internal and external validation
Kwon et al³¹	2022	LVEF <40%	Internal and external validation
Surendra et al³²	2023	Asymptomatic LVSD, HFrEF, HFmrEF, HFpEF	Internal validation
Vaid et al³³	2022	LVEF <35%	Internal and external validation
Katsushika et al³⁴	2021	LVEF <40%	Internal validation
Cho et al³⁵	2021	LVEF <40%	Internal and external validation
Choi et al³⁶	2022	LVEF <40%	External validation model, Cho et al
Sun et al³⁷	2021	LVEF <50%	internal validation
Attia et al⁶	2019	LVEF <35%	internal validation
Attia et al³⁸	2022	LVEF <40%	External validation model, Attia et al
Harmon et al³⁹	2022	LVEF <50%, <40%	External validation model, Attia et al
Harmon et al⁴⁰	2022	LVEF <40%	External validation model, Attia et al
Attia et al⁴¹	2022	LVEF <35%, <40%, <50%	External validation model, Attia et al
Attia et al⁴²	2021	LVEF <35%	External validation model, Attia et al
Attia et al⁴³	2019	LVEF <35%	External validation model, Attia et al
Klein et al⁴⁴	2022	LVEF <35%	External validation model, Attia et al
Bachtiger et al⁴⁵	2021	LVEF <40%	External validation model, Attia et al
Kashou et al⁴⁶	2021	LVEF <35%	External validation model, Attia et al
Brito et al⁴⁷	2021	LVEF <40%	External validation model, Attia et al
Kashou et al⁴⁸	2021	LVEF 35%	External validation model, Attia et al
Jentzer et al⁴⁹	2021	LVEF <35%	External validation model, Attia et al
Adedinsewo et al⁵⁰	2020	LVEF <50%	External validation model, Attia et al
Noseworthy et al⁵¹	2020	LVEF <35%	External validation model, Attia et al
van de Leur et al⁵²	2022	LVEF <40%	Internal and external validation
Vaid et al⁵³	2022	LVEF <40%	Internal validation

Open in a new tab

HFmrEF = heart failure with mildly reduced ejection fraction; HFpEF = heart failure with preserved ejection fraction; HFrEF = heart failure with reduced ejection fraction; LVEF = left ventricular ejection fraction; LVSD = left ventricular systolic dysfunction.

Risk of bias of AI-ECG models

The description of baseline characteristics varied, with multiple studies not providing detailed cohort characteristics for both the development and external validation cohorts (Figure 1A). In 12 of the 20 articles describing model derivation, external validation was reported either within the same article or in a separate article (Supplemental Table 6). For the internal validation, the performance was excellent in 29 (57%) of the models, good in 19 (37%), moderate in 2 (4%), and poor in one (2%), respectively (Supplemental Table 6). For external validation, the performance was excellent in 31 (50%) of the models, good in 28 (45%), and moderate in 3 (5%), with none being poor (Figure 1B, Supplemental Table 7). Of the articles reporting on the derivation of their models, 9 stratified the results to identify potential bias toward subgroups based on age, sex, ethnicity, comorbidity, or ECG characteristics (Supplemental Table 6).

**Reported Baseline Characteristics, Internal and External Performance for Included Models**

(A) These heatmaps display baseline characteristics reported in derivation (top) and external validation (bottom) studies. Each row represents a study, and each column indicates a specific characteristic (eg, age, sex, comorbidities). Green shows that the characteristic was reported, while red indicates it was not. (B) The difference between internal validation area under the receiver-operating characteristic curve and external validation area under the receiver-operating characteristic curve across models. The purple diamonds represent external validation reported in the original paper. The red diamonds show external validation conducted in separate papers. All values are referenced to internal area under the receiver-operating characteristic curve; therefore, a value higher than zero indicates a higher internal area under the receiver-operating characteristic curve compared to the area under the receiver-operating characteristic curve during external validation. AUROC = area under the receiver-operating curve; BWH = Brigham and Woman's Hospital; CNN = convolutional neural network; ECG = electrocardiogram; MGH = Massachusetts General Hospital; UCSF = University of California San Francisco; VAE = variational auto encoder.

According to the Prediction model Risk Of Bias ASsessment Tool assessment, the majority of derivation studies were rated as having high or unclear risk of bias, mainly due to a lack of external validation or inadequate cohort descriptions (Supplemental Figures 2 and 3). Concerns about applicability were infrequent (Supplemental Figures 2 and 3).

Model availability

None of the reported models and/or weights were publicly available for independent external validation. After contacting all corresponding authors of articles describing the development of AI-ECG models for LVSD detection, 6 research groups responded to our inquiries. Of these, 4 research groups agreed to participate in the external validation study by sharing their models for external validation. The models originated from Korea, the United States, Taiwan, and the Netherlands.

The research group identified through Cho et al.³⁵ shared an algorithm called AiTiALVSD v2.00.00, which is based on a transformer architecture and convolutional encoders. It was trained on 498,726 ECGs from 186,889 patients across 16 tertiary hospitals in Korea, using an 80–10% to 10% split. Labels were defined as LVEF <40% on Transthoracic echocardiography (TTE) measurement within 30 days of the included ECG. External validation performance of this version of the model has not yet been published.

Sangha et al.² introduced an AI-ECG model called ECG Vision. This model utilized a pretrained version of EfficientNet-B3, which was fine-tuned using a data set comprising 116,210 patients from Yale New Haven Hospital in the United States, with an 85-5% to 10% split for training, validation, and testing. The model inputs are 300x300 ECG images, either directly or plotted from raw signals. LVSD was defined as LVEF <40% based on TTE measurements performed within 15 days of the ECG. The model achieved an AUROC of 0.90 to 0.95 across 6 external validation sites (Supplemental Tables 6 and 7).

Huang et al. shared a ResNet-based model trained on signal data from 380,675 patients at Chang Gung Memorial Hospital (CGMH) in Taiwan, using a 35-15% to 50% split for training, validation, and testing, which will be referred to as the CGMH model.¹³ LVSD was defined as LVEF <40% on TTE performed within 2 weeks of the ECG. External validation was performed on data from another hospital in Taiwan, resulting in an AUROC of 0.94 for internal validation and 0.95 for external validation (Supplemental Tables 6 and 7).

The research group, as identified through the publication by van de Leur et al.,⁵² shared an unpublished ResNet-based model trained on median-beat ECGs from 57,250 patients across 2 hospitals in the Netherlands. They used a 90% to 10% split for training and internal validation and evaluated it on a different internal cohort of newly referred cardiology outpatients. This model differs from the one in the original publication, with differences in architecture, training, and development cohort. The pipeline applied a custom algorithm to construct median-beat ECGs. As median-beat reconstruction requires consistent and noise-free beats, this preprocessing step was more sensitive to signal noise, leading to rejection of 6% of ECGs. LVSD was defined as LVEF <40% on TTEs performed within 90 days of the ECG.

External validation of AI ECG models for LVSD detection

Independent external validation was conducted using the full CMR cohort, which included 1,203 patients and 4,737 corresponding ECGs, with one randomly selected ECG per individual in each bootstrap iteration. The average age of the cohort was 59 ± 15 years, and 431 (36%) were female. Based on CMR measurements, LVEF was <40% in 212 (18%) patients, between 40% and 50% in 231 (19%) patients, and >50% in the remaining 760 (63%) patients (Table 2). The subset of patients for which all models provided valid predictions included 94% of all ECGs. For the representative HF cohort, 554 patients were included in each bootstrap iteration, consisting of 84 (15%) with LVEF <40%, 50 (9%) with LVEF between 40% and 50%, and the remaining 420 (76%) with LVEF >50%.

Table 2.

Baseline Characteristics of the Amsterdam UMC CMR-Cohort

Number of ECGs	4,737
Number of unique patients	1,203
Age	59.5 ± 14.9
Female	431 (35.9)
LVEF 40-50	231 (19.3)
LVEF<40	212 (17.7)
Coronary artery disease	328 (27.6)
Cardiomyopathy other	147 (12.4)
Ischemia	87 (7.3)
Dilated cardiomyopathy	83 (7.0)
Aortic pathology	49 (4.1)
Hypertrophic cardiomyopathy	63 (5.3)

Open in a new tab

Values are n, mean ± SD, or n (%). Diagnoses are based on the conclusions of the cardiac magnetic resonance imaging study reported by an expert cardiologist.

CMR = cardiac magnetic resonance; ECG = electrocardiogram; n = number of samples; other abbreviations as in Table 1.

Performance of AI-ECG models in detecting left ventricular dysfunction

AiTiALVSD achieved an AUROC of 0.94 (CI: 0.93; 0.94) in the full CMR cohort and 0.96 (CI: 0.94; 0.97) in the representative HF cohort (Figure 2A, Supplemental Tables 8 and 9). ECG Vision achieved an AUROC of 0.89 (CI: 0.88; 0.90) in the full CMR cohort and 0.92 (CI: 0.89; 0.94) in the representative HF cohort. The AUROC for CGMH was 0.90 (CI: 0.89; 0.91) in the full CMR cohort and 0.94 (CI: 0.92; 0.95) in the representative HF cohort, while the Utrecht model achieved an AUROC of 0.83 (CI: 0.82; 0.84) and 0.86 (CI: 0.83; 0.89) in these cohorts, respectively. Sensitivity analyses limited to ECGs with valid predictions across all models demonstrated no meaningful change in model performance (Supplemental Table 10).

**Performance and Calibration of Included Models for External Validation**

Panel A: Forest plot showing the independent external validation results of 4 models (Cho et al., Sangha et al., Huang et al., and van de Leur et al.) in the full cardiac magnetic resonance cohort and the augmented heart failure cohort. The plot displays area under the receiver-operating characteristic curve values with their 95% CI. Panel B: Calibration plot for the external validation. The mean predicted probabilities vs observed proportions, shown using quantile binning (10 bins). The “Perfect Calibration” line indicates ideal model calibration. CMR = cardiac magnetic resonance; HF = heart failure; AUROC = area under the receiver-operating characteristic curve; Sens = Sensitivity; Spec = Specificity; PPV = positive predictive value; NPV = negative predictive value; ECG = electrocardiogram; CGMH = Chang Gung Memorial Hospital.

The AiTiALVSD and ECG Vision slightly underestimated the observed risk, while the models CGMH and Utrecht slightly overestimated it (Figure 2B). Recalibrating the thresholds caused small differences in sensitivity and specificity. In 3 models, sensitivity improved slightly at the cost of a minor decrease in specificity, whereas in 1 model, specificity increased slightly with a corresponding drop in sensitivity (Supplemental Table 11).

Stratification by age and sex did not reveal any potential bias (Figure 3, Supplemental Tables 8 and 9). All 4 models demonstrated slightly reduced performance in ECGs with a QRS width > 100 ms and in patients with atrial fibrillation. Additionally, AUROCs were generally lower in subgroups with cardiomyopathy. Despite these differences, the models demonstrated good to excellent performance across all subgroups (Figure 3). The false positive rates ranged from 53% to 65% in patients with an LVEF between 40% and 50%, and from 12% to 39% in patients with an LVEF >50% for the included models (Supplemental Table 12).

**Subgroup Analysis of Area Under the Receiver-Operating Characteristic Curve Values for External Validation of Models**

Performance in the full cardiac magnetic resonance cohort across various patient subgroups is shown, including sex, age, QRS duration, heart rate, atrial fibrillation status, coronary disease, types of cardiomyopathies, and myocarditis. The total number of patients is listed, along with the number of cases with left ventricular systolic dysfunction in the chosen category (n patients [cases EF<40%]). CMR; Cardiac Magnetic Resonance imaging, AUROC values are provided with 95% CIs to give insights into model performance within specific populations. AUROC = area under the receiver-operating characteristic curve; CGMH = Chang Gung Memorial Hospital; CMP = cardiomyopathy; DCM = dilated cardiomyopathy; ECG = electrocardiogram; EF = ejection fraction; HCM = hypertrophic cardiomyopathy.

Correlation between individual models

Pairwise Spearman’s correlation coefficients of the continuous model outputs demonstrated moderate to high correlations across all models (Figure 4). The highest correlation was observed between the CGMH model and AiTiALVSD (ρ = 0.86). Correlations with the Utrecht model were generally lower (ρ 0.69-0.73).

**Correlation Between the Different AI-Enhanced ECG Models**

The heatmap shows the pairwise Spearman correlation coefficients between the model predictions of the different included models. AI-ECG = AI-Enhanced ECG; ECG = electrocardiogram; CGMH = Chang Gung Memorial Hospital.

Discussion

We are the first to introduce a new framework for the external validation of AI-ECG, providing a systematic, head-to-head comparison of all available AI-ECG models for LVSD detection. We found that the overall risk of bias for AI-ECG remains high, mainly due to incomplete descriptions of training cohorts and an overreliance on developer-led validations. Importantly, of the 51 models we identified, only 4 were accessible for independent testing, highlighting significant gaps in transparency and raising important questions about external validity. The 4 models available for external validation on a single, well-characterized data set performed consistently despite differences in training populations from different continents. Our research shows that their diagnostic accuracy depends on edge-case examples, notably individuals with an LVEF between 40 and 50%, where false positives are common. High intermodel correlations indicate that AI-ECG models are capturing similar electrocardiographic signs of LVSD. Furthermore, validation in a more complex cohort attenuates the diagnostic discrimination ability, underscoring the importance of using representative validation populations.

The strong performance of AI-ECG models for LVSD detection in the literature underscores their potential to screen patients for early diagnosis of LVSD. However, this study reveals significant gaps in the evaluation of AI-ECG models, particularly in assessing potential biases, which may undermine their trustworthiness and hinder clinical adoption. External validation in all reviewed articles was performed solely by the original developers, increasing the risk of selective data inclusion and biased reporting. Moreover, most models do not fully align with the FUTURE-AI guidelines, particularly in terms of transparency, reproducibility, and open access.¹⁰ With the recent Food and Drug Administration approval of AI-ECG models, which signals their potential integration into clinical practice, it is crucial to evaluate the models' performance while minimizing potential bias. This makes it even more concerning that only a small number of AI-ECG models were made available for our external validation study. Selective sharing of models likely introduces systematic bias, as investigators who are more confident in their models may be more willing to share them, creating an inflated and potentially misleading perception of performance at the field level.

The 4 models shared for independent external validation demonstrated consistent performance, with a slight attenuation of the AUROC when comparing the lower-complexity (representative HF) cohort to the higher-complexity (full CMR) cohort. Notably, this external performance remained consistent, even though 3 of the models were trained on geographically distinct cohorts (Korea, the United States, and Taiwan). Moreover, the continuous probabilities produced by these models exhibited high pairwise correlation, indicating that AI-identified ECG features are consistent despite being trained on 3 continents, which supports their generalizability. Additionally, models trained on raw ECG signals or ECG images performed similarly well, aligning with recent literature findings.⁸^,54, 55, 56 It is essential to note that all included AI-ECG models were initially developed and validated using TTE-derived LVEF as the ground truth. In contrast, our independent validation utilized CMR, the reference standard for LVEF assessment, known for its superior accuracy and reproducibility. Despite the differences between these modalities, with TTE tending to underestimate LVEF compared to CMR, the models maintained strong performance, highlighting their robustness and potential clinical utility. The slight performance decline observed in the full CMR cohort warrants careful consideration and possibly model recalibration when implemented in highly complex patient populations.

Our findings demonstrate that the characteristics of the validation cohort substantially influence model performance. An AUROC difference of ∼0.04 between the low-complexity (representative HF) and high-complexity (CMR referral) populations underscores this effect. Many published models fail to clearly describe their cohort composition, which limits their interpretability and applicability.⁵⁷ Performance variation across subgroups, such as those with broad QRS or atrial fibrillation, further highlights the need for transparent reporting and potential model recalibration.⁵⁸ Notably, most false positives in our study had CMR-derived LVEF between 40% and 50%, suggesting borderline dysfunction and reflecting the challenge of distinguishing subclinical or overlapping cardiac conditions.

Limitations and future directions

We included 4 models for external validation, all of which showed good or excellent performance in our external validation. However, this might have introduced publication bias, as research groups with better-performing models may have been more willing to share them. Additionally, some of the research groups in the external validation provided updated versions of their models that differ from those in the systematic review, precluding direct comparison of their internal and external performance. Furthermore, even though all models were trained to detect LVSD defined as LVEF <40%, heterogeneity in the input format of the ECG and development pipelines limits direct equivalence of performance estimates. Another important limitation is that the analysis was limited to a single Western European cohort of patients referred for CMR, representing a more challenging classification task and enrichment of complex cardiomyopathies. To mitigate the effect of cohort-specific selection bias, we simulated a population with a prevalence similar to that in the emergency departments or outpatient clinics. This analysis should be interpreted as a sensitivity analysis since it is an augmented cohort, and not a real clinical cohort. Finally, we used CMR-derived LVSD to define the outcome of interest. While CMR is considered the reference standard for LVEF assessment, it systematically differs from TTE-derived LVEF, which was used during model development. While CMR allows for a more accurate evaluation, this modality mismatch may have influenced model performance, particularly in patients with borderline LVEF values.

Future research should prioritize collaborative efforts, sharing models, and independent external validation across data sets that are geographically and demographically diverse. Additionally, using standardized metrics and benchmarks to describe data set features will be crucial for ensuring consistency and comparability in the literature. With these combined efforts, AI-ECG models can evolve from promising research tools to actual clinical decision-support systems, providing real benefits to patients.

Conclusions

AI-ECG models for detecting LVSD demonstrated high discriminative performance in a novel framework for independent external validation. However, the incomplete reporting of development and validation cohort characteristics in many studies raises concerns about potential bias, and the limited availability of models greatly hinders independent validation, thereby impeding wider clinical adoption.

Funding support and author disclosures

This work was funded by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee EP/Z000211/1. This work received funding from the European Union’s Horizon Europe research and innovation program under Grant Agreement No. 101057849 (DataTools4Heart project) and No. 101080430 (AI4HF project). Dr Croon is supported by the University of Amsterdam Research Priority Agenda Program AI for Health Decision-Making. Dr Asselbergs is supported by UCL Hospitals NIHR Biomedical Research Centre. The Department of Cardiology at UMC Utrecht may receive royalties in the future from sales of deep learning ECG algorithms developed by Cordys Analytics, a spin-off company. Additionally, Dr Leur are shareholders of Cordys Analytics. Dr Oikonomou reported being a cofounder of Evidence2Health; serving as a consultant to Caristo Diagnostics Ltd and Ensight-AI; having stock options in Caristo Diagnostics Ltd; receiving a grant from the National Heart, Lung, and Blood Institute of the National Institutes of Health; and having patents 63/508,315 and 63/177,117 outside the submitted work. Dr Khera reported receiving grants from the National Heart, Lung, and Blood Institute, National Institutes of Health, Doris Duke Charitable Foundation, Bristol Myers Squibb, Novo Nordisk, BridgeBio, and Blavatnik Foundation; being an academic cofounder of Ensight-AI and Evidence2Health; having patents 63/346,610, WO2023230345A1, US20220336048A1, 63/484,426, 63/508,315, 63/580,137, 63/606,203, 63/619,241, and 63/562,335 pending; and serving as associate editor of JAMA outside the submitted work. These affiliations and potential financial interests have been disclosed and are being managed in accordance with institutional policies. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.

Footnotes

The authors attest they are in compliance with human studies committees and animal welfare regulations of the authors’ institutions and Food and Drug Administration guidelines, including patient consent where appropriate. For more information, visit the Author Center.

Appendix

For an expanded Methods section and supplemental tables and figures, please see the online version of this paper.

Supplementary Material

mmc1.docx^{(492.9KB, docx)}

References

1.Siontis K.C., Noseworthy P.A., Attia Z.I., Friedman P.A. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021;18:465–478. doi: 10.1038/s41569-020-00503-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Sangha V., Nargesi A.A., Dhingra L.S., et al. Detection of left ventricular systolic dysfunction from electrocardiographic images. Circulation. 2023;148:765–777. doi: 10.1161/CIRCULATIONAHA.122.062646. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sangha V., Oikonomou E.K., Khera R. Artificial intelligence applied to electrocardiographic images for scalable screening of transthyretin amyloid cardiomyopathy. Cardiovasc Med. 2024 doi: 10.1101/2024.09.30.24314651. [DOI] [Google Scholar]
4.Khunte A., Sangha V., Oikonomou E.K., et al. Automated diagnostic reports from images of electrocardiograms at the point-of-care. medRxiv. 2024 doi: 10.1101/2024.02.17.24302976. [DOI] [Google Scholar]
5.Noseworthy P.A., Attia Z.I., Behnken E.M., et al. Artificial intelligence-guided screening for atrial fibrillation using electrocardiogram during sinus rhythm: a prospective non-randomised interventional trial. Lancet. 2022;400:1206–1212. doi: 10.1016/S0140-6736(22)01637-3. [DOI] [PubMed] [Google Scholar]
6.Attia Z.I., Kapa S., Lopez-Jimenez F., et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med. 2019;25:70–74. doi: 10.1038/s41591-018-0240-2. [DOI] [PubMed] [Google Scholar]
7.Croon P.M., Dhingra L.S., Biswas D., Oikonomou E.K., Khera R. Phenotypic selectivity of artificial intelligence-enhanced electrocardiography in cardiovascular diagnosis and risk prediction. Circulation. 2025;152:1282–1294. doi: 10.1161/circulationaha.125.076279. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Croon P.M., Pedroso A.F., Khera R. The emerging role of AI in transforming cardiovascular care. Future Cardiol. 2025;21:547–550. doi: 10.1080/14796678.2025.2492973. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Myhre P.L., Grenne B., Asch F.M., et al. Artificial intelligence-enhanced echocardiography in cardiovascular disease management. Nat Rev Cardiol. 2025:1–19. doi: 10.1038/s41569-025-01197-0. [DOI] [PubMed] [Google Scholar]
10.Lekadir K., Frangi A.F., Porras A.R., et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ. 2025;388 doi: 10.1136/bmj-2024-081554. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Yagi R., Goto S., Katsumata Y., MacRae C.A., Deo R.C. Importance of external validation and subgroup analysis of artificial intelligence in the detection of low ejection fraction from electrocardiograms. Eur Heart J Digit Health. 2022;3:654–657. doi: 10.1093/ehjdh/ztac065. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Rosenbaum L. Reconnecting the dots — reinterpreting industry–physician relations. N Engl J Med. 2015;372:1860–1864. doi: 10.1056/NEJMms1502493. [DOI] [PubMed] [Google Scholar]
13.Huang Y.-C., Hsu Y.-C., Liu Z.-Y., et al. Artificial intelligence-enabled electrocardiographic screening for left ventricular systolic dysfunction and mortality risk prediction. Front Cardiovasc Med. 2023;10 doi: 10.3389/fcvm.2023.1070641. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Sandeep B., Huang X., Xiao Z. Artificial intelligence in heart failure improving the efficiency or dependency on it? Letter regarding the article “Artificial intelligence and heart failure: a state-of-the-art review.”. Eur J Heart Fail. 2024;26:704. doi: 10.1002/ejhf.3017. [DOI] [PubMed] [Google Scholar]
15.Ledwidge M., Gallagher J., Conlon C., et al. Natriuretic peptide-based screening and collaborative care for heart failure: the STOP-HF randomized trial. JAMA. 2013;310:66–74. doi: 10.1001/jama.2013.7588. [DOI] [PubMed] [Google Scholar]
16.Anon 2021 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure. https://www.escardio.org/Guidelines/Clinical-Practice-Guidelines/Acute-and-Chronic-Heart-Failure [DOI] [PubMed]
17.Heidenreich P.A., Bozkurt B., Aguilar D., et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: a report of the American College of Cardiology/American Heart Association Joint Committee on clinical practice guidelines. Circulation. 2022;145:e895–e1032. doi: 10.1161/CIR.0000000000001063. [DOI] [PubMed] [Google Scholar]
18.Clark K.A.A., Reinhardt S.W., Chouairi F., et al. Trends in heart failure hospitalizations in the US from 2008 to 2018. J Card Fail. 2022;28:171–180. doi: 10.1016/j.cardfail.2021.08.020. [DOI] [PubMed] [Google Scholar]
19.Dhingra L.S., Aminorroaya A., Sangha V., et al. Heart failure risk stratification using artificial intelligence applied to electrocardiogram images: a multinational study. Eur Heart J. 2025;46:1044–1053. doi: 10.1093/eurheartj/ehae914. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Page M.J., McKenzie J.E., Bossuyt P.M., et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372 doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gottlieb M., Schraft E., O’Brien J., Patel D., Peksa G.D. Prevalence of undiagnosed stage B heart failure among emergency department patients. Am J Emerg Med. 2024;85:153–157. doi: 10.1016/j.ajem.2024.09.026. [DOI] [PubMed] [Google Scholar]
22.Balderston J.R., Gertz Z.M., Brooks S., Joyce J.M., Evans D.P. Diagnostic yield and accuracy of bedside echocardiography in the emergency department in hemodynamically stable patients. J Ultrasound Med. 2019;38:2845–2851. doi: 10.1002/jum.14985. [DOI] [PubMed] [Google Scholar]
23.Bergquist J.A., Zenger B., Brundage J., et al. Performance of off-the-shelf machine learning architectures and biases in low left ventricular ejection fraction detection. Heart Rhythm O2. 2024;5:644–654. doi: 10.1016/j.hroo.2024.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Khunte A., Sangha V., Oikonomou E.K., et al. Detection of left ventricular systolic dysfunction from single-lead electrocardiography adapted for portable and wearable devices. NPJ Digit Med. 2023;6:124. doi: 10.1038/s41746-023-00869-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Vaid A., Jiang J., Sawant A., et al. A foundational vision transformer improves diagnostic performance for electrocardiograms. NPJ Digit Med. 2023;6:108. doi: 10.1038/s41746-023-00840-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Honarvar H., Agarwal C., Somani S., et al. Enhancing convolutional neural network predictions of electrocardiograms with left ventricular dysfunction using a novel sub-waveform representation. Cardiovasc Digit Health J. 2022;3:220–231. doi: 10.1016/j.cvdhj.2022.07.074. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Chen H.-Y., Lin C.-S., Fang W.-H., et al. Artificial intelligence-enabled electrocardiogram predicted left ventricle diameter as an independent risk factor of long-term cardiovascular outcome in patients with normal ejection fraction. Front Med (Lausanne) 2022;9 doi: 10.3389/fmed.2022.870523. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Golany T., Radinsky K., Kofman N., et al. Physicians and machine-learning algorithm performance in predicting left-ventricular systolic dysfunction from a standard 12-lead-electrocardiogram. J Clin Med. 2022;11:6767. doi: 10.3390/jcm11226767. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Lee C.-H., Liu W.-T., Lou Y.-S., et al. Artificial intelligence-enabled electrocardiogram screens low left ventricular ejection fraction with a degree of confidence. Digit Health. 2022;8 doi: 10.1177/20552076221143249. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Chen H.-Y., Lin C.-S., Fang W.-H., et al. Artificial intelligence-enabled electrocardiography predicts left ventricular dysfunction and future cardiovascular outcomes: a retrospective analysis. J Pers Med. 2022;12:455. doi: 10.3390/jpm12030455. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Kwon J.-M., Jo Y.-Y., Lee S.Y., et al. Artificial intelligence-enhanced smartwatch ECG for heart failure-reduced ejection fraction detection by generating 12-lead ECG. Diagnostics (Basel) 2022;12:654. doi: 10.3390/diagnostics12030654. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Surendra K., Nürnberg S., Bremer J.P., et al. Pragmatic screening for heart failure in the general population using an electrocardiogram-based neural network. ESC Heart Fail. 2023;10:975–984. doi: 10.1002/ehf2.14263. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Vaid A., Johnson K.W., Badgeley M.A., et al. Using deep-learning algorithms to simultaneously identify right and left ventricular dysfunction from the electrocardiogram. JACC Cardiovasc Imaging. 2022;15:395–410. doi: 10.1016/j.jcmg.2021.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Katsushika S., Kodera S., Nakamoto M., et al. The effectiveness of a deep learning model to detect left ventricular systolic dysfunction from electrocardiograms. Int Heart J. 2021;62:1332–1341. doi: 10.1536/ihj.21-407. [DOI] [PubMed] [Google Scholar]
35.Cho J., Lee B., Kwon J.-M., et al. Artificial intelligence algorithm for screening heart failure with reduced ejection fraction using electrocardiography. ASAIO J. 2021;67:314–321. doi: 10.1097/MAT.0000000000001218. [DOI] [PubMed] [Google Scholar]
36.Choi J., Lee S., Chang M., Lee Y., Oh G.C., Lee H.-Y. Deep learning of ECG waveforms for diagnosis of heart failure with a reduced left ventricular ejection fraction. Sci Rep. 2022;12 doi: 10.1038/s41598-022-18640-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Sun J.-Y., Qiu Y., Guo H.-C., et al. A method to screen left ventricular dysfunction through ECG based on convolutional neural network. J Cardiovasc Electrophysiol. 2021;32:1095–1102. doi: 10.1111/jce.14936. [DOI] [PubMed] [Google Scholar]
38.Attia Z.I., Harmon D.M., Dugan J., et al. Prospective evaluation of smartwatch-enabled detection of left ventricular dysfunction. Nat Med. 2022;28:2497–2503. doi: 10.1038/s41591-022-02053-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Harmon D.M., Adedinsewo D., Van’t Hof J.R., et al. Community-based participatory research application of an artificial intelligence-enhanced electrocardiogram for cardiovascular disease screening: a FAITH! trial ancillary study. Am J Prev Cardiol. 2022;12 doi: 10.1016/j.ajpc.2022.100431. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Harmon D.M., Carter R.E., Cohen-Shelly M., et al. Real-world performance, long-term efficacy, and absence of bias in the artificial intelligence enhanced electrocardiogram to detect left ventricular systolic dysfunction. Eur Heart J Digit Health. 2022;3:238–244. doi: 10.1093/ehjdh/ztac028. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Attia Z.I., Dugan J., Rideout A., et al. Automated detection of low ejection fraction from a one-lead electrocardiogram: application of an AI algorithm to an electrocardiogram-enabled Digital Stethoscope. Eur Heart J Digit Health. 2022;3:373–379. doi: 10.1093/ehjdh/ztac030. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Attia I.Z., Tseng A.S., Benavente E.D., et al. External validation of a deep learning electrocardiogram algorithm to detect ventricular dysfunction. Int J Cardiol. 2021;329:130–135. doi: 10.1016/j.ijcard.2020.12.065. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Attia Z.I., Kapa S., Yao X., et al. Prospective validation of a deep learning electrocardiogram algorithm for the detection of left ventricular systolic dysfunction: ATTIA et al. J Cardiovasc Electrophysiol. 2019;30:668–674. doi: 10.1111/jce.13889. [DOI] [PubMed] [Google Scholar]
44.Klein C.J., Ozcan I., Attia Z.I., et al. Electrocardiogram-artificial intelligence and immune-mediated necrotizing myopathy: predicting left ventricular dysfunction and clinical outcomes. Mayo Clin Proc Innov Qual Outcomes. 2022;6:450–457. doi: 10.1016/j.mayocpiqo.2022.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Bachtiger P., Petri C.F., Scott F.E., et al. Point-of-care screening for heart failure with reduced ejection fraction using artificial intelligence during ECG-enabled stethoscope examination in London, UK: a prospective, observational, multicentre study. Lancet Digit Health. 2022;4:e117–e125. doi: 10.1016/S2589-7500(21)00256-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Kashou A.H., Medina-Inojosa J.R., Noseworthy P.A., et al. Artificial intelligence-augmented electrocardiogram detection of left ventricular systolic dysfunction in the general population. Mayo Clin Proc. 2021;96:2576–2586. doi: 10.1016/j.mayocp.2021.02.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Brito B.O.F., Attia Z.I., Martins L.N.A., et al. Left ventricular systolic dysfunction predicted by artificial intelligence using the electrocardiogram in Chagas disease patients-the SaMi-Trop cohort. PLoS Negl Trop Dis. 2021;15 doi: 10.1371/journal.pntd.0009974. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Kashou A.H., Noseworthy P.A., Lopez-Jimenez F., et al. The effect of cardiac rhythm on artificial intelligence-enabled ECG evaluation of left ventricular ejection fraction prediction in cardiac intensive care unit patients. Int J Cardiol. 2021;339:54–55. doi: 10.1016/j.ijcard.2021.07.001. [DOI] [PubMed] [Google Scholar]
49.Jentzer J.C., Kashou A.H., Attia Z.I., et al. Left ventricular systolic dysfunction identification using artificial intelligence-augmented electrocardiogram in cardiac intensive care unit patients. Int J Cardiol. 2021;326:114–123. doi: 10.1016/j.ijcard.2020.10.074. [DOI] [PubMed] [Google Scholar]
50.Adedinsewo D., Carter R.E., Attia Z., et al. Artificial intelligence-enabled ECG Algorithm to identify patients with left ventricular systolic dysfunction presenting to the emergency department with Dyspnea. Circ Arrhythm Electrophysiol. 2020 doi: 10.1161/circep.120.008437. [DOI] [PubMed] [Google Scholar]
51.Noseworthy P.A., Attia Z.I., Brewer L.C., et al. Assessing and mitigating bias in medical artificial intelligence: the effects of race and ethnicity on a deep learning model for ECG analysis: the effects of race and ethnicity on a deep learning model for ECG analysis. Circ Arrhythm Electrophysiol. 2020;13 doi: 10.1161/CIRCEP.119.007988. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.van de Leur R.R., Bos M.N., Taha K., et al. Improving explainability of deep neural network-based electrocardiogram interpretation using variational auto-encoders. Eur Heart J Digit Health. 2022;3:390–404. doi: 10.1093/ehjdh/ztac038. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Vaid A., Jiang J.J., Sawant A., et al. Automated determination of left ventricular function using electrocardiogram data in patients on maintenance hemodialysis. Clin J Am Soc Nephrol. 2022;17:1017–1025. doi: 10.2215/CJN.16481221. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Sau A., Zeidaabadi B., Patlatzoglou K., et al. A comparison of artificial intelligence-enhanced electrocardiography approaches for prediction of time-to-mortality using electrocardiogram images. Eur Heart J Digit Health. 2024;6:180–189. doi: 10.1093/ehjdh/ztae090. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Sangha V., Dhingra L.S., Aminorroaya A., et al. Identification of hypertrophic cardiomyopathy on electrocardiographic images with deep learning. Nat Cardiovasc Res. 2025:1–10. doi: 10.1038/s44161-025-00685-3. [DOI] [PubMed] [Google Scholar]
56.Dhingra L.S., Aminorroaya A., Sangha V., et al. Ensemble deep learning algorithm for structural heart disease screening using electrocardiographic images. J Am Coll Cardiol. 2025;85:1302–1313. doi: 10.1016/j.jacc.2025.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Galanty M., Luitse D., Noteboom S.H., et al. Assessing the documentation of publicly available medical image and signal datasets and their impact on bias using the BEAMRAD tool. Sci Rep. 2024;14 doi: 10.1038/s41598-024-83218-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Wu J., Biswas D., Ryan M., et al. Artificial intelligence methods for improved detection of undiagnosed heart failure with preserved ejection fraction. Eur J Heart Fail. 2024;26:302–310. doi: 10.1002/ejhf.3115. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

mmc1.docx^{(492.9KB, docx)}

[bib1] 1.Siontis K.C., Noseworthy P.A., Attia Z.I., Friedman P.A. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021;18:465–478. doi: 10.1038/s41569-020-00503-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Sangha V., Nargesi A.A., Dhingra L.S., et al. Detection of left ventricular systolic dysfunction from electrocardiographic images. Circulation. 2023;148:765–777. doi: 10.1161/CIRCULATIONAHA.122.062646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Sangha V., Oikonomou E.K., Khera R. Artificial intelligence applied to electrocardiographic images for scalable screening of transthyretin amyloid cardiomyopathy. Cardiovasc Med. 2024 doi: 10.1101/2024.09.30.24314651. [DOI] [Google Scholar]

[bib4] 4.Khunte A., Sangha V., Oikonomou E.K., et al. Automated diagnostic reports from images of electrocardiograms at the point-of-care. medRxiv. 2024 doi: 10.1101/2024.02.17.24302976. [DOI] [Google Scholar]

[bib5] 5.Noseworthy P.A., Attia Z.I., Behnken E.M., et al. Artificial intelligence-guided screening for atrial fibrillation using electrocardiogram during sinus rhythm: a prospective non-randomised interventional trial. Lancet. 2022;400:1206–1212. doi: 10.1016/S0140-6736(22)01637-3. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Attia Z.I., Kapa S., Lopez-Jimenez F., et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med. 2019;25:70–74. doi: 10.1038/s41591-018-0240-2. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Croon P.M., Dhingra L.S., Biswas D., Oikonomou E.K., Khera R. Phenotypic selectivity of artificial intelligence-enhanced electrocardiography in cardiovascular diagnosis and risk prediction. Circulation. 2025;152:1282–1294. doi: 10.1161/circulationaha.125.076279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Croon P.M., Pedroso A.F., Khera R. The emerging role of AI in transforming cardiovascular care. Future Cardiol. 2025;21:547–550. doi: 10.1080/14796678.2025.2492973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Myhre P.L., Grenne B., Asch F.M., et al. Artificial intelligence-enhanced echocardiography in cardiovascular disease management. Nat Rev Cardiol. 2025:1–19. doi: 10.1038/s41569-025-01197-0. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Lekadir K., Frangi A.F., Porras A.R., et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ. 2025;388 doi: 10.1136/bmj-2024-081554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Yagi R., Goto S., Katsumata Y., MacRae C.A., Deo R.C. Importance of external validation and subgroup analysis of artificial intelligence in the detection of low ejection fraction from electrocardiograms. Eur Heart J Digit Health. 2022;3:654–657. doi: 10.1093/ehjdh/ztac065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Rosenbaum L. Reconnecting the dots — reinterpreting industry–physician relations. N Engl J Med. 2015;372:1860–1864. doi: 10.1056/NEJMms1502493. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Huang Y.-C., Hsu Y.-C., Liu Z.-Y., et al. Artificial intelligence-enabled electrocardiographic screening for left ventricular systolic dysfunction and mortality risk prediction. Front Cardiovasc Med. 2023;10 doi: 10.3389/fcvm.2023.1070641. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Sandeep B., Huang X., Xiao Z. Artificial intelligence in heart failure improving the efficiency or dependency on it? Letter regarding the article “Artificial intelligence and heart failure: a state-of-the-art review.”. Eur J Heart Fail. 2024;26:704. doi: 10.1002/ejhf.3017. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Ledwidge M., Gallagher J., Conlon C., et al. Natriuretic peptide-based screening and collaborative care for heart failure: the STOP-HF randomized trial. JAMA. 2013;310:66–74. doi: 10.1001/jama.2013.7588. [DOI] [PubMed] [Google Scholar]

[bib16] 16.Anon 2021 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure. https://www.escardio.org/Guidelines/Clinical-Practice-Guidelines/Acute-and-Chronic-Heart-Failure [DOI] [PubMed]

[bib17] 17.Heidenreich P.A., Bozkurt B., Aguilar D., et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: a report of the American College of Cardiology/American Heart Association Joint Committee on clinical practice guidelines. Circulation. 2022;145:e895–e1032. doi: 10.1161/CIR.0000000000001063. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Clark K.A.A., Reinhardt S.W., Chouairi F., et al. Trends in heart failure hospitalizations in the US from 2008 to 2018. J Card Fail. 2022;28:171–180. doi: 10.1016/j.cardfail.2021.08.020. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Dhingra L.S., Aminorroaya A., Sangha V., et al. Heart failure risk stratification using artificial intelligence applied to electrocardiogram images: a multinational study. Eur Heart J. 2025;46:1044–1053. doi: 10.1093/eurheartj/ehae914. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Page M.J., McKenzie J.E., Bossuyt P.M., et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372 doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Gottlieb M., Schraft E., O’Brien J., Patel D., Peksa G.D. Prevalence of undiagnosed stage B heart failure among emergency department patients. Am J Emerg Med. 2024;85:153–157. doi: 10.1016/j.ajem.2024.09.026. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Balderston J.R., Gertz Z.M., Brooks S., Joyce J.M., Evans D.P. Diagnostic yield and accuracy of bedside echocardiography in the emergency department in hemodynamically stable patients. J Ultrasound Med. 2019;38:2845–2851. doi: 10.1002/jum.14985. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Bergquist J.A., Zenger B., Brundage J., et al. Performance of off-the-shelf machine learning architectures and biases in low left ventricular ejection fraction detection. Heart Rhythm O2. 2024;5:644–654. doi: 10.1016/j.hroo.2024.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Khunte A., Sangha V., Oikonomou E.K., et al. Detection of left ventricular systolic dysfunction from single-lead electrocardiography adapted for portable and wearable devices. NPJ Digit Med. 2023;6:124. doi: 10.1038/s41746-023-00869-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Vaid A., Jiang J., Sawant A., et al. A foundational vision transformer improves diagnostic performance for electrocardiograms. NPJ Digit Med. 2023;6:108. doi: 10.1038/s41746-023-00840-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Honarvar H., Agarwal C., Somani S., et al. Enhancing convolutional neural network predictions of electrocardiograms with left ventricular dysfunction using a novel sub-waveform representation. Cardiovasc Digit Health J. 2022;3:220–231. doi: 10.1016/j.cvdhj.2022.07.074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Chen H.-Y., Lin C.-S., Fang W.-H., et al. Artificial intelligence-enabled electrocardiogram predicted left ventricle diameter as an independent risk factor of long-term cardiovascular outcome in patients with normal ejection fraction. Front Med (Lausanne) 2022;9 doi: 10.3389/fmed.2022.870523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Golany T., Radinsky K., Kofman N., et al. Physicians and machine-learning algorithm performance in predicting left-ventricular systolic dysfunction from a standard 12-lead-electrocardiogram. J Clin Med. 2022;11:6767. doi: 10.3390/jcm11226767. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Lee C.-H., Liu W.-T., Lou Y.-S., et al. Artificial intelligence-enabled electrocardiogram screens low left ventricular ejection fraction with a degree of confidence. Digit Health. 2022;8 doi: 10.1177/20552076221143249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Chen H.-Y., Lin C.-S., Fang W.-H., et al. Artificial intelligence-enabled electrocardiography predicts left ventricular dysfunction and future cardiovascular outcomes: a retrospective analysis. J Pers Med. 2022;12:455. doi: 10.3390/jpm12030455. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Kwon J.-M., Jo Y.-Y., Lee S.Y., et al. Artificial intelligence-enhanced smartwatch ECG for heart failure-reduced ejection fraction detection by generating 12-lead ECG. Diagnostics (Basel) 2022;12:654. doi: 10.3390/diagnostics12030654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Surendra K., Nürnberg S., Bremer J.P., et al. Pragmatic screening for heart failure in the general population using an electrocardiogram-based neural network. ESC Heart Fail. 2023;10:975–984. doi: 10.1002/ehf2.14263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Vaid A., Johnson K.W., Badgeley M.A., et al. Using deep-learning algorithms to simultaneously identify right and left ventricular dysfunction from the electrocardiogram. JACC Cardiovasc Imaging. 2022;15:395–410. doi: 10.1016/j.jcmg.2021.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Katsushika S., Kodera S., Nakamoto M., et al. The effectiveness of a deep learning model to detect left ventricular systolic dysfunction from electrocardiograms. Int Heart J. 2021;62:1332–1341. doi: 10.1536/ihj.21-407. [DOI] [PubMed] [Google Scholar]

[bib35] 35.Cho J., Lee B., Kwon J.-M., et al. Artificial intelligence algorithm for screening heart failure with reduced ejection fraction using electrocardiography. ASAIO J. 2021;67:314–321. doi: 10.1097/MAT.0000000000001218. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Choi J., Lee S., Chang M., Lee Y., Oh G.C., Lee H.-Y. Deep learning of ECG waveforms for diagnosis of heart failure with a reduced left ventricular ejection fraction. Sci Rep. 2022;12 doi: 10.1038/s41598-022-18640-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Sun J.-Y., Qiu Y., Guo H.-C., et al. A method to screen left ventricular dysfunction through ECG based on convolutional neural network. J Cardiovasc Electrophysiol. 2021;32:1095–1102. doi: 10.1111/jce.14936. [DOI] [PubMed] [Google Scholar]

[bib38] 38.Attia Z.I., Harmon D.M., Dugan J., et al. Prospective evaluation of smartwatch-enabled detection of left ventricular dysfunction. Nat Med. 2022;28:2497–2503. doi: 10.1038/s41591-022-02053-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Harmon D.M., Adedinsewo D., Van’t Hof J.R., et al. Community-based participatory research application of an artificial intelligence-enhanced electrocardiogram for cardiovascular disease screening: a FAITH! trial ancillary study. Am J Prev Cardiol. 2022;12 doi: 10.1016/j.ajpc.2022.100431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Harmon D.M., Carter R.E., Cohen-Shelly M., et al. Real-world performance, long-term efficacy, and absence of bias in the artificial intelligence enhanced electrocardiogram to detect left ventricular systolic dysfunction. Eur Heart J Digit Health. 2022;3:238–244. doi: 10.1093/ehjdh/ztac028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 41.Attia Z.I., Dugan J., Rideout A., et al. Automated detection of low ejection fraction from a one-lead electrocardiogram: application of an AI algorithm to an electrocardiogram-enabled Digital Stethoscope. Eur Heart J Digit Health. 2022;3:373–379. doi: 10.1093/ehjdh/ztac030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42.Attia I.Z., Tseng A.S., Benavente E.D., et al. External validation of a deep learning electrocardiogram algorithm to detect ventricular dysfunction. Int J Cardiol. 2021;329:130–135. doi: 10.1016/j.ijcard.2020.12.065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] 43.Attia Z.I., Kapa S., Yao X., et al. Prospective validation of a deep learning electrocardiogram algorithm for the detection of left ventricular systolic dysfunction: ATTIA et al. J Cardiovasc Electrophysiol. 2019;30:668–674. doi: 10.1111/jce.13889. [DOI] [PubMed] [Google Scholar]

[bib44] 44.Klein C.J., Ozcan I., Attia Z.I., et al. Electrocardiogram-artificial intelligence and immune-mediated necrotizing myopathy: predicting left ventricular dysfunction and clinical outcomes. Mayo Clin Proc Innov Qual Outcomes. 2022;6:450–457. doi: 10.1016/j.mayocpiqo.2022.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Bachtiger P., Petri C.F., Scott F.E., et al. Point-of-care screening for heart failure with reduced ejection fraction using artificial intelligence during ECG-enabled stethoscope examination in London, UK: a prospective, observational, multicentre study. Lancet Digit Health. 2022;4:e117–e125. doi: 10.1016/S2589-7500(21)00256-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46.Kashou A.H., Medina-Inojosa J.R., Noseworthy P.A., et al. Artificial intelligence-augmented electrocardiogram detection of left ventricular systolic dysfunction in the general population. Mayo Clin Proc. 2021;96:2576–2586. doi: 10.1016/j.mayocp.2021.02.029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] 47.Brito B.O.F., Attia Z.I., Martins L.N.A., et al. Left ventricular systolic dysfunction predicted by artificial intelligence using the electrocardiogram in Chagas disease patients-the SaMi-Trop cohort. PLoS Negl Trop Dis. 2021;15 doi: 10.1371/journal.pntd.0009974. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] 48.Kashou A.H., Noseworthy P.A., Lopez-Jimenez F., et al. The effect of cardiac rhythm on artificial intelligence-enabled ECG evaluation of left ventricular ejection fraction prediction in cardiac intensive care unit patients. Int J Cardiol. 2021;339:54–55. doi: 10.1016/j.ijcard.2021.07.001. [DOI] [PubMed] [Google Scholar]

[bib49] 49.Jentzer J.C., Kashou A.H., Attia Z.I., et al. Left ventricular systolic dysfunction identification using artificial intelligence-augmented electrocardiogram in cardiac intensive care unit patients. Int J Cardiol. 2021;326:114–123. doi: 10.1016/j.ijcard.2020.10.074. [DOI] [PubMed] [Google Scholar]

[bib50] 50.Adedinsewo D., Carter R.E., Attia Z., et al. Artificial intelligence-enabled ECG Algorithm to identify patients with left ventricular systolic dysfunction presenting to the emergency department with Dyspnea. Circ Arrhythm Electrophysiol. 2020 doi: 10.1161/circep.120.008437. [DOI] [PubMed] [Google Scholar]

[bib51] 51.Noseworthy P.A., Attia Z.I., Brewer L.C., et al. Assessing and mitigating bias in medical artificial intelligence: the effects of race and ethnicity on a deep learning model for ECG analysis: the effects of race and ethnicity on a deep learning model for ECG analysis. Circ Arrhythm Electrophysiol. 2020;13 doi: 10.1161/CIRCEP.119.007988. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] 52.van de Leur R.R., Bos M.N., Taha K., et al. Improving explainability of deep neural network-based electrocardiogram interpretation using variational auto-encoders. Eur Heart J Digit Health. 2022;3:390–404. doi: 10.1093/ehjdh/ztac038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] 53.Vaid A., Jiang J.J., Sawant A., et al. Automated determination of left ventricular function using electrocardiogram data in patients on maintenance hemodialysis. Clin J Am Soc Nephrol. 2022;17:1017–1025. doi: 10.2215/CJN.16481221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] 54.Sau A., Zeidaabadi B., Patlatzoglou K., et al. A comparison of artificial intelligence-enhanced electrocardiography approaches for prediction of time-to-mortality using electrocardiogram images. Eur Heart J Digit Health. 2024;6:180–189. doi: 10.1093/ehjdh/ztae090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] 55.Sangha V., Dhingra L.S., Aminorroaya A., et al. Identification of hypertrophic cardiomyopathy on electrocardiographic images with deep learning. Nat Cardiovasc Res. 2025:1–10. doi: 10.1038/s44161-025-00685-3. [DOI] [PubMed] [Google Scholar]

[bib56] 56.Dhingra L.S., Aminorroaya A., Sangha V., et al. Ensemble deep learning algorithm for structural heart disease screening using electrocardiographic images. J Am Coll Cardiol. 2025;85:1302–1313. doi: 10.1016/j.jacc.2025.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] 57.Galanty M., Luitse D., Noteboom S.H., et al. Assessing the documentation of publicly available medical image and signal datasets and their impact on bias using the BEAMRAD tool. Sci Rep. 2024;14 doi: 10.1038/s41598-024-83218-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] 58.Wu J., Biswas D., Ryan M., et al. Artificial intelligence methods for improved detection of undiagnosed heart failure with preserved ejection fraction. Eur J Heart Fail. 2024;26:302–310. doi: 10.1002/ejhf.3115. [DOI] [PubMed] [Google Scholar]

PERMALINK

Artificial Intelligence–Enhanced Electrocardiogram Models for Detection of Left Ventricular Dysfunction

Philip M Croon, MD

Machteld J Boonstra, PhD

Cornelis P Allaart, MD, PhD

Bauke KO Arends, MD

Lovedeep S Dhingra, MBBS, MHS

Yu-Chang Huang, MD

Thomas Mast, MD, PhD

Rohan Khera, MD, MS

Chang-Fu Kuo, MD, PhD

Joon-Myoung Kwon, MD, MS

Hak Seung Lee, MD

Min Sung Lee, MD

Rutger R van de Leur, MD, PhD

Zhi-Yong Liu, PhD

Evangelos K Oikonomou, MD, DPhil

Jasper L Selder, MD

Michiel M Winter, MD, PhD

Folkert W Asselbergs, MD, PhD

Abstract

Background

Objectives

Methods

Results

Conclusions

Central Illustration

Central Illustration.

Methods

Systematic review of AI-ECG for LVSD

External validation cohort

External validation of included LVSD models

Model performance assessment

Sensitivity analyses

Statistical analysis

Results

Identification and review of AI-ECG models

Table 1.

Risk of bias of AI-ECG models

Figure 1.

Model availability

External validation of AI ECG models for LVSD detection

Table 2.

Performance of AI-ECG models in detecting left ventricular dysfunction

Figure 2.

Figure 3.

Correlation between individual models

Figure 4.

Discussion

Limitations and future directions

Conclusions

Funding support and author disclosures

Footnotes

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases