Skip to main content
Health Research Alliance Author Manuscripts logoLink to Health Research Alliance Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 6.
Published in final edited form as: Circ Arrhythm Electrophysiol. 2025 Dec 23;19(1):e014369. doi: 10.1161/CIRCEP.125.014369

Convolutional Neural Network Models Leverage Morphological Rather than Temporal Features to Detect Myocardial Diseases from 12-lead Electrocardiograms

Masamitsu Nakayama 1,2, Ryuichiro Yagi 1,2, Yoshinori Katsumata 3, Masayuki Oki 4, Rahul C Deo 1,2, Calum A MacRae 1,2, Shinichi Goto 1,2,5,6
PMCID: PMC12768457  NIHMSID: NIHMS2130120  PMID: 41431897

Artificial intelligence (AI) combined with electrocardiogram (ECG) is a powerful tool for detecting myocardial disorders such as left ventricular systolic dysfunction (LVSD),1 hypertrophic cardiomyopathy (HCM),2 and cardiac amyloidosis (CA),3 which are frequently missed in current clinical practice. Attribution-based approaches such as gradient-weighted class activation mapping (Grad-CAM) have been used to mitigate the “black-box” problem, delivering intuitive heatmaps indicating the regions contributing to the prediction. However, such techniques provide little information on specific features.2 A fundamental question that remains unclear is whether the models primarily utilize morphological features of the waveform or rely on rhythm abnormalities/temporal fluctuation within the ECG. To dissect the morphological vs temporal features, we compared the performance of convolutional neural network (CNN) models trained on 12-lead ECGs with varying lengths for detecting LVSD, HCM, and CA.

Four models were trained to detect each of the three diseases using ECG recordings of 10, 5, 2.5 seconds, and a single beat (defined as the segment from 300 ms before to 500 ms after the second R peak, detected by the Pan-Tompkins algorithm). We collected ECGs from Brigham and Women’s Hospital (BWH) and Keio University Hospital (KEIO) (Figure [A]). All models were trained with data from BWH and externally validated on data from KEIO. The LVSD dataset was constructed using ECGs recorded within 14 days of an echocardiogram. HCM and CA datasets were constructed by matching case ECGs to control ECGs in a 1:5 ratio based on age and sex. The datasets from BWH were randomly divided into derivation, validation, and test sets in a 5:2:3 ratio without patient overlap. While all available ECGs in derivation and validation sets were used, only one ECG per patient was selected (the ECG closest to the echocardiogram for LVSD and the first available ECG after diagnosis for HCM and CA) for the test sets. All the models were constructed based on previously published architecture,3 differing only in the number of max pooling layers depending on the input data shape. The final model for each disease/input length was chosen as the one achieving the highest area under the receiver operating curve (AUROC) on validation set, which was then tested once on the test sets. Model performances were reported with 95% confidence intervals (CI) calculated using 2000 bootstrap samples at the optimal Youden’s index. Subgroup analyses stratified by patient characteristics and overt ECG abnormalities were performed for the single-beat models. The institutional review boards of all institutions approved the study. The data of this study are available from the corresponding author upon reasonable request.

Figure.

Figure.

Model performance for detecting myocardial diseases. A, Numbers of ECGs in case and control groups. B, AUROCs across internal and external test sets for models trained with different ECG lengths. C, Sensitivities, specificities, PPVs, and NPVs at optimal cutoffs. P values for the difference in AUROCs were calculated with DeLong method taking the 10s model as reference. D, Subgroup analyses for single-beat models in the BWH test set for LVSD, HCM, and CA.

AUROC, area under the receiver operating curve; BWH, Brigham and Women’s Hospital; KEIO, Keio University Hospital; LVSD, left ventricular systolic dysfunction; HCM, hypertrophic cardiomyopathy; CA, cardiac amyloidosis; HR, heart rate; 1AVb, first-degree atrioventricular block; AF, atrial fibrillation; LBBB, left bundle branch block; PAC, premature atrial complex; PVC, premature ventricular complex, PPV: positive predictive value, NPV: negative predictive value, CI: confidence interval.

The 10-, 5-, 2.5-second, and single-beat models detecting LVSD showed comparable AUROCs of 0.89 [95%CI, 0.88–0.91], 0.90 [0.88–0.91], 0.90 [0.88–0.91], and 0.89 [0.88–0.91] in the BWH test set (Figure [B] and [C]). The results were similar in other myocardial diseases: for HCM, AUROCs were 0.91 [0.87–0.94], 0.93 [0.90–0.95], 0.94 [0.91–0.96], and 0.92 [0.90–0.94] for 10-, 5-, 2.5-second, and single-beat models; for CA, AUROCs were 0.94 [0.92–0.96], 0.94 [0.91–0.96], 0.93 [0.90–0.95], and 0.93 [0.90–0.95] for 10-, 5-, 2.5-second, and single-beat models, respectively. The model performances were consistent upon external validation (Figure [C]). The stratified analysis using the single-beat model for LVSD detection demonstrated consistent performance across patient backgrounds (Figure [D]). Although the performance of the single-beat model was affected by the presence of left bundle branch block, atrial fibrillation, and premature ventricular complexes for the LVSD model, the model was robust across various heart rate ranges and against other common ECG abnormalities. The external validation studies similarly showed consistent performance of the single-beat models across subpopulations for HCM and CA detection.

Previous studies have indicated both waveform profiles in a single beat and variabilities within the 10-second recording were indicative of cardiac diseases.4,5 In this study, our CNN-based AI model performances were not affected by the length of ECG. Furthermore, the models using a single beat, which lacks information on beat rhythms and differences in the waveforms, demonstrated robust performance for detecting multiple cardiac diseases across subpopulations and external institutions. Our results suggest that the CNN-based AI models primarily focus on the morphological features of a single beat waveform rather than rhythm-related abnormalities. This insight offers a refined perspective on the interpretation of visualization outputs, indicating that the highlighted regions may contain clinically relevant morphological characteristics. One limitation of our study is that the analyses were performed on a specific CNN architecture. The results may not be generalizable to other models, including transformers that are reported to capture features from longer sequences compared with CNNs. However, the AUROCs observed were comparable to the state-of-the-art models, suggesting that information content within a single beat is sufficient to achieve such performance.

In summary, the CNN-based single-beat models showed equivalent performance compared to the 10-second ECG models for detecting myocardial diseases. More detailed morphological analysis may enhance the interpretability of ECG AI models.

Source of Funding:

This work was supported by the postdoctoral fellowship from the American Heart Association (AHA), grants from Terumo Life Science Foundation, Senshin Medical Research Foundation, the Japanese Circulation Society (JCS), JSPS KAKENHI grant numbers 24K11225, 24K10541, SECOM Science and Technology Foundation, Sakakibara Memorial Foundation, AMED grant number JP25hma322032 and AHA’s Second Century Early Faculty Independence Award.

Disclosures:

Dr Yagi is supported by the postdoctoral fellowship from American Heart Association (AHA). Dr Goto is partially supported by AHA’s Second Century Early Faculty Independence Award. Dr Deo is a co-founder of Atman Health. Dr MacRae is support and co-founder of Atman Health.

Nonstandard Abbreviations and Acronyms

AF

Atrial fibrillation

AUROC

Area under the receiver operating characteristic curve

BWH

Brigham and Women’s Hospital

CA

Cardiac amyloidosis

CI

Confidence interval

HCM

Hypertrophic cardiomyopathy

HR

Heart rate

KEIO

Keio University Hospital

LBBB

Left bundle branch block

LVSD

Left ventricular systolic dysfunction

NPV

Negative predictive value

PAC

Premature atrial complex

PVC

Premature ventricular complex

PPV

Positive predictive value

1AVb

First-degree atrioventricular block

References:

  • 1.Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, Pellikka PA, Enriquez-Sarano M, Noseworthy PA, Munger TM, et al. Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram. Nat. Med 2019;25:70–74. [DOI] [PubMed] [Google Scholar]
  • 2.Goto S, Solanki D, John JE, Yagi R, Homilius M, Ichihara G, Katsumata Y, Gaggin HK, Itabashi Y, MacRae CA, et al. Multinational Federated Learning Approach to Train ECG and Echocardiogram Models for Hypertrophic Cardiomyopathy Detection. Circulation. 2022;146:755–769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Goto S, Mahara K, Beussink-Nelson L, Ikura H, Katsumata Y, Endo J, Gaggin HK, Shah SJ, Itabashi Y, MacRae CA, et al. Artificial intelligence-enabled fully automated detection of cardiac amyloidosis using electrocardiograms and echocardiograms. Nat. Commun 2021;12:2726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Faust O, Hong W, Loh HW, Xu S, Tan R-S, Chakraborty S, Barua PD, Molinari F, Acharya UR. Heart rate variability for medical decision support systems: A review. Comput. Biol. Med 2022;145:105407. [DOI] [PubMed] [Google Scholar]
  • 5.Siontis KC, Suárez AB, Sehrawat O, Ackerman MJ, Attia ZI, Friedman PA, Noseworthy PA, Maanja M. Saliency maps provide insights into artificial intelligence-based electrocardiography models for detecting hypertrophic cardiomyopathy. J. Electrocardiol 2023;81:286–291. [DOI] [PubMed] [Google Scholar]

Articles from Circulation. Arrhythmia and electrophysiology are provided here courtesy of Health Research Alliance manuscript submission

RESOURCES