Skip to main content
Systematic Reviews logoLink to Systematic Reviews
. 2025 Dec 20;15:28. doi: 10.1186/s13643-025-03033-5

Artificial intelligence in electrocardiogram signals for sudden cardiac death prediction: a systematic review and meta-analysis

Shuang He 1,#, Ming Du 1,#, Zhao Wang 1, Yuhang Zang 1, Guanfei Ning 1, Shuxin Pang 1, Yining Wan 1, Yuchen Wang 1, Meng Zuo 1, Bo Luan 1,✉,#, Na Duan 1,✉,#
PMCID: PMC12836852  PMID: 41422068

Abstract

Purpose

The study aimed to evaluate the diagnostic performance of artificial intelligence (AI) in detecting sudden cardiac death on electrocardiogram (ECG).

Methods

We systematically searched PubMed, Web of Science, Embase, and IEEE Xplore for studies published through April 2025 evaluating AI models for ECG-based sudden cardiac death detection, using expert consensus or database records as the reference standard. A bivariate random-effects model generated pooled sensitivity and specificity estimates. Heterogeneity was quantified via I2 and τ2 statistics. Study quality was appraised using the revised QUADAS-2 tool, with evidence certainty graded via the GRADE assessment.

Results

Out of 958 initially identified studies, 27 studies with 2613 patients and images were ultimately included for the final analysis. For heart rate variability, AI demonstrated a sensitivity of 0.90 (95% CI: 0.86–0.92) and specificity of 0.91 (95% CI: 0.83–0.96), with an AUC of 0.93 (95% CI: 0.91–0.95). For ECG signal segmentation, AI demonstrated a sensitivity of 0.96 (95% CI: 0.92–0.98) and specificity of 0.99 (95% CI: 0.94–1.00), with an AUC of 0.99 (95% CI: 0.98–1.00). For direct input of ECG lead signals, AI demonstrated a sensitivity of 0.87 (95% CI: 0.61–0.97) and specificity of 0.91 (95% CI: 0.75–0.97), with an AUC of 0.95 (95% CI: 0.93–0.97).

Conclusions

This meta-analysis indicates that AI-based ECG analysis shows potential for SCD prediction. However, the summary estimates are derived from highly heterogeneous studies and should not be considered benchmarks for clinical performance. The current evidence remains preliminary and derived from idealized research settings, underscoring the need for prospective, multicenter studies with standardized methodologies to establish generalizability and clinical applicability.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-025-03033-5.

Keywords: Electrocardiogram, Sudden cardiac death, Artificial intelligence, Deep learning, Meta-analysis

Introduction

Sudden cardiac death (SCD) ranks as one of the primary factors contributing to deaths globally [1]. Frequently, it arises from pre-existing heart conditions, including coronary artery disease, arrhythmias, and cardiomyopathies [2]. Despite advances in cardiovascular care, the unpredictability of SCD remains a significant challenge for clinicians [3]. Early identification of individuals at risk is crucial, as timely intervention can significantly improve survival rates. However, due to the sudden onset of this condition, accurate and early risk stratification remains a key focus of ongoing research [4].

Electrocardiogram (ECG) is the most widely used diagnostic tool for the assessment of cardiac arrhythmias and the detection of abnormalities that predispose individuals to SCD [5]. This non-invasive, cost-effective, and widely available method offers valuable insights into the electrical activity of the heart [6]. However, conventional ECG interpretation is highly dependent on clinician experience, which can lead to variability in diagnosis. Additionally, ECG alone may fail to identify patients at risk of SCD, as it does not always capture subtle arrhythmic events or non-obvious risk factors [7].

The interpretation of ECG signals has seen a growing incorporation of artificial intelligence (AI), especially through machine learning (ML) algorithms [8, 9]. AI-based models offer the potential to enhance diagnostic accuracy by automating feature extraction, identifying hidden patterns, and providing objective predictions [10]. These systems have demonstrated promise in detecting arrhythmias and predicting SCD risk by analyzing complex ECG data [11]. Despite these advancements, the clinical applicability of AI-based ECG analysis remains controversial [12]. Critics argue that while AI models may improve diagnostic performance, concerns regarding their generalizability, interpretability, and integration into clinical practice persist [13].

The aim of this meta-analysis is to conduct a systematic review and quantitatively assess the diagnostic efficacy of AI-based models in ECG for predicting SCD, addressing the current controversies and providing a comprehensive synthesis of existing evidence.

Methods

This review was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA) [14]. The study protocol was registered in PROSPERO (CRD420251069632) before study selection and data extraction.

Search strategy

A systematic search was conducted in the PubMed, Web of Science, Embase databases, and IEEE Xplore, covering the period from their beginnings up to April 24, 2025. The search approach encompassed three main groups of concepts: “Artificial intelligence,” “sudden cardiac death,” and “ECG.” Details of this search strategy can be found in Supplementary Table 1. To find more pertinent studies, we additionally reviewed the reference lists of the articles chosen. Finally, to ensure inclusion of the latest publications, the search was repeated on June 5, 2025.

Inclusion and exclusion criteria

Research was selected based on the following PICOS criteria. Population (P): SCD and non-SCD individuals. Intervention (I): ECG-based AI models. Comparison (C): assessment of performance with or without expert consensus and database records. Outcomes (O): diagnostic performance measures, such as sensitivity, specificity, and AUC, for identifying heart rate variability (HRV), segmenting ECG signals, and directly using ECG lead signals. Study design (S): inclusive of both retrospective and prospective studies.

The criteria for excluding studies in this research included (1) studies involving animals; (2) articles that are not original research, including reviews, conference abstracts, case reports, meta-analyses, and letters to the editor, and preprints; (3) primary outcome data that is incomplete; and (4) non-English language study articles. We further excluded duplicate publications or overlapping datasets, retaining only the most thorough or latest study.

Quality assessment

In order to comprehensively assess the quality of the studies included, we utilized the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [15]. This instrument was modified by substituting irrelevant criteria with relevant items derived from the Prediction model Risk Of Bias Assessment Tool (PROBAST) [16]. The adaptations, described in this section, were informed by our experience with QUADAS-2 and consideration of potential biases stemming from variations in study design and conduct among the included studies.

The modified QUADAS-2 tool we developed assesses four key domains: participants, the index test (AI algorithm), the reference standard, and the analysis. The assessment covers not only the risk of bias in each specific domain but also addresses applicability issues related to the initial three domains. Two reviewers conducted independent evaluations of the bias risk present in the selected studies utilizing this instrument. The certainty of the evidence was evaluated employing the GRADE framework specifically for diagnostic tests. Any differences in the reviewers’ evaluations were reconciled through discussion.

Data extraction

Two reviewers (SH and MD) independently screened studies for applicability and extracted relevant data. Discrepancies were settled through a consensus process, with a third reviewer (ZW) acting as the adjudicator. The variables extracted comprised the first author, publication year, country of origin, the design of the study, imaging modality, AI methodology (algorithm/model), reference standard, analytical approach, and patient/lesion characteristics (enrollment numbers, lesion counts), data sources, validation method, and prediction interval. When data were incomplete or ambiguous, corresponding authors were contacted via email. Unavailable data were excluded to maintain analytical rigor.

Due to the omission of diagnostic contingency tables in many studies, we reconstructed fourfold tables utilizing two methods: (1) obtaining values from reported sensitivity, specificity, confirmed true positives according to the reference standard, and the total sample size; (2) determining optimal sensitivity and specificity through Youden’s index applied to data from the receiver operating characteristic (ROC) curve. We acknowledge that this reconstruction is a potential source of estimation error.

Outcome measures

The primary outcomes focused on AI model performance for HRV analysis, ECG signal segmentation, and direct ECG lead interpretation. Key metrics included sensitivity (TP/[TP + FN]), specificity (TN/[TN + FP]), and area under the ROC curve (AUC). Sensitivity measured correct positive identification, while specificity quantified accurate negative classification. The overall diagnostic accuracy was offered by AUC. In cases where multiple AI models were reported, data corresponding to the best-performing model (which had the highest AUC) were selected to represent optimal performance. This approach was chosen to represent “proof-of-concept” performance but may introduce optimism bias.

Statistical analysis

A bivariate random-effects model was utilized to combine the sensitivity and specificity estimates for AI applications regarding HRV, ECG segmentation, and direct interpretation of ECG. The pooled results were visualized using forest plots. Additionally, summary receiver operating characteristic (SROC) curves illustrated the pooled estimates along with 95% confidence and prediction intervals [17]. Clinical utility was assessed using Fagan’s nomogram.

The assessment of study heterogeneity was carried out using I2 and τ2 statistics, where thresholds of 25%, 50%, and 75% represent low, moderate, and high levels of heterogeneity, respectively [18]. In cases of HRV analyses involving more than ten datasets, meta-regression was applied if I2 exceeded 50% to investigate potential covariates, including imaging modality, region, AI method, year, reference standard, data sources, validation method, and prediction interval. Subgroup analyses were also conducted accordingly.

Deeks’ funnel plot test was employed to evaluate publication bias. The analyses were conducted with STATA software version 15.1 (Midas and Meta), and RevMan 5.4 facilitated risk of bias assessment. A statistical significance level was set at P < 0.05.

Results

Study selection

A search of the database yielded a total of 1148 entries. Following the elimination of duplicates (n = 958), screening of titles and abstracts led to the exclusion of 897 studies that were deemed ineligible. Among the 61 articles that were subject to full-text evaluation, 34 were omitted due to either a lack of data extraction (n = 19) or the use of non-AI methodologies or models (n = 15). This resulted in 27 studies being included for the final analysis [7, 10, 1943]. The selection process is illustrated in the PRISMA flowchart provided in Fig. 1.

Fig. 1.

Fig. 1

PRISMA flow diagram illustrating the study selection process

Study description and quality assessment

A total of 27 studies that met the eligibility criteria were assessed, including 16 investigations focused on HRV (involving 733 patients/images, range: 14–211), five studies that dealt with the segmentation of ECG signals (with 484 patients/images, range: 16–332), and six studies concerning the direct application of ECG lead signals (covering 1396 patients/images, range: 18–714). Of the 16 studies on HRV, adequate data regarding diagnostic performance for AI were available, which allowed for a meta-regression to be conducted. In contrast, the studies related to ECG signal segmentation and direct ECG lead signal input did not provide sufficient data to facilitate a combined meta-regression. Within these studies, one was prospective while the remaining 26 were retrospective. The reference standards included records from databases (n = 24) and expert consensus opinions (n = 3). The predominant modeling algorithms were support vector machine (SVM) (8/27, 30%), followed by k-nearest neighbor (KNN) (6/27, 22%), convolutional neural network (CNN) (3/27, 11%), artificial neural network (ANN) (4/27, 14.8%), decision tree (1/27, 3.7%), EIANet (1/27, 3.7%), RF (1/27, 3.7%), U-Net (1/27, 3.7%), and XGBoost (1/27, 3.7%). Study characteristics and technical details are summarized in Tables 1 and 2 and Supplementary Table 2.

Table 1.

Study and patient characteristics of the included studies

Author Year Country Study design Imaging modality Reference standard Analysis Patients/images per set
Heart rate variability ECG signal segmentation Direct input of ECG lead signals
Urteaga et al. 2025 America Retro ECG Expert consensus PB NR 332 NR
Lu et al. 2025 China Retro ECG Database records PB NR NR 114
Holmstrom et al. 2024 America Pro ECG Database records IB NR NR 714
Butler et al. 2024 Japan Retro ECG Database records PB NR NR 461
Kenet et al. 2023 America Retro ECG Expert consensus PB 211 NR NR
Panjaitan et al. 2023 Indonesia Retro ECG Database records PB 20 NR NR
Caesarendra et al. 2022 Brunei Retro ECG Database records IB NR 90 NR
Parsi et al. 2021 Ireland Retro ECG Database records IB 135 NR NR
Shi et al. 2020 China Retro ECG Database records IB 40 NR NR
Murugappan et al. 2020 Kuwait Retro ECG Database records PB NR 18 NR
Ebrahimzadeh et al. 2019 Iran Retro ECG Database records IB 35 NR NR
Devi et al. 2019 India Retro ECG Database records PB 18 NR NR
Lai et al. 2019 China Retro ECG Database records PB NR 28 NR
Calderon et al. 2019 Spain Retro ECG Database records PB NR 20 NR
Rodriguez et al. 2019 German Retro ECG Expert consensus PB 14 NR NR
Khazaei et al. 2018 Iran Retro ECG Database records IB 40 NR NR
Ebrahimzadeh et al. 2017 Iran Retro ECG Database records IB 35 NR NR
Houshyarifar et al. 2017 Iran Retro ECG Database records PB 23 NR NR
Fujita et al. 2016 Japan Retro ECG Database records IB 21 NR NR
Fairooz et al. 2016 Saudi Retro ECG Database records PB NR NR 18
Acharya et al. 2015 Singapore Retro ECG Database records IB NR NR 40
Acharya et al. 2015 Singapore Retro ECG Database records IB 20 NR NR
Ramírez et al. 2015 Spain Retro ECG Expert consensus PB NR NR 49
Murugappan et al. 2015 Malaysia Retro ECG Database records IB 40 NR NR
Murukesan et al. 2014 Malaysia Retro ECG Database records IB 23 NR NR
Ebrahimzadeh et al. 2014 America Retro ECG Database records IB 35 NR NR
Shen et al. 2007 China Retro ECG Database records IB 23 NR NR

Retro retrospective, Pro prospective, ECG electrocardiogram, PB patient-based, IB image-based, NR not report

Table 2.

Technical aspects of included studies

Author Year AI method Optimal AI algorithms Heart rate variability ECG signal segmentation Direct input of ECG lead signals
TP FP FN TN TP FP FN TN TP FP FN TN
Urteaga et al. 2025 Deep learning U-Net NR NR NR NR 324 4 8 101 NR NR NR NR
Lu et al. 2025 Deep learning EIANet NR NR NR NR NR NR NR NR 91 19 23 147
Holmstrom et al. 2024 Deep learning NR NR NR NR NR NR NR NR 545 67 169 262
Butler et al. 2024 Deep learning CNN NR NR NR NR NR NR NR NR 318 258 143 1586
Kenet et al. 2023 Machine learning XGBoost 192 345 19 2503 NR NR NR NR NR NR NR NR
Panjaitan et al. 2023 Deep learning CNN 19 1 1 94 NR NR NR NR NR NR NR NR
Caesarendra et al. 2022 Deep learning CNN NR NR NR NR 80 0 10 230 NR NR NR NR
Parsi et al. 2021 Machine learning KNN 120 7 15 119 NR NR NR NR NR NR NR NR
Shi et al. 2020 Machine learning KNN 39 2 1 34 NR NR NR NR NR NR NR NR
Murugappan et al. 2020 Machine learning SVM NR NR NR NR 17 1 1 17 NR NR NR NR
Ebrahimzadeh et al. 2019 Machine learning KNN 30 6 5 29 NR NR NR NR NR NR NR NR
Devi et al. 2019 Machine learning KNN 14 4 4 29 NR NR NR NR NR NR NR NR
Lai et al. 2019 Machine learning RF NR NR NR NR 28 0 0 18 NR NR NR NR
Calderon et al. 2019 Deep learning ANN NR NR NR NR 15 1 1 19 NR NR NR NR
Rodriguez et al. 2019 Machine learning SVM 14 5 0 72 NR NR NR NR NR NR NR NR
Khazaei et al. 2018 Machine learning Decision tree 38 2 2 34 NR NR NR NR NR NR NR NR
Ebrahimzadeh et al. 2017 Deep learning MLP 29 5 6 30 NR NR NR NR NR NR NR NR
Houshyarifar et al. 2017 Machine learning SVM 19 1 4 35 NR NR NR NR NR NR NR NR
Fujita et al. 2016 Machine learning SVM 20 1 1 16 NR NR NR NR NR NR NR NR
Fairooz et al. 2016 Machine learning SVM NR NR NR NR NR NR NR NR 18 0 0 18
Acharya et al. 2015 Machine learning SVM NR NR NR NR NR NR NR NR 40 1 0 35
Acharya et al. 2015 Machine learning KNN 16 1 4 17 NR NR NR NR NR NR NR NR
Ramirez et al. 2015 Machine learning SVM NR NR NR NR NR NR NR NR 27 175 22 373
Murugappan et al. 2015 Machine learning KNN 37 2 3 34 NR NR NR NR NR NR NR NR
Murukesan et al. 2014 Machine learning SVM 21 0 2 18 NR NR NR NR NR NR NR NR
Ebrahimzadeh et al. 2014 Deep learning MLP 29 29 6 6 NR NR NR NR NR NR NR NR
Shen et al. 2007 Deep learning ANN 16 8 7 12 NR NR NR NR NR NR NR NR

TP true positive, TN true negative, FP false positive, FN false positive, NR not report

The assessment of quality using the revised QUADAS-2 tool revealed a notable risk of bias related to patient selection in one study due to insufficient exclusion criteria. The investigation conducted by Kenet et al. involved patients admitted to the Pediatric Intensive Care Unit. None of the studies was categorized as high risk within the domains of index test, reference standard, or analysis. The thorough quality evaluation indicated that the studies included were of acceptable quality, as illustrated in Fig. 2. Among the 16 outcomes evaluated in our meta-analysis, the GRADE certainty of evidence varied from very low to high. Notably, the certainty of evidence regarding sensitivity and specificity in the “AI for HRV and ECG signal segmentation” group was assessed as moderate, while the certainty of evidence for sensitivity and specificity in the “AI for direct input of ECG lead signals” group was rated low. Additional details concerning the GRADE assessments can be found in Supplementary Table 3.

Fig. 2.

Fig. 2

Risk of bias and applicability concerns of the included studies using the modified Quality Assessment of Diagnostic Performance Studies QUADAS-2 tool

Diagnostic performance of AI in detecting heart rate variability

In the analysis of HRV, the pooled estimates showed a sensitivity of 0.90 (95% CI: 0.86–0.92) and a specificity of 0.91 (95% CI: 0.83–0.96), with an AUC of 0.93 (95% CI: 0.91–0.95); the 95% prediction contour, representing the expected range of future study results accounting for between-study heterogeneity, revealed substantial variability, particularly in the specificity dimension (Figs. 3 and 4). When using Fagan’s nomogram with a pre-test probability set at 20%, the positive and negative likelihood ratios were found to be 72% and 3%, respectively (Fig. 5).

Fig. 3.

Fig. 3

Forest plot of sensitivity and specificity for electrocardiogram artificial intelligence in detecting heart rate variability. Squares denoted the sensitivity and specificity in each study, while horizontal bars indicated the 95% confidence interval

Fig. 4.

Fig. 4

Summary receiver operating characteristic (SROC) curves for electrocardiogram-based artificial intelligence in detecting heart rate variability

Fig. 5.

Fig. 5

Fagan’s nomogram of electrocardiogram-based artificial intelligence diagnostic performance in patients with heart rate variability

A significant level of heterogeneity in AI performance was noted, with I2 values of 58.01% for sensitivity and 93.73% for specificity. To provide a more appropriate quantification of heterogeneity in diagnostic meta-analyses, we calculated the variance components (τ2) from the bivariate random-effects model, which yielded τ2 = 0.126 for sensitivity and τ2 = 1.878 for specificity. The notably higher τ2 value for specificity reflects considerable between-study variation in specificity estimates. For the heterogeneity of sensitivity, the regression showed year (2015–2025 vs before 2015), AI method (deep learning vs machine learning), analysis (image-based vs patient-based), reference standard (database records vs expert consensus), region (Asia vs non-Asia), validation method (cross validation vs hold-out validation), prediction interval (<10 min vs ≥10 min), and data sources (database vs hospital) as the main contributor (P < 0.05) (Supplementary Fig. 1). For the heterogeneity of specificity, year (2015–2025 vs before 2015) is the main contributor.

Subgroup analyses further revealed that for sensitivity, year was the only significant factor (P < 0.05), while for specificity, multiple factors showed significant associations including region, AI method, year, validation method, and prediction interval (all P < 0.05) (Table 3). In addition, we found three studies that may have led to higher heterogeneity, and after excluding them, the meta-analysis showed that the heterogeneity estimate was significantly reduced: sensitivity I2 from 58.01% to 28.21%, specificity I2 from 93.73% down to 39.99% (Supplementary Fig. 2).

Table 3.

Subgroup analysis of artificial intelligence performance in internal validation cohorts for the detection of heart rate variability

Subgroup Studies, n Sensitivity (95% CI) Subgroup difference P value Specificity (95% CI) Subgroup difference P value
Reference standard 0.38 0.79
 Database records 14 0.89 (0.85–0.93) 0.93 (0.86–0.97)
 Expert consensus 2 0.92 (0.86–0.96) 0.91 (0.69–0.98)
Analysis 0.42 0.55
 Image-based 11 0.90 (0.85–0.93) 0.91 (0.82–0.96)
 Patient-based 5 0.92 (0.87–0.95) 0.94 (0.85–0.98)
Region 0.25 0.00
 Asia 12 0.90 (0.86–0.94) 0.97 (0.93–0.99)
 Non-Asia 4 0.87 (0.80–0.91) 0.68 (0.42–0.86)
AI method 0.09 0.00
 Deep learning 4 0.85 (0.75–0.91) 0.80 (0.63–0.91)
 Machine learning 12 0.91 (0.88–0.93) 0.94 (0.90–0.97)
Year 0.04 0.00
 2015–2025 3 0.92 (0.87–0.95) 0.95 (0.84–0.99)
 2015 earlier 13 0.90 (0.85–0.93) 0.92 (0.84–0.96)
Validation method 0.55 0.02
 Cross validation 10 0.90 (0.85–0.93) 0.90 (0.80–0.95)
 Hold-out validation 6 0.92 (0.84–0.96) 0.96 (0.90–0.98)
Prediction interval 0.11 0.00
 <10 min 11 0.89 (0.85–0.92) 0.88 (0.73–0.95)
 ≥10 min 5 0.94 (0.87–0.97) 0.99 (0.96–1.00)
Data sources 0.38 0.79
 Database 14 0.89 (0.85–0.93) 0.93 (0.86–0.97)
 Hospital 2 0.92 (0.86–0.95) 0.91 (0.69–0.98)

Additional analysis by algorithm type showed that KNN demonstrated consistent performance across studies (sensitivity: 0.89, I2 = 20.74%; specificity: 0.92, I2 = 9.08%). In contrast, SVM and CNN models, while achieving high point estimates (SVM sensitivity: 0.94; CNN specificity: 0.99), exhibited substantial heterogeneity (I2 > 90%), suggesting limited generalizability. ANN showed moderate performance with considerable variability (Supplementary Fig. 3).

Diagnostic performance of AI in detecting ECG signal segmentation and direct input of ECG lead signals

The comprehensive analysis of different ECG analysis approaches revealed promising but highly variable diagnostic performance, with significant heterogeneity precluding robust quantitative pooling of results. For ECG signal segmentation (n = 5 studies), individual studies demonstrated a wide range of sensitivity (0.89 to 1.00) and specificity (0.81 to 1.00) (Supplementary Fig.  4 A). The SROC analysis (Supplementary Fig. 5 A) indicated a summary operating point with high sensitivity and specificity, and an AUC of 0.99. However, the considerable width of the 95% prediction contour reflects substantial uncertainty in the expected performance of future studies, consistent with the significant statistical heterogeneity (I2 = 73.39%) observed in the forest plot. Based on these potentially unstable summary estimates, Fagan’s nomogram analysis (Supplementary Fig.  6 A) suggested that with a pre-test probability of 20%, a positive AI test could increase the post-test probability to 94%. Similarly, for direct input of ECG lead signals (n = 6 studies), performance estimates varied considerably, with sensitivity values spanning 0.61 to 0.97 and specificity ranging from 0.75 to 0.97 (Supplementary Fig. 4B). The SROC curve (Supplementary Fig. 5B) showed good overall discriminatory power (AUC = 0.95), but with noticeably wider confidence and prediction contours compared to the segmentation approach. This extensive prediction region aligns with the high heterogeneity (I2 = 93.73%) found in the quantitative synthesis. Consequently, the corresponding Fagan’s nomogram analysis (Supplementary Fig. 6B), which indicated a post-test probability of 71% for positive results at the same pre-test probability, should be interpreted with caution due to the substantial uncertainty in the underlying summary estimates. In summary, due to the small number of studies, these pooled estimates should be interpreted as preliminary and hypothesis-generating.

Publication bias

Deeks’ funnel plot asymmetry assessment indicated the absence of substantial publication bias concerning HRV AI (P = 0.87) (Fig. 6). Similarly, no notable publication bias was detected in relation to ECG signal segmentation and the direct utilization of ECG lead signals (P = 0.40, 0.20) (Supplementary Fig. 7).

Fig. 6.

Fig. 6

Deeks’ funnel plot was used to evaluate publication bias of electrocardiogram-based artificial intelligence in detecting heart rate variability. P < 0.05 was considered statistically significant

Discussion

We systematically identified and summarized ML and DL models that use ECG to predict SCD and conducted exploratory analyses to investigate sources of heterogeneity. AI demonstrates the capability to derive and analyze characteristics from intricate high-dimensional electrophysiological signals, potentially enabling identification of complex relationships with SCD [44]. To implement AI models in clinical settings, it is crucial to tackle these: the evidence shows substantial heterogeneity in model performance, electrophysiological features, and study designs. Furthermore, relatively few studies applied standardized quality assessments such as QUADAS-2 and GRADE. Addressing these methodological limitations will be crucial for the translation of AI models into clinical practice [45].

When comparing our findings with previous meta-analyses, it is evident that our study introduces several methodological and analytical advancements. Lee et al. conducted a meta-analysis of 102 studies. The application of artificial intelligence technology in wearable devices related to various cardiovascular diseases was examined [46]. However, their work lacks a focused assessment of SCD as a specific clinical endpoint. Furthermore, they did not conduct subgroup analyses or utilize the GRADE framework to assess the certainty of the evidence. In contrast, our meta-analysis was specifically devoted to evaluating the diagnostic performance of AI models using ECG to predict SCD. In addition to conventional analysis metrics, we also include validation methods, prediction intervals, and other metrics to compare different AI algorithms to make the evaluation more comprehensive. We employed well-defined disease criteria and inclusion criteria and systematically evaluated three representative AI methods, including HRV analysis, ECG signal segmentation, and direct ECG lead input. Furthermore, we applied the QUADAS-2 and GRADE tools, thereby enhancing the methodological rigor and evidentiary reliability of our findings. At the same time, we also have some limitations, especially in terms of heterogeneity. The heterogeneity in this study is large, so we also explored it, found potential causes that may lead to heterogeneity, and adjusted. Furthermore, our study provides several improvements compared to a recent meta-analysis by Kolk et al. [47]. First, for the assessment of heterogeneity, we applied not only I2 but also τ2 assessment, as I2 may ignore the correlation between sensitivity and specificity. Second, our study used Fagan nomogram to quantify the clinical utility of AI models under different pre-test probabilities, providing a more practical framework for translating diagnostic performance into clinical decision-making.

Our investigation into the sources of heterogeneity revealed a complex and inconsistent picture, ultimately underscoring that the high and largely unexplained heterogeneity makes the pooled results a statistical summary of a disparate literature rather than a generalizable performance metric. While meta-regression identified multiple potential contributors to sensitivity (including study year, AI method, and reference standard), subgroup analyses yielded discordant results. This inconsistency, likely stemming from methodological differences and limited statistical power [48], means that the true sources of variation remain elusive. Furthermore, the interpretability of AI models in clinical scenarios is a recognized challenge. Despite their high performance observed in recent studies, their inherent “black-box” nature may limit the deployment of AI models in clinical settings [49]. Critically, the extreme fragility of our results was demonstrated through sensitivity analysis: the exclusion of only three methodologically distinct studies—Kenet et al. (2023), pediatric ICU population) and two early studies [27, 41] with potentially obsolete device models—precipitated a dramatic reduction in heterogeneity (e.g., I2 for specificity dropped from 93.73% to 39.99%). This profound sensitivity severely undermines the robustness of the summary estimates and confirms that the impressive pooled performance metrics are unstable artifacts, heavily dependent on a few specific studies. Therefore, these estimates cannot be considered reliable benchmarks for clinical practice. The persistent, multifactorial heterogeneity necessitates a fundamental shift toward prospective, standardized, and rigorously validated study designs before clinically meaningful performance metrics can be established.

Analysis by algorithm type revealed distinct performance patterns; KNN demonstrated relatively consistent performance across studies, whereas SVM and CNN models, despite achieving high performance metrics in specific settings, exhibited substantially higher heterogeneity that may limit their generalizability. These findings suggest a potential trade-off between consistency and peak performance in algorithm selection for SCD prediction. The potential clinical implications of the AI-ECG performance estimates were explored methodologically using Fagan’s nomogram under a hypothetical pre-test probability of 20%. However, it is crucial to emphasize that these calculations are dependent on the unstable and heterogeneous summary estimates presented earlier. The Fagan’s nomogram analysis suggested that a positive result from the AI model analyzing ECG signal segmentation could theoretically increase the post-test probability to 94%. Similarly, the analysis for HRV-based and direct ECG input models yielded theoretical probability shifts to 71–72% for positive results, while negative results across all models consistently reduced the probability to 1–3%. These striking numerical results, while mathematically derived from our meta-analytic estimates, are built upon a chain of evidence with inherent limitations, including retrospective data, selective reporting of best-performing models, data reconstruction, and heterogeneous pooling. Therefore, these probability shifts should be interpreted not as established clinical benchmarks, but as hypothetical illustrations.

Limitations

This study has several limitations. First, the retrospective design and small sample sizes of included studies limit generalizability, while extreme heterogeneity makes the pooled estimates statistically fragile and clinically non-generalizable. Second, exploratory analyses of heterogeneity sources yielded inconsistent results, leaving the true causes of variation unconfirmed. Third, selective inclusion of best-performing models likely introduces optimism bias, reflecting idealized performance rather than real-world clinical utility. Finally, the low-to-moderate evidence certainty indicates that the impressive performance metrics represent methodological artifacts from optimized conditions rather than reliable clinical benchmarks.

Future directions

Prospective, multicenter studies with larger samples are needed to validate these findings. Future work should further investigate heterogeneity sources and include all developed models (not just best-performing ones), and standardized studies are needed in the future to better guide clinical application.

Conclusion

This meta-analysis demonstrates that the current evidence for AI-based ECG analysis in SCD prediction is preliminary and derived from biased and heterogeneous retrospective data. Consequently, the impressive summary performance metrics must be interpreted as artifacts of idealized research settings rather than guarantees of clinical utility. To bridge this gap, future research must prioritize prospective, multicenter studies employing standardized methodologies to validate the generalizability and robustness of these models in real-world clinical environments.

Supplementary Information

Supplementary Material 1 (3.5MB, docx)

Acknowledgements

Not applicable.

Authors’ contributions

SH conceived and designed the study. SH, MD, and ZW extracted and analyzed the data, while SH and MD wrote the first version of the manuscript. All authors contributed to the manuscript and approved the final version for submission.

Funding

Doctoral Start-up Foundation of Liaoning Province (2021-BS-039).

Data availability

The original findings of this research are included in the article. For additional inquiries, please contact the corresponding authors.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors report no commercial or financial interests that could be interpreted as potential conflicts of interest in conducting this research.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Shuang He and Ming Du contributed equally to this work and should be considered co-first authors.

Bo Luan and Na Duan contributed equally to this work and should be considered co-corresponding authors.

Contributor Information

Bo Luan, Email: luanbo2016@163.com.

Na Duan, Email: duanna202508@163.com.

References

  • 1.Wong CX, Brown A, Lau DH, Chugh SS, Albert CM, Kalman JM, et al. Epidemiology of sudden cardiac death: global and regional perspectives. Heart Lung Circ. 2019;28(1):6–14. [DOI] [PubMed] [Google Scholar]
  • 2.Zipes DP, Wellens HJJ. Sudden cardiac death. Circulation. 1998;98(21):2334–51. [DOI] [PubMed] [Google Scholar]
  • 3.Marijon E, Narayanan K, Smith K, Barra S, Basso C, Blom MT, et al. The Lancet Commission to reduce the global burden of sudden cardiac death: a call for multidisciplinary action. Lancet. 2023;402(10405):883–936. [DOI] [PubMed] [Google Scholar]
  • 4.Chugh SS. Early identification of risk factors for sudden cardiac death. Nat Rev Cardiol. 2010;7(6):318–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McLaughlin MG, Zimetbaum PJ. Electrocardiographic predictors of arrhythmic death. Ann Noninvasive Electrocardiol. 2006;11(4):327–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Abdelghani SA, Rosenthal TM, Morin DP. Surface electrocardiogram predictors of sudden cardiac arrest. Ochsner J. 2016;16(3):280–9. [PMC free article] [PubMed] [Google Scholar]
  • 7.Holmstrom L, Chugh H, Nakamura K, Bhanji Z, Seifer M, Uy-Evanado A, et al. An ECG-based artificial intelligence model for assessment of sudden cardiac death risk. Commun Med. 2024;4(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang J, Liu A, Gao M, Chen X, Zhang X, Chen X. ECG-based multi-class arrhythmia detection using spatio-temporal attention-based convolutional recurrent neural network. Artif Intell Med. 2020;106:101856. [DOI] [PubMed] [Google Scholar]
  • 9.Martínez-Sellés M, Marina-Breysse M. Current and future use of artificial intelligence in electrocardiography. J Cardiovasc Dev Dis. 2023. 10.3390/jcdd10040175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Panjaitan F, Nurmaini S, Partan RU. Accurate prediction of sudden cardiac death based on heart rate variability analysis using convolutional neural network. Medicina (Kaunas). 2023. 10.3390/medicina59081394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kolk MZH, Ruipérez-Campillo S, Wilde AAM, Knops RE, Narayan SM, Tjong FVY. Prediction of sudden cardiac death using artificial intelligence: current status and future directions. Heart Rhythm. 2025;22(3):756–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chung CT, Lee S, King E, Liu T, Armoundas AA, Bazoukis G, et al. Clinical significance, challenges and limitations in using artificial intelligence for electrocardiography-based diagnosis. International Journal of Arrhythmia. 2022;23(1):24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liu Z. A review of artificial intelligence in electrocardiogram recognition. arXiv. Preprints 2025;2025040799. 10.20944/preprints202504.0799.v1
  • 14.Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36. [DOI] [PubMed] [Google Scholar]
  • 16.Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–8. [DOI] [PubMed] [Google Scholar]
  • 17.Jones CM, Athanasiou T. Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. Ann Thorac Surg. 2005;79(1):16–20. [DOI] [PubMed] [Google Scholar]
  • 18.Lin L. Comparison of four heterogeneity measures for meta-analysis. J Eval Clin Pract. 2020;26(1):376–84. [DOI] [PubMed] [Google Scholar]
  • 19.Acharya URFH, Sudarshan VK. An integrated index for detection of sudden cardiac death using discrete wavelet transform and nonlinear features. Knowl Based Syst. 2015;83:149–58. [Google Scholar]
  • 20.Acharya UR FH, Sudarshan VK, Ghista DN, Lim WJE,, JEW K. Automated prediction of sudden cardiac death risk using Kolmogorov complexity and recurrence quantification analysis features extracted from HRV signals. IEEE Int Conf Syst Man Cybern. 2015;1110–1115.
  • 21.Butler L, Ivanov A, Celik T, Karabayir I, Chinthala L, Tootooni MS, et al. Time-dependent ECG-AI prediction of fatal coronary heart disease: a retrospective study. J Cardiovasc Dev Dis. 2024. 10.3390/jcdd11120395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Caesarendra W, Hishamuddin TA, Lai DTC, Husaini A, Nurhasanah L, Glowacz A, et al. An embedded system using convolutional neural network model for online and real-time ECG signal classification and prediction. Diagnostics. 2022. 10.3390/diagnostics12040795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Calderon A PA, Valente J. ECG feature extraction and ven tricular fibrillation (VF) prediction using data mining techniques. IEEE 32nd International Symposium on Computer-Based Medical Systems. 2019;2019:14–19.
  • 24.Devi RTH, Kumar D. A novel multi-class approach for early stage prediction of sudden cardiac death. Biocybern Biomed Eng. 2019;39(3):586–98. [Google Scholar]
  • 25.Ebrahimzadeh E, Foroutan A, Shams M, Baradaran R, Rajabion L, Joulani M, et al. An optimal strategy for prediction of sudden cardiac death through a pioneering feature-selection approach from HRV signal. Comput Methods Programs Biomed. 2019;169:19–36. [DOI] [PubMed] [Google Scholar]
  • 26.Ebrahimzadeh E, Manuchehri MS, Amoozegar S, Araabi BN, Soltanian-Zadeh H. A time local subset feature selection for prediction of sudden cardiac death from ECG signal. Med Biol Eng Comput. 2018;56(7):1253–70. [DOI] [PubMed] [Google Scholar]
  • 27.Ebrahimzadeh E, Pooyan M, Bijar A. A novel approach to predict sudden cardiac death (SCD) using nonlinear and time-frequency analyses from HRV signals. PLoS One. 2014;9(2):e81896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Fairooz TKH. SVM classification of CWT signal features for predicting sudden cardiac death. Biomed Phys Eng Express. 2016. 10.1088/2057-1976/2/2/025006. [Google Scholar]
  • 29.Fujita HAU, Sudarshan VK, et al. Sudden cardiac death (SCD) prediction based on nonlinear heart rate variability features and SCD index. Appl Soft Comput. 2016;43:510–9. [Google Scholar]
  • 30.Houshyarifar VAM. Early detection of sudden cardiac death using Poincaré plots and recurrence plot-based features from HRV signals. Turk J Electr Eng Comput Sci. 2017;25:1541–53. [Google Scholar]
  • 31.Kenet AL, Pemmaraju R, Ghate S, Raghunath S, Zhang Y, Yuan M, et al. A pilot study to predict cardiac arrest in the pediatric intensive care unit. Resuscitation. 2023;185:109740. [DOI] [PubMed] [Google Scholar]
  • 32.Khazaei MRK, Goshvarpour A, Ahmadzadeh M. Early detection of sudden cardiac death using nonlinear analysis of heart rate variability. Biocybern Biomed Eng. 2018;38(4):931–40. [Google Scholar]
  • 33.Lai DZY, Zhang X, Su Y, Bin Heyat MB. An automated strategy for early risk identification of sudden cardiac death by using machine learning approach on measurable arrhythmic risk markers. IEEE Access. 2019;7:94701–16. [Google Scholar]
  • 34.Lu SC, Chen GY, Liu AS, Sun JT, Gao JW, Huang CH, et al. Deep learning-based electrocardiogram model (EIANet) to predict emergency department cardiac arrest: development and external validation study. J Med Internet Res. 2025;27:e67576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Murugappan MML, Omar I, Khatun S, Murugappan S. Time domain features based sudden cardiac arrest prediction using machine learning algorithms. J Med Imag Health Inform. 2015;5(6):1267–71. [Google Scholar]
  • 36.Murugappan MML, Jerritta S, Adeli H. Sudden cardiac arrest (SCA) prediction using ECG morphological features. Arab J Sci Eng. 2020;46(2):947–61. [Google Scholar]
  • 37.Murukesan LMM, Iqbal M, Saravanan K. Machine learning approach for sudden cardiac arrest prediction based on optimal heart rate variability features. J Med Imaging Health Inform. 2014;4(4):521–32. [Google Scholar]
  • 38.Parsi ABD, Glavin M, Jones E. Heart rate variability feature selection method for automated prediction of sudden cardiac death. Biomed Signal Process Control. 2021. 10.1016/j.bspc.2020.102310. [Google Scholar]
  • 39.Ramirez JMV, Minchole A, et al. Automatic SVM classification of sudden cardiac death and pump failure death from autonomic and repolarization ECG markers. J Electrocardiol. 2015;48(4):551–7. [DOI] [PubMed] [Google Scholar]
  • 40.Rodriguez J, Schulz S, Giraldo BF, Voss A. Risk stratification in idiopathic dilated cardiomyopathy patients using cardiovascular coupling analysis. Front Physiol. 2019;10:841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Shen TW, Shen HP, Lin CH, Ou YL. Detection and prediction of sudden cardiac death (SCD) for personal healthcare. Annu Int Conf IEEE Eng Med Biol Soc. 2007;2007:2575–8. [DOI] [PubMed] [Google Scholar]
  • 42.Shi M, He H, Geng W, Wu R, Zhan C, Jin Y, et al. Early detection of sudden cardiac death by using ensemble empirical mode decomposition-based entropy and classical linear features from heart rate variability signals. Front Physiol. 2020;11:118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Urteaga J, Elola A, Herráez D, Norvik A, Unneland E, Bhardwaj A, et al. A deep learning model for QRS delineation in organized rhythms during in-hospital cardiac arrest. Int J Med Inform. 2025;196:105803. [DOI] [PubMed] [Google Scholar]
  • 44.Mehari T, Strodthoff N. Self-supervised representation learning from 12-lead ECG data. Comput Biol Med. 2022;141:105114. [DOI] [PubMed] [Google Scholar]
  • 45.Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet. 2019;394(10201):861–7. [DOI] [PubMed] [Google Scholar]
  • 46.Lee S, Chu Y, Ryu J, Park YJ, Yang S, Koh SB. Artificial intelligence for detection of cardiovascular-related diseases from wearable devices: a systematic review and meta-analysis. Yonsei Med J. 2022;63(Suppl):S93-s107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kolk MZH, Deb B, Ruipérez-Campillo S, Bhatia NK, Clopton P, Wilde AAM, et al. Machine learning of electrophysiological signals for the prediction of ventricular arrhythmias: systematic review and examination of heterogeneity between studies. EBioMedicine. 2023;89:104462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Spineli LM, Pandis N. Problems and pitfalls in subgroup analysis and meta-regression. Am J Orthod Dentofacial Orthop. 2020;158(6):901–4. [DOI] [PubMed] [Google Scholar]
  • 49.Huang Z, Yang E, Shen J, Gratzinger D, Eyerer F, Liang B, et al. A pathologist-AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat Biomed Eng. 2025;9(4):455–70. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (3.5MB, docx)

Data Availability Statement

The original findings of this research are included in the article. For additional inquiries, please contact the corresponding authors.


Articles from Systematic Reviews are provided here courtesy of BMC

RESOURCES