Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2026 Mar 4;5(3):e0001266. doi: 10.1371/journal.pdig.0001266

Association between deep learning–based atrial fibrillation burden and in-hospital mortality

Yongseop Lee 1,#, Yujee Chang 2,#, Jihoon Seo 2, Jung Ah Lee 1, Jung Ho Kim 1, Jin Young Ahn 1, Su Jin Jeong 1, Jun Yong Choi 1, Joon-Sup Yeom 1, Nam Su Ku 1,*, Dukyong Yoon 2,3,4,*
Editor: Iqram Hussain5
PMCID: PMC12959658  PMID: 41779734

Abstract

Despite its clinical significance, research on atrial fibrillation (AF) burden as a dynamic, real-time predictor of adverse outcomes in patients with critical illness is lacking. This study examined the association between high AF burden and in-hospital mortality in critically ill patients, using intensive care unit (ICU) data from the Medical Information Mart for Intensive Care III (MIMIC-III; 2001–2012) and Yongin Severance Hospital (2021–2023). Electrocardiogram waveform data were analyzed using deep learning models to calculate AF burden. Adult ICU patients were included, with exclusion of those aged ≥90 years and those with an AF burden >0.9. AF burden was defined as the ratio of AF waveforms to total waveforms during ICU admission, with a high burden defined as ≥7.0%. Logistic regression and machine learning models were employed to assess the association between AF burden and in-hospital mortality, as well as to evaluate the contribution of AF burden to mortality prediction. From the MIMIC-III database, 7,734 patients were included: 5,734 (74.1%) had a low AF burden (median, 0.3%) and 2,000 (25.9%) had a high AF burden (median, 22.5%). High AF burden was associated with significantly higher in-hospital mortality (18.1% vs. 8.6%, P < 0.001) and was identified as an independent risk factor (adjusted odds ratio, 1.63; 95% confidence interval, 1.36–1.95; P < 0.001). Machine learning models demonstrated that AF burden is a significant contributor to mortality prediction, with an area under the curve of 0.86. AF burden may serve as a dynamic marker for real-time alerts of clinical deterioration and for risk stratification in critically ill patients.

Author summary

When people become critically ill and are admitted to the intensive care unit, irregular heart rhythms such as atrial fibrillation are common and can be dangerous. In the past, most research has looked at atrial fibrillation simply as present or absent. However, this approach ignores how much time a patient actually spends in this rhythm. In our study, we measured the total “burden” of atrial fibrillation, meaning the percentage of time a patient’s heart was in this rhythm during their stay in the intensive care unit. We analyzed over 7,700 patients from a large public hospital database in the United States and confirmed our results using data from a Korean hospital. We found that patients with a high atrial fibrillation burden had a significantly higher risk of dying in the hospital. Using deep learning and machine learning methods, we also showed that atrial fibrillation burden was an important factor in predicting patient outcomes, alongside age, sepsis, and use of a ventilator. Because heart rhythm monitoring is already part of routine care in intensive care units, our approach could allow doctors to identify high-risk patients in real time, without extra cost or procedures, and potentially guide early interventions.

Introduction

Atrial fibrillation (AF) is a common form of arrhythmia diagnosed in critically ill patients. The incidence of new-onset AF in patients with critical illnesses ranges from 4.5–15.0%. The risk increases with advanced age, acute respiratory failure, and sepsis, particularly in patients with septic shock, where the incidence increases to 46% [1,2]. The development of AF increases the risk of in-hospital mortality and a prolonged length of stay [2,3].

Most studies have evaluated AF as a binary entity (present or absent). However, the AF burden (the total amount of time spent in AF rhythm) has recently been suggested as a crucial predictor of AF-related adverse outcomes as opposed to the presence of AF [4]. Importantly, AF burden represents a dynamic marker that can only be accurately captured through continuous, real-time monitoring rather than intermittent ECG assessments. Prior studies relying on spot recordings or manual review were unable to reflect temporal fluctuations in AF episodes.

Previously, technology for processing large amounts of electrocardiogram (ECG) data was lacking. Recent advances in deep learning have enabled the automatic classification of heart rhythms at the cardiologist level, facilitating accurate and effective analysis of large volumes of ECG monitoring data [5]. Leveraging these advances, deep learning models now enable continuous and automatic computation of AF burden from full-length ICU ECG streams, offering a unique approach compared with conventional intermittent monitoring methods. However, this capability has not been explored in critically ill populations.

In parallel, deep learning has also shown strong performance in other continuous biosignal and IoT-based monitoring environments, where sensor-fused data streams are analyzed to detect clinically meaningful physiologic patterns [6]. These developments highlight the broader technological foundation supporting our approach and further emphasize the potential of AI-based, real-time signal analysis for improving clinical risk stratification.

Despite the clinical importance of AF in patients with critical illness, research on the AF burden in this population is lacking. Traditionally, scoring systems, such as the Acute Physiology and Chronic Health Evaluation II (APACHE II), Sequential Organ Failure Assessment (SOFA), and Simplified Acute Physiology Score (SAPS), have been widely used to predict the prognosis of patients in the intensive care unit (ICU) [7,8]. However, these methods are limited by their inability to reflect real-time changes in a patient’s condition.

This study aimed to examine the association between a high AF burden and in-hospital mortality in patients with critical illness. Additionally, the study sought to determine the contribution of AF burden to in-hospital mortality using machine learning. We also aimed to determine the utility of real-time ECG data in predicting patient mortality in the ICU. An overview of this study is provided in Fig 1.

Fig 1. Study overview.

Fig 1

Abbreviations: MIMIC-III, Medical Information Mart for Intensive Care III; AF, atrial fibrillation; ICU, intensive care unit; CCI, Charlson Comorbidity index; SOFA, Sequential Organ Failure Assessment ML, Machine Learning; SHAP, SHapley Additive exPlanation.

Results

Study population

From the MIMIC-III database, 57,786 patients were admitted to the ICU, of which 8,455 had ECG data available for analysis (Fig 2A). Among these, 721 patients were excluded due to age or suspected persistent AF: 16 patients were aged <18 years, 332 were aged >90 years, and 373 had an AF burden of >0.9. After exclusion, 7,734 patients were included in the analysis; 5,734 (74.1%) and 2,000 (25.9%) had low and high AF burden, respectively.

Fig 2. Flowchart of the study.

Fig 2

(A) MIMIC-III (training dataset) and (B) Yongin Severance dataset (external validation dataset) flowcharts. Abbreviations: ICU, intensive care unit; EKG, electrocardiogram; AF, atrial fibrillation; MIMIC-III, Medical Information Mart for Intensive Care III.

The median age of the high AF burden group was 74 years (IQR: 64–81 years), which was significantly higher than that of the low AF burden group (60 years [IQR: 49–71 years]; Table 1). The sex distribution was not significantly different between the two groups, with the proportion of male patients being 57.0% and 58.8% in the low and high AF burden groups, respectively.

Table 1. Clinical characteristics of patients with critical illness stratified by atrial fibrillation burden.

MIMIC-III Yongin Severance Hospital
Total

(n = 7,734)
Low AF burden

(n = 5,734)
High AF burden

(n = 2,000)
P-value Total

(n = 3,428)
Low AF burden

(n = 2,573)
High AF burden

(n = 855)
P-value
Age (years) 64.0 (52.0–75.0) 60.0 (49.0–71.0) 74.0 (64.0–81.0) <0.001 68.0

(57.0–79.0)
65.0

(54.0–76.0)
78.0

(67.0–83.0)
<0.001
Male sex 4,444 (57.5%) 3,268 (57.0%) 1,176 (58.8%) 0.159 2,087

(60.8%)
1,804

(70.1%)
502

(58.7%)
0.188
Admission type
 Elective admission 1,191 (15.4%) 875 (15.3%) 316 (15.8%) 0.564 906

(26.4%)
769

(29.9%)
137

(16.0%)
<0.001
 Emergency department 6,543 (84.6%) 4,859 (84.7%) 1,684 (84.2%) 0.564 2,522

(73.6%)
1,804

(70.1%)
718

(84.0%)
<0.001
Admission unit
 MICU 2,232 (28.9%) 1,684 (29.4%) 548 (27.4%) 0.094 2,137 (62.3%) 1,531 (59.5%) 606 (70.9%) <0.001
 SICU 1,823 (23.6%) 1,459 (25.4%) 364 (18.2%) <0.001 1,291 (37.7%) 1,042 (40.5%) 249 (29.1%) <0.001
 CCU 1,680 (21.7%) 1,152 (20.1%) 528 (26.4%) <0.001
 CSRU 1,308 (16.9%) 873 (15.2%) 435 (21.8%) <0.001
 TSICU 691 (8.9%) 566 (9.9%) 125 (6.2%) <0.001
AF burden (%) 1.0 (0.1–7.6) 0.3 (0.0–1.5) 22.5 (12.8–44.8) <0.001 1.1 (0.1–7.0) 0.4 (0.1–1.7) 23.6 (12.6–46.8) <0.001
Comorbidities
 Diabetes mellitus 2,202 (28.5%) 1,595 (27.8%) 607 (30.4%) 0.031 2,024 (59.0%) 1,475 (57.3%) 549 (64.2%) <0.001
 Congestive heart

failure
2,132 (27.6%) 1,306 (22.8%) 826 (41.3%) <0.001 779

(22.7%)
496

(19.3%)
283

(33.1%)
<0.001
 Chronic kidney

disease
1,300 (16.8%) 868 (15.1%) 432 (21.6%) <0.001 521 (15.2%) 314 (12.2%) 207 (24.2%) <0.001
 Chronic liver

disease
992 (12.8%) 787 (13.7%) 205 (10.2%) <0.001 286 (8.3%) 227 (8.8%) 59 (6.9%) 0.078
 Chronic pulmonary disease 1,532 (19.8%) 1,085 (18.9%) 447 (22.4%) <0.001 459 (13.4%) 301 (11.7%) 158 (18.5%) <0.001
 Cerebrovascular disease 1,130 (14.6%) 827 (14.4%) 303 (15.2%) 0.428 780 (22.8%) 601 (23.4%) 179 (20.9%) 0.143
 Dementia 401 (5.2%) 253 (4.4%) 148 (7.4%) <0.001 52 (1.5%) 29 (1.1%) 23 (2.7%) 0.001
 Connective tissue disease 256 (3.3%) 170 (3.0%) 86 (4.3%) 0.004 41 (1.2%) 33 (1.3%) 8 (0.9%) 0.419
 Cancer 547 (7.1%) 430 (7.5%) 117 (5.8%) 0.013 414 (12.1%) 322 (12.5%) 92 (10.8%) 0.173
 Charlson comorbidity index 2.0 (1.0–4.0) 2.0 (1.0–4.0) 2.0 (1.0–4.0) <0.001 6.0 (3.0–10.0) 5.0 (3.0–9.0) 8.0 (5.0–12.0) <0.001
SOFA score 3.0

(2.0–6.0)
3.0

(2.0– 5.0)
4.0

(2.0– 7.0)
<0.001 4.0

(2.0–8.0)
3.0

(1.0–7.0)
7.0

(4.0–11.0)
<0.001
Ventilator use 3,905 (50.5%) 2,812 (49.0%) 1,093 (54.6%) <0.001 935 (27.3%) 596 (23.2%) 339 (39.6%) <0.001
Renal replacement

therapy
368 (4.8%) 249 (4.3%) 119 (6.0%) 0.004 395 (11.5%) 202 (7.9%) 193 (22.6%) <0.001
Vasopressor use 1279 (16.7%) 813 (14.4%) 466 (23.6%) <0.001
Sepsis 2,379 (30.8%) 1,685 (29.4%) 694 (34.7%) <0.001 883 (25.8%) 542 (21.1%) 341 (39.9%) <0.001

AF, atrial fibrillation; MICU, medical intensive care unit; SICU, surgical intensive care unit; CCU, coronary care unit; CSRU, cardiac surgery recovery unit; TSICU, trauma surgical intensive care unit; SOFA, Sequential Organ Failure Assessment; MIMIC-III, Medical Information Mart for Intensive Care III.

The median AF burden for all patients was 1.0% (IQR: 0.1%–7.6%). The median AF burden was 0.3% (IQR: 0.0%–1.5%) and 22.5% (IQR: 12.8%–44.8%) in the low and high AF burden groups, respectively. Significant differences in demographics, admission type, comorbidities, and severity indices were observed between the two groups. The high AF burden group had a higher Charlson comorbidity index, SOFA score, and proportion of patients with sepsis than did the low AF burden group. The characteristics of the study population are presented in Table 1.

Outcomes

In the MIMIC-III database, the in-hospital mortality rate was significantly higher in patients with a high AF burden (18.1%) than in those with a low AF burden (8.6%; P < 0.001; Table 2). In-hospital mortality rates increased sequentially with AF burden, ranging from 8.0% in the lowest-burden group to 20.3% in the highest-burden group (S1 Fig). Overall mortality was significantly higher in the high AF burden group (48.4%) than in the low AF burden group (30.0%; P < 0.001).

Table 2. Comparison of outcomes in patients with critical illness stratified by atrial fibrillation burden.

MIMIC-III Yongin Severance Hospital
Low AF burden

(n = 5,734)
High AF burden

(n = 2,000)
P-value Low AF burden

(n = 2,573)
High AF burden

(n = 855)
P-value
In-hospital mortality 495

(8.6%)
363

(18.1%)
<0.001 285

(11.1%)
268

(31.3%)
<0.001
Overall mortality 1721 (30.0%) 967 (48.4%) <0.001 372

(14.5%)
319

(37.3%)
<0.001
Length of

ICU stay (days)
2.1 (1.2–4.3) 3.0 (1.5–6.0) <0.001 2.0

(2.0–4.0)
3.0

(2.0–6.0)
<0.001
Length of

hospital stay (days)
7.0

(4.0–12.0)
8.0

(4.0–13.0)
<0.001 12.0

(5.0–12.0)
15.0

(7.0–29.0)
<0.001

AF, atrial fibrillation; ICU, intensive care unit; MIMIC-III, Medical Information Mart for Intensive Care III.

Additionally, patients with a high AF burden had a significantly longer median ICU stay (3.0 days [IQR: 1.5–6.0 days]) than did the low AF burden counterparts (2.1 days [IQR: 1.2–4.3 days]; P < 0.001). The median length of hospital stay was also significantly longer in the high AF burden group (8.0 days [IQR: 4.0–13.0 days]) than in the low AF burden group (7.0 days [IQR: 4.0–12.0 days]; P < 0.001).

Impact of AF burden on in-hospital mortality

Logistic regression analysis was conducted to evaluate the association between the AF burden and in-hospital mortality (Table 3). Univariate analysis showed a significant association between high AF burden and in-hospital mortality (OR, 2.35; 95% CI, 2.03–2.72; P < 0.001). Multivariable analysis was performed to adjust for confounders, including age, sex, admission type, admission unit, Charlson Comorbidity Index, SOFA score, ventilator use, renal replacement therapy, vasopressor use, and sepsis. After adjustment, a high AF burden remained an independent risk factor for in-hospital mortality in patients with critical illness (adjusted OR, 1.63; 95% CI, 1.36–1.95; P < 0.001). Furthermore, restricted cubic spline analysis demonstrated a significant association between AF burden and in-hospital mortality (P < 0.001), with significant nonlinearity in this relationship (P < 0.001; S2 Fig).

Table 3. Risk factors for in-hospital mortality in patients with critical illness.

Univariable analysis Multivariable analysis
OR (95% CI) P-value aOR (95% CI) P-value
Age 1.03 (1.03–1.04) <0.001 1.03 (1.03–1.04) <0.001
Male sex 0.92 (0.80–1.06) 0.241 0.90 (0.77–1.06) 0.220
MICU 1.75 (1.51–2.02) <0.001 1.10 (0.92–1.31) 0.305
SICU 0.96 (0.81–1.13) 0.595
CCU 0.96 (0.80–1.14) 0.637
CSRU 0.35 (0.26–0.45) <0.001 0.23 (0.17–0.32) <0.001
TICU 1.02 (0.79–1.30) 0.865
Admission via ED 4.23 (3.09–5.96) <0.001 2.54 (1.79–3.71) <0.001
Charlson Comorbidity Index 1.18 (1.15–1.21) <0.001 1.09 (1.06–1.13) <0.001
High AF burden 2.35 (2.03–2.72) <0.001 1.63 (1.36–1.95) <0.001
SOFA score 1.30 (1.27–1.33) <0.001 1.21 (1.18–1.25) <0.001
Ventilator use 4.20 (3.55–4.99) <0.001 3.90 (3.20–4.76) <0.001
Renal replacement therapy 2.39 (1.84–3.08) <0.001 0.92 (0.67–1.27) 0.629
Vasopressor use 3.85 (3.30–4.49) <0.001 1.06 (0.86–1.32) 0.575
Sepsis 3.71 (3.21–4.30) <0.001 1.36 (1.14–1.62) <0.001

All variance inflation factors were <5, and the Hosmer–Lemeshow test indicated no significant lack of fit (P = 0.203).

OR, odds ratio; aOR, adjusted odds ratio; AF, atrial fibrillation; MICU, medical intensive care unit; SICU, surgical intensive care unit; CCU, coronary care unit; CSRU, cardiac surgery recovery unit; TSICU, trauma surgical intensive care unit; ED, emergency department; SOFA, Sequential Organ Failure Assessment.

Prediction of in-hospital mortality using machine learning models based on AF burden

Machine learning models were trained to predict in-hospital mortality, and the SHAP values were measured. Various models, including the Random Forest Classifier, XGBoost, and Logistic Regression, were trained, with the Random Forest Classifier model achieving the best performance (S1 Table). The performance metrics of the Random Forest Classifier model included an accuracy of 0.788, sensitivity of 0.815, specificity of 0.760, positive predictive value (PPV) of 0.775, F1 score of 0.795, and area under the curve (AUC) of 0.86 (Fig 3A).

Fig 3. Performance and feature importance of in-hospital mortality prediction models in patients with critical illness.

Fig 3

(A) MIMIC-III (training dataset) AUROC, (B) Yongin Severance Hospital (external validation dataset) AUROC, (C) MIMIC-III (training dataset) SHAP values, (D) Yongin Severance Hospital (external validation dataset) SHAP values. Abbreviations: MIMIC-III, Medical Information Mart for Intensive Care III. MIMIC-III, Medical Information Mart for Intensive Care III; AUROC, Area Under Receiver Operating Characteristic Curve; SHAP, SHapley Additive exPlanation.

Using the SHAP value statistical analysis, the AF burden had an impact on the prediction of in-hospital mortality (Figs 3C and S3A). The AF burden was ranked after SOFA score, ventilator use, age, and sepsis in terms of feature importance. This high ranking underscores the significant role of AF burden in the model’s predictions. Higher AF burden values were associated with increased mortality, demonstrating the importance of AF burden in effectively predicting patient outcomes.

External validation

For the external validation dataset from the Yongin Severance Hospital, 3,428 eligible patients were included in the analysis (Fig 2B). Among them, 2,573 (75.1%) and 855 (24.9%) patients had low and high AF burdens, respectively. The two groups differed significantly in several baseline characteristics, including age, sex, admission type, admission unit, and comorbidities (Table 1). The median AF burden in all patients was 1.1% (IQR: 0.1–7.0). The median AF burden was 0.4% (IQR: 0.1–1.7) and 23.6% (IQR: 12.6–46.8) in the low and high AF burden groups, respectively. The high AF burden group had significantly higher in-hospital (31.3% vs. 11.1%; P < 0.001) and overall mortality (37.3% vs. 14.5%; P < 0.001) rates than the low AF burden group.

The performance of the machine learning model in predicting in-hospital mortality was evaluated using a Yongin Severance Hospital external validation dataset. The model showed an accuracy of 0.754, sensitivity of 0.850, specificity of 0.736, PPV of 0.382, F1 score of 0.528, and AUC of 0.87 (Fig 3B). These performance metrics are comparable to the results obtained from the MIMIC-III training dataset. The SHAP values also showed that AF burden had a significant impact on the model predictions (Fig 3D and S3B).

Discussion

We investigated the association between a high AF burden and in-hospital mortality in patients with critical illnesses. Patients with high AF burden had a significantly increased risk of in-hospital mortality. A substantial proportion of patients with critical illness had a high AF burden, with mortality rates exceeding 20% and the highest AF burden (>23.9%). We introduced a method to calculate AF burden using a deep learning model applied to continuously recorded ECG waveform data in the ICU. Furthermore, the findings were validated using external data from Yongin Severance Hospital, demonstrating the robustness and generalizability of the results.

Although AF has long been recognized as a marker of poor prognosis in patients with critical illness, it has primarily been understood as a binary entity [9]. However, recent studies have evaluated the impact of AF burden—the proportion of time spent in AF during monitoring—and have shown that a high AF burden is significantly associated with subsequent AF diagnosis and ischemic stroke [10,11]. In the study by Lancini et al., a threshold of ≥7% was used to define high AF burden, corresponding to the median AF burden of their cohort [12]. The authors noted that this cutoff was used as an operational threshold for binary analyses rather than a clinically validated value. Importantly, they demonstrated that AF burden was a strong independent predictor of subsequent AF diagnosis, with risk increasing progressively across higher burden levels—supporting the interpretation of AF burden as a continuous risk factor.

Similarly, in our study, a high AF burden was significantly associated with in-hospital mortality among patients with critical illness. We also used the 7% threshold applied by Lancini et al [12]. to distinguish high AF burden; however, this value served only as a reference point for binary analysis, as our objective was not to propose a fixed clinical cutoff for high AF burden but to evaluate its continuous association with mortality risk. As shown in S1 and S2 Figs, mortality increased steadily with higher AF burden, illustrating that AF burden behaves as a continuous risk factor. Moreover, restricted cubic spline analysis showed significant nonlinearity between AF burden and mortality. This nonlinear pattern supports the use of flexible approaches, including our deep learning–based mortality prediction model, which can account for complex, non-linear relationships beyond the capability of linear models.

A possible explanation for the observed association between AF burden and mortality is that AF burden may serve as a surrogate marker of underlying illness severity and systemic stress, given its frequent occurrence in the setting of heightened inflammatory responses, vasopressor use, electrolyte disturbances, and advanced organ dysfunction [13]. Additionally, AF burden may contribute to clinical deterioration, as sustained AF can eliminate effective atrial contraction, reduce diastolic filling, and increase ventricular rates, which together can impair cardiac output and worsen hemodynamic stability in vulnerable ICU patients [13]. These mechanisms support the clinical plausibility of the observed association between higher AF burden and increased mortality. Hence, AF should be understood not as a binary entity, but as a continuous variable known as the AF burden.

Although numerous mortality prediction models have been proposed, most rely exclusively on static clinical variables such as age, sepsis, and SOFA score, which are collected intermittently and do not capture rapid physiologic deterioration [7,8]. A key innovation of our study is the incorporation of a continuously quantified AF burden derived from deep learning analysis of real-time ICU ECG waveforms. By transforming AF from a binary event into a dynamic, time-dependent physiologic marker, our model captures high-resolution cardiac instability that precedes clinical deterioration—an aspect that existing models cannot assess. Importantly, AF burden provided independent and incremental prognostic value beyond traditional predictors, and its significance was consistently reproduced in an external ICU cohort. These findings demonstrate that leveraging continuous monitoring data introduces a novel physiologic dimension to mortality prediction that meaningfully extends beyond the capabilities of conventional risk scores. Building on this advantage, real-time AF burden monitoring could be integrated into ICU clinical decision-support systems to facilitate earlier detection of clinical deterioration. Dynamic AF burden thresholds could trigger automated alerts, prompting clinicians to evaluate hemodynamic status, reassess rate or rhythm-control strategies, and consider anticoagulation when appropriate. Incorporating AF burden into existing ICU dashboards would enable continuous, real-time risk stratification and may help identify patients whose cardiovascular stability is worsening.

In the SHAP analysis, this distinction was clearly reflected in the feature contributions. Higher SOFA scores, mechanical ventilation use, and sepsis were strong positive predictors of mortality, consistent with their roles as established markers of critical illness severity [7,8]. AF burden also emerged as an important contributor, functioning as a dynamic real-time marker rather than a static admission-time variable. Increasing AF burden was associated with a steadily higher predicted mortality risk, indicating that sustained or frequent AF episodes capture ongoing hemodynamic instability during the ICU stay. These results further support the added value of continuous monitoring–derived variables in mortality prediction models.

Deep learning offers a promising approach for automating ECG feature extraction, with convolutional and recurrent neural networks demonstrating robustness in distinguishing AF [14]. Traditional convolutional networks, when built with increased layers, often encounter gradient vanishing or explosion, a challenge ResNet overcomes through skip (residual) connections. These allow the model to learn residual functions and train deeper networks efficiently by maintaining information across layers without degrading the learning process [15]. ResNet-based models have achieved a high AF detection accuracy. Andreotti et al. used ResNet-34 with data augmentation on single-lead ECGs and reported a high F1 score [16]. Guan et al. added hidden attention to ResNet, effectively capturing ECG spatiotemporal features and improving both accuracy and interpretability [17]. In a previous study, SE-ResNet was compared with ResNet as a baseline model for different layer depths (18, 34, 50, 101, and 152), showing that SE-ResNet had better classification performance than ResNet for all slices [18]. Here, SE-ResNet also surpassed ResNet, with SE-ResNet-34 used to train the final AF classification model. Beyond AF detection performance, deep learning has also demonstrated substantial utility in other continuous biosignal and IoT-based monitoring environments, where sensor-fused physiologic streams are analyzed to capture subtle, clinically meaningful patterns [6]. This expanding technological foundation supports the broader methodological rationale of our study and underscores how AI-based waveform analysis can address gaps left by traditional, intermittently measured clinical variables.

Importantly, this deep learning–based approach overcomes a key limitation of traditional AF assessment in the ICU, which often relies on intermittent rhythm checks or manual review of telemetry strips. Such methods cannot capture the rapid, high-frequency fluctuations in cardiac rhythm that occur during critical illness. By automatically analyzing every 10-second ECG segment, our model provides continuous, high-resolution quantification of AF burden that would be impractical to obtain manually, thereby reducing sampling bias and clinician workload in a busy ICU environment.

We present methods for real-time risk stratification and prediction of clinical deterioration using the AF burden. While ICU scoring systems such as SOFA, APACHE, and SAPS are widely used, they rely on the manual input of complex patient characteristics and include discontinuously measured variables, limiting their ability to adequately reflect real-time deterioration [7,8]. In contrast, our approach uses only existing 24-h monitoring data to detect AF burden, requiring no additional input or cost. A key finding is that the AF burden can be calculated in real time using ECG waveforms to predict the outcomes. As ECG monitoring is routine in the ICU, this method enables continuous, cost-free assessment without added intervention.

Patients with a high AF burden may benefit from early and aggressive management, such as antiarrhythmic medications. However, the efficacy of pharmacological agents in the treatment of AF in patients with critical illnesses varies. Amiodarone (30.0%–95.2%), beta-blockers (31.8%–92.3%), and calcium channel blockers (30.0%–87.1%) have shown mixed success rates, with risks of hypotension, especially in patients with sepsis. Magnesium (55.2%–77.8%) appears safer, with fewer adverse events, particularly as a first-line agent or adjunct to amiodarone. Nonetheless, these drugs have side effects, and no study has conclusively shown a survival benefit [19]. Previous studies have treated AF as a binary condition, without exploring rhythm control strategies in high-risk groups identified by AF burden reclassification. This study highlights the need to recognize AF as a continuous variable rather than a binary variable. Further research is warranted to assess whether additional treatment in patients with high AF burden and mortality risk improves prognosis.

This study had several limitations. The cohort included only patients with critical illnesses in the ICU; therefore, the findings may not be generalizable to general ward patients. The AF burden was calculated as the proportion of total ICU stay, limiting the assessment of its real-time implications. Moreover, we could not determine whether the AF burden–guided interventions improved the outcomes. The study also did not identify the specific reason for ICU admission, although ICU type was included. Baseline cardiac history and echocardiographic findings were not available in our dataset and therefore could not be included in the analysis. Future studies should assess whether targeting the AF burden can improve outcomes and whether real-time clinical decisions based on the AF burden offer prognostic benefits.

In conclusion, high AF burden was significantly associated with in-hospital mortality. We present a method to calculate AF burden by analyzing continuous ECG monitoring in the ICU using a deep learning model. The AF burden can be used to provide real-time alerts for clinical deterioration in the ICU and stratify the risk of patients with critical illness and AF.

Methods

Study population

This multicenter case–control study utilized data from the Medical Information Mart for Intensive Care III (MIMIC-III) database (version 2.2) and Yongin Severance Hospital [20,21]. The MIMIC-III database is a publicly available de-identified electronic health record database that encompasses comprehensive clinical information and waveform data of patients admitted to the ICU of the Beth Israel Deaconess Medical Center (BIDMC) in Boston, Massachusetts, between 2001 and 2012. The waveform record contained digitized signals, including ECG, arterial blood pressure, and respiratory signals [22]. The data were randomly collected in an automated manner.

Yongin Severance Hospital, located in Yongin, Republic of Korea, is a 708-bed, university-affiliated hospital. The patients included in this study were admitted to the ICU of this hospital between January 2021 and February 2023. ECG data from the hospital were extracted via the Severance Data Portal, which provided access to single-lead 10-s ECG recordings. These data were collected at a 500-Hz sampling rate using an ECG lead II, with a system capable of real-time monitoring in the ICU. These data were used as an external validation dataset for machine learning models assessing the predictive value of AF burden for in-hospital mortality.

This study included adult patients admitted to the ICU of each hospital. Patients aged ≥90 years were excluded because age ≥ 90 years was anonymized in the MIMIC-III database due to patient de-identification issues. Cases in which the AF burden could not be calculated due to incomplete ECG data were excluded. Furthermore, patients with an AF burden > 0.9 were also excluded. An AF burden above this level is highly suggestive of persistent AF, which is defined as AF lasting longer than 7 days and not self-terminating [23]. In such cases, AF is continuously present regardless of short-term changes in clinical condition, making it less appropriate to use AF burden as a dynamic predictor of outcomes. Given the lack of formal historical diagnostic information (e.g., past ECG data and physician’s diagnosis) to reliably exclude all persistent AF cases, the > 0.9 threshold was employed based on this clinical rationale.

AF burden

To calculate the AF burden, AF binary convolutional neural network models were trained using public ECG datasets (PTB-XL [24] and AF 2017 Challenge dataset [25], S1 and S2 Methods). The model architectures used were ResNet-18, ResNET-34, SE-ResNet-18, and SE-ResNet-34. Model performance was evaluated using another public ECG dataset (Shaoxing Hospital ECG dataset [26], S3 Method). The SE-ResNet-34 model [27,28] demonstrated the best overall performance (accuracy: 0.943, sensitivity: 0.937, specificity: 0.945, AUROC: 0.983) and was therefore selected as the final AF detection model for computing AF burden from ICU ECG waveforms. (S2 Table). The SE-ResNet-34 architecture consisted of an initial 1-D convolutional layer (kernel size 7, 64 channels, stride 2), batch normalization, ReLU activation, and a max-pooling layer (kernel size 3, stride 2), followed by four residual stages with 3, 4, 6, and 3 blocks and channel widths of 64, 128, 256, and 512, respectively. In each residual block, two 1-D convolutional layers (kernel size 3) with batch normalization and ReLU were followed by a squeeze-and-excitation (SE) block, which applied global average pooling and two fully connected layers to adaptively reweight channel-wise features. A global average pooling layer over the temporal dimension and a final fully connected layer produced the binary AF prediction. To exclude pacemaker rhythm, another SE-ResNet-34 model was trained for pacemaker rhythm prediction. This model was trained using the PTB-XL and Lobachevsky University Electrocardiography Database (S4 Method) [29]. Model performance was evaluated using the MIT-BIH Arrhythmia Database [30] (S5 and S6 Methods, S3 Table).

The entire waveform data of each patient was divided into 10-s segments. Segments with poor signal quality or pacemaker rhythms were excluded from the analysis. The signal quality was assessed using a high-pass filter to remove the baseline wander, setting the cutoff frequency at 0.5 Hz. A power line filter was employed to eliminate the 50-Hz power line noise [31]. Z-normalization was performed using the mean and standard deviation. After exclusion, an AF classification model was applied to each segment. The AF burden was calculated as the ratio of AF waveforms to the total number of waveforms during ICU admission. For the external validation cohort, the identical preprocessing pipeline and AF classification model used for the MIMIC-III cohort were applied without any modification, including the same filtering procedures, segmentation strategy, normalization method, and the pretrained SE-ResNet-34 classifier. AF burden in the external cohort was computed using the same algorithm, ensuring methodological consistency across cohorts.

Variables and outcome measures

The following variables were extracted from the MIMIC-III database and the Severance Clinical Research Analysis Portal for Anonymous (SCRAP-A) Yongin Severance Hospital data: demographic characteristics (age and sex), admission type, ICU admission, comorbidities, SOFA score, ventilation, and renal replacement therapy. The outcome of interest was the in-hospital mortality.

In patients with critical illness, a high AF burden was defined as ≥7.0% during the ICU stay, following the threshold used in a prior study by Lancini et al [12]. This cutoff was applied to categorize participants into high and low AF burden groups. Additionally, the AF burden was categorized into four quartiles: Q1, 0.1%–2.4%; Q2, 2.4%–7.0%; Q3, 7.1%–23.9%; and Q4, 24.7%–100% [12]. Sepsis was determined on the basis of the International Classification of Diseases (ICD) ninth (ICD-9) and tenth (ICD-10) revision codes [32,33]. The Charlson Comorbidity Index was calculated using coding algorithms on the basis of ICD-9-CM and ICD-10 codes [34]. These variables were extracted based on the criteria used during ICU admission.

Statistical analysis

Continuous variables were compared between groups using independent two-tailed t-tests or the Mann–Whitney U-test, as appropriate, based on the data distribution. The Shapiro–Wilk test was used to assess normality, with normally distributed variables reported as means ± standard deviations and non-normally distributed variables as medians with interquartile ranges (IQRs). Categorical variables were analyzed using Pearson’s chi-square or Fisher’s exact test, as appropriate.

Univariable logistic regression was used to assess the association between a high AF burden and in-hospital mortality. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated for all variables. Variables that showed a statistically significant association (P < 0.05) in the univariable analysis were included in the multivariable logistic regression model. Multicollinearity was assessed using a variance inflation factor >10. The goodness of fit of the logistic regression model was assessed using the Hosmer–Lemeshow test. In addition, a restricted cubic spline analysis was performed to examine potential nonlinear associations between AF burden and mortality [35]. Covariates that remained significant in the multivariable logistic regression model were included in the spline model to adjust for confounding. Statistical analyses were performed using R, version 4.4.0 (The R Foundation for Statistical Computing, Vienna, Austria). Random Forest model and SHapley Additive exPlanation (SHAP) value analyses were conducted using Python (version 3.10.13) in a Jupyter notebook. The SQL code used for the data extraction is available at GitHub (https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii).

Machine learning modeling

Based on the statistical analyses, machine learning models were developed to further explore the predictive power and future importance of AF burden in determining in-hospital mortality. A Random Forest Classifier was implemented using scikit-learn and Random Forest libraries in Python (version 3.10.13) in a Jupyter notebook. The Random Forest model, an ensemble learning method known for its robustness and ability to handle complex feature interactions while mitigating overfitting, was selected for its suitability for classification tasks (S1 Table). The dataset was divided into training and test sets in an 8:2 ratio. Model performance was assessed using metrics, including accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (AUROC). To assess the importance of AF burden and other features in predicting in-hospital mortality, the SHAP values were calculated. SHAP provides a unified measure of the contribution of the features, highlighting the impact of each variable on the model’s predictions [36]. To facilitate the interpretation of how different variables influenced the model’s decisions, the resulting SHAP values were visualized using different plots. A list of model features, including their name and definitions is provided in S4 Table to enhance clarity and reproducibility.

For external validation, an independent dataset from Yongin Severance Hospital was used, which was also divided into training and test sets at a ratio of 8:2. To assess the generalizability of the model to external data, performance metrics (accuracy, sensitivity, specificity, F1 score, and AUROC) were recalculated.

Ethics statement

The MIMIC-III database was approved by the institutional review boards of the Massachusetts Institute of Technology and BIDMC. The protocols for the external validation dataset were approved by the Institutional Review Board of Yongin Severance Hospital (Approval No. 9-2023-0040). The requirement for informed consent was waived by the board owing to the retrospective analysis of anonymized data obtained from routine evaluations of patients undergoing Holter monitoring.

Supporting information

S1 Method. PTB-XL data.

(DOCX)

pdig.0001266.s001.docx (17.8KB, docx)
S2 Method. AF 2017 Challenge data.

(DOCX)

pdig.0001266.s002.docx (16.5KB, docx)
S3 Method. Shaoxing Hospital data.

(DOCX)

pdig.0001266.s003.docx (15.8KB, docx)
S4 Method. Lobachevsky University Electrocardiography Database (LUDB).

(DOCX)

pdig.0001266.s004.docx (16KB, docx)
S5 Method. MIT-BIH Arrhythmia database.

(DOCX)

pdig.0001266.s005.docx (15.9KB, docx)
S6 Method. Deep-learning model for rhythm classification.

(DOCX)

pdig.0001266.s006.docx (15.2KB, docx)
S1 Table. Performance of in-hospital mortality prediction models in critically ill patients.

(DOCX)

pdig.0001266.s007.docx (15.6KB, docx)
S2 Table. Performance of the atrial fibrillation classification model using external validation dataset.

(DOCX)

pdig.0001266.s008.docx (15.7KB, docx)
S3 Table. Performance of the pacemaker rhythm classification model using external validation dataset.

(DOCX)

pdig.0001266.s009.docx (15.2KB, docx)
S4 Table. Feature name and clinical meaning.

(DOCX)

pdig.0001266.s010.docx (16.8KB, docx)
S1 Fig. Risk of in-hospital mortality by AF burden.

A) MIMIC-III training dataset, B) Yongin Severance Hospital external validation dataset.

(DOCX)

pdig.0001266.s011.docx (1.3MB, docx)
S2 Fig. Restricted cubic spline showing the relationship between AF burden and in-hospital mortality in the MIMIC-III dataset.

(DOCX)

pdig.0001266.s012.docx (885.7KB, docx)
S3 Fig. Feature importance (A) MIMIC-III dataset, (B) Yongin Severance Hospital.

(DOCX)

pdig.0001266.s013.docx (303.9KB, docx)

Data Availability

Yongin Severance Hospital dataset (external dataset) cannot be shared publicly because they contain potentially identifying and sensitive patient information and are subject to the Personal Information Protection Act and institutional review board (IRB) restrictions. Requests for access to the Yongin Severance Hospital dataset may be directed to the Institutional Review Board of Yonsei University Health System (email: irb@yuhs.ac). MIMIC-III dataset is publicly available via PhysioNet (MIMIC-III Waveform Database, https://physionet.org/content/mimic3wdb/1.0/), subject to completion of the required data use training and credentialing process. The code used for model development and analysis is publicly available at https://github.com/CMI-Laboratory/AF_burden.

Funding Statement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT of the Republic of Korea (RS-2023-00276320 to D.Y). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Moss TJ, Calland JF, Enfield KB, Gomez-Manjarres DC, Ruminski C, DiMarco JP, et al. New-onset atrial fibrillation in the critically Ill. Crit Care Med. 2017;45(5):790–7. doi: 10.1097/CCM.0000000000002325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Meierhenrich R, Steinhilber E, Eggermann C, Weiss M, Voglic S, Bögelein D, et al. Incidence and prognostic impact of new-onset atrial fibrillation in patients with septic shock: a prospective observational study. Crit Care. 2010;14(3):R108. doi: 10.1186/cc9057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gandhi S, Litt D, Narula N. New-onset atrial fibrillation in sepsis is associated with increased morbidity and mortality. Neth Heart J. 2015;23(2):82–8. doi: 10.1007/s12471-014-0641-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chen LY, Chung MK, Allen LA, Ezekowitz M, Furie KL, McCabe P, et al. Atrial fibrillation burden: moving beyond atrial fibrillation as a binary entity: a scientific statement from the American Heart Association. Circulation. 2018;137(20):e623–44. doi: 10.1161/CIR.0000000000000568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25(1):65–9. doi: 10.1038/s41591-018-0268-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Selvan PS, Addula SR, Singh CE, Narayanaperumal M, Marriwala NK, Appathurai A. Deep learning-enabled fetal health classification through sensor-fused IoT. Mobile Radio Communications and 5G Networks: Proceedings of Fifth MRCN 2024; 2025. 157 p. [Google Scholar]
  • 7.Falcão ALE, Barros AG de A, Bezerra AAM, Ferreira NL, Logato CM, Silva FP, et al. The prognostic accuracy evaluation of SAPS 3, SOFA and APACHE II scores for mortality prediction in the surgical ICU: an external validation study and decision-making analysis. Ann Intensive Care. 2019;9(1):18. doi: 10.1186/s13613-019-0488-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Strand K, Flaatten H. Severity scoring in the ICU: a review. Acta Anaesthesiol Scand. 2008;52(4):467–78. doi: 10.1111/j.1399-6576.2008.01586.x [DOI] [PubMed] [Google Scholar]
  • 9.Blythe R, Parsons R, White NM, Cook D, McPhail S. A scoping review of real-time automated clinical deterioration alerts and evidence of impacts on hospitalised patient outcomes. BMJ Qual Saf. 2022;31(10):725–34. doi: 10.1136/bmjqs-2021-014527 [DOI] [PubMed] [Google Scholar]
  • 10.Doundoulakis I, Nedios S, Zafeiropoulos S, Vitolo M, Della Rocca DG, Kordalis A, et al. Atrial fibrillation burden: stepping beyond the categorical characterization. Heart Rhythm. 2025;22(5):1179–87. doi: 10.1016/j.hrthm.2024.08.051 [DOI] [PubMed] [Google Scholar]
  • 11.Park YJ, Kim JS, Park K-M, On YK, Park S-J. Subclinical atrial fibrillation burden and adverse clinical outcomes in patients with permanent pacemakers. Stroke. 2021;52(4):1299–308. doi: 10.1161/STROKEAHA.120.031822 [DOI] [PubMed] [Google Scholar]
  • 12.Lancini D, Tan WL, Guppy-Coles K, Boots R, Prasad S, Atherton J, et al. Critical illness associated new onset atrial fibrillation: subsequent atrial fibrillation diagnoses and other adverse outcomes. Europace. 2023;25(2):300–7. doi: 10.1093/europace/euac174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bosch NA, Cimini J, Walkey AJ. Atrial fibrillation in the ICU. Chest. 2018;154(6):1424–34. doi: 10.1016/j.chest.2018.03.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Xie J, Stavrakis S, Yao B. Automated identification of atrial fibrillation from single-lead ECGs using multi-branching ResNet. Front Physiol. 2024;15:1362185. doi: 10.3389/fphys.2024.1362185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.He K, Zhang X, Ren S, Sun J, editors. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. [Google Scholar]
  • 16.Andreotti F, Carr O, Pimentel MA, Mahdi A, De Vos M, editors. Comparing feature-based classifiers and convolutional neural networks to detect arrhythmia from short segments of ECG. 2017 Computing in Cardiology (CINC). IEEE; 2017. [Google Scholar]
  • 17.Guan Y, An Y, Xu J, Liu N, Wang J. HA-ResNet: residual neural network with hidden attention for ECG arrhythmia detection using two-dimensional signal. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(6):3389–98. doi: 10.1109/TCBB.2022.3198998 [DOI] [PubMed] [Google Scholar]
  • 18.Park J, Kim JK, Jung S, Gil Y, Choi JI, Son HS. ECG-signal multi-classification model based on squeeze-and-excitation residual neural networks. Appl Sci. 2020;10(18):6495. [Google Scholar]
  • 19.O’Bryan LJ, Redfern OC, Bedford J, Petrinic T, Young JD, Watkinson PJ. Managing new-onset atrial fibrillation in critically ill patients: a systematic narrative review. BMJ Open. 2020;10(3):e034774. doi: 10.1136/bmjopen-2019-034774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Johnson AEW, Pollard TJ, Shen L, Lehman L-WH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035. doi: 10.1038/sdata.2016.35 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Johnson A, Pollard T, Mark R. MIMIC-III clinical database (version 1.4). PhysioNet; 2016. [Google Scholar]
  • 22.Moody B, Moody G, Villarroel M, et al. MIMIC-III waveform database (version 1.0). PhysioNet; 2020. [Google Scholar]
  • 23.Van Gelder IC, Rienstra M, Bunting KV, Casado-Arroyo R, Caso V, Crijns HJGM, et al. 2024 ESC Guidelines for the management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS). Eur Heart J. 2024;45(36):3314–414. doi: 10.1093/eurheartj/ehae176 [DOI] [PubMed] [Google Scholar]
  • 24.Wagner P. PTB-XL, a large publicly available electrocardiography dataset (version 1.0.3); 2022. Available from: doi: 10.13026/kfzx-aw45 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Clifford GD, Liu C, Moody B, Lehman LH, Silva I, Li Q. AF classification from a short single lead ECG recording: the PhysioNet/Computing in Cardiology Challenge 2017. Comput Cardiol. 2017;44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zheng J. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0.0); 2022. Available from: doi: 10.13026/wgex-er52 [DOI] [Google Scholar]
  • 27.Xu J, Leng L, Kim B-G. Gesture recognition and hand tracking for anti-counterfeit palmvein recognition. Appl Sci. 2023;13(21):11795. doi: 10.3390/app132111795 [DOI] [Google Scholar]
  • 28.Yoo J, Jin Y, Ko B, Kim M-S. k-Labelsets method for multi-label ECG signal classification based on SE-ResNet. Appl Sci. 2021;11(16):7758. doi: 10.3390/app11167758 [DOI] [Google Scholar]
  • 29.Kalyakulina A, et al. Lobachevsky University Electrocardiography Database; 2021. [Google Scholar]
  • 30.Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag. 2001;20(3):45–50. doi: 10.1109/51.932724 [DOI] [PubMed] [Google Scholar]
  • 31.Zhao Z-D, Chen Y-Q, editors. A new method for removal of baseline wander and power line interference in ECG signals. 2006 International Conference on Machine Learning and Cybernetics. IEEE;2006. [Google Scholar]
  • 32.Mellhammar L, Wollter E, Dahlberg J, Donovan B, Olséen C-J, Wiking PO, et al. Estimating sepsis incidence using administrative data and clinical medical record review. JAMA Netw Open. 2023;6(8):e2331168. doi: 10.1001/jamanetworkopen.2023.31168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Angus DC, Linde-Zwirble WT, Lidicker J, Clermont G, Carcillo J, Pinsky MR. Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care. Crit Care Med. 2001;29(7):1303–10. doi: 10.1097/00003246-200107000-00002 [DOI] [PubMed] [Google Scholar]
  • 34.Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi J-C, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130–9. doi: 10.1097/01.mlr.0000182534.19832.83 [DOI] [PubMed] [Google Scholar]
  • 35.Gauthier J, Wu QV, Gooley TA. Cubic splines to model relationships between continuous variables and outcomes: a guide for clinicians. Bone Marrow Transplant. 2020;55(4):675–80. doi: 10.1038/s41409-019-0679-x [DOI] [PubMed] [Google Scholar]
  • 36.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67. doi: 10.1038/s42256-019-0138-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0001266.r001

Decision Letter 0

Iqram Hussain

13 Nov 2025

Response to Reviewers Revised Manuscript with Track Changes Manuscript Journal Requirements:

1. Please send a completed 'Competing Interests' statement, including any COIs declared by your co-authors. If you have no competing interests to declare, please state "The authors have declared that no competing interests exist". Otherwise please declare all competing interests beginning with the statement "I have read the journal's policy and the authors of this manuscript have the following competing interests:"

2. Your current Financial Disclosure states, “The author(s) received no specific funding for this work.”. However, your funding information on the submission form indicates that you received funding from “Ministry of Science and ICT, South Korea”. Please indicate by return email the full and correct funding information for your study and confirm the order in which funding contributions should appear. Please be sure to indicate whether the funders played any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

3. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

Additional Editor Comments:

Please include a table describing the feature names and their meanings.

There are already many published models on in-hospital mortality; please explain how this study is different or novel compared to prior work. The top predictors mirror what has been repeatedly shown in pulished works (e.g., sepsis, age, SOFA). This limits the innovative value of your model.

Discuss the clinical significance of the key features highlighted in the SHAP plots; what do these findings imply in a medical context? SHAP and mortality common topics, find some new interpretability methods to interpret the model.

The model architectures are rather basic; consider adding some methodological novelty or advanced comparison.

The image quality in the figures needs improvement (resolution).

Indicate whether you have validated the model using an external dataset.

Reviewers' Comments:

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: 1. In the Introduction and abstract the authors need, specify AF burden as a dynamic marker produced through real-time monitoring. In the introduction, clearly state the uniqueness of applying deep learning models to continuous and automatic computation of AF burden in critically ill patients compared to those utilizing intermittent monitoring devices.

2. The use consistent notation between the two data sets. For instance, sometimes MIMIC-III is also referred to as "training dataset" and Yongin Severance Hospital as "external validation dataset", which is right, but occasional use of "training dataset" without declaring MIMIC-III (e.g., Line 880) should be removed for clarity.

3. The Data Availability statement indicates that it is not feasible to share the external validation dataset openly due to the Personal Information Protection Act, understandable. However, in simple words, proclaim MIMIC-III as the primary, public dataset and categorically commit that the code used for the analysis (e.g., deployment of the deep learning model) will be shared to ascertain full reproducibility. (The mention of SQL code on GitHub is a solid start, yet the code of the ML/DL model is critical).

4. The discussion acknowledges that the study has a limitation in evaluating if AF burden-directed interventions reduce outcomes. However, more reinforcing the potential mechanism of action (e.g., hemodynamic instability, inflammatory response) linking high AF burden to mortality would strengthen the argument. While the ≥7.0% threshold for "high AF burden" is borrowed from an earlier study, its clinical and biological logic in the setting of this manuscript may be clarified further in the introduction or discussion.

5. It is suggested that the authors need to mention about the technology foundation in the Introduction and Discussion, include a citation on the use of deep learning or AI in healthcare or IoT because it deserves the largest methodology. The following citation from the provided document is extremely relevant to the intersection of deep learning and health monitoring. The suggested reference paper from: “Deep Learning-Enabled Fetal Health Classification Through Sensor-Fused IoT. Mobile Radio Communications and 5G Networks”. This paper suggested explains where the lack of adequate research in AF burden and its clinical importance is expressed, to move to the solution using deep learning to the discussion to reinforce the present arguments in deep learning in healthcare and application of AI in risk stratification.

6. Although deep learning forms the crux of the method, the exposition is primarily based on the existing deep learning architecture (ResNet/SE-ResNet). A brief clarification regarding how this computerized high-frequency load calculation is specifically bridging the limitation of traditional manual or intermittent monitoring in a busy ICU scenario would be beneficial. Enhance Rationale for AF Burden Threshold: Provide a superior, evidence-based rationale in the discussion for employing the ≥7.0% threshold for high AF burden by explaining its clinical relevance or statistical derivation from prior work.

Reviewer #2: Dear Authors,

Thank you for the opportunity to review your manuscript entitled “Association Between Deep Learning–Based Atrial Fibrillation Burden and In-Hospital Mortality.” The study presents a novel and clinically relevant approach to quantifying atrial fibrillation (AF) burden using deep learning analysis of continuous ECG monitoring in critically ill patients. The integration of artificial intelligence with large-scale ICU datasets (MIMIC-III and an external Korean cohort) is impressive and has potential to enhance dynamic risk stratification in critical care settings.

However, several important issues need to be addressed before the paper can be considered for publication.

Methodological Transparency:

Please clarify how AF burden was computed and justify the chosen cutoff (≥7.0%) for defining “high burden.” Was this threshold determined empirically, or derived from prior literature or ROC analysis?

Provide more details on the deep learning model architecture (e.g., SE-ResNet-34), input parameters, and validation metrics (accuracy, sensitivity, specificity) used to ensure reliable AF detection.

Explain how ECG artifacts were managed and why patients with AF burden >0.9 were excluded.

Confounding and Adjustment:

Expand on the rationale for including variables such as SOFA score, sepsis, and ventilator use in the multivariable model. Discuss potential multicollinearity, and report VIFs if available.

Consider adjusting for additional confounders such as vasoactive drug use, baseline cardiac history, or renal replacement therapy.

Model Interpretability and Generalizability:

Provide quantitative data on feature importance (not only SHAP plots) to highlight the predictive contribution of AF burden relative to other variables.

In the external validation cohort, please describe ECG data collection parameters and confirm that preprocessing and classification pipelines were identical to the MIMIC-III cohort.

Clinical Implications:

The discussion could be strengthened by explaining how real-time AF burden monitoring might be integrated into ICU alert systems or decision support tools.

Minor Revisions:

Ensure consistent terminology for AF burden thresholds throughout the text.

Improve figure readability (particularly Fig. 2B and Fig. 3) and update references to include more recent AI–AF burden studies (post-2022).

In summary, this is a promising and methodologically sound study that could significantly contribute to the understanding of AF burden as a dynamic prognostic marker in ICU patients. With greater methodological transparency and clearer articulation of clinical applications, the paper will be substantially strengthened.

Best regards,

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: Yes: Duaa Abualkhair, Division of Physiotherapy , Department of Applied and Allied Medical Sciences, Faculty of Medicine and Allied Medical Sciences, An-Najah National University, Nablus, Palestine . (www.najah.edu)

**********

Figure resubmission:

Reproducibility:-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->?>

PLOS Digit Health. doi: 10.1371/journal.pdig.0001266.r003

Decision Letter 1

Iqram Hussain

8 Feb 2026

Association Between Deep Learning–Based Atrial Fibrillation Burden and In-Hospital Mortality

PDIG-D-25-00824R1

Dear Dr. Yoon,

We are pleased to inform you that your manuscript 'Association Between Deep Learning–Based Atrial Fibrillation Burden and In-Hospital Mortality' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Iqram Hussain, Ph.D.

Academic Editor

PLOS Digital Health

***********************************************************

Additional Editor Comments (if provided):

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

publication criteria?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: I recommend accepting this manuscript. The authors have thoroughly addressed all the concerns. The updated manuscript significantly enhances the case for utilizing deep learning to assess Atrial Fibrillation (AF) burden in critical care settings.

1. The study goes beyond simply labeling AF as "present or absent." By defining AF burden as a continuous, changing measure, the authors give a clearer picture of how serious the arrhythmia is.

2. The authors applied an advanced SE-ResNet-34 model to analyze continuous ECG data from the ICU. This method effectively uses deep learning for automated analysis and overcomes the limitations of manual checks.

3. The authors have shown that AF burden is an important risk factor for in-hospital mortality (adjusted odds ratio: 1.63; 95% confidence interval: 1.36-1.95). They also demonstrate that this dynamic measure adds predictive value beyond traditional scores like SOFA.

4. The findings were confirmed in an independent group from Yongin Severance Hospital. The results were consistent across different datasets, such as MIMIC-III and the Korean cohort, highlighting the model's relevance in various settings.

5. The authors included a restricted cubic spline analysis, which effectively shows the significant and complex relationship between AF burden and mortality. This provides a stronger reason for understanding AF burden as a continuous measure.

6. The authors have made their deep learning model and analysis code available on GitHub, allowing others to reproduce their findings. This is essential for meaningful research in digital health.

Reviewer #2: The authors have adequately addressed all comments raised by the reviewers. The revisions have improved the clarity and quality of the manuscript, and I am satisfied with the responses provided.

I therefore recommend acceptance of the manuscript for publication.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: Yes: Duaa Abualkhair

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Method. PTB-XL data.

    (DOCX)

    pdig.0001266.s001.docx (17.8KB, docx)
    S2 Method. AF 2017 Challenge data.

    (DOCX)

    pdig.0001266.s002.docx (16.5KB, docx)
    S3 Method. Shaoxing Hospital data.

    (DOCX)

    pdig.0001266.s003.docx (15.8KB, docx)
    S4 Method. Lobachevsky University Electrocardiography Database (LUDB).

    (DOCX)

    pdig.0001266.s004.docx (16KB, docx)
    S5 Method. MIT-BIH Arrhythmia database.

    (DOCX)

    pdig.0001266.s005.docx (15.9KB, docx)
    S6 Method. Deep-learning model for rhythm classification.

    (DOCX)

    pdig.0001266.s006.docx (15.2KB, docx)
    S1 Table. Performance of in-hospital mortality prediction models in critically ill patients.

    (DOCX)

    pdig.0001266.s007.docx (15.6KB, docx)
    S2 Table. Performance of the atrial fibrillation classification model using external validation dataset.

    (DOCX)

    pdig.0001266.s008.docx (15.7KB, docx)
    S3 Table. Performance of the pacemaker rhythm classification model using external validation dataset.

    (DOCX)

    pdig.0001266.s009.docx (15.2KB, docx)
    S4 Table. Feature name and clinical meaning.

    (DOCX)

    pdig.0001266.s010.docx (16.8KB, docx)
    S1 Fig. Risk of in-hospital mortality by AF burden.

    A) MIMIC-III training dataset, B) Yongin Severance Hospital external validation dataset.

    (DOCX)

    pdig.0001266.s011.docx (1.3MB, docx)
    S2 Fig. Restricted cubic spline showing the relationship between AF burden and in-hospital mortality in the MIMIC-III dataset.

    (DOCX)

    pdig.0001266.s012.docx (885.7KB, docx)
    S3 Fig. Feature importance (A) MIMIC-III dataset, (B) Yongin Severance Hospital.

    (DOCX)

    pdig.0001266.s013.docx (303.9KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pdig.0001266.s015.docx (1.2MB, docx)

    Data Availability Statement

    Yongin Severance Hospital dataset (external dataset) cannot be shared publicly because they contain potentially identifying and sensitive patient information and are subject to the Personal Information Protection Act and institutional review board (IRB) restrictions. Requests for access to the Yongin Severance Hospital dataset may be directed to the Institutional Review Board of Yonsei University Health System (email: irb@yuhs.ac). MIMIC-III dataset is publicly available via PhysioNet (MIMIC-III Waveform Database, https://physionet.org/content/mimic3wdb/1.0/), subject to completion of the required data use training and credentialing process. The code used for model development and analysis is publicly available at https://github.com/CMI-Laboratory/AF_burden.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES