Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Aug 1.
Published in final edited form as: Circ Arrhythm Electrophysiol. 2022 Jul 22;15(8):e010850. doi: 10.1161/CIRCEP.122.010850

Machine Learning-Enabled Multimodal Fusion of Intra-Atrial and Body Surface Signals in Prediction of Atrial Fibrillation Ablation Outcomes

Siyi Tang 1, Orod Razeghi 2, Ridhima Kapoor 1, Mahmood I Alhusseini 1, Muhammad Fazal 1, Albert J Rogers 1, Miguel Rodrigo Bort 1,3, Paul Clopton 1, Paul Wang 1, Daniel Rubin 1, Sanjiv M Narayan 1, Tina Baykaner 1
PMCID: PMC9972736  NIHMSID: NIHMS1823319  PMID: 35867397

Abstract

Background:

Machine learning (ML) is a promising approach to personalize atrial fibrillation (AF) management strategies for patients after catheter ablation. Prior AF ablation outcome prediction studies applied classical ML methods to hand-crafted clinical scores, and none have leveraged intracardiac electrograms (EGM) or 12-lead surface electrocardiograms (ECG) for outcome prediction. We hypothesized that (a) ML models trained on EGM or ECG signals can perform better at predicting patient outcomes after AF ablation than existing clinical scores and (b) multimodal fusion of EGM, ECG, and clinical features can further improve the prediction of patient outcomes.

Methods:

Consecutive patients who underwent catheter ablation between 2015–2017 with panoramic left atrial EGM prior to ablation and clinical follow-up for at least one year following ablation were included. Convolutional neural networks (CNN) and a novel multimodal fusion framework were developed for predicting 1-year AF recurrence after catheter ablation from EGM, ECG signals, and clinical features. The models were trained and validated using 10-fold cross-validation on patient-level splits.

Results:

156 patients (64.5±10.5 years, 74% male, 42% paroxysmal) were analyzed. Using EGM signals alone, the CNN achieved an Area Under the Receiver Operating Characteristics Curve (AUROC) of 0.731, outperforming the existing APPLE scores (AUROC=0.644) and CHA2DS2-VASc scores (AUROC=0.650). Similarly using 12-lead ECG alone, the CNN achieved an AUROC of 0.767. Combining EGM, ECG, and clinical features, the fusion model achieved an AUROC of 0.859, outperforming single and dual modality models.

Conclusions:

Deep neural networks trained on EGM or ECG signals improved the prediction of catheter ablation outcome compared to existing clinical scores, and fusion of EGM, ECG, and clinical features further improved the prediction. This suggests the promise of using ML to help treatment planning for patients after catheter ablation.

Keywords: Machine learning, atrial fibrillation, cardiac electrophysiology, catheter ablation

Graphical Abstract

graphic file with name nihms-1823319-f0004.jpg

Introduction

Atrial fibrillation (AF) ablation is the cornerstone of therapy for symptomatic AF, and it helps improve quality of life and prolongs survival in several populations1,2. Improved tools for predicting the success of AF catheter ablation are needed to guide clinicians in better patient selection for this procedure, as well as setting realistic patient expectations following the procedure.

Clinical scores have been developed to predict success after catheter ablation of AF with Area Under the Receiver Operating Characteristics Curve (AUROC) of 0.55–0.65 for majority of the models, with rare models reaching an AUROC of 0.7535. However, none of these previous predictive scores have incorporated electrophysiological data, which may place specific AF mechanisms within the clinical context to improve predictive accuracy.

We hypothesized that (a) machine learning (ML) models trained on intracardiac electrograms (EGM) or surface electrocardiograms (ECG) signals can perform better at predicting patient outcomes after AF ablation (i.e., 1-year AF recurrence) compared to existing clinical scores and (b) multimodal fusion of EGM, ECG, and clinical features can further improve the prediction of patient outcomes.

Although there are no prior ML-based studies that directly take signals as inputs to predict AF ablation outcomes, recent advances in the use of ML in signal analysis of human rhythm disorders have led to promising preliminary results. For example, ML models were able to predict future ventricular arrhythmia from ventricular signals6. Prior works using ML to predict success of AF ablation includes estimation of recurrence by predicting shape descriptors directly from magnetic resonance imaging (MRI)7 and combining imaging and clinical biomarkers to predict cryoballoon pulmonary vein isolation (PVI) outcomes8. ML methods and personalized computational modeling have also been used together to predict recurrence following PVI9. In addition, handcrafted features derived from computerized tomography (CT) scans have been shown to be associated with likelihood of post-ablation AF recurrence10.

Deep neural networks (DNNs) are the state-of-the-art ML models that are able to learn complex features directly from large amounts of data without the need of feature engineering11. DNNs have shown promising empirical successes across a wide variety of medical domains12. Unlike previous works using classical ML models810, we aim to develop and validate (a) a deep neural network for post-ablation AF recurrence prediction from signals (EGM and ECG) and (b) a multimodal fusion framework that leverages the three modalities––EGM, ECG, and patients’ clinical features––to further improve the model performance (Figure 1A).

Figure 1. (A) Overview of our methods.

Figure 1.

The inputs come from three modalities: patient EGM signals, ECG signals, and clinical features. A multimodal machine learning model fuses the inputs from the three modalities and outputs prediction of AF recurrence. (B) Details of our multimodal fusion framework. We first trained a model on EGM signals only for AF recurrence prediction, and a separate model on ECG signals only for AF recurrence prediction. We then extracted EGM and ECG features from the respective trained models. Finally, the EGM and ECG features were concatenated with the clinical features, and were subsequently passed to a multimodal fusion model to predict AF recurrence.

Methods

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Subject recruitment

This is a retrospective analysis of consecutive adult patients with paroxysmal or persistent AF who underwent catheter ablation between 2015–2017 at a tertiary referral center by 5 providers. To be included, patients were required to have panoramic left atrial electrograms recorded prior to ablation and clinical follow-up for at least 12 months following ablation for accurate assessment of their AF ablation procedure outcomes. All patients had pulmonary vein isolation as a part of the AF ablation procedure; additional ablation lesions per the operating physicians’ discretion were allowed. This comprised of ablation of localized AF sources via focal impulse and rotor mapping (FIRM, in 100% of patients), ablation of left atrial linear lesions (in 24% of patients) and cavotricuspid isthmus (CTI) ablation (in 27% of patients).

Clinical and demographic data were obtained from electronic medical records. 12-lead ECGs in sinus rhythm obtained within 1 year of the ablation procedure were included. Patients with no 12-lead ECG available (n=3) were excluded from the ECG-only model, and were imputed with the means of the other patients’ ECG features in the fusion models. This study protocol was approved by the Institutional Review Board of Stanford University. Due to the retrospective nature of the study, no informed consent was required. The corresponding author had full access to all the data in the study and took responsibility for its integrity and the data analysis.

Ablation procedure and clinical follow-up

All procedures were performed under general anesthesia. Various ablation catheters were used to achieve PVI which included point-by-point radiofrequency ablation with a contact force sensing 3.5mm tip irrigated catheter (Biosense Webster; Abbott) or cryoballoon (Arctic Front, Medtronic). Unipolar panoramic intracardiac signals used for ML analysis were obtained prior to any ablation with a 64-pole basket catheter (FIRMap catheter, Abbott) during atrial fibrillation. If patients presented to the electrophysiology laboratory in normal sinus rhythm, AF was induced with burst pacing.

Patients were followed up routinely in the outpatient setting, and all had 3 month evaluations for at least 1 year, that included rhythm assessment with 12-lead ECGs at 3 and 6 months and a 14-day event monitor at 1 year. AF recurrence was defined as >30 second duration episodes on ECG monitoring, or >1% AF burden on device interrogation for the patients with implantable monitors. In this study, we focus on the outcome of whether a patient has recurrent AF within one year after catheter ablation.

Demographic and clinical features

The demographic variables extracted from electronic health records included patients’ age at the time of ablation, sex, height, weight, body mass index (BMI), race and ethnicity. Clinical comorbidities such as presence of hypertension (HTN), hyperlipidemia (HLD), transient ischemic attack (TIA), stroke (CVA), coronary artery disease (CAD), diabetes mellitus (DM), chronic kidney disease (CKD), congestive heart failure (CHF), and obstructive sleep apnea (OSA) were collected. Arrhythmia characteristics such as type of AF (paroxysmal, persistent or long standing persistent), and history of prior AF ablation were recorded. Structural features extracted from imaging studies included left ventricular ejection fraction (LVEF) and left atrial diameter (LAd) from transthoracic echocardiograms; and left atrial volume, surface area and sphericity index from computed tomography (CT) scans that were routinely obtained within 1 year prior to AF ablation. These variables were selected based on the literature on known factors which could impact AF ablation outcomes35,8,1316. A complete list of clinical features and number of missing values is shown in Supplemental Table I. Missing values were imputed with the most frequent value of the feature. Supplemental Table II shows model performance with different missing value imputation techniques and Supplemental Table III shows model performance in patients without missing values.

Modeling clinical features for AF recurrence prediction

As a baseline method, we built a classifier for predicting 1-year AF recurrence from demographic and clinical features. For each patient, a multi-dimensional feature vector was constructed from the clinical and demographic features, where continuous variables were normalized to have zero mean and unit variance and categorical variables were one-hot encoded. We used the categorical boosting (CatBoost) classifier17, a state-of-the-art, gradient boosted decision tree-based ML algorithm, for AF recurrence prediction. Briefly, CatBoost sequentially builds many weak learners (i.e. decision trees) and creates a strong predictive model by greedy search and ensembling. We chose CatBoost because it has been shown to outperform other gradient boosted decision tree-based algorithms and naturally handles both continuous and categorical variables17.

Preprocessing of EGM and ECG signals

In each patient, unipolar left atrial intracardiac electrograms (EGM) were recorded during atrial fibrillation. Unipolar signals were recorded from a 64-pole basket catheter positioned in the mid left atrium (LA) prior to any ablation were exported. Preprocessing of EGM signals included QRS subtraction and resampling to 200 Hz. See Supplemental Methods for details.

Preprocessing of ECG signals included a bandpass filtering of 0.05–100 Hz and resampling to 200 Hz. Eight independent ECG channels were used (channels I, II, and V1–6) as any linear dependency can be naturally learned by deep neural networks (i.e. channel III can be derived vectorially from channels I and II).

Each EGM and ECG signal was augmented by dividing into 5-sec windows with a 4-sec overlap between consecutive windows, resulting in a 1000x64 matrix for each input EGM data point and a 1000x8 matrix for each input ECG data point.

Modeling EGM and ECG signals for AF recurrence prediction

We developed a convolutional neural network (CNN) for predicting 1-year AF recurrence from EGM or ECG signals.

Similar to Attia et al.14, our CNN consisted of several layers of bottleneck blocks with 1-dimensional (1D) convolutions operating on the time dimension, followed by a 1D convolutional layer operating on the channel dimension. Intuitively, the time-dimension convolutional layers capture the temporal dependency in the signal by extracting features from signals within one channel, whereas the final channel-dimension convolutional layer aggregates the features across channels to obtain a spatial representation of the signal. Details of the CNN can be found in Supplemental Methods and Supplemental Figure I.

Fusion model for AF recurrence prediction

Finally, we developed a multimodal fusion framework that leverages more than one modality to improve the prediction of AF recurrence (Figure 1B).

First, EGM features were extracted from the CNN that was trained on EGM signals only. All the features from the EGM signals from the same patient were averaged to obtain a single EGM feature representation for each patient. ECG features for each patient were extracted in a similar way. Next, for each patient, the feature vectors of the fused modalities (i.e. EGM features, ECG features, and clinical features) were concatenated to form a multimodal feature vector. Lastly, a classifier was trained on the patients’ multimodal feature vectors for predicting 1-year AF recurrence. As a fair comparison to clinical feature-based models, we also applied the CatBoost17 classifier in the fusion framework.

As ablation experiments, we also validated fusion of two modalities (i.e. EGM and clinical features, ECG and clinical features, or EGM and ECG features) and compared the results to fusion of three modalities (EGM, ECG, and clinical features).

Model training and validation

Stratified 10-fold cross-validation (patient-wise split) was used to train and validate each of the models described above. Specifically, all patients were randomly divided into 10 groups (i.e., folds) with the same proportion of AF recurrence in each fold (i.e., stratified 10-fold). At the i-th cross-validation step, the i-th fold was used to test the model and the remaining 9 folds were used to train the model. This above process was repeated 10 times, such that each patient only appeared in one of the test folds.

To mitigate overfitting, data augmentation was applied during training. We designed five data augmentation methods using electrophysiology domain knowledge: (a) randomly shift (forward or backward in time) each 5-sec window by up to 2.5-sec, (b) randomly scale the raw signal by a factor within range 0.5 to 2, (c) randomly shift the DC value within range −10 to 10 microvolts, (d) randomly masking with zeros for up to 25% of the 5-sec window, (e) randomly add Gaussian noise with zero mean and a standard deviation < 0.2. Importantly, these data augmentations did not result in invalid signals but naturally increased the variability of the training data, which could mitigate overfitting of deep neural networks.

Training for the CNNs on EGM and ECG signals was accomplished using the Adam optimizer18 in PyTorch on a single NVIDIA P100 GPU. For CNNs, we followed the same model architecture configuration as that in Attia et al. 14 (except for reducing the number of bottleneck blocks from 9 to 6 in ECG-based CNN) and did not tune the model hyperparameters. Training for CatBoost was done using the CatBoost Python package17, and CatBoost hyperparameters were tuned using grid search (see Supplemental Methods for details). All models were trained to optimize AUROC. We assessed the model’s ability to predict 1-year AF recurrence using AUROC, sensitivity, specificity, accuracy, and F1-scores. To derive sensitivity, specificity, accuracy, and F1-scores, a probability threshold was selected based on the highest F1-score on the 10 fold test sets.

Statistical analysis

For population characteristics, continuous data are reported as mean ± standard deviation, unless otherwise stated, and are tested for normality using the Shapiro-Wilk test (p > 0.05). Independent-samples t-test and Mann-Whitney U test were run to determine if there were differences in mean values between cohorts for analysis of continuous data. Categorical variables were compared using the Pearson chi-square test or Fisher’s exact test where expected frequencies were less than 5. For model evaluation, we report the mean and standard deviation of AUROC, sensitivity, specificity, accuracy, and F1-scores of the 10 fold test results. In addition, we measure the calibration of the models using Brier score19 and expected calibration error (ECE)20. Briefly, the Brier score measures the mean squared difference between the predicted probability assigned to the possible label and the actual label. The ECE approximates the expectation between model confidence and accuracy by binning the predictions into equally-spaced bins and taking a weighted average of the bin’s accuracy and confidence difference. For both Brier score and ECE, lower values indicate better calibrated models. A statistical significance threshold (α) of 0.05 was used for all the reported tests.

Results

Overall summary

Between 2015–2017, 226 consecutive AF ablations were done using a 64-pole basket catheter that recorded simultaneous panoramic unipolar electrograms from the left and the right atria. Of these, 161 had left atrial signals recorded prior to any ablation. Five were excluded due to poor signal quality, leaving 156 patients to be analyzed for this study. Baseline characteristics of these patients are shown in Table 1. PVI was done using radiofrequency in 118 patients (76%), cryoballoon in 38 patients (24%). 34 patients (21.8%) were on an antiarrhythmic drugs (AAD) at the time of follow up (10.2% on class IC agents, 3.9% on class III agents (sotalol or dofetilide), 8.3% on amiodarone and 1.9% on dronedarone). Additional ablation lesions beyond PVI and ablation of localized sources are presented in Table 1.

Table 1.

Baseline characteristics of population

All Subjects (n=156) Free from AF (n=112) Recurrent AF (n=44) p-value
Demographics
Age (years, mean±SD ) 64.5 ± 10.5 64.5 ± 9.9 64.5 ± 11.9 0.988
Male Gender, n (%) 115 (74%) 87 (78%) 28 (64%) 0.073
Height (m, mean±SD) 1.77 ± 0.1 1.77 ± 0.1 1.77 ± 0.1 0.298
Weight (kg, mean±SD) 96.6 ± 24.4 98.1 ± 24.3 92.6 ± 24.4 0.205
BMI (kg/m2, mean±SD) 30.6 ± 6.8 31.2 ± 7.1 29.3 ± 5.8 0.117
Comorbidities
CAD, n (%) 30 (19%) 25 (22%) 5 (11%) 0.118
CHF, n (%) 32 (21%) 25 (22%) 7 (16%) 0.359
Hypertension, n (%) 104 (67%) 76 (68%) 28 (64%) 0.615
Hyperlipidemia, n (%) 88 (56%) 69 (62%) 19 (43%) 0.037
TIA or CVA, n (%) 13 (8%) 11 (10%) 2 (5%) 0.352
Diabetes mellitus, n (%) 30 (19%) 26 (23%) 4 (9%) 0.037
OSA, n (%) 59 (38%) 43 (38%) 16 (36%) 0.784
CKD, n (%) 24 (15%) 17 (15%) 7 (16%) 0.872
Prior AF ablation, n (%) 43 (28%) 26 (23%) 17 (39%) 0.052
Type of AF 0.210
Paroxysmal AF, n (%) 67 (43%) 47 (42%) 20 (46%)
Persistent AF, n (%) 66 (42%) 45 (40%) 21 (48%)
Long-standing Persistent AF, n (%) 23 (15%) 20 (19%) 3 (7%)
AF Ablation type * 0.248
Left atrial linear ablation 38 (24%) 30 (34%) 8 (18%)
CTI 42 (27%) 35 (31%) 7 (16%)
Antiarrhythmic drug use 34 (22%) 26 (23%) 8 (18%) 0.667

Values are n, mean ± standard deviation, or median (interquartile range). Categorical variables are compared using Fisher’s exact test; continuous variables using the t-test or Mann-Whitney U test if data is not normally distributed.

*:

In addition to pulmonary vein isolation and ablation of localized rotational and focal sources by FIRM mapping.

Abbreviations: CAD, coronary artery disease; CHF, congestive heart failure; TIA, transient ischemic attack; CVA, stroke; OSA, obstructive sleep apnea; CKD, chronic kidney disease; AF, atrial fibrillation; CTI, cavotricuspid isthmus ablation.

Catheter ablation outcomes

On follow-up at 1 year, 112 (72%) patients remained free of atrial fibrillation. Patients with and without recurrence had a similar age, BMI and comorbidities (Table 1). AAD use was not different among groups. 28% of the patients had a prior history of AF ablation. Presence of hyperlipidemia and diabetes mellitus correlated with AF recurrence (p=0.04) in univariate analysis. Ablation of additional left atrial lines did not correlate with AF ablation outcomes.

Validation of existing AF ablation outcome prediction scores: APPLE and CHA2DS2-VaSC

First, we validated two existing clinical feature-based prediction scores, APPLE3 and CHA2DS2-VaSC4, for 1-year AF recurrence prediction using CatBoost17. Detailed formulation of APPLE and CHA2DS2-VaSC scores can be found in Supplemental Methods.

The CatBoost classifier achieved an AUROC of 0.644 (SD=0.129) on APPLE scores and an AUROC of 0.650 (SD=0.133) on CHA2DS2-VASc scores (Table 2, 1st–2nd rows).

Table 2.

Results of 1-year AF recurrence prediction

AUROC Sensitivity Specificity Accuracy F1-Score
APPLE Score 0.644 ± 0.129 0.915 ± 0.138 0.350 ± 0.329 0.504 ± 0.213 0.533 ± 0.111
CHA2DS2-VASc Score 0.650 ± 0.133 0.905 ± 0.162 0.427 ± 0.355 0.560 ± 0.226 0.568 ± 0.124
Clinical Feature 0.755 ± 0.093 0.875 ± 0.137 0.680 ± 0.198 0.728 ± 0.121 0.656 ± 0.102
EGM 0.731 ± 0.105 0.885 ± 0.116 0.627 ± 0.131 0.701 ± 0.098 0.630 ± 0.092
ECG 0.767 ± 0.122 0.812 ± 0.176 0.770 ± 0.183 0.781 ± 0.112 0.682 ± 0.108
Fusion of EGM & Clinical Data 0.788 ± 0.110 0.905 ± 0.117 0.706 ± 0.144 0.764 ± 0.107 0.691 ± 0.117
Fusion of ECG & Clinical Data 0.836 ± 0.063 0.865 ± 0.112 0.812 ± 0.124 0.827 ± 0.070 0.747 ± 0.075
Fusion of EGM & ECG 0.833 ± 0.084 0.915 ± 0.138 0.793 ± 0.124 0.826 ± 0.083 0.753 ± 0.096
Fusion of EGM, ECG & Clinical Feature 0.859 ± 0.082 0.870 ± 0.200 0.867 ± 0.121 0.866 ± 0.076 0.784 ± 0.106

Values are mean ± standard deviation across 10-folds. Best mean results for each metric are highlighted in bold.

ML-based AF recurrence prediction from clinical features

Using clinical features, the CatBoost classifier achieved an AUROC of 0.755 (SD=0.093; Table 2, 3rd row), outperforming the performance of the CatBoost classifier trained on APPLE and CHA2DS2-VASc scores. This performance improvement is expected given that multiple clinical features were used, whereas APPLE and CHA2DS2-VASc scores only accounted for five and seven clinical features, respectively.

Figure 2 shows the model interpretation of the clinical features that contribute the most to AF recurrence prediction in our clinical feature-based model, where the five most important features are LVEF, height, body mass index (BMI), weight, left atria volume from CT, and left atria surface area; which have previously been reported to correlate with development of incident AF15,21 or poorer outcomes following AF ablation16,22.

Figure 2. Clinical feature-based model interpretation.

Figure 2.

Importance of clinical features in predicting AF recurrence using the CatBoost classifier (averaged across 10 folds). The five most important features are: left ventricular ejection fraction (LVEF), height, body mass index (BMI), weight, left atria volume from CT, and left atria surface area from CT.

ML-based AF recurrence prediction from EGM or ECG

Using EGM signals only, the CNN achieved an AUROC of 0.731 (SD=0.105) for AF recurrence prediction (Table 2, 4th row); using ECG signals only, the CNN achieved an AUROC of 0.767 (SD=0.122; Table 2, 5th row), both of which outperform APPLE and CHA2DS2-VASc scores.

In addition, we visualize examples of EGM and ECG learned by the CNNs using the Uniform Manifold Approximation and Projection23 (UMAP) dimensionality reduction technique. As shown in Supplemental Figure II, the same patient’s EGM features are clustered together, whereas different patients’ EGM features are further apart. Moreover, EGM/ECG features of patients with AF recurrence are further away from features of patients without AF recurrence, suggesting that the CNNs are able to learn distinct patterns in patients with different outcomes.

ML-based AF recurrence prediction from fusion of EGM, ECG, and clinical features

Our final fusion model that combines EGM, ECG, and clinical features achieved an AUROC of 0.859 (SD=0.082; Table 2, last row), outperforming the APPLE scores, CHA2DS2-VASc scores, and ECG or EGM signals alone, suggesting the effectiveness of our fusion framework.

Figure 3 shows the ROC curves of the clinical feature-based models, the signal-based CNN models, and the fusion model. At a low false positive rate (FPR), such as 20% FPR, our fusion model had a true positive rate (TPR) of 80%, which translates clinically to missing 20% recurrent AF patients with 20% of the predicted recurrent AF being false positives. In contrast, the CHA2DS2-VASc score-based classifier and the clinical feature-based classifier only achieved a TPR of 40% and 58%, respectively, which translates to missing 60% and 42% recurrent AF patients, respectively, with the same number of false positives.

Figure 3. Receiver operating characteristics (ROC) curves of the clinical feature-based models, signal-based models, and the fusion model.

Figure 3.

The x-axis shows the false positive rate averaged across 10 folds for each model, and the y-axis shows the true positive rate averaged across 10 folds for each model.

Moreover, combining two modalities performed better than single modalities (Table 2, 6th–8th rows), which is intuitive given that two modalities encode additional features than a single modality. Model performance in various subgroups are provided in Supplemental Results and Supplemental Tables IVVI.

In addition to discriminative measures (e.g., AUROC, sensitivity, and specificity), we evaluate the calibration of the models using Brier score19 and expected calibration error20. See Supplemental Results and Supplemental Table VII for details.

Discussion

In this study, we developed a deep convolutional neural network that encodes the spatiotemporal dependencies in EGM and ECG signals, as well as a multimodal fusion framework that leverages clinical features, EGM, and ECG for predicting 1-year AF recurrence after catheter ablation. Our study was based on a cohort of 156 patients.

To our knowledge, compared to the existing AF recurrence prediction scores to date, this provides the highest performance in predicting which patients would be free from AF one year following ablation.

Other studies evaluating prediction of AF ablation outcomes using machine learning include Shade et al.9 that utilized ML and personalized computational modeling in 32 patients to predict AF recurrence following PVI with either cryoballoon or radiofrequency approach. In their machine learning model, their sources of information (imaging, clinical data) were combined equally9. Late gadolinium enhanced MRI scans were used for imaging data. AUROC of 0.82 was reported when clinical variables were included in the model. Firouznia et al.10 extracted data from chest CT scans to establish their association with likelihood of post-ablation AF recurrence in 203 patients using a random forest classifier. Certain derived imaging features such as left atrial surface area, volume and sphericity index used in their study were also included in our model as a part of clinical features10. PVI in this study was completed with either cryoballoon or radiofrequency catheters. Moreover, posterior wall, septal, superior vena cava and CTI ablation were performed according to operator choice, although further details of extra-PVI ablation were not discussed in the study or included in the models.

In our study, all patients underwent PVI with cryoballoon or radiofrequency approach. Similar to Firouznia et al.10, patients undergoing various ablation strategies were included, including ablation of localized sources detected by FIRM mapping strategy in 100% of patients, left atrial linear lesions in 24% and CTI ablation in 27% of patients. FIRM strategy was used in all patients as it allowed simultaneous recording of unipolar signals in the left atrium prior to any ablation in this cohort, which was a prerequisite in our analysis. Our models were able to predict long-term (one year) freedom from arrhythmias independent of the ablation strategy. Clinical benefit of lesions beyond PVI in patients with persistent AF has been a subject of debate, with multiple studies showing no additional benefit of extra PV lesions in long term freedom from AF24,25, with some demonstrating incremental benefit 26,27, and larger multicenter studies underway to evaluate this further 28. Furthermore, incorporation of intracardiac electrograms indeed improved prediction of AF ablation outcomes, suggesting that an AF mechanism might be at play that could be delineated further by feature interpretation of these signals. Given the wide variety of ablation approaches used in the training and testing cohorts for our machine learning model, and limited representation of subgroups such as women, generalizability of our findings to the broader population could be limited.

Limitations

This study was performed at a single center, involves a small cohort with underrepresentation of women, and results have not been validated externally. Heterogeneity in ablation approaches may limit generalizability of the findings to specific ablation strategies. Despite this limitation, all the patients underwent PVI, and evidence of benefit of further ablation beyond PVI, including linear ablation and ablation of sites of organized rotational or focal activation, has not been proven consistently in multicenter randomized studies24,29,30. All patients in this study underwent FIRM mapping and ablation that formed the basis of the unipolar EGMs used in the model. The necessity of the use of FIRM mapping is a limitation to this study, as this is not a widely used catheter or mapping strategy in the community.

Freedom from AF appears higher for a mixed cohort of patients; but is consistent with other studies that used intermittent monitoring rather than implanted loop recorders. Intermittent monitoring of AF recurrence with 12-lead ECGs and 14-day event monitors likely underrepresents true AF recurrence, which could affect the accuracy of our predictive model. The retrospective nature of the data limited strict guidelines over AAD use in follow up, i.e. for certain patients pre-procedure AADs were continued post ablation due to patient or provider preference regardless of procedure outcome. 28% of patients had prior AF ablation, which may have impacted intracardiac signal characteristics. 12-lead ECGs in sinus rhythm prior to ablation were not available in all patients. When a patient’s 12-lead ECGs in sinus rhythm prior to ablation was not available, a 12-lead ECG in sinus rhythm immediately after ablation was used for analysis, which could result in bias in analyses. While we show that when evaluating the trained models on patients whose pre-ablation 12-lead ECGs are available, the model performance did not differ significantly from our original analysis (i.e., post-ablation 12-lead ECGs are used for patients whose pre-ablation ECGs are not available), we did not re-train the models on pre-ablation ECGs only due to the limited size of our cohort (n = 107 patients with pre-ablation ECGs). Majority of these patients had a 12-lead ECG in sinus rhythm prior to ablation that was not performed at our center in an electronic format that could be exported for analysis, due to the tertiary referral center status where the study was conducted. Some of the data that were used in the models to predict ablation success, including intracardiac signals, are obtained at the time of ablation, and may not help in patient selection for ablation procedure, but can rather guide medical management and expectations following the procedure. Furthermore, while we show that most of the trained models perform similarly on patient subgroups (patients with paroxysmal versus non-paroxysmal AF; patients with cryoablation versus radiofrequency ablation), future study with a larger cohort that trains models on these subgroups independently is needed to further compare these subgroups. Lastly, while we show that our CNNs and fusion model are better calibrated than the existing APPLE and CHASDS2-VASc scores (Supplemental Table VII), the Brier scores and expected calibration errors are still relatively high; advanced calibration techniques31 for deep neural networks need to be incorporated in the future to produce better calibrated models.

Conclusions

Our machine learning approach provides an automatic technique to predict freedom from atrial arrhythmias in patients undergoing AF ablation, outperforming traditional scoring systems. Larger datasets are needed in the future to train and validate this approach even further to help develop personalized ablation strategies for AF patients.

Supplementary Material

010850 - Supplemental Material

What is Known:

  • Atrial fibrillation (AF) ablation is the cornerstone of therapy for symptomatic AF, with increasing evidence on its safety and efficacy.

  • Clinical scores have been developed to predict success of catheter ablation, to guide better patient selection, with most clinical scores reaching an Area Under the Receiver Operating Characteristics Curve (AUROC) of 0.55–0.65 in accurately predicting AF ablation success.

What the Study Adds:

  • Deep neural networks trained on intracardiac signals and 12-lead ECG signals, in addition to clinical features, can improve the prediction accuracy of catheter ablation outcomes compared to existing clinical scores.

  • A convolutional neural network (CNN) using intracardiac signals in AF achieves an AUROC of 0.731, similarly a CNN using 12-lead ECG alone achieves an AUROC of 0.767. Fusion of EGM, ECG, and clinical features further improves the prediction (AUROC = 0.859) compared to models with a single modality.

  • Machine learning models can help treatment planning for patients after catheter ablation of atrial fibrillation through more accurate prediction of treatment outcomes.

Sources of Funding:

This work was funded by NIH (K23 HL145017) grant to Dr. Baykaner.

Dr. Rubin reports grants from NIH (1U01CA190214, 1U01CA187947, U01CA242879, U24CA226110) and consulting fees from Roche-Genentech, Dr. Narayan reports research grants from NIH (HL70529, HL103800, HL83359), consulting from LifeSignals.ai Inc, TDK Inc., Up to Date, Abbott Laboratories, and American College of Cardiology Foundation (all modest); Intellectual Property Rights from University of California Regents and Stanford University. Dr. Baykaner reports funding from NIH (K23 HL145017) and consulting fees from Medtronic, BIOTRONIK and PaceMate.

Nonstandard Abbreviations and Acronyms

AF

Atrial Fibrillation

AUROC

Area Under the Receiver Operating Characteristics Curve

ML

Machine Learning

EGM

Intracardiac Electrogram

ECG

Electrocardiogram

MRI

Magnetic Resonance Imaging

PVI

Pulmonary Vein Isolation

CT

Computerized Tomography

DNN

Deep Neural Networks

CTI

Cavotricuspid Isthmus

BMI

Body Mass Index

HTN

Hypertension

HLD

Hyperlipidemia

TIA

Transient Ischemic Attack

CVA

Stroke

CAD

Coronary Artery Disease

DM

Diabetes Mellitus

CKD

Chronic Kidney Disease

CHF

Congestive Heart Failure

OSA

Obstructive Sleep Apnea

LVEF

Left Ventricular Ejection Fraction

LAd

Left Atrial Diameter

CatBoost

Categorical Boosting Classifier

LA

Left Atrium

CNN

Convolutional Neural Network

AAD

Antiarrhythmic Drugs

SD

Standard Deviation

Footnotes

Disclosures: Ms. Tang reports no disclosures. Dr. Razeghi reports no disclosure. Dr. Kapoor reports no disclosures. Mr. Alhusseini reports intellectual property rights from Stanford University. Dr. Fazal reports no disclosures. Dr. Rogers reports no disclosure. Dr. Rodrigo reports no disclosure. Mr. Clopton reports no disclosures. Dr. Wang reports no disclosures.

References:

  • 1.Packer DL, Mark DB, Robb RA, Monahan KH, Bahnson TD, Poole JE, Noseworthy PA, Rosenberg YD, Jeffries N, Mitchell LB, Flaker GC, Pokushalov E, Romanov A, Bunch TJ, Noelker G, Ardashev A, Revishvili A, Wilber DJ, Cappato R, Kuck K-H, Hindricks G, Davies DW, Kowey PR, Naccarelli GV, Reiffel JA, Piccini JP, Silverstein AP, Al-Khalidi HR, Lee KL, CABANA Investigators. Effect of Catheter Ablation vs Antiarrhythmic Drug Therapy on Mortality, Stroke, Bleeding, and Cardiac Arrest Among Patients With Atrial Fibrillation: The CABANA Randomized Clinical Trial. JAMA 2019;321:1261–1274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Marrouche NF, Brachmann J, Andresen D, Siebels J, Boersma L, Jordaens L, Merkely B, Pokushalov E, Sanders P, Proff J, Schunkert H, Christ H, Vogt J, Bänsch D, CASTLE-AF Investigators. Catheter Ablation for Atrial Fibrillation with Heart Failure. N Engl J Med 2018;378:417–427. [DOI] [PubMed] [Google Scholar]
  • 3.Kornej J, Hindricks G, Arya A, Sommer P, Husser D, Bollmann A. The APPLE Score – A Novel Score for the Prediction of Rhythm Outcomes after Repeat Catheter Ablation of Atrial Fibrillation. PLOS ONE 2017;12:e0169933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jacobs V, May HT, Bair TL, Crandall BG, Cutler M, Day JD, Weiss JP, Osborn JS, Muhlestein JB, Anderson JL, Mallender C, Bunch TJ. The impact of risk score (CHADS2 versus CHA2DS2-VASc) on long-term outcomes after atrial fibrillation ablation. Heart Rhythm 2015;12:681–686. [DOI] [PubMed] [Google Scholar]
  • 5.Kosich F, Schumacher K, Potpara T, Lip GY, Hindricks G, Kornej J. Clinical scores used for the prediction of negative events in patients undergoing catheter ablation for atrial fibrillation. Clin Cardiol 2019;42:320–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rogers AJ, Selvalingam A, Alhusseini MI, Krummen DE, Corrado C, Abuzaid F, Baykaner T, Meyer C, Clopton P, Giles W, Bailis P, Niederer S, Wang PJ, Rappel W-J, Zaharia M, Narayan SM. Machine learned cellular phenotypes in cardiomyopathy predict sudden death. Circ Res 2021;128:172–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bhalodia R, Goparaju A, Sodergren T, Morris A, Kholmovski E, Marrouche N, Cates J, Whitaker R, Elhabian S. Deep Learning for End-to-End Atrial Fibrillation Recurrence Estimation In: 2018 Computing in Cardiology Conference (CinC). 2018. p. 1–4. [Google Scholar]
  • 8.Budzianowski J, Hiczkiewicz J, Burchardt P, Pieszko K, Rzeźniczak J, Budzianowski P, Korybalska K. Predictors of atrial fibrillation early recurrence following cryoballoon ablation of pulmonary veins using statistical assessment and machine learning algorithms. Heart Vessels 2019;34:352–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Shade JK, Ali RL, Basile D, Popescu D, Akhtar T, Marine JE, Spragg DD, Calkins H, Trayanova NA. Preprocedure Application of Machine Learning and Mechanistic Simulations Predicts Likelihood of Paroxysmal Atrial Fibrillation Recurrence Following Pulmonary Vein Isolation. Circulation: Arrhythmia and Electrophysiology 2020;13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Firouznia M, Feeny AK, LaBarbera MA, McHale M, Cantlay C, Kalfas N, Schoenhagen P, Saliba W, Tchou P, Barnard J, Chung MK, Madabhushi A. Machine learning-derived fractal features of shape and texture of the left atrium and pulmonary veins from cardiac computed tomography scans are associated with risk of recurrence of atrial fibrillation postablation. Circ Arrhythm Electrophysiol 2021;14:e009265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–444. [DOI] [PubMed] [Google Scholar]
  • 12.Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med 2019;25:24–29. [DOI] [PubMed] [Google Scholar]
  • 13.Firouznia M, Feeny AK, LaBarbera MA, McHale M, Cantlay C, Kalfas N, Schoenhagen P, Saliba W, Tchou P, Barnard J, Chung MK, Madabhushi A. Machine Learning–Derived Fractal Features of Shape and Texture of the Left Atrium and Pulmonary Veins From Cardiac Computed Tomography Scans Are Associated With Risk of Recurrence of Atrial Fibrillation Postablation. Circulation: Arrhythmia and Electrophysiology 2021;14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, Carter RE, Yao X, Rabinstein AA, Erickson BJ, Kapa S, Friedman PA. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 2019;394:861–867. [DOI] [PubMed] [Google Scholar]
  • 15.Lavie CJ, Pandey A, Lau DH, Alpert MA, Sanders P. Obesity and Atrial Fibrillation Prevalence, Pathogenesis, and Prognosis: Effects of Weight Loss and Exercise. J Am Coll Cardiol 2017;70:2022–2035. [DOI] [PubMed] [Google Scholar]
  • 16.Khaykin Y, Oosthuizen R, Zarnett L, Essebag V, Parkash R, Seabrook C, Beardsall M, Tsang B, Wulffhart Z, Verma A. Clinical predictors of arrhythmia recurrences following pulmonary vein antrum isolation for atrial fibrillation: predicting arrhythmia recurrence post-PVAI. J Cardiovasc Electrophysiol 2011;22:1206–1214. [DOI] [PubMed] [Google Scholar]
  • 17.Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems 2018. [Google Scholar]
  • 18.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. 2015. [Google Scholar]
  • 19.Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1–3. [Google Scholar]
  • 20.Naeini MP, Cooper GF, Hauskrecht M. Obtaining Well Calibrated Probabilities Using Bayesian Binning. Proc Conf AAAI Artif Intell 2015;2015:2901–2907. [PMC free article] [PubMed] [Google Scholar]
  • 21.Santhanakrishnan R, Wang N, Larson MG, Magnani JW, McManus DD, Lubitz SA, Ellinor PT, Cheng S, Vasan RS, Lee DS, Wang TJ, Levy D, Benjamin EJ, Ho JE. Atrial Fibrillation Begets Heart Failure and Vice Versa: Temporal Associations and Differences in Preserved Versus Reduced Ejection Fraction. Circulation 2016;133:484–492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pathak RK, Middeldorp ME, Lau DH, Mehta AB, Mahajan R, Twomey D, Alasady M, Hanley L, Antic NA, McEvoy RD, Kalman JM, Abhayaratna WP, Sanders P. Aggressive risk factor reduction study for atrial fibrillation and implications for the outcome of ablation: the ARREST-AF cohort study. J Am Coll Cardiol 2014;64:2222–2231. [DOI] [PubMed] [Google Scholar]
  • 23.McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv [stat.ML] 2018. [Google Scholar]
  • 24.Verma A, Jiang C-Y, Betts TR, Chen J, Deisenhofer I, Mantovan R, Macle L, Morillo CA, Haverkamp W, Weerasooriya R, Albenque J-P, Nardi S, Menardi E, Novak P, Sanders P, STAR AF II Investigators. Approaches to catheter ablation for persistent atrial fibrillation. N Engl J Med 2015;372:1812–1822. [DOI] [PubMed] [Google Scholar]
  • 25.Brachmann J, Hummel JD, Wilber DJ, Sarver AE, Rapkin J, Shpun S, Szili-Torok T. Prospective randomized comparison of rotor ablation vs conventional ablation for treatment of persistent atrial fibrillation—The REAFFIRM trial. Heart Rhythm 2019;16:963–965. [Google Scholar]
  • 26.Clarke J-RD, Piccini JP, Friedman DJ. The role of posterior wall isolation in catheter ablation of persistent atrial fibrillation. J Cardiovasc Electrophysiol 2021;32:2567–2576. [DOI] [PubMed] [Google Scholar]
  • 27.Narayan SM, Baykaner T, Clopton P, Schricker A, Lalani GG, Krummen DE, Shivkumar K, Miller JM. Ablation of rotor and focal sources reduces late recurrence of atrial fibrillation compared with trigger ablation alone: extended follow-up of the CONFIRM trial (Conventional Ablation for Atrial Fibrillation With or Without Focal Impulse and Rotor Modulation). J Am Coll Cardiol 2014;63:1761–1768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Terricabras M, Piccini JP, Verma A. Ablation of persistent atrial fibrillation: Challenges and solutions. J Cardiovasc Electrophysiol 2020;31:1809–1821. [DOI] [PubMed] [Google Scholar]
  • 29.Thiyagarajah A, Kadhim K, Lau DH, Emami M, Linz D, Khokhar K, Munawar DA, Mishima R, Malik V, O’Shea C, Mahajan R, Sanders P. Feasibility, Safety, and Efficacy of Posterior Wall Isolation During Atrial Fibrillation Ablation: A Systematic Review and Meta-Analysis. Circ Arrhythm Electrophysiol 2019;12:e007005. [DOI] [PubMed] [Google Scholar]
  • 30.Kirzner JM, Raelson CA, Liu CF, Thomas G, Ip JE, Lerman BB, Markowitz SM, Cheung JW. Effects of focal impulse and rotor modulation‐guided ablation on atrial arrhythmia termination and inducibility: Impact on outcomes after treatment of persistent atrial fibrillation. Journal of Cardiovascular Electrophysiology 2019;30:2773–2781. [DOI] [PubMed] [Google Scholar]
  • 31.Guo C, Pleiss G, Sun Y, Weinberger KQ. On Calibration of Modern Neural Networks In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. PMLR; 06--11 Aug 2017. p. 1321–1330. [Google Scholar]
  • 32.Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift In: Bach F, Blei D, editors. Proceedings of the 32nd International Conference on Machine Learning. Lille, France: PMLR; 2015. p. 448–456. [Google Scholar]
  • 33.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016. [Google Scholar]
  • 34.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15:1929–1958. [Google Scholar]
  • 35.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–845. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

010850 - Supplemental Material

RESOURCES