Skip to main content
European Heart Journal. Digital Health logoLink to European Heart Journal. Digital Health
. 2026 Feb 2;7(2):ztag015. doi: 10.1093/ehjdh/ztag015

Development and multicentre validation of an artificial intelligence electrocardiogram model for ventricular remodeling in repaired tetralogy of Fallot

Son Q Duong 1,2,3,✉,3, Akhil Vaid 4,5, Pengfei Jiang 6,7, Yuval Bitterman 8, Yamini Krishnamurthy 9, I Min Chiu 10, Joshua Finer 11, Brian Cleary 12, Benjamin S Glicksberg 13,14,15, Ruchira Garg 16, Michael DiLorenzo 17, Mark Friedberg 18, Evan Zahn 19, Matthew Lewis 20, Michael Satzer 21, David Ouyang 22, Pierre Elias 23, Tal Geva 24, Sunil Ghelani 25, Brett R Anderson 26,27,28, Ali Zaidi 29,30, Rachel M Wald 31,32, Girish N Nadkarni 33,34,#, Joshua Mayourian 35,#
PMCID: PMC12902437  PMID: 41695565

Abstract

Aims

Periodic cardiac MRI (CMR) is recommended to identify adverse ventricular remodelling in repaired tetralogy of Fallot (TOF), but access to CMR is uneven, and compliance is poor. We developed a 12-lead electrocardiogram (ECG) artificial intelligence (AI) biomarker to identify CMR-quantified adverse biventricular remodelling in repaired TOF.

Methods and results

Six (1 train/5 external test) North American retrospective cohorts with paired ECG and CMR were included. The main outcome was a composite of ≥2 TOF-specific CMR abnormalities: right ventricular (RV) end-diastolic volume ≥ 160 mL/m2, RV end-systolic volume ≥ 80 mL/m2, RV ejection fraction (EF) <47%, and left ventricular EF <55%. Model discrimination, calibration, and net benefit as a screening test to rule out ventricular remodelling were assessed. Nine hundred and eight patients (2552 ECGs) were included in training, and 782 patients (1795 ECGs) in external validation (outcome prevalence 57%). The area under the receiver-operating curve (AUROC) was 0.85 (95% confidence interval 0.83–0.87), and average precision was 0.88. At a screening risk-threshold of 0.25, there was 92% sensitivity, 41% specificity, 87% negative predictive value, and 55% positive predictive value for ventricular remodelling, which yielded a 13% net reduction in CMR use on net benefit analysis. There was no difference by sex or race/ethnicity, but there were differences by age and site, with two of five sites with lower AUROC than the others, and three of five sites met criteria for miscalibration, which improved after centre-specific calibration.

Conclusion

An artificial intelligence analysis of electrocardiogram (AI-ECG) biomarker in repaired TOF effectively identifies ventricular remodelling to inform timing of advanced imaging. Extensive external validation revealed variation in discrimination and calibration that are important considerations for clinical implementation and regulatory approval pathways of AI-ECG in congenital heart disease.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Tetralogy of Fallot (TOF) is the most common cyanotic congenital heart disease, affecting 1 in 3000 live births.1 Right ventricular outflow tract (RVOT) dysfunction after repair of TOF is nearly universal and characterized by chronic pulmonary regurgitation with residual obstruction. Although well-tolerated early in life, chronic RVOT dysfunction precipitates a pathophysiological cascade that may lead to morbidity and mortality after the second decade of life.2

Lifelong surveillance is recommended for early detection of electromechanical cardiomyopathy that might be ameliorated by pulmonary valve replacement (PVR) to increase survival.3,4 Guidelines recommend serial monitoring and RVOT intervention with PVR at ventricular size and functional thresholds that represent a ‘tipping point’ before irreversible changes occur.2,4,5

Because assessment of RV size and systolic function by echocardiography is suboptimal,6,7 cardiac magnetic resonance imaging (CMR) is recommended for assessment of RV volumes, ejection fraction (EF), and mass. However, CMR is infeasible in some patients, requires specialized expertise and equipment, and typically exceeds 2 h of patient and clinician time to acquire, process, and report.8 These factors limit their use in under-resourced settings, as half of adults with congenital heart disease live more than an hour from an appropriate care centre,9 and demonstrate low adherence to guideline-recommended diagnostic imaging.10 Furthermore, frequent CMR in TOF may not be necessary in select patients.11,12 This population is therefore an ideal target for the development of broadly available precision diagnostics to increase care efficiency and expand access to quantitative right ventricular (RV) assessment.

Artificial intelligence analysis of electrocardiograms (AI-ECG) is a novel method for RV assessment. Artificial intelligence analysis of electrocardiogram methods can predict CMR-quantified RV dilation and dysfunction in adult13 and congenital heart disease populations,14 and AI-ECG complements echocardiogram-based RV functional assessment.15 However, the performance of AI-ECG to identify adverse ventricular remodelling at clinically important quantitative thresholds in repaired TOF has not been explored. Additionally, prior AI-ECG studies in paediatric and congenital heart disease are limited by a lack of multiple-centre external validation, and thus, differences in performance and calibration have yet to be explored. This is an important next step in the regulatory approval pathway for these novel diagnostics.

This study sought to develop and externally validate an AI-ECG model in six centres across North America to predict the risk of clinically important CMR-based biventricular size and systolic functional abnormalities in patients with repaired TOF.

Methods

Participating centres

Model training occurred at a large northeastern combined paediatric and adult congenital heart disease (ACHD) centre. External validation occurred at five hospital-based centres consisting of mixed ACHD and paediatric practices across the USA and Canada. An overview of the training and validation process is shown in Figure 1. Institutional Review Board approval with a waiver of consent was obtained at all participating institutions. This study adheres to EHRA-AI16 and TRIPOD-AI17 reporting guidelines (see Supplementary material online).

Figure 1.

Figure 1

Study overview. A multicentre study to train and validate a deep learning electrocardiogram model to predict the abnormalities in biventricular size and function on cardiac MRI for patients with tetralogy of Fallot.

Inclusion criteria

Investigators at each participating centre retrospectively identified patients of any age followed at their institution with TOF status post full intracardiac repair, body surface area (BSA) ≥ 1 m2, and at least one 12-lead ECG performed within 90 days of a CMR without an intermediate intervention (cardiac catheterization or surgical). At Site B, part of the cohort submitted was existing longitudinal registry data consisting of patients with moderate-or-greater pulmonary regurgitation at enrolment.18

Data collection

Clinical and demographic data were collected from electronic health records, internal databases, and imaging reports per site-specific practices (see Supplementary material online, Methods). Electrocardiograms obtained during clinical care were retrospectively identified and extracted from MUSE ECG management system in XML format (GE Healthcare, USA) either manually (Site E, which submitted 1 ECG per CMR) or through database query (all other sites, resulting in ≥ 1 ECG per CMR in the inclusion period). Electrocardiogram tracings were 10 s acquisitions sampled at either 250 or 500 Hz. Only the standard 12-leads were included. Cardiac MRI volumes and EF were collected from CMR reports performed during the course of clinical care, except for registry participants at Site B, at which CMR volumes were remeasured in a core lab as part of the existing registry protocol.18

Prediction target

Cardiac MRI in TOF quantifies adverse ventricular remodelling to guide the timing of PVR. These CMR-defined criteria were the prediction targets of interest:2–4,19,20 BSA-indexed RV end-diastolic volume (RVEDVi) ≥ 160 mL/m2, BSA-indexed RV end-systolic volume (RVESVi) ≥ 80 mL/m2, RVEF < 47%, LVEF < 55%, and a composite of ≥2 of the above criteria. The five model outputs were prediction probabilities ranging from 0 to 1 for their respective CMR abnormality. As guidelines suggest referral for PVR with ≥2 CMR abnormalities in asymptomatic individuals, the ≥2 composite criteria were considered the main outcome and were subsequently evaluated for clinical utility as a screening tool to identify adverse ventricular remodelling.

Model selection, architecture, and training

To maximize sample size, the training set was partitioned 90% training and 10% validation for monitoring loss and hyperparameter tuning, with no other data split for internal testing at the training site. Instead, model performance was evaluated exclusively at external validation sites. Increasing the training sample by combining data across centres for multisite training was not feasible due to data-sharing restrictions. Starting model weights were from a previously published congenital heart disease model trained on over 90 000 ECGs.14 The network architecture is identical to previous work14,21 where 12 × 2048 ECG inputs are used as inputs into a convolutional neural network that includes residual blocks adapted for unidimensional signals.22 Details of model training are found in Supplementary material online, Methods.

Multicentre external validation

The model and pipeline for inference were packaged into a Docker container (Docker Inc., USA) for reproducibility of inference across all external validation centres. Global model discrimination was measured with area under the receiver-operating curve (AUROC) and average precision [analogous to area under the precision–recall curve (AUPRC)] with bootstrapped 95% confidence intervals. Subgroups were analysed with Bonferroni-corrected Delong Test. The containerized model and pipeline used for multicentre external inference and evaluation are available at https://github.com/sonqduong/ECGsizefxn_rTOF. Other software used and data availability statement can be found in Supplementary material online, Methods.

Model calibration

Model calibration was assessed visually with reliability diagrams and statistically with the Spiegelhalter Z test.23 A Z test P < 0.05 in a model with AUROC >0.65 was considered evidence of miscalibration. In centres that met criteria for miscalibration, centre-specific Platt scaling23 was performed, which does not affect model AUROC at individual centres but improves calibration. Platt scaling was performed using leave-one-group-out cross-validation (with grouping at the patient level) to prevent data leakage and reduce overoptimistic estimates of performance. The calibrated results were then aggregated across centres to allow for global examination of performance.

Clinical utility

The composite outcome model was selected to demonstrate the clinical utility of AI-ECG as a biomarker to screen for adverse ventricular remodelling suggestive of the need for PVR. The cohort was further limited to the closest available ECG to the time of CMR to mimic the expected clinical use case of the algorithm. Threshold-specific accuracy metrics: sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were reported at a 25% risk threshold, which represented a clinically reasonable pre-test odds of disease at which CMR could be deferred. The net benefit with decision curve analysis24 was implemented as described in the Supplementary material online, Methods to evaluate net reduction in CMR with implementation of the AI-ECG model at this threshold and over a range of clinically reasonable thresholds.

Model explainability

Median waveform analysis provides a visualization of representative high-risk and low-risk ECGs. Saliency mapping was performed using a Shapley Additive Explanations framework7 to visualize input ECG features that contribute most to model prediction. Model explainability was performed on ECGs from test Site A using saliency mapping on median waveforms from 10 independent patient ECGs from the highest and lowest AI-ECG risk scores, similar to prior work.25,26

As a comparator model, the final reported QRS duration on a subset of ECG from Site A and Site B was collected, and a simple univariable logistic regression model was fit using patient-stratified five-fold cross-validation to predict the primary outcome using QRS duration as a sole predictor. Area under the receiver-operating curve and decision curve analysis were reported for the AI-ECG model.

Results

Cohort characteristics

The training cohort consisted of 908 patients with 1991 CMR and 2552 ECGs. The aggregated external validation cohort consisted of 782 patients with 996 CMR and 1795 ECG (Table 1). The aggregated external validation cohort had a higher prevalence of disease outcomes compared with the training site (34 vs. 23% for RVEDVi ≥ 160, 48 vs. 29% for RVESVi ≥ 80, 61 vs. 36% for RVEF < 47%, 51 vs. 35% for LVEF < 55%, 57 vs. 38% for composite of ≥2 criteria; P < 0.001 for all comparisons). There was heterogeneity across external validation centres in race/ethnicity, TOF subtype, age at study, year of study, body size measures, and RVOT type at CMR (P < 0.001 for all measures, Table 1). There were significant group differences in RV volumes and biventricular EFs across validation centres (P < 0.001).

Table 1.

Cohort characteristics

Train site Aggregated external validation Individual external validation centres
Site A Site B Site C Site D Site E P-value*
Patient characteristics Independent patients, n 908 782 161 250 54 175 142
Female sex, n (%) 405 (45%) 351 (46%) 71 (49%) 104 (42%) 28 (52%) 76 (45%) 70 (49%) 0.41
Race/ethnicity
 Non-Hispanic Asian 29 (3%) 70 (9%) 11 (7%) 33 (13%) 8 (15%) 6 (3%) 12 (8%) <0.001
 Non-Hispanic Black 33 (4%) 55 (7%) 19 (12%) 4 (2%) 4 (7%) 8 (5%) 20 (14%)
 Hispanic 51 (6%) 95 (12%) 27 (17%) 4 (2%) 16 (30%) 12 (7%) 36 (25%)
 Other/mixed 35 (4%) 45 (6%) 16 (10%) 12 (5%) 2 (4%) 12 (7%) 3 (2%)
 Non-Hispanic Pacific Islander 4 (1%) 3 (2%) 0 (0%) 0 (0%) 0 (0%) 1 (1%)
 Unknown 152 (17%) 174 (22%) 27 (17%) 67 (27%) 1 (2%) 78 (45%) 1 (1%)
 Non-Hispanic White 608 (67%) 339 (43%) 58 (36%) 130 (52%) 23 (43%) 59 (34%) 69 (49%)
TOF subtype
 Pulmonary stenosis 580 (64%) 530 (68%) 139 (86%) 181 (72%) 18 (33%) 91 (52%) 102 (72%) <0.001
 Pulmonary atresia 161 (18%) 68 (9%) 16 (10%) 9 (4%) 6 (11%) 19 (11%) 18 (13%)
 PA/MAPCAs 7 (1%) 18 (2%) 2 (1%) 0 (0%) 1 (2%) 7 (4%) 8 (6%)
 Absent pulmonary valve 6 (1%) 22 (3%) 1 (1%) 0 (0%) 0 (0%) 9 (5%) 11 (8%)
 TOF-AV canal defect 11 (1%) 6 (1%) 1 (1%) 0 (0%) 1 (2%) 2 (1%) 2 (1%)
 Unspecified 143 (16%) 138 (18%) 2 (1%) 60 (24%) 28 (52%) 47 (27%) 1 (1%)
CMR Characteristics Independent CMR, n 1991 996 242 263 72 184 235
Age, median [IQR] 23 [16–34] 23 [16–34] 25 [18–35] 26 [18–40] 28 [22–40] 27 [17–41] 17 [14–22] <0.001
Year, median [IQR] 2012 [‘08–‘16] 2016 [’14–’20] 2019 [‘14–‘22] 2015 [‘14–‘17] 2017 [‘15–‘18] 2021 [‘18–‘22] 2015 [‘11–‘19] <0.001
Weight, median [IQR] 65 [52–79] 63 [52–78] 67 [56–79] 65 [52–80] 67 [54–84] 61 [50–75] 58 [49–70] <0.001
BSA, median [IQR] 1.7 [1.5–1.9] 1.7 [1.5–1.9] 1.8 [1.6–1.9] 1.7 [1.5–2.0] 1.8 [1.5–2.0] 1.7 [1.5–1.9] 1.6 [1.4–1.8] <0.001
Obese, n (%) 374 (19%) 162 (18%) 46 (19%) 48 (18%) 15 (21%) 17 (18%) 36 (15%) 0.75
RVOT anatomy at the time of CMR
 Surgical/transcatheter PVR, n (%) 198 (20%) 52 (21%) 5 (2%) 37 (51%) 59 (32%) 45 (19%) <0.001
 RV-PA conduit, n (%) 117 (12%) 15 (6%) 11 (4%) 2 (3%) 18 (10%) 71 (30%)
 Transannular patch, n (%) 382 (38%) 76 (31%) 136 (52%) 18 (25%) 90 (49%) 62 (26%)
 Valve-sparing, n (%) 175 (18%) 29 (12%) 75 (29%) 10 (14%) 7 (4%) 54 (23%)
 Unknown, n (%) 124 (12%) 70 (29%) 36 (14%) 5 (7%) 10 (5%) 3 (1%)
RVEDVi, median [IQR] 129 [107–153] 132 [110–156] 126 [105–151] 150 [129–179] 129 [109–154] 134 [112–159] 119 [103–138] <0.001
RVESVi, median [IQR] 63 [515–80] 69 [55–87] 63 [51–75] 85 [68–104] 61 [47–77] 72 [58–94] 63 [52–76] <0.001
RVEF, median [IQR] 50 [45–55] 47 [42–52] 50 [45–54] 45 [40–49] 51 [47–58] 45 [39–50] 48 [42–52] <0.001
LVEF, median [IQR] 58 [54–62] 56 [52–60] 58 [55–63] 55 [51–59] 61 [57–66] 55 [51–59] 55 [50–58] <0.001
ECG prediction Independent ECGs, n 2552 1795 473 540 90 457 235
RVEDVi > 160, n (%) 590 (23%) 608 (34%) 90 (19%) 314 (58%) 25 (28%) 159 (35%) 20 (9%) <0.001
RVESVi > 80, n (%) 752 (29%) 854 (48%) 151 (32%) 393 (73%) 23 (26%) 240 (53%) 47 (20%) <0.001
RVEF < 47%, n (%) 911 (36%) 1086 (61%) 201 (43%) 410 (76%) 26 (29%) 345 (75%) 104 (44%) <0.001
LVEF < 55%, n (%) 892 (35%) 909 (51%) 182 (39%) 336 (62%) 18 (20%) 258 (56%) 115 (49%) <0.001
>2 Criteria, n (%) 966 (38%) 1019 (57%) 185 (39%) 420 (78%) 28 (31%) 298 (65%) 88 (37%) <0.001

* P-value for differences between external validation sites.

External validation model performance

The centre-aggregated primary outcome model AUROC was 0.85 (95% CI 0.83–0.87), and the average precision was 0.88 (95% CI 0.86–0.90). Notably, there was variation observed in AUROC across centres, which ranged from 0.69 to 0.88 (AUROC Site A: 0.87, B: 0.88, C: 0.82; D: 0.74, E: 0.69; see Supplementary material online, Figure S1). Three of five centres met criteria for miscalibration (Sites B, D, E; see Supplementary material online, Results), with miscalibrated sites tending to underestimate the risk of ventricular remodelling (See Supplementary material online, Figure S1). After within-centre calibration, the model was adequately calibrated with expected calibration error of 6.1% (Spiegelhalter Z test P = 0.25, Figure 2). Clinical utility of model implementation as a screening tool to identify adverse ventricular remodelling was examined for the ECG prediction closest to the time of CMR (outcome prevalence of 43%) at a risk threshold of 0.25. Performance metrics were sensitivity 92% (95% CI 89–95%), specificity 41% (95% CI 37–45%), PPV 54% (95% CI 50–57%), and NPV 87% (95% CI 83–91%). In net benefit analysis, implementation of the model as a screening test for ventricular remodelling yielded a net 13% reduction in CMR without missing any additional true positives at a risk threshold of 0.25. Limiting the evaluation to only well-performing centres increased the net reduction in CMR to 21%. Decision curves show a net reduction of the model compared to ‘CMR all’ and ‘CMR none’ strategies in a clinically acceptable risk threshold range of 8% to 50% (see Figure 3). Net benefit was examined across a range of disease prevalence from 25 to 50% due to concern that the study inclusion criteria favoured more diseased groups (see Supplementary material online, Figure S2). The projected net reduction in CMR increased to 24.5% at a prevalence of 25%, implying AI-ECG screening could be more effective to reduce CMR usage in settings where ventricular remodelling is rarer.

Figure 2.

Figure 2

Model discrimination and calibration. Receiver-operating characteristic curves (left panel) and precision–recall curves (middle panel) to identify cardiac MRI-quantified volumetric and functional abnormalities in patients with repaired tetralogy of Fallot. Composite outcome reliability diagram (right panel). Calibrated predictions were aggregated across all sites. ECE, expected calibration error. P-value is Spieghalter’s Z test P-value in which a P < 0.05 suggests model miscalibration.

Figure 3.

Figure 3

Decision curve analysis. Implementation of artificial intelligence analysis of electrocardiogram as a screening tool to detect adverse ventricular remodelling to aid in the timing of advanced imaging in tetralogy of Fallot. The net reduction in cardiac MRI (green line) is plotted against the current strategy of every patient undergoing cardiac MRI (red dashed line) and the strategy of no patient undergoing cardiac MRI (black dashed line). The well-performing subset of sites (A, B, C) is also separately plotted to examine the best-case net benefit (green dashed line).

Examining the utility of the model as a tool to rule-in disease (i.e. tuned for specificity at a risk threshold of 0.75), the model sensitivity was 43% (40–46%), specificity 94% (92–96%), PPV 84% (78–89%), and NPV 68% (65–71%). In total, 59% of the study cohort fell within either the low- or high-risk thresholds.

Model performance of the subcomponent volume and systolic functional CMR metrics to the primary composite model is shown in Figure 2. Prediction of RVESV and RVEF derangements has similar performance to the composite model (AUROC 0.85 for both). Right ventricular end-diastolic volume prediction performance is moderately lower (AUROC 0.78). Discrimination between those above and below an LVEF of 55% is limited (AUROC 0.69).

Subgroup analysis

Subgroup analysis of the composite outcome aggregated across centres is shown in Figure 4. There was no difference in model discrimination by patient sex (P = 0.18), race/ethnicity (P = 1.0), or presence of PVR after intracardiac repair (P = 1.0). There was lower performance in children and young adults <22 years old vs. older adults (AUROC 0.73 vs. 0.85; P = 0.005). Due to this finding, threshold-specific metrics were re-examined in the young adult subgroup, and despite the drop in AUROC, at the screening threshold to rule out adverse ventricular remodelling, the sensitivity was 85% and NPV was 83%, and there was 9% net benefit. At the specific ‘rule-in’ threshold, specificity was 97% and PPV 83%.

Figure 4.

Figure 4

Key subgroup performance. Key subgroup model performance in the aggregated external test cohort. Bonferroni-corrected Delong test P-value for the difference in area under the receiver-operating curve displayed on the right.

Obesity was also associated with improved prediction (AUROC 0.90 vs. 0.84, P = 0.025). However, age and BMI were positively correlated (Spearman rho 0.49, P < 0.001), and in the cohort <22 years, the proportion of obese was significantly lower compared with the older group (10.2 vs. 19.3%, P < 0.001). Therefore, AUROC was examined in the older age cohort, and no prediction performance difference was identified (AUROC 0.90 vs. 0.89 for obese vs. non-obese).

Model explainability with saliency mapping

Saliency mapping example of the composite outcome model is shown in Figure 5. Mapping suggests that the QRS complex is the most important region for the prediction of disease state, particularly in V1 and V6. High-risk features include R’ notching in V1, V2, and a taller R wave in V6.

Figure 5.

Figure 5

Median waveform analysis and saliency mapping. Median waveforms of the 10 highest (red line) and 10 lowest (green line) predicted risk for the primary outcome in independent patient electrocardiograms from Site A. Saliency mapping (blue shade) demarcates important prediction regions of the waveform.

Comparison to QRS duration model

A univariable logistic regression model to predict adverse ventricular remodelling using QRS duration was fit from 392 available ECGs and compared with the matching AI-ECG predictions. Artificial intelligence analysis of electrocardiograms outperformed QRS duration (AUROC 0.87 vs. 0.78, see Supplementary material online, Figure S3). QRS duration model was properly calibrated (Spieghalter’s Z test P = 0.99), but at a similar clinical decision threshold of 0.25, there was no net reduction in CMR, and it was slightly harmful (net reduction −0.02).

Error analysis

To better understand the characteristics of the false negative predictions and the implications on care delivery, studies were partitioned into correct and misclassified groups: true positive, false negative, false positive, and true negative categories to examine CMR measurements and age at CMR (Table 2). The false negative group had less extreme median adverse remodelling than the true positive group. The false negative group was also younger than the false positive group (median age 17.0 vs. 27.1 years P < 0.001), again demonstrating the diminished prediction performance by age.

Table 2.

Characteristics of artificial intelligence analysis of electrocardiogram model in correct and misclassified groups

True positive False negative False positive True negative
n 392 34 337 231
Age (years) 27.1 [19.1–41.0] 17.0 [14.0–24.9] 21.2 [15.4–30.6] 21.9 [14.8–29.6]
RVEDVi (mL/m2) 159.7 [135.0–186.4] 150.9 [117.6–172.8] 120.8 [104.2–135.0] 117.0 [98.9–134.5]
RVESVi (mL/m2) 92.0 [80.7–109.1] 84.8 [68.4–90.0] 60.2 [51.7–69.1] 54.1 [44.9–65.7]
RVEF (%) 41.2 [35.8–45.0] 44.3 [40.1–49.0] 50.0 [47.0–53.2] 52.7 [49.9–56.9]
LVEF (%) 52.0 [47.4–55.4] 53.6 [50.0–58.0] 58.0 [55.2–61.0] 59.3 [55.7–64.0]

Results are presented as median [IQR].

Discussion

The widely available and cost-effective 12-lead ECG can serve as an AI-ECG biomarker to personalize the assessment of adverse ventricular remodelling in TOF. By linking together CMR-defined criteria linked to adverse ventricular remodelling with the ECG waveform, this unique biomarker may inform the timing of further imaging and may be used to effectively reduce the number of CMR required in TOF. A strength of this study is the breadth of multiple external validations, which advance important technical and pragmatic aspects of multicentre AI model implementation in congenital heart disease. Important additional novel findings of this study are (i) RV contractile metrics such as RVESV and RVEF were better predicted than RVEDV; and (ii) AI-ECG models exhibited important differences in discrimination and calibration across external validation sites.

Clinical significance

The output of the AI-ECG model is the probability of CMR volumetric or functional derangements indicative of adverse ventricular remodelling. This may be used as a novel biomarker to risk-stratify the need and mode of advanced imaging. As the recommended CMR screening interval in asymptomatic adult TOF population is every 3 years,4,5 a proposed use of this model is to personalize the timing and mode of surveillance imaging. For low-risk patients who have less than a 25% risk of having an actionable CMR abnormality, use of this test may delay (but not necessarily replace due to false negatives) the need for surveillance CMR by a year, with 92% sensitivity and 87% NPV. A method of evaluating benefit is net benefit and decision curve analysis, which describes the net reduction in CMR in an acceptable range of risks. Acceptable risk is dependent on patient and practitioner practices. In this analysis, anything less than a 25% risk, or three ‘negative’ for every 1 CMR ‘positive’ for evidence of significant ventricular remodelling, was considered a reasonable range of risk to delay a CMR given the typically slow progression of ventricular remodelling in older children and adults.11 Because current practice is to obtain CMR on all patients, implementation of this strategy is equivalent to a strategy that identified every case of disease in the population but only referred 87% of patients for CMR. Importantly, decision curves illustrate net reduction over a range of thresholds of patient and provider risk preference. A net reduction in CMR was observed for all risk probabilities at and above 8%, meaning in only very risk-adverse providers or patients, would the current strategy of obtaining CMR on every patient be warranted. Although the model utilized North American CMR-based surveillance guidelines as prediction targets, these are similar to European Society of Cardiology guidelines, which also state RVEDVi ≥160 mL/m2, RVESVi ≥ 80 mL/m2, and RV systolic dysfunction are Class IIa recommendations for PVR.27

The AI-ECG biomarker may also be informative for the patient with a high risk (>75%) of adverse ventricular remodelling. They could be referred for earlier-than-usual CMR for earlier disease identification, or they may be referred directly for retrospective ECG-gated cardiac CT for transcatheter pulmonary valve evaluation28 and volumetric assessment to eliminate redundant imaging. Finally, patients with intermediate risk would continue usual care. The elimination of unnecessary or redundant imaging could increase cost-effectiveness and ensure that limited resources are used in those most likely to benefit.

This approach meets a particularly salient need in the ACHD population. Shortfalls in the availability of ACHD providers29 lead to unequal access to speciality care. Early adulthood is a particularly high-risk period for patients with congenital heart disease to lose access to care,30 and in one study, only 13% of patients were adherent to imaging guidelines10 highlighting the challenge of delivering appropriate care to this population. Furthermore, this is a high-risk age window in which adverse ventricular remodelling may develop, and PVR may be indicated.31 Artificial intelligence analysis of electrocardiograms could be an effective tool to provide increased access to congenital care in this at-risk population, as it may be performed with non-specialized equipment and providers. Even though global model discrimination (AUROC) was decreased in patients <22 years old, it is important to recognize that the screening threshold metrics still suggest clinical utility and net benefit in this patient population. Despite this, and even though the global false negative rate is low (high test sensitivity), error analysis suggests that the characteristic of the ‘missed’ patient tends to be the younger patients with CMR measurements closer to the threshold for intervention. The clinical impact of delaying diagnosis in younger patients may be minimized as PVR in children is less common,5 and the vast majority of TOF patients do not progress rapidly across serial CMR.11 However, as guidelines recommend a ‘baseline’ CMR in adolescence near the time of transition to adulthood,32 this algorithm may not be suited to replace this initial CMR. Differences in model performance may be related to underrepresentation within the training cohort due to the experimental requirement for a paired CMR (which is typically first obtained in older children and young adults) and BSA >1 m2.

Pathophysiological relevance

The ECG patterns in TOF convey important electromechanical information. Right bundle branch block QRS duration is prognostic, with a duration >180 ms a risk factor for sudden cardiac death.33,34 Notably, the AI-ECG model outperformed QRS duration for the prediction of adverse remodelling, and QRS duration alone was insufficient to reduce CMR usage at clinically relevant screening thresholds. QRS fragmentation is a more recently described pattern that has been associated with RV size, function, exercise tolerance, and outcome.35–40 Saliency mapping in this study identifies the later portions of the QRS complex, during early to mid-systole, as an important area for risk stratification, with notching of the R’ and R height observed as important factors. Interestingly, new data deemphasize the importance of diastolic size in risk stratification and emphasize that contractile metrics like RVESV and RVEF may have stronger associations with outcome.5,19,20,41 Similarly, this study suggests that RVESV and RVEF may have stronger electromechanical representation in the ECG than diastolic size, which is a potential mechanism for the correlations observed between the ECG, the CMR, and long-term outcome in TOF.

Interestingly, obesity was associated with improved model performance. Obesity in TOF is associated with lower LV and RV EF and leads to underestimation of the degree of ventricular dilation when volumes are indexed to BSA rather than ideal body weight.42 In this study, weak negative correlations between BMI and RVEF (Spearman rho = −0.07, P = 0.024, see Supplementary material online, Figure S4) and between BMI and RVEDVi (Spearman rho = −0.08, P = 0.015) are observed consistent with these prior reports. However, unsurprisingly, obesity and age were correlated with each other, and within the ≥22-year-old age group, there was not increased model performance in obese patients. Given the known limitations of BSA scaling in volumetric RV assessment in the setting of obesity, ECG analysis could be a novel method for identification of ventricular remodelling but further development of this method is required to ensure results are not confounded by age effects.

Implications for AI model development and deployment in congenital heart disease

This study goes beyond single-centre external validation to evaluate the generalizability of an AI-ECG model across a diverse spectrum of practices and settings. Based on the extensive external validation performed in this study, we conclude that local verification of discrimination and calibration is required. This is a novel and important finding in the congenital heart disease literature, particularly when examined in a regulatory framework. For ‘high risk’ AI systems, the newly implemented European Union AI Act demands ‘an appropriate level of accuracy’ which shall be defined ‘in cooperation with relevant stakeholders and organizations’. The United States Federal Drug Administration approval pathway requires evidence of model generalizability43 that may be demonstrated with two separate external validations. This study could meet these requirements as three centres performed strongly on external validation, but further evaluation revealed suboptimal performance in two other centres. This reinforces that demonstration of discrimination, calibration, and net benefit across a wide range of centres is important for safe and effective adoption of AI tools. The current standards could be inadequate to ensure truly generalizable performance, which warrants further examination across the breadth of AI-ECG applications.

Calibration is an essential consideration to ensure that model probability scores can be reliably interpreted as predicted risk, but it is not well studied in the congenital heart disease literature. Calibration varied across sites including miscalibration at one site that had a high AUROC (Site B). We speculate that differences in calibration may arise from out-of-distribution prevalence of adverse remodelling and age distributions in the miscalibrated centres (see Supplementary material online, Results and discussion). This highlights that model discrimination (AUROC) is only the first step towards AI model evaluation. Calibration is essential to ensure that globally applied risk thresholds can be expected to have similar performance across institutions. Furthermore, calibration ensures the predicted risk aligns with the observed outcomes to give clinicians the context needed to make personalized decisions. Important metrics like net benefit analysis are not sensical without proper model calibration24 because model output probability does not reflect the underlying risk of disease. This is further emphasized by analysis of net benefit curves for the uncalibrated model (see Supplementary material online, Figure S5), which shows very little net benefit at the a priori threshold amongst well-performing centres and negative net benefit when all centres are considered. This shows that without proper consideration of calibration, a model may be harmful compared with the current standard of care when considered in the context of clinical risk tolerance. These findings echo recent findings that an FDA-approved AI-ECG model has calibration differences across centres that significantly affect the model performance at predefined screening thresholds to identify hypertrophic cardiomyopathy.44 Mitigation of calibration differences across centres might be achieved through a multicentre training design to account for distribution differences across centres.

To aid in further research and implementation in the congenital heart disease community, the complete containerized pipeline and model to perform AI-ECG inference from 12-lead ECG in XML format is available for download at https://github.com/sonqduong/ECGsizefxn_rTOF.

Limitations

The AI-ECG model output is the predicted probability of significant CMR abnormalities. However, these volumetric criteria for intervention are debated. A scientific statement5 released after data collection, model training, and analysis of this study suggested modification of the CMR-based indications for PVR, though RVESVi, RVEF, LVEF, and RVEDVi are still all directly or indirectly integrated. It remains to be seen whether this guidance will be broadly adopted. Notably, in this study, the composite, RVEF, and RVESV models all exhibit similar performance, which suggests that this model could be successfully adapted to updated guidance. Although the models were produced from retrospective data before the release of this statement, if the definitions of significant ventricular remodelling change, then CMR screening practices are also likely to change over time, which may in turn become a source of model drift in prospective implementation. It is also recognized that the AI-ECG model cannot completely replace advanced imaging because CMR also provides important information beyond quantification of volumes and EF such as pulmonary regurgitation fraction, aorta and pulmonary artery size, branch pulmonary artery flow distribution, and evaluation for scar and fibrosis to further risk stratify outcomes.45–47

In this study, significant differences across institutions in CMR measurements and outcome prevalence are observed, which may be related to population differences across centres but may also be related to how CMR are quantified at each centre. The combined effect of interstudy and interobserver variability on RV measurements in TOF may be substantial, varying by 12.7 mL/m2 for RVEDVi, 9.3 mL/m2 for RVESVi, and 7% for RVEF.48 Even in healthy children, coefficients of variation in RVEDV, RVESV, and RVEF range from 6 to ∼10%, with evidence for systematic differences between observers.49 If there are systematic differences in how CMRs are measured, then this may degrade performance in external validation through ‘concept shift’ (i.e. similar ECG features may reflect the same underlying ‘true’ measurement, but due to CMR measurement differences, the ECG is labelled as disease-positive at one site and disease-negative at another). This may in part explain the observed differences in calibration observed across centres. Although one of the sites (Site B) submitted CMR data that were partially remeasured in a core lab18 and discrimination remained strong, it is likely that performance could be further improved by recontouring all studies at a core facility, as has been performed in other major TOF cohort studies18,50 to improve generalizability.

Because variations in discrimination are observed, it is important to verify discrimination and calibration locally. The calibration was performed using cross-validation instead of a hold-out calibration set to maximize sample size for detailed subgroup performance analysis. However, this strategy risks presenting overoptimistic calibration results, which may in turn affect net benefit analysis results, which are sensitive to calibration. In prospective implementation, recalibration would by necessity be performed on a hold-out calibration set. As this validation was only performed at North American sites, validation in Europe and other global locations that may benefit from AI-ECG screening is required. Finally, due to the experimental requirement for a paired CMR, the cohort is enriched for patients with more severe disease, as more frequent CMR is recommended in higher-risk patients.4 Certain populations with a likely lower incidence of disease such as those without significant pulmonary regurgitation might be underrepresented. We addressed this by examining net benefit under a range of projected disease prevalence and found that the net reduction in CMR increased as disease prevalence decreased.

Conclusions

Artificial intelligence analysis of electrocardiogram analysis can be used as a biomarker for adverse ventricular remodelling to inform the timing of advanced imaging in patients with repaired TOF. This may increase care efficiency and expand access to care by reducing unnecessary or redundant CMR. Through its rare multicentre design, this study also reveals important differences in performance and model calibration across centres and cohorts, which have important practical implications for real-world prospective implementation of AI-ECG biomarkers in congenital heart disease and beyond.

Supplementary Material

ztag015_Supplementary_Data

Acknowledgements

The authors would like to thank Dr. Andrew Vickers for advice in implementation of net benefit analysis.

Contributor Information

Son Q Duong, Department of Pediatrics (Cardiology), Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, Annenberg 3rd Fl, New York, NY 10029, USA; Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for Artificial Intelligence in Children’s Health, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Akhil Vaid, Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10029, USA.

Pengfei Jiang, Center for Child Health Services Research, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Population Health Sciences and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Yuval Bitterman, Department of Paediatrics, Labatt Family Heart Centre, Hospital for Sick Children (SickKids), Toronto M5G 1X8, Canada.

Yamini Krishnamurthy, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.

I Min Chiu, Department of Medicine (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.

Joshua Finer, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.

Brian Cleary, Department of Pediatrics (Cardiology), Northwestern Feinberg School of Medicine, Chicago, IL 60611, USA.

Benjamin S Glicksberg, Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for Artificial Intelligence in Children’s Health, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10029, USA.

Ruchira Garg, Department of Pediatrics (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.

Michael DiLorenzo, Department of Pediatrics (Cardiology), Columbia University College of Physicians and Surgeons, New York, NY 10032, USA.

Mark Friedberg, Department of Paediatrics, Labatt Family Heart Centre, Hospital for Sick Children (SickKids), Toronto M5G 1X8, Canada.

Evan Zahn, Department of Pediatrics (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.

Matthew Lewis, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.

Michael Satzer, Department of Pediatrics (Cardiology), Northwestern Feinberg School of Medicine, Chicago, IL 60611, USA.

David Ouyang, Department of Medicine (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.

Pierre Elias, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.

Tal Geva, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, USA.

Sunil Ghelani, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, USA.

Brett R Anderson, Department of Pediatrics (Cardiology), Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, Annenberg 3rd Fl, New York, NY 10029, USA; Center for Child Health Services Research, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Population Health Sciences and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Ali Zaidi, Department of Pediatrics (Cardiology), Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, Annenberg 3rd Fl, New York, NY 10029, USA; Mount Sinai Fuster Heart Hospital, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Rachel M Wald, Department of Paediatrics, Labatt Family Heart Centre, Hospital for Sick Children (SickKids), Toronto M5G 1X8, Canada; Peter Munk Cardiac Centre, Toronto Adult Congenital Heart Disease Program, University of Toronto, Toronto M5G 2C4, Ontario, Canada.

Girish N Nadkarni, Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10029, USA.

Joshua Mayourian, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, USA.

Supplementary material

Supplementary material is available at European Heart Journal – Digital Health.

Author contributions

Son Duong (Conceptualization [lead]; Data curation [lead]; Formal analysis [lead]; Investigation [lead]; Methodology [lead]; Supervision [equal]; Validation [lead]; Visualization [lead]; Writing—original draft [lead]; Writing—review & editing [lead]), Rachel Wald (Conceptualization [supporting]; Data curation [supporting]; Formal analysis [supporting]; Investigation [supporting]; Supervision [supporting]; Writing—review & editing [supporting]), Ali Zaidi (Data curation [supporting]; Investigation [supporting]; Writing—review & editing [supporting]), Brett Anderson (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Sunil Ghelani (Data curation [supporting]; Writing—review & editing [supporting]), Tal Geva (Data curation [supporting]; Writing—review & editing [supporting]), Pierre Elias (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), David Ouyang (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Michael Satzer (Data curation [supporting]; Writing—review & editing [supporting]), Matthew Lewis (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Evan Zahn (Data curation [supporting]; Writing—review & editing [supporting]), Mark Friedberg (Data curation [supporting]; Writing—review & editing [supporting]), Michael DiLorenzo (Data curation [supporting]; Writing—review & editing [supporting]), Ruchira Garg (Data curation [supporting]; Writing—review & editing [supporting]), Benjamin Glicksberg (Methodology [supporting]; Resources [supporting]; Writing—review & editing [supporting]), Brian Cleary (Data curation [supporting]; Writing—review & editing [supporting]), Joshua Finer (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), I-Min Chiu (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Yamini Krishnamurthy (Data curation [supporting]; Writing—review & editing [supporting]), Yuval Bitterman (Data curation [supporting]; Writing—review & editing [supporting]), Pengfei Jiang (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Akhil Vaid (Formal analysis [supporting]; Investigation [supporting]; Methodology [supporting]), Girish Nadkarni (Formal analysis [supporting]; Funding acquisition [equal]; Resources [equal]; Supervision [equal]; Writing—review & editing [lead]), and Joshua Mayourian (Conceptualization [equal]; Formal analysis [equal]; Investigation [equal]; Methodology [equal]; Writing—review & editing [equal])

Funding

This work was supported by National Institutes of Health K08HL173639 and the American Society of Echcardiography Foundation EDGES Award (S.Q.D.), National Institutes of Health R01HL167050 (G.N.N.), Kostin Innovation Fund (J.M.), Thrasher Research Fund Early Career Award (J.M.), and National Institutes of Health T32HL007572 (J.M.).

Data availability

The data underlying this article cannot be shared publically because it contains protected patient health information. The data may be shared upon reasonable request and in accordance with appropriate regulatory oversight. The containerized pipeline and model to perform AI-ECG inference from 12-lead ECG in XML format is available for download at https://github.com/sonqduong/ECGsizefxn_rTOF.

References

  • 1. Apitz  C, Webb  GD, Redington  AN. Tetralogy of Fallot. Lancet  2009;374:1462–1471. [DOI] [PubMed] [Google Scholar]
  • 2. Geva  T. Indications for pulmonary valve replacement in repaired tetralogy of Fallot: the quest continues. Circulation  2013;128:1855–1857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Bokma  JP, Geva  T, Sleeper  LA, Lee  JH, Lu  M, Sompolinsky  T, et al.  Improved outcomes after pulmonary valve replacement in repaired tetralogy of Fallot. J Am Coll Cardiol  2023;81:2075–2085. [DOI] [PubMed] [Google Scholar]
  • 4. Stout  KK, Daniels  CJ, Aboulhosn  JA, Bozkurt  B, Broberg  CS, Colman  JM, et al.  2018 AHA/ACC guideline for the management of adults with congenital heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation  2019;139:e698–e800. [DOI] [PubMed] [Google Scholar]
  • 5. Geva  T, Wald  RM, Bucholz  E, Cnota  JF, McElhinney  DB, Mercer-Rosa  LM, et al.  Long-term management of right ventricular outflow tract dysfunction in repaired tetralogy of Fallot: a scientific statement from the American Heart Association. Circulation  2024;150:e689–e707. [DOI] [PubMed] [Google Scholar]
  • 6. Mercer-Rosa  L, Parnell  A, Forfia  PR, Yang  W, Goldmuntz  E, Kawut  SM. Tricuspid annular plane systolic excursion in the assessment of right ventricular function in children and adolescents after repair of tetralogy of Fallot. J Am Soc Echocardiogr  2013;26:1322–1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Lopez  L, Saurers  DL, Barker  PCA, Cohen  MS, Colan  SD, Dwyer  J, et al.  Guidelines for performing a comprehensive pediatric transthoracic echocardiogram: recommendations from the American Society of Echocardiography. J Am Soc Echocardiogr  2024;37:119–170. [DOI] [PubMed] [Google Scholar]
  • 8. Buddhe  S, Soriano  BD, Powell  AJ. Survey of centers performing cardiovascular magnetic resonance in pediatric and congenital heart disease: a report of the Society for Cardiovascular Magnetic Resonance. J Cardiovasc Magn Reson  2022;24:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Salciccioli  KB, Oluyomi  A, Lupo  PJ, Ermis  PR, Lopez  KN. A model for geographic and sociodemographic access to care disparities for adults with congenital heart disease. Congenit Heart Dis  2019;14:752–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Khan  AM, McGrath  LB, Ramsey  K, Agarwal  A, Broberg  CS. Association of adults with congenital heart disease-specific care with clinical characteristics and healthcare use. J Am Heart Assoc  2021;10:e019598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Rutz  T, Ghandour  F, Meierhofer  C, Naumann  S, Martinoff  S, Lange  R, et al.  Evolution of right ventricular size over time after tetralogy of Fallot repair: a longitudinal cardiac magnetic resonance study. Eur Heart J Cardiovasc Imaging  2017;18:364–370. [DOI] [PubMed] [Google Scholar]
  • 12. Wald  RM, Valente  AM, Gauvreau  K, Babu-Narayan  SV, Assenza  GE, Schreier  J, et al.  Cardiac magnetic resonance markers of progressive RV dilation and dysfunction after tetralogy of Fallot repair. Heart  2015;101:1724–1730. [DOI] [PubMed] [Google Scholar]
  • 13. Duong  SQ, Vaid  A, My  VTH, Butler  LR, Lampert  J, Pass  RH, et al.  Quantitative prediction of right ventricular size and function from the ECG. J Am Heart Assoc  2024;13:e031671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Mayourian  J, Gearhart  A, La Cava  WG, Vaid  A, Nadkarni  GN, Triedman  JK, et al.  Deep learning-based electrocardiogram analysis predicts biventricular dysfunction and dilation in congenital heart disease. J Am Coll Cardiol  2024;84:815–828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Duong  SQ, Dominy  CL, Lampert  J, Singh  S, Croft  L, Zaidi  AN, et al.  Ensemble modeling of multimodal electrocardiogram and echocardiogram data improves quantitative assessment of right ventricular function. JACC Adv  2024;3:101186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Svennberg  E, Han  JK, Caiani  EG, Engelhardt  S, Ernst  S, Friedman  P, et al.  State of the art of artificial intelligence in clinical electrophysiology in 2025: a scientific statement of the European Heart Rhythm Association (EHRA) of the ESC, the Heart Rhythm Society (HRS), and the ESC Working Group on E-Cardiology. Europace  2025;27:euaf071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Collins  GS, Moons  KGM, Dhiman  P, Riley  RD, Beam  AL, Van Calster  B, et al.  TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ  2024;385:q902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Wald  RM, Altaha  MA, Alvarez  N, Caldarone  CA, Cavallé-Garrido  T, Dallaire  F, et al.  Rationale and design of the Canadian outcomes registry late after tetralogy of Fallot repair: the CORRELATE study. Can J Cardiol  2014;30:1436–1443. [DOI] [PubMed] [Google Scholar]
  • 19. Valente  AM, Gauvreau  K, Assenza  GE, Babu-Narayan  SV, Schreier  J, Gatzoulis  MA, et al.  Contemporary predictors of death and sustained ventricular tachycardia in patients with repaired tetralogy of Fallot enrolled in the INDICATOR cohort. Heart  2014;100:247–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Bokma  JP, Winter  MM, Oosterhof  T, Vliegen  HW, van Dijk  AP, Hazekamp  MG, et al.  Preoperative thresholds for mid-to-late haemodynamic and clinical outcomes after pulmonary valve replacement in tetralogy of Fallot. Eur Heart J  2016;37:829–835. [DOI] [PubMed] [Google Scholar]
  • 21. Mayourian  J, La Cava  W, Vaid  A, Ghelani  SJ, Mannix  R, Bezzerides  VJ, et al.  Pediatric electrocardiogram-based deep learning to predict left ventricular dysfunction and remodeling. Circulation  2024;149:917–931, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Ribeiro  AH, Ribeiro  MH, Paixão  GMM, Oliveira  DM, Gomes  PR, Canazart  JA, et al.  Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun  2020;11:1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Huang  Y, Li  W, Macheret  F, Gabriel  RA, Ohno-Machado  L. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc  2020;27:621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Vickers  AJ, Van Calster  B, Steyerberg  EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ  2016;352:i6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Khurshid  S, Friedman  S, Pirruccello  JP, Di Achille  P, Diamant  N, Anderson  CD, et al.  Deep learning to predict cardiac magnetic resonance–derived left ventricular mass and hypertrophy from 12-lead ECGs. Circ Cardiovasc Imaging  2021;14:e012281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Mayourian  J, El-Bokl  A, Lukyanenko  P, La Cava  WG, Geva  T, Valente  AM, et al.  Electrocardiogram-based deep learning to predict mortality in paediatric and adult congenital heart disease. Eur Heart J  2025;46:856–868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Baumgartner  H, De Backer  J, Babu-Narayan  SV, Budts  W, Chessa  M, Diller  G-P, et al.  2020 ESC guidelines for the management of adult congenital heart disease. Eur Heart J  2021;42:563–645. [DOI] [PubMed] [Google Scholar]
  • 28. Gillespie  MJ, Benson  LN, Bergersen  L, Bacha  EA, Cheatham  SL, Crean  AM, et al.  Patient selection process for the harmony transcatheter pulmonary valve early feasibility study. Am J Cardiol  2017;120:1387–1392. [DOI] [PubMed] [Google Scholar]
  • 29. Chowdhury  D, Johnson  JN, Baker-Smith  CM, Jaquiss  RDB, Mahendran  AK, Curren  V, et al.  Health care policy and congenital heart disease: 2020 focus on our 2030 future. J Am Heart Assoc  2021;10:e020605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Moons  P, Bratt  EL, De Backer  J, Goossens  E, Hornung  T, Tutarel  O, et al.  Transition to adulthood and transfer to adult care of adolescents with congenital heart disease: a global consensus statement of the ESC Association of Cardiovascular Nursing and Allied Professions (ACNAP), the ESC Working Group on Adult Congenital Heart Disease (WG ACHD), the Association for European Paediatric and Congenital Cardiology (AEPC), the Pan-African Society of Cardiology (PASCAR), the Asia-Pacific Pediatric Cardiac Society (APPCS), the Inter-American Society of Cardiology (IASC), the Cardiac Society of Australia and New Zealand (CSANZ), the International Society for Adult Congenital Heart Disease (ISACHD), the World Heart Federation (WHF), the European Congenital Heart Disease Organisation (ECHDO), and the Global Alliance for Rheumatic and Congenital Hearts (Global ARCH). Eur Heart J  2021;42:4213–4223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Slouha  E, Trygg  G, Tariq  AH, La  A, Shay  A, Gorantla  VR. Pulmonary valve replacement timing following initial tetralogy of Fallot repair: a systematic review. Cureus  2023;15:e49577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Mayourian  J, La Cava  WG, Vaid  A, Nadkarni  GN, Ghelani  SJ, Mannix  R, et al.  Pediatric ECG-based deep learning to predict left ventricular dysfunction and remodeling. Circulation  2024;149:917–931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Khairy  P, Harris  L, Landzberg  MJ, Viswanathan  S, Barlow  A, Gatzoulis  MA, et al.  Implantable cardioverter-defibrillators in tetralogy of Fallot. Circulation  2008;117:363–370. [DOI] [PubMed] [Google Scholar]
  • 34. Gatzoulis  MA, Till  JA, Somerville  J, Redington  AN. Mechanoelectrical interaction in tetralogy of Fallot. Circulation  1995;92:231–237. [DOI] [PubMed] [Google Scholar]
  • 35. Egbe  AC, Luis  SA, Padang  R, Warnes  CA. Outcomes in moderate mixed aortic valve disease: is it time for a paradigm shift?  J Am Coll Cardiol  2016;67:2321–2329. [DOI] [PubMed] [Google Scholar]
  • 36. Bokma  JP, Winter  MM, Vehmeijer  JT, Vliegen  HW, van Dijk  AP, van Melle  JP, et al.  QRS fragmentation is superior to QRS duration in predicting mortality in adults with tetralogy of Fallot. Heart  2017;103:666–671. [DOI] [PubMed] [Google Scholar]
  • 37. Alonso  P, Andrés  A, Rueda  J, Buendía  F, Igual  B, Rodríguez  M, et al.  Value of the electrocardiogram as a predictor of right ventricular dysfunction in patients with chronic right ventricular volume overload. Rev Esp Cardiol  2015;68:390–397. [DOI] [PubMed] [Google Scholar]
  • 38. Buntharikpornpun  R, Jaruratanasirikul  S, Roymanee  S, Jarutach  J, Wongwaitaweewong  K, Sangthong  R. Correlation between fragmented QRS and ventricular function from cardiac magnetic resonance in patients with repaired tetralogy of Fallot. Pediatr Cardiol  2021;42:1713–1721. [DOI] [PubMed] [Google Scholar]
  • 39. Book  WM, Hurst  JW, Parks  WJ, Hopkins  KL. Electrocardiographic predictors of right ventricular volume measured by magnetic resonance imaging late after total repair of tetralogy of Fallot. Clin Cardiol  1999;22:740–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Lumens  J, Fan  CS, Walmsley  J, Yim  D, Manlhiot  C, Dragulescu  A, et al.  Relative impact of right ventricular electromechanical dyssynchrony versus pulmonary regurgitation on right ventricular dysfunction and exercise intolerance in patients after repair of tetralogy of Fallot. J Am Heart Assoc  2019;8:e010903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Ishikita  A, McIntosh  C, Hanneman  K, Lee  MM, Liang  T, Karur  GR, et al.  Machine learning for prediction of adverse cardiovascular events in adults with repaired tetralogy of Fallot using clinical and cardiovascular magnetic resonance imaging variables. Circ Cardiovasc Imaging  2023;16:e015205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Aly  S, Lizano Santamaria  RW, Devlin  PJ, Jegatheeswaran  A, Russell  J, Seed  M, et al.  Negative impact of obesity on ventricular size and function and exercise performance in children and adolescents with repaired tetralogy of Fallot. Can J Cardiol  2020;36:1482–1490. [DOI] [PubMed] [Google Scholar]
  • 43. Health C for D and R . Good Machine Learning Practice for Medical Device Development: Guiding Principles. FDA. Published online March 25, 2025. https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles (6 August 2025).
  • 44. Lampert  J, Bhatt  DL, Vaid  A, Kon  K, Feinman  J, Jou  S, et al.  Calibration of ECG-based deep-learning algorithm scores for patients flagged as high risk for hypertrophic cardiomyopathy. NEJM AI  2025;2:AIoa2400421. [Google Scholar]
  • 45. Geva  T. Repaired tetralogy of Fallot: the roles of cardiovascular magnetic resonance in evaluating pathophysiology and for pulmonary valve replacement decision support. J Cardiovasc Magn Reson  2011;13:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Valente  AM, Cook  S, Festa  P, Ko  HH, Krishnamurthy  R, Taylor  AM, et al.  Multimodality imaging guidelines for patients with repaired tetralogy of Fallot: a report from the American Society of Echocardiography. J Am Soc Echocardiogr  2014;27:111–141. [DOI] [PubMed] [Google Scholar]
  • 47. Ghonim  S, Gatzoulis  MA, Ernst  S, Li  W, Moon  JC, Smith  GC, et al.  Predicting survival in repaired tetralogy of Fallot: a lesion-specific and personalized approach. JACC Cardiovasc Imaging  2022;15:257–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Blalock  SE, Banka  P, Geva  T, Powell  AJ, Zhou  J, Prakash  A. Inter-study variability in CMR measurements of right ventricular volume, mass and ejection fraction in tetralogy of Fallot: a prospective observational study. J Cardiovasc Magn Reson  2012;14:P104. [DOI] [PubMed] [Google Scholar]
  • 49. van der Ven  JPG, Sadighy  Z, Valsangiacomo Buechel  ER, Sarikouch  S, Robbers-Visser  D, Kellenberger  CJ, et al.  Multicentre reference values for cardiac magnetic resonance imaging derived ventricular size and function for children aged 0–18 years. Eur Heart J Cardiovasc Imaging  2020;21:102–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Valente  AM, Gauvreau  K, Assenza  GE, Babu-Narayan  SV, Evans  SP, Gatzoulis  M, et al.  Rationale and design of an international multicenter registry of patients with repaired tetralogy of Fallot to define risk factors for late adverse outcomes: the INDICATOR cohort. Pediatr Cardiol  2013;34:95–104. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ztag015_Supplementary_Data

Data Availability Statement

The data underlying this article cannot be shared publically because it contains protected patient health information. The data may be shared upon reasonable request and in accordance with appropriate regulatory oversight. The containerized pipeline and model to perform AI-ECG inference from 12-lead ECG in XML format is available for download at https://github.com/sonqduong/ECGsizefxn_rTOF.


Articles from European Heart Journal. Digital Health are provided here courtesy of Oxford University Press on behalf of the European Society of Cardiology

RESOURCES