Abstract
Aims
Periodic cardiac MRI (CMR) is recommended to identify adverse ventricular remodelling in repaired tetralogy of Fallot (TOF), but access to CMR is uneven, and compliance is poor. We developed a 12-lead electrocardiogram (ECG) artificial intelligence (AI) biomarker to identify CMR-quantified adverse biventricular remodelling in repaired TOF.
Methods and results
Six (1 train/5 external test) North American retrospective cohorts with paired ECG and CMR were included. The main outcome was a composite of ≥2 TOF-specific CMR abnormalities: right ventricular (RV) end-diastolic volume ≥ 160 mL/m2, RV end-systolic volume ≥ 80 mL/m2, RV ejection fraction (EF) <47%, and left ventricular EF <55%. Model discrimination, calibration, and net benefit as a screening test to rule out ventricular remodelling were assessed. Nine hundred and eight patients (2552 ECGs) were included in training, and 782 patients (1795 ECGs) in external validation (outcome prevalence 57%). The area under the receiver-operating curve (AUROC) was 0.85 (95% confidence interval 0.83–0.87), and average precision was 0.88. At a screening risk-threshold of 0.25, there was 92% sensitivity, 41% specificity, 87% negative predictive value, and 55% positive predictive value for ventricular remodelling, which yielded a 13% net reduction in CMR use on net benefit analysis. There was no difference by sex or race/ethnicity, but there were differences by age and site, with two of five sites with lower AUROC than the others, and three of five sites met criteria for miscalibration, which improved after centre-specific calibration.
Conclusion
An artificial intelligence analysis of electrocardiogram (AI-ECG) biomarker in repaired TOF effectively identifies ventricular remodelling to inform timing of advanced imaging. Extensive external validation revealed variation in discrimination and calibration that are important considerations for clinical implementation and regulatory approval pathways of AI-ECG in congenital heart disease.
Graphical Abstract
Graphical Abstract.
Introduction
Tetralogy of Fallot (TOF) is the most common cyanotic congenital heart disease, affecting 1 in 3000 live births.1 Right ventricular outflow tract (RVOT) dysfunction after repair of TOF is nearly universal and characterized by chronic pulmonary regurgitation with residual obstruction. Although well-tolerated early in life, chronic RVOT dysfunction precipitates a pathophysiological cascade that may lead to morbidity and mortality after the second decade of life.2
Lifelong surveillance is recommended for early detection of electromechanical cardiomyopathy that might be ameliorated by pulmonary valve replacement (PVR) to increase survival.3,4 Guidelines recommend serial monitoring and RVOT intervention with PVR at ventricular size and functional thresholds that represent a ‘tipping point’ before irreversible changes occur.2,4,5
Because assessment of RV size and systolic function by echocardiography is suboptimal,6,7 cardiac magnetic resonance imaging (CMR) is recommended for assessment of RV volumes, ejection fraction (EF), and mass. However, CMR is infeasible in some patients, requires specialized expertise and equipment, and typically exceeds 2 h of patient and clinician time to acquire, process, and report.8 These factors limit their use in under-resourced settings, as half of adults with congenital heart disease live more than an hour from an appropriate care centre,9 and demonstrate low adherence to guideline-recommended diagnostic imaging.10 Furthermore, frequent CMR in TOF may not be necessary in select patients.11,12 This population is therefore an ideal target for the development of broadly available precision diagnostics to increase care efficiency and expand access to quantitative right ventricular (RV) assessment.
Artificial intelligence analysis of electrocardiograms (AI-ECG) is a novel method for RV assessment. Artificial intelligence analysis of electrocardiogram methods can predict CMR-quantified RV dilation and dysfunction in adult13 and congenital heart disease populations,14 and AI-ECG complements echocardiogram-based RV functional assessment.15 However, the performance of AI-ECG to identify adverse ventricular remodelling at clinically important quantitative thresholds in repaired TOF has not been explored. Additionally, prior AI-ECG studies in paediatric and congenital heart disease are limited by a lack of multiple-centre external validation, and thus, differences in performance and calibration have yet to be explored. This is an important next step in the regulatory approval pathway for these novel diagnostics.
This study sought to develop and externally validate an AI-ECG model in six centres across North America to predict the risk of clinically important CMR-based biventricular size and systolic functional abnormalities in patients with repaired TOF.
Methods
Participating centres
Model training occurred at a large northeastern combined paediatric and adult congenital heart disease (ACHD) centre. External validation occurred at five hospital-based centres consisting of mixed ACHD and paediatric practices across the USA and Canada. An overview of the training and validation process is shown in Figure 1. Institutional Review Board approval with a waiver of consent was obtained at all participating institutions. This study adheres to EHRA-AI16 and TRIPOD-AI17 reporting guidelines (see Supplementary material online).
Figure 1.
Study overview. A multicentre study to train and validate a deep learning electrocardiogram model to predict the abnormalities in biventricular size and function on cardiac MRI for patients with tetralogy of Fallot.
Inclusion criteria
Investigators at each participating centre retrospectively identified patients of any age followed at their institution with TOF status post full intracardiac repair, body surface area (BSA) ≥ 1 m2, and at least one 12-lead ECG performed within 90 days of a CMR without an intermediate intervention (cardiac catheterization or surgical). At Site B, part of the cohort submitted was existing longitudinal registry data consisting of patients with moderate-or-greater pulmonary regurgitation at enrolment.18
Data collection
Clinical and demographic data were collected from electronic health records, internal databases, and imaging reports per site-specific practices (see Supplementary material online, Methods). Electrocardiograms obtained during clinical care were retrospectively identified and extracted from MUSE ECG management system in XML format (GE Healthcare, USA) either manually (Site E, which submitted 1 ECG per CMR) or through database query (all other sites, resulting in ≥ 1 ECG per CMR in the inclusion period). Electrocardiogram tracings were 10 s acquisitions sampled at either 250 or 500 Hz. Only the standard 12-leads were included. Cardiac MRI volumes and EF were collected from CMR reports performed during the course of clinical care, except for registry participants at Site B, at which CMR volumes were remeasured in a core lab as part of the existing registry protocol.18
Prediction target
Cardiac MRI in TOF quantifies adverse ventricular remodelling to guide the timing of PVR. These CMR-defined criteria were the prediction targets of interest:2–4,19,20 BSA-indexed RV end-diastolic volume (RVEDVi) ≥ 160 mL/m2, BSA-indexed RV end-systolic volume (RVESVi) ≥ 80 mL/m2, RVEF < 47%, LVEF < 55%, and a composite of ≥2 of the above criteria. The five model outputs were prediction probabilities ranging from 0 to 1 for their respective CMR abnormality. As guidelines suggest referral for PVR with ≥2 CMR abnormalities in asymptomatic individuals, the ≥2 composite criteria were considered the main outcome and were subsequently evaluated for clinical utility as a screening tool to identify adverse ventricular remodelling.
Model selection, architecture, and training
To maximize sample size, the training set was partitioned 90% training and 10% validation for monitoring loss and hyperparameter tuning, with no other data split for internal testing at the training site. Instead, model performance was evaluated exclusively at external validation sites. Increasing the training sample by combining data across centres for multisite training was not feasible due to data-sharing restrictions. Starting model weights were from a previously published congenital heart disease model trained on over 90 000 ECGs.14 The network architecture is identical to previous work14,21 where 12 × 2048 ECG inputs are used as inputs into a convolutional neural network that includes residual blocks adapted for unidimensional signals.22 Details of model training are found in Supplementary material online, Methods.
Multicentre external validation
The model and pipeline for inference were packaged into a Docker container (Docker Inc., USA) for reproducibility of inference across all external validation centres. Global model discrimination was measured with area under the receiver-operating curve (AUROC) and average precision [analogous to area under the precision–recall curve (AUPRC)] with bootstrapped 95% confidence intervals. Subgroups were analysed with Bonferroni-corrected Delong Test. The containerized model and pipeline used for multicentre external inference and evaluation are available at https://github.com/sonqduong/ECGsizefxn_rTOF. Other software used and data availability statement can be found in Supplementary material online, Methods.
Model calibration
Model calibration was assessed visually with reliability diagrams and statistically with the Spiegelhalter Z test.23 A Z test P < 0.05 in a model with AUROC >0.65 was considered evidence of miscalibration. In centres that met criteria for miscalibration, centre-specific Platt scaling23 was performed, which does not affect model AUROC at individual centres but improves calibration. Platt scaling was performed using leave-one-group-out cross-validation (with grouping at the patient level) to prevent data leakage and reduce overoptimistic estimates of performance. The calibrated results were then aggregated across centres to allow for global examination of performance.
Clinical utility
The composite outcome model was selected to demonstrate the clinical utility of AI-ECG as a biomarker to screen for adverse ventricular remodelling suggestive of the need for PVR. The cohort was further limited to the closest available ECG to the time of CMR to mimic the expected clinical use case of the algorithm. Threshold-specific accuracy metrics: sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were reported at a 25% risk threshold, which represented a clinically reasonable pre-test odds of disease at which CMR could be deferred. The net benefit with decision curve analysis24 was implemented as described in the Supplementary material online, Methods to evaluate net reduction in CMR with implementation of the AI-ECG model at this threshold and over a range of clinically reasonable thresholds.
Model explainability
Median waveform analysis provides a visualization of representative high-risk and low-risk ECGs. Saliency mapping was performed using a Shapley Additive Explanations framework7 to visualize input ECG features that contribute most to model prediction. Model explainability was performed on ECGs from test Site A using saliency mapping on median waveforms from 10 independent patient ECGs from the highest and lowest AI-ECG risk scores, similar to prior work.25,26
As a comparator model, the final reported QRS duration on a subset of ECG from Site A and Site B was collected, and a simple univariable logistic regression model was fit using patient-stratified five-fold cross-validation to predict the primary outcome using QRS duration as a sole predictor. Area under the receiver-operating curve and decision curve analysis were reported for the AI-ECG model.
Results
Cohort characteristics
The training cohort consisted of 908 patients with 1991 CMR and 2552 ECGs. The aggregated external validation cohort consisted of 782 patients with 996 CMR and 1795 ECG (Table 1). The aggregated external validation cohort had a higher prevalence of disease outcomes compared with the training site (34 vs. 23% for RVEDVi ≥ 160, 48 vs. 29% for RVESVi ≥ 80, 61 vs. 36% for RVEF < 47%, 51 vs. 35% for LVEF < 55%, 57 vs. 38% for composite of ≥2 criteria; P < 0.001 for all comparisons). There was heterogeneity across external validation centres in race/ethnicity, TOF subtype, age at study, year of study, body size measures, and RVOT type at CMR (P < 0.001 for all measures, Table 1). There were significant group differences in RV volumes and biventricular EFs across validation centres (P < 0.001).
Table 1.
Cohort characteristics
| Train site | Aggregated external validation | Individual external validation centres | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Site A | Site B | Site C | Site D | Site E | P-value* | ||||
| Patient characteristics | Independent patients, n | 908 | 782 | 161 | 250 | 54 | 175 | 142 | |
| Female sex, n (%) | 405 (45%) | 351 (46%) | 71 (49%) | 104 (42%) | 28 (52%) | 76 (45%) | 70 (49%) | 0.41 | |
| Race/ethnicity | |||||||||
| Non-Hispanic Asian | 29 (3%) | 70 (9%) | 11 (7%) | 33 (13%) | 8 (15%) | 6 (3%) | 12 (8%) | <0.001 | |
| Non-Hispanic Black | 33 (4%) | 55 (7%) | 19 (12%) | 4 (2%) | 4 (7%) | 8 (5%) | 20 (14%) | ||
| Hispanic | 51 (6%) | 95 (12%) | 27 (17%) | 4 (2%) | 16 (30%) | 12 (7%) | 36 (25%) | ||
| Other/mixed | 35 (4%) | 45 (6%) | 16 (10%) | 12 (5%) | 2 (4%) | 12 (7%) | 3 (2%) | ||
| Non-Hispanic Pacific Islander | — | 4 (1%) | 3 (2%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (1%) | ||
| Unknown | 152 (17%) | 174 (22%) | 27 (17%) | 67 (27%) | 1 (2%) | 78 (45%) | 1 (1%) | ||
| Non-Hispanic White | 608 (67%) | 339 (43%) | 58 (36%) | 130 (52%) | 23 (43%) | 59 (34%) | 69 (49%) | ||
| TOF subtype | |||||||||
| Pulmonary stenosis | 580 (64%) | 530 (68%) | 139 (86%) | 181 (72%) | 18 (33%) | 91 (52%) | 102 (72%) | <0.001 | |
| Pulmonary atresia | 161 (18%) | 68 (9%) | 16 (10%) | 9 (4%) | 6 (11%) | 19 (11%) | 18 (13%) | ||
| PA/MAPCAs | 7 (1%) | 18 (2%) | 2 (1%) | 0 (0%) | 1 (2%) | 7 (4%) | 8 (6%) | ||
| Absent pulmonary valve | 6 (1%) | 22 (3%) | 1 (1%) | 0 (0%) | 0 (0%) | 9 (5%) | 11 (8%) | ||
| TOF-AV canal defect | 11 (1%) | 6 (1%) | 1 (1%) | 0 (0%) | 1 (2%) | 2 (1%) | 2 (1%) | ||
| Unspecified | 143 (16%) | 138 (18%) | 2 (1%) | 60 (24%) | 28 (52%) | 47 (27%) | 1 (1%) | ||
| CMR Characteristics | Independent CMR, n | 1991 | 996 | 242 | 263 | 72 | 184 | 235 | |
| Age, median [IQR] | 23 [16–34] | 23 [16–34] | 25 [18–35] | 26 [18–40] | 28 [22–40] | 27 [17–41] | 17 [14–22] | <0.001 | |
| Year, median [IQR] | 2012 [‘08–‘16] | 2016 [’14–’20] | 2019 [‘14–‘22] | 2015 [‘14–‘17] | 2017 [‘15–‘18] | 2021 [‘18–‘22] | 2015 [‘11–‘19] | <0.001 | |
| Weight, median [IQR] | 65 [52–79] | 63 [52–78] | 67 [56–79] | 65 [52–80] | 67 [54–84] | 61 [50–75] | 58 [49–70] | <0.001 | |
| BSA, median [IQR] | 1.7 [1.5–1.9] | 1.7 [1.5–1.9] | 1.8 [1.6–1.9] | 1.7 [1.5–2.0] | 1.8 [1.5–2.0] | 1.7 [1.5–1.9] | 1.6 [1.4–1.8] | <0.001 | |
| Obese, n (%) | 374 (19%) | 162 (18%) | 46 (19%) | 48 (18%) | 15 (21%) | 17 (18%) | 36 (15%) | 0.75 | |
| RVOT anatomy at the time of CMR | |||||||||
| Surgical/transcatheter PVR, n (%) | — | 198 (20%) | 52 (21%) | 5 (2%) | 37 (51%) | 59 (32%) | 45 (19%) | <0.001 | |
| RV-PA conduit, n (%) | — | 117 (12%) | 15 (6%) | 11 (4%) | 2 (3%) | 18 (10%) | 71 (30%) | ||
| Transannular patch, n (%) | — | 382 (38%) | 76 (31%) | 136 (52%) | 18 (25%) | 90 (49%) | 62 (26%) | ||
| Valve-sparing, n (%) | — | 175 (18%) | 29 (12%) | 75 (29%) | 10 (14%) | 7 (4%) | 54 (23%) | ||
| Unknown, n (%) | — | 124 (12%) | 70 (29%) | 36 (14%) | 5 (7%) | 10 (5%) | 3 (1%) | ||
| RVEDVi, median [IQR] | 129 [107–153] | 132 [110–156] | 126 [105–151] | 150 [129–179] | 129 [109–154] | 134 [112–159] | 119 [103–138] | <0.001 | |
| RVESVi, median [IQR] | 63 [515–80] | 69 [55–87] | 63 [51–75] | 85 [68–104] | 61 [47–77] | 72 [58–94] | 63 [52–76] | <0.001 | |
| RVEF, median [IQR] | 50 [45–55] | 47 [42–52] | 50 [45–54] | 45 [40–49] | 51 [47–58] | 45 [39–50] | 48 [42–52] | <0.001 | |
| LVEF, median [IQR] | 58 [54–62] | 56 [52–60] | 58 [55–63] | 55 [51–59] | 61 [57–66] | 55 [51–59] | 55 [50–58] | <0.001 | |
| ECG prediction | Independent ECGs, n | 2552 | 1795 | 473 | 540 | 90 | 457 | 235 | |
| RVEDVi > 160, n (%) | 590 (23%) | 608 (34%) | 90 (19%) | 314 (58%) | 25 (28%) | 159 (35%) | 20 (9%) | <0.001 | |
| RVESVi > 80, n (%) | 752 (29%) | 854 (48%) | 151 (32%) | 393 (73%) | 23 (26%) | 240 (53%) | 47 (20%) | <0.001 | |
| RVEF < 47%, n (%) | 911 (36%) | 1086 (61%) | 201 (43%) | 410 (76%) | 26 (29%) | 345 (75%) | 104 (44%) | <0.001 | |
| LVEF < 55%, n (%) | 892 (35%) | 909 (51%) | 182 (39%) | 336 (62%) | 18 (20%) | 258 (56%) | 115 (49%) | <0.001 | |
| >2 Criteria, n (%) | 966 (38%) | 1019 (57%) | 185 (39%) | 420 (78%) | 28 (31%) | 298 (65%) | 88 (37%) | <0.001 | |
* P-value for differences between external validation sites.
External validation model performance
The centre-aggregated primary outcome model AUROC was 0.85 (95% CI 0.83–0.87), and the average precision was 0.88 (95% CI 0.86–0.90). Notably, there was variation observed in AUROC across centres, which ranged from 0.69 to 0.88 (AUROC Site A: 0.87, B: 0.88, C: 0.82; D: 0.74, E: 0.69; see Supplementary material online, Figure S1). Three of five centres met criteria for miscalibration (Sites B, D, E; see Supplementary material online, Results), with miscalibrated sites tending to underestimate the risk of ventricular remodelling (See Supplementary material online, Figure S1). After within-centre calibration, the model was adequately calibrated with expected calibration error of 6.1% (Spiegelhalter Z test P = 0.25, Figure 2). Clinical utility of model implementation as a screening tool to identify adverse ventricular remodelling was examined for the ECG prediction closest to the time of CMR (outcome prevalence of 43%) at a risk threshold of 0.25. Performance metrics were sensitivity 92% (95% CI 89–95%), specificity 41% (95% CI 37–45%), PPV 54% (95% CI 50–57%), and NPV 87% (95% CI 83–91%). In net benefit analysis, implementation of the model as a screening test for ventricular remodelling yielded a net 13% reduction in CMR without missing any additional true positives at a risk threshold of 0.25. Limiting the evaluation to only well-performing centres increased the net reduction in CMR to 21%. Decision curves show a net reduction of the model compared to ‘CMR all’ and ‘CMR none’ strategies in a clinically acceptable risk threshold range of 8% to 50% (see Figure 3). Net benefit was examined across a range of disease prevalence from 25 to 50% due to concern that the study inclusion criteria favoured more diseased groups (see Supplementary material online, Figure S2). The projected net reduction in CMR increased to 24.5% at a prevalence of 25%, implying AI-ECG screening could be more effective to reduce CMR usage in settings where ventricular remodelling is rarer.
Figure 2.
Model discrimination and calibration. Receiver-operating characteristic curves (left panel) and precision–recall curves (middle panel) to identify cardiac MRI-quantified volumetric and functional abnormalities in patients with repaired tetralogy of Fallot. Composite outcome reliability diagram (right panel). Calibrated predictions were aggregated across all sites. ECE, expected calibration error. P-value is Spieghalter’s Z test P-value in which a P < 0.05 suggests model miscalibration.
Figure 3.
Decision curve analysis. Implementation of artificial intelligence analysis of electrocardiogram as a screening tool to detect adverse ventricular remodelling to aid in the timing of advanced imaging in tetralogy of Fallot. The net reduction in cardiac MRI (green line) is plotted against the current strategy of every patient undergoing cardiac MRI (red dashed line) and the strategy of no patient undergoing cardiac MRI (black dashed line). The well-performing subset of sites (A, B, C) is also separately plotted to examine the best-case net benefit (green dashed line).
Examining the utility of the model as a tool to rule-in disease (i.e. tuned for specificity at a risk threshold of 0.75), the model sensitivity was 43% (40–46%), specificity 94% (92–96%), PPV 84% (78–89%), and NPV 68% (65–71%). In total, 59% of the study cohort fell within either the low- or high-risk thresholds.
Model performance of the subcomponent volume and systolic functional CMR metrics to the primary composite model is shown in Figure 2. Prediction of RVESV and RVEF derangements has similar performance to the composite model (AUROC 0.85 for both). Right ventricular end-diastolic volume prediction performance is moderately lower (AUROC 0.78). Discrimination between those above and below an LVEF of 55% is limited (AUROC 0.69).
Subgroup analysis
Subgroup analysis of the composite outcome aggregated across centres is shown in Figure 4. There was no difference in model discrimination by patient sex (P = 0.18), race/ethnicity (P = 1.0), or presence of PVR after intracardiac repair (P = 1.0). There was lower performance in children and young adults <22 years old vs. older adults (AUROC 0.73 vs. 0.85; P = 0.005). Due to this finding, threshold-specific metrics were re-examined in the young adult subgroup, and despite the drop in AUROC, at the screening threshold to rule out adverse ventricular remodelling, the sensitivity was 85% and NPV was 83%, and there was 9% net benefit. At the specific ‘rule-in’ threshold, specificity was 97% and PPV 83%.
Figure 4.
Key subgroup performance. Key subgroup model performance in the aggregated external test cohort. Bonferroni-corrected Delong test P-value for the difference in area under the receiver-operating curve displayed on the right.
Obesity was also associated with improved prediction (AUROC 0.90 vs. 0.84, P = 0.025). However, age and BMI were positively correlated (Spearman rho 0.49, P < 0.001), and in the cohort <22 years, the proportion of obese was significantly lower compared with the older group (10.2 vs. 19.3%, P < 0.001). Therefore, AUROC was examined in the older age cohort, and no prediction performance difference was identified (AUROC 0.90 vs. 0.89 for obese vs. non-obese).
Model explainability with saliency mapping
Saliency mapping example of the composite outcome model is shown in Figure 5. Mapping suggests that the QRS complex is the most important region for the prediction of disease state, particularly in V1 and V6. High-risk features include R’ notching in V1, V2, and a taller R wave in V6.
Figure 5.
Median waveform analysis and saliency mapping. Median waveforms of the 10 highest (red line) and 10 lowest (green line) predicted risk for the primary outcome in independent patient electrocardiograms from Site A. Saliency mapping (blue shade) demarcates important prediction regions of the waveform.
Comparison to QRS duration model
A univariable logistic regression model to predict adverse ventricular remodelling using QRS duration was fit from 392 available ECGs and compared with the matching AI-ECG predictions. Artificial intelligence analysis of electrocardiograms outperformed QRS duration (AUROC 0.87 vs. 0.78, see Supplementary material online, Figure S3). QRS duration model was properly calibrated (Spieghalter’s Z test P = 0.99), but at a similar clinical decision threshold of 0.25, there was no net reduction in CMR, and it was slightly harmful (net reduction −0.02).
Error analysis
To better understand the characteristics of the false negative predictions and the implications on care delivery, studies were partitioned into correct and misclassified groups: true positive, false negative, false positive, and true negative categories to examine CMR measurements and age at CMR (Table 2). The false negative group had less extreme median adverse remodelling than the true positive group. The false negative group was also younger than the false positive group (median age 17.0 vs. 27.1 years P < 0.001), again demonstrating the diminished prediction performance by age.
Table 2.
Characteristics of artificial intelligence analysis of electrocardiogram model in correct and misclassified groups
| True positive | False negative | False positive | True negative | |
|---|---|---|---|---|
| n | 392 | 34 | 337 | 231 |
| Age (years) | 27.1 [19.1–41.0] | 17.0 [14.0–24.9] | 21.2 [15.4–30.6] | 21.9 [14.8–29.6] |
| RVEDVi (mL/m2) | 159.7 [135.0–186.4] | 150.9 [117.6–172.8] | 120.8 [104.2–135.0] | 117.0 [98.9–134.5] |
| RVESVi (mL/m2) | 92.0 [80.7–109.1] | 84.8 [68.4–90.0] | 60.2 [51.7–69.1] | 54.1 [44.9–65.7] |
| RVEF (%) | 41.2 [35.8–45.0] | 44.3 [40.1–49.0] | 50.0 [47.0–53.2] | 52.7 [49.9–56.9] |
| LVEF (%) | 52.0 [47.4–55.4] | 53.6 [50.0–58.0] | 58.0 [55.2–61.0] | 59.3 [55.7–64.0] |
Results are presented as median [IQR].
Discussion
The widely available and cost-effective 12-lead ECG can serve as an AI-ECG biomarker to personalize the assessment of adverse ventricular remodelling in TOF. By linking together CMR-defined criteria linked to adverse ventricular remodelling with the ECG waveform, this unique biomarker may inform the timing of further imaging and may be used to effectively reduce the number of CMR required in TOF. A strength of this study is the breadth of multiple external validations, which advance important technical and pragmatic aspects of multicentre AI model implementation in congenital heart disease. Important additional novel findings of this study are (i) RV contractile metrics such as RVESV and RVEF were better predicted than RVEDV; and (ii) AI-ECG models exhibited important differences in discrimination and calibration across external validation sites.
Clinical significance
The output of the AI-ECG model is the probability of CMR volumetric or functional derangements indicative of adverse ventricular remodelling. This may be used as a novel biomarker to risk-stratify the need and mode of advanced imaging. As the recommended CMR screening interval in asymptomatic adult TOF population is every 3 years,4,5 a proposed use of this model is to personalize the timing and mode of surveillance imaging. For low-risk patients who have less than a 25% risk of having an actionable CMR abnormality, use of this test may delay (but not necessarily replace due to false negatives) the need for surveillance CMR by a year, with 92% sensitivity and 87% NPV. A method of evaluating benefit is net benefit and decision curve analysis, which describes the net reduction in CMR in an acceptable range of risks. Acceptable risk is dependent on patient and practitioner practices. In this analysis, anything less than a 25% risk, or three ‘negative’ for every 1 CMR ‘positive’ for evidence of significant ventricular remodelling, was considered a reasonable range of risk to delay a CMR given the typically slow progression of ventricular remodelling in older children and adults.11 Because current practice is to obtain CMR on all patients, implementation of this strategy is equivalent to a strategy that identified every case of disease in the population but only referred 87% of patients for CMR. Importantly, decision curves illustrate net reduction over a range of thresholds of patient and provider risk preference. A net reduction in CMR was observed for all risk probabilities at and above 8%, meaning in only very risk-adverse providers or patients, would the current strategy of obtaining CMR on every patient be warranted. Although the model utilized North American CMR-based surveillance guidelines as prediction targets, these are similar to European Society of Cardiology guidelines, which also state RVEDVi ≥160 mL/m2, RVESVi ≥ 80 mL/m2, and RV systolic dysfunction are Class IIa recommendations for PVR.27
The AI-ECG biomarker may also be informative for the patient with a high risk (>75%) of adverse ventricular remodelling. They could be referred for earlier-than-usual CMR for earlier disease identification, or they may be referred directly for retrospective ECG-gated cardiac CT for transcatheter pulmonary valve evaluation28 and volumetric assessment to eliminate redundant imaging. Finally, patients with intermediate risk would continue usual care. The elimination of unnecessary or redundant imaging could increase cost-effectiveness and ensure that limited resources are used in those most likely to benefit.
This approach meets a particularly salient need in the ACHD population. Shortfalls in the availability of ACHD providers29 lead to unequal access to speciality care. Early adulthood is a particularly high-risk period for patients with congenital heart disease to lose access to care,30 and in one study, only 13% of patients were adherent to imaging guidelines10 highlighting the challenge of delivering appropriate care to this population. Furthermore, this is a high-risk age window in which adverse ventricular remodelling may develop, and PVR may be indicated.31 Artificial intelligence analysis of electrocardiograms could be an effective tool to provide increased access to congenital care in this at-risk population, as it may be performed with non-specialized equipment and providers. Even though global model discrimination (AUROC) was decreased in patients <22 years old, it is important to recognize that the screening threshold metrics still suggest clinical utility and net benefit in this patient population. Despite this, and even though the global false negative rate is low (high test sensitivity), error analysis suggests that the characteristic of the ‘missed’ patient tends to be the younger patients with CMR measurements closer to the threshold for intervention. The clinical impact of delaying diagnosis in younger patients may be minimized as PVR in children is less common,5 and the vast majority of TOF patients do not progress rapidly across serial CMR.11 However, as guidelines recommend a ‘baseline’ CMR in adolescence near the time of transition to adulthood,32 this algorithm may not be suited to replace this initial CMR. Differences in model performance may be related to underrepresentation within the training cohort due to the experimental requirement for a paired CMR (which is typically first obtained in older children and young adults) and BSA >1 m2.
Pathophysiological relevance
The ECG patterns in TOF convey important electromechanical information. Right bundle branch block QRS duration is prognostic, with a duration >180 ms a risk factor for sudden cardiac death.33,34 Notably, the AI-ECG model outperformed QRS duration for the prediction of adverse remodelling, and QRS duration alone was insufficient to reduce CMR usage at clinically relevant screening thresholds. QRS fragmentation is a more recently described pattern that has been associated with RV size, function, exercise tolerance, and outcome.35–40 Saliency mapping in this study identifies the later portions of the QRS complex, during early to mid-systole, as an important area for risk stratification, with notching of the R’ and R height observed as important factors. Interestingly, new data deemphasize the importance of diastolic size in risk stratification and emphasize that contractile metrics like RVESV and RVEF may have stronger associations with outcome.5,19,20,41 Similarly, this study suggests that RVESV and RVEF may have stronger electromechanical representation in the ECG than diastolic size, which is a potential mechanism for the correlations observed between the ECG, the CMR, and long-term outcome in TOF.
Interestingly, obesity was associated with improved model performance. Obesity in TOF is associated with lower LV and RV EF and leads to underestimation of the degree of ventricular dilation when volumes are indexed to BSA rather than ideal body weight.42 In this study, weak negative correlations between BMI and RVEF (Spearman rho = −0.07, P = 0.024, see Supplementary material online, Figure S4) and between BMI and RVEDVi (Spearman rho = −0.08, P = 0.015) are observed consistent with these prior reports. However, unsurprisingly, obesity and age were correlated with each other, and within the ≥22-year-old age group, there was not increased model performance in obese patients. Given the known limitations of BSA scaling in volumetric RV assessment in the setting of obesity, ECG analysis could be a novel method for identification of ventricular remodelling but further development of this method is required to ensure results are not confounded by age effects.
Implications for AI model development and deployment in congenital heart disease
This study goes beyond single-centre external validation to evaluate the generalizability of an AI-ECG model across a diverse spectrum of practices and settings. Based on the extensive external validation performed in this study, we conclude that local verification of discrimination and calibration is required. This is a novel and important finding in the congenital heart disease literature, particularly when examined in a regulatory framework. For ‘high risk’ AI systems, the newly implemented European Union AI Act demands ‘an appropriate level of accuracy’ which shall be defined ‘in cooperation with relevant stakeholders and organizations’. The United States Federal Drug Administration approval pathway requires evidence of model generalizability43 that may be demonstrated with two separate external validations. This study could meet these requirements as three centres performed strongly on external validation, but further evaluation revealed suboptimal performance in two other centres. This reinforces that demonstration of discrimination, calibration, and net benefit across a wide range of centres is important for safe and effective adoption of AI tools. The current standards could be inadequate to ensure truly generalizable performance, which warrants further examination across the breadth of AI-ECG applications.
Calibration is an essential consideration to ensure that model probability scores can be reliably interpreted as predicted risk, but it is not well studied in the congenital heart disease literature. Calibration varied across sites including miscalibration at one site that had a high AUROC (Site B). We speculate that differences in calibration may arise from out-of-distribution prevalence of adverse remodelling and age distributions in the miscalibrated centres (see Supplementary material online, Results and discussion). This highlights that model discrimination (AUROC) is only the first step towards AI model evaluation. Calibration is essential to ensure that globally applied risk thresholds can be expected to have similar performance across institutions. Furthermore, calibration ensures the predicted risk aligns with the observed outcomes to give clinicians the context needed to make personalized decisions. Important metrics like net benefit analysis are not sensical without proper model calibration24 because model output probability does not reflect the underlying risk of disease. This is further emphasized by analysis of net benefit curves for the uncalibrated model (see Supplementary material online, Figure S5), which shows very little net benefit at the a priori threshold amongst well-performing centres and negative net benefit when all centres are considered. This shows that without proper consideration of calibration, a model may be harmful compared with the current standard of care when considered in the context of clinical risk tolerance. These findings echo recent findings that an FDA-approved AI-ECG model has calibration differences across centres that significantly affect the model performance at predefined screening thresholds to identify hypertrophic cardiomyopathy.44 Mitigation of calibration differences across centres might be achieved through a multicentre training design to account for distribution differences across centres.
To aid in further research and implementation in the congenital heart disease community, the complete containerized pipeline and model to perform AI-ECG inference from 12-lead ECG in XML format is available for download at https://github.com/sonqduong/ECGsizefxn_rTOF.
Limitations
The AI-ECG model output is the predicted probability of significant CMR abnormalities. However, these volumetric criteria for intervention are debated. A scientific statement5 released after data collection, model training, and analysis of this study suggested modification of the CMR-based indications for PVR, though RVESVi, RVEF, LVEF, and RVEDVi are still all directly or indirectly integrated. It remains to be seen whether this guidance will be broadly adopted. Notably, in this study, the composite, RVEF, and RVESV models all exhibit similar performance, which suggests that this model could be successfully adapted to updated guidance. Although the models were produced from retrospective data before the release of this statement, if the definitions of significant ventricular remodelling change, then CMR screening practices are also likely to change over time, which may in turn become a source of model drift in prospective implementation. It is also recognized that the AI-ECG model cannot completely replace advanced imaging because CMR also provides important information beyond quantification of volumes and EF such as pulmonary regurgitation fraction, aorta and pulmonary artery size, branch pulmonary artery flow distribution, and evaluation for scar and fibrosis to further risk stratify outcomes.45–47
In this study, significant differences across institutions in CMR measurements and outcome prevalence are observed, which may be related to population differences across centres but may also be related to how CMR are quantified at each centre. The combined effect of interstudy and interobserver variability on RV measurements in TOF may be substantial, varying by 12.7 mL/m2 for RVEDVi, 9.3 mL/m2 for RVESVi, and 7% for RVEF.48 Even in healthy children, coefficients of variation in RVEDV, RVESV, and RVEF range from 6 to ∼10%, with evidence for systematic differences between observers.49 If there are systematic differences in how CMRs are measured, then this may degrade performance in external validation through ‘concept shift’ (i.e. similar ECG features may reflect the same underlying ‘true’ measurement, but due to CMR measurement differences, the ECG is labelled as disease-positive at one site and disease-negative at another). This may in part explain the observed differences in calibration observed across centres. Although one of the sites (Site B) submitted CMR data that were partially remeasured in a core lab18 and discrimination remained strong, it is likely that performance could be further improved by recontouring all studies at a core facility, as has been performed in other major TOF cohort studies18,50 to improve generalizability.
Because variations in discrimination are observed, it is important to verify discrimination and calibration locally. The calibration was performed using cross-validation instead of a hold-out calibration set to maximize sample size for detailed subgroup performance analysis. However, this strategy risks presenting overoptimistic calibration results, which may in turn affect net benefit analysis results, which are sensitive to calibration. In prospective implementation, recalibration would by necessity be performed on a hold-out calibration set. As this validation was only performed at North American sites, validation in Europe and other global locations that may benefit from AI-ECG screening is required. Finally, due to the experimental requirement for a paired CMR, the cohort is enriched for patients with more severe disease, as more frequent CMR is recommended in higher-risk patients.4 Certain populations with a likely lower incidence of disease such as those without significant pulmonary regurgitation might be underrepresented. We addressed this by examining net benefit under a range of projected disease prevalence and found that the net reduction in CMR increased as disease prevalence decreased.
Conclusions
Artificial intelligence analysis of electrocardiogram analysis can be used as a biomarker for adverse ventricular remodelling to inform the timing of advanced imaging in patients with repaired TOF. This may increase care efficiency and expand access to care by reducing unnecessary or redundant CMR. Through its rare multicentre design, this study also reveals important differences in performance and model calibration across centres and cohorts, which have important practical implications for real-world prospective implementation of AI-ECG biomarkers in congenital heart disease and beyond.
Supplementary Material
Acknowledgements
The authors would like to thank Dr. Andrew Vickers for advice in implementation of net benefit analysis.
Contributor Information
Son Q Duong, Department of Pediatrics (Cardiology), Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, Annenberg 3rd Fl, New York, NY 10029, USA; Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for Artificial Intelligence in Children’s Health, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Akhil Vaid, Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10029, USA.
Pengfei Jiang, Center for Child Health Services Research, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Population Health Sciences and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Yuval Bitterman, Department of Paediatrics, Labatt Family Heart Centre, Hospital for Sick Children (SickKids), Toronto M5G 1X8, Canada.
Yamini Krishnamurthy, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.
I Min Chiu, Department of Medicine (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.
Joshua Finer, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.
Brian Cleary, Department of Pediatrics (Cardiology), Northwestern Feinberg School of Medicine, Chicago, IL 60611, USA.
Benjamin S Glicksberg, Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for Artificial Intelligence in Children’s Health, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10029, USA.
Ruchira Garg, Department of Pediatrics (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.
Michael DiLorenzo, Department of Pediatrics (Cardiology), Columbia University College of Physicians and Surgeons, New York, NY 10032, USA.
Mark Friedberg, Department of Paediatrics, Labatt Family Heart Centre, Hospital for Sick Children (SickKids), Toronto M5G 1X8, Canada.
Evan Zahn, Department of Pediatrics (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.
Matthew Lewis, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.
Michael Satzer, Department of Pediatrics (Cardiology), Northwestern Feinberg School of Medicine, Chicago, IL 60611, USA.
David Ouyang, Department of Medicine (Cardiology), Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.
Pierre Elias, Department of Medicine (Cardiology), Columbia University Medical Center, New York, NY 10032, USA.
Tal Geva, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, USA.
Sunil Ghelani, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, USA.
Brett R Anderson, Department of Pediatrics (Cardiology), Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, Annenberg 3rd Fl, New York, NY 10029, USA; Center for Child Health Services Research, Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Population Health Sciences and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Ali Zaidi, Department of Pediatrics (Cardiology), Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, Annenberg 3rd Fl, New York, NY 10029, USA; Mount Sinai Fuster Heart Hospital, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Rachel M Wald, Department of Paediatrics, Labatt Family Heart Centre, Hospital for Sick Children (SickKids), Toronto M5G 1X8, Canada; Peter Munk Cardiac Centre, Toronto Adult Congenital Heart Disease Program, University of Toronto, Toronto M5G 2C4, Ontario, Canada.
Girish N Nadkarni, Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10029, USA.
Joshua Mayourian, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, USA.
Supplementary material
Supplementary material is available at European Heart Journal – Digital Health.
Author contributions
Son Duong (Conceptualization [lead]; Data curation [lead]; Formal analysis [lead]; Investigation [lead]; Methodology [lead]; Supervision [equal]; Validation [lead]; Visualization [lead]; Writing—original draft [lead]; Writing—review & editing [lead]), Rachel Wald (Conceptualization [supporting]; Data curation [supporting]; Formal analysis [supporting]; Investigation [supporting]; Supervision [supporting]; Writing—review & editing [supporting]), Ali Zaidi (Data curation [supporting]; Investigation [supporting]; Writing—review & editing [supporting]), Brett Anderson (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Sunil Ghelani (Data curation [supporting]; Writing—review & editing [supporting]), Tal Geva (Data curation [supporting]; Writing—review & editing [supporting]), Pierre Elias (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), David Ouyang (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Michael Satzer (Data curation [supporting]; Writing—review & editing [supporting]), Matthew Lewis (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Evan Zahn (Data curation [supporting]; Writing—review & editing [supporting]), Mark Friedberg (Data curation [supporting]; Writing—review & editing [supporting]), Michael DiLorenzo (Data curation [supporting]; Writing—review & editing [supporting]), Ruchira Garg (Data curation [supporting]; Writing—review & editing [supporting]), Benjamin Glicksberg (Methodology [supporting]; Resources [supporting]; Writing—review & editing [supporting]), Brian Cleary (Data curation [supporting]; Writing—review & editing [supporting]), Joshua Finer (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), I-Min Chiu (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Yamini Krishnamurthy (Data curation [supporting]; Writing—review & editing [supporting]), Yuval Bitterman (Data curation [supporting]; Writing—review & editing [supporting]), Pengfei Jiang (Data curation [supporting]; Formal analysis [supporting]; Writing—review & editing [supporting]), Akhil Vaid (Formal analysis [supporting]; Investigation [supporting]; Methodology [supporting]), Girish Nadkarni (Formal analysis [supporting]; Funding acquisition [equal]; Resources [equal]; Supervision [equal]; Writing—review & editing [lead]), and Joshua Mayourian (Conceptualization [equal]; Formal analysis [equal]; Investigation [equal]; Methodology [equal]; Writing—review & editing [equal])
Funding
This work was supported by National Institutes of Health K08HL173639 and the American Society of Echcardiography Foundation EDGES Award (S.Q.D.), National Institutes of Health R01HL167050 (G.N.N.), Kostin Innovation Fund (J.M.), Thrasher Research Fund Early Career Award (J.M.), and National Institutes of Health T32HL007572 (J.M.).
Data availability
The data underlying this article cannot be shared publically because it contains protected patient health information. The data may be shared upon reasonable request and in accordance with appropriate regulatory oversight. The containerized pipeline and model to perform AI-ECG inference from 12-lead ECG in XML format is available for download at https://github.com/sonqduong/ECGsizefxn_rTOF.
References
- 1. Apitz C, Webb GD, Redington AN. Tetralogy of Fallot. Lancet 2009;374:1462–1471. [DOI] [PubMed] [Google Scholar]
- 2. Geva T. Indications for pulmonary valve replacement in repaired tetralogy of Fallot: the quest continues. Circulation 2013;128:1855–1857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bokma JP, Geva T, Sleeper LA, Lee JH, Lu M, Sompolinsky T, et al. Improved outcomes after pulmonary valve replacement in repaired tetralogy of Fallot. J Am Coll Cardiol 2023;81:2075–2085. [DOI] [PubMed] [Google Scholar]
- 4. Stout KK, Daniels CJ, Aboulhosn JA, Bozkurt B, Broberg CS, Colman JM, et al. 2018 AHA/ACC guideline for the management of adults with congenital heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation 2019;139:e698–e800. [DOI] [PubMed] [Google Scholar]
- 5. Geva T, Wald RM, Bucholz E, Cnota JF, McElhinney DB, Mercer-Rosa LM, et al. Long-term management of right ventricular outflow tract dysfunction in repaired tetralogy of Fallot: a scientific statement from the American Heart Association. Circulation 2024;150:e689–e707. [DOI] [PubMed] [Google Scholar]
- 6. Mercer-Rosa L, Parnell A, Forfia PR, Yang W, Goldmuntz E, Kawut SM. Tricuspid annular plane systolic excursion in the assessment of right ventricular function in children and adolescents after repair of tetralogy of Fallot. J Am Soc Echocardiogr 2013;26:1322–1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Lopez L, Saurers DL, Barker PCA, Cohen MS, Colan SD, Dwyer J, et al. Guidelines for performing a comprehensive pediatric transthoracic echocardiogram: recommendations from the American Society of Echocardiography. J Am Soc Echocardiogr 2024;37:119–170. [DOI] [PubMed] [Google Scholar]
- 8. Buddhe S, Soriano BD, Powell AJ. Survey of centers performing cardiovascular magnetic resonance in pediatric and congenital heart disease: a report of the Society for Cardiovascular Magnetic Resonance. J Cardiovasc Magn Reson 2022;24:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Salciccioli KB, Oluyomi A, Lupo PJ, Ermis PR, Lopez KN. A model for geographic and sociodemographic access to care disparities for adults with congenital heart disease. Congenit Heart Dis 2019;14:752–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Khan AM, McGrath LB, Ramsey K, Agarwal A, Broberg CS. Association of adults with congenital heart disease-specific care with clinical characteristics and healthcare use. J Am Heart Assoc 2021;10:e019598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Rutz T, Ghandour F, Meierhofer C, Naumann S, Martinoff S, Lange R, et al. Evolution of right ventricular size over time after tetralogy of Fallot repair: a longitudinal cardiac magnetic resonance study. Eur Heart J Cardiovasc Imaging 2017;18:364–370. [DOI] [PubMed] [Google Scholar]
- 12. Wald RM, Valente AM, Gauvreau K, Babu-Narayan SV, Assenza GE, Schreier J, et al. Cardiac magnetic resonance markers of progressive RV dilation and dysfunction after tetralogy of Fallot repair. Heart 2015;101:1724–1730. [DOI] [PubMed] [Google Scholar]
- 13. Duong SQ, Vaid A, My VTH, Butler LR, Lampert J, Pass RH, et al. Quantitative prediction of right ventricular size and function from the ECG. J Am Heart Assoc 2024;13:e031671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Mayourian J, Gearhart A, La Cava WG, Vaid A, Nadkarni GN, Triedman JK, et al. Deep learning-based electrocardiogram analysis predicts biventricular dysfunction and dilation in congenital heart disease. J Am Coll Cardiol 2024;84:815–828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Duong SQ, Dominy CL, Lampert J, Singh S, Croft L, Zaidi AN, et al. Ensemble modeling of multimodal electrocardiogram and echocardiogram data improves quantitative assessment of right ventricular function. JACC Adv 2024;3:101186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Svennberg E, Han JK, Caiani EG, Engelhardt S, Ernst S, Friedman P, et al. State of the art of artificial intelligence in clinical electrophysiology in 2025: a scientific statement of the European Heart Rhythm Association (EHRA) of the ESC, the Heart Rhythm Society (HRS), and the ESC Working Group on E-Cardiology. Europace 2025;27:euaf071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:q902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Wald RM, Altaha MA, Alvarez N, Caldarone CA, Cavallé-Garrido T, Dallaire F, et al. Rationale and design of the Canadian outcomes registry late after tetralogy of Fallot repair: the CORRELATE study. Can J Cardiol 2014;30:1436–1443. [DOI] [PubMed] [Google Scholar]
- 19. Valente AM, Gauvreau K, Assenza GE, Babu-Narayan SV, Schreier J, Gatzoulis MA, et al. Contemporary predictors of death and sustained ventricular tachycardia in patients with repaired tetralogy of Fallot enrolled in the INDICATOR cohort. Heart 2014;100:247–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Bokma JP, Winter MM, Oosterhof T, Vliegen HW, van Dijk AP, Hazekamp MG, et al. Preoperative thresholds for mid-to-late haemodynamic and clinical outcomes after pulmonary valve replacement in tetralogy of Fallot. Eur Heart J 2016;37:829–835. [DOI] [PubMed] [Google Scholar]
- 21. Mayourian J, La Cava W, Vaid A, Ghelani SJ, Mannix R, Bezzerides VJ, et al. Pediatric electrocardiogram-based deep learning to predict left ventricular dysfunction and remodeling. Circulation 2024;149:917–931, [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Ribeiro AH, Ribeiro MH, Paixão GMM, Oliveira DM, Gomes PR, Canazart JA, et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 2020;11:1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Huang Y, Li W, Macheret F, Gabriel RA, Ohno-Machado L. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc 2020;27:621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016;352:i6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Khurshid S, Friedman S, Pirruccello JP, Di Achille P, Diamant N, Anderson CD, et al. Deep learning to predict cardiac magnetic resonance–derived left ventricular mass and hypertrophy from 12-lead ECGs. Circ Cardiovasc Imaging 2021;14:e012281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Mayourian J, El-Bokl A, Lukyanenko P, La Cava WG, Geva T, Valente AM, et al. Electrocardiogram-based deep learning to predict mortality in paediatric and adult congenital heart disease. Eur Heart J 2025;46:856–868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Baumgartner H, De Backer J, Babu-Narayan SV, Budts W, Chessa M, Diller G-P, et al. 2020 ESC guidelines for the management of adult congenital heart disease. Eur Heart J 2021;42:563–645. [DOI] [PubMed] [Google Scholar]
- 28. Gillespie MJ, Benson LN, Bergersen L, Bacha EA, Cheatham SL, Crean AM, et al. Patient selection process for the harmony transcatheter pulmonary valve early feasibility study. Am J Cardiol 2017;120:1387–1392. [DOI] [PubMed] [Google Scholar]
- 29. Chowdhury D, Johnson JN, Baker-Smith CM, Jaquiss RDB, Mahendran AK, Curren V, et al. Health care policy and congenital heart disease: 2020 focus on our 2030 future. J Am Heart Assoc 2021;10:e020605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Moons P, Bratt EL, De Backer J, Goossens E, Hornung T, Tutarel O, et al. Transition to adulthood and transfer to adult care of adolescents with congenital heart disease: a global consensus statement of the ESC Association of Cardiovascular Nursing and Allied Professions (ACNAP), the ESC Working Group on Adult Congenital Heart Disease (WG ACHD), the Association for European Paediatric and Congenital Cardiology (AEPC), the Pan-African Society of Cardiology (PASCAR), the Asia-Pacific Pediatric Cardiac Society (APPCS), the Inter-American Society of Cardiology (IASC), the Cardiac Society of Australia and New Zealand (CSANZ), the International Society for Adult Congenital Heart Disease (ISACHD), the World Heart Federation (WHF), the European Congenital Heart Disease Organisation (ECHDO), and the Global Alliance for Rheumatic and Congenital Hearts (Global ARCH). Eur Heart J 2021;42:4213–4223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Slouha E, Trygg G, Tariq AH, La A, Shay A, Gorantla VR. Pulmonary valve replacement timing following initial tetralogy of Fallot repair: a systematic review. Cureus 2023;15:e49577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Mayourian J, La Cava WG, Vaid A, Nadkarni GN, Ghelani SJ, Mannix R, et al. Pediatric ECG-based deep learning to predict left ventricular dysfunction and remodeling. Circulation 2024;149:917–931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Khairy P, Harris L, Landzberg MJ, Viswanathan S, Barlow A, Gatzoulis MA, et al. Implantable cardioverter-defibrillators in tetralogy of Fallot. Circulation 2008;117:363–370. [DOI] [PubMed] [Google Scholar]
- 34. Gatzoulis MA, Till JA, Somerville J, Redington AN. Mechanoelectrical interaction in tetralogy of Fallot. Circulation 1995;92:231–237. [DOI] [PubMed] [Google Scholar]
- 35. Egbe AC, Luis SA, Padang R, Warnes CA. Outcomes in moderate mixed aortic valve disease: is it time for a paradigm shift? J Am Coll Cardiol 2016;67:2321–2329. [DOI] [PubMed] [Google Scholar]
- 36. Bokma JP, Winter MM, Vehmeijer JT, Vliegen HW, van Dijk AP, van Melle JP, et al. QRS fragmentation is superior to QRS duration in predicting mortality in adults with tetralogy of Fallot. Heart 2017;103:666–671. [DOI] [PubMed] [Google Scholar]
- 37. Alonso P, Andrés A, Rueda J, Buendía F, Igual B, Rodríguez M, et al. Value of the electrocardiogram as a predictor of right ventricular dysfunction in patients with chronic right ventricular volume overload. Rev Esp Cardiol 2015;68:390–397. [DOI] [PubMed] [Google Scholar]
- 38. Buntharikpornpun R, Jaruratanasirikul S, Roymanee S, Jarutach J, Wongwaitaweewong K, Sangthong R. Correlation between fragmented QRS and ventricular function from cardiac magnetic resonance in patients with repaired tetralogy of Fallot. Pediatr Cardiol 2021;42:1713–1721. [DOI] [PubMed] [Google Scholar]
- 39. Book WM, Hurst JW, Parks WJ, Hopkins KL. Electrocardiographic predictors of right ventricular volume measured by magnetic resonance imaging late after total repair of tetralogy of Fallot. Clin Cardiol 1999;22:740–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Lumens J, Fan CS, Walmsley J, Yim D, Manlhiot C, Dragulescu A, et al. Relative impact of right ventricular electromechanical dyssynchrony versus pulmonary regurgitation on right ventricular dysfunction and exercise intolerance in patients after repair of tetralogy of Fallot. J Am Heart Assoc 2019;8:e010903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Ishikita A, McIntosh C, Hanneman K, Lee MM, Liang T, Karur GR, et al. Machine learning for prediction of adverse cardiovascular events in adults with repaired tetralogy of Fallot using clinical and cardiovascular magnetic resonance imaging variables. Circ Cardiovasc Imaging 2023;16:e015205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Aly S, Lizano Santamaria RW, Devlin PJ, Jegatheeswaran A, Russell J, Seed M, et al. Negative impact of obesity on ventricular size and function and exercise performance in children and adolescents with repaired tetralogy of Fallot. Can J Cardiol 2020;36:1482–1490. [DOI] [PubMed] [Google Scholar]
- 43. Health C for D and R . Good Machine Learning Practice for Medical Device Development: Guiding Principles. FDA. Published online March 25, 2025. https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles (6 August 2025).
- 44. Lampert J, Bhatt DL, Vaid A, Kon K, Feinman J, Jou S, et al. Calibration of ECG-based deep-learning algorithm scores for patients flagged as high risk for hypertrophic cardiomyopathy. NEJM AI 2025;2:AIoa2400421. [Google Scholar]
- 45. Geva T. Repaired tetralogy of Fallot: the roles of cardiovascular magnetic resonance in evaluating pathophysiology and for pulmonary valve replacement decision support. J Cardiovasc Magn Reson 2011;13:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Valente AM, Cook S, Festa P, Ko HH, Krishnamurthy R, Taylor AM, et al. Multimodality imaging guidelines for patients with repaired tetralogy of Fallot: a report from the American Society of Echocardiography. J Am Soc Echocardiogr 2014;27:111–141. [DOI] [PubMed] [Google Scholar]
- 47. Ghonim S, Gatzoulis MA, Ernst S, Li W, Moon JC, Smith GC, et al. Predicting survival in repaired tetralogy of Fallot: a lesion-specific and personalized approach. JACC Cardiovasc Imaging 2022;15:257–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Blalock SE, Banka P, Geva T, Powell AJ, Zhou J, Prakash A. Inter-study variability in CMR measurements of right ventricular volume, mass and ejection fraction in tetralogy of Fallot: a prospective observational study. J Cardiovasc Magn Reson 2012;14:P104. [DOI] [PubMed] [Google Scholar]
- 49. van der Ven JPG, Sadighy Z, Valsangiacomo Buechel ER, Sarikouch S, Robbers-Visser D, Kellenberger CJ, et al. Multicentre reference values for cardiac magnetic resonance imaging derived ventricular size and function for children aged 0–18 years. Eur Heart J Cardiovasc Imaging 2020;21:102–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Valente AM, Gauvreau K, Assenza GE, Babu-Narayan SV, Evans SP, Gatzoulis M, et al. Rationale and design of an international multicenter registry of patients with repaired tetralogy of Fallot to define risk factors for late adverse outcomes: the INDICATOR cohort. Pediatr Cardiol 2013;34:95–104. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article cannot be shared publically because it contains protected patient health information. The data may be shared upon reasonable request and in accordance with appropriate regulatory oversight. The containerized pipeline and model to perform AI-ECG inference from 12-lead ECG in XML format is available for download at https://github.com/sonqduong/ECGsizefxn_rTOF.






