Table 1.
First Author | Year | Patient Population | Category for Analysis | Models | Training/Validation Data Sets | Test Data Set | Results Metrics | Limitations |
---|---|---|---|---|---|---|---|---|
Prenatal CHD screening | ||||||||
Chen et al24 | 2017 | 900 fetuses | Echocardiograms | Composite RNN to define standard fetal cardiac imaging planes | 900 videos | 331 videos | AUC: 0.95 | Limited to healthy patients, not tested on CHD. |
Dong et al25 | 2022 | 3,910 fetuses (14.1% with CHD) | Echocardiograms | Random forest algorithms (ML) to differentiate normal and CHD hearts | 25 features | 10 features | AUC: 0.94 Sensitivity: 0.85 Specificity: 0.88 |
Tabular data instead of raw images. No specific subtypes of CHD defined. Single center. |
Arnaout et al12 | 2021 | 1,326 fetuses | Echocardiograms | CNN (classification) | 107,823 images from 1,326 echocardiograms | 4,108 fetal ultrasounds | AUC: 0.99 Sensitivity: 0.95 Specificity: 0.96 |
No published algorithms. Not clinically deployed in practice. |
Truong et al26 | 2022 | 3,910 fetuses (14.1% with CHD) | Echocardiograms | Random forest algorithms (ML) to differentiate normal and CHD hearts | 25 features | 10 features | AUC: 0.94 Sensitivity: 0.85 Specificity: 0.88 |
Tabular data instead of raw images. No specific subtypes of CHD defined. Single center. |
Postnatal CHD screening | ||||||||
Gharenhbaghi et al27 | 2017 | 55 healthy children vs 35 BAV | Heart sounds | Support vector machine and Markov model | Unknown | Unknown | Sensitivity: 0.86 Specificity: 0.87 |
Small study and clinical deployment not widespread. |
Gharenhbaghi et al28 | 2020 | 50 healthy children vs 35 septal defects vs 30 valvular regurgitation | Heart sounds | Time growing neural network (a type of DL) | 80 patients for training, 30% random sampling as validation test | Unknown | Sensitivity: 0.92 | No test data sets and not used clinically. |
Toba et al29 | 2020 | 1,031 cardiac catheterizations from 657 CHD patients to predict pulmonary-to-systemic flow ratio | Chest x-rays | Transfer learning of CNN | 931 | 100 | AUC: 0.88 Sensitivity: 0.47 Specificity: 0.95 |
Lack of external validation. Bias as all CHD patients who had a cardiac catheterization. Limited number of patients in the training group. |
Gomez-Quintana et al30 | 2021 | 265 term and late-preterm neonates (137 normal vs 89 PDA vs 39 CHD patients) | Heart sounds (healthy vs PDA) (healthy vs CHD) |
ML | 90% of data | 10% of data | AUC (PDA): 0.74 AUC (CHD): 0.78 |
Not clinically deployed. Limited data sets. |
Mori et al31 | 2021 | 1,192 EKGs from 728 patients (828 normal and 364 ASD) | EKG | CNN and LSTM | Validation was 25% of 1,000 learning data | 192 EKG (155 healthy and 37 ASD) | AUC: 0.96 Sensitivity: 0.76 Specificity: 0.96 |
Volume of data was small for DL. Bias associated with priming effect. Insufficient data to deploy into clinical practice. |
Lai et al32 | 2021 | 236 newborns | Pulse oximetry | ML (random forest, logistic regression, multilayer perception) | 158 healthy and 27 CHD patients (0-48 h), 50 healthy and 36 CHD patients (>48 h) | 50 healthy and 36 CHD | AUC: 0.91 Sensitivity: 95.8 Specificity: 86.4 |
Small data sets |
Bos et al33 | 2021 | 2,059 patients; 967 with LQTS and 1,092 evaluated for LQTS but discharged without a diagnosis | EKGs | CNN classification | Trained using 60% and validated in 10% of the patients | Tested on remaining 30% of patients | AUC was 0.900 (95% CI: 0.876-0.925) | Bias as patient cohort sent with suspicion of possible LQTS limiting generalizability. Lacks external validation and calibration from a different center. |
Hong et al34 | 2022 | Color Doppler echocardiogram images | CNN for classification and segmentation | 4,031 cases with 370,057 images | 229 cases with 203,619 images of which 105 cases with ASD and 124 with intact atrial septum | Accuracy, recall, precision, specificity, and F1 score of 0.8833, 0.8545, 0.8577, 0.9136, and 0.8546, respectively | Not generalizable to spectrum of CHD; single center. | |
Cardiac imaging | ||||||||
Pereira et al35 | 2017 | 90 patients; 26 coarctation and 64 healthy | 2D echocardiograms of the parasternal long axis, apical 4-chamber, and suprasternal notch views | SVM (support vector machine classifiers) | Trained on 80% | Tested on 20% | Total error rate of 12.9% (11.5% false negative error and 13.6% false positive) | Single-center study. Limited to single disease. No external validation. |
Diller et al10 | 2019 | 132 patients with a systemic RV and 67 normal controls (73,425 TGA; 33,394 ccTGA; and 24,354 normal apical 4-chamber frames) | Echocardiograms | CNN—classification and segmentation | 159 | 40 | Accuracy: 0.98 | Model requires external validation. |
Wegner et al36 | 2022 | 9,793 echocardiogram images from 262 patients with CHD (ToF, Ebstein, TGA) and 62 controls used to build a new model. Prior model was trained on 14,035 echocardiograms from patients without CHD for automated view classification. | Echocardiograms from patients with CHD or structural heart disease used to validate existing CNN trained on structurally normal hearts. Additional model built trained on CHD echocardiograms to compare performance. | CNN view classification model | 80% for training and validation | 20% for testing | Noncongenital model overall accuracy of 48.3% vs 66.7% in patients without cardiac disease for correct view classification in patients with CHD. New CHD trained model accuracy of 76.1% for view classification. | Single-center study. Not vendor agnostic. Relatively small number of patients with cyanotic forms of CHD (ie, 3 patients with HLHS, 1 with tricuspid atresia). |
Karimi-Bidhedi et al13 | 2020 | 64 patients (20 ToF, 9 DORV, 9 TGA, 8 cardiomyopathy, 9 coronary artery anomaly, 4 pulmonary stenosis, 3 truncus, 2 aortic arch anomaly) | MRI images | Generative Adversarial Network (form of unsupervised learning) to augment data used to augment training set. CNN used to segment MRI images | 26 patients randomly assigned to training data set (split 80/20 for training and validation) | 38 Patients randomly selected for testing | Dice Similarity Index metrics of 91% and 86.8% for LV at end-diastole and end-systole, respectively, and 87.4% and 80.6% for RV at end-diastole and end-systole, respectively. Externally validated. | Single site. Small patient numbers. |
Tandon et al37 | 2021 | 87 cardiac MRI from repaired ToF patients | MRI images | CNN—transfer learning | 57 | 30 | Dice similarity coefficient: 0.90 | Small data sets |
Wang et al38 | 2021 | 1,308 children (823 healthy, 209 VSDs, 276 ASDs) | Echocardiograms | CNN view classification for 5 views | 90% training | 10% testing | Autoencoders trained significantly better on CHD samples than healthy samples; cross-entropy healthy: 0.2649 ± 0.0369 vs 0.2597 ± 0.0327 for CHD, and mean squared difference healthy: 133.89 ± 79.06 vs 118.86 ± 61.52 for CHD. A lower cross-entropy indicates a closer representation of the underlying distribution. | No external validation. Limited diseases. |
Procedural planning for catheterization and surgery | ||||||||
Ruiz-Fernandez et al39 | 2016 | 2,432 patients | Basic clinical data, healthy history, surgical intervention, and postsurgical intervention | Classification model:
|
2,432 | 2,432 | Accuracy: 0.99 | Not clinically deployed |
Lu et al40 | 2020 | 550 echocardiogram images; 275 before and after atrial septal occlusion surgery | 2D echocardiogram images | Variant of the U-Net architecture used to perform atrial segmentation via CNN to determine surgical outcomes of atrial septal defects before and after septal occlude | 3:1 Training-to-testing ratio | The U-net mean and SD reported for the Dice Similarity Index, Jaccard Index, and Hausdorff Distance were 0.9488 (±0.0209), 0.9033 (±0.0374), and 7.5625 (±4.4549), respectively. | Single clinical site and scanner used. No external validation. | |
Outcome prediction and risk stratification | ||||||||
Diller et al10 | 2019 | 10,019 adult CHD patients | Clinical data, EKG, cardiopulmonary exercise test, laboratory markers | CNN to categorize diagnostic groups, disease complexity, and New York Heart Association Class | 44,000 medical reports | Unclear | Accuracy 91% in diagnosis, 96% in disease complexity, 90% New York Heart Association Class | Retrospective single-center data. Raw echo and MRI data using specifically trained data need validation externally. |
Atallah et al41 | 2020 | 288 patients (72 ToF patients and 216 controls) | Clinical data and noninvasive testing | Random forest Decision tree to risk stratify into low, moderate, high risk for ventricular arrhythmia and life-threatening events |
Unknown | Unknown | High-risk group Sensitivity: 0.54 Specificity: 0.86 |
Small data set and retrospective. Unknown numbers for training and testing data sets. |
Jalali et al42 | 2020 | 549 single-ventricle patients | Clinical data, surgery | Logistic regression Decision tree Random forest Gradient boosting
|
25 out of 100 variables selected for training | Unknown | AUC (mortality/cardiac transplantation): 0.95 AUC (prolonged length of stay): 0.94 |
Exclusion of very ill patients from the PHN SVR trial, thus biased toward higher survival rates. Retrospective data set. |
Bertsimas et al43 | 2021 | 235,000 patients with 295,000 operations | Clinical data, general preoperative patient risk factors to predict mortality, postoperative MVST, and length of hospital stay (LOS) |
|
175,239 | 46,096 | AUC (mortality): 0.86 AUC (prolonged MVST): 0.85 AUC (prolonged LOS): 0.82 |
Heterogeneous data can lead to bias. |
Precision medicine | ||||||||
Meza et al44 | 2018 | 651 neonates with critical left heart obstruction | 136 echocardiographic measures to group patients into 3 subtypes and identify differentiating characteristics | Unsupervised clustering analysis | Divided into group 1, 215; group 2, 338; and group 3, 98. | Median LV end diastolic area was 1.35, 0.69, 2.47 cm2 in groups 1, 2, and 3; P < 0.001. Overall mortality was 27%, 41%, and 12%, respectively; P < 0.001. | ||
Bruse et al45 | 2017 | 60 patients | CMR | Automated segmentation, statistical shape modeling and unsupervised hierarchical clustering to group patients accordingly and identify novel subgroups | Cohort divided into 20 healthy subjects, 20 patients who had undergone surgical aortic arch reconstruction, and 20 patients who had their aorta pushed back posteriorly in the Lecompte maneuver for arterial switch operation | Achieved automatic division of input shape data according to primary clinical diagnosis with an high F-score (0.902 ± 0.042) and Matthews correlation coefficient (0.851 ± 0.064) using the correlation/weighted distance/linkage combination. | Relatively small cohort of patients; not generalizable to other forms CHD | |
Bahado-Singh et al46 | 2022 | 24 coarctation patients and 16 controls | Blood spots | Deep learning to perform genome-wide DNA methylation analysis | Unknown | Unknown | AUC: 0.97 Sensitivity: 0.95 Specificity: 0.98 |
Unknown number of training and testing data sets |
ASD = atrial septal defect; AUC = area under the curve; BAV = bicuspid aortic valve; ccTGA = corrected transposition of the great arteries; CHD = congenital heart; CI = confidence interval; CMR = cardiac magnetic resonance imaging; CNN = convolutional neural network; DL = deep learning; DORV = double outlet right ventricle; EKG = electrocardiogram; HLHS = hypoplastic left heart syndrome; LQTS = long QT syndrome; LSTM = long short term memory; LV = left ventricle; ML = machine learning; MRI =magnetic resonance imaging; MVST = mechanical ventilatory support time; PDA = patent ductus arteriosus; PHN = pulmonary hypertension; RNN =recurrent neural network; RV = right ventricle; SD = standard deviation; SVM = support vector machine; TGA = transposition of the great arteries; ToF = tetralogy of Fallot; VSD = ventricular septal defect.