. 2022 Dec 14;1(5):100153. doi: 10.1016/j.jacadv.2022.100153

Table 1.

Application of Artificial Intelligence in Congenital Heart Disease

First Author	Year	Patient Population	Category for Analysis	Models	Training/Validation Data Sets	Test Data Set	Results Metrics	Limitations
Prenatal CHD screening
Chen et al²⁴	2017	900 fetuses	Echocardiograms	Composite RNN to define standard fetal cardiac imaging planes	900 videos	331 videos	AUC: 0.95	Limited to healthy patients, not tested on CHD.
Dong et al²⁵	2022	3,910 fetuses (14.1% with CHD)	Echocardiograms	Random forest algorithms (ML) to differentiate normal and CHD hearts	25 features	10 features	AUC: 0.94 Sensitivity: 0.85 Specificity: 0.88	Tabular data instead of raw images. No specific subtypes of CHD defined. Single center.
Arnaout et al¹²	2021	1,326 fetuses	Echocardiograms	CNN (classification)	107,823 images from 1,326 echocardiograms	4,108 fetal ultrasounds	AUC: 0.99 Sensitivity: 0.95 Specificity: 0.96	No published algorithms. Not clinically deployed in practice.
Truong et al²⁶	2022	3,910 fetuses (14.1% with CHD)	Echocardiograms	Random forest algorithms (ML) to differentiate normal and CHD hearts	25 features	10 features	AUC: 0.94 Sensitivity: 0.85 Specificity: 0.88	Tabular data instead of raw images. No specific subtypes of CHD defined. Single center.
Postnatal CHD screening
Gharenhbaghi et al²⁷	2017	55 healthy children vs 35 BAV	Heart sounds	Support vector machine and Markov model	Unknown	Unknown	Sensitivity: 0.86 Specificity: 0.87	Small study and clinical deployment not widespread.
Gharenhbaghi et al²⁸	2020	50 healthy children vs 35 septal defects vs 30 valvular regurgitation	Heart sounds	Time growing neural network (a type of DL)	80 patients for training, 30% random sampling as validation test	Unknown	Sensitivity: 0.92	No test data sets and not used clinically.
Toba et al²⁹	2020	1,031 cardiac catheterizations from 657 CHD patients to predict pulmonary-to-systemic flow ratio	Chest x-rays	Transfer learning of CNN	931	100	AUC: 0.88 Sensitivity: 0.47 Specificity: 0.95	Lack of external validation. Bias as all CHD patients who had a cardiac catheterization. Limited number of patients in the training group.
Gomez-Quintana et al³⁰	2021	265 term and late-preterm neonates (137 normal vs 89 PDA vs 39 CHD patients)	Heart sounds (healthy vs PDA) (healthy vs CHD)	ML	90% of data	10% of data	AUC (PDA): 0.74 AUC (CHD): 0.78	Not clinically deployed. Limited data sets.
Mori et al³¹	2021	1,192 EKGs from 728 patients (828 normal and 364 ASD)	EKG	CNN and LSTM	Validation was 25% of 1,000 learning data	192 EKG (155 healthy and 37 ASD)	AUC: 0.96 Sensitivity: 0.76 Specificity: 0.96	Volume of data was small for DL. Bias associated with priming effect. Insufficient data to deploy into clinical practice.
Lai et al³²	2021	236 newborns	Pulse oximetry	ML (random forest, logistic regression, multilayer perception)	158 healthy and 27 CHD patients (0-48 h), 50 healthy and 36 CHD patients (>48 h)	50 healthy and 36 CHD	AUC: 0.91 Sensitivity: 95.8 Specificity: 86.4	Small data sets
Bos et al³³	2021	2,059 patients; 967 with LQTS and 1,092 evaluated for LQTS but discharged without a diagnosis	EKGs	CNN classification	Trained using 60% and validated in 10% of the patients	Tested on remaining 30% of patients	AUC was 0.900 (95% CI: 0.876-0.925)	Bias as patient cohort sent with suspicion of possible LQTS limiting generalizability. Lacks external validation and calibration from a different center.
Hong et al³⁴	2022		Color Doppler echocardiogram images	CNN for classification and segmentation	4,031 cases with 370,057 images	229 cases with 203,619 images of which 105 cases with ASD and 124 with intact atrial septum	Accuracy, recall, precision, specificity, and F1 score of 0.8833, 0.8545, 0.8577, 0.9136, and 0.8546, respectively	Not generalizable to spectrum of CHD; single center.
Cardiac imaging
Pereira et al³⁵	2017	90 patients; 26 coarctation and 64 healthy	2D echocardiograms of the parasternal long axis, apical 4-chamber, and suprasternal notch views	SVM (support vector machine classifiers)	Trained on 80%	Tested on 20%	Total error rate of 12.9% (11.5% false negative error and 13.6% false positive)	Single-center study. Limited to single disease. No external validation.
Diller et al¹⁰	2019	132 patients with a systemic RV and 67 normal controls (73,425 TGA; 33,394 ccTGA; and 24,354 normal apical 4-chamber frames)	Echocardiograms	CNN—classification and segmentation	159	40	Accuracy: 0.98	Model requires external validation.
Wegner et al³⁶	2022	9,793 echocardiogram images from 262 patients with CHD (ToF, Ebstein, TGA) and 62 controls used to build a new model. Prior model was trained on 14,035 echocardiograms from patients without CHD for automated view classification.	Echocardiograms from patients with CHD or structural heart disease used to validate existing CNN trained on structurally normal hearts. Additional model built trained on CHD echocardiograms to compare performance.	CNN view classification model	80% for training and validation	20% for testing	Noncongenital model overall accuracy of 48.3% vs 66.7% in patients without cardiac disease for correct view classification in patients with CHD. New CHD trained model accuracy of 76.1% for view classification.	Single-center study. Not vendor agnostic. Relatively small number of patients with cyanotic forms of CHD (ie, 3 patients with HLHS, 1 with tricuspid atresia).
Karimi-Bidhedi et al¹³	2020	64 patients (20 ToF, 9 DORV, 9 TGA, 8 cardiomyopathy, 9 coronary artery anomaly, 4 pulmonary stenosis, 3 truncus, 2 aortic arch anomaly)	MRI images	Generative Adversarial Network (form of unsupervised learning) to augment data used to augment training set. CNN used to segment MRI images	26 patients randomly assigned to training data set (split 80/20 for training and validation)	38 Patients randomly selected for testing	Dice Similarity Index metrics of 91% and 86.8% for LV at end-diastole and end-systole, respectively, and 87.4% and 80.6% for RV at end-diastole and end-systole, respectively. Externally validated.	Single site. Small patient numbers.
Tandon et al³⁷	2021	87 cardiac MRI from repaired ToF patients	MRI images	CNN—transfer learning	57	30	Dice similarity coefficient: 0.90	Small data sets
Wang et al³⁸	2021	1,308 children (823 healthy, 209 VSDs, 276 ASDs)	Echocardiograms	CNN view classification for 5 views	90% training	10% testing	Autoencoders trained significantly better on CHD samples than healthy samples; cross-entropy healthy: 0.2649 ± 0.0369 vs 0.2597 ± 0.0327 for CHD, and mean squared difference healthy: 133.89 ± 79.06 vs 118.86 ± 61.52 for CHD. A lower cross-entropy indicates a closer representation of the underlying distribution.	No external validation. Limited diseases.
Procedural planning for catheterization and surgery
Ruiz-Fernandez et al³⁹	2016	2,432 patients	Basic clinical data, healthy history, surgical intervention, and postsurgical intervention	Classification model: 1. Multilayer perceptron 2. Radial basis function 3. Self-organizing map 4. Decision tree	2,432	2,432	Accuracy: 0.99	Not clinically deployed
Lu et al⁴⁰	2020	550 echocardiogram images; 275 before and after atrial septal occlusion surgery	2D echocardiogram images	Variant of the U-Net architecture used to perform atrial segmentation via CNN to determine surgical outcomes of atrial septal defects before and after septal occlude	3:1 Training-to-testing ratio		The U-net mean and SD reported for the Dice Similarity Index, Jaccard Index, and Hausdorff Distance were 0.9488 (±0.0209), 0.9033 (±0.0374), and 7.5625 (±4.4549), respectively.	Single clinical site and scanner used. No external validation.
Outcome prediction and risk stratification
Diller et al¹⁰	2019	10,019 adult CHD patients	Clinical data, EKG, cardiopulmonary exercise test, laboratory markers	CNN to categorize diagnostic groups, disease complexity, and New York Heart Association Class	44,000 medical reports	Unclear	Accuracy 91% in diagnosis, 96% in disease complexity, 90% New York Heart Association Class	Retrospective single-center data. Raw echo and MRI data using specifically trained data need validation externally.
Atallah et al⁴¹	2020	288 patients (72 ToF patients and 216 controls)	Clinical data and noninvasive testing	Random forest Decision tree to risk stratify into low, moderate, high risk for ventricular arrhythmia and life-threatening events	Unknown	Unknown	High-risk group Sensitivity: 0.54 Specificity: 0.86	Small data set and retrospective. Unknown numbers for training and testing data sets.
Jalali et al⁴²	2020	549 single-ventricle patients	Clinical data, surgery	Logistic regression Decision tree Random forest Gradient boosting 1. Deep neural network	25 out of 100 variables selected for training	Unknown	AUC (mortality/cardiac transplantation): 0.95 AUC (prolonged length of stay): 0.94	Exclusion of very ill patients from the PHN SVR trial, thus biased toward higher survival rates. Retrospective data set.
Bertsimas et al⁴³	2021	235,000 patients with 295,000 operations	Clinical data, general preoperative patient risk factors to predict mortality, postoperative MVST, and length of hospital stay (LOS)	2. Optimal classification trees 3. Random forests 4. Gradient boosting	175,239	46,096	AUC (mortality): 0.86 AUC (prolonged MVST): 0.85 AUC (prolonged LOS): 0.82	Heterogeneous data can lead to bias.
Precision medicine
Meza et al⁴⁴	2018	651 neonates with critical left heart obstruction	136 echocardiographic measures to group patients into 3 subtypes and identify differentiating characteristics	Unsupervised clustering analysis	Divided into group 1, 215; group 2, 338; and group 3, 98.		Median LV end diastolic area was 1.35, 0.69, 2.47 cm² in groups 1, 2, and 3; P < 0.001. Overall mortality was 27%, 41%, and 12%, respectively; P < 0.001.
Bruse et al⁴⁵	2017	60 patients	CMR	Automated segmentation, statistical shape modeling and unsupervised hierarchical clustering to group patients accordingly and identify novel subgroups	Cohort divided into 20 healthy subjects, 20 patients who had undergone surgical aortic arch reconstruction, and 20 patients who had their aorta pushed back posteriorly in the Lecompte maneuver for arterial switch operation		Achieved automatic division of input shape data according to primary clinical diagnosis with an high F-score (0.902 ± 0.042) and Matthews correlation coefficient (0.851 ± 0.064) using the correlation/weighted distance/linkage combination.	Relatively small cohort of patients; not generalizable to other forms CHD
Bahado-Singh et al⁴⁶	2022	24 coarctation patients and 16 controls	Blood spots	Deep learning to perform genome-wide DNA methylation analysis	Unknown	Unknown	AUC: 0.97 Sensitivity: 0.95 Specificity: 0.98	Unknown number of training and testing data sets

ASD = atrial septal defect; AUC = area under the curve; BAV = bicuspid aortic valve; ccTGA = corrected transposition of the great arteries; CHD = congenital heart; CI = confidence interval; CMR = cardiac magnetic resonance imaging; CNN = convolutional neural network; DL = deep learning; DORV = double outlet right ventricle; EKG = electrocardiogram; HLHS = hypoplastic left heart syndrome; LQTS = long QT syndrome; LSTM = long short term memory; LV = left ventricle; ML = machine learning; MRI =magnetic resonance imaging; MVST = mechanical ventilatory support time; PDA = patent ductus arteriosus; PHN = pulmonary hypertension; RNN =recurrent neural network; RV = right ventricle; SD = standard deviation; SVM = support vector machine; TGA = transposition of the great arteries; ToF = tetralogy of Fallot; VSD = ventricular septal defect.