. Author manuscript; available in PMC: 2022 May 2.

Published in final edited form as: Cardiol Young. 2021 Nov 2;31(11):1770–1780. doi: 10.1017/S1047951121004212

Table:

Summary of Studies Focusing on Machine Learning Techniques Applied to Diagnosing and Assessing CHD in Neonates, Infants, and Children

Study Citation	Study Aim and Design	Machine Learning Approach	Main Findings
*Auscultation of Heart Sounds*
Gomez-Quintana S, et.al. (2021).³⁴	Case-control study (n= 265 newborns [39 Other CHD, 89 PDA, 137 healthy], gestational ages 35 to 42 weeks). PCG recordings in the first 6 days of life used to create a decision support system that can detect sound signatures with and without PDA and CHD. Two clinical sites were used.	A boosted decision tree classifier with 10-fold cross validation using a training and testing 90/10 split dataset was used to estimate the probability of PDA or CHD. No external validation.	The classifier achieved an AUC of 78% for detecting CHD and 77% for detecting PDA.
Lv Jingjing, et. al. (2021).³⁷	Case-control study (n= 1362 [1194 abnormal heart sounds and 168 normal heart sounds], mean age 2.4 years [SD 3.1, median 0.9, and age range 1 day to 15.9 years]). To compare detection of abnormal heart sounds via remote and automated artificial intelligence auscultation with face-to-face auscultations by experienced cardiologists (gold standard). One clinical site was used.	CNN with 5-fold cross validation. No external validation.	Remote auscultations compared to face-to-face auscultations- sensitivity, specificity, and accuracy of 98% (95% CI 97-99%), 91% (95% CI 87-95%), and 97% (95% CI 96-98%), respectively. Automated artificial intelligence auscultations compared to face-to-face auscultations- sensitivity, specificity, and accuracy of 97% (95% CI 96-98%), 89% (95% CI 84-94%), and 96% (95% CI 95-97%), respectively.
Aziz S, et. al. (2020).²⁹	Case-control study (n= 56 [17 ASD, 11 VSD, and 28 healthy], to automate detection and classification of CHD through use of pattern recognition techniques. One clinical site was used.	SVM with 10-fold cross validation. Quadratic, cubic, and Gaussian kernels were applied to the SVM classifier. No external validation.	SVM classifier with a cubic kernel function using a subset of fused frequency and temporal based features, had the best performance for binary and multiclass experiments. Accuracy 95.24%, sensitivity 95.24%, specificity 95.24%, PPV 86.96%, NPV 98.36%, and error 4.76%.
Gharehbaghi A, et. al. (2020).³⁰	Case-control study (n= 115 [10 ASD, 25 healthy with innocent murmur, 25 healthy with no murmur, 15 MR, 15 TR, and 25 VSD], average age 3.9 – 2.4 to 12.6 ± 4.4) to diagnose children with a septal defect versus children with valvular leakage. Both of these conditions are known to have a systolic murmur. One clinical site was used.	TGNN with K-fold validation where there are different values of K, ranging from 2 to half of the minimum group size (A-test method), using a training and testing 70/30 spilt dataset. Repeated random sub-sampling was applied, as well. No external validation.	Average accuracy 88.4% ± 3.9, sensitivity 91.6% ± 5.7, classification error 9.89% using the A-test method (evaluates structural risk).
Elgendi M, et. al. (2018).²⁷	Cohort study (n=60, median age 7 years [range 3 months to 78 years]) to classify pulmonary artery hypertension using sound signatures from non-invasive pulmonary circulation vibrations. One clinical site was used.	LDA with leave one out cross validation. No external validation.	Sensitivity 84%, specificity 88.57% for entropy (disorder of heart sound pattern) of the first sinusoid formant (frequency resonance) of heart sounds.
Sun S, et. al. (2018).²⁰	Case-control study (n= 227 [60 VSD and 167 healthy], VSD group average age 2 years and healthy group average age 21 years) toclassify small, medium, and large VSD based on heart sound feature extraction. Two databases (3M and Michigan), 1 cohort of undergraduate students, and 1 clinical site were used.	PCA for feature generation and ellipse model, and SVM for classification using 2,276 heart sounds (22% positive cases). A Gaussian kernel was applied to the SVM classifier. No external validation.	The ellipse model used for classification, had the highest performance with accuracy 95.5%, 92.1%, and 96.2% for small, medium, and large VSD, sensitivity of 94.9%, 93.8%, and 95.3%, and specificity 95.6%, 91.9%, and 96.3%, respectively.
Thompson W, et. al. (2018).³⁶	Case-control study (n=603 cases [374 abnormal confirmed by echocardiogram and pathologic murmur, and 229 normal confirmed by echocardiogram and innocent murmur (90) or no murmur (139)], median age 8.8 ± 0.1 to 80.9 years) to compare classification of heart rate by murmur detection algorithm and gold standard 3-lead electrocardiogram. One clinical site was used.	Murmur detection algorithm that performs a signal quality check, heart sounds are segmented (S1 systole, S2, and diastole), feature vectors emerge, and then these feature vectors are used to build a non-linear artificial intelligence classifier. The patients included in the Johns Hopkins Cardiac Auscultatory Database were not used for training, only for testing.	Murmur detection algorithm sensitivity and specificity for detecting pathologic cases was 93% (CI 90-95%) and 81% (CI 75-85%), with accuracy of 88% (CI 85-91%).
Gharehbaghi A, et. al. (2017).²²	Case-control study (n=90 [55 healthy and 35 CHD], age 6.6± 1.2 to 11.8± 4.1 years) to classify BAV and mitral regurgitation using recorded heart sounds. One clinical site was used.	Hidden Markov Model and SVM using repeated random sub-sampling with 5-fold cross validation training / testing 50/50 split dataset. A quadratic kernel was applied to the SVM classifier. No external validation.	Accuracy 86.4%, sensitivity 85.6%, and specificity 87%.
Gharehbaghi A, et. al. (2017).²⁶	Case-control study (n=90 [30 VSD, age 3.6 ±1.2 years; 30 valvular regurgitation, age 11.8 ± 4.1 and 12.6 ± 4.4 years; 30 healthy, age 6.7 ± 3.7 years]) to classify phonocardiography recordings and distinguish between VSD and AV valve regurgitation. One clinical site was used.	TGNN with leave one out validation method. No external validation.	Accuracy 86.7% and sensitivity 83.3%.
Gharehbaghi A, et. al. (2015).¹	Case-control study (n=50 [22 BAV and 28 healthy], median age 7 years [range 2.5 to 12 years]) to classify BAV through use of recorded heart sounds. One clinical site was used.	Statistical TGNN, and SVM using 856 cardiac cycles (45% positive cases) with cross validation on training / testing 50/50 split dataset. A linear kernel was applied to the SVM classifier. No external validation.	The statistical TGNN on average had better performance than the other models with classification rate 87.4%, sensitivity 86.5%, and specificity 88.4%.
*Transthoracic Echocardiogram*
Wang J, et. al. (2021) ³³	Case-control study (n= 1308 children [823 healthy, 209 VSD, 276 ASD]) designed to automatically interpret five-view echocardiograms. One clinical site was used.	A multi-channel CNN was applied to the dataset with a training / testing 90/10 split. No external validation.	The video-based model diagnosed the binary classification problem (positive or negative) with 93.9% accuracy, and the 3-class classification problem (negative, ASD, VSD) with 92.1% accuracy. This model was also able to achieve an AUC for binary classification of 0.922. This model did not use a ground truth label or key-frame annotation.
Diller GP, et. al. (2019).¹⁹	Case-control study (n= 267 [152 CHD and 155 healthy], mean age 39±16 years) to remove artifacts from transthoracic echocardiograms by estimating cross-entropy (a loss function that measures differences between two probability distributions-original image vs. reconstructed image)^19,73 and sum of squared differences (measures quality between healthy and CHD images). One clinical site was used.	DNN with an autoencoder applied to 153,420 apical 4-chamber views from CHD subjects and 24,354 from healthy subjects with 70/30 training/testing split. No external validation.	Autoencoders trained significantly better on CHD samples than healthy samples (cross-entropy- healthy: 0.2649 ± 0.0369 vs. 0.2597 ± 0.0327 for CHD), and (mean squared difference- healthy: 133.89 ± 79.06 vs. 118.86 ± 61.52 for CHD). A lower cross-entropy indicates a closer representation of the underlying distribution.
Diller GP, et. al. (2019).²⁵	Case-control study (n= 199 [132 CHD and 67 healthy], mean age 38±12 to classify transposition of the great arteries after arterial switch procedure vs. congenitally corrected or healthy subjects. Two clinical sites were used.	CNN on 4-chamber apical view images with 80/20 training/testing split. No external validation.	Model accuracy 95% in the training set and 94.4% in the testing set.
Meza JM, et. al. (2018).²¹	Cohort study of neonates with critical left heart obstruction (n=651, median gestational age 38 [38-39] weeks) to phenotype clinically meaningful clusters of baseline and pre-intervention disease. 21 clinical sites enrolled in the Congenital Heart Surgeons’ Society Data Center were used.	Unsupervised hierarchical, non-overlapping, agglomerative cluster analysis used 136 baseline quantitative and qualitative morphologic and functional variables from baseline echocardiograms. No external validation.	Three distinct groups emerged (C1=215, C2=338, and C3=98). Aortic valve atresia and LV end diastolic volume were significantly different between groups (11%, 87%, and 8% for aortic atresia and 1.35, 0.69, and 2.47 cm2 for median LV end diastolic area between the three clusters, respectively).
Pereira F, et. al. (2017).²³	Case-control study (n= 90 [26 CoA and 64 healthy], neonatal mean age 7 days) to classify CoA and healthy hearts from 2-D echocardiograms. One clinical site was used.	SVM using 5-fold cross-validation on training and testing datasets of ~80/20. A Gaussian kernel was applied to the SVM classifier. No external validation.	The parasternal long axis view had the lowest false negative error rate (end diastolic phase [7.7], end systolic phase [11.5]), and the lowest total error rates (end diastolic phase [18.9], end systolic phase [20.0]).
*Advanced Medical Imaging*
Tandon A, et. al. (2021) ³⁵	Cohort study (n= 87 patients with repaired tetralogy of Fallot and pulmonary stenosis or atresia, age in the training dataset was 13.5 years [IQR 10-17.5], and age in the testing dataset was 13.9 [IQR 11.7-18]) to automate ventricular contouring during CMR of repaired tetralogy of Fallot patients. One clinical site and one scanner type was used.	CNN using training / testing datasets ~70/30. These datasets were not randomly split, the groups were separated by time. The earlier enrolled cases were assigned to the training dataset. No external validation.	This study was a continuation of previously established research. The retrained contouring algorithm included mostly structural normal hearts with the addition of the repaired tetralogy of Fallot patients. Spatial metrics were used to evaluate algorithm performance (Dice Similarity Coefficient- shows spatial overlap in 3-dimensions. A Dice of 1 = perfect spatial overlap and 0 = no spatial overlap). The LV endocardial, LV epicardial, and RV endocardial at end diastole all had improved Dice metrics in the retrained algorithm 0.903 (0.875, 0.920) p = 0.0248; 0.905 (0.881, 0.937) p < 0.0001; and 0.894 (0.855, 0.907) p < 0.0001.
Lu Y, et. al. (2020).³²	Cohort study (n= 3 ASD patients [550 images, 275 before and 275 after atrial septal occlusion surgery) to segment right atrium CMR images to aid in determining surgical outcomes. One clinical site and one scanner type was used.	U-net deep CNN was compared to an active contour model with cross validation using training and testing datasets in a 3:1 ratio. No external validation.	The proposed technique outperformed the traditional active contour model when accurately segmenting the atria. The U-net mean and SD reported for the Dice Similarity Index, Jaccard Index, and Hausdorff Distance were 0.9488 (± 0.0209), 0.9033 (± 0.0374), and 7.5625 (± 4.4549).
Karimi-Bidhedi S, et. al. (2020).³¹	Case-control study (n= 64 patients, age range of 2 to 18 years; [20 tetralogy of Fallot, 9 double outlet right ventricle, 9 transposition of the great arteries - repaired arterial switch operation, 8 cardiomyopathy, 9 coronary artery anomaly, 4 pulmonary stenosis or atresia, 3 truncus arteriosus, and 2 aortic arch anomaly]). Developed synthetically segmented CMR images to produce a large training dataset used for automated detection of complex heart disease. One clinical site and two scanner types were used.	A generative adversarial network was used to augment the training dataset. A fully convolutional network was used to segment the CMR images. The sample was split randomly, 26 patients were assigned to the training dataset and 38 patients to the testing dataset. The training dataset was split further 80/20 for training and validation. The framework was externally validated on second dataset.	The fully convolutional network (automated) produced average Dice Similarity Index metrics of 91% and 86.8% for LV at end-diastole and end-systole; and 87.4% and 80.6% for RV at end-diastole and end-systole, respectively.
Hauptmann A, et. al. (2019).²⁸	Cohort study (n=250 [retrospective data, mean age 22±13 years] and n=10 [prospective data, mean age 34±17 years]). For the prospective study, one clinical site and one scanner type was used.	CNN used to de-noise CMR images of free-breathing individuals with cross-validation on retrospective data and external validation on the prospective data.	RMSE and SSIM error rates of SNR, acceleration factor, and image cropping features were computed on the reconstructed image of the test dataset. The continuously rotating tiny golden angle CMR sampling pattern had the lowest RMSE and highest SSIM compared to all other sampling methods. SNR decreased from 20 dB to 10 dB, and the acceleration factor increased from 10x to 16x.
Bruse JL, et. al. (2017).²⁴	Case-control study (n= 60 [20 healthy aged 15±2 years, 20 with surgical aortic arch reconstruction aged 23±7 years, and 20 with Lecompte maneuver reconstruction aged 14±3 years) to identify meaningful clusters within anatomical shape data. One clinical site and one scanner type was used.	Agglomerative hierarchical clustering was performed to subdivide groups. Followed by PCA with leave one out strategy and 10-fold cross-validation. No external validation.	The best performing distance/linkage combination had correlation coefficient scores > 0.8 and an F score ~0.9. Classification accuracy for healthy arches, CoA shapes, and arterial switch shapes were 83%, 85%, and 100%, respectively.

Area Under the Curve (AUC), Atrial Septal Defect (ASD), Atrioventricular (AV), Bicuspid Aortic Valve (BAV), Cardiovascular Magnetic Resonance Imagining (CMR), Coarctation of the aorta (CoA), Confidence Interval (CI), Congenital Heart Defect (CHD), Convolutional Neural Network (CNN), Deep Neural Network (DNN), Linear Discriminant Analysis (LDA), Interquartile Range (IQR), Left Ventricle (LV), Mitral Regurgitation (MR), Negative Predictive Value (NPV), Patent Ductus Arteriosus (PDA), Positive Predictive Value (PPV), Principal Component Analysis (PCA), Right Ventricle (RV), Root Mean-Square Error (RMSE), Signal-to-Noise Ratio (SNR), Standard Deviation (SD), Structural Similarity Index (SSIM), Support Vector Machine (SVM), Time Growing Neural Network (TGNN), Tricuspid Regurgitation (TR), Ventricular Septal Defect (VSD)