Table 1.
Author | Journal | Year | Outcome | Comparison between machine learning and conventional methods | Conclusion | |||
---|---|---|---|---|---|---|---|---|
Machine learning models | Conventional methods | |||||||
Classification of HF patients | ||||||||
Austin PC | Journal of Clinical Epidemiology | 2013 |
Discrimination HFpEF vs HFrEF |
Model | AUC | Model | AUC | Conventional LR performed at least as well as modern methods |
Regression tree | 0.683 | LR | 0.780 | |||||
Bagged regression tree | 0.733 | |||||||
Random forest | 0.751 | |||||||
Boosted regression tree (depth 1) | 0.752 | |||||||
Boosted regression tree (depth 2) | 0.768 | |||||||
Boosted regression tree (depth 3) | 0.772 | |||||||
Boosted regression tree (depth 4) | 0.774 | |||||||
CRT response | ||||||||
Kalscheur MM | Circ Arrhythm Electropysiol | 2018 | All-cause mortality or HF hospitalization in CRT recipients |
AUC values RF model (0.74, 95% CI 0.72–0.76) Sequential minimal optimization to train a SVM (0.67, 95% CI 0.65–0.68) |
AUC values Multivariate LR (0.67, 95% CI 0.65–0.69) |
The improvement in AUC for the RF model was statistically significant compared to the other models, p < 0.001 | ||
Data extraction | ||||||||
Zhang R | BMC Med Inform Decis Mak | 2018 | HF information (NYHA) extraction from clinical notes | RF, n-gram features → F-measure 93.78%, recall 92.23%, precision 95.40%, SVM → F-measure 93.52%, recall 93.21%, precision 93.84% | LR → F-measure 90.42%, recall 90.82%, precision 90.03% | ML-based methods outperformed a rule-based method. The best machine learning method was an RF | ||
HF diagnosis | ||||||||
Nirschi JJ | PlosOne | 2018 | HF diagnosis using biopsy images |
AUC value RF 0.952 Deep learning 0.974 |
AUC value Pathologists 0.75 |
ML models outperformed conventional methods | ||
Rasmy L | J Biomed Inform | 2018 | HF diagnosis |
AUC value Recurrent NN 0.822 |
AUC value LR 0.766 |
ML outperformed conventional methods | ||
Son CS | J Biomed Inform | 2012 | HF diagnosis | Rough sets based decision-making model → accuracy 97.5%, SENS 97.2%, SPE 97.7%, PPV 97.2%, NPV 97.7%, AUC 97.5% | LR-based decision-making model → accuracy 88.7%, SENS 90.1%, SPE 87.5%, PPV 85.3%, NPV 91.7%, AUC 88.8% | ML models outperformed conventional methods | ||
Wu J | Med Care | 2010 | HF diagnosis | Boosting using a less strict cut-off had better performance compared to SVM | The highest median AUC (0.77) was observed for LR with Bayesian information criterion | LR and boosting were, both, superior to SVM | ||
Identification of HF patients | ||||||||
Blecker S | JAMA Cardiology | 2016 | Identification of HF patients | ML using notes and imaging reports → (developmental set) AUC 99%, SENS 92%, PPV 80%. (Validation SET) AUC 97%, SENS 84%, PPV 80% | LR using structured data → (developmental set) AUC 96%, SENS 78%, PPV 80%. (Validation SET) AUC 95%, SENS 76%, PPV 80% | ML models improved identification of HF patients | ||
Blecker S | J Card Fail | 2018 | Identification of HF hospitalization | ML with use of both data → (developmental set) AUC 99%, SENS 98%, PPV 43%. (Validation SET) AUC 99%, SENS 98%, PPV 34% | LR using structured data, notes, and imaging reports → (developmental set) AUC 96%, SENS 98%, PPV 14%. (Validation SET) AUC 96%, SENS 98%, PPV 15% | ML models performed better in identifying decompensated HF | ||
Choi E | Journal of AMIA | 2017 | Predicting HF diagnosis from EHR |
AUC values 12-month observation → NN model 0.777 MLP with 1 hidden layer 0.765 SVM 0.743 K-NN 0.730 |
AUC values 12-month observation → LR 0.747 |
ML models performed better in detecting incident HF with a short observation window of 12–18 months | ||
Prediction of outcomes | ||||||||
Austin PC | Biom J | 2012 | 30-day mortality |
AUC values Regression tree 0.674 Bagged trees 0.713 Random forests 0.752 Boosted trees—depth one 0.769 Boosted trees—depth two 0.788 Boosted trees—depth three 0.801 Boosted trees—depth four 0.811 |
AUC values LR 0.773 |
Ensemble methods from the data mining and ML literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional LR models | ||
Austin PC | J Clin Epidemiol | 2010 | In-hospital mortality |
AUC values LR models Regression trees 0.620–0.651 |
AUC values LR 0.747–0.775 |
LR predicted in-hospital mortality in patients hospitalized with HF more accurately than did the regression trees | ||
Awan SE | ESC Heart Failure | 2019 | 30-day readmissions |
AUC values MLP 0.62 Weighted random forest 0.55 Weighted decision trees 0.53 Weighted SVM models 0.54 |
AUC values LR 0.58 |
The proposed MLP-based approach is superior to other ML and regression techniques | ||
Fonarow GC | JAMA | 2005 | In-hospital mortality |
AUC values CART model (derivation cohort 68.7%; validation cohort 66.8%) |
AUC values LR model (derivation cohort 75.9%; validation cohort 75.7%) |
Based on AUC, the accuracy of the CART model (derivation cohort 68.7%; validation cohort 66.8%) was modestly less than that of the more complicated LR model (derivation cohort75.9%; validation cohort 75.7%) | ||
Frizzell JD | JAMA Cardiol | 2016 | 30-day readmissions |
C-statistics Tree-augmented naive Bayesian network 0.618 RF 0.607 Gradient-boosted 0.614 Least absolute shrinkage and selection operator models 0.618 |
C-statistics LR 0.624 |
ML methods showed limited predictive ability | ||
Golas SB | BMC Med Inform Decis Mak | 2018 | 30-day readmissions |
AUC values Gradient boosting 0.650 ± 0.011 Maxout networks 0.695 ± 0.016 Deep unified networks 0.705 ± 0.015 |
AUC values LR 0.664 ± 0.015 |
Deep learning techniques performed better than other traditional techniques | ||
Hearn J | Circ Heart Fail | 2018 | Clinical deterioration (i.e., the need for mechanical circulatory support, listing for heart transplantation, or mortality from any cause) |
AUC values ppVo2 0.800 (0.753–0.838) Staged LASSO 0.827 (0.785–0.867) Staged NN 0.835 (0.795–0.880) BxB LASSO 0.816 (0.767–0.866) BxB NN 0.842 (0.794–0.882) |
AUC values CPET risk score 0.759 (0.709–0.799) |
NN incorporating breath-by-breath data achieved the best performance | ||
Kwon JM | Echocardiography | 2019 | Hospital mortality |
AUC values Deep learning 0.913 RF 0.835 |
AUC values LR 0.835 MAGGIC score 0.806 GWTG score 0.783 |
The echocardiography-based deep learning model predicted in-hospital mortality among HD patients more accurately than existing prediction models | ||
Phillips KT | AMIA Annu Symp Proc | 2005 | Mortality |
AUC levels Nearest neighbor 0.823 NN 0.802 Decision tree 0.4975 |
AUC values Stepwise LR 0.734 |
Data mining methods outperform multiple logistic regression and traditional epidemiological methods | ||
Mortazavi BJ | Circ Cardiovasc Qual Outcomes | 2016 | HF readmissions |
C-statistics Boosting 0.678 |
C-statistics LR 0.543 |
Boosting improved the c-statistic by 24.9% over LR | ||
Myers J | Int J Cardiol | 2014 | Cardiovascular death |
AUC values Artificial NN 0.72 Cox PH models 0.69 |
AUC values LR 0.70 |
An artificial NN model slightly improves upon conventional methods | ||
Panahiazar M | Stud Health Technol Inform | 2015 | 5-year mortality |
AUC values RF 62% (baseline set), 72% (extended set) Decision tree 50% (baseline set), 50% (extended set) SVM 55% (baseline set), 38% (extended set) AdaBoost 61% (baseline set), 68% (extended set) |
AUC values LR 61% (baseline set), 73% (extended set) |
LR and RF return more accurate models | ||
Subramanian D | Circ Heart Fail | 2011 | 1-year mortality |
C-statistics Ensemble model using gentle boosting with 10-fold cross-validation 84% |
C-statistics Μultivariate LR model using time-series cytokine Measurements 81% |
The ensemble model showed significantly better performance | ||
Taslimitehrani V | J Biomed Inform | 2016 | 5-year survival |
Precision SVM 0.2, CPXR (log) 0.721 Recall SVM 0.5 CPXR (log) 0.615 Accuracy SVM 0.66 CPXR 0.809 |
Precision LR 0.513 Recall LR 0.506 Accuracy LR 0.717 |
CPXR is better than logistic regression, SVM, random forest and AdaBoost | ||
Turgeman L | Artif Intell Med | 2016 | Hospital readmissions |
AUC values NN 0.589 (train), 0.639 (test) Naïve Bayes 0.699 (train), 0.676 (test) SVM 0.768 (train), 0.643 (test) CART decision tree 0.529 (train), 0.556 (test) Ensemble models C5 0.714 (train), 0.693 (test) CHAID decision tree 0.671 (train), 0.691 (test) |
AUC values LR 0.642 (train), 0.699 (test) |
A dynamic mixed-ensemble model combines a boosted C5.0 model as the base ensemble classifier and SVM model as a secondary classifier to control classification error for the minority class | ||
Wong W | Scientific World Journal | 2003 | Mortality (365 days models) |
AUC values MLP 69% Radial basis function 67% |
AUC values LR 60% |
NNs are able to outperform the LR in terms of sample prediction | ||
Yu S | Artif Intell Med | 2015 | 30-day HF readmissions |
AUC values Linear SVM 0.65 Poly SVM 0.61 Cox PH 0.63 |
AUC values Industry standard method (LACE) 0.56 |
The ML models performed better compared to standard method | ||
Zhang J | Int J Cardiol | 2013 | Death or hospitalization |
AUC values Decision trees 79.7% |
AUC values LR 73.8% |
Decision trees tended to perform better than LR models | ||
Zhu K | Methods Inf Med | 2015 | 30-day readmissions |
AUC values RF 0.577 SVM 0.560 Conditional LR 1 = 0.576 Conditional LR 2 = 0.608 Conditional LR 3 = 0.615 |
AUC values Standard LR 0.547 Stepwise LR 0.539 |
LR after combining ML outperforms standard classification models | ||
Zolfaghar K | In 2013 IEEE International Conference on Big Data | 2013 | HF readmissions |
AUC values Multicare health systems model RF 62.25% |
AUC values Multicare health systems model LR 63.78% Yale model LR 59.72% |
ML random forest model does not outperform traditional LR model |
AUC area under the receiver operating curve, CPET cardiopulmonary exercise test, HF heart failure, LR logistic regression, ML machine learning, MLP multilayer perceptron, NN neural networks, NPV negative prognostic value, PH proportional hazard, PPV positive prognostic value, ppVo2 predicted peak oxygen uptake, RF random forest, SENS sensitivity, SPE specificity, SVM support vector machine