Table 2.
Summary of data processing and performance of machine-learning algorithm in enrolled studies.
Study | Feature Selection Algorithm | Feature Selection Method | Data Splitting | Machine Learning Algorithm | AUROC |
---|---|---|---|---|---|
Kate et al. [60] | NR | NR | ten-fold cross-validation | naïve Bayes | 0.654 |
SVM | 0.621 | ||||
decision trees | 0.639 | ||||
logistic regression | 0.660 | ||||
Thottakkara et al. [61] | LASSO | embedded method | training data (70%); validation (30%) | naïve Bayes | 0.819 |
generalized additive model | 0.858 | ||||
logistic regression | 0.853 | ||||
support vector machine | 0.857 | ||||
Davis et al. [62] | according to clinical experience or previous report | NR | five-fold cross-validation | random forest | 0.73 |
neural network | 0.72 | ||||
naïve Bayes | 0.69 | ||||
logistic regression | 0.78 | ||||
Cheng et al. [63] | according to clinical experience or previous report | NR | ten-fold cross-validation | random forest | 0.765 |
AdaBoostM1 | 0.751 | ||||
logistic regression | 0.763 | ||||
Ibrahim et al. [64] | LASSO | embedded method | Monte Carlo cross-validation | logistic regression | 0.79 |
Koola et al. [65] | LASSO | embedded method | five-fold cross-validation | logistic regression | 0.93 |
naïve Bayes; | 0.73 | ||||
support vector machines; | 0.90 | ||||
random forest; | 0.91 | ||||
gradient boosting | 0.88 | ||||
Koyner et al. [66] | tree-based method | embedded method | ten-fold cross-validation | gradient boosting | 0.9 |
Huang et al. [67] | XGBoost and LASSO | embedded method | training data (70%); validation (30%) | gradient boost; | 0.728 |
logistic regression | 0.717 | ||||
Lin et al. [68] | according to clinical experience or previous report | NR | five-fold cross-validation | SVM | 0.86 |
Simonov et al. [69] | according to clinical experience or previous report | NR | training data (67%); validation (33%) | discrete-time logistic regression | 0.74 |
Huang et al. [70] | stepwise backward selection, LASSO, premutation-based selection | embedded method | training (50%); validation (50%) | generalized additive model | 0.777 |
Tomašev et al. [71] | L1 regularization | embedded method | training (80%); validation (5%); calibration (5%); test (10%) | recurrent neural network | 0.934 |
Adhikari et al. [72] | F-test | filter method | five-fold cross-validation | random forest | 0.86 |
Flechet et al. [73] | according to clinical experience or previous report | NR | NR | random forest | 0.78 |
Parreco et al. [74] | NR | NR | NR | gradient boosting; | 0.834 |
logistic regression; | 0.827 | ||||
deep learning | 0.817 | ||||
Xu et al. [75] | gradient boosting | embedded method | five-fold cross-validation | gradient boosting | 0.749 |
Tran et al. [76] | NR | NR | Scikit-learn cross validation | k-nearest neighbor | 0.92 |
Zhang et al. [77] | XGBoost | embedded method | bootstrap validation | gradient boosting | 0.86 |
Zimmerman et al. [78] | logistic regression | embedded method | five-fold cross-validation | logistic regression | 0.783 |
random forest | 0.779 | ||||
neural network | 0.796 | ||||
Rashidi et al. [79] | according to clinical experience or previous report | NR | Scikit-learn cross validation | recurrent neural network | 0.92 |
Zhou et al. [80] | NR | NR | five-fold cross-validation | logistic regression | 0.73 |
linear kernel SVM | 0.84 | ||||
Gaussian kernel SVM | 0.77 | ||||
random forest | 0.89 | ||||
Martinez et al. [81] | LASSO | embedded method | ten-fold cross-validation | random forest | not provided |
Lei et al. [82] | NR | NR | training data (70%); validation (30%) | Gradient boosting | 0.8 |
Lei et al. [82] | NR | NR | training data (70%); validation (30%) | Gradient boosting | 0.772 |
Light gradient boosted machine | 0.725 | ||||
random forest | 0.662 | ||||
DecisionTree | 0.628 | ||||
Qu et al. [84] | NR | NR | ten-fold cross-validation | random forest | 0.821 |
classification and regression tree | 0.8033 | ||||
logistic regression | 0.8728 | ||||
extreme gradient boosting | 0.9193 | ||||
Tseng et al. [85] | tree-based method | embedded method | five-fold cross-validation | random forest | 0.839 |
random forest with extreme gradient boosting | 0.843 | ||||
Sun et al. [86] | Boruta algorithm | wrapper method | ten-fold cross-validation | random forest | 0.82 |
logistic regression; | 0.69 | ||||
Churpek et al. [87] | gradient boosting | embedded method | ten-fold cross-validation | gradient boosted machine | 0.72 |
Hsu et al. [88] | XGBoost and LASSO | embedded method | five-fold cross-validation | logistic regression; | 0.767 |
Penny-Dimri et al. [89] | tree-based method | embedded method | five-fold cross-validation | logistic regression; | 0.77 |
gradient boosted machine | 0.78 | ||||
neural networks | 0.77 | ||||
Li et al. [90] | LASSO | embedded method | ten-fold cross-validation | Bayesian networks | 0.736 |
AUROC: area under the receiver operating characteristic curve; LASSO: least absolute shrinkage and selection operator; NR: not reported; SAPS: simplified acute physiology score; SVM: support vector machine; XGB: eXtreme Gradient Boostin.