Table 2. Performance metrics of methods: random forest, gradient boosting machine (GBM), and logistic regression.
| Metrics | Random forest | GBM | Logistic regression | ||||||
|---|---|---|---|---|---|---|---|---|---|
| TrainingMarch–April (MA) | ValidatingMarch–April (MA) | TestingMay–Dec(MD) | TrainingMarch–April (MA) | ValidatingMarch–April (MA) | TestingMay–Dec(MD) | TrainingMarch–April (MA) | ValidatingMarch–April (MA) | TestingMay–Dec(MD) | |
| AUC (DeLong)(95% CI) | 0.97(0.97–0.98) | 0.83(0.80–0.87) | 0.78(0.73–0.84) | 0.88(0.86–0.89) | 0.84(0.80–0.88) | 0.78(0.73–0.83) | 0.84(0.82–0.86) | 0.83(0.79–0.87) | 0.52(0.44–0.60) |
| Sensitivity(95% CI) | 0.93(0.91–0.97) | 0.82(0.72–0.92) | 0.73(0.54–1.00) | 0.85(0.80–0.88) | 0.80(0.66–0.90) | 0.77(0.65–0.94) | 0.80(0.77–0.84) | 0.84(0.76–0.91) | 0.87(0.18–1.00) |
| Specificity(95% CI) | 0.92(0.88–0.94) | 0.75(0.63–0.83) | 0.73(0.41–0.89) | 0.77(0.73–0.81) | 0.75(0.65–0.87) | 0.71(0.50–0.79) | 0.74(0.70–0.77) | 0.73(0.65–0.79) | 0.26(0.11–0.94) |
Comparison between the performances of three methods: random forest, GBM, and logistic regression model applied on the rebalanced dataset obtained with SMOTE methodology. Logistic regression predictions are computed using the 10-fold cross-validation in order to be comparable with random forest and GBM predictions (which use out-of-bag and 10-fold cross-validation, respectively).