Table 2.
Statistical metrics for the best algorithm of each cluster of method and their combined regression model, both the “Full Regression” and the stepwise-optimized regression model (rf + svmPoly + pda) for both training and testing set.
| C5.0 | pda | plr | rf | svmPoly | Full Regression | rf + svmPoly + pda | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
| AUROC | 0.88 | 0.83 | 0.85 | 0.84 | 0.83 | 0.85 | 0.93 | 0.83 | 0.89 | 0.83 | 0.91 | 0.91 | 0.91 | 0.91 |
| Accuracy | 0.88 | 0.91 | 0.85 | 0.88 | 0.83 | 0.85 | 0.93 | 0.90 | 0.89 | 0.90 | 0.94 | 0.95 | 0.94 | 0.95 |
| Sensitivity | 0.78 | 0.68 | 0.86 | 0.76 | 0.82 | 0.84 | 0.87 | 0.71 | 0.80 | 0.68 | 0.98 | 0.98 | 0.98 | 0.98 |
| Specificity | 0.98 | 0.98 | 0.84 | 0.91 | 0.85 | 0.85 | 0.98 | 0.96 | 0.98 | 0.97 | 0.84 | 0.85 | 0.84 | 0.85 |
| PPV | 0.98 | 0.90 | 0.84 | 0.73 | 0.84 | 0.64 | 0.98 | 0.84 | 0.97 | 0.87 | 0.95 | 0.95 | 0.95 | 0.95 |
| NPV | 0.81 | 0.91 | 0.85 | 0.93 | 0.82 | 0.95 | 0.89 | 0.91 | 0.83 | 0.91 | 0.91 | 0.94 | 0.91 | 0.94 |
| FPR | 0.22 | 0.32 | 0.14 | 0.24 | 0.18 | 0.16 | 0.13 | 0.29 | 0.20 | 0.32 | 0.02 | 0.02 | 0.02 | 0.02 |
| FNR | 0.02 | 0.02 | 0.16 | 0.09 | 0.15 | 0.15 | 0.02 | 0.04 | 0.02 | 0.03 | 0.16 | 0.15 | 0.16 | 0.15 |
| F1 | 0.86 | 0.78 | 0.85 | 0.74 | 0.83 | 0.73 | 0.92 | 0.77 | 0.88 | 0.76 | 0.96 | 0.97 | 0.96 | 0.97 |
PCA: dataset upon Principal Component Analysis; PCAUp: dataset upon Principal Component Analysis and up-scaling of the minor class; PCADown: dataset upon Principal Component Analysis and down-sampling of the major class; Scaled: dataset upon z-score calculation; ScaledUp: dataset upon z-score calculation and up-sampling of the minor class; ScaledDown: dataset upon z-score calculation and down-sampling of the major class.