Table 2.
Final performance of the prediction model.
| Dataset | Base dataset | Embedded dataset | |||
|---|---|---|---|---|---|
| Model | Random Forest | Logistic regression | SVM | XGBoost | XGBoost |
| AUROC | 0.655 | 0.622 | 0.644 | 0.762 | 0.780 |
| AUPRC | 0.089 | 0.061 | 0.055 | 0.141 | 0.175 |
| Sensitivity | 0.611 | 0.583 | 0.806 | 0.778 | 0.722 |
| Specificity | 0.776 | 0.751 | 0.504 | 0.788 | 0.818 |
| PPV | 0.073 | 0.071 | 0.047 | 0.105 | 0.114 |
| NPV | 0.986 | 0.985 | 0.989 | 0.992 | 0.990 |
| Accuracy | 0.771 | 0.745 | 0.513 | 0.788 | 0.815 |
| MCC | 0.151 | 0.134 | 0.106 | 0.232 | 0.234 |
AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve; MCC, Matthews correlation coefficient; NPV, negative predictive value; PPV, positive predictive value; SVM, support vector machine; XGBoost, extreme gradient boosting.