Table 3.
Admission type | Model | Accuracy | Balanced accuracy | Sensitivity/Recall | Specificity | PPV/Precision | NPV | F1-score | AUROC | AUPRC | MAE (%) |
---|---|---|---|---|---|---|---|---|---|---|---|
Test data: 01/02/2019 to 31/01/2020 | |||||||||||
Elective | XGB (200 features) | 0.823 | 0.767 | 0.673 | 0.861 | 0.555 | 0.911 | 0.609 | 0.871 | 0.658 | 8.9 |
Elective | LR (baseline) | 0.464 | 0.596 | 0.820 | 0.372 | 0.252 | 0.889 | 0.385 | 0.629 | 0.269 | 10.7 |
Elective | LR (200 features) | 0.696 | 0.740 | 0.815 | 0.666 | 0.386 | 0.933 | 0.524 | 0.821 | 0.538 | 10.6 |
Emergency | XGB (200 features) | 0.844 | 0.756 | 0.616 | 0.896 | 0.571 | 0.912 | 0.593 | 0.860 | 0.644 | 4.9 |
Emergency | LR (baseline) | 0.637 | 0.654 | 0.682 | 0.626 | 0.292 | 0.897 | 0.409 | 0.708 | 0.349 | 5.8 |
Emergency | LR (200 features) | 0.718 | 0.738 | 0.769 | 0.707 | 0.372 | 0.931 | 0.501 | 0.813 | 0.507 | 5.6 |
Overall | XGB (200 features) | 0.837 | 0.752 | 0.615 | 0.888 | 0.561 | 0.909 | 0.587 | 0.859 | 0.634 | 4.6 |
Overall | LR (baseline) | 0.589 | 0.642 | 0.726 | 0.558 | 0.276 | 0.898 | 0.400 | 0.694 | 0.327 | 5.4 |
Overall | LR (200 features) | 0.700 | 0.733 | 0.787 | 0.680 | 0.363 | 0.932 | 0.497 | 0.809 | 0.497 | 5.0 |
Test data: 01/02/2021 to 31/01/2022 | |||||||||||
Elective | XGB (200 features) | 0.825 | 0.753 | 0.638 | 0.869 | 0.532 | 0.911 | 0.580 | 0.864 | 0.614 | 11.6 |
Emergency | XGB (200 features) | 0.835 | 0.703 | 0.501 | 0.906 | 0.528 | 0.896 | 0.514 | 0.820 | 0.543 | 10.0 |
The baseline LR models only included age, sex, day of the week, and hours since admission. PPV: Positive predictive value, NPV: Negative predictive value, AUROC: Area under the receiver operating curve, AUPRC: Area under the precision-recall curve, MAE: Normalised mean absolute error (mean difference in predicted and actual discharges per day divided by the mean number of discharges per day).