Skip to main content
. 2024 Nov 18;4:236. doi: 10.1038/s43856-024-00673-x

Table 3.

Model performance of extreme gradient boosting (XGB) models with 200 features, baseline logistic regression (LR) model, and LR model with 200 features predicting 24-hour discharge in the test dataset (01 February 2019 to 31 January 2020) and an additional test dataset post-COVID (01 February 2021 to 31 January 2022)

Admission type Model Accuracy Balanced accuracy Sensitivity/Recall Specificity PPV/Precision NPV F1-score AUROC AUPRC MAE (%)
Test data: 01/02/2019 to 31/01/2020
 Elective XGB (200 features) 0.823 0.767 0.673 0.861 0.555 0.911 0.609 0.871 0.658 8.9
 Elective LR (baseline) 0.464 0.596 0.820 0.372 0.252 0.889 0.385 0.629 0.269 10.7
 Elective LR (200 features) 0.696 0.740 0.815 0.666 0.386 0.933 0.524 0.821 0.538 10.6
 Emergency XGB (200 features) 0.844 0.756 0.616 0.896 0.571 0.912 0.593 0.860 0.644 4.9
 Emergency LR (baseline) 0.637 0.654 0.682 0.626 0.292 0.897 0.409 0.708 0.349 5.8
 Emergency LR (200 features) 0.718 0.738 0.769 0.707 0.372 0.931 0.501 0.813 0.507 5.6
 Overall XGB (200 features) 0.837 0.752 0.615 0.888 0.561 0.909 0.587 0.859 0.634 4.6
 Overall LR (baseline) 0.589 0.642 0.726 0.558 0.276 0.898 0.400 0.694 0.327 5.4
 Overall LR (200 features) 0.700 0.733 0.787 0.680 0.363 0.932 0.497 0.809 0.497 5.0
Test data: 01/02/2021 to 31/01/2022
 Elective XGB (200 features) 0.825 0.753 0.638 0.869 0.532 0.911 0.580 0.864 0.614 11.6
 Emergency XGB (200 features) 0.835 0.703 0.501 0.906 0.528 0.896 0.514 0.820 0.543 10.0

The baseline LR models only included age, sex, day of the week, and hours since admission. PPV: Positive predictive value, NPV: Negative predictive value, AUROC: Area under the receiver operating curve, AUPRC: Area under the precision-recall curve, MAE: Normalised mean absolute error (mean difference in predicted and actual discharges per day divided by the mean number of discharges per day).