Table 5.
Performance metrics, including confidence intervals for area under the receiver operater characteristics curve (ROC-AUC), for all long-term hospitalization models comparing models trained on either danish trauma dataset (DTD), the American trauma quality Improvement program dataset (TQIPD) or a mixed traning dataset consisting of a random forest-selected subset of TQIPD and DTD training data (mixed). Neural network with transfer learning was retrained on DTD.
| Model | Training data set | ROC-AUC | Lower CI ROC-AUC | Upper CI ROC-AUC | Precision | Recall | F1 score |
|---|---|---|---|---|---|---|---|
| Random forest | DTD | 0.885 | 0.849 | 0.921 | 0.817 | 0.958 | 0.882 |
| TQIPD | 0.860 | 0.820 | 0.900 | 0.840 | 0.930 | 0.883 | |
| Mixed | 0.884 | 0.848 | 0.920 | 0.858 | 0.898 | 0.877 | |
| AdaBoost | DTD | 0.890 | 0.855 | 0.925 | 0.863 | 0.935 | 0.897 |
| TQIPD | 0.885 | 0.849 | 0.921 | 0.834 | 0.935 | 0.822 | |
| Mixed | 0.884 | 0.848 | 0.920 | 0.858 | 0.898 | 0.877 | |
| XGBoost | DTD | 0.865 | 0.825 | 0.904 | 0.820 | 0.953 | 0.882 |
| TQIPD | 0.866 | 0.826 | 0.905 | 0.801 | 0.935 | 0.863 | |
| Mixed | 0.865 | 0.826 | 0.905 | 0.855 | 0.930 | 0.891 | |
| Explainable boosting machine | DTD | 0.883 | 0.847 | 0.919 | 0.885 | 0.898 | 0.891 |
| TQIPD | 0.880 | 0.843 | 0.917 | 0.869 | 0.893 | 0.881 | |
| Mixed | 0.877 | 0.839 | 0.914 | 0.844 | 0.907 | 0.874 | |
| Neural network | DTD | 0.857 | 0.817 | 0.898 | 0.853 | 0.916 | 0.883 |
| TQIPD | 0.888 | 0.852 | 0.923 | 0.810 | 0.953 | 0.876 | |
| Mixed | 0.875 | 0.837 | 0.913 | 0.831 | 0.935 | 0.88 | |
| Neural network (Transfer learning) | TQIPD/DTD | 0.879 | 0.842 | 0.916 | 0.813 | 0.949 | 0.876 |
| Mixed/DTD | 0.874 | 0.836 | 0.912 | 0.830 | 0.930 | 0.877 |