Skip to main content
. 2022 Dec 8;12:21247. doi: 10.1038/s41598-022-25472-z

Table 2.

Hold-out validation performance of all models in all binary classification tasks (value ± 95% CI).

Model AUROC AUPRC Accuracy F-1 Precision Recall
In-ICU mortality
LR 85.1 ± 3.2 39.5 ± 7.2 93.4 ± 0.6 30.1 ± 7.6 55.0 ± 11.6 20.7 ± 6.1
RF 89.1 ± 2.2 45.9 ± 7.3 93.5 ± 0.3 14.2 ± 6.5 81.8 ± 19.2 7.8 ± 3.9
GRU-D 89.4 ± 2.3 50.8 ± 6.8 94.0 ± 0.6 38.9 ± 8.1 66.2 ± 10.3 27.6 ± 6.5
TCN 89.2 ± 2.5 50.8 ± 7.0 94.3 ± 0.6 46.6 ± 7.3 64.5 ± 8.7 36.5 ± 7.1
In-hospital mortality
LR 83.6 ± 2.6 44.7 ± 5.7 91.0 ± 0.7 35.7 ± 6.0 61.4 ± 9.3 25.2 ± 5.3
RF 86.4 ± 2.3 49.3 ± 5.9 90.7 ± 0.4 14.5 ± 5.8 85.1 ± 14.0 7.9 ± 3.4
GRU-D 87.3 ± 2.3 52.1 ± 5.6 91.6 ± 0.8 44.2 ± 6.0 65.4 ± 7.5 33.4 ± 5.8
TCN 87.7 ± 2.1 53.0 ± 6.0 91.2 ± 0.9 47.2 ± 6.0 58.7 ± 6.7 39.5 ± 6.2
Length of stay (LOS > 3)
LR 69.0 ± 2.1 61.7 ± 2.8 65.5 ± 1.8 53.5 ± 2.7 63.6 ± 2.8 46.2 ± 2.9
RF 71.4 ± 2.0 65.5 ± 2.8 67.3 ± 1.7 55.3 ± 2.7 67.1 ± 2.8 47.0 ± 3.0
GRU-D 72.2 ± 2.0 65.7 ± 2.7 68.1 ± 1.7 59.4 ± 2.5 65.6 ± 2.6 54.2 ± 3.0
TCN 71.6 ± 2.2 65.0 ± 2.7 67.0 ± 1.7 55.6 ± 2.7 66.0 ± 2.8 48.0 ± 2.9
Length of stay (LOS > 7)
LR 66.8 ± 4.2 15.9 ± 3.3 91.7 ± 0.3 2.3 ± 2.8 15.2 ± 17.7 1.3 ± 1.6
RF 75.3 ± 3.5 22.0 ± 4.5 92.1 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0
GRU-D 74.4 ± 3.8 22.4 ± 4.5 92.0 ± 0.4 9.8 ± 5.3 44.9 ± 20.4 5.5 ± 3.2
TCN 73.5 ± 3.6 18.8 ± 3.5 91.8 ± 0.3 3.7 ± 3.5 25.0 ± 21.9 2.0 ± 1.9

All values shown in %. Primary evaluation metrics: AUROC, AUPRC, Accuracy, F-1. Secondary evaluation metrics: precision, recall. TCN, temporal convolution network; GRU-D, gated recurrent unit with delay; RF, random forest; LR, logistic regression; AUROC, area under receiver operating curve; AUPRC, area under precision recall curve.

Best-in-task values for primary evaluation metrics are in bold.