. 2022 Dec 8;12:21247. doi: 10.1038/s41598-022-25472-z

Table 2.

Hold-out validation performance of all models in all binary classification tasks (value ± 95% CI).

Model	AUROC	AUPRC	Accuracy	F-1	Precision	Recall
In-ICU mortality
LR	85.1 ± 3.2	39.5 ± 7.2	93.4 ± 0.6	30.1 ± 7.6	55.0 ± 11.6	20.7 ± 6.1
RF	89.1 ± 2.2	45.9 ± 7.3	93.5 ± 0.3	14.2 ± 6.5	81.8 ± 19.2	7.8 ± 3.9
GRU-D	89.4 ± 2.3	50.8 ± 6.8	94.0 ± 0.6	38.9 ± 8.1	66.2 ± 10.3	27.6 ± 6.5
TCN	89.2 ± 2.5	50.8 ± 7.0	94.3 ± 0.6	46.6 ± 7.3	64.5 ± 8.7	36.5 ± 7.1
In-hospital mortality
LR	83.6 ± 2.6	44.7 ± 5.7	91.0 ± 0.7	35.7 ± 6.0	61.4 ± 9.3	25.2 ± 5.3
RF	86.4 ± 2.3	49.3 ± 5.9	90.7 ± 0.4	14.5 ± 5.8	85.1 ± 14.0	7.9 ± 3.4
GRU-D	87.3 ± 2.3	52.1 ± 5.6	91.6 ± 0.8	44.2 ± 6.0	65.4 ± 7.5	33.4 ± 5.8
TCN	87.7 ± 2.1	53.0 ± 6.0	91.2 ± 0.9	47.2 ± 6.0	58.7 ± 6.7	39.5 ± 6.2
Length of stay (LOS > 3)
LR	69.0 ± 2.1	61.7 ± 2.8	65.5 ± 1.8	53.5 ± 2.7	63.6 ± 2.8	46.2 ± 2.9
RF	71.4 ± 2.0	65.5 ± 2.8	67.3 ± 1.7	55.3 ± 2.7	67.1 ± 2.8	47.0 ± 3.0
GRU-D	72.2 ± 2.0	65.7 ± 2.7	68.1 ± 1.7	59.4 ± 2.5	65.6 ± 2.6	54.2 ± 3.0
TCN	71.6 ± 2.2	65.0 ± 2.7	67.0 ± 1.7	55.6 ± 2.7	66.0 ± 2.8	48.0 ± 2.9
Length of stay (LOS > 7)
LR	66.8 ± 4.2	15.9 ± 3.3	91.7 ± 0.3	2.3 ± 2.8	15.2 ± 17.7	1.3 ± 1.6
RF	75.3 ± 3.5	22.0 ± 4.5	92.1 ± 0.0	0.0 ± 0.0	0.0 ± 0.0	0.0 ± 0.0
GRU-D	74.4 ± 3.8	22.4 ± 4.5	92.0 ± 0.4	9.8 ± 5.3	44.9 ± 20.4	5.5 ± 3.2
TCN	73.5 ± 3.6	18.8 ± 3.5	91.8 ± 0.3	3.7 ± 3.5	25.0 ± 21.9	2.0 ± 1.9

All values shown in %. Primary evaluation metrics: AUROC, AUPRC, Accuracy, F-1. Secondary evaluation metrics: precision, recall. TCN, temporal convolution network; GRU-D, gated recurrent unit with delay; RF, random forest; LR, logistic regression; AUROC, area under receiver operating curve; AUPRC, area under precision recall curve.

Best-in-task values for primary evaluation metrics are in bold.