Table 2.
Bins | Model | AUC | F1 | P-value |
---|---|---|---|---|
Classic models | ||||
1–3 d | MLP w/ x48 | 0.791 (0.0043) | 0.746 (0.0072) | .0034 |
3–5 d | MLP w/ w48 | 0.653 (0.018) | 0.444 (0.029) | .081 |
5–8 d | LR w/ w48 | 0.705 (0.006) | 0.298 (0.007) | .121 |
8–14 d | LR w/ w48 | 0.840 (0.0079) | 0.372 (0.014) | .029 |
14–21 d | LR w/ x48 | 0.887 (0.019) | 0.264 (0.015) | .033 |
21–30 d | LR w/ x48 | 0.917 (0.011) | 0.182 (0.01) | .0016 |
30+ | LR w/ w48 | 0.934 (0.011) | 0.173 (0.0041) | .0028 |
Micro | LR w/ w48 | 0.747 (0.0025) | 0.419 (0.0018) | .051 |
Sequential models | ||||
1–3 d | CNN-LSTM w/ x19 | 0.758 (0.0055) | 0.615 (0.015) | .013 |
3–5 d | CNN-LSTM w/ x19 | 0.645 (0.0047) | 0.139 (0.031) | .092 |
5–8 d | CNN-LSTM w/ x19 | 0.736 (0.0029) | 0.103 (0.012) | .088 |
8–14 d | CNN-LSTM w/ x19 | 0.838 (0.0055) | 0.181 (0.037) | .055 |
14–21 d | CNN-LSTM w/ x19 | 0.877 (0.009) | 0.112 (0.025) | .0046 |
21–30 d | LSTM w/ x19+h2v | 0.879 (0.025) | 0.135 (0.032) | .011 |
30+ | LSTM w/ x19+h2v | 0.889 (0.027) | 0.165 (0.07) | .005 |
Micro | CNN-LSTM w/ x19 | 0.846 (0.001) | 0.368 (0.010) | .00014 |
Note: Each performance metric is evaluated across 5 stratified shuffle splits. The mean performance is reported with the standard deviation in parenthesis. The P-value is calculated by comparing the AUC of a given model with the baseline performance with random forest classifier and diagnostic histories. More extensive pairwise statistical t-tests are shown in Supplementary Table S8.
Abbreviations: LOS: length of stay; AUC: area under receiver operating characteristic curve; F1: f1-score; CNN: Convolutional Neural Network; MLP: Multi-Layer Perceptron; LR: Logistic Regression; LSTM: Long Short-term Memory.
Bold values indicate best performance.