Table 2.
Performance metrics (95% CI) | SOFA Score | APACHE II Score | Clinical model * | |||
---|---|---|---|---|---|---|
Derivation Cohort (UKY) | Validation Cohort (UTSW) | Derivation Cohort (UKY) | Validation Cohort (UTSW) | Derivation Cohort (UKY) | Validation Cohort (UTSW) | |
AUC | 0.71(0.71–0.71) | 0.71(0.71–0.71) | 0.69(0.68–0.69) | 0.67(0.67–0.67) | 0.79 (0.79–0.80) | 0.74 (0.73–0.74) |
Difference in AUC (vs. SOFA) | - | - | - | - | 0.08 | 0.03 |
- P value | - | - | - | - | <0.001 | <0.001 |
Difference in AUC (vs. APACHE II) | - | - | - | - | 0.10 | 0.07 |
- P value | - | - | - | - | <0.001 | <0.001 |
Accuracy | 0.65(0.64–0.65) | 0.73(0.73–0.73) | 0.65(0.64–0.65) | 0.52(0.50–0.54) | 0.71 (0.71–0.71) | 0.65 (0.64–0.66) |
Precision | 0.34(0.34–0.35) | 0.19(0.19–0.19) | 0.34(0.33–0.34) | 0.14(0.13–0.14) | 0.41 (0.40–0.42) | 0.18 (0.17–0.18) |
Sensitivity | 0.68(0.67–0.68) | 0.55(0.55–0.55) | 0.63(0.62–0.64) | 0.73(0.70–0.75) | 0.72 (0.72–0.73) | 0.69 (0.67–0.71) |
Specificity | 0.64(0.63–0.64) | 0.75(0.75–0.75) | 0.65(0.65–0.65) | 0.50(0.47–0.52) | 0.71 (0.70–0.71) | 0.64 (0.63–0.65) |
F1 | 0.46(0.45–0.46) | 0.29(0.29–0.29) | 0.44(0.43–0.45) | 0.23(0.23–0.23) | 0.52 (0.52–0.53) | 0.28 (0.27–0.29) |
PPV | 0.34(0.34–0.35) | 0.19(0.19–0.19) | 0.34(0.33–0.34) | 0.14(0.13–0.14) | 0.41 (0.40–0.42) | 0.18 (0.17–0.18) |
NPV | 0.88(0.87–0.88) | 0.94(0.94–0.94) | 0.86(0.86–0.87) | 0.94(0.94–0.94) | 0.90 (0.90–0.90) | 0.95 (0.95–0.95) |
Calibration Intercept | −1.27(−1.29 to −1.25) | −2.03(−2.03 to −2.02) | −1.28(−1.29 to −1.26) | −2.47(−2.47 to −2.46) | −1.25 (−1.27 to −1.24) | −2.23 (−2.26 to −2.20) |
Calibration Slope | 1.02(1.00–1.04) | 1.12(1.08–1.16) | 0.99(0.95–1.03) | 0.75(0.73–0.78) | 1.13 (1.11–1.14) | 1.17 (1.12–1.21) |
NRI % (vs. SOFA) | ||||||
- Categorical | - | - | - | - | 0.12 [0.09 to 0.15] | 0.05 [−0.03 to 0.13] |
- P value | - | - | - | - | < 0.001 | 0.20 |
NRI % (vs. APACHE II) | ||||||
- Categorical | - | - | - | - | 0.15 [0.12 to 0.18] | 0.12 [0.04 to 0.20] |
- P value | - | - | - | - | < 0.001 | < 0.001 |
The proposed clinical model included 15 features, refer to the Methods section for details. The machine learning method for the reported performance evaluation of SOFA and APACHE II is Logistic Regression and for the proposed clinical model is Random Forest.
Abbreviations: APACHE (acute physiologic assessment and chronic health evaluation), AUC (area under the curve), CI (confidence interval), F1 (F1 score), NPV (negative predictive value), NRI (net reclassification index), PPV (positive predictive value), SOFA (sequential organ failure assessment), UKY (University of Kentucky), UTSW (University of Texas Southwestern)