Table 2.
Reference: Model performance metrics in the development and internal validation cohort | ||||
---|---|---|---|---|
Metric | 90-day mortality | Reference | 2-year mortality | Reference |
c-statistic a | 0.67 (0.62, 0.71) | 0.74 (0.67, 0.80) | 0.67 (0.65, 0.70) | 0.70 (0.63, 0.75) |
Intercept b | 0.18 (0.02, 0.35) | − 0.05 (− 0.37, 0.26) | 0.50 (0.40, 0.61) | − 0.03 (− 0.27, 0.19) |
Slope b | 0.92 (0.67, 1.17) | 1.11 (0.73, 1.51) | 0.90 (0.74, 1.04) | 0.89 (0.62, 1.19) |
Brier c | 0.071 (0.062, 0.081) | 0.078 (0.061, 0.098) | 0.19 (0.18, 0.20) | 0.16 (0.15, 0.18) |
Null-model Brier score in the Israeli cohort: 90-day—0.073, 2-year—0.20
aA c-statistic of 0.5 indicates random guess and 1.0 indicates perfect discriminatory ability; a c-index of 0.6 to 0.7 is typically considered acceptable discriminatory ability
bCalibration plots the predicted versus the observed probabilities; a perfect calibration plot has an intercept of 0 (< 0 reflects overestimation and > 0 reflects underestimating the probability of the outcome) and a slope of 1 (model is performing similarly in training and test sets); if the slope is < 1 (often in small datasets), this reflects model overfitting; probabilities are too extreme (low probability too low; high probability too high)
cThe Brier score of the prediction model should be compared with that of the null model; the null-model Brier score is a score calculated from the probability of delirium in the dataset and used to benchmark the algorithm’s Brier score; a lower Brier score of the prediction model indicates good overall model performance