Skip to main content
. 2023 Feb 9;49(3):1545–1553. doi: 10.1007/s00068-023-02237-5

Table 2.

Model performance assessment on external validation in the Sheba Medical Center cohort (95% CI), n = 2,033

Reference: Model performance metrics in the development and internal validation cohort
Metric 90-day mortality Reference 2-year mortality Reference
c-statistic a 0.67 (0.62, 0.71) 0.74 (0.67, 0.80) 0.67 (0.65, 0.70) 0.70 (0.63, 0.75)
Intercept b 0.18 (0.02, 0.35) − 0.05 (− 0.37, 0.26) 0.50 (0.40, 0.61) − 0.03 (− 0.27, 0.19)
Slope b 0.92 (0.67, 1.17) 1.11 (0.73, 1.51) 0.90 (0.74, 1.04) 0.89 (0.62, 1.19)
Brier c 0.071 (0.062, 0.081) 0.078 (0.061, 0.098) 0.19 (0.18, 0.20) 0.16 (0.15, 0.18)

Null-model Brier score in the Israeli cohort: 90-day—0.073, 2-year—0.20

aA c-statistic of 0.5 indicates random guess and 1.0 indicates perfect discriminatory ability; a c-index of 0.6 to 0.7 is typically considered acceptable discriminatory ability

bCalibration plots the predicted versus the observed probabilities; a perfect calibration plot has an intercept of 0 (< 0 reflects overestimation and > 0 reflects underestimating the probability of the outcome) and a slope of 1 (model is performing similarly in training and test sets); if the slope is < 1 (often in small datasets), this reflects model overfitting; probabilities are too extreme (low probability too low; high probability too high)

cThe Brier score of the prediction model should be compared with that of the null model; the null-model Brier score is a score calculated from the probability of delirium in the dataset and used to benchmark the algorithm’s Brier score; a lower Brier score of the prediction model indicates good overall model performance