. 2023 Feb 9;49(3):1545–1553. doi: 10.1007/s00068-023-02237-5

Table 2.

Model performance assessment on external validation in the Sheba Medical Center cohort (95% CI), n = 2,033

Reference: Model performance metrics in the development and internal validation cohort
Metric	90-day mortality	Reference	2-year mortality	Reference
c-statistic ^a	0.67 (0.62, 0.71)	0.74 (0.67, 0.80)	0.67 (0.65, 0.70)	0.70 (0.63, 0.75)
Intercept ^b	0.18 (0.02, 0.35)	− 0.05 (− 0.37, 0.26)	0.50 (0.40, 0.61)	− 0.03 (− 0.27, 0.19)
Slope ^b	0.92 (0.67, 1.17)	1.11 (0.73, 1.51)	0.90 (0.74, 1.04)	0.89 (0.62, 1.19)
Brier ^c	0.071 (0.062, 0.081)	0.078 (0.061, 0.098)	0.19 (0.18, 0.20)	0.16 (0.15, 0.18)

Null-model Brier score in the Israeli cohort: 90-day—0.073, 2-year—0.20

^aA c-statistic of 0.5 indicates random guess and 1.0 indicates perfect discriminatory ability; a c-index of 0.6 to 0.7 is typically considered acceptable discriminatory ability

^bCalibration plots the predicted versus the observed probabilities; a perfect calibration plot has an intercept of 0 (< 0 reflects overestimation and > 0 reflects underestimating the probability of the outcome) and a slope of 1 (model is performing similarly in training and test sets); if the slope is < 1 (often in small datasets), this reflects model overfitting; probabilities are too extreme (low probability too low; high probability too high)

^cThe Brier score of the prediction model should be compared with that of the null model; the null-model Brier score is a score calculated from the probability of delirium in the dataset and used to benchmark the algorithm’s Brier score; a lower Brier score of the prediction model indicates good overall model performance