Table 1.
Characteristics of some traditional and novel performance measures
Aspect | Measure | Visualization | Characteristics |
---|---|---|---|
Overall performance | R2 Brier | Validation graph | Better with lower distance between Y and Ŷ. Captures calibration and discrimination aspects. |
Discrimination | C statistic | ROC curve | Rank order statistic; Interpretation for a pair of patients with and without the outcome |
Discrimination slope | Box plot | Difference in mean of predictions between outcomes; Easy visualization | |
Calibration | Calibration-in-the-large | Calibration or validation graph | Compare mean(y) versus mean(ŷ); essential aspect for external validation |
Calibration slope | Regression slope of linear predictor; essential aspect for internal and external validation related to ‘shrinkage’ of regression coefficients | ||
Hosmer-Lemeshow test | Compares observed to predicted by decile of predicted probability | ||
Reclassification | Reclassification table | Cross-table or scatter plot | Compare classifications from 2 models (one with, one without a marker) for changes |
Reclassification calibration | Compare observed and predicted within cross-classified categories | ||
Net Reclassification Index (NRI) | Compare classifications from 2 models for changes by outcome for a net calculation of changes in the right correction | ||
Integrated Discrimination Index (IDI) | Box plots for 2 models (one with, one without a marker) | Integrates the NRI over all possible cut-offs; equivalent to difference in discrimination slopes | |
Clinical usefulness | Net Benefit (NB) | Cross-table | Net number of true positives gained by using a model compared to no model at a single threshold (NB) or over a range of thresholds (DCA) |
Decision curve analysis (DCA) | Decision curve |