Table 2.
1. Test for model improvement using a likelihood-based or similar test. 1a. The IDI may be used as a nonparametric test or measure of effect if the models are well-calibrated. 1b. The NRI > 0 may be misleading, especially if a new marker is not normally distributed. | |
2. Assess overall calibration and discrimination of each model. 2a. Plot observed and expected risk in categories or continuously with a smoother and compute the calibration intercept and slope. 2b. Compute the ROC curve and AUC or c-statistic if discrimination across the whole range of risk is of interest. | |
3. If relevant risk strata are available, compute the risk reclassification table with clinical cut points or the overall prevalence, if relevant. 3a. Assess improvement in calibration within cross-classified categories. 3b. Assess improvement in discrimination through the categorical NRI. | |
4. If relevant, consider bias-corrected conditional NRI to enhance screening of individuals at intermediate risk. | |
5. If pre-specified risk strata are not available, consider cost tradeoffs to develop appropriate cut points. | |
6. Consider decision analysis to assess the net benefit of using models for treatment decisions. 6a. Decision curves can be used to compare treatment strategies across a wide range of thresholds. 6b. Conduct full cost-effectiveness analysis if appropriate and estimates available. | |
7. Validate all measures or tests of improvement in data not used to fit or select models. 7a. Internal validation, using bootstrapping, X-fold cross-validation, or (ideally multiple) split samples is required. 7b. External validation is preferable, particularly prior to clinical use. |