Table 2.

Recommendations

1. Test for model improvement using a likelihood-based or similar test. 1a. The IDI may be used as a nonparametric test or measure of effect if the models are well-calibrated. 1b. The NRI > 0 may be misleading, especially if a new marker is not normally distributed.
2. Assess overall calibration and discrimination of each model. 2a. Plot observed and expected risk in categories or continuously with a smoother and compute the calibration intercept and slope. 2b. Compute the ROC curve and AUC or c-statistic if discrimination across the whole range of risk is of interest.
3. If relevant risk strata are available, compute the risk reclassification table with clinical cut points or the overall prevalence, if relevant. 3a. Assess improvement in calibration within cross-classified categories. 3b. Assess improvement in discrimination through the categorical NRI.
4. If relevant, consider bias-corrected conditional NRI to enhance screening of individuals at intermediate risk.
5. If pre-specified risk strata are not available, consider cost tradeoffs to develop appropriate cut points.
6. Consider decision analysis to assess the net benefit of using models for treatment decisions. 6a. Decision curves can be used to compare treatment strategies across a wide range of thresholds. 6b. Conduct full cost-effectiveness analysis if appropriate and estimates available.
7. Validate all measures or tests of improvement in data not used to fit or select models. 7a. Internal validation, using bootstrapping, X-fold cross-validation, or (ideally multiple) split samples is required. 7b. External validation is preferable, particularly prior to clinical use.