Table 1.

Summary of performance measures for quantifying added value

Measure	Advantages	Disadvantages
Likelihood-based measures	Reflects probability of obtaining the observed data	Based on assumed model
Likelihood ratio (LR), change in AIC or BIC	The LR test is the uniformly most powerful test for nested models. The AIC and BIC can be used to assess non-nested models.	While powerful, statistical association or model improvement may not be of clinical importance.
Discrimination	Assesses separation of cases and non-cases	Only one component of model fit
Difference in ROC curves, AUC, c-statistic	Assesses discrimination between those with and without outcome of interest across the whole range of a continuous predictor or score. Useful for classification	Based on ranks only. Does not assess calibration. Differences may not be of clinical importance.
Clinical risk reclassification	Examines difference in assigning to clinically important risk strata	Strata should be pre-defined. Loses information if strata are not clinically important
Reclassification calibration statistic	Assesses calibration within cross-classified risk strata	A test for each model is needed
Categorical NRI	Can assess changes in important risk strata. Cases and non-cases can be considered separately	Depends on the number of categories and cut points used
NRI(p)	Nice statistical properties. Does not vary by event rate in the data	May not be clinically relevant
Conditional NRI	Indicates improvement within clinically important risk subgroups	Biased in its crude form, and a correction based on the full data is needed.
Category-free measures	Does not require cut points	May lose clinical intuition
Brier score	Proper scoring rule	May be difficult to interpret; the maximum value depends on incidence of the outcome.
NRI(0)	Continuous, does not depend on categories	Based on ranks only. Measure of association rather than model improvement. Behavior may be erratic if the new predictor is not normally distributed.
IDI	Nice statistical properties. Related to the difference in model R²	Depends on event rate. Values are low and may be difficult to interpret.
Decision analytics	Estimates clinical impact of using model	Not a direct estimate of model fit or improvement. Need reasonable estimates of decision thresholds
Decision curve	Displays the net benefit across a range of thresholds	Does not compare model improvement directly but clinical consequences of using the models for treatment decisions
Cost-benefit analysis	Compares costs and benefits of one models or treatment strategy vs. another	Need detailed estimates of costs and benefits of misclassification, including further diagnostic workup and treatments