. 2024 Sep 23;53(9):afae201. doi: 10.1093/ageing/afae201

Table 2.

Key metrics for evaluating prediction models with categorical outcomes—definitions, formulas and interpretations

Term	Explanation	Formula
Sensitivity (recall)	The ability of a test to correctly identify those with the condition (true positive rate)	Sensitivity = true positives/(true positives + false negatives)
Specificity	The ability of a test to correctly identify those without the condition (true negative rate)	Specificity = true negatives/(true negatives + false positives)
Negative predictive value (NPV)	The proportion of negative test results that are true negatives	NPV = true negatives/(true negatives + false negatives)
Positive predictive value (PPV) or precision	The proportion of positive test results that are true positives	PPV = true positives/(true positives + false positives)
F1 score	The F1 score is the harmonic mean of precision and recall, accounting both for false positives and false negatives	F1 score = 2 × (precision × recall)/(precision + recall)
Discrimination	The model’s ability to distinguish between different levels of outcome, often measured by metrics like the AUC for binary outcomes	AUC (area under the curve or area under the receiver operating characteristic curve) is not derived from a simple formula. It involves plotting true positive rate against false positive rate at various threshold settings and measuring the area under this curve. The AUC can be interpreted as the probability that the model ranks a random positive example higher than a random negative example
Calibration	The agreement between observed outcomes and predictions is assessed by calibration slope (beta) and intercept (alpha)	Calibration slope and intercept based on a logistic regression for categorical outcomes; often visualised with calibration plots, where the predicted probabilities are plotted on the x-axis and the observed frequencies on the y-axis. A perfectly calibrated model would result in a plot where the points lie on the diagonal line. In the ideal case, the calibration slope would be 1 and the intercept 0