Skip to main content
. 2024 Sep 23;53(9):afae201. doi: 10.1093/ageing/afae201

Table 2.

Key metrics for evaluating prediction models with categorical outcomes—definitions, formulas and interpretations

Term Explanation Formula
Sensitivity (recall) The ability of a test to correctly identify those with the condition (true positive rate) Sensitivity = true positives/(true positives + false negatives)
Specificity The ability of a test to correctly identify those without the condition (true negative rate) Specificity = true negatives/(true negatives + false positives)
Negative predictive value (NPV) The proportion of negative test results that are true negatives NPV = true negatives/(true negatives + false negatives)
Positive predictive value (PPV) or precision The proportion of positive test results that are true positives PPV = true positives/(true positives + false positives)
F1 score The F1 score is the harmonic mean of precision and recall, accounting both for false positives and false negatives F1 score = 2 × (precision × recall)/(precision + recall)
Discrimination The model’s ability to distinguish between different levels of outcome, often measured by metrics like the AUC for binary outcomes AUC (area under the curve or area under the receiver operating characteristic curve) is not derived from a simple formula. It involves plotting true positive rate against false positive rate at various threshold settings and measuring the area under this curve. The AUC can be interpreted as the probability that the model ranks a random positive example higher than a random negative example
Calibration The agreement between observed outcomes and predictions is assessed by calibration slope (beta) and intercept (alpha) Calibration slope and intercept based on a logistic regression for categorical outcomes; often visualised with calibration plots, where the predicted probabilities are plotted on the x-axis and the observed frequencies on the y-axis. A perfectly calibrated model would result in a plot where the points lie on the diagonal line. In the ideal case, the calibration slope would be 1 and the intercept 0