Sensitivity (recall) |
The ability of a test to correctly identify those with the condition (true positive rate) |
Sensitivity = true positives/(true positives + false negatives) |
Specificity |
The ability of a test to correctly identify those without the condition (true negative rate) |
Specificity = true negatives/(true negatives + false positives) |
Negative predictive value (NPV) |
The proportion of negative test results that are true negatives |
NPV = true negatives/(true negatives + false negatives) |
Positive predictive value (PPV) or precision |
The proportion of positive test results that are true positives |
PPV = true positives/(true positives + false positives) |
F1 score |
The F1 score is the harmonic mean of precision and recall, accounting both for false positives and false negatives |
F1 score = 2 × (precision × recall)/(precision + recall) |
Discrimination |
The model’s ability to distinguish between different levels of outcome, often measured by metrics like the AUC for binary outcomes |
AUC (area under the curve or area under the receiver operating characteristic curve) is not derived from a simple formula. It involves plotting true positive rate against false positive rate at various threshold settings and measuring the area under this curve. The AUC can be interpreted as the probability that the model ranks a random positive example higher than a random negative example |
Calibration |
The agreement between observed outcomes and predictions is assessed by calibration slope (beta) and intercept (alpha) |
Calibration slope and intercept based on a logistic regression for categorical outcomes; often visualised with calibration plots, where the predicted probabilities are plotted on the x-axis and the observed frequencies on the y-axis. A perfectly calibrated model would result in a plot where the points lie on the diagonal line. In the ideal case, the calibration slope would be 1 and the intercept 0 |