Table 1.
Definition of terms used in the article
| Term | Definition |
|---|---|
| Ground truth | the “known” label for an instance in a dataset; in binary classification, this is a 1 or 0, which are here referred to as positives (P) and negatives (N), respectively |
| Confusion matrix | a 2 × 2 contingency table in which the rows are the ground-truth labels (P or N) and the columns are the predicted labels ( or Ň), or vice versa; the quadrants are true positives (TPs) and true negatives (TNs), which are positive and negative instances successfully classified as such, and false positives (FPs) and false negatives (FNs), which are negatives falsely classified as positives and positives falsely classified as negatives respectively |
| Operating point | an operating point is a specific use of a classifier output or set of classifier outputs; most commonly, this refers to the selection of a score threshold where the classifier has a continuous (not necessarily calibrated) output, with instances scoring above this threshold being labeled as positives and instances scoring below labeled as negatives |
| Precision; positive predictive value | TP/(TP + FP) = TP/PĆ |
| True positive rate; recall; sensitivity | TP/(TP + FN) = TP/P |
| False positive rate; 1 − specificity | FP/(FP + TN) = FP/N |
| Specificity | TN/(FP + TN) = TN/N |
| Accuracy | (TP + TN)/(TP + FP + FN + TN) = (TP + TN)/(N + P) |
| ROC curve | a plot in which the x axis is FPR and the y axis is TPR; each point corresponds to a particular operating point, e.g., score threshold; straight lines are drawn between each point |
| PR curve | a plot in which the x axis is TPR (referred to as recall) and the y axis is precision; as in the ROC curve, each point corresponds to a particular operating point, and lines are drawn between each point to construct a curve |
| AUC | area under the curve, e.g., ROC (ROC-AUC) or PR (PR-AUC) curve, providing a single number that summarizes predictive performance, allowing comparison of curves |
| Early retrieval (ER) | performance of the classifier between up to a certain FPR range, e.g., FPR 0 to FPRmax = 0.1; this reflects the performance of the classifier over the highest-scoring instances |