Skip to main content
. 2023 Jul 7;4(8):100791. doi: 10.1016/j.patter.2023.100791

Figure 1.

Figure 1

A schematic representation of paired evaluation

(A) Individual samples in a test dataset are represented by squares, colored according to their true labels in binary classification and linear regression settings. The test dataset is broken up into rankable pairs, and a predictor is asked to score each pair separately. The scores are used to determine whether a given pair was ranked correctly (✓) or incorrectly (X), and the AUC is determined by the fraction of correctly ranked pairs.

(B) The criteria for a valid rankable test pair. In binary classification, two samples are considered rankable if they belong to the opposite classes; in linear regression, a rankable pair of samples requires that the difference between their labels is greater than a predefined meta-parameter δ.

(C) An example comparison of two models (A and B). A 2 × 2 contingency table tallies the correctly and incorrectly ranked pairs by each model. Statistical significance of the difference in method performance is assessed by Fisher’s exact test and McNemar’s test.

See also Figure S1.