Table 3.
Model [5-vote] (std) | Model [3-vote] (std) | Neurologists [3-vote] (std) | |
---|---|---|---|
AUC | 0.92 (0.01) | 0.92 (0.01) | N/A |
AP | 0.85 (0.01) | 0.82 (0.02) | N/A |
F1 | 0.81 (0.01) | 0.77 (0.02) | 0.71 (0.08) |
Accuracy | 0.89 (0.01) | 0.87 (0.02) | 0.82 (0.05) |
Precision | 0.84 (0.01) | 0.81 (0.03) | 0.76 (0.17) |
Recall | 0.78 (0.02) | 0.74 (0.05) | 0.75 (0.17) |
Cohen | 0.73 (0.02) | 0.68 (0.03) | 0.60 (0.11) |
The values shown in brackets are the standard deviations of the average performance. Recall that the performance of each neurologist is averaged across 4 3-vote ground truth sets. For the 5-vote labels, we subsampled the test set 5 times (at 80%) to obtain the value for the standard deviation. Note that for these metrics the abnormal class was used as the positive label. AUC, Area Under the receiver operating characteristic Curve; AP, Average Precision.