Table 2.
Evaluation metric | Overall | One-vs-Rest | One-vs-One | ||||
---|---|---|---|---|---|---|---|
No vs. rest | Yes vs. rest | Not answered vs. rest | Yes vs. no | Yes vs. not answered | No vs. not answered | ||
Accuracy (95% CI) | 0.82 (0.75; 0.88) | 0.83 (0.76; 0.89) | 0.88 (0.81; 0.93) | 0.77 (0.69; 0.84) | 0.83 (0.71; 0.91) | 0.64 (0.59; 0.77) | 0.83 (0.71; 0.91) |
AUC | 0.80 (0.72; 0.88) | 0.83 (0.76; 0.90) | 0.83 (0.75; 0.91) | 0.74 (0.66; 0.82) | 0.90 (0.82; 0.98) | 0.71 (0.57; 0.85) | 0.82 (0.71; 0.92) |
Kappa | 0.60 | 0.66 | 0.68 | 0.48 | 0.63 | 0.28 | 0.65 |
No Information Rate | 0.41 | 0.59 | 0.74 | 0.67 | 0.60 | 0.54 | 0.53 |
P-value [Acc > NIR] | 2.6e-14 | 1.7e-9 | 9.3e-5 | 8.6e-3 | 2.0e-4 | 0.11 | 6.4e-7 |
Mcnemar's Test P-value | 0.36 | 0.83 | 0.80 | 1.00 | 0.34 | 0.17 | 0.55 |
Sensitivity/Recall | 0.74 | 0.82 | 0.74 | 0.65 | 0.70 | 0.54 | 0.88 |
Specificity | 0.86 | 0.84 | 0.93 | 0.83 | 0.91 | 0.75 | 0.77 |
Average precision | 0.65 | 0.51 | 0.82 | 0.61 | 0.74 | 0.53 | 0.80 |
Precision/PPV | 0.74 | 0.79 | 0.78 | 0.65 | 0.84 | 0.71 | 0.81 |
NPV | 0.87 | 0.87 | 0.91 | 0.83 | 0.82 | 0.58 | 0.85 |
F1 score | 0.74 | 0.80 | 0.76 | 0.65 | 0.76 | 0.61 | 0.85 |
Prevalence | 0.34 | 0.41 | 0.26 | 0.33 | 0.40 | 0.54 | 0.53 |
Detection Rate | 0.26 | 0.34 | 0.19 | 0.21 | 0.28 | 0.29 | 0.47 |
Detection Prevalence | 0.35 | 0.43 | 0.24 | 0.33 | 0.33 | 0.40 | 0.58 |
Balanced Accuracy | 0.80 | 0.83 | 0.83 | 0.74 | 0.81 | 0.64 | 0.83 |
95% CI 95% confidence interval, Acc accuracy, AUROC area under the receiver operator characteristic, knn k-nearest neighbors, NIR No Information Rate, NPV negative predictive value, PPV positive predictive value