. 2024 Jan 11;44(3):523–534. doi: 10.1007/s00296-023-05518-9

Table 2.

results of the multiclass classification

Evaluation metric	Overall	One-vs-Rest			One-vs-One
Evaluation metric	Overall	No vs. rest	Yes vs. rest	Not answered vs. rest	Yes vs. no	Yes vs. not answered	No vs. not answered
Accuracy (95% CI)	0.82 (0.75; 0.88)	0.83 (0.76; 0.89)	0.88 (0.81; 0.93)	0.77 (0.69; 0.84)	0.83 (0.71; 0.91)	0.64 (0.59; 0.77)	0.83 (0.71; 0.91)
AUC	0.80 (0.72; 0.88)	0.83 (0.76; 0.90)	0.83 (0.75; 0.91)	0.74 (0.66; 0.82)	0.90 (0.82; 0.98)	0.71 (0.57; 0.85)	0.82 (0.71; 0.92)
Kappa	0.60	0.66	0.68	0.48	0.63	0.28	0.65
No Information Rate	0.41	0.59	0.74	0.67	0.60	0.54	0.53
P-value [Acc > NIR]	2.6e-14	1.7e-9	9.3e-5	8.6e-3	2.0e-4	0.11	6.4e-7
Mcnemar's Test P-value	0.36	0.83	0.80	1.00	0.34	0.17	0.55
Sensitivity/Recall	0.74	0.82	0.74	0.65	0.70	0.54	0.88
Specificity	0.86	0.84	0.93	0.83	0.91	0.75	0.77
Average precision	0.65	0.51	0.82	0.61	0.74	0.53	0.80
Precision/PPV	0.74	0.79	0.78	0.65	0.84	0.71	0.81
NPV	0.87	0.87	0.91	0.83	0.82	0.58	0.85
F1 score	0.74	0.80	0.76	0.65	0.76	0.61	0.85
Prevalence	0.34	0.41	0.26	0.33	0.40	0.54	0.53
Detection Rate	0.26	0.34	0.19	0.21	0.28	0.29	0.47
Detection Prevalence	0.35	0.43	0.24	0.33	0.33	0.40	0.58
Balanced Accuracy	0.80	0.83	0.83	0.74	0.81	0.64	0.83

95% CI 95% confidence interval, Acc accuracy, AUROC area under the receiver operator characteristic, knn k-nearest neighbors, NIR No Information Rate, NPV negative predictive value, PPV positive predictive value