. Author manuscript; available in PMC: 2022 Oct 1.

Published in final edited form as: Med Image Anal. 2021 Jul 21;73:102179. doi: 10.1016/j.media.2021.102179

Table 2:

Comparison with baseline and ablated methods.

Method	κ	AUC	F ₁	Pre	Rec
Ours	0.49	0.83	0.66	0.63	0.72
1) Ours w/o Ordinal	0.46	0.83	0.67	0.69	0.72
2) Ours w/o Focal	0.46	0.84	0.66	0.64	0.70
3) Ours w/o OF loss^*	0.45	0.80	0.66	0.67	0.65

4) RCE w/ implicit norm^*	0.41	0.77	0.63	0.65	0.61
5) Soft scores^*	0.32	0.75	0.56	0.57	0.56
6) Soft scores (KL)^*	0.42	0.72	0.62	0.65	0.62

7) Majority Vote (OF)^*	0.33	0.73	0.58	0.59	0.58
8) Majority Vote^*	0.32	0.75	0.56	0.57	0.56

9) Baseline OF-CNN^*	0.26	0.72	0.57	0.60	0.54
10) Baseline CNN^*	0.24	0.71	0.55	0.61	0.49
11) DeepRank^* ((Pang et al., 2017))	0.27	0.70	0.56	0.53	0.58
12) SVM^*	0.21	0.56	0.44	0.49	0.40

indicates statistical difference at (p < 0.05) compared with our method, measured by the Wilcoxon signed rank test (Wilcoxon, 1992). Best results are in bold and second best are underlined. See text for details about compared methods.