Table 2:
Comparison with baseline and ablated methods.
Method | κ | AUC | F 1 | Pre | Rec |
---|---|---|---|---|---|
Ours | 0.49 | 0.83 | 0.66 | 0.63 | 0.72 |
1) Ours w/o Ordinal | 0.46 | 0.83 | 0.67 | 0.69 | 0.72 |
2) Ours w/o Focal | 0.46 | 0.84 | 0.66 | 0.64 | 0.70 |
3) Ours w/o OF loss* | 0.45 | 0.80 | 0.66 | 0.67 | 0.65 |
| |||||
4) RCE w/ implicit norm* | 0.41 | 0.77 | 0.63 | 0.65 | 0.61 |
5) Soft scores* | 0.32 | 0.75 | 0.56 | 0.57 | 0.56 |
6) Soft scores (KL)* | 0.42 | 0.72 | 0.62 | 0.65 | 0.62 |
| |||||
7) Majority Vote (OF)* | 0.33 | 0.73 | 0.58 | 0.59 | 0.58 |
8) Majority Vote* | 0.32 | 0.75 | 0.56 | 0.57 | 0.56 |
| |||||
9) Baseline OF-CNN* | 0.26 | 0.72 | 0.57 | 0.60 | 0.54 |
10) Baseline CNN* | 0.24 | 0.71 | 0.55 | 0.61 | 0.49 |
11) DeepRank* ((Pang et al., 2017)) | 0.27 | 0.70 | 0.56 | 0.53 | 0.58 |
12) SVM* | 0.21 | 0.56 | 0.44 | 0.49 | 0.40 |
indicates statistical difference at (p < 0.05) compared with our method, measured by the Wilcoxon signed rank test (Wilcoxon, 1992). Best results are in bold and second best are underlined. See text for details about compared methods.