Table 7.
Ablation study on the effectiveness of the UC-MIL’s backbone networks and the proposed Uncertainty-aware Consensus-assisted mechanism. Specifically, we respectively replace the proposed UC-MIL to another two classic MIL methods, such as Campanella et al. (2019) (w/ Instance-based) and Ilse et al. (2018) (w/ Embedding-based). The performance is reported as F1 (%), AUROC (%). 95% confidence intervals are presented in brackets, respectively.
| Methods | Learning ability |
Generalisation ability |
||
|---|---|---|---|---|
| F1 (%)↑ | AUROC (%)↑ | F1 (%)↑ | AUROC (%)↑ | |
| Backbone | ||||
| w/ ResNet34 | 93.2 (90.9, 95.5) |
98.0 (96.1, 99.1) |
86.8 (84.7, 88.1) |
90.5 (88.7, 92.0) |
| w/ ResWide50 | 93.3 (91.7, 95.0) |
97.7 (95.4, 98.9) |
86.0 (84.2, 88.1) |
90.2 (88.4, 92.3) |
| w/ EfficientNetB3 | 90.2 (88.1, 92.3) |
96.0 (93.9, 98.0) |
84.6 (82.7, 86.1) |
88.1 (86.5, 89.7) |
| w/ Res2Net50 | 91.7 (89.9, 93.2) |
96.8 (94.7, 98.0) |
85.0 (83.1, 87.2) |
88.7 (86.6, 89.9) |
| Ours |
94.9 (93.0, 96.8) |
98.7 (97.6, 99.4) |
88.0 (82.3, 92.7) |
91.8 (84.6, 93.3) |
| Component | ||||
| w/o Uncertainty | 92.2 (90.1, 94.1) |
97.0 (95.8, 98.1) |
86.2 (94.9, 88.3) |
90.2 (88.1, 92.1) |
| w/o Consensus | 92.2 (90.4, 94.6) |
97.7 (95.1, 98.6) |
85.8 (83.3, 87.0) |
89.4 (87.2, 90.5) |
| w/ Instance-based | 88.7 (86.0, 90.1) |
95.5 (93.3, 96.8) |
81.5 (80.0, 83.1) |
85.7 (83.3, 87.1) |
| w/ Embedding-based | 89.9 (87.4, 91.2) |
95.9 (93.3, 97.6) |
83.0 (81.0, 85.2) |
87.0 (85.1, 88.9) |
| Ours |
94.9 (93.0, 96.8) |
98.7 (97.6, 99.4) |
88.0 (82.3, 92.7) |
91.8 (84.6, 93.3) |