Table 2:
Classification performance on CBIS-DDSM. For GMIC, we reported test AUC of top-5 models that achieved highest validation AUC in identifying breasts with malignant findings. We compared GMIC with five baselines. The performance of Deep MIL, RGP, and GGP in this table was originally reported in Shu et al. (2020).
| Model | AUC(M) |
|---|---|
| ResNet-34 | 0.792 ± 0.014 |
| ResNet-34-1×1 conv | 0.800 ± 0.011 |
| Deep MIL (Zhu et al., 2017) | 0.791 ± 0.0002 |
| RGP (Shu et al., 2020) | 0.838 ± 0.0001 |
| GGP (Shu et al., 2020) | 0.823 ± 0.0002 |
| GMIC-ResNet-18 | 0.833 ± 0.004 |
| GMIC-ResNet-18 (best) | 0.840 |
| GMIC-ResNet-34 | 0.830 ± 0.003 |
| GMIC-ResNet-50 | 0.828 ± 0.001 |
| GMIC-ResNet-18-ensemble | 0.858 |
| GMIC-ResNet-34-ensemble | 0.849 |
| GMIC-ResNet-50-ensemble | 0.849 |