Table 6:
To evaluate the effectiveness of the patch-wise attention, we compare the proposed model with the variant (uniform) that always assigns equal attention to all patches. To investigate the importance of the localization information in the saliency maps, we trained another variant (random) that randomly selects patches from the input image. We use GMIC-ResNet-18 model with top 3% pooling as the base model. The performance of the local module (ŷlocal) is reported.
| Attention | ROI patches | AUC(M) | AUC(B) |
|---|---|---|---|
| uniform | retrieve_roi | 0.874 ± 0.008 | 0.776 ± 0.007 |
| gated | random | 0.629 ± 0.042 | 0.658 ± 0.011 |
| gated | retrieve_roi | 0.898 ± 0.01 | 0.78 ± 0.008 |