Table 3.
Pooling encoder comparisons
| Dataset | Meas. | SIFT | VGG-M | VGG-M | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (%) | BoVW | LLC | VLAD | IFV | BoVW | LLC | VLAD | IFV | FC | |
| DTD | acc | 49.0 0.8 | 48.2 1.4 | 54.3 0.8 | 58.6 1.2 | 61.2 1.3 | 64.0 1.3 | 67.6 0.7 | 66.8 1.5 | 58.7 0.9 |
| OS+R | acc | 30.0 | 30.8 | 32.5 | 39.8 | 41.3 | 45.3 | 49.7 | 52.5 | 41.3 |
| KTH-T2b | acc | 57.6 1.5 | 56.8 2.0 | 64.3 1.3 | 70.2 1.6 | 73.6 2.8 | 74.0 3.3 | 72.2 4.7 | 73.3 4.8 | 71.0 2.3 |
| FMD | acc | 50.5 1.7 | 48.4 2.2 | 54.0 1.3 | 59.7 1.6 | 67.9 2.2 | 71.7 2.1 | 74.2 2.0 | 73.5 2.0 | 70.3 1.8 |
| VOC07 | mAP11 | 51.2 | 47.8 | 56.9 | 59.9 | 72.9 | 75.5 | 76.8 | 76.4 | 76.8 |
| MIT Indoor | acc | 47.7 | 39.2 | 51.0 | 54.9 | 69.1 | 68.9 | 71.2 | 74.2 | 62.5 |
The table compares the orderless pooling encoders BoVW, LLC, VLAD, and IFV with either SIFT local descriptors and VGG-M CNN local descriptors (FV-CNN). It also compares pooling convolutional features with the CNN fully connected layers (FC-CNN). The table reports classification accuracies for all datasets except VOC07 and OSR for which mAP-11 (Everingham et al. 2007) and mAP are reported, respectively
Bold values indicate the best results achieved by the compared methods