Skip to main content
. 2016 Jan 9;118:65–94. doi: 10.1007/s11263-015-0872-3

Table 3.

Pooling encoder comparisons

Dataset Meas. SIFT VGG-M VGG-M
(%) BoVW LLC VLAD IFV BoVW LLC VLAD IFV FC
DTD acc 49.0 ± 0.8 48.2 ± 1.4 54.3 ± 0.8 58.6 ± 1.2 61.2 ± 1.3 64.0 ± 1.3 67.6 ± 0.7 66.8 ± 1.5 58.7 ± 0.9
OS+R acc 30.0 30.8 32.5 39.8 41.3 45.3 49.7 52.5 41.3
KTH-T2b acc 57.6 ± 1.5 56.8 ± 2.0 64.3 ± 1.3 70.2 ± 1.6 73.6 ± 2.8 74.0 ± 3.3 72.2 ± 4.7 73.3 ± 4.8 71.0 ± 2.3
FMD acc 50.5 ± 1.7 48.4 ± 2.2 54.0 ± 1.3 59.7 ± 1.6 67.9 ± 2.2 71.7 ± 2.1 74.2 ± 2.0 73.5 ± 2.0 70.3 ± 1.8
VOC07 mAP11 51.2 47.8 56.9 59.9 72.9 75.5 76.8 76.4 76.8
MIT Indoor acc 47.7 39.2 51.0 54.9 69.1 68.9 71.2 74.2 62.5

The table compares the orderless pooling encoders BoVW, LLC, VLAD, and IFV with either SIFT local descriptors and VGG-M CNN local descriptors (FV-CNN). It also compares pooling convolutional features with the CNN fully connected layers (FC-CNN). The table reports classification accuracies for all datasets except VOC07 and OS+R for which mAP-11 (Everingham et al. 2007) and mAP are reported, respectively

Bold values indicate the best results achieved by the compared methods