. 2016 Jan 9;118:65–94. doi: 10.1007/s11263-015-0872-3

Table 3.

Pooling encoder comparisons

Dataset	Meas.	SIFT				VGG-M				VGG-M
Dataset	(%)	BoVW	LLC	VLAD	IFV	BoVW	LLC	VLAD	IFV	FC
DTD	acc	49.0 $\pm$ 0.8	48.2 $\pm$ 1.4	54.3 $\pm$ 0.8	58.6 $\pm$ 1.2	61.2 $\pm$ 1.3	64.0 $\pm$ 1.3	67.6 $\pm$ 0.7	66.8 $\pm$ 1.5	58.7 $\pm$ 0.9
OS+R	acc	30.0	30.8	32.5	39.8	41.3	45.3	49.7	52.5	41.3
KTH-T2b	acc	57.6 $\pm$ 1.5	56.8 $\pm$ 2.0	64.3 $\pm$ 1.3	70.2 $\pm$ 1.6	73.6 $\pm$ 2.8	74.0 $\pm$ 3.3	72.2 $\pm$ 4.7	73.3 $\pm$ 4.8	71.0 $\pm$ 2.3
FMD	acc	50.5 $\pm$ 1.7	48.4 $\pm$ 2.2	54.0 $\pm$ 1.3	59.7 $\pm$ 1.6	67.9 $\pm$ 2.2	71.7 $\pm$ 2.1	74.2 $\pm$ 2.0	73.5 $\pm$ 2.0	70.3 $\pm$ 1.8
VOC07	mAP11	51.2	47.8	56.9	59.9	72.9	75.5	76.8	76.4	76.8
MIT Indoor	acc	47.7	39.2	51.0	54.9	69.1	68.9	71.2	74.2	62.5

The table compares the orderless pooling encoders BoVW, LLC, VLAD, and IFV with either SIFT local descriptors and VGG-M CNN local descriptors (FV-CNN). It also compares pooling convolutional features with the CNN fully connected layers (FC-CNN). The table reports classification accuracies for all datasets except VOC07 and OS $+$ R for which mAP-11 (Everingham et al. 2007) and mAP are reported, respectively

Bold values indicate the best results achieved by the compared methods