Fig. 5.
Per class classification accuracy in the DTD data comparing three local image descriptors: SIFT, VGG-M, and VGG-VD. For all three local descriptors, BoVW with 4096 visual words was used. Classes are sorted by increasing BoVW-CNN-VD accuracy (this number is reported along each bar)