Skip to main content
. 2014 Apr 10;4:4636. doi: 10.1038/srep04636

Figure 3. Illustration of grid spacing and DSIFT feature extraction.

Figure 3

For consistency reasons we extract the same number of features from all vocabulary, training or test images. The image locations from where the features are extracted are fixed according to a grid. A sparse grid, equivalent to a low number of features, can be responsible for dismissing important image information, while a dense grid, equivalent to a high number of features, is computationally demanding and may lead to redundant information. The DSIFT descriptors extracted from the grid locations are histogram representations that combine local gradient orientations and magnitudes from a neighborhood around a keypoint, indicated by the bin size. More precisely, the descriptor is a histogram of gradient location and orientation, where location is quantized into a 4 × 4 location grid and the gradient angle is quantized into 8 orientations, one for each of the cardinal directions. The resulting descriptor is a normalized vector with the dimension of 128 elements.