An efficient image descriptor for image classification and CBIR

Ashkan Shakarami; Hadis Tarrah

doi:10.1016/j.ijleo.2020.164833

. 2020 May 4;214:164833. doi: 10.1016/j.ijleo.2020.164833

An efficient image descriptor for image classification and CBIR

Ashkan Shakarami ^a,^*, Hadis Tarrah ^b

PMCID: PMC7198219 PMID: 32372771

Abstract

Pattern recognition and feature extraction of images always have been important subjects in improving the performance of image classification and Content-Based Image Retrieval (CBIR). Recently, Machine Learning and Deep Learning algorithms are utilized widely in order to achieve these targets. In this research, an efficient method for image description is proposed which is developed by Machine Learning and Deep Learning algorithms. This method is created using combination of an improved AlexNet Convolutional Neural Network (CNN), Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) descriptors. Furthermore, the Principle Component Analysis (PCA) algorithm has been used for dimension reduction. The experimental results demonstrate the superiority of the offered method compared to existing methods by improving the accuracy, mean Average Precision (mAP) and decreasing the complex computation. The experiments have been run on Corel-1000, OT and FP datasets.

Keywords: Pattern recognition, Image descriptor, Machine learning, Convolutional neural network (CNN), Image classification, Content-based image retrieval (CBIR)

1. Introduction

In recent years, because of the wide spread use of the Internet and the massive use of audio-visual information in digital format for communications, designing the systems for describing the content of multimedia information in order to seek and classify them is really important. In computer vision, image descriptors describe elementary characteristics such as shape, color, texture or motion of images, which are visual features of images [1,2].

Offering new image descriptors has been active research area and will help in increasing the performance of many tasks in computer vision. Some descriptors such as HOG, LBP, SURF and SIFT have been proposed, so far [[3], [4], [5], [6], [7]]. These descriptors are usually used in Machine Learning for pattern recognition and feature extraction [[8], [9], [10], [11]]. Each of these descriptors has disadvantages, like large dimension of feature vector and considering only certain features such as texture. To overcome these problems, an efficient image descriptor is presented in this research. This descriptor is created using combination of HOG, LBP and improved AlexNet CNN. Furthermore, for dimension reduction the PCA has been applied. When the proposed descriptor is used for image classification and CBIR, it provides benefits which are divided into following items:

•
Higher accuracy and mAP compare with other works;
•
Overcoming the problem of high dimension descriptors such as HOG;
•
Sensitivity to intra-class as well as inter-class variety;
•
High performance on imbalanced databases.

The rest of this paper is organized as:

The related works is introduced in section 2. The proposed method is discussed in section 3. The experimental results are reported in section 4 and these results are compared with existing experimental results in this section. Finally, the conclusion and future work are presented in section 5.

2. Related works

In recent years, image descriptors are usually used for many cases such as image classification and CBIR [12,13]. Some researchers utilized combination of descriptors to propose new ones in order to increase efficiency and use it for special cases [14]. Some of them are described as follow:

[15] has offered a descriptor using combination of HOG and LBP. The results of this approach have showed that, it is an efficient descriptor for object detection, but the unequal dimension is the main problem of that. This descriptor is affected by HOG. Suppose the dimension of input image is 227 × 227 × 3; in this case, the feature vector’s dimension of HOG descriptor is 1 × 26244, whereas this dimension for LBP descriptor is 1 × 59 and the combined descriptor has more affected by the larger descriptor. Whatever this difference become larger, the smaller descriptor lose its effect [16,17].

[18] similar to [15], has suggested a way that feature vector’s dimension of HOG and LBP descriptors (1 × 181,535) are combined and are reduced to 1 × 2824 using the variance and improved PSO algorithm, in which LBP descriptor has more influence. In our proposed method, for overcoming mentioned problems a solution is presented for dimension reduction using PCA algorithm.

Convolutional Neural Networks can be used for feature extraction, and the extracted features are utilized in cases such as classification, retrieval or detection. [19] proposed a method in which the fusion of extracted features by two networks used for face detection. In some works the combination of Convolutional Neural Networks and descriptors is used to create a new image descriptor. For instance [20], has suggested a method that image features are extracted using the HOG and SURF descriptors, then the features of each descriptor are sent to convolutional layers, and the feature vectors with 1 × 2016 and 1 × 1024 dimensions are created. After that, these features are combined and their dimension is reduced by Fully Connected (FC) layers and eventually are used for classification. In this way, such as [15,18], because of the inequality of the dimensions of HOG and SURF descriptors, their impacts on the outcome of classification will be different.

Considering the weaknesses of the mentioned existing methods, in this study a new descriptor has been offered to overcome these problems.

3. Proposed method

In Fig. 1 , the schematic of proposed method has been shown. According to this figure, in the first step, the images are read and resized to 227 × 227 × 3. Then, each image is sent to deep feature extractor (an improved AlexNet CNN) and handcrafted descriptors such as HOG and LBP simultaneously. On the one hand, the improved AlexNet CNN processes the images and recognizes its patterns, and finally proposes a feature vector with the dimension of 1 × 64 [21]. On the other hand, the HOG and LBP descriptors extracts features. Then the PCA algorithm is used to reduce the dimensions of produced features by HOG descriptor.

In this paper, for having equal effects of HOG and LBP descriptors on final feature descriptor, the dimensions of HOG descriptor has been reduced to 1 × 59 by PCA algorithm. After that, the HOG-PCA and LBP feature’s vectors are combined and a new handcrafted feature vector with dimension of 1 × 118 is created. In order to match the dimension of handcrafted feature vector with deep feature vector the PCA is applied on handcrafted feature vector and 64 of 118 features are selected and a Handcrafted-PCA feature vector with a dimension of 1 × 64 is created. Finally, the deep feature vector and Handcrafted-PCA feature vector are combined and an efficient image descriptor with a dimension of 1 × 128 is created.

In this research, the AlexNet CNN, HOG, LBP and PCA are chosen by authors due to some reasons. These reasons are described as follow:

1)
The Feature selection technique which is applied by PCA algorithm, has been used for several reasons such as reducing the computation volume and training times, simplification of models and etc. [22].
2)
The LBP descriptor is a strong feature for texture classification. Combining the LBP with HOG descriptor, improves the performance of detection considerably on some datasets [15,23].
3)
The HOG descriptor is computed on a dense grid of uniformly spaced cells and for improving the accuracy, uses overlapping local contrast normalization [62]. The advantages of this descriptor is that it executes on local cells which is invariant to geometric and photometric transformation, except for object orientation [4,24].
4)
The improved AlexNet CNN detects the important and high level features automatically without any human supervision [21].

4. Simulation and comparison results

The proposed method has been implemented with Matlab 2018b software, a computer system with 6GB RAM, a NVIDIA GeForce 920 M graphics processor unit (GPU), and an Intel® Core™ i5-7200U @ 2.50 GHz central processor unit (CPU).

In this investigation, the accuracy, mean average precision (mAP) and recall criteria have been measured in order to evaluate the proposed method for classification and CBIR [[25], [26], [27], [28]] and 3-fold cross-validation has been applied in all experiments (Westerhuis et al 2008). In addition, the Adam algorithm has been used to train the AlexNet CNN [29] and Random forest, SVM and KNN classifiers have been utilized for classification [[30], [31], [32], [33], [34], [35], [36]]. Moreover, for measuring the similarity, Euclidean distance has also been used [37,38]. In the following, used datasets, experiments and comparison results have been demonstrated and described in details.

4.1. Used datasets

In this study, for evaluating the offered way, Corel-1000 (Wang), OT and FP datasets have been used. In Table 1 , the number of categories and number of images per categories have been shown. These datasets are described in details as below:

Table 1.

The number of categories and images per categories for used datasets.

Datasets	Number of categories	Number of images
Corel-1000	10	1000
OT	8	2688
FP (Catlech-101)	5	380

Method	Train data (Accuracy ± Standard division)	Test data (Accuracy ± Standard division)
	Accuracy ± Standard division	Accuracy ± Standard division
AlexNet CNN	97.80 ± 0.46	90.10 ± 1.80

Method	Train data (Accuracy ± Standard division)			Test data (Accuracy ± Standard division)
	Classifiers			Classifiers
	Random forest	SVM	KNN	Random forest	SVM	KNN
Proposed method	100	100	100	96 ± 0.62	94.70 ± 1.08	95.20 ± 1.04

Methods	mAP ± Standard division
	5-top	10-top	All relative images
AlexNet CNN	93.14	91.87	75.48
Proposed method	96.02 ± 1.94	95.80 ± 1.82	91.65 ± 1.22

Methods	mAP ± Standard division
Methods	5-top	10-top	All relative images
AlexNet CNN	93 ± 0.48	92.30 ± 0.65	71.26 ± 1.58
Proposed method	94.22 ± 0.36	93.91 ± 0.37	85.64 ± 0.55

Dataset	Method	Accuracy	Training rate%, Test rate%
Corel-1000	Fuzzy Topological [53]	62.20	33, 67
	Color Histogram + Fuzzy Neural Network [54]	73.40	–
	Proposed method	96	3-fold cross validation
OT	Fusion features [55]	63.89	–
	Co-occurance matrix + Bayesian classifier [43]	89	10-fold cross validation
	Hybrid generative + Dense SIFT [56]	91.08	50, 50
	Proposed method	93.86	3-fold cross validation
FP	Co-occurance matrix + KNN [43]	87.20	10-fold cross validation
FP	Proposed method	89.74	3-fold cross validation

Dataset	Method	mAP	Training rate%, Test rate%	Dimension of proposed method
Corel-1000	CCM + DBPSP [63]	76.10	90, 10	1 × 21
	Block Truncation Coding [57]	77.90	–	1 × 96
	HOG + SURF [64]	80.61	70, 30	–
	Dense SIFT [65]	84.20	50, 50	1 × 128
	SURF + FREAK [66]	86	70, 30	1 × 128
	SURF + MSER [67]	88	70, 30	1 × 128
	Fusion features [55]	83.50	–	–
	AlexNet CNN [44]	93.80	–	1 × 4096
	Proposed method	95.80	3-fold cross validation	1 × 128
OT	Co-occurance matrix [43]	76.39	10-fold cross validation	1 × 9
	Color moment + Angular Radial Transform + Edge histogram [58]	50.59	85, 15	–
	Relevance Feedback [59]	79	–	–
	Proposed method	93.91	3-fold cross validation	1 × 128
FP	Co-occurance matrix [43]	78.83	10-fold cross validation	1 × 9
FP	Proposed method	86.86	3-fold cross validation	1 × 128

PERMALINK

An efficient image descriptor for image classification and CBIR

Ashkan Shakarami

Hadis Tarrah

Abstract

1. Introduction

2. Related works

3. Proposed method

Fig. 1.

4. Simulation and comparison results

4.1. Used datasets

Table 1.

4.1.1. Corel-1000 dataset

4.1.2. Oliva and Torralba (OT) dataset

4.1.3. FP (Caltech-101) dataset

4.2. Simulations and evaluation of proposed method for image classification

4.2.1. Simulation and evaluation of proposed method on Corel-1000 dataset for image classification

Table 2.

Table 3.

4.2.2. Simulation and evaluation of proposed method on OT dataset for image classification

Table 4.

Table 5.

4.2.3. Simulation and evaluation of proposed method on FP dataset for image classification

Table 6.

Table 7.

4.3. Simulations and evaluation of proposed method for Contend-based Image Retrieval (CBIR)

4.3.1. Simulation and evaluation of proposed method on Corel-1000 dataset for CBIR

Table 8.

4.3.2. Simulation and evaluation of proposed method on OT dataset for CBIR

Table 9.

4.3.3. Simulation and evaluation of proposed method on FP dataset for CBIR

Table 10.

4.4. The proposed method’s mAP-Mean recall plots

Plot 1.

Plot 2.

Plot 3.

4.5. Visual representation of proposed method's retrieval performance

4.5.1. Visual representation of proposed method's retrieval for Corel-1000 dataset

Table 11.

4.5.2. Visual representation of the proposed method's retrieval for OT dataset

Table 12.

4.5.3. Visual representation of proposed method's retrieval for FP dataset

Table 13.

4.6. Comparison results

Table 14.

Table 15.

5. Conclusion and future works

Declaration of Competing Interest

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases