Static hand gesture recognition based on hierarchical decision and classification of finger features

Yunfeng Li; Pengyue Zhang

doi:10.1177/00368504221086362

. 2022 Mar 17;105(1):00368504221086362. doi: 10.1177/00368504221086362

Static hand gesture recognition based on hierarchical decision and classification of finger features

Yunfeng Li ^1,^2,^✉, Pengyue Zhang ¹

PMCID: PMC10358564 PMID: 35296188

Abstract

Considering that the distinctions among static hand gestures are the difference between fingers sticking out, a method of grouping and classifying hand gestures step by step by using the information of the quantity, direction, position and shape of the outstretched fingers was proposed this paper. Firstly, the gesture region was segmented by using the skin color information of the hand, and the gesture direction was normalized by using the direction information of the gesture contour lines. Secondly, the finger was segmented one by one by using convex decomposition in the hand gesture image based on the convex characteristic of the gesture shape. Thirdly, the features of quantity, direction, position and shape of the segmented fingers were extracted. Lastly, a hierarchical decision classifier embedded with deep sparse autoencoders was constructed. The quantity of fingers was used to divide the gesture images into groups first, then the direction, position and shape features of the fingers were used to subdivide and recognize gestures within each group. The experimental results show that the proposed method is robust as lighting, direction and scale changes and significantly superior to the traditional method both in the recognition rate and the recognition stability.

Keywords: gesture recognition, convex decomposition, finger feature, hierarchical classification, deep learning

Introduction

Nowadays, almost all man-machine interactions are done by hands, such as using hands to input data into or read information from the devices by manipulating the keyboard, mouse or touch screen, using data gloves to perceive grasping, movement, rotation and other motions of the hand for the interaction with the virtual reality system. Traditional ways of man-machine interaction need tedious and boring operations. Even complex hardware systems are required. All these have brought a great inconvenience to the users. The development of artificial intelligence technology provides conditions for the vision based non-contact human-computer interaction. Vision based hand gesture recognition technology relieves the constraints and limits of traditional man-machine interaction on the users dramatically, and it is helpful for the natural expression of human gestures. Vision based hand gesture recognition technology has wide application prospects in the field of human-computer interaction,^1,2 human-robot interaction,^3,4 sign language interaction,^5,6 surgeon-computer interaction,^7,8 smart home appliances,^9,10 virtual reality^11,12 and game interface,^13,14 etc.

Vision based hand gesture recognition technology refers to the technology that performs the processing on images or videos containing hand gestures by using computer vision algorithms, and identifies the messages that users send with different hand gestures further. Hand gesture recognition can be divided into static hand gesture recognition and dynamic hand gesture recognition.^15–17 Static hand gesture recognition obtains the meaning represented by each category of hand gesture through the processing of the hand gesture images according to the combining state of different fingers stretching out, while dynamic hand gesture recognition identifies the meaning expressed by the hand gesture through the processing of the hand gesture videos according to the trajectory, velocity and angle of the hand motion. Although the research on gesture recognition has lasted for decades, it is still an open problem now. Existing gesture recognition methods lack the ability to adapt to the changes of illumination, direction and scale. The essence of the problem is that neither method can fully extract the changeable characteristics of gestures. The motivation of this paper is to explore a novel hand gesture representation and recognition method which is robust enough to lighting, direction and scale changes.

Static hand gesture conveys a different meaning by sticking out different combinations of fingers. Typical static hand gestures are shown in Figure 1. The distinctions among static hand gestures are the difference between fingers sticking out, so the essence of static hand gesture recognition lies in the judgment of the state of the fingers stretching out. In the case of large category quantity of hand gestures, they can be grouped and classified step by step, by using the information of the quantity, direction, position and shape of the outstretched fingers. This will gradually narrow the classification scope of the hand gestures, and also perform a classification according to the characteristics of each gesture.

Figure 1. — Typical static hand gestures (a) palm (b) three-left (c) V (d) loose.

A hierarchical decision and classification method for hand gesture recognition by using the finger features was proposed in this paper. Firstly, the gesture region was segmented by using the skin color information of the hand, and the gesture direction was normalized by using the direction information of the gesture contour lines. Secondly, the finger was segmented one by one by using convex decomposition in the hand gesture image based on the convex characteristic of the gesture shape. Thirdly, the features of quantity, direction, position and shape of the segmented fingers were extracted. Lastly, hand gestures were grouped and recognized by using hierarchical decision classifier which is embedded with deep sparse autoencoders. The flow chart of the proposed method is shown in Figure 2. The main contributions of this paper are summarized as following:

A gesture preprocessing method was proposed for the gesture image with unconstrained background.
Specific convex decomposition conditions were established especially for hand gesture decomposition.
A new descriptor based on direction, position and shape for hand gesture was constructed.
A new hierarchical decision classifier embedded with deep sparse autoencoders was proposed for hand gesture recognition.
Special considerations for robustness on lighting, direction and scale variations were included in our method.

Figure 2. — Flow chart of gesture recognition.

The remainder of the paper is organized as following. Section ‘Related work’ reviews the related researches on hand gesture recognition in recent years. The segmentation of the gesture region and the normalization of the gesture direction are introduced in Section ‘Preprocessing of hand gesture image’. The method of finger segmentation based on convex decomposition is explained in detail in Section ‘Convex decomposition of hand gesture’. Extraction methods of the features of the direction, position and shape of the segmented fingers are described in Section ‘Feature extraction of fingers’. The hierarchical decision classifier and corresponding classification method are established in Section ‘Hand gesture classification’. Experimental results and comparisons in different scenarios are demonstrated in Section ‘Experiment and discussion’. Conclusions and future work are given in Section ‘Conclusion’.

Related work

Vision based gesture recognition technology is a challenging frontier in the world and attracting a lot of interest from researchers. The main tasks of static hand gesture recognition include feature extraction and feature classification of the hand gesture images. In the aspect of feature extraction of gestures, studies were mainly carried out on the extraction of geometric features, moment features, contour features, histograms of oriented gradients and wavelet features of hand gesture images in the literature. Lopez-Casado et al.⁷ introduced a hand gesture geometric descriptor based on the distances and orientations of the lines connecting the hand contour convex extrema and concave extrema, and a linear Support Vector Machine classifier was used for hand gesture recognition. Wu et al.¹⁸ presented a hand gesture recognition method based on hand shape. Fingertips and concave points between fingers were detected by using convex hull, and the numbers of fingertips and concave points were used for hand gestures recognition. Dominio et al.¹⁹ extracted four different sets of hand gesture feature, which are the distances of the fingertips from the hand center and the palm plane, the curvature of the hand contour and the geometry of the palm region. Marin et al.²⁰ employed the distances from the hand centroid, the curvature of the hand contour and the convex hull of the hand shape as gesture feature descriptors. The extracted feature sets were used for hand gesture recognition. Park et al.²¹ applied masked Zernike moment features for hand gesture recognition. They presented two categories of masks to eliminate the overlapped information of hand images due to their shape characteristic. The internal mask was used to eliminate overlapped information in hand images, while the external mask was used to weigh the outstanding features of hand images. Priyal and Bora²² presented a hand gesture recognition method by using geometry based normalizations and Krawtchouk moment features. The regions constituting the hand and the forearm were extracted through skin color detection and the anthropometric measures. Rotation normalization was used for aligning the extracted hand and the Krawtchouk moment features were used to represent the hand gestures. Chevtchenko et al.²³ researched the multi-objective optimization problem in hand gesture feature selection. They used Hu moments and Gabor features to represent a hand gesture. Gesture recognition was undertaken by a multilayer perceptron. Both the feature vector and the neural network were tuned by a multi-objective evolutionary algorithm. Elouariachi et al.²⁴ proposed a quaternion Tchebichef moment invariants by using the quaternion algebra for extracting the gesture features. Based on the algebraic properties of the discrete Tchebichef polynomials, the direct derivation of invariants from their orthogonal moments has the robustness against geometrical distortion, noisy conditions and complex background. Ren YY et al.²⁵ proposed a contour based static hand gesture recognition method. They performed direction normalization of hand gesture images by applying a multi-scale weighted histogram of contour direction. The contour direction histogram was weighted by considering the position and direction of each contour point. Time-series curve feature was extracted from the hand contour for hand gesture recognition. Ren Z et al.²⁶ applied near-convex decomposition to obtain the finger clusters in their time-series curves of the hand gesture. A distance metric called Finger-Earth Mover's distance which considers each finger as a cluster and penalizes the empty finger-hole was used for hand gesture recognition. As it only matches fingers while not the whole hand shape, it can better distinguish hand gestures with slight differences. Feng and Yuan²⁷ proposed a static hand gesture recognition method based on gradient direction histogram features. Gradient direction histogram feature is operating on the local grid unit of image, so it can maintain a good invariance on geometric and optical deformation. Ding et al.²⁸ presented a cascade feature of the histograms of oriented gradients and improved local binary pattern to represent the hand gesture. Pansare et al.²⁹ used edge orientation histogram to extract the feature of hand gesture images. Huang et al.³⁰ applied Gabor filtered image for hand gesture representation and PCA method for feature dimensionality reduction. Parvathy and Subramaniam³¹ applied 2D-Discreete Wavelet Transform to reduce the hand gesture image size and Harris corner detector to extract key points of the hand. The geometric contour feature was extracted for a window centered on the detected key point. Liu et al.³² proposed a tortoise model to describe the hand gesture. The tortoise model is composed of the hand gesture features, such as the radius of the palm, the radius of the wrist, the number of fingers, the length and width of fingers, etc. These features were extracted by using the concentric circular scan lines of the palm. Yang et al.³³ employed saliency based feature and sparse representation for hand gesture recognition. The block radial histogram based saliency features of the hand gestures were extracted, and the histogram intersection kernel function was used to map the extracted features into the kernel feature space. Wu et al.³⁴ extracted the length feature, angle feature and angular velocity feature based on hand joint coordinates acquired by the Leap Motion. These features were fed into the long-short time memory recurrent neural network to predict the gesture. Zhang et al.³⁵ proposed a hand gesture recognition algorithm based on geometric features. Some length parameters of the palms were used to divide the hand gestures into different types firstly, and the area-perimeter ratio and effective-area ratio of the hand gesture were extracted for gesture recognition. Zhang et al.³⁶ proposed a distinctive fingertip gradient orientation with finger Fourier descriptor and modified Hu moments for depth gesture images collected by Kinect sensor. A weighted AdaBoost classifier based on finger-earth mover's distance and SVM models was used to realize the hand gesture recognition.

In the aspect of classification and recognition of hand gestures, hand gesture recognition by a neural network based on deep learning shows great potential.³⁷ Oyedotun and Khashman¹⁵ applied deep learning-based networks to the task of recognizing hand gestures. The segmented binary hand gesture images were used to train Convolutional Neural Network and stacked denoising autoencoder respectively, the trained network was tested for hand gesture recognition. Hu et al.³⁸ used a Deep Belief Network which is composed of three Restricted Boltzmann Machines for hand gesture recognition. Tang et al.³⁹ applied Deep Neural Networks to automatically learn features from the hand gesture images that are insensitive to movement, scaling, and rotation. Chen et al.⁴⁰ proposed a pose guided structured region ensemble network for hand pose estimation. This network extracts regions from the feature maps of Convolutional Neural Network and generates optimal and representative features for hand pose estimation. Jain et al.⁴¹ used Shift Invariant Convolutional Deep Structured Neural Learning with Long Short-Term Memory and Bivariate Fully Recurrent Deep Neural Network with Long Short-Term Memory for gestures classification. The proposed method can automatically learn the features and the data to minimize time complexity in gesture recognition. Zhang et al.⁴² proposed a gesture recognition method based on Deconvolutional Single Shot Detector. They used the K-means clustering algorithm to select the aspect ratios of the prior boxes to improve the detection accuracy. By using the transfer learning method, the detection accuracy of small gesture data set is improved. Bhaumik et al.⁴³ proposed a hybrid feature attention network by stacking four multi-scale refined edge extraction modules for hand gesture recognition. The purpose of edge extraction module was to capture the refined edge information of hand gestures by incorporating hybrid feature attention block. Noreen et al.⁴⁴ proposed a 2D CNN model with four parallel streams to classify hand gestures with depth data. Each stream received input samples from the gesture data and the 2-D convolution process was followed in parallel. SoftMax was applied for the final classification. Iglesias et al.⁴⁵ specifically designed a CNN network with a small architecture for using in computationally limited devices. The CNN network adopted a Darknet reference model which has high speed on the detection stage while having simple network architecture. Kowdiki and Khaparde⁴⁶ developed a dynamic hand gesture segmentation and deep learning-based strategy for gesture recognition. The segmentation of gesture was performed by adaptive Hough transform in which the theta value was optimized by the Whale Optimization Algorithm. Segmented gesture images were classified by the optimized Deep CNN. Al-Hammadi et al.^47,48 proposed a system for dynamic hand gesture recognition using multiple deep learning architectures for hand segmentation, local and global feature representations, and sequence feature globalization and recognition. Two 3DCNN instances were used separately for learning the fine-grained features of the hand shape and the coarse-grained features of the global body configuration.

Human hand is composed of palm, fingers and joints, and the joints have more than 20 degrees of freedom. The acquisition of hand gesture images is conducted in unconstrained environments.⁴⁹ There are physiological differences among individual hands. In addition, there exist the interference from changes of lighting, shielding, background, direction, scale, position and angle of view, all these cause the patterns of hand gesture images to be very complicated. Although there have been many researches on hand gesture recognition, the robustness of the existing hand gesture recognition method is far from the needs of the practical application.⁵⁰

Preprocessing of hand gesture image

The purpose of preprocessing is to segment the hand region from the captured hand gesture image and conduct direction correction for the hand gesture. The skin area was segmented by Bayesian decision in YCrCb color space firstly. Then, the center point of the hand gesture was determined by using the maximum inscribed circle in the gesture region, and the forearm was removed based on the obtained center point. Lastly, Hough transform was used to detect the direction of the linear features on the hand contour and the gesture was rotated to the vertical direction according to the direction of the detected linear features.

Skin area segmentation

The commonly used skin area segmentation is completed by threshold processing. This fixed threshold segmentation method cannot adapt to the influence of illumination changes. So posterior probability based Bayesian decision method was used to segment the skin color area in the hand gesture image. Considering that the skin color features have excellent clustering characteristics in YCrCb color space, Cr and Cb components of hand gesture images in YCrCb color space were used to form skin color feature vector. Firstly, the prior probability of the feature vector of skin color region and background region was established respectively. Then, according to the values of Cr and Cb components in different regions of the image, the skin color gesture region was segmented by Bayesian decision formula.

As segmenting the skin area, the pixels in the image were divided into two categories, namely skin color area pixels and background area pixels. They were labeled as category 1 and category 2 respectively. Select some sample images from the hand gesture image database for building Bayesian model. In YCrCb space of the hand gesture image, mark the pixel points of the skin area and the pixel points of the background area with category labels respectively. Establish the prior probability of the feature vectors of the skin area and background area by using Cr and Cb components of the pixels.

Suppose samples of category 1 in the labeled pixels are $(x_{1}^{(1)}, x_{2}^{(1)}, \dots, x_{n}^{(1)})$ . The samples of category 2 are $(x_{1}^{(2)}, x_{2}^{(2)}, \dots x_{m}^{(2)})$ . Each sample contains two features. They are the Cr and Cb components of the pixel respectively. According to the known sample images, calculate the prior probabilities of category 1 and category 2:

P (Y = 1) = \frac{n}{n + m}

(1)

P (Y = 2) = \frac{m}{n + m}

(2)

where n is the number of pixels labeled as category 1 in all sample images, m is the number of pixels labeled as category 2 in all sample images, and Y is the pixel category label.

Then, calculate the conditional probability as $C r = a$ and $C b = b$ in each category of the samples respectively:

P (C r = a | Y = 1) = \frac{n^{a}}{n}

(3)

P (C b = b | Y = 1) = \frac{n^{b}}{n}

(4)

P (C r = a | Y = 2) = \frac{m^{a}}{m}

(5)

P (C b = b | Y = 2) = \frac{m^{b}}{m}

(6)

where $n^{a}$ and $n^{b}$ is the number of pixels in the sample category 1 as $C r = a$ and $C b = b$ respectively, $m^{a}$ and $m^{b}$ is the number of pixels in the sample category 2 as $C r = a$ and $C b = b$ respectively.

As for the input pixel $x^{C r = A, C b = B}$ , the output probability of category $Y = C (C \in 1, 2)$ is

P (Y = C | x^{C r = A, C b = B}) = P (x^{C r = A} | Y = C) \cdot P (x^{C b = B} | Y = C) \cdot P (Y = C)

(7)

According to Bayesian decision, the output category of the input pixel is:

Y = {\begin{matrix} 1, P (Y = 1 | X) > P (Y = 2 | X) \\ 2, P (Y = 1 | X) < P (Y = 2 | X) \end{matrix}

(8)

A total of 600 images were selected from the sample dataset to build the Bayesian model for skin color segmentation. In order to obtain the representative model resistant to varying lighting conditions, these hand gesture images were captured in a shaded environment, natural lighting, indoor lighting and artificial lighting four different illumination conditions respectively. According to the component $C r$ and $C b$ of the pixels of the original image in Figure 1, the pixels are classified into skin pixels and background pixels by formula (7) and (8). Set the values of the skin pixels to be 1 and the background pixels to be 0, so the skin area segmented binary image can be obtained as shown in Figure 3.

Figure 3. — Skin area segmented results (a) palm (b) three-left (c)V (d) loose.

Hand region segmentation

After skin color area segmentation, the obtained binary gesture image may also contain non-gesture areas such as face, wrist and arm, etc. These areas are redundant for gesture recognition based on hand; their presence will affect and interfere with the extraction and recognition of hand gesture features. Therefore, areas in the image that are not related to hand gestures need to be removed. The hand gesture area was segmented by the following steps. Traverse the whole hand gesture image and judge the connectivity of the 3 by 3 neighborhood of each pixel, set the connected pixels belong to the same connected region. So some connected areas in the image can be obtained which correspond to the hand region, face region and other regions respectively. Calculate the perimeter, area and roundness features of each connected area. According to the value of these features, retain the hand gesture area and remove the other areas. The processed results are shown in Figure 4.

Figure 4. — Connectivity area segmented results (a) palm (b) three-left (c) V (d) loose.

The segmented hand gesture area of the image still contains the arm part. The maximum inner circle of the hand gesture area was used to determine the hand gesture center, and the arm part was removed by using this center point as the reference. All pixels in the hand gesture area in the binary image are marked as set I. Take any point in the gesture area as the circle center, draw a circle with the radius r, and mark all pixels inside the circle as the set $I_{c l}$ . The maximum inner circle $C$ of the hand gesture area should satisfy the following conditions:

C = {(x_{c}, y_{c}, r) | I_{c l} \subset I, r \to max}

(9)

where $(x_{c}, y_{c})$ is the coordinates of the circle center, r is the circle radius. Traverse the position of the circle center and change the radius of the circle in the hand gesture area, and the result of Figure 5 (c) can be obtained as the formula (9) is satisfied. Thus, the inner circle center coordinates $(x_{c}, y_{c})$ and the maximum inner circle center radius $r_{m}$ are obtained. Draw the lower half circle with the coordinates $(x_{c}, y_{c})$ as the center and the value $1.5 r_{m}$ as the radius. Set the values of the pixels outside the lower left half circle to be 0, so the wrist and arm area are removed as shown in Figure 5 (d).

Figure 5. — Arm removing process (a) Traverse circle position (b) Change circle radius (c) Maximum inner circle (d) Arm removed.

Gesture direction normalization

Hand gesture interaction is carried out under unconstrained conditions, so the gesture direction in the acquired image shows certain arbitrariness. In the process of hand gesture recognition, in order to facilitate the comparison between different hand gesture images, it is necessary to unify all gestures into the same direction by rotating the gesture image with a certain angle. The gesture direction is mainly embodied in the gesture contours, so the needed rotation angle for correction was obtained through calculating the contour direction. Thus all gesture directions can be normalized by rotating the images.

Considering that the contours on both sides of the fingers and the palms show the characteristics of a straight line, Hough transform was used to detect the direction of the linear features on the gesture contours. Then the needed correction angle can be determined according to the average direction of the detected lines. For a straight line in the pixel coordinate space $x - y$ , the equation can be expressed as:

ρ = x \cos θ + y \sin θ

(10)

where $ρ$ and $θ$ are the parameters of the straight line.

According to the above equation, for every point on a straight line in the $x - y$ coordinate space, there exists a corresponding curve in the $ρ - θ$ parameter space. All these curves intersect at the same point $(ρ, θ)$ , and this is the value of the parameters $(ρ, θ)$ of the linear equation in the $x - y$ coordinate space.

When detecting straight lines on the gesture contours, a two-dimensional accumulative array of $ρ - θ$ parameter space was established, and the element of each $(ρ, θ)$ position was denoted as $A r (ρ, θ)$ . For each pixel point $(x, y)$ on the gesture contour, the $ρ$ value was calculated according to equation (9) as $θ$ changing continuously. For each $(ρ, θ)$ , add 1 to the element $A r (ρ, θ)$ in the corresponding position. When all the pixels in the $x - y$ coordinate space have been calculated, the $(ρ, θ)$ values corresponding to the element with the largest values in the accumulative array are the parameters in equation (10) of the line. The main direction of the gesture can be obtained by averaging the directions of the detected lines, and then rotate the gesture to the vertical direction according to the value of the main direction. The boundary of the gesture area was obtained by the grayscale projection of the obtained image, and the rectangular portion tightly including the hand area was extracted by using the position of the boundary. The preprocessed hand gesture images are shown in Figure 6.

Figure 6. — Direction normalized results (a) palm (b) three-left (c) V (d) loose.

Convex decomposition of hand gesture

Convex decomposition²⁶ is the process of decomposing the convex shape from the original shape of the image. The purpose of the convex decomposition of hand gesture is to separate the fingers from the gesture silhouette, and the information of the decomposed fingers is used to construct the feature vector for hand gesture recognition. In order to perform near-convex decomposition of hand gesture in the image, edge detection is carried out on the pre-processed hand gesture image to obtain the contour of the hand. Firstly, according to the physiological characteristics of the human hand silhouette, the candidate cutting line required for the convex decomposition was obtained. Then, the optimal cutting line for each finger in the gesture was determined according to the convexity of the decomposed finger shape and the number of spacing contour points between the two endpoints of the candidate cutting line.

Determination of candidate cut lines

In the process of convex decomposition of the hand gesture, in order to decompose the fingers and reduce the calculation, points were taken on the gesture contour at intervals. Starting from the contour point directly below the center of the gesture region, the contour points were numbered clockwise. As shown in Figure 7, suppose $V_{1}$ , $V_{2}$ , $V_{3}$ and $V_{4}$ are 4 contour points in the interval contour point set. When the connecting line between the two contour points is within the gesture contour or overlaps with the contour, the two contour points are called a visible pair. In Figure 7, $V_{2} V_{4}$ is a visible pair, because the connecting line between the two contour points is within the gesture contour, while $V_{1} V_{2}$ and $V_{2} V_{3}$ are not, because all or part of their connecting line is outside the gesture contour.

Figure 7. — Visible pair for convex decomposition.

Because the width of the finger contour is narrow, so the cutting line used in convex decomposition of the hand gesture is shorter. The candidate cut lines of convex decomposition are the connecting lines between the visible pairs conforming to the short cut rule. To find the visible pairs that conform to the short cut rule, calculate the distance between each pair of visible pairs in the interval contour point set, and normalize them to be $d_{L}$ by using the radius of the maximum inscribed circle of the hand gesture region. On the other hand, the two ends of the cutting line are distributed on the contour lines on both sides of the finger, and the contour lines between them are also arranged with multiple interval contour points. Therefore, the relative distance $d_{L}$ between the visible pair and the number $h_{L}$ of the interval contour points between the visible pair are used to determine whether a cutting line is a candidate cutting line. Set the distance threshold $T h_{d}$ and interval threshold $T h_{h}$ for judgment:

S (d_{L}, h_{L}) = {\begin{matrix} 1, d_{L} \leq T h_{d}, h_{L} \geq T h_{h} \\ 0, other \end{matrix}

(11)

If the value of $S (d_{L}, h_{L})$ equals to 1, it indicates that the visible pair conforms to the threshold value condition, and the cut line formed by it will be retained. Processed candidate cut lines by using this condition are shown in Figure 8. The lines in the figure are the candidate cutting line retained after threshold judgment.

Figure 8. — Candidate cut lines for hand gesture decomposition (a) palm (b) three-left (c) V (d) loose.

Selection of optimal cut line

In the convex decomposition of the hand gestures, each decomposed shape may not be a strict convex shape, but it should be as convex as possible. Suppose $G_{m}$ is the number of all visible pairs in the interval contour point set, and $G_{n}$ is the number of the line segments connecting all contour points in the interval contour point set. A variable $C_{R}$ is defined to represent the convexity of the decomposed shape to measure whether the decomposed shape satisfies the convex decomposition:

C_{R} = \frac{G_{m}}{G_{n}}

(12)

According to the above formula, when the shape is strictly convex, such as a circle, rectangle, etc., $C_{R}$ equals to 1. The optimal cutting line is the candidate cut line which ensures the average value of all the convexities of the decomposed shapes from the gesture to be maximum. In order to get the optimal cutting line for each finger, reserved cutting lines were numbered and grouped. Since all interval contour points have been assigned serial numbers, the smaller of the two contour points in the visible pair was taken as the cutting line number. Then the cutting lines were divided into several groups according to the degree of similarity of the serial number of them. Each set of cutting lines corresponds to one finger, and on this basis the candidate cutting lines of each finger were determined.

Calculate the convexity $c_{L}$ of the finger decomposed by each cut line in each group, and count the number $g_{L}^{(i, j)}$ of the interval contour points between the visible pair which constitutes the cut line. Set the convex threshold $T h_{c}$ to measure whether the hand gesture decomposition satisfies the convexity:

M (c_{L}, g_{L}) = {\begin{matrix} 1, c_{L} > T h_{c}, g_{L} \to max \\ 0, other \end{matrix}

(13)

If the value of $M (c_{L}, g_{L})$ equals to 1, it indicates that the cutting line constituted by the visible pair conforms to the convexity condition of gesture decomposition. When a cutting line in each group satisfies equations (12) and (13) at the same time, then this cutting line is the optimal cutting line in this group, that is, the optimal cutting line of the corresponding finger. According to the order from left to right, each finger was decomposed one by one, and the decomposition results of typical hand gestures were shown in Figure 9. The decomposed fingers are marked with different colors in the figure.

Figure 9. — Convex decomposition results (a) palm (b) three-left (c) V (d) loose.

Feature extraction of fingers

Hand gesture is composed of palm and fingers. The difference among hand gestures is the quantity, direction, position and shape of the fingers. After convex decomposition, the outstretched fingers have been determined. So the outstretched fingers were used to establish the features for gesture recognition. Firstly, each finger was refined into a single pixel line that approximates the center line of that finger, and the direction of the finger was calculated by the pixel coordinates of the center line of the finger. Then, the position feature of the finger was determined according to the distribution of all pixel points of each decomposed finger on the circumference of the gesture. Finally, shape features of the decomposed finger were constructed by the scale invariant Hu moment feature.

Direction feature of finger

Each finger area was refined into a single pixel line that approximates the center line of the decomposed finger by using the image thinning algorithm, and then the pixel coordinates of the line are used to calculate the direction of the outstretched finger. Check each pixel of the image in the 3 × 3 neighborhood, if it satisfies: 1) there is no upper adjacent pixel, that is lower adjacent pixel, left adjacent pixel and right adjacent pixel; 2) not isolated point or termination line; 3) remove the pixel point will not disconnect the region, then remove the checked pixel. Scan the whole finger area, and repeat this step until no pixels can be removed. This process is realized by an iterative method, which removes the boundary layer by layer at a time until the finger area is refined into a central line. The original finger is represented by the corresponding center line after the refinement processing.

Select two pixels at a certain pixel interval on the thinning center line of the finger, and their coordinates are $(x_{1}, y_{1})$ and $(x_{2}, y_{2})$ respectively. The direction of the i th line formed by connecting the two pixels can be expressed as:

β_{f}^{i} = a r c t g \frac{y_{2} - y_{1}}{x_{2} - x_{1}}

(14)

The direction $β_{f}^{i} (i = 1, 2, 3, \dots, n)$ of $n_{l}$ lines can be obtained by selecting such pixels along the direction of the thinning center line of the finger. The average value of these $n_{l}$ directions is defined as the direction feature of the finger:

β_{f} = \frac{\sum_{i = 1}^{n} β_{f}^{i}}{n_{l}}

(15)

Position feature of finger

In order to avoid the influence of the inconsistency of the gesture direction on the recognition accuracy, the average value of all finger directions of each gesture was defined as the main direction of the gesture. The main direction was used as the benchmark for further direction correction of the hand gesture. The direction angle of each finger was calculated according to equation (15), and the main direction of the gesture was obtained by averaging the direction angle of all fingers. Rotate the gesture image with a certain angle according to the value of the main direction of the gesture, and adjust the main direction of all gesture images to the vertical direction.

With the center of the hand gesture region as the circle center and the horizontal rightward direction as the starting position, the gesture region was divided into 360 equal parts counterclockwise along the circumference, as shown in Figure 10. Each part was numbered in turn as $1, 2, 3, \dots, 360$ , these numbers were used to define the position of each finger. Suppose $p_{j}$ represents the position of the j th pixel included in a finger. If this pixel lies in the k th part, then $p_{j} = k$ . The position of this finger is the average of the numbers of the parts that all pixels lie in:

D = \frac{\sum_{j = 1}^{n_{f}} p_{j}}{n_{f}}

(16)

where $n_{f}$ is the number of pixels contained in the finger.

Figure 10. — Partitions of gesture image.

Shape feature of finger

Each finger was separated from the palm of the hand through convex decomposition. In addition to the difference in direction and position of the fingers, the shapes of them also vary greatly. Since the acquisition of gesture images is carried out under unconstrained conditions, the same finger has the change of zoom, rotation, translation and other modes.Hu invariant moments have rotation and translation invariance. Based on Hu invariant moments,⁵¹ moments with scale invariant characteristics were constructed here to describe the shapes of different fingers.

Suppose the value of the pixel with coordinates $(x, y)$ in the image is $f (x, y)$ , then the central moment of the image is:

u_{p q} = \sum_{(x, y) \in R} {\sum (x - x_{c})}^{p} {(y - y_{c})}^{q} f (x, y)

(17)

where $(x_{c}, y_{c})$ is the coordinate of the central pixel of the moment, $p, q = 0, 1, 2, \cdot \cdot \cdot .$

The normalized central moment is:

η_{p q} = \frac{u_{p q}}{u_{00}^{(p + q) / 2 + 1}}

(18)

The pixel coordinates of the image after scaling are $x^{'} = ρ x$ and $y^{'} = ρ y$ , and the central moment of the scaled image is:

u_{p q}^{'} = \sum_{(x, y) \in R} \sum ρ^{p} {(x - x_{c})}^{p} ρ^{q} {(y - y_{c})}^{q} f (x, y) = ρ^{p + q} u_{p q}

(19)

where $ρ$ is the is the scaling factor.

According to the above relation, the normalized central moment of the scaled image is:

n_{p q}^{'} = ρ^{p + q} η_{p q}

(20)

Then, the relationship between Hu invariant moments $ϕ_{1}, ϕ_{2}, ϕ_{3}, \dots, ϕ_{7}$ and Huinvariant moments $ϕ_{1}^{'}, ϕ_{2}^{'}, ϕ_{3}^{'}, \dots, ϕ_{7}^{'}$ with scale variation are as follows:

ϕ_{1}^{'} = ρ^{2} \cdot ϕ_{1}

(21)

ϕ_{2}^{'} = ρ^{4} \cdot ϕ_{2}

(22)

ϕ_{3}^{'} = ρ^{6} \cdot ϕ_{3}

(23)

ϕ_{4}^{'} = ρ^{6} \cdot ϕ_{4}

(24)

ϕ_{5}^{'} = ρ^{12} \cdot ϕ_{5}

(25)

ϕ_{6}^{'} = ρ^{8} \cdot ϕ_{6}

(26)

ϕ_{7}^{'} = ρ^{12} \cdot ϕ_{7}

(27)

From the above formula, it can be seen that the scaling factor $ρ$ affects the result of Hu moment. Therefore, the following moment features with scaling invariance were constructed to eliminate this effect:

M_{1} = \lg | \frac{ϕ_{2}}{ϕ_{1}^{2}} |

(28)

M_{2} = \lg | \frac{ϕ_{3}}{ϕ_{1}^{3}} |

(29)

M_{3} = \lg | \frac{ϕ_{4}}{ϕ_{1}^{3}} |

(30)

M_{4} = \lg | \frac{ϕ_{5}}{ϕ_{1}^{6}} |

(31)

M_{5} = \lg | \frac{ϕ_{6}}{ϕ_{1}^{4}} |

(32)

M_{6} = \lg | \frac{ϕ_{7}}{ϕ_{1}^{6}} |

(33)

$M_{1}, M_{2}, M_{3}, \dots, M_{6}$ were used to describe the shape of each finger.

Hand gesture classification

Due to the unconstraint of the image acquisition environment and the physiological differences among individual hands, the acquired gesture images have the pattern variations of direction, scale, position and perspective. These bring great difficulty in the classification and recognition of hand gestures. In the case of large category quantity of the hand gestures, they can be grouped and classified step by step by using the information of the quantity, direction, position and shape of the outstretched fingers. This will gradually narrow the classification scope of the hand gestures, and perform classification according to the characteristics of each gesture. Based on this consideration, a hierarchical decision classifier with embedded deep sparse autoencoders was established to classify the hand gestures step by step. By using the method of threshold judgment, hand gestures were classified step by step by using the quantity, direction and position of the outstretched fingers. By using the output of deep network, the final recognition of hand gestures was realized by recognizing the shape of the outstretched fingers.

Deep sparse autoencoder

Deep sparse autoencoder establishes the correlation of the data by learning the characteristics of input data. In this paper, the deep sparse autoencoder used for finger shape classification has a four-layer network structure, that is one input layer, two feature layers and one Softmax classification layer.

The adopted network structure of the deep sparse autoencoder is shown in Figure 11. It was trained through layer by layer training method. Firstly, train the network between the Input layer and the Feature I layer by using the input feature data. Then, train the network between the Feature I layer and the Feature II layer by using the data of Feature I layer as input. Lastly, train the network between the Feature II layer and the Softmax layer by using the data of Feature II layer as input. As training for all layers has been finished, fine-tuned was conducted for the whole network. Consider all layers of the network as a model, optimize all connection weights in each iteration.

Hierarchical decision classifier

A hierarchical decision classifier embedded with deep sparse autoencoders was constructed for hand gesture recognition, as shown in Figure 12. The quantity of fingers was used to divide the gesture images into groups first, then the direction, position and shape features of the fingers were used to subdivide and recognize gestures within each group. The classification process is as follows:

Figure 12. — Hierarchical decision classifier.

Firstly, the hand gestures were divided into six groups according to the quantity of fingers stretching out, and each group includes 0, 1, 2, 3, 4 and 5 fingers respectively. Then, classification for each group was performed.

Classification of the first group: the quantity of the outstretched finger in the first group of gestures is 0, which only corresponds to one category of hand gesture “fist”, so it was directly recognized as “fist”.

Classification of the second group: “thumb left” was distinguished according to the direction of the outstretched finger, and then “one” and “one-right” were distinguished according to the position of the outstretched finger. In order to avoid confusion between the gesture “one” and gesture “thumb-left”, shape features were adopted to classify them again.

Classification of the third group: divide the gestures into two groups according to the position of the outstretched left finger, and then classify each group according to the position of the outstretched right finger, thus the gestures “loose”, “two-left”, “lock” and “V” can be recognized.

Classification of the fourth group: divide the gestures into two groups according to the position of the outstretched left finger. Then divide the first group into “ILY” and “three-left” according to the shape of the outstretched right finger, and divide the second group into “W” and “OK” according to the position of the outstretched right finger. In order to avoid confusion between the gesture “W” and the gesture “OK”, classify them again by using the shape feature of the outstretched right finger.

Classification of the fifth group: divide the hand gestures into gesture “four-right” and “four-left” according to the shape feature of the first outstretched finger on the left.

Classification of the sixth group: the quantity of the outstretched finger in the sixth group of gestures is 5, which only corresponds to one category of hand gesture “palm”, so it is directly recognized as “palm”.

Experiment and discussion

In order to evaluate the effectiveness and robustness of our proposed method, experimental evaluations were performed on our self-built gesture image dataset and the gesture image dataset²⁰ built by University of Padova.

Evaluation on our self-built dataset

Our self-built gesture image dataset contains 15 categories of hand gestures. The gestures are shown in Figure 13. The category number of the 15 categories of hand gestures is 1, 2, 3, … , 15 respectively. Each category of gesture was acquired 40 times from 5 people, and a total of 3000 gesture images were collected. The size of the images is 300 × 400. Among them, 900 images were used for the training of the hierarchical decision classifier, and the remaining 2100 images were used for the recognition test The effectiveness and robustness of the proposed method are verified by comparing it with the gesture recognition methods in the literature for the variation of lighting conditions, gesture direction and gesture scale respectively.

Comparison on illumination variation

In order to verify the robustness of the proposed method on illumination variation, hand gesture images were captured in a shaded environment, natural lighting, indoor lighting and artificial lighting four different illumination conditions respectively. Since the robustness to the variation of lighting conditions is mainly reflected in the segmentation and preprocessing method of the hand gesture region, the Bayesian decision method in YCrCb space used in this paper, k-means clustering method in RGB space⁵² and Gaussian model method in HSV space⁵³ was respectively adopted to segment the gesture images. Then, gesture recognition experiments were carried out using the processed images, the results were shown in Figure 14. It can be seen from the figure that the proposed method in this paper is significantly superior to the traditional method both in the recognition rate and the recognition stability of different gestures when the lighting condition of gesture image acquisition changes.

Figure 14. — Recognition results for illumination variation (a) Shaded environment (b) Natural lighting (c) Indoor lighting (d) Artificial lighting.

Comparison on gesture direction variation

In order to verify the robustness of the proposed method on gesture direction variation, hand gesture images were rotated by −20°, −10°, + 10°and + 20° respectively. The proposed method in this paper, the CNN method,⁵⁴ the Hu moment feature method²³ and the HOG feature method²⁷ was respectively adopted to conduct gesture recognition experiments, the results were shown in Figure 15. It can be seen from the figure that the proposed method in this paper is significantly superior to the traditional method both in the recognition rate and the recognition stability of different gestures when the direction of the hand gesture changes.

Figure 15. — Recognition results for rotation variation (a) Rotated by −20° (b) Rotated by −10° (c) Rotated by + 10° (d) Rotated by + 20°.

Comparison on gesture scale variation

In order to verify the robustness of the proposed method on gesture scale variation, the size of the gesture in the hand gesture image was rescaled by 0.5 times, 0.75 times, 1.5 times and 2 times respectively, as shown in Figure 16. The proposed method in this paper, the CNN method,⁵⁴ the Hu moment feature method²³ and the SIFT method⁵⁵ was respectively adopted to conduct gesture recognition experiments, the results were shown in Figure 16. It can be seen from the figure that the proposed method in this paper is significantly superior to the traditional method both in the recognition rate and the recognition stability of different gestures when the scale of the hand gesture changes.

Figure 16. — Recognition results for scale variation (a) Rescaled by 0.5 times (b) Rescaled by 0.75 times (c) Rescaled by 1.5 times (d) Rescaled by 2.0 times.

Evaluation on publicly available dataset

In order to verify the generalization ability of the proposed method on different gesture image dataset, the publicly available gesture image dataset20 built by the University of Padova was used for evaluation. The dataset contains 10 different categories of gestures, as shown in Figure 17. These gestures were acquired from 14 different persons; each person performed each gesture 10 times. A total of 1400 RGB gesture images were obtained The size of each image is 1280 × 960.

Figure 17. — Hand gestures from other dataset²⁰ G1 G2 G3 G4 G5 G6 G7 G8 G9 G10.

Four tests by using the leave-three-out method were performed on this dataset. In each test, 300 images from 3 persons were selected for training the classifier, and 1100 images from the remaining 11 persons were used for the recognition test Different training sets and testing sets are selected from different persons for each test For example, the images from person P1 to P3 were used for training in Test 1. The training sets and testing sets selection of each test are shown in Table 1.

Table 1.

Training sets and testing sets selection for each test.

	P1	P2	P3	P4	P5	P6	P7	P8	P9	P10	P11	P12	P13	P14
Test 1	train	train	train	test	test	test	test	test	test	test	test	test	test	test
Test 2	test	test	test	train	train	train	test	test	test	test	test	test	test	test
Test 3	test	test	test	test	test	test	train	train	train	test	test	test	test	test
Test 4	test	test	test	test	test	test	test	test	test	train	train	train	test	test

Open in a new tab

The proposed method in this paper, the CNN method,⁵⁴ the Hu moment feature method²³ and the SIFT method⁵⁵ was respectively adopted to conduct gesture recognition experiments, the results were shown in Figure 18. It can be seen from the figure that the proposed method in this paper is significantly superior to the traditional method both in the recognition rate and the recognition stability of different gestures on publicly available dataset.

Figure 18. — Recognition results on publicly available gesture dataset (a) Test 1 (b) Test 2 (c) Test 3 (d) Test 4.

Conclusion

In this paper, the hand gesture region was obtained by skin color segmentation in YCrCb space. This processing utilizes good clustering characteristics of skin color in YCrCb space and overcomes the influence of changes in lighting conditions on gesture recognition. Before the finger segmentation, the hand gesture direction was corrected by using the direction of the gesture contour lines. After the finger segmentation, the hand gesture direction was corrected further by using the direction of the finger direction. These processing overcome the influence of changes in gesture direction on gesture recognition. The shapes of different fingers were described by constructing Hu moment features with scale invariance. This representation overcomes the influence of changes in gesture scale on gesture recognition. In the stage of gesture recognition, the generalization ability of the classifier is improved by embedding the embedded deep sparse autoencoders in the classifier. The experimental results show that the proposed method is robust as lighting, direction and scale changes and significantly superior to the traditional method both in the recognition rate and the recognition stability. Further research will be considered in the following aspects. Firstly, feature extraction will be not limited to the silhouette image based method in this paper, but conducted from the contour curves of the binary gesture images or texture of the grayscale gesture images. Secondly, optimization, selection and weighting processing of the extracted features will be studied for simplifying calculations. Finally, in algorithm design aspect, not only the recognition accuracy and robustness would be considered, but also the convenience of use and efficiency of operation will be considered.

Acknowledgements

None.

Author biographies

Yunfeng Li received his Ph.D. degree from Dalian University of Science and Technology, China. He is currently working as an associate professor in School of Mechatronics Engineering of Henan University of Science and Technology, China. He is one of the members of China Instrument and Control Society. His main research interests include image understanding, biometric recognition and machine learning.

Pengyue Zhang received his B.S. degree in Measurement Control Technology and Instrumentation from Henan University of Science and Technology, China. Now he is a master degree candidate in School of Mechatronics Engineering of Henan University of Science and Technology. His main research interests include digital image processing and image recognition.

Footnotes

Author contributions (roles): Yunfeng Li: proposed conceptualization and methodology, performed calculating.

Pengyue Zhang: performed experiments and data processing.

Ethical Approval /Patient consent: Topic of the paper does not based on the human subjects, thus no ethical approval and patient consent are required.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: Yunfeng Li https://orcid.org/0000-0003-4816-2298

References

1.Kiliboz NC, Gudukbay U. A hand gesture recognition technique for human-computer interaction. J Vis Commun Image R 2015; 28: 97–104. [Google Scholar]
2.Rautaray SS, Agrawal A. Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 2015; 43: 1–54. [Google Scholar]
3.Wang M, Chen WY, Li XD. Hand gesture recognition using valley feature and Hu's Moments technique for robot movement control. Meas 2016; 94: 734–744. [Google Scholar]
4.Xing L. Human-robot interaction based on gesture and movement recognition. Signal Process-Image 2020; 81: 115686. [Google Scholar]
5.Neiva DH, Zanchettin C. Gesture recognition: a review focusing on sign language in a mobile context. Expert Syst Appl 2018; 103: 159–183. [Google Scholar]
6.Almasre MA, Al-Nuaim H. A comparison of Arabic sign language dynamic gesture recognition models. Heliyon 2020; 6: e03554. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lopez-Casado C, Bauzano E, Rivas-Blanco I, et al. A gesture recognition algorithm for hand-assisted laparoscopic surgery. Sensors 2019; 19: 5182. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.A-Reum L, Yongwon C, Seongho J, et al. Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Comput Meth Prog Bio 2020; 190: 105385. [DOI] [PubMed] [Google Scholar]
9.Abid MR, Petriu EM, Amjadian E. Dynamic sign language recognition for smart home interactive application using stochastic linear formal grammar. IEEE T Instrum Meas 2015; 64: 596–605. [Google Scholar]
10.Dinh DL, Ngoan PTK, Thang ND, A single depth silhouette-based hand gesture recognition for appliance interfaces in smart home environment. In: 6th International Conference on the Development of Biomedical Engineering in Vietnam (ed Vo Van T, et al. ). Ho Chi Minh, Vietnam, June 2016. IFMBE Proceedings , vol 63, pp.369–373. Singapore: Springer. doi: 10.1007/978-981-10-4361-1_62 [DOI] [Google Scholar]
11.Rautaray SS, Agrawal A. Manipulating objects through hand gesture recognition in virtual environment. In: Advances in Parallel Distributed Computing (ed Nagamalai D, et al. ). Tirunelveli, India, Sept 23–25, 2011. Communications in Computer and Information Science , vol 203, pp. 270–281. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-24037-9_26 [DOI] [Google Scholar]
12.Sagayam KM, Hemanth DJ. Hand posture and gesture recognition techniques for virtual reality applications: a survey. Virtual Real 2017; 21: 91–107. [Google Scholar]
13.Lee DH, Hong KS. Game interface using hand gesture recognition. In: 5th International Conference on Computer Sciences and Convergence Information Technology (ed Ko F, Na YJ.). Seoul, Korea (South), Nov 30-Dec 2, 2010. ICCIT2010 Proceeding, pp.1092–1097. doi: 10.1109/ICCIT.2010.5711226 [DOI] [Google Scholar]
14.Zhu YM, Yuan B. Real-time hand gesture recognition with kinect for playing racing video games. In: Proceedings of IEEE International Joint Conference on Neural Networks, Beijing, China, July 6–11, 2014. pp. 3240–3246. Los Alamitos: IEEE Computer Society. doi: 10.1109/IJCNN.2014.6889481. [DOI] [Google Scholar]
15.Oyedotun OK, Khashman A. Deep learning in vision-based static hand gesture recognition. Neural Comput & Applic 2017; 28: 3941–3951. [Google Scholar]
16.Jian CF, Li JJ. Real-time multi-trajectory matching for dynamic hand gesture recognition. IET Image Proc 2020; 14: 236–244. [Google Scholar]
17.Verma B, Choudhary A. Grassmann manifold based dynamic hand gesture recognition using depth data. Multimed Tools Appl 2020; 79: 1–25. [Google Scholar]
18.Wu X, Zhang Q, Li ZM. A method research of hand gesture recognition based on hand shape in complex background. In Proceedings of 1st Asia-Pacific Computational Intelligence and Information Technology Conference, Shanghai, China, Dec28, 2013. Proceedings of the 2013 Asia-Pacific Computational Intelligence and Information Technology Conference, pp.578–585. [Google Scholar]
19.Dominio F, Donadeo M, Zanuttigh P. Combining multiple depth-based descriptors for hand gesture recognition. Pattern Recogn Lett 2014; 50: 101–111. [Google Scholar]
20.Marin G, Dominio F, Zanuttigh P. Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed Tools Appl 2016; 75: 14991–15015. [Google Scholar]
21.Park JS, Choi HR, Kim JY, et al. Hand pose recognition by using masked zernike moments. In: Proceedings of 9th International Conference on Computer Vision Theory and Applications. Lisbon, Portugal, Jan 5–8, 2014. pp.551–556. [Google Scholar]
22.Priyal SP, Bora PK. A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recogn 2013; 46: 2202–2219. [Google Scholar]
23.Chevtchenko SF, Vale RF, Macario V. Multi-objective optimization for hand posture recognition. Expert Syst Appl 2018; 92: 170–181. [Google Scholar]
24.Elouariachi I, Benouini R, Zenkouar K, et al. Robust hand gesture recognition system based on a new set of quaternion tchebichef moment invariants. Pattern Anal Applic 2020; 23: 1337–1353. [Google Scholar]
25.Ren YY, Xie X, Li GL, et al. Hand gesture recognition with multi-scale weighted histogram of contour direction normalization for wearable applications. IEEE T Circ Syst Vid 2018; 28: 364–377. [Google Scholar]
26.Ren Z, Yuan JS, Zhang ZY. Robust hand gesture recognition based on finger-earth mover's Distance with a commodity depth camera. In: Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, AZ, USA, Nov 28 - Dec1, 2011. pp.1093–1096. doi: 10.1145/2072298.2071946. [DOI] [Google Scholar]
27.Feng KP, Yuan F. Static hand gesture recognition based on HOG characters and support vector machines. In: Proceedings of 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation, Toronto, ON, Canada, Dec 23–24, 2013. pp. 936–938. doi: 10.1109/IMSNA.2013.6743432. [DOI] [Google Scholar]
28.Ding YD, Pang HB, Wu XC. Static hand-gesture recognition using HOG and improved LBP features. Int J Digi Cont Tech Appl 2011; 5: 236–243. [Google Scholar]
29.Pansare JR, Rampurkar KS, Mahamane PL, et al. Real-time static devnagri sign language translation using histogram. Int J Adv Res Comput Eng & Technology 2013; 2: 2–6. [Google Scholar]
30.Huang DY, Hu WC, Chang SH. Gabor filter-based hand-pose angle estimation for hand gesture recognition under varying illumination. Expert Syst Appl 2011; 38: 6031–6042. [Google Scholar]
31.Parvathy DP, Subramaniam K. Rapid speedup segment analysis based feature extraction for hand gesture recognition. Multimed Tools Appl 2020; 79: 16987–17002. [Google Scholar]
32.Liu Y, Wang X, Yan K. Hand gesture recognition based on concentric circular scan lines and weighted K-nearest neighbor algorithm. Multimed Tools Appl 2018; 77: 209–223. [Google Scholar]
33.Yang W, Kong L, Wang M. Hand gesture recognition using saliency and histogram intersection kernel based sparse representation. Multimed Tools Appl 2016; 75: 6021–6034. [Google Scholar]
34.Wu BX, Zhong JP, Yang CG. A visual-based gesture prediction framework applied in social robots. IEEE/CAA J Autom Sinica 2022; 9: 510–519. [Google Scholar]
35.Zhang Q, Xiao SL, Yu ZY, et al. Hand gesture recognition algorithm combining hand-type adaptive algorithm and effective-area ratio for efficient edge computing. J Electron Imag 2021; 30: 63026. [Google Scholar]
36.Zhang B, Zhang Y, Liu J, et al. FGFF Descriptor and modified Hu moment-based hand gesture recognition. Sensors 2021; 21: 6525. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2015; 61: 85–117. [DOI] [PubMed] [Google Scholar]
38.Hu Y, Zhao HF, Wang ZG. Sign language fingers spelling recognition using depth information and deep belief networks. Int J Pattern Recogn 2018; 32: 1850018. [Google Scholar]
39.Tang A, Lu K, Wang YF, et al. A real-time hand posture recognition system using deep neural networks. ACM T Intel Syst Tec 2015; 6: 1–23. [Google Scholar]
40.Chen XH, Wang GJ, Guo HK, et al. Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing. 2020; 395: 138-149. [Google Scholar]
41.Jain DK, Mahanti A, Shamsolmoali P. Deep neural learning techniques with long short-term memory for gesture recognition. Neural Comput & Applic 2020; 32: 16073–16089. [Google Scholar]
42.Zhang Y, Zhou W, Wang Y, et al. A real-time recognition method of static gesture based on DSSD. Multimed Tools Appl 2020; 79: 17445–17461. [Google Scholar]
43.Bhaumik G, Verma M, Govil MC, et al. Hyfinet: hybrid feature attention network for hand gesture recognition. Multimed Tools Appl. Epub ahead of print 8 Jan 2022. DOI: 10.1007/s11042-021-11623-3. [DOI] [Google Scholar]
44.Noreen I, Hamid M, Akram U, et al. Hand pose recognition using parallel multi stream CNN. Sensors 2021; 21: 8469. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Iglesias AT, Astorquia IF, Gomez JIV, et al. Gesture-based human machine interaction using RCNNs in limited computation power devices. Sensors 2021; 21: 8202. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Kowdiki M, Khaparde A. Adaptive hough transform with optimized deep learning followed by dynamic time warping for hand gesture recognition. Multimed Tools Appl 2022; 81: 2095-2126. [Google Scholar]
47.Al-Hammadi M, Muhammad G, Abdul W, et al. Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 2020; 8: 192527–192542. [Google Scholar]
48.Al-Hammadi M, Muhammad G, Abdul W, et al. Hand gesture recognition for sign language using 3DCNN. IEEE Access 2020; 8: 79491–79509. [Google Scholar]
49.Medjram S, Babahenini MC, Taleb-Ahmed A, et al. Automatic hand detection in color images based on skin region verification. Multimed Tools Appl 2018; 77: 13821–13851. [Google Scholar]
50.Al-Shamayleh AS, Ahmad R, Abushariah MAM, et al. A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 2018; 77: 28121–28184. [Google Scholar]
51.Hu MK. Visual pattern recognition by moment invariants. IEEE Trans Inf Theory 1962; 8: 179–187. [Google Scholar]
52.Sanda Mahama AT, Dossa AS, Gouton P. Choice of distance metrics for RGB color image analysis. Electron Imaging 2016; 20: 1–4. [Google Scholar]
53.JeongS Gray RM. Minimum distortion color image retrieval based on lloyd-clustered Gauss mixtures. In: Proceedings of DCC 2005: Data Compression Conference (ed Storer JA, Cohn M.). Snowbird, UT, USA, Mar 29-31, 2005. pp. 279–288. Los Alamitos: IEEE Computer Society. doi: 10.1109/DCC.2005.52 [DOI] [Google Scholar]
54.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556v4 (2014, accessed 19 Jun 2021)
55.Lin WS, Wu YL, Hung WC, A study of real-time hand gesture recognition using SIFT on binary images. In: Advances in intelligent systems and applications - volume 2. Smart innovation, systems and technologies (ed Pan JS, et al. ). 2013. v 21, pp.235–246. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-35473-1_24 [DOI] [Google Scholar]

[bibr1-00368504221086362] 1.Kiliboz NC, Gudukbay U. A hand gesture recognition technique for human-computer interaction. J Vis Commun Image R 2015; 28: 97–104. [Google Scholar]

[bibr2-00368504221086362] 2.Rautaray SS, Agrawal A. Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 2015; 43: 1–54. [Google Scholar]

[bibr3-00368504221086362] 3.Wang M, Chen WY, Li XD. Hand gesture recognition using valley feature and Hu's Moments technique for robot movement control. Meas 2016; 94: 734–744. [Google Scholar]

[bibr4-00368504221086362] 4.Xing L. Human-robot interaction based on gesture and movement recognition. Signal Process-Image 2020; 81: 115686. [Google Scholar]

[bibr5-00368504221086362] 5.Neiva DH, Zanchettin C. Gesture recognition: a review focusing on sign language in a mobile context. Expert Syst Appl 2018; 103: 159–183. [Google Scholar]

[bibr6-00368504221086362] 6.Almasre MA, Al-Nuaim H. A comparison of Arabic sign language dynamic gesture recognition models. Heliyon 2020; 6: e03554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr7-00368504221086362] 7.Lopez-Casado C, Bauzano E, Rivas-Blanco I, et al. A gesture recognition algorithm for hand-assisted laparoscopic surgery. Sensors 2019; 19: 5182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr8-00368504221086362] 8.A-Reum L, Yongwon C, Seongho J, et al. Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Comput Meth Prog Bio 2020; 190: 105385. [DOI] [PubMed] [Google Scholar]

[bibr9-00368504221086362] 9.Abid MR, Petriu EM, Amjadian E. Dynamic sign language recognition for smart home interactive application using stochastic linear formal grammar. IEEE T Instrum Meas 2015; 64: 596–605. [Google Scholar]

[bibr10-00368504221086362] 10.Dinh DL, Ngoan PTK, Thang ND, A single depth silhouette-based hand gesture recognition for appliance interfaces in smart home environment. In: 6th International Conference on the Development of Biomedical Engineering in Vietnam (ed Vo Van T, et al. ). Ho Chi Minh, Vietnam, June 2016. IFMBE Proceedings , vol 63, pp.369–373. Singapore: Springer. doi: 10.1007/978-981-10-4361-1_62 [DOI] [Google Scholar]

[bibr11-00368504221086362] 11.Rautaray SS, Agrawal A. Manipulating objects through hand gesture recognition in virtual environment. In: Advances in Parallel Distributed Computing (ed Nagamalai D, et al. ). Tirunelveli, India, Sept 23–25, 2011. Communications in Computer and Information Science , vol 203, pp. 270–281. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-24037-9_26 [DOI] [Google Scholar]

[bibr12-00368504221086362] 12.Sagayam KM, Hemanth DJ. Hand posture and gesture recognition techniques for virtual reality applications: a survey. Virtual Real 2017; 21: 91–107. [Google Scholar]

[bibr13-00368504221086362] 13.Lee DH, Hong KS. Game interface using hand gesture recognition. In: 5th International Conference on Computer Sciences and Convergence Information Technology (ed Ko F, Na YJ.). Seoul, Korea (South), Nov 30-Dec 2, 2010. ICCIT2010 Proceeding, pp.1092–1097. doi: 10.1109/ICCIT.2010.5711226 [DOI] [Google Scholar]

[bibr14-00368504221086362] 14.Zhu YM, Yuan B. Real-time hand gesture recognition with kinect for playing racing video games. In: Proceedings of IEEE International Joint Conference on Neural Networks, Beijing, China, July 6–11, 2014. pp. 3240–3246. Los Alamitos: IEEE Computer Society. doi: 10.1109/IJCNN.2014.6889481. [DOI] [Google Scholar]

[bibr15-00368504221086362] 15.Oyedotun OK, Khashman A. Deep learning in vision-based static hand gesture recognition. Neural Comput & Applic 2017; 28: 3941–3951. [Google Scholar]

[bibr16-00368504221086362] 16.Jian CF, Li JJ. Real-time multi-trajectory matching for dynamic hand gesture recognition. IET Image Proc 2020; 14: 236–244. [Google Scholar]

[bibr17-00368504221086362] 17.Verma B, Choudhary A. Grassmann manifold based dynamic hand gesture recognition using depth data. Multimed Tools Appl 2020; 79: 1–25. [Google Scholar]

[bibr18-00368504221086362] 18.Wu X, Zhang Q, Li ZM. A method research of hand gesture recognition based on hand shape in complex background. In Proceedings of 1st Asia-Pacific Computational Intelligence and Information Technology Conference, Shanghai, China, Dec28, 2013. Proceedings of the 2013 Asia-Pacific Computational Intelligence and Information Technology Conference, pp.578–585. [Google Scholar]

[bibr19-00368504221086362] 19.Dominio F, Donadeo M, Zanuttigh P. Combining multiple depth-based descriptors for hand gesture recognition. Pattern Recogn Lett 2014; 50: 101–111. [Google Scholar]

[bibr20-00368504221086362] 20.Marin G, Dominio F, Zanuttigh P. Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed Tools Appl 2016; 75: 14991–15015. [Google Scholar]

[bibr21-00368504221086362] 21.Park JS, Choi HR, Kim JY, et al. Hand pose recognition by using masked zernike moments. In: Proceedings of 9th International Conference on Computer Vision Theory and Applications. Lisbon, Portugal, Jan 5–8, 2014. pp.551–556. [Google Scholar]

[bibr22-00368504221086362] 22.Priyal SP, Bora PK. A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recogn 2013; 46: 2202–2219. [Google Scholar]

[bibr23-00368504221086362] 23.Chevtchenko SF, Vale RF, Macario V. Multi-objective optimization for hand posture recognition. Expert Syst Appl 2018; 92: 170–181. [Google Scholar]

[bibr24-00368504221086362] 24.Elouariachi I, Benouini R, Zenkouar K, et al. Robust hand gesture recognition system based on a new set of quaternion tchebichef moment invariants. Pattern Anal Applic 2020; 23: 1337–1353. [Google Scholar]

[bibr25-00368504221086362] 25.Ren YY, Xie X, Li GL, et al. Hand gesture recognition with multi-scale weighted histogram of contour direction normalization for wearable applications. IEEE T Circ Syst Vid 2018; 28: 364–377. [Google Scholar]

[bibr26-00368504221086362] 26.Ren Z, Yuan JS, Zhang ZY. Robust hand gesture recognition based on finger-earth mover's Distance with a commodity depth camera. In: Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, AZ, USA, Nov 28 - Dec1, 2011. pp.1093–1096. doi: 10.1145/2072298.2071946. [DOI] [Google Scholar]

[bibr27-00368504221086362] 27.Feng KP, Yuan F. Static hand gesture recognition based on HOG characters and support vector machines. In: Proceedings of 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation, Toronto, ON, Canada, Dec 23–24, 2013. pp. 936–938. doi: 10.1109/IMSNA.2013.6743432. [DOI] [Google Scholar]

[bibr28-00368504221086362] 28.Ding YD, Pang HB, Wu XC. Static hand-gesture recognition using HOG and improved LBP features. Int J Digi Cont Tech Appl 2011; 5: 236–243. [Google Scholar]

[bibr29-00368504221086362] 29.Pansare JR, Rampurkar KS, Mahamane PL, et al. Real-time static devnagri sign language translation using histogram. Int J Adv Res Comput Eng & Technology 2013; 2: 2–6. [Google Scholar]

[bibr30-00368504221086362] 30.Huang DY, Hu WC, Chang SH. Gabor filter-based hand-pose angle estimation for hand gesture recognition under varying illumination. Expert Syst Appl 2011; 38: 6031–6042. [Google Scholar]

[bibr31-00368504221086362] 31.Parvathy DP, Subramaniam K. Rapid speedup segment analysis based feature extraction for hand gesture recognition. Multimed Tools Appl 2020; 79: 16987–17002. [Google Scholar]

[bibr32-00368504221086362] 32.Liu Y, Wang X, Yan K. Hand gesture recognition based on concentric circular scan lines and weighted K-nearest neighbor algorithm. Multimed Tools Appl 2018; 77: 209–223. [Google Scholar]

[bibr33-00368504221086362] 33.Yang W, Kong L, Wang M. Hand gesture recognition using saliency and histogram intersection kernel based sparse representation. Multimed Tools Appl 2016; 75: 6021–6034. [Google Scholar]

[bibr34-00368504221086362] 34.Wu BX, Zhong JP, Yang CG. A visual-based gesture prediction framework applied in social robots. IEEE/CAA J Autom Sinica 2022; 9: 510–519. [Google Scholar]

[bibr35-00368504221086362] 35.Zhang Q, Xiao SL, Yu ZY, et al. Hand gesture recognition algorithm combining hand-type adaptive algorithm and effective-area ratio for efficient edge computing. J Electron Imag 2021; 30: 63026. [Google Scholar]

[bibr36-00368504221086362] 36.Zhang B, Zhang Y, Liu J, et al. FGFF Descriptor and modified Hu moment-based hand gesture recognition. Sensors 2021; 21: 6525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr37-00368504221086362] 37.Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2015; 61: 85–117. [DOI] [PubMed] [Google Scholar]

[bibr38-00368504221086362] 38.Hu Y, Zhao HF, Wang ZG. Sign language fingers spelling recognition using depth information and deep belief networks. Int J Pattern Recogn 2018; 32: 1850018. [Google Scholar]

[bibr39-00368504221086362] 39.Tang A, Lu K, Wang YF, et al. A real-time hand posture recognition system using deep neural networks. ACM T Intel Syst Tec 2015; 6: 1–23. [Google Scholar]

[bibr40-00368504221086362] 40.Chen XH, Wang GJ, Guo HK, et al. Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing. 2020; 395: 138-149. [Google Scholar]

[bibr41-00368504221086362] 41.Jain DK, Mahanti A, Shamsolmoali P. Deep neural learning techniques with long short-term memory for gesture recognition. Neural Comput & Applic 2020; 32: 16073–16089. [Google Scholar]

[bibr42-00368504221086362] 42.Zhang Y, Zhou W, Wang Y, et al. A real-time recognition method of static gesture based on DSSD. Multimed Tools Appl 2020; 79: 17445–17461. [Google Scholar]

[bibr43-00368504221086362] 43.Bhaumik G, Verma M, Govil MC, et al. Hyfinet: hybrid feature attention network for hand gesture recognition. Multimed Tools Appl. Epub ahead of print 8 Jan 2022. DOI: 10.1007/s11042-021-11623-3. [DOI] [Google Scholar]

[bibr44-00368504221086362] 44.Noreen I, Hamid M, Akram U, et al. Hand pose recognition using parallel multi stream CNN. Sensors 2021; 21: 8469. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr45-00368504221086362] 45.Iglesias AT, Astorquia IF, Gomez JIV, et al. Gesture-based human machine interaction using RCNNs in limited computation power devices. Sensors 2021; 21: 8202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr46-00368504221086362] 46.Kowdiki M, Khaparde A. Adaptive hough transform with optimized deep learning followed by dynamic time warping for hand gesture recognition. Multimed Tools Appl 2022; 81: 2095-2126. [Google Scholar]

[bibr47-00368504221086362] 47.Al-Hammadi M, Muhammad G, Abdul W, et al. Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 2020; 8: 192527–192542. [Google Scholar]

[bibr48-00368504221086362] 48.Al-Hammadi M, Muhammad G, Abdul W, et al. Hand gesture recognition for sign language using 3DCNN. IEEE Access 2020; 8: 79491–79509. [Google Scholar]

[bibr49-00368504221086362] 49.Medjram S, Babahenini MC, Taleb-Ahmed A, et al. Automatic hand detection in color images based on skin region verification. Multimed Tools Appl 2018; 77: 13821–13851. [Google Scholar]

[bibr50-00368504221086362] 50.Al-Shamayleh AS, Ahmad R, Abushariah MAM, et al. A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 2018; 77: 28121–28184. [Google Scholar]

[bibr51-00368504221086362] 51.Hu MK. Visual pattern recognition by moment invariants. IEEE Trans Inf Theory 1962; 8: 179–187. [Google Scholar]

[bibr52-00368504221086362] 52.Sanda Mahama AT, Dossa AS, Gouton P. Choice of distance metrics for RGB color image analysis. Electron Imaging 2016; 20: 1–4. [Google Scholar]

[bibr53-00368504221086362] 53.JeongS Gray RM. Minimum distortion color image retrieval based on lloyd-clustered Gauss mixtures. In: Proceedings of DCC 2005: Data Compression Conference (ed Storer JA, Cohn M.). Snowbird, UT, USA, Mar 29-31, 2005. pp. 279–288. Los Alamitos: IEEE Computer Society. doi: 10.1109/DCC.2005.52 [DOI] [Google Scholar]

[bibr54-00368504221086362] 54.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556v4 (2014, accessed 19 Jun 2021)

[bibr55-00368504221086362] 55.Lin WS, Wu YL, Hung WC, A study of real-time hand gesture recognition using SIFT on binary images. In: Advances in intelligent systems and applications - volume 2. Smart innovation, systems and technologies (ed Pan JS, et al. ). 2013. v 21, pp.235–246. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-35473-1_24 [DOI] [Google Scholar]

PERMALINK

Static hand gesture recognition based on hierarchical decision and classification of finger features

Yunfeng Li

Pengyue Zhang

Abstract

Introduction

Figure 1.

Figure 2.

Related work

Preprocessing of hand gesture image

Skin area segmentation

Figure 3.

Hand region segmentation

Figure 4.

Figure 5.

Gesture direction normalization

Figure 6.

Convex decomposition of hand gesture

Determination of candidate cut lines

Figure 7.

Figure 8.

Selection of optimal cut line

Figure 9.

Feature extraction of fingers

Direction feature of finger

Position feature of finger

Figure 10.

Shape feature of finger

Hand gesture classification

Deep sparse autoencoder

Figure 11.

Hierarchical decision classifier

Figure 12.

Experiment and discussion

Evaluation on our self-built dataset

Figure 13.

Comparison on illumination variation

Figure 14.

Comparison on gesture direction variation

Figure 15.

Comparison on gesture scale variation

Figure 16.

Evaluation on publicly available dataset

Figure 17.

Table 1.

Figure 18.

Conclusion

Acknowledgements

Author biographies

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases