Component-based face recognition using statistical pattern matching analysis

Sushil Kumar Paul; Saida Bouakaz; Chowdhury Mofizur Rahman; Mohammad Shorif Uddin

doi:10.1007/s10044-020-00895-4

. 2020 Jul 18;24(1):299–319. doi: 10.1007/s10044-020-00895-4

Component-based face recognition using statistical pattern matching analysis

Sushil Kumar Paul ^1,^✉, Saida Bouakaz ², Chowdhury Mofizur Rahman ³, Mohammad Shorif Uddin ⁴

PMCID: PMC7368618 PMID: 32837298

Abstract

The aim of this research is to develop a fusion concept to component-based face recognition algorithms for features analysis of binary facial components (BFCs), which are invariant to illumination, expression, pose variations and partial occlusion. To analyze the features, using statistical pattern matching concepts, which are the combination of Chi-square (CSQ), Hu moment invariants (HuMIs), absolute difference probability of white pixels (AbsDifPWPs) and geometric distance values (GDVs) have been proposed for face recognition. The individual grayscale face image is cropped by applying the Viola–Jones face detection algorithm from a face database having variations in illumination, appearance, pose and partial occlusion with complex backgrounds. Doing illumination correction through histogram linearization technique, the grayscale face components such as eyes, nose and mouth regions are extracted using the 2D geometric positions. The binary face image is created by applying cumulative probability distribution function with Otsu adaptive thresholding method and then extracted BFCs such as eyes, nose and mouth regions. Five statistical pattern matching tools such as the standard deviation of CSQ values with probability of white pixels (PWPs), standard deviation of HuMIs with Hu’s seven moment invariants, AbsDifPWPs and GDVs are developed for recognition purpose. GDVs are determined between two similar facial corner points (FCPs) and nine FCPs are extracted from binary whole face and BFCs. Pixel Intensity Values (PIVs) which are determined using L₂ norms from grayscale values of the whole face and grayscale values of the face components. Experiment is performed using BioID Face Database on the basis of these pattern matching tools and appropriate threshold values with logical and conditional operators and gives the best expected results from true positive rate perspective.

Keywords: Binary facial component, Histogram linearization technique, Otsu thresholding, Probability of white pixels, Hu’s moment invariants, Facial corner points

Introduction

Human face recognition has the most significant role for identifying a person in many real-world applications in computer vision like identification, authentication, security, surveillance system, human–computer interaction, anti-terrorism, psychology and so on. It is not an intrusive technique (i.e., not convey any health risks like corona virus), and it does not need to touch anything during the acquisition level. Moreover, face is a rich source of non-verbal information about human behavior [1]. Due to complex and multidimensional structure of face, it requires enormous process and computation. On the other hand, face perception is the most developed visual perceptual ability in human beings. Infants desire to look at faces and try to recognize faces from shortly after birth [2]. Most people consume more time looking at faces than at any other type of objects. Humans have not the ability to recognize unlimited number of different faces, but computer can do it because computer has a bulk amount of memory and processing capabilities. Although automatic face recognition is not a new topic, the challenge of developing an appropriate face recognition algorithm remains unsolved. The main problems of face recognition are illuminations, different expressions, lighting variations, different poses, partial occlusion, complex and multidimensional structure of face and cluttered background.

Face recognition is the technique to identify one or more person in images. Algorithms for face recognition typically extract facial features such as facial corner points and facial component values and compare them to a database to find the best match. Facial features extraction task is the initial stage for face recognition in the field of computer vision. The most significant facial features are facial corner points, eyes, nostrils and mouth areas. Facial corner points are eyes corners, nostrils, nose tip and mouth corners. Eyes, nostrils and mouth areas are also the key feature regions for face recognition. Using all of these information with some appropriate pattern, matching tools are accomplished the recognition decision [3–8]. Face recognition algorithm complies with appropriate matching to identify a similar face and reject the dissimilar face images from the face database.

Face recognition techniques are mainly classified into two principal categories: appearance based and model based. In appearance-based techniques, facial feature extraction concept, are holistic (global), facial components information (local) and fusion (hybrid) approaches play the vital roles for the classification and recognition of faces. In global (holistic) approach, entire information is extracted with a single vector from the whole face image. In local feature-based methods, facial component features such as eyes, nose, mouth, etc. are extracted from face images. Eyes are robust to facial expressions and occlusions because of its constant inter-ocular distance among people and unaffected by mustache or beard [7, 9]. Nose indicates the head pose and its nostrils and nose tip are the symmetry points of both right and left side face regions and also robust to facial expressions. Mouth also is a key component of face recognition, while mouth length and lips convey facial expression information [7].

Feature-based approach can perform with diverse imaging conditions to achieve better performance. Fusion (hybrid) method is the combination of both global and local concepts and applied for achieving the desired outcome. So, holistic features may not be used if the exact and accurate facial component features extraction is possible [1].

Various holistic approaches used for face recognition, such as Eigenfaces [10], Fisherfaces [11], Independent Component Analysis (ICA) [12], Moment Invariants [13], Discrete Cosine Transform (DCT) [14], etc. Moreover, component feature-based techniques are described by facial components with various ideas, for examples, components with support vector machine (SVM) [15], LDA [16], 3D models [17], etc.

Zhang et al. [18] proposed a holistic-based illumination invariant face recognition approach used only high-frequency components and some useful identity information are lost due to discarding of the low-frequency components. Wada et al. [19] proposed a holistic-based expression invariant face recognition system, which used a constrained optical flow algorithm. It needs high computational cost as well as the obtained results are not shown satisfactory.

As a model-based approach, Weyrauch et al. [17] constructed a component feature-based algorithm using 3D morphable models. It took fourteen component features but applied only nine components due to high computational complexity. Bonnen et al. [20] developed a component-based technique utilizing heterogeneous concepts, which is too much intricate. Hua [21] developed a pose invariant component-based face recognition technique, but it needs high computational cost and does not produce good performance. Turk and Pentland [10] used Eigenfaces, but it cannot incorporate additional training data into an existing PCA projection matrix and not invariant to change in shape, pose and expression [11, 22]. Rajiv Kapoor and Pallavi Mathur [23] developed three kinds of moment invariants including Hu moment invariants and achieved poor result. Harguess and Aggarwal [24] demonstrated average-half-face with multiple concepts used full frontal view face applied six types of face recognition algorithms and shown not so good results. Nabatchian et al. [13] applied nine moment invariants (MIs) but got poor results except Pseudo-Zernike Moment Invariants (PZMI). Brunelli and Poggio [25] applied geometrical feature and template-matching concepts in frontal view face with the same illumination condition and applied facial component features. Sushil et al. developed the component-based face recognition algorithms using statistical pattern matching concepts and does not show good result as the TPR perspective [3, 4].

The above-mentioned holistic and model-based techniques need high computational cost and are not suitable due to variations in illumination, expression, pose and partial occlusion. Therefore, it is very much indispensable to develop a less computation cost component-based face recognition algorithms that are able to include new feature, discard or modify existing feature to summarize similarity decision as a fusion concept if any one facial component feature of a test face is matched with corresponding facial component feature of reference database to establish an invariant properties of different illumination, expression, pose variation and partial occlusions for better performance.

The main objective of this research activity is that the different pattern matching algorithms are recognized the same images as well as different images of the same individual using entire face database. Therefore, combination of multiple algorithms will be provided the same percent of TPR values for each individual.

The purposes of this research work are: (1) to extract some crucial facial features such as eyes, nostrils and mouth regions as the key components and also to apply shape invariant concept of these components, (2) to develop an invariant technique of multiple imaging situations such as illumination, expression, pose variations and partial occlusion, (3) to analyze of facial features that which feature need to be included or discarded for achieving better performance as TPR perspective, (4) to need simple mathematical calculation for constructing desired pattern matching tools and (5) to use two or more pattern matching tools can be formed new or expected recognition algorithms. As a fusion-based similarity technique, the five statistical pattern matching tools such as standard deviation of CSQ values [3], standard deviation of Hu moment invariants (HuMIs) [4], Geometric Distance Values (GDVs) [5–8], absolute difference probability of white pixels (AbsDifPWPs) [3, 4] and Pixel Intensity Values (PIVs) [3, 4] are constructed for face recognition algorithms.

In this research activity the facial component feature extraction concept is based on BFCs that are mainly employed on the three independent methods such as probability of white pixels (PWPs) with CSQ formula [3], shape similarities with Hu moment invariants (HuMIs) [4] and geometric distance values (GDVs) of nine similar FCPs between test face and reference database images using each of individual facial component and whole face [5–8].

The PWPs depend on facial appearance and expression but independent on illumination variations. The PWPs are calculated from the number of white pixels divided by total number of pixels of each binary face component. To establish matching technique without complex processing, only normalization idea (sum of probability of white and black pixels of each binary facial component is equal to one) is applied to confirm more invariance of facial appearance and expression. Matching of each component is done with the help of Chi-square (CSQ) formula [3].

The shape of all images of each individual (person) is same, but the shape of different people is different. The shape similarity as a statistical pattern matching concepts have been established using Hu’s seven nonlinear moment functions for recognition purpose due to its invariance characteristics. A BFC will be invariant if its moment value shows constant to alter in scale, rotation, translation or/and reflection in each image component to ensure the independent of appearance, expression, pose and lighting variations, and it is computationally efficient to get better outcome because it does not require any prior information about the face model [4].

The geometric distance values (GDVs) is used for recognition activity, and it is calculated from the automatic extraction of nine similar FCPs, such as both eyes corner points, mouth corner points, nostrils and nose tip between test face and reference database images [5–8]. The benefit of FCPs is that it is completely independent of facial appearance, expression and illumination, pose variations and partial occlusion and requires the least computational cost [8].

The absolute difference of probability of white pixels (AbsDifPWPs) is calculated using binary test face and binary reference database images. The pixel intensity values (PIVs) between grayscale test face and reference database are determined using $L_{2}$ norm is known as Euclidean distance formula [3, 4, 8].

In this paper, we have shown six types of recognition algorithms such as: (i) Chi-Square (CSQ) [3], (ii) Chi-Square with Facial Corner Points (CSQ + FCPs) [8], (iii) Hu Moment Invariants (HuMIs) [4], (iv) Hu Moment Invariants with Facial Corner Points (HuMIs + FCPs), (v) Chi-Square and Hu Moment Invariants (CSQ + HuMIs) and (vi) Chi-Square and Hu Moment Invariants with Facial Corner Points (CSQ + HuMIs + FCPs) methods. Five components such as whole face, both eyes, nose and mouth regions are used from both grayscale and binary images [3, 4]. Algorithms (i) to (iii) are existing methods and algorithms (iv) to (vi) are to be developed for features analysis.

The major contributions of this paper are: (1) Construction of an invariant technique of multiple imaging conditions and shape invariant concept using binary facial components, (2) Development of a new recognition algorithm by the combination of two or more pattern matching tools, (3) Making a decision framework for finding an appropriate pattern matching tool on the basis of TPR perspective and (4) Performance analysis to use of each pattern matching tool.

The remainder of this work is organized accommodating the following six sections. Sections 2 and 3 describe the preprocessing and processing works, respectively. Determination of five types of statistical pattern matching tools is described in Sect. 4. The similarity evaluation and recognition decisions are performed in Sect. 5. In Sect. 6, implementation and results are described and finally, conclusions and future work are mentioned in Sect. 7.

Preprocessing work

Detection of face image, normalization face size, eliminating forehead portion and conditionally, histogram linearization technique (HLT) are done in this section. Localizing and cropping the exact face portion from the cluttered background is included in the preprocessing work. Since not all the detected faces are the same size, so it is essential to build all images into same size is known as normalization task for uniformity purpose. The Viola–Jones face detection algorithm is used for localizing and cropping the exact face area [26]. There is no information available on forehead portion in the binary face image, therefore, it is necessary to eliminate forehead portion for speed up processing activity [3–6, 25]. The complete proposed work is shown in Fig. 1.

Face detection and cropping

Face detection in an input face image is done by Paul Viola and Michael Jones called Viola–Jones face detector. It is based on four key concepts such as rectangular features like Haar features, an integral image for fast feature detection, classifier training and feature using the AdaBoost machine-learning algorithm and a cascade classifier to combine many features efficiently [26].

Rectangular feature (Haar-like features) image is represented by two-, three- and four-rectangular features. A two-rectangle feature value is computed by the difference between the sums of the pixels within two vertically or horizontally adjacent rectangular regions. A three-rectangle feature value is computed by the sum within two outside rectangles subtracted from the sum in a center rectangle and a four-rectangle feature value is computed by the difference between diagonal pairs of rectangles.

The integral image at pixel location p(x, y) is the sum of all the pixels above it and to its left. It is calculated in one pass over the original image. Therefore, feature can be determined rapidly because the value of each rectangle needs only four array references.

A machine-learning algorithm called AdaBoost is applied to construct a strong classifier through a weighted combination of weak classifiers. Combination of a series of AdaBoost classifier is called as a filter chain and each filter, which is a separate AdaBoost classifier, contains a fairly small number of weak classifier. If any one of these filters fails to pass an image region is immediately classify as “Not Face”. If image regions that pass through all filters in the chain are classify as “Face” and this filtering chain is known as cascade [26].

Finally, the detected face is cropped (face size = W × H pixels) from an input image. The detected and cropped face are shown in Fig. 3a, b, respectively.

Face size normalization

All the detected faces are not of the same size during face detection task, therefore, it is indispensable to convert all images into equal size (normalized face size = W1 × H1 = 128 × 128 pixels)) for uniformity purpose (see Fig. 3c).

Face forehead portion elimination

There is no information (i.e., no white pixels are present in the forehead portion) is available in the forehead region during binary image conversion task. So, it is necessary to discard forehead portion(face size without forehead region = W2 × H2 = 0.75W1 × 0.60H1 = 96 × 76 pixels) for speed up processing activity(see Fig. 3d).

Illumination uniformity activity using histogram linearization technique (HLT)

Face image having weak illumination can be reduced the recognition performance. Therefore, illumination adjustment activity is necessary in the preprocessing stage. HLT transforms the intensity values so that the histogram of the output image approximately appears the flat (uniform) histogram. We have applied HLT for illumination uniformity activity on a grayscale face image if its average pixel intensity value is less than 170 [27] (see Fig. 3e).

Processing work

Conversion of binary face image using cumulative probability distribution function (CPDF) with Otsu’s optimal global thresholding value, and cropping of four BFCs (such as both eyes, nose and mouth components), four grayscale facial components (GFCs) (such as both eyes, nose and mouth components) and nine FCPs extraction are the principal tasks in this section. Details procedures are shown in Fig. 3. Cropping of facial components is explained in details in [5–7] (see Fig. 3f, h). The following mathematical concepts are applied to explain the Otsu’s adaptive thresholding, binary image conversion, cropping of four BFCs (such as both eyes, nose and mouth components), four grayscale facial components (GFCs) (such as both eyes, nose and mouth components and nine FCPs extraction techniques [3–7, 28]. These are as follows.

Otsu’s optimal global thresholding

Otsu’s thresholding is a nonlinear, nonparametric and unsupervised statistical discriminant analysis image segmentation method known as clustering-based adapting thresholding that transforms a grayscale image into a binary image. Its fundamental concept is to split the image’s pixels into two classes and confirms the optimal threshold to maximize between-class (inter-class) variance or minimize the weighted sum of within-class (intra-class) variances.

An image, I (x, y), is a 2D grayscale intensity function, and contain $N = W \times H$ pixels with gray levels from $0 to L - 1$ . The number of pixels with gray level $i$ is denoted $n_{i}$ giving a probability of gray level $i$ in an image of $ρ_{i} = \frac{n_{i}}{N}$ . Where, $\sum_{i = 0}^{L - 1} ρ_{i} = 1, ρ_{i} \geq 0, 0 \leq i \leq L - 1 and L = 256 .$

In the case of bi-level thresholding of an image, I (x, y), the pixels are divided into two classes, $C_{1} with gray levels [0, 1, 2, \dots, t] and C_{2} with gray levels [t + 1, t + 2, t + 3, \dots, L - 1]$ and suppose, a threshold value, $τ_{threshold} (t) = t, and 0 < t < L - 1 .$ Then, the probabilities, $ω_{1} (t)$ $and ω_{2} (t)$ that the pixels are assigned to the two classes $C_{1} and C_{2}$ are given by the following cumulative sum [28]:

ω_{1} (t) = \sum_{i = 0}^{t} ρ_{i}

and

ω_{2} (t) = \sum_{i = t + 1}^{L - 1} ρ_{i} = 1 - ω_{1} (t)

The mean intensity values, $μ_{1} (t)$ $and μ_{2} (t),$ of the pixels assigned to the two classes $C_{1} and C_{2}$ using Bayes’ rule are:

μ_{1} (t) = \sum_{i = 0}^{t} i P (i / C_{1}) = \sum_{i = 0}^{t} i P (C_{1} / i) \frac{P (i)}{P (C_{1})} = \frac{\sum_{i = 0}^{t} i ρ_{i}}{ω_{1} (t)}

And similarly for class C_{2}, μ_{2} (t) = \sum_{i = t + 1}^{L - 1} i P (i / C_{2}) = \frac{\sum_{i = t + 1}^{L - 1} i ρ_{i}}{ω_{2} (t)}

The cumulative mean (average intensity), $μ_{C} (t)$ up to level $t$ and the global mean of the entire image, $μ_{G}$ up to level $L - 1$ are expressed by the following equations:

μ_{C} (t) = \sum_{i = 0}^{t} i ρ_{i}

and

μ_{G} = \sum_{i = 0}^{L - 1} i ρ_{i}

The global mean, $μ_{G},$ can be written as sum of weighed mean of the two classes $C_{1} and C_{2}$ up to level $L - 1$ is:

μ_{G} = ω_{1} (t) μ_{1} (t) + ω_{2} (t) μ_{2} (t)

and

ω_{1} (t) + ω_{2} (t) = 1

Global variance (total variance i.e., the intensity variance of all pixels in the image), $σ_{G}^{2}$ , is the sum of within-class variance, $σ_{W}^{2}$ and between-class variance, $σ_{B}^{2}$ .

σ_{G}^{2} = σ_{W}^{2} (t) + σ_{B}^{2} (t)

where,

σ_{G}^{2} = \sum_{i = 0}^{L - 1} {(i - μ_{G})}^{2} ρ_{i}

which is constant and independent of threshold value

σ_{W}^{2} (t) = \sum_{i = 0}^{t} {(i - μ_{1} (t))}^{2} \frac{ρ_{i}}{ω_{1} (t)} + \sum_{i = t + 1}^{L - 1} {(i - μ_{2} (t))}^{2} \frac{ρ_{i}}{ω_{2} (t)}

Indicate weighted within-class variance for classes C₁ and C₂

σ_{B}^{2} (t) = ω_{1} (t) {(μ_{1} (t) - μ_{G})}^{2} + ω_{2} (t) {(μ_{2} (t) - μ_{G})}^{2} = ω_{1} (t) ω_{2} (t) {(μ_{1} (t) - μ_{2} (t))}^{2} = \frac{{(μ_{G} ω_{1} (t) - μ_{C} (t))}^{2}}{ω_{1} (t) (1 - ω_{1} (t))},

indicate between-class variance for classes C₁ and C₂.

Since, total variance (global variance) is constant and does not depend on threshold value, so that the algorithm must concentrate on minimizing $σ_{W}^{2} (t)$ or maximizing $σ_{B}^{2} (t)$ . Calculating within-class variance for each of the two classes for each possible threshold needs a lot of computation, but it is easy way to maximize $σ_{B}^{2} (t)$ because $σ_{B}^{2} (t)$ depends only global mean, cumulative mean and probability of class $C_{1}$ which are easy to compute.

The optimal threshold value $t^{*}$ , which is maximized $σ_{B}^{2}$ can be shown the relation:

σ_{B}^{2} (t^{*}) = max_{0 \leq t \leq L - 1} σ_{B}^{2} (t)

and finally, we get the OTSU threshold value,

τ_{Otsu} = t^{*}

(see Fig. 2b, d).

Binary image conversion using OTSU thresholding

The cumulative probability density function (CPDF) idea is applied on a cropped grayscale face image with Otsu’s thresholding to get the desire binary face image. Finally, the four ROIs (Region of Interests) such as binary both eyes, nose and mouth regions are extracted using the 2D geometrical positions (Cartesian coordinate system on xy-plane) concept [3–7] (see Fig. 3g, h).

If $I_{grayscale} (X, Y)$ is a cropped grayscale face image having size $W \times H$ , the binary face image, $I_{binary} (X, Y)$ is performed by using the following mathematical equations.

P_{Z}^{(I_{grayscale} (X, Y) = = Z)} = \frac{n_{Z}}{W \times H}

where $0 \leq Z \leq (L - 1), L = 256 and 0 \leq P_{Z} \leq 1.0$

c p d f_{Z}^{I_{grayscale} (X, Y)} = \sum_{j = 0}^{Z} P_{j}^{I_{grayscale} (X, Y)}

I_{binary} (X, Y) = \{\begin{matrix} 255, & if c p d f_{Z}^{I_{grayscale} (X, Y)} \leq T_{Otsu} \\ 0 & otherwise \end{matrix})

where $P_{Z}^{I_{grayscale} (X, Y)} = Probality Density Function of pixel value is Z (0 \leq Z \leq 255)$ , $W \times H = Image Size = Width \times Height,$ $n_{Z} = Total number of pixels having same pixel value Z,$ $c p d f_{Z}^{I_{grayscale} (X, Y)} = Cumulative probality density function up to the intensity value Z$ $T_{otsu} = {Otsu}^{'} s adaptive threshold value$

Cropping grayscale and binary facial components

Viewing from the human frontal face structure concept for both grayscale and binary images, eyes, nose and mouth regions are situated in upper, middle and lower portions of the face image, respectively. Again, the upper portion is divided vertically into left and right segments for isolating right and left eyes, respectively. For cropping nose region, the center portion is extracted from the middle portion of the human face structure with discarding equal size of the leftmost and rightmost regions, and finally, the mouth portion is extracted from the lower portion of the human face structure with discarding equal size of the leftmost and rightmost regions (see Fig. 3f, h). Details are explained in [5–7]. The details of ROI (Region of Interest) size are mentioned below:

\begin{matrix} Normalized Face Image Size = W 1 \times H 1 = 128 \times 128 pixels, \\ Face Image Size (without forehead area) = W 2 \times H 2 pixels, \\ = (1.0 - 2 \times 0.125) W 1 \times (1.0 - 0.25 (upper portion) - 0.15 (lower portion)) \\ H 1 = 0.75 W 1 \times 0.60 H 1 = 96 \times 76 pixels, \\ Upper Portion Image Size = W 2 \times 0.42 H 2 and Left and Right Eyes Size = 0.50 W 2 \times 0.42 H 2, \\ Middle Portion Image Size = W 2 \times 0.31 H 2 and Nose size = (1.0 - 2 \times 0.17) W 2 \times 0.31 H 2 = 0.66 W 2 \times 0.31 H 2, \\ Lower Portion Image Size = W 2 \times 0.27 H 2 and Mouth size = (1.0 - 2 \times 0.17) W 2 \times 0.27 H 2 = 0.66 W 2 \times 0.27 H 2 . \end{matrix}

Human facial corner points (FCPs) extraction

Four binary ROI (Region of Interest) images such as Right-Eye, Left-Eye, Nose and Mouth regions are used to detect corner points automatically(see Fig. 3h). Simple linear search concept is applied on Right Eye, Left Eye and Mouth binary ROIs to detect the first white pixel locations and the contour algorithm is applied on binary Nose ROI to detect nostrils locations. Finally, a Nose Tip is calculated from the nostrils locations. Details are explained in [5–7]. The position of nine facial corner points is summarized and shown in Table 1.

Table 1.

Summary of facial corner points (FCPs) detection

Open in a new tab

Results of detected nine human facial corner points are shown in Fig. 4.

Determination statistical pattern matching tools

The five statistical pattern matching tools such as standard deviation of five CSQ values, standard deviation of fifteen HuMIs values, five AbsDifPWPs values between binary test face and binary reference database images and five PIVs ( $L_{2}$ norm) between grayscale test face and grayscale reference database images and nine GDVs using automatic extracted nine FCPs between test face and reference database images are constructed mathematically to describe the six different types of face recognition algorithms [3–8].

Chi-SQUARE (CSQ) method

Chi-square (CSQ) statistic calculates the goodness-of-fit of the data to the model. That is, Chi-square is the sum of the squared difference between observed ( $O$ ) and the expected ( $ε$ ) data (or the deviation, $δ$ ), divided by the sum of observed and expected data in all possible categories.

If the observed values in each of $β$ bins are $O_{i}$ , and the expected values from the model are $ε_{i}$ , then $χ_{CSQ}^{2}$ statistic can be written by the following formula using Eqs. (19) and (20) [29, 30]:

χ_{CSQ}^{2} = \sum_{i = 1}^{β = 2} \frac{{(O_{i} - ε_{i})}^{2}}{(O_{i} + ε_{i})}

χ_{CSQ}^{2} = \frac{{(P_{Ref} - P_{Test})}^{2}}{(P_{Ref} + P_{Test})} + \frac{{(Q_{Ref} - Q_{Test})}^{2}}{(Q_{Ref} + Q_{Test})}

δ_{CSQ} = χ_{CSQ}^{2} = \frac{2 {(P_{Ref} - P_{Test})}^{2}}{(P_{Ref} + P_{Test}) (2 - P_{Ref} - P_{Test})}

where P = Probability of White Pixels (PWPs), $O = P_{Ref}, ε = P_{Test}, Q = Probability of Black Pixels (PBPs) = 1 - P,$ $P_{Ref} and P_{Test} are the PWPs of reference and test images, respectively .$

A low value of $δ = χ_{CSQ}^{2}$ indicates a better match than a high score. An exact match is 0(zero) and a total mismatch is unbounded and it depends on the size of the bin.

Standard deviation of Chi-Square value

Standard deviation of five CSQ values for five BFCs between the test face and a reference database image is given in Eq. (21) [3].

σ_{CSQ} = C S Q_{i} (I_{T e s t_{i}} (x, y), J_{R e f_{i}} (x, y)) = \frac{\sqrt{\sum_{i = 1}^{5} {(δ_{{CSQ}_{i}} - \bar{δ_{CSQ}})}^{2}}}{5}

where, $σ_{CSQ}$ is a standard deviation of CSQ value, $δ_{{CSQ}_{i}}$ are CSQ values for five BFCs, $\bar{δ_{CSQ}}$ is an mean CSQ value, and i = 1, 2,…,5

Moments and moment invariants

Moment concept is mainly used for shape descriptor of a probability distribution function and use to many real-world applications such as computer vision, image processing and pattern recognition areas for object matching, recognition, classification and identification purposes. Mathematically, moments are “projection” of a function onto a polynomial basis. The image shape feature performs a vital role in image classification, identification and recognition activities. So, effectively and efficiently extraction of shape features is the key element of the image representation and comparison. Moment invariants mean invariant to certain class of image degradations such as invariant to translation, rotation,scaling (TRS) or affine transform, etc. In this research work, we use moment invariant concept as shape descriptor on five binary facial components for face recognition [31].

Hu moment invariants (HuMIs)

Hu [32] first introduced two-dimensional geometric moment invariants concept to apply for shape recognition task. A set of seven nonlinear moment functions are derived from the second and third order moment, which are translation, scale and rotation invariants.

A digital image $f (a, b)$ having size $W \times H$ , the 2D traditional geometric moments of order $(α + β)$ are expressed by:

K_{α β} = \sum_{a = 0}^{W - 1} \sum_{b = 0}^{H - 1} f (a, b) a^{α} b^{β}

where $α, β = 0, 1, 2 \dots$ are integer values.

The double integrals are to be considered over the entire area of the image including its boundary.

In the image plane, the image centroids are used to define the central moments to normalize for translation. The central moment of order $(α + β)$ is defined as:

λ_{α β} = \sum_{a = 0}^{W - 1} \sum_{b = 0}^{H - 1} f (a, b) {(a - \bar{a})}^{α} {(b - \bar{b})}^{β}

where $\bar{a} = \frac{K_{10}}{K_{00}}, and \bar{b} = \frac{K_{01}}{K_{00}}$

Since central moments are origin independent and therefore they are translation invariant. But these moments are not invariant to scale or rotation in their original form. When a scaling normalization is applied the central moments become as:

Υ_{α β} = \frac{λ_{α β}}{λ_{00}^{(\frac{α + β + 2}{2})}} for α + β = 2, 3, \dots .

Hu’s [32], seven scalar values, determined by normalizing central moments through order three, that are invariant to object scale, position and orientation. In terms of the central moments, a set of the seven moments are derived by Eqs. (25-31):

ψ_{1} = Υ_{20} + Υ_{02}

ψ_{2} = {(Υ_{20} - Υ_{02})}^{2} + 4 Υ_{11}^{2}

ψ_{3} = {(Υ_{30} - 3 Υ_{12})}^{2} + {(3 Υ_{21} - Υ_{03})}^{2}

ψ_{4} = {(Υ_{30} + Υ_{12})}^{2} + {(Υ_{21} + Υ_{03})}^{2}

ψ_{5} = (Υ_{30} - 3 Υ_{12}) (Υ_{30} + Υ_{12}) ({(Υ_{30} + Υ_{12})}^{2} - 3 (Υ_{21} + Υ_{03})^{2}) + (3 Υ_{21} - Υ_{03}) (Υ_{21} + Υ_{03}) (3 (Υ_{30} + Υ_{12})^{2} - {(Υ_{21} + Υ_{03})}^{2})

ψ_{6} = (Υ_{20} - Υ_{02}) ({(Υ_{30} + Υ_{12})}^{2} - {(Υ_{21} + Υ_{03})}^{2}) + 4 Υ_{11} (Υ_{30} + Υ_{12}) (Υ_{21} + Υ_{03})

ψ_{7} = (3 Υ_{21} - Υ_{03}) (Υ_{30} + Υ_{12}) ({(Υ_{30} + Υ_{12})}^{2} - 3 {(Υ_{21} + Υ_{03})}^{2}) + (3 Υ_{12} - Υ_{30}) (Υ_{21} + Υ_{03}) (3 {(Υ_{30} + Υ_{12})}^{2} - {(Υ_{21} + Υ_{03})}^{2})

These seven invariant moments, $ψ_{z}$ , $1 \leq z \leq 7$ (Eqs. 25–31), are independent of scale, translation and rotation. We have applied binary five images (the binary four ROIs such as right eye, left eye, nostrils and mouth component and binary face image discarding forehead portion) of both for test and reference images by $ψ_{z}$ as matching the shapes.

Standard deviation of fifteen Hu moment invariants (HuMIs)

The three types of Hu’s invariant moment values such as $ε$ _i, $φ$ _i and $ζ$ _i are computed using the following equations [4] (32–36):

ε_{i} (I_{{Ref}_{i}} (a, b), I_{{Test}_{i}} (a, b)) = \sum_{z = 1 \dots 7} |\frac{1}{m_{z}^{{Ref}_{i}}} - \frac{1}{m_{z}^{{Test}_{i}}}|

φ_{i} (I_{{Ref}_{i}} (a, b), I_{{Test}_{i}} (a, b)) = \sum_{z = 1 \dots 7} |m_{z}^{{Ref}_{i}} - m_{z}^{{Test}_{i}}|

ζ_{i} (I_{{Ref}_{i}} (a, b), I_{{Test}_{i}} (a, b)) = \sum_{z = 1 \dots 7} \frac{|m_{z}^{{Ref}_{i}} - m_{z}^{{Test}_{i}}|}{|m_{z}^{{Ref}_{i}}|}

where i = 1,2,3,4,5,

m_{z}^{Ref} = sign (ψ_{z}^{Ref}) \cdot log (ψ_{z}^{Ref})

m_{z}^{Test} = s i g n (ψ_{z}^{Test}) . log (ψ_{z}^{Test})

$ψ_{z}^{Ref}$ and $ψ_{z}^{Test}$ are the HuMIs of binary form of test and reference images, respectively. It is invariant to scale, rotation and reflection [4].

Using three groups of Hu’s invariant moments such as $ε_{i}$ , $φ_{i}$ and $ζ_{i}$ , the Standard deviation of fifteen moments(HuMIs) is [4]:

σ_{HuMIs} = {StdDev}_{HuMIs} (I_{Binary - Ref} (a, b), I_{Binary - Test} (a, b)) = \frac{\sqrt{\sum_{k = 1}^{15} {(d_{k} - \bar{d})}^{2}}}{15}

where $d_{k = 1, 2, \dots, 15} = \{ε_{i = 1, 2, \dots, 5}, φ_{i = 1, 2, \dots, 5}, ζ_{i = 1, 2, \dots, 5}\} ;$

$\bar{d}$ is the mean value of $d_{k}$ ’s ( $k$ = 1,2,3,….,15)

Absolute difference of probability of white pixels (AbsDifPWPs)

Absolute difference of probability of white pixels (AbsDifPWPs) between the binary form of test image and a reference database image is shown in Eq. (38) [3, 4].

δ_{{A B S}_{i}} = A b s D i f P W P s_{i} (I_{{Test}_{i}} (x, y), J_{{Ref}_{i}} (x, y)) = |P_{T e s t_{i}} (I_{T e s t_{i}} (x, y)) - P_{{Ref}_{i}} (J_{{Ref}_{i}} (x, y))|

$δ_{{A B S}_{i}} \geq 0 and δ_{{A B S}_{i}} = 0$ indicate perfect match, where i = 1, 2, …,5.

Pixel intensity values (PIVs) using L₂ norms

Similarity (or closeness) is calculated by using a Euclidean distance formula (EDF). The formula for the square root of sum of the squares differences between the corresponding pixel values of the same size $(W \times H)$ of two grayscale images $I_{{Test}_{i}} (x, y) and J_{{Ref}_{i}} (x, y)$ is known as $L_{2}$ norm. The equation of $L_{2}$ (39) is [3, 4]:

d_{{E D F}_{i}} = {E D F}_{i} (I_{T e s t_{i}} (x, y), J_{{Ref}_{i}} (x, y)) = \frac{\sqrt{\sum_{X, Y} {(I_{{Test}_{i}} (x, y) - J_{R e f_{i}} (x, y))}^{2}}}{W \times H}

where $i = 1, 2, \dots, 5, d_{{E D F}_{i}} \geq 0.0 and d_{{E D F}_{i}} = 0.0$ indicates perfect match.

Geometric distant values (GDVs) using nine facial corner points (FCPs) between two faces

The Geometric Distance Values (GDVs) between two Cartesian coordinate points on the xy-plane is defined by the geometric distance formula.

The distance between the two similar corner points of $Q_{r}^{Test} (x_{k}, y_{l})$ and $R_{r}^{Ref} (x_{u}, y_{v})$ of the same size (size: 128 × 128) of two face images ${Img}_{T e s t_{r}} (x, y) and {Img}_{{Ref}_{r}} (x, y)$ , respectively is (see Fig. 4) [5–8]:

{GDVs}_{r} (Q_{r}^{Test} (x_{k}, y_{ℓ}), R_{r}^{Ref} (x_{u}, y_{v}) = \sqrt{{(x_{{Test}_{k}} - x_{{Ref}_{u}})}^{2} + {(y_{T e s t_{ℓ}} - y_{R e f_{V}})}^{2}}

where $r = k = ℓ = u = v = 1, 2, 3, \dots, 9 and {GDVs}_{r} \geq 0.0 and$ ${GDVs}_{r} = 0.0$ confirm perfect match

Similarity evaluation and recognition decisions

Recognition decision comes from the idea of matching in any one facial component value of a test face image is same as corresponding facial component value of reference database images using any two or three statistical tools out of five statistical tools such as $σ_{CSQ}$ , $σ_{HuMIs}$ , $δ_{{A B S}_{i}}$ $d_{{E D F}_{i}}$ , and any three points out of nine points of ${GDVs}_{r}$ with the help of logical and conditional operators and appropriate threshold values such as $T_{CSQ}$ , $T_{HuMIs}$ , $T_{ABS}$ , $T_{PIVs}$ and $T_{GDVs}$ , respectively. The six different types of recognition decisions are explained below:

CSQ (Method 1):

if ((σ_{CSQ} < T_{CSQ}) & & (any one out of five values of δ_{{A B S}_{i}} \leq T_{ABS}) & & (any one out of five values of d_{{E D F}_{i}} < T_{PIVs}))