Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2021 Nov 5;52(8):8793–8809. doi: 10.1007/s10489-021-02843-z

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Yan-Ru Guo 1, Yan-Qin Bai 1, Chun-Na Li 2,, Lan Bai 3, Yuan-Hai Shao 2
PMCID: PMC8568685  PMID: 34764624

Abstract

The recently proposed L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) was an effective improvement over linear discriminant analysis (LDA) and was used to handle vector input samples. When faced with two-dimensional (2D) inputs, such as images, converting two-dimensional data to vectors, regardless of the inherent structure of the image, may result in some loss of useful information. In this paper, we propose a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA). 2DBLDA maximizes the matrix-based between-class distance, which is measured by the weighted pairwise distances of class means and minimizes the matrix-based within-class distance. The criterion of 2DBLDA is equivalent to optimizing the upper bound of the Bhattacharyya error. The weighting constant between the between-class and within-class terms is determined by the involved data that make the proposed 2DBLDA adaptive. The construction of 2DBLDA avoids the small sample size (SSS) problem, is robust, and can be solved through a simple standard eigenvalue decomposition problem. The experimental results on image recognition and face image reconstruction demonstrate the effectiveness of 2DBLDA.

Keywords: Feature extraction, Dimensionality reduction, Two-dimensional linear discriminant analysis, Robust linear discriminant analysis, Bhattacharyya error bound

Introduction

Feature extraction plays an important role in pattern recognition. As a powerful supervised feature extraction method, linear discriminant analysis (LDA) [1] has been successfully applied in many problems, such as face recognition [2, 3], text mining [4, 5], image retrieval [6, 7], gait recognition [8], and microarrays [9, 10].

However, classical LDA is a vector (or one-dimensional) 1D based method. When input data are naturally of matrix (or two-dimensional) 2D form, such as images, two issues may arise. First, converting 2D data to 1D data may produce high-dimensional vectors and hence may lead to the small sample size (SSS) problem [11]. For example, a 32× 32 face image corresponds to a 1024-dimensional vector. Second, during the transformation from 2D data to 1D data, the underlying spatial (structural) information is destroyed. Therefore, useful discriminant information may be lost [12, 13]. To handle these problems, many image-as-matrix methods have been developed [14, 15]. In contrast to the image-as-vector methods, the image-as-matrix methods treat an image as a two-order tensor, and their objective functions are expressed as functions of the image matrix instead of the high-dimensional image vector. The representative image-as-matrix method is two-dimensional LDA (2DLDA) [16]. 2DLDA constructed the within-class scatter matrix and between-class scatter matrix by using the original image samples represented in matrix form rather than converting matrices to vectors beforehand. Compared to LDA, 2DLDA can alleviate the SSS problem when a mild condition is satisfied [17] and can preserve the original structure of the input matrix.

Thereafter, some modifications and improvements of 2DLDA were studied by many researchers. Due to the squared L2-norm nature of 2DLDA, it was sensitive to noise and outliers. To improve the robustness of 2DLDA, robust replacements of the L2-norm were studied, including the L1-norm [1821], nuclear norm [22, 23], Lp-norm [24, 25], and Schatten Lp-norm, 0 < p < 1 [26]. Some of the studies focused on extracting the discriminative transformations on both sides of the matrix samples. The authors in [27, 28] implemented 2DLDA on matrices in sequence or independently and then combined left-and right side transformations to achieve bilateral dimensionality reduction. Li et al. [25] used iterative schemes to extract transformations on both sides. Extensions to other machine learning problems and real applications were also investigated. For example, Wang et al. [29] proposed a convolutional 2DLDA for nonlinear dimensionality reduction, and Xiao et al. [30] studied a two-dimensional quaternion sparse discriminant analysis that met the requirements of representing RGB and RGB-D images.

Although 2DLDA can ease the SSS problem, it may still face the singularity issue theoretically as LDA since it needs to solve a generalized eigenvalue problem. Recently, a novel vector-based L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] was proposed. Compared to LDA, L2BLDA solved a simple standard eigenvalue decomposition problem rather than a generalized eigenvalue decomposition problem, which avoided the singularity issue and had robustness. In fact, minimizing the Bhattacharyya error [32] bound is a reasonable way to establish classification [33]. In this paper, inspired by L2BLDA, to cope with the SSS problem and improve the robustness of 2DLDA, we first derive a Bhattacharyya error upper bound for matrix input classification and then propose a novel two-dimensional linear discriminant analysis by minimizing this Bhattacharyya error upper bound, called 2DBLDA. The proposed 2DBLDA has the following characteristics:

  • 2DBLDA is proposed for the novel two-dimensional matrix input problem. The 2DBLDA criterion is proven to be an upper bound of the theoretical framework of the Bhattacharyya error bound optimality. We have proved that optimizing this upper bound of the Bhattacharyya error can lead to an optimal discriminant direction. Therefore, the rationality of the 2DBLDA optimization problem is guaranteed theoretically.

  • The weighting constant of the between-class distance and the within-class distance of 2DBLDA is adaptive to the involved data that is calculated according to input data. This constant not only helps the objective of 2DBLDA achieve the minimum error bound but also makes the proposed 2DBLDA adaptive without tuning any parameters. By considering the above weighted between-class distance information, 2DBLDA could achieve robustness.

  • Unlike 2DLDA, 2DBLDA is solved effectively through a standard eigenvalue decomposition problem, which does not involve the inverse of a matrix and hence avoids the SSS problem.

  • To observe the discriminant ability of our method, we consider the accuracy of different databases, plot the variation of the accuracy with dimension reduction, and measure the reconstruction performance of the face image. The experimental results on image recognition and face reconstruction demonstrate the effectiveness of 2DBLDA.

The paper is organized as follows. Section 2 briefly introduces LDA, L2BLDA and 2DLDA. Section 3 proposes our 2DBLDA and gives the corresponding theoretical analysis. Section 4 compares 2DBLDA with its related approaches. Section 5 discusses the relationship between our 2DBLDA and related methods and analyses the experimental results. Finally, the concluding remarks are given in Section 6. The proof of the Bhattacharyya error upper bound of 2DBLDA is given in the Appendix.

The notations of this paper are given as follows. We consider a supervised learning problem in the d1 × d2-dimensional matrix space d1×d2. The training dataset is given by T = {(X1,y1),...,(XN,yN)}, where Xld1×d2 is the l-th input matrix sample and yl ∈{1,...,c} is the corresponding label, l = 1,...,N. Assume that the i-th class contains Ni samples, i = 1,…,c. Then, we have i=1cNi=N. We further write the samples in the i-th class as {Xis}, where Xis is the s-th sample in the i-th class, i = 1,…,c, s = 1,…,Ni. Let X¯=1Nl=1NXl be the mean of all matrix samples and X¯i=1Nis=1NiXis be the mean of matrix samples in the i-th class. For a matrix Q=(q1,q2,,qn)m×n, its Frobenius norm (F-norm) ||Q||F is defined as ||Q||F=i=1n||qi||22. The F-norm is a natural generalization of the vector L2-norm on matrices.

Related work

Linear discriminant analysis

Linear discriminant analysis (LDA) finds a projection transformation matrix W such that the ratio of between-class distance to within-class distance is maximized in the projected space. For data in n, LDA finds an optimal Wn×r, rn, such that the most discriminant information of the data is retained in r by solving the following problem:

maxWtr(WTSbW)tr(WTSwW), 1

where tr(⋅) is the trace operation of a matrix, and the between-class scatter matrix Sb and the within-class scatter matrix Sw are defined by

Sb=1Ni=1cNi(x¯ix¯)(x¯ix¯)T 2

and

Sw=1Ni=1cs=1Ni(xisx¯i)(xisx¯i)T, 3

where x¯in is the mean of the samples in the i-th class, x¯n is the mean of the whole data, and xisn is the s-th sample of the i-th class. The optimization problem (1) is equivalent to the generalized problem Sbw = λSww, where λ≠ 0, with its solution W = (w1,…,wr) given by the first r largest eigenvalues of (Sw)1Sb in case Sw being nonsingular.

L2-norm linear discriminant analysis criterion via the Bhattacharyya error bound estimation

As an improvement over LDA, the L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] is a recently proposed vector-based weighted linear discriminant analysis. In the vector space n, by minimizing an upper bound of the Bhattacharyya error, the optimization problem of L2BLDA is formulated as

minW1Ni<jNiNj||WT(x¯ix¯j)||22+Δi=1cs=1Ni||WT(xisx¯i)||22s.t.WTW=I, 4

where Wn×r, rn, Pi=NiN, Pj=NjN, x¯in is the mean of the samples in the i-th class, xisn is the s-th sample of the i-th class, Δ=14i<jcPiPj||x¯ix¯j||22, and Ir×r is the identity matrix.

L2BLDA is solved through the following standard eigenvalue decomposition problem:

minWtr(WTSW)s.t.WTW=I, 5

where

S=1Ni<jNiNj(x¯ix¯j)(x¯ix¯j)T+Δi=1cs=1Ni(xisx¯i)(xisx¯i)T. 6

Then, W = (w1,w2,…,wr) is obtained by the r orthogonormal eigenvectors that correspond to the first r nonzero smallest eigenvectors of S. After obtaining the optimal W, a new sample xn is projected into r by WTx.

Two-dimensional linear discriminant analysis

Different from LDA or L2BLDA, which works on vector samples, two-dimensional linear discriminant analysis (2DLDA) [16, 17] operates on matrix samples. 2DLDA defines the between-class scatter matrix and the within-class scatter matrix directly on the 2D data set T as

Sb=1Ni=1cNi(X¯iX¯)(X¯iX¯)T 7

and

Sw=1Ni=1cs=1Ni(XisX¯i)(XisX¯i)T. 8

Then 2DLDA solves the following optimization problem:

maxWtr(WTSbW)tr(WTSwW)=i=1cNiWT(X¯iX¯)F2i=1cs=1NiWT(XisX¯i)F2, 9

where W=(w1,,wr)d1×r, rd1. i = 1,…,c, j = 1,…,Ni. (9) can be solved through the generalized eigenvalue problem Sbw = λSww in case Sw is nonsingular, and its solution is the r eigenvectors corresponding to the first largest r nonzero eigenvalues. After obtaining optimal W, a new sample Xd1×d2 is projected into r×d2 by WTX. Note that 2DLDA will still encounter the singularity problem when Sw is not of full rank.

Two-dimensional Bhattacharyya bound linear discriminant analysis

The derivation of a Bhattacharyya error bound estimation

In this section, we derive a new two-dimensional linear discriminant analysis criterion by minimizing a Bhattacharyya error bound.

From the viewpoint of minimizing the probability of classification error, the Bayes classifier is the best classifier [1], and its error rate, known as the Bayes error, is defined as

𝜖=1maxi{1,2,,c}{Pipi(X)}dX, 10

where X is a sample, Pi is the prior probability, and pi(X) is the probability density function of the i-th class of the data. Computing the Bayes error is usually hard, and therefore minimizing its upper bound is often considered an alternative effective method [3537]. Among various bounds, the Bhattacharyya error [32] is a close upper bound to the Bayes error, which is given by

𝜖B=i<jcPiPjpi(X)pj(X)dX. 11

Under the background of two-dimensional supervised dimensionality reduction, if we can derive a relatively close upper bound of 𝜖B, we may obtain a reasonable dimensionality reduction model. In fact, under some basic assumptions, we can obtain an upper bound of 𝜖B, as shown in the following proposition.

Proposition 1

Assume Pi and pi(X) are the prior probability and the probability density function of the i-th class for the training data set T, respectively, and the data samples in each class are independent and identically normally distributed. Let p1(X),p2(X),…,pc(X) be the Gaussian functions given by pi(X)=N(X|X¯i,Σi), where X¯i and Σi are the class mean and the class covariance matrix, respectively. We further suppose Σi = Σ, i = 1,2,…,c, where Σ is the covariance matrix of the data set T, and X¯i and Σ can be estimated accurately from T. Then for arbitrary projection vector wd1, the Bhattacharyya error bound 𝜖B defined by (11) on the data set T~={X~i|X~i=wTXi1×d2} satisfies the following:

𝜖Ba8i<jcPiPj||wT(X¯iX¯j)||22+a8Δi=1cs=1Ni||wT(XisX¯i)||22+i<jcPiPj, 12

where Δ=14i<jcPiPj||X¯iX¯j||F2, and a > 0 is some constant.

Proof

See the Appendix. □

The proposed two-dimensional Bhattacharyya bound linear discriminant analysis

Proposition 1 gives a reasonable upper bound of 𝜖B. After obtaining an upper error bound, it is natural to minimize it. Therefore, we minimize the upper bound of 𝜖B in (12), that is, the right side of (12). In fact, by minimizing it, we can easily obtain a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA) as follows:

minwTw=11Ni<jNiNj||wT(X¯iX¯j)||22+Δi=1cs=1Ni||wT(XisX¯i)||22 13

where Δ=14i<jcPiPj||X¯iX¯j||F2, wd1, Pi=NiN.

By applying (13), we can project a d1 × d2 sample X into a 1 × d2 sample X~ by X~=wTX. However, it does not usually contain enough discriminant information in the 1 × d2 space, and we may need r ≥ 1 projection vectors w1,w2,…,wr that constitute a projection matrix W=(w1,w2,,wr)d1×r and project X into a r × d2 space by X~=WTX.

In general, we consider the following 2DBLDA

minW1Ni<jNiNj||WT(X¯iX¯j)||F2+Δi=1cs=1Ni||WT(XisX¯i)||F2s.t.WTW=I, 14

where Wr×d1, rd1. We now give the geometric meaning of 2DBLDA. Minimizing the first term in (14) will make the means of two different classes far from each other in the projected space, which guarantees the between-class separativeness. Here, the coefficients 1NNiNj in the first term weight distance pairs between different class means. Minimizing the second term in (14) forces each sample around its own class mean in the projected space. The weighting constant Δ in front of the second term balances the between-class importance and within-class importance while also ensuring a minimum error bound according to the proof of Proposition 1. We can observe that 2DBLDA is adaptive to different data since Δ is determined by the given data set. To ensure minimum redundancy in the projected space, we also consider an orthogonormal constraint WTW = I on discriminant directions.

It is easily seen that we can solve 2DBLDA through the following standard eigenvalue decomposition problem:

minWtr(WTSW)s.t.WTW=I, 15

where

S=1Ni<jNiNj(X¯iX¯j)(X¯iX¯j)T+Δi=1cs=1Ni(XisX¯i)(XisX¯i)T. 16

Then, we obtain the optimal solution as W=w1,w2,,wr, where w1,w2,…,wr are the r orthogonormal eigenvectors corresponding to the first r smallest nonzero eigenvectors of S.

Experiments

In this section, we compare the proposed 2DBLDA with 2DPCA [34], 2DPCA-L1 [12], 2DLDA [16] and L1-2DLDA [18, 19]. The learning parameter δ of L1-2DLDA is selected optimally from the set {0.001,0.005,0.01,0.05,0.1,0.5,1} by grid search.

We experimented on three image databases for image recognition and one face image database for face reconstruction. In the experiments, after applying a dimensionality reduction method on training data and then obtaining a projection matrix, the test data are projected to lower dimensional space using this projection matrix. For image recognition, the nearest neighbours classifier is employed to obtain classification accuracy. In addition, when the data classes are unbalanced, area under the ROC curve (AUC) and G-mean are used as the performance measurement index. For face reconstruction, the mean reconstruction error is used for performance evaluation. All the methods will be carried out on a PC with P4 2.3 GHz CPU by Matlab 2017b.

Image recognition

Performance on three image databases

The Yale database1 is a human face database that contains 165 images of 15 individuals, and each individual includes 11 images. The database is considered to evaluate the performance of methods when facial expression and lighting conditions are changed.

Columbia Object Image Library (Coil100)2 is a database of colour images of 100 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated 360 degrees to vary object pose with respect to a fixed colour camera. Images of the objects were taken at pose intervals of 5 degrees. The database contains 900 images of 100 objects, with each object containing 9 images.

The COVID database3 has 349 CT images containing clinical findings of COVID-19 from 216 patients and 397 non-COVID CT scans. The images are collected from COVID-19 related papers from medRxiv4, bioRxiv5, Lancet, etc. In our experiment, 195 COVID-19 images and 195 non-COVID-19 images were randomly extracted.

We resize each image to 32 × 32 for all the above three databases. Since the number of samples in some classes of the image data used in the experiment is relatively small, to avoid the chance that images in these classes may not be selected due to random cross-validation, for each class, we randomly select 60% of the data samples as the training set, and deem the rest as the test set. Therefore, this strategy makes sure both train and test data set contain samples from every class. First, we obtain all projection matrices from the training data and then compute the test classification accuracy on the projected test data. Since 2DPCA, 2DLDA and 2DBLDA have no parameters, the result of one run is the final result. For L1-2DLDA, there is one parameter, and we adopt ten-fold cross validation on the training set to find its optimal parameter. Then, this optimal parameter is used to run L1-2DLDA ten times on the test set to eliminate the influence of random initialization, and the average accuracy of these ten accuracies is adopted. Similarly, for 2DPCA-L1, since its performance is affected by the initialization projections, we repeat the method ten times and adopt mean accuracy along with standard variance. The results on these databases are listed in Table 1, and the best accuracies are shown in bold figures. From the table, we see that our 2DBLDA has comparable performance compared to other methods. The 2DPCA-L1 and L1-2DLDA obviously have the highest computational burden. In contrast, 2DBLDA costs the least CPU time than 2DPCA and 2DLDA.

Table 1.

Comparison of mean accuracy (%), CPU time (second) for different methods on the original three databases

Data set 2DPCA 2DPCA-L1 2DLDA L1-2DLDA 2DBLDA
AC/Time AC/Time AC/Time AC/Time(δ) AC/Time
Yale 85.00 82.67 ± 0.86 83.33 85.00± 0.00 85.00
1.6835 9.2095 1.4929 10.4038(0.001) 1.4174
Coil100 74.00 67.93 ± 1.26 72.00 73.37 ± 0.46 74.33
30.4082 67.3805 29.1930 71.5989(0.01) 28.7575
COVID 95.00 90.63 ± 0.93 94.38 93.75 ± 0.51 94.38
6.0727 21.2971 5.2936 20.6759(0.05) 5.9291
Average accuracy 84.67 80.41 83.24 84.04 84.57

The optimal parameter δ of L1-2DLDA is shown in bracket

To further see the superiority of our 2DBLDA, we artificially pollute the training data by adding each training sample with a rectangle block occlusion at a random location. We set the occlusion area ratio to 10%,20%,30%,40%. For convenience, we denote these four data sets as Yaleb0.1, Yaleb0.2, Yaleb0.3 and Yaleb0.4, where the subscript “b” represents block occlusion and the number next to it means occlusion ratio. For the Coil100 data and COVID data, we add random rectangular Gaussian noise of mean 0 and variance 0.2 that covers 10%,20%,30%,40% areas of each training image at a random position. Denote these eight data sets as Coilg0.1, Coilg0.2, Coilg0.3, Coilg0.4, COVIDg0.1, COVIDg0.2, COVIDg0.3 and COVIDg0.4, where the subscript “g” represents Gaussian noise, and the number next to it means noise ratio. Some noise samples are shown in Fig. 1.

Fig. 1.

Fig. 1

Noise samples from the three databases

The classification results on the noise datasets are listed in Table 2. From the table, we have the following observations : (i) All methods are affected by noise, and their corresponding accuracies are lower than those of the original data. In general, the larger the noise area is, the lower the accuracy is. (ii) The proposed 2DBLDA has the highest average accuracy on all noise data. (iii) L1-2DLDA and 2DPCA perform better than 2DPCA-L1 and 2DLDA. (iv) L1-2DLDA can achieve the optimal accuracy when δ is relatively small. (v) For CPU time, we see that 2DPCA-L1 and L1-2DLDA have the same computing time level but are all slower than 2DPCA and 2DLDA, and that 2DLDA and 2DBLDA run the fastest since they obtain all the discriminant vectors once for all.

Table 2.

Comparison of mean accuracy (%), CPU time (second) for different methods on noise databases

Data set 2DPCA 2DPCA-L1 2DLDA L1-2DLDA 2DBLDA
AC/Time AC/Time AC/Time AC/Time(δ) AC/Time
Yaleb0.1 76.67 59.33 ± 2.63 76.66 77.33 ± 1.17 78.33
2.0922 10.0383 1.8348 11.1539(0.1) 1.6212
Yaleb0.2 76.67 55.83 ± 3.07 70.00 73.67 ± 2.33 76.67
2.7252 11.0812 2.3906 12.1323(0.005) 2.0293
Yaleb0.3 63.33 50.67 ± 7.58 63.33 64.67 ± 2.33 65.00
3.4940 12.3434 3.1027 13.4336(0.05) 2.6751
Yaleb0.4 56.67 49.67 ± 3.31 56.67 51.83 ± 3.46 60.00
4.6380 13.9008 4.1114 14.9425(0.005) 3.6111
Coilg0.1 71.67 63.77 ± 3.39 70.67 71.37 ± 1.11 72.33
36.3437 84.5664 36.3030 87.0550(0.001) 35.3140
Coilg0.2 67.00 51.43 ± 1.75 68.67 67.10 ± 0.88 70.00
43.2088 86.9822 43.1494 91.3349(0.05) 41.0304
Coilg0.3 63.33 48.93 ± 3.43 63.33 61.80 ± 2.54 68.33
48.2347 90.4744 45.5738 94.9467(0.1) 43.4944
Coilg0.4 58.00 43.70 ± 0.64 60.00 57.73 ± 1.84 61.00
52.1063 93.9737 47.4356 98.0803(0.005) 45.9146
COVIDg0.1 94.38 91.44 ± 0.42 93.75 94.11 ± 0.03 94.38
6.7681 23.4892 6.8060 22.6747(0.1) 6.2296
COVIDg0.2 93.13 91.25 ± 0.00 93.13 92.94 ± 0.93 93.75
6.8413 23.5693 6.8593 23.9786(0.01) 6.3514
COVIDg0.3 91.25 90.88 ± 0.67 90.00 90.88 ± 0.72 91.25
7.0305 23.9982 7.0121 24.6301(0.005) 6.5660
COVIDg0.4 90.63 88.88 ± 0.92 87.50 92.32 ± 0.78 92.50
7.2641 24.5959 6.9099 25.4767(0.1) 6.6628
Average accuracy 75.23 65.48 74.48 74.65 76.96

The optimal parameter δ of L1-2DLDA is shown in bracket

The influence of the reduced dimension

To observe the discriminant ability of the dimensionality method, we measure feature ranking by observing the effect of sample classification in projection space and plot the accuracy variation along with the reduced dimensions in Figs. 2 and 3. Figure 2 depicts the variation of accuracies along dimensions on the original three databases, and Fig. 3 depicts the corresponding results on noise databases.

Fig. 2.

Fig. 2

Accuracies of all methods on the original three databases

Fig. 3.

Fig. 3

Accuracies of all methods on three databases with different levels of noise

The results show the following: (i) With the increase of the number of reduced dimensions, the accuracies of 2DPCA and our 2DBLDA first achieve their highest and then have a relatively steady trend, while other methods vary greatly. (ii) Regardless of on the original data or the noise data, the proposed 2DBLDA has the highest accuracy under the optimal reduced dimension. (iii) All the methods are greatly influenced by the reduced dimension, and it is necessary to choose an optimal reduced dimension. (iv) In addition, the optimal reduced dimension of 2DBLDA is not too large compared to other methods in general.

The influence of the unbalanced classes

In this subsection, we verify the influence of our algorithm on unbalanced classes. To construct unbalanced data, different numbers of images are randomly selected from each class to form the training set, and the remaining data are deemed as the test set. In specific, for the COVID database, we randomly select 60% of the sample number for each class from COVID-19 images and non-COVID-19 images in a ratio of 1:1.5 as the training set. Notably, the training set and test set we construct are unbalanced. To test the robustness, as before, we pollute the training images with a black rectangular block, which covers 10%, 20%, 30% and 40% of each image at a random position. In this situation, we use AUC and G-mean to measure the performance of all methods, which are both designed for unbalanced data. The results on original databases and noise databases are demonstrated in Figs. 4 and 5. From Figs. 4 and 5, we can see that the proposed 2DBLDA has the highest AUC and G-mean of all databases. Though the larger the noise area is the lower the performance is for all algorithms, when the block percentage increases, 2DBLDA and L1-2DLDA are less affected by noise, while the performance of other methods decreases dramatically and the proposed 2DBLDA is the best. The result is in fact consistent with the formulation of 2DBLDA, where its weighted between-class distance information and weighting constant of the between-class distance and the within-class distance make contribution to its good performance on unbalanced problems. The result also shows that compared to other methods, our 2DBLDA is more adaptive and robust to different data.

Fig. 4.

Fig. 4

AUC of all methods on different databases

Fig. 5.

Fig. 5

G-mean of all methods on different databases

Face Reconstruction

In this part, the proposed 2DBLDA and other methods are applied to face reconstruction on the Indian female database. The Indian females database contains 242 human face images of 22 female individuals, and each individual has 11 different images. The original images are resized to 32×32 pixels.

We introduce face image reconstruction. For a given image Xd1×d2, suppose we have obtained a projection matrix W=w1,w2,,wrd1×r, rd1. Then X is projected into the r × d2-dimensional space by X~=WTX. Since w1,w2,…,wr are orthonormal, then the reconstructed image of X can be obtained by X^=WX~=WWTX. To measure the reconstruction performance, we use the average reconstruction error (ARE) as a performance indicator, which is defined as

ēr=1Ni=1N||XiWWTXi||F, 17

where r = 1,2,…,d1.

We first experiment on the original data and compute the ARE for each method. The variation in ARE along different dimensions is shown in Fig. 6 (a). From the figure, we see that when the dimension is less than 15, our 2DBLDA performs the best, especially when the dimension is greater than 5. When the dimension is greater than 15, 2DPCA is comparable or slightly better than our 2DBLDA, but both of these methods almost achieve steady performance. The result shows that 2DBLDA can achieve good performance for low dimensions. The other three methods obviously perform worse than our 2DBLDA and 2DPCA on all the dimensions. When r = 15, we demonstrate the reconstructed face images for 7 random individuals in Fig. 6b. We can visually see that 2DBLDA and 2DPCA have the best reconstruction performance.

Fig. 6.

Fig. 6

Face reconstruction results on different databases

To further evaluate the effectiveness of the proposed 2DBLDA, we add two different types of noise to the data. The first type of noise is Gaussian noise with mean 0 and variance 0.05 that covers 30% of the area of each image. The ARE of each method under different dimensions is plotted in Fig. 6c. On Gaussian noise data, we see that our 2DBLDA outperforms other methods on almost all the reduced dimensions, and 2DPCA is comparable to our 2DBLDA only when the dimension is greater than 27, indicating that the proposed 2DBLDA can achieve fairly good performance by employing only a small number of reduced dimensions. We then add the second type of noise, dummy noise, to the data. Here, the dummy noise is the image that is generated from the discrete uniform distribution on [0,1] and is of the same size as the original image. An additional 100 dummy images are added to the whole database. After the projection matrix is obtained on these polluted data, it is used to reconstruct human face images. The result in Fig. 6e demonstrates that our 2DBLDA has the lowest ARE on these databases for all the dimensions, and when the dimension is greater than 20, it has a rather low ARE. The reconstructed face images when r = 15 shown in Fig. 6f also support the above argument.

Discussion

To further clarify the contribution of our method, we discuss the differences between the proposed 2DBLDA and its two closely related methods, RLp2DLDA and L2BLDA, and give a detailed analysis of the experimental results.

Relationship between RLp2DLDA, L2BLDA and 2DBLDA

  • (i)

    Difference From RLp2DLDA: The formulation of 2DBLDA is different from any existing 2D linear discriminant analysis method, and the 2DBLDA criterion is derived by minimizing an upper bound of the theoretical framework of the Bhattacharyya error bound optimality. Although robust bilateral Lp-norm two-dimensional linear discriminant analysis (RLp2DLDA) is also derived from some upper bound of the Bhattacharyya error, they have different formulations since they have different error bounds. In fact, the bound for 2DBLDA may be closer than the bound of RLp2DLDA, which can be observed from two aspects. First, when deriving its bound, RLp2DLDA ignores the term PiPj and replaces it by 1, which obviously magnifies the upper bound. In contrast, our 2DBLDA keeps this term and fully explores this weighting information, which leads to one of good properties of 2DBLDA, that is, robustness. Second, RLp2DLDA also magnifies its upper bound when using the Lp-norm (0 < p < 1) rather than the L2-norm. Therefore, this results in two advantages of our 2DBLDA over RLp2DLDA: one is that 2DBLDA obtains a meaningful weighting parameter that does not need tuning, and the other is that 2DBLDA can simply solve its optimization problem through a standard eigenvalue problem, while RLp2DLDA solves its optimization problem through an iteration technique without proving its convergence.

  • (ii)

    Difference From L2BLDA: Compared to the vector-based robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm (L2BLDA), the proposed 2DBLDA is a matrix-based dimensionality reduction method. Although 2DBLDA is a generalization of L2BLDA, it is not so direct from view of the derivation of its upper bound. In fact, the derivation procedure of the Bhattacharyya error bound of 2DBLDA is not exactly the same as that of L2BLDA. In addition, 2DBLDA can more effectively deal with the matrix input without vectoring it first, which improves the computing efficiency, especially when computing the scatter matrices.

Experimental results summary

  • (i)

    To study the performance of 2DBLDA, we give the variation of accuracies under different databases and different noise levels. The time of 2DBLDA is also investigated in Tables 1-2. Experimental results show that 2DBLDA runs fast and improves the robustness of 2DLDA.

  • (ii)

    To compare the behavior of 2DBLDA and other related methods under different reduced dimensions, we plot the accuracy variation along with the reduced dimensions in Figs. 2-3. The results demonstrate that compared to other methods, the proposed 2DBLDA obtains better classification results under its optimal reduced dimension.

  • (iii)

    To see the application ability of 2DBLDA in unbalanced classes, we experiment on three original and different noise image databases. From the results in Fig. 4 and Fig. 5, we see that the proposed 2DBLDA has the best performance compared to other methods.

  • (iv)

    To observe the behavior of the proposed method visually, we reconstruct face images by the obtained projection matrix. Original and polluted Indian female databases are used for face reconstruction. By choosing an appropriate reduced dimension but not necessarily too large, the proposed 2DBLDA can obtain good face reconstruction performance.

Conclusion

This paper proposed a novel two-dimensional linear discriminant analysis via Bhattacharyya upper bound optimality (2DBLDA). Different from the existing 2DLDA, optimizing the criterion of 2DBLDA was equivalent to optimizing the upper bound of the Bhattacharyya error, leading to maximizing a weighted between-class distance and minimizing the within-class distance, where these two distances were weighted by a meaningful adaptive constant that can be computed directly from the involved data. The 2DBLDA had no parameters to be tuned and could be effectively solved by a standard eigenvalue decomposition problem. Experimental results on image recognition and face image reconstruction demonstrated the superiority of the proposed method. Our MATLAB code can be downloaded from http://www.optimal-group.org/Resources/Code/2DBLDA.html.

However, a drawback of 2DBLDA is that its classification performance degrades when the class distribution of the samples is inconsistent. A TAISL technique could be used to handle this issue [38]. Since sparse learning could make the data have better interpretation after dimensionality reduction [20], one of the future studies also includes considering a sparse model. In the end, applying our algorithm to track fault detection is worth studying [39, 40].

Acknowledgements

This work is supported by the Hainan Provincial Natural Science Foundation of China (No. 620QN234 and No. 120RC449) and the National Natural Science Foundation of China (No. 12171307, No. 62066012, No. 61703370, No. 11871183, No. 61866010, No. 11771275 and No. 6210021509), in part by Zhejiang Soft Science Research Project (2021C35003) and in part by the Natural Science Foundation of Inner Mongolia Autonomous Region (No. 2019BS01009).

Biographies

Yan-Ru Guo

received her Master’s degree in Department of Mathematics from Zhejiang University of Technology, China, in 2015. Currently, she is pursuing a Ph.D. degree at Shanghai University. Her research interests include optimization methods and machine learning. graphic file with name 10489_2021_2843_Figa_HTML.jpg

Yan-Qin Bai

is a Professor and Doctoral supervisor in the Department of Mathematics, School of Sciences at Shanghai University. She received her Ph.D. degrees majored on Operations Research and Cybernetics from Shanghai University. She held a research Fellow and postdoctoral research Fellow position in Delft University of Technology during the period of 2001-2004 and in Tilburg University, Netherlands in 2006, respectively. Her research interests include the interior-point methods for solving linear and conic optimization problems, with applications towards machine leaning, spare optimization, portfolio selection, etc. She has published over 100 research papers and a book entitled Kernel Function-Based Interior-point Algorithms for Conic Optimization published by Science Press China. She once served as the vice-president of Operations Research Society of China (ORSC), vice-president of the Branch of Mathematical Programming, and currently is the president of Operations Research Society of Shanghai, the associate editors-in-chief of Operations Research Transactions, and the editorial board member of Journal of the Operations Research Society of China (JORSC). graphic file with name 10489_2021_2843_Figb_HTML.jpg

Chun-Na Li

received her Master’s degree and Ph.D degree in Department of Mathematics from Harbin Institute of Technology, China, in 2009 and 2012, respectively. Currently, she is an associate professor at Management School, Hainan University. Her research interests include optimization methods, machine learning and data mining. graphic file with name 10489_2021_2843_Figc_HTML.jpg

Lan Bai

received the Ph.D. degree in mathematics from the Department of Mathematics, Jilin University, Changchun, China, in 2014. She is currently a Lecturer with the School of Mathematical Sciences, Inner Mongolia University, Hohhot, China. Her research interests include clustering techniques, feature selection, and data mining. graphic file with name 10489_2021_2843_Figd_HTML.jpg

Yuan-Hai Shao

received his B.S. degree in information and computing science in College of Mathematics from Jilin University, the masters degree in applied mathematics, and Ph.D. degree in operations research and management in College of Science from China Agricultural University, China, in 2006, 2008 and 2011, respectively. Currently, he is a professor at Management School, Hainan University. His research interests include optimization methods, machine learning and data mining. He has published over 100 refereed papers. graphic file with name 10489_2021_2843_Fige_HTML.jpg

Appendix

Proof of Proposition 1:

We first note that pi(X~)=N(X~|X¯~i,Σ~), where X¯~i=wTX¯i1×d2 is the i-class mean, and Σ~ is the covariance matrix in the 1 × d2 projected space. Denote

D=wTX1wTXNTd2×NandX¯~I=wTX¯t1wTX¯tNTd2×N. 18

Then Σ~=(DX¯~I)(DX¯~I)T.

According to [1], we have

pi(X~)pj(X~)=e18(X¯i~X¯j~)Σ~1(X¯i~X¯j~)T. 19

The upper bound of the error 𝜖B can be estimated as

𝜖B=i<jcPiPje18(X¯i~X¯j~)Σ~1(X¯i~X¯j~)T=i<jcPiPje18||(X¯i~X¯j~)Σ~12||22i<jcPiPj1a8||(X¯i~X¯j~)Σ~12||22=i<jcPiPja8i<jcPiPj||(wTX¯iwTX¯j)Σ~12||22i<jcPiPja8i<jcPiPj||(wTX¯iwTX¯j)||22||Σ~12||F2i<jcPiPja8i<jcPiPj||wT(X¯iX¯j)||22+a8i<jcPiPjΔij||Σ~12||F2, 20

where Δij=14||X¯iX¯j||F2, a > 0 is some constant. For the first inequality of (20), note that the real value function f(z) = ez is concave when z ∈ [0,b], b > 0; therefore, ez11ebbz. By taking a=1ebb and noting X¯i~=wTX¯i, the first inequality is obtained. For the second inequality, we first note that for any z1×d2 and an invertible Ad2×d2, ||z||2=||(zA)A1||2||zA||2||A1||F, which implies ||zA||2||z||2||A1||F. By taking z=wTX¯iwTX¯j and A=Σ~12, we get the second inequality. For the last inequality, since ||w||2 = 1, ||wT(X¯iX¯j)||22||w||22||X¯iX¯j||F2=||X¯iX¯j||F2 and 1||Σ~12||F211||Σ~12||F214, we have

||wT(X¯iX¯j)||22||wT(X¯iX¯j)||22||Σ~12||F21||Σ~12||F2=||wT(X¯iX¯j)||221||Σ~12||F211||Σ~12||F214||wT(X¯iX¯j)||2214||X¯iX¯j||F2=Δij. 21

which implies

||wT(X¯iX¯j)||22||Σ~12||F2||wT(X¯iX¯j)||22+Δij||Σ~12||F2. 22

By multiplying a8PiPj to both sides of (22) and summing it over all 1 ≤ i < jc, we obtain the last inequality of (20).

Take Δ=i<jcPiPjΔij=14i<jcPiPj||X¯iX¯j||F2, and note that ||Σ~12||F2=i=1cs=1Ni||wT(XisX¯i)||22, we then obtain (12). □

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Yan-Ru Guo, Email: Guoyanru211@163.com.

Yan-Qin Bai, Email: yqbai@shu.edu.cn.

Chun-Na Li, Email: na1013na@163.com.

Lan Bai, Email: imubailan@163.com.

Yuan-Hai Shao, Email: shaoyuanhai21@163.com.

References

  • 1.Fukunaga K. Introduction to statistical pattern recognition. New York: Academic Press; 2013. [Google Scholar]
  • 2.Shah JH, Sharif M, Yasmin M, et al. Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recogn Lett. 2020;139:166–173. doi: 10.1016/j.patrec.2017.06.021. [DOI] [Google Scholar]
  • 3.Ouyang AJ, Liu YM, Pei SY, et al. A hybrid improved kernel LDA and PNN algorithm for efficient face recognition. Neurocomputing. 2020;393:214–222. doi: 10.1016/j.neucom.2019.01.117. [DOI] [Google Scholar]
  • 4.Miha P, Vili P. Text classification method based on self training and LDA topic models. Expert Systems with Applications. 2017;80:83–93. doi: 10.1016/j.eswa.2017.03.020. [DOI] [Google Scholar]
  • 5.Chen Y, Zhang H, Liu R, et al. Experimental explorations on short text topic mining between LDA and NMF based schemes. Knowl-Based Syst. 2019;163:1–13. doi: 10.1016/j.knosys.2018.08.011. [DOI] [Google Scholar]
  • 6.Cao G, Iosifidis A, Gabbouj M, et al. Multi view nonparametric discriminant analysis for image retrieval and recognition. IEEE Signal Process Lett. 2017;24(10):1537–1541. doi: 10.1109/LSP.2017.2748392. [DOI] [Google Scholar]
  • 7.Liu Z, Zhang CM, Chen CX. MMDF LDA An improved multi modal latent dirichlet allocation model for social image annotation. Expert Syst Appl. 2018;104:168–184. doi: 10.1016/j.eswa.2018.03.014. [DOI] [Google Scholar]
  • 8.Wang H, Fan Y, Fang B, et al. Generalized linear discriminant analysis based on euclidean norm for gait recognition. Int J Mach Learn Cybern. 2018;9(4):569–576. doi: 10.1007/s13042-016-0540-0. [DOI] [Google Scholar]
  • 9.Dong K, Zhao H, Tong T, et al. NBLDA: Negative binomial linear discriminant analysis for RNA-seq data. BMC Bioinforma. 2016;17(1):1–10. doi: 10.1186/s12859-016-1208-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ibrahim W, Abadeh MS. Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis. Neural Comput Appl. 2019;31(8):4201–4214. doi: 10.1007/s00521-018-3346-z. [DOI] [Google Scholar]
  • 11.Sharma A, Paliwal KK. Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern. 2015;6(3):443–454. doi: 10.1007/s13042-013-0226-9. [DOI] [Google Scholar]
  • 12.Li X, Pang Y, Yuan Y. L1-norm-based 2DPCA. IEEE Trans Syst Man Cybern Part B (Cybern) 2010;40(4):1170–1175. doi: 10.1109/TSMCB.2009.2035629. [DOI] [PubMed] [Google Scholar]
  • 13.Mi JX, Zhang YN, Li Y, et al. Generalized two-dimensional PCA based on ℓ2-norm minimization. Int J Mach Learn Cybern. 2020;11:2421–2438. doi: 10.1007/s13042-020-01127-1. [DOI] [Google Scholar]
  • 14.Lu Y, Yuan C, Lai Z, et al. Horizontal and vertical nuclear norm based 2DLDA for image representation. IEEE Trans Circ Syst Video Technol. 2019;29(4):941–955. doi: 10.1109/TCSVT.2018.2822761. [DOI] [Google Scholar]
  • 15.Zhao M, Jia ZG, Cai YF, et al. Advanced variations of two-dimensional principal component analysis for face recognition. Neurocomputing. 2021;452:653–664. doi: 10.1016/j.neucom.2020.08.083. [DOI] [Google Scholar]
  • 16.Li M, Yuan B. 2D-LDA: a statistical linear discriminant analysis for image matrix. Pattern Recogn Lett. 2005;26(5):527–532. doi: 10.1016/j.patrec.2004.09.007. [DOI] [Google Scholar]
  • 17.Imani M, Ghassemian H. Two dimensional linear discriminant analyses for hyperspectral data. Photogr Eng Remote Sens. 2015;81(10):777–786. doi: 10.14358/PERS.81.10.777. [DOI] [Google Scholar]
  • 18.Chen SB, Chen DR, Luo B. L1-norm based two-dimensional linear discriminant analysis. J Electron Inf Technol. 2015;37(6):1372–1377. [Google Scholar]
  • 19.Li CN, Shao YH, Deng NY. Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw. 2015;65:92–104. doi: 10.1016/j.neunet.2015.01.003. [DOI] [PubMed] [Google Scholar]
  • 20.Li CN, Shang MQ, Shao YH, et al. Sparse L1-norm two dimensional linear discriminant analysis via the generalized elastic net regularization. Neurocomputing. 2019;337:80–96. doi: 10.1016/j.neucom.2019.01.049. [DOI] [Google Scholar]
  • 21.Li M, Wang J, Wang Q, et al. Trace ratio 2DLDA with L1-norm optimization. Neurocomputing. 2017;266(29):216–225. doi: 10.1016/j.neucom.2017.05.037. [DOI] [Google Scholar]
  • 22.Lu Y, Yuan C, Lai Z, et al. Horizontal and vertical nuclear norm-based 2DLDA for image representation. IEEE Trans Circ Syst Video Technol. 2018;29(4):941–955. doi: 10.1109/TCSVT.2018.2822761. [DOI] [Google Scholar]
  • 23.Zhang P, Deng S, Nie F, et al. Nuclear-norm based 2DLDA with application to face recognition. Neurocomputing. 2019;339:94–104. doi: 10.1016/j.neucom.2019.01.066. [DOI] [Google Scholar]
  • 24.Li CN, Shao YH, Chen WJ, et al. Generalized two-dimensional linear discriminant analysis with regularization. Neural Netw. 2021;142:73–91. doi: 10.1016/j.neunet.2021.04.030. [DOI] [PubMed] [Google Scholar]
  • 25.Li CN, Shao YH, Wang Z, et al. Robust bilateral Lp-norm two-dimensional linear discriminant analysis. Inf Sci. 2019;500:274–297. doi: 10.1016/j.ins.2019.05.066. [DOI] [Google Scholar]
  • 26.Du H, Zhao Z, Wang S, et al. Two-dimensional discriminant analysis based on Schatten p-norm for image feature extraction. J Vis Commun Image Represent. 2017;45:87–94. doi: 10.1016/j.jvcir.2017.02.015. [DOI] [Google Scholar]
  • 27.Lee YP. Palm vein recognition based on a modified (2d)2LDA. Signal Image Video Process. 2015;9(1):229–242. doi: 10.1007/s11760-013-0425-6. [DOI] [Google Scholar]
  • 28.Liu X, Cao Y, Cao Y, et al. Novel method fusing (2d)2LDA with multichannel model for face recognition. J Harbin Inst Technol. 2015;22(6):110–114. [Google Scholar]
  • 29.Wang Q, Qin Z, Nie F, et al. (2017) Convolutional 2DLDA for nonlinear dimensionality reduction. Int Joint Conf Artif Intell:2929–2935
  • 30.Xiao X, Chen Y, Gong YJ, et al. Two-dimensional quaternion sparse discriminant analysis. IEEE Trans Image Process. 2019;29:2271–2286. doi: 10.1109/TIP.2019.2947775. [DOI] [PubMed] [Google Scholar]
  • 31.Li CN, Shao YH, Wang Z, et al. Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm. Knowl-Based Syst. 2019;183:104858. doi: 10.1016/j.knosys.2019.07.029. [DOI] [Google Scholar]
  • 32.Nielsen F. Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means. Pattern Recogn Lett. 2014;42:25–34. doi: 10.1016/j.patrec.2014.01.002. [DOI] [Google Scholar]
  • 33.Guo YR, Bai YQ, Li CN, et al. Reverse nearest neighbors Bhattacharyya bound linear discriminant analysis for multimodal classification. Eng Appl Artif Intell. 2021;97:104033. doi: 10.1016/j.engappai.2020.104033. [DOI] [Google Scholar]
  • 34.Yang J, Zhang D, Frangi AF, et al. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell. 2004;26(1):131–137. doi: 10.1109/TPAMI.2004.1261097. [DOI] [PubMed] [Google Scholar]
  • 35.Ayed IB, Punithakumar K, Li S. Distribution matching with the Bhattacharyya similarity: A Bound Optimization Framework. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1777–1791. doi: 10.1109/TPAMI.2014.2382104. [DOI] [PubMed] [Google Scholar]
  • 36.Jiang B, Zhu B (2021) Dynamic Bhattacharyya bound based approach for fault classification in industrial processes. IEEE Transactions on Industrial Informatics
  • 37.Gyamfi KS, Brusey J, Hunt A, et al. Linear dimensionality reduction for classification via a sequential Bayes error minimisation with an application to flow meter diagnostics. Expert Syst Appl. 2018;91:252–262. doi: 10.1016/j.eswa.2017.09.010. [DOI] [Google Scholar]
  • 38.Hu CF, Wang YX, Gu JW. Cross domain intelligent fault classification of bearings based on tensor aligned invariant subspace learning and two dimensional convolutional neural networks. Knowl-Based Syst. 2020;209:106214. doi: 10.1016/j.knosys.2020.106214. [DOI] [Google Scholar]
  • 39.Hu CF, Wang YX. Multidimensional denoising of rotating machine based on tensor factorization. Mech Syst Signal Process. 2019;122:273–289. doi: 10.1016/j.ymssp.2018.12.012. [DOI] [Google Scholar]
  • 40.Hu CF, He SL, Wang YX. A classification method to detect faults in a rotating machinery based on kernelled support tensor machine and multilinear principal component analysis. Appl Intell. 2021;51:2609–2621. doi: 10.1007/s10489-020-02011-9. [DOI] [Google Scholar]

Articles from Applied Intelligence (Dordrecht, Netherlands) are provided here courtesy of Nature Publishing Group

RESOURCES