Abstract
In real-world applications, the image of faces varies with illumination, facial expression, and poses. It seems that more training samples are able to reveal possible images of the faces. Though minimum squared error classification (MSEC) is a widely used method, its applications on face recognition usually suffer from the problem of a limited number of training samples. In this paper, we improve MSEC by using the mirror faces as virtual training samples. We obtained the mirror faces generated from original training samples and put these two kinds of samples into a new set. The face recognition experiments show that our method does obtain high accuracy performance in classification.
1. Introduction
The conventional minimum squared error (MSE) algorithm has been widely used for pattern recognition and this algorithm also performs well in face classification. MSEC [1, 2], respectively, takes the sample and its class label as the input and output and tries to obtain the mapping that can best transform the input into corresponding outputs. MSE has many advantages in classification. This method is simple and easy to operate. MSEC not only is suitable for two-class classification [3], but also can be applied to multiclass classification [4–6]. MSEC has been extended to nonlinear methods. Kernel MSE (KMSE) [7, 8] is a well-known nonlinear extension of MSEC. Ruiz and Lopez-de-Teruel [9] proposed a KMSE method in which the solution is based on generalized inverse of the kernel matrix. Differing from MSE, KMSE tries to obtain nonlinear mapping between the input and output. A “Lasso” method based on MSE (LMSE) [10–12] has been proposed for pattern recognition. LMSE tries to obtain a good performance by minimizing the l 1 norm of the solution vector and can be viewed as an extension of conventional MSEC. The total least squares (TLS) is also a well-known improvement to the MSE. TLS [13, 14] assumes that both the input and output are corrupted and each of them can be expressed as sum of the corresponding “true data” and “measurement noise.” Other people also proposed another method [15–19] to improve this MSE algorithm. For example, Xu et al. [15] proposed a modified minimum squared error (MMSE) to improve MSE. Besides pattern recognition, MSE has been applied for other fields such as clustering, data fitting, and density estimation [20–23]. Moreover, the well-known representation based methods can be viewed as generalized MSEC methods, for example, collaborative representation classification (CRC) [24], two-phase test samples sparse representation (TPTSSR) [25], and sparse representation-based classifier (SRC) [26].
The MSE algorithm first uses training sample and its class label of learning mapping and exploits the obtained mapping to predict the class label of testing sample [27, 28]. Then, MSE chooses the training sample that is the nearest to the test sample. Finally, MSE assigns the testing sample into class that this training sample belongs to.
The conventional MSEC is limited by the number of training samples. And the images in face recognition were faced with some problems, such as variations of illumination, facial expression, and poses. Facial expression and poses variations can be dealing with the restrict condition to reduce errors, but illumination variation is hard to control [29–31]. Therefore, dealing with illumination problem is necessary for face recognition. Shang et al. proposed an illumination face recognition algorithm based on ordinal feature [32], adopted ordinal feature as the ordinary variable. The solutions of illumination invariable face recognition [33–36] can be classified into three kinds, method based on the normal features, modeling based on changed illumination, and having a standard condition for illumination.
It seems that more training samples are able to reveal more possible variations of the illumination, facial expression, and poses and beneficial for correct classification of the face. However, in real world, it is hard to capture enough samples and a system usually has a limited number of training samples. In order to obtain better face classification, previous literatures have proposed synthesizing new samples from the true face image. These new samples were called virtual samples [37, 38]. For example, Tang et al. [39] proposed prototype faces and an optic flow and expression ratio image based method to generate virtual samples. Ryu and Oh [30] exploited the distribution of the given training set to generate virtual samples. Jung et al. [40] synthesized new face samples with virtual training samples. Xu et al. [41] used the average face of every two original training samples to generate virtual training samples. Liu et al. [42] exploited kernel principal component analysis (PCA) and symmetrical faces to classify the test samples. Xu et al. [43] used symmetrical faces as training samples for two-step face recognition. There are many advantages for generating virtual images. We can obtain features from virtual faces that the original samples do not have, though there are many similar features between virtual faces and original training samples. Besides this, adding new training sample can compensate for the limited training samples.
In this paper, our method proposed the MSE classification based on mirror faces [44, 45] for face recognition. We first establish the equation of the MSE and solve it. Then, we exploit the obtained solution to predict the class labels of test samples. Finally we classify the test sample and obtain the accuracy of classification. In this paper, we use three databases to do experiments, PIE database, Yale B database, and subset of Yale database. Our method has a higher classification accuracy than conventional MSE. This paper will describe the following parts. Section 2 introduces our proposed method, Section 3 analyzes our method, Section 4 shows the results of experiments, and Section 5 is the conclusion.
2. The Proposed Method
In this section we will present the main steps of our proposed method in detail. Suppose that there are c classes and each class has m training samples. There are n numbers of total training samples (n = c∗m). Let x 1,…, x n represent all the training samples and we assign a class label to each class.
2.1. Main Steps of the Proposed Method
The proposed method includes five steps. The first step generates mirror faces of original training samples. The second step puts both the mirror faces and original training samples into a new set and obtains the class label of each new training sample. The third step uses the new training samples to perform MSE algorithm and obtain mapping. The fourth step predicts the class label of testing sample for face recognition. The last step gets the ultimate classification result. And we present these steps as follows.
Step 1 . —
Obtain mirror face of each training sample. Let x i be ith training sample in the form of training image matrix; x i is p × l matrix. And x i = (y 1, y 2, y 3,…, y l), where i ∈ (1,2,…, n), and y j (j = 1,2,…, l) is the column vector of x i. Set variable z, where . The mirror face of x i is defined by .
Step 2 . —
Use both original training samples and mirror faces to structure a new training set and sign its class label matrix. The new training sample number N = 2n. Transforming every original training sample x i and the mirror face into p × l-dimensional column vector, we can obtain (p × l) × N-dimensional training samples matrix . We use c-dimensional vector b i, i = 1,…, N, to represent the class label of each training sample and the class label matrix b defined as ; for the definition of b i please see Section 2.2.
Step 3 . —
Use the new training sample set to perform MSE algorithm face recognition. From this step, we will obtain mapping for test samples. For the algorithm, please see Section 2.2.
Step 4 . —
Exploit the mapping matrix obtained in Step 3 to classify the test samples, and predict the class label of test sample which the training samples are nearest to.
Step 5 . —
Exploit the result obtained by conventional MSE and get the classification result. Combine the original class label of test sample with its predicted class label in step three. If the same, then we consider that the test sample is correctly classified. Finally, we can obtain the accuracy of face recognition.
2.2. Minimum Squared Error (MSE) Algorithm
In this subsection, we present the algorithm of MSE face recognition. There, we use c-dimensional vector b to represent the class label for each class. If a sample is from jth class, then jth element of its class label is one and the other elements are all zeros. For example, if the sample is from the first class, we will take b 1 = [1 0 0 ⋯ 0] as its class label.
There, we assume that matrix W can transform each training sample into its class label; MSE has the following equation:
| (1) |
where
| (2) |
We refer to W as transform matrix, and b i is the class label of the ith training sample. As (1) cannot be directly solved, we convert it into the following equation:
| (3) |
We can obtain W using
| (4) |
where γ and I denote a small positive constant and the identity matrix. MSE classification classified test sample y in the form of row vector as follows: the class label of y is first predicted using the following equation:
| (5) |
Then, the distance between b y and the class labels of the c classes is calculated. We choose the minimum distance which means b y is the closest to the class label of the kth class, and then y will be classified into kth class.
3. Analysis of the Proposed Method
In this section we show the rationales of the proposed method. First, the mirror faces in the proposed method indeed reflect some possible appearance of the face, which are not shown by the original training samples. Figure 1 shows some original training samples from the PIE face database and the mirror faces generated from the original training samples. Figure 2 shows some original training samples from the Yale B face database and the mirror faces generated from the original training samples. Figure 3 shows some original training samples from the Yale database and the mirror face images generated from the original samples. It seems that mirror training samples have different illumination and features in comparison with the original training samples. Our method uses both original training samples and their mirror training samples. And from the results of experiments, the mirror face training samples are really beneficial for correct classification of the test sample. Figure 4 shows some test samples from the PIE face database from the same class. Figure 5 shows some test samples from the Yale B face database from the same class. Figure 6 shows some test samples from the Yale face database from the same class. Mirror faces generated from the original samples do not show high accuracy classification in all case; this is only playing well on different illumination. Figure 7 shows some original training samples from the FERET face database in the same illumination condition and the mirror faces generated from the original training samples. We can see that mirror face training samples have not shown other obvious features. PIE database and Yale B database are all better for our experiments.
Figure 1.

Some original training samples from the PIE database and the mirror faces generated from the original training samples. The first row shows the original training samples. The second row shows the mirror faces generated from the original training samples.
Figure 2.
Some original training samples from the Yale B database and the mirror faces generated from the original training samples. The first row shows the original training samples. The second row shows the mirror faces generated from the original training samples.
Figure 3.

Some original training samples from the Yale database and the mirror faces generated from the original training samples. The first row shows the original training samples. The second row shows the mirror faces generated from the original training samples.
Figure 4.

Some test samples from the PIE face database from the same class.
Figure 5.

Some test samples from the Yale B face database from the same class.
Figure 6.

Some test samples from the Yale face database from the same class.
Figure 7.

Some original training samples from the FERET face database in the same illumination condition and the mirror faces generated from the original training samples. The first row shows the original training samples. The second row shows the mirror faces generated from the original training samples.
The other rationale of the proposed method is that it uses MSE algorithm for face recognition. MSE is easy to calculate and reduce the effect caused by negative errors. Our method uses all training samples and tries to minimize the sum of the deviation between the obtained class labels and true class labels. MSEC is able to convert every training sample. By mapping (4), we obtained predicted class label of test sample, where γ = 0.01. Then, we choose the minimum distance class label as its class label. Finally, we judge the accuracy of classification. As a result, by using mirror face training samples we can increase the probability of test sample being correctly classified.
4. Experimental Results
In this paper, we use three databases to conduct experiments. The first database is a subset of the PIE [46] database whose samples are only influenced by illumination. The second database is Yale B database [47]. And the third database is Yale database [48].
4.1. Experiments on PIE Database
From the PIE database, we used 68 subjects and each subject has 21 gray images. Every image was resized to a 32 × 32 image. For the PIE database, we adopted three cases to do experiments. In the first case, we used the first image of each subject as original training samples and took the remaining images as the test samples. In the second case, we used the first two images of each subject as original training samples and took the remaining images as the test samples. In the third case, we used the first three images of each subject as training samples. Table 1 shows the experimental results on PIE database. From the results, we can see that our method shows greatly accuracy on classification compared to conventional MSEC. With increasing quantity of training samples, the accuracy of classification has improved. When the original training samples are up to three, the classification rate almost reached full percent.
Table 1.
Rate of classification errors (%) on the PIE database.
| Number of the original training samples per class | 1 | 2 | 3 |
|---|---|---|---|
| Conventional MSEC using original images | 14.78 | 7.59 | 0.98 |
| Conventional MSEC using mirror images | 83.09 | 84.21 | 79.58 |
| Our method | 1.32 | 0.23 | 0.16 |
4.2. Experiments on Yale B Database
The Yale B database includes 38 subjects and each subject has 64 gray images. Every image was resized to a 96 × 84 image. We used five cases of original training samples to do experiments; we took the first 3, 4, 5, 30, and 35 face images of each subject as the original training samples and treated the remaining images as the test samples. Table 2 shows the results of Yale B database experiments. It shows again that our proposed method achieved higher accuracy classification than conventional MSEC.
Table 2.
Rate of classification errors (%) on the Yale B database.
| Number of the original training samples per class | 3 | 4 | 5 | 30 | 35 |
|---|---|---|---|---|---|
| Conventional MSEC using original images | 54.92 | 50.09 | 48.66 | 31.27 | 31.22 |
| Conventional MSEC using mirror images | 75.63 | 74.65 | 73.95 | 48.53 | 41.02 |
| Our method | 46.64 | 39.08 | 36.08 | 20.90 | 16.79 |
4.3. Experiments on Yale Database
We used a subset of the Yale database to test our method. This subset consists of 165 images from 15 subjects; each subject includes 11 images. Every image was resized to 32 × 32. We, respectively, took first 2, 3, and 4 face images of each subject as the original training samples and took the remaining face images as the test samples. Table 3 shows the experiments on the Yale database. According to the experimental results, our method shows better than conventional MSEC, especially for conventional MSEC which only uses mirror images as training samples.
Table 3.
Rate of classification errors (%) on the Yale database.
| Number of the original training samples per class | 2 | 3 | 4 |
|---|---|---|---|
| Conventional MSEC using original images | 39.26 | 34.17 | 23.81 |
| Conventional MSEC using mirror images | 68.15 | 60.00 | 42.86 |
| Our method | 34.81 | 29.17 | 20.95 |
5. Conclusions
We propose a very promising method to exploit limited training samples for MSEC face recognition. The new training samples generated in this paper can well exploit the mirror structure of the face. The mirror face is helpful for overcoming the drawback of limited training samples in the real-world face recognition system. And the MSEC is able to obtain high classification accuracy as the training sample nearest to the test sample can provide useful information for classifying it.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
References
- 1.Baramidze V., Lai M.-J. Convergence of discrete and penalized least squares spherical splines. Journal of Approximation Theory. 2011;163(9):1091–1106. doi: 10.1016/j.jat.2011.03.001. [DOI] [Google Scholar]
- 2.Yao G., Ding R. Two-stage least squares based iterative identification algorithm for controlled autoregressive moving average (CARMA) systems. Computers & Mathematics with Applications. 2012;63(5):975–984. doi: 10.1016/j.camwa.2011.12.002. [DOI] [Google Scholar]
- 3.Duda R. O., Hart P. E., Stork D. G. Pattern Classification. 2nd 2000. [Google Scholar]
- 4.Sun L., Ji S., Ye J. Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2011;33(1):194–200. doi: 10.1109/tpami.2010.160. [DOI] [PubMed] [Google Scholar]
- 5.Hastie T., Buja A., Tibshirani R. Penalized discriminant analysis. The Annals of Statistics. 1995;23(1):73–102. doi: 10.1214/aos/1176324456. [DOI] [Google Scholar]
- 6.Ye J. Least squares linear discriminant analysis. Proceedings of the 24th International Conference on Machine Learning (ICML '07); June 2007; Corvallis, Ore, USA. IEEE; pp. 1087–1093. [DOI] [Google Scholar]
- 7.Xu J., Zhang X., Li Y. Kernel MSE algorithm: a unified framework for KFD, LS-SVM and KRR. Proceedings of the International Joint Conference on Neural Networks; July 2001; Washington, DC, USA. IEEE; pp. 1486–1491. [Google Scholar]
- 8.Xu Y., Zhang D., Jin Z., Li M., Yang J.-Y. A fast kernel-based nonlinear discriminant analysis for multi-class problems. Pattern Recognition. 2006;39(6):1026–1033. doi: 10.1016/j.patcog.2005.10.029. [DOI] [Google Scholar]
- 9.Ruiz A., Lopez-de-Teruel P. E. Nonlinear kernel-based statistical pattern analysis. IEEE Transactions on Neural Networks. 2001;12(1):16–32. doi: 10.1109/72.896793. [DOI] [PubMed] [Google Scholar]
- 10.Park M. Y., Hastie T. L 1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007;69(4):659–677. doi: 10.1111/j.1467-9868.2007.00607.x. [DOI] [Google Scholar]
- 11.Belloni A., Chernozhukov V. l 1-penalized quantile regression in high-dimensional sparse models. Annals of Statistics. 2011;39(1):82–130. doi: 10.1214/10-aos827. [DOI] [Google Scholar]
- 12.Bickel P. J., Ritov Y., Tsybakov A. B. Simultaneous analysis of lasso and dantzig selector. Annals of Statistics. 2009;37(4):1705–1732. doi: 10.1214/08-aos620. [DOI] [Google Scholar]
- 13.Markovsky I., Van Huffel S. Overview of total least-squares methods. Signal Processing. 2007;87(10):2283–2302. doi: 10.1016/j.sigpro.2007.04.004. [DOI] [Google Scholar]
- 14.Eldar Y. C. Universal weighted MSE improvement of the least-squares estimator. IEEE Transactions on Signal Processing. 2008;56(5):1788–1800. doi: 10.1109/tsp.2007.913158. [DOI] [Google Scholar]
- 15.Xu Y., Fang X., Zhu Q., Chen Y., You J., Liu H. Modified minimum squared error algorithm for robust classification and face recognition experiments. Neurocomputing. 2014;135:253–261. doi: 10.1016/j.neucom.2013.11.025. [DOI] [Google Scholar]
- 16.Zhao Y.-P., Du Z.-H., Zhang Z.-A., Zhang H.-B. A fast method of feature extraction for kernel MSE. Neurocomputing. 2011;74(10):1654–1663. doi: 10.1016/j.neucom.2011.01.020. [DOI] [Google Scholar]
- 17.Dalton L. A., Dougherty E. R. Bayesian minimum mean-square error estimation for classification error. Part I. Definition and the bayesian mmse error estimator for discrete classification. IEEE Transactions on Signal Processing. 2011;59(1):115–129. doi: 10.1109/tsp.2010.2084572. [DOI] [Google Scholar]
- 18.Zhu Q. Reformative nonlinear feature extraction using kernel MSE. Neurocomputing. 2010;73(16–18):3334–3337. doi: 10.1016/j.neucom.2010.04.007. [DOI] [Google Scholar]
- 19.Dalton L. A., Dougherty E. R. Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error. Part I. Representation. IEEE Transactions on Signal Processing. 2012;60(5):2575–2587. doi: 10.1109/tsp.2012.2184101. [DOI] [Google Scholar]
- 20.Zhang Y., Yao J., Chen X., Shen Z. Emitter recognition based on least square error model. Information and Electronic Engineering. 2011;9(1):30–32, 43. doi: 10.3969/j.issn.1672-2892.2011.01.007. [DOI] [Google Scholar]
- 21.Wang H., Yan W., Huang M., Guo B. New AQM algorithm ISE-GPM-PID with lease square error integral. Computer Science. 2012;39(1):37–43. [Google Scholar]
- 22.Shan B., Shen T., Cui Y., Zhao P., Hu Y. Improving the performance of alternating projection demosaicking algorithm by pre-estimating minimum square error initialization. Transactions of Beijing Institute of Technology. 2007;27(5):436–440. [Google Scholar]
- 23.Wang J., You J., Li Q., Xu Y. Orthogonal discriminant vector for face recognition across pose. Pattern Recognition. 2012;45(12):4069–4079. doi: 10.1016/j.patcog.2012.04.012. [DOI] [Google Scholar]
- 24.Zhang L., Yang M., Feng X., Ma Y., Zhang D. Collaborative representation based classification foe face recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '12); June 2012. [Google Scholar]
- 25.Xu Y., Zhang D., Yang J., Yang J.-Y. A two-phase test sample sparse representation method for use with face recognition. IEEE Transactions on Circuits and Systems for Video Technology. 2011;21(9):1255–1262. doi: 10.1109/tcsvt.2011.2138790. [DOI] [Google Scholar]
- 26.Yang J., Chu D. Sparse representation classifier steered discriminative projection. Proceedings of the 20th International Conference on Pattern Recognition (ICPR '10); August 2010; Istanbul, Turkey. IEEE; pp. 694–697. [DOI] [Google Scholar]
- 27.Shu N., Ma H., Sun H. The Theories and Methods of Pattern Recognition. Wuhan, China: Wuhan University Press; 2004. [Google Scholar]
- 28.Xu J.-H. Regularized Kernel forms of minimum square error method. Acta Automatica Sinica. 2004;(1):27–36. [Google Scholar]
- 29.Liang Y., Gong W., Pan Y., Li W., Liu J., Zhang H. Singular value decomposition-based approach for face recognition. Optics and Precision Engineering. 2004;12(5):543–549. [Google Scholar]
- 30.Ryu Y.-S., Oh S.-Y. Simple hybrid classifier for face recognition with adaptively generated virtual data. Pattern Recognition Letters. 2002;23(7):833–841. doi: 10.1016/S0167-8655(01)00159-3. [DOI] [Google Scholar]
- 31.Beymer D., Poggio T. Face recognition from one example view. Proceedings of the 5th International Conference on Computer Vision; June 1995; pp. 500–507. [Google Scholar]
- 32.Shang Z., Hu D., Zhao H., Yang J. Self-adaptive illumination invariant feature extraction method for texture image. Computer Engineering. 2013;39(11):254–258. [Google Scholar]
- 33.Zhang Y., Xiong F., Zhang G. A preprocessing algorithm for illumination invariant face recognition. Journal of Image and Graphics. 2008;13(9):1707–1712. [Google Scholar]
- 34.Braje W. L., Kersten D., Tarr M. J., Troje N. F. Illumination effects in face recognition. Psychobiology. 1998;26(4):371–380. [Google Scholar]
- 35.Li S. Z., Chu R., Liao S., Zhang L. Illumination invariant face recognition using near-infrared images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29(4):627–639. doi: 10.1109/TPAMI.2007.1014. [DOI] [PubMed] [Google Scholar]
- 36.Shan S., Gao W., Cao B., Zhao D. Illumination normalization for robust face recognition against varying lighting conditions. Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG '03); October 2003; IEEE; pp. 157–164. [DOI] [Google Scholar]
- 37.Zhang S., Chen F., Yang J. Some researches for face recognition with one training image per person. Computer Science. 2006;33(2) [Google Scholar]
- 38.Wang W., Zheng Y., Yang J. Optimizing regularized discriminant analysis in virtual training samples. Journal of Computer-Aided Design and Computer Graphics. 2006;18(9):1327–1331. [Google Scholar]
- 39.Tang B., Luo S., Huang H. High performance face recognition system by creating virtual sample. Proceedings of the International Conference on Neural Networks and Signal Processing; December 2003; Nanjing, China. pp. 972–975. [DOI] [Google Scholar]
- 40.Jung H.-C., Hwang B.-W., Lee S.-W. Authenticating corrupted face image based on noise model. Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (FG '04); May 2004; IEEE; pp. 272–277. [DOI] [Google Scholar]
- 41.Xu Y., Fang X., Li X., Yang J., You J., Teng S. Data uncertainty in face recognition. IEEE Transactions on Cybernetics. 2014;44(10):1950–1961. doi: 10.1109/TCYB.2014.2300175. [DOI] [PubMed] [Google Scholar]
- 42.Liu S., Luo M., Zhang G. Face recognition based on symmetrical kernel principal component analysis. Journal of Computer Applications. 2012;32(5):1404–1406. doi: 10.3724/sp.j.1087.2012.01404. [DOI] [Google Scholar]
- 43.Xu Y., Zhu X., Li Z., Liu G., Lu Y., Liu H. Using the original and ‘symmetrical face’ training samples to perform representation based two-step face recognition. Pattern Recognition. 2013;46(4):1151–1158. doi: 10.1016/j.patcog.2012.11.003. [DOI] [Google Scholar]
- 44.Xu Y., Li X., Yang J., Zhang D. Integrate the original face image and its mirror image for face recognition. Neurocomputing. 2014;131:191–199. doi: 10.1016/j.neucom.2013.10.025. [DOI] [Google Scholar]
- 45.Etemad K., Chellappa R. Discriminant analysis for recognition of human face images. Journal of the Optical Society of America A. 1997;14(8):1724–1733. doi: 10.1364/josaa.14.001724. [DOI] [Google Scholar]
- 46. http://www.computervisiononline.com/dataset/cmu-pie-database.
- 47. http://www.datatang.com/data/44730.
- 48. http://vision.ucsd.edu/content/yale-face-database.

