Skip to main content
PLOS One logoLink to PLOS One
. 2013 Dec 20;8(12):e83291. doi: 10.1371/journal.pone.0083291

Discriminant Projective Non-Negative Matrix Factorization

Naiyang Guan 1, Xiang Zhang 1, Zhigang Luo 1,*, Dacheng Tao 2,*, Xuejun Yang 3
Editor: Xi-Nian Zuo4
PMCID: PMC3869764  PMID: 24376680

Abstract

Projective non-negative matrix factorization (PNMF) projects high-dimensional non-negative examples X onto a lower-dimensional subspace spanned by a non-negative basis W and considers WT X as their coefficients, i.e., XWWT X. Since PNMF learns the natural parts-based representation Wof X, it has been widely used in many fields such as pattern recognition and computer vision. However, PNMF does not perform well in classification tasks because it completely ignores the label information of the dataset. This paper proposes a Discriminant PNMF method (DPNMF) to overcome this deficiency. In particular, DPNMF exploits Fisher's criterion to PNMF for utilizing the label information. Similar to PNMF, DPNMF learns a single non-negative basis matrix and needs less computational burden than NMF. In contrast to PNMF, DPNMF maximizes the distance between centers of any two classes of examples meanwhile minimizes the distance between any two examples of the same class in the lower-dimensional subspace and thus has more discriminant power. We develop a multiplicative update rule to solve DPNMF and prove its convergence. Experimental results on four popular face image datasets confirm its effectiveness comparing with the representative NMF and PNMF algorithms.

Introduction

Dimension reduction uncovers the low-dimensional structures hidden in the high-dimensional data and gets rid of the data redundancy, and thus significantly enhance the performance and reduce the subsequent computational cost. Due to its effectiveness, dimension reduction has been widely used in many areas such as pattern recognition and computer vision. Some data such as image pixels and video frames are non-negative, but conventional dimension reduction approaches like principal component analysis (PCA, [1]) and Fisher's linear discriminant analysis (FLDA, [2]) do not maintain such non-negativity property, and thus lead to a holistic representation which is inconsistent with the intuition of learning parts to form a whole.

Non-negative matrix factorization (NMF, [3]) decomposes a non-negative data matrix X into the product of two lower-rank non-negative factor matrices, i.e., XWH. Due to the non-negativity constraints on both factor matrices W and H, NMF learns parts-based representation and brought much attention in practical tasks such as image processing [4] and data mining [5][8]. To utilize the label information of a dataset, Zafeiriou et al. [9] proposed Discriminant NMF (DNMF) by incorporating Fisher's criterion to NMF. Guan et al. [43] [44] proposed a Nonnegative Patch Alignment Framework (NPAF) that incorporates margin-maximization based discriminative information into NMF. Recently, Guan et al. [42] extended NMF to a novel low-rank and sparse matrix decomposition method termed Manhattan NMF (MahNMF). Nevertheless, NMF, DNMF, NPAF, and MahNMF suffer from the out-of-sample deficiency [10] [11], namely it is indirect to obtain the coefficient of any new coming example. Usually, after getting the basis W by NMF, we calculate the coefficient of a new coming example x as y = W x, where W denotes the pseudo-inverse of W. However, such strategy violates the non-negativity property of the coefficients because the pseudo-inverse operator induces negative entries. Conventional dimension reduction methods such as PAF [35], NPE [12] and LPP [13] overcome the out-of-sample deficiency by using the linearization method which learns a projection matrix. They project a new coming example into the lower-dimensional subspace by directly multiplying it with the learned projection matrix.

To overcome the out-of-sample deficiency of NMF, Yuan et al. [14] proposed projective NMF (PNMF) based on the linearization method. In particular, PNMF learns non-negative basis of the lower dimensional subspace and considers its transpose as the projection matrix, i.e., XWWT X. Since the learned projection matrix is non-negative, PNMF obtains non-negative coefficient for any new coming example because multiplication of non-negative matrix and non-negative vector produces non-negative vector. In addition, since PNMF implicitly induces WWTI, rows of W are approximately orthogonal. Moreover, since W is non-negative, such orthogonality implies that each column of W contains few nonzero entries. Therefore, PNMF implicitly learns parts-based representation. In contrast, NMF never guarantees such parts-based representation [15]. On the other hand, PNMF involves fewer parameters than NMF, and thus it has been widely used in dimension reduction.

Recently, PNMF has been well-studied and extended to deal with various tasks. Liu et al. [10] proposed projective non-negative graph embedding (PNGE) which learns two factor matrices, i.e., a non-negative basis matrix and a non-negative projection matrix while PNMF learns a single one. PNGE incorporates both geometric structure and label information in a dataset based on graph embedding [16]. Wen et al. [17] proposed orthogonal projective non-negative matrix factorization based on NPE (NPOPNMF) for hyperspectral image feature extraction. However, PNGE and NPOPNMF have two unknown variables like NMF and do not benefit enough from PNMF. To handle non-linear dimension reduction problem, Yang et al. [18] proposed non-linear PNMF. Yang et al. [18] theoretically analyzed the convergence of the multiplicative update rule (MUR) of PNMF and applied MUR to optimize the non-linear PNMF. Since the objective function of PNMF contains a fourth-order term, MUR suffers from serious non-convergence problem. To remedy this problem, Hu et al. [19] approximated PNMF with a high-order Taylor expansion of the objective function and developed a convergent MUR with its convergence proved. To guarantee the convergence of PNMF, Zhang et al. [20] solved PNMF by a new adaptive MUR without normalizing the basis matrix in each iteration round.

Although PNMF and its variants have been successfully applied in many fields such as face recognition and document clustering, they share the following problems: PNMF and most of its variants ignore the label information of the dataset, and thus they cannot perform well in classification tasks. PNGE considers the label information based on the graph embedding framework [16], but it induces additional unknown variable and increases the computational complexity. In this paper, we proposed a Discriminant PNMF (DPNMF) to overcome the aforementioned problems. In particular, DPNMF incorporates Fisher's criterion into PNMF to make examples of different classes as far as possible meanwhile make examples of the same class as close as possible in the lower-dimensional subspace. It has been verified that label information enhances recognition performance in practical applications [21][24]. Therefore, DPNMF benefits much from the label information and significantly boosts the performance of classification tasks. To avoid the singularity problem in conventional FLDA, DPNMF utilizes a smartly choosing parameter to trade-off both aforementioned objectives. To solve DPNMF, we developed a MUR-based algorithm and proved its convergence. Experimental results on four popular face image datasets including Yale [25], ORL [26], UMIST [27] and FERET [28] confirm the effectiveness of DPNMF comparing with NMF, PNMF and their extensions.

Analysis

This section surveys both non-negative matrix factorization (NMF) and projective non-negative matrix factorization (PNMF) with their superiorities and shortcomings analysed.

NMF

Given n examples in m-dimensional space arranged in a non-negative data matrix Inline graphic, NMF seeks two lower-rank non-negative factor matrices, i.e., Inline graphic and Inline graphic, whose product reconstructs V. The objective of NMF is to minimize the Kullback-Leiblur (KL) divergence between V and WH, i.e.,

graphic file with name pone.0083291.e004.jpg (1)

where log signifies the natural logarithmic function. Although NMF is jointly non-convex with respect to Wand H, it is convex with respect to W and H separately. Therefore, NMF can be solved by alternatively updating both factor matrices. Lee and Seung [3] proposed an efficient multiplicative update rule (MUR) to solve NMF:

graphic file with name pone.0083291.e005.jpg (2)
graphic file with name pone.0083291.e006.jpg (3)
graphic file with name pone.0083291.e007.jpg (4)

where (2) updates W followed by a normalization (3), and (4) updates H.

Since NMF ignores the label information of a dataset, it does not perform well in classification tasks. In addition, NMF suffers from the out-of-sample problem because it is non-trivial to calculate the non-negative coefficient of a new coming example.

PNMF

To overcome the out-of-sample deficiency of NMF, PNMF [14] learns a non-negative projection matrix to directly project V onto the lower-dimensional subspace. Let W denote the basis matrix, then PNMF treats WTV as the coefficients and utilize WWTV to reconstruct V. The objective function of PNMF is

graphic file with name pone.0083291.e008.jpg (5)

where Inline graphic denotes the Frobenius norm. Since JPNMF is non-convex [19], it is non-trivial to get the global minimum of PNMF. Yuan et al. [14] developed a multiplicative update rule (MUR) to iteratively update W by

graphic file with name pone.0083291.e010.jpg (6)

until JPNMF does not change. In each iteration round, PNMF normalizes W by dividing its spectral norm, i.e., Inline graphic and Inline graphic signifies the spectral norm of a matrix, for the following reason. According to (5), PNMF implicitly induces the constraint WWT≈I, which is not guaranteed by (6). The normalization operator shrinks W to make WWT close to I in terms of spectral norm.

PNMF overcomes the out-of-sample deficiency of NMF and learns parts-based representation because it implicitly induces the orthogonality of the learned basis. However, since PNMF ignores the label information of a dataset, like NMF, PNMF does not work well in classification tasks.

Results

Discriminant PNMF

Above analysis gives us two observations on NMF and its extensions: 1) both NMF and DNMF suffer from the out-of-sample deficiency, and 2) although PNMF overcomes the out-of-sample deficiency, it does not utilize the label information in a dataset. To further understand these observations, we sampled 10 training examples and 10 test examples from two 3-D uniform distributions whose means are [0.0137, 0.1009, 0.5292] and [0.0424, 0.2627, 0.326], respectively. We marked both classes of examples by “*” and “o” and obtained totally 20 training examples painted in red and 20 test examples painted in blue in Figure 1. Figure 1.B and Figure 1.C give the projected test examples onto the 2-D subspaces learned by DNMF and PNMF, respectively. Figure 1.B shows that these coefficients contain negative entries caused by the pseudo-inverse operator over the basis matrix, i.e., DNMF suffers from out-of-sample deficiency which weakens its discriminant power. Figure 1.C shows that PNMF overcomes the out-of-sample deficiency but it has weak discriminant power because it completely ignores the label information.

Figure 1. Projected test examples in the learned 2-D subspace.

Figure 1

Projected test examples in the learned 2-D subspace by (A) DPNMF, (B) DNMF, and (C) PNMF on the synthetic dataset.

These observations motivate us to take advantages of both DNMF and PNMF and propose Discriminant PNMF (DPNMF) algorithm. In particular, we assume that examples can be projected onto a lower-dimensional subspace and the transpose of basis is considered as a projection matrix. Such assumption implicitly induces parts-based representation of the training examples and overcomes the out-of-sample deficiency like PNMF. To utilize the label information of a dataset like DNMF, DPNMF incorporate Fisher's criteria to enhance the discriminant ability of PNMF. Given training data examples arranged in Inline graphic, DPNMF learns the basis matrix Inline graphic(rm and rn) and projects V from Rm to Rr by WT, i.e., the coefficients Y = WTV. According to [2], DPNMF expects the examples of same class as close as possible and the examples of different class as far as possible in the lower-dimensional subspace. Since Y = WTV, the above two objectives are equivalent to

graphic file with name pone.0083291.e015.jpg (7)
graphic file with name pone.0083291.e016.jpg (8)

where C signifies the number of classes, nc is the number of examples of class c, and Inline graphic and Inline graphic signify the within-class scatter and between-class scatter, respectively, where Inline graphic is the j-example of class c, Inline graphic is the mean of examples of class c, Inline graphic is the mean of all examples. By combining (5), (7), and (8), the objective function of DPNMF is

graphic file with name pone.0083291.e022.jpg (9)

where λ balances objectives (7) and (8), and μ controls the weight of Fisher's criterion.

The tradeoff parameterλ is critical in DPNMF (9). According to [29], we choose λ as the largest eigenvalue of Inline graphic, i.e., Inline graphic, to guarantee the convexity of Fisher's criterion. Although the second term of (9) is convex, the objective function of (9) is non-convex because the loss function of PNMF is non-convex. The following section will present an efficient algorithm to find its local minimum. Another tradeoff parameter μ is tuned in the experiments.

MUR for DPNMF

Since the objective function JDPNMF(W) is non-convex, it is impossible to find its global minimum. Fortunately, it is differential with respect to W, and thus the gradient descent method can be used to find a local minimum of (9). By simple algebra, eq. (9) can be written as

graphic file with name pone.0083291.e025.jpg (10)

which is obviously a constrained minimization problem. The problem (10) can be solved by using the Lagrangian multiplier method [30]. The Lagrangian function of the objective function of (10) is

graphic file with name pone.0083291.e026.jpg (11)

where φ is the Lagrangian multiplier of the constraint W≥0.

According to the K.K.T. conditions [31], the minimizer of (9) satisfies

graphic file with name pone.0083291.e027.jpg (12)
graphic file with name pone.0083291.e028.jpg (13)
graphic file with name pone.0083291.e029.jpg (14)

where Wik stands for the entry positioned at the i-th row and k-th column of W.

By substituting (12) into (14), we have

graphic file with name pone.0083291.e030.jpg (15)

Since any real matrix A can be calculated by its positive items minus the negative items, i.e. Inline graphic, where the operator [X]+ keeps the non-negative entries of X meanwhile shrinks the negative entries to zero, Inline graphic equals to Inline graphic and eq. (15) equals to

graphic file with name pone.0083291.e034.jpg

By simple algebra, the above equation is equivalent to

graphic file with name pone.0083291.e035.jpg (16)

Eq. (16) gives us a multiplicative update rule (MUR) for DPNMF

graphic file with name pone.0083291.e036.jpg (17)

Since MUR includes only product operators of non-negative matrices, the obtained minimizer naturally satisfies (17). Although MUR is derived from the K.K.T. condition [31], it does decrease the objective function JDPNMF(W) of DPNMF. The following Theorem 1 proves the convergence of MUR.

Theorem 1: The objective function JDPNMF(W) is non-increasing under (17).

We leave the proof of Theorem 1 in Materials.

Similar to PNMF, DPNMF also implicitly induces the constraint WWTI which cannot be satisfied by MUR. Therefore, DPNMF normalizes W by dividing by its spectral norm in each iteration round to remedy this deficiency. The DPNMF algorithm is summarized in Algorithm 1 (see Table 1), where the operator Inline graphic in line 5 signifies element-wise multiplication. The Algorithm 1 is stopped when the following condition is satisfied:

graphic file with name pone.0083291.e038.jpg (18)

where t is the iteration counter and ε is a predefined tolerance.

Table 1. Summary of MUR algorithm for DPNMF.

Algorithm 1. MUR algorithm for DPNMF
Input: Examples Inline graphic, labels Inline graphic, reduced dimensionality r, regularization parameter μ.
Output: Basis matrix W.
1. Calculate Swand Sb with V and L, according to (1) and (2), respectively.
2. Calculate the largest eigenvalue λ 1of Inline graphic.
3. Initialize Inline graphic and set t = 0.
4. Repeat
5. Calculate Inline graphic.
6. Normalize Inline graphic and update Inline graphic.
7. Until {Stopping criterion (18) is satisfied.}.
8. Inline graphic.

The main time cost of Algorithm 1 is spent on lines 1, 2, and 5. Line 1 constructs both within-class and between-class scatter matrices in O(m 2 n) time. Line 2 calculates inverse of Sw and its multiplication with Sb in O(m 3) time. Line 5 denominates the time complexity because it includes multiplications between high-dimensional matrices and the number of iterations is usually large. Looking carefully at line 5, its time costs can be decreased by updating Wt +1 by the following two steps:

graphic file with name pone.0083291.e047.jpg (19)

and

graphic file with name pone.0083291.e048.jpg (20)

where (19) costs O(mnr) time and (20) costs O(mr 2+m 2 r) time. Since (20) calculates the shared Ut three times, it saves the time cost of line 5. In summary, the total time complexity of Algorithm 1 is Inline graphic, where T is the number of iterations, and its memory complexity is Inline graphic.

Experiments

This section evaluates DPNMF by a comprehensive study of its ability of data representation and its effectiveness in face recognition on four datasets including Yale [25], ORL [26], UMIST [27] and FERET [28] dataset.

A Comprehensive Study

To validate the data representation ability of DPNMF, we conducted a simple experiment before practical tasks. We randomly selected two individuals from UMIST dataset. For each individual, totally 15 images were chosen for this study and 7 images were utilized for training and the remaining 8 images were utilized for testing. Each image was cropped to a 40×40 pixel array and reshaped to 1600-dimensional vector. We marked images of both individuals by “*” and “o”, respectively, and the training images and the test images are painted in blue and red, respectively. Therefore, we obtained totally 14 training images painted in red and 16 test images painted in blue in Figure 2. In this experiment, DPNMF, DNMF, PNMF and NMF were conducted on the training images to learn a 2-dimensional subspace. Then, the test images were projected onto the learned subspace to depict their data representation abilities.

Figure 2. Projected test examples in the learned 2-D subspace on the UMIST dataset.

Figure 2

Projected test examples in the learned 2-D subspace: (A) DPNMF, (B) DNMF, (C) PNMF and (D) NMF on the real dataset.

Figure 2 shows the coefficients of both training and test images in the learned subspaces by DPNMF, DNMF, PNMF and NMF. Figure 2.B shows that their coefficients in the DNMF subspace contain negative entries. It means that DNMF suffers from the out-of-sample deficiency, namely the coefficients of the test examples contain negative entries. Figure 2.C shows that PNMF overcomes the out-of-sample deficiency but has weak discriminant power because it ignores the label information of the training images. In addition, NMF suffers from the out-of-sample deficiency and ignores the label information of the training images (see Figure 2.D). Figure 2.A shows that DPNMF simultaneously overcomes the aforementioned drawbacks and separates the images of both individuals perfectly.

Face Recognition

In this section, we validate the effectiveness of DPNMF by comparing the most related methods including NMF, PNMF, PNGE and DNMF on four datasets including Yale [25], ORL [26], UMIST [27] and FERET [28] dataset. For each dataset, all the face images are aligned according to the position eye. Different numbers of images of each subject were randomly selected to construct the training set and the remaining images consist of the test set. In this experiment, we used the nearest neighbor (NN) rule as a classifier and calculated the accuracy as percentage of test face images that are correctly classified. To eliminate the effect of randomness, we repeated such trial 5 times and compared representative algorithms based on the average accuracy. For DNMF, we set γ = 10 and δ = 0.0001 over the within class scatter term and between class scatter term, respectively. For PNGE, we set the trade-off parameter μ = 0.5 and the other parameters according to [10]. For all algorithms, the maximum number of loops is set to 2000 and the tolerance ε of stopping criterion is set to 10−7.

Given the training set Vtr, both NMF and DNMF learn a basis W and the coefficients as Inline graphic. To classify each image vts, we first calculate its coefficient Inline graphic and then classify it to the same class as the image whose coefficient has smallest Euclidean distance to yts, i.e., Inline graphic. Since both PNMF and DPNMF learn a basis W and consider its transpose as a projection matrix, different from NMF and DNMF, the coefficient of a test image vts is calculated as Inline graphic. We keep the remaining procedures of classification consistent for fairness of comparison.

Figure 3 gives the basis images learned by DPNMF, DNMF, PNGE, NMF, and PNMF on Yale, ORL, UMIST, and FERET datasets. It shows that DPNMF learns parts-based representation. In the following, we will validate the effectiveness of such representation.

Figure 3. The bases learned by different representative NMF and PNMF algorithms on four popular datasets.

Figure 3

The bases learned by (1) DPNMF, (2) DNMF, (3) PNGE, (4) NMF and (5) PNMF on four popular datasets (A) Yale, (B) ORL, (C) UMIST and (D) FERET datasets.

Yale Dataset

The Yale face image database [25] consists of 165 grayscale images taken from 15 subjects. Totally eleven images were taken from each subject under different settings such as varying facial expressions (sleepy or surprised) and other configurations. Each image is cropped to 32×32 pixels and reshaped to a 1024-dimensional vector. For each subject, totally 2, 4, 6, and 8 images were randomly selected as the training images and the remaining images as test images. In this experiment, we set the parameter μ = 1 for DPNMF (9). Figure 4 reports the average accuracies of DPNMF, DNMF, PNGE, PNMF and NMF on Yale dataset under different settings. It shows that DPNMF significantly outperforms the representative algorithms because it utilizes the label information in representing the training images and such parts-based representation (cf. row A of Figure 3 effectively inhibits the influence of the contained noises.

Figure 4. Average accuracies versus different reduced dimensionalities on Yale dataset.

Figure 4

Average accuracies versus reduced dimensionalities when (A) 2, (B) 4, (C) 6, and (D) 8 images of each subject of Yale dataset were selected for training.

ORL Dataset

The Cambridge ORL database [26] is composed of 400 face images taken from 40 individuals with varying facial expression, lighting and occlusions such as with and without glasses. For each individual, totally 2, 4, 6, and 8 images were randomly selected as the training images and the remaining images as test images. Each image is cropped to 32×32 pixels and reshaped to a 1024-dimensional vector. For DPNMF, the parameter in (9) is set to μ = 10 when 2 and 4 images of each individual are selected for training and μ = 0.03 when 6 and 8 images of each individual are selected for training.

Figure 5 reports the average accuracies of DPNMF, DNMF, PNGE, PNMF and NMF on ORL dataset under different settings. It shows that DPNMF outperforms DNMF, PNMF and NMF. Figure 5.A shows that DPNMF outperforms PNGE when only two images of each individual are used for training. However, PNGE shows superiority when the training set contains four and six images of each individual (see Figure 5.B and Figure 5.C). That is because the photos in ORL dataset are taken from different views of frontal faces and the local geometric structure enhances the discriminant power of PNGE on such dataset. Figure 5.D shows that DPNMF performs comparably with PNGE when the training set contains eight images of each individual.

Figure 5. Average accuracies versus different reduced dimensionalities on ORL dataset.

Figure 5

Average accuracies versus reduced dimensionalities when (A) 2, (B) 4, (C) 6, and (D) 8 images of each subject of ORL dataset were selected for training.

UMIST Dataset

The UMIST database [27] includes 575 face images collected from 20 individuals from different views and poses. Each image was resized to a 40×40 pixel array and reshaped to a 1600-dimensional long vector. In this experiment, a subset of 300 images composed of 15 images per subject on the left profile was tested. We randomly selected 4, 6, 8, and 10 images from each individual for training and the remaining images are used for testing. For DPNMF, we set the parameter μ = 1 in (9) empirically.

Figure 6 compares the average accuracies of DPNMF, DNMF, PNGE, PNMF and NMF on UMIST dataset under different settings. It shows that DPNMF significantly outperforms other algorithms especially when four and six images of each individual are selected for training. When eight and ten images of each individual are selected for training, DPNMF almost performs perfectly.

Figure 6. Average accuracies versus different reduced dimensionalities on UMIST dataset.

Figure 6

Average accuracies versus reduced dimensionalities when (A) 4, (B) 6, (C) 8, and (D) 10 images of each individuals of UMIST dataset are selected for training.

FERET Dataset

The FERET database [28] contains 13,539 face images taken from 1,565 subjects varying in size, pose, illumination, facial expression and age. We randomly select 100 individuals and 7 images for each individual to build up the FERET dataset. Each image was cropped to a 40×40 pixel array and reshaped to a 1600-dimensional long vector. Totally 2, 3, 4, and 5 images were randomly selected from each individual for training and the remaining images are used for testing. For DPNMF (9), we set the parameter μ = 1 when 2 and 3 images of each individual are selected for training, and set μ = 0.1 when 4 and 5 images of each individual are selected for training. Figure 7 reports the average accuracies of DPNMF, DNMF, PNGE, PNMF and NMF on FERET dataset under different settings. It shows that DPNMF significantly outperforms NMF, PNMF, and PNGE because it utilizes the label information in the training set. Figure 7 shows that DNMF also performs well on this dataset especially when 3, 4, and 5 images of each individual are selected for training. However, DNMF performs poorly when only two images of each individual are used for training because the training examples are rather limited in this case and the pseudo-inverse operator over its learned basis greatly reduces the discriminant power of DNMF. DPNMF overcomes such problem, and thus performs well (see Figure 7.A) in this case. Such observation confirms the effectiveness of DPNMF.

Figure 7. Average accuracies versus different reduced dimensionalities on FERET dataset.

Figure 7

Average accuracies versus reduced dimensionalities when (A) 2, (B) 3, (C) 4, and (D) 5 images of each subject of FERET dataset were selected for training.

Discussion

This section shows how to tune the tradeoff parameter in DPNMF. In addition, we also give an empirical validation of both convergence and efficiency of the MUR algorithm for DPNMF.

Parameter Selection

In the proposed DPNMF, there is a trade-off parameter μ that controls its discriminant power. It is usually tuned by using grid search on a wide range. In our experiments, we tuned this parameter in a wide range of [10-10 10-7 10-3 0.01 0.1 1 3 5 10 50 100 500 103 107 1010] on the Yale, ORL, UMIST and FERET datasets. To study the consistence of the selected parameter, we randomly select 4 and 8 images from each individual of Yale and ORL datasets for training, and 6 and 10 images from each individual of UMIST dataset for training, and 3 and 5 images from each individual of FERET dataset for training. Such trail is independently conducted five times to eliminate the randomness of training set and the average accuracy is reported in Figure 8.A to Figure 8.H, respectively.

Figure 8. Average accuracies versus the parameter μ with the corresponding reduced dimensionality.

Figure 8

Average accuracies versus the parameter μ when 4 and 8 images of each individual from Yale dataset were selected for training and the reduced dimensionality is set to 50 (A and E), 4 and 8 images of each individual from ORL dataset were selected for training and the reduced dimensionality is set to 120 (B and F), 6 and 10 images of each individual from UMIST dataset were selected for training and the reduced dimensionality is set to 100 (C and G), and 3 and 5 images of each individual from FERET dataset were selected for training and the reduced dimensionality is set to 250 (D and H).

Figure 8.A and Figure 8.E show that DPNMF performs stably when μ is selected from 10−10 to 1 on the Yale dataset and reaches its peak when μ = 1. Figure 7.B and Figure 8.F show that DPNMF performs stably when μ varies from 10−10 to 0.1 on the ORL dataset and reaches its peak when μ = 0.1. Figure 8.C and Figure 8.G show that DPNMF performs stably when μ is selected from 10−10 to 50 on the UMIST dataset and reaches its peak when μ = 3. Figure 8.D and Figure 8.H show that DPNMF performs stably when μ is selected from 10−10 to 1 on the FERET dataset and reaches its peak when μ = 0.01. From Figure 8, we can see that DPNMF performs stably when the parameter μ is selected from a wide range, but its discriminant power might decrease when the parameter μ is gradually increased. Therefore, we empirically set the parameter μ = 1, and this parameter should be tuned for satisfied classification performance on other datasets.

Convergence Study

In this section, we verified the convergence of DPNMF on the tested four face datasets. We randomly selected 8, 8, 10 and 5 images from each individual of Yale, ORL, UMIST and FERET datasets for training, and reported the objective values versus numbers of iterations in Figure 9.A to Figure 9.D, respectively. In this experiment, we set the tradeoff parameter μ to 10, 0.1, 3, and 0.01, according to above analysis and the reduced dimensionalities to 116, 304, 186, and 496 on the Yale, ORL, UMIST, and FERET datasets, respectively. The maximum number of iterations is set to 500.

Figure 9. Objective value versus the iterative number on four datasets.

Figure 9

Objective value versus the iterative number when (A) 8 images of each individual from Yale datasets, (B) 8 images of each individual from ORL datasets, (C) 10 images of each individual from UMIST datasets, and (D) 5 images of each individual from FERET datasets.

From Figure 9.A to Figure 9.D, we can see that MUR gradually reduced the objective function of DPNMF and converges rapidly within 500 iteration rounds on four tested datasets.

Efficiency Study

We also verified the computational cost of DPNMF compared with the representative algorithms on Yale, ORL, UMIST, and FERET datasets. Similarly, we randomly selected 8, 8, 10 and 5 images from each individual of Yale, ORL, UMIST and FERET datasets for training and repeated such trial five times to eliminate the effect of randomness. The parameter setting is same as those in above section. We implement all algorithms in MATLAB on a workstation which contains a 3.4 GHz Intel (R) Core (TM) processor and an 8 GB RAM. Figure 10 compares the average CPU costs of each iteration round spent by DPNMF with those spent by PNMF and PNGE on four test datasets.

Figure 10. CPU seconds versus reduced dimensionalities on four datasets.

Figure 10

CPU seconds versus reduced dimensionalities when (A) 8 images of each individual from Yale datasets, (B) 8 images of each individual from ORL datasets, (C) 10 images of each individual from UMIST datasets, and (D) 5 images of each individual from FERET datasets.

Figure 10 shows that DPNMF costs more CPU times than the other algorithms because it utilizes two time-consuming operators, i.e., Inline graphic and Inline graphic in line 5 of Algorithm 1, whose time complexities are both m 2 r. However, DPNMF can achieve higher accuracy than other algorithms (see Figure 4 to Figure 7) due to the incorporated Fisher's criterion. Several excellent NMF optimization algorithms such as NeNMF [45], Online RSA-NMF [46], and L-FGD [47] can be applied to optimize DPNMF more efficiently than MUR.

From above analysis, DPNMF is an effective dimension reduction method. In our future works, we will applied it to many vision tasks, e.g., color to gray image transformation [32], 3-D face reconstruction [33], and 3-D face facial expression analysis [34]. In addition, due to its effectiveness, we will extend DPNMF to tensor analysis [37] for gait recognition [36] and Bayesian model based on covariance learning [38] [39] [40] [41] in our future works.

Conclusion

This paper proposes an effective Discriminant Projective Non-negative Matrix Factorization (DPNMF) method to overcome the out-of-sample deficiency of NMF and boost its discriminant power by incorporating the label information in a dataset based on Fisher's criterion. We developed a multiplicative update rule to solve DPNMF and proved its convergence. Experimental results on popular face image databases demonstrate that DPNMF outperforms NMF and PNMF as well as their extensions.

Materials

Proof of Theorem 1

Given the current solution W′, we approximate Inline graphic by its Taylor-series expansion

graphic file with name pone.0083291.e058.jpg (21)

We construct an auxiliary function Inline graphic of Inline graphic as follows:

graphic file with name pone.0083291.e061.jpg (22)

It is easy to verify that Inline graphic.

In the following section, we will prove that Inline graphic to complete the proof. For any z>0, we have Inline graphic. By substituting Inline graphic into the above inequality, we have

graphic file with name pone.0083291.e066.jpg (23)

Since Inline graphic and Inline graphic, from (23), we have

graphic file with name pone.0083291.e069.jpg (24)
graphic file with name pone.0083291.e070.jpg (25)

By substituting (24) and (25) into (21), we prove that Inline graphic.

Assuming W″ is the minimum of Inline graphic, we have the following inequalities:

graphic file with name pone.0083291.e073.jpg (26)

The remaining things are calculating W″ and verifying its non-negativity constraint. To this end, we set the gradient of Inline graphic to zero, i.e.,

graphic file with name pone.0083291.e075.jpg (27)

Eq. (27) gives

graphic file with name pone.0083291.e076.jpg (28)

Since (28) is contains multiplications and divisions of non-negative entries, W″ is non-negative matrix.

It is obvious that (28) is equivalent to (17), and thus (26) implies that (17) decreases the objective function of DPNMF. It completes the proof.

Acknowledgments

We thank the Research Center of Supercomputing Application, National University of Defense Technology for their kind supports.

Funding Statement

This work was partially supported by Scientific Research Plan Project of National University of Defense Technology (No. JC13-06-01) and Australian Research Council Discovery Project (120103730). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Hotelling H (1933) Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology 24: 417–441. [Google Scholar]
  • 2. Fisher RA (1936) The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7: 179–188. [Google Scholar]
  • 3. Lee DD, Seung HS (1999) Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401: 788–791. [DOI] [PubMed] [Google Scholar]
  • 4. Zafeiriou S, Petrou M (2009) Nonlinear Non-negative Component Analysis Algorithms. IEEE Transaction on Image Processing 19: 1050–1066. [DOI] [PubMed] [Google Scholar]
  • 5. Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text Mining using Non-negative Matrix Factorization. IEEE International Conference on Data Mining 1: 452–456. [Google Scholar]
  • 6. Taslaman L, Nilsson B (2012) A Framework for Regularized Non-Negative Matrix Factorization, with Application to the Analysis of Gene Expression Data. PLoS ONE 7: e46331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Murrell B, Weighill T, Buys J, Ketteringham R, Moola S, et al. (2011) Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution. PLoS ONE 6: e28898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Lee CM, Mudaliar MAV, Haggart DR, Wolf CR, Miele G, et al. (2012) Simultaneous Non-Negative Matrix Factorization for Multiple Large Scale Gene Expression Datasets in Toxicology. PLoS ONE 7: e48238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Zafeiriou S, Tefas A, Buciu I, Pitas I (2006) Exploiting Discriminant Information in Nonnegative Matrix Factorization With Application to Frontal Face Verification. IEEE Transactions on Neural Networks 17: 683–695. [DOI] [PubMed] [Google Scholar]
  • 10. Liu X, Yan S, Jin H (2010) Projective Non-negative Graph Embedding. IEEE Transactions on Image Processing 19: 1126–1137. [DOI] [PubMed] [Google Scholar]
  • 11.Bengio Y, Paiement JF, Vincent P (2003) Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. Technical Report 1238.. [Google Scholar]
  • 12. He X, Cai D, Yan S, Zhang HJ (2005) Neighborhood Preserving Embedding. IEEE Conference on Computer Vision 2: 1208–1213. [Google Scholar]
  • 13. He X, Niyogi P (2004) Locality Preserving Projections. Advances in Neural Information Processing Systems 16: 153. [Google Scholar]
  • 14. Yuan Z, Oja E (2004) Projective Nonnegative Matrix Factorization for Image Compression and Feature Extraction. Springer Lecture Notes in Computer Science 3195: 1–8. [Google Scholar]
  • 15. Donoho D, Stodden V (2004) When Does Non-negative Matrix Factorization Give A Correct Decomposition into Parts? Advances in Neural Information Processing Systems 16: 1141–1148. [Google Scholar]
  • 16. Yan S, Xu D, Zhang B, Yang Q, Zhang H, et al. (2007) Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29: 40–51. [DOI] [PubMed] [Google Scholar]
  • 17. Wen J, Tian Z, Liu X, Lin W (2013) Neighborhood Preserving Orthogonal PNMF Feature Extraction for Hyperspectral Image Classification. IEEE Transactions on Geoscience & Remote Sensing Society 6: 759–768. [Google Scholar]
  • 18. Yang Z, Oja E (2010) Linear and Nonlinear Projective Non-negative Matrix Factorization. IEEE Transactions on Neural Networks 21: 734–749. [DOI] [PubMed] [Google Scholar]
  • 19. Hu L, Wu J, Wang L (2013) Convergent Projective Non-negative Matrix Factorization. International Journal of Computer Science Issues 10: 127–133. [Google Scholar]
  • 20. Zhang H, Yang Z, Oja E (2012) Adaptive Multiplicative Updates for Projective Nonnegative Matrix Factorization. International Conference on Neural Information Processing 3: 277–284. [Google Scholar]
  • 21. Wang SJ, Yang J, Zhang N, Zhou CG (2011) Tensor Discriminant Color Space for Face Recognition. IEEE Transactions on Image Processing 20(9): 2490–2501. [DOI] [PubMed] [Google Scholar]
  • 22. Wang SJ, Yang J, Sun MF, Peng XJ, Sun MM, et al. (2012) Sparse Tensor Discriminant Color Space for Face Verification. IEEE Transactions on Neural Networks and Learning Systems 23(6): 876–888. [DOI] [PubMed] [Google Scholar]
  • 23. Wang SJ, Zhou CG, Zhang N, Peng XJ, Chen YH, et al. (2011) Face Recognition using Second Order Discriminant Tensor Subspace Analysis. Neurocomputing 74(12–13): 2142–2156. [Google Scholar]
  • 24. Wang SJ, Zhou CG, Fu X (2013) Fusion Tensor Subspace Transformation Framework. PLoS ONE 8(7): e66647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Belhumeour P, Hespanha J, Kriegman D (1997) Eigenfaces vs. Fisherfaces: Recognition using Class Sepcific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19: 711–720. [Google Scholar]
  • 26.Samaria F, Harter A (1994) Parameterisation of A Stochastic Model for Human Face Identification. IEEE Conference on Computer Vision, Sarasota: 138–142.
  • 27. Graham DB, Allinson NM, Wechsler H, Pillips PJ, Bruce V, et al. (1998) Characterizing Virtual Eigensignatures for General Purpose Face Recognition. Face Recognition: From Theory to Applications 163: 446–456. [Google Scholar]
  • 28. Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The FERET Evaluation Methodology for Face-Recognition Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10): 1090–1104. [Google Scholar]
  • 29.Kong D, Ding C (2012) A Semi-Definite Positive Linear Discriminant Analysis and its Applications. IEEE International Conference on Data Mining: 942–947.
  • 30.Bertsekas DP (1982) Constrained Optimization and Lagrange Multiplier Methods, Academic Press. Inc.
  • 31.Kuhn HW, Tucker AW (1951) Nonlinear Programming. Proceedings of 2nd Berkeley Symposium, Berkeley: University of California Press: 481–492.
  • 32. Song M, Tao D, Chen C, Li X, Chen CW (2010) Color to Gray: Visual Cue Preservation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9): 1537–1552. [DOI] [PubMed] [Google Scholar]
  • 33. Song M, Tao D, Huang X, Chen C, Bu J (2012) Three-Dimensional Face Reconstruction From a Single Image by a Coupled RBF Network. IEEE Transactions on Image Processing 21(5): 2887–2897. [DOI] [PubMed] [Google Scholar]
  • 34. Song M, Tao D, Sun S, Chen C, Bu J (2013) Joint Sparse Learning for 3-D Facial Expression Generation. IEEE Transactions on Image Processing 22(8): 3283–3295. [DOI] [PubMed] [Google Scholar]
  • 35. Zhang T, Tao D, Li X, Yang J (2009) Patch Alignment for Dimensionality Reduction. IEEE Transactions on Knowledge and Data Engineering 21(9): 1299–1313. [Google Scholar]
  • 36. Tao D, Li X, Wu X, Maybank SJ (2007) General Tensor Discriminant Analysis and Gabor Features for Gait Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(10): 1700–1715. [DOI] [PubMed] [Google Scholar]
  • 37.Tao D, Li X, Wu X, Maybank SJ (2007) General Averaged Divergence Analysis. International Conference on Data Mining: 302–311.
  • 38. Li J, Tao D (2013) Simple Exponential Family PCA. IEEE Transactions on Neural Networks and Learning Systems 24(3): 485–497. [DOI] [PubMed] [Google Scholar]
  • 39. Li J, Tao D (2013) Exponential Family Factors for Bayesian Factor Analysis. IEEE Transactions on Neural Networks and Learning Systems 24(6): 964–976. [DOI] [PubMed] [Google Scholar]
  • 40.Li J, Tao D (2013) A Bayesian Factorised Covariance Model for Image Analysis. International Joint Conferences on Artificial Intelligence: 1466–1471.
  • 41. Li J, Tao D (2012) On Preserving Original Variables in Bayesian PCA with Applications to Image Analysis. IEEE Transactions on Image Processing 21(12): 4830–4843. [DOI] [PubMed] [Google Scholar]
  • 42.Guan N, Tao D, Luo Z, Shawe-taylor J (2012) MahNMF: Manhattan Non-negative Matrix Factorization. arXiv: 1207.3438v1.
  • 43. Guan N, Tao D, Luo Z, Yuan B (2011) Manifold Regularized Discriminative Nonnegative Matrix Factorization with Fast Gradient Descent. IEEE Transactions on Image Processing 20: 2030–2048. [DOI] [PubMed] [Google Scholar]
  • 44. Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative Patch Alignment Framework. IEEE Transactions on Neural Networks 22: 1218–1230. [DOI] [PubMed] [Google Scholar]
  • 45. Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: An Optimal Gradient Method for Non-negative Matrix Factorization. IEEE Transactions on Signal Processing 60(6): 2882–2898. [Google Scholar]
  • 46. Guan N, Tao D, Luo Z, Yuan B (2012) Online Non-negative Matrix Factorization with Robust Stochastic Approximation. IEEE Transactions on Neural Networks and Learning Systems 23(7): 1087–1099. [DOI] [PubMed] [Google Scholar]
  • 47. Guan N, Wei L, Luo Z, Tao D (2013) Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization. PLoS ONE 8(10): e77162. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES