Abstract
Datasets with matrix and vector form are increasingly popular in modern scientific fields. Based on structures of datasets, matrix and vector coefficients need to be estimated. At present, the matrix regression models were proposed, and they mainly focused on the matrix without vector variables. In order to fully explore complex structures of datasets, we propose a novel matrix regression model which combines fused LASSO and nuclear norm penalty, which can deal with the data containing matrix and vector variables meanwhile. Our main work is to design an efficient algorithm to solve the proposed low-rank and fused LASSO matrix regression model. Following the existing idea, we design the linearized alternating direction method of multipliers and establish its global convergence. Finally, we carry out numerical experiments to demonstrate the efficiency of our method. Especially, we apply our model to two real datasets, i.e. the signal shapes and the trip time prediction from partial trajectories.
Keywords: Matrix regression, fused LASSO, low rank, linearized alternating direction method of multipliers, global convergence
1. Introduction
In the era of big data, modern scientific applications are more complex, and sampling units combine matrices with vectors instead of containing one form. A well-known example is the study of electroencephalography (EEG) data set of alcoholism, as in [23]. The study consists of 122 subjects with two groups, an alcoholic group and a normal control group, with each subject being exposed to a stimulus. Voltage values are measured from 64 channels of electrodes placed on the subject's scalp for 256-time points, so each sampling unit is a matrix. We would face intricate challenges if turning this matrix to vector. On the one hand, the dimension is , but the sample size is . On the other hand, vectorization destroys the structural information in matrix data. It is crucial to propose a novel matrix regression model to deal with this sampling data. Zhou and Li [23] proposed the matrix regression
| (1) |
where is the response, are the objective regression coefficients, is the matrix variate and is the vector variate, is the noise which follows a normal distribution with mean 0 and standard deviation σ. However, Zhou and Li [23] only analyse the properties and algorithm when γ is 0. At present, few researchers considered matrix regression model (1), let alone this model with penalized regularization. It is necessary to study this model because of the data's form containing matrix and vector at the same time.
In this paper, we focus on the low-rank and fused LASSO (LRFL) matrix regression
| (2) |
where are tuning parameters. Unlike some typical models only containing vectors or matrices, the first and second -norm in this model induce the sparsity of both the coefficients γ and their successive differences, and the nuclear norm induces the low rank of unknown regression matrix B. The sparsity of γ can help us to choose the most important variables. And the low rank of B is used to pick up the pivotal information in matrix variable. In model (2), if we get rid of B, the model degenerates into the fused LASSO (FLASSO) introduced by Tibshirani et al. [18]. If we take away γ from model (2), it comes into the nuclear norm regularized matrix regression studied in Zhou and Li [23]. In this paper, we will mainly focus on designing an efficient method for solving LRFL matrix regression model (2).
In recent years, researchers proposed many regularized regression models with different penalties, such as Power family [7], elastic net [24], log-penalty [1,3], SCAD [6], and MC+ [22]. Meanwhile, Zhou and Li [23] proposed matrix regression model and considered the low rank of B based on spectral regularization. There is also some work related to matrix data, such as [4,13,14,17,20,21]. However, none of them considered matrix and vector variables together. Thus, the model combining matrix and vector variables is essential to be studied. The basic work is studying the statistical property and designing the algorithm for computing solution of LRFL matrix regression model (2). Zhou and Li [23] used the Nesterov method to solve spectral regularized matrix regression. Although this method is analytically simple, it is not suitable for model (2) because of two variables. What's more, Li et al. proposed the linearized alternating direction method of multipliers (LADMM) algorithm in [12] for solving FLASSO. Owing to objective function in our model (2) is convex with respect to B and γ, we can consider to have a natural extension to our model. Fortunately, following the procedure of Li et al. [12], we develop a LADMM algorithm for solving model (2).
The rest of the paper is organized as follows. In Section 2, some preliminaries which are useful for further discussion will be introduced. Especially, we present two important optimization problems appeared in our algorithm. In Section 3, we give the LADMM algorithm for the LRFL matrix regression model (2). In Section 4, we demonstrate the convergence of the obtained algorithm. We conduct extensive numerical experiments to evaluate the performance of our algorithm for LRFL matrix regression model (2) in Section 5. We conclude this paper in the final section.
2. Preliminary
In this section, we introduce some preliminaries which are useful for further discussion. Firstly, we give some symbols about derivative. Then, we briefly introduce the solution of two important optimization problems.
2.1. Matrix calculation
- If is a real-valued function of , the derivative of y with respect to B is defined that
- If and is real-valued function of B, the derivative of Y with respect to B is defined that
- For any matrix the derivative of function is defined as
2.2. Two important optimization problems
Here, we present two results which play an important role in the algorithm.
2.2.1. Soft-thresholding
The minimization problem
with , and has a closed-form solution, which is given by the soft-thresholding operator
| (3) |
where is the sign function.
2.2.2. Matrix soft-thresholding
The minimization problem
with , and has a closed-form solution, which is defined as
| (4) |
where satisfy the singular value decomposition (SVD) of R, i.e. and λ is the diagonal of Λ. We arrange r to an column-wise matrix, and denote it by R. The proof can be found in [2] (Theorem 2.1) or [19] (Theorem 3).
3. Linearized alternating direction method of multipliers
In this section, we will propose the LADMM algorithm to solve LRFL matrix regression model (2). For completeness, we firstly explain the ADMM algorithm in the following. Considering a convex minimization problem
| (5) |
The objective function (5) is separable and the constraints are linear. The and are given non-empty, closed and convex sets, and F, G are closed convex functions. The and are given matrices, and is a given vector. Throughout, we assume that the solution set of (5) denoted by is non-empty. We get the augmented Lagrangian function of (5),
where is a penalty parameter, is Lagrange multiplier. The augmented Lagrangian method (ALM) in Hestenes [10] and Powell [16] can be applied to solve (5). With given initial point , the iterative scheme of ALM for (5) is
| (6) |
The direct application of the ALM to (5) is the scheme (6). At each iteration, it requires to minimize β and γ simultaneously. The ADMM algorithm in Gabay and Mercier [8] and Glowinski and Marrocco [9] decomposes the minimization problem (6) into two separable problems and minimizes them iteratively
| (7) |
Because the constraint optimization problem (5) is transformed into an unconstrained optimization problem (7), the subproblems in (7) are easier than the original problem (5). Moreover, the resulting subproblems in (7) have closed-formed solutions for many applications including the LASSO and GLASSO. This fact particularly makes the application of ADMM efficient for a wide class of problems. So we consider applying the ADMM algorithm to the LRFL matrix regression model (2).
Now we analyse how to solve the LRFL matrix regression model (2) by applying the ADMM. In order to reformulate the model (2), we define a matrix as
In fact, we denote , and the model (2) can be written as
| (8) |
Letting , (8) can be rewritten as
| (9) |
The augmented Lagrangian function of (9) is
| (10) |
where is the Lagrange multiplier, and μ is a given penalty parameter. The iterative scheme of ADMM for (10) is
| (11) |
Now let us look at the resulting subproblems in (11). First, for the B-subproblem in (11), after trivial manipulation, it can be written as
| (12) |
where . The subproblem does not have a closed-form solution because of the non-identity matrix X. As Wang Yuan [19], we can linearize the quadratic term in (12), which is replaced by
where the parameter v>0 controls the proximity to . Overall, we solve the following subproblem:
| (13) |
where is a matrix with , and . Then, following (4), the closed-form solution of (13) is
| (14) |
where satisfy the singular value decomposition (SVD) of , i.e. and λ is the diagonal of Λ.
Second, for the γ-subproblem in (11), we will briefly display the solution of subproblem γ as follows:
where . The subproblem does not have a closed-form solution because of the non-identity matrix . Similarly, we can linearize the quadratic term, which is replaced by
where the parameter v>0 controls the proximity to . Overall, we solve the following subproblem:
Then, according to (3), the closed-form solution of (11) is
| (15) |
Third, for the ξ-subproblem in (11), its closed-form solution can be obtained directly by determining the derivative of the augmented Lagrangian function, which is
We obtain the solution by (3),
| (16) |
In summary, the iterative scheme of LADMM algorithm for LRFL matrix regression model can be described as follows.
Remark 3.1
When solving B-subproblem, the rank of is determined by . If increases, the rank of will decrease; if gets smaller, the rank of will get larger. Thus, if we want to obtain a low-rank estimator, we only need to choose a large . On the other hand, for a given , we do not need to compute all the singular values of . From the solution in (14), the singular values which are greater than make sense. Thus in the implement of Algorithm 1, we use the truncation technique which can be found in [15].
4. Convergence analysis
In this section, we will focus on the convergence analysis of Algorithm 1. In fact, this procedure is similar to some existing work, such as [12]. In order to better understand, we give a succinct proof here.
4.1. Convergence of Algorithm 1
Note that, the Lagrange function of (9) is
| (17) |
where is the Lagrange multiplier. By the first-order optimality condition of (17), it is easy to see that solving (9) is equivalent to find , and such that
| (18) |
Note that denotes the subdifferential operator of a non-smooth convex function. We denote all the elements in Ω that satisfy (18) by . Then, using the notation , and , (18) can be written as a variational inequality (VI) problem: finding a , and such that
| (19) |
where and
For purpose of expressing concisely, we need to use the following matrix:
From the procedure of proof, the matrix G must be positive definite. Considering , the positive definiteness of G can be achieved by the condition and , where denotes the spectral radius of matrix. In order to establish the convergence of LADMM algorithm, in the following lemma we characterize the th iteration of Algorithm 1 as a VI problem.
Lemma 4.1
Let be a sequence generated by Algorithm 1. Then we have
where
Proof.
First, we have
and
It follows from (11) that
(20) Deriving the first-order optimality condition of the minimization problems (14), (15) and (16), we see that the iterative scheme (11) is equivalent to find , and such that
(21) According to the definition of G, inserting (20) into (21), the (21) can be written as
Then we obtain the conclusion.
The following lemma can be easily derived by Lemma 4.1. For completeness, we show the proof in detail.
Lemma 4.2
Let be a sequence generated by Algorithm 1. Then we have
Proof.
From Lemma 4.1, for any , by setting , we obtain
(22) where is an arbitrary solution point in . Note that we have for , thus (22) leads to
Since , the above inequality becomes
On the other hand, since and are both convex, it is obvious that the mapping is monotone. We thus have
(23) and
(24) Then, replacing by in (23) and using (24), we get the desired conclusion.
Using Lemma 4.2, we can obtain an upper bound of the difference between the sequence generated by Algorithm 1 and true solution.
Lemma 4.3
Let be the sequence generated by Algorithm 1. Then we have
(25)
Proof.
For , it follows from Lemma 4.2 that
(26) As shown before, we have for any k. Thus, we have
We obtain that
(27)
Lemma 4.3 implies that the sequence generated by Algorithm 1 is contractive with respect to the solution set . The following corollary is trivial based on the inequality (25). Thus, we omit the proof.
Corollary 4.4
Let be the sequence generated by Algorithm 1. Then we have
.
The sequence is bounded.
For any the sequence is monotonically non-increasing.
Now we can obtain the convergence of the linearized alternating direction method of multipliers for the LRFL matrix regression model.
Theorem 4.5
For any and the sequence is generated by Algorithm 1. Then converges to a point where is an optimal solution of the LRFL matrix regression model (9).
Proof.
The property (a) in Corollary 4.4 means that
In addition, the property (b) in Corollary 4.4 implies that the sequence has at least one cluster point. We denote it by and let be a subsequence converging to . Thus, we have
(28) and
(29) Next we show that the cluster point satisfies the optimality condition (18). Due to the variational inequality (19) and (29), we have
Then, according to (28), we obtain that
Thus, the limiting point satisfies (19), i.e. . Considering the property (c) in Corollary 4.4, we have
Therefore, the sequence has the unique cluster point . Thus, is an optimal solution of the LRFL matrix regression model (9).
4.2. Convergence rate
Following the work in [24], we can easily establish a worst-case convergence rate measured by the iteration complexity in the ergodic sense for the proposed Algorithm 1. That is, after k iterations, the average of all these k iterates generated by Algorithm 1 is an approximate solution of LRFL matrix regression model with an accuracy of . For succinctness, we omit the proof.
5. Numerical experiments
5.1. Simulation
We consider a class of matrix models with different ranks and sparsity levels used in [23]. Specifically, we generate the matrix covariates X of size and the vector covariates Z in , both of which consist of independent standard normal entries. We set the sample size at n = 500, where the number of parameters is . We set and generate the true array signal as , where , and R controls the rank of the signal. Moreover, each entry of B is 0 or 1, and the percentage of non-zero entries is controlled by a sparsity level constant s, i.e. each entry of is a Bernoulli distribution with probability of 1 equal to . We vary the rank R = 1, 5 and the level of (non-)sparsity s = 0.01, 0.05, 0.1 ( s = 0.05 means that about of entries of B are 1s and the rest are 0s). We generate response y with the systematic part as with ϵ satisfying a standard normal distribution. In addition, all computations are performed on an Intel Core(TM)i7-2640M CPU (2.80 GHz) and 8 GB RAM. The code for Algorithm 1 is written in MATLAB, and the initial point for this method is set to be . The maximum iteration number is set as 500. For the tuning parameters , we take a large grid of values. For each model, we choose the parameters which give the best performance of test root-mean-square error (RMSE), where the test set is generated as the above besides n = 2500. The other parameters are chosen as and
In the experiment, we simulate the model 100 times. Then we evaluate the performance of our method from two aspects: parameter estimation and prediction. For the former, we employ
as the evaluation criterion. For the latter, we use independent validation data to evaluate the prediction error measured by the RMSE of the response. For simplicity, we use LRFL to stand for LRFL matrix regression model. We report the performance in Tables 1–5. Specifically, RMSE-B is the root-mean-squared error of B for training data, RMSE-γ is the root-mean-squared error of γ for training data, RMSE-PRE is the root-mean-squared error of prediction for test data, and the numerical values in parentheses are the standard deviations of the corresponding terms. The average CPU time (in seconds) is also included.
Table 1. Performance of Algorithm 1 when the true rank of coefficient matrix R = 1.
| Sparsity | RMSE-B | RMSE-γ | RMSE-PRE | CPU |
|---|---|---|---|---|
| 0.01 | 0.23(0.007) | 0.05(0.025) | 0.23(0.006) | 1.48 |
| 0.05 | 0.30(0.006) | 0.08(0.019) | 0.31(0.016) | 1.67 |
| 0.1 | 0.40(0.004) | 0.07(0.030) | 0.40(0.008) | 1.58 |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 2. Performance of Algorithm 1 when the true rank of coefficient matrix R = 5.
| Sparsity | RMSE-B | RMSE-γ | RMSE-PRE | CPU |
|---|---|---|---|---|
| 0.01 | 0.20(0.008) | 0.05(0.023) | 0.20(0.014) | 1.86 |
| 0.05 | 0.27(0.008) | 0.09(0.018) | 0.29(0.014) | 1.54 |
| 0.1 | 0.40(0.005) | 0.09(0.028) | 0.40(0.013) | 1.56 |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 3. Results of comparing LRFL with matrix LASSO when the true rank of coefficient matrix R = 2.
| R = 2 | |||||
|---|---|---|---|---|---|
| Sparsity | Method | RMSE-B | RMSE-γ | RMSE-PRE | Rank |
| 0.01 | LRFL | 0.38(0.055) | 0.54(0.023) | 0.66(0.017) | 1 |
| matrix LASSO | 0.33(0.042) | 0.71(0.018) | 0.78(0.037) | 24.5 | |
| 0.05 | LRFL | 0.46(0.027) | 0.54(0.015) | 0.70(0.018) | 1 |
| matrix LASSO | 0.38(0.019) | 0.71(0.011) | 0.81(0.018) | 24 | |
| 0.1 | LRFL | 0.45(0.042) | 0.55(0.024) | 0.72(0.031) | 1 |
| matrix LASSO | 0.38(0.029) | 0.71(0.018) | 0.81(0.030) | 22 | |
| 0.2 | LRFL | 0.59(0.036) | 0.55(0.017) | 0.80(0.037) | 1 |
| matrix LASSO | 0.54(0.017) | 0.72(0.086) | 0.90(0.014) | 25 | |
| 0.5 | LRFL | 0.62(0.056) | 0.57(0.021) | 0.84(0.037) | 1 |
| matrix LASSO | 0.95(0.020) | 0.74(0.019) | 1.22(0.039) | 27 | |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 4. Results of comparing LRFL with matrix LASSO when the true rank of coefficient matrix R = 5.
| R = 5 | |||||
|---|---|---|---|---|---|
| Sparsity | Method | RMSE-B | RMSE-γ | RMSE-PRE | Rank |
| 0.01 | LRFL | 0.29(0.025) | 0.67(0.012) | 0.75(0.032) | 6.5 |
| matrix LASSO | 0.31(0.069) | 0.79(0.023) | 0.85(0.028) | 10.5 | |
| 0.05 | LRFL | 0.48(0.046) | 0.55(0.022) | 0.72(0.028) | 6 |
| matrix LASSO | 0.43(0.028) | 0.72(0.010) | 0.84(0.027) | 25 | |
| 0.1 | LRFL | 0.55(0.025) | 0.53(0.013) | 0.74(0.023) | 6 |
| matrix LASSO | 0.44(0.031) | 0.71(0.012) | 0.84(0.023) | 23.5 | |
| 0.2 | LRFL | 0.66(0.018) | 0.55(0.023) | 0.84(0.039) | 7 |
| matrix LASSO | 0.61(0.024) | 0.72(0.017) | 0.94(0.023) | 26 | |
| 0.5 | LRFL | 0.80(0.060) | 0.56(0.023) | 0.98(0.053) | 6 |
| matrix LASSO | 0.90(0.028) | 0.73(0.015) | 1.15(0.046) | 27.5 | |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 5. Results of comparing LRFL with matrix LASSO when the true rank of coefficient matrix R = 10.
| R = 10 | |||||
|---|---|---|---|---|---|
| Sparsity | Method | RMSE-B | RMSE-γ | RMSE-PRE | Rank |
| 0.01 | LRFL | 0.43(0.043) | 0.55(0.043) | 0.70(0.039) | 11 |
| matrix LASSO | 0.31(0.057) | 0.71(0.011) | 0.78(0.038) | 24 | |
| 0.05 | LRFL | 0.48(0.026) | 0.53(0.019) | 0.72(0.023) | 15 |
| matrix LASSO | 0.34(0.032) | 0.71(0.012) | 0.78(0.017) | 23 | |
| 0.1 | LRFL | 0.54(0.015) | 0.53(0.012) | 0.76(0.027) | 15.5 |
| matrix LASSO | 0.46(0.028) | 0.72(0.012) | 0.86(0.035) | 24 | |
| 0.2 | LRFL | 0.72(0.014) | 0.55(0.018) | 0.91(0.020) | 16.5 |
| matrix LASSO | 0.68(0.015) | 0.73(0.016) | 1.01(0.026) | 26.5 | |
| 0.5 | LRFL | 0.88(0.072) | 0.59(0.042) | 1.07(0.068) | 15 |
| matrix LASSO | 1.05(0.024) | 0.74(0.015) | 1.29(0.033) | 28 | |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
From Tables 1 and 2 and Figure 1, we can find that the accuracy of and test prediction are becoming worse with the increase of the sparsity. The standard deviation, however, is almost not changed. This result suggests that our method is convergent. Compared the accuracy of B with is better than B.
Figure 1.
The boxplot of RMSE-B, RMSE-γ, RMSE-PRE, where is the rank of is the sparsity level of .
In Tables 3–5, we compared LRFL with matrix LASSO in the accuracy and estimation rank of B. The ranks listed in Tables 3–5 are the median of the 100 simulation results. For the matrix LASSO method, we stack up the elements of the predictor X by column. Then, the matrix model is reformed to vector model. Thus, we can use matrix LASSO to solve it. From Tables 3–5, for the accuracy of B, LASSO gives a better performance. However, LRFL shows a better rank estimation. There is no doubt that vectorization destroys the structure of X. This result in the coefficient matrix B is not low-rank.
5.2. Real data
In this subsection, we will use our algorithm to deal with two real datasets.
5.2.1. Signal shapes
Recently, signal shapes have attracted wide attention of researchers [11,23]. In LRFL matrix regression model (2), X is a matrix with entries generated as independent standard normal distribution, and Z is a five-dimensional vector. We set that B is binary with the true signal shapes, and γ satisfies a standard normal distribution. We easily know that the response is where ϵ satisfies a normal distribution. In our numerical experiments, we change sample sizes at n = 500, 750 and 1000, and results are displayed in Tables 6–8. Tables 6–8 show the root-mean-squared errors (RMSEs) for vector coefficient γ, matrix coefficient B, response y and their standard deviations. Note that RMSEs of and y have significantly reduced with increasing of dimension of sample.
Table 6. Performance for signal shapes of Algorithm 1 when n = 500.
| Shapes | RMSE-B | RMSE-γ | RMSE-y |
|---|---|---|---|
| Cross | 0.29(0.004) | 1.64(0.007) | 0.20(0.002) |
| Square | 0.32(0.004) | 1.89(0.005) | 0.25(0.007) |
| Tshape | 0.32(0.005) | 1.12(0.005) | 0.27(0.008) |
| Triangle | 0.31(0.001) | 1.99(0.002) | 0.24(0.001) |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 7. Performance for signal shapes of Algorithm 1 when n = 750.
| Shapes | RMSE-B | RMSE-γ | RMSE-y |
|---|---|---|---|
| Cross | 0.21(0.001) | 1.44(0.008) | 0.18(0.007) |
| Square | 0.28(0.002) | 0.95(0.005) | 0.23(0.004) |
| Tshape | 0.26(0.002) | 1.01(0.003) | 0.22(0.005) |
| Triangle | 0.22(0.009) | 1.64(0.007) | 0.19(0.004) |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 8. Performance for signal shapes of Algorithm 1 when n = 1000.
| Shapes | RMSE-B | RMSE-γ | RMSE-y |
|---|---|---|---|
| Cross | 0.20(0.008) | 1.29(0.001) | 0.17(0.004) |
| Square | 0.26(0.001) | 0.88(0.004) | 0.22(0.007) |
| Tshape | 0.24(0.002) | 0.98(0.001) | 0.22(0.002) |
| Triangle | 0.20(0.003) | 0.73(0.012) | 0.18(0.003) |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
5.2.2. Trip time prediction from partial trajectories
A classical example is the ECML Discovery challenge 2015 competition to estimate of taxi's travel time for complete trips [5,11]. It contains the 7733 trajectories of taxis in Porto for a period of 1 year, which every trajectory includes multiple features. We record the latitude and longitude coordinates every 15 s when running. It also contains seven regular variates, such as trip id, call type, origin stand, day type and so on. The trip id contains a unique identifier for every trip. Call type identifies some way to demand this service, such as, this trip was dispatched from the central, demanded directly to a taxi driver at a specific stand and otherwise. Origin stand contains a unique identifier for the taxi stand. Day type contains holiday or any other special day, a day before holiday and otherwise. In LRFL matrix regression model (2), X is a matrix in and Z is a vector in . We remove 32 trajectories due to the missing observation of coordinates. We use our matrix regression model to predict taxis travel time for complete journeys.
Table 9 shows that the root-mean-squared error of prediction of test data under 5-fold or 10-fold Cross-Validation. Furthermore, we choose 1000 trips as a test dataset, and change the dimension of the training dataset. Results are displayed in Table 10.
Table 9. Results of trip time prediction under 5-fold or 10-fold cross-validation.
| 5-fold rate | 10-fold rate | ||||
|---|---|---|---|---|---|
| Training dataset | Test dataset | RMSE-PRE | Training dataset | Test dataset | RMSE-PRE |
| 6161 | 1540 | 0.86(0.010) | 6931 | 770 | 0.82(0.013) |
Notes: All the measurements are the mean of the results after repeated 100 times. The numbers in parentheses are the corresponding standard errors.
Table 10. All the measurements are the mean of the results after repeated 100 times.
| Training dataset | Test dataset | RMSE-PRE |
|---|---|---|
| 200 | 1000 | 0.86(0.034) |
| 500 | 1000 | 0.86(0.041) |
| 1000 | 1000 | 0.81(0.089) |
| 1500 | 1000 | 0.82(0.068) |
| 2000 | 1000 | 0.77(0.046) |
Note: The numbers in parentheses are the corresponding standard errors.
6. Summary
In this paper, we propose the LRFL matrix regression model which combines nuclear norm and fused LASSO penalty. The inspiration for this model comes from the fact which sampling units containing matrix and vector at the same time in kinds of fields. In order to solve the LRFL matrix regression model, we explore a linearized ADMM algorithm and establish the global convergence. Finally, we demonstrate the efficiency of our method through numerical experiments on simulation and real datasets. We compare LRFL matrix regression model with matrix LASSO. As we can see, our model can give more accurate and lower rank estimation. We mainly focus on the algorithm for solving the LRFL matrix regression model. Therefore, the statistical properties are also worthy of study.
Acknowledgments
The authors are very grateful to two anonymous reviewers and associate editor for their insightful remarks and comments, which considerably improved the presentation of our paper.
Funding Statement
The work was supported in part by the National Natural Science Foundation of China (11671029), the Fundamental Research Funds for the Central Universities (2019YJS200) and Colleges and Universities in Hebei Province Science and Technology Research Project (Z2019032).
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Armagan A., Dunson D. and Lee J., Generalized double Pareto shrinkage, Stat. Sin. 23 (2013), pp. 119–143. [PMC free article] [PubMed] [Google Scholar]
- 2.Cai J., Candés E. and Shen Z., A singular value thresholding algorithm for matrix completion, SIAM J. Optim. 20 (2010), pp. 1956–1982. doi: 10.1137/080738970 [DOI] [Google Scholar]
- 3.Candés E., Wakin M. and Boyd S., Enhancing sparsity by reweighted minimization, J. Four. Anal. Appl. 14 (2008), pp. 877–905. doi: 10.1007/s00041-008-9045-x [DOI] [Google Scholar]
- 4.Chen B. and Kong L., High-dimensional least square matrix regression via elastic net penalty, Pac. J. Optim. 13 (2017), pp. 185–196. [Google Scholar]
- 5.Discovery challenge: On learning from taxi GPS traces: ECML-PKDD, 2015. Available at http://www.geolink.pt/ecmlpkdd2015-challenge/.
- 6.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc. 96 (2001), pp. 1348–1360. doi: 10.1198/016214501753382273 [DOI] [Google Scholar]
- 7.Frank I. and Friedman J., A statistical view of some chemometrics regression tools, Technometrics 35 (1993), pp. 109–135. doi: 10.1080/00401706.1993.10485033 [DOI] [Google Scholar]
- 8.Gabay D. and Mercier B., A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Comp. Math. Appl. 2 (1976), pp. 17–40. doi: 10.1016/0898-1221(76)90003-1 [DOI] [Google Scholar]
- 9.Glowinski R. and Marrocco A., Sur l'approximation, par éléments finis et la résolution, par pénalisation-dualité d'une classe de problémes de Dirichlet non linéaires, ESAIM: Math. Model. Numer. Anal. 9 (1975), pp. 41–76. [Google Scholar]
- 10.Hestenes M., Multiplier and gradient methods, J. Optim. Theory Appl. 4 (1969), pp. 303–320. doi: 10.1007/BF00927673 [DOI] [Google Scholar]
- 11.Li M. and Kong L., Double fused Lasso penalized LAD for matrix regression, Appl. Math. Comput. 357 (2019), pp. 119–138. doi: 10.1016/j.cam.2019.02.009 [DOI] [Google Scholar]
- 12.Li X., Mo L., Yuan X. and Zhang J., Linearized alternating direction method of multipliers for sparse group and fused Lasso models, Comput. Stat. Data Anal. 79 (2014), pp. 203–221. doi: 10.1016/j.csda.2014.05.017 [DOI] [Google Scholar]
- 13.Lin Z., Liu R. and Su Z., Linearized alternating direction method with adaptive penalty for low-rank representation, Adv. Neural Inf. Process. Syst. 24 (2011), pp. 612–620. [Google Scholar]
- 14.Luo L., Yang J., Qian J., Tai Y. and Lu G., Robust image regression based on the extended matrix variate power exponential distribution of dependent noise, IEEE Trans. Neur. Net. Lear. 28 (2017), pp. 2168–2182. doi: 10.1109/TNNLS.2016.2573644 [DOI] [PubMed] [Google Scholar]
- 15.Ma S., Goldfarb D. and Chen L., Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128 (2011), pp. 321–353. doi: 10.1007/s10107-009-0306-5 [DOI] [Google Scholar]
- 16.Powell M., A Method for Nonlinear Constraints in Minimization Problems, Optimization. Academic Press, New York, 1969. [Google Scholar]
- 17.Qian J., Yang J., Zhang F. and Lin Z., Robust low-rank regularized regression for face recognition with occlusion, Biometrics Workshop in Conjunction with IEEE Conference on Computer Vision and Pattern Recognition (CVPRW), Columbus, Ohio, June 23–28, 2014.
- 18.Tibshirani R., Saunders M., Rosset S., Zhu J. and Knight K., Sparsity and smoothness via the fused Lasso, J. R. Stat. Soc. 67 (2005), pp. 91–108. doi: 10.1111/j.1467-9868.2005.00490.x [DOI] [Google Scholar]
- 19.Wang X. and Yuan X., The linearized alternating direction method for Dantzig selector, SIAM J. Sci. Comput. 34 (2012), pp. A2792–A2811. doi: 10.1137/110833543 [DOI] [Google Scholar]
- 20.Xie J., Yang J., Qian J., Tai Y. and Zhang H., Robust nuclear norm-based matrix regression with applications to robust face recognition, IEEE Trans. Image Process. 26 (2017), pp. 2286–2295. doi: 10.1109/TIP.2017.2662213 [DOI] [PubMed] [Google Scholar]
- 21.Yang J., Luo L., Qian J., Tai Y., Zhang F. and Xu Y., Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes, IEEE Trans. Pattern Anal. 39 (2017), pp. 156–171. doi: 10.1109/TPAMI.2016.2535218 [DOI] [PubMed] [Google Scholar]
- 22.Zhang C., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), pp. 894–942. doi: 10.1214/09-AOS729 [DOI] [Google Scholar]
- 23.Zhou H. and Li L., Regularized matrix regression, J. R. Stat. Soc. Ser. B Stat. Methodol. 76 (2014), pp. 463–483. doi: 10.1111/rssb.12031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zou H. and Hastie T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005), pp. 301–320. doi: 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]

