Abstract
This paper is concerned with computational issues related to penalized quantile regression (PQR) with ultrahigh dimensional predictors. Various algorithms have been developed for PQR, but they become ineffective and/or infeasible in the presence of ultrahigh dimensional predictors due to the storage and scalability limitations. The variable updating schema of the feature-splitting algorithm that directly applies the ordinary alternating direction method of multiplier (ADMM) to ultrahigh dimensional PQR may make the algorithm fail to converge. To tackle this hurdle, we propose an efficient and parallelizable algorithm for ultrahigh dimensional PQR based on the three-block ADMM. The compatibility of the proposed algorithm with parallel computing alleviates the storage and scalability limitations of a single machine in the large-scale data processing. We establish the rate of convergence of the newly proposed algorithm. In addition, Monte Carlo simulations are conducted to compare the finite sample performance of the proposed algorithm with that of other existing algorithms. The numerical comparison implies that the proposed algorithm significantly outperforms the existing ones. We further illustrate the proposed algorithm via an empirical analysis of a real-world data set.
Keywords: ADMM, Penalized quantile regression, Parallel computing, Sample-splitting algorithm
1. Introduction
Quantile regression (QR) is well acknowledged as a powerful tool for analyzing data with heterogeneous effect. Since the seminal work of Koenker and Bassett (1978), QR has been extensively applied in many research fields, in particular, in econometrics. For a complete review of QR, refer to Koenker (2017) and Koenker et al. (2017). Many recent advances and achievements of QR can be found in literature. Wang and He (2022) provided a unified theory for high-dimensional quantile regression with both convex and nonconvex penalties. Gimenes and Guerre (2022) proposed a QR inference framework for first-price auctions, and Cai et al. (2022) reexamined the heterogeneous predictability of US stock returns at different quantile levels. Other recent studies of QR include, but not limited to, D’Haultfœuille et al. (2018), Altunbaş and Thornton (2019), Giessing and He (2019), Gu and Volgushev (2019), Firpo et al. (2022), He et al. (2022) , and Narisetty and Koenker (2022).
For variable selection in QR, penalized quantile regression (PQR) has been developed with fixed and finite dimensional predictors in Li and Zhu (2008) and Wu and Liu (2009). Furthermore, PQR with high-dimensional predictors has also been studied in the statistical literature, since as the advent of data science, high-dimensional data analysis has become one of the most important research topics in the last decade. Belloni and Chernozhukov (2011) derived a nice error bound for PQR with the Lasso penalty (ℓ1-QR for short). Wang et al. (2012) studied the PQR with folded concave penalty such as the smoothed clip absolute deviation (SCAD) penalty (Fan and Li, 2001) and minimax concave penalty (MCP) (Zhang, 2010), and further established the oracle property for PQR in the ultrahigh dimension setting under mild conditions. In summary, estimation and theory of PQR are well studied and understood in the literature.
The numerical minimization problem for searching solutions to PQR, however, is challenging due to the nonsmoothness objective function with the possible nonconvexity of folded concave penalty. Sherwood and Maidman (2017) developed an R-package rqPen for ℓ1-QR, and the algorithm is similar to the ℓ1-QR introduced in Koenker and Mizera (2014). Peng and Wang (2015) developed an iterative coordinate descent algorithm (QICD) for solving PQR with nonconvex penalty. Gu et al. (2018) introduced fast alternating direction method of multiplier (ADMM) (Boyd et al., 2011) for PQR in high dimension.
As the advent of big data, it is of crucial importance to study numerical algorithms for PQR in ultrahigh dimension and/or a large data size. The ADMM (Boyd et al., 2011) has been introduced to cope with PQR with a large data size. Yu et al. (2017) and Fan et al. (2021) developed parallel algorithms for PQR based on sample-splitting ADMM. By sample-splitting, it means by its name that the algorithm partitions the data across samples. Ultrahigh dimensionality adds another challenge in minimizing the objective function of ultrahigh dimensional PQR. This work aims to tackle simultaneous challenges of nonsmoothness, nonconvexity and ultrahigh dimensionality by developing feature-splitting algorithms for PQR.
In this paper, we propose an efficient and parallelizable algorithm for PQR in ultrahigh dimension based on three-block ADMM. It is noteworthy that Yu and Lin (2017) briefly mentioned one direct extension of the feature-splitting ADMM for PQR without theoretical justifications and numerical studies. The variable update schema in Yu and Lin (2017) makes the convergence of the algorithm uncertain. Chen et al. (2016) showed that Gauss-Seidel multi-block ADMM is not necessarily convergent. For more detailed discussion on this, see Section 2. The uncertain convergence motivates us to avoid the direct extension of the feature-splitting ADMM, and instead to develop a three-block ADMM algorithm for ultrahigh dimensional PQR. Using related techniques in Sun et al. (2015), we establish the rate of convergence of the proposed algorithm and the theoretical convergence guarantee, and address the convergence uncertainty. The compatibility of the proposed three-block ADMM algorithm with parallel computing alleviates the storage and scalability limitations of a single machine in the large-scale data processing. The proposed three-block ADMM algorithms also enjoy numerical efficiency over the directly extended two-block ADMM. It is worthy to note that the newly proposed algorithms can be directly applied to PQR with various penalties including the , the SCAD and the MCP penalties by local linear approximation to the penalties (Zou and Li, 2008). Based on theories developed in Wang et al. (2013) and Fan et al. (2014), the proposed algorithms are able to obtain an PQR estimate with strong oracle property in ultrahigh dimension.
The rest of this article is organized as follows. In Section 2, we present the computational framework based on the three-block ADMM for PQR and establish the linear rate of convergence of the algorithm. In Section 3, we demonstrate the numerical and statistical efficiency of the proposed framework in high and ultra-high dimensional settings through Monte Carlo simulation, and illustrate the proposed algorithm via an empirical analysis of a Chinese supermarket data set. Technical proofs are given in the Appendix.
Throughout the paper, we adopt the following notations. For a matrix , denote , and and as the smallest and largest eigenvalues of M, respectively. XA denotes the sub-matrix of X with the columns indexed by A. indicates that M is positive definite. For a positive semidefinite operator or matrix .
2. Feature-splitting Algorithms for PQR
Suppose that , is a random sample from linear regression model
where is p-dimensional vector of regression coefficients, and is a random error with . In this paper, we are interested in solving QR in ultrahigh dimensional regime, in which . Define as the response vector, and as the corresponding design matrix. For a given , the quantile of interest, define the loss function where I(⋅) is the indicator function, , and . The QR is to minimize its objective function
| (1) |
with respect to , and this leads to the QR estimate of . The minimization problem in QR can be reformulated as a linear programming problem. The Frisch-Newton algorithm can be applied to solve the minimization problem with computational complexity growing as a cubic function of p when p < n.
2.1. Penalized quantile regression
In the presence of ultrahigh dimensional predictors, it is common to impose sparsity assumption on . That is, only a small portion of elements in are nonzero. This implies that only a small portion of predictors are significant in the model. Thus, it is critical to identify the significant predictors in QR in ultrahigh dimension. The variable selection in QR is similar to that in linear regression for which penalized least squares methods have been proposed. It is then natural to extend the PQR method to the variable selection for QR. With PQR, it is to minimize the penalized quantile loss function
| (2) |
where is a penalty function with a regularization parameter that controls model complexity. The algorithms to be developed in this paper allow that different regression coefficients have different penalties, although it is common to take all to be the same and denoted by . This paper concentrates on the two most commonly-used penalties: the Lasso (i.e., ) penalty and the SCAD penalty whose first derivative is defined as
| (3) |
with and a = 3.7 as suggested in Fan and Li (2001). The proposed algorithms are directly applicable for other folded concave penalties (Fan et al., 2020).
It is challenging in minimizing the objective function of PQR in (2) since both the loss function and penalty function are nonsmoothing. When folded concave penalty such as the SCAD penalty is used in ultrahigh dimensional PQR, the minimization problem becomes even more challenging due to its nonconvexity and ultrahigh dimensionality. It is noteworthy that it is a convex minimization problem for the PQR with the penalty, and when , it has unique minimizer. For the PQR with a folded concave penalty, minimizing the objective function in (2) may be achieved by iteratively minimizing PQR with reweighted penalty with the aid of the local linear approximation (LLA) to the penalty function. Specifically, given updated from the k-th step in the course of iterations, we first approximate
| (4) |
which is referred to as the LLA. Then at the -th step we minimize
| (5) |
where . The function in (5) is the objective function of PQR with reweighted penalty and weights ’s that are updated at every step.
The LLA was first proposed in Zou and Li (2008) for penalized likelihood with finite dimensional predictors, and further adopted in Wang et al. (2013) and Fan et al. (2014) for penalized least squares for ultrahigh dimensional linear regression models. Note that if we set initial value , the is the PQR-Lasso estimator defined as the PQR with penalty. Then can be regarded as the one-step sparse estimator with initial value being the PQR-Lasso estimator. With properly chosen tuning parameters, Wang et al. (2013) and Fan et al. (2014) showed that under some regularity conditions on high dimensional linear models, the corresponding penalized least squares estimator enjoys the strong oracle property with probability tending to one. This motivates us to focus on developing feature-splitting algorithm for PQR with weighted penalty.
2.2. Three-block ADMM
Define the PQR estimator with weighted -penalty to be
| (6) |
where with being the weight vector.
The non-smoothness of the objective function in (6) hinders an efficient application of gradient-based methods. To decouple the non-smooth parts in computation, we decentralize problem (6) into the following constrained optimization problem,
| (7) |
Problem (7) is a natural candidate of classical two-block ADMM algorithm. Define augmented Lagrangian function as
| (8) |
where is the Lagrangian multiplier, and ϕ > 0 is the parameter associated with the quadratic term. The classic iterative scheme at the iteration k for two-block ADMM is
where is a tuning parameter controlling the step size. The effect of tuning parameter on the algorithm convergence has been discussed in literature (Fortin and Glowinski, 2000; Fazel et al., 2013), where the convergences are established when is constrained in . In our numerical experiments, we set that is slightly less than for faster convergence. Gu et al. (2018) proposed an efficient algorithm (qradmm) to solve PQR based on the two block ADMM algorithm. While qradmm performs very well for moderate dimensions, we found it can still be out of memory with larger p in our numerical study. This motivates us to split the high dimensional variable to smaller blocks and speed up updates through parallelization.
We next propose a new three-block semi-proximal ADMM framework that capacitates a parallel update of to cope with the ultrahigh dimensionality. Major computational cost of two-block ADMM for solving (7) comes from the update, which takes up to O(np) operations and may impede an efficient execution of the algorithm with ultrahigh dimension p. This calls for feature-splitting algorithm for PQR in ultrahigh dimension. For a pre-specified G, let us partition X and as follows,
Then problem (7) can be rewritten as a three-block optimization problem
| (9) |
Intuitively, slack variables store information of each local update . Each is updated independently and we view as a single variable block in the algorithm. Likewise, all together make up the third variable block. There may exist multiple ways to transform a problem into a form that ADMM can handle. For example, in formulation (9), the role of is not special and are exchangeable. In this paper, we use formulation (9) to illustrate the computational framework.
The augmented Lagrangian function for (9) is given by
| (10) |
As seen from (10), each is decoupled in the quadratic term, which allows a natural parallelization for updates. Two-block ADMM can be directly extended to solve (9), and the corresponding algorithm is referred to as Gauss-Seidel multi-block ADMM. At the kth iteration, it updates each variable with
| (11) |
Procedure (11) may perform well in practice. However, its theoretical convergence has remained unclear until the work by Chen et al. (2016), in which the authors showed that Gauss-Seidel multi-block ADMM is not necessarily convergent. To address the convergence uncertainty, Sun et al. (2015) proposed a symmetric Gauss-Seidel based semi-proximal ADMM (sGS-sPADMM) for convex programming problems, which enjoys both theoretical convergence guarantee and numerical efficiency over the directly extended multi-block ADMM. This convergent semi-proximal ADMM has three separable blocks in the objective function with the third part being linear and updates twice to improve convergence, but the extra step may incur additional computational cost.
Inspired by Sun et al. (2015), we now propose the three-block ADMM algorithm for solving PQR with weighted ℓ1 penalty using the following special iterative cycle
where and are some positive semidefinite matrices.
Given the augmented Lagrangian function defined in (10),
becomes
| (12) |
It can be seen that subproblems are a series of weighted -penalized least squares problems. If pg is too large, may not be full column rank, and thus, the generated sequences may not be well-defined. This concern can be addressed with an additional general position condition (Koenker, 2017), which indicated the existence of the unique solution of QR in a rather general condition. Standard quadratic solvers can be applied to solve (12) efficiently. In our numerical studies, we use R solver ‘glmnet’ to compute through the coordinate descent (CD) algorithm.
, and z are updated in the following cycle:
| (13) |
in which we perform an extra intermediate step to compute before computing . As seen from (13), the extra cost for update ω is negligible. The derivations of updates are given in Appendix A.1.
Finally, we update and via gradient ascent,
| (14) |
We call this algorithm FS-QRADMM-CD, and summarize it in Algorithm 1. From our numerical studies, we observe that FS-QRADMM-CD has favorable practical performances.
Algorithm 1.
FS-QRADMM-CD for weighted -penalized QR
Besides using coordinate descent algorithm to update , we have another solution for update. To ensure that solutions from (12) are well-defined, we add G self-adjoint positive semidefinite matrices, denoted as , to (12). A general principle is that should be as small as possible, while the optimization problems are still easy to compute. Here we add proximal terms , to each of the -subproblems, where the proximal operators is positive definite. The positive definiteness of makes automatically well-defined. In this paper, we take with . This essentially is a linearization step of the update, as it uses to approximate the Hessian matrix . The modified minimization problem admits a closed-form solution, which can be carried out componentwisely,
| (15) |
where is the soft thresholding function.
Updates in (15) manifest one advantage of splitting feature space into lower dimensions. The update can be regarded as a one-step iteration of the proximal gradient. After feature-splitting, ’s are relatively small compared to the “un-splitted” , as needs to be larger than . Since η increases significantly with p for high dimensional data, the step size for the update (i.e., ) can be rather small and slow down the convergence of the algorithm. The update for and in this algorithm is exactly same as those in Algorithm 1. We use FS-QRADMM-prox to denote this algorithm and summarize it in Algorithm 2. Note that is also required in the proof of convergence.
Thus, we compute on separate processors/cores in the manner of parallel computing, and then aggregate the updated information to compute other variables.
We establish the linear rate of convergence for Algorithm 2 in Theorem 1 in which the proximal term is necessary for establishing the theory and whose proof is given in the Appendix A.
Theorem 1. For , the sequence generated by Algorithm 2 converges to a limit point , where is the primal optimal and is the dual optimal. Furthermore, there exists a constant μ ∈ (0, 1) such that , where at the k-th iteration is defined as,
| (16) |
where and .
Algorithm 2.
FS-QRADMM-prox for weighted -penalized QR
Remark. The minimization problem for searching the solution of the penalized quantile regression can be written as a linear programming problem. The primal and dual problems are feasible. The minimizer of the dual problem equals the solution of the linear programming problem (primal problem) by strong duality. Thus both the optimal values of the primal and dual problems equal the optimal value of the penalized quantile regression problem.
The effect of G on the convergence is twofold. On the one hand, increasing G reduces the dimension of subproblems and the value of , and thus it accelerates the computation of each sub-problem. On the other hand, increasing G leads to an increased number of sub-problems and may raise the value of . This slows down the convergence both practically and theoretically. In our numerical experiments, it seems that choosing G from 5 to 10 works well for p ranging from thousands to tens of thousands.
2.3. PQR-Lasso and PQR-SCAD
In this paper, PQR-Lasso refers to the PQR in (2) with the penalty, . Thus, the PQR-Lasso can be solved by Algorithms 1 and 2 directly with all weights in (6). The resulting solutions from Algorithms 1 and 2 for the PQR-Lasso are denoted by FS-QRADMM-CD(Lasso) and FS-QRADMM-prox(Lasso) in Section 3, respectively.
Parallel to the PQR-Lasso, PQR-SCAD refers to the PQR in (2) with the SCAD penalty whose first-order derivative is defined in (3). Since the SCAD penalty is folded concave, the objective function of PQR-SCAD may have multiple local minimizers. To avoid this issue, we recommend (a) using the proposed algorithm to obtain the PQR-Lasso estimate , and then (b) further to obtain the PQR with weighted penalty, in which the weight is with being the first-order derivative of the SCAD penalty. We refer to the resulting estimate as the two-step PQR-SCAD estimate. Note that both penalty and the SCAD-based weighted penalty are convex. The two-step SCAD estimate is well defined when L(y–Xβ) is strictly convex with respect to . Denote FS-QRADMM-CD(TS-SCAD) and FS-QRADMM-prox(TS-SCAD) to be the resulting solutions of Algorithms 1 and 2 for the two-step PQR-SCAD. The corresponding algorithms of FS-QRADMM-CD and FS-QRADMM-prox for the two-step PQR-SCAD are presented in Algorithms 3 and 4 in Section A.3 in the Appendix.
The two-step PQR-SCAD shares the same spirit of one-step sparse maximum likelihood estimation proposed in Zou and Li (2008) for folded concave penalization problems. The second step in the two-step PQR-SCAD is to correct the bias inherited in penalty which is known to over-penalize large coefficients and introduce bias to the resulting model. As shown in Corollary 8 in Fan et al. (2014), the two-step PQR-SCAD can find the oracle estimator among multiple local minimums with overwhelming probability, under certain regularity conditions. This provides theoretical justification for the two-step SCAD. In other words, the resulting solutions of Algorithms 3 and 4 for the two-step PQR-SCAD enjoy strong oracle property in the terminology of Fan et al. (2014).
The two-step PQR-SCAD procedure can be extended to two-step PQR with a general folded concave penalty characterized by the following conditions: (a) pλ(t) is nondecreasing and concave for with pλ (0) = 0; (b) pλ(t) is differentiable in (0, ∞); (c) for some positive constants a1 and for , and (d) for with a > 1. As shown in Fan et al. (2014), the two-step PQR with a general folded concave penalty also enjoys the strong oracle property under certain regularity conditions.
It is desirable to have a data-driven method to select the regularization parameters in PQR-Lasso and PQR-SCAD. In our numerical study, we set the same penalty and tuning parameter for all coefficients, and λ is chosen by HBIC criterion proposed in Lee et al. (2014),
| (17) |
where is the cardinality of active set. We select the λ that minimizes HBIC.
Wang et al. (2013) recommends using different ’s in the first step and the second step in the penalized least squares setting to ensure the resulting Lasso estimate satisfying a certain rate of convergence. Denote and to be regularization parameters used in the first and second step, respectively. Following the recommendation in Wang et al. (2013), we choose , where and tends to 0 as . We set υ = λ suggested by Wang et al. (2013) in our numerical studies in Section 3.
3. Numerical Studies
In this section, we assess the performance of the proposed algorithms via simulation studies and illustrate the application of the newly proposed procedure via an empirical analysis. For all ADMM-based methods, we implement the warm-start technique introduced in Friedman et al. (2007) and Friedman et al. (2010), which uses the solution from the previous to initialize computation at the current . The way of splitting the features has no influence on the convergence property of the algorithm. We equally distribute the features into K groups without adjusting the order in our numerical studies. The stopping criterion of ADMM-based algorithms is provided in the Appendix.
3.1. Simulation study
In this simulation, we compare the performance of Algorithms 1 and 2 with R packages rqPen (Sherwood and Maidman, 2017), qradmm (Gu et al., 2018), hqReg (Yi and Huang, 2017) and Conquer (Tan et al., 2022). Since qradmm package is boosted by FORTRAN, we re-implement its core algorithm, i.e., a two-block proximal ADMM, using R code for a relatively fair comparison. The R package rqPen implements an iterative coordinate descent algorithm (QICD) proposed in Peng and Wang (2015) to solve sparse quantile regression. QICD applies a convex majorization function on the concave penalty term, and solves the majorized objective function by coordinate descent. The R package qradmm implements a two-block proximal ADMM for PQR with penalty proposed in Gu et al. (2018). We use the R packages hqreg and conquer to implement the methods proposed by Yi and Huang (2017) and Tan et al. (2022), respectively. The regularization parameter λ in all algorithms to be compared is selected by the HBIC criterion defined in (17).
We take the simulation setting similar to that of Peng and Wang (2015). We generate from , where with . Then set and for , where is the cumulative distribution function of N(0, 1). The response variable Y is generated from the following heteroscedastic regression model,
| (18) |
where . We consider three different quantiles and 0.7. Note that X1 does not affect the center of the conditional distribution Y given x, but affects the conditional distribution when or 0.7. In our simulation, we set n = 400 and p = 1000 and 50000. For each case, we conduct 500 replications.
The following criteria are used to compare the performance of different algorithms.
Average absolute error: the average and standard deviation of over 500 replications.
Size: the average number of nonzero ’s over 500 replications.
P1: the proportion of models that select all active features except for X1 over 500 replications
P2: the proportion of models that select X1 over 500 replications.
The proportion P2 is expected to be close to 0 when , and be close to 1 when and 0.7.
The simulation results over 500 replications are summarized in Tables 1 and 2. Compared to the PQR-Lasso, two-step PQR-SCAD produces models with significantly smaller absolute error and better selection accuracy in general. FS-QRADMM-CD(TS-SCAD) and FS-QRADMM-prox (TS-SCAD) have the best performances with respect to estimation and variable selection accuracy. When p = 1000, three ADMM-based methods perform comparably well and outperform rqPen, hqReg and Conquer by a significant margin. rqPen, hqReg and Conquer obtain relatively larger estimation errors and is more likely to miss X1 when , and 0.7. The current version of rqPen runs out of memory when solving two-step PQR-SCAD, as noted in the table. Nonetheless, when p = 50000, both Qradmm and rqPen fail due to their demanding memory usage. In fact, we notice that the efficiency of Qradmm deteriorates sharply when p increases. hqReg and Conquer are able to finish the job when p = 50000, but the proposed methods still outperform hqReg and Conquer.
Table 1:
Comparison of algorithms for PQR when p = 1000 and n = 400.
| n = 400, p = 1000 | P1 | P2 | Size | ||
|---|---|---|---|---|---|
| FS-QRADMM-CD (Lasso) | 0.3 | 0.295 (0.003) | 100% | 100% | 5.56 (0.03) |
| 0.5 | 0.210 (0.003) | 100% | 5.4% | 4.36 (0.03) | |
| 0.7 | 0.281 (0.003) | 100% | 100% | 5.56 (0.03) | |
|
| |||||
| FS-QRADMM-prox (Lasso) | 0.3 | 0.295 (0.003) | 100% | 100% | 5.62 (0.03) |
| 0.5 | 0.198 (0.003) | 100% | 4.6% | 4.34 (0.02) | |
| 0.7 | 0.301 (0.003) | 100% | 100% | 5.56 (0.03) | |
|
| |||||
| qradmm(Lasso) | 0.3 | 0.310 (0.003) | 100% | 100% | 5.68 (0.03) |
| 0.5 | 0.230 (0.003) | 100% | 9% | 5.32 (0.06) | |
| 0.7 | 0.327 (0.005) | 100% | 100% | 6.73 (0.08) | |
|
| |||||
| rqPen(Lasso) | 0.3 | 0.598 (0.004) | 100% | 61.2% | 5.10 (0.04) |
| 0.5 | 0.267 (0.003) | 100% | 0% | 4.23 (0.02) | |
| 0.7 | 0.601 (0.004) | 100% | 56.6% | 5.04 (0.04) | |
|
| |||||
| hqReg(Lasso) | 0.3 | 0.593 (0.006) | 100% | 50% | 4.95 (0.04) |
| 0.5 | 0.235 (0.003) | 100% | 0% | 4.31 (0.03) | |
| 0.7 | 0.589 (0.006) | 100% | 51.6% | 4.97 (0.04) | |
|
| |||||
| Conquer(Lasso) | 0.3 | 0.590 (0.005) | 100% | 45% | 4.73 (0.03) |
| 0.5 | 0.231 (0.002) | 100% | 0% | 4.27 (0.02) | |
| 0.7 | 0.586 (0.005) | 100% | 45% | 4.72 (0.03) | |
|
| |||||
| FS-QRADMM-CD (TS-SCAD) | 0.3 | 0.119 (0.002) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.035 (0.001) | 100% | 0.2% | 4.00 (0.00) | |
| 0.7 | 0.125 (0.002) | 100% | 100% | 5.00 (0.00) | |
|
| |||||
| FS-QRADMM-prox (TS-SCAD) | 0.3 | 0.115 (0.002) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.040 (0.001) | 100% | 0.2% | 4.00 (0.00) | |
| 0.7 | 0.123 (0.001) | 100% | 100% | 5.00 (0.00) | |
|
| |||||
| qradmm (TS-SCAD) | 0.3 | 0.122 (0.002) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.038 (0.001) | 100% | 0.4% | 4.00 (0.00) | |
| 0.7 | 0.129 (0.002) | 100% | 100% | 5.00 (0.00) | |
|
| |||||
| rqPen (TS-SCAD) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| Conquer(SCAD) | 0.3 | 0.339 (0.004) | 100% | 63.4% | 4.67 (0.02) |
| 0.5 | 0.049 (0.001) | 100% | 0% | 4.07 (0.01) | |
| 0.7 | 0.350 (0.004) | 100% | 57% | 4.60 (0.02) | |
Table 2:
Comparison of algorithms for PQR when p = 50000 and n = 400.
| n = 400, p = 50000 | P1 | P2 | Size | ||
|---|---|---|---|---|---|
| FS-QRADMM-CD (Lasso) | 0.3 | 0.320 (0.003) | 100% | 98.2% | 5.34 (0.02) |
| 0.5 | 0.250 (0.003) | 100% | 2% | 4.25 (0.03) | |
| 0.7 | 0.349 (0.003) | 100% | 100% | 5.15 (0.03) | |
|
| |||||
| FS-QRADMM-prox (Lasso) | 0.3 | 0.326 (0.003) | 100% | 92.4% | 4.93 (0.01) |
| 0.5 | 0.121 (0.001) | 100% | 0% | 4.01 (0.00) | |
| 0.7 | 0.394 (0.002) | 100% | 95.6% | 5.01 (0.11) | |
|
| |||||
| qradmm(Lasso) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| rqPen(Lasso) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| hqReg(Lasso) | 0.3 | 0.812 (0.004) | 100% | 2.2% | 4.23 (0.02) |
| 0.5 | 0.365 (0.003) | 100% | 0% | 4.20 (0.02) | |
| 0.7 | 0.808 (0.004) | 100% | 4.4% | 4.26 (0.02) | |
|
| |||||
| Conquer(Lasso) | 0.3 | 0.717 (0.003) | 100% | 18% | 5.88 (0.07) |
| 0.5 | 0.303 (0.002) | 100% | 0% | 8.63 (0.11) | |
| 0.7 | 0.705 (0.003) | 100% | 26.8% | 6.13 (0.07) | |
|
| |||||
| FS-QRADMM-CD (TS-SCAD) | 0.3 | 0.180 (0.003) | 100% | 98.8% | 4.99 (0.00) |
| 0.5 | 0.047 (0.001) | 100% | 0% | 4.00 (0.00) | |
| 0.7 | 0.172 (0.003) | 100% | 99.6% | 5.00 (0.03) | |
|
| |||||
| FS-QRADMM-prox (TS-SCAD) | 0.3 | 0.158 (0.002) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.069 (0.005) | 100% | 2.2% | 7.31 (0.47) | |
| 0.7 | 0.244 (0.007) | 100% | 99.2% | 6.64 (0.14) | |
|
| |||||
| qradmm (TS-SCAD) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| rqPen (TS-SCAD) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| Conquer(SCAD) | 0.3 | 0.396 (0.002) | 100% | 48.4% | 5.04 (0.04) |
| 0.5 | 0.058 (0.001) | 100% | 0% | 5.78 (0.07) | |
| 0.7 | 0.390 (0.003) | 100% | 49.6% | 5.12 (0.05) | |
Figure 1 plots the curves of with respect to the iteration steps averaged over 500 replications when n = 400, p = 1000, and . We can see that Algorithms 1 and 2 converge to true within approximately 20 iterations.
Figure 1:

Convergence curves of of FS-QRADMM-CD(Lasso) (left panel) and FS-QRADMM-prox(Lasso) (right panel) over 500 replications.
We next examine the performance of the proposed algorithms when the sample size is large. To this end, we conduct a simulation with n = 30000 and p = 1000. The simulation results are summarized in Table 3, from which it can be seen that the proposed two algorithm and the ADMM for quantile regression have the same performance and perform better than the conquer algorithm.
Table 3:
Performance of proposed algorithms for PQR when p = 1000 and n = 30000.
| n = 30000, p = 1000 | P1 | P2 | Size | ||
|---|---|---|---|---|---|
| FS-QRADMM-CD (Lasso) | 0.3 | 0.031 (0.0004) | 100% | 100% | 5.06 (0.003) |
| 0.5 | 0.020 (0.0004) | 100% | 0.4% | 4.04 (0.003) | |
| 0.7 | 0.029 (0.0004) | 100% | 100% | 5.06 (0.003) | |
|
| |||||
| FS-QRADMM-prox (Lasso) | 0.3 | 0.029 (0.0003) | 100% | 100% | 5.05 (0.003) |
| 0.5 | 0.020 (0.0003) | 100% | 0.5% | 4.03 (0.002) | |
| 0.7 | 0.029 (0.0003) | 100% | 100% | 5.05 (0.003) | |
|
| |||||
| qradmm(Lasso) | 0.3 | 0.030 (0.0004) | 100% | 100% | 5.08 (0.003) |
| 0.5 | 0.023 (0.0003) | 100% | 0.6% | 4.05 (0.004) | |
| 0.7 | 0.029 (0.0004) | 100% | 100% | 5.04 (0.004) | |
|
| |||||
| rqPen(Lasso) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| hqReg(Lasso) | 0.3 | 0.040 (0.0008) | 100% | 100% | 6.06 (0.09) |
| 0.5 | 0.020 (0.0002) | 100% | 1% | 4.60 (0.07) | |
| 0.7 | 0.040 (0.0008) | 100% | 51.6% | 5.90 (0.09) | |
|
| |||||
| Conquer(Lasso) | 0.3 | 0.066 (0.0009) | 100% | 100% | 5.23 (0.04) |
| 0.5 | 0.020 (0.0003) | 100% | 0% | 4.11 (0.04) | |
| 0.7 | 0.065 (0.0009) | 100% | 100% | 5.25 (0.05) | |
|
| |||||
| FS-QRADMM-CD (TS-SCAD) | 0.3 | 0.012 (0.0005) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.004 (0.0002) | 100% | 0.2% | 4.00 (0.00) | |
| 0.7 | 0.013 (0.0004) | 100% | 100% | 5.00 (0.00) | |
|
| |||||
| FS-QRADMM-prox (TS-SCAD) | 0.3 | 0.011 (0.0003) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.004 (0.0002) | 100% | 0% | 4.00 (0.00) | |
| 0.7 | 0.012 (0.0003) | 100% | 100% | 5.00 (0.00) | |
|
| |||||
| qradmm (TS-SCAD) (TS-SCAD) | 0.3 | 0.012 (0.0004) | 100% | 100% | 5.00 (0.00) |
| 0.5 | 0.004 (0.0003) | 100% | 0% | 4.00 (0.00) | |
| 0.7 | 0.011 (0.0005) | 100% | 100% | 5.00 (0.00) | |
|
| |||||
| rqPen (TS-SCAD) | The algorithm runs out of memory for τ = 0.3, 0.5, 0.7 | ||||
|
| |||||
| Conquer(SCAD) | 0.3 | 0.028 (0.0008) | 100% | 100% | 5.15 (0.039) |
| 0.5 | 0.005 (0.0002) | 100% | 1% | 4.45 (0.061) | |
| 0.7 | 0.029 (0.0008) | 100% | 100% | 5.14 (0.037) | |
3.2. A real data example
QR model is widely adopted in the analysis of consumer markets due to its robustness against outliers. In this section, we apply the proposed algorithm for an empirical analysis of a supermarket data set studied in Wang (2009) and compare it with other existing algorithms. This data set contains the daily number of customers and the daily sale volumes of 6398 products from a supermarket in China over 464 days. Following Wang (2009), we set the response to be the daily number of customers, and predictors to be the daily sale volumes of products. Since the sample size n = 464 is much less than the dimension p = 6398, it is reasonable to assume that only a small proportion of predictors have significant effects on the response. The distribution of the number of customers is highly skewed. This motivates us to consider PQR with the proposed algorithm in this example. We standardize the response and the predictors for our analysis.
We randomly split observations into training and testing datasets of sizes 300 and 164, respectively, and fit PQR-Lasso and two-step QR-SCAD on the training data with and 0.7. The regularization parameter is chosen by HBIC criterion. We report the averaged predictive error and its standard deviation on the testing data over 100 replications in Table 4. The predictive error is measured by the loss function . We also report the average model sizes and its corresponding standard deviation to evaluate the interpretability of models selected from different methods. For PQR-Lasso, we observe that ADMM-based algorithms have similar performance with that of rqPen and hqReg in terms of prediction error. The average values and standard deviations of the loss function are very close among those methods. In general, all methods perform the best when . The proposed method selects fewer products than qradmm, rqPen, hqReg do in most scenarios, which indicates a better model interpretability.
Table 4:
Performances of ADMM and lpSolve of sparse quantile regression on the Chinese Supermarket Data.
| Size | |||
|---|---|---|---|
| FS-QRADMM-CD (Lasso) | 0.3 | 0.118 (0.001) | 97.35 (0.62) |
| 0.5 | 0.116 (0.001) | 100.56 (0.53) | |
| 0.7 | 0.127 (0.001) | 97.37 (0.71) | |
|
| |||
| FS-QRADMM-prox (Lasso) | 0.3 | 0.117 (0.001) | 103.23 (1.13) |
| 0.5 | 0.113 (0.001) | 118.61 (0.93) | |
| 0.7 | 0.127 (0.001) | 96.02 (1.06) | |
|
| |||
| qradmm(Lasso) | 0.3 | 0.115 (0.000) | 119.01 (0.49) |
| 0.5 | 0.116 (0.001) | 121.26 (0.37) | |
| 0.7 | 0.130 (0.000) | 127.35 (0.58) | |
|
| |||
| rqPen(Lasso) | 0.3 | 0.117 (0.001) | 113.62 (0.73) |
| 0.5 | 0.115 (0.001) | 117.17 (0.56) | |
| 0.7 | 0.128 (0.001) | 120.11 (0.62) | |
|
| |||
| hqReg(Lasso) | 0.3 | 0.117 (0.001) | 49.31 (0.46) |
| 0.5 | 0.116 (0.001) | 90.85 (0.65) | |
| 0.7 | 0.127 (0.001) | 43.42 (0.42) | |
|
| |||
| Conquer(Lasso) | 0.3 | 0.118 (0.001) | 80.8 (1.47) |
| 0.5 | 0.114 (0.001) | 42.9 (0.58) | |
| 0.7 | 0.125 (0.001) | 39.6 (0.49) | |
|
| |||
| FS-QRADMM-CD (TS-SCAD) | 0.3 | 0.112 (0.000) | 63.86 (0.39) |
| 0.5 | 0.111 (0.001) | 69.77 (0.47) | |
| 0.7 | 0.116 (0.001) | 72.71 (0.61) | |
|
| |||
| FS-QRADMM-prox (TS-SCAD) | 0.3 | 0.116 (0.000) | 97.03 (0.96) |
| 0.5 | 0.110 (0.000) | 100.33 (1.02) | |
| 0.7 | 0.113 (0.000) | 95.66 (0.79) | |
|
| |||
| qradmm (TS-SCAD) | 0.3 | 0.113 (0.001) | 469.88 (2.56) |
| 0.5 | 0.114 (0.001) | 477.72 (2.33) | |
| 0.7 | 0.120 (0.001) | 521.33 (1.99) | |
|
| |||
| rqPen(TS-SCAD) | The algorithm runs out of memory for the three τs | ||
|
| |||
| Conquer(SCAD) | 0.3 | 0.115 (0.001) | 65.7 (1.95) |
| 0.5 | 0.113 (0.001) | 65.7(0.82) | |
| 0.7 | 0.120 (0.001) | 36.0 (0.48) | |
Similar results are also observed with two-step the PQR-SCAD. However, the rqPen for the two-step SCAD fails in this example due to the limitation of computing memory. The proposed algorithms have similar prediction error to that of qradmm, but the model sizes are much smaller. When , the proposed methods outperform qradmm, with fewer products included in the QR model. Conquer with SCAD penalty has similar performance to the proposed method under this scenario. We can also notice that PQR-SCAD gives better loss than PQR-Lasso does, and the two-step PQR-SCAD procedures select fewer products when the proposed algorithms are implemented.
4. Conclusion
QR model is a powerful data analytic tool in econometrics. To promote the application of QR in the high/ultrahigh dimension, in this paper, we propose efficient and parallelizable algorithms for PQR based on a three-block ADMM algorithm with feature-splitting, and further establish the convergence of the proposed algorithms. Due to the nature of feature-splitting algorithm, the proposed algorithms can be used to minimize the objective function of PQR in ultrahigh dimension. Our numerical study implies that the proposed algorithms outperform existing ones for PQR. To illustrate the performance of proposed methods, we conduct a comprehensive simulation study. The numerical experiments suggest that the proposed method is stable when the dimension of the data is huge while existing algorithms run out of memory and fail to accomplish the tasks. The proposed algorithms may be extended to other statistical models such as supporting vector machine which has similar loss function to the loss function of QR. This is an interesting topic for future research.
Acknowledgment
Christina Dan Wang is supported in part by National Natural Science Foundation of China (NNSFC) grant 11901395 and 12271363. Li’s research was supported by National Science Foundation DMS-1820702 and NIAID/NIH grants R01-AI136664 and R01AI170249.
Appendix: Technical Details and Proofs
In this appendix, we first provide details of how to update each variable in Algorithm 2, and then provide technical proofs of Theorem 1.
A.1. Sub-problems in Algorithm 2
In this subsection, we derive the updates for and in Algorithm 2. For ease of notation, define a set of functions f, g, h.
| (A.1) |
Thus, f, g, h are closed proper convex functions. Further define matrices F, G, H
| (A.2) |
Then Problem (9) can be expressed as a general three-block constrained optimization problem,
| (A.3) |
where by definitions of F, G and H,
| (A.4) |
As in sGS-sPADMM proposed by Sun et al. (2015), we update the three-block variables using a special cycle,
| (A.5) |
where and are optionally added self-adjoint positive semidefinite operators. To update , we need to compute . Since
we apply the Sherman-Morrison-Woodebury formula to compute and it follows that
| (A.6) |
In the z-subproblem, we set and then we have
| (A.7) |
The closed-form solution of the z-subproblem can be easily derived as
| (A.8) |
A.2. Proof of Theorem 1
We first show Lemmas A.1, A.2 and A.3, which are used in the proof of Theorem 1. From (A.2), we have Fact 1 below.
Fact 1. is positive definite.
Assumptions 1 below is imposed to obtain theoretical guarantee on feasibility and convergence of the sequence .
Assumption 1. There exists , such that .
For algorithm (A.5), the projection matrix plays an important role in the convergence analysis. Let . Since can be expressed as , it follows that . Given that in our case, we can now rewrite (A.3) as
| (A.9) |
Stopping Criterion. In the implementation of Algorithm 2, We use the same stopping criterion as the one introduced in Boyd et al. (2011). The primal and dual residuals are often used in characterizing the convergence stage. Define as the primal residual and as the dual residual at the (k + 1)th iteration. The termination criterion is
| (A.10) |
where and are feasibility tolerances chosen as , and . A common choice could be and .
The augmented Lagrangian function for (A.9) is given by
Using similar arguments in Sun et al. (2015), it follows that applying the updates in A.5 to problem (A.3) is equivalent to applying the following 2-block semi-proximal ADMM to (A.9),
| (A.11) |
The Karush–Kuhn–Tucker (KKT) optimality condition of (A.9) is
| (A.12) |
Denote the solution set to (A.12) as , then we can replace Assumption 1 by assuming that is non-empty. Let be an optimal solution to (A.9). We have the following lemma on the convergence of the proposed algorithm by utilizing its equivalence to the updates in (A.11).
Lemma A.1. Suppose Assumption 1 holds. and are chosen such that and are positive definite. Then under the condition , the sequence generated by (A.5) converges to a limit point with solving (A.3) and is the dual optimal.
Lemma A.1 follows by a direct application of Theorem 3.2 in Han et al. (2018). Based on (A.2), we have the following fact.
Fact 2. Suppose converges to . There exists a positive constant q such that
| (A.13) |
for a sufficiently large k.
For any convex function P, proxP(·) denotes the proximal mapping associated with P. That is,
| (A.14) |
Denote , where , then we have the following relationship.
Lemma A.2. Suppose the sequence is generated by algorithm (A.5), then for any ,
| (A.15) |
Proof. Consider the optimality conditions of subproblems in (A.11), we have
| (A.16) |
Then we have and it follows that
and we have
| (A.17) |
We first bound the term . By the fact that the proximal mapping is Lipschitz continuous with constant 1, i.e., for any mapping h,
| (A.18) |
By taking into account the fact that
and
and the inequality that
where is the largest eigenvalue of FFT, (A.18) can be reduced to
| (A.19) |
Similarly we can bound the term ,
| (A.20) |
From the update of , we have
| (A.21) |
Combining (A.19), (A.20) and (A.21), we can obtain that
| (A.22) |
□
Lemma A.3. Suppose that Assumptions 1 holds, and assume that both and are positive definite. Then for all k sufficiently large and , there exists such that
| (A.23) |
where
| (A.24) |
with .
Proof. From Theorem 1 in Han et al. (2018), we can derive the following results.
| (A.25) |
When , it is ensured that . Let , then we have
| (A.26) |
| (A.27) |
Note that , and we have
| (A.28) |
Let in defined in (A.24), and . Note that when , the following relationship holds.
Combining with Lemma A.2, we have
| (A.29) |
Take , then we can obtain (A.23) with . □
Proof of Theorem 1. Since and are piecewise linear-quadratic functions, thus both proxf(·) and proxg(·) are piecewise polyhedral (Poliquin and Rockafellar, 1993) which implies Fact 2 (Han et al., 2018). Since we take , then is positive definite, and this together with the fact that imply that the sequence is automatically well defined. By Lemma A.1, under the condition , the sequence generated by algorithm (A.5) converges to a limit point with solving (9) and is the dual optimal.
To derive the rate of convergence, we first compute . By definition,
It follows that
| (A.30) |
Plugging equations (A.30) back into (A.29), we derive the results in Theorem 1 easily.
A.3. Algorithms for Two-Step PQR-SCAD
This section presents two three-block ADMM algorithms for PQR-SCAD proposed in Section 2.3.
Algorithm 3.
FS-QRADMM-CD for Two-Step PQR-SCAD
| Initialization: , and . |
| while the stopping criterion is not satisfied, do |
| Update by |
| Compute and by (13). |
| Update by (14). |
| end while The solution is denoted as . |
| Initialization: and . Compute |
| for . |
| while the stopping criterion is not satisfied, do |
| Update by (12). |
| Compute and by (13). |
| Update by (14). |
| end while |
Algorithm 4.
FS-QRADMM-prox for Two-Step PQR-SCAD
| Initialization: , and . |
| while the stopping criterion is not satisfied, do |
| Update by |
| Compute and by (13). |
| Update by (14). |
| end while denote the solution as |
| Initialization: and . Compute |
| for . |
| while the stopping criterion is not satisfied, do |
| Update by (15). |
| Compute and by (13). |
| Update by (14). |
| end while |
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Altunbaş Y and Thornton J (2019). The impact of financial development on income inequality: a quantile regression approach. Economics Letters, 175:51–56. [Google Scholar]
- Belloni A and Chernozhukov V (2011). L1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39(1):82–130. [Google Scholar]
- Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122. [Google Scholar]
- Cai Z, Chen H, and Liao X (2022). A new robust inference for predictive quantile regression. Journal of Econometrics. In press. [Google Scholar]
- Chen C, He B, Ye Y, and Yuan X (2016). The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1):57–79. [Google Scholar]
- D’Haultfœuille X, Maurel A, and Zhang Y (2018). Extremal quantile regressions for selection models and the black–white wage gap. Journal of Econometrics, 203(1):129–142. [Google Scholar]
- Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348–1360. [Google Scholar]
- Fan J, Li R, Zhang C-H, and Zou H (2020). Statistical Foundations of Data Science. Chapman and Hall/CRC. [Google Scholar]
- Fan J, Xue L, and Zou H (2014). Strong oracle optimality of folded concave penalized estimation. The Annals of Statistics, 42(3):819–849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan Y, Lin N, and Yin X (2021). Penalized quantile regression for distributed big data using the slack variable representation. Journal of Computational and Graphical Statistics, 30(3):557–565. [Google Scholar]
- Fazel M, Pong TK, Sun D, and Tseng P (2013). Hankel matrix rank minimization with applications to system identification and realization. SIAM Journal on Matrix Analysis and Applications, 34(3):946–977. [Google Scholar]
- Firpo S, Galvao AF, Pinto C, Poirier A, and Sanroman G (2022). GMM quantile regression. Journal of Econometrics. In press. [Google Scholar]
- Fortin M and Glowinski R (2000). Augmented Lagrangian methods: applications to the numerical solution of boundary-value problems, volume 15. Elsevier. [Google Scholar]
- Friedman J, Hastie T, Höfling H, and Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332. [Google Scholar]
- Friedman J, Hastie T, and Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22. [PMC free article] [PubMed] [Google Scholar]
- Giessing A and He X (2019). On the predictive risk in misspecified quantile regression. Journal of Econometrics, 213(1):235–260. [Google Scholar]
- Gimenes N and Guerre E (2022). Quantile regression methods for first-price auctions. Journal of Econometrics, 226(2):224–247. [Google Scholar]
- Gu J and Volgushev S (2019). Panel data quantile regression with grouped fixed effects. Journal of Econometrics, 213(1):68–91. [Google Scholar]
- Gu Y, Fan J, Kong L, Ma S, and Zou H (2018). ADMM for high-dimensional sparse penalized quantile regression. Technometrics, 60(3):319–331. [Google Scholar]
- Han D, Sun D, and Zhang L (2018). Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Mathematics of Operations Research, 43(2):622–637. [Google Scholar]
- He X, Pan X, Tan KM, and Zhou W-X (2022). Smoothed quantile regression with large-scale inference. Journal of Econometrics. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koenker R (2017). Quantile regression: 40 years on. Annual Review of Economics, 9:155–176. [Google Scholar]
- Koenker R and Bassett G (1978). Regression quantiles. Econometrica, 46(1):33–50. [Google Scholar]
- Koenker R, Chernozhukov V, He X, and Peng L (2017). Handbook of Quantile Regression. CRC press. [Google Scholar]
- Koenker R and Mizera I (2014). Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. Journal of the American Statistical Association, 109(506):674–685. [Google Scholar]
- Lee ER, Noh H, and Park BU (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109(505):216–229. [Google Scholar]
- Li Y and Zhu J (2008). L1-norm quantile regression. Journal of Computational and Graphical Statistics, 17(1):163–185. [Google Scholar]
- Narisetty N and Koenker R (2022). Censored quantile regression survival models with a cure proportion. Journal of Econometrics, 226(1):192–203. [Google Scholar]
- Peng B and Wang L (2015). An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. Journal of Computational and Graphical Statistics, 24(3):676–694. [Google Scholar]
- Poliquin R and Rockafellar R (1993). A calculus of epi-derivatives applicable to optimization. Canadian Journal of Mathematics, 45(4):879–896. [Google Scholar]
- Sherwood B and Maidman A (2017). rqPen: Penalized Quantile Regression. R package version 2.0. [Google Scholar]
- Sun D, Toh K-C, and Yang L (2015). A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM Journal on Optimization, 25(2):882–915. [Google Scholar]
- Tan KM, Wang L, and Zhou W-X (2022). High-dimensional quantile regression: Convolution smoothing and concave regularization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(1):205–233. [Google Scholar]
- Wang H (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524. [Google Scholar]
- Wang L and He X (2022). Analysis of global and local optima of regularized quantile regression in high dimensions: A subgradient approach. Econometric Theory. [Google Scholar]
- Wang L, Kim Y, and Li R (2013). Calibrating non-convex penalized regression in ultra-high dimension. The Annals of Statistics, 41(5):2505–2536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L, Wu Y, and Li R (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107(497):214–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y and Liu Y (2009). Variable selection in quantile regression. Statistica Sinica, 19(2):801–817. [Google Scholar]
- Yi C and Huang J (2017). Semismooth newton coordinate descent algorithm for elastic-net penalized huber loss regression and quantile regression. Journal of Computational and Graphical Statistics, 26(3):547–557. [Google Scholar]
- Yu L and Lin N (2017). Admm for penalized quantile regression in big data. International Statistical Review, 85(3):494–518. [Google Scholar]
- Yu L, Lin N, and Wang L (2017). A parallel algorithm for large-scale nonconvex penalized quantile regression. Journal of Computational and Graphical Statistics, 26(4):935–939. [Google Scholar]
- Zhang C-H (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2):894–942. [Google Scholar]
- Zou H and Li R (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4):1509–1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
