Regression shrinkage and selection via least quantile shrinkage and selection operator

Alireza Daneshvar; Golalizadeh Mousa

doi:10.1371/journal.pone.0266267

. 2023 Feb 16;18(2):e0266267. doi: 10.1371/journal.pone.0266267

Regression shrinkage and selection via least quantile shrinkage and selection operator

Alireza Daneshvar ^1,^#, Golalizadeh Mousa ^1,^*,^#

Editor: Xiaoyu Song²

PMCID: PMC9934385 PMID: 36795659

Abstract

Over recent years, the state-of-the-art lasso and adaptive lasso have aquired remarkable consideration. Unlike the lasso technique, adaptive lasso welcomes the variables’ effects in penalty meanwhile specifying adaptive weights to penalize coefficients in a different manner. However, if the initial values presumed for the coefficients are less than one, the corresponding weights would be relatively large, leading to an increase in bias. To dominate such an impediment, a new class of weighted lasso will be introduced that employs all aspects of data. That is to say, signs and magnitudes of the initial coefficients will be taken into account simultaneously for proposing appropriate weights. To allocate a particular form to the suggested penalty, the new method will be nominated as ‘lqsso’, standing for the least quantile shrinkage and selection operator. In this paper, we demonstate that lqsso encompasses the oracle properties under certain mild conditions and delineate an efficient algorithm for the computation purpose. Simulation studies reveal the predominance of our proposed methodology when compared with other lasso methods from various aspects, particularly in ultra high-dimensional condition. Application of the proposed method is further underlined with real-world problem based on the rat eye dataset.

Introduction

Most data analysts intend to accomplish two central goals while dealing with regression models in a high dimensional framework. The first aim is to profit high prediction accuracy. Another one which is recognized as interpretability, applies to the act of selecting pertinent explanatory variables that have an intense relationship with the response variable [1]. In other words, prediction accuracy refers to adjusting bias and variance components. It is important to note that managing bias and variance trade-off results in high prediction accuracy, provided that appropriate modeling methods have been designated. Regularization technique is a common strategy to design a convenient balance between these two quantities. With a convex penalty class, some regularization methods involve nonnegative garotte [2], ridge [3], lasso [4] and elastic net [5]. In association with the aforementioned tools, lasso has been greately appreciated as a result of its statistical and applied properties. It is noteworthy to mention that adaptive lasso, defined by Zou [6] is a particular version of lasso that allocates adaptive weights to different coefficients by including some impressive properties in the L₁ penalty. As a result of selecting a right subset model under some particular conditions, this method covers two aforementioned properties, i.e., high prediction accuracy and sensible interpretability. It is worthy to note that Tibshirani [7] has reviewed various statistical methods following out lasso.

It is common knowledge that lasso disregards the effect of random variables in the penalty term. On the contrary, the adaptive lasso, suppresses such a drawback and as a result, has a better precision in the statistical manner. This technique encompasses oracle properties, conforming the same algorithm as lasso does. In the mathematical context, adaptive weights are determined using the initial estimates, derived via invoking OLS method for the estimation of regression coefficients. The corresponding weights lead to high precision, improving the adaptive lasso to be authentic. Bühlmann and Van De Geer in [8] have mentioned the condition that if initial coefficients are large, adaptive lasso employs a small penalty leading to little shrinkage and less bias. In addition, neither of Zou in [6], Bühlmann and Van De Geer in [8] and Fan and his colleagues in [9] have investigated a specific situation by which the absolute of initial coefficients are less than one, resulting in extra bias. They have confined themselves to either taking the zero coefficient as the initials or restricting the weights in some pre-determined bounds. However, if the absolute of initial coefficient is less than one, the corresponding weight turns to be large. Indeed, in a regular regression (OLS), there is no penalty part, bias is low and variance is high. By applying a penalty, we sacrifice bias (i.e. increase bias) to reduce variance, considered as a bias-variance tradeoff. As a consequence, bias increases by adding a penalty term. Based upon the reseach by Bühlmann and Van De Geer in [8], if the absolute of the jth initial coefficients is large, the adaptive lasso enforces a minor penalty (i.e. little shrinkage) for the jth coefficient, causing less bias. Therefore, our proposed method can exert the mentioned subject. Specifically noted, our suggested method can overcome this problem by supporting unbiased estimates in a situation by which the weights are greater than one. It should be emphasized that adaptive lasso merely uses the magnitude of initial estimates. To handle this issue, we propose a new method with the gurantee to provide weights between zero and one, improving the accuracy of the new lasso family estimators better in comparison with other common methods. As shall be seen, our proposed method takes the signs and magnitudes of the initial coefficients into account. These two interesting features of the new method make it superior when compared with some current methods in modeling high dimensional data. One of the main proof for our proposed method is Bayesian approach. Our proposed method can be viewed through a Bayesian methodology. The lqsso can be considered as a Maximum A Posteriori (MAP) estimator. This is the case when one takes the prior distributions for the initial coefficients as the Asymmetric Laplace Distribution (ALD) written by Koenker and Machado in [10].

Specifically, we present a new weighted lasso, as an alternative to adaptive lasso for the estimation of regression parameters in various situations. Our adaptive weights have been inspired from quantile regression methodology, proposed by Koenker and Bassett in [11]. In contrast with [12, 13], applying check function as a loss with an adaptive lasso penalty, in this article we employ a common quadratic loss along with check function as a penalty to regularize the associated parameters.

In analogy with titles appeared in the terminology of lasso, we call our proposed method lqsso. We demonstrate that lqsso is a developed version of lasso and adaptive lasso. Moreover, we enforce that lqsso performs very well in comparison with other stated penalization methods. We also illustrate that the suggested method covers oracle properties, the concept advocated in [14]. Besides, lqsso is a convex optimization problem, like lasso and its extensions, therefore it does not suffer from local minimum drawback. The procedure to implement our proposed method is implicitly the same as lasso and adaptive lasso, accordingly we can apply efficient algorithms from both procedures. However if there is concern about computational costs of implemented methods, one can use coordinate descent [15] and LARS [16] to derive estimates from the lqsso method.

The rest of this article is organized as follows. We first define our proposed penalization method, i.e. lqsso. Further, we present the algorithm and geometrical aspects of our method. As mentioned previously, our proposed method takes advantages of oracle properties under certain conditions. The essentials to exhibit this advantage are presented. We conduct simulation studies to compare lasso, adaptive lasso and lqsso and bring forward the results. We provide an application of real data analysis to display the estimation and variable selection performance of our method. In conclusion, a general discussion on lqsso and comprehension for future research are represented.

Lqsso; a novel method of variable selection

Definition

Suppose the pairs (x_j,y), j = 1, 2, …, p are presented, where y = (y₁, …, y_n)^T and x_j = (x_1j, …, x_nj)^T are the response and predictor variables, respectively. Also, let X = [x₁, …, x_p] be a predictor matrix. In the present article, it is assumed that y_is are conditionally independent given x_ijs, remarking that x_ij are centered and scaled, therefore ∑_ix_ij = 0, $\sum_{i} x_{i j}^{2} / n = 1$ . Consider a general linear model structure y = Xβ + ε, where β = (β₁, …, β_p)^T, ε = (ε₁, …, ε_n)^T and ε₁,…,ε_n hint at independent identically distributed (i.i.d) random errors with zero mean and variance σ². In the rest, a subscript is assigned for the purpose of determining the estimation of coefficient β derived from a particular method. We expect such definitions are self-described without any ambiguity wherever they first appear in this and subsequent sections. Besides, a superscript analogous to (n) is applied to show dependency of the estimator on the sample size, and to investigate the asymptotic behavior of the estimator. An example is ${\hat{β}}_{l q s s o}^{(n)} = {({\hat{β}}_{(1, l q s s o)}^{(n)}, \dots, {\hat{β}}_{(p, l q s s o)}^{(n)})}^{T} .$

At first step, there is necessity to specify our proposed estimator according to a minimizing problem. In this connection, let us define an estimator, say ${\hat{β}}^{(n)},$ for the parameter β, as

\begin{matrix} {\hat{β}}^{(n)} = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ_{n} \sum_{j = 1}^{p} ρ_{τ} (β_{j}), \end{matrix}

(1)

where λ_n ≥ 0 is a tuning parameter, τ is a fixed number chosen from (0, 1) and

\begin{matrix} ρ_{τ} (β_{j}) = [τ - I (β_{j} < 0)] β_{j} = [(1 - τ) I (β_{j} \leq 0) + τ I (β_{j} > 0)] | β_{j} | . \end{matrix}

(2)

Note that the estimator in Eq (1) depends on λ_n. For this reason, the superscript (n) is located over $\hat{β}$ to obliquely emphasize the dependency. Moreover, as comprehended, we add ρ_τ(β_j) in Eq (1) to express the adapting manner of estimator with observation in our proposed model. Additionally, due to the structure of the check function it can be realized that the considered penalty is flexible with quantile τ. This is in contrast with other penalties proposed previously by which the response’s quantile did not play a part in the penalty. Without loss of generality, we assume a condition by which intercept has been eliminated from the regression model. If this is not the case, we can simply center the response, then one will eccounter with the variables having a zero mean.

To fit a model, an important point to note is that the corresponding coefficients are not known formerly. So, initially imposing some criteria on coefficients in order to choose relevant weights, does not make sense in real application. In particular, we intend to turn this fact around before implementing Eq (1). To tackle this case, we suppose that one has already derived the OLS estimates of the coefficients before intending to determine the corresponding weights appeared in our proposed method. Therefore, it is recommended to define lqsso estimate, $({\hat{β}}_{l q s s o}^{(n)}),$ as

\begin{matrix} {\hat{β}}_{l q s s o}^{(n)} = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ_{n} \sum_{j = 1}^{p} ρ_{τ}^{*} (β_{j}), \end{matrix}

(3)

where two fixed parameters are the same as those defined in Eq (1) and

\begin{matrix} ρ_{τ}^{*} (β_{j}) = [τ - I ({\hat{β}}_{j, O L S} < 0)] β_{j} = [(1 - τ) I ({\hat{β}}_{j, O L S} \leq 0) + τ I ({\hat{β}}_{j, O L S} > 0)] | β_{j} | . \end{matrix}

As perceived, the proposed lqsso functions similar to lasso, as a result of considering magnitude and sign of the estimated coefficients, applying OLS in the penalty term. Typically, the proposed penalty captures all accessible information in samples, therefore lqsso is expected to perform better than alternative competing methods in many small effect situations.

Our motivation to outline lqsso is in view of selecting variables and minimizing bias when initial coefficients for the corresponding weights in adaptive lasso are less than one. In this regard, the function employed in penalty term provides less weight in proportion to irrelevant coefficients, analogous to the process of treating usual and outlier variables differently in statistical analysis. Indeed, the idea is inspired from integrating two interesting methods proposed by Tibshirani and also Koenker and Bassett in [4, 11], respectively. It is commonly known that lasso minimizes

\begin{matrix} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ_{n} \sum_{j = 1}^{p} | β_{j} |, \end{matrix}

(4)

while the quantile regression modeling attempts to minimize the following expression

\begin{matrix} \sum_{i = 1}^{n} ρ_{τ} (y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j}) . \end{matrix}

(5)

But, as noticed formerly in Eq (3), we applied Koenker’s loss function as a penalty function. Our proposed method is further similar to adaptive lasso, aside from assigning different weights. Consider the weighted lasso,

\begin{matrix} \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ_{n} \sum_{j = 1}^{p} w_{j} | β_{j} |, \end{matrix}

(6)

where w = (w₁, …, w_p)^T refers to a known weight vector.

Assume that $\hat{β}$ is an estimator of β*, e.g. ${\hat{β}}_{(O L S)}$ , where $β^{*} = {(β_{1}^{*}, \dots, β_{p}^{*})}^{T}$ is a true coefficient vector. Choose a γ > 0, and define the weight vector as $\hat{w} = 1 / {| \hat{β} |}^{γ}$ . The adaptive lasso estimates, say ${\hat{β}}_{a l a s s o}^{(n)}$ , are then defined as

\begin{matrix} {\hat{β}}_{a l a s s o}^{(n)} = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ_{n} \sum_{j = 1}^{p} {\hat{w}}_{j} | β_{j} | . \end{matrix}

(7)

It is worth noting that, similar to Eqs (4) and (7), our objective in Eq (3) is also a convex optimization problem, and a global minimizer can thus be attained. Bearing in mind these stated similarities, one can conclude that lqsso is, in fact, a weighted lasso problem available in [6]. On account of resemblance between our penalization method and previous versions of lasso, we can apply convenient algorithms for solving adaptive lasso and lasso in order to calculate the lqsso estimates. Computational details regarding to implementing lqsso are provided in subsection The algorithm of lqsso.

The proposed lqsso has a closed-form and many representations can be used with lqsso penalty in various circumstances, including an applicable algorithm, invoking in a geometrical field, recalling a Bayesian aspect, proving the Oracle property and calculating KKT conditions in closed form in the quantile regression. The noteworthy aspect of check function, by way of loss or penalty function, is that it can be considered in various structures. At this juncture, we summarize some of them. The equivalent expressions are as follows:

\begin{matrix} ρ_{τ} (u) & = [τ - I (u \leq 0)] u = [(1 - τ) I (u \leq 0) + τ I (u > 0)] | u | \\ = u [τ I (u > 0) - (1 - τ) I (u \leq 0)] = \frac{| u | + (2 τ - 1) u}{2} . \end{matrix}

Depending upon the case by which a relevant ingredient, smooth the intended mathematical consideration, one of the above equivalent representations of the check function will be taken into account. By way of illustration, the last expression will be invoked in order to evaluate KKT conditions.

The geometry of lqsso

To present a visual sense of our proposed method, S1 Fig demonstrates a sketch of this objective for the case p = 2. In this condition, the loss function is assumed quadratic. The elliptical contour of this function is displayed by an ellipse, centered at the OLS estimates. The constraint regions, visualized with diamond in gray color, indicate the estimates derived from lasso. In addition, the polyhedrals are the spaces identified for the estimates granted by lqsso. As derived, the contour touches square in the lasso process and this issue sometimes occurs at a corner, corresponding to a zero coefficient. In lqsso structure, the contours touch polyhedrals as well, but with more flexibilities in such situations. For instance, varying τ in its domain allow an irregular constrained region (rather than a regular space) for the coefficients. In contrast, space of a regular polygon is the only area to seek the candidate parameters in the lasso case. It appears that lqsso provides more complete view and information corresponding to the constraint regions than lasso, indicating more comprehensive insight of lqsso compared with lasso. This is mainly due to the fact that absolute function is a special case of check function; regarding as penalties in both lasso and lqsso. As an instance, we arbitrarily set τ = .2 and τ = .8. Note that lqsso with τ = 0.5 is equivalent to lasso in a graphical view point, though this does not occur in reality. The reason is that lqsso is approximately equivalent to lasso. Specifically, the main difference between the stated methods is that lqsso includes indicator function of OLS coefficients in the penalty component, a reason for distinction from lasso when τ = 0.5.

The related signs exhibit the direction of polygon in figure and OLS coefficients in the penalty part. Strictly speaking, OLS coefficients are based on a density function f(y—X), which examines the mass not only at the middle of the density but also at their two tails, especially when f(.) is asymmetric. As a consequence, the suggested methodology is more pliable when compared with lasso and adaptive. For more explanation, plot C in Zou [6] is presented in S2 Fig for lqsso (black points). Note that Zou [6] indicated them by a line in his graph, however for a better illustration, we preferred to plot them as points. Regarding the τ value and its related sign, the degree of closeness of lqsso line to the red line as well as the positive or negative side can be achived. As formerly mentioned about Bayesian aspect, it should be noted that the value of τ is required to be fixed initially for calculations via invoking the cross-validation technique. By assuming τ = 0.8, with β more than zero and less than one (consequently the related weight would be more than one), we are able to assign a probability for this true coefficient in conjunction with a reasonable probability for zero coefficients, i.e., sparsity. This notion relates to signs. Moreover, our calculated weights would be between zero and one, as yet. Note that the proposed method advocates the highest probability corresponding to zero for the adaptive case, as can be realized in S3 Fig.

The algorithm of lqsso

The following section deals with computational aspects of implementing lqsso. Similar to alternative methods in the class of lasso, a coordinate descent algorithm can also provide the lqsso estimates after invoking Eq (3). This algorithm is available in the glmnet package [15], freely available in statistical software R. The glmnet package covers all the computational aspects related to the L₁ penalty and its corresponding extensions. Recalling discussion in the aforementioned sections, it is straightforward to delineate a procedure for implementing the suggested lqsso. This subject is briefly included in Algorithm 1.

Algorithm 1 (coordinate descent algorithm for deriving lqsso estimate):

1. Define $x_{j}^{*} = x_{j} / w_{j}^{q}$ , where j = 1, 2, …, p.

2. Solve the lasso problem for all λ_n, i.e.

\begin{matrix} {\hat{β}}_{l a s s o}^{(n)} = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} x_{j}^{*} β_{j})}^{2} + λ_{n} \sum_{j = 1}^{p} | β_{j} | . \end{matrix}

3. Output ${\hat{β}}_{(j, l q s s o)}^{(n)} = {\hat{β}}_{(j, l a s s o)}^{(n)} / w_{j}^{q}$ , j = 1, 2, …, p.

At this point, we supply a brief sketch of a proof by which the Algorithm 1 guarantees a solution. At first, we consider the following equivalent expressions

\begin{matrix} {\hat{β}}_{l q s s o}^{(n)} & = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ_{n} \sum_{j = 1}^{p} w_{j}^{q} | β_{j} | \\ = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j} \frac{w_{j}^{q}}{w_{j}^{q}})}^{2} + λ_{n} \sum_{j = 1}^{p} w_{j}^{q} | β_{j} | \\ = \underset{β}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} w_{j}^{q} x_{i j}^{*})}^{2} + λ_{n} \sum_{j = 1}^{p} w_{j}^{q} | β_{j} | \\ = \underset{β^{α}}{a r g m i n} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j}^{α} x_{i j}^{*})}^{2} + λ_{n} \sum_{j = 1}^{p} | β_{j}^{α} |, \end{matrix}

where $β_{j}^{α} = β_{j} w_{j}^{q}$ . Implementing the aforementioned algorithm on the last expression gives ${\hat{β}}_{(j, l a s s o)}^{α (n)} = w_{j}^{q} {\hat{β}}_{(j, l q s s o)}^{(n)}$ . It is then straightforward to recognize that ${\hat{β}}_{(j, l q s s o)}^{(n)} = \frac{{\hat{β}}_{(j, l a s s o)}^{α (n)}}{w_{j}^{q}}$ .

Determining regularization parameter is a significant stage in all penalized regression problems. Customarily, we employ ${\hat{β}}_{(O L S)}$ to compute the related weights in lqsso. But, following Zou [6], we can use ${\hat{β}}_{(r i d g e)}$ instead of ${\hat{β}}_{(O L S)},$ in high-dimensional case. Then, the objective is to obtain the optimal pairs of (τ, λ_n). This function is similar to the technique for applying adaptive lasso, where we intend to derive optimal pairs of (γ, λ_n). According to Zou [6], the adaptive lasso uses cross-validation to tune these pair parameters. While implementing lqsso technique, we further utilize the same procedure to derive (τ, λ_n).

Oracle properties of lqsso

This section provides oracle properties in the first phase. In the subsequent, we ascertain that our proposed penalization method (lqsso) follows the mentined features subject to some mild conditions. In particular, the subsequent Theorem demonstrates that lqsso covers oracle properties provided that a proper λ_n is selected.

Let $A = {j : β_{j}^{*} \neq 0}$ where $β_{j}^{*}$ is a j-th true coefficient and assume that the cardinality of $A$ equals p₀, i.e. $| A | = p_{0}$ such that p₀ < p. As a consequence, the true model depends only on a subset of covariates, having a strong relationship with response variables. Note that $\frac{1}{n} X^{T} X = C$ where C is a positive definite matrix. Generally speaking, the estimated regression coefficients, ${\hat{β}}_{1}, \dots, {\hat{β}}_{p}$ , possess the oracle properties, defined by Fan and Li in [14], if they satisfy the following conditions:

They present a true subset model, i.e. ${j : {\hat{β}}_{j} \neq 0} = A$ .
They follow asymptotic normality, i.e. $\sqrt{n} ({\hat{β}}_{A} - β_{A}^{*}) \overset{d}{\to} N (0_{p}, Σ)$ , where Σ refers to the covariance matrix of the true subset model.

Theorem[Oracle properties] Assume $\frac{λ_{n}}{\sqrt{n}} \to 0$ and $\sqrt{n} λ_{n} \to \infty$ . Then, the lqsso estimates ought to have the following properties:

Sparsity: $l i m_{n} P (A_{n}^{l q s s o} = A) = 1$ as n → ∞, where $A_{n}^{l q s s o} = {j : {\hat{β}}_{{j, l q s s o}}^{(n)} \neq 0}$ .
Asymptotic normality: $\sqrt{n} ({\hat{β}}_{{A, l q s s o}}^{(n)} - β_{A}^{*}) \overset{d}{\to} N (0_{p_{0}}, σ^{2} \times C_{11}^{- 1})$ ,

where C₁₁ is a p₀ × p₀ matrix; a component of C partitioned as:

\begin{matrix} C = [\begin{matrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{matrix}] . \end{matrix}

Proof of Theorem:

At first, the asymptotic normality proof of the estimator derived from our proposed method, i.e., lqsso will be presented.

Let us consider $β = β^{*} + \frac{u}{\sqrt{n}}$ and

\begin{matrix} Ψ_{n} (u) = | | y - \sum_{j = 1}^{p} x_{j} (β_{j}^{*} + \frac{u_{j}}{\sqrt{n}}) {| |}^{2} + λ_{n} \sum_{j = 1}^{p} w_{j}^{q} | β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | . \end{matrix}

If ${\hat{u}}_{(n)} = \underset{u}{a r g m i n} Ψ_{n} (u)$ ; then $β_{l q s s o}^{* (n)} = β^{*} + \frac{{\hat{u}}_{(n)}}{\sqrt{n}}$ or ${\hat{u}}_{(n)} = \sqrt{n} (β_{l q s s o}^{* (n)} - β^{*})$ . Note that $Ψ_{n} (u) - Ψ_{n} (0) = V_{4}^{(n)} (u)$ , where

\begin{matrix} V_{4}^{(n)} (u) \equiv u^{T} (\frac{1}{n} X^{T} X) u - 2 \frac{ε^{T} X}{\sqrt{n}} u + \frac{λ_{n}}{\sqrt{n}} \sum_{j = 1}^{p} w_{j}^{q} \sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) . \end{matrix}

We know that $\frac{1}{n} X^{T} X \to C$ and $\frac{ε^{T} X}{\sqrt{n}} \overset{d}{\to} W = N (0_{p}, σ^{2} C) .$ Now consider the limiting behavior of the third term appeared in $V_{4}^{(n)} (u) .$ Note that our weights include τ, 1 − τ and an indicator function of the initial coefficients, i.e. $I ({\hat{β}}_{(j, O L S)} > 0)$ and $I ({\hat{β}}_{(j, O L S)} \leq 0)$ . Also, we know that the indicator functions converge, in probability, to an indicator function. Next, we consider three distinct cases regarding the value of $β_{j}^{*} .$

If $β_{j}^{*} > 0$ , then $\sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) \to u_{j} s g n (β_{j}^{*})$ . Consequently, we have $\frac{λ_{n}}{\sqrt{n}} τ \sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) \overset{p}{\to} 0,$ following the Slutsky’s theorem.

If $β_{j}^{*} = 0$ , then $\sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) \to u_{j}$ . Thus, we have $\frac{λ_{n}}{\sqrt{n}} (1 - τ) \sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) \overset{p}{\to} 0$ by, again, using the Slutsky’s theorem.

If $β_{j}^{*} < 0$ , then, $\sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) \to u_{j} s g n (β_{j}^{*})$ . Consequently, we have $\frac{λ_{n}}{\sqrt{n}} (1 - τ) \sqrt{n} (| β_{j}^{*} + \frac{u_{j}}{\sqrt{n}} | - | β_{j}^{*} |) \overset{p}{\to} 0$ , by invoking the Slutsky’s theorem one more time.

So, we see that $V_{4}^{(n)} (u) \overset{d}{\to} V_{4} (u)$ for every u, where

\begin{matrix} V_{4} (u) = u_{A}^{T} C_{11} u_{A} - 2 u_{A}^{T} W_{A} if u_{j} = 0, \forall j \notin A . \end{matrix}

Note that function $V_{4}^{(n)}$ is convex, and the unique minimum of V₄ is ${(C_{11}^{- 1} W_{A}, 0)}^{T}$ .

Following the epi-convergence results reported in Geyer [17] and Fu and Knight [18], we have

\begin{matrix} {\hat{u}}_{A}^{(n)} \overset{d}{\to} C_{11}^{- 1} W_{A} a n d {\hat{u}}_{A^{c}}^{(n)} \overset{d}{\to} 0 . \end{matrix}

(8)

In conclusion, we notice that $W_{A} \sim N (0_{p_{0}}, σ^{2} C_{11})$ ; which completes the asymptotic normality of lqsso.

At present, we reveal the consistency of lqsso. The asymptotic normality result indicates that ${\hat{β}}_{(j, l q s s o)}^{(n)} \overset{p}{\to} β_{j}^{*}$ for $\forall j \in A$ ; thus $P (j \in A_{n}^{l q s s o}) \to 1$ . Then, it suffices to show that $\forall j^{'} \notin A$ , $P (j^{'} \in A_{n}^{l q s s o}) \to 0$ .

Consider the event $j^{'} \in A_{n}^{l q s s o}$ . By the Karush-Kuhn-Tucker (KKT) optimality conditions (see, e.g. Bühlmann and Van De Geer [8]), we know that $2 x_{j^{'}}^{T} (y - X {\hat{β}}_{l q s s o}^{(n)}) = λ_{n} w_{j^{'}}^{q}$ . Note that $\frac{λ_{n}}{\sqrt{n}} w_{j^{'}}^{q} \overset{p}{\to} 0$ whereas

\begin{matrix} 2 \frac{x_{j^{'}}^{T} (y - X {\hat{β}}_{l q s s o}^{(n)})}{\sqrt{n}} = 2 \frac{x_{j^{'}}^{T} X \sqrt{n} (β^{*} - {\hat{β}}_{l q s s o}^{(n)})}{n} + 2 \frac{x_{j^{'}}^{T} ε}{\sqrt{n}} . \end{matrix}

By Eq (8) and the Slutsky’s theorem, we know that the quantity $2 \frac{x_{j^{'}}^{T} X \sqrt{n} (β^{*} - {\hat{β}}_{l q s s o}^{(n)})}{n}$ converges to a normal density in distribution and $2 \frac{x_{j^{'}}^{T} ε}{\sqrt{n}} \overset{d}{\to} N (0, 4 | | x_{j^{'}} {| |}^{2} σ^{2})$ . Thus

\begin{matrix} P (j^{'} \in A_{n}^{l q s s o}) \leq P (2 x_{j^{'}}^{T} (y - X {\hat{β}}_{l q s s o}^{(n)}) = λ_{n} w_{j^{'}}^{q}) ⟶ 0 . \end{matrix}

This proves the consistency of lqsso. Note that based upon the knowledge from elementary statistics, we applied a simple property for the last stage of convergence. The mentioned property states that the probability for the case by which a continuous variable, with a continuous distribution, equals to a constant value is zero. The important point to highlight is that, unlike our procedure, Zou [6] invoked a property by which normal distribution tends to zero at its tails, in other words corresponding variables tend to infinity.

Note that oracle property, along with implementing a simple adoption from l₁ penalty, insure that our proposed method follows oracle properties.

Simulation studies

In this section, we present the results of simulation studies in order to illustrate the performance of our proposed method. Considering that our intend is to compare lqsso with lasso and adaptive lasso, we pursue to maintain the same spirit of simulation scheme investigated by Zou [6]. Hence, we take into account the effects of the same parameters that he considered, i.e. $\frac{σ_{x}^{2}}{σ^{2}}$ , σ² and n, respectively, indicating the Signal-to-Noise Ratio (SNR), the variance of the error and the sample size. Because we can write $E [{(\hat{y} - y_{t e s t})}^{2}] = E [{(\hat{y} - X^{T} β)}^{2}] + σ^{2},$ we report the Relative Prediction Error (RPE), defined as $R P E = \frac{E [{(\hat{y} - X^{T} β)}^{2}]}{σ^{2}},$ for comparing different regression methods discussed in this paper combined with various scenarios. To conduct our simulation studies, we apply various linear models represented by y = x^Tβ + N(0, σ²) through altering sample size (n), SNR and the error variance (σ²). As frequently, we take OLS coefficient estimates as the initials for weights when considering adaptive lasso and lqsso. In the high-dimensional setting, the ridge coefficient estimates are yet determind as initial coefficients for those weights. The coordinate descent algorithm proposed by Friedman et al. [15], available in the glmnet package is then implemented to compute estimates of the relevant parameters while fitting the three delibrated methods. For each method, we select λ_n and γ in adaptive lasso and τ for lqsso, using a set of conceivable values. The selected values for λ_n, γ and τ are {0.1, 0.2, …, 2}, {0.1, 0.2, …, 2} and {0, 0.01, …, 1}, respectively. Thereupon, the sum of squared difference between all estimated coefficients and their true values has been calculated in order to inspect bias. In this manner, the mean of 100 simulation runs has been reported as a measure of bias for each method. In subsequent, the outcomes of aforesaid models in connection with various scenarios assumed in our simulation studies are presented.

Low dimensional case

Model 1: We set β = (β₁, β₂, β₃, β₄, 0, 0, 0, 0)^T, where β₁,β₂,β₃,β₄ are independently generated from standard normal distribution. The covariates x_i (i = 1, …, n) are i.i.d random vectors generated from 8 dimensional standard multivariate normal distribution. To impose collinearity among variables, we define correlation between each pair of predictor variables, x_ij and x_ij′, through the expression cor(j, j′) = (0.5)^|j−j′|, 1 ≤ j, j′ ≤ 8. We also set σ equal to 1, 3 and 6 where the corresponding SNRs are 21.25, 2.35 and 0.59, respectively. The sample size (n) is fixed at 40 and 80.

Model 2: This model is similar to Model 1, except all β_j are independently generated from standard normal distribution for j = 1, …, 8.

High dimensional setting

For two alternative models, we specify all parameters and simulation settings similar to Model 1, besides the numbers of variables which are set at p = 100.

Model 3 (Dense): All 100 coefficients are independently generated from standard normal distribution.

Model 4 (Sparse): A total of 30 non-zero coefficients are independently simulated from standard normal distribution.

Ultra high dimensional setting

At this point, we merely consider one model with number of variables (p) equal to 1000.

Model 5 (Very sparse): Only 30 coefficients are non-zero. The assumed coefficients are independently generated from normal distribution with mean and standard deviation equal to 0.5. Remainder coefficients; from the total of 970, are set at zero.

In what follows, we are going to provide more details for the simulated samples and processing them for major analysis. In addition, we present the notification of results. To evaluate the considered models, standard procedures have been carried out. In consequence, simulated observations were divided into two sections: training data and test samples. The number of training samples were fixed at 100 for each scenario analyzed formerly. Extra 1000 samples were employed as test set. To derive appropriate values for λ_n, γ and τ, RPE criterion was implemented, while n and σ were set as discussed previously.

To evaluate the accuracy of RPE, the related standard errors were computed through a bootstrap scheme, in the following manner. A fixed sample of RPEs were generated considering as bootstrapped samples. Afterwards, the median of the bootstrapped samples were extracted. The procedure was thus repeated 500 times. Standard deviation of medians, denominating as Monte Carlo sd, was reported for the estimated standard error of RPEs.

To demonstrate various aspects of simulation studies and for the purpose of comparison, the outputs were intentionally divided into sections. In other words, we separated the results to low, high and ultra high-dimensional settings, similar to the structure defined in the simulation study. We then prepared the results equivalent to the separation process. It is necessary to mention that more extensive simulations can be supplied to underline other aspects of the proposed methodology. Nevertheless, to save space and to emphasize on more appropriate results, we preferred to focus on aspects highlighting the proposed method. Henceforth, we cover the outcomes of our investigations.

The results derived from simulation studies in the low-dimensional settings are demonstrated in Tables 1 and 2. Some essential remarks extracted from these two tables are as follows. Focusing on Model 1, it can be achieved that for different values of SNR, n and σ, the adaptive lasso is performing better than lasso and lqsso in terms of RPE and bias criteria. From the perspective of RPE, the proposed method has a better performance than lasso. With some minor exceptions for lasso in terms of bias, lasso and lqsso had no superiority over each other. Pointed out that Model 1 refers to low dimensional setting.

Table 1. The mean values of RPEs for Model 1 and Model 2.

			Method
Model	n	σ	lasso	alasso	lqsso
Model 1	40	1	0.334 (0.021)	0.163 (0.019)	0.297 (0.027)
		3	0.181 (0.014)	0.126 (0.009)	0.163 (0.015)
		6	0.137 (0.017)	0.121 (0.010)	0.123 (0.018)
	80	1	0.206 (0.010)	0.051 (0.004)	0.203 (0.014)
		3	0.084 (0.007)	0.065 (0.005)	0.079 (0.006)
		6	0.075 (0.006)	0.059 (0.006)	0.074 (0.006)
Model 2	40	1	0.432 (0.030)	0.434 (0.032)	0.345 (0.025)
		3	0.258 (0.010)	0.262 (0.010)	0.253 (0.011)
		6	0.220 (0.013)	0.226 (0.021)	0.210 (0.018)
	80	1	0.314 (0.022)	0.308 (0.022)	0.242 (0.013)
		3	0.109 (0.008)	0.110 (0.008)	0.103 (0.005)
		6	0.116 (0.008)	0.117 (0.008)	0.113 (0.008)

Open in a new tab

The mean values of RPEs and their standard errors (in bracket), after fitting Model 1 and Model 2 using lasso, adaptive lasso (alasso) and lqsso methods for the simulated data. For each scenario, the selected methods in each row are highlighted in bold.

Table 2. The mean values of bias for Model 1 and Model 2.

			Method
Model	n	σ	lasso	alasso	lqsso
Model 1	40	1	0.166 (0.077)	0.104 (0.012)	0.170 (0.064)
		3	0.342 (0.070)	0.294 (0.006)	0.346 (0.072)
		6	0.597 (0.045)	0.551 (0.006)	0.594 (0.047)
	80	1	0.139 (0.063)	0.059 (0.012)	0.142 (0.080)
		3	0.245 (0.059)	0.222 (0.006)	0.234 (0.062)
		6	0.429 (0.058)	0.396 (0.005)	0.433 (0.064)
Model 2	40	1	0.194 (0.204)	0.197 (0.021)	0.165 (0.037)
		3	0.400 (0.047)	0.400 (0.028)	0.388 (0.032)
		6	0.800 (0.022)	0.803 (0.009)	0.765 (0.027)
	80	1	0.169 (0.249)	0.171 (0.042)	0.137 (0.037)
		3	0.263 (0.042)	0.267 (0.028)	0.254 (0.033)
		6	0.540 (0.042)	0.542 (0.025)	0.539 (0.043)

Open in a new tab

The mean values of bias and their standard errors (in bracket), after fitting Model 1 and Model 2 using lasso, adaptive lasso (alasso) and lqsso methods for the simulated data. For each scenario, the selected methods in each row are highlighted in bold.

Under construction of Model 2, the proposed method, i.e. lqsso, performs better than two alternative methods in terms of RPE and bias. Interestingly, standard deviation of RPEs corresponding to our suggested methodology is also lower compared with the corresponding values of lasso and adaptive lasso. Also like to point out that Model 2 regards to sparse situation in the lower dimensional setting. Hence, the lqsso is recognizing the sparsity better than two standard methods in the regression modeling framework. Additionally, Model 2 reveals that to incorporate with high variability among data expressed in the weights for coefficients, lqsso uses the information available in data better than the adaptive lasso. Bear in mind that we don’t intend to make a comparison between lasso and alasso. This is because of the fact that such comparison needs to be done precisely in terms of both mentioned criteria and the bootstrapped sd, it also requires invoking many debates related to the stated methods.

In general, regardless of the presumed model, the estimated standard errors for all methods tend to decrease by increasing the sample size. This conclusion is exactly the same as what we expect from the asymptotic behavior of estimators in the context of statistical inference. To verify the asymptotic behavior of the mentioned methods, we focus on more scientific details. In the low dimensional setting by which the sparsity also exists, we claim that our proposed methodology performs better compared with lasso and alasso. It should be noted that lqsso can be considered as an economical method in the context of having low bias, sparsity and trade-off between the bias and variance, simultaneously.

Based upon the results reported in Tables 3 and 4, at first instance it might be difficult to make an explicit decision in declaring the best method based on scenarios considered in Model 3 and Model 4. However, we can assert that our proposed method outperforms two alternatives. Such a conclusion might be a little optimistic statement based on the values reported in Table 4. But, the results in Table 3 confirms that our method has the least RPE for all scenarios considered for the Model 3 and Model 4. Our method only lost to the alasso in some cases in terms of bias as it is evident in Table 4. However, in the lost cases, the difference between values of bias obtained from lqsso and alasso is negligible, which might be as a result of some minor computation roundings.

Table 3. The mean values of RPEs for Model 3, Model 4 and Model 5.

			Method
Model	n	σ	lasso	alasso	lqsso
Model 3	40	1	1109.216 (1.628)	1110.291 (1.706)	964.131 (2.583)
		3	167.084 (0.884)	161.840 (0.644)	161.391 (1.241)
		6	40.717 (0.184)	40.901 (0.164)	40.300 (0.285)
	80	1	391.573 (5.711)	371.177 (4.848)	240.280 (2.582)
		3	43.058 (0.416)	44.073 (0.418)	42.705 (0.552)
		6	13.532 (0.211)	13.559 (0.211)	13.508 (0.185)
Model 4	40	1	32.700 (0.377)	33.031 (0.353)	31.753 (0.570)
		3	8.200 (0.137)	8.218 (0.127)	8.111 (0.123)
		6	3.396 (0.082)	3.343 (0.063)	3.311 (0.072)
	80	1	3.146 (0.081)	3.340 (0.088)	2.657 (0.072)
		3	2.370 (0.059)	2.210 (0.081)	2.002 (0.050)
		6	1.369 (0.030)	1.357 (0.032)	1.206 (0.035)
Model 5	40	1	226.060 (2.656)	162.295 (2.094)	98.148 (0.829)
		3	46.331 (0.487)	14.542 (1.226)	4.573 (0.086)
		6	9.213 (0.120)	4.956 (0.148)	3.211 (0.067)
	80	1	28.020 (0.597)	15.912 (0.462)	8.673 (0.235)
		3	3.779 (0.128)	3.254 (0.109)	2.949 (0.065)
		6	3.577 (0.070)	2.802 (0.074)	1.680 (0.046)

Open in a new tab

The mean values of RPEs and their standard errors (in bracket), after fitting Model 3, Model 4 and Model 5 using lasso, adaptive lasso (alasso) and lqsso methods. For each scenario, the selected methods in each row are highlighted in bold.

Table 4. The mean values of bias for Model 3, Model 4 and Model 5.

			Method
Model	n	σ	lasso	alasso	lqsso
Model 3	40	1	8.773 (0.010)	8.766 (0.058)	7.900 (0.887)
		3	10.142 (0.019)	10.058 (0.030)	10.161 (0.226)
		6	10.580 (0.011)	10.547 (0.046)	10.484 (0.242)
	80	1	5.219 (0.126)	5.117 (0.038)	4.165 (0.107)
		3	5.553 (0.147)	5.565 (0.045)	5.535 (0.122)
		6	6.247 (0.101)	6.257 (0.044)	6.236 (0.203)
Model 4	40	1	1.588 (0.028)	1.600 (0.026)	1.524 (0.112)
		3	2.058 (0.008)	2.067 (0.043)	2.047 (0.161)
		6	2.661 (0.009)	2.599 (0.061)	2.690 (0.207)
	80	1	0.514 (0.055)	0.528 (0.027)	0.460 (0.048)
		3	1.292 (0.027)	1.010 (0.038)	1.168 (0.040)
		6	1.822 (0.023)	1.802 (0.092)	1.684 (0.028)
Model 5	40	1	2.740 (0.004)	2.225 (0.021)	1.855 (0.231)
		3	3.766 (0.003)	2.402 (0.008)	1.613 (0.210)
		6	2.960 (0.004)	2.651 (0.012)	2.354 (0.482)
	80	1	1.002 (0.057)	0.795 (0.083)	0.603 (0.371)
		3	1.131 (0.005)	1.072 (0.004)	1.153 (0.164)
		6	2.120 (0.005)	2.082 (0.010)	1.768 (0.122)

Open in a new tab

The mean values of bias and their standard errors (in bracket), after fitting Model 3, Model 4 and Model 5 using lasso, adaptive lasso (alasso) and lqsso methods on the simulated data. For each scenario, the selected methods in each row are highlighted in bold.

But as highlighted in Tables 3 and 4, lqsso is superior to lasso for all scenarios. It should be noted that Model 3 and Model 4 demonstrate situations by which one experience with high dimensional data analysis in dense and sparse cases, respectively.

The situation is rather advantageous promising in the ultra high-dimensional setting. As demonstrated in Tables 3 and 4, it can be achieved that while invoking Model 5, our proposed method (lqsso) has the best performance in comparison with two competitive methods, i.e. lasso and adaptive lasso. Both tables indicate that our method detects sparsity very well and has low RPE and bias. Interestingly, the suggested method manage to retrieve the information among data and employs them to choose feasible weights for coefficients. Based upon conclusions achieved from simulation setting, we claim that our proposed method is better than adaptive lasso and lasso in ultra high-dimensional variable selection setting.

Note that the results and remarks presented in this section, were all based upon some particular scenarios and simulation settings. In this section, inspite the fact that various aspects of regression modeling were covered, a general conclusion can not be accomplished. Such consideration has also been addressed by Zou [6], based upon his simulation studies. In this regard, he also intended to make a decision on preferring between adaptive lasso and lasso methods.

To prepare a graphical conception in terms of bias and RPE for comparing each method based on different components appeared in the modeling process, i.e. sample size (n), standard deviation of error (σ), and five aforementioned models, S4 and S5 Figs are presented. At present, we do not discuss the remarks acquired from each figure, because the related results have already been presented while discussing the outputs in the previous tables. As stated, an specific decision on selecting the best candidate method is not straightforward. Nonetheless, according to the presented figures, the proposed lqsso method did relatively well in most cases.

In conclusion, the proposed method performed well in most scenarios and particularly in ultra high-dimensional setting. As a result, we are interested in evaluating its performance in more details. In this manner, similar to Zou [6], the performance of the suggested method in correctly selecting non-zero variables along with treating the sparsity will be evaluated. Accordingly, the performance of proposed method compared with two stated methods will be discussed in the ultra high-dimensional setting. Table 5 provides pertinent information based upon simulation studies outlined formerly in this section. It should be pointed out that the row labeled C shows the number of correctly identified non-zero variables, and the row labeled I indicates the number of zero variables incorrectly selected by each method. Hence, the method with high and less values for C and I, respectively, is preferred.

Table 5. Median of the number of (in)correctly selected variables for Model 5.

				Method
n	σ	Type	truth	lasso	alasso	lqsso
40	1	C	30	12 (0.040)	16 (0.802)	25 (1.727)
	1	I	0	18 (0.039)	14 (0.764)	5 (1.764)
	3	C	30	10 (0.082)	25 (1.140)	28 (2.065)
	3	I	0	20 (0.081)	4 (1.163)	2 (2.231)
	6	C	30	9 (0.077)	21 (0.861)	22 (2.654)
	6	I	0	21 (0.075)	9 (0.961)	8 (2.842)
80	1	C	30	25 (0.710)	27 (0.202)	28 (2.429)
	1	I	0	5 (0710)	3 (0.196)	2 (2.268)
	3	C	30	23 (0.258)	23 (0.614)	24 (1.626)
	3	I	0	7 (0.269)	7 (0.60)	6 (1.797)
	6	C	30	19 (0.131)	23 (0.479)	29 (1.805)
	6	I	0	11 (0.121)	7 (0.481)	1 (1.824)

Open in a new tab

Median of the number of (in)correctly selected variables and their standard errors (in bracket), after fitting Model 5 using lasso, adaptive lasso (alasso) and lqsso methods on the simulated data for the ultra high-dimensional setting.

Recalling our simulation process, there were 30 non-zero and 970 zero coefficients while implementing Model 5. Accordingly, in all scenarios lqsso had a better performance in terms of correctly identifying thirty important variables compared with lasso and adaptive lasso. In addition, lqsso correctly selected non-zero variables with a relative frequency of at least 74% in the worst case. In all cases, lqsso has the least zero variables incorrectly selected during the modeling process. Indeed, those variables wrongly declaring zero, i.e. its worst case is less than 1%. Typically, a better performance of lqsso is concluded other than two alternative methods in the simulation setup.

Real data analysis

To illustrate an application of our proposed method, we concentrate on the Bardet-Biedl syndrome gene expression data set studied by Scheetz et al. [19] and use the corresponding well-known data called the eye dataset which includes gene expression levels of p = 18975 genes from n = 120 rats. The main purpose of the analysis is to find out relevant genes that are correlated with gene TRIM32, a gene known to cause the eye disease Bardet-Biedl syndrome. Wang and Xiang [20] first screened down from 18975 genes to 3000 genes based upon the largest variances in gene expression levels. Afterwards, they computed the marginal correlation coefficients between each of these 3000 genes and the gene TRIM32, and selected the top 200 genes with the largest absolute correlation coefficients. In consequence, final data consists of 200 variables with 120 observations taken from the flare package [21]. We apply this final dataset, as data with n = 120, p = 200.

To conduct our analysis appropriately, we consider TRIM32 as the response variable in the proposed regression model. As discussed previously, to proceed our analysis, initial weights are not required in contrast with adaptive lasso and lqsso settings. Precisely, initial weights are set at the ridge estimates for the corresponding coefficients. The subsequent steps are designed according to the discussion represented in the paper. In other words, the main objectives are estimating the parameters by implemening different methods (lasso, adaptive lasso and lqsso) and comparing their performances with criteria provided in the simulation section.

As demonstrated in S6 Fig, the ridge estimates for coefficients, which are also considered as initial coefficients based upon the suggestion made by Zou [6], vary between -0.1 to 0.1 in magnitude. It is remarkable that estimates have a clear sign of conformity with the discussion presented for Model 4 and Model 5 in our simulation studies. Although the numbers of variables applied here is less, it might be claimed that this example is mostly relevant to Model 5. Hence, while fitting a regression model to analyze our real data example, we initially rely on this model to treat small effects scenario. In consequence, we expect that lqsso will provide a better accuracy and precision in capturing the variability in this data set.

For the comparison purpose, we compute the n-fold cross-validation test error, abbreviated as CVErr, i.e.

\begin{matrix} C V E r r = m e a n {{(y_{1} - y_{- 1, p})}^{2}, \dots, {(y_{n} - y_{- n, p})}^{2}}, \end{matrix}

(9)

where y_i is i-th observation of the response variable and y_{−i, p} refers to the predicted value by fitting a model using all observations except i-th sample. The latter technique is also nominated as leave-one-out cross-validation (sometimes LOOCV in abbreviation).

As demonstrated in Table 6, various methods function differently from each other. Note that the last lines at the end of table exhibit the accuracy criterion. As indicated, the inserted methods represent sparse solution in all covariates except in gene numbered 21094, 22016, 23041, 24565 and 29842. Therefore, the stated variables play an important role in causing eye disease, usually assigned as substantial and significant variables in this particular example. As a result, lqsso has the best performance in terms of CVErr criteria. Moreover according to the results, the superiority of suggested penalization method is apparent in comparison with previous lasso techniques. Another worth mentioning subject is the magnitude of estimated values. The lasso and lqsso provided the same estimates but CVErr regarding to lqsso is lower. In conclusion in this particular example, it should be noted that estimated values corresponding to λ_n, γ and τ were 0.4, 0.1 and 0.63, respectively.

Table 6. Estimated mean values for the coefficients in rat eye dataset.

Predictor	lasso	alasso	lqsso
21094	0.139	0.165	0.139
22016	0.172	0.184	0.172
23041	0.618	0.609	0.618
24565	0.054	0.045	0.054
29842	0.044	0.032	0.044
CVErr	0.013 (0.0004)	0.015 (0.003)	0.011 (0.0002)

Open in a new tab

The estimated mean values for cross-validation test error, their standard errors (in bracket) and the coefficients using lasso, adaptive lasso (alasso) and lqsso methods along with the model selection criteria in analyzing the rat eye data.

Conclusion

In this article, we defined a novel method in the structure of penalized regression problem. The proposed method is under the same umbrella as the renowned approach lasso does. The suggested method, denominated as lqsso, is able to treat unusual observations as well as selecting important variables. Moreover, the suggested method can appropriately deal with sparsity in various dimensional problems, i.e. low, moderate, high and ultra high-dimensional. Simulation studies conducted in this paper reveal the superiority of our proposed method in contrast with lasso and adaptive lasso in several small effect situations. Our claim is effectively confirmed regarding to RPEs and their standard errors in various simulation scenarios, particularly in ultra high-dimensional setting. Additionally, while analyzing a real data set, our proposed method has proved better performance compared with alternative methods.

We illustrated that our proposed method enjoys oracle properties under some mild conditions. In this connection, lqsso reacts the same as adaptive lasso does. Our proposed method provides lower bias than adaptive lasso, but not as good as lasso. Nevertheless, our suggested method performs well in the context of variable selection and sparsity compared with lasso. Considering RPE measure, our proposed method has a prime performance in comparison with two alternative methods, particularly in an ultra high-dimensional setting. As Zou [6] pointed out, there is no superior method in all situations, regarding what we mentioned in this article.

Similar to lasso, our proposed method can also be figured out in a Bayesian methodology. However, adaptive lasso can not be considered in this framework. The lqsso method can be remarked as a Maximum A Posteriori (MAP) estimator, similar to ridge and lasso regression. See, for instance, [22]. This will be true if the prior distributions for coefficients is considered as the Asymmetric Laplace Distribution (ALD). One can consult [10] for more details on this latter subject. We aim to develop such viewpoint in our future research.

The construction of quantile regression mainly arises from considering check function, a robust measure for treating outliers. Therefore, quantile regression is a prime trick to deal with unusual coefficients while considering a penalty function in a minimizing scheme. It is important to state that, with some prior knowledge on the skewness of regression coefficients, there is possibility and background to scrutinize an optimization algorithm for choosing appropriate weights. This topic will be our effort for further research.

Supporting information

S1 Fig. Plot of the geometry of the lqsso with different τ.

Plot of the lqsso estimators with τ = .2, τ = .5 (approximately equal to the lasso) and τ = .8 sketched at the left, middle and right panels, respectively, centered at OLS estimates. See the text for more details.

(TIFF)

Click here for additional data file.^{(13.3KB, tiff)}

S2 Fig. Plot of a comparison among the lqsso, lasso, and adaptive lasso.

Plot of the lqsso penalty in comparison with the lasso and adaptive lasso penalty according to the plot C in Zou [6].

(TIFF)

Click here for additional data file.^{(14.2KB, tiff)}

S3 Fig. Plot of the Bayesian view of the lqsso penalty.

Plot of the lqsso penalty in comparison with the lasso and adaptive lasso penalty based on Bayesian approach.

(TIFF)

Click here for additional data file.^{(11.2KB, tiff)}

S4 Fig. The RPEs comparison of the three methods.

Compare methods in terms of the RPE criterion using different models, sample sizes (n) and the standard deviation (sd) of the error term.

(PNG)

Click here for additional data file.^{(14.8KB, png)}

S5 Fig. The biases comparison of the three methods.

To compare methods in terms of the bias criterion using different models, sample sizes (n) and the standard deviation (sd) of the error term.

(TIFF)

Click here for additional data file.^{(15.7KB, tiff)}

S6 Fig. The initial ridge coefficients.

The plot shows the ridge estimates for the coefficients of the corresponding ridge regression model.

(TIFF)

Click here for additional data file.^{(371.6KB, tiff)}

Acknowledgments

Receiving support from the Center of Excellence in Analysis of Spatio-Temporal Correlated Data at Tarbiat Modares University is acknowledged.

Data Availability

https://cran.r-project.org/web/packages/flare/index.html.

Funding Statement

The author(s) received no specific funding for this work.

References

1. Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York: Springer Series in Statistics; 2001. [Google Scholar]
2. Breiman Leo. Better subset regression using the nonnegative garrote. Technometrics. 1995. Nov;37(4):373–384. doi: 10.1080/00401706.1995.10484371 [DOI] [Google Scholar]
3. Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970. Feb;12(1):55–67. doi: 10.1080/00401706.1970.10488634 [DOI] [Google Scholar]
4. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological). 1996. Jan;58(1):267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x [DOI] [Google Scholar]
5. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: series B (Statistical Methodology). 2005. Apr;67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]
6. Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006. Dec;101(476):1418–1429. doi: 10.1198/016214506000000735 [DOI] [Google Scholar]
7. Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2011. Jun;73(3):273–282. doi: 10.1111/j.1467-9868.2011.00771.x [DOI] [Google Scholar]
8. Bühlmann P, Van De Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. Berlin: Springer Science & Business Media; 2011. [Google Scholar]
9. Fan J, Feng Y, Wu Y. Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics. 2009. Jun;3(2):521–541. doi: 10.1214/08-AOAS215SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Koenker R, Machado JA. Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association. 1999. Dec;94(448):1296–1310. doi: 10.1080/01621459.1999.10473882 [DOI] [Google Scholar]
11. Koenker R, Bassett G Jr. Regression quantiles. Econometrica: Journal of the Econometric Society. 1978. Jan;46(1):33–50. doi: 10.2307/1913643 [DOI] [Google Scholar]
12. Wu Y, Liu Y. Variable selection in quantile regression. Statistica Sinica. 2009. Apr;19(2):801–817. [Google Scholar]
13. Ciuperca Gabriela. Adaptive LASSO model selection in a multiphase quantile regression. Statistics. 2016. Sep;50(5):1100–1131. doi: 10.1080/02331888.2016.1151427 [DOI] [Google Scholar]
14. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001. Dec;96(456):1348–1360. doi: 10.1198/016214501753382273 [DOI] [Google Scholar]
15. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010. Aug;33(1):1–22. doi: 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. The Annals of Statistics. 2004. Apr;32(2):407–499. doi: 10.1214/009053604000000067 [DOI] [Google Scholar]
17. Geyer CJ. On the asymptotics of constrained M-estimation. The Annals of Statistics. 1994. Dec;22(4):1993–2010. doi: 10.1214/aos/1176325768 [DOI] [Google Scholar]
18. Fu W, Knight K. Asymptotics for lasso-type estimators. The Annals of Statistics. 2000. Oct;28(5):1356–1378. doi: 10.1214/aos/1015957397 [DOI] [Google Scholar]
19. Scheetz TE, Kim KY, Swiderski RE, Philp AR, Braun TA, Knudtson KL, et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences. 2006. Sep;103(39):14429–14434. doi: 10.1073/pnas.0602562103 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Wang S, Xiang L. Two-layer EM algorithm for ALD mixture regression models: A new solution to composite quantile regression. Computational Statistics & Data Analysis. 2017. Nov;115:136–154. doi: 10.1016/j.csda.2017.06.002 [DOI] [Google Scholar]
21. Li X, Zhao T, Yuan X, Liu H. The flare package for high dimensional linear regression and precision matrix estimation in R. J. Mach. Learn. Res. 2015. Mar;16:553–557. [PMC free article] [PubMed] [Google Scholar]
22. Johnstone IM, Titterington DM. Statistical challenges of high-dimensional data. Philosophical Transactions of the Royal Society A. 2009. Nov;367(1906):4237–4253. doi: 10.1098/rsta.2009.0159 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0266267.r001

Decision Letter 0

Xiaoyu Song

13 Oct 2021

PONE-D-21-27670Regression Shrinkage and Selection via Least Quantile Shrinkage and Selection OperatorPLOS ONE

Dear Dr. Golalizadeh,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 27, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Xiaoyu Song

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

3. Please ensure that you refer to Figures 3 and 4 in your text as, if accepted, production will need this reference to link the reader to the figure.

4. Please remove your figures from within your manuscript file, leaving only the individual TIFF/EPS image files, uploaded separately. These will be automatically included in the reviewers’ PDF.

Additional Editor Comments:

The authors proposed an interesting idea to extend LASSO to have a quantile based weight. The rationals for using quantile based weights are not fully introduced. How quantile based weights can be linked with the quantile of outcomes, and why they are linked with the directions of the coefficients are not well explained. The establishment of statistical properties (e.g. oracle property) needs further explanation, and simulations should report more scenarios and/or model results to demonstrate the performance of the proposed method.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript introduced a new method in the class of the penalized regression approach. It is a solid contribution to the field but some important pieces in the simulation result section are missing. I do believe, however, a substantial revision will help make it a more valuable contribution to the literature.

Major comments:

1. In the simulation studies section, the authors compared RPEs in Table 1 and 4, bias in Table 2 and 4, number of (in)correct selected variables in Table 5. However, the authors only provided the standard errors for RPEs in Table 1 and 4. I am wondering whether the authors could also provide standard errors for bias and number of (in)correct selected variables in other tables.

2. For number of (in)correct selected variables in Table 5, I would recommend using the mean and standard errors instead of median, just to be consistent with the previous 4 tables. Are there any reasons to use median here?

3. Similarly, in Table 6, please provide the standard error for the cross validation so that audience can know the variability of the metric.

Minor comments:

1. The 3rd paragraph in the Introduction section (From "One of the existing" to "for the coefficients") talks about the potential Bayesian understanding for the new approach . I would recommend moving it to the Discussion section or cut it shorter as it is beyond the function of the "Introduction" section.

2. I would strongly recommend the authors to make the code of new method publicly available so that people from a broader community can utilize.

3. The figures need to be improved. For example, in Fig 1, the scale has been distorted. In Fig 2, please get rid of the "_" in the figure labels.

4. In Table 1 to Table 6, I think the authors are providing the mean values. Please specify that instead of just putting "values" and let the readers turn back and forth to find out what the values mean.

Reviewer #2: Instead of using the originally proposed weight: w_j = 1 / |\\beta_j|^\\gamma in adaptive LASSO, the authors proposed to use w_j = (1-\\tau)*I(\\beta_j <= 0) + \\tau*I(\\beta_j > 0) as the new weight. The weight takes either \\tau or 1-\\tau depending on the sign of initial coefficient. Since the weight is between 0 and 1, the following problem is solved: if the absolute of the initial coefficient is less than 1, the weight will be large leading to increase in bias. The authors did extensive simulation studies and a real study, and showed that the proposed LQSSO outperforms LASSO and adaptive LASSO in terms of Relative Prediction Error of Y and estimation bias of \\beta. The work is well-organized and the numerical studies are informative. Comparing to the work of Zou (2006), I would like to see the followings addressed:

1. When \\tau=0.5, LQSSO is equivalent to LASSO. Since LASSO doesn't have oracle property in general, how could the oracle property of LQSSO be established? Is there some non-trivial necessary condition missing?

2. Why is the proposed weight superior? Or break down to (1) Why does considering the sign of initial coefficients help? e.g., Given \\tau=0.2, w=0.8 when the initial coefficient (usually OLS or Ridge estimates) is negative and w=0.2 when it is positive. (2) If the goal is to constrain the weight between 0 and 1, how about sth like w_j = 1 / (1 + exp(|\\beta|))?

3. Please provide literature or numerical or theoretical support for the claim: If the absolute of the initial coefficient is less than 1, the weight will be large leading to increase in bias.

4. Page 6: ''This is in contrast with other penalty proposed so far in which the response's quantile did not play a role in the penalty.'' So, how does LQSSO include Y's quantile in the penalty with w_j = (1-\\tau)*I(\\beta_j <= 0) + \\tau*I(\\beta_j > 0)?

5. Fig 1 is a nice illustration, while (1) Given one \\tau, suppose it is the \\tau selected after tuning, the constraint region is still a trapezoid, why does it provide more comprehensive insight than LASSO? (2) Please also plot a figure similar to Fig 1(c) of Zou (2006). I guess it is a horizontal shift of Fig 1(c). So this is related to Question 2, what does LQSSO gain or lose from the shift?

6. Please check statements and language, some are confusing. e.g., Page 7, last paragraph: ''when the initial weights for the coefficients in the adaptive LASSO are less than one'', what does it refer to, w_j or \\beta_j?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 16;18(2):e0266267. doi: 10.1371/journal.pone.0266267.r002

Author response to Decision Letter 0

2 Nov 2021

Responses to Reviewer #1

Major comments:

1) In the simulation studies section, the authors compared RPEs in Table 1 and 4, bias in Table 2 and 4, number of (in)correct selected variables in Table 5. However, the authors only provided the standard errors for RPEs in Table 1 and 4. I am wondering whether the authors could also provide standard errors for bias and number of (in)correct selected variables in other tables.

The standard errors have been reported for the biases and the number of (in)correct selected variables and they are added in the relevant tables.

2) For number of (in)correct selected variables in Table 5, I would recommend using the mean and standard errors instead of median, just to be consistent with the previous 4 tables. Are there any reasons to use median here?

Following Zou (2006), because the true numbers are inherently integer values, we prefer to report the median for the number of (in)correct selected variables. Moreover, as suggested, the standard errors of those numbers were also reported in the related tables.

3) Similarly, in Table 6, please provide the standard error for the cross validation so that audience can know the variability of the metric

The standard errors for the cross-validations were also reported in Table 6. We appreciate your suggesting this point.

Minor comments:

1) The 3rd paragraph in the Introduction section (From "One of the existing" to "for the coefficients") talks about the potential Bayesian understanding for the new approach. I would recommend moving it to the Discussion section or cut it shorter as it is beyond the function of the "Introduction" section.

Our Bayesian view of proposed idea became shorter in the Introduction, and more comments are added in the Conclusion section.

2) I would strongly recommend the authors to make the code of new method publicly available so that people from a broader community can utilize.

Thanks for this recommendation. However, we currently cannot share our code publicly because we’re still working on it for other research studies. Notably, we are improving the idea of quantile regression and mixed quantile regression with lqsso penalty.

3) The figures need to be improved. For example, in Fig 1, the scale has been distorted. In Fig 2, please get rid of the "_" in the figure labels.

The scales set in figure 1. Also, we corrected the order of \\tau=0.2, 0.5 and 0.8 in this figure. Also, we removed the “_” sign in the Y label of the current version of figure 2.

4) In Table 1 to Table 6, I think the authors are providing the mean values. Please specify that instead of just putting "values" and let the readers turn back and forth to find out what the values mean.

Those were good points. Now, we replaced the “values” by the “mean values” in Table 1 to Table 6 except in Table 5, which is, in fact, the median.

Responses to Reviewer #2

1) When \\tau=0.5, LQSSO is equivalent to LASSO. Since LASSO doesn't have oracle property in general, how could the oracle property of LQSSO be established? Is there some non-trivial necessary condition missing?

That is true that the Lqsso with \\tau=0.5 is equivalent to Lasso graphically but this is not in reality. The reason is that the lqsso is approximately equal to the lasso. Specifically, the main difference between the two is that lqsso consists of the indicator function of OLS coefficients in the penalty component, making it different from LASSO if \\tau=0.5.

2) Why is the proposed weight superior? Or break down to (1), Why does considering the sign of initial coefficients help? e.g., Given \\tau=0.2, w=0.8 when the initial coefficient (usually OLS or Ridge estimates) is negative and w=0.2 when it is positive. (2) If the goal is to constrain the weight between 0 and 1, how about sth like w_j = 1 / (1 + exp(|\\beta|))?

Compared with your proposed penalty w_j = 1 / (1 + exp(|\\beta|)), lqsso relies on check function (\\rho _{\\tau}(.)). So it has closed-form and many representations to be used in various circumstances such as implementing an algorithm, invoking in a geometrical view, recalling a Bayesian view, using for proof of Oracle property, and calculating KKT conditions (closed form) in the quantile regression with lqsso penalty. Other representations are as follows.

The signs show the direction of the polygon in the figure and OLS coefficients in the penalty part. More precisely, OLS coefficients are based on a density function f(y|X), so it considers the mass not only at the middle of the density but also at their two tails, especially when f(.) is asymmetric. Hence it is more flexible more than lasso and adaptive.

Plot C in Zou (2006) is as follows for lqsso (black points). Note that Zou (2006) indicated them by a line in his graph, but we preferred to plot them as points for better illustration purposes. Ali, it is not clear and lasso (Green line). It shows according to the \\tau value and its sign, lqsso line how much is closer to the red line and also on the positive or negative side.

Moreover, one can use the ALD as a prior distribution for the initial coefficients and obtain the estimations of the lqsso via computing the MAP estimator. This is the case if one considers the ALD as the prior distributions for the initial coefficients. Johnstone and Titterington (2009) also presented a Bayesian view to the lasso model by setting up the prior density for the coefficients as the Standard Laplace distribution. To do so, the zero coefficients will have a high probability compared to the non-zeros, and this feature would guarantee the sparsity. The plot given below shows this exciting property. To the best of our knowledge, the adaptive lasso has not been treated from a Bayesian perspective. We expect this gap can be filled via considering a re-scaled Standard Laplace distribution. This topic will be tackled in our future research.

Let us go back to your question, i.e., the effect of the signs. As mentioned, we should initially fix \\tau, e.g., via invoking the cross-validation technique. Now, assume that \\tau=0.8, where \\beta_1>0 and less than 1 (consequently w_1>1). Then, we could assign a probability for this true coefficient and a reasonable probability for zero coefficients, i.e., sparsity. This idea owns to the signs.

Moreover, our weights will still be between 0 and 1. Note that it advocates the highest probability for zero for the adaptive case, as seen in the plot.

As a summary, the table below indicates where our proposed method shows superiority compared with other methods.

Property Lasso Adaptive Lqsso

Oracle Yes Yes

Bayesian view and rationale Yes Yes

Use lasso algorithm Yes Yes Yes

Weights are based on random variables & observations Yes Yes

Geometry aspect Yes Yes

3) Please provide literature or numerical or theoretical support for the claim: If the absolute of the initial coefficient is less than 1, the weight will be large leading to increase in bias.

The rationale behind this claim includes the following sequential statements:

a) If the absolute of the initial coefficient is less than 1, the weight will be large.

b) In a regular regression (OLS), there is no penalty part, bias is low and variance is high. By using a penalty, we sacrifice bias (has increased) to reduce variance i.e., bias-variance tradeoff. Hence, with adding a penalty, bias increases.

c) According to Buhlmann and Van De Geer (2011), if |\\beta_ j,initial| is large, the adaptive Lasso employs a minor penalty (i.e. little shrinkage) for the jth coefficient (b_j) which implies less bias.

d) The topic has not already been appropriately addressed in the literature. Albeit, Zou (2006), B{\"u}hlmann and Van De Geer (2011) and Fan et al. (2009) suggested some easy solutions such as taking the initials coefficients equal to zero if that particular circumstance occurs.

So, considering a&b or (c&d, with less weights) support our claim.

4) Page 6: ''This is in contrast with other penalty proposed so far in which the response's quantile did not play a role in the penalty.'' So, how does LQSSO include Y's quantile in the penalty with w_j = (1-\\tau)*I(\\beta_j <= 0) + \\tau*I(\\beta_j > 0)?

We should point out that our proposed idea is not directly based on response's quantile. But, generally, the regression modeling set-up is related to the quantity E(y|X,\\beta,\\tau) which takes the randomness of Y into account. Moreover, we do see that the penalty term utilized the estimate of the OLS coefficients and these latter values are derived via the density function f(y|X), or directly speaking the randomness of Y. Hence, our considered check function, i.e. \\rho, inherits Y’s quantile.

5) Fig 1 is a nice illustration, while (1) Given one \\tau, suppose it is the \\tau selected after tuning, the constraint region is still a trapezoid, why does it provide more comprehensive insight than LASSO? (2) Please also plot a figure similar to Fig 1(c) of Zou (2006). I guess it is a horizontal shift of Fig 1(c). So this is related to Question 2, what does LQSSO gain or lose from the shift?

The answer to this comment is provided in replying to the comment 2.

6) Please check statements and language, some are confusing. e.g., Page 7, last paragraph: ''when the initial weights for the coefficients in the adaptive LASSO are less than one'', what does it refer to, w_j or \\beta_j?

The correct sentence is “when the initial coefficients for the weights in the adaptive lasso are less than one.” We corrected this in the revised version.

The best regards

Mousa

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(66.9KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0266267.r003

Decision Letter 1

Xiaoyu Song

26 Dec 2021

PONE-D-21-27670R1Regression Shrinkage and Selection via Least Quantile Shrinkage and Selection OperatorPLOS ONE

Dear Dr. Golalizadeh,

Some of the responses provided in this revision are not well organized, vaguely stated, and hard to understand. None of the important points (Reviewer 2 Questions 1-4) are addressed in the manuscript. I also suggest they have a professional editing of the language, making the statements clear and coherent.

Please submit your revised manuscript by 5/22/2022. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Xiaoyu Song

Academic Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: No

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: Thanks for the explanations in the response. Overall, it would be better if (1) The explanations in response could be more organized and polished, e.g., no need to include discussions between authors, and please explain everything clearly and to the very point (especially for Answers 1-3). (2) Most of the explanations are informative and convincing, thus could be addressed or incorporated in the main text or supplement, so that readers like me will be convinced.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2023 Feb 16;18(2):e0266267. doi: 10.1371/journal.pone.0266267.r004

Author response to Decision Letter 1

8 Feb 2022

As we understood, the Reviewer 1is fully satisfied with the changes made in our first revised paper as well as the responses to her/his comments. The Reviewer 2, but, was mostly concerned about the questions (we assume those were default of the PlOS-One) 2 and 6 in your previous email. She (he) was partly happy with the question 2 “2. Is the manuscript technically sound, and do the data support the conclusions?”. Hence, we attempted to improve the paper to fit with this request. Considering the question 6 “Review Comments to the Author “, we tried to fit ourselves with her/his request in the points (1) and (2) and set all necessary materials in the paper, particularly where the concepts should have been cleared. We hope the changes made in those relevant pages, sections and expressions are well enough to convince her/him.

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(15.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0266267.r005

Decision Letter 2

Xiaoyu Song

18 Mar 2022

Regression Shrinkage and Selection via Least Quantile Shrinkage and Selection Operator

PONE-D-21-27670R2

Dear Dr. Golalizadeh,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Xiaoyu Song

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS One. doi: 10.1371/journal.pone.0266267.r006

Acceptance letter

Xiaoyu Song

29 Sep 2022

PONE-D-21-27670R2

Regression shrinkage and selection via least quantile shrinkage and selection operator

Dear Dr. Golalizadeh:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Xiaoyu Song

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Plot of the geometry of the lqsso with different τ.

(TIFF)

Click here for additional data file.^{(13.3KB, tiff)}

S2 Fig. Plot of a comparison among the lqsso, lasso, and adaptive lasso.

Plot of the lqsso penalty in comparison with the lasso and adaptive lasso penalty according to the plot C in Zou [6].

(TIFF)

Click here for additional data file.^{(14.2KB, tiff)}

S3 Fig. Plot of the Bayesian view of the lqsso penalty.

Plot of the lqsso penalty in comparison with the lasso and adaptive lasso penalty based on Bayesian approach.

(TIFF)

Click here for additional data file.^{(11.2KB, tiff)}

S4 Fig. The RPEs comparison of the three methods.

Compare methods in terms of the RPE criterion using different models, sample sizes (n) and the standard deviation (sd) of the error term.

(PNG)

Click here for additional data file.^{(14.8KB, png)}

S5 Fig. The biases comparison of the three methods.

To compare methods in terms of the bias criterion using different models, sample sizes (n) and the standard deviation (sd) of the error term.

(TIFF)

Click here for additional data file.^{(15.7KB, tiff)}

S6 Fig. The initial ridge coefficients.

The plot shows the ridge estimates for the coefficients of the corresponding ridge regression model.

(TIFF)

Click here for additional data file.^{(371.6KB, tiff)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(66.9KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(15.7KB, docx)}

Data Availability Statement

https://cran.r-project.org/web/packages/flare/index.html.

[pone.0266267.ref001] 1. Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York: Springer Series in Statistics; 2001. [Google Scholar]

[pone.0266267.ref002] 2. Breiman Leo. Better subset regression using the nonnegative garrote. Technometrics. 1995. Nov;37(4):373–384. doi: 10.1080/00401706.1995.10484371 [DOI] [Google Scholar]

[pone.0266267.ref003] 3. Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970. Feb;12(1):55–67. doi: 10.1080/00401706.1970.10488634 [DOI] [Google Scholar]

[pone.0266267.ref004] 4. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological). 1996. Jan;58(1):267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x [DOI] [Google Scholar]

[pone.0266267.ref005] 5. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: series B (Statistical Methodology). 2005. Apr;67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]

[pone.0266267.ref006] 6. Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006. Dec;101(476):1418–1429. doi: 10.1198/016214506000000735 [DOI] [Google Scholar]

[pone.0266267.ref007] 7. Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2011. Jun;73(3):273–282. doi: 10.1111/j.1467-9868.2011.00771.x [DOI] [Google Scholar]

[pone.0266267.ref008] 8. Bühlmann P, Van De Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. Berlin: Springer Science & Business Media; 2011. [Google Scholar]

[pone.0266267.ref009] 9. Fan J, Feng Y, Wu Y. Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics. 2009. Jun;3(2):521–541. doi: 10.1214/08-AOAS215SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0266267.ref010] 10. Koenker R, Machado JA. Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association. 1999. Dec;94(448):1296–1310. doi: 10.1080/01621459.1999.10473882 [DOI] [Google Scholar]

[pone.0266267.ref011] 11. Koenker R, Bassett G Jr. Regression quantiles. Econometrica: Journal of the Econometric Society. 1978. Jan;46(1):33–50. doi: 10.2307/1913643 [DOI] [Google Scholar]

[pone.0266267.ref012] 12. Wu Y, Liu Y. Variable selection in quantile regression. Statistica Sinica. 2009. Apr;19(2):801–817. [Google Scholar]

[pone.0266267.ref013] 13. Ciuperca Gabriela. Adaptive LASSO model selection in a multiphase quantile regression. Statistics. 2016. Sep;50(5):1100–1131. doi: 10.1080/02331888.2016.1151427 [DOI] [Google Scholar]

[pone.0266267.ref014] 14. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001. Dec;96(456):1348–1360. doi: 10.1198/016214501753382273 [DOI] [Google Scholar]

[pone.0266267.ref015] 15. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010. Aug;33(1):1–22. doi: 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0266267.ref016] 16. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. The Annals of Statistics. 2004. Apr;32(2):407–499. doi: 10.1214/009053604000000067 [DOI] [Google Scholar]

[pone.0266267.ref017] 17. Geyer CJ. On the asymptotics of constrained M-estimation. The Annals of Statistics. 1994. Dec;22(4):1993–2010. doi: 10.1214/aos/1176325768 [DOI] [Google Scholar]

[pone.0266267.ref018] 18. Fu W, Knight K. Asymptotics for lasso-type estimators. The Annals of Statistics. 2000. Oct;28(5):1356–1378. doi: 10.1214/aos/1015957397 [DOI] [Google Scholar]

[pone.0266267.ref019] 19. Scheetz TE, Kim KY, Swiderski RE, Philp AR, Braun TA, Knudtson KL, et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences. 2006. Sep;103(39):14429–14434. doi: 10.1073/pnas.0602562103 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0266267.ref020] 20. Wang S, Xiang L. Two-layer EM algorithm for ALD mixture regression models: A new solution to composite quantile regression. Computational Statistics & Data Analysis. 2017. Nov;115:136–154. doi: 10.1016/j.csda.2017.06.002 [DOI] [Google Scholar]

[pone.0266267.ref021] 21. Li X, Zhao T, Yuan X, Liu H. The flare package for high dimensional linear regression and precision matrix estimation in R. J. Mach. Learn. Res. 2015. Mar;16:553–557. [PMC free article] [PubMed] [Google Scholar]

[pone.0266267.ref022] 22. Johnstone IM, Titterington DM. Statistical challenges of high-dimensional data. Philosophical Transactions of the Royal Society A. 2009. Nov;367(1906):4237–4253. doi: 10.1098/rsta.2009.0159 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Regression shrinkage and selection via least quantile shrinkage and selection operator

Alireza Daneshvar

Golalizadeh Mousa

Roles

Abstract

Introduction

Lqsso; a novel method of variable selection

Definition

The geometry of lqsso

The algorithm of lqsso

Oracle properties of lqsso

Simulation studies

Table 1. The mean values of RPEs for Model 1 and Model 2.

Table 2. The mean values of bias for Model 1 and Model 2.

Table 3. The mean values of RPEs for Model 3, Model 4 and Model 5.

Table 4. The mean values of bias for Model 3, Model 4 and Model 5.

Table 5. Median of the number of (in)correctly selected variables for Model 5.

Real data analysis

Table 6. Estimated mean values for the coefficients in rat eye dataset.

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Xiaoyu Song

Roles

Author response to Decision Letter 0

Decision Letter 1

Xiaoyu Song

Roles

Author response to Decision Letter 1

Decision Letter 2

Xiaoyu Song

Roles

Acceptance letter

Xiaoyu Song

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases