Concave 1-norm group selection

Dingfeng Jiang; Jian Huang

doi:10.1093/biostatistics/kxu050

. 2014 Nov 21;16(2):252–267. doi: 10.1093/biostatistics/kxu050

Concave 1-norm group selection

Dingfeng Jiang ^1,^*, Jian Huang ²

PMCID: PMC4441102 PMID: 25417206

Abstract

Grouping structures arise naturally in many high-dimensional problems. Incorporation of such information can improve model fitting and variable selection. Existing group selection methods, such as the group Lasso, require correct membership. However, in practice it can be difficult to correctly specify group membership of all variables. Thus, it is important to develop group selection methods that are robust against group mis-specification. Also, it is desirable to select groups as well as individual variables in many applications. We propose a class of concave Inline graphic -norm group penalties that is robust to grouping structure and can perform bi-level selection. A coordinate descent algorithm is developed to calculate solutions of the proposed group selection method. Theoretical convergence of the algorithm is proved under certain regularity conditions. Comparison with other methods suggests the proposed method is the most robust approach under membership mis-specification. Simulation studies and real data application indicate that the Inline graphic -norm concave group selection approach achieves better control of false discovery rates. An R package grppenalty implementing the proposed method is available at CRAN.

Keywords: Bi-level selection, Concave penalties, Coordinate descent, Sparse group Lasso, p > n problems

1. Introduction

Grouping structures exist in many high-dimensional problems. For example, genes in the same biological pathway naturally form a group. In genome-wide association studies, single-nucleotide polymorphisms (SNP) from the same exon can also be considered as a group. Typically, variables of the same membership share similar characteristics. Hence higher within-group correlations are often observed for group members. Incorporation of group information can improve model fitting and leads to better interpretation. Possible applications of grouped approaches includes but not limited to (i) genetic studies assessing association between biomarkers (such as gene expression level, SNP mutation, and copy number variation) and phenotypes of interest; (ii) studies using multiple questions (instruments) to measure a particular feature. For example, multiple cognitive tests are generally used in Alzheimer's studies to quantify the cognitive function. The membership of variables can be determined either by analytical methods or knowledge from field science.

Yuan and Lin (2006) proposed the group Lasso for group selection. Meier and others (2008) extended the method to logistic regression and applied it to detect splice sites in DNA sequences. Breheny and Huang (2009) proposed a class of bi-level selection methods using concave composite penalties. Huang and others (2009) proposed a group bridge approach for group and individual variable selection. Friedman and others (2010) and Simon and others (2012) proposed the sparse group Lasso (SGL). The SGL bridges the individual selection feature of the Lasso and the group selection nature of the group Lasso via a convex combination. Huang and others (2012) reviewed several group selection methods, including the Inline graphic -norm concave group selection methods. Both the group Lasso and the concave -norm group penalty select variables at group level, that is, the members of a group are either all selected or dropped. Therefore, grouping structure has a great impact on results. True grouping structure is, however, difficult to specify or not available in many applications. Hence, it is important to develop a robust group selection method with respect to possible mis-specified membership.

We propose a class of concave Inline graphic -norm group selection methods for high-dimensional linear and generalized linear models when number of covariates can exceed sample size. These methods have two attractive features. First, they are capable of selecting variables at both group and individual levels, that is, they have the bi-level selection property. Second, they are robust against possible mis-specified grouping structure. These methods can be efficiently implemented via a coordinate descent type algorithm. Our convergence analysis shows that this algorithm is guaranteed to converge to a minimum of the objective function.

The rest of the article organizes as follows. Section 2 first provides a brief review of related penalties, then proposes the concave Inline graphic -norm group penalty. Section 2.3 shows the robustness by two examples and establishes the bi-level selection feature by two propositions. Section 3 details the computation of the concave -norm group penalized solution using the coordinate descent algorithm (CDA). Section 3.3 extends the concave Inline graphic -norm group penalty to GLMs and develops algorithm for computing solutions in GLMs based on majorization minimization (MM) approach and CDA. Section 3.4 establishes the theoretical convergence of the CDA for linear and GLMs. Section 4 performs simulation studies to understand the robustness of the concave Inline graphic -norm group penalty and compare it with the concave -norm group penalty and the SGL. A comprehensive comparison with related methods is also conducted to study the empirical behavior of the proposed method. Section 5 applies the -norm group penalty to a motivation example and compares the results with other methods. Section 6 concludes the article by discussion.

2. Methods

2.1. A brief review of group penalties

We briefly review the existing group selection methods, namely the group Lasso, the concave Inline graphic -norm group penalty and the sparse group Lasso (SGL). Denote the coefficients of a group of variables as and let be its dimensionality. The group Lasso (Yuan and Lin, 2006) is defined as

(2.1)

with Inline graphic being the Euclidean norm. When the group size , the group Lasso reduces to the Lasso penalty. By imposing a concave penalty on the Euclidean norm of , Huang and others (2012) proposed the concave -norm group penalty, which has the form as

(2.2)

The concave Inline graphic -norm group penalty reduces to the standard concave penalty when . The group Lasso can be viewed as a special case of the -norm concave group penalty with . Both the group Lasso and concave -norm group penalty rely on non-differentiability of to perform group only selection. Tuning parameter Inline graphic controls model sparsity. As group selection procedures, both the group Lasso and -norm concave group penalty are sensitive to specified membership. The SGL (Friedman and others, 2010; Simon and others, 2012) uses the penalty function

(2.3)

with Inline graphic , i.e., the norm of and coefficient of individual group member. Convex combination of the group Lasso and Lasso imposes both group and individual sparsity. Tuning parameter controls degree of sparsity and controls weight of the group Lasso and Lasso. The SGL becomes the Lasso when Inline graphic and the group Lasso when . According to our results, the SGL is also sensitive to mis-specified group information. This may due to the group Lasso component. The SGL method seems to prefer a larger model comparing to the -norm group penalty, which could be related to rate consistency property of the Lasso under the sparse Riesz condition (Zhang and Huang, 2008).

2.2. Concave 1-norm group penalty

Consider a linear regression model, Inline graphic , where is a response vector, is an design matrix and is error terms. Here is the coefficient vector. We are interested in cases where and is sparse in the sense that many of its elements are zero. Denote the th covariate vector by . Without loss of generality, we assume the response is centered and the covariates are standardized so that Inline graphic . We also assume that covariates are divided into groups and size of the th group is . Denote the coefficients of the th group and corresponding design matrix .

The concave Inline graphic -norm group penalized least squares criterion is defined as

(2.4)

The penalty level for the Inline graphic th group is , which is proportional to its group size. This avoids the situation where large groups overwhelm small groups.

Multiple penalties can be chosen for Inline graphic . We use the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) and the minimax concave penalty (MCP) (Zhang, 2010). Both the SCAD and MCP reduce degree of penalization gradually for large coefficients. Such (nearly) unbiased estimation of coefficients enables the SCAD and MCP to correctly select important variables and estimate their coefficients with high probabilities under certain sparsity conditions and other appropriate regularity conditions, a property known as the oracle property. The SCAD penalty function is Inline graphic with and . Here is the indicator function and denotes the non-negative part of . The MCP is defined as for and . Regularization parameter controls concavity of both penalties, with smaller being more concave. When , both SCAD and MCP reduce to the Lasso penalty. Throughout the article, we brief the Inline graphic -norm group SCAD and MCP as the gSCAD and gMCP.

Figure 1 shows solution paths of the Inline graphic -norm and -norm gMCP, group Lasso, and SGL for a simple example. The example has four groups of variables, with the first group having coefficients and the rest three all having zero coefficients. The bold solid line is the path of the zero coefficient (the 1st member) and the bold dash lines are the paths of the non-zero coefficients (the 2nd and 3rd member) in the 1st group. The dotted lines are the paths of the rest variables. The Inline graphic -norm gMCP and the SGL have bi-level selection features with a proper , while the -norm gMCP and the group Lasso perform group selection only.

Fig. 1. — Solution paths of different group selection methods. The bold solid line is the path of the zero coefficient (the 1st member) and the bold dash lines are the paths of the non-zero coefficients (the 2nd and 3rd member) in the first group. The dotted lines are the paths of the rest variables. The -norm gMCP and the SGL have bi-level selection features with a proper , while the -norm gMCP and group Lasso perform group selection only.

2.3. Properties of the concave 1-norm group penalty

2.3.1. Robustness to mis-specified grouping structure

An advantage of the concave 1-norm group penalty is its robustness to mis-specified group information. This property is closely related to the bi-level selection feature discussed later. Group selection method, such as the 2-norm group penalty, has only two possible estimates, Inline graphic or . This obviously puts the 2-norm group penalty in a disadvantage position when some null variables are mis-grouped with one or more non-zero variables. The method either misses non-zero variables or falsely identifies null variables as causal ones.

Table 1 illustrates limitation of the Inline graphic gMCP in some settings. Example A shows that the gMCP fails to identify two null variables, x6–x7, when they are mis-grouped with a causal variable x5. This leads to the false discovery of x6–x7 as causal variables by the gMCP method, while the gMCP identifies x6–x7 correctly as null variables. Example B shows when a null variable x4 and a causal one x5 are mis-grouped with a causal one x6, the Inline graphic gMCP fails to identify x5–x6 as causal variables, while the gMCP fails to identify x6 as causal variable.

Table 1.

Two examples

		True	Working		gMCP	gMCP
Example	Variables	structure	structure	True value	estimate	estimate
A	x1	1	1	0.4	0.285	0.311
	x2	1	1	0.4	0.382	0.416
	x3	1	1	0.4	0.390	0.424
	x4	1	1	0.3	0.297	0.327
	x5	1	2	0.2	0.214	0.156
	x6	2	2	0	-0.023	0
	x7	2	2	0	-0.041	0
B	x1	1	1	0.3	0.227	0.212
	x2	1	1	0.3	0.338	0.305
	x3	1	1	0.3	0.345	0.309
	x4	1	2	0	0	0
	x5	1	2	0.1	0	0.093
	x6	2	2	0.05	0	0

Open in a new tab

Example A shows the Inline graphic gMCP falsely identifies the null variables – as causal ones due to mis-specification. Example B shows that the gMCP misses the causal variables –.

2.3.2. Bi-level selection feature

The following proposition shows that the concave Inline graphic -norm group penalty could have zero and non-zero solutions within a group under proper conditions. Thus the method has bi-level selection feature. Note that the right hand side of second expression could be zero. Under that scenario, the proposed penalty performs group selection only. Therefore, bi-level selection of the concave Inline graphic -norm group penalty requires a proper . We prefer a data-driven approach to select an optimal .

Proposition 1 (Bi-level selection) —

Let be the solution of the concave -norm group penalized regression as defined in (2.4), then a necessary condition for to be a minimizer is that

(2.5)

where is the first derivative of w.r.t. .

Proposition 2 (Invariance property of the gMCP) —

Given a group of standardized variables with size of , the gMCP has the following invariance property,

(2.6)

Notice that the Inline graphic gSCAD does not have the invariance property. Proof of both propositions is simple and thus skipped.

2.3.3. Model sparsity of the 1-norm group penalty

The following proposition shows that at the same penalty level Inline graphic , the concave 1-norm group penalty has a higher group sparsity than the 2-norm group penalty. That means in order to achieve the same level of group sparsity, we need a larger for the concave 2-norm group penalty.

Proposition 3 (Model sparsity of 1-norm group penalty) —

Let be the coefficients of a group of variables with dimensionality , then given the same penalty level , implies .

This proposition holds because Inline graphic by the Cauchy–Schwarz inequality.

3. Computation

3.1. Coordinate descent algorithm

Over the last few years, CDA has been shown to be an efficient approach for solving high-dimensional penalized regression problems such as the Lasso (Wu and Lange, 2008; Friedman and others, 2007, 2010). We apply the idea of CDA to compute the solutions of the concave 1-norm group selection problems.

Let Inline graphic . We want to update to and update to within the th group. That is we want to update to using the proceeding notation. CDA minimizes the criterion function (2.4)

(3.1)

as a function of Inline graphic . The solution of for the gSCAD and gMCP are

(3.2)

(3.3)

where Inline graphic with , . The notation for is the soft-thresholding operator (Donoho and Johnstone, 1994). The solution form of (3.2) and (3.3) resembles a simple soft-thresholding operator if we set . The reflects the grouping effect in the penalty.

CDA for concave -norm group penalty: We summarize the CDA for computing the solution of the concave Inline graphic -norm group penalized regression as follows:

Given an initial value of , CDA computes the corresponding residual .
For , CDA updates to by using (3.2) or (3.3) for the th coordinate of the th group. Then repeat the same process for the other groups until is updated to .
CDA checks the convergence criterion. If the algorithm converges then CDA stops iterations, otherwise it repeats Step 2 until the algorithm converges.

3.2. Solution surface

It is a common practice to compute a solution path for a sequence of Inline graphic with a chosen when applying the standard concave penalties. For example, in linear models it has been suggested one uses for the SCAD (Fan and Li, 2001) and (Zhang, 2010) for the MCP. For the proposed group penalties, it is not clear which is appropriate. Therefore, we treat and Inline graphic both as tuning parameters and compute solution surface over a rectangle of .

Let Inline graphic and the grid values of a rectangle in to be and . The number of grid points and are pre-specified with . It can be shown that . We let , with if and otherwise. Denote the solution corresponding to as . We first compute with as the initial value. Then for a given , we compute Inline graphic by using as the initial value. The solution surface calculated in this manner is referred as the solution surface along . In general, it provides a smoother fit than other alternatives. For more details of the solution surface along , we refer to Mazumder and others (2011) and Jiang and Huang (2014).

3.3. Extension to the generalized linear models

The concave Inline graphic -norm group penalty can be easily extended to other models by using different loss. In this article, we extend it to the GLM family, with focus on logistic model. For a GLM model, the criterion is defined as

(3.4)

with Inline graphic being the vector of the th observation. The form depends on the specified GLM. For a logistic model . Direct application of CDA is possible but not stable for large in GLMs. Hence, we apply MM approach together with CDA to compute solutions of (3.4). The main idea of MM approach is to optimize a majorization function of Inline graphic such that each iteration forces downward until numerical minimum is reached. For more details about MM method, we refer to Hunter and Lange (2004), Hunter and Li (2005), and Lange and others (2000).

We assume the following two conditions hold in order to apply MM approach.

The second partial derivative of loss w.r.t. is uniformly bounded for standardized , i.e., there exists a real number such that for all , and .
, with being the second derivative of w.r.t .

For a logistic model, the condition (i) can be met by choosing Inline graphic . The condition (ii) is met by choosing for the gSCAD and for the gMCP. Some calculation shows that the coordinate-wise solution forms in GLM are as follows:

(3.5)

(3.6)

where Inline graphic , and being the first derivative of .

3.4. Convergence analysis

Theorem 3.1 establishes that under certain regularity conditions, CDA converges to a minimum of the objective function (2.4) for a concave Inline graphic -norm group penalized linear model. Theorem 3.2 states that the solution computed by the CDA and MM approach converges to a minimum of the objective functions for a concave -norm group penalized GLM. The proof of both theorems are provided in Appendix of supplementary material available at Biostatistics online.

Theorem 3.1 (Convergence in linear model) —

Consider the objective function (2.4), where the given data lies on a compact set and no two columns of are identical. Suppose the penalty satisfies , is non-negative, uniformly bounded, with being the first derivative (assuming existence) of w.r.t .

Then the sequence generated by the CDA converges to a minimum of the function defined in (2.4).

Theorem 3.2 (Convergence in GLM) —

Consider the objective function (3.4), where the given data lie on a compact set and no two columns of are identical. Suppose the penalty satisfies , is non-negative, uniformly bounded, with being the first derivative (assuming existence) of w.r.t. . Also assume two conditions listed below hold.

The second partial derivative of loss w.r.t. is uniformly bounded for standardized , i.e., there exists a real number such that for all , and .

, with being the second derivative of w.r.t. .

Then the sequence generated by the aforementioned algorithm converges to a minimum of the function defined in (3.4).

4. Simulation studies

We first compare the Inline graphic -norm gMCP, -norm gMCP, and SGL under group mis-specification. A comprehensive comparison between the -norm gMCP and related penalties is then presented under correct group information. For both simulation studies, the penalized covariates . To avoid the complexity of tuning parameter selection, we use a validation approach to select final model for comparison. That is for each Inline graphic , we compute a predictive measure based on a validation dataset with . For a linear regression, we use the predictive mean square error (PMSE) defined as . For a logistic regression, we first compute the predictive probability by . Then based on , we compute the predictive area under ROC curve (PAUC). For details of computing PAUC, we refer to Jiang and others (2013). The Inline graphic corresponding to the smallest PMSE and the largest PAUC are selected for comparison across different methods.

4.1. Simulation with group mis-specification

Set Inline graphic , and , with , , , , and zero for the rest coefficients. Let , with being the covariance matrix for groups 1 and 2 and the covariance matrix for groups 3 and 4, and being the covariance matrix for group . We set such that within-group correlation is 0.5 and between-group correlation is Inline graphic for groups 1 and 2. Similarly, such that within-group correlation and between-group correlation are both 0.5 for groups 3 and 4. For , we choose a compound symmetry structure with . The working group information is mis-specified in the sense that the causal variables X9–X10 are grouped with the null variables X11–X20 and the causal variables X29–X30 are grouped with the null variables X31–X40. Hence, Inline graphic , , , and for the working group information.

Table 2 presents the results in linear and logistic models. We report the false discovery rate (FDR) as well as the percentage of X9–X20 and X29–X40 being selected over the Inline graphic replications. The results show that the gMCP avoids the false-positive discovery of X11–X20 and X31–X40; hence, it achieves the lowest FDR. We did not report the result of the Lasso due to space limit and its similar performance with the SGL ().

Table 2.

Comparison of the Inline graphic gMCP, gMCP, and SGL with mis-specified group information

Model	Results	gMCP	gMCP	SGL	SGL	SGL	SGL
Linear	FDR	0.272	0.613	0.544	0.544	0.520	0.453
	Pct. X9	0.872	0.714	0.984	0.984	0.994	0.988
	Pct. X10	0.856	0.714	0.984	0.982	0.992	0.984
	Pct. range X11–X20	0.182–0.208	0.714	0.972–0.980	0.926–0.944	0.744–0.812	0.430–0.480
	Pct. X29	0.866	0.684	0.968	0.976	0.996	0.982
	Pct. X30	0.860	0.684	0.968	0.976	0.994	0.976
	Pct. range X31–X40	0.172–0.218	0.684	0.956–0.964	0.910–0.940	0.716–0.804	0.390–0.472
Logistic	FDR	0.167	0.369	0.452	0.438	0.429	0.477
	Pct. X9	0.274	0.372	0.652	0.654	0.686	0.730
	Pct. X10	0.260	0.372	0.654	0.660	0.684	0.724
	Pct. range X11–X20	0.054–0.100	0.372	0.634–0.652	0.594–0.640	0.534–0.596	0.364–0.422
	Pct. X29	0.284	0.372	0.624	0.620	0.656	0.704
	Pct. X30	0.288	0.372	0.624	0.620	0.648	0.692
	Pct. range X31–X40	0.064–0.092	0.372	0.606–0.620	0.578–0.604	0.530–0.588	0.348–0.428

Open in a new tab

The FDR, the percentage of X Inline graphic –X and X–X being selected, is reported. Causal variables are X–X (mis-grouped with the null variables X–X) and X–X (mis-grouped with the null variables X–X). The gMCP achieves the smallest FDR.

4.2. A comparison with related methods

We compare the proposed 1-norm group penalty with the SGL, group Lasso, the standard concave penalty and the 2-norm concave group penalty in this subsection.

4.2.1. Simulation models

Set Inline graphic and . The is a compound symmetric matrix with , representing a background correlation among predictors. The is a compound symmetric covariance matrix of the th group with as a median level of within-group correlation. We consider two scenarios (1) equal group size, with for , and (2) unequal group size with Inline graphic for , and for . For coefficients, set for and , with being a vector of length . The value of is chosen such that signal-to-noise ratio (SNR) is approximately in the range of . We consider five types of as listed below to represent five settings,

, representing a situation where effects of group members are relative small but similar.
, representing a situation where effects of some group members are small but not zero.
, representing a situation where only one or two members have strong effect with other members have small effect.
, representing a situation where effects of group members are median with some null members having zero coefficients.
, representing a situation where only one or two members have strong effect with other members having small or zero coefficients.

Denote the causal variables set as Inline graphic with dimension , and the estimated version as with dimension . Define the set of false-positive variables as and with dimension . Similar concepts are defined at group level. Let the causal group sets with dimension , and the estimated version as with dimension . Denote the set of false-positive groups as Inline graphic and with dimension . We report our results in terms of model size (), false discovery rate (), group model size (), and group false discovery rate () to evaluate selection performance together with PMSE/PAUC.

Tables 3 and 4 present the results from 500 replications in linear and logistic models under five different settings. For the sake of space, we only report the results with unequal group size with Inline graphic . For the same reason, we only report methods based on MCP penalty due to the similarity between SCAD and MCP penalties. The computation of the -norm and -norm group penalties and the group Lasso is done by the R package grppenalty, and the SGL is done by the package SGL. Below we provided a summary of major findings.

Table 3.

Comparison of the concave Inline graphic -norm group penalties with other methods in linear models

			PMSE	GMS	GFDR	MS	FDR
Setting	SNR	Method	()	()	()	()	()
1	2.88	gMCP	1.24 (2.6)	5.49 (0.6)	0.06 (6.0)	60.57 (0.7)	0.01 (1.0)
		SGL	2.01 (6.0)	19.58 (1.7)	0.73 (2.8)	78.84 (4.5)	0.35 (3.4)
		SGL	1.65 (4.8)	8.2 (1.0)	0.35 (7.5)	83.89 (8.4)	0.28 (6.6)
		SGL	1.55 (4.6)	6.6 (0.7)	0.21 (7.1)	79.62 (8.5)	0.21 (7.2)
		Lasso	1.91 (6.5)	36.82 (1.0)	0.86 (0.4)	133.76 (7.7)	0.59 (2.0)
		MCP	1.91 (6.5)	36.78 (1.1)	0.86 (0.5)	133.53 (7.8)	0.59 (2.1)
		Group Lasso	1.56 (5.2)	22.22 (0.6)	0.77 (0.7)	277.37 (7.4)	0.78 (0.6)
		gMCP	1.24 (2.6)	5.62 (0.6)	0.08 (6.1)	67.33 (7.0)	0.08 (6.1)
2	2.61	gMCP	1.24 (2.7)	5.53 (0.6)	0.06 (6.1)	60.61 (0.7)	0.01 (1.1)
		SGL	1.66 (5.6)	20.16 (1.8)	0.74 (2.8)	69.03 (4.8)	0.42 (3.8)
		SGL	1.52 (4.8)	10.78 (1.4)	0.50 (6.8)	96.06 (11.0)	0.44 (6.5)
		SGL	1.59 (5.2)	9.06 (1.1)	0.41 (7.3)	108.31 (13.9)	0.41 (7.5)
		Lasso	1.59 (5.2)	35.60 (1.1)	0.86 (0.5)	109.39 (6.1)	0.62 (2.0)
		MCP	1.51 (4.7)	26.86 (3.2)	0.79 (4.4)	72.28 (10.5)	0.47 (5.5)
		Group Lasso	1.54 (5.2)	22.15 (0.7)	0.77 (0.8)	276.42 (8.0)	0.78 (0.7)
		gMCP	1.24 (2.6)	5.74 (0.7)	0.09 (6.7)	68.89 (8.8)	0.09 (6.8)
3	2.76	gMCP	1.24 (2.6)	5.44 (0.9)	0.04 (5.5)	59.69 (1.7)	0.01 (2.0)
		SGL	1.46 (4.2)	16.21 (1.8)	0.67 (4.2)	51.27 (4.5)	0.39 (4.6)
		SGL	1.48 (4.6)	10.92 (1.4)	0.50 (6.8)	92.53 (11.4)	0.47 (6.7)
		SGL	1.69 (6.3)	9.95 (1.2)	0.46 (6.8)	118.77 (15.4)	0.46 (6.9)
		Lasso	1.40 (3.6)	32.44 (1.4)	0.84 (0.7)	83.70 (4.9)	0.61 (2.2)
		MCP	1.28 (2.0)	15.99 (2.7)	0.62 (8.7)	35.15 (6.2)	0.36 (6.3)
		Group Lasso	1.55 (5.3)	22.03 (0.7)	0.77 (0.8)	275.23 (8.5)	0.78 (0.8)
		gMCP	1.24 (2.6)	5.74 (0.7)	0.09 (7.0)	68.95 (9.3)	0.08 (7.1)
4	2.55	gMCP	1.24 (2.7)	5.57 (0.6)	0.07 (6.4)	60.66 (0.8)	0.41 (0.7)
		SGL	1.64 (5.4)	21.04 (1.8)	0.75 (2.4)	69.83 (4.9)	0.52 (3.3)
		SGL	1.50 (4.7)	11.58 (1.5)	0.53 (6.3)	100.72 (11.8)	0.62 (4.5)
		SGL	1.58 (5.1)	9.64 (1.2)	0.44 (7.1)	115.08 (14.9)	0.66 (4.4)
		Lasso	1.58 (5.1)	35.62 (1.1)	0.86 (0.5)	107.89 (6.1)	0.68 (1.6)
		MCP	1.50 (4.7)	26.44 (3.4)	0.78 (5.1)	69.95 (10.3)	0.50 (6.4)
5	2.71	gMCP	1.24 (2.6)	5.55 (1.1)	0.05 (6.1)	59.62 (1.9)	0.40 (1.4)
		SGL	1.44 (4.0)	16.97 (1.8)	0.68 (4.1)	51.21 (4.6)	0.51 (4.1)
		SGL	1.46 (4.5)	11.72 (1.5)	0.54 (6.3)	97.23 (12.1)	0.65 (4.5)
		SGL	1.67 (6.2)	10.63 (1.4)	0.49 (6.6)	126.79 (16.9)	0.69 (4.2)
		Lasso	1.39 (3.5)	32.30 (1.4)	0.84 (0.7)	80.96 (5.0)	0.68 (1.8)
		MCP	1.25 (1.9)	13.89 (2.8)	0.54 (11.1)	29.79 (5.8)	0.34 (7.9)

Open in a new tab

PMSE is the predictive mean square error, MS is model size, FDR is false discovery rate, GMS is group model size, and GFDR is the group false discovery rate. SE is the standard error computed from Inline graphic replications.

Table 4.

Comparison of the concave Inline graphic -norm group penalties with other methods in logistic models

			PAUC	GMS	GFDR	MS	FDR
Setting	SNR	Method	()	()	()	()	()
1	2.88	gMCP	0.851 (0.78)	8.80 (2.1)	0.34 (10.8)	61.97 (8.7)	0.16 (7.0)
		SGL	0.841 (0.59)	16.94 (2.0)	0.68 (4.1)	53.5 (5.4)	0.39 (4.7)
		SGL	0.872 (0.46)	10.17 (1.4)	0.46 (7.4)	89.84 (11.0)	0.41 (6.9)
		SGL	0.879 (0.43)	10.48 (1.3)	0.49 (6.3)	125.35 (15.9)	0.49 (6.4)
		Lasso	0.832 (0.63)	21.42 (2.3)	0.75 (3.0)	52.10 (5.5)	0.43 (4.1)
		MCP	0.832 (0.63)	21.42 (2.3)	0.75 (3.0)	52.10 (5.5)	0.43 (4.1)
		Group Lasso	0.838 (0.74)	12.73 (1.5)	0.58 (5.2)	155.92 (19.6)	0.59 (5.3)
		gMCP	0.857 (0.70)	6.78 (1.2)	0.20 (9.6)	81.55 (15.2)	0.20 (9.7)
2	2.61	gMCP	0.832 (0.83)	9.98 (2.7)	0.37 (12.0)	58.32 (8.7)	0.19 (8.3)
		SGL	0.828 (0.67)	16.91 (2.3)	0.67 (4.8)	48.2 (6.4)	0.44 (5.5)
		SGL	0.849 (0.53)	12.62 (1.6)	0.57 (5.6)	104.33 (12.3)	0.53 (5.7)
		SGL	0.851 (0.52)	13.13 (1.4)	0.60 (4.6)	156.79 (17.5)	0.60 (4.8)
		Lasso	0.820 (0.72)	20.44 (2.6)	0.73 (3.6)	45.05 (6.1)	0.46 (4.7)
		MCP	0.820 (0.71)	20.31 (2.6)	0.73 (3.8)	44.67 (6.2)	0.46 (4.7)
		Group Lasso	0.820 (0.82)	12.42 (1.6)	0.57 (5.9)	151.71 (20.8)	0.57 (6.1)
		gMCP	0.841 (0.77)	6.56 (1.2)	0.17 (9.5)	78.48 (14.8)	0.17 (9.6)
3	2.76	gMCP	0.847 (0.77)	11.17 (3.4)	0.38 (13.0)	56.35 (6.8)	0.22 (10.3)
		SGL	0.849 (0.69)	17.82 (2.5)	0.69 (4.9)	46.38 (6.8)	0.50 (5.4)
		SGL	0.853 (0.62)	15.02 (1.6)	0.65 (4.2)	118.96 (13.1)	0.63 (4.4)
		SGL	0.846 (0.60)	15.61 (1.5)	0.66 (3.5)	187.17 (18.7)	0.67 (3.6)
		Lasso	0.843 (0.73)	20.57 (2.9)	0.73 (4.7)	41.63 (6.7)	0.50 (5.2)
		MCP	0.850 (0.85)	14.96 (3.3)	0.56 (10.5)	27.96 (6.8)	0.38 (8.4)
		Group Lasso	0.830 (0.80)	12.20 (1.4)	0.56 (5.3)	149.62 (18.2)	0.57 (5.4)
		gMCP	0.852 (0.68)	6.35 (1.1)	0.14 (8.8)	76.44 (13.8)	0.14 (8.9)
4	2.55	gMCP	0.827 (0.84)	9.72 (2.7)	0.36 (12.0)	56.97 (8.2)	0.48 (4.1)
		SGL	0.822 (0.68)	17.07 (2.4)	0.67 (4.7)	47.91 (6.6)	0.55 (4.8)
		SGL	0.843 (0.57)	12.99 (1.6)	0.58 (5.5)	107.22 (12.6)	0.69 (3.6)
		SGL	0.845 (0.55)	13.46 (1.5)	0.61 (4.5)	160.91 (17.7)	0.76 (2.7)
		Lasso	0.814 (0.73)	20.48 (2.8)	0.73 (3.9)	44.57 (6.7)	0.55 (4.7)
		MCP	0.814 (0.72)	20.10 (2.8)	0.72 (5.0)	43.59 (6.9)	0.54 (5.2)
5	2.71	gMCP	0.842 (0.82)	11.38 (3.4)	0.39 (13.0)	56.64 (7.2)	0.50 (5.1)
		SGL	0.845 (0.69)	17.86 (2.4)	0.69 (4.9)	45.86 (6.9)	0.60 (4.8)
		SGL	0.849 (0.64)	15.18 (1.6)	0.65 (4.1)	119.78 (13.1)	0.75 (2.9)
		SGL	0.840 (0.62)	15.84 (1.6)	0.68 (3.6)	189.59 (19.6)	0.80 (2.2)
		Lasso	0.839 (0.73)	20.90 (2.8)	0.73 (4.5)	41.80 (6.6)	0.59 (4.8)
		MCP	0.847 (0.87)	15.23 (3.3)	0.57 (10.6)	28.15 (6.9)	0.44 (9.2)

Open in a new tab

PAUC is the predictive area under ROC curve, MS is model size, FDR is false discovery rate, GMS is group model size, and GFDR is the group false discovery rate. SE is the standard error computed from Inline graphic replications.

4.2.2. Comparison with the SGL

In linear models, the Inline graphic gMCP achieves smaller PMSE than the SGL, while in logistic models the PAUC of these methods are similar. The concave -norm group penalties have smaller FDR and GFDR across all settings. The MS and GMS of the concave -norm group penalties are smaller than the SGL with and .

4.2.3. Comparison with the standard concave penalty

The PMSE of the concave Inline graphic -norm group penalties is smaller than the standard ones; while the PAUC of these methods is close. The gMCP has smaller GMS and GFDR in all the settings. This is expected since the standard penalties do not make use of group information. The MS and FDR of the concave -norm group penalties are smaller than the standard concave penalties under the setting 1–4. Under the setting 5 with one or two dominating members, the standard MCP penalty ends up with smaller MS.

4.2.4. Comparison with the group Lasso and the concave 2-norm group selection

We compare the concave Inline graphic -norm, the group Lasso and the -norm group penalties only under the setting 1–3 because of the group selection property of the group Lasso and the -norm group penalties. The -norm group penalty in general has a smaller GMS, while the -norm group penalty has a smaller MS. The GFDR and FDR of the Inline graphic -norm and -norm group penalties are close to each other, both of which are smaller than the group Lasso.

5. Data example

Our illustrative example comes from a published study exploring the association between genes and prognosis of breast cancer (van’t Veer and others, 2002; Van de Vijver and others, 2002). Tumor samples from Inline graphic women with breast cancer were selected for microarray expression profiling. The age at diagnosis was 52 years or younger for those women to be eligible. Fluorescence intensities of 25 000 human genes were quantified and normalized. Ratio of these values to the intensity of a reference pool was calculated for analysis purpose. Further details can be found in the references above.

For our purpose, a binary variable indicating whether patients developed metastasis within 5 years from surgery is modeled as the outcome. There are 78 patients developed metastasis within 5 years. A total of Inline graphic genes with top Spearman correlation coefficients with the outcome were used for illustrative purpose. (Note: the method can handle problems with and .) The membership of the genes were determined by the hierarchical cluster method using the Gap statistic. The idea of Gap statistic is (1) group genes into Inline graphic clusters and calculate the total within block sum of squares . (2) create new resampled datasets by separately permuting measurement of each gene. Repeat Step (1) to the new resampled datasets and calculate the average , . Then find an maximizes . For details about the Gap statistic, we refer to Tibshirani and others (2001) and Ma and Huang (2007). In our example, the optimal Inline graphic . Hence, we have 33 groups with group size from 2 to 68. We use the cross-validated area under ROC curve (CV-AUC) approach to select tuning parameters . This approach computes average predictive AUC of validation datasets created by cross validation to select tuning parameters. We refer to Jiang and others (2013) for more details of CV-AUC method.

Table 5 presents the results based on 20 replications of 5-fold CV-AUC of different penalties. The median and median absolute deviation (MAD) of CV-AUC, MS, and GMS are reported. The Inline graphic gSCAD and gMCP have greater CV-AUC than other methods. The gSCAD and gMCP prefer models with small GMS. The SGL methods have similar results with three different choices of . The standard concave penalties (Lasso, SCAD, and MCP) have the smallest CV-AUC compared with group methods. The results suggest that incorporating group information in general improves model predictive performance. Among grouped approaches, the Inline graphic -norm group penalty outperformed others.

Table 5.

Results of different penalties in breast cancer study

Method	CV-AUC (MAD)	GMS (MAD)	MS (MAD)
Lasso	0.776 (0.013)	29 (2)	52 (6)
SCAD	0.776 (0.013)	29 (2)	55 (9)
MCP	0.776 (0.013)	29 (2)	52 (7)
SGL	0.782 (0.008)	27 (2)	74.5 (15)
SGL	0.803 (0.012)	28 (1)	80 (8.5)
SGL	0.810 (0.011)	30 (1)	94 (4)
Group Lasso	0.794 (0.009)	12 (1)	71 (11)
gSCAD	0.802 (0.007)	6 (1)	27 (7)
gMCP	0.802 (0.008)	5 (0)	20 (0)
gSCAD	0.825 (0.011)	11.5 (4)	44 (13)
gMCP	0.824 (0.011)	11.5 (4.5)	44 (13)

Open in a new tab

6. Discussion

The proposed concave Inline graphic -norm group penalty has bi-level selection feature under proper conditions. The robustness to membership mis-specification is of particular interest in practice since true group information is usually unavailable. The recent SGL method also has bi-level selection feature. However, it is sensitive to mis-specification due to the group Lasso component. The robustness of the proposed method is related to the Inline graphic penalty within group. Assuming the same probably of being identified at group level, the norm still gives freedom to individual members; while the norm does not. Individual level selection protects over-control at group level. Hence, under mis-specification, an causal variable is still likely to be picked even if the group it belongs to is not identified. Likewise, a null variable is less likely to be identified even the group is selected. More work is needed to better understand theoretical property of the method.

Compared with the standard concave penalty, the Inline graphic -norm group penalty incorporates the group information and thus achieves a better control of false discovery rate at group level and individual level. When group information is correctly specified, the -norm group penalty is still capable of achieving a better FDR and GFDR in most cases being explored. The R package “grppenalty” implemented the proposed algorithms with sufficient efficiency and stability. Hence, the Inline graphic -norm group penalty is a valuable tool for variable selection in problems.

Although the Inline graphic -norm group penalty is robust to mis-specified group information, we still want to approximate true grouping structure as accurate as possible. How to achieve this goal is still an open question. We offer several possibilities as listed below. For studies using questionnaire to collect variable information. We suggest defining group structure based on design of questionnaire. Most of these studies have a block of questions to measure similar quantities of study subjects. For example, questions attempting to quantify intake of fat-rich foods are usually organized in one block and can be considered as a group. Such grouping structure is consistent to the perception of researchers and has an easy interpretation. However, no statistical procedure justifies membership information. Another approach is to perform a numerical exploration using index statistics such as the Gap statistic. Then based on the index statistic, a group structure can be established. Such structure considers correlation among predictors; therefore, it in general leads to an improved performance. A disadvantage of such grouping method is its difficulty in interpretation. A third way, which is more specific to the genomic data, is to use available biological information about genes. Gene Oncology (GO) and multiple databases on biological pathway would be a good start to collect such group information.

7. Funding

The work of Huang is supported in part by NIH Grant R01CA142774 and NSF Grant DMS-1208225.

Supplementary Material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Supplementary Data

supp_16_2_252__index.html^{(709B, html)}

Acknowledgement

We thank the editor, associate editor, and two referees for their helpful comments, which led to considerable improvements in the revision of the paper. Conflict of Interest: None declared.

References

Breheny P., Huang J. (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface 2, 369–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
Donoho D. L., Johnstone J. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 813, 425–455. [Google Scholar]
Fan J., Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association 96456, 1348–13608. [Google Scholar]
Friedman J., Hastie T., Höfling H., Tibshirani R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics 12, 302–332. [Google Scholar]
Friedman J., Hastie T., Tibshirani R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 331, 1–22. [PMC free article] [PubMed] [Google Scholar]
Friedman J., Hastie T., Tibshirani R. (2010). A note on the group lasso and a sparse group lasso. Techinical report. http://arxiv.org/abs/1001.0736. [Google Scholar]
Huang J., Breheny P., Ma S. (2012). A selective review of group selection in high dimensional model. Statistical Science 274, 481–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang J., Ma S., Xie H. L., Zhang C.-H. (2009). A group bridge approach for variable selection. Biometrika 96, 339–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hunter D. R., Lange K. (2004). A tutorial on MM algorithms. American Statistician 581, 30–37. [Google Scholar]
Hunter D. R., Li R. (2005). Variable selection using MM algorithms. Annals of Statistics 334, 1617–1642. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang D., Huang J., Zhang Y. (2013). The cross-validated AUC for MCP-logistic regression with high-dimensional data. Statistical Methods in Medical Research 225, 505–518. [DOI] [PubMed] [Google Scholar]
Jiang D., Huang J. (2014). Majorization minimization by coordinate descent for concave penalized generalized linear models. Statistics and Computing 245, 871–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lange K., Hunter D., Yang I. (2000). Optimization transfer using surrogate objective functions (with discussion). Journal of Computational and Graphics Statistics 91, 1–59. [Google Scholar]
Ma S., Huang J. (2007). Clustering threshold gradient descent regularization: with applications to microarray studies. Bioinformatics 234, 466–472. [DOI] [PubMed] [Google Scholar]
Mazumder R., Friedman J., Hastie T. (2011). SparseNet coordinate descent with non-convex penalties. Journal of American Statistical Association 106495, 1125–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meier L., van de Geer S., Bühlmann P. (2008). The group lasso for logistic regression. Journal of Royal Statistical Society Series B 701, 53–71. [Google Scholar]
Simon N., Friedman J., Hastie T., Tibshirani R. A sparse-group lasso. Technical report. Stanford University. [Google Scholar]
Tibshirani R., Walther G., Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of Royal Statistical Society Series B 632, 411–423. [Google Scholar]
van’t Veer L. J., Dai H., van de Vijver M. J. and others (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 41531, 530–536. [DOI] [PubMed] [Google Scholar]
van de Vijver M. J., He Y. D., van’t Veer L. J. and others (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 34725, 1999–2009. [DOI] [PubMed] [Google Scholar]
Wu T. T., Lange K. (2008). Coordinate descent algorithms for Lasso penalized regression. Annals of Applied Statistics 21, 224–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan M., Lin Y. (2006). Model selection and estimation in regression with grouped variables. Journal of Royal Statistical Society Series B 681, 49–67. [Google Scholar]
Zhang C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 382, 894–942. [Google Scholar]
Zhang C. H., Huang J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Annals of Statistics 364, 1567–1594. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

supp_16_2_252__index.html^{(709B, html)}

supp_kxu050_kxu050supp.pdf^{(174KB, pdf)}

[C1] Breheny P., Huang J. (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface 2, 369–380. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C2] Donoho D. L., Johnstone J. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 813, 425–455. [Google Scholar]

[C3] Fan J., Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association 96456, 1348–13608. [Google Scholar]

[C4] Friedman J., Hastie T., Höfling H., Tibshirani R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics 12, 302–332. [Google Scholar]

[C5] Friedman J., Hastie T., Tibshirani R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 331, 1–22. [PMC free article] [PubMed] [Google Scholar]

[C6] Friedman J., Hastie T., Tibshirani R. (2010). A note on the group lasso and a sparse group lasso. Techinical report. http://arxiv.org/abs/1001.0736. [Google Scholar]

[C7] Huang J., Breheny P., Ma S. (2012). A selective review of group selection in high dimensional model. Statistical Science 274, 481–499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C8] Huang J., Ma S., Xie H. L., Zhang C.-H. (2009). A group bridge approach for variable selection. Biometrika 96, 339–355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C9] Hunter D. R., Lange K. (2004). A tutorial on MM algorithms. American Statistician 581, 30–37. [Google Scholar]

[C10] Hunter D. R., Li R. (2005). Variable selection using MM algorithms. Annals of Statistics 334, 1617–1642. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C11] Jiang D., Huang J., Zhang Y. (2013). The cross-validated AUC for MCP-logistic regression with high-dimensional data. Statistical Methods in Medical Research 225, 505–518. [DOI] [PubMed] [Google Scholar]

[C12] Jiang D., Huang J. (2014). Majorization minimization by coordinate descent for concave penalized generalized linear models. Statistics and Computing 245, 871–883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C13] Lange K., Hunter D., Yang I. (2000). Optimization transfer using surrogate objective functions (with discussion). Journal of Computational and Graphics Statistics 91, 1–59. [Google Scholar]

[C14] Ma S., Huang J. (2007). Clustering threshold gradient descent regularization: with applications to microarray studies. Bioinformatics 234, 466–472. [DOI] [PubMed] [Google Scholar]

[C15] Mazumder R., Friedman J., Hastie T. (2011). SparseNet coordinate descent with non-convex penalties. Journal of American Statistical Association 106495, 1125–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C16] Meier L., van de Geer S., Bühlmann P. (2008). The group lasso for logistic regression. Journal of Royal Statistical Society Series B 701, 53–71. [Google Scholar]

[C17] Simon N., Friedman J., Hastie T., Tibshirani R. A sparse-group lasso. Technical report. Stanford University. [Google Scholar]

[C18] Tibshirani R., Walther G., Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of Royal Statistical Society Series B 632, 411–423. [Google Scholar]

[C19] van’t Veer L. J., Dai H., van de Vijver M. J. and others (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 41531, 530–536. [DOI] [PubMed] [Google Scholar]

[C20] van de Vijver M. J., He Y. D., van’t Veer L. J. and others (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 34725, 1999–2009. [DOI] [PubMed] [Google Scholar]

[C21] Wu T. T., Lange K. (2008). Coordinate descent algorithms for Lasso penalized regression. Annals of Applied Statistics 21, 224–244. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C22] Yuan M., Lin Y. (2006). Model selection and estimation in regression with grouped variables. Journal of Royal Statistical Society Series B 681, 49–67. [Google Scholar]

[C23] Zhang C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 382, 894–942. [Google Scholar]

[C24] Zhang C. H., Huang J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Annals of Statistics 364, 1567–1594. [Google Scholar]

PERMALINK

Concave 1-norm group selection

Dingfeng Jiang

Jian Huang

Abstract

1. Introduction

2. Methods

2.1. A brief review of group penalties

2.2. Concave 1-norm group penalty

Fig. 1.

2.3. Properties of the concave 1-norm group penalty

2.3.1. Robustness to mis-specified grouping structure

Table 1.

2.3.2. Bi-level selection feature

Proposition 1 (Bi-level selection) —

Proposition 2 (Invariance property of the gMCP) —

2.3.3. Model sparsity of the 1-norm group penalty

Proposition 3 (Model sparsity of 1-norm group penalty) —

3. Computation

3.1. Coordinate descent algorithm

3.2. Solution surface

3.3. Extension to the generalized linear models

3.4. Convergence analysis

Theorem 3.1 (Convergence in linear model) —

Theorem 3.2 (Convergence in GLM) —

4. Simulation studies

4.1. Simulation with group mis-specification

Table 2.

4.2. A comparison with related methods

4.2.1. Simulation models

Table 3.

Table 4.

4.2.2. Comparison with the SGL

4.2.3. Comparison with the standard concave penalty

4.2.4. Comparison with the group Lasso and the concave 2-norm group selection

5. Data example

Table 5.

6. Discussion

7. Funding

Supplementary Material

Acknowledgement

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases