Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Nov 21;16(2):252–267. doi: 10.1093/biostatistics/kxu050

Concave 1-norm group selection

Dingfeng Jiang 1,*, Jian Huang 2
PMCID: PMC4441102  PMID: 25417206

Abstract

Grouping structures arise naturally in many high-dimensional problems. Incorporation of such information can improve model fitting and variable selection. Existing group selection methods, such as the group Lasso, require correct membership. However, in practice it can be difficult to correctly specify group membership of all variables. Thus, it is important to develop group selection methods that are robust against group mis-specification. Also, it is desirable to select groups as well as individual variables in many applications. We propose a class of concave Inline graphic-norm group penalties that is robust to grouping structure and can perform bi-level selection. A coordinate descent algorithm is developed to calculate solutions of the proposed group selection method. Theoretical convergence of the algorithm is proved under certain regularity conditions. Comparison with other methods suggests the proposed method is the most robust approach under membership mis-specification. Simulation studies and real data application indicate that the Inline graphic-norm concave group selection approach achieves better control of false discovery rates. An R package grppenalty implementing the proposed method is available at CRAN.

Keywords: Bi-level selection, Concave penalties, Coordinate descent, Sparse group Lasso, p > n problems

1. Introduction

Grouping structures exist in many high-dimensional problems. For example, genes in the same biological pathway naturally form a group. In genome-wide association studies, single-nucleotide polymorphisms (SNP) from the same exon can also be considered as a group. Typically, variables of the same membership share similar characteristics. Hence higher within-group correlations are often observed for group members. Incorporation of group information can improve model fitting and leads to better interpretation. Possible applications of grouped approaches includes but not limited to (i) genetic studies assessing association between biomarkers (such as gene expression level, SNP mutation, and copy number variation) and phenotypes of interest; (ii) studies using multiple questions (instruments) to measure a particular feature. For example, multiple cognitive tests are generally used in Alzheimer's studies to quantify the cognitive function. The membership of variables can be determined either by analytical methods or knowledge from field science.

Yuan and Lin (2006) proposed the group Lasso for group selection. Meier and others (2008) extended the method to logistic regression and applied it to detect splice sites in DNA sequences. Breheny and Huang (2009) proposed a class of bi-level selection methods using concave composite penalties. Huang and others (2009) proposed a group bridge approach for group and individual variable selection. Friedman and others (2010) and Simon and others (2012) proposed the sparse group Lasso (SGL). The SGL bridges the individual selection feature of the Lasso and the group selection nature of the group Lasso via a convex combination. Huang and others (2012) reviewed several group selection methods, including the Inline graphic-norm concave group selection methods. Both the group Lasso and the concave Inline graphic-norm group penalty select variables at group level, that is, the members of a group are either all selected or dropped. Therefore, grouping structure has a great impact on results. True grouping structure is, however, difficult to specify or not available in many applications. Hence, it is important to develop a robust group selection method with respect to possible mis-specified membership.

We propose a class of concave Inline graphic-norm group selection methods for high-dimensional linear and generalized linear models when number of covariates can exceed sample size. These methods have two attractive features. First, they are capable of selecting variables at both group and individual levels, that is, they have the bi-level selection property. Second, they are robust against possible mis-specified grouping structure. These methods can be efficiently implemented via a coordinate descent type algorithm. Our convergence analysis shows that this algorithm is guaranteed to converge to a minimum of the objective function.

The rest of the article organizes as follows. Section 2 first provides a brief review of related penalties, then proposes the concave Inline graphic-norm group penalty. Section 2.3 shows the robustness by two examples and establishes the bi-level selection feature by two propositions. Section 3 details the computation of the concave Inline graphic-norm group penalized solution using the coordinate descent algorithm (CDA). Section 3.3 extends the concave Inline graphic-norm group penalty to GLMs and develops algorithm for computing solutions in GLMs based on majorization minimization (MM) approach and CDA. Section 3.4 establishes the theoretical convergence of the CDA for linear and GLMs. Section 4 performs simulation studies to understand the robustness of the concave Inline graphic-norm group penalty and compare it with the concave Inline graphic-norm group penalty and the SGL. A comprehensive comparison with related methods is also conducted to study the empirical behavior of the proposed method. Section 5 applies the Inline graphic-norm group penalty to a motivation example and compares the results with other methods. Section 6 concludes the article by discussion.

2. Methods

2.1. A brief review of group penalties

We briefly review the existing group selection methods, namely the group Lasso, the concave Inline graphic-norm group penalty and the sparse group Lasso (SGL). Denote the coefficients of a group of variables as Inline graphic and let Inline graphic be its dimensionality. The group Lasso (Yuan and Lin, 2006) is defined as

2.1. (2.1)

with Inline graphic being the Euclidean norm. When the group size Inline graphic, the group Lasso reduces to the Lasso penalty. By imposing a concave penalty Inline graphic on the Euclidean norm of Inline graphic, Huang and others (2012) proposed the concave Inline graphic-norm group penalty, which has the form as

2.1. (2.2)

The concave Inline graphic-norm group penalty reduces to the standard concave penalty when Inline graphic. The group Lasso can be viewed as a special case of the Inline graphic-norm concave group penalty with Inline graphic. Both the group Lasso and concave Inline graphic-norm group penalty rely on non-differentiability of Inline graphic to perform group only selection. Tuning parameter Inline graphic controls model sparsity. As group selection procedures, both the group Lasso and Inline graphic-norm concave group penalty are sensitive to specified membership. The SGL (Friedman and others, 2010; Simon and others, 2012) uses the penalty function

2.1. (2.3)

with Inline graphic, i.e., the Inline graphic norm of Inline graphic and Inline graphic coefficient of individual group member. Convex combination of the group Lasso and Lasso imposes both group and individual sparsity. Tuning parameter Inline graphic controls degree of sparsity and Inline graphic controls weight of the group Lasso and Lasso. The SGL becomes the Lasso when Inline graphic and the group Lasso when Inline graphic. According to our results, the SGL is also sensitive to mis-specified group information. This may due to the group Lasso component. The SGL method seems to prefer a larger model comparing to the Inline graphic-norm group penalty, which could be related to rate consistency property of the Lasso under the sparse Riesz condition (Zhang and Huang, 2008).

2.2. Concave 1-norm group penalty

Consider a linear regression model, Inline graphic, where Inline graphic is a response vector, Inline graphic is an Inline graphic design matrix and Inline graphic is error terms. Here Inline graphic is the coefficient vector. We are interested in cases where Inline graphic and Inline graphic is sparse in the sense that many of its elements are zero. Denote the Inline graphicth covariate vector by Inline graphic. Without loss of generality, we assume the response is centered and the covariates are standardized so that Inline graphic. We also assume that Inline graphic covariates are divided into Inline graphic groups and size of the Inline graphicth group is Inline graphic. Denote the coefficients of the Inline graphicth group Inline graphic and corresponding design matrix Inline graphic.

The concave Inline graphic-norm group penalized least squares criterion is defined as

2.2. (2.4)

The penalty level for the Inline graphicth group is Inline graphic, which is proportional to its group size. This avoids the situation where large groups overwhelm small groups.

Multiple penalties can be chosen for Inline graphic. We use the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) and the minimax concave penalty (MCP) (Zhang, 2010). Both the SCAD and MCP reduce degree of penalization gradually for large coefficients. Such (nearly) unbiased estimation of coefficients enables the SCAD and MCP to correctly select important variables and estimate their coefficients with high probabilities under certain sparsity conditions and other appropriate regularity conditions, a property known as the oracle property. The SCAD penalty function is Inline graphic with Inline graphic and Inline graphic. Here Inline graphic is the indicator function and Inline graphic denotes the non-negative part of Inline graphic. The MCP is defined as Inline graphic for Inline graphic and Inline graphic. Regularization parameter Inline graphic controls concavity of both penalties, with smaller Inline graphic being more concave. When Inline graphic, both SCAD and MCP reduce to the Lasso penalty. Throughout the article, we brief the Inline graphic-norm group SCAD and MCP as the Inline graphic gSCAD and Inline graphic gMCP.

Figure 1 shows solution paths of the Inline graphic-norm and Inline graphic-norm gMCP, group Lasso, and SGL for a simple example. The example has four groups of variables, with the first group having coefficients Inline graphic and the rest three all having zero coefficients. The bold solid line is the path of the zero coefficient (the 1st member) and the bold dash lines are the paths of the non-zero coefficients (the 2nd and 3rd member) in the 1st group. The dotted lines are the paths of the rest variables. The Inline graphic-norm gMCP and the SGL have bi-level selection features with a proper Inline graphic, while the Inline graphic-norm gMCP and the group Lasso perform group selection only.

Fig. 1.

Fig. 1.

Solution paths of different group selection methods. The bold solid line is the path of the zero coefficient (the 1st member) and the bold dash lines are the paths of the non-zero coefficients (the 2nd and 3rd member) in the first group. The dotted lines are the paths of the rest variables. The Inline graphic-norm gMCP and the SGL have bi-level selection features with a proper Inline graphic, while the Inline graphic-norm gMCP and group Lasso perform group selection only.

2.3. Properties of the concave 1-norm group penalty

2.3.1. Robustness to mis-specified grouping structure

An advantage of the concave 1-norm group penalty is its robustness to mis-specified group information. This property is closely related to the bi-level selection feature discussed later. Group selection method, such as the 2-norm group penalty, has only two possible estimates, Inline graphic or Inline graphic. This obviously puts the 2-norm group penalty in a disadvantage position when some null variables are mis-grouped with one or more non-zero variables. The method either misses non-zero variables or falsely identifies null variables as causal ones.

Table 1 illustrates limitation of the Inline graphic gMCP in some settings. Example A shows that the Inline graphic gMCP fails to identify two null variables, x6–x7, when they are mis-grouped with a causal variable x5. This leads to the false discovery of x6–x7 as causal variables by the Inline graphic gMCP method, while the Inline graphic gMCP identifies x6–x7 correctly as null variables. Example B shows when a null variable x4 and a causal one x5 are mis-grouped with a causal one x6, the Inline graphic gMCP fails to identify x5–x6 as causal variables, while the Inline graphic gMCP fails to identify x6 as causal variable.

Table 1.

Two examples

True Working Inline graphic gMCP Inline graphic gMCP
Example Variables structure structure True value estimate estimate
A x1 1 1 0.4 0.285 0.311
x2 1 1 0.4 0.382 0.416
x3 1 1 0.4 0.390 0.424
x4 1 1 0.3 0.297 0.327
x5 1 2 0.2 0.214 0.156
x6 2 2 0 -0.023 0
x7 2 2 0 -0.041 0
B x1 1 1 0.3 0.227 0.212
x2 1 1 0.3 0.338 0.305
x3 1 1 0.3 0.345 0.309
x4 1 2 0 0 0
x5 1 2 0.1 0 0.093
x6 2 2 0.05 0 0

Example A shows the Inline graphic gMCP falsely identifies the null variables Inline graphicInline graphic as causal ones due to mis-specification. Example B shows that the Inline graphic gMCP misses the causal variables Inline graphicInline graphic.

2.3.2. Bi-level selection feature

The following proposition shows that the concave Inline graphic-norm group penalty could have zero and non-zero solutions within a group under proper conditions. Thus the method has bi-level selection feature. Note that the right hand side of second expression could be zero. Under that scenario, the proposed penalty performs group selection only. Therefore, bi-level selection of the concave Inline graphic-norm group penalty requires a proper Inline graphic. We prefer a data-driven approach to select an optimal Inline graphic.

Proposition 1 (Bi-level selection) —

Let Inline graphic be the solution of the concave Inline graphic-norm group penalized regression as defined in (2.4), then a necessary condition for Inline graphic to be a minimizer is that

graphic file with name M122.gif (2.5)

where Inline graphic is the first derivative of Inline graphic w.r.t. Inline graphic.

Proposition 2 (Invariance property of the Inline graphic gMCP) —

Given a group of standardized variables with size of Inline graphic, the Inline graphic gMCP has the following invariance property,

graphic file with name M129.gif (2.6)

Notice that the Inline graphic gSCAD does not have the invariance property. Proof of both propositions is simple and thus skipped.

2.3.3. Model sparsity of the 1-norm group penalty

The following proposition shows that at the same penalty level Inline graphic, the concave 1-norm group penalty has a higher group sparsity than the 2-norm group penalty. That means in order to achieve the same level of group sparsity, we need a larger Inline graphic for the concave 2-norm group penalty.

Proposition 3 (Model sparsity of 1-norm group penalty) —

Let Inline graphic be the coefficients of a group of variables with dimensionality Inline graphic, then given the same penalty level Inline graphic, Inline graphic implies Inline graphic.

This proposition holds because Inline graphic by the Cauchy–Schwarz inequality.

3. Computation

3.1. Coordinate descent algorithm

Over the last few years, CDA has been shown to be an efficient approach for solving high-dimensional penalized regression problems such as the Lasso (Wu and Lange, 2008; Friedman and others, 2007, 2010). We apply the idea of CDA to compute the solutions of the concave 1-norm group selection problems.

Let Inline graphic. We want to update Inline graphic to Inline graphic and update Inline graphic to Inline graphic within the Inline graphicth group. That is we want to update Inline graphic to Inline graphic using the proceeding notation. CDA minimizes the criterion function (2.4)

3.1. (3.1)

as a function of Inline graphic. The solution of Inline graphic for the Inline graphic gSCAD and gMCP are

3.1. (3.2)
3.1. (3.3)

where Inline graphic with Inline graphic, Inline graphic. The notation Inline graphic for Inline graphic is the soft-thresholding operator (Donoho and Johnstone, 1994). The solution form of (3.2) and (3.3) resembles a simple soft-thresholding operator if we set Inline graphic. The Inline graphic reflects the grouping effect in the penalty.

CDA for concave Inline graphic-norm group penalty: We summarize the CDA for computing the solution of the concave Inline graphic-norm group penalized regression as follows:

  1. Given an initial value of Inline graphic, CDA computes the corresponding residual Inline graphic.

  2. For Inline graphic, CDA updates Inline graphic to Inline graphic by using (3.2) or (3.3) for the Inline graphicth coordinate of the Inline graphicth group. Then repeat the same process for the other groups until Inline graphic is updated to Inline graphic.

  3. CDA checks the convergence criterion. If the algorithm converges then CDA stops iterations, otherwise it repeats Step 2 until the algorithm converges.

3.2. Solution surface

It is a common practice to compute a solution path for a sequence of Inline graphic with a chosen Inline graphic when applying the standard concave penalties. For example, in linear models it has been suggested one uses Inline graphic for the SCAD (Fan and Li, 2001) and Inline graphic (Zhang, 2010) for the MCP. For the proposed group penalties, it is not clear which Inline graphic is appropriate. Therefore, we treat Inline graphic and Inline graphic both as tuning parameters and compute solution surface over a rectangle of Inline graphic.

Let Inline graphic and the grid values of a rectangle in Inline graphic to be Inline graphic and Inline graphic. The number of grid points Inline graphic and Inline graphic are pre-specified with Inline graphic. It can be shown that Inline graphic. We let Inline graphic, with Inline graphic if Inline graphic and Inline graphic otherwise. Denote the solution corresponding to Inline graphic as Inline graphic. We first compute Inline graphic with Inline graphic as the initial value. Then for a given Inline graphic, we compute Inline graphic by using Inline graphic as the initial value. The solution surface calculated in this manner is referred as the solution surface along Inline graphic. In general, it provides a smoother fit than other alternatives. For more details of the solution surface along Inline graphic, we refer to Mazumder and others (2011) and Jiang and Huang (2014).

3.3. Extension to the generalized linear models

The concave Inline graphic-norm group penalty can be easily extended to other models by using different loss. In this article, we extend it to the GLM family, with focus on logistic model. For a GLM model, the criterion is defined as

3.3. (3.4)

with Inline graphic being the vector of the Inline graphicth observation. The form Inline graphic depends on the specified GLM. For a logistic model Inline graphic. Direct application of CDA is possible but not stable for large Inline graphic in GLMs. Hence, we apply MM approach together with CDA to compute solutions of (3.4). The main idea of MM approach is to optimize a majorization function of Inline graphic such that each iteration forces Inline graphic downward until numerical minimum is reached. For more details about MM method, we refer to Hunter and Lange (2004), Hunter and Li (2005), and Lange and others (2000).

We assume the following two conditions hold in order to apply MM approach.

  1. The second partial derivative of loss Inline graphic w.r.t. Inline graphic is uniformly bounded for standardized Inline graphic, i.e., there exists a real number Inline graphic such that Inline graphic for all Inline graphic, Inline graphic and Inline graphic.

  2. Inline graphic, with Inline graphic being the second derivative of Inline graphic w.r.t Inline graphic.

For a logistic model, the condition (i) can be met by choosing Inline graphic. The condition (ii) is met by choosing Inline graphic for the Inline graphic gSCAD and Inline graphic for the Inline graphic gMCP. Some calculation shows that the coordinate-wise solution forms in GLM are as follows:

3.3. (3.5)
3.3. (3.6)

where Inline graphic, Inline graphic and Inline graphic being the first derivative of Inline graphic.

3.4. Convergence analysis

Theorem 3.1 establishes that under certain regularity conditions, CDA converges to a minimum of the objective function (2.4) for a concave Inline graphic-norm group penalized linear model. Theorem 3.2 states that the solution computed by the CDA and MM approach converges to a minimum of the objective functions for a concave Inline graphic-norm group penalized GLM. The proof of both theorems are provided in Appendix of supplementary material available at Biostatistics online.

Theorem 3.1 (Convergence in linear model) —

Consider the objective function (2.4), where the given data Inline graphic lies on a compact set and no two columns of Inline graphic are identical. Suppose the penalty Inline graphic satisfies Inline graphic, Inline graphic is non-negative, uniformly bounded, with Inline graphic being the first derivative (assuming existence) of Inline graphic w.r.t Inline graphic.

Then the sequence Inline graphic generated by the CDA converges to a minimum of the function Inline graphic defined in (2.4).

Theorem 3.2 (Convergence in GLM) —

Consider the objective function (3.4), where the given data Inline graphic lie on a compact set and no two columns of Inline graphic are identical. Suppose the penalty Inline graphic satisfies Inline graphic, Inline graphic is non-negative, uniformly bounded, with Inline graphic being the first derivative (assuming existence) of Inline graphic w.r.t. Inline graphic. Also assume two conditions listed below hold.

  1. The second partial derivative of loss Inline graphic w.r.t. Inline graphic is uniformly bounded for standardized Inline graphic, i.e., there exists a real number Inline graphic such that Inline graphic for all Inline graphic, Inline graphic and Inline graphic.

  2. Inline graphic, with Inline graphic being the second derivative of Inline graphic w.r.t. Inline graphic.

Then the sequence Inline graphic generated by the aforementioned algorithm converges to a minimum of the function Inline graphic defined in (3.4).

4. Simulation studies

We first compare the Inline graphic-norm gMCP, Inline graphic-norm gMCP, and SGL under group mis-specification. A comprehensive comparison between the Inline graphic-norm gMCP and related penalties is then presented under correct group information. For both simulation studies, the Inline graphic penalized covariates Inline graphic. To avoid the complexity of tuning parameter selection, we use a validation approach to select final model for comparison. That is for each Inline graphic, we compute a predictive measure based on a validation dataset with Inline graphic. For a linear regression, we use the predictive mean square error (PMSE) defined as Inline graphic. For a logistic regression, we first compute the predictive probability by Inline graphic. Then based on Inline graphic, we compute the predictive area under ROC curve (PAUC). For details of computing PAUC, we refer to Jiang and others (2013). The Inline graphic corresponding to the smallest PMSE and the largest PAUC are selected for comparison across different methods.

4.1. Simulation with group mis-specification

Set Inline graphic, Inline graphic and Inline graphic, with Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic and zero for the rest coefficients. Let Inline graphic, with Inline graphic being the covariance matrix for groups 1 and 2 and Inline graphic the covariance matrix for groups 3 and 4, and Inline graphic being the covariance matrix for group Inline graphic. We set Inline graphic such that within-group correlation is 0.5 and between-group correlation is Inline graphic for groups 1 and 2. Similarly, Inline graphic such that within-group correlation and between-group correlation are both 0.5 for groups 3 and 4. For Inline graphic, we choose a compound symmetry structure with Inline graphic. The working group information is mis-specified in the sense that the causal variables X9–X10 are grouped with the null variables X11–X20 and the causal variables X29–X30 are grouped with the null variables X31–X40. Hence, Inline graphic, Inline graphic, Inline graphic, and Inline graphic for the working group information.

Table 2 presents the results in linear and logistic models. We report the false discovery rate (FDR) as well as the percentage of X9–X20 and X29–X40 being selected over the Inline graphic replications. The results show that the Inline graphic gMCP avoids the false-positive discovery of X11–X20 and X31–X40; hence, it achieves the lowest FDR. We did not report the result of the Lasso due to space limit and its similar performance with the SGL (Inline graphic).

Table 2.

Comparison of the Inline graphic gMCP, Inline graphic gMCP, and SGL with mis-specified group information

Model Results Inline graphic gMCP Inline graphic gMCP SGL Inline graphic SGL Inline graphic SGL Inline graphic SGL Inline graphic
Linear FDR 0.272 0.613 0.544 0.544 0.520 0.453
Pct. X9 0.872 0.714 0.984 0.984 0.994 0.988
Pct. X10 0.856 0.714 0.984 0.982 0.992 0.984
Pct. range X11–X20 0.182–0.208 0.714 0.972–0.980 0.926–0.944 0.744–0.812 0.430–0.480
Pct. X29 0.866 0.684 0.968 0.976 0.996 0.982
Pct. X30 0.860 0.684 0.968 0.976 0.994 0.976
Pct. range X31–X40 0.172–0.218 0.684 0.956–0.964 0.910–0.940 0.716–0.804 0.390–0.472
Logistic FDR 0.167 0.369 0.452 0.438 0.429 0.477
Pct. X9 0.274 0.372 0.652 0.654 0.686 0.730
Pct. X10 0.260 0.372 0.654 0.660 0.684 0.724
Pct. range X11–X20 0.054–0.100 0.372 0.634–0.652 0.594–0.640 0.534–0.596 0.364–0.422
Pct. X29 0.284 0.372 0.624 0.620 0.656 0.704
Pct. X30 0.288 0.372 0.624 0.620 0.648 0.692
Pct. range X31–X40 0.064–0.092 0.372 0.606–0.620 0.578–0.604 0.530–0.588 0.348–0.428

The FDR, the percentage of XInline graphic–XInline graphic and XInline graphic–XInline graphic being selected, is reported. Causal variables are XInline graphic–XInline graphic (mis-grouped with the null variables XInline graphic–XInline graphic) and XInline graphic–XInline graphic (mis-grouped with the null variables XInline graphic–XInline graphic). The Inline graphic gMCP achieves the smallest FDR.

4.2. A comparison with related methods

We compare the proposed 1-norm group penalty with the SGL, group Lasso, the standard concave penalty and the 2-norm concave group penalty in this subsection.

4.2.1. Simulation models

Set Inline graphic and Inline graphic. The Inline graphic is a Inline graphic compound symmetric matrix with Inline graphic, representing a background correlation among predictors. The Inline graphic is a Inline graphic compound symmetric covariance matrix of the Inline graphicth group with Inline graphic as a median level of within-group correlation. We consider two scenarios (1) equal group size, with Inline graphic for Inline graphic, and (2) unequal group size with Inline graphic for Inline graphic, and Inline graphic for Inline graphic. For coefficients, set Inline graphic for Inline graphic and Inline graphic, with Inline graphic being a vector of length Inline graphic. The value of Inline graphic is chosen such that signal-to-noise ratio (SNR) is approximately in the range of Inline graphic. We consider five types of Inline graphic as listed below to represent five settings,

  1. Inline graphic, representing a situation where effects of group members are relative small but similar.

  2. Inline graphic, representing a situation where effects of some group members are small but not zero.

  3. Inline graphic, representing a situation where only one or two members have strong effect with other members have small effect.

  4. Inline graphic, representing a situation where effects of group members are median with some null members having zero coefficients.

  5. Inline graphic, representing a situation where only one or two members have strong effect with other members having small or zero coefficients.

Denote the causal variables set as Inline graphic with dimension Inline graphic, and the estimated version as Inline graphic with dimension Inline graphic. Define the set of false-positive variables as Inline graphic and Inline graphic with dimension Inline graphic. Similar concepts are defined at group level. Let the causal group sets Inline graphic with dimension Inline graphic, and the estimated version as Inline graphic with dimension Inline graphic. Denote the set of false-positive groups as Inline graphic and Inline graphic with dimension Inline graphic. We report our results in terms of model size (Inline graphic), false discovery rate (Inline graphic), group model size (Inline graphic), and group false discovery rate (Inline graphic) to evaluate selection performance together with PMSE/PAUC.

Tables 3 and 4 present the results from 500 replications in linear and logistic models under five different settings. For the sake of space, we only report the results with unequal group size with Inline graphic. For the same reason, we only report methods based on MCP penalty due to the similarity between SCAD and MCP penalties. The computation of the Inline graphic-norm and Inline graphic-norm group penalties and the group Lasso is done by the R package grppenalty, and the SGL is done by the package SGL. Below we provided a summary of major findings.

Table 3.

Comparison of the concave Inline graphic-norm group penalties with other methods in linear models

PMSE GMS GFDR MS FDR
Setting SNR Method (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic)
1 2.88 Inline graphic gMCP 1.24 (2.6) 5.49 (0.6) 0.06 (6.0) 60.57 (0.7) 0.01 (1.0)
SGLInline graphic 2.01 (6.0) 19.58 (1.7) 0.73 (2.8) 78.84 (4.5) 0.35 (3.4)
SGLInline graphic 1.65 (4.8) 8.2 (1.0) 0.35 (7.5) 83.89 (8.4) 0.28 (6.6)
SGLInline graphic 1.55 (4.6) 6.6 (0.7) 0.21 (7.1) 79.62 (8.5) 0.21 (7.2)
Lasso 1.91 (6.5) 36.82 (1.0) 0.86 (0.4) 133.76 (7.7) 0.59 (2.0)
MCP 1.91 (6.5) 36.78 (1.1) 0.86 (0.5) 133.53 (7.8) 0.59 (2.1)
Group Lasso 1.56 (5.2) 22.22 (0.6) 0.77 (0.7) 277.37 (7.4) 0.78 (0.6)
Inline graphic gMCP 1.24 (2.6) 5.62 (0.6) 0.08 (6.1) 67.33 (7.0) 0.08 (6.1)
2 2.61 Inline graphic gMCP 1.24 (2.7) 5.53 (0.6) 0.06 (6.1) 60.61 (0.7) 0.01 (1.1)
SGLInline graphic 1.66 (5.6) 20.16 (1.8) 0.74 (2.8) 69.03 (4.8) 0.42 (3.8)
SGLInline graphic 1.52 (4.8) 10.78 (1.4) 0.50 (6.8) 96.06 (11.0) 0.44 (6.5)
SGLInline graphic 1.59 (5.2) 9.06 (1.1) 0.41 (7.3) 108.31 (13.9) 0.41 (7.5)
Lasso 1.59 (5.2) 35.60 (1.1) 0.86 (0.5) 109.39 (6.1) 0.62 (2.0)
MCP 1.51 (4.7) 26.86 (3.2) 0.79 (4.4) 72.28 (10.5) 0.47 (5.5)
Group Lasso 1.54 (5.2) 22.15 (0.7) 0.77 (0.8) 276.42 (8.0) 0.78 (0.7)
Inline graphic gMCP 1.24 (2.6) 5.74 (0.7) 0.09 (6.7) 68.89 (8.8) 0.09 (6.8)
3 2.76 Inline graphic gMCP 1.24 (2.6) 5.44 (0.9) 0.04 (5.5) 59.69 (1.7) 0.01 (2.0)
SGLInline graphic 1.46 (4.2) 16.21 (1.8) 0.67 (4.2) 51.27 (4.5) 0.39 (4.6)
SGLInline graphic 1.48 (4.6) 10.92 (1.4) 0.50 (6.8) 92.53 (11.4) 0.47 (6.7)
SGLInline graphic 1.69 (6.3) 9.95 (1.2) 0.46 (6.8) 118.77 (15.4) 0.46 (6.9)
Lasso 1.40 (3.6) 32.44 (1.4) 0.84 (0.7) 83.70 (4.9) 0.61 (2.2)
MCP 1.28 (2.0) 15.99 (2.7) 0.62 (8.7) 35.15 (6.2) 0.36 (6.3)
Group Lasso 1.55 (5.3) 22.03 (0.7) 0.77 (0.8) 275.23 (8.5) 0.78 (0.8)
Inline graphic gMCP 1.24 (2.6) 5.74 (0.7) 0.09 (7.0) 68.95 (9.3) 0.08 (7.1)
4 2.55 Inline graphic gMCP 1.24 (2.7) 5.57 (0.6) 0.07 (6.4) 60.66 (0.8) 0.41 (0.7)
SGLInline graphic 1.64 (5.4) 21.04 (1.8) 0.75 (2.4) 69.83 (4.9) 0.52 (3.3)
SGLInline graphic 1.50 (4.7) 11.58 (1.5) 0.53 (6.3) 100.72 (11.8) 0.62 (4.5)
SGLInline graphic 1.58 (5.1) 9.64 (1.2) 0.44 (7.1) 115.08 (14.9) 0.66 (4.4)
Lasso 1.58 (5.1) 35.62 (1.1) 0.86 (0.5) 107.89 (6.1) 0.68 (1.6)
MCP 1.50 (4.7) 26.44 (3.4) 0.78 (5.1) 69.95 (10.3) 0.50 (6.4)
5 2.71 Inline graphic gMCP 1.24 (2.6) 5.55 (1.1) 0.05 (6.1) 59.62 (1.9) 0.40 (1.4)
SGLInline graphic 1.44 (4.0) 16.97 (1.8) 0.68 (4.1) 51.21 (4.6) 0.51 (4.1)
SGLInline graphic 1.46 (4.5) 11.72 (1.5) 0.54 (6.3) 97.23 (12.1) 0.65 (4.5)
SGLInline graphic 1.67 (6.2) 10.63 (1.4) 0.49 (6.6) 126.79 (16.9) 0.69 (4.2)
Lasso 1.39 (3.5) 32.30 (1.4) 0.84 (0.7) 80.96 (5.0) 0.68 (1.8)
MCP 1.25 (1.9) 13.89 (2.8) 0.54 (11.1) 29.79 (5.8) 0.34 (7.9)

PMSE is the predictive mean square error, MS is model size, FDR is false discovery rate, GMS is group model size, and GFDR is the group false discovery rate. SE is the standard error computed from Inline graphic replications.

Table 4.

Comparison of the concave Inline graphic-norm group penalties with other methods in logistic models

PAUC GMS GFDR MS FDR
Setting SNR Method (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic)
1 2.88 Inline graphic gMCP 0.851 (0.78) 8.80 (2.1) 0.34 (10.8) 61.97 (8.7) 0.16 (7.0)
SGLInline graphic 0.841 (0.59) 16.94 (2.0) 0.68 (4.1) 53.5 (5.4) 0.39 (4.7)
SGLInline graphic 0.872 (0.46) 10.17 (1.4) 0.46 (7.4) 89.84 (11.0) 0.41 (6.9)
SGLInline graphic 0.879 (0.43) 10.48 (1.3) 0.49 (6.3) 125.35 (15.9) 0.49 (6.4)
Lasso 0.832 (0.63) 21.42 (2.3) 0.75 (3.0) 52.10 (5.5) 0.43 (4.1)
MCP 0.832 (0.63) 21.42 (2.3) 0.75 (3.0) 52.10 (5.5) 0.43 (4.1)
Group Lasso 0.838 (0.74) 12.73 (1.5) 0.58 (5.2) 155.92 (19.6) 0.59 (5.3)
Inline graphic gMCP 0.857 (0.70) 6.78 (1.2) 0.20 (9.6) 81.55 (15.2) 0.20 (9.7)
2 2.61 Inline graphic gMCP 0.832 (0.83) 9.98 (2.7) 0.37 (12.0) 58.32 (8.7) 0.19 (8.3)
SGLInline graphic 0.828 (0.67) 16.91 (2.3) 0.67 (4.8) 48.2 (6.4) 0.44 (5.5)
SGLInline graphic 0.849 (0.53) 12.62 (1.6) 0.57 (5.6) 104.33 (12.3) 0.53 (5.7)
SGLInline graphic 0.851 (0.52) 13.13 (1.4) 0.60 (4.6) 156.79 (17.5) 0.60 (4.8)
Lasso 0.820 (0.72) 20.44 (2.6) 0.73 (3.6) 45.05 (6.1) 0.46 (4.7)
MCP 0.820 (0.71) 20.31 (2.6) 0.73 (3.8) 44.67 (6.2) 0.46 (4.7)
Group Lasso 0.820 (0.82) 12.42 (1.6) 0.57 (5.9) 151.71 (20.8) 0.57 (6.1)
Inline graphic gMCP 0.841 (0.77) 6.56 (1.2) 0.17 (9.5) 78.48 (14.8) 0.17 (9.6)
3 2.76 Inline graphic gMCP 0.847 (0.77) 11.17 (3.4) 0.38 (13.0) 56.35 (6.8) 0.22 (10.3)
SGLInline graphic 0.849 (0.69) 17.82 (2.5) 0.69 (4.9) 46.38 (6.8) 0.50 (5.4)
SGLInline graphic 0.853 (0.62) 15.02 (1.6) 0.65 (4.2) 118.96 (13.1) 0.63 (4.4)
SGLInline graphic 0.846 (0.60) 15.61 (1.5) 0.66 (3.5) 187.17 (18.7) 0.67 (3.6)
Lasso 0.843 (0.73) 20.57 (2.9) 0.73 (4.7) 41.63 (6.7) 0.50 (5.2)
MCP 0.850 (0.85) 14.96 (3.3) 0.56 (10.5) 27.96 (6.8) 0.38 (8.4)
Group Lasso 0.830 (0.80) 12.20 (1.4) 0.56 (5.3) 149.62 (18.2) 0.57 (5.4)
Inline graphic gMCP 0.852 (0.68) 6.35 (1.1) 0.14 (8.8) 76.44 (13.8) 0.14 (8.9)
4 2.55 Inline graphic gMCP 0.827 (0.84) 9.72 (2.7) 0.36 (12.0) 56.97 (8.2) 0.48 (4.1)
SGLInline graphic 0.822 (0.68) 17.07 (2.4) 0.67 (4.7) 47.91 (6.6) 0.55 (4.8)
SGLInline graphic 0.843 (0.57) 12.99 (1.6) 0.58 (5.5) 107.22 (12.6) 0.69 (3.6)
SGLInline graphic 0.845 (0.55) 13.46 (1.5) 0.61 (4.5) 160.91 (17.7) 0.76 (2.7)
Lasso 0.814 (0.73) 20.48 (2.8) 0.73 (3.9) 44.57 (6.7) 0.55 (4.7)
MCP 0.814 (0.72) 20.10 (2.8) 0.72 (5.0) 43.59 (6.9) 0.54 (5.2)
5 2.71 Inline graphic gMCP 0.842 (0.82) 11.38 (3.4) 0.39 (13.0) 56.64 (7.2) 0.50 (5.1)
SGLInline graphic 0.845 (0.69) 17.86 (2.4) 0.69 (4.9) 45.86 (6.9) 0.60 (4.8)
SGLInline graphic 0.849 (0.64) 15.18 (1.6) 0.65 (4.1) 119.78 (13.1) 0.75 (2.9)
SGLInline graphic 0.840 (0.62) 15.84 (1.6) 0.68 (3.6) 189.59 (19.6) 0.80 (2.2)
Lasso 0.839 (0.73) 20.90 (2.8) 0.73 (4.5) 41.80 (6.6) 0.59 (4.8)
MCP 0.847 (0.87) 15.23 (3.3) 0.57 (10.6) 28.15 (6.9) 0.44 (9.2)

PAUC is the predictive area under ROC curve, MS is model size, FDR is false discovery rate, GMS is group model size, and GFDR is the group false discovery rate. SE is the standard error computed from Inline graphic replications.

4.2.2. Comparison with the SGL

In linear models, the Inline graphic gMCP achieves smaller PMSE than the SGL, while in logistic models the PAUC of these methods are similar. The concave Inline graphic-norm group penalties have smaller FDR and GFDR across all settings. The MS and GMS of the concave Inline graphic-norm group penalties are smaller than the SGL with Inline graphic and Inline graphic.

4.2.3. Comparison with the standard concave penalty

The PMSE of the concave Inline graphic-norm group penalties is smaller than the standard ones; while the PAUC of these methods is close. The Inline graphic gMCP has smaller GMS and GFDR in all the settings. This is expected since the standard penalties do not make use of group information. The MS and FDR of the concave Inline graphic-norm group penalties are smaller than the standard concave penalties under the setting 1–4. Under the setting 5 with one or two dominating members, the standard MCP penalty ends up with smaller MS.

4.2.4. Comparison with the group Lasso and the concave 2-norm group selection

We compare the concave Inline graphic-norm, the group Lasso and the Inline graphic-norm group penalties only under the setting 1–3 because of the group selection property of the group Lasso and the Inline graphic-norm group penalties. The Inline graphic-norm group penalty in general has a smaller GMS, while the Inline graphic-norm group penalty has a smaller MS. The GFDR and FDR of the Inline graphic-norm and Inline graphic-norm group penalties are close to each other, both of which are smaller than the group Lasso.

5. Data example

Our illustrative example comes from a published study exploring the association between genes and prognosis of breast cancer (van’t Veer and others, 2002; Van de Vijver and others, 2002). Tumor samples from Inline graphic women with breast cancer were selected for microarray expression profiling. The age at diagnosis was 52 years or younger for those women to be eligible. Fluorescence intensities of Inline graphic25 000 human genes were quantified and normalized. Ratio of these values to the intensity of a reference pool was calculated for analysis purpose. Further details can be found in the references above.

For our purpose, a binary variable indicating whether patients developed metastasis within 5 years from surgery is modeled as the outcome. There are 78 patients developed metastasis within 5 years. A total of Inline graphic genes with top Spearman correlation coefficients with the outcome were used for illustrative purpose. (Note: the method can handle problems with Inline graphic and Inline graphic.) The membership of the Inline graphic genes were determined by the hierarchical cluster method using the Gap statistic. The idea of Gap statistic is (1) group genes into Inline graphic clusters and calculate the total within block sum of squares Inline graphic. (2) create new resampled datasets by separately permuting measurement of each gene. Repeat Step (1) to the new resampled datasets and calculate the average Inline graphic, Inline graphic. Then find an Inline graphic maximizes Inline graphic. For details about the Gap statistic, we refer to Tibshirani and others (2001) and Ma and Huang (2007). In our example, the optimal Inline graphic. Hence, we have 33 groups with group size from 2 to 68. We use the cross-validated area under ROC curve (CV-AUC) approach to select tuning parameters Inline graphic. This approach computes average predictive AUC of validation datasets created by cross validation to select tuning parameters. We refer to Jiang and others (2013) for more details of CV-AUC method.

Table 5 presents the results based on 20 replications of 5-fold CV-AUC of different penalties. The median and median absolute deviation (MAD) of CV-AUC, MS, and GMS are reported. The Inline graphic gSCAD and gMCP have greater CV-AUC than other methods. The Inline graphic gSCAD and gMCP prefer models with small GMS. The SGL methods have similar results with three different choices of Inline graphic. The standard concave penalties (Lasso, SCAD, and MCP) have the smallest CV-AUC compared with group methods. The results suggest that incorporating group information in general improves model predictive performance. Among grouped approaches, the Inline graphic-norm group penalty outperformed others.

Table 5.

Results of different penalties in breast cancer study

Method CV-AUC (MAD) GMS (MAD) MS (MAD)
Lasso 0.776 (0.013) 29 (2) 52 (6)
SCAD 0.776 (0.013) 29 (2) 55 (9)
MCP 0.776 (0.013) 29 (2) 52 (7)
SGLInline graphic 0.782 (0.008) 27 (2) 74.5 (15)
SGLInline graphic 0.803 (0.012) 28 (1) 80 (8.5)
SGLInline graphic 0.810 (0.011) 30 (1) 94 (4)
Group Lasso 0.794 (0.009) 12 (1) 71 (11)
Inline graphic gSCAD 0.802 (0.007) 6 (1) 27 (7)
Inline graphic gMCP 0.802 (0.008) 5 (0) 20 (0)
Inline graphic gSCAD 0.825 (0.011) 11.5 (4) 44 (13)
Inline graphic gMCP 0.824 (0.011) 11.5 (4.5) 44 (13)

6. Discussion

The proposed concave Inline graphic-norm group penalty has bi-level selection feature under proper conditions. The robustness to membership mis-specification is of particular interest in practice since true group information is usually unavailable. The recent SGL method also has bi-level selection feature. However, it is sensitive to mis-specification due to the group Lasso component. The robustness of the proposed method is related to the Inline graphic penalty within group. Assuming the same probably of being identified at group level, the Inline graphic norm still gives freedom to individual members; while the Inline graphic norm does not. Individual level selection protects over-control at group level. Hence, under mis-specification, an causal variable is still likely to be picked even if the group it belongs to is not identified. Likewise, a null variable is less likely to be identified even the group is selected. More work is needed to better understand theoretical property of the method.

Compared with the standard concave penalty, the Inline graphic-norm group penalty incorporates the group information and thus achieves a better control of false discovery rate at group level and individual level. When group information is correctly specified, the Inline graphic-norm group penalty is still capable of achieving a better FDR and GFDR in most cases being explored. The R package “grppenalty” implemented the proposed algorithms with sufficient efficiency and stability. Hence, the Inline graphic-norm group penalty is a valuable tool for variable selection in Inline graphic problems.

Although the Inline graphic-norm group penalty is robust to mis-specified group information, we still want to approximate true grouping structure as accurate as possible. How to achieve this goal is still an open question. We offer several possibilities as listed below. For studies using questionnaire to collect variable information. We suggest defining group structure based on design of questionnaire. Most of these studies have a block of questions to measure similar quantities of study subjects. For example, questions attempting to quantify intake of fat-rich foods are usually organized in one block and can be considered as a group. Such grouping structure is consistent to the perception of researchers and has an easy interpretation. However, no statistical procedure justifies membership information. Another approach is to perform a numerical exploration using index statistics such as the Gap statistic. Then based on the index statistic, a group structure can be established. Such structure considers correlation among predictors; therefore, it in general leads to an improved performance. A disadvantage of such grouping method is its difficulty in interpretation. A third way, which is more specific to the genomic data, is to use available biological information about genes. Gene Oncology (GO) and multiple databases on biological pathway would be a good start to collect such group information.

7. Funding

The work of Huang is supported in part by NIH Grant R01CA142774 and NSF Grant DMS-1208225.

Supplementary Material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Supplementary Data

Acknowledgement

We thank the editor, associate editor, and two referees for their helpful comments, which led to considerable improvements in the revision of the paper. Conflict of Interest: None declared.

References

  1. Breheny P., Huang J. (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface 2, 369–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Donoho D. L., Johnstone J. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 813, 425–455. [Google Scholar]
  3. Fan J., Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association 96456, 1348–13608. [Google Scholar]
  4. Friedman J., Hastie T., Höfling H., Tibshirani R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics 12, 302–332. [Google Scholar]
  5. Friedman J., Hastie T., Tibshirani R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 331, 1–22. [PMC free article] [PubMed] [Google Scholar]
  6. Friedman J., Hastie T., Tibshirani R. (2010). A note on the group lasso and a sparse group lasso. Techinical report. http://arxiv.org/abs/1001.0736. [Google Scholar]
  7. Huang J., Breheny P., Ma S. (2012). A selective review of group selection in high dimensional model. Statistical Science 274, 481–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Huang J., Ma S., Xie H. L., Zhang C.-H. (2009). A group bridge approach for variable selection. Biometrika 96, 339–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hunter D. R., Lange K. (2004). A tutorial on MM algorithms. American Statistician 581, 30–37. [Google Scholar]
  10. Hunter D. R., Li R. (2005). Variable selection using MM algorithms. Annals of Statistics 334, 1617–1642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Jiang D., Huang J., Zhang Y. (2013). The cross-validated AUC for MCP-logistic regression with high-dimensional data. Statistical Methods in Medical Research 225, 505–518. [DOI] [PubMed] [Google Scholar]
  12. Jiang D., Huang J. (2014). Majorization minimization by coordinate descent for concave penalized generalized linear models. Statistics and Computing 245, 871–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lange K., Hunter D., Yang I. (2000). Optimization transfer using surrogate objective functions (with discussion). Journal of Computational and Graphics Statistics 91, 1–59. [Google Scholar]
  14. Ma S., Huang J. (2007). Clustering threshold gradient descent regularization: with applications to microarray studies. Bioinformatics 234, 466–472. [DOI] [PubMed] [Google Scholar]
  15. Mazumder R., Friedman J., Hastie T. (2011). SparseNet coordinate descent with non-convex penalties. Journal of American Statistical Association 106495, 1125–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Meier L., van de Geer S., Bühlmann P. (2008). The group lasso for logistic regression. Journal of Royal Statistical Society Series B 701, 53–71. [Google Scholar]
  17. Simon N., Friedman J., Hastie T., Tibshirani R. A sparse-group lasso. Technical report. Stanford University. [Google Scholar]
  18. Tibshirani R., Walther G., Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of Royal Statistical Society Series B 632, 411–423. [Google Scholar]
  19. van’t Veer L. J., Dai H., van de Vijver M. J. and others (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 41531, 530–536. [DOI] [PubMed] [Google Scholar]
  20. van de Vijver M. J., He Y. D., van’t Veer L. J. and others (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 34725, 1999–2009. [DOI] [PubMed] [Google Scholar]
  21. Wu T. T., Lange K. (2008). Coordinate descent algorithms for Lasso penalized regression. Annals of Applied Statistics 21, 224–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Yuan M., Lin Y. (2006). Model selection and estimation in regression with grouped variables. Journal of Royal Statistical Society Series B 681, 49–67. [Google Scholar]
  23. Zhang C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 382, 894–942. [Google Scholar]
  24. Zhang C. H., Huang J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Annals of Statistics 364, 1567–1594. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES