Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure

Yanming Li; Bin Nan; Ji Zhu

doi:10.1111/biom.12292

. Author manuscript; available in PMC: 2015 Jun 25.

Published in final edited form as: Biometrics. 2015 Mar 2;71(2):354–363. doi: 10.1111/biom.12292

Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure

Yanming Li ¹, Bin Nan ^1,^✉, Ji Zhu ²

PMCID: PMC4479976 NIHMSID: NIHMS680623 PMID: 25732839

Summary

We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functioning groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.

Keywords: coordinate descent algorithm, eQTL, high-dimensional data, genetic association, oracle inequalities, sparsity

1 Introduction

Genomic association studies with a single phenotype have been widely studied. Such association studies often encounter high-dimensional predictors with sparsity, i.e., only a small number of predictors are associated with the response. To select truly associated predictors, it is necessary to use regularization penalties to shrink the coefficients of irrelevant predictors to exactly zero. Popular penalties for regression models with a univariate response include the lasso (Tibshirani, 1996), the adaptive lasso (Zou, 2006), the elastic net (Zou and Hastie, 2005) and the smoothly clipped absolute deviation (Fan and Li, 2001), among many others.

An important characteristic of high-dimensional genomic predictors is the intrinsic group structures. For example, the DNA markers, also known as single nucleotide polymorphisms (SNPs), can often be grouped into genes, and genes can be grouped into biological pathways. Such grouping strategies have been applied successfully to genomic studies in rare variant detection (Zhou et al., 2010; Biswas and Lin, 2012). For group variable selection, Yuan and Lin (2006) proposed the group lasso method for the univariate response case. It penalizes the L₂ norm of each predictor group and selects important groups in an “all-in-all-out” fashion. That is, all the predictors in a group are included or excluded simultaneously. However, in real applications, this is rarely the case. Oftentimes, not all the variables in an important group are important. For example, a gene associated with a certain complex trait does not mean that all the variants within the gene are causal, and a pathway that regulates certain gene expressions does not necessarily indicate that all its components have regulatory effects. Recent efforts have been made to select both important groups and important within-group signals simultaneously. Huang et al. (2009) and Zhou and Zhu (2010) adopted a L_γ, 0 < γ < 1, penalty to select important groups while removing unimportant variables within them; Zhou et al. (2010) used a penalized logistic regression with a mixed L₁/L₂ penalty to select both common and rare variants in a genome-wide association study; and Simon et al. (2013) proposed the sparse group lasso for selecting both important groups and within group predictors. However, all the above methods concern a univariate response.

Many other genomic data analyses focus on investigating the associations between high dimensional response variables and high-dimensional covariates, such as gene-gene associations (Park and Hastie, 2008; Zhang et al., 2010), protein-DNA associations (Zamdborg and Ma, 2009) and brain fMRI-DNA (or gene) associations (Stein et al., 2010). Oftentimes pairwise associations are calculated in such studies. For example, many multivariate genome-wide association studies nowadays still look for one association at a time between a single marker and a single trait, and then correct for multiple hypothesis testing (Dudoit et al., 2003; Stein et al., 2010). However, when both responses and predictors are of high dimensions, most of the family-wise type I error controlling procedures are usually too conservative and yield poor performance (Stein et al., 2010), and oftentimes adjusted analysis considering multiple variables simultaneously is more appropriate.

High dimensional responses also have natural group structures very often, for example, pathway group structures for gene expression responses and brain functional regions for fMRI intensity responses. For multivariate responses, Peng et al. (2010) adopted the mixed L₁/L₂ penalty in an orthonormal setting for identifying hub covariates in a gene regulation network; Obozinski et al. (2011) and Bunea et al. (2011) studied joint support union and joint rank selections; Lounici et al. (2011) proved oracle inequalities for multitask learning. Despite all the efforts, little focus, to our knowledge, has been put on the cases where the responses also have a group structure, whereas such cases are commonly encountered in biological studies. A possible strategy for multivariate-response analysis is to perform covariate selection for one response variable at a time. In such analysis the predictor group structure can be considered but the response group structure is overlooked.

In this article, we propose a regularization method for making a good use of the intrinsic biological group structures on both covariates and responses to facilitate a better variable selection on multivariate-response and multiple-predictor data by effectively removing unimportant blocks of regression coefficients. Both the predictor and response group structures, or in general, the block structure of the regression coefficient matrix, are assumed known. Information of many biologically confirmed group structures can be achieved from publicly available repositories, for example, RefSeq gene files from NCBI Reference Sequence Database (http://www.ncbi.nlm.nih.gov/refseq/), KEGG pathway maps from Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/), and Brodmann brain anatomic region atlas from https://surfer.nmr.mgh.harvard.edu/fswiki/BrodmannAreaMaps. The proposed method can handle cases where the number of variables in either responses or predictors is much greater than the sample size, and complex group structures such as overlapping groups where a variable belongs to multiple groups. The estimators enjoy finite sample oracle bounds for the prediction error, the estimation error, and the estimated sparsity of the regression coefficient matrix. Extensive simulations show that the proposed method outperforms competitive regularization methods. We applied the proposed method to a yeast gene expression quantitative loci (eQTL) study, where the numbers of gene expression responses and genetic marker predictors are both much larger than the sample size. The gene expressions are grouped into biological pathways and the genetic markers are grouped into genes. We demonstrate by considering both group structures that the proposed method generates a much more interpretable and predictive eQTL network between the gene expressions and genetic markers, comparing with several other commonly used regularized approaches.

2 Multivariate linear model with arbitrary grouping

We consider the multivariate linear model

Y = X B + W,

(1)

where Y = (y₁,⋯, y_q) ∈ ℝ^n×q is the response matrix of n samples and q variables, X = (x₁,⋯, x_p) ∈ ℝ^n×p is the covariate matrix of n samples and p variables, B = (β_jk)_p×q ∈ ℝ^p×q is the coefficient matrix and W = (w₁,⋯, w_q) ∈ ℝ^n×q is the matrix of error terms with each w_k ~ N(0, $σ_{k}^{2} I_{n \times n}$ ), k = 1,⋯, q. Assume Y and X are centered so that there is no intercept in B. We adopt the notational convention that the column vectors of X are indexed by j, the column vectors of Y and W are indexed by k, and the samples are indexed by i.

Assume B contains G groups, and each group, denoted as B_g where g ∈ {1,⋯, G}, is a subset of two or more elements in B. We denote the group structure by 𝒢 = {B₁,⋯, B_G}. We use B or B_g to denote either the set of all their elements or the numerical values of all their elements, depending on the context, which should not cause any confusion. Figure 1 illustrates a few examples of group structures, where each highlighted block indicates an important group in 𝒢 and each figure may represent several different group structures. Note that the group structures considered in this article are pre-defined by biological functions, such as gene or pathways. Also note that the union of all groups in 𝒢 does not need to contain all the elements of B, in other words, some β_jk may not belong to any group. We say B_g₁ is nested in B_g₂ if B_g₁ ⊂ B_g₂; B_g₁ and B_g₂ are overlapping if B_g₁ ∩ B_g₂ is not empty. Obviously, nested groups are a special case of overlapping. A group structure with overlapping groups is common in biological studies. For example, when grouping genetic variants according to genes or pathways, different genes or pathways can overlap.

B* group structures. Important groups are shaded. (a) X group structure, (b) XY group structure, (c) X+XY group structure (nesting group structure) and (d) overlapping group structure.

Though the proposed method works for an arbitrary group structure 𝒢 on B, in real applications, a biologically meaningful group structure on B is usually introduced from the group structures of both predictors and responses. Specifically, suppose X has m₁ column groups and Y has m₂ column groups, then they yield m₁ ×m₂ intersection block groups on B. We denote this intersection block group structure by 𝒢_XY, the row block group structure only determined by the predictor groups by 𝒢_X, and the nested group structure containing all groups in 𝒢_XY and 𝒢_X by 𝒢_XY ∪ 𝒢_X. In the eQTL association study, a nonzero group in 𝒢_XY indicates that the corresponding gene group has SNPs associated with expressions in the corresponding pathway group. A nonzero group in 𝒢_X indicates that the corresponding gene group has an effect on some or all of the expressions.

For an arbitrary group structure 𝒢 with G groups, let $\sum_{g = 1}^{G} {‖ B_{g} ‖}_{2}$ be the total sum of L₂ norms of every group in 𝒢, where ${‖ B_{g} ‖}_{2}^{2} = \sum_{β_{j k} \in B_{g}} β_{j k}^{2}$ . The group L₂ norm reduces to the Frobenius norm ‖A‖₂ = {tr(A^TA)}^1/2 for a matrix group A and to the vector L₂ norm ‖a‖₂ = {a^Ta}^1/2 for a vector group a. Proofs of theoretical results in the following sections are provided in the web-based Supplementary Materials.

3 The regularization method and its properties

3.1 The multivariate sparse group lasso

For an arbitrary group structure 𝒢 on B, to simplify the notation, we denote {g: B_g ∈ 𝒢} by {g ∈ 𝒢} as long as it does not cause any confusion. For j = 1,…, p and k = 1, …, q, let λ_jk ≥ 0 be the adaptive lasso tuning parameter for β_jk, with λ_jk = 0 if β_jk is not penalized. Let λ_g ≥ 0 be the adaptive tuning parameter for group B_g ∈ 𝒢, with λ_g = 0 if group B_g is not penalized. We consider the following penalized optimization problem for a general regularized multivariate multiple linear regression:

\underset{B}{arg min} \frac{1}{2 n} {‖ Y - X B ‖}_{2}^{2} + \sum_{1 \leq j \leq p, 1 \leq k \leq q} λ_{j k} | β_{j k} | + \sum_{g \in 𝒢} λ_{g} {‖ B_{g} ‖}_{2},

(2)

where the L₂ penalty term aims to shrink unimportant groups to zero and the L₁ penalty term aims to shrink unimportant entries within an important group to zero. We call it the multivariate sparse group lasso (MSGLasso). We exclude the trivial case that λ_g = 0 for all g ∈ 𝒢 and λ_jk = 0 for all j, k. To better understand the solution to (2), we develop the following theorem for β_jk when all other elements in B are fixed.

Theorem 3.1

For an arbitrary group structure 𝒢 on B, let B̂ be the solution to (2) and β_jk be its jk-th element. If for some group B_g0 ∈ 𝒢 with a tuning parameter λ_g0,

\sqrt{\sum_{{j k : β_{j k} \in B_{g 0}}} {(| S_{j k} | / n - λ_{j k})}_{+}^{2}} \leq λ_{g 0},

(3)

then β̂_jk = 0 for every β_jk ∈ B_g0. Otherwise, β_jk satisfies

{\hat{β}}_{j k} = \frac{sgn (S_{j k}) {(| S_{j k} | - n λ_{j k})}_{+}}{{‖ x_{j} ‖}_{2}^{2} + n \sum_{{g \in 𝒢 : β_{j k} \in B_{g}}} λ_{g} / {‖ {\hat{B}}_{g} ‖}_{2}},

(4)

where $S_{j k} = x_{j}^{T} {(Y - X {\hat{B}}_{(- j)})}_{\cdot k}$ with B̂_(−j) being the j-th row of B̂ replaced by zeros, the subscript ·k refers to the k-th column of a matrix, and a₊ = a if a > 0 and 0 otherwise.

Note that Theorem 3.1 is a general solution form and applies to arbitrary group structures. If there is no group structure assigned on B, then 𝒢 becomes an empty set and (4) reduces to the lasso solution; If λ_jk = 0 for all j, k, then (4) and (3) provide the group lasso solution. It is of interest to consider certain special group structures that are intuitive and commonly used in many applications. Specifically, we consider model (2) with the following four group structures: (I) 𝒢 = ∅, no group structure assigned on B; (II) 𝒢_X; (III) 𝒢_XY; (IV) 𝒢_XY ∪ 𝒢_X. The corresponding optimization problems become

\underset{B}{arg min} \frac{1}{2 n} {‖ Y - X B ‖}_{2}^{2} + λ {| B |}_{1},

(5)

\underset{B}{arg min} \frac{1}{2 n} {‖ Y - X B ‖}_{2}^{2} + λ {| B |}_{1} + λ_{1} \sum_{g_{1} \in 𝒢_{X}} ω_{g_{1}}^{1 / 2} {‖ B_{g_{1}} ‖}_{2},

(6)

\underset{B}{arg min} \frac{1}{2 n} {‖ Y - X B ‖}_{2}^{2} + λ {| B |}_{1} + λ_{2} \sum_{g_{2} \in 𝒢_{X Y}} ω_{g_{2}}^{1 / 2} {‖ B_{g_{2}} ‖}_{2},

(7)

\underset{B}{arg min} \frac{1}{2 n} {‖ Y - X B ‖}_{2}^{2} + λ {| B |}_{1} + λ_{1} \sum_{g_{1} \in 𝒢_{X}} ω_{g_{1}}^{1 / 2} {‖ B_{g_{1}} ‖}_{2} + λ_{2} \sum_{g_{2} \in 𝒢_{X Y}} ω_{g_{2}}^{1 / 2} {‖ B_{g_{2}} ‖}_{2},

(8)

where |B|₁ = ∑_jk |β_jk| is the L₁ norm of B, and ω_g₁ and ω_g₂ are some weights, in particular, the group sizes. The tuning parameter λ_jk = λ for all lasso penalties, $λ_{g} = λ_{1} ω_{g_{1}}^{1 / 2}$ if g ∈ 𝒢_X, and $λ_{g} = λ_{2} ω_{g_{2}}^{1 / 2}$ if g ∈ 𝒢_XY.

In the remaining of this article, we call (5) the Lasso model, (6) the Lasso+X model, (7) the Lasso+XY model, and (8) the Lasso+X+XY model.

Let B̂_L, B̂_LX, B̂_LXY and B̂_LXXY be the solutions to (5), (6), (7) and (8), respectively. Their corresponding expressions from Theorem 3.1 further reduce to some interesting simpler forms under the orthonormal design, in particular, B̂_LX and B̂_LXY are just further shrinkages of B̂_L, and B̂_LXXY is a further shrinkage of either B̂_LX or B̂_LXY. We are also interested in the group lasso cases where λ = 0 in (6), (7) and (8), with their solutions denoted by B̂_GX, B̂_GXY and B̂_GXXY, respectively. Then the main theorems in Yuan and Lin (2006) and Peng et al. (2010) become special cases.

In the eQTL example that we will analyze later, method (5) does not take the advantage of knowing the group structure. Method (6) only concerns the predictor group structure, therefore can select important gene groups. However, it ignores which pathways those genes are associated with. Method (7) considers both predictor and response group structures, therefore can select gene-to-pathway association blocks. Method (8) pertains advantages of both (6) and (7) and is more robust to misspecified group structures.

3.2 Oracle inequalities

The lasso method has been shown to achieve the oracle bounds for both prediction and estimation in the multiple linear regression model, which are the error bounds one would obtain if the true model were given, see for example, Bickel et al. (2009). Similar bounds also hold for a total of pq regression coefficients in the multivariate multiple linear regression model with a multivariate mixed L₁/L₂ penalty. For notational simplicity, we consider the following special case of (2) with λ_jk = λ for all j, k:

\underset{B}{arg min} \frac{1}{2 n} {‖ Y - X B ‖}_{2}^{2} + λ {| B |}_{1} + \sum_{g \in 𝒢} λ_{g} {‖ B_{g} ‖}_{2} .

(9)

We follow the method of Bickel et al. (2009). Let J₁(B) = {jk : |β_jk| ≠ 0} be the index set of nonzero elements in B, and J₂(B) = {g ∈ 𝒢, ‖B_g‖₂ ≠ 0} be the index set of nonzero groups in 𝒢. Define M₁(B) = ∑_jk I(β_jk ≠ 0) = |J₁(B)| and M₂(B) = ∑_g∈𝒢 I(‖B_g‖₂ ≠ 0) = |J₂(B)|. For any matrix Δ ∈ ℝ^p×q and any given index set J₁ ⊆ {jk : 1 ≤ j ≤ p, 1 ≤ k ≤ q}, denote Δ_J₁ the projection of Δ on the index set J₁, that is the matrix with the same elements of Δ on coordinates J₁ and zeros on the complementary coordinates $J_{1}^{c}$ . Also for any group index set J₂ ⊆ {1,⋯,|𝒢|}, denote Δ_J₂ the set of projection of Δ on each of {B_g: g ∈ J₂}, that is Δ_J₂ = {Δ_{B_g} : g ∈ J₂}. Denote M₁(B) = r and M₂(B) = s. We then impose a restricted eigenvalue assumption for the multivariate linear regression model with a multivariate mixed L₁/L₂ penalty, which leads to the desirable oracle inequalities.

Assumption 3.2

Let J₁ ⊆ {jk : 1 ≤ j ≤ p, 1 ≤ k ≤ q} and J₂ ⊆ {1,⋯,|𝒢|} be any index sets that satisfy |J₁| ≤ r and |J₂| ≤ s. Let ρ̃ = {ρ_g : g ∈ 𝒢} be a set of positive numbers. Then for any nontrivial matrix Δ ∈ ℝ^p×q that satisfies

{| Δ_{J_{1}^{c}} |}_{1} + 2 \sum_{g \in J_{2}^{c}} ρ_{g} {‖ Δ_{B_{g}} ‖}_{2} \leq 3 {| Δ_{J_{1}} |}_{1} + 2 \sum_{g \in J_{2}} ρ_{g} {‖ Δ_{B_{g}} ‖}_{2},

the following minimums exist and are positive:

κ_{1} (r, s, \tilde{ρ}) = min_{J_{1}, J_{2}, Δ \neq 0} \frac{{‖ X Δ ‖}_{2}}{n^{1 / 2} {‖ Δ_{J_{1}} ‖}_{2}} > 0, κ_{2} (r, s, \tilde{ρ}) = min_{J_{1}, J_{2}, Δ \neq 0} \frac{{‖ X Δ ‖}_{2}}{n^{1 / 2} {‖ Δ_{J_{2}} ‖}_{2}} > 0 .

Theorem 3.3

Consider model (9). Let B* be the true coefficient matrix. Assume each column of the error matrix, w_k, follows a multivariate normal distribution N(0, σ_kI_n), and all the diagonal elements of the matrix X^TX/n are equal to 1. Suppose M₁(B*) = r and M₂(B*) = s. Let ψ_max be the largest eigenvalue of X^TX/n, σ = max{σ₁,⋯, σ_q}, λ_g = ρ_gλ for g ∈ 𝒢, ρ = min{1, ρ_g; g ∈ 𝒢}, c be the maximum number of duplicates of a coefficient in overlapping groups in 𝒢, and

λ = 2 σ A {log (p q) / n}^{1 / 2}

for some constant A > 2^1/2. Furthermore, assume Assumption 3.2 holds with κ₁ = κ₁(r, s, ρ̃) and κ₂ = κ₂(r, s, ρ̃). Then with probability at least 1 − (pq)^1−A²/2, we have the following oracle bounds for the prediction error, the estimation error and the order of sparsity:

\frac{1}{n} {‖ X (\hat{B} - B^{*}) ‖}_{2}^{2} \leq 16 λ^{2} {(\frac{r^{1 / 2}}{k_{1}} + \frac{{(\sum_{g \in J_{2} (B^{*})} ρ_{g}^{2})}^{1 / 2}}{k_{2}})}^{2},

{| \hat{B} - B^{*} |}_{1} \leq \frac{32 (c + 2) σ A}{1 + ρ} {(\frac{log (p q)}{n})}^{1 / 2} {(\frac{r^{1 / 2}}{k_{1}} + \frac{{(\sum_{g \in J_{2} (B^{*})} ρ_{g}^{2})}^{1 / 2}}{k_{2}})}^{2},

M_{1} (\hat{B}) \leq 64 ψ_{max} {(\frac{r^{1 / 2}}{k_{1}} + \frac{{(\sum_{g \in J_{2} (B^{*})} ρ_{g}^{2})}^{1 / 2}}{k_{2}})}^{2} .

The mean square prediction error is bounded by a factor of order λ² ~ log(pq)/n, the l₁ norm of the estimation error is bounded by a factor of order $\sqrt{log (p q) / n}$ , and the estimated order of sparsity is bounded by a constant related to Assumption 3.2. These results are similar to those in Bickel et al. (2009). Note that Theorem 3.3 will still hold for flexible λ_jk in (2), as long as λ_jk > 0 for all j, k.

4 The mixed coordinate descent algorithm

Based on Theorem 3.1, the zero groups can be determined according to (3) and the entries in a nonzero group can be determined by solving for the fixed point solution of (4) using a coordinate descent algorithm. The coordinate algorithm updates each coefficient coordinate β_jk at a step while fixing all the other coefficients at their current values. Theoretically, the coordinate descent algorithm would work if one can solve (4) for β̂_jk exactly. Practically, since β̂_jk also appears in the term ∑_{{g∈𝒢: β_jk∈B_g, ‖B̂_g‖₂>0}} λ_g/‖B̂_g‖₂ on the right hand side of (4), unlike lasso, a closed form solution is usually not available and numerically solving for β̂_jk requires iteratively updating (4), which can be time consuming. Here we propose a mixed coordinate descent algorithm, which only updates β̂_jk once from ${\hat{β}}_{j k}^{(m - 1)}$ to ${\hat{β}}_{j k}^{(m)}$ according to (4) without iteratively solving (4). In particular, the algorithm updates β̂_jk by the following.

If any of the groups B_g ∈ 𝒢 containing β_jk satisfies (3), then the entire group is estimated at zero. Otherwise β̂_jk will be updated according to one of the situations (II)–(IV).
If all the groups containing β_jk satisfy ${‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} = 0$ at the current step, where ${\hat{B}}_{g - (j k)}^{(m - 1)}$ is ${\hat{B}}_{g}^{(m - 1)}$ with its jkth element replaced by zero, then β̂_jk is updated by
${\hat{β}}_{j k}^{(m)} = \frac{sgn (S_{j k}^{(m - 1)}) {(| S_{j k}^{(m - 1)} | - n \sum_{{g \in 𝒢 : β_{j k} \in B_{g}, {‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} = 0}} λ_{g} - n λ_{j k})}_{+}}{{‖ x_{j} ‖}_{2}^{2}} .$
Notice that in this case, (4) becomes a closed form lasso solution.
If all the groups containing β_jk satisfy ${‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} > 0$ at the current step and λ_jk = 0, then ${\hat{β}}_{j k}^{(m - 1)}$ is updated by the group lasso formulation
${\hat{β}}_{j k}^{(m)} = \frac{S_{j k}^{(m - 1)}}{{‖ x_{j} ‖}_{2}^{2} + n \sum_{{g \in 𝒢 : β_{j k} \in B_{g}, {‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} > 0}} λ_{g} / {‖ {\hat{B}}_{g}^{(m - 1)} ‖}_{2}} .$
Notice in this case, all the entries in B_g with ‖B̂_g−(jk)‖₂ > 0 will enter as nonzero entries, or in other words, the whole group B_g will be selected as an important group.
If some but not all groups containing β_jk satisfy ‖B̂_g−(jk)‖₂ = 0 at the current step, then ${\hat{β}}_{j k}^{(m - 1)}$ belongs to a mixture of the lasso case (for groups with ${‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} = 0$ ) and the group lasso case (for groups with ${‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} > 0$ ), and it is updated as if by a mixture of the lasso and the group lasso through
${\hat{β}}_{j k}^{(m)} = \frac{sgn (S_{j k}^{(m - 1)}) {(| S_{j k}^{(m - 1)} | - n \sum_{{g \in 𝒢 : β_{j k} \in B_{g}, {‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} = 0}} λ_{g} - n λ_{j k})}_{+}}{{‖ x_{j} ‖}_{2}^{2} + n \sum_{{g \in 𝒢 : β_{j k} \in B_{g} {‖ {\hat{B}}_{g - (j k)}^{(m - 1)} ‖}_{2} > 0}} λ_{g} / {‖ {\hat{B}}_{g}^{(m - 1)} ‖}_{2}} .$

Specifically, the algorithm is given in the following for a fixed set of values of all the tuning parameters.

Step 1. Standardize the data such that
$\sum_{i = 1}^{n} y_{i k} = 0, \sum_{i = 1}^{n} x_{i j} = 0, \sum_{i = 1}^{n} x_{i j}^{2} = 1, for all j \in {1, \dots, p}, k \in {1, \dots, q} .$
In our numerical examples, we also standardize y_k such that $\sum_{i = 1}^{n} y_{i k}^{2} = 1$ to minimize the impact of different scales of variations across y_k on the regression coefficients for all k ∈ {1,⋯ q}.
Step 2. Set initial values for all β̂_jk and the iteration index m = 1. We use initial values ${\hat{β}}_{j k}^{(0)} = 0$ in our numerical examples.
Step 3. For a given pair (j, k), fix β_j′k′ at ${\hat{β}}_{j' k'}^{(m - 1)}$ for all j′ ≠ j or k′ ≠ k. Then update ${\hat{β}}_{j k}^{(m - 1)}$ to ${\hat{β}}_{j k}^{(m)}$ by (I) to (IV) accordingly.
Step 4. Repeat Step 3 for all j ∈ {1,⋯, p} and k ∈ {1,⋯, q}, and iterate until ‖B̂^(m) − B̂^(m−1)‖ reaches a prespecified precision level for some norm ‖·‖. We use infinity norm in our numerical examples.

Convergence of different types of coordinate descent algorithms have been studied in the literature. Tseng (2001) provided conditions for convergence of cyclic coordinate descent algorithm with general separable objective functions. Wu and Lange (2008) proved the convergence of greedy coordinate descent algorithm with a L₂ loss and the lasso penalty. Based on Wu and Lange (2008), we show the convergence of our mixed coordinate descent algorithm which is given in the following proposition. Details are provided in the supplemental materials, where we also illustrate that the speed of convergence of our mixed coordinate descent algorithm is much faster than the coordinate descent algorithm that solves the fixed point solution to (4) with inner iterations.

Proposition 4.1

A sequence of coordinate estimates iteratively updated by the mixed coor- dinate descent algorithm converge to a global minimizer of the objective function.

We implemented the MSGLasso and the mixed coordinate descent algorithm with C/C++ language and wrapped into an R package. It is available on the web-based Supplementary Materials and will soon be upload to CRAN repository.

5 Numerical studies

5.1 Simulations

In this section, we first investigate the numerical performances of Lasso, Lasso+X, Lasso+XY, Lasso+X+XY methods and their group lasso counterparts when the true coefficient matrix B* takes a group structure of either 𝒢_X, 𝒢_XY or 𝒢_XY ∪ 𝒢_X. We also compare the proposed MSGLasso method with lasso and group lasso for an overlapping group structure.

All the true group structures considered in our simulations are given in Fig.1(a)–1(d). For each group structure, we consider two scenarios: (i) “all-in-all-out”, where all the coefficients in an important group are important, and (ii) “not-all-in-all-out”, where only a subset of coefficients in an important group are important. Specifically, we generate B* by setting $β_{j k}^{*} = 0$ if it is from an unimportant group, and drawing its value from a uniform distribution on [−5,−1] ∪ [1, 5] and fixing it for the simulations if it is from an important group. The sparsity of an important group in the “not all in all out” setting is randomly set between 1/4 and 1/6.

Each B* is of dimension 200 × 200. For a nonoverlapping group structure, each X row group is of dimension 20 × 200; each XY block group is of dimension 20 × 20. For the overlapping group structure, the groups start on coordinates (1, 21, 41, 61, 101, 121, 141, 181) and end on coordinates (20, 40, 70, 100, 120, 150, 180, 200), for both X and Y variables.

Covariates $X_{i \cdot}^{T}$ , i = 1,⋯, n, are generated from a multivariate normal distribution N_p(0, Σ_X), where Σ_X = diag(Σ_g₁,⋯, Σ_g₁₀) is block diagonal and each block corresponds to each group of X which has the first order autoregressive structure. Specifically, Σ_{g_i} (j, k) = ρ^|j−k| for any j, k pair from the same group, i = 1,⋯, 10. The error terms w_ik are generated from a normal distribution N(0, σ²), where σ² is to yield a signal to noise ratio of 2. Finally, the responses are generated from Y = XB* +W.

The optimal values of tuning parameters may be selected by different criteria. Since the degrees of freedom are difficult to determine for a penalty with multiple tuning parameters, we search for the optimal tuning parameter values using a 5-fold cross-validation over a wide range of candidate values. The searching process starts with the largest candidate tuning parameter values with each by itself shrinking all the coefficients to zero. The converged estimates B̂ obtained from the previous searching step are used as the initial values for B in the next searching step with a new set of tuning parameter values. We find it is very effective in reducing the computational cost.

For each simulation setup, we run a hundred replications and calculate the averages of the following quantities:

false positives = | {i j pairs : {\hat{β}}_{i j} \neq 0 and β_{i j}^{*} = 0} |,

false negatives = | {i j pairs : {\hat{β}}_{i j} = 0 and β_{i j}^{*} \neq 0} |,

sensitivity = \frac{| {i j pairs : {\hat{β}}_{i j} \neq 0 and β_{i j}^{*} \neq 0} |}{| {i j pairs : β_{i j}^{*} \neq 0} |},

specificity = \frac{| {i j pairs : {\hat{β}}_{i j} = 0 and β_{i j}^{*} = 0} |}{| {i j pairs : β_{i j}^{*} = 0} |},

prediction error = {‖ Y_{test} - X_{test} \hat{B} ‖}_{2}^{2},

where |·| is the number of elements in a set and (Y_test, X_test) is an independently generated testing set of 100 samples.

Figure 2 summarizes these quantities for simulation setups with “not all in and all out” for all the group structures in Fig.1 at p = q = 200, n = 150, and ρ = 0.5. The proposed method using Lasso+X+XY for the nonoverlapping group structures 𝒢_X, 𝒢_XY and 𝒢_XY ∪𝒢_X as well as for the overlapping group structure are highlighted in black. The methods for the correctly specified group structures are highlighted in grey except in Fig.2(c) and Fig.2(d), where the implemented group structures are by themselves the correctly specified group structures. From Fig.2 we see that correctly incorporating group structure improves both variable selection and prediction, and our proposed method Lasso+X+XY, or the MSGLasso, performs at least the same as, if not better than, the methods for the correct group structures and yields the lowest prediction errors.

Simulation results, large p small n, “not all in all out” cases with n = 100, p = q = 200 and ρ = 0.5. SGL: the multivariate sparse group lasso; G: the multivariate group lasso.

Figure 3 illustrates fitted results for a data set randomly chosen from one hundred replications, where B* has a “not all in all out” either 𝒢_XY ∪ 𝒢_X or overlapping group structure with p = 200, q = 200 and ρ = 0.5. It clearly shows that the MSGLasso results for correctly specified group structure, both in Fig.3(e) and in Fig.3(k), yield the most desirable estimates. Methods without lasso penalty yield too many false positives inside the important groups for the “not all in all out” case even when the groups are correctly specified, while methods with lasso penalty but incorrectly specified groups yield too many false positives outside the important groups.

Heatmaps of coefficient matrices, selection effects. (a)–(h): “Not all in all out” *X+XY* nonoverlapping group structure with n = 100, p = 200, q = 200, and ρ = 0.5. (a) B*; (b) B̂_L; (c) B̂_LX; (d) B̂_LXY ; (e) B̂_LXXY ; (f) B̂_GX; (g) B̂_GXY ; (h) B̂_GXXY. (i)–(l): “Not all in all out” overlapping group structure with n = 100, p = 200, q = 200, and ρ = 0.5. (i) B*; (j) B̂_L; (k) B̂_SGL; (l) B̂_G.

5.2 Yeast eQTL data analysis

In this section, we demonstrate our method by analyzing a yeast eQTL data set generated by Brem and Kruglyak (2005), see also Yin and Li (2011), where gene expressions are grouped into, possibly overlapping, pathways and the genetic markers are grouped into genes.

The data set contains 6216 yeast genes assayed for 112 individual segregant. Genotypes of these 112 segregant at 2956 marker positions were also collected using GeneChip Yeast Genome S98 microarrays. The 6216 expressed genes are grouped by Kyoto Encyclopedia of Genes and Genomes pathways and the 2956 markers are grouped by genes, taking isoform genes as the same gene. To illustrate the method, in the reported analysis we only include genes from the following four pathways: the mitogen-activated protein kinases (MAPK) pathway containing 54 genes, the cell cycle pathway containing 116 genes, the cancer pathway containing 20 genes and the ribosome pathway containing 137 genes. There are in total 315 distinct expressed genes in these pathways, with 5 genes overlapping between MAPK and cell cycle, 5 genes overlapping between MAPK and cancer, 3 genes overlapping between cell cycle and cancer, and 1 gene overlapping between MAPK, cell cycle and cancer. Ribosome does not contain overlapping genes with the other three pathways.

We follow a similar procedure of Yin and Li (2011) for prescreening genotype markers by performing univariate linear regressions across all the 315 gene expressions and 2956 markers, and include the 395 markers with p-value of 0.01 or smaller into the final analysis. These 395 markers are embedded in 45 distinct genes.

Since some marker within a gene is associated with some gene expression in a pathway does not necessarily imply the gene must be associated with all four pathways, we exclude the 𝒢_X group structure and only apply an overlapping 𝒢_XY group structure in the data analysis. We cross-validate the performance of the multivariate sparse group lasso, the multivariate lasso, the multivariate group lasso and the univariate lasso. In particular, we randomly divide the 112 samples into five approximately equal sized subsets, set one subset aside as the test set, and use the remaining four subsets as the training set. Then for each model, we run 5-fold cross-validation on the training set to estimate the coefficient matrix, and use the estimated model to compute the prediction error on the test set. We repeat the above procedures until each of the five subsets has been used as the test set once. The overall cross-validated prediction errors, the sum of squares, are reported in Table 1. The univariate lasso is conducted by first selecting variables on the training set using 315 separate lasso regressions, each for a single gene expression variable, and then implementing multivariate linear regression on only the selected set of covariates to obtain B̂. Our proposed method has the best performance. The univariate lasso gives the highest prediction error, this is expected as the relations among responses are totaly overlooked. And this leads to high variability and over-fitting (Peng et al., 2010). The proposed method shows roughly a 10% decrease of the cross-validated prediction error over the multivariate lasso method, the second best approach among all four compared methods.

Table 1.

Comparison of prediction errors between different methods

Method	MSG lasso	M lasso	MG lasso	lasso
Prediction error	3094.5	3396.8	3557.4	3683.3

Open in a new tab

MSG lasso = multivariate sparse group lasso, M lasso = multivariate lasso, MG lasso = multivariate group lasso, lasso = univariate lassos.

We then apply the multivariate sparse group lasso to the entire data set with 315 gene expressions and 395 markers. The final tuning parameters are λ = 7 × 10⁻² and λ₁ = 2 × 10⁻⁴ determined by a 5-fold cross-validation. We also investigate the selection stability following Meinshausen and Bühlmann (2010) by calculating the selection frequencies of the top selected associations using one hundred bootstrap datasets. The top associations in terms of size, with selection frequency no less than 95%, are given in Table 2. The p-values in the last column are obtained from marginal simple linear regressions. Overall there are 1422 nonzero elements in the estimated coefficient matrix, which gives an overall estimated sparsity of about 1%. There are 235 markers with nonzero coefficients related to genes in the MAPK pathway, 135 markers related to genes in the cell cycle pathway, 65 markers related to genes in the cancer pathway, and 65 markers related to genes in the ribosome pathway. Among those, 34 markers are related to genes in the overlap of MAPK and cell cycle pathways, 23 markers are related to genes in the overlap of MAPK and cancer pathways, and 5 markers is related to a gene in the overlap of MAPK, cell cycle and cancer pathways.

Table 2.

Top selected expression-marker associations

Index	β̂_jk	Sel. Freq.^* (%)	Expr.^** name	Expr. pathways	Marker Chr:BP^***	Marker gene	p-value
1	−1.481	100	YKL178C	MAPK	3:201166	YCR041W	2.43e-51
2	1.465	100	YFL026W	MAPK	3:201166	YCR041W	2.81e-55
3	−1.264	100	YPL187W	MAPK	3:201166	YCR041W	7.10e-45
4	1.061	100	YNL145W	MAPK	3:201166	YCR041W	5.54e-39
5	−0.735	100	YGL089C	MAPK	3:201166	YCR041W	8.53e-20
6	0.650	100	YFL026W	MAPK	3:201167	YCR041W	2.81e-55
7	−0.649	100	YKL178C	MAPK	3:201167	YCR041W	2.43e-51
8	−0.554	98	YPL187W	MAPK	3:201167	YCR041W	7.10e-45
9	0.452	100	YDR461W	MAPK	3:201166	YCR041W	8.42e-14
10	−0.385	98	YPL187W	MAPK	3:177850	gCR02	1.65e-33
11	0.352	100	YGR088W	MAPK	15:170945	gOL02	1.52e-10
12	0.346	100	YGR088W	MAPK	15:174364	gOL02	1.51e-10
13	−0.318	97	YKL178C	MAPK	3:177850	gCR02	2.44e-37
14	0.257	98	YGR088W	MAPK	10:51003	YJL204C	0.044
15	−0.175	95	YGL089C	MAPK	2:681361	YML056C	0.66

Open in a new tab

Sel. Freq. = Selection Frequency.

^**

expr. = gene expression.

^***

Marker is denoted by its physical position in the format of “chromosome:basepair”.

Table 3 lists the top pathway-gene groupwise associations in terms of the group L₂ norms with a 100% group-wise selection frequency. Out of 180 block groups, 89 groups contain nonzero coefficients. Several top selected genes have been reported in the literature. For example, one of the isoforms of YCR gene, YCR073C/SSK22 is MAPK cascade involved in osmosensory signaling pathway. Gene groups YJL and YGR in the Scr homology 3 domains are interacting with gene Pbs2 in one of the three kinase components in the MAPK pathway (Zarrinpar et al., 2003). The top association signals detected between the gene expressions in the joint of MAPK, cell cycle and cansor pathways and markers in NHR gene group also confirm the regulation effects of NHR genes on cell cycle pathway and other autophagyrelated genes (Nicole, 2011).

Table 3.

Top selected pathway-gene associations (with 100% selection frequency)

Index	Pathway	Gene	‖B̂_g‖₂	Number of nonzero β̂_jk in group	Top expr.^* in pathway	Top marker^** in gene	Top β̂_jk in group
1	MAPK	YCR	3.06	23	YKL178C	3:201166	−1.481
2	MAPK	gOL	0.508	10	YGR088W	15:170945	0.352
3	MAPK	gCR	0.499	3	YPL187W	3:177850	−0.385
4	MAPK	YJL	0.424	23	YGR088W	10:51003	0.257
5	MAPK	NHR	0.420	49	YCL027W	8:111686	−0.184
6	MAPK	NBR	0.382	15	YGL089C	2:681361	0.207
7	MAPK	YBR	0.372	81	YGR088W	2:368060	0.165
8	ribosome	YER	0.342	119	YER102W	5:350744	−0.063
9	cancer	YLR	0.286	14	YJR048W	12:674651	0.164
10	MAPK	YGR	0.275	3	YGL089C	7:916471	−0.172
11	MAPK	YPL	0.274	18	YGR088W	12:428612	0.240
12	MAPK	YLR	0.252	62	YCL027W	12:957108	0.092
13	MAPK	YER	0.229	23	YPL187W	7:321714	0.135
14	MAPK	YML	0.214	23	YGL098C	13:164026	−0.175
15	MAPK	YHL	0.205	15	YKL178C	8:98513	−0.128
16	MAPK	YNL	0.183	23	YGL089C	14:418269	−0.083
17	MAPK	YCL	0.176	27	YCL027W	3:64311	0.140
18	MAPK; cell cycle	NHR	0.175	44	YJL157C	8:111686	−0.061
19	MAPK	gJL	0.131	9	YFL026W	10:259991	0.098
20	MAPK MAPK;	YOL	0.125	26	YPL187W	15:193911	0.084
21	cell cycle; cancer	NHR	0.098	5	YBL016W	8:111686	−0.044
22	cell cycle	YCR	0.067	5	YLR288C	3:201166	0.046
23	cell cycle	YCL	0.063	16	YDL003W	3:64311	−0.035
24	cell cycle	YLR	0.029	37	YBR093C	12:674651	0.012

Open in a new tab

expr. = gene expression.

^**

Top marker in gene is denoted by its physical position in the format of “chromosome:basepair”.

It worth noting that none of the association p-values from marginal simple linear regressions between gene YJL and pathway MAPK survives the Bonferroni correction for multiple comparisons. For example, the 14^th signal in Table 2 has a univariate marginal p-value of 0.044, therefore it is unlikely to be picked up by the pairwise analysis. However, the MSGLasso successfully selected this signal in an adjusted analysis with high individual and group selection frequencies, see Tables 2 and 3. This finding is supported by Zarrinpar et al. (2003). It demonstrates that besides the advantage of dimension reduction, the MSGLasso can also pick out important signals that would be missed by the pairwise method.

The stability selection results show that the first 40 selected top signals do not contain zero within their 2.5%–97.5% bootstrap percentile band, and the bootstrap Q1–Q3 band of the top 100 selected signals do not contain zero, indicating that the top selected signals using proposed method have high selection frequencies from bootstrap samples.

6 Discussion

For a predetermined group structure, the MSGLasso effectively and efficiently selects the important groups and important individual signals within those groups. There is some interest in recent literature in learning the group structure and selecting the important variables simultaneously. For example, Yin and Li (2011) proposed a conditional Guassian graphical model to select nonzero entries in the precision matrix conditional on simultaneously selected predictors. It is of interest to select important predictors via the MSGlasso based on a data driven group structure, where the selection of group structure is a topic for future research.

The L1/L2 penalty in the MSGLasso ensures that the objective function is a convex function with respect to B. The convexity is essential for the proposed mixed coordinate descent algorithm. Replacing the L1 penalty by the SCAD penalty (Fan and Li, 2001) would be of interest, but the respective optimization is non-convex, thus not guaranteed to converge to the global minimum. More research along this line is needed.

Supplementary Material

Supp Material

NIHMS680623-supplement-Supp_Material.pdf^{(2.1MB, pdf)}

Acknowledgements

The authors thank Dr. Hongzhe Li for providing the yeast eQTL data and helpful discussions. The research was supported in part by the National Institute of Health grant R01-AG036802 and the National Science Foundation grants DMS-1007590 and DMS-0748389.

Footnotes

Supplementary Materials

Web Appendices for the proofs of theoretical results referenced in Sections 3 and 4, computing cost comparison and MSGLasso package referenced in Section 4, and additional numerical results are available with this paper at the Biometrics website on Wiley Online Library.

References

Bickel PJ, Ritov Y, Tsybakov AB. Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 2009;37:1705–1732. [Google Scholar]
Biswas S, Lin S. Logistic bayesian lasso for identifying association with rare haplotypes and application to age-related macular degeneration. Biometrics. 2012;68:587–597. doi: 10.1111/j.1541-0420.2011.01680.x. [DOI] [PubMed] [Google Scholar]
Brem RB, Kruglyak L. The landscape of genetic complexity across 5700 gene expression traits in yeast. Proceddings of National Academy of Sciences. 2005;102:1572–1577. doi: 10.1073/pnas.0408709102. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bunea F, She Y, Wegkamp M. Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Stat. 2011;39:1282–1309. [Google Scholar]
Dudoit S, SHaffer J, Boldrick J. Multiple hypothesis testing in microarray experiments. Statistical Science. 2003;18:71–103. [Google Scholar]
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 2001;96:1348–1360. [Google Scholar]
Huang J, Ma S, Xie H, Zhang C. A group bridge approach for variable selection. Biometrika. 2009;2:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lounici K, Pontil M, Tsybakov AB, van de Geer S. Oracle inequalities and optimal inference under group sparsity. Annal of Statistics. 2011;39:2164–2204. [Google Scholar]
Meinshausen N, Bühlmann P. Stability selection. J. R. Statist. Soc. B. 2010;72:417–473. [Google Scholar]
Nicole A. Integration of nutritional status with germline proliferation: characterizing the roles of nhr-88 and nhr-49 in the c. elegans gonad. 2011 [Google Scholar]
Obozinski G, Wainwright M, Jordan M. Support union recovery in highdimensional multivariate regression. Ann. Stat. 2011;39:1–47. [Google Scholar]
Park MY, Hastie T. Penalized logistic regression for detecting geneinteractions. Biostat. 2008;9:30–50. doi: 10.1093/biostatistics/kxm010. [DOI] [PubMed] [Google Scholar]
Peng J, Zhu J, Bergamaschi A, Han W, D Y, Noh JP, Wang P. Newblock regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 2010;4:53–77. doi: 10.1214/09-AOAS271SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
Simon N, Friedman J, Hastie T, Tibshirani R. A sparse-group lasso. Journal of Computational and Graphical. Statistics. 2013;22:231–245. [Google Scholar]
Stein J, Hua X, Lee S, Ho A, Leow A, Toga A, Saykin A, Shen L, Foroud T, Pankratz N, Huentelman M, Craig D, Gerber J, Allen A, Corneveaux J, Dechairo B, Potkin S, Weiner M, Thompson P, Initiative ADN. Voxelwise genome-wide association study (vgwas) Neuroimage. 2010;53(3):1160–1174. doi: 10.1016/j.neuroimage.2010.02.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B. 1996;58:267–288. [Google Scholar]
Tseng P. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization: Theory and Applications. 2001;109:275–294. [Google Scholar]
Wu T, Lange K. Coordinate descent algorithms for lasso penalized regression. Annal of Applied Statistics. 2008;2:224–244. [Google Scholar]
Yin J, Li H. A sparse conditional gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Stat. 2011;4:2630–2650. doi: 10.1214/11-AOAS494. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B. 2006;68:49–67. [Google Scholar]
Zamdborg L, Ma P. Discovery of protein-dna interactions by penalized multivariate regression. Nucl. Acids Res. 2009;37:5246–5254. doi: 10.1093/nar/gkp554. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zarrinpar A, Park SH, Lim WA. Optimization of specificity in a cellular protein interaction network by negative selection. Nature. 2003;426:676–680. doi: 10.1038/nature02178. [DOI] [PubMed] [Google Scholar]
Zhang S, Ching W, Tsing N, Leung H, Guo D. A new multiple regression approach for the construction of genetic regulatory networks. Artificial Intelligence in Medicine. 2010;48:153–160. doi: 10.1016/j.artmed.2009.11.001. [DOI] [PubMed] [Google Scholar]
Zhou H, Sehl ME, Sinsheimer JS, Lange K. Association screening of common and rare genetic variants by penalized regression. Nucl. Acids Res. 2010;26:2375–2382. doi: 10.1093/bioinformatics/btq448. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou N, Zhu J. Group variable selection via a hierarchical lasso and its oracle property. Statistics and Its Interface. 2010;4:557–574. [Google Scholar]
Zou H. The adaptive lasso and its oracle properties. J. Am. Statist. Assoc. 2006;101:1418–1429. [Google Scholar]
Zou H, Hastie T. Regularization and variable selection via the elastic net. J. R. Statist. Soc. B. 2005;67:301–320. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

NIHMS680623-supplement-Supp_Material.pdf^{(2.1MB, pdf)}

[R1] Bickel PJ, Ritov Y, Tsybakov AB. Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 2009;37:1705–1732. [Google Scholar]

[R2] Biswas S, Lin S. Logistic bayesian lasso for identifying association with rare haplotypes and application to age-related macular degeneration. Biometrics. 2012;68:587–597. doi: 10.1111/j.1541-0420.2011.01680.x. [DOI] [PubMed] [Google Scholar]

[R3] Brem RB, Kruglyak L. The landscape of genetic complexity across 5700 gene expression traits in yeast. Proceddings of National Academy of Sciences. 2005;102:1572–1577. doi: 10.1073/pnas.0408709102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Bunea F, She Y, Wegkamp M. Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Stat. 2011;39:1282–1309. [Google Scholar]

[R5] Dudoit S, SHaffer J, Boldrick J. Multiple hypothesis testing in microarray experiments. Statistical Science. 2003;18:71–103. [Google Scholar]

[R6] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 2001;96:1348–1360. [Google Scholar]

[R7] Huang J, Ma S, Xie H, Zhang C. A group bridge approach for variable selection. Biometrika. 2009;2:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Lounici K, Pontil M, Tsybakov AB, van de Geer S. Oracle inequalities and optimal inference under group sparsity. Annal of Statistics. 2011;39:2164–2204. [Google Scholar]

[R9] Meinshausen N, Bühlmann P. Stability selection. J. R. Statist. Soc. B. 2010;72:417–473. [Google Scholar]

[R10] Nicole A. Integration of nutritional status with germline proliferation: characterizing the roles of nhr-88 and nhr-49 in the c. elegans gonad. 2011 [Google Scholar]

[R11] Obozinski G, Wainwright M, Jordan M. Support union recovery in highdimensional multivariate regression. Ann. Stat. 2011;39:1–47. [Google Scholar]

[R12] Park MY, Hastie T. Penalized logistic regression for detecting geneinteractions. Biostat. 2008;9:30–50. doi: 10.1093/biostatistics/kxm010. [DOI] [PubMed] [Google Scholar]

[R13] Peng J, Zhu J, Bergamaschi A, Han W, D Y, Noh JP, Wang P. Newblock regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 2010;4:53–77. doi: 10.1214/09-AOAS271SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Simon N, Friedman J, Hastie T, Tibshirani R. A sparse-group lasso. Journal of Computational and Graphical. Statistics. 2013;22:231–245. [Google Scholar]

[R15] Stein J, Hua X, Lee S, Ho A, Leow A, Toga A, Saykin A, Shen L, Foroud T, Pankratz N, Huentelman M, Craig D, Gerber J, Allen A, Corneveaux J, Dechairo B, Potkin S, Weiner M, Thompson P, Initiative ADN. Voxelwise genome-wide association study (vgwas) Neuroimage. 2010;53(3):1160–1174. doi: 10.1016/j.neuroimage.2010.02.032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B. 1996;58:267–288. [Google Scholar]

[R17] Tseng P. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization: Theory and Applications. 2001;109:275–294. [Google Scholar]

[R18] Wu T, Lange K. Coordinate descent algorithms for lasso penalized regression. Annal of Applied Statistics. 2008;2:224–244. [Google Scholar]

[R19] Yin J, Li H. A sparse conditional gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Stat. 2011;4:2630–2650. doi: 10.1214/11-AOAS494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B. 2006;68:49–67. [Google Scholar]

[R21] Zamdborg L, Ma P. Discovery of protein-dna interactions by penalized multivariate regression. Nucl. Acids Res. 2009;37:5246–5254. doi: 10.1093/nar/gkp554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Zarrinpar A, Park SH, Lim WA. Optimization of specificity in a cellular protein interaction network by negative selection. Nature. 2003;426:676–680. doi: 10.1038/nature02178. [DOI] [PubMed] [Google Scholar]

[R23] Zhang S, Ching W, Tsing N, Leung H, Guo D. A new multiple regression approach for the construction of genetic regulatory networks. Artificial Intelligence in Medicine. 2010;48:153–160. doi: 10.1016/j.artmed.2009.11.001. [DOI] [PubMed] [Google Scholar]

[R24] Zhou H, Sehl ME, Sinsheimer JS, Lange K. Association screening of common and rare genetic variants by penalized regression. Nucl. Acids Res. 2010;26:2375–2382. doi: 10.1093/bioinformatics/btq448. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Zhou N, Zhu J. Group variable selection via a hierarchical lasso and its oracle property. Statistics and Its Interface. 2010;4:557–574. [Google Scholar]

[R26] Zou H. The adaptive lasso and its oracle properties. J. Am. Statist. Assoc. 2006;101:1418–1429. [Google Scholar]

[R27] Zou H, Hastie T. Regularization and variable selection via the elastic net. J. R. Statist. Soc. B. 2005;67:301–320. [Google Scholar]

PERMALINK

Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure

Yanming Li

Bin Nan

Ji Zhu

Summary

1 Introduction

2 Multivariate linear model with arbitrary grouping

Figure 1.

3 The regularization method and its properties

3.1 The multivariate sparse group lasso

Theorem 3.1

3.2 Oracle inequalities

Assumption 3.2

Theorem 3.3

4 The mixed coordinate descent algorithm

Proposition 4.1

5 Numerical studies

5.1 Simulations

Figure 2.

Figure 3.

5.2 Yeast eQTL data analysis

Table 1.

Table 2.

Table 3.

6 Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure

Yanming Li

Bin Nan

Ji Zhu

Summary

1 Introduction

2 Multivariate linear model with arbitrary grouping

Figure 1.

3 The regularization method and its properties

3.1 The multivariate sparse group lasso

Theorem 3.1

3.2 Oracle inequalities

Assumption 3.2

Theorem 3.3

4 The mixed coordinate descent algorithm

Proposition 4.1

5 Numerical studies

5.1 Simulations

Figure 2.

Figure 3.

5.2 Yeast eQTL data analysis

Table 1.

Table 2.

Table 3.

6 Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases