Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jun 22.
Published in final edited form as: Biometrics. 2010 Jul 21;67(2):495–503. doi: 10.1111/j.1541-0420.2010.01463.x

Fixed and Random Effects Selection in Mixed Effects Models

Joseph G Ibrahim 1,*, Hongtu Zhu 1,**, Ramon I Garcia 1,***, Ruixin Guo 1,****
PMCID: PMC3041932  NIHMSID: NIHMS216083  PMID: 20662831

SUMMARY

We consider selecting both fixed and random effects in a general class of mixed effects models using maximum penalized likelihood (MPL) estimation along with the smoothly clipped absolute deviation (SCAD) and adaptive LASSO (ALASSO) penalty functions. The maximum penalized likelihood estimates are shown to posses consistency and sparsity properties and asymptotic normality. A model selection criterion, called the ICQ statistic, is proposed for selecting the penalty parameters (Ibrahim, Zhu and Tang, 2008). The variable selection procedure based on ICQ is shown to consistently select important fixed and random effects. The methodology is very general and can be applied to numerous situations involving random effects, including generalized linear mixed models. Simulation studies and a real data set from an Yale infant growth study are used to illustrate the proposed methodology.

Keywords: ALASSO, Cholesky decomposition, EM algorithm, ICQ criterion, Mixed Effects selection, Penalized likelihood, SCAD

1. Introduction

In the analysis of mixed effects models, a primary objective is to assess significant fixed effects and/or random effects of the outcome variable. For instance, when simultaneously selecting both random and fixed effects, that is, when selecting mixed effects, it is common to use a selection procedure (e.g., forward or backward elimination), coupled with a selection criterion, such as AIC and/or BIC based on the observed data log-likelihood, to compare a set of candidate models (Keselman et al., 1998; Gurka, 2006; Liang, Wu, and Zou, 2008; Ibrahim, Zhu, and Tang, 2008; Claeskens and Consentino, 2008). Zhu and Zhang (2006) proposed a testing procedure based on a class of test statistics for a general mixed effects model to test the homogeneity hypothesis that all of the variance components are zero. Such methods, however, suffer from a serious deficiency in that it is infeasible to simultaneously select significant random and fixed (mixed) effects from a large number of possible models (Fan and Li, 2001; Fan and Li, 2002). To overcome such a deficiency, variable selection procedures based on penalized likelihood methods, such as the Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001) and the Adaptive Lasso (ALASSO) (Zou, 2006), may be developed to select mixed effects.

Compared to the large body of literature on variable selection procedures, we make several novel contributions in this paper. This is one of the few papers on developing selection methods for selecting mixed effects in a large class of mixed effects models. Most variable selection procedures are developed for various parametric models and semiparametric models with/without random effects and/or unobserved data (Fan and Li, 2002, 2004; Cai et al., 2005; Qu and Li, 2006; Zhang and Lu, 2007; Ni, Zhang, and Zhang, 2009; Johnson, Lin, and Zeng, 2008; Garcia, Ibrahim, and Zhu 2010a, 2010b), but all these procedures have only been used for the selection of significant fixed effects. The only exception is the recent work by Krishna (2009) and Bondell, Krishna, and Ghosh (2010), in which only the linear mixed model is considered. We use a novel reparametrization to reformulate the selection of mixed effects into the problem of grouped variable selection in models with heavy ‘missing’ data, where the missing data is represented by the random effects. This reparametrization makes it possible to use penalized likelihood methods to select both fixed and random effects. Compared to most variable selection methods for linear models, we must address additional challenges due to the presence of missing observations for each subject. A computational challenge here is to directly maximize the observed data log-likelihood function along with the SCAD or ALASSO penalties to select both fixed and random effects and to calculate their estimates. The observed data log-likelihood for complicated mixed effects models is often not available in closed form, and is computationally intractable because it may involve high dimensional integrals which are difficult to approximate. When selecting random effects, this maximization is further complicated because one must eliminate the corresponding row and column of an insignificant random effect and constrain the remaining matrix to be positive definite. Another challenge is to select appropriate penalty parameters in order to produce estimates having proper asymptotic properties (Fan and Li, 2001), whereas existing selection criteria (Kowalchuck et al., 2004; Gurka, 2006; Liang, Wu, and Zou, 2008; Claeskens and Consentino, 2008) are computationally difficult for general mixed effect models.

The goal of this paper is to develop a simultaneous fixed and random effects selection procedure based on the SCAD and ALASSO penalties for application to longitudinal models, correlated models, and/or mixed effects models. We reformulate the problem of selecting mixed effects and develop a method based on the ICQ criterion to select the penalty parameters. We also specify the penalty parameters in the SCAD and ALASSO penalty functions as a hyperparameter, and then we use the Expectation Maximization (EM) algorithm to simultaneously optimize the penalized likelihood function and estimate the penalty parameters. Under some regularity conditions, we establish the asymptotic properties of the maximum penalized likelihood estimator and the consistency of the ICQ-based penalty selection procedure.

To motivate the proposed methodolgy, we consider a dataset from a Yale infant growth study (Wasserman and Leventhal, 1993; Stier et al., 1993). The objective of this study is to investigate the relationship between maternal cocaine dependency and child maltreatment (physical abuse, sexual abuse, or neglect). This study had a total of 298 children from the cocaine exposed and unexposed groups. The outcome variable is infant weight (in pounds), which is obtained over several time points. Seven covariates were considered: day of visit, age of mother, gestational age of infant, race, previous pregnancies, gender of infant, and cocaine exposure. Each child had different number and pattern of visits during the study. We consider the mixed effects model by using the seven covariates as fixed effects and the first three covariates as random effects. Our objective in this analysis is to select the most important predictors of infant weight as well as select significant random effects. The selection of random effects is crucial in this application, as it is not at all clear whether a random intercept model will suffice or whether the longitudinal model should also contain random slope effects. Moreover, there is large number of covariates to select from in the fixed effects component of the model. The selection can be done by our penalized likelihood method, which includes a penalty function (SCAD or ALASSO) with a random effect and an ICQ penalty estimate. More details regarding the analyses of these data set is given in Section 5.

The rest of the paper is organized as follows. Section 2 introduces the general development for maximizing the penalized likelihood function and selecting the penalty parameters. Section 3 examines the asymptotic properties of the maximum penalized likelihood (MPL) estimator and the ICQ penalty selection procedure. Section 4 presents a simulation study to examine the finite sample performance of the maximum penalized likelihood estimate. An real data analysis of the Yale infant growth study is given in Section 5. Section 6 concludes the paper with some discussion.

2. Mixed effects selection for mixed effects models

2.1 Model Formulation

Suppose we observe n independent observations (y1, X1),…, (yn, Xn), where yi is an ni × 1 vector of responses or repeated measures and Xi is an ni × p matrix of fixed covariates for i = 1,…, n. We assume independence among the different (yi, Xi)’s and

E[yi|bi,Xi;θ]=g(Xiβ+ZiΓbi), (1)

where bi is a q × 1 vector of unobserved random effects, θ denotes all the unknown parameters, Γ is a q × q lower triangular matrix, g(·) is an known link function, β = (β1,…, βp)T is a p × 1 vector of regression coefficients, and Zi is an ni × q matrix composed of the columns of Xi. In practice, it is common to assume that the conditional distribution of yi given (bi, Xi), denoted by f(yi|bi, Xi; θ), belongs to the exponential family, such as the binomial, normal, and Poisson (Little and Schluchter (1985), and Ibrahim and Lipsitz (1996)). For notational simplicity, the random effects bi ~ Nq(0, Iq) are assumed to follow a multivariate normal distribution with zero mean and a q × q covariance matrix Iq. Equivalently, Γbi ~ Nq(0, D = ΓΓT) and Γ is the Cholesky composition of the q × q matrix D. We allow the possibility of D being positive semi-definite so that certain components of Γbi may not be random but 0 with probability 1.

2.2 EM Algorithm for Maximizing the Penalized Likelihood

Selecting mixed effects involves identifying the nonzero components of β, determining the nonrandom elements of Γbi, and simultaneously estimating all nonzero parameters. We propose to maximize the penalized likelihood function given by

PL(θ)=(θ)nj=1pϕλj(|βj|)nk=1qϕλp+k(γk), (2)

where (θ)=i=1ni(θ), in which ℓi(θ) = log ∫ f(yi, bi|Xi; θ) dbi is the observed-data log-likelihood for the ith individual, λj is the penalty parameter of βj, and the penalty function ϕλj(·) is a nonnegative, nondecreasing, and differentiable function on (0, ∞) (Fan and Li, 2001; Zou, 2006). In addition, the k × 1 vector γk consists of all nonzero elements of the k-th row of the lower triangular q × q matrix Γ, γk=(γkTγk)1/2, and λp+k is the group penalty parameter corresponding to the whole k-th row of Γ. The structure in (2) ensures that certain estimates of β are zero (Fan and Li, 2001), which are insignificant predictors of the outcome variable, and the other covariates are significant predictors. The penalization of γk is performed in a group manner in order to preserve the positive definite constraint on D such that the estimates of the parametric vector γk either are all not zero or all equal to zero (Yuan and Li, 2006). If all the elements of γk are zero, then the k-th row of Γ is zero and the k-th element of Γbi is not random.

Similar to Chen and Dunson (2003), we reparametrize the linear predictor as

Xiβ+ZiΓbi=(Xi  (biTZi)Jq)(βγ)=Uiδ, (3)

where Jq is the q2 × q(q + 1)/2 matrix which transforms γ to vec(Γ), i.e. vec(Γ) = Jqγ. By reparametrizing the linear predictor this way, the selection of mixed effects is equivalent to the problem of grouped variable selection in regression models with missing covariates, while the random effects in the design matrix Ui can be interpreted as the “missing covariates”. Using this reparametrization, we can apply the variable selection methods proposed in Garcia, Ibrahim, and Zhu (2010a; 2010b) to select important mixed effects.

Because the observed-data log-likelihood function usually involves intractable integration, we develop a Monte Carlo EM algorithm to compute the maximum penalized likelihood estimator of θ, denoted by θ̂λ, for each λ = (λ1,…,λp+q). Denote the complete and observed data for subject i by dc,i = (yi, Xi, bi) and do,i = (yi, Xi), respectively, and the entire complete and observed data by dc and do, respectively. At the s-th iteration, given θ(s), the E step is to evaluate the penalized Q-function, given by

Qλ(θ|θ(s))=i=1nE{logf(di,c;θ)|do;θ(s)}nj=1pϕλj(|βj|)nk=1qϕλp+k(γk) (4)
=Q1(θ|θ(s))nj=1pϕλj(|βj|)nk=1qϕλp+k(γk)+Q2(θ(s)), (5)

where θ = (δT, ξT)T, in which ξ represents all other parameters other than δ, di,c = (yi, bi, Xi), and

Q1(θ|θ(s))=i=1n{logf(yi|bi,Xi;δ,ξ)}f(bi|di,o;θ(s))dbi, (6)
Q2(θ(s))=i=1n{logf(bi)}f(bi|di,o;θ(s))dbi. (7)

Since the integrals in (6) and (7) are often intractable, we approximate these integrals by taking a Markov chain Monte Carlo (MCMC) sample of size L from the density f(bi|di,o; θ(s)) (See Ibrahim, Chen, and Lipsitz, 1999). Let bi(s,l) be the l-th simulated value at the s-th iteration of the algorithm. The integrals in (6) can be approximated as,

Q1(θ|θ(s))=1Ll=1Li=1nlogf(yi|bi(s,l),Xi;θ). (8)

The M step involves maximizing

Q1,λ(θ|θ(s))=Q1(θ|θ(s))nj=1pϕλj(|βj|)nk=1qϕλp+k(γk) (9)

with respect to (δ, ξ). Maximizing Q1,λ(δ, ξ|θ(s)) with respect to ξ is straightforward and can be done using a standard optimization algorithm, such as the Newton-Raphson algorithm (Little and Schluchter, 1985; Ibrahim, 1990; Ibrahim and Lipsitz, 1996). Maximizing Q1,λ with respect to δ is difficult because Q1,λ is a nondifferentiable and nonconcave function of δ respectively (Zou and Li, 2008).

In order to maximize Q1,λ, following Fan and Li (2001), a second order Taylor’s series approximation of Q1,λ centered at the value δ(s) is used. Using this approximation, Q1,λ resembles a penalized weighted least squares regression, so algorithms for minimizing penalized least squares can be used (Fan and Li, 2001; Hunter and Li, 2005). We use a modification of the local linear approximation algorithm (LLA) (Zou and Li, 2008) to incorporate grouped penalization. For γk, we use an approximation centered at γk(s) as follows:

ϕλp+k(γk)t=1k{ϕλp+k(γk(s))|γkt(s)|γk(s)}|γkt|, (10)

where γkt is the t-th element of the vector γk and we assume γk(s)>0. If γk(s)=0, then we let γk(s+1)=0. Using this approximation, Q1,λ resembles a penalized regression with an L2 penalty, so the methods for performing the lasso can be used to maximize Q1,λ (Tibshirani, 1996; Fu, 1998).

Let ξ(s+1)=argmaxξQ1,λ(δ(s),ξ|θ(s)) and δ(s+1)=argmaxδQ1,λ(δ,ξ(s+1)|θ(s)). Due to the Taylor’s series approximation of Q1 and the LLA of ϕλj, θ(s+1) = (δ(s+1), ξ(s+1)) may not necessarily be the maximizer of Qλ (θ|θ(s)). By implementing the Expectation Conditional-Maximization (ECM) algorithm (Meng and Rubin, 1993), however, we can find a θ(s+1) such that Qλ(θ(s+1)|θ(s)) ≥ Qλ(θ(s)(s)) instead of directly maximizing Qλ(θ|θ(s)). This process is iterated until convergence and the value at convergence, denoted by θ̂λ, maximizes the penalized observed data log likelihood function.

2.3 Penalty Parameter Selection Procedure

To ensure that θ̂λ has good properties, the penalty parameter λ has to be appropriately selected. Two commonly used criteria for selection of the penalty parameter include the Generalized Cross Validation (GCV) and BIC criteria (Wang et al., 2007). These criteria cannot be easily computed in the presence of random effects, because they are functions of observed data quantities whose expressions may require intractable integrals. Moreover, it has been shown in Wang et al. (2007) that even in the simple linear model, the GCV criterion can lead to significant overfit.

We propose two methods to select the penalty parameter: an ICQ criterion and a random effects penalty selection method. The ICQ criterion (Ibrahim, Zhu, and Tang, 2008) selects the optimal λ by minimizing

ICQ(λ)=2Q(θ^λ|θ^0)+cn(θ^λ),

where θ^0=argmaxθ(θ) is the unpenalized maximum likelihood estimate and cn(θ) is a function of the data and the fitted model. For instance, if cn equals twice the total number of parameters, then we obtain an AIC-type criterion; alternatively, we obtain a BIC-type criterion when cn(θ) = dim(θ) × log n. Moreover, in the absence of random effects, ICQ(λ) reduces to the usual AIC or BIC criteria. As in the EM algorithm, we can draw a set of samples from f(bi|di,o; θ̂0) for i = 1,…, n in order to estimate Q(θ̂λ|θ̂0) for any λ.

The random effects penalty estimator is calculated under the assumption that δ is distributed as a random effect vector in a hierarchical model. The quantity λ can be regarded as a hyperparameter vector in the distribution of δ, denoted by f(δ|λ, n). Then, λ can be estimated by maximizing the marginal likelihood with respect to (ξ, λ), which is given by

i=1nf(yi|Xi,bi,δ;ξ)f(bi)f(δ|λ,n)dbidδ=i=1nf(di,o|ξ)f(δ|λ,n)dδ, (11)

where f(δ|λ, n) is defined by

f(δ|λ,n)=j=1pexp{nϕλj(|βj|)}k=1qexp{nϕλp+k(γk)}/{C(λ,n)},

and C(λ, n) is the normalizing constant of f(δ|λ, n). The resulting estimate of λ, denoted by λ̂RE, from the maximization of (11), is the random effects penalty estimator. Treating δ as missing data, the Monte Carlo EM algorithm can be used to maximize (11) with respect to (ξ, λ).

We consider the SCAD and ALASSO penalty functions for determining λ. The ALASSO penalty is defined by

ϕλj(|βj|)=λj|βj|  for  j=1,,p,  ϕλp+k(γk)=λp+kγk  for  k=1,,q.

Typical values of λj are λj = λ01|β̂j|−1 and λp+k=λ02kγ^k1, where β̂j and γ̂k are the unpenalized maximum likelihood (ML) estimates. The multiplier k normalizes the penalty parameter γk in order to accommodate the varying sizes of γk. When λj = λ01 and λp+k=λ02k, the ALASSO reduces to the LASSO penalty.

The SCAD penalty (Fan and Li, 2001) is a nonconcave function defined by ϕλ (0) = 0 and for |β|>0,ϕλ(|β|)=λ1(|β|λ)+(aλ|β|)+a11(|β|>λ), where t+ denotes the positive part of t and a = 3.7. Because the integral of the negative exponential of the ALASSO and SCAD penalties is not finite, i.e. exp{nϕλ(λk)}dγk=, the expression exp{−nϕλ(‖λk‖)} is defined in a bounded space to ensure that f(δ|λ, n) is a proper density. Since a closed form expression of λ̂RE is unavailable for both the ALASSO and SCAD penalties, we use the Newton Raphson algorithm along with the ECM algorithm to estimate λ̂RE.

3. Theoretical Results

In this section, we establish the asymptotic theory of the MPL estimator and the consistency of the penalty selection procedure based on ICQ. Suppose β=(β(1)T,β(2)T)T, where β(1) and β(2) are, respectively, p1 × 1 and (pp1) × 1 subvectors. Let β*=(β(1)*T,β(2)*T)T denote the true value of β. Without loss of generality, we assume that β(2)*=0 and all of the components of β(1)* are not equal to zero. Similarly let γ=(γ1T,,γkT)T=(γ(1)T,γ(2)T)T where γ(1)T=(γ1T,,γq1T)T,γ(2)T=(γq1+(1)T,,γqT)T and γ(1) and γ(2) are q1(q1 + 1)/2 × 1 and {qq1(q1 + 1)/2} × 1 subvectors respectively. Let γ*=(γ(1)*T,γ(2)*T)T denote the true value of γ. Without loss of generality, we assume that γ(2)*=0 and some of the components of each γk* are not equal to zero for k = 1,…, q1.

Let 𝒮 = {j11,…, j1d1; j21, …, j2d2} be a candidate model containing the j11-th, …, j1d-th columns of X and the j21-th, …, j2d2-th columns of Z. Thus, 𝒮F = {1,…, p; 1,…,q} and 𝒮T = {1,…, p1; 1,…, q1} denote the full and true covariate models, respectively. If 𝒮 misses at least one important covariate, that is 𝒮 ⊅ 𝒮T, then 𝒮 is referred to as an underfitted model; however, if 𝒮 ⊃ 𝒮T and 𝒮 ≠ 𝒮T, then 𝒮 is an overfitted model. The unpenalized and penalized ML estimators of θ = (βT, γT, ξ)T, denoted by θ̂S and θ̂λ, respectively, are defined as

θ^S=argmaxθ:βj0,j𝒮(θ)  and  θ^λ=argmaxθ{(θ)nj=1pϕλj(|βj|)nk=1qϕλp+k(γk)},

, and particularly θ̂𝕊F = θ̂0. We obtain the following theorems whose assumptions and proofs can be found in the web-based supplementary document.

THEOREM 1

Under assumptions (C1)–(C7) in the supplementary document, we have

  1. θ̂λθ* = Op(n−1/2) as n → ∞, where θ* is the true value of θ;

  2. Sparsity: P(β̂(2)λ = 0, γ̂(2)λ = 0) → 1;

  3. Asymptotic normality: n{(β^(1)λT,λ^(1)λT,ξ^λT)T(β(1)*T,γ(1)*T,ξ(1)*T)T} is asymptotically normal with mean and covariance matrix defined in the supplement.

Theorem 1 states that by appropriately choosing the penalty λ, there exists a root-n estimator of θ, θ̂λ, and that this estimator must possess the sparsity property, i.e. β̂(2)λ = 0, γ̂(2)λ = 0 in probability. Moreover, (β^(1)λT,λ^(1)λT,ξ^λT)T is asymptotically normal.

We investigate whether the ICQ(λ) criterion can consistently select the correct model. For each λRp+, (β̂λ, γ̂λ) naturally defines a candidate model 𝕊λ = {j : β̂λ,j ≠ 0; k : ‖γ̂λ,k‖ ≠ 0}. Generally, 𝒮λ can be either underfitted, overfitted, or true. Therefore, Rp+ can be partitioned into three mutually exclusive regions Rup+={λRp+:𝒮λ𝒮T},Rtp+={λRp+:𝒮λ=𝒮T}, and Rop+={λRp+:𝒮λ𝒮T,𝒮λ𝒮T},. Furthermore, if we can choose a reference penalty parameter sequence {λnRp+}n=1, which satisfies the conditions of Theorem 1, then 𝒮λn = 𝒮T in probability.

To select λ we first calculate

dICQ(λ2,λ1)=ICQ(λ2)ICQ(λ1)=2Q(θ^λ2|θ^0)+cn(θ^λ2)+2Q(θ^λ1|θ^0)cn(θ^λ1)

for any two λ1 and λ2. We assume 𝒮λ2 ⊃ 𝒮λ1 and choose the model 𝒮λ1 resulting from using the penalty value λ1 if dICQ(λ2, λ1) ≥ 0, otherwise we choose the model 𝒮λ2.

Define δQ(λ1,λ2)=E{Q(θ𝒮λ1*|θ*)}E{Q(θ𝒮λ2*|θ*)}, and δc(λ2, λ1) = cn(θ̂λ2) − cn(θ̂λ1), where θ𝒮* is defined in the supplementary document.

THEOREM 2

Under assumptions (C1)–(C7) in the supplementary document, we have the following results.

  1. If for all 𝒮λ ⊅ 𝒮T, liminfnδQ(λ,0)/n>0 and δc(λ, 0) = op(n), then dICQ(λ, 0) > 0 in probability.

  2. If E{Q(θ𝒮λ1*|θ0^)}E{Q(θ𝒮λ2*|θ0^)}=Op(n1/2) and Q(θ^λt|θ^0)E{Q(θ𝒮λt*|θ0^)}=Op(n1/2) for t = 1, 2, then dICQ(λ2, λ1) > 0 in probability as n−1/2δc(λ2, λ1) converges toin probability.

  3. If Q(θ̂λ1 |θ̂0) − Q(θ̂λ2 |θ̂0) = Op(1), then dICQ(λ2, λ1) > 0 in probability as δc(λ2, λ1) converges toin probability.

Theorem 2 has some important implications. Theorem 2(a) shows that ICQ(λ) chooses all significant covariates with probability 1. Because 𝒮0RtpRop, the optimal model selected by minimizing ICQ(λ) will not select a λ with 𝒮λ ⊅ 𝒮T because dICQ(λ, 0) > 0 in probability. Therefore, the ICQ(λ) criterion selects all significant covariates with probability tending to 1. Generally, the most commonly used cn(θ), such as 2dim(θ), dim(θ) log(n), and K log log(n) (K > 0), satisfy the condition δc(λ, 0) = op(n). The condition liminfnn1δQ(λ,0)>0 ensures that ICQ(λ) chooses a model with large E{Q(θ𝒮*|θ*)}. This condition is analogous to condition 2 in (Wang et al., 2007), which elucidates the effect of underfitted models. The term n1E{Q(θ*|θ*)}n1E{Q(θ𝒮*|θ*)} can be written as

n1(θ*)n1(θS*)+n1E{H(θ*|θ*)}n1E{H(θS*|θ*)}, (12)

where

H(θ1|θ2)=i=1nlog{f(bi|do,i;θ1)}f(bim|do,i;θ2)dbim. (13)

By Jensen’s inequality, the third and fourth terms of (12) are greater than zero and the first and second terms must be greater than zero for large n. Thus, lim infn n−1δQ(λ, 0) ≥ 0 in probability.

If λ1 and λ2 have the same average n1E{Q(θ𝒮λ*|θ*)}, that is, lim infn n−1δQ(λ2, λ1) = 0, then Theorem 2 (b) and (c) indicate that ICQ(λ) picks out the smaller model 𝒮λ1 when δc(λ2, λ1) increases to ∞ at a certain rate (e.g., log(n)). For example, for the BIC-type criterion, δc(λ2, λ1) = {dim(θ̂𝒮λ2) − dim(θ̂𝒮λ1)} log(n) ≥ log(n) since we assume 𝒮λ2 ⊃ 𝒮λ1. The AIC-type criterion, for which cn(θ) = 2 × dim(θ), however, does not satisfy this condition. Thus, similar to the AIC criterion with no random effects, ICQ(λ) with cn(θ) = 2 × dim(θ) tends to overfit.

4. Simulation Study

We use simulations to examine the finite sample performance of the maximum penalized likelihood estimates using our proposed penalty estimators and compare them to the unpenalized ML estimate. Our objectives for these simulations are to 1) compare the random effects and ICQ penalty estimators and 2) to compare the SCAD, LASSO, and ALASSO penalty functions.

To do this, we simulated a data set consisting of n independent observations according to the model yi = Xiβ+ZiΓbiεi, i = 1,…, n, where bi and εi are independent and standard multivariate normal random vectors, and β = (3, 2, 1.5, 0, 0, 0, 0, 0)T. Moreover, ΓΓT = D is a 3 × 3 matrix, such that the (r, s) element of D is ρ|rs|. The matrix Xi is a 12 × 8 matrix of independent rows, where each row of Xi has mean zero and covariance matrix Σxx whose (r, s) element is ρ|rs|. The matrix Zi was set equal to Xi.

We considered six different settings: (n = 50, σ = 3), (n = 50, σ = 1), (n = 100, σ = 3), (n = 100, σ = 1), (n = 200, σ = 3), and (n = 200, σ = 1) with a value of ρ = .5 for all settings. For each setting, one design matrix was simulated and 100 data sets (yi, Xi) for i = 1,…, n were generated.

For each simulated data set, the maximum penalized likelihood (MPL) estimate using the SCAD, LASSO and ALASSO penalties was computed using the random effects and ICQ penalty estimates. These estimates are denoted as SCAD-RE, SCAD-ICQ, LASSO-RE, LASSO-ICQ, ALASSO-RE, and ALASSO-ICQ, respectively. For the ICQ estimate, the BIC-type criterion, cn(θ) = dim(θ) log n, was used. For the Monte Carlo EM algorithm, 2000 Monte Carlo iterations were used within each iteration of EM. For the SCAD and LASSO penalties, we set λj = λ01, for j = 1,… 8, and λ8+k=λ02k, for k = 1,…,3 while for the ALASSO penalty, λj = λ01|β̂j|−1, for j = 1,… 8, and λ8+k=λ02kγ^k1 for k = 1,…,3 where β̂j, and γ̂k are the unpenalized ML estimates of βj and γk respectively, and the penalty (λ01, λ02) was estimated using the ICQ and random effects penalty selection methods.

For each estimate, the penalized estimate of β and D were computed, denoted as β̂λ and λ respectively, and the mean square error ME(β̂λ) = (β̂λ − β)TΣxx(β̂λ − β)T and the quadratic loss error ME(λ) = trace[(λD)2]1/2 were computed. The ratio of the model error of the MPL estimate to that of the unpenalized ML estimate, ME(β̂λ)/ME(β̂0) and ME(λ)/ME(0), were computed for each data set and the median of the ratios over the 100 simulated data sets, denoted as MRME, was calculated. The MRME of the true model is also reported. In addition, we report two types of errors regarding the fixed and random effects. ZERO1 is the mean number of type I errors (an effect is truly not significant or random but the corresponding MPL estimate indicates it is significant or random) and ZERO2 is the mean number of type II errors (an effect is truly significant or random but the corresponding MPL estimate indicates it is not significant or random).

For the MPL estimates, MRME values greater than one indicate that the estimate performs worse than the ML estimate, values near one indicate it performs as good as the ML estimate, while values near the ‘true’ MRME value indicate optimal performance. The values ZERO1 and ZERO2 can be interpreted as estimates of the probability of overfit and underfit, respectively, and the value 1 − ZERO1 − ZERO2 is an estimate of the probability of selecting the true model. Ideally, one would like to have MPL estimates with small ZERO1 and ZERO2 values and small MRME values. Overall, the MRME values of all of the MPL estimates were less than or equal to one, which indicates that regardless of the sample size or noise level, the MPL estimates perform better than the ML estimate. Across all samples sizes and noise levels, the MRME values of the MPL estimates using the random effects penalty estimates was higher than the MPL estimates using the ICQ penalty estimates. For the ICQ MPL estimates, as the noise level decreases from σ = 3 to σ = 1, the MRME values increase. For a fixed noise level, the MRME values at sample sizes of n = 50 and n = 200 are comparable but there is a slight decrease in the MRME values at sample sizes of n = 100. This indicates that the MPL estimates perform better, relative to the MLE, at low noise levels and near sample sizes of n = 100. The MPL estimates using the random effects penalty estimate tended to overfit significantly. On average, the MPL estimate using the ALASSO penalty function had smaller estimation error and overfit than the LASSO estimate. For estimating fixed effects, the SCAD-ICQ estimate has, on average, smaller estimation error and overfit than the other estimates. For estimating the random effects, the ALASSO-ICQ has smaller error and overfit.

5. Yale Infant Growth Study

We applied the proposed methodology to the Yale infant growth study of Wasserman and Leventhal (1993) and Stier et al. (1993). The Yale infant growth data were collected to study whether cocaine exposure during pregnancy leads to the maltreatment of infants after birth, such as physical and sexual abuse. A total of 298 children were recruited from two subject groups (cocaine exposure group and unexposed group). Throughout the study different children had different numbers and patterns of visits during the study period. The multivariate response was weight of the infant at each visit. Let yij denote the weight (in pounds) at the j-th visit of infant i, for i = 1,…, 298, j = 1,…, ni and let yi = (yi1,…, yini). The covariates used were: xij1 = day of visit, xij2 = age (in years) of mother, xij3 = gestational age (in weeks) of infant, xij4 = race (2 levels: African American and other, coded as 1 and 0), xij5 = previous pregnancies (2 levels: no and yes, coded as 1 and 0), xij6 = gender of infant (2 levels: male and female, coded as 1 and 0), xij7 = cocaine exposure (2 levels: yes and no, coded as 1 and 0). The design matrix Xi is a ni × 8 matrix with the j-th row equal to (1, xij1, xij2, xij3, xij4, xij5, xij6, xij7), Zi is a ni × 3 matrix composed of the first 3 continuous covariates of Xi, i.e., the j-th row of Zi is (xij1, xij2, xij3), and therefore q = 3 here. All covariates were centered in the analysis for numerical stability. Further, we assume that [yi|Xi; β, D] is normally distributed with mean E(yi) = Xiβ + ZiΓbi, where ΓΓT = D and [yij |Xi; β, D] and [yij′|Xi; β, D] are independent for jj′.

The objective of this analysis was to determine the significant predictors of infant weight and the significant random effects. Because the ALASSO penalty outperformed the LASSO penalty in the simulations, only the SCAD and ALASSO penalty functions were used along with the ICQ and random effects penalty estimates. Note that the intercept term was not penalized. For the SCAD, λj = λ01 for j = 2,…, 8 and λ8+k=λ02k, for k = 1,…, 3, while for the ALASSO penalty, λj = λ01|β̂j|−1 for j = 2, …, 8 and λ8+k=λ02kγ^k1, for k = 1,…, 3, where β̂j and γ̂k are the unpenalized ML estimates of βj and γk, respectively, and (λ01, λ02) was estimated using the ICQ and random effects penalty selection methods.

The results of the analysis are presented in Table 2. The MPL estimates using the SCAD penalty identify visit, gestational age of infant, gender of infant and cocaine exposure as significant predictors of infant weight, and visit as significant random effect. These estimates coincide with the results of the maximum likelihood analysis which identify the same fixed and random effects as significant (significant effects by MLE analysis are indicated by a * in Table 2). The results of using the SCAD with two different sets of penalty estimates are similar. Although the estimates using SCAD with the ICQ penalty estimates do not shrink the random-effect variances for age and gestational age to 0, these variance estimates are relatively smaller than that of the visit random effect, which still identifies the correct random-effect. The MPL estimate using the ALASSO penalty shrunk two more coefficients of the fixed effects to zero: gender and cocaine. Although these two effects are identified as significant in the MLE, we see that their corresponding MLE estimates are smaller relative to the other significant fixed effects. The estimates using the ALASSO penalty with the ICQ penalty estimates are close to that of the RE penalty estimates. The MPL estimates using the ALASSO penalty identify visit and gestational age of infant as significant fixed effects, and visit as a significant random effect.

Table 2.

Maximum penalized likelihood estimates of Yale infant grown data comparing SCAD and ALASSO penalty functions with random effects and ICQ penalty estimates

Fixed Estimatea (Variance Estimate of Random Effectb)

SCAD ALASSO


Variable MLEc RE ICQ RE ICQ
Intercept 7.002* (-) 6.924 (-) 6.988 (-) 6.913 (-) 6.913 (-)
Visit 2.641* (0.230*) 2.576 (0.087) 2.617 (0.109) 2.543 (0.040) 2.548 (0.067)
Age −0.035 (0.017) 0.000 (0.000) 0.000 (0.007) 0.000 (0.000) 0.000 (0.000)
Gestation 0.528* (0.017) 0.424 (0.000) 0.455 (0.011) 0.322 (0.000) 0.424 (0.000)
Race −0.060 (-) 0.000 (-) 0.000 (-) 0.000 (-) 0.000 (-)
Pregnant −0.004 (-) 0.000 (-) 0.000 (-) 0.000 (-) 0.000 (-)
Gender 0.139* (-) 0.022 (-) 0.033 (-) 0.000 (-) 0.000 (-)
Cocaine 0.103* (-) 0.016 (-) 0.022 (-) 0.000 (-) 0.000 (-)
σ2 d 0.512 (-) 0.552 (-) 0.527 (-) 0.612 (-) 0.594 (-)

ICQe 9223.7 11507.32 9660.013 11999.01 11773.25
a

is estimate of β

b

is estimate of diag(D)

c

* indicates significant effects by MLE analysis

d

is the variance estimate of error term of the linear mixed model

e

is a measure of goodness of fit

6. Discussion

We have proposed a general method which performs simultaneous fixed and random effects selection as well as estimation. Under certain regularity conditions and appropriate assumptions on the penalty parameters, the maximum penalized likelihood estimate possesses oracle properties. We have used two methods for estimating the penalty parameters, the random effects and ICQ penalty selection methods, and showed that under an appropriate choice of cn(θ), the ICQ penalty estimate chooses all the significant fixed and random effects with probability 1. Since penalized likelihood methods have been shown to perform poorly in finite samples, simulations were performed to examine the finite sample properties of the maximum penalized likelihood estimators and the performance of the Monte Carlo EM algorithm. In the simulations, the SCAD and ALASSO penalty functions using the ICQ penalty estimate performed best and had significantly less estimation error than the maximum likelihood estimate. Unlike previous implementations of the random effects penalty estimate (Garcia, Ibrahim, and Zhu, 2010a, 2010b), the simulations and real data analysis results show that for mixed effects regression models, the random effects penalty estimate has significant overfit. For estimating fixed effects, the SCAD-ICQ estimate had, on average, smaller estimation error and overfit, while for estimating random effects, the ALASSO-ICQ had smaller error and overfit.

Many aspects of this work warrant further research and investigation. Recent developments have shown that there may be more than one plausible scheme for formulating the grouped penalty in the penalized likelihood (Zhao et al., 2009; Breheny and Huang, 2009). To select significant random effects using a cholesky parametrization of the covariance matrix of the random effects requires that each row of the cholesky matrix to be penalized as a group. Other parameters, however, can be grouped and penalized in various ways. For instance, it is possible to group parameters corresponding to the fixed effects if one is interested in determining whether a particular group of fixed effects is significant or not. It is also possible to use different penalty functions for each group of parameters.

The objective of this paper was to perform simultaneous selection of fixed and random effects. To the best of our knowledge, this is the first paper to propose this type of methodology. In the existing literature, (Gurka, 2006; Chen and Dunson, 2003; Daniels and Kass, 1999, 2001), the predominant approach to mixed effects selection has been to fix either the mean model or the covariance structure of the random effects and then either test variance components or perform variable selection on the mean model (Keselman et al., 1998). This approach, since it fixes certain parts of the model, makes assumptions regarding the model structure which may not inappropriate. A possible reason that simultaneous mixed effects selection may not have been pursued before is perhaps due to the numerical complexity inherent in the model fitting algorithms. With penalized likelihood methods, however, simultaneous mixed effects selection is straightforward to implement and no assumptions are necessary regarding any part of the model.

As it stands, calculating the ICQ penalty estimator is slightly demanding. An alternative to ICQ penalty parameter estimation is to select the penalty parameter which optimizes other criteria developed in mixed effects models such as those in Claeskens and Consentino (2008) and Liang, Wu, and Zou (2008). We will formally study these issues in future work.

Supplementary Material

Supp material

Table 1.

Simulation results of linear mixed effects models comparing SCAD, LASSO, and ALASSO penalty functions with random effect and ICQ penalty estimates

β Estimate (D Estimate)

Model Method MRME ZERO1 ZERO2
n = 50, σ = 3 SCAD-RE 0.576 (0.980) 0.11 (0.94) 0.00 (0.00)
SCAD-ICQ 0.552 (0.259) 0.01 (0.09) 0.00 (0.01)
LASSO-RE 0.983 (0.988) 0.99 (1.00) 0.00 (0.00)
LASSO-ICQ 0.605 (0.241) 0.04 (0.10) 0.00 (0.01)
ALASSO-RE 0.949 (0.983) 0.80 (1.00) 0.00 (0.00)
ALASSO-ICQ 0.597 (0.263) 0.01 (0.13) 0.00 (0.01)
True 0.559 (0.228) 0.00 (0.00) 0.00 (0.00)
n = 50, σ = 1 SCAD-RE 0.906 (0.803) 0.58 (1.00) 0.00 (0.00)
SCAD-ICQ 0.869 (0.461) 0.03 (0.13) 0.00 (0.00)
LASSO-RE 0.997 (0.996) 0.99 (1.00) 0.00 (0.00)
LASSO-ICQ 0.884 (0.438) 0.04 (0.08) 0.00 (0.00)
ALASSO-RE 0.983 (0.989) 0.81 (1.00) 0.00 (0.00)
ALASSO-ICQ 0.858 (0.441) 0.03 (0.10) 0.00 (0.00)
True 0.846 (0.439) 0.00 (0.00) 0.00 (0.00)
n = 100, σ = 3 SCAD-RE 0.571 (0.970) 0.13 (0.93) 0.00 (0.00)
SCAD-ICQ 0.565 (0.219) 0.01 (0.04) 0.00 (0.00)
LASSO-RE 0.993 (0.994) 0.99 (1.00) 0.00 (0.00)
LASSO-ICQ 0.584 (0.232) 0.01 (0.04) 0.00 (0.00)
ALASSO-RE 0.949 (0.987) 0.81 (1.00) 0.00 (0.00)
ALASSO-ICQ 0.574 (0.205) 0.01 (0.04) 0.00 (0.00)
True 0.513 (0.196) 0.00 (0.00) 0.00 (0.00)
n = 100, σ = 1 SCAD-RE 0.895 (0.803) 0.57 (1.00) 0.00 (0.00)
SCAD-ICQ 0.820 (0.452) 0.01 (0.07) 0.00 (0.00)
LASSO-RE 0.999 (0.997) 0.99 (1.00) 0.00 (0.00)
LASSO-ICQ 0.835 (0.478) 0.03 (0.08) 0.00 (0.00)
ALASSO-RE 0.982 (0.989) 0.82 (1.00) 0.00 (0.00)
ALASSO-ICQ 0.839 (0.415) 0.02 (0.06) 0.00 (0.00)
True 0.832 (0.392) 0.00 (0.00) 0.00 (0.00)
n = 200, σ = 3 SCAD-RE 0.553 (0.987) 0.13 (0.94) 0.00 (0.00)
SCAD-ICQ 0.554 (0.245) 0.01 (0.07) 0.00 (0.00)
LASSO-RE 0.995 (0.996) 0.99 (1.00) 0.00 (0.00)
LASSO-ICQ 0.617 (0.244) 0.05 (0.09) 0.00 (0.00)
ALASSO-RE 0.934 (0.992) 0.78 (1.00) 0.00 (0.00)
ALASSO-ICQ 0.603 (0.237) 0.02 (0.11) 0.00 (0.00)
True 0.546 (0.218) 0.00 (0.00) 0.00 (0.00)
n = 200, σ = 1 SCAD-RE 0.902 (0.833) 0.55 (1.00) 0.00 (0.00)
SCAD-ICQ 0.853 (0.487) 0.01 (0.12) 0.00 (0.00)
LASSO-RE 0.998 (0.998) 0.99 (1.00) 0.00 (0.00)
LASSO-ICQ 0.873 (0.554) 0.07 (0.20) 0.00 (0.00)
ALASSO-RE 0.982 (0.991) 0.79 (1.00) 0.00 (0.00)
ALASSO-ICQ 0.871 (0.468) 0.02 (0.11) 0.00 (0.00)
True 0.839 (0.408) 0.00 (0.00) 0.00 (0.00)

Acknowledgments

The authors wish to thank the editor, associate editor and two referees for helpful comments and suggestions, which have led to an improvement of this article. This research was partially supported by NSF grant BCS-08-26844 and NIH grants GM 70335, CA 74015, RR025747-01, MH086633, AG033387, and P01CA142538-01.

Footnotes

Supplementary Materials

Web-based supplementary document referenced in Section 3 is available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

References

  1. Bondell HD, Krishna A, Ghosh SK. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics. 2010 doi: 10.1111/j.1541-0420.2010.01391.x. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Breheny P, Huang J. Penalized methods for bi-level variable selection. Statistics and its Interface. 2009;2:369–380. doi: 10.4310/sii.2009.v2.n3.a10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. doi: 10.1093/biomet/92.2.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Claeskens G, Consentino F. Variable selection with incomplete covariate data. Biometrics. 2008;64:1062–1096. doi: 10.1111/j.1541-0420.2008.01003.x. [DOI] [PubMed] [Google Scholar]
  5. Chen Z, Dunson D. Random effects selection in linear mixed models. Biometrics. 2003;59:762–769. doi: 10.1111/j.0006-341x.2003.00089.x. [DOI] [PubMed] [Google Scholar]
  6. Daniels MJ, Kass RE. Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association. 1999;94:1254–1263. [Google Scholar]
  7. Daniels MJ, Kass RE. Shrinkage estimators for covariance matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
  9. Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Annals of Statistics. 2002;30:74–99. [Google Scholar]
  10. Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of American Statistical Association. 2004;99:710–723. [Google Scholar]
  11. Fu W. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics. 1998;7:375–384. [Google Scholar]
  12. Garcia RI, Ibrahim JG, Zhu H. Variable selection for regression models with missing data. Statistica Sinica. 2010a;20:149–165. [PMC free article] [PubMed] [Google Scholar]
  13. Garcia RI, Ibrahim JG, Zhu H. Variable selection in the Cox regression model with covariates missing at random. Biometrics. 2010b;66:97–104. doi: 10.1111/j.1541-0420.2009.01274.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gurka MJ. Selecting the best linear mixed model under REML. American Statistician. 2006;60:19–26. [Google Scholar]
  15. Hunter DR, Li R. Variable selection using MM algorithms. Annals of Statistics. 2005;33:1617–1642. doi: 10.1214/009053605000000200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ibrahim JG. Incomlete data in generalized linear models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]
  17. Ibrahim JG, Chen MH, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–596. doi: 10.1111/j.0006-341x.1999.00591.x. [DOI] [PubMed] [Google Scholar]
  18. Ibrahim JG, Lipsitz SR. Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. Biometrics. 1996;52:1071–1078. [PubMed] [Google Scholar]
  19. Ibrahim JG, Zhu H, Tang N. Model selection criteria for missing-data problems using the em algorithm. Journal of the American Statistical Association. 2008;103:1648–1658. doi: 10.1198/016214508000001057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Johnson B, Lin DY, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Assoication. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Communications in Statistics - Simulation and Computation. 1998;27:591–604. [Google Scholar]
  22. Kowalchuck RK, Keselman HJ, Algina J, Wolfinger RD. The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement. 2004;64:224–242. [Google Scholar]
  23. Krishna A. North Carolina State University; 2009. Joint variable selection of fixed and random effects in linear mixed-effects model and its oracle properties. unpublished thesis. [Google Scholar]
  24. Leeb H, Potscher BM. Sparse estimators and the oracle property, or the return of Hodges’ Estimator. Journal of Econometrics. 2008;142:201–211. [Google Scholar]
  25. Liang H, Wu H, Zou G. A note on conditional AIC for linear mixed effects-models. Biometrika. 2008;95:773–778. doi: 10.1093/biomet/asn023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lin X. Variance component testing in generalized linear models with random effects. Biometrika. 1997;84:309–326. [Google Scholar]
  27. Little RJA, Schluchter M. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika. 1985;72:497–512. [Google Scholar]
  28. Meng XL, Rubin DB. Maximum likleihood estimation via the ECM algorithm: a general framework. Biometrika. 1993;80:267–278. [Google Scholar]
  29. Ni X, Zhang D, Zhang H. Variable selection for semiparametric mixed models in longitudinal studies. Biometrics. 2009;66:79–88. doi: 10.1111/j.1541-0420.2009.01240.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Qu A, Li R. Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics. 2006;62:379–391. doi: 10.1111/j.1541-0420.2005.00490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Stier DM, Leventhal JM, Berg AT, Johnson L, Mezger J. Are children born to young mothers at increased risk of maltreatment? Pediatrics. 1993;91:642–648. [PubMed] [Google Scholar]
  32. Thall PF, Vail SX. Some covariance models for longitudinal dount data with overdispersion. Biometrics. 1990;46:657–671. [PubMed] [Google Scholar]
  33. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B. 1996;58:267–288. [Google Scholar]
  34. Wasserman DR, Leventhal JM. Maltreatment of children born to cocaine-dependent mothers. American J. Diseases of Children. 1993;147:1324–1328. doi: 10.1001/archpedi.1993.02160360066021. [DOI] [PubMed] [Google Scholar]
  35. Wang H, Li R, Tsai CL. Tuning parameter selector for the smoothly clippped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statistic. Soc. B. 2006;68:49–67. [Google Scholar]
  37. Zhang H, Lu W. Adaptive-LASSO for Cox’s proportional hazards model. Biometrika. 2007;94:1–13. [Google Scholar]
  38. Zhao P, Rocha G, Yu B. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics. 2009;37:3468–3497. [Google Scholar]
  39. Zhu HT, Zhang HP. Generalized score test for homogeneity for mixed effects models. Annals of Statistics. 2006;34:1545–1569. [Google Scholar]
  40. Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
  41. Zou H, Li R. One-step sparse estimates in noncancave penalized likelihood models. Annals of Statistics. 2008;36:1509–1533. doi: 10.1214/009053607000000802. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp material

RESOURCES