Fixed and Random Effects Selection in Mixed Effects Models

Joseph G Ibrahim; Hongtu Zhu; Ramon I Garcia; Ruixin Guo

doi:10.1111/j.1541-0420.2010.01463.x

. Author manuscript; available in PMC: 2011 Jun 22.

Published in final edited form as: Biometrics. 2010 Jul 21;67(2):495–503. doi: 10.1111/j.1541-0420.2010.01463.x

Fixed and Random Effects Selection in Mixed Effects Models

Joseph G Ibrahim ^1,^*, Hongtu Zhu ^1,^**, Ramon I Garcia ^1,^***, Ruixin Guo ^1,^****

PMCID: PMC3041932 NIHMSID: NIHMS216083 PMID: 20662831

SUMMARY

We consider selecting both fixed and random effects in a general class of mixed effects models using maximum penalized likelihood (MPL) estimation along with the smoothly clipped absolute deviation (SCAD) and adaptive LASSO (ALASSO) penalty functions. The maximum penalized likelihood estimates are shown to posses consistency and sparsity properties and asymptotic normality. A model selection criterion, called the IC_Q statistic, is proposed for selecting the penalty parameters (Ibrahim, Zhu and Tang, 2008). The variable selection procedure based on IC_Q is shown to consistently select important fixed and random effects. The methodology is very general and can be applied to numerous situations involving random effects, including generalized linear mixed models. Simulation studies and a real data set from an Yale infant growth study are used to illustrate the proposed methodology.

Keywords: ALASSO, Cholesky decomposition, EM algorithm, IC_Q criterion, Mixed Effects selection, Penalized likelihood, SCAD

1. Introduction

In the analysis of mixed effects models, a primary objective is to assess significant fixed effects and/or random effects of the outcome variable. For instance, when simultaneously selecting both random and fixed effects, that is, when selecting mixed effects, it is common to use a selection procedure (e.g., forward or backward elimination), coupled with a selection criterion, such as AIC and/or BIC based on the observed data log-likelihood, to compare a set of candidate models (Keselman et al., 1998; Gurka, 2006; Liang, Wu, and Zou, 2008; Ibrahim, Zhu, and Tang, 2008; Claeskens and Consentino, 2008). Zhu and Zhang (2006) proposed a testing procedure based on a class of test statistics for a general mixed effects model to test the homogeneity hypothesis that all of the variance components are zero. Such methods, however, suffer from a serious deficiency in that it is infeasible to simultaneously select significant random and fixed (mixed) effects from a large number of possible models (Fan and Li, 2001; Fan and Li, 2002). To overcome such a deficiency, variable selection procedures based on penalized likelihood methods, such as the Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001) and the Adaptive Lasso (ALASSO) (Zou, 2006), may be developed to select mixed effects.

Compared to the large body of literature on variable selection procedures, we make several novel contributions in this paper. This is one of the few papers on developing selection methods for selecting mixed effects in a large class of mixed effects models. Most variable selection procedures are developed for various parametric models and semiparametric models with/without random effects and/or unobserved data (Fan and Li, 2002, 2004; Cai et al., 2005; Qu and Li, 2006; Zhang and Lu, 2007; Ni, Zhang, and Zhang, 2009; Johnson, Lin, and Zeng, 2008; Garcia, Ibrahim, and Zhu 2010a, 2010b), but all these procedures have only been used for the selection of significant fixed effects. The only exception is the recent work by Krishna (2009) and Bondell, Krishna, and Ghosh (2010), in which only the linear mixed model is considered. We use a novel reparametrization to reformulate the selection of mixed effects into the problem of grouped variable selection in models with heavy ‘missing’ data, where the missing data is represented by the random effects. This reparametrization makes it possible to use penalized likelihood methods to select both fixed and random effects. Compared to most variable selection methods for linear models, we must address additional challenges due to the presence of missing observations for each subject. A computational challenge here is to directly maximize the observed data log-likelihood function along with the SCAD or ALASSO penalties to select both fixed and random effects and to calculate their estimates. The observed data log-likelihood for complicated mixed effects models is often not available in closed form, and is computationally intractable because it may involve high dimensional integrals which are difficult to approximate. When selecting random effects, this maximization is further complicated because one must eliminate the corresponding row and column of an insignificant random effect and constrain the remaining matrix to be positive definite. Another challenge is to select appropriate penalty parameters in order to produce estimates having proper asymptotic properties (Fan and Li, 2001), whereas existing selection criteria (Kowalchuck et al., 2004; Gurka, 2006; Liang, Wu, and Zou, 2008; Claeskens and Consentino, 2008) are computationally difficult for general mixed effect models.

The goal of this paper is to develop a simultaneous fixed and random effects selection procedure based on the SCAD and ALASSO penalties for application to longitudinal models, correlated models, and/or mixed effects models. We reformulate the problem of selecting mixed effects and develop a method based on the IC_Q criterion to select the penalty parameters. We also specify the penalty parameters in the SCAD and ALASSO penalty functions as a hyperparameter, and then we use the Expectation Maximization (EM) algorithm to simultaneously optimize the penalized likelihood function and estimate the penalty parameters. Under some regularity conditions, we establish the asymptotic properties of the maximum penalized likelihood estimator and the consistency of the IC_Q-based penalty selection procedure.

To motivate the proposed methodolgy, we consider a dataset from a Yale infant growth study (Wasserman and Leventhal, 1993; Stier et al., 1993). The objective of this study is to investigate the relationship between maternal cocaine dependency and child maltreatment (physical abuse, sexual abuse, or neglect). This study had a total of 298 children from the cocaine exposed and unexposed groups. The outcome variable is infant weight (in pounds), which is obtained over several time points. Seven covariates were considered: day of visit, age of mother, gestational age of infant, race, previous pregnancies, gender of infant, and cocaine exposure. Each child had different number and pattern of visits during the study. We consider the mixed effects model by using the seven covariates as fixed effects and the first three covariates as random effects. Our objective in this analysis is to select the most important predictors of infant weight as well as select significant random effects. The selection of random effects is crucial in this application, as it is not at all clear whether a random intercept model will suffice or whether the longitudinal model should also contain random slope effects. Moreover, there is large number of covariates to select from in the fixed effects component of the model. The selection can be done by our penalized likelihood method, which includes a penalty function (SCAD or ALASSO) with a random effect and an IC_Q penalty estimate. More details regarding the analyses of these data set is given in Section 5.

The rest of the paper is organized as follows. Section 2 introduces the general development for maximizing the penalized likelihood function and selecting the penalty parameters. Section 3 examines the asymptotic properties of the maximum penalized likelihood (MPL) estimator and the IC_Q penalty selection procedure. Section 4 presents a simulation study to examine the finite sample performance of the maximum penalized likelihood estimate. An real data analysis of the Yale infant growth study is given in Section 5. Section 6 concludes the paper with some discussion.

2. Mixed effects selection for mixed effects models

2.1 Model Formulation

Suppose we observe n independent observations (y₁, X₁),…, (y_n, X_n), where y_i is an n_i × 1 vector of responses or repeated measures and X_i is an n_i × p matrix of fixed covariates for i = 1,…, n. We assume independence among the different (y_i, X_i)’s and

E [y_{i} | b_{i}, X_{i}; θ] = g (X_{i} β + Z_{i} Γ b_{i}),

(1)

where b_i is a q × 1 vector of unobserved random effects, θ denotes all the unknown parameters, Γ is a q × q lower triangular matrix, g(·) is an known link function, β = (β₁,…, β_p)^T is a p × 1 vector of regression coefficients, and Z_i is an n_i × q matrix composed of the columns of X_i. In practice, it is common to assume that the conditional distribution of y_i given (b_i, X_i), denoted by f(y_i|b_i, X_i; θ), belongs to the exponential family, such as the binomial, normal, and Poisson (Little and Schluchter (1985), and Ibrahim and Lipsitz (1996)). For notational simplicity, the random effects b_i ~ N_q(0, I_q) are assumed to follow a multivariate normal distribution with zero mean and a q × q covariance matrix I_q. Equivalently, Γb_i ~ N_q(0, D = ΓΓ^T) and Γ is the Cholesky composition of the q × q matrix D. We allow the possibility of D being positive semi-definite so that certain components of Γb_i may not be random but 0 with probability 1.

2.2 EM Algorithm for Maximizing the Penalized Likelihood

Selecting mixed effects involves identifying the nonzero components of β, determining the nonrandom elements of Γb_i, and simultaneously estimating all nonzero parameters. We propose to maximize the penalized likelihood function given by

PL (θ) = ℓ (θ) - n \sum_{j = 1}^{p} ϕ_{λ_{j}} (| β_{j} |) - n \sum_{k = 1}^{q} ϕ_{λ_{p + k}} (‖ γ_{k} ‖),

(2)

where $ℓ (θ) = \sum_{i = 1}^{n} ℓ_{i} (θ)$ , in which ℓ_i(θ) = log ∫ f(y_i, b_i|X_i; θ) db_i is the observed-data log-likelihood for the ith individual, λ_j is the penalty parameter of β_j, and the penalty function ϕ_{λ_j}(·) is a nonnegative, nondecreasing, and differentiable function on (0, ∞) (Fan and Li, 2001; Zou, 2006). In addition, the k × 1 vector γ_k consists of all nonzero elements of the k-th row of the lower triangular q × q matrix Γ, $‖ γ_{k} ‖ = {(γ_{k}^{T} γ_{k})}^{1 / 2}$ , and λ_p+k is the group penalty parameter corresponding to the whole k-th row of Γ. The structure in (2) ensures that certain estimates of β are zero (Fan and Li, 2001), which are insignificant predictors of the outcome variable, and the other covariates are significant predictors. The penalization of γ_k is performed in a group manner in order to preserve the positive definite constraint on D such that the estimates of the parametric vector γ_k either are all not zero or all equal to zero (Yuan and Li, 2006). If all the elements of γ_k are zero, then the k-th row of Γ is zero and the k-th element of Γb_i is not random.

Similar to Chen and Dunson (2003), we reparametrize the linear predictor as

X_{i} β + Z_{i} Γ b_{i} = (X_{i} (b_{i}^{T} \otimes Z_{i}) J_{q}) (\begin{matrix} β \\ γ \end{matrix}) = U_{i} δ,

(3)

where J_q is the q² × q(q + 1)/2 matrix which transforms γ to vec(Γ), i.e. vec(Γ) = J_qγ. By reparametrizing the linear predictor this way, the selection of mixed effects is equivalent to the problem of grouped variable selection in regression models with missing covariates, while the random effects in the design matrix U_i can be interpreted as the “missing covariates”. Using this reparametrization, we can apply the variable selection methods proposed in Garcia, Ibrahim, and Zhu (2010a; 2010b) to select important mixed effects.

Because the observed-data log-likelihood function usually involves intractable integration, we develop a Monte Carlo EM algorithm to compute the maximum penalized likelihood estimator of θ, denoted by θ̂_λ, for each λ = (λ₁,…,λ_p+q). Denote the complete and observed data for subject i by d_c,i = (y_i, X_i, b_i) and d_o,i = (y_i, X_i), respectively, and the entire complete and observed data by d_c and d_o, respectively. At the s-th iteration, given θ^(s), the E step is to evaluate the penalized Q-function, given by

Q_{λ} (θ | θ^{(s)}) = \sum_{i = 1}^{n} E {log f (d_{i, c}; θ) | d_{o}; θ^{(s)}} - n \sum_{j = 1}^{p} ϕ_{λ_{j}} (| β_{j} |) - n \sum_{k = 1}^{q} ϕ_{λ_{p + k}} (‖ γ_{k} ‖)

(4)

= Q_{1} (θ | θ^{(s)}) - n \sum_{j = 1}^{p} ϕ_{λ_{j}} (| β_{j} |) - n \sum_{k = 1}^{q} ϕ_{λ_{p + k}} (‖ γ_{k} ‖) + Q_{2} (θ^{(s)}),

(5)

where θ = (δ^T, ξ^T)^T, in which ξ represents all other parameters other than δ, d_i,c = (y_i, b_i, X_i), and

Q_{1} (θ | θ^{(s)}) = \sum_{i = 1}^{n} \int {log f (y_{i} | b_{i}, X_{i}; δ, ξ)} f (b_{i} | d_{i, o}; θ^{(s)}) d b_{i},

(6)

Q_{2} (θ^{(s)}) = \sum_{i = 1}^{n} \int {log f (b_{i})} f (b_{i} | d_{i, o}; θ^{(s)}) d b_{i} .

(7)

Since the integrals in (6) and (7) are often intractable, we approximate these integrals by taking a Markov chain Monte Carlo (MCMC) sample of size L from the density f(b_i|d_i,o; θ^(s)) (See Ibrahim, Chen, and Lipsitz, 1999). Let $b_{i}^{(s, l)}$ be the l-th simulated value at the s-th iteration of the algorithm. The integrals in (6) can be approximated as,

Q_{1} (θ | θ^{(s)}) = \frac{1}{L} \sum_{l = 1}^{L} \sum_{i = 1}^{n} log f (y_{i} | b_{i}^{(s, l)}, X_{i}; θ) .

(8)

The M step involves maximizing

Q_{1, λ} (θ | θ^{(s)}) = Q_{1} (θ | θ^{(s)}) - n \sum_{j = 1}^{p} ϕ_{λ_{j}} (| β_{j} |) - n \sum_{k = 1}^{q} ϕ_{λ_{p + k}} (‖ γ_{k} ‖)

(9)

with respect to (δ, ξ). Maximizing Q_1,λ(δ, ξ|θ^(s)) with respect to ξ is straightforward and can be done using a standard optimization algorithm, such as the Newton-Raphson algorithm (Little and Schluchter, 1985; Ibrahim, 1990; Ibrahim and Lipsitz, 1996). Maximizing Q_1,λ with respect to δ is difficult because Q_1,λ is a nondifferentiable and nonconcave function of δ respectively (Zou and Li, 2008).

In order to maximize Q_1,λ, following Fan and Li (2001), a second order Taylor’s series approximation of Q_1,λ centered at the value δ^(s) is used. Using this approximation, Q_1,λ resembles a penalized weighted least squares regression, so algorithms for minimizing penalized least squares can be used (Fan and Li, 2001; Hunter and Li, 2005). We use a modification of the local linear approximation algorithm (LLA) (Zou and Li, 2008) to incorporate grouped penalization. For γ_k, we use an approximation centered at $γ_{k}^{(s)}$ as follows:

ϕ_{λ_{p + k}} (‖ γ_{k} ‖) \approx \sum_{t = 1}^{k} {\frac{ϕ_{λ_{p + k}} (‖ γ_{k}^{(s)} ‖) | γ_{kt}^{(s)} |}{‖ γ_{k}^{(s)} ‖}} | γ_{kt} |,

(10)

where γ_kt is the t-th element of the vector γ_k and we assume $‖ γ_{k}^{(s)} ‖ > 0$ . If $‖ γ_{k}^{(s)} ‖ = 0$ , then we let $γ_{k}^{(s + 1)} = 0$ . Using this approximation, Q_1,λ resembles a penalized regression with an L₂ penalty, so the methods for performing the lasso can be used to maximize Q_1,λ (Tibshirani, 1996; Fu, 1998).

Let $ξ^{(s + 1)} = \underset{ξ}{argmax} Q_{1, λ} (δ^{(s)}, ξ | θ^{(s)}) and δ^{(s + 1)} = \underset{δ}{argmax} Q_{1, λ} (δ, ξ^{(s + 1)} | θ^{(s)})$ . Due to the Taylor’s series approximation of Q₁ and the LLA of ϕ_{λ_j}, θ^(s+1) = (δ^(s+1), ξ^(s+1)) may not necessarily be the maximizer of Q_λ (θ|θ^(s)). By implementing the Expectation Conditional-Maximization (ECM) algorithm (Meng and Rubin, 1993), however, we can find a θ^(s+1) such that Q_λ(θ^(s+1)|θ^(s)) ≥ Q_λ(θ^(s)|θ^(s)) instead of directly maximizing Q_λ(θ|θ^(s)). This process is iterated until convergence and the value at convergence, denoted by θ̂_λ, maximizes the penalized observed data log likelihood function.

2.3 Penalty Parameter Selection Procedure

To ensure that θ̂_λ has good properties, the penalty parameter λ has to be appropriately selected. Two commonly used criteria for selection of the penalty parameter include the Generalized Cross Validation (GCV) and BIC criteria (Wang et al., 2007). These criteria cannot be easily computed in the presence of random effects, because they are functions of observed data quantities whose expressions may require intractable integrals. Moreover, it has been shown in Wang et al. (2007) that even in the simple linear model, the GCV criterion can lead to significant overfit.

We propose two methods to select the penalty parameter: an IC_Q criterion and a random effects penalty selection method. The IC_Q criterion (Ibrahim, Zhu, and Tang, 2008) selects the optimal λ by minimizing

{IC}_{Q} (λ) = - 2 Q ({\hat{θ}}_{λ} | {\hat{θ}}_{0}) + c_{n} ({\hat{θ}}_{λ}),

where ${\hat{θ}}_{0} = \underset{θ}{argmax} ℓ (θ)$ is the unpenalized maximum likelihood estimate and c_n(θ) is a function of the data and the fitted model. For instance, if c_n equals twice the total number of parameters, then we obtain an AIC-type criterion; alternatively, we obtain a BIC-type criterion when c_n(θ) = dim(θ) × log n. Moreover, in the absence of random effects, IC_Q(λ) reduces to the usual AIC or BIC criteria. As in the EM algorithm, we can draw a set of samples from f(b_i|d_i,o; θ̂₀) for i = 1,…, n in order to estimate Q(θ̂_λ|θ̂₀) for any λ.

The random effects penalty estimator is calculated under the assumption that δ is distributed as a random effect vector in a hierarchical model. The quantity λ can be regarded as a hyperparameter vector in the distribution of δ, denoted by f(δ|λ, n). Then, λ can be estimated by maximizing the marginal likelihood with respect to (ξ, λ), which is given by

\int \prod_{i = 1}^{n} \int f (y_{i} | X_{i}, b_{i}, δ; ξ) f (b_{i}) f (δ | λ, n) d b_{i} d δ = \prod_{i = 1}^{n} \int f (d_{i, o} | ξ) f (δ | λ, n) d δ,

(11)

where f(δ|λ, n) is defined by

f (δ | λ, n) = \prod_{j = 1}^{p} exp {- n ϕ_{λ_{j}} (| β_{j} |)} \prod_{k = 1}^{q} exp {- n ϕ_{λ_{p + k}} (‖ γ_{k} ‖)} / {C (λ, n)},

and C(λ, n) is the normalizing constant of f(δ|λ, n). The resulting estimate of λ, denoted by λ̂_RE, from the maximization of (11), is the random effects penalty estimator. Treating δ as missing data, the Monte Carlo EM algorithm can be used to maximize (11) with respect to (ξ, λ).

We consider the SCAD and ALASSO penalty functions for determining λ. The ALASSO penalty is defined by

ϕ_{λ_{j}} (| β_{j} |) = λ_{j} | β_{j} | for j = 1, \dots, p, ϕ_{λ_{p + k}} (‖ γ_{k} ‖) = λ_{p + k} ‖ γ_{k} ‖ for k = 1, \dots, q .

Typical values of λ_j are λ_j = λ₀₁|β̂_j|⁻¹ and $λ_{p + k} = λ_{02} \sqrt{k} {‖ {\hat{γ}}_{k} ‖}^{- 1}$ , where β̂_j and γ̂_k are the unpenalized maximum likelihood (ML) estimates. The multiplier $\sqrt{k}$ normalizes the penalty parameter γ_k in order to accommodate the varying sizes of γ_k. When λ_j = λ₀₁ and $λ_{p + k} = λ_{02} \sqrt{k}$ , the ALASSO reduces to the LASSO penalty.

The SCAD penalty (Fan and Li, 2001) is a nonconcave function defined by ϕ_λ (0) = 0 and for $| β | > 0, ϕ_{λ}^{'} (| β |) = λ 1 (| β | \leq λ) + \frac{(a λ - | β |) +}{a - 1} 1 (| β | > λ)$ , where t₊ denotes the positive part of t and a = 3.7. Because the integral of the negative exponential of the ALASSO and SCAD penalties is not finite, i.e. $\int_{- \infty}^{\infty} exp {- n ϕ_{λ} (‖ λ_{k} ‖)} d γ_{k} = \infty$ , the expression exp{−nϕ_λ(‖λ_k‖)} is defined in a bounded space to ensure that f(δ|λ, n) is a proper density. Since a closed form expression of λ̂_RE is unavailable for both the ALASSO and SCAD penalties, we use the Newton Raphson algorithm along with the ECM algorithm to estimate λ̂_RE.

3. Theoretical Results

In this section, we establish the asymptotic theory of the MPL estimator and the consistency of the penalty selection procedure based on IC_Q. Suppose $β = {(β_{(1)}^{T}, β_{(2)}^{T})}^{T}$ , where β₍₁₎ and β₍₂₎ are, respectively, p₁ × 1 and (p − p₁) × 1 subvectors. Let $β^{*} = {(β_{(1)}^{* T}, β_{(2)}^{* T})}^{T}$ denote the true value of β. Without loss of generality, we assume that $β_{(2)}^{*} = 0$ and all of the components of $β_{(1)}^{*}$ are not equal to zero. Similarly let $γ = {(γ_{1}^{T}, \dots, γ_{k}^{T})}^{T} = {(γ_{(1)}^{T}, γ_{(2)}^{T})}^{T}$ where $γ_{(1)}^{T} = {(γ_{1}^{T}, \dots, γ_{q_{1}}^{T})}^{T}, γ_{(2)}^{T} = {(γ_{q_{1} + (1)}^{T}, \dots, γ_{q}^{T})}^{T}$ and γ₍₁₎ and γ₍₂₎ are q₁(q₁ + 1)/2 × 1 and {q − q₁(q₁ + 1)/2} × 1 subvectors respectively. Let $γ^{*} = {(γ_{(1)}^{* T}, γ_{(2)}^{* T})}^{T}$ denote the true value of γ. Without loss of generality, we assume that $γ_{(2)}^{*} = 0$ and some of the components of each $γ_{k}^{*}$ are not equal to zero for k = 1,…, q₁.

Let 𝒮 = {j₁₁,…, j_1d₁; j₂₁, …, j_2d₂} be a candidate model containing the j₁₁-th, …, j_1d-th columns of X and the j₂₁-th, …, j_2d₂-th columns of Z. Thus, 𝒮_F = {1,…, p; 1,…,q} and 𝒮_T = {1,…, p₁; 1,…, q₁} denote the full and true covariate models, respectively. If 𝒮 misses at least one important covariate, that is 𝒮 ⊅ 𝒮_T, then 𝒮 is referred to as an underfitted model; however, if 𝒮 ⊃ 𝒮_T and 𝒮 ≠ 𝒮_T, then 𝒮 is an overfitted model. The unpenalized and penalized ML estimators of θ = (β^T, γ^T, ξ)^T, denoted by θ̂_S and θ̂_λ, respectively, are defined as

{\hat{θ}}_{S} = \underset{θ : β_{j} \neq 0, \forall j \in 𝒮}{argmax} ℓ (θ) and {\hat{θ}}_{λ} = \underset{θ}{argmax} {ℓ (θ) - n \sum_{j = 1}^{p} ϕ_{λ_{j}} (| β_{j} |) - n \sum_{k = 1}^{q} ϕ_{λ_{p + k}} (‖ γ_{k} ‖)},

, and particularly θ̂_{𝕊_F} = θ̂₀. We obtain the following theorems whose assumptions and proofs can be found in the web-based supplementary document.

THEOREM 1

Under assumptions (C1)–(C7) in the supplementary document, we have

θ̂_λ − θ* = O_p(n^−1/2) as n → ∞, where θ* is the true value of θ;
Sparsity: P(β̂_(2)λ = 0, γ̂_(2)λ = 0) → 1;
Asymptotic normality: $\sqrt{n} {{({\hat{β}}_{(1) λ}^{T}, {\hat{λ}}_{(1) λ}^{T}, {\hat{ξ}}_{λ}^{T})}^{T} - {(β_{(1)}^{* T}, γ_{(1)}^{* T}, ξ_{(1)}^{* T})}^{T}}$ is asymptotically normal with mean and covariance matrix defined in the supplement.

Theorem 1 states that by appropriately choosing the penalty λ, there exists a root-n estimator of θ, θ̂_λ, and that this estimator must possess the sparsity property, i.e. β̂_(2)λ = 0, γ̂_(2)λ = 0 in probability. Moreover, ${({\hat{β}}_{(1) λ}^{T}, {\hat{λ}}_{(1) λ}^{T}, {\hat{ξ}}_{λ}^{T})}^{T}$ is asymptotically normal.

We investigate whether the IC_Q(λ) criterion can consistently select the correct model. For each λ ∈ R^p+, (β̂_λ, γ̂_λ) naturally defines a candidate model 𝕊_λ = {j : β̂_λ,j ≠ 0; k : ‖γ̂_λ,k‖ ≠ 0}. Generally, 𝒮_λ can be either underfitted, overfitted, or true. Therefore, R^p+ can be partitioned into three mutually exclusive regions $R_{u}^{p +} = {λ \in R^{p +} : 𝒮_{λ} ⊅ 𝒮_{T}}, R_{t}^{p +} = {λ \in R^{p +} : 𝒮_{λ} = 𝒮_{T}}, and R_{o}^{p +} = {λ \in R^{p +} : 𝒮_{λ} ⊅ 𝒮_{T}, 𝒮_{λ} \neq 𝒮_{T}},$ . Furthermore, if we can choose a reference penalty parameter sequence ${λ_{n} \in R^{p +}}_{n = 1}^{\infty}$ , which satisfies the conditions of Theorem 1, then 𝒮_{λ_n} = 𝒮_T in probability.

To select λ we first calculate

{dIC}_{Q} (λ_{2,} λ_{1}) = {IC}_{Q} (λ_{2}) - {IC}_{Q} (λ_{1}) = - 2 Q ({\hat{θ}}_{λ_{2}} | {\hat{θ}}_{0}) + c_{n} ({\hat{θ}}_{λ_{2}}) + 2 Q ({\hat{θ}}_{λ_{1}} | {\hat{θ}}_{0}) - c_{n} ({\hat{θ}}_{λ_{1}})

for any two λ₁ and λ₂. We assume 𝒮_λ₂ ⊃ 𝒮_λ₁ and choose the model 𝒮_λ₁ resulting from using the penalty value λ₁ if dIC_Q(λ₂, λ₁) ≥ 0, otherwise we choose the model 𝒮λ₂.

Define $δ_{Q} (λ_{1}, λ_{2}) = E {Q (θ_{𝒮_{λ_{1}}}^{*} | θ^{*})} - E {Q (θ_{𝒮_{λ_{2}}}^{*} | θ^{*})}$ , and δ_c(λ₂, λ₁) = c_n(θ̂_λ₂) − c_n(θ̂_λ₁), where $θ_{𝒮}^{*}$ is defined in the supplementary document.

THEOREM 2

Under assumptions (C1)–(C7) in the supplementary document, we have the following results.

If for all 𝒮_λ ⊅ 𝒮_T, $\underset{n}{lim inf} δ_{Q} (λ, 0) / n > 0$ and δ_c(λ, 0) = o_p(n), then dIC_Q(λ, 0) > 0 in probability.
If $E {Q (θ_{𝒮_{λ_{1}}}^{*} | \hat{θ_{0}})} - E {Q (θ_{𝒮_{λ_{2}}}^{*} | \hat{θ_{0}})} = O_{p} (n^{1 / 2}) and Q ({\hat{θ}}_{λ_{t}} | {\hat{θ}}_{0}) - E {Q (θ_{𝒮_{λ_{t}}}^{*} | \hat{θ_{0}})} = O_{p} (n^{1 / 2})$ for t = 1, 2, then dIC_Q(λ₂, λ₁) > 0 in probability as n^−1/2δ_c(λ₂, λ₁) converges to ∞ in probability.
If Q(θ̂_λ₁ |θ̂₀) − Q(θ̂_λ₂ |θ̂₀) = O_p(1), then dIC_Q(λ₂, λ₁) > 0 in probability as δ_c(λ₂, λ₁) converges to ∞ in probability.

Theorem 2 has some important implications. Theorem 2(a) shows that IC_Q(λ) chooses all significant covariates with probability 1. Because $𝒮_{0} \subset R_{t}^{p} \cup R_{o}^{p}$ , the optimal model selected by minimizing IC_Q(λ) will not select a λ with 𝒮_λ ⊅ 𝒮_T because dIC_Q(λ, 0) > 0 in probability. Therefore, the IC_Q(λ) criterion selects all significant covariates with probability tending to 1. Generally, the most commonly used c_n(θ), such as 2dim(θ), dim(θ) log(n), and K log log(n) (K > 0), satisfy the condition δ_c(λ, 0) = o_p(n). The condition $\underset{n}{lim inf} n^{- 1} δ_{Q} (λ, 0) > 0$ ensures that IC_Q(λ) chooses a model with large $E {Q (θ_{𝒮}^{*} | θ^{*})}$ . This condition is analogous to condition 2 in (Wang et al., 2007), which elucidates the effect of underfitted models. The term $n^{- 1} E {Q (θ^{*} | θ^{*})} - n^{- 1} E {Q (θ_{𝒮}^{*} | θ^{*})}$ can be written as

n^{- 1} ℓ (θ^{*}) - n^{- 1} ℓ (θ_{S}^{*}) + n^{- 1} E {H (θ^{*} | θ^{*})} - n^{- 1} E {H (θ_{S}^{*} | θ^{*})},

(12)

where

H (θ_{1} | θ_{2}) = \sum_{i = 1}^{n} \int log {f (b_{i} | d_{o, i}; θ_{1})} f (b_{im} | d_{o, i}; θ_{2}) d b_{im} .

(13)

By Jensen’s inequality, the third and fourth terms of (12) are greater than zero and the first and second terms must be greater than zero for large n. Thus, lim inf_n n⁻¹δ_Q(λ, 0) ≥ 0 in probability.

If λ₁ and λ₂ have the same average $n^{- 1} E {Q (θ_{𝒮_{λ}}^{*} | θ^{*})}$ , that is, lim inf_n n⁻¹δ_Q(λ₂, λ₁) = 0, then Theorem 2 (b) and (c) indicate that IC_Q(λ) picks out the smaller model 𝒮_λ₁ when δ_c(λ₂, λ₁) increases to ∞ at a certain rate (e.g., log(n)). For example, for the BIC-type criterion, δ_c(λ₂, λ₁) = {dim(θ̂_{𝒮_λ₂}) − dim(θ̂_{𝒮_λ₁})} log(n) ≥ log(n) since we assume 𝒮_λ₂ ⊃ 𝒮_λ₁. The AIC-type criterion, for which c_n(θ) = 2 × dim(θ), however, does not satisfy this condition. Thus, similar to the AIC criterion with no random effects, IC_Q(λ) with c_n(θ) = 2 × dim(θ) tends to overfit.

4. Simulation Study

We use simulations to examine the finite sample performance of the maximum penalized likelihood estimates using our proposed penalty estimators and compare them to the unpenalized ML estimate. Our objectives for these simulations are to 1) compare the random effects and IC_Q penalty estimators and 2) to compare the SCAD, LASSO, and ALASSO penalty functions.

To do this, we simulated a data set consisting of n independent observations according to the model y_i = X_iβ+Z_iΓb_i+σε_i, i = 1,…, n, where b_i and ε_i are independent and standard multivariate normal random vectors, and β = (3, 2, 1.5, 0, 0, 0, 0, 0)^T. Moreover, ΓΓ^T = D is a 3 × 3 matrix, such that the (r, s) element of D is ρ^|r−s|. The matrix X_i is a 12 × 8 matrix of independent rows, where each row of X_i has mean zero and covariance matrix Σ_xx whose (r, s) element is ρ^|r−s|. The matrix Z_i was set equal to X_i.

We considered six different settings: (n = 50, σ = 3), (n = 50, σ = 1), (n = 100, σ = 3), (n = 100, σ = 1), (n = 200, σ = 3), and (n = 200, σ = 1) with a value of ρ = .5 for all settings. For each setting, one design matrix was simulated and 100 data sets (y_i, X_i) for i = 1,…, n were generated.

For each simulated data set, the maximum penalized likelihood (MPL) estimate using the SCAD, LASSO and ALASSO penalties was computed using the random effects and IC_Q penalty estimates. These estimates are denoted as SCAD-RE, SCAD-IC_Q, LASSO-RE, LASSO-IC_Q, ALASSO-RE, and ALASSO-IC_Q, respectively. For the IC_Q estimate, the BIC-type criterion, c_n(θ) = dim(θ) log n, was used. For the Monte Carlo EM algorithm, 2000 Monte Carlo iterations were used within each iteration of EM. For the SCAD and LASSO penalties, we set λ_j = λ₀₁, for j = 1,… 8, and $λ_{8 + k} = λ_{02} \sqrt{k}$ , for k = 1,…,3 while for the ALASSO penalty, λ_j = λ₀₁|β̂_j|⁻¹, for j = 1,… 8, and $λ_{8 + k} = λ_{02} \sqrt{k} {‖ {\hat{γ}}_{k} ‖}^{- 1}$ for k = 1,…,3 where β̂_j, and γ̂_k are the unpenalized ML estimates of β_j and γ_k respectively, and the penalty (λ₀₁, λ₀₂) was estimated using the IC_Q and random effects penalty selection methods.

For each estimate, the penalized estimate of β and D were computed, denoted as β̂_λ and D̂_λ respectively, and the mean square error ME(β̂_λ) = (β̂_λ − β)^TΣ_xx(β̂_λ − β)^T and the quadratic loss error ME(D̂_λ) = trace[(D̂_λ − D)²]^1/2 were computed. The ratio of the model error of the MPL estimate to that of the unpenalized ML estimate, ME(β̂_λ)/ME(β̂₀) and ME(D̂_λ)/ME(D̂₀), were computed for each data set and the median of the ratios over the 100 simulated data sets, denoted as MRME, was calculated. The MRME of the true model is also reported. In addition, we report two types of errors regarding the fixed and random effects. ZERO₁ is the mean number of type I errors (an effect is truly not significant or random but the corresponding MPL estimate indicates it is significant or random) and ZERO₂ is the mean number of type II errors (an effect is truly significant or random but the corresponding MPL estimate indicates it is not significant or random).

For the MPL estimates, MRME values greater than one indicate that the estimate performs worse than the ML estimate, values near one indicate it performs as good as the ML estimate, while values near the ‘true’ MRME value indicate optimal performance. The values ZERO₁ and ZERO₂ can be interpreted as estimates of the probability of overfit and underfit, respectively, and the value 1 − ZERO₁ − ZERO₂ is an estimate of the probability of selecting the true model. Ideally, one would like to have MPL estimates with small ZERO₁ and ZERO₂ values and small MRME values. Overall, the MRME values of all of the MPL estimates were less than or equal to one, which indicates that regardless of the sample size or noise level, the MPL estimates perform better than the ML estimate. Across all samples sizes and noise levels, the MRME values of the MPL estimates using the random effects penalty estimates was higher than the MPL estimates using the IC_Q penalty estimates. For the IC_Q MPL estimates, as the noise level decreases from σ = 3 to σ = 1, the MRME values increase. For a fixed noise level, the MRME values at sample sizes of n = 50 and n = 200 are comparable but there is a slight decrease in the MRME values at sample sizes of n = 100. This indicates that the MPL estimates perform better, relative to the MLE, at low noise levels and near sample sizes of n = 100. The MPL estimates using the random effects penalty estimate tended to overfit significantly. On average, the MPL estimate using the ALASSO penalty function had smaller estimation error and overfit than the LASSO estimate. For estimating fixed effects, the SCAD-IC_Q estimate has, on average, smaller estimation error and overfit than the other estimates. For estimating the random effects, the ALASSO-IC_Q has smaller error and overfit.

5. Yale Infant Growth Study

We applied the proposed methodology to the Yale infant growth study of Wasserman and Leventhal (1993) and Stier et al. (1993). The Yale infant growth data were collected to study whether cocaine exposure during pregnancy leads to the maltreatment of infants after birth, such as physical and sexual abuse. A total of 298 children were recruited from two subject groups (cocaine exposure group and unexposed group). Throughout the study different children had different numbers and patterns of visits during the study period. The multivariate response was weight of the infant at each visit. Let y_ij denote the weight (in pounds) at the j-th visit of infant i, for i = 1,…, 298, j = 1,…, n_i and let y_i = (y_i1,…, y_{in_i}). The covariates used were: x_ij1 = day of visit, x_ij2 = age (in years) of mother, x_ij3 = gestational age (in weeks) of infant, x_ij4 = race (2 levels: African American and other, coded as 1 and 0), x_ij5 = previous pregnancies (2 levels: no and yes, coded as 1 and 0), x_ij6 = gender of infant (2 levels: male and female, coded as 1 and 0), x_ij7 = cocaine exposure (2 levels: yes and no, coded as 1 and 0). The design matrix X_i is a n_i × 8 matrix with the j-th row equal to (1, x_ij1, x_ij2, x_ij3, x_ij4, x_ij5, x_ij6, x_ij7), Z_i is a n_i × 3 matrix composed of the first 3 continuous covariates of X_i, i.e., the j-th row of Z_i is (x_ij1, x_ij2, x_ij3), and therefore q = 3 here. All covariates were centered in the analysis for numerical stability. Further, we assume that [y_i|X_i; β, D] is normally distributed with mean E(y_i) = X_iβ + Z_iΓb_i, where ΓΓ^T = D and [y_ij |X_i; β, D] and [y_ij′|X_i; β, D] are independent for j ≠ j′.

The objective of this analysis was to determine the significant predictors of infant weight and the significant random effects. Because the ALASSO penalty outperformed the LASSO penalty in the simulations, only the SCAD and ALASSO penalty functions were used along with the IC_Q and random effects penalty estimates. Note that the intercept term was not penalized. For the SCAD, λ_j = λ₀₁ for j = 2,…, 8 and $λ_{8 + k} = λ_{02} \sqrt{k}$ , for k = 1,…, 3, while for the ALASSO penalty, λ_j = λ₀₁|β̂_j|⁻¹ for j = 2, …, 8 and $λ_{8 + k} = λ_{02} \sqrt{k} {‖ {\hat{γ}}_{k} ‖}^{- 1}$ , for k = 1,…, 3, where β̂_j and γ̂_k are the unpenalized ML estimates of β_j and γ_k, respectively, and (λ₀₁, λ₀₂) was estimated using the IC_Q and random effects penalty selection methods.

The results of the analysis are presented in Table 2. The MPL estimates using the SCAD penalty identify visit, gestational age of infant, gender of infant and cocaine exposure as significant predictors of infant weight, and visit as significant random effect. These estimates coincide with the results of the maximum likelihood analysis which identify the same fixed and random effects as significant (significant effects by MLE analysis are indicated by a * in Table 2). The results of using the SCAD with two different sets of penalty estimates are similar. Although the estimates using SCAD with the IC_Q penalty estimates do not shrink the random-effect variances for age and gestational age to 0, these variance estimates are relatively smaller than that of the visit random effect, which still identifies the correct random-effect. The MPL estimate using the ALASSO penalty shrunk two more coefficients of the fixed effects to zero: gender and cocaine. Although these two effects are identified as significant in the MLE, we see that their corresponding MLE estimates are smaller relative to the other significant fixed effects. The estimates using the ALASSO penalty with the IC_Q penalty estimates are close to that of the RE penalty estimates. The MPL estimates using the ALASSO penalty identify visit and gestational age of infant as significant fixed effects, and visit as a significant random effect.

Table 2.

Maximum penalized likelihood estimates of Yale infant grown data comparing SCAD and ALASSO penalty functions with random effects and IC_Q penalty estimates

Fixed Estimate^a (Variance Estimate of Random Effect^b)

		SCAD		ALASSO

Variable	MLE^c	RE	IC_Q	RE	IC_Q
Intercept	7.002* (-)	6.924 (-)	6.988 (-)	6.913 (-)	6.913 (-)
Visit	2.641* (0.230*)	2.576 (0.087)	2.617 (0.109)	2.543 (0.040)	2.548 (0.067)
Age	−0.035 (0.017)	0.000 (0.000)	0.000 (0.007)	0.000 (0.000)	0.000 (0.000)
Gestation	0.528* (0.017)	0.424 (0.000)	0.455 (0.011)	0.322 (0.000)	0.424 (0.000)
Race	−0.060 (-)	0.000 (-)	0.000 (-)	0.000 (-)	0.000 (-)
Pregnant	−0.004 (-)	0.000 (-)	0.000 (-)	0.000 (-)	0.000 (-)
Gender	0.139* (-)	0.022 (-)	0.033 (-)	0.000 (-)	0.000 (-)
Cocaine	0.103* (-)	0.016 (-)	0.022 (-)	0.000 (-)	0.000 (-)
σ² ^d	0.512 (-)	0.552 (-)	0.527 (-)	0.612 (-)	0.594 (-)

IC_Q^e	9223.7	11507.32	9660.013	11999.01	11773.25

Open in a new tab

is estimate of β

is estimate of diag(D)

* indicates significant effects by MLE analysis

is the variance estimate of error term of the linear mixed model

is a measure of goodness of fit

6. Discussion

We have proposed a general method which performs simultaneous fixed and random effects selection as well as estimation. Under certain regularity conditions and appropriate assumptions on the penalty parameters, the maximum penalized likelihood estimate possesses oracle properties. We have used two methods for estimating the penalty parameters, the random effects and IC_Q penalty selection methods, and showed that under an appropriate choice of c_n(θ), the IC_Q penalty estimate chooses all the significant fixed and random effects with probability 1. Since penalized likelihood methods have been shown to perform poorly in finite samples, simulations were performed to examine the finite sample properties of the maximum penalized likelihood estimators and the performance of the Monte Carlo EM algorithm. In the simulations, the SCAD and ALASSO penalty functions using the IC_Q penalty estimate performed best and had significantly less estimation error than the maximum likelihood estimate. Unlike previous implementations of the random effects penalty estimate (Garcia, Ibrahim, and Zhu, 2010a, 2010b), the simulations and real data analysis results show that for mixed effects regression models, the random effects penalty estimate has significant overfit. For estimating fixed effects, the SCAD-IC_Q estimate had, on average, smaller estimation error and overfit, while for estimating random effects, the ALASSO-IC_Q had smaller error and overfit.

Many aspects of this work warrant further research and investigation. Recent developments have shown that there may be more than one plausible scheme for formulating the grouped penalty in the penalized likelihood (Zhao et al., 2009; Breheny and Huang, 2009). To select significant random effects using a cholesky parametrization of the covariance matrix of the random effects requires that each row of the cholesky matrix to be penalized as a group. Other parameters, however, can be grouped and penalized in various ways. For instance, it is possible to group parameters corresponding to the fixed effects if one is interested in determining whether a particular group of fixed effects is significant or not. It is also possible to use different penalty functions for each group of parameters.

The objective of this paper was to perform simultaneous selection of fixed and random effects. To the best of our knowledge, this is the first paper to propose this type of methodology. In the existing literature, (Gurka, 2006; Chen and Dunson, 2003; Daniels and Kass, 1999, 2001), the predominant approach to mixed effects selection has been to fix either the mean model or the covariance structure of the random effects and then either test variance components or perform variable selection on the mean model (Keselman et al., 1998). This approach, since it fixes certain parts of the model, makes assumptions regarding the model structure which may not inappropriate. A possible reason that simultaneous mixed effects selection may not have been pursued before is perhaps due to the numerical complexity inherent in the model fitting algorithms. With penalized likelihood methods, however, simultaneous mixed effects selection is straightforward to implement and no assumptions are necessary regarding any part of the model.

As it stands, calculating the IC_Q penalty estimator is slightly demanding. An alternative to IC_Q penalty parameter estimation is to select the penalty parameter which optimizes other criteria developed in mixed effects models such as those in Claeskens and Consentino (2008) and Liang, Wu, and Zou (2008). We will formally study these issues in future work.

Supplementary Material

Supp material

NIHMS216083-supplement-Supp_material.pdf^{(178.4KB, pdf)}

Table 1.

Simulation results of linear mixed effects models comparing SCAD, LASSO, and ALASSO penalty functions with random effect and IC_Q penalty estimates

β Estimate (D Estimate)

Model	Method	MRME	ZERO₁	ZERO₂
n = 50, σ = 3	SCAD-RE	0.576 (0.980)	0.11 (0.94)	0.00 (0.00)
	SCAD-IC_Q	0.552 (0.259)	0.01 (0.09)	0.00 (0.01)
	LASSO-RE	0.983 (0.988)	0.99 (1.00)	0.00 (0.00)
	LASSO-IC_Q	0.605 (0.241)	0.04 (0.10)	0.00 (0.01)
	ALASSO-RE	0.949 (0.983)	0.80 (1.00)	0.00 (0.00)
	ALASSO-IC_Q	0.597 (0.263)	0.01 (0.13)	0.00 (0.01)
	True	0.559 (0.228)	0.00 (0.00)	0.00 (0.00)
n = 50, σ = 1	SCAD-RE	0.906 (0.803)	0.58 (1.00)	0.00 (0.00)
	SCAD-IC_Q	0.869 (0.461)	0.03 (0.13)	0.00 (0.00)
	LASSO-RE	0.997 (0.996)	0.99 (1.00)	0.00 (0.00)
	LASSO-IC_Q	0.884 (0.438)	0.04 (0.08)	0.00 (0.00)
	ALASSO-RE	0.983 (0.989)	0.81 (1.00)	0.00 (0.00)
	ALASSO-IC_Q	0.858 (0.441)	0.03 (0.10)	0.00 (0.00)
	True	0.846 (0.439)	0.00 (0.00)	0.00 (0.00)
n = 100, σ = 3	SCAD-RE	0.571 (0.970)	0.13 (0.93)	0.00 (0.00)
	SCAD-IC_Q	0.565 (0.219)	0.01 (0.04)	0.00 (0.00)
	LASSO-RE	0.993 (0.994)	0.99 (1.00)	0.00 (0.00)
	LASSO-IC_Q	0.584 (0.232)	0.01 (0.04)	0.00 (0.00)
	ALASSO-RE	0.949 (0.987)	0.81 (1.00)	0.00 (0.00)
	ALASSO-IC_Q	0.574 (0.205)	0.01 (0.04)	0.00 (0.00)
	True	0.513 (0.196)	0.00 (0.00)	0.00 (0.00)
n = 100, σ = 1	SCAD-RE	0.895 (0.803)	0.57 (1.00)	0.00 (0.00)
	SCAD-IC_Q	0.820 (0.452)	0.01 (0.07)	0.00 (0.00)
	LASSO-RE	0.999 (0.997)	0.99 (1.00)	0.00 (0.00)
	LASSO-IC_Q	0.835 (0.478)	0.03 (0.08)	0.00 (0.00)
	ALASSO-RE	0.982 (0.989)	0.82 (1.00)	0.00 (0.00)
	ALASSO-IC_Q	0.839 (0.415)	0.02 (0.06)	0.00 (0.00)
	True	0.832 (0.392)	0.00 (0.00)	0.00 (0.00)
n = 200, σ = 3	SCAD-RE	0.553 (0.987)	0.13 (0.94)	0.00 (0.00)
	SCAD-IC_Q	0.554 (0.245)	0.01 (0.07)	0.00 (0.00)
	LASSO-RE	0.995 (0.996)	0.99 (1.00)	0.00 (0.00)
	LASSO-IC_Q	0.617 (0.244)	0.05 (0.09)	0.00 (0.00)
	ALASSO-RE	0.934 (0.992)	0.78 (1.00)	0.00 (0.00)
	ALASSO-IC_Q	0.603 (0.237)	0.02 (0.11)	0.00 (0.00)
	True	0.546 (0.218)	0.00 (0.00)	0.00 (0.00)
n = 200, σ = 1	SCAD-RE	0.902 (0.833)	0.55 (1.00)	0.00 (0.00)
	SCAD-IC_Q	0.853 (0.487)	0.01 (0.12)	0.00 (0.00)
	LASSO-RE	0.998 (0.998)	0.99 (1.00)	0.00 (0.00)
	LASSO-IC_Q	0.873 (0.554)	0.07 (0.20)	0.00 (0.00)
	ALASSO-RE	0.982 (0.991)	0.79 (1.00)	0.00 (0.00)
	ALASSO-IC_Q	0.871 (0.468)	0.02 (0.11)	0.00 (0.00)
	True	0.839 (0.408)	0.00 (0.00)	0.00 (0.00)

Open in a new tab

Acknowledgments

The authors wish to thank the editor, associate editor and two referees for helpful comments and suggestions, which have led to an improvement of this article. This research was partially supported by NSF grant BCS-08-26844 and NIH grants GM 70335, CA 74015, RR025747-01, MH086633, AG033387, and P01CA142538-01.

Footnotes

Supplementary Materials

Web-based supplementary document referenced in Section 3 is available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

References

Bondell HD, Krishna A, Ghosh SK. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics. 2010 doi: 10.1111/j.1541-0420.2010.01391.x. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Breheny P, Huang J. Penalized methods for bi-level variable selection. Statistics and its Interface. 2009;2:369–380. doi: 10.4310/sii.2009.v2.n3.a10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. doi: 10.1093/biomet/92.2.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Claeskens G, Consentino F. Variable selection with incomplete covariate data. Biometrics. 2008;64:1062–1096. doi: 10.1111/j.1541-0420.2008.01003.x. [DOI] [PubMed] [Google Scholar]
Chen Z, Dunson D. Random effects selection in linear mixed models. Biometrics. 2003;59:762–769. doi: 10.1111/j.0006-341x.2003.00089.x. [DOI] [PubMed] [Google Scholar]
Daniels MJ, Kass RE. Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association. 1999;94:1254–1263. [Google Scholar]
Daniels MJ, Kass RE. Shrinkage estimators for covariance matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Annals of Statistics. 2002;30:74–99. [Google Scholar]
Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of American Statistical Association. 2004;99:710–723. [Google Scholar]
Fu W. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics. 1998;7:375–384. [Google Scholar]
Garcia RI, Ibrahim JG, Zhu H. Variable selection for regression models with missing data. Statistica Sinica. 2010a;20:149–165. [PMC free article] [PubMed] [Google Scholar]
Garcia RI, Ibrahim JG, Zhu H. Variable selection in the Cox regression model with covariates missing at random. Biometrics. 2010b;66:97–104. doi: 10.1111/j.1541-0420.2009.01274.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gurka MJ. Selecting the best linear mixed model under REML. American Statistician. 2006;60:19–26. [Google Scholar]
Hunter DR, Li R. Variable selection using MM algorithms. Annals of Statistics. 2005;33:1617–1642. doi: 10.1214/009053605000000200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ibrahim JG. Incomlete data in generalized linear models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]
Ibrahim JG, Chen MH, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–596. doi: 10.1111/j.0006-341x.1999.00591.x. [DOI] [PubMed] [Google Scholar]
Ibrahim JG, Lipsitz SR. Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. Biometrics. 1996;52:1071–1078. [PubMed] [Google Scholar]
Ibrahim JG, Zhu H, Tang N. Model selection criteria for missing-data problems using the em algorithm. Journal of the American Statistical Association. 2008;103:1648–1658. doi: 10.1198/016214508000001057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnson B, Lin DY, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Assoication. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Communications in Statistics - Simulation and Computation. 1998;27:591–604. [Google Scholar]
Kowalchuck RK, Keselman HJ, Algina J, Wolfinger RD. The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement. 2004;64:224–242. [Google Scholar]
Krishna A. North Carolina State University; 2009. Joint variable selection of fixed and random effects in linear mixed-effects model and its oracle properties. unpublished thesis. [Google Scholar]
Leeb H, Potscher BM. Sparse estimators and the oracle property, or the return of Hodges’ Estimator. Journal of Econometrics. 2008;142:201–211. [Google Scholar]
Liang H, Wu H, Zou G. A note on conditional AIC for linear mixed effects-models. Biometrika. 2008;95:773–778. doi: 10.1093/biomet/asn023. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin X. Variance component testing in generalized linear models with random effects. Biometrika. 1997;84:309–326. [Google Scholar]
Little RJA, Schluchter M. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika. 1985;72:497–512. [Google Scholar]
Meng XL, Rubin DB. Maximum likleihood estimation via the ECM algorithm: a general framework. Biometrika. 1993;80:267–278. [Google Scholar]
Ni X, Zhang D, Zhang H. Variable selection for semiparametric mixed models in longitudinal studies. Biometrics. 2009;66:79–88. doi: 10.1111/j.1541-0420.2009.01240.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qu A, Li R. Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics. 2006;62:379–391. doi: 10.1111/j.1541-0420.2005.00490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stier DM, Leventhal JM, Berg AT, Johnson L, Mezger J. Are children born to young mothers at increased risk of maltreatment? Pediatrics. 1993;91:642–648. [PubMed] [Google Scholar]
Thall PF, Vail SX. Some covariance models for longitudinal dount data with overdispersion. Biometrics. 1990;46:657–671. [PubMed] [Google Scholar]
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B. 1996;58:267–288. [Google Scholar]
Wasserman DR, Leventhal JM. Maltreatment of children born to cocaine-dependent mothers. American J. Diseases of Children. 1993;147:1324–1328. doi: 10.1001/archpedi.1993.02160360066021. [DOI] [PubMed] [Google Scholar]
Wang H, Li R, Tsai CL. Tuning parameter selector for the smoothly clippped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statistic. Soc. B. 2006;68:49–67. [Google Scholar]
Zhang H, Lu W. Adaptive-LASSO for Cox’s proportional hazards model. Biometrika. 2007;94:1–13. [Google Scholar]
Zhao P, Rocha G, Yu B. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics. 2009;37:3468–3497. [Google Scholar]
Zhu HT, Zhang HP. Generalized score test for homogeneity for mixed effects models. Annals of Statistics. 2006;34:1545–1569. [Google Scholar]
Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
Zou H, Li R. One-step sparse estimates in noncancave penalized likelihood models. Annals of Statistics. 2008;36:1509–1533. doi: 10.1214/009053607000000802. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp material

NIHMS216083-supplement-Supp_material.pdf^{(178.4KB, pdf)}

[R1] Bondell HD, Krishna A, Ghosh SK. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics. 2010 doi: 10.1111/j.1541-0420.2010.01391.x. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Breheny P, Huang J. Penalized methods for bi-level variable selection. Statistics and its Interface. 2009;2:369–380. doi: 10.4310/sii.2009.v2.n3.a10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. doi: 10.1093/biomet/92.2.303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Claeskens G, Consentino F. Variable selection with incomplete covariate data. Biometrics. 2008;64:1062–1096. doi: 10.1111/j.1541-0420.2008.01003.x. [DOI] [PubMed] [Google Scholar]

[R5] Chen Z, Dunson D. Random effects selection in linear mixed models. Biometrics. 2003;59:762–769. doi: 10.1111/j.0006-341x.2003.00089.x. [DOI] [PubMed] [Google Scholar]

[R6] Daniels MJ, Kass RE. Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association. 1999;94:1254–1263. [Google Scholar]

[R7] Daniels MJ, Kass RE. Shrinkage estimators for covariance matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]

[R9] Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Annals of Statistics. 2002;30:74–99. [Google Scholar]

[R10] Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of American Statistical Association. 2004;99:710–723. [Google Scholar]

[R11] Fu W. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics. 1998;7:375–384. [Google Scholar]

[R12] Garcia RI, Ibrahim JG, Zhu H. Variable selection for regression models with missing data. Statistica Sinica. 2010a;20:149–165. [PMC free article] [PubMed] [Google Scholar]

[R13] Garcia RI, Ibrahim JG, Zhu H. Variable selection in the Cox regression model with covariates missing at random. Biometrics. 2010b;66:97–104. doi: 10.1111/j.1541-0420.2009.01274.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Gurka MJ. Selecting the best linear mixed model under REML. American Statistician. 2006;60:19–26. [Google Scholar]

[R15] Hunter DR, Li R. Variable selection using MM algorithms. Annals of Statistics. 2005;33:1617–1642. doi: 10.1214/009053605000000200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Ibrahim JG. Incomlete data in generalized linear models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]

[R17] Ibrahim JG, Chen MH, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–596. doi: 10.1111/j.0006-341x.1999.00591.x. [DOI] [PubMed] [Google Scholar]

[R18] Ibrahim JG, Lipsitz SR. Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. Biometrics. 1996;52:1071–1078. [PubMed] [Google Scholar]

[R19] Ibrahim JG, Zhu H, Tang N. Model selection criteria for missing-data problems using the em algorithm. Journal of the American Statistical Association. 2008;103:1648–1658. doi: 10.1198/016214508000001057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Johnson B, Lin DY, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Assoication. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Communications in Statistics - Simulation and Computation. 1998;27:591–604. [Google Scholar]

[R22] Kowalchuck RK, Keselman HJ, Algina J, Wolfinger RD. The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement. 2004;64:224–242. [Google Scholar]

[R23] Krishna A. North Carolina State University; 2009. Joint variable selection of fixed and random effects in linear mixed-effects model and its oracle properties. unpublished thesis. [Google Scholar]

[R24] Leeb H, Potscher BM. Sparse estimators and the oracle property, or the return of Hodges’ Estimator. Journal of Econometrics. 2008;142:201–211. [Google Scholar]

[R25] Liang H, Wu H, Zou G. A note on conditional AIC for linear mixed effects-models. Biometrika. 2008;95:773–778. doi: 10.1093/biomet/asn023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Lin X. Variance component testing in generalized linear models with random effects. Biometrika. 1997;84:309–326. [Google Scholar]

[R27] Little RJA, Schluchter M. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika. 1985;72:497–512. [Google Scholar]

[R28] Meng XL, Rubin DB. Maximum likleihood estimation via the ECM algorithm: a general framework. Biometrika. 1993;80:267–278. [Google Scholar]

[R29] Ni X, Zhang D, Zhang H. Variable selection for semiparametric mixed models in longitudinal studies. Biometrics. 2009;66:79–88. doi: 10.1111/j.1541-0420.2009.01240.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Qu A, Li R. Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics. 2006;62:379–391. doi: 10.1111/j.1541-0420.2005.00490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Stier DM, Leventhal JM, Berg AT, Johnson L, Mezger J. Are children born to young mothers at increased risk of maltreatment? Pediatrics. 1993;91:642–648. [PubMed] [Google Scholar]

[R32] Thall PF, Vail SX. Some covariance models for longitudinal dount data with overdispersion. Biometrics. 1990;46:657–671. [PubMed] [Google Scholar]

[R33] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B. 1996;58:267–288. [Google Scholar]

[R34] Wasserman DR, Leventhal JM. Maltreatment of children born to cocaine-dependent mothers. American J. Diseases of Children. 1993;147:1324–1328. doi: 10.1001/archpedi.1993.02160360066021. [DOI] [PubMed] [Google Scholar]

[R35] Wang H, Li R, Tsai CL. Tuning parameter selector for the smoothly clippped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statistic. Soc. B. 2006;68:49–67. [Google Scholar]

[R37] Zhang H, Lu W. Adaptive-LASSO for Cox’s proportional hazards model. Biometrika. 2007;94:1–13. [Google Scholar]

[R38] Zhao P, Rocha G, Yu B. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics. 2009;37:3468–3497. [Google Scholar]

[R39] Zhu HT, Zhang HP. Generalized score test for homogeneity for mixed effects models. Annals of Statistics. 2006;34:1545–1569. [Google Scholar]

[R40] Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]

[R41] Zou H, Li R. One-step sparse estimates in noncancave penalized likelihood models. Annals of Statistics. 2008;36:1509–1533. doi: 10.1214/009053607000000802. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Fixed and Random Effects Selection in Mixed Effects Models

Joseph G Ibrahim

Hongtu Zhu

Ramon I Garcia

Ruixin Guo

SUMMARY

1. Introduction

2. Mixed effects selection for mixed effects models

2.1 Model Formulation

2.2 EM Algorithm for Maximizing the Penalized Likelihood

2.3 Penalty Parameter Selection Procedure

3. Theoretical Results

THEOREM 1

THEOREM 2

4. Simulation Study

5. Yale Infant Growth Study

Table 2.

6. Discussion

Supplementary Material

Table 1.

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Fixed and Random Effects Selection in Mixed Effects Models

Joseph G Ibrahim

Hongtu Zhu

Ramon I Garcia

Ruixin Guo

SUMMARY

1. Introduction

2. Mixed effects selection for mixed effects models

2.1 Model Formulation

2.2 EM Algorithm for Maximizing the Penalized Likelihood

2.3 Penalty Parameter Selection Procedure

3. Theoretical Results

THEOREM 1

THEOREM 2

4. Simulation Study

5. Yale Infant Growth Study

Table 2.

6. Discussion

Supplementary Material

Table 1.

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases