Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Feb 14.
Published in final edited form as: J Am Stat Assoc. 2021 Apr 20;117(540):1835–1846. doi: 10.1080/01621459.2021.1888740

Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach

Sai Li 1, T Tony Cai 2, Hongzhe Li 3
PMCID: PMC9928173  NIHMSID: NIHMS1769025  PMID: 36793369

Abstract

Linear mixed-effects models are widely used in analyzing clustered or repeated measures data. We propose a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. The proposed method is applicable to general settings where the dimension of the random effects and the cluster sizes are possibly large. Regarding the fixed effects, we provide rate optimal estimators and valid inference procedures that do not rely on the structural information of the variance components. We also study the estimation of variance components with high-dimensional fixed effects in general settings. The algorithms are easy to implement and computationally fast. The proposed methods are assessed in various simulation settings and are applied to a real study regarding the associations between body mass index and genetic polymorphic markers in a heterogeneous stock mice population.

Keywords: clustered data, debiased Lasso, longitudinal data, random effects, variance components

1. Introduction

The results of scientific experiments are often subject to environmental effects as experimental units can be grouped and settled in diverse environments, where the observations within the same group can be dependent as a cluster. Clustered data commonly arise in many fields, such as biology, genetics, and economics. Linear mixed-effects models provide a flexible tool for analyzing such clustered data, which include repeated measures data, longitudinal data, and multilevel data (Pinheiro and Bates 2000; Goldstein 2011). The linear mixed-effects models incorporate both the fixed and random effects, where the random effects induce correlations among the observations within each cluster and accommodate the cluster structure. In many genomic and economic studies, the dimension of the covariates can be large and possibly much larger than the sample size. A variety of statistical models and approaches have been proposed and studied for analyzing high-dimensional data. However, most of them are restricted to dealing with independent observations, such as linear models and generalized linear models. Statistical inference for high-dimensional linear mixed-effects models remains to be a challenging problem. In this work, we consider estimation and inference of unknown parameters in high-dimensional mixed-effects models.

For ease of presentation, we use the setting for clustered data to present a linear mixed-effects model. For repeated measurement data, the repeated measures form a cluster. Let i = 1, …, n be the cluster indices. For the i-th cluster, we have a response vector yimi, a design matrix for the fixed effects Ximi×p, and a design matrix for the random effects Zimi×q, where mi is the size of the i-th cluster. A linear mixed-effects model (Laird and Ware 1982) can be written as

yi=Xiβ*+Ziγi+ϵi,i=1,,n, (1)

where β*p is the vector of the fixed effects, γiq is the vector of the random effects of the i-th cluster, and ϵimi is the noise vector of the i-th cluster. For i = 1, …, n, we assume γi and ϵi are independently distributed with mean zero and variance Ψq×q and σe2Imi, respectively. Detailed assumptions are given in Sections 2 and 3.

Much existing literature on linear mixed-effects models assumes that the number of random effects q and cluster sizes mi are fixed. Without special emphasis, we say a fixed-dimensional setting if p, q, and {mi}i=1n are all fixed numbers, and a high-dimensional setting if p is large and possibly much larger than N, where N=i=1nmi is the total sample size. We refer to γi and ϵi as the random components.

1.1. Related literature

In the fixed-dimensional setting, many methods have been proposed to jointly estimate the fixed effects and variance parameters. We refer to Gumedze and Dunne (2011) for a comprehensive review. Among them, the maximum likelihood estimators (MLEs) and restricted MLEs are most popular for estimation and inference in linear mixed-effects models. Restricted MLEs can produce unbiased estimators of the variance components in the low-dimensional setting but it is not applicable in the high-dimensional setting. Furthermore, these likelihood-based estimators rely heavily on the normality assumptions of the random components. Computationally, maximizing the likelihood can generally lead to a nonconvex optimization problem that typically has multiple local maxima. Hence, the performance of likelihood-based methods lacks of guarantees in real applications.

As an alternative, Sun et al. (2007) proposed moment estimators of the fixed effects and variance parameters for a random effect varying-coefficient model. Peng and Lu (2012) considered such moment estimators for fixed-dimensional linear mixed-effects models. Their proposed estimators have closed-form solutions and are computationally efficient. The consistency and asymptotic normality of these estimators are justified under certain conditions in the fixed-dimensional setting. Ahmn et al. (2012) proposed another moment-based method for the estimation and selection of the variance components of the random effects in the fixed-dimensional setting. This method works especially well when the number of the random effects is as large as the cluster sizes, i.e. m1 = … = mn = q.

For inference of variance components in the fixed-dimensional setting, the likelihood ratio, score, and Wald tests (Stram and Lee 1994; Lin 1997; Verbeke and Molenberghs 2003; Demidenko 2004) are broadly used. However, when testing the existence of the random effects, the asymptotic distribution of the likelihood ratio is usually a mixture of chi-square distributions (Miller 1977; Self and Liang 1987). Since these methods are based on the MLEs or restricted MLEs as initial estimators, they also suffer from the drawbacks of likelihood-based methods discussed above.

In the high-dimensional setting, the problems are much more challenging. Assuming fixed cluster sizes, Schelldorfer et al. (2011) analyzed the rate of convergence for the global maximizer of the 1-penalized likelihood with fixed designs. As mentioned before, the analysis for the global optimum may not apply to the realizations due to the existence of local maxima. Fan and Li (2012) studied the fixed effects and random effects selection in a high-dimensional linear mixed-effects model when the cluster sizes are balanced, i.e. (maxi mi) / (mini mi) < ∞. The selection consistency requires minimum signal strength conditions regarding the fixed effects and the random effects. Bradic et al. (2019) considered testing a single coefficient of the fixed effects in the high-dimensional linear mixed-effects models with fixed cluster sizes, fixed number of random effects, and sub-Gaussian designs. The theoretical analyses in all three aforementioned papers require the positive definiteness of the covariance matrix of the random effects. This condition takes prior knowledge on the existence of the random effects and can be hard to fulfill in applications. Moreover, the optimal convergence rate of parameter estimation remains unknown. In fact, estimators of fixed effects in Schelldorfer et al. (2011) and Bradic et al. (2019) may not be rate optimal for estimation according to our analysis. Finally, estimation and inference of the variance components in the high-dimensional setting remain largely unknown.

The problems of estimation and inference of the fixed effects in linear mixed-effects models are related to high-dimensional linear models. Many penalized methods have been proposed for prediction, estimation, and variable selection in high-dimensional linear models; see, for example, Tibshirani (1996); Fan and Li (2001); Zou (2006); Candes and Tao (2007); Meinshausen and Bühlmann (2010); Zhang (2010). Statistical inference on a low-dimensional component of a high-dimensional regression vector has been considered and studied in linear models and generalized linear models with “debiased” estimators (Zhang and Zhang 2014; van de Geer et al. 2014; Javanmard and Montanari 2014), and the minimaxity and adaptivity of confidence intervals have been studied in Cai and Guo (2017) and Cai et al. (2020). The idea of debiasing has also been studied and extended to solve other statistical problems, such as statistical inference in Cox models (Fang et al. 2017), simultaneous inference (Zhang and Cheng 2017; Dezeure et al. 2017), and semi-supervised inference (Cai and Guo 2020).

1.2. Our contributions

In this paper, we develop a simple but powerful method for inference of the unknown parameters in high-dimensional linear mixed-effects models. Our method is applicable to the settings where the number of random effects can possibly be large and the cluster sizes can be either fixed or growing, balanced or unbalanced. The proposed method is easy to implement and the optimization in each step is either analytic or convex.

Based on a proxy of the true covariance matrix, we develop a penalized quasi-likelihood approach for fixed effects estimation. The proposed estimator is minimax rate optimal under general conditions. We further develop a debiased estimator for hypothesis testing and construction of confidence intervals for the fixed effects. The proposed estimator does not require normality assumptions or the structural assumptions on the variance components. We further apply the idea of quasi-likelihood to estimate the variance components and prove its optimality under certain conditions.

Our analysis provides a novel insight for understanding and simplifying the linear mixed-effects models by approximating the true unknown covariance matrix of the random components with some simple proxy matrices. In this way, one separates the tasks of estimating the fixed effects and variance components and avoids the nuisance parameters in each optimization step. This improves the computational efficiency and simplifies the theoretical analysis.

1.3. Notation

Throughout the paper, we use i to index the i-th cluster and k to index the k-th observation in each cluster. Let y, γ, ϵ, and X be obtained by stacking vectors yi, γi, ϵi, and matrices Xi underneath each other, respectively. Let ZN×(nq) be a block diagonal matrix with the i-th block being Zi. Let Σθi=ZiΨZi+σe2Imi and ΣθN×N be a block diagonal matrix with the i-th block being Σθi. Let Σzi=(Zi)Zi/mi and Σz,xi=(Zi)Xi/mi, i = 1, …, n. For a random variable u, define its sub-Gaussian norm as uψ2=supl1l1/2E1/l[|u|l]. We refer to uψ2,Z=supl1l1/2E1/l[|u|lZ] as the conditional sub-Gaussian norm of u. For a random vector Un0, define its sub-Gaussian norm as Uψ2=supv2=1,vn0U,vψ2. Define it conditional sub-Gaussian norm as Uψ2,Z=supv2=1,vn0U,vψ2,Z.

Let An0×n0 be a symmetric matrix. A ⪰ 0 means that A is semi-positive definite and A ≻ 0 means that A is positive definite. Let Λmax (A) and Λmin (A) denote the largest and smallest eigenvalues of A, respectively. Let ∥A2 denote Λmax(A). Let AF2=Tr(AA), where Tr(A) is the trace of matrix A. Let c, c0, c1, …, C, C0, C1, … denote some generic positive constants that can vary in different statements.

1.4. Organization of the paper

The rest of the paper is organized as follows. Section 2 introduces the idea of quasi-likelihood and a procedure for the fixed effects inference. Section 3 provides a theoretical analysis for the inference procedures proposed in Section 2. Section 4 introduces estimators for the variance components and provides upper and lower bounds. Numerical performance of the proposed methods is investigated in Section 5 in various simulation settings. The proposed methods are applied in Section 6 to analyze a real study on the associations between the body mass index and genetic variants in a stock mice population where the cage effect is modeled as a random effect. A discussion is given in Section 7. Proofs and more numerical results are provided in the Supplementary Materials.

2. Inference for fixed effects: the method

In many applications of the linear mixed-effects models, inference of the fixed effects is of main interest. In this section, we present our method for fixed effects inference and describe its motivations. We assume that the vector of fixed effects β* is sparse such that ∥β*∥0s with s unknown. We consider model (1) where p, s, and q can grow and p can be much larger than N. The cluster sizes {mi}i=1n can be either fixed or grow with n.

2.1. Motivations of the proposed method

For fixed effects estimation in model (1), the main challenges are posed by the high-dimensionality of the fixed effects and the clustered structure of the observations. Before developing a new method, it is helpful to understand the new challenges posed by the cluster structures in model (1) in terms of estimation and inference. For this purpose, we study the consequences of mis-specifying a linear mixed-effects model as a standard linear model.

Applying Lasso (Tibshirani 1996) directly to the observations generated from (1), we analyze

β^(lm)=argminbp{12NyXb22+λ(lm)b1} (2)

for some tuning parameter λ(lm) > 0. In a typical analysis of the Lasso, the convergence rate of β^(lm) depends on the restricted isometries of the sample covariance matrix, XX / N, the sparsity of the true coefficients, and the so-called “empirical process” part of the problem, ∥X(yXβ*) / N. It is known that for linear models with row-wise independent sub-Gaussian (X, y), the “empirical process” part is of order logp/N, which gives the optimal convergence rate in 2-norm. In the following proposition, we study the size of “empirical process” part when the true model is (1).

Proposition 2.1 (The rate of Lasso for linear mixed-effects models).

Suppose that the responses yi are generated with respect to model (1) and each row of X is independently generated with covariance matrix Σx|z conditioning on Z. Then for any fixed j ∈ {1, …, p},

E[|1NX.,j(Zγ+ϵ)|2Z]=(Σxz)j,jσe2N+(Σxz)j,ji=1nmiTr(ΨΣzi)N2+i=1nmi2Ψ1/2E[z,xi|Z]22N2. (3)

If Ψ is positive definite and {mi}i=1n have bounded diagonal elements, then the second term on the right hand side of (3) is ≍ q / N. If it further holds that E[Σz,xi|Z]0, i.e. X and Z are correlated, then the third term can be ≳ min1≤in mi / N. That is, the Lasso may not be rate optimal for clustered data if either q grows, or, {mi}i=1n grow and X and Z are correlated. On the other hand, if the q and mis are all constant, it is not hard to prove that the original Lasso is still rate optimal for model (1).

Therefore, proper methods need to be developed for high-dimensional linear mixed-effects models under general conditions on q and {mi}i=1n. The main challenge comes from the correlation among observations induced by the random effects. For the i-th block, the covariance of the random components is Σθ*i=ZiΨ(Zi)+σe2Imi, which involves unknown parameters. We consider a proxy of Σθ*i as

Σai=aZi(Zi)+Imi

with some predetermined constant a > 0. The following proposition shows that this approximation is valid up to some scaling constant. Let ΣaN×N be the block diagonal matrix with the i-th block being Σai.

Proposition 2.2.

If Ψ is positive definite, then for any constant a > 0,

min{1σe2,aΛmax(Ψ)}Σa1Σθ*1max{1σe2,aΛmin(Ψ)}Σa1.

Therefore, if Ψ has positive and bounded eigenvalues, Σθ*1 and Σa1 are of the same rate and only differ by constants. This property of Σa1 is crucial to achieve the general results in this work. A broader class of proxy matrices have been considered in Fan and Li (2012) for variable selection and in Bradic et al. (2019) for hypothesis testing, which include Σa1 as a special case. As reviewed in Section 1.1, afore-mentioned two papers considered relatively restrictive scenarios in terms of group sizes and the dimension of the random effects. It is not clear whether the desired property proved in Proposition 2.2 holds for the general class of proxy matrices.

2.2. The quasi-likelihood approach

We consider a quasi-likelihood approach which replaces Σθ*i with Σai for some constant a > 0 in the likelihood function for Gaussian mixed-effects models. Specifically, let Xa and ya denote the transformed observations such that (Xa,ya)=(Σa1/2X,Σa1/2y).

First, we estimate the fixed effects via the Lasso based on the transformed data. For some fixed a > 0, define

β^=argminβp{12Tr(Σa1)yaXaβ22+λβ1} (4)

for some tuning parameter λ > 0. The quantity Tr(Σa1) can be viewed as the effective sample size in the current problem and its magnitude is studied in Remark 3.1. The choice of a will be studied theoretically in Section 3 and numerically in Section 5.

Given the task of making inference for βj*, we propose the following debiased estimator. For β^ defined in (4),

β^j(db)=β^j+w^j(yaXaβ^)w^j(Xa).,j, (5)

where w^jN can be viewed as a correction score. It can be computed via another Lasso regression (Zhang and Zhang 2014; van de Geer et al. 2014) or via a quadratic optimization (Zhang and Zhang 2014; Javanmard and Montanari 2014). For computational convenience, we consider the Lasso approach based on the transformed data. Define the correction score w^j=(Xa).,j(Xa).,jκ^j, where

κ^j=argminκjp1{12Tr(Σa1)||(Xa).,j(Xa).,jκj||22+λjκj1}, (6)

for some tuning parameter λj > 0. A two-sided 100× (1 – α)% confidence interval for βj* can be constructed as

β^j(db)±zα/2V^j, (7)

where zτ is the τ-th quantile of a standard normal distribution and V^j is an estimator of the variance of β^j(db). We propose to use the following empirical variance estimate

V^j=i=1n[(w^ji)(yaiXaiβ^)]2(w^j(Xa).,j)2, (8)

where β^ is the initial Lasso estimator (4), w^jimi is the i-th sub-vector of w^j such that w^j=((w^j1),,(w^jn)), and yai is the i-th sub-vector of ya. The idea of empirical variance estimator has been considered in Bühlmann and van de Geer (2015) to deal with the misspecified linear models. The format of (8) is however different from the one for linear models because it is an average over n groups rather than N observations. In this work, V^j serves as a convenient alternative to the limiting distribution-based variance estimators. In fact, the limiting distribution of β^j(db) involves nuisance parameters coming from the complicated variance components. By using the empirical residuals of the transformed data, we bypass the estimation of the nuisance parameters.

3. Inference for fixed effects: theoretical guarantees

In this section, we provide theoretical guarantees for the estimators described in Section 2.2. We first detail our assumptions.

Condition 3.1 (Sub-Gaussian random components).

The random noises ϵi,k, i = 1, …, n, k = 1, …, mi, are independent with mean zero and variance 0<σe2<K0<. The sub-Gaussian norms of ϵi,k, are upper bounded by K0. The random effects γiq, i = 1, …, n, are independent with mean zero and covariance Ψ K1Iq for some positive constant K1. For i = 1, …, n, ϵi and γi are independent of each other and are independent of (Xi, Zi). The sub-Gaussian norms of Σθ*1/2(Zγ+ϵ) are upper bounded by K0.

Condition 3.1 assumes sub-Gaussian random components while that classical linear mixed-effects models always assume Gaussian random components (Pinheiro and Bates 2000). Hence, Condition 3.1 is less restrictive and is more robust to model misspecifications than the classical assumptions. In addition, we do not require Ψ to be strictly positive definite. A scenario of singular Ψ is that some components of the random effects do not exists such that some diagonal elements of Ψ are zero.

Regarding the conditions on the designs, the estimation and inference in the linear mixed-effects models are usually conditioning on Z in order to maintain the cluster structure. Schelldorfer et al. (2011) and Fan and Li (2012) assume both X and Z are fixed. Jiang et al. (2016) considers estimation and inference in a misspecified linear model when both X and Z are random. Bradic et al. (2019) assumes X is sub-Gaussian with mean zero and Z is fixed, which implies that X and Z are independent. In the current work, we consider random designs satisfying the following condition.

Condition 3.2 (Sub-Gaussian [INEQ-START).

X conditioning on Z] Conditioning on Z, each row of X is independent with mean zero and covariance matrix Σx|z such that 0 < K* ≤ Λminx|z) ≤ Λmaxx|z) ≤ K* < ∞. Conditioning on Z, the conditional sub-Gaussian norms of Xk,.i are upper bounded by K0.

In Condition 3.2, we assume sub-Gaussian X and Z have mean independence, i.e., E[XZ]=0 for simplicity. This is slightly weaker than assuming X and Z are mutually independent and it holds when Z is deterministic including the random intercept model. In Section 3.4, we study the performance of our proposal when E[XZ]0.

3.1. Fixed effects estimation

In this subsection, we analyze the theoretical performance of (4) under Conditions 3.1 and 3.2. Define

λa*=Tr(Σa1Σθ*Σa1)logpTr(Σa1).

Lemma 3.1 (Fixed effects estimation with quasi-likelihood based Lasso).

Assume that Conditions 3.1 and 3.2 hold true. There exists a constant c1 such that for λc1λa* and Tr(Σa1)slogp, we have with probability at least 1 − 2exp(−log p),

||β^β*||1C1sλ,||β^β*||2C2sλ,and1Tr(Σa1)||Xa(β^β*)||22C3sλ2 (9)

for some positive constants C1, C2, and C3. Moreover, for any a > 0,

λa*(Λmax(Ψ)/a+σe2)logpTr(Σa1).

Remark 3.1.

For any a ≥ 0,

i=1nmax{miq,0}Tr(Σa1)N.

Lemma 3.1 provides upper bounds for the prediction error and the estimation errors in 1-norm and 2-norm. By setting a to be a positive constant, the 2-error of β^ is of order slogp/Tr(Σa1). Remark 3.1 studies the magnitude of the effective sample size Tr(Σa1). As pointed out by a reviewer, in the case of equal group sizes and q / mc0 < 1, Tr(Σa1)N, i.e. the convergence rates are the same as the rates in linear models. Revoking Proposition 2.1, it shows that β^ has a faster convergence rate than β^(lm) in the regime that q grows but remains relatively small to m.

The results of Lemma 3.1 hold for any positive constant a. Different choices of a can affect the constants in the upper bound and the empirical performance of the method. To understand the optimal choice of a, we prove the following remark.

Remark 3.2 (Effect of a).

Suppose Ψ = η*Iq. For any given n, p, q, {mi}i=1n, a=η*/σe2 minimizes λa* for a∈ (0, ∞). If it further holds that η* ≠ 0 and q < maxin mi, then λ0*>λa* for any a(0,λa*].

Remark 3.2 gives the optimal choice of a for Ψ = η*Iq. In this case, setting a=η*/σe2 minimizes λa* and hence minimizes the upper bound on the estimation errors when other parameters and constants are fixed. The optimal choice of a is intuitive as it mimics the MLE procedure. Furthermore, when the random effects exist and q < maxin mi, then setting a = 0 is strictly worse than the proposed quasi-likelihood approach with a(0,λa*]. We mention that the condition q < maxin mi is sufficient but not necessary. This remark sheds lights on the choice of a in general settings as any semi-positive definite Ψ can be upper and lower bounded by diagonal matrices. From the optimization perspective, we treat a as a tuning parameter in the optimization (4). In Section 5, we carefully examine the effect of a on estimation accuracy in numerical experiments.

3.2. Rate optimality of the proposed estimator

In this subsection, we study the minimax optimality of proposed estimator for the fixed effects. We consider

Xk,.iZ~i.i.d.N(0,Σx),γi~i.i.d.N(0,Ψ)andϵi,k~i.i.d.N(0,σe2), (10)

k = 1, …, mi, i, = 1, …, n. Consider the following parameter space

Ξ(s,Z)={v=(β*,Ψ,σe2,Σx,Z):β*0s,0<σe2K0,0ΨK1,1/K*Λmin(Σx)Λmax(Σx)K*<}, (11)

where K* ≥ 1. We see that (10) and (11) define a special case of Conditions 3.1 and 3.2. We prove the minimax optimal rate of convergence in 2-norm with respect to Ξ(s).

Theorem 3.1. (Minimax lower bounds for estimating the fixed effects).

Suppose that (1) and (10) are true. If scmin{Tr(Σa1)/logp,pν} for 0 < ν < 1 / 2 and c > 0, then there exists some constant c1 > 0 such that for any fixed a > 0,

infβ^supvΞ(s,Z)v(β^β*22c1slog(p/s2)Tr(Σa1)Z)14.

Together with (9), this shows that β^ is minimax rate optimal in -error in the parameter space Ξ(s, Z). In the proof, we use the minimax optimality of 1-penalized MLE, which has Σθ*1 as the weighting matrix and use Proposition 2.2 to show the equivalence of MLE and proposed estimator in term of convergence rate.

3.3. Statistical inference of the fixed effects

Debiased estimators can be used for statistical inference of linear combinations of regression coefficients in high-dimensional linear models (Zhang and Zhang 2014; van de Geer et al. 2014; Javanmard and Montanari 2014). Under certain conditions, the debiased estimators are asymptotically normal and can be used to construct confidence interval with optimal lengths (Cai and Guo 2017). To make inference of βj*, we consider the debiased estimator proposed in (5). Let Hj be the support of (Σxz1).,j.

Theorem 3.2. (Asymptotic normality of the debiased estimator).

Assume Conditions 3.1 and 3.2. Let λλjc1logp/Tr(Σa1) with a large enough constant c1. For β^j(db) defined in (5), if (slogp)2lognmaximiTr(Σa1)Λmin(Σa1/2Σθ*Σa1/2) and |Hj|logpTr(Σa1), then it holds that

Vj1/2(β^j(db)βj*)=Rj+oP(1),

where RjDN(0,1) for

Vj=w^jΣa1/2Σθ*Σa1/2w^j{w^j(Xa).,j}2.

The magnitude of Vj satisfies

Vj=(Σxz1)j,jTr(Σa1Σθ*Σa1)Tr2(Σa1)(1+oP(1)). (12)

Theorem 3.2 shows the asymptotic normality of the proposed debiased estimator under the given conditions. The convergence rate of β^j(db) is Vj1/2 with magnitude provided in (12). Remark 3.2 helps to understand the effect of a on the inference results. As Σx|z is positive definite, Vj is proportional to (λa*)2 for any given p. Hence, using the debiased Lasso for linear models, i.e., setting a = 0, can lead to large Vj and low power in statistical inference. We verify these arguments numerically in Section 5. The sparsity of (Σxz1).,j guarantees that κ^j converges to its probabilistic limit so that the central limit theorem can be justified. When Ψ is positive definite, the sample size condition for asymptotic normality is (slogp)2lognmaximi|Hj|logpTr(Σa1). When Ψ is singular, it is sufficient to require (slogp)2maximilognmaximi2|Hj|logpTr(Σa1).

Theorem 3.2 is related to the results in some other works. When there is no random effect components, i.e. Z = 0, the conditions and conclusions of Theorem 3.2 recover the conditions and conclusions for the debiased Lasso in linear models, say, in Theorem 2.4 of van de Geer et al. (2014). In comparison to BCG19 of Bradic et al. (2019), the limiting distribution of their test statistic under the null hypothesis does not require sparse regression coefficients but requires |Hj|=o(n/logp/logn), using our notations. The power analysis for their test statistic requires max{s,|Hj|}=o(n/logp/logn). In comparison, the sample size condition in our work (in the fixed q and m scenario) is s=o(n/logp) and | Hj | = o(n / log p), which is weaker. More importantly, the realization of β^j(db) does not rely on the null hypothesis and hence can be directly used to construct confidence intervals. We examine the empirical performance of these two different approaches in Section 5.

If one has a consistent estimator of Vj, the confidence intervals of the fixed effects can be constructed based on the limiting distribution of β^j(db). However, a plug-in estimate of Vj would require the knowledge of the structures of variance components and extra efforts on estimation. In the following, we show that an empirical estimator of Vj, V^j defined in (8), is consistent under mild conditions. Let cn=lognmaximi/Λmin(Σa1/2Σθ*Σa1/2).

Lemma 3.2 (Convergence rate of the variance estimator).

Under the conditions of Theorem 3.2, for V^j defined in (8),

|V^j/Vj1|=OP(cn1/2Tr1/2(Σa1)+cnslogpTr(Σa1)).

Lemma 3.2 implies that the proposed variance estimator is consistent if cn=o(Tr1/2(Σa1)) and the conditions of Theorem 3.2 hold ture. The quantity cn is to account for the correlations within clusters. The proposed V^j is robust in the sense that it does not rely on the specific structure of the variance components and is consistent under mild conditions. Based on Theorem 3.2 and Lemma 3.2, hypothesis testing and constructing confidence intervals are both achievable. The asymptotic validness of the proposed confidence interval (7) is guaranteed.

We conclude this section with a further comment on the benefits of using the quasi-likelihood. In fact, even if we have consistent estimators of the variance parameters, say θ^, using proxy matrix Σa to compute the debiased estimator is still favorable over using Σθ^. First, using θ^ can bring the complex dependence structure to Rj as the correction score would also depend on the random components. This makes it difficult to justify the asymptotic normality of Rj. Second, Σθ^1 may not approximate Σθ*1 well enough in the sense that the magnitude of the error in θ^ can be larger than the magnitude of the bias of the debiased Lasso estimator. As a result, the sample size condition for the asymptotic normality may be impaired.

3.4. Results for possibly correlated X and Z

In this subsection, we consider a relaxed version of Condition 3.2 that allows for correlation between X and Z.

Condition 3.3.

Conditioning on Z, each row of X is independently distributed with covariance matrix Σx|z such that 0 < K* ≤ Λminx|z) ≤ Λmaxx|z) ≤ K* < ∞. Conditioning on Z, the conditional sub-Gaussian norms of Xk,.i are upper bounded by K0. Moreover, max1jpE[X.,jZ]22c1Tr(Σa1) for some large enough c1.

Revoking that X.,j22CN and Remark 3.1, a sufficient condition for the last statement to hold is q / (mini mi) ≤ c0 <1.

Lemma 3.3 (Fixed effects estimation with Lasso).

Assume that Conditions 3.1 and 3.3 hold true. There exist large enough constants c1 and c2 such that for λc1(σe2+K1/a)logp/Tr(Σa1) and Tr(Σa1)slogp, we have with probability at least 1 − 2exp(−c2 log p),

||β^β*||1C1sλ,||β^β*||2C2sλ,and1Tr(Σa1)||Xa(β^β*)||22C3sλ2 (13)

for some large enough constants C1, C2, and C3.

Under Condition 3.3, for any constant a > 0, the effective sample size for the proposed approach is still of order Tr(Σa1). However, we may not have a clear understanding of the optimal choice of a under current conditions but a can still be chosen by cross-validation in applications.

For inference of the fixed effects, one issue caused by the correlation between X and Z is that the limit of κ^j in (6) can depend on a and its sparsity is hard to guarantee. If its limit is sparse indeed, then the central limit theory of Theorem 3.2 still holds. If its limit is not sparse, then one may consider the debiasing scheme for linear models with the initial estimator computed by the quasi-likelihood approach. We show in the Supplementary Materials (Theorem A) that our proposed debiased estimator with a = 0 in (5) and (6) is robust to the correlation between X and Z. However, its asymptotic normality requires stronger sample size conditions when Ψ is positive definite and it can have significantly wider confidence intervals and hence lower statistical power. We examine this method in Section 5 numerically.

4. Variance components estimation

In this section, we consider estimating the unknown parameters of variance components. While fixed effects inference can be of main interest in many problems, estimation of variance components can provide insights into the structure of the data. As far as we know, this problem has not been studied in existence of high-dimensional fixed effects. We will demonstrate that the idea of quasi-likelihood approach can be applied to estimating the variance components in high-dimensional linear mixed-effects models.

We parameterize Ψ as follows. Let η*d be the true parameter such that

Ψ=Ψη*=j=1dηj*Gjq×q, (14)

where G1, … Gd are symmetric basis matrices that are linearly independent in the sense that

j=1dcjGj=0iffc1==cd=0. (15)

The dimension d is allowed to grow to infinity. The structure of Ψη* (14) incorporates most commonly used models in applications, such as the random intercept model and the models used in twin or family studies (Wang et al. 2011). One should note that any symmetric Ψ can be represented via (14) with η* being the vector of its upper diagonal elements. Without loss of generality, we assume the basis matrices have a constant scale, i.e. max1≤jdGj2C < ∞.

4.1. Estimating the variance components

A widely used approach for estimating the variance components is the Gaussian maximum likelihood method. However, this approach is highly restricted to the Gaussian assumptions. We consider a different approach that deals with sub-Gaussian random components. We first split the data into two folds: Let I1 ∪ =I2 = [n], I1I2 = ∅, and | I1 |≈| I2 |≈ n / 2. Let β^(2) be an initial estimate of β* with the second half of the data {Xi,Zi,yi}iI2. We compute the residuals ri^=yiXiβ^(2) for iI1 and estimate σe2 via

σe2=1iI1Tr(PZi)iI1PZir^i22. (16)

Next, we estimate η* via

η^=argminηdiI1||(Σai)1/2(r^ir^iZiΨη(Zi)σe2Ini)(Σai)1/2||F2, (17)

where constant KK2 and σe2 is obtained via (16).

The rationale of (16) is that the observations PZi(yiXiβ*) have covariance matrix σe2PZi which only involves the target parameter σe2. Replacing β* with its quasi-likelihood estimate gives (16). This estimator is meaningful only when iI1Tr(PZi)>0, i.e. iI1mimax{0,1q/mi}>0. The rationale of (17) comes from the MLE. One can check that the derivative of the target function in (17) would be the score function with respect to η if we replace Σa with the MLE estimate of Σθ*. Different from the MLE, we estimate σe2 and η* separately. This is because a joint estimation of η* and σe2 may have poor performance. The reason is that, loosely speaking, the observed data involves N independent observations of the random noise and n independent observations of random effects. When Nn, the convergence rate for estimating σe2 and η* can have different magnitudes and a joint estimation can lead to ill-positioned Hessian matrix and non-sharp convergence rate. The sample splitting is for technical reasons and it is for proving that the estimation error of η^ is independent of the error of the fixed-effects estimation.

Computationally, σe2 in (16) is a one-step estimator and (17) involves a convex optimization, which can be easily implemented. On the other hand, sample splitting can lead to sub-optimal finite sample performance and it is worthwhile to perform a cross-fitting step. That is, one can run another round of (16) and (17) with samples in the two folds switched and report the average of two estimates as the final estimate.

4.2. Upper bound analysis

In this subsection, we analyze the proposed estimator of the variance components. Let DGd×d be such that

{DG}j,k=Tr(GjGk). (18)

The matrix DG only depends on the pre-specified basis and Λmin (DG) > 0 as Gj, j = 1, …, d, are linearly independent.

Theorem 4.1. (Convergence rate of variance components estimates)

Assume that Conditions 3.1 and 3.2 hold and iI2Tr((Σai)1)slogp. Then

|σe2σe2|=OP((iI1maxmi{0,1q/mi})1/2+slogpiI2Tr((ai)1)).

If further nc1 log d for some large enough c1, min1inΛmin(Σzi)c0/mi>0, and 0 < c1 ≤ Λmin (DG) ≤ Λmax (DG) ≤ c2 < ∞, then

η^η*2=OP(dlogdn).

The convergence rate of σe2 depends on the effective sample size in I1 as well as the estimation error of β^(2). In comparison to the rate of variance estimation in linear models (Verzelen 2012), the current result replaces the total sample size with the effective sample size. On the other hand, η^ has the typical parametric rate when there are d unknown parameters and n independent observations of random effects. The estimation error of η^ is independent of the error of the fixed effects estimation.

In terms of conditions, DG depends on pre-specified basis matrices and it eigenvalues are positive and bounded in many cases. Consider the class of basis matrices where Gj,kq×q such that (Gj, k)l, q = (Gj, k)q, l = 1 if l = j, q = k and (Gj, k)l, q = 0 otherwise. For any 1 ≤ dq(q + 1) / 2, it is easy to check that in this case Λmin (DG) =Λmax (DG) =1. For independent sub-Gaussian Zk,.i, min1inΛmin(Σzi)c0 with high probability if q log n ≪ min1≤im mi. To summarize, the estimators proposed in Section 4.1 are mainly for the scenario where qc0 min1≤in mi for sufficiently small c0.

4.3. Rate optimality of estimating variance components

Now we turn to study the minimax lower bound for estimating the variance parameters. We consider the random components satisfying (10) and parameter space Ξ(s, Z) (11).

Theorem 4.2. (Minimax lower bounds for estimation of variance components.)

Suppose that (1) and (10) are true. If scmin{Tr(Σa1)/logp,pν} for 0 < ν < 1 / 2 and c > 0 for some c0 > 0, then there exists some constants c1c3 > 0 such that

infσe2supvΞ(s,Z)v(|σe2σe2|c1Tr1/2(Σa1)+c2slog(p/s2)Tr(Σa1)Z)14

If further Λmax (DG) ≤ C < ∞,

infη^supvΞ(s,Z)vη^η*2c3n1/2Z14.

Theorem 4.2 and Theorem 4.1 together imply that σe2 is rate optimal in Ξ(s, Z) under the conditions of Theorem 4.2 if when Tr(Σa1)i=1nmimax{0,1q/mi}. As explained after Lemma 3.1, in the case where group sizes are equal and q / mc0 <1, i=1nmax{0,miq}Tr(Σa1)N. Moreover, η^ is rate optimal when d is finite. When d grows, regularized estimators of η* can have smaller estimation error than η^, similar to the famous Stein’s phenomenon.

5. Simulation results

In this section, we present simulation results to evaluate the empirical performance of the proposed methods and compare it with some related methods. We examine the effect of a on estimation and inference of the fixed effects.

We generate data as follows. We set N = 144 and p = 300. Each row of (X, Z) are i.i.d. generated from a normal distribution with mean zero and covariance such that Σx = Ip, Σz = Iq, and (Σx, z)k, j = ρj for 1≤ j, kq and (Σx, z)k,. = 0 for k > q. That is, the correlation between Xj and Z is nonzero if jp and is 0 if j > q. The random noises are i.i.d. generated via ϵi~N(0,0.25Imi) and the random effects are i.i.d. generated via γi ~ N(0, Ψ). We consider q ∈ {2, 8, 14}. The matrix Ψ will be specified later. The responses y are generated via model (1) with s = 5 and β1:5 = (1, 0.5, 0.2, 0.1, 0.05) and equal cluster sizes, i.e. m1 =…= mn = m. Each setting is replicated with 300 independent Monte Carlo simulations.

5.1. Statistical inference for fixed effects

We first examine the empirical performance of the proposed confidence intervals (7) and hypothesis testing based on β^j(db). We consider two covariance matrices of random effects, a “positive definite Ψ” where Ψ =j, k 0.56|jk| for 1 ≤ j, kq, and a “singular Ψ” with a diagonal Ψ where Ψ =j, j 0.56 for 1≤ jq / 2 and Ψj, j = 0 otherwise. For the proposed method, we first choose a by cross-validation using the error criteria yXβ^(a)22, where β^(a) the the proposed estimate associated with a specific a. The tuning parameter λ is chosen as σ(init)2logp/N, where σ(init) is computed via the scaled-Lasso (Sun and Zhang 2012) with observations {Xa, ya}. For computing β^j(db), the tuning parameters λj are set to be σx2logp/N, where σx is computed via the scaled-Lasso with observations {(Xa).,− j,(Xa)., j}. The tuning parameters for BCG19 are chosen as in Section 5 of Bradic et al. (2019).

We see from Table 1 that the coverage probabilities of the proposed confidence intervals are close to the nominal level in most scenarios. It shows that the proposed method is robust to large m and q and singular Ψ. We see that the confidence intervals have shorter lengths when m increases. This is because when q is fixed and m grows, the effective sample size Tr(Σa1) increases and the proposed estimators have smaller estimation errors. See Table 1 in the Supplementary Materials for details. When q grows and m is fixed, the effective sample size Tr(Σa1) is smaller and the proposed estimators have larger estimation errors. The results for ρ = 0.2 are reported in the Table 2 of the Supplementary Materials.

Table 1.

95%-confidence intervals given by the proposed approach with positive definite Ψ and singular Ψ when ρ = 0.“cov(0.5)” and “cov(0)” denote the coverage probabilities for βj = 0.5 and βj = 0, respectively. “sd(0.5)” and “ sd(0)” denote the standard deviations for βj = 0.5 and βj = 0, respectively.

q m Positive definite Ψ Singular Ψ
cov(0.5) cov(0) sd(0.5) sd(0) cov(0.5) cov(0) sd(0.5) sd(0)
2 4 0.940 0.957 0.068 0.062 0.953 0.943 0.062 0.056
8 0.943 0.967 0.063 0.053 0.938 0.981 0.058 0.049
12 0.943 0.967 0.061 0.049 0.967 0.948 0.059 0.047
8 4 0.943 0.960 0.195 0.177 0.957 0.943 0.128 0.111
8 0.960 0.940 0.148 0.123 0.933 0.919 0.105 0.088
12 0.943 0.973 0.106 0.083 0.957 0.976 0.085 0.066
14 4 0.937 0.953 0.276 0.264 0.976 0.948 0.173 0.153
8 0.937 0.947 0.243 0.217 0.924 0.929 0.158 0.132
12 0.933 0.950 0.202 0.166 0.981 0.924 0.148 0.112

In Table 2, we report the type-I error and power of our proposed method and those of BCG19. The computational time for our proposal is around 8s per experiment and that for BCG19 is around 20s per experiment. Ideally, the rejection rate for the true null should be close to 5% and the rejection rates for βj*{1,0.5,0.2} should be larger than 5%. We see that both our proposal and BCG19 are effective at controlling the type-I error. However, BCG19 is less powerful than our proposal when q is large and Ψ is positive definite. When Ψ is singular, two methods have comparable performance in most scenarios.

Table 2.

The rejection rate for testing H0:βj*=0 at 95% level for βj*{1,0.5,0.2,0} with positive definite Ψ (p.d.) and singular Ψ when ρ = 0.

Ψ q m Proposed BCG19
1 0.5 0.2 0 1 0.5 0.2 0
p.d. 2 4 1 1 0.793 0.043 1 1 0.627 0.043
8 1 1 0.880 0.033 1 1 0.850 0.040
12 1 1 0.940 0.033 1 1 0.936 0.040
8 4 0.997 0.713 0.163 0.040 0.987 0.593 0.150 0.067
8 1 0.923 0.243 0.060 0.987 0.747 0.173 0.060
12 1 1 0.423 0.027 1 0.840 0.207 0.030
14 4 0.943 0.437 0.117 0.047 0.867 0.327 0.123 0.037
8 0.967 0.487 0.113 0.053 0.927 0.397 0.107 0.057
12 0.993 0.610 0.157 0.050 0.930 0.477 0.150 0.060
singular 2 4 1 1 0.895 0.057 1 1 0.900 0.029
8 1 1 0.952 0.020 1 1 0.943 0.024
12 1 1 0.914 0.052 1 1 0.927 0.057
8 4 1 0.995 0.362 0.057 1 0.976 0.371 0.052
8 1 0.986 0.438 0.071 1 0.986 0.438 0.062
12 1 1 0.638 0.024 1 1 0.567 0.047
14 4 1 1 0.638 0.024 1 1 0.567 0.048
8 1 0.895 0.228 0.043 1 0.890 0.233 0.052
12 1 0.895 0.229 0.067 1 0.890 0.233 0.052

Table 3 demonstrates the effect of a on estimation and inference of the fixed effects. The results for singular Ψ are reported in the Supplementary Materials. We see that choosing a = 0 can lead to large estimation errors and significantly wider confidence intervals. This implies that the Lasso for linear models is less accurate than our proposed methods with a > 0. In all the scenarios of (q, m), we see that the estimation error first decreases as a increases and then increases as a increases. This phenomenon agrees with Remark 3.2. For the inference results, the proposed confidence interval has the desired coverage probabilities as long as a is not too large. We see that setting a = 0 has coverage probabilities close to the nominal level but the confidence intervals are significantly wider than setting a > 0. This implies that using the linear debiased Lasso can lead to low power in hypothesis testing for mixed-effects models.

Table 3.

Effect of different a on sum of squared error (SSE) for estimating β* and on the accuracy of confidence intervals with positive definite Ψ and ρ = 0. “cov(0.5)” and “cov(0)” denote the coverage probabilities for βj = 0.5 and βj = 0, respectively. “sd(0.5)” and “sd(0)” denote the standard deviations for βj = 0.5 and βj = 0, respectively.

(q, m) a SSE Tr(Σa1) cov(0.5) cov(0) sd(0.5) sd(0)
(2,4) 0 0.321 144.0 0.948 0.967 0.138 0.122
2 0.134 87.3 0.962 0.962 0.074 0.065
4 0.113 81.4 0.933 0.952 0.070 0.062
8 0.111 77.7 0.933 0.957 0.068 0.062
16 0.103 75.2 0.900 0.967 0.065 0.060
32 0.105 73.8 0.890 0.933 0.065 0.060
(8,8) 0 0.753 144.0 0.943 0.943 0.261 0.235
2 0.338 34.5 0.952 0.962 0.150 0.122
4 0.342 26.2 0.948 0.933 0.148 0.121
8 0.349 19.6 0.943 0.967 0.144 0.119
16 0.350 14.8 0.910 0.962 0.143 0.116
32 0.402 10.9 0.895 0.967 0.142 0.119
(14,12) 0 0.961 144.0 0.948 0.967 0.344 0.316
2 0.531 19.9 0.981 0.948 0.211 0.167
4 0.501 12.9 0.938 0.967 0.199 0.162
8 0.515 8.1 0.938 0.971 0.200 0.167
16 0.504 5.0 0.948 0.957 0.200 0.161
32 0.551 2.9 0.905 0.967 0.190 0.155

5.2. Estimating variance components

In this subsection, we consider estimating variance components with the proposed method. The true fixed effects and data generation steps are the same as in Section 5.1. We use the whole data to estimate σe2 and η*. We set σe2=0.25. We first consider diagonal Ψ with d = 2. The basis matrices are set to be

G1=(Iq/2000)andG2=(000Iq/2).

For diagonal Ψ, η* = (0.56, 0.56). For singular Ψ, η* = (0.56, 0). Table 4 shows the mean absolute errors of σe2 (mae.σe2), η1* (mae.η1), and η2* (mae.η2). A scenario with relatively large d is reported in the Supplementary Materials (Table 4).

Table 4.

Estimation of the variance components with the proposed method for positive definite and singular Ψ when ρ = 0.

m q Positive definite Ψ Singular Ψ
mae.σe2 mae.η1 mae.η2 mae.σe2 mae.η1 mae.η2
4 2 0.115 0.206 0.207 0.076 0.150 0.050
8 2 0.091 0.212 0.249 0.070 0.171 0.020
4 0.122 0.164 0.166 0.071 0.136 0.027
12 2 0.087 0.268 0.245 0.076 0.197 0.015
6 0.126 0.163 0.160 0.078 0.116 0.019

6. Application to a genome-wide association study in a mouse population

We apply the proposed method to estimate the effects of genetic variants on the body mass index (BMI) in a heterogenous stock mice population generated by the Welcome Trust Centre for Human Genetics http://gscan.well.ox.ac.uk. The data is available in R package “BGLR” (Perez and de los Campos 2014). The dataset consists of 1,814 mice, each genotyped over 10,346 polymorphic markers (SNPs) and has been used for genome-wide genetic association studies of multiple traits (Shifman et al. 2006; Valdar et al. 2006). This mice population consists of 8 liters and were housed in 523 different cages, each including a different number of mice. The distribution of cage density is in Figure 1. We are interested in identifying the genetic variants that are associated with the BMI phenotype. The measurements of BMI are transformed as described in Valdar et al. (2006) so that the data distribution is close to normal. In many mice experiments, cages often contribute significant environmental effects to the phenotypes such as BMI and mice in the same cage tend to be correlated in their phenotype measurements. It is therefore important to account for such cage effect in genetic association studies and the linear mixed-effects model can be employed.

Fig. 1.

Fig. 1

Cage density of the stock mice population. The average density of cages with at least two individuals is 3.70.

In the current analysis, we incorporate the effect of cages as a single random effect and consider the following model

Yi,k=β0+j=110346βjXj,ki+τ1agei,k+τ2genderi,k+γi+ϵi,k,

where Yi, k is the BMI of the kth mouse in the ith cage, Xj,ki, is the numerical genotype at the genetic variant j for the ith mouse in cage k, β0 and βj are the regression coefficients corresponding to the intercept and genetic variants, τ1 and τ2 are the regression coefficients for age and gender, γi is the cage-specific random effect for the i-th cage. For cages with only one individual, we only fit the fixed effects.

The fixed effects are estimated via a weighted Lasso. To mitigate the relatively high correlation among the design, we first compute ridge regression estimates of the fixed effects, say β^(rr), with tuning parameter chosen by cross-validation and use normalized {1/|β^j(rr)|}j=1p (sum up equal to p) as the weights for the Lasso estimates. The regression coefficient β^ is obtained by fitting (4) with tuning parameter λ=0.655×2logp/N, where 0.655 is the noise level estimated by the scaled Lasso. In terms of statistical inference, we compute the debiased Lasso estimates of the fixed effect via (5) and their variances according to (8). According to cross-validation, we set a = 2.

6.1. Identification of BMI associated genetic variants

We control the false discovery rate (FDR) at 5% using the procedure proposed in Xia et al. (2018). Our method identifies 14 covariates with p-value threshold 6.7 × 10−5. The QQ plot of the z-scores of all the covariates is given in the left panel of Figure 2. It shows some deviation from standard normal density at both tails, indicating that some variants can be associated with BMI. Some of the genetic variants identified are in or near the genes known to be associated with body growth, body size, metabolism or obesity. For example, SNP rs13478535 is a variant in Auts2 gene, which has been shown to be related to with either low birth weight or small stature mice (Gao et al. 2014). SNP rs13481413 is one of the genetic variants in gene Immp2l, which is associated with food intake and body weight (Han et al. 2013). cAMP response element binding protein (Crebbp) has been postulated to play an important role downstream of the melanocortin-4 receptor and may affect other pathways that are implicated in the regulation of body weight (Chiappini et al. 2011).

Fig. 2.

Fig. 2

The normal QQ-plots of the z-scores of the debiased Lasso estimators with proposed approach (left) and the debiased Lasso estimates with a = 0 (right) for 10348 fixed effects. The straight reference line passes the first and third quantiles of the z-scores.

We also consider applying the proposed procedure with a = 0. This is equivalent to applying the Lasso to fit the linear model to without considering the random cage effects. The tuning parameters are chosen in the same way as above. Only gender is selected as nonzero at FDR level 0.05. This is possibly due to the model misspecification and larger variances of the debiased Lasso estimators. The QQ-plot of the z-scores based on the debiased Lasso estimation of the linear model (right panel of Figure 2) shows that the z-scores clearly deviate from the standard normal distribution. These results indicate that the proposed estimation and inference methods for the linear mixed-effects model indeed provide an effective way of identifying important genetic variants associated with BMI in mice.

6.2. Evaluation of cage effect

For estimating the variance components, we only use the clusters with at least two observations. The estimated variance of the random effects is 0.202 and the estimated variance of the noise is 0.209. We compute the standard error of the estimated variance of the random effects assuming that the random components are normally distributed. The estimated standard deviation is 0.018, which indicates a strong cage effect.

7. Discussion

The present paper considers estimation and inference of unknown parameters in a high-dimensional linear mixed-effects model. Optimal rate of convergence for estimation was established and rate-optimal estimators were developed. The proposed methods have general applicability in modeling repeated measures and longitudinal data, especially when the cluster sizes are large or heterogeneous. The desirable properties of the proposed estimators are mainly due to the proper approximations of the unknown oracle weighting matrix Σθ*. Our proposed estimation procedure is computationally efficient and does not require strong distributional assumptions on the random effects and error distributions.

The proposed methods have important applications in large-scale genetic association studies in humans, including both family-based studies where the kinship coefficients can be used to specify the random effects and population cohort studies where the random effects can be used to adjust for population stratification (Yang et al. 2014). Instead of considering one genetic variant at a time as in typical mixed-effects models in genetic association studies (Yang et al. 2014), our model considers all the variants jointly. We expect gain in power in detecting phenotype-associated genetic variants by allowing for flexible random effects and by considering all genetic variants jointly using high-dimensional mixed-effects models studied in this paper.

Supplementary Material

Supp 1
Supp 2
Supp 3

Table 5.

Selected covariates at FDR level 0.05. 13 SNPs and gender are selected at FDR ≤ 0.05 and their z-scores are reported. The genes where the SNPs are located are presented when they are available.

SNP Gene z-score SNP Gene z-score
rs13476390 −5.11 rs13482464 −4.01
rs13478535 Auts2 −4.42 rs4152477 Crebbp 4.17
rs3023058 Srrm3 4.22 rs6251709 Csnka2ip 4.49
rs13480072 −4.52 rs6185805 Mtcl1 −4.22
rs13481413 Immp2l −4.22 gnf18.028.738 −4.34
rs13481961 6.27 gnfX.070.167 −4.24
rs4139535 4.51 gender 19.63

FUNDING

This research was supported by NIH grants R01GM123056 and R01GM129781. Tony Cai’s research was also supported in part by NSF grants DMS-1712735 and DMS-2015259.

Footnotes

SUPPLEMENTARY MATERIALS

In the online Supplemental Materials, we provide proofs of all the theorems and lemmas and more numerical studies.

Contributor Information

Sai Li, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.

T. Tony Cai, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

Hongzhe Li, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.

References

  1. Ahmn M, Zhang HH, and Lu W (2012). Moment-Based Method for Random Effects Selection in Linear Mixed Models. Stat Sin 100 (2), 130–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bradic J, Claeskens G, and Gueuning T (2019). Fixed effects testing in high-dimensional linear mixed models. Journal of the American Statistical Association, 1–16.34012183 [Google Scholar]
  3. Bühlmann P and van de Geer S (2015). High-dimensional inference in misspecified linear models. Electronic Journal of Statistics 9 (1), 1449–1473. [Google Scholar]
  4. Cai TT and Guo Z (2017). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Annals of Statistics 45 (2), 615–646. [Google Scholar]
  5. Cai TT and Guo Z (2020). Semi-supervised inference for explained variance in high-dimensional regression and its applications. Journal of the Royal Statistical Society: Series B 82, 391–419. [Google Scholar]
  6. Cai TT, Guo Z, and Ma R (2020). Statistical inference for high-dimensional generalized linear models with binary outcomes. Technical Report. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Candes E and Tao T (2007). The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics 35 (6), 2313–2351. [Google Scholar]
  8. Chiappini F, Cunha L, Harris J, and Hollenberg A (2011). Lack of camp-response element-binding protein 1 in the hypothalamus causes obesity. J Biol Chem. 286, 8094–8105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Demidenko E (2004). Mixed Models: Theory and Applications. Wiley. [Google Scholar]
  10. Dezeure R, Bühlmann P, and Zhang C-H (2017). High-dimensional simultaneous inference with the bootstrap. TEST 26 (4), 685–719. [Google Scholar]
  11. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96 (456), 1348–1360. [Google Scholar]
  12. Fan Y and Li R (2012). Variable selection in linear mixed effects models. Annals of Statistics 40 (4), 2043–2068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fang EX, Ning Y, and Liu H (2017). Testing and confidence intervals for high dimensional proportional hazards models. Journal of the Royal Statistical Society. Series B: Statistical Methodology 79 (5), 1415–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gao Z, Lee P, and Stafford JM (2014). Auts2 confers gene activation to polycomb group proteins in the cns. Nature 516, 349–354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Goldstein H (2011). Multilevel statistical models, Volume 922. John Wiley & Sons. [Google Scholar]
  16. Gumedze FN and Dunne TT (2011). Parameter estimation and inference in the linear mixed model. Linear Algebra and Its Applications 435 (8), 1920–1944. [Google Scholar]
  17. Han C, Zhao Q, and Lu B (2013). The role of nitric oxide signaling in food intake; insights from the inner mitochondrial membrane peptidase 2 mutant mice. Redox Biology 1 (1), 498–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Javanmard A and Montanari A (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research 15, 2869–2909. [Google Scholar]
  19. Jiang J, Li C, Paul D, Yang C, and Zhao H (2016). On high-dimensional misspecified mixed model analysis in genome-wide association study. The Annals of Statistics 44 (5), 2127–2160. [Google Scholar]
  20. Laird NM and Ware JH (1982). Random-effects models for longitudinal data. Biometrics 38 (4), 963–974. [PubMed] [Google Scholar]
  21. Lin X (1997). Variance component testing in generalised linear models with random effects. Biometrika 84 (2), 309–326. [Google Scholar]
  22. Meinshausen N and Bühlmann P (2010). Stability selection. Journal of the Royal Statistical Society. Series B: Statistical Methodology 72 (4), 417–473. [Google Scholar]
  23. Miller JJ (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics 5 (4), 746–762. [Google Scholar]
  24. Peng H and Lu Y (2012). Model selection in linear mixed effect models. Journal of Multivariate Analysis 109, 109–129. [Google Scholar]
  25. Perez P and de los Campos G (2014). Genome-wide regression and prediction with the bglr statistical package. Genetics 198 (2), 483–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Pinheiro JC and Bates DM (2000). Mixed-Effects Models in S and S-PLUS. Springer. [Google Scholar]
  27. Schelldorfer J, Bühlmann P, and van De Geer S (2011). Estimation for high-dimensional linear mixed-effects models using 1-penalization. Scandinavian Journal of Statistics 38 (2), 197–214. [Google Scholar]
  28. Self SG and Liang K-Y (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association 82 (398), 605. [Google Scholar]
  29. Shifman S, Bell J, and Copley R (2006). A high-resolution single nucleotide polymorphism genetic map of the mouse genome. Plos Biology 4(12), e395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Stram DO and Lee JW (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 1171–1177. [PubMed] [Google Scholar]
  31. Sun T and Zhang C-H (2012). Scaled sparse linear regression. Biometrika 99 (4), 879–898. [Google Scholar]
  32. Sun Y, Zhang W, and Tong H (2007). Estimation of the covariance matrix of random effects in longitudinal studies. Annals of Statistics 35 (6), 2795–2814. [Google Scholar]
  33. Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), 267–288. [Google Scholar]
  34. Valdar W, Solberg L, Gauguier D, and others. (2006). Genome-wide genetic association of complex traits in heterogeneous stock mice. Nature Genetics 38(8), 879–887. [DOI] [PubMed] [Google Scholar]
  35. van de Geer S, Bühlmann P, Ritov Y, and Dezeure R (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42 (3), 1166–1202. [Google Scholar]
  36. Verbeke G and Molenberghs G (2003). The use of score tests for inference on variance components. Biometrics 59 (2), 254–262. [DOI] [PubMed] [Google Scholar]
  37. Verzelen N (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electronic Journal of Statistics 6, 38–90. [Google Scholar]
  38. Wang X, Guo X, and He M (2011). Statistical inference in mixed models and analysis of twin and family data. Biometrics 67 (3), 987–995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Xia Y, Cai T, and Cai TT (2018). Two-sample tests for high-dimensional linear regression with an application to detecting interactions. Statistica Sinica 28, 63–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Yang J, Zaitlen N, Goddard M, Visscher P, and Price A (2014). Advantages and pitfalls in the application of mixed model association methods. Nature Genetics 46, 100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38 (2), 894–942. [Google Scholar]
  42. Zhang C-H and Zhang SS (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society. Series B: Statistical Methodology 76 (1), 217–242. [Google Scholar]
  43. Zhang X and Cheng G (2017). Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association 112 (518), 757–768. [Google Scholar]
  44. Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101 (476), 1418–1429. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1
Supp 2
Supp 3

RESOURCES