Skip to main content
Entropy logoLink to Entropy
. 2021 Oct 15;23(10):1348. doi: 10.3390/e23101348

Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application

Eugene A Opoku 1,*, Syed Ejaz Ahmed 2, Farouk S Nathoo 1
Editor: Matteo Convertino
PMCID: PMC8534815  PMID: 34682072

Abstract

In a host of business applications, biomedical and epidemiological studies, the problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis for linear mixed models (LMM). We consider an efficient estimation strategy for high-dimensional data application, where the dimensions of the parameters are larger than the number of observations. In this paper, we are interested in estimating the fixed effects parameters of the LMM when it is assumed that some prior information is available in the form of linear restrictions on the parameters. We propose the pretest and shrinkage estimation strategies using the ridge full model as the base estimator. We establish the asymptotic distributional bias and risks of the suggested estimators and investigate their relative performance with respect to the ridge full model estimator. Furthermore, we compare the numerical performance of the LASSO-type estimators with the pretest and shrinkage ridge estimators. The methodology is investigated using simulation studies and then demonstrated on an application exploring how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer’s disease.

Keywords: linear mixed model, ridge estimation, pretest and shrinkage estimation, multicollinearity, asymptotic bias and risk, LASSO estimation, high-dimensional data

1. Introduction

In many fields such as bio-informatics, physical biology, and epidemiology, the response of interest is represented by repeated measures of some variables of interest that are collected over a specified time period for different independent subjects or individuals. These types of data are commonly encountered in medical research where the responses are subject to various time-dependent and time-constant effects such as pre- and post-treatment types, gender effect, and baseline measures, among others. A widely-used statistical tool in the analysis and modeling of longitudinal and repeated measures data is the linear mixed effects model (LMM) [1,2]. This model provides an effective and flexible way to describe the means and the covariance structures of a response variable after accounting for within subject correlation.

The rapid growth in the size and scope of longitudinal data has created a need for innovative statistical strategies in longitudinal data analysis. Classical methods are based on the assumption that the number of predictors is less than the number of observations. However, there is an increasing demand for efficient prediction strategies for analysis of high-dimensional data, where the number of observed data elements (sample size) are smaller than the number of predictors in a linear model context. Existing techniques that deal with high-dimensional data mostly rely on various penalized estimators. Due to the trade-off between model complexity and model prediction, the statistical inference of model selection becomes an extremely important and challenging problem in high-dimensional data analysis.

Over the years, many penalized regularization approaches have been developed to do variable selection and estimation simultaneously. Among them, the least absolute shrinkage and selection operator (LASSO) is commonly used [3]. It is a useful estimation technique in part due to its convexity and computational efficiency. The LASSO approach is based on an 1 penalty for regularization of regression parameters. Ref. [4] provides a comprehensive summary of the consistency properties of the LASSO approach. Related penalized likelihood methods have been extensively studied in the literature, see for example [5,6,7,8,9,10]. The penalized likelihood methods have a close connection to Bayesian procedures. Thus, the LASSO estimate corresponds to a Bayes method that puts a Laplacian (double-exponential) prior on the regression coefficients [11,12].

In this paper, our interest lies in estimating the fixed effect parameters of the LMM using a ridge estimation technique when it is assumed that some prior information is available in the form of potential linear restrictions on the parameters. One possible source of prior information is using a Bayesian approach. An alternative source of prior information may be obtained from previous studies or expert knowledge that search for or assume sparsity patterns.

We consider the problem of fixed effect parameter estimation for LMMs when there exist many predictors relative to the sample size. These predictors may be classified into two groups: sparse and non-sparse. Thus, there are two choices to be considered: a full model with all predictors, and a sub-model that contains only non-sparse predictors. When the sub-model based on available subspace information is true (i.e., the assumed restriction holds), it then provides more efficient statistical inferences than those based on a full model. In contrast, if the sub-model is not true, the estimates could become biased and inefficient. The consequences of incorporating subspace information therefore depend on the quality or reliability of the information being incorporated into the estimation procedure. One way to deal with uncertain subspace information is to use a pretest estimation strategy. The validity of the information is tested before incorporation into a final estimator. Another approach is shrinkage estimation, which shrinks the full model estimator to the sub-model estimator by utilizing subspace information. Besides these estimation strategies, there is a growing literature on simultaneous model selection and estimation. These approaches are known as penalty strategies. By shrinking some regression coefficients toward zero, the penalty methods simultaneously select a sub-model and estimate its regression parameters. Several authors have investigated the pretest, shrinkage, and penalty estimation strategies in partial linear model, Poisson regression model, and Weibull censored regression model [13,14,15].

To formulate the problem, we suppose that the vector of the fixed effects parameter β in the LMM can be partitioned into two sub-vectors β=(β1,β2), where β1 is the coefficient vector of non-sparse predictors and β2 is the coefficient vector of sparse predictors. Our interest lies in the estimation of β1 when β2 is close to zero. To deal with this problem in the context of low dimensional data, ref. [16] propose an improved estimation strategy using sub-model selection and post-estimation for the LMM. Within this framework, linear shrinkage and shrinkage pretest estimation strategies are developed, which combine full model and sub-model estimators in an effective way as a trade-off between bias and variance. Ref. [17] extend this study by using a likelihood ratio test to develop James–Stein shrinkage and pretest estimation methods based on LMM for longitudinal data. In addition, the non-penalty estimators are compared with several penalty estimators (LASSO, adaptive LASSO and Elastic Net) for best performance.

In most real data situations, there is also the problem of multicollinearity among predictor variables for high-dimensional data. Various biased estimation techniques such as shrinkage estimation, partial least squares estimation [18] and Liu estimators [19] have been implemented to deal with this problem, but the widely used technique is ridge estimation [20]. The ridge estimator overcomes the weakness of the least squares estimator with a smaller mean squared error. To overcome and combat multicollinearity, ref. [21] propose pretest and Stein-type ridge regression estimators for linear and partially linear models. Furthermore, ref. [22] also develop shrinkage estimation based on Liu regression to overcome multicollinearity in linear models.

Our primary focus is on the estimation and prediction problem for linear mixed effect models when there are many potential predictors that have a weak or no influence on the response of interest. This method simultaneously controls overfitting using general least square estimation with a roughness penalty. We propose pretest and shrinkage estimation strategies using the ridge estimation technique as a base estimator and numerically compare their performance with the LASSO and adaptive LASSO estimators. Our proposed estimation strategy is applied to both high-dimensional and low-dimensional data.

The rest of this article is organized as follows. In Section 2, we present the linear mixed effect model and the proposed estimation techniques. We introduce the full and sub-model estimators based on ridge estimation. Thereafter, we construct the pretest and shrinkage ridge estimators. Section 3 provides the asymptotic bias and risk of these estimators. A Monte Carlo simulation is used to evaluate the performance of the estimators including a comparison with the lasso-type estimators, and the results are reported in Section 4. Section 5 presents a demonstration of the proposed methodology on a high-dimensional resting-state effective brain connectivity and genetic data. We also illustrate the proposed estimation methods in an application to a low-dimensional Amsterdam growth and health study. Section 6 presents a discussion with recommendations.

2. Model and Estimation Strategies

In this section, we present the linear mixed effect model and the proposed estimation strategies.

2.1. Linear Mixed Model

Suppose that we have a sample of N subjects. For the ith subject, we collect the response variable yij for the jth time, where i=1,n;j=1,ni and N=i=1nni. Let Yi=(yi1,yini) denotes the ni×1 vector of responses from the ith subject. Let Xi=(xi1,,xini) and Zi=(zi1,,zini) be ni× p and ni× q known fixed-effects and random-effect design matrix for the ith subject of full rank p and q, respectively. The linear mixed effect model [1] for a vector of repeated responses Yi on the ith subject is assumed to have the form

Yi=Xiβ+Ziai+ϵi, (1)

where β=(β1,,βp) is the p × 1 vector of unknown fixed-effect parameters or regression coefficients, ai is the q × 1 vector of unobservable random effects for the ith subject, assumed to come from a multivariate normal distribution with zero mean and a covariance matrix G, where G is an unknown q×q covariance matrix and ϵi denotes ni× 1 vector of error terms assumed to be normally distributed with zero mean, covariance matrix σ2Ini. Further, ϵi are assumed to be independent of the random effects ai.

The marginal distribution for the response yi is normal with mean Xiβ and covariance matrix Cov(Yi)=Ziσi2ZiT+σ2In. By stacking the vectors, the mixed model can be can be expressed as Y=Xβ+Za+ϵ. From the Equation (1), the distribution of the model follows YNn(Xβ,V), where E(Y)=Xβ with covariance, V=i=1nZiσi2ZiT+σ2In.

2.2. Ridge Full Model and Sub-Model Estimator

The generalized least square estimator (GLS) is defined as β^GLS=(XTV1X)1XTV1Y and the ridge full model estimator can be obtained by introducing a penalized regression so that β^=argminβ(YXβ)TV1(YXβ)+kβTβ and β^Ridge=(XTV1X+kI)1XTV1Y, where β^Ridge is the ridge full model estimator and k[0,) is the tuning parameter. If k = 0, β^Ridge is the GLS estimator and β^Ridge=0 for k is sufficiently large. We select the value of k using cross validation.

We let X=(X1,X2), where X1 is an n×p1 sub-matrix containing the non-sparse predictors and X2 is an n×p2 sub-matrix that contains the sparse predictors. Accordingly, β=(β1,β2) where β1 and β2 have dimensions p1 and p2, respectively, with p1+p2=p, pi0 for i = 1, 2.

A sub-model is defined as Y=Xβ+Za+ϵsubjecttoβTβϕandβ2=0 which corresponds to Y=X1β1+Za+ϵsubjecttoβ1Tβ1ϕ. The sub-model estimator β^1RSM of β1 has the form β^1RSM=(X1TV1X1+kI)1X1TV1Y. We denote β^1RFM as the full model ridge estimator of β1 and given as β^1RFM=(X1TV1/2MX2V1/2X1+kI)1X1TV1/2MX2V1/2Y, where MX2=IP=IV1/2X2(X2V1X2)1X2TV1/2.

2.3. Pretest Ridge Estimation Strategy

Generally, the sub-model estimator will be more efficient than the full model estimator if the information embodied in the imposed linear restrictions is valid, thus β2 is close to zero. However, if the information is not valid the sub-model estimator is likely to be more biased and may have a higher risk than the full model estimator. There is, therefore, some doubt as to whether or not to impose the restrictions on the model’s parameter. It is in response to this uncertainty that a statistical test may be used to determine the validity of the proposed restrictions. Accordingly, the procedure to follow in practice is pretest the validity of the restrictions and if the outcome of the pretest suggests that they are correct then the model parameters are estimated incorporating the restrictions. If the pretest rejects the restrictions then the parameters are estimated from the sample information alone. This motivates the consideration of the pretest estimation strategy for the LMM.

The pretest estimator is a combination of the full model estimator β^1RFM, and sub-model estimator β^1RSM, through an indicator function I(Lndn,α), where Ln is an appropriate test statistic to test H0:β2=0 versus HA:β20. Moreover, dn,α is an α level critical value based on distribution of Ln under H0. We define test statistics based on the log-likelihood ratio test as Ln=2*(β^RFMY)*(β^RSMY).

Under H0, the test statistic Ln follows asymptotic chi-square distribution with p2 degrees of freedom. The pretest test ridge estimator β^1RPT of β1 is then defined by

β^1RPT=β^1RFM(β^1RFMβ^1RSM)I(Lndn,α),p21

.

2.4. Shrinkage Ridge Estimation Strategy

The pre-test estimator is a discontinuous function of the sub-model β^1RSM and full model β^1RFM, which depends on the hard threshold (dn,α=χp2,α2). We address this limitation by defining the shrinkage ridge estimator based on soft thresholding. The shrinkage ridge estimator (RSE) of β1, denoted as β^1RSE, is defined as

β^1RSE=β^1RSM+(β^1RFMβ^1RSM)(1(p22)Ln1),p23.

Here, β^1RSE is the linear combination of the full model β^1RFM and sub-model β^1RSM estimates. If Ln(p22), then a relatively large weight is placed on β^1RSM otherwise, more weight is on β^1RFM. A setback with β^1RSE is that it is not a convex combination of β^1RFM and β^1RSM. This can cause over-shrinkage, which gives the estimator opposite sign of β^1RFM. This could happen if (p22)Ln1 is larger than one. To counter this, we use the positive-part shrinkage ridge estimator (RPS) defined as

β^1RPS=β^1RSM+(β^1RFMβ^1RSM)(1(p22)Ln1)+,p23

where (1(p22)Ln1)+=max(0,1(p22)Ln1). The RPS estimator will control possible over-shrinking in the RSE estimator.

3. Asymptotic Results

In this section, we derive the asymptotic distributional bias and risk of the estimators considered in Section 2. We examine the properties of the estimators for increasing n and as β2 approaches the null vector under the sequence of local alternatives defined as

Kn:β2=β2(n)=κn, (2)

where κ=(κ1,κ2,κp2)Rp2 is a fixed vector. The vector κn is a measure of how far local alternatives Kn differ from the subspace information β2=0. In order to evaluate the performance of the estimators, we define the asymptotic distributional bias of the estimator β^1* as

ADB(β^1*)=limnEn(β^1*β1),

In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimator β^1* is expressed as

Cov(β^1*)=limnEn(β^1*β1)(β^1*β1)T.

Following the asymptotic covariance matrix, we define the asymptotic risk of an estimator β^1* as R(β^1*)=trQCov(β^1*). Q is a positive definite matrix of weights with dimensions of p×p. We set Q = I in this study.

Assumption 1.

We make the following two regularity conditions to establish the asymptotic properties of the estimators.

1.1nmax1inxiTXTV1X1xi0as n,wherexiTis the ith row ofX.

2.Bn=n1XTV1X1B,for some finiteB = B11B12B21B22.

Theorem 1.

Fork<, Ifk/nλoand B is non-singular, the distribution of the full model ridge estimator, β^nRFMis

n(β^nRFMβ)DN(λoB1β,B1),

whereDdenotes convergence in distribution.

Proof. 

See Theorem 2 in [23]. □

Proposition 1.

Assuming the above assumption 1 together with Theorem 1 hold, under the local alternativesKn, we have

φ1φ3DNμ11.2δ,B11.21ΦΦΦ,
φ3φ2DNδγ,Φ00B111,

where φ1=n(β^1RFMβ1), φ2=n(β^1RSMβ1), φ3=n(β^1RFMβ^1RSM), γ=μ11.2+δ, δ=B111B12κ, Φ=B111B12B22.11B21B111, B22.1=B22B21B111B12, μ=λoB1β=μ1μ2 and μ11.2=μ1B12B221((β2κ)μ2).

Proof. 

See Appendix A. □

Theorem 2.

Under the condition of Theorem 1 and the local alternativesKn, the ADBs of the proposed estimators are

ADB(β^1RFM)=μ11.2,ADB(β^1RSM)=μ11.2B111B12δ=γ,ADB(β^1RPT)=μ11.2δHp2+2(χp2,α2;Δ),ADB(β^1RSE)=μ11.2(p22)δE(χp2+22(Δ)),ADB(β^1RPS)=μ11.2δHp2+2(χp222;Δ)}(p22)δEχp2+22(Δ)I(χp2+22>p22),

whereΔ=κTB22.11κ, B22.1=B22B21B111B12, and Hv(x;Δ)is the cumulative distribution function of the non-central chi-squared distribution with non-centrality parameter Δ and v degrees of freedom, and E(χv2j(Δ)) is the expected value of the inverse of a non-central χ2 distribution with v degrees of freedom and non-centrality parameter Δ,

E(χv2j(Δ))=0x2jdHv(x,Δ).

Proof. 

See Appendix B.1. □

Since the ADBs of the estimators are in non-scalar form, we define the following asymptotic quadratic bias (AQDB) of β^1* by

AQDB(β^1*)=ADB(β^1*)B11.2ADB(β^1*),

where B11.2=B11B12B221B21.

Corollary 1.

Suppose Theorem 2 holds. Then, under{Kn}, the AQDBs of the estimators are

AQDB(β^1RFM)=μ11.2TB11.2μ11.2,AQDB(β^1RSM)=γTB11.2γ,AQDB(β^1RPT)=μ11.2TB11.2μ11.2+μ11.2TB11.2δHp2+2(χp22;Δ)+δTB11.2μ11.2Hp2+2(χp22;Δ)+δTB11.2δHp2+22(χp22;Δ),AQDB(β^1RSE)=μ11.2TB11.2μ11.2+(p22)μ11.2TB11.2δEχp2+22(Δ)+(p22)δTB11.2μ11.2Eχp2+22(Δ)+(p22)2δTB11.2δEχp2+22(Δ)2,AQDB(β^1RPS)=μ11.2TB11.2μ11.2+δTB11.2μ11.2+μ11.2TB11.2δ[Hp2+2(p22;Δ)+(p22)Eχp2+22(Δ)I(χp2+22(Δ)>p22)]+δTB11.2δ[Hp2+2(p22;Δ)+(p22)Eχp2+22(Δ)I(χp2+22(Δ)>p22)]2.

When B11.2=0, the AQDB of all estimators are equivalent, and the estimators are therefore asymptotically unbiased. If we assume that B11.20, the results for the bias of the estimators can be summarized as follows:

  1. The AQDB of β^1RSM is an unbounded function of γTB11.2γ.

  2. The AQDB of β^1RPT starts from μ11.2TB11.2μ11.2 at Δ=0, and when Δ increases, it increases to the maximum and then decreases to zero.

  3. The characteristics of β^1RSE and β^1RPS are similar to β^1RPT. The AQDB of β^1RSE and β^1RPS similarly start from μ11.2TB11.2μ11.2 at Δ=0, and increase to a point, and then decrease towards zero, since Eχp2+22(Δ) is a non-increasing on of Δ.

Theorem 3.

Suppose Theorem 1 holds and under the local alternativesKn, the covariance matrices of the estimators are

Cov(β^1RFM)=B11.21+μ11.2μ11.2T,Cov(β^1RSM)=B111+γγT,Cov(β^1RPT)=B11.21+μ11.2μ11.2T+2μ11.2TδHp2+2(χp22;Δ)ΦHp2+2(χp22;Δ)+δδT2Hp2+2(χp22;Δ)Hp2+4(χp22;Δ),Cov(β^1RSE)=B11.21+μ11.2μ11.2T+2(p22)μ11.2TδEχp2+22(Δ)(p22)Φ2Eχp2+22(Δ)(p22)Eχp2+24(Δ)+(p22)δδT2Eχp2+42(Δ)+2E(χp2+22(Δ))+(p22)Eχp2+44(Δ),Cov(β^1RPS)=Cov(β^1RSE)+2δμ11.2TE1(p22)χp2+22(Δ)Iχp2+22(Δ)p222ΦE1(p22)χp2+22(Δ)Iχp2+22(Δ)p222δδTE{1(p22)χp2+42(Δ)}I(χp2+42(Δ)p22)+2δδTE1(p22)χp2+22(Δ)Iχp2+22(Δ)p22(p22)2ΦEχp2+24(Δ)Iχp2+2,α2(Δ)p22(p22)2δδTEχp2+2,α4(Δ)Iχp2+2,α2(Δ)p22+ΦHp2+2p22;Δ+δδTHp2+4p22;Δ.

Proof. 

See Appendix B.2. □

Corollary 2.

Under the local alternatives (Kn) and from Theorem 3, the risk of the estimators are obtained as

Rβ^1RFM=trQB11.21+μ11.2TQμ11.2,R[β^1RSM]=trQB111+γTQγ,Rβ^1RPT=trQB11.21+μ11.2TQμ11.2+2μ11.2TQδHp2+2χp22;ΔtrQΦHp2+2χp22;Δ+δQδT2Hp2+2χp22;ΔHp2+4χp22;Δ,Rβ^1RSE=trQB11.21+μ11.2TQμ11.2+2(p22)μ11.2TQδEχp2+22(Δ)(p22)tr(QΦ)Eχp2+22(Δ)(p22)Eχp2+24(Δ)+(p22)δTQδ2Eχp2+22(Δ)2Eχp2+42(Δ)(p22)Eχp2+44(Δ),Rβ^1RPS=Rβ^1RSE+2δQμ11.2TE1(p22)χp2+22(Δ)Iχp2+22(Δ)p222tr(QΦ)E1(p22)χp2+22(Δ)Iχp2+22(Δ)p222δTQδE{1(p22)χp2+42(Δ)}I(χp2+42(Δ)p22)+2δTQδE1(p22)χp2+22(Δ)Iχp2+22(Δ)p22(p22)2tr(QΦ)Eχp2+24(Δ)Iχp2+22(Δ)p22(p22)2δTQδEχp2+24(Δ)Iχp2+22(Δ)p22+tr(QΦ)Hp2+2p22;Δ+δTQδHp2+4p22;Δ.

From Theorem 2, when B12=0, the risks of estimators β^1RSM,β^1RPT,β^1RSE, and β^1RPS are reduced to common value tr(QB11.21)+μ11.2TQμ11.2, the risk of β^1RFM. If B120, the results can be summarized as follows:

  1. The risk of β^1RFM remains constant while the risk of β^1RSM is an unbounded function of Δ since Δ[0,).

  2. The risk of β^1RPT increases as Δ moves away from zero, achieves it maximum and then decreases towards the risk of the full model estimator.

  3. The risk of β^1RFM is smaller than the risk of β^1RPT for small values in the neighborhood of Δ and for the rest of the parameter space, β^1RPT outperforms β^1RFM, thus, Rβ^1RFM>Rβ^1RPT.

  4. Comparing the risks of β^1RSE and β^1RFM, it can be seen that the estimator β^1RSE outperforms β^1RFM that is, Rβ^1RSERβ^1RFM for all Δ0.

4. Simulation Studies

In this section, we conduct a simulation study to assess the performance of the suggested estimators for finite samples. The criterion for comparing the performance of any estimator in our study is the mean square error. We simulate the response from the following LMM model

Yi=Xiβ+Ziai+ϵi, (3)

where ϵiN(0,σ2Ini) with σ2=1. We generate random effect covariate ai from a multivariate normal distribution with zero mean and covariance matrix G=0.5I2×2, where I2×2 is 2×2 identity matrix. The design matrix Xi=(xi1,,xini) is generated from a ni-multivariate normal distribution with mean vector and covariance matrix x. Furthermore, we assume that the off-diagonal elements of the covariance matrix x are equal to ρ, which is the coefficient of correlation between any two predictors, with ρ=0.3,0.7,0.9. The ratio of the largest eigenvalue to the smallest eigen-value of matrix XTV1X is calculated as a condition number index (CNI) [24], which assesses the existence of multicollinearity in the design matrix. If the CNI is larger than 30, then the model has significant multicollinearity. Our simulations are based on the linear mixed effects model in Equation (3) with n=60 and 100 subjects.

We consider a situation when the model is assumed to be sparse. In this study, our interest lies in testing the hypothesis Ho:β2=0, and our goal is to estimate the fixed effect coefficient β1. We partition the fixed effects coefficients as β=(β1,β2)=(β1,0p2). The coefficients β1 and β2 are p1 and p2 dimensional vectors, respectively, with p=p1+p2.

In order to investigate the behavior of the estimators, we define Δ*=||ββo||, where βo=(β1T,0p2)T and ||.|| is the euclidean norm. We considered Δ* values between 0 and 4. If Δ*=0, then we will have β=(1,1,1,1,0,0,,0p2)T to generate the response under null hypothesis. On the other hand, when Δ*0, say Δ*=4, we will have β=(1,1,1,1,4,0,0,,0p21)T to generate the response under the local alternative hypothesis. In our simulation study, we consider the number of fixed effect or predictor variables as (p1,p2){(5,40),(5,500),(5,1000)}. Each realization is repeated 5000 times to obtain consistent results and compute the MSE of suggested estimators with α=0.05.

Based on the simulated data, we calculate the mean square error (MSE) of all the estimators as MSE(β^)=15000j=15000(β^β)T(β^β), where β^ denotes any one of β^RSM,β^RPT,β^RSE and β^RPS, in the jth repetition. We use the relative mean squared efficiency (RMSE), or the ratio of MSE for risk performance comparison. The RMSE of an estimator β^* with respect to the baseline full model ridge estimator β^1RFM is defined as RMSE(β^1RFM:β^1*)=MSE(β^1RFM)MSE(β^1*), where β1* is one of the suggested estimators under consideration.

4.1. Simulation Results

In this subsection, we present the results from our simulation study. We report the results for n=60,100 and p1=5 with different values of correlation coefficient ρ are shown in Table 1. Furthermore, we plot the RMSEs against Δ* in Figure 1 and Figure 2. The findings can be summarized as follows:

  1. When Δ*=0, the sub-model RSM outperforms all other estimators. As Δ*=0 moves from zero, the RMSE of the sub-model decreases and goes to zero.

  2. The pretest ridge estimator RPT outperforms shrinkage ridge and positive Stein ridge estimators in the case of Δ*=0. However, for large number of sparse predictors p2 while keeping p1 and n fixed, RPT is less efficient than RPS and RSE. In the case of Δ* being larger than zero, the RMSE of RPT decreases, and it remains below 1 for immediate values of Δ*, after that the RMSE of RPT increases and approaches one for larger values of Δ*.

  3. RPS performs better than RSE in the entire parameter space induced by Δ* as presented in Table 1 and Table 2. Similarly, both shrinkage estimators RPS and RSE outperforms the full ridge model estimator irrespective of the corrected sub-model selected. This is consistent with the asymptotic theory presented in Section 3.

  4. Δ* which measures the degree of deviation from the Assumption 1 on the parameter space, it is clear that one cannot go wrong with the use of shrinkage estimators even if the selected sub-model is wrongly specified. As evident from Table 1 and Table 2, Figure 1 and Figure 2, if the selected sub-model is correct, that is, Δ*=0, then the shrinkage estimators are relatively efficient compared with the ridge full model estimator. On the other hand, if the sub-model is misspecified, the gain slowly diminishes. However, in terms of risk, the shrinkage estimators are at least as good as the full ridge model estimator. Therefore, the use of shrinkage estimators makes sense in application when a sub-model cannot be correctly specified.

  5. The RMSE of the ridge-type estimators are an increasing function of the amount of multicollinearity. This indicates that the ridge-type estimators perform better than the classical estimator in the presence of multicollinearity among predictor variables.

Table 1.

RMSEs of RSM, RPT, RSE, and RPS estimators with respect to β^RFM when Δ0 for p1=5 and n=60.

ρ p2 Δ CNI RSM RPT RSE RPS
0.3 40 0 361 2.61 2.07 1.94 1.96
1 1.05 1.07 1.20 1.25
2 0.25 0.95 1.04 1.05
3 0.12 0.98 0.99 1.00
4 0.08 1.00 1.00 1.00
500 0 613 4.48 3.29 3.48 1.96
1 1.26 1.12 1.26 1.29
2 0.41 0.97 1.08 1.09
3 0.18 0.99 1.00 1.00
4 0.13 1.00 1.00 1.00
1000 0 693 5.36 4.53 4.67 4.71
1 1.53 1.21 1.35 1.39
2 0.49 1.01 1.13 1.14
3 0.28 0.99 0.99 0.99
4 0.10 1.00 1.00 1.00
0.7 40 0 1352 3.18 2.33 2.17 2.18
1 1.04 1.11 1.20 1.23
2 0.42 1.03 1.04 1.04
3 0.23 0.98 0.99 1.00
4 0.14 1.00 1.00 1.00
500 0 1789 4.48 2.76 2.94 3.02
1 1.08 1.43 1.52 1.53
2 0.67 1.03 1.07 1.06
3 0.35 0.98 1.00 1.00
4 0.19 1.00 1.00 1.00
1000 0 2134 6.82 5.24 5.30 3.02
1 1.16 1.32 1.42 1.53
2 0.75 1.10 1.15 1.16
3 0.39 0.99 1.00 1.00
4 0.11 1.00 1.00 1.00

Figure 1.

Figure 1

RMSE of estimators as a function of the non-centrality parameter Δ when n = 60, and p1=5.

Figure 2.

Figure 2

RMSE of estimators as a function of the non-centrality parameter Δ when n = 100, and p1=5.

Table 2.

RMSEs of RSM, RPT, RSE, and RPS estimators with respect to β^RFM when Δ0 for p1=5, and n=100.

ρ p2 Δ CNI RSM RPT RSE RPS
0.3 40 0 150 2.38 2.09 1.88 1.90
1 0.89 1.01 1.05 1.08
2 0.21 0.94 1.01 1.02
3 0.06 0.94 0.99 1.00
4 0.02 1.00 1.00 1.00
500 0 340 4.15 2.65 2.99 3.17
1 0.87 1.08 1.18 1.21
2 0.14 0.96 1.03 1.05
3 0.06 0.99 0.99 1.00
4 0.03 1.00 1.00 1.00
1000 0 536 4.30 2.75 3.02 3.08
1 0.96 1.09 1.13 1.15
2 0.21 0.8 1.03 1.03
3 0.09 1.00 1.00 1.00
4 0.04 1.00 1.00 1.00
0.7 40 0 997 3.27 2.15 2.09 2.11
1 0.85 1.02 1.09 1.10
2 0.21 0.98 1.02 1.02
3 0.06 0.99 0.99 0.99
4 0.01 1.00 1.00 1.00
500 0 1589 4.13 2.22 2.35 2.39
1 1.04 1.19 1.21 1.20
2 0.30 0.97 1.05 1.05
3 0.14 1.00 1.00 1.00
4 0.08 1.00 1.00 1.00
1000 0 1751 5.17 3.71 4.03 4.09
1 1.01 1.15 1.24 1.25
2 0.39 1.04 1.07 1.06
3 0.16 0.99 1.00 1.00
4 0.11 1.00 1.00 1.00

4.2. Comparison with LASSO-Type Estimators

We compare our listed estimators with the LASSO and adaptive LASSO estimators. A 10-fold cross-validation is used for selecting the optimal value of the penalty parameters that minimizes the mean square errors for the LASSO-type estimators. The results for ρ=0.3,0.7,0.9, n=60,100, p1=10 and p2=50,500,1000,2000 are presented in Table 3. We observe the following from Table 3.

Table 3.

RMSEs of estimators with respect to β^RFM when Δ=0 for p1=10.

n ρ p2 CNI RSM RPT RSE RPS LASSO aLASSO
60 0.3 50 35.64 3.31 2.25 1.82 1.95 1.23 1.28
500 452.76 4.13 3.71 2.61 3.01 1.47 1.52
1000 1265.34 5.02 4.28 4.61 4.78 1.96 2.15
2000 4567.56 7.13 5.10 6.18 6.39 2.70 3.06
0.7 50 61.34 3.52 3.05 2.51 2.55 1.14 1.21
500 743.17 4.49 3.65 3.41 3.50 1.36 1.58
1000 2350.89 5.84 4.11 4.32 4.61 1.68 1.95
2000 6908.39 8.10 5.31 6.24 6.29 1.84 2.02
0.9 50 120.21 4.21 3.61 3.34 3.35 1.10 1.05
500 950.98 4.82 3.3.8 3.72 3.73 1.21 1.16
1000 5892.51 6.35 4.10 5.01 5.13 1.42 1.31
2000 8352.73 8.51 4.63 5.24 5.38 1.61 1.35
100 0.3 50 31.21 2.91 2.54 2.12 2.23 1.32 1.36
500 356.64 3.75 3.31 2.84 2.92 1.54 1.61
1000 975.32 4.25 2.53 3.42 3.61 1.92 2.06
2000 2764.84 5.61 4.25 4.91 5.08 2.31 2.46
0.7 50 52.79 3.18 2.61 2.30 2.37 1.28 1.53
500 578.43 4.28 3.05 3.52 3.59 1.46 2.07
1000 1281.66 5.10 3.26 3.78 3.82 1.84 2.52
2000 3498.30 6.12 3.01 4.26 4.33 2.27 2.41
0.9 50 79.41 4.11 3.41 3.21 3.28 1.28 1.21
500 681.43 4.35 3.55 3.41 3.50 1.43 1.51
1000 1470.32 5.82 3.18 4.01 4.14 1.72 1.79
2000 4105.90 7.04 4.57 5.22 5.32 1.87 1.96
  1. The performance of the sub-model estimator is the best among all estimators.

  2. The pretest ridge estimator performs better than the other estimators. However, for larger values of sparse predictors p2 the shrinkage estimators outperform the pretest estimator.

  3. The performance of the LASSO and aLASSO estimators are comparable when ρ is small. The pretest and shrinkage estimators remain stable for a given value of ρ.

  4. For a large number of sparse predictors p2, the shrinkage and pretest ridge estimators outperforms the lasso-type estimators. This indicates the superiority of the shrinkage estimators over the LASSO-type estimators. Therefore shrinkage estimators are preferable when there is multicollinearity in our predictor variables.

5. Real Data Application

We consider two real data analyses using Amsterdam Growth and Health Data and a genetic and brain network connectivity edge weight data to illustrate the performance of the proposed estimators.

5.1. Amsterdam Growth and Health Data (AGHD)

The AGHD data is obtained from the Amsterdam Growth and Health Study [25]. The goal of this study is to investigate the relationship between lifestyle and health in adolescence into young adulthood. The response variable Y is the total serum cholesterol measured over six time points. There are five covariates: X1 is the baseline fitness level measured as the maximum oxygen uptake on a treadmill, X2 is the amount of body fat estimated by the sum of the thickness of four skinfolds, X3 is a smoking indicator (0 = no, 1 = yes), X4 is the gender (1 = female, 2 = male), and time measurement as X5 and subject specific random effects.

A total of 147 subjects participated in the study where all variables were measured at ni=6 time occasions. In order to apply the proposed methods, firstly, we apply a variable selection based on AIC procedure to select the sub-model. For the AGHD data, we fit a linear mixed model with all the five covariates for both fixed and subject specific random effects by two stage selection procedure for the purpose of choosing both the random and fixed effects. The analysis found X2 and X5 to be significant covariates for prediction of the response variable serum cholestrol and the other variables are ignored since they are not significantly important. Based on this information, a sub-model is chosen to be X2 and X5 and the full model includes all the covariates. We construct the shrinkage estimators from the full-model and sub-model. In terms of null hypothesis, the restriction can be written as β2=(β1,β3,β4)=(0,0,0) with p=5, p1=2 and p2=3.

To evaluate the performance of the estimators, we obtain the mean square prediction error (MSPE) using bootstrap samples. We draw 1000 bootstrap samples of the 147 subjects from the data matrix {(Yij,Xij),i=1,2,,147;j=1,2,,6}. We then calculate the relative prediction error (RPE) of β1* with respect to β1RFM, the full model estimator. The RPE is defined as

RPE(β^1RFM:β^1*)=MSPE(β^1*)MSPE(β^1RFM)=(YX1β^1*)(YX1β^1*)(YX1β^1RFM)(YX1β^1RFM),

where β1* is one of the listed estimators. If RPE<1, then β^1* outperforms β^1RFM.

Table 4 reports the estimates, standard error of the non-sparse predictors and RPEs of the estimators with respect to the full model. As expected, the sub-model ridge estimator β^1RSM has the minimum RPE because it is computed when the sub-model is correct, that is, Δ*=0. It is evident by the RPE values in Table 4 that the shrinkage estimators are superior to the LASSO-type estimators. Furthermore, the positive shrinkage is more efficient than the shrinkage ridge estimator.

Table 4.

Estimate, standard error for the active predictors and RPEs of estimators with respect to full-model estimator for the Amsterdam Growth and Health Study data.

RFM RSM RPT RSE RPS LASSO aLASSO
Estimate(β2 ) 0.381 0.395 0.392 0.389 0.390 0.624 0.611
Standard error 0.104 0.102 0.100 0.009 0.008 0.081 0.079
Estimate (β5) 0.137 0.125 0.131 0.130 0.133 0.101 0.105
Standard error 0.012 0.010 0.009 0.011 0.010 0.013 0.012
RPE 1.000 0.723 0.841 0.838 0.831 0.986 0.973

5.2. Resting-State Effective Brain Connectivity and Genetic Data

This data comprises longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) effective brain connectivity network and genetic study [26] data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The 111 subjects comprise 36 cognitively normal (CN), 63 mild cognitive impairment (MCI) and 12 Alzheimer’s Disease (AD) subjects. The response is a network connection between regions of interest estimated from an rs-fMRI scan within the Default Mode Network (DMN), and we observe a longitudinal sequence of such connections for each subject with the number of repeated measurements. The DMN consists of a set of brain regions that tend to be active in resting-state, when a subject is mind wandering with no intended task. For this data analysis, we consider the network edge weight from the left intraparietal cortex to posterior cingulate cortex (LIPC → PCC) as our response. The genetic data are single nucleotide polymorphism (SNPs) from non-sex chromosomes, i.e., chromosome 1 to chromosome 22. SNPs with minor allele frequency less than 5% are removed as are SNPs with a Hardy–Weinberg equilibrium p-value lower than 106 or a missing rate greater than 5%. After preprocessing we are left with 1,220,955 SNPs and the longitudinal rs-fMRI effective connectivity network using the 111 subjects with rs-fMRI data. The response is network edge weight. There are SNPs which are the fixed effects and subject specific random effects.

In order to apply the proposed methods, we use a genome- wide association study (GWAS) for screening the genetic data to 100 SNPs. We implement a second screening by applying multinomial logistic regression to identify a smaller subset of the 100 SNPs that are potentially associated with disease (CN/MCI/AD). This yields a subset of top 10 SNPs. This showed the top 10 SNPs are the most important predictors and the other 90 SNPs are ignored as not significant. We now have two models, which are the full model with all 100 SNPs and sub-model with 10 SNPs selected. Finally, we construct the pretest and shrinkage estimators from the full-model and sub-model.

We draw 1000 bootstrap samples with replacements from the corresponding data matrix {(Yij,Xij),i=1,,111;j=1,,ni}. We report the RPE of the estimators based on the bootstrap simulation with respect to the full model ridge estimator in Table 5. We observe that the RPE of the sub-model, pretest, shrinkage and positive shrinkage ridge estimators outperforms the full model estimator. Clearly, the sub-model ridge estimator has the smallest RPE since it’s computed when the candidate sub-model is correct, i.e., Δ=0. Both shrinkage ridge estimators outperform the pretest ridge estimator. Particularly, the positive shrinkage performed better than the shrinkage estimator. The performance of both shrinkage and pretest ridge estimators are better than the LASSO-type estimators. Thus, the data analysis is in line with our simulation and theoretical findings.

Table 5.

RPEs of estimators.

RFM RSM RPT RSE RPS LASSO aLASSO
RPE 1.000 0.802 0.947 0.932 0.928 1.051 1.190

6. Conclusions

In this paper, we present efficient estimation strategies for the linear mixed effect model when there exists multicollinearity among predictor variables for high-dimensional data application. We considered the estimation of fixed effects parameters in the linear mixed model when some of the predictors may have a very weak influence on the response of interest. We introduced pretest and shrinkage estimation in our model using the ridge estimation as the reference estimator. In addition, we established the asymptotic properties of the pretest and shrinkage ridge estimators. Our theoretical findings demonstrate that the shrinkage ridge estimators outperform the full model ridge estimator and perform relatively better than the sub-model estimator in a wide range of the parameter space.

Additionally, a Monte Carlo simulation was conducted to investigate and assess the finite sample behavior of proposed estimators when the model is sparse (restrictions on parameters hold). As expected, the sub-model ridge estimator outshines all other estimators when the restrictions hold. However, when this assumption is violated, the shrinkage and pretest ridge estimators outperform the sub-model estimator. Furthermore, when the number of sparse predictors are extremely large relative to the sample size, the shrinkage estimators outperform the pretest ridge estimator. These numerical results are consistent with our asymptotic result. We also assess the relative performance of the LASSO-type estimators with our ridge-type estimators. We observe that the performance of pretest and shrinkage ridge estimators are superior to the LASSO-type estimators when predictors are highly correlated. For our real data application, the shrinkage ridge estimators are superior with the smallest relative prediction error compared to the LASSO-type estimators.

In summary, the results of the data analyses strongly confirm the findings of the simulation study and suggest the use of the shrinkage ridge estimation strategy when no prior information about the parameter subspace is available. The results of our simulation study and real data application are consistent with available results in [27,28,29].

In our future work, we will focus on other penalty estimators like the Elastic-Net, the minimax concave penalty (MCP), and the smoothly clipped absolute deviation method (SCAD) as estimation strategy in LMM for high-dimensional data. These estimators will be assessed and compared with the proposed ridge-type estimators. Another interesting extension will be integrating two sub-models by incorporating ridge-type estimation strategies in the linear mixed effect models. The goal is to improve the estimation accuracy of the non-sparse set of the fixed effects parameters by combining an over-fitted model estimator with an under-fitted one [27,29]. This approach will include combining two sub-models produced by two different variable selection techniques from the LMM [28].

Acknowledgments

Research is supported by the Visual and Automated Disease Analytics (VADA) graduate training program.

Appendix A

Proof of Proposition 1. 

The asymptotic relationship between the sub-model and full model estimators of β1, we use the argument and equation: Y^=YX2β^2RFM, where

β^1RFM=argminβ1(Y^X1β1)TV1(Y^X1β1)+λ||β1||2=X1TV1X1+λIp11X1TV1Y^=X1TV1X1+λIp11X1TV1YX1TV1X1+λIp11X1TV1X2β^2RFM=β^1RSMX1V1X1+λIp11X1TV1X2β^2RFM=β^1RSMB111B12β^2RFM

From Theorem 1, we partition n(β^RFMβ) as n(β^RFMβ)=n(β^1RFMβ1),n(β^2RFMβ2). We obtain n(β^1RFMβ1)DNp1(μ11.2,B11.21), where B11.21=B11B12B221B21. We have shown that β^1RSM=β^1RFM+B111B12β^2RFM. Using this expression and under the local alternative {Kn}, we obtain the following expressions

φ2=nβ^1RSMβ1=nβ^1RFM+B111B12β^2RFMβ1=φ1+B111B12nβ^2RFM,φ3=n(β^1RFMβ^1RSM)=nβ^1RFMβ1nβ^1RSMβ1=φ1φ2.

Since φ2 and φ3 are linear functions of φ1, as n, they are also asymptotically normally distributed. Their mean vectors and covariance matrices are as follows:

E(φ1)=Enβ^1RFMβ1=μ11.2E(φ2)=Eφ1+B111B12nβ^2RFM=E(φ1)+B111B12nE(β^2RFM)=μ11.2+B111B12κ=(μ11.2δ)=γE(φ3)=E(φ1φ2)=μ11.2((μ11.2δ))=δVar(φ1)=B22.11Var(φ2)=Varφ1+B111B12nβ^2RFM=Var(φ1)+B111B12B22.11B21B111+2Covn(β^1RFMβ1),n(β^2RFMβ2)(B111B12)T=B22.11B111B12B22.11B21B111=B111Var(φ3)=Varnβ^1RFMβ^1RSM=Varnβ^1RFMβ^1RFMB111B12β^2RFM=B111B12Varnβ^2RFM(B111B12)T=B111B12B22.11B21B111=ΦCov(φ1,φ3)=Covnβ^1RFMβ1,nβ^1RFMβ^1RSM=Varnβ^1RFMβ1Covnβ^1RFMβ1,nβ^1RSMβ1=Var(φ1)Covnβ^1RFMβ1,nβ^1RFMβ1+nB111B12β^2RFM=B111B12B22.11B21B111=Φ
Cov(φ2,φ3)=Covnβ^1RSMβ1,nβ^1RFMβ^1RSM=Covnβ^1RSMβ1,nβ^1RFMβ1Varnβ^1RSMβ1=B11.21B111B12B22.11B21B111B111=B11.21B11.21B111B111=0

Therefore, the asymptotic distributions of the vectors φ2 and φ3 are obtained as follows:

φ2=n(β^1RSMβ1)DNp1(γ,B111)φ3=n(β^1RFMβ^1RSM)DNp1(δ,Φ)

  □

Appendix B

We next introduce the lemmas given in [30] to aid with the proof of the bias and covariance of the estimators.

Lemma A1.

LetV=(V1,V2,Vp)T be a p-dimensional normal vector distributed as Np(μv,p), then for a measurable function Ψ, we have

EVΨ(VTV)=μvEΨχp+22(Δ)EVVTΨ(VTV)=pEΨχp+22(Δ)+μvμvTEΨχp+42(Δ)

where χk2(Δ) is a non-central chi-square distribution with k degrees of freedom and non-centrality parameter Δ.

Appendix B.1

Proof of Theorem 2. 

ADB(β^1RFM)=Elimnn(β^1RFMβ1)=μ11.2.
ADB(β^1RSM)=Elimnn(β^1RSMβ1)=Elimnn(β^1RFMB111B12β^2RFMβ1)=Elimnn(β^1RFMβ1)Elimnn(B111B12β^2RFM)=μ11.2Elimnn(B111B12β^2RFM)=μ11.2B111B12κ=(μ11.2+δ)=γ.

Using Lemma 1,

ADB(β^1RPT)=Elimnn(β^1RPTβ1)=Elimnn(β^1RFM(β^1RFMβ^1RSM)I(Lndn,α)β1)=Elimnn(β^1RFMβ1)Elimnn(β^1RFMβ^1RSM)I(Lndn,α)=μ11.2Elimnn(β^1RFMβ^1RSM)I(Lndn,α))=μ11.2δHp2+2(χp22;Δ).
ADB(β^1RSE)=Elimnn(β^1RSEβ1)=Elimnn(β^1RFM(β^1RFMβ^1RSM)(p22)Ln1β1)=Elimnn(β^1RFMβ1)Elimnn(β^1RFMβ^1RSM)(p22)Ln1=μ11.2Elimnn(β^1RFMβ^1RSM)(p22)Ln1=μ11.2(p22)δE(χp2+22(Δ)).
ADB(β^1RPS)=Elimnn(β^1RPSβ1)=Elimnn(β^1RSM+(β^1RFMβ^1RSM)(1(p22)Ln1)I(Ln>p22)β1)=E{n[β^1RSM+(β^1RFMβ^1RSM)(1I(Lnp22))(β^1RFMβ^1RSM)(p22)Ln1I(Ln>p22)β1]}=Elimnn(β^1RFMβ1)Elimnn(β^1RFMβ^1RSM)(p22)I(Lnp22)E{limnn(β^1RFMβ^1RSM)(p22)Ln1I(Ln>p22)=μ11.2δHp2+2(χp222;Δ)}(p22)δEχp2+22(Δ)I(χp2+22>p22).

    □

Appendix B.2

In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimator β^1* is expressed as

Cov(β^1*)=limnEn(β^1*β1)(β^1*β1)T.

Proof of Theorem 3. 

We first start by computing the asymptotic covariance of the estimator β^1RFM as:

Cov(β^1RFM)=E{limnn(β^1RFMβ1)n(β^1RFMβ1)T}=E(φ1φ1T)=Cov(φ1φ1T)+E(φ1)E(φ1T)=B11.21+μ11.2μ11.2T.

Furthermore, similarly, the asymptotic covariance of the estimator β^1RSM is obtained as:

Cov(β^1RSM)=E{limnn(β^1RSMβ1)n(β^1RSMβ1)T}=E(φ2φ2T)=Cov(φ2φ2T)+E(φ2)E(φ2T)=B111+γγT.

The asymptotic covariance of the estimator β^1RPT is obtained as:

Cov(β^1RPT)=E{limnn(β^1RPTβ1)n(β^1RPTβ1)T}=E{limnn[β^1RFMβ1)(β^1RFMβ^1RSM)I(Lndn,α)β^1RFMβ1)(β^1RFMβ^1RSM)I(Lndn,α)T=E[φ1φ3I(Lndn,α)][φ1φ3I(Lndn,α)]T=Eφ1φ1T2φ3φ1TI(Lndn,α)+φ3φ3TI(Lndn,α)

Thus, we need to find Eφ1φ1T, Eφ3φ1TI(Lndn,α) and Eφ3φ3TI(Lndn,α), The first term is Eφ1φ1T=B11.21+μ11.2μ11.2T. Using Lemma 1, the third term is computed as:

Eφ3φ3TI(Lndn,α)=ΦHp2+2(χp22;Δ)+δδTHp2+4(χp22;Δ).

The second term Eφ3φ1TI(Lndn,α) can be computed from normal theory as

Eφ3φ1TI(Lndn,α)=EEφ3φ1TI(Lndn,α)|φ3=Eφ3Eφ1TI(Lndn,α)|φ3=Eφ3[μ11.2+(φ3δ)]TI(Lndn,α)=Eφ3μ11.2I(Lndn,α)+Eφ3(φ3δ)TI(Lndn,α)=μ11.2TE{φ3I(Lndn,α)}+E{φ3φ3TI(Lndn,α)}Eφ3δTI(Lndn,α)=μ11.2TδHp2+2(χp22;Δ)+{Cov(φ3φ3T)Hp2+2(χp22;Δ)+E(φ3)E(φ3T)Hp2+4(χp22;Δ)δδTHp2+2(χp22;Δ)}=μ11.2TδHp2+2(χp22;Δ)+ΦHp2+2(χp22;Δ)+δδTHp2+4(χp22;Δ)δδTHp2+2(χp22;Δ)

Putting all the terms together and simplifying, we obtain

Cov(β^1RPT)=μ11.2μ11.2T+2μ11.2TδHp2+2(χp22;Δ)+B11.21ΦHp2+2(χp22;Δ)δδTHp2+4(χp22;Δ)+2δδTHp2+2(χp22;Δ)=B11.21+μ11.2μ11.2T+2μ11.2TδHp2+2(χp22;Δ)ΦHp2+2(χp22;Δ)+δδT2Hp2+2(χp22;Δ)Hp2+4(χp22;Δ).

The asymptotic covariance of the estimator β^1RSE can be obtained as

Cov(β^1RSE)=E{limnn(β^1RSEβ1)n(β^1RSEβ1)T}=E{limnn[β^1RFMβ1)(β^1RFMβ^1RSM)(p22)Ln1β^1RFMβ1)(β^1RFMβ^1RSM)(p22)Ln1T=E[φ1φ3(p22)Ln1][φ1φ3(p22)Ln1]T=Eφ1φ1T2(p22)φ3φ1TLn1+(p22)2φ3φ3TLn2

We need to compute Eφ3φ3TLn2 and Eφ3φ1TLn1. By using Lemma 1, the first term is obtained as follows:

Eφ3φ3TLn2=ΦEχp2+24(Δ)+δδTEχp2+44(Δ).

The second term is computed from normal theory

Eφ3φ1TLn1=EEφ3φ1TLn1|φ3=Eφ3Eφ1TLn1|φ3=Eφ3[μ11.2+(φ3δ)]TLn1=Eφ3μ11.2Ln1+Eφ3(φ3δ)TLn1=μ11.2TE{φ3Ln1}+E{φ3φ3TLn1}Eφ3δTLn1

From above, we can find Eφ3δTLn1=δδTEχp2+22(Δ) and Eφ3Ln1=δEχp2+22(Δ). Putting these terms together and simplifying, we obtain

Cov(β^1RSE)=B11.21+μ11.2μ11.2T+2(p22)μ11.2TδEχp2+22(Δ)(p22)Φ2Eχp2+22(Δ)(p22)Eχp2+24(Δ)+(p22)δδT2Eχp2+42(Δ)+2E(χp2+22(Δ))+(p22)Eχp2+44(Δ).

Since β^1RPS=β^1RSE(β^1RFMβ^1RSM)1(p22)Ln1I(Lnp22).

We derive the covariance of the estimator β^1RPS as follows.

Cov(β^1RPS)=Elimnn(β^1RPSβ1)n(β^1RPSβ1)T=E{limnn(β^1RSEβ1)n(β^1RFMβ^1RSM)1(p22)Ln1I(Lnp22)×n(β^1RSEβ1)n(β^1RFMβ^1RSM)1(p22)Ln1I(Lnp22)T}=E{limnn(β^1RSEβ1)n(β^1RSEβ1)T2φ3n(β^1RSEβ1)T1(p22)Ln1I(Lnp22)+φ3φ3T1(p22)Ln12I(Lnp22)}=Cov(β^1RSE)2Elimnφ3n(β^1RSEβ1)T1(p22)Ln12I(Lnp22)+Elimnφ3φ3T1(p22)Ln12I(Lnp22)=Cov(β^1RSE)2Elimnφ3φ1T1(p22)Ln1I(Lnp22)+2Elimnφ3φ3T(p22)Ln11(p22)Ln1I(Lnp22)+Elimnφ3φ3T1(p22)Ln12I(Lnp22)=Cov(β^1RSE)2Elimnφ3φ1T1(p22)Ln1I(Lnp22)Elimnφ3φ3T(p22)2Ln2I(Lnp22)+Elimnφ3φ3TI(Lnp22)

We first compute the last term in the equation above Eφ3φ3TI(Lnp22) as Eφ3φ3TI(Lnp22)=ΦHp2+2(p22;Δ)+δδTHp2+4(p22;Δ). Using Lemma 1 and from the normal theory, we find,

Eφ3φ1T{1(p22)Ln1}I(Lnp22)=EEφ3φ1T{1(p22)Ln1}I(Lnp22)|φ3=Eφ3Eφ1T{1(p22)Ln1}I(Lnp22)|φ3=Eφ3[μ11.2+(φ3δ)]T{1(p22)Ln1}I(Lnp22)=μ11.2Eφ31(p22)Ln1ILnp22+Eφ3φ3T1(p22)Ln1ILnp22Eφ3δT1(p22)Ln1ILnp22=δμ11.2TE1(p22)χp2+22(Δ)Iχp2+22(Δ)p22+ΦE1(p22)χp2+22(Δ)Iχp2+22(Δ)p22+δδTE1(p22)χp2+42(Δ)Iχp2+42(Δ)p22δδTE1(p22)χp2+42(Δ)Iχp2+42(Δ)p22.
Eφ3φ3T(p22)2Ln2I(Lnp22)=(p22)2ΦEχp2+24(Δ)Iχp2+22(Δ)p22+(p22)2δδTEχp2+24(Δ)Iχp2+22(Δ)p22

Putting all the terms together, we obtain

Cov(β^1RPS)=Cov(β^1RSE)+2δμ11.2TE1(p22)χp2+22(Δ)Iχp2+22(Δ)p222ΦE1(p22)χp2+22(Δ)Iχp2+22(Δ)p222δδTE{1(p22)χp2+42(Δ)}I(χp2+42(Δ)p22)+2δδTE1(p22)χp2+22(Δ)Iχp2+22(Δ)p22(p22)2ΦEχp2+24(Δ)Iχp2+2,α2(Δ)p22(p22)2δδTEχp2+24(Δ)Iχp2+22(Δ)p22+ΦHp2+2p22;Δ+δδTHp2+4p22;Δ.

 □

Author Contributions

Conceptualization, E.A.O. and S.E.A.; methodology, E.A.O. and F.S.N.; formal analysis, E.A.O.; writing—original draft preparation, E.A.O.; writing—review and editing, E.A.O., S.E.A. and F.S.N.; supervision, F.S.N. and S.E.A.; funding acquisition, F.S.N. and S.E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Sciences and Engineering Research Council of Canada (NSERC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here https://pubmed.ncbi.nlm.nih.gov/22434862/ (accessed on 20 April 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Laird N.M., Ware J.H. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. doi: 10.2307/2529876. [DOI] [PubMed] [Google Scholar]
  • 2.Longford N. Regression analysis of multilevel data with measurement error. Br. J. Math. Stat. Psychol. 1993;46:301–311. doi: 10.1111/j.2044-8317.1993.tb01018.x. [DOI] [Google Scholar]
  • 3.Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996;58:267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]
  • 4.Zou H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006;101:1418–1429. doi: 10.1198/016214506000000735. [DOI] [Google Scholar]
  • 5.Tran M.N. The loss rank criterion for variable selection in linear regression analysis. Scand. J. Stat. 2011;38:466–479. doi: 10.1111/j.1467-9469.2011.00732.x. [DOI] [Google Scholar]
  • 6.Huang J., Ma S., Zhang C.H. Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 2008;18:1603–1618. [Google Scholar]
  • 7.Kim Y., Choi H., Oh H.S. Smoothly clipped absolute deviation on high dimensions. J. Am. Stat. Assoc. 2008;103:1665–1673. doi: 10.1198/016214508000001066. [DOI] [Google Scholar]
  • 8.Wang H., Leng C. Unified LASSO estimation by least squares approximation. J. Am. Stat. Assoc. 2007;102:1039–1048. doi: 10.1198/016214507000000509. [DOI] [Google Scholar]
  • 9.Yuan M., Lin Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol. 2006;68:49–67. doi: 10.1111/j.1467-9868.2005.00532.x. [DOI] [Google Scholar]
  • 10.Leng C., Lin Y., Wahba G. A note on the lasso and related procedures in model selection. Stat. Sin. 2006;16:1273–1284. [Google Scholar]
  • 11.Park T., Casella G. The bayesian lasso. J. Am. Stat. Assoc. 2008;103:681–686. doi: 10.1198/016214508000000337. [DOI] [Google Scholar]
  • 12.Greenlaw K., Szefer E., Graham J., Lesperance M., Nathoo F.S., Initiative A.D.N. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics. 2017;33:2513–2522. doi: 10.1093/bioinformatics/btx215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ahmed S.E., Nicol C.J. An application of shrinkage estimation to the nonlinear regression model. Comput. Stat. Data Anal. 2012;56:3309–3321. doi: 10.1016/j.csda.2010.07.022. [DOI] [Google Scholar]
  • 14.Ahmed S.E., Raheem S.E. Shrinkage and absolute penalty estimation in linear regression models. Wiley Interdiscip. Rev. Comput. Stat. 2012;4:541–553. doi: 10.1002/wics.1232. [DOI] [Google Scholar]
  • 15.Lisawadi S., Kashif Ali Shah M., Ejaz Ahmed S. Model selection and post estimation based on a pretest for logistic regression models. J. Stat. Comput. Simul. 2016;86:3495–3511. doi: 10.1080/00949655.2016.1167894. [DOI] [Google Scholar]
  • 16.Ahmed S.E., Opoku E.A. Proceedings of the Tenth International Conference on Management Science and Engineering Management. Springer; Berlin/Heidelberg, Germany: 2017. Submodel selection and post-estimation of the linear mixed models; pp. 633–646. [Google Scholar]
  • 17.Raheem S.E., Ahmed S.E., Doksum K.A. Absolute penalty and shrinkage estimation in partially linear models. Comput. Stat. Data Anal. 2012;56:874–891. doi: 10.1016/j.csda.2011.09.021. [DOI] [Google Scholar]
  • 18.Geladi P., Kowalski B.R. Partial least-squares regression: A tutorial. Anal. Chim. Acta. 1986;185:1–17. doi: 10.1016/0003-2670(86)80028-9. [DOI] [Google Scholar]
  • 19.Liu K. Using Liu-type estimator to combat collinearity. Commun. Stat.-Theory Methods. 2003;32:1009–1020. doi: 10.1081/STA-120019959. [DOI] [Google Scholar]
  • 20.Hoerl A.E., Kennard R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67. doi: 10.1080/00401706.1970.10488634. [DOI] [Google Scholar]
  • 21.Yüzbaşı B., Ejaz Ahmed S. Shrinkage and penalized estimation in semi-parametric models with multicollinear data. J. Stat. Comput. Simul. 2016;86:3543–3561. doi: 10.1080/00949655.2016.1171868. [DOI] [Google Scholar]
  • 22.Yüzbası B., Ahmed S.E., Güngör M. Improved penalty strategies in linear regression models. REVSTAT J. 2017;15:251–276. [Google Scholar]
  • 23.Knight K., Fu W. Asymptotics for lasso-type estimators. Ann. Stat. 2000;28:1356–1378. [Google Scholar]
  • 24.Belsley D.A. Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley; Hoboken, NJ, USA: 1991. Number 519.536 B452. [Google Scholar]
  • 25.Twisk J., Kemper H., Mellenbergh G. Longitudinal development of lipoprotein levels in males and females aged 12–28 years: The Amsterdam Growth and Health Study. Int. J. Epidemiol. 1995;24:69–77. doi: 10.1093/ije/24.1.69. [DOI] [PubMed] [Google Scholar]
  • 26.Nie Y., Opoku E., Yasmin L., Song Y., Wang J., Wu S., Scarapicchia V., Gawryluk J., Wang L., Cao J., et al. Spectral dynamic causal modelling of resting-state fMRI: An exploratory study relating effective brain connectivity in the default mode network to genetics. Stat. Appl. Genet. Mol. Biol. 2020;19 doi: 10.1515/sagmb-2019-0058. [DOI] [PubMed] [Google Scholar]
  • 27.Ahmed S.E., Kim H., Yıldırım G., Yüzbaşı B. International Workshop on Matrices and Statistics. Springer; Berlin/Heidelberg, Germany: 2016. High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study; pp. 145–175. [Google Scholar]
  • 28.Ejaz Ahmed S., Yüzbaşı B. Big data analytics: Integrating penalty strategies. Int. J. Manag. Sci. Eng. Manag. 2016;11:105–115. doi: 10.1080/17509653.2016.1153252. [DOI] [Google Scholar]
  • 29.Ahmed S.E., Yüzbaşı B. Big and Complex Data Analysis. Springer; Berlin/Heidelberg, Germany: 2017. High dimensional data analysis: Integrating submodels; pp. 285–304. [Google Scholar]
  • 30.Judge G.G., Bock M.E. The Statistical Implication of Pre-Test and Steinrule Estimators in Econometrics. Elsevier; Amsterdam, The Netherlands: 1978. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here https://pubmed.ncbi.nlm.nih.gov/22434862/ (accessed on 20 April 2021).


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES