Skip to main content
Springer logoLink to Springer
. 2017 Jun 30;2017(1):158. doi: 10.1186/s13660-017-1432-x

Adaptive group bridge estimation for high-dimensional partially linear models

Xiuli Wang 1, Mingqiu Wang 1,
PMCID: PMC5493733  PMID: 28725135

Abstract

This paper studies group selection for the partially linear model with a diverging number of parameters. We propose an adaptive group bridge method and study the consistency, convergence rate and asymptotic distribution of the global adaptive group bridge estimator under regularity conditions. Simulation studies and a real example show the finite sample performance of our method.

Keywords: adaptive group bridge, high dimension, partially linear model

Introduction

Consider the following model:

Y=xTβ+f(U)+ε, 1

where x=(x1T,x2T,,xpnT)T is a covariate vector with xj=(Xjk,k=1,,dj)T being a dj×1 vector corresponding to the jth group in the linear part, β=(βjT,j=1,,pn)T with βj being the dj×1 vector of regression coefficients, f is an unknown function of U, and ε is the random error with mean zero. Without loss of generality, U is scaled to [0,1]. Furthermore, (x,U) and ε are independent.

Variable selection for high-dimensional data is a hot and important issue. Penalized regression methods have been widely used in the literature such as [15], and so on. Among these methods, bridge regression including lasso and ridge as two well-known special cases has been studied by many authors (e.g., [610]). [11] studied adaptive bridge estimation for high-dimensional linear models. In addition, group structure of variables arise always in many contemporary statistical modeling problems. [12] proposed a group bridge method which not only effectively removes unimportant groups, but also maintains the flexibility of selecting variables within identified groups. [13] investigated an adaptive choice of the penalty order in group bridge regression.

The aforementioned model (1) is just the partially linear model that originated from [14]. The partially linear model is a common semiparametric model enjoying the interpretability and flexibility. Our contributions in this paper include: (1) we propose an adaptive group bridge method to achieve the group selection for a high-dimensional partially linear model; (2) we consider the choice of index γ in the adaptive group bridge and use leave-one-observation-out cross-validation (CV) to implement this choice. It can significantly reduce the computational burden; (3) we give the consistency, convergence rate and asymptotic distribution of the adaptive group bridge estimator which is the global minimizer of the objective function.

The rest of the article is organized as follows. Section 2 gives the adaptive group bridge method. In Section 3, we show the assumptions and asymptotic results for the global adaptive group bridge estimator. Section 4 shows computational algorithm and selection of tuning parameters. Simulation studies and real data are presented in Section 5. Section 6 gives a short discussion. Technical proofs are relegated to Appendix.

Adaptive group bridge in the partially linear model

Suppose that we have a collection of independent observations {(xi,Ui,Yi),1in} from model (1). That is,

Yi=xiTβ+f(Ui)+εi,i=1,,n, 2

where ε1,,εn are i.i.d. random errors with mean zero and finite variance σ2<.

To obtain an estimate of function f(), we employ a B-spline basis. Denote Sn as the space of polynomial splines of degree m1. Let {Bk(u),1kqn} be a normalized B-spline basis with Bk1, where is the sup norm. Then, for any fnSn, we have

fn(u)=j=1qnBj(u)αjB(u)Tα.

Under some smoothness conditions, the nonparametric function f can well be approximated by functions in Sn.

Consider the following adaptive group bridge penalized objective function:

i=1n(YixiTβB(Ui)Tα)2+j=1pnλjβjγ, 3

where λj, j=1,,pn, are the tuning parameters, and denotes the L2 norm on the Euclidean space. Let Y=(Y1,,Yn)T, X=(Xijk,1in,1jpn,1kdj)=(x1,,xn)T and Z=(B(U1),,B(Un))T. Then (3) can be changed into

Ln(β,α)=YXβZα2+j=1pnλjβjγ. 4

For some β, the optimal α minimizing Ln() meets the partial differential equation

Ln(β,α)/α=0,

namely,

ZTZα=ZT(YXβ).

Let H=Z(ZTZ)1ZT, note that H is a projection matrix. We can rewrite the expression (4) as follows:

Qn(β)=(IH)(YXβ)2+j=1pnλjβjγ. 5

For some fixed γ>0, define βˆ=argminQn(β), then βˆ is called the adaptive group bridge estimator. If βˆ is obtained, then the estimator αˆ can be achieved. Thus we can get the estimator of the nonparametric part, namely, fˆn(u)=B(u)Tαˆ.

Asymptotic properties

In this section, we show the oracle property of the parametric part. For convenience of the statement, we first give some notations. Define g(u)=E(x|U=u) and x˜=xE(x|U). Let Σ(u) be the conditional covariance matrix of x˜, i.e., Σ(u)=cov(x˜|U=u). Denote Ω as the unconditional covariance matrix of x˜, i.e., Ω=E[Σ(U)]. The corresponding sample version is G=(g(U1),,g(Un))T with g(Ui)=E(xi|Ui) and X˜=(x˜1,,x˜n)T with x˜i=xiE(xi|Ui).

Let the true parameter be β0=(β01T,,β0pnT)T(β10T,β20T)T. Let A={1jpn:β0j0} be the index set of the nonzero groups. Without loss of generality, we assume that coefficients of the first kn group are nonzero, i.e., A={1,2,,kn}. Let |A|=kn be the cardinality of the set A, which is allowed to increase with n. For jA, β0j=0. Define β10=(β0jT,jA)T, β20=(β0jT,jA)T. Let d=max1jpndj, φn1=max{λj,jA} and φn2=min{λj,jA}.

Corresponding to the partition of β0, denote βˆ=(βˆ(1)T,βˆ(2)T)T and decompose

X=(X1X2),G=(G1G2),X˜=(X˜1X˜2),Ω=(Ω11Ω12Ω21Ω22).

The following conditions are required for the B-spline approximation of function f.

  1. The distribution of U is absolutely continuous, and its density is bounded away from 0 and ∞.

  2. (Hölder conditions of f() and gj(), where gj is the jth component of g) Let l, δ and M be real constants such that 0<δ1 and M>0. f() and gj() belong to a class of functions H,
    H={h:|h(l)(u1)h(l)(u2)|M|u1u2|δ,for 0u1,u21},
    where 0<lm1 and r=l+δ.

The following part lists all the reasonable conditions which are necessary to attain the asymptotic results.

  1. Let λmax(Ω) and λmin(Ω) be the largest and smallest eigenvalue of Ω, respectively. There exist constants τ1 and τ2 such that
    0<τ1λmin(Ω)λmax(Ω)τ2<.
  2. There exist constants 0<b0<b1< such that
    b0min{β0j,1jkn}max{β0j,1jkn}b1.
  3. n1XT(IH)XΩP0; E[tr(XT(IH)X)]=O(npn).

  4. d=O(1), pn2/n0 and n1φn1kn0.

  5. (a) φn1kn1/2/(npn+npnqnr)0; (b) φn2(n1pn+pnqnr)γ2/n.

  6. For every 1jpn and 1kdj, E[X1jkE(X1jk|U1)]4 is bounded. Furthermore, E(ε4) is bounded.

Conditions (A1) and (A2) are commonly used. Condition (A3) holds under some conditions. The proof can be found in Lemmas 1 and 2 in [15]. Condition (A4) is used to obtain the consistency of the estimator. Condition (A5) is needed in the proof of convergence rate. Condition (A6) is necessary to attain the asymptotic distribution.

Theorem 3.1

Consistency

Suppose that γ>0 and conditions (A1)-(A4) hold, then

βˆβ02=OP(n1dpn+qn2r+n1φn1kn),

namely, βˆβ0P0.

Theorem 3.1 implies that under some conditions the estimators converge to the true values of parameters.

Theorem 3.2

Convergence rate

Suppose that conditions (A1)-(A5) hold, then

βˆβ0=OP(n1pn+pnqnr).

This theorem shows that the adaptive group bridge can give the optimal convergence rate with pn.

Theorem 3.3

Oracle property

Suppose that 0<γ<1, n1knqn0 and nqn2r0. If conditions (A1)-(A6) are satisfied, then we have

  • (i)

    Pr(βˆ(2)=0)1, n;

  • (ii)
    Let un2=n2ωnT(X1T(IH)X1)1Ω11(X1T(IH)X1)1ωn with ωn being some j=1kndj-vector with ωn2=1, then
    n1/2un1ωnT(βˆ(1)β10)DN(0,σ2).

This theorem states that the adaptive group bridge performs as well as the oracle [16].

Computational algorithm and selection of tuning parameters

Computational algorithm

In this section, we apply the LQA algorithm proposed by [3] to compute the adaptive group bridge estimate.

We take the initial value β(0). Here the ordinary least square estimate is chosen as the initial value β(0). The penalty term pλj(βj)=λjβjγ can be approximated as

pλj(βj)pλj(βj(0))+12{pλj(βj(0))/βj(0)}(βj2βj(0)2),

when βj(0)>0. The following iterative expression of β can be obtained:

β(1)=[XT(IH)X+nΣλ,γ(β(0))]1XT(IH)Y, 6

where

Σλ,γ(β(0))=diag{pλj(βj(0))βj(0)Idj,j=1,,pn},

with Idj being a dj×dj unit matrix. If some βj(1) is smaller than 10−3, then we set βj(1)=0. The finial estimate can be obtained iteratively by formula (6) until the convergence is achieved.

Selection of the tuning parameters

For our method, qn, γ, and λj (j=1,,pn) should be chosen. For convenience, cubic spline basis (m=4) is used. We set qn=7. Simulation results demonstrate that this choice performs quite well. There are also many tuning parameters that should be chosen. In fact, we only need to select one tuning parameter by setting λj=λ/βj(0). We use ‘leave-one-observation-out’ cross-validation (CV) to select λ and γ. Due to the convergence of the algorithm, we have

βˆ=[XT(IH)X+nΣλ,γ(βˆ)]1XT(IH)Y,

where βˆ is obtained based on the whole data set. Note that it is the solution of the ridge regression

YXβ2+nβTΣλ,γ(βˆ)β, 7

where Y=(IH)Y and X=(IH)X. Let Y=(y1,,yn)T and X=(x1,,xn)T. The CV error is

CV(λ,γ)=1ni=1n(yixiTβˆi)2,

where βˆi is achieved by solving (7) without the ith observation. The computation of the CV error is intensive, so we will use the following formula, which can be proved similar to [17]:

CV(λ,γ)=1ni=1n(yixiTβˆ)2/(1Dii),

where Dii is the (i,i)th diagonal element of (IH)X[XT(IH)X+nΣλ,γ(βˆ)]1XT(IH). It is obvious that this method can significantly reduce the computational burden.

Simulation studies and application

In this section, we investigate the finite sample performance of the adaptive group bridge method through simulations and a real data application.

Monte Carlo simulations

We simulate 100 datasets consisting of n observations from the following partially linear model:

Yi=j=1pnxijTβj+cos(2πUi)+εi,i=1,,n,

where n=500, and the error εiN(0,σ2) with σ=0.5,1,4. We consider that there are pn groups with pn=10,30,50 and each group consists of three variables. The true values of parameters β1T=(0.5,1,1.5), β2T=(1,1,1), β3T=(0.5,0.5,0.5), β4T==βpnT=(0,0,0). Ui follows the uniform distribution on [0,1]. To generate covariate x=(x1T,x2T,,xpnT)T with xj=(Xjk,k=1,2,3)T, we first simulate R1,,R3pn independently from the standard normal distribution. Next, simulate Zj, j=1,,pn, from a multivariate normal distribution with the mean zero and Cov(Zj,Zl)=0.6|jl|. Then the covariates are generated as Xjk=(Zj+R3(j1)+k)/2, j=1,,pn, k=1,2,3.

We compare the adaptive group bridge (AGB) with the group lasso (GL) and the group bridge (GB). The following three performance measures are calculated:

  1. L2 loss of parametric estimate, which is defined as βˆβ0.

  2. Average number of nonzero groups identified by the method (NN).

  3. Average number of nonzero groups identified by the method that are truly nonzero (NNT).

Group selection results are depicted in Table 1. The numbers in the parentheses in the columns labeled ‘NN’ and ‘NNT’ are the corresponding sample standard deviations based on the 100 runs. Boxplots of the L2 losses under different settings are given in Figures 1-3.

Figure 2.

Figure 2

Boxplots of L2 loss for pn=30 .

Table 1.

Group selection results

pn Method σ  = 0.5 σ  = 1 σ  = 4
NN NNT NN NNT NN NNT
10 GL 7.80 3 7.59 3 6.02 3
(1.231) (0) (2.396) (0) (1.255) (0)
GB 4.64 3 4.80 3 4.55 3
(1.259) (0) (1.356) (0) (1.258) (0)
AGB 5.22 3 5.00 3 4.56 3
(1.605) (0) (1.735) (0) (1.131) (0)
30 GL 20.47 3 11.49 3 14.46 3
(2.504) (0) (4.464) (0) (2.022) (0)
GB 11.04 3 10.96 3 10.18 3
(3.643) (0) (3.784) (0) (2.350) (0)
AGB 13.06 3 10.64 3 9.57 3
(5.510) (0) (5.921) (0) (2.046) (0)
50 GL 33.17 3 16.48 3 22.37 3
(3.223) (0) (5.018) (0) (3.308) (0)
GB 17.23 3 17.79 3 15.96 3
(4.608) (0) (3.952) (0) (3.284) (0)
AGB 19.25 3 15.67 3 15.68 3
(8.437) (0) (8.385) (0) (3.684) (0)

Figure 1.

Figure 1

Boxplots of L2 loss for pn=10 .

Figure 3.

Figure 3

Boxplots of L2 loss for pn=50 .

From Table 1, we can have the following observations:

  1. Both GB and AGB perform better than GL for all settings. All these three methods can retain all the true nonzero groups, but GL always keeps more redundant groups that are unrelated with the response than both GB and AGB.

  2. AGB performs much better for larger σ and pn. When pn=50 for AGB, groups selected for the case σ=4 are about 18.5% lower than that for the case σ=0.5. While groups selected for GB decrease by 7.37% in the same situation.

  3. For pn=10, GB performs better than AGB, but the stability of GB is bad for σ=4.

Figures 1-3 present L2 losses with varying σ and pn. We can see that the performances of estimates are similar for GB and AGB. For pn=30 and 50, both GB and AGB perform better than GL. However, when pn=50, the median of L2 losses for all these three are similar for σ=0.5 and 4, but the L2 losses of GL fluctuate more widely.

Wage data analysis

The workers’ wage data from Berndt[18] contains a random sample of 534 observations on 11 variables sampled from the current population survey of 1985. It provides information on wages and other characteristics of the workers, including continuous variables: the number of years of education, years of work experience, age and nominal variables: race, sex, region of residence, occupational status, sector, marital status and union membership. Our goal is to study the important factors for the wage, so it is reasonable to use our proposed method for these data.

From the residual plot, we can easily see that the variance of wages is not a constant. So the log transformation is used to stabilize the variance of wages. Due to the multicollinearity problem between age and experience, we need to get rid of either age or experience. Here we remove the age variable from the model. Xie and Huang [15] analyzed these data without considering the transformation of Y. Furthermore, they did not consider group selection of factors. Similar to Xie and Huang [15], we fit these data using a partially linear model with U being ‘years of work experience’.

Table 2 reports estimated regression coefficients of GL, GB and AGB. All these three methods exclude marital status. We use the first 400 observations as a training dataset to select and fit the model, and use the rest of 134 observations as a testing dataset to evaluate the prediction ability of the selected model. The prediction performance is measured by the median of {|yiyˆi|,i=1,2,,134} for GL, GB and AGB using the testing data, respectively. Here yi’s are those 134 observations in the testing dataset and yˆi’s are corresponding prediction values. The median absolute prediction errors of GL, GB and AGB are 0.3072, 0.3062 and 0.3022, respectively. Therefore, we can conclude that the AGB gives the smallest prediction error, so it is an attractive technique in group selection.

Table 2.

Estimates of the wage data

Variable Description GL GB AGB
edu Number of years of education 0.0694 0.0668 0.0635
south 1 = southern region, 0 = other −0.0723 −0.0679 −0.0490
sex 1 = Female, 0 = Male −0.1999 −0.1983 −0.2031
union 1 = union member, 0 = nonmember 0.1951 0.1934 0.2030
race 1 = other, 0 = White −0.0559 −0.0585 −0.0582
1 = Hispanic, 0 = White −0.0537 −0.0615 −0.0614
occup 1 = management, 0 = other 0.1874 0.2173 0.2516
1 = sales, 0 = other −0.0797 −0.0809 −0.0721
1 = clerical, 0 = other 0.0166 0.0262 0.0430
1 = service, 0 = other −0.1171 −0.1173 −0.1104
1 = professional, 0 = other 0.1533 0.1768 0.2061
sector 1 = manufacturing, 0 = other 0.0848 0.0912 0.0994
1 = construction, 0 = other 0.0546 0.0622 0.0674
marr 1 = married, 0 = other 0.0000 0.0000 0.0000

Discussion

This paper studies group selection for high-dimensional partially linear model with the adaptive group bridge method. We also consider the choice of γ in the bridge penalty. It is worth mentioning that we use ‘leave-one-observation-out’ cross-validation to select both λ and γ. This method can significantly reduce the computational burden. This is the first try to use this method in group selection for the partially linear model.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 11401340).

Appendix

Proof of Theorem 3.1

By the definition of βˆ, it is easy to get

(IH)(YXβˆ)2+j=1pnλjβˆjγ(IH)(YXβ0)2+j=1pnλjβ0jγ,

that is,

(IH)(YXβˆ)2(IH)(YXβ0)2j=1pnλjβ0jγ.

As Y=Xβ0+f(U)+ε with f(U)=(f(U1),,f(Un))T and ε=(ε1,,εn)T, we can rewrite the upper inequality as follows:

(IH)X(βˆβ0)22(f(U)+ε)T(IH)X(βˆβ0)j=1pnλjβ0jγ.

Let

an=n1/2[XT(IH)X]1/2(βˆβ0),bn=n1/2[XT(IH)X]1/2XT(IH)(f(U)+ε).

Then we have

an22(anbn2+bn2)2nj=1pnλjβ0jγ+4bn2.

Since |A|=kn, under condition (A2),

2nj=1pnλjβ0jγ=O(φn1knn).

While

bn2=1n(f(U)+ε)T(IH)X[XT(IH)X]1XT(IH)(f(U)+ε)2nεTAε+2nf(U)TAf(U), 8

where

A=(IH)X[XT(IH)X]1XT(IH).

For the first term on the right-hand side of (8),

E(1nεTAε)=σ2ntr(E(A))n1dpnσ2.

Thus

n1εTAε=OP(n1dpn). 9

For the second term on the right-hand side of (8), by conditions (C1) and (C2),

E(1nf(U)TAf(U))1nE{λmax{(IH)X[XT(IH)X]1XT(IH)}×tr[f(U)T(IH)f(U)]}=1nE[f(U)T(IH)f(U)]=O(qn2r). 10

Combining (9)-(10),

bn2=OP(n1dpn+qn2r).

By conditions (A1) and (A3),

Ean2=1nE[(βˆβ0)TXT(IH)X(βˆβ0)]=E[(βˆβ0)T(1nXT(IH)XΩ)(βˆβ0)]+E[(βˆβ0)TΩ(βˆβ0)]τ12Eβˆβ02.

Therefore

βˆβ02=OP(n1dpn+qn2r+n1φn1kn).

Under condition (A4), we have

βˆβ0P0.

 □

Proof of Theorem 3.2

Let μn=n1pn+qnr+n1φn1kn, we can choose a sequence {rn,rn>0} which satisfies rn0. Partition Rj=1pndj{0} into shells {Snj:j=1,2,}, where Snj={β:2j1rnββ0<2jrn}. For an arbitrary fixed constant LR+, if βˆβ0 is larger than 2Lrn, βˆ is in one of the shells with jL, we have

Pr(βˆβ02Lrn)=l>L,2lrn>2L1μnPr(βˆSnl)+l>L,2lrn2L1μnPr(βˆSnl)(L1 is an arbitrary constant),

where

l>L,2lrn>2L1μnPr(βˆSnl)Pr(βˆβ02L11μn)=o(1),

and

l>L,2lrn2L1μnPr(βˆSnl)=l>L,2lrn2L1μnPr(βˆSnl,Δnτ12)+l>L,2lrn2L1μnPr(βˆSnl,Δn>τ12),

where Δn=n1XT(IH)XΩ. By condition (A3),

l>L,2lrn2L1μnPr(βˆSnl,Δn>τ12)Pr(Δn>τ12)=o(1).

Therefore,

Pr(βˆβ02Lrn)=o(1)+l>L,2lrn2L1μnPr(infβSnl(Qn(β)Qn(β0))<0,Δnτ12).

Since

Qn(β)Qn(β0)=(IH)X(ββ0)22(f(U)+ε)T(IH)X(ββ0)+j=1pnλj(βjγβ0jγ)(IH)X(ββ0)22(f(U)+ε)T(IH)X(ββ0)+j=1knλj(βjγβ0jγ)=ΔIn1+In2+In3.

For In1,

In1infβSnlnτ12ββ02,

for all βSnl, there exists ββ0222l2rn2, therefore In1nτ122l3rn2.

For In3, we have

|In3|=j=1knλjγβjγ1(βjβ0j)φn1γj=1knβjγ1βjβ0j,

where βj is between βj and β0j. By condition (A2) and since we only need to consider β with βSnl, 2lrn2L1μn, there exists a constant C3>0 such that

|In3|C3φn1γj=1knβjβ0jC3φn1kn1/2γββ0.

So for all βSnl such that |In3|C3φn1kn1/2γ2lrn, by the Markov inequality, we have

Pr(infβSnl(Qn(β)Qn(β0))0)Pr(supβSnl|In2|nτ122l3rn2C3φn1kn1/2γ2lrn)E(supβSnl|In2|)nτ122l3rn2C3φn1kn1/2γ2lrn.

Using the Cauchy-Schwarz inequality, we have

E(supβSnl|In2|)2[E((f(U)+ε)T(IH)XXT(IH)(f(U)+ε))]1/2×[E(supβSnlββ02)]1/22l+3/2rn[E(εT(IH)XXT(IH)ε)+E(f(U)T(IH)XXT(IH)f(U))]1/2,

where

E(εT(IH)XXT(IH)ε)=σ2E(tr((IH)XXT(IH)))=O(npn)

and

E[f(U)T(IH)XXT(IH)f(U)]E[tr(XT(IH)X)tr(f(U)T(IH)f(U))]=O(npn)O(nqn2r)=O(n2pnqn2r).

Accordingly,

E(supβSnl|In2|)C42lrn(npn+npnqnr).

Then we can get

l>L,2lrn2L1μnPr(βˆSnl)l>LC42lrn(npn+npnqnr)nτ122l3rn2C3φn1kn1/2γ2lrn.

We choose rn=(pn/n+pnqnr), we have

l>L,2lrn2L1μnPr(βˆSnl)=l>LC4τ12l3C3φn1kn1/2γ/(npn+npnqnr).

By condition (A5)(a) φn1kn1/2/(npn+npnqnr)0, for sufficiently large n,

2l3C3τ11λjkn1/2/(npn+npnMnrg)2l4.

Thus

l>L,2lrn2L1μnPr(βˆSnl)l>LC42l4C42(L5).

Let L, then

l>L,2lrn2L1μnPr(βˆSnl)0.

Hence

βˆβ0=OP(n1pn+pnqnr).

 □

Proof of Theorem 3.3

(i) By Theorem 3.2, for sufficiently large C5, βˆ lies in the ball {β:ββ0vnC5} with probability converging to 1, where vn=n1pn+pnqnr. Let β(1)=β10+vnν1 and β(2)=β20+vnν2=vnν2 with ν2=ν12+ν22C52. Let

Vn(ν1,ν2)=Qn(β(1),β(2))Qn(β10,0)=Qn(β10+vnν1,vnν2)Qn(β10,0).

Then βˆ1 and βˆ2 can be attained by minimizing Vn(ν1,ν2) over νC5, except on an event with probability converging to zero. We only need to show that, for some ν1 and ν2 with νC5, if ν2>0,

Pr(Vn(ν1,ν2)Vn(ν1,0)>0)1,n.

Some simple calculations show that

Vn(ν1,ν2)Vn(ν1,0)=vn2(IH)X2ν22+2vn2(X1ν1)T(IH)(X2ν2)2vn(f(U)+ε)T(IH)(X2ν2)+jAλjvnν2jγ=ΔIIn1+IIn2+IIn3+IIn4.

For the first two terms IIn1 and IIn2,

IIn1+IIn2vn2(IH)X1ν12=nvn2C52(oP(1)+τ2).

For IIn3, we have

E[(f(U)+ε)T(IH)X2ν2]22{E[f(U)T(IH)X2ν2ν2TX2T(IH)f(U)+εT(IH)X2ν2ν2TX2T(IH)ε]}C6{E[tr(X2T(IH)X2)tr(f(U)T(IH)f(U))]+σ2E[tr(X2T(IH)X2)]}=O(n2pnqn2r+npn).

Thus we have

IIn3=vn(npn1/2qnr+n1/2pn1/2)OP(1).

For IIn4, by 0<γ<1,

(jAvnν2jγ)1/γ(jAvnν2j2)1/2=vnν2.

Accordingly,

IIn4φn2vnγν2γ.

By condition (A5)(b), for some ν2>0, we have

Pr(Vn(ν1,ν2)Vn(ν1,0)>0)1.

(ii) Let ωn be some j=1kndj-vector with ωn2=1. By Theorem 3.2(i), with probability tending to 1, we have the following result:

Qn(β(1))β(1)|β(1)=βˆ(1)=X1T(IH)X1(βˆ(1)β10)X1T(IH)(f(U)+ε)+ξn=0,

where ξn=(λ1γβˆ1γ2βˆ1T,,λknγβˆknγ2βˆknT)T. We consider the limit distribution

n1/2ωnTΩ111/2[X1T(IH)X1](βˆ(1)β10)=n1/2ωnTΩ111/2X1T(IH)f(U)+n1/2ωnTΩ111/2X1T(IH)εn1/2ωnTΩ111/2ξn=ΔJn1+Jn2+Jn3.

For Jn1,

Jn12=n1|ωnTΩ111/2X1T(IH)f(U)|2=OP(nqn2r).

For Jn3, by conditions (A2) and (A4), we have

E(Jn32)n1τ11φn1γ2j=1knEβˆj2(γ1)=O(n1φn1kn).

For Jn2,

Jn2=n1/2ωnTΩ111/2G1T(IH)ε+n1/2ωnTΩ111/2X˜1Tεn1/2ωnTΩ111/2X˜1THε=ΔKn1+Kn2+Kn3.

Under conditions (C1) and (C2),

EKn12=n1ωnTΩ111/2E[G1T(IH)εεT(IH)G1]Ω111/2ωn=O(knqn2r).

By condition (A6), we have

EKn32=n1ωnTΩ111/2E[X˜1THεεTHX˜1]Ω111/2ωn=O(n1knqn).

Now we focus on Kn2

Kn2=n1/2ωnTΩ111/2X˜1Tε=Δ1ni=1nsniεi.

First,

E(sniεi)=0;Var(i=1nsniεi)=i=1nVar(sniεi)=σ2.

Next we verify the conditions of the Lindeberg-Feller central limit. For any ϵ>0,

i=1nE[(sni2εi2)1(|sniεi|>ϵ)]=nE[(sn12ε12)1(|sn1ε1|>ϵ)]n[E(sn14ε14)]1/2[Pr(|sn1ε1|>ϵ)]1/2.

By condition (A6),

E(sn14ε14)=n2E{ωnTΩ111/2[x1E(x1|U1)][x1E(x1|U1)]TΩ111/2ωn}2Eε14n2ρmin2(ωnωnT)ρmax2(Ω111)E{[x1E(x1|U1)]T[x1E(x1|U1)]}2Eε14n2ρmin2(ωnωnT)ρmax2(Ω111)Eε14kndj=1knk=1djE[X1jkE(X1jk|U1)]4=O(kn2n2)

and

P(|sn1ε1|>ϵ)1ϵ2E(sn1ε1)2=σ2ϵ2n1ωnTΩ111/2E{[x1E(x1|U1)][x1E(x1|U1)]T}Ω111/2ωn=σ2ϵ2n1=O(n1).

Thus we have

i=1nE[(sni2εi2)1(|sniεi|>ϵ)]=O(nknn1n1/2)=o(1).

This means that Kn2DN(0,σ2). Using Slutsky’s theorem, we have

n1/2ωnTΩ111/2[X1T(IH)X1](βˆ(1)β10)DN(0,σ2).

Let un2=n2ωnT(X1T(IH)X1)1Ω11(X1T(IH)X1)1ωn, then

n1/2un1ωnT(βˆ(1)β10)DN(0,σ2).

 □

Footnotes

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B. 1996;58:267–288. [Google Scholar]
  • 2.Frank I, Friedman J. A statistical view of some chemometrics regression tools. Technometrics. 1993;35:109–148. doi: 10.1080/00401706.1993.10485033. [DOI] [Google Scholar]
  • 3.Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001;96:1348–1360. doi: 10.1198/016214501753382273. [DOI] [Google Scholar]
  • 4.Zou H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006;101:1418–1429. doi: 10.1198/016214506000000735. [DOI] [Google Scholar]
  • 5.Zou H, Hastie T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B. 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x. [DOI] [Google Scholar]
  • 6.Fu W. Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 1998;7:397–416. [Google Scholar]
  • 7.Knight K, Fu W. Asymptotics for lasso-type estimators. J. Comput. Graph. Stat. 2000;28:1356–1378. [Google Scholar]
  • 8.Liu Y, Zhang H, Park C, Ahn J. Support vector machines with adaptive lq penalty. Comput. Stat. Data Anal. 2007;51:6380–6394. doi: 10.1016/j.csda.2007.02.006. [DOI] [Google Scholar]
  • 9.Huang J, Horowitz J, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Stat. 2008;36:587–613. doi: 10.1214/009053607000000875. [DOI] [Google Scholar]
  • 10.Wang M, Song L, Wang X. Bridge estimation for generalized linear models with a diverging number of parameters. Stat. Probab. Lett. 2010;80:1584–1596. doi: 10.1016/j.spl.2010.06.012. [DOI] [Google Scholar]
  • 11.Chen Z, Zhu Y, Zhu C. Adaptive bridge estimation for high-dimensional regression models. J. Inequal. Appl. 2016;2016 doi: 10.1186/s13660-016-1205-y. [DOI] [Google Scholar]
  • 12.Huang J, Ma S, Xie H, Zhang C. A group bridge approach for variable selection. Biometrika. 2009;96:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Park C, Yoon Y. Bridge regression: adaptivity and group selection. J. Stat. Plan. Inference. 2011;141:3506–3519. doi: 10.1016/j.jspi.2011.05.004. [DOI] [Google Scholar]
  • 14.Engle R, Granger C, Rice J, Weiss A. Semiparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc. 1986;81:310–320. doi: 10.1080/01621459.1986.10478274. [DOI] [Google Scholar]
  • 15.Xie H, Huang J. Scad-penalized regression in high-dimensional partially linear models. Ann. Stat. 2009;37:673–696. doi: 10.1214/07-AOS580. [DOI] [Google Scholar]
  • 16.Donoho D, Johnstone I. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455. doi: 10.1093/biomet/81.3.425. [DOI] [Google Scholar]
  • 17.Wang L, Li H, Huang J. Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 2008;103:1556–1569. doi: 10.1198/016214508000000788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Berndt ER. The Practice of Econometrics: Classical and Contemporary. Reading: Addison-Wesley; 1991. [Google Scholar]

Articles from Journal of Inequalities and Applications are provided here courtesy of Springer

RESOURCES