Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Mar 23;50(3):512–534. doi: 10.1080/02664763.2021.1904847

Model estimation and selection for partial linear varying coefficient EV models with longitudinal data

Mingtao Zhao a, Xiaoli Xu b, Yanling Zhu a, Kongsheng Zhang a, Yan Zhou c,CONTACT
PMCID: PMC9930868  PMID: 36819082

Abstract

In this paper, we consider the estimation and model selection for longitudinal partial linear varying coefficient errors-in-variables (EV) models when the covariates are measured with some additive errors. Bias-corrected penalized quadratic inference functions method is proposed based on quadratic inference functions with two penalty function terms. The proposed method can not only handle the measurement errors of covariates and within-subject correlations but also estimate and select significant non-zero parametric and nonparametric components simultaneously. With some regularization conditions, the resulting estimators of parameters are asymptotically normal and the estimators of nonparametric varying coefficient achieves the optimal convergence rate. Furthermore, we present simulation studies and a real example analysis to evaluate the finite sample performance of the proposed method.

Keywords: Longitudinal data, variable selection, partial linear varying coefficient EV models, quadratic inference function

1. Introduction

Varying coefficient models [7] have better interpretability and flexibility than linear models and can avoid the curse of dimensionality. They are usually applied for analysis of longitudinal and clustered data. In varying coefficient model, the regression coefficients are unknown nonparametric functions and allowed to depend on time or some other covariates. They facilitate the study of dynamic features. A survey about studies for varying coefficient models can be seen in [15].

As we know, not all of the coefficients are varying in some special cases. Thus, we consider the partial linear varying coefficient model [12] based on longitudinal data. Suppose the longitudinal data

(Yij,Xij,Zij,tij):i=1,2,,n,j=1,2,,ni

satisfies the following partial linear varying coefficient model as

Yij=XijTβ+ZijTα(tij)+ϵij, (1)

where YijR and (Xij,Zij)Rp×Rq are the response and covariate at tijR, Xij=(Xij1,Xij2,,Xijp)T, Zij=(Zij1,Zij2,,Zijq)T, ϵijR is a zero-mean stochastic process, ϵij and (Xij,Zij) are independent of each other. β=(β1,β2,,βp)T is a regression parameter vector. tR, α(t)=(α1(t),α2(t),,αq(t))T is coefficient functions vector with αl(t)(l=1,2,,q) being unknown smoothing coefficient functions of t, t[0,1]. We further state some assumptions on the first two moments of model (1) as E(Yij|Xij,Zij,tij)=μij and var(Yij|Xij,Zij,tij)=ν(μij), where E(), var() are the expectation and variance respectively, and ν() is known function.

Model (1) has the advantages of linear model and varying coefficient model, can reduce the modeling bias and avoid the curse of dimensionality. Recently, it has been studied by some different statistical methods, such as local polynomial fitting method [32], profile least square method [4], empirical likelihood method [10,30], quantile regression method [23], penalized quadratic inference function (pQIF) method [20] and so on. An important assumption in these methods is that covariates can be observed.

However, as all we know, it is impossible to get accurate data in practice, especially for some important covariates. No matter how data is collected, measurement errors are unavoidable or some covariates are unobserved. Ignoring measurement errors may result in biased estimators or even uncorrected results. Therefore, it is meaningful to incorporate measurement errors into model (1). In view of these, we consider that X and Z are measured with additive errors in model (1), which is the so-called partial linear varying coefficient errors-in-variables (EV) model as

Yij=XijTβ+ZijTα(tij)+ϵijWij=Xij+wijUij=Zij+uij,i=1,2,,n,j=1,2,,ni, (2)

where Wij=(Wij1,Wij2,,Wijp)TRp and Uij=(Uij1,Uij2,,Uijq)TRq can be observed directly, wij=(wij1,wij2,,wijp)TRp and uij=(uij1,uij2,,uijq)TRq are zero-mean measurement errors with diagonal covariance matrix Σw and Σu respectively. In addition, we assume that cov(wij1,wij2)=0,cov(uij1,uij2)=0 for j1j2, wij and uij are independent of each other and are all independent of (Xij,Zij,tij,ϵij), where cov() denotes the covariance operator. Although these assumptions are not the weakest possible condition, to deal with measurement errors, we need extra information about Σw and Σu in practice. For example, we usually assume that Σw and Σu are known or can be estimated.

Model (2) was studied in the literatures. For the case that only Xij is measured with additive error, You and Chen [30] proposed two different estimators for parametric and nonparametric components with cross-sectional data. Empirical likelihood inference can be seen in Hu et al. [9], Zhao and Xue [35], Xia and Da[28], Zhou et al. [38], Fan et al. [3] and Wang et al. [25]. The cases for some linear covariates are unobserved with available ancillary variables can been in Zhou and Liang [37]. Variable selection procedure for the high-dimensional situation was studied by Wang and Xue [26]. The estimation and testing problems can be seen in Zhang et al. [33]. Wang and Zou [22] studied the model average problem of model (2). Wei [27] proposed a restricted modified profile least squares estimator for the parametric components.

For the models that only Zij is unobserved and measured with additive error, empirical likelihood inference and Local bias-corrected restricted profile least squares estimators can be used for model estimation [2,6]. For generalized partial linear varying coefficient model with some linear covariates being error prone but ancillary variables being available, Zhang et al. [31] proposed a variable selection method. Zhao and Xue [36] proposed a variable selection method for the case that Xij and Zij are measured with errors simultaneously based on the cross-sectional data.

On the other hand, model selection is an important topic for longitudinal data analysis. Some valuable research can be seen in Tian and Xue [19], Zhao et al. [34]. As far as we know, there is no study being reported on model selection for model (2) with longitudinal data when Xij and Zij are measured with additive errors simultaneously. Taking this issue into account, inspired by [34] and [19], we mainly study the variable selection for model (2). In view of the advantages of quadratic inference functions (QIF)[16] versus generalized estimating equations (GEE) [14], a bias-corrected penalized quadratic inference functions (pQIF) method for model (2) is proposed in this paper, which can estimate and select non-zero regression parameters and coefficient functions simultaneously. Furthermore, the asymptotic properties of the proposed method and estimators are constructed.

The rest of this paper is organized as follows. In section 2, we propose the bias-corrected pQIF method. In section 3, we study the asymptotic properties of model estimation and selection results. Some issues in practical implementation are presented in section 4. Simulation studies and a real example analysis are presented in section 5. In Section 6, we present a brief conclusion and discussion of the results and methods. The proofs of some asymptotic results are provided in the Appendix.

2. Model estimation and selection method

Denote B-spline basis vector B(t) with the order d as B(t)=(B1(t),B2(t),,BL(t))T, where L = K + d, K(>0) is the number of interior knots. Hence, following He et al. [8], αl(t)(l=1,2,,q) can be represented approximately as

αl(t)B(t)Tγl,l=1,2,,q, (3)

where γlRK+d is a regression coefficient vector of B-spline basis.

Replace αl(t)(l=1,2,,q) by (3), model (2) can be represented as

YijXijTβ+Z~ijTγ+ϵijWij=Xij+wijU~ij=Z~ij+u~ij, (4)

where γ=(γ1T,γ2T,,γqT)T, Bij=IqB(tij), Z~ij=BijZij, U~ij=BijUij, u~ij=Bijuij, Iq is the q×q identity matrix. From the assumption of model (2), we can see that wij and u~ij are independent of each other, and are all independent of (Xij,Zij,tij,ϵij), E(u~ij)=0, cov(u~ij)=Σu~, Σu~=BijΣuBijT, cov(u~ij1,u~ij2)=0 for j1j2. From (4), we can get the GEE about θ=(βT,γT)T as

i=1n(Wi,U~i)TVi1(Yi(Wi,U~i)θ)=0 (5)

Then we can get

Ei=1n{(Wi,U~i)TVi1(Yi(Wi,U~i)θ)}=Ei=1n{(Xi,Z~i)TVi1(Yi(Xi,Z~i)θ)}nE((wi,u~i)TVi1(wi,u~i)θ)=nE((wi,u~i)TVi1(wi,u~i)θ)0.

This shows that Equation (5) is biased, then we can get the bias-corrected GEE about θ as

i=1n(Wi,U~i)TVi1(YiWiβU~iγ)+Diθ=0, (6)

where Wi=(Wi1,Wi2,,Wini)T, U~i=(U~i1,U~i2,,U~ini)T, Yi=(Yi1,Yi2,,Yini)T, Di=E(wi,u~i)TVi1(wi,u~i)), wi=(wi1,wi2,,wini)T, u~i=(u~i1T,u~i2T,,u~iniT)T, Vi is the covariance of Yi. Obviously, equation (6) is unbiased. From the GEE method, we take Vi as Vi=Ai1/2Ri(ρ)Ai1/2, where Ai=diag(var(Yi1),var(Yi2),,var(Yini))=diag(var(ϵi1),var(ϵi2),,var(ϵini)), Ri(ρ) is a working correlation matrix, ρ is a nuisance parameter. Liang and Zeger [14] pointed out that consistent estimator of ρ may not exist in some certain simple cases, which may invalidate the GEE method.

To overcome this drawback of the GEE, Qu and Li [17] proposed the QIF method to analyze longitudinal data by assuming that Ri1(ρ)=κ=1saκMκ, where Mκ(κ=1,2,,s) are some simple known matrices, aκ(κ=1,2,,s) are unknown constants. This approach treats aκ(κ=1,2,,s) as nuisance parameters [17]. Substituting it into (6), we get the new bias-corrected GEE as

i=1n(Wi,U~i)TAi1/2κ=1saκMκAi1/2(YiWiβU~iγ)+E(wi,u~i)TAi1/2κ=1saκMκAi1/2(wi,u~i)θ=0. (7)

Unlike the GEE method, we do not need to estimate a=(a1,a2,,as). Instead, define the bias-corrected extended score function g¯n(θ) as

g¯n(θ)=1ni=1ngi(θ)=1ni=1n(Wi,U~i)TAi1/2M1Ai1/2(YiWiβU~iγ)+Di(1)θ(Wi,U~i)TAi1/2MsAi1/2(YiWiβU~iγ)+Di(s)θ, (8)

where Di(κ)=E((wi,u~i)TAi1/2MκAi1/2(wi,u~i)), κ=1,2,,s. Obviously, wij and uij are independent of each other, so we can get that, wi and u~i are independent of each other and E(wiTAi1/2MκAi1/2u~i)=0 and E(u~iTAi1/2MκAi1/2wi)=0. we can get

Di(κ)=D11,i(κ)00D22,i(κ), (9)

where D11,i(κ)=E(wiTAi1/2MκAi1/2wi), D22,i(κ)=E(u~iTAi1/2MκAi1/2u~i), κ=1,2,,s. By some simple matrix calculations, following as Zhao et al. (2020)[34], we have

D11,i(κ)=tr(Ai1/2MκAi1/2)Σw (10)
D22,i(κ)=Σu(Bidiag(Ai1/2MκAi1/2)BiT) (11)

where Bi=(B(ti1),B(ti2),,B(tini)), diag() denotes a diagonal matrix operator. However, the covariance matrix Σu and Σv are usually unknown in advance, so we need to estimate Σu in practice. Under some conditions, Σu and Σv can usually be estimated by partial replication similar as [1].

If the longitudinal data is balanced, that is, ni=n0<(i=1,2,,n). Suppose that Wij and Uij can be observed mi times for ith subject, Wij(r)=Xij+wij(r), Uij(r)=Zij+uij(r), r=1,2,,mi, we can get two consistent, unbiased estimators Σˆw and Σˆu for Σw and Σu, respectively, as

Σˆw=1nn0i=1nj=1n01mi1r=1mi(Wij(r)W¯ij)(Wij(r)W¯ij)T (12)
Σˆu=1nn0i=1nj=1n01mi1r=1mi(Uij(r)U¯ij)(Uij(r)U¯ij)T (13)

where W¯ij=mi1i=1miWij(r), U¯ij=mi1i=1miUij(r). Furthermore, we can get two consistent, unbiased estimators Dˆ11,i(κ) and Dˆ22,i(κ) for D11,i(κ) and D22,i(κ), respectively, as

Dˆ11,i(κ)=tr(Ai1/2MκAi1/2)Σˆw (14)
Dˆ22,i(κ)=Σˆu(Bidiag(Ai1/2MκAi1/2)BiT) (15)

Substituting (14) and (15) into (9), we can get a consistent, unbiased estimator Dˆi(κ) for Di(κ) as

Dˆi(κ)=Dˆ11,i(κ)00Dˆ22,i(κ),κ=1,2,,s. (16)

If the longitudinal data is unbalanced, following Xue et al. [29], it can be reformulated to balanced, the details are omitted here.

According to (16), we can get a estimator g¯ˆn(θ) for g¯n(θ) as

g¯ˆn(θ)=1ni=1ngˆi(θ)=1ni=1n(Wi,U~i)TAi1/2M1Ai1/2(YiWiβU~iγ)+Dˆi(1)θ(Wi,U~i)TAi1/2MsAi1/2(YiWiβU~iγ)+Dˆi(s)θ. (17)

Obviously, g¯ˆn(θ) is a s(p+q(K+d))×1 vector, however, θ is a (p+q(K+d))×1 parameter vector. Equation E(g¯ˆn(θ))=0 is over-identified and can not be used to solve the θˆ. To solve this problem, following Qu et al. [16], we construct the bias-corrected QIF about θ as

Qn(θ)=ng¯ˆnT(θ)Ωn1g¯ˆn(θ), (18)

where Ωn=1ni=1ngˆi(θ)gˆiT(θ). Furthermore, we can get θ~ as

θ~=argminθQn(θ). (19)

As mentioned above, the bias-corrected QIF can correct the bias of estimating equations and handle within-subject correlations simultaneously. However, the bias-corrected QIF method for nonparametric coefficient functions is the spline regression approach and usually over-fitted. Not only that, but the true model is unknown in practice. To solve these issues, we construct the bias-corrected pQIF to estimate and select significant parameters and varying coefficients simultaneously, defined as

Qp(θ)=Qn(θ)+nk=1ppλ1k(|βk|)+nl=1qpλ2l(γlH), (20)

where γlH=(γlTHγl)1/2, H=(hij)L×L, hij=01Bi(t)BjT(t)dt, pλ() is the SCAD penalty function [5] defined as

pλ(w)=λI(wλ)+(aλw)+(a1)λI(w>λ), (21)

where a = 3.7, w>0 and pλ(0)=0, λ is a tuning parameter and measures the amount of penalty. Therefore, denote λ1k(k=1,2,,p) and λ2l(l=1,2,,q) for βk(k=1,2,,p) and αl(t)(l=1,2,,q) respectively in (20). The bias-corrected pQIF estimator θˆ is given by

θˆ=(βˆT,γˆT)T=argminθQp(θ). (22)

Furthermore, the estimators of βl(t)(l=1,2,,q) can be obtained by

αˆl(t)=B(t)Tγˆl,l=1,2,,q. (23)

3. Asymptotic properties

We now construct the asymptotic properties of βˆk(k=1,2,,p) and αˆl(t)(l=1,2,,q). Firstly, let β0=(β10,β20,,βp0)T and α0(t)=(α10(t),α20(t),,αq0(t))T be the true regression parameters and coefficient functions, γl0(l=1,2,,q) be B-spline regression coefficient vectors from the spline approximation to αl0(t)(l=1,2,,q). Furthermore, we assume that

βk00(k=1,2,,p1),βk0=0(k=p1+1,p1+2,,p),αl0(t)0(l=1,2,,q1),αl0(t)=0(l=q1+1,q2+2,,q).

Some necessary regularity conditions for the asymptotic properties are as follows.

  • C1:

    0<ni< for i=1,2,,n.

  • C2:

    αl(t)(l=1,2,,q) are rth continuously differentiable on (0,1), where r2.

  • C3:

    ∃ unique θ0Θ satisfies E(g¯ˆn(θ0))=o(1), where Θ is the parameter space.

  • C4:

    ∃ invertible Ω0 such that Ωna.s.Ω0.

  • C5:

    E(ϵiϵiT)=Vi,supiVi< , and ∃ δ>0 such that supiE{ϵi2+δ}<, Ewi8<, Eui8<, where is the modulus of the largest singular values.

  • C6:

    Ai0, supiAi<.

  • C7:

    EXi4<, EZi4<, i=1,2,,n.

  • C8:

    Denote interior knots as {τi,i=1,2,,K} and satisfy max1iK|Δτi+1Δτi|=o(K1) and ΔτmaxΔτminC, where C0, Δτmax=max1iKτi,Δτmin=min1iKτi, Δτi=τiτi1, τ0=0, τK+1=1.

  • C9:
    g¯ˆ˙n(θ)=g¯ˆn(θ)θ exists and is continuous, and according to the weak law of large number, when θˆpθ0, ∃ J0 such that
    limn1ni=1nE(Wi,U~i)TAi1/2M1Ai1/2(Wi,U~i)(Wi,U~i)TAi1/2MsAi1/2(Wi,U~i)J0. (24)
  • C10:

    Denote an=maxk,l{|pλ1k(|βk0|)|,|pλ2l(γl0H)|,βk00,γl00}, then an0 as n.

  • C11:
    pλ(t) satisfies
    lim infnlim infβk0+λ1k1pλ1k(|βk|)>0,k=p1+1,p1+2,,p. (25)
    lim infnlim infγlH0λ2l1pλ2l(γlH)>0,l=q1+1,q1+2,,q. (26)

Remark 3.1

These conditions are often used in the literatures for nonparametric and semi-parametric statistical inference. C1 implies N=i=1nni=O(n). C2 is the smoothness condition about αl(t)(l=1,2,,q) and necessary condition to study the convergence rate of B-spline estimator. C4 and C9 can be easily obtained by the weak law of large numbers when n. C3, C5-C7, C9 can be seen in [20]. C8 is necessary for knots of B-spline basis approximations [18]. C10 and C11 can be seen in [5,20,36].

According these conditions above, some asymptotic properties about resulting estimators are presented as follows.

Theorem 3.1

If C1-C11 hold, and K=O(N1/(2r+1)), we have

|αˆl()αl0()|=Op(nr/(2r+1)),l=1,2,,q. (27)

Theorem 3.2

If C1-C11 hold, and K=O(N1/(2r+1)), let λmax=maxk,l{λ1k,λ2l}, λmin=mink,l{λ1k,λ2l} satisfy λmax0, nr/(2r+1)λmin+, then with probability tending to 1, we have

  1. βˆk()=0,k=p1+1,,p,

  2. αˆl()0,l=q1+1,,q.

Theorem 3.3

Denote βˆ=(βˆ1,βˆ2,,βˆp1)T as the estimator of β=(β1,β2,,βp1)T. If C1-C11 hold, and K=O(N1/(2r+1)), we have

n(βˆβ0)LN(0,A0(Jθ0Ω01Jθ0T)1A0T) (28)

where A0 is denoted as Equation (A11) in Appendix, “ L” represents the convergence in distribution.

Remark 3.2

Theorem 1 shows that the estimators of varying coefficients have the optimal convergence rate, Theorem 2 shows that the estimators of constant coefficients and varying coefficients have sparse property. From Theorem 1-3, we know that the proposed method possesses the oracle property.

4. Computational algorithm and selection of tuning parameters

4.1. Computational algorithm

It is obvious that θˆ by (22) does not have closed form and Qn() is irregular at the origin which means that we can only get numerical solution of θˆ. Therefore, Qn() can be approximated around a given point θ(0) using Taylor expansion as

Qn(θ)Qn(θ(0))+Q˙n(θ(0))T(θθ(0))+12(θθ(0))TQ¨n(θ(0))(θθ(0)),

where Q˙n()=Qn()θ and Q¨n()=a˙Qn()θ. On the other hand, pλ() can be approximated as

pλ(|t|)pλ(|t0|)+12pλ(|t0|)|t0|(t2t02),fortt0,

where t0 is an initial value. Therefore, apart from a constant, the bias-corrected pQIF can represented as

Qp(θ)Qn(θ(0))+Q˙n(θ(0))T(θθ(0))+12(θθ(0))TQ¨n(θ(0))(θθ(0))+n2θTΣλ(θ(0))θ, (29)

where

Σλ(θ(0))=diagpλ11(|β1(0)|)|β1(0)|,,pλ1p(|βp(0)|)|βp(0)|,pλ21(γ1(0)H)γ1(0)HH,,pλ2q(γq(0)H)γq(0)HH.

According to (29), θˆ can be solved by following calculation algorithm

θ(1)θ(0){Q¨n(θ(0))+nΣλ(θ(0))}1{Q˙n(θ(0))+nΣλ(θ(0))θ(0)}.

The detailed computational algorithm iterative calculation method is shown below.

  • Step 1: Take the bias-corrected QIF estimator θ~ denoted by (19) as θ(0).

  • Step 2: Update θˆ at the (k+1)th iteration by
    θ(k+1)θ(k){Q¨n(θ(k))+nΣλ(θ(k))}1{Q˙n(θ(k))+nΣλ(θ(k))θ(k)}.
  • Step 3: Repeat Step 2 until certain convergence criterion is satisfied.

4.2. Selection of tuning parameters

As all we know, λ1k and λ2l control the amount of penalty and determine the values of the penalty function pλ1k()(k=1,2,,p) and pλ2l(l=1,2,,q). However, they are unknown in practice. These mean that unknown λ1k and λ2l determine the results of model estimation and selection indirectly. Thus, it is important for selection of λ1k and λ2l in the implementation. As Wang et al. [21] presented, the BIC criterion for SCAD estimator can select the true model with probability tending to one. In our work, we apply the BIC criterion to select the optimal tuning parameters λ1k and λ2l.

However, it is a challenge to select p + q parameters simultaneously in real applications. A wise way for selection of λ1k(k=1,2,,p) and λ2l(l=1,2,,q) is to give a larger value to a zero parameter or a zero coefficient function than to a non-zero parameter or non-zero coefficient function. This method aims to give more amount of penalty to zero parameters or zero coefficient functions than to non-zero parameters or non-zero coefficient functions, which is good for selecting significantly non-zero parameters or non-zero coefficient functions and can reduce computational complexity. This kind of tuning parameters usually are called adaptive tuning parameters. The proposed method using the adaptive tuning parameters can estimate large parameters and coefficient functions unbiasedly and shrink the small ones toward zero simultaneously. Thus, denote λ1k(k=1,2,,p) and λ2l(l=1,2,,q) as

λ1k=λ|β~k|,λ2l=λγ~lH,

where β~k(k=1,2,,p) and γ~l(l=1,2,,q) are defined by (19). Consequently, the selection of λ1k(k=1,2,,p) and λ2l(l=1,2,,q) becomes a problem of selection of λ, which is an easier univariate problem and can reduce computational complexity greatly. Define BIC as

BIC(λ)=Qn(θˆλ)+dfλlog(n), (30)

where θˆλ=(βˆλT,γˆλT)T is defined by (22) for a given λ, dfλ is the number of non-zero parameters and coefficients of βˆ1λ,βˆ2λ,,βˆpλ and γˆ1λH,γˆ2λH,,γˆqλH, βˆλ=(βˆ1λ,βˆ2λ,,βˆpλ)T, γˆλ=(γˆ1λT,γˆ2λT,,γˆqλT)T. So we can get the optimal λˆ as

λˆ=minλBIC(λ) (31)

In practice, λˆ can be obtained by the grid searching method.

5. Numerical studies

5.1. Simulations studies

We conducted some numerical simulations to asses the performance of the bias-corrected pQIF method in terms of estimation accuracy and selection performance in finite samples. Firstly, the generalized mean square error (GMSE) [20,36] is defined as

GMSE=(βˆβ)TE(XXT)(βˆβ).

Obviously, the smaller the GMSE, the better the estimation effect for β. The square root of average square (RASE) is defined as

RASE=1ml=1q=1mαˆl(t)αl(t)21/2.

Smaller RASE indicates better estimation accuracy which implies βˆ(t) is more closer to the true function α(t). In our work, we set m = 200, and t(=1,2,,m) are equally spaced on [0,1].

‘C’ in tables below means the average number of βˆk=0(k=p1+1,,p) or αl(t)=0(l=q1+1,q1+2.,q), and ‘IC’ denotes the average number of βˆk=0(k=1,2,,p1) or αl(t)=0(l=1,2,,q1). Obviously, larger ‘C’ and smaller ‘IC’ imply better model selection results. The performance of the bias-corrected pQIF method is assessed by the GMSE, RASE, ‘C’ and ‘IC’ simultaneously.

In our simulation studies, for model (2), let β=(β1,β2,β3,β4)T with β1=2, β2=0.7, βk=0(k=3,4), α(t)=(α1(t),α2(t),,α6(t))T, αl(t)0(l=3,4,5,6) and

α1(t)=7.5+0.1exp(3t1),α2(t)=sin(2πt)

We took XijN(2,σX2I4), ZijN(2,σZ2I6), wijN(0,σw2I4), uijN(0,σu2I6), where j=1,2,,10, σX=σZ=2, I4 is 4×4 identify matrix, I6 is 6×6 identify matrix. We set σw=σu as 0.2, 0.4, 0.6. tijU[0,1]. ϵi=(ϵi1,ϵi2,,ϵini)TN(0,σ2Corr(ϵi,ρ)), where σ2=1 and Corr(ϵi,ρ)) is a known correlation matrix with parameter ρ. So we can get Ai=diag(1,1,,1). We set ni=10 and considered ϵi has the first-order autoregressive (AR(1)) and exchangeable (EX) correlation structures with ρ=0.3 and ρ=0.7. We generated n = 150, 200, 300 subjects. The cubic B-spline basis was applied, the knots were equally spaced in [0,1], K=c×N1/5, where c denotes the largest integer less than c [8]. Following Tian et al. [20], we choose c = 0.6.

For each simulated longitudinal data, we compared the bias-corrected pQIF method with the LASSO and the SCAD penalty functions and the one neglecting measurement errors with SCAD penalty function (denoted as ‘nSCAD’). For the sake of simplicity, the bias-corrected pQIF with LASSO and SCAD penalty functions are denoted by ‘LASSO’ and ‘SCAD’ respectively. λˆ1k(k=1,2,,p) and λˆ2l(l=1,2,,q) were chosen by (31). Furthermore, we did 500 simulation runs under each simulation setup and presented the median of GMSE and RASE in the following tables.

In summary, from Tables 1 to 4, we can get some conclusions as follows:

  1. The performances of the LASSO and SCAD methods are much more better than the nSCAD method in all of cases, which implies that bias-corrected method we proposed is valuable and neglecting measurement errors results in biased estimation and poor variable selection results for model (2).

  2. Under the same conditions, the performance of the SCAD and LASSO methods become better as the sample size becomes larger. Furthermore, the SCAD methods is better than the LASSO method as for estimation and selection for parametric and nonparametric parts.

  3. Under the same conditions, the SCAD and LASSO methods become worse when the measurement error increases. There is little difference between the performance of the SCAD and LASSO methods when the measurement error is small. However, the SCAD method is significantly better than the LASSO method when the measurement error is large, which implies that the LASSO method is less robust than that of the SCAD method.

Table 2.

Variable selections for α() with the EX correlation structure.

      n = 150 n = 200 n = 300
ρ σu Method C IC RASE C IC RASE C IC RASE
0.3 0.2 LASSO 3.442 0 0.11303 3.798 0 0.09892 3.972 0 0.08785
    SCAD 3.502 0 0.11307 3.808 0 0.09879 3.974 0 0.08779
    nSCAD 3.410 0 0.13200 3.764 0 0.11801 3.948 0 0.10970
  0.4 LASSO 3.378 0 0.17904 3.696 0 0.14573 3.916 0 0.12109
    SCAD 3.414 0 0.17833 3.766 0 0.14346 3.956 0 0.11848
    nSCAD 3.134 0 0.30538 3.524 0 0.29042 3.850 0 0.27680
  0.6 LASSO 3.140 0 0.26483 3.628 0 0.20670 3.888 0 0.16369
    SCAD 3.282 0 0.25851 3.674 0 0.20247 3.922 0 0.15911
    nSCAD 2.834 0 0.60510 3.178 0 0.57348 3.588 0 0.56202
0.7 0.2 LASSO 3.520 0 0.10967 3.752 0 0.09818 3.970 0 0.08738
    SCAD 3.538 0 0.10946 3.776 0 0.09803 3.972 0 0.08711
    nSCAD 3.434 0 0.12746 3.804 0 0.11733 3.958 0 0.10913
  0.4 LASSO 3.334 0 0.17337 3.684 0 0.14602 3.940 0 0.12035
    SCAD 3.386 0 0.17244 3.710 0 0.14444 3.954 0 0.11865
    nSCAD 3.136 0 0.30833 3.502 0 0.29786 3.856 0 0.27834
  0.6 LASSO 3.180 0 0.26489 3.522 0 0.20862 3.900 0 0.16294
    SCAD 3.242 0 0.26246 3.662 0 0.20480 3.938 0 0.15718
    nSCAD 2.688 0 0.61163 3.132 0 0.58300 3.600 0 0.56400

Table 3.

Variable selections for β with the AR(1) correlation structure.

      n = 150 n = 200 n = 300
ρ σu Method C IC GMSE C IC GMSE C IC GMSE
0.3 0.2 LASSO 1.834 0 0.00027 1.950 0 0.00018 1.992 0 0.00010
    SCAD 1.842 0 0.00017 1.960 0 0.00011 1.992 0 7.6E-05
    nSCAD 1.796 0 0.00079 1.880 0 0.00061 1.964 0 0.00060
  0.4 LASSO 1.380 0 0.00676 1.586 0 0.00184 1.754 0 0.00601
    SCAD 1.396 0 0.00154 1.594 0 0.00113 1.758 0 0.00075
    nSCAD 0.988 0 0.03460 0.954 0 0.00621 0.892 0 0.00111
  0.6 LASSO 1.028 0 0.03337 1.150 0 0.03183 1.408 0 0.03737
    SCAD 1.060 0 0.00929 1.178 0 0.00516 1.410 0 0.00418
    nSCAD 0.448 0 0.24760 0.286 0 0.16020 0.162 0 0.09212
0.7 0.2 LASSO 1.884 0 0.00026 1.934 0 0.00015 1.992 0 0.00011
    SCAD 1.886 0 0.00014 1.936 0 0.00010 1.996 0 6.9E-05
    nSCAD 1.814 0 0.00092 1.904 0 0.00076 1.948 0 0.00062
  0.4 LASSO 1.420 0 0.00301 1.588 0 0.00662 1.794 0 0.00126
    SCAD 1.474 0 0.00168 1.614 0 0.00107 1.824 0 0.00084
    nSCAD 1.046 0 0.00856 0.982 0 0.00197 0.886 0 0.00628
  0.6 LASSO 0.984 0 0.03522 1.212 0 0.03437 1.428 0 0.04007
    SCAD 0.994 0 0.00885 1.236 0 0.00583 1.436 0 0.00406
    nSCAD 0.460 0 0.23031 0.308 0 0.16011 0.136 0 0.08534

Table 1.

Variable selections for β with the EX correlation structure.

      n = 150 n = 200 n = 300
ρ σu Method C IC GMSE C IC GMSE C IC GMSE
0.3 0.2 LASSO 1.834 0 0.00025 1.950 0 0.00017 1.988 0 0.00010
    SCAD 1.850 0 0.00014 1.960 0 9.7E-05 1.988 0 7.6E-05
    nSCAD 1.802 0 0.00089 1.878 0 0.00072 1.974 0 0.00068
  0.4 LASSO 1.420 0 0.00339 1.546 0 0.00200 1.784 0 0.00120
    SCAD 1.448 0 0.00151 1.580 0 0.00116 1.792 0 0.00077
    nSCAD 0.892 0 0.00762 1.000 0 0.00683 1.038 0 0.00534
  0.6 LASSO 1.046 0 0.01934 1.180 0 0.01422 1.404 0 0.00954
    SCAD 1.082 0 0.00820 1.200 0 0.00610 1.438 0 0.00435
    nSCAD 0.188 0 0.03413 0.314 0 0.03321 0.478 0 0.03836
0.7 0.2 LASSO 1.882 0 0.00028 1.962 0 0.00017 1.994 0 0.00012
    SCAD 1.892 0 0.00013 1.968 0 9.3E-05 1.996 0 6.1E-05
    nSCAD 1.870 0 0.00099 1.926 0 0.00085 1.984 0 0.00070
  0.4 LASSO 1.456 0 0.00340 1.626 0 0.00192 1.752 0 0.00121
    SCAD 1.466 0 0.00176 1.598 0 0.00117 1.784 0 0.00079
    nSCAD 0.974 0 0.00883 0.994 0 0.00723 1.000 0 0.00630
  0.6 LASSO 1.024 0 0.03529 1.262 0 0.01561 1.438 0 0.00424
    SCAD 1.068 0 0.00821 1.264 0 0.00594 1.448 0 0.00421
    nSCAD 0.170 0 0.03655 0.330 0 0.03483 0.496 0 0.02724

Table 4.

Variable selections for α() with the AR(1) correlation structure.

      n = 150 n = 200 n = 300
ρ σu Method C IC RASE C IC RASE C IC RASE
0.3 0.2 LASSO 3.446 0 0.11341 3.790 0 0.10114 3.952 0 0.08951
    SCAD 3.454 0 0.11330 3.816 0 0.10087 3.958 0 0.08931
    nSCAD 3.488 0 0.13156 3.716 0 0.11831 3.960 0 0.10979
  0.4 LASSO 3.324 0 0.18076 3.672 0 0.14687 3.924 0 0.11895
    SCAD 3.396 0 0.17887 3.744 0 0.14504 3.954 0 0.11742
    nSCAD 3.064 0 0.30761 3.546 0 0.29106 3.836 0 0.27361
  0.6 LASSO 3.132 0 0.26516 3.628 0 0.20858 3.868 0 0.16263
    SCAD 3.276 0 0.25738 3.732 0 0.19892 3.934 0 0.15858
    nSCAD 2.694 0 0.60629 2.986 0 0.57842 3.468 0 0.55978
0.7 0.2 LASSO 3.416 0 0.11313 3.780 0 0.09946 3.964 0 0.08908
    SCAD 3.474 0 0.11247 3.806 0 0.09909 3.966 0 0.08878
    nSCAD 3.396 0 0.13017 3.710 0 0.11741 3.968 0 0.10843
  0.4 LASSO 3.278 0 0.17854 3.706 0 0.14404 3.924 0 0.11779
    SCAD 3.372 0 0.17591 3.782 0 0.14356 3.950 0 0.11729
    nSCAD 3.150 0 0.30647 3.496 0 0.28994 3.808 0 0.27429
  0.6 LASSO 3.088 0 0.26305 3.574 0 0.20768 3.86 0 0.16407
    SCAD 3.226 0 0.26080 3.67 0 0.20441 3.92 0 0.15910
    nSCAD 2.656 0 0.6041 3.018 0 0.58127 3.522 0 0.56007

5.2. Real example analysis

We now describe the performance of the proposed method through analysis of the AIDS dataset. This dataset contains some variables such as the mean CD4 percentage, smoking status, the pre-HIV infection CD4 percentage and age. It is unbalanced and available in the R package timereg. More details of the study design and medical implications can be found in [11]. It has been analyzed to illustrate partial linear varying coefficient models [20] and partial linear varying coefficient EV models [35]. Zhao et al. [37] and Tian et al. [20] indicated that only the baseline function varies over time and pre-CD4 has a constant effect over time. We now consider measurement errors for the covariates and analyze this dataset using the proposed method.

For simplicity, following Zhao and Xue [36], we considered the following model. Let Y be the individuals CD4 percentage, X1 be the centered preCD4 percentage, X2=X12, Z1 be the centered age at HIV infection, Z2=Z12.

Y=X1β1+X2β2+α0(t)+Z1α1(t)+Z2α2(t)+ϵ (32)

where α0(t) is the baseline of CD4 percentage; β1 and β2 describe the first-order and second-order effects of preCD4 percentage, α1(t) and α2(t) describes the first-order and second-order effects of the age at HIV infection, t is the visiting time for each patient.

For the AIDS dataset, we cannot get repeated measurements of the covariates or estimate the variance of the measurement error. Following Lin and Carroll (2000), a sensitivity analysis can be used to test the practicability of the proposed method. Similar as in Zhao and Xue (2010)[36], we considered X1 and Z1 have additive measurement errors as follows

W1=X1+w1,U1=Z1+u1

where w1N(0,σw2), u1N(0,σu2). In our work, we took σw=σu=0,0.5,1 to represent different measurement errors. It is obvious that σw=σu=0 implies no measurement errors.

We repeated the proposed model selection procedure and the proposed method identified two non-zero coefficients β1 and α0(t) every time under different measurement errors, which means that the first-order or second-order effects of age at HIV infection have no significant impact on the mean CD4 percentage. The same goes for the second-order effect of the centered preCD4 percentage and the interaction effect between the preCD4 percentage and age at HIV infection. Our result is same as in Zhao et al. (2009).

Figure 1 shows the curve of αˆ0(t) over time under different measurement errors. It shows that α0(t) decreases quickly at the beginning of HIV infection, and the rate of decrease slows down, which is similar as in Zhao and Xue [36]. Furthermore, we found that the estimated functional curve αˆ0(t) preserves its shape under different measurement errors, which means that our bias-corrected model selection scheme works well. This further demonstrates that the proposed model estimation and selection method has good practical value.

Figure 1.

Figure 1.

The curve of αˆ0(t) for the cases σw=σu=0 (solid curve), σw=σu=0.5 (dashed curve) and σw=σu=1(dotted curve).

6. Conclusion and discussion

Longitudinal data are widely used in some scientific fields, and it is of great significance to consider measurement error in longitudinal research. Longitudinal data have unknown within-subject correlations. Thus, the processing of within-subject correlations and measurement errors is an important subject for analysis for longitudinal data with measurement errors. In our work, we consider cases where covariates of model (2) have additive measurement errors. For model (2), some scholars have done valuable researches, such as [9,33,35–37]. However, no studies have been reported on model estimation and selection simultaneously for model (2) with longitudinal data. In our work, we proposed a bias-corrected penalized quadratic inference functions method to do model estimation and selection for model (2) with longitudinal data. This method can deal with both within-subject correlation and measurement errors. Under some conditions, the proposed method can select significant non-zero parameters and varying coefficients. Furthermore, the estimators of non-zero coefficient functions achieve the optimal convergence rate, the estimators of parameters are asymptotic normal. The performance of the proposed method in the case of finite samples can be demonstrated by numerical studies. Finally, it can be concluded that the proposed method has good theoretical and practical value for model estimation and selection of model (2).

The proposed method can also be applied to other models, such as generalized partial linear additive models, generalized partial linear single index models and many others. In addition, the proposed method can also be used for other types of correlated data analysis, such as panel data, clustered data and so on. In future, we will use this method to study more complex models.

Acknowledgements

This work is supported by grants from the Social Science Foundation of China (15CTJ008 to MZ), the Natural Science Foundation of Anhui Universities (KJ2017A433 to KZ), the Social Science Foundation of the Ministry of Education of China(19YJCZH250 to KZ), the National Science Foundation of China (12071305, 11871390 and 11871411 to YZ), the Excellent Young Talents Fund Program of Higher Education Institutions of Anhui Province(gxyqZD2019031 to YZ), the National Science Foundation of China (71803001 to YZ). This paper is partially supported by the National Natural Science Foundation of China (11901401). All authors read and approved the final manuscript.

Appendix. Proof of theorems.

Lemma 1. If C1-C11 hold, and K=O(N1/(2r+1)), then we have

g¯ˆ˙n(β)pJ0,ng¯ˆn(θ0)LN(0,Ω0).

Proof.

According to (17), we have

g¯ˆ˙n(θ)=1ni=1n(Wi,U~i)TAi1/2M1Ai1/2(Wi,U~i)+Dˆi(1)(Wi,U~i)TAi1/2MsAi1/2(Wi,U~i)+Dˆi(s).

Denote the κth block matrix of g¯ˆ˙n(β) as g¯ˆ˙nκ(β), κ=1,2,,s

g¯ˆ˙nκ(θ)=1ni=1n((Wi,U~i)TAi1/2MκAi1/2(Wi,U~i)Dˆi(k))=1ni=1n((Xi+wi,Z~i+u~i)TAi1/2MκAi1/2(Xi+wi,Z~i+u~i)Dˆi(κ))=1ni=1n((Xi,Z~i)TAi1/2MκAi1/2(Xi,Z~i)+(Xi,Z~i)TAi1/2MκAi1/2(wi,u~i)+(wi,u~i)TAi1/2MκAi1/2(Xi,Z~i)+(wi,u~i)TAi1/2MκAi1/2(wi,u~i)Dˆi(k))=(Δ1+Δ2+Δ3+Δ41ni=1nDˆi(κ))

Now, we prove Δ41ni=1nDˆi(k)p0 as n.

Δ41ni=1nDˆi(k)=1ni=1n(wi,u~i)TAi1/2MkAi1/2(wi,u~i)Di(k)+Di(k)1ni=1nDˆi(k)

Clearly, according to the law of large numbers, we have 1ni=1n(wi,u~i)TAi1/2MkAi1/2(wi,u~i)Di(k)p0 and Di(k)1ni=1nDˆi(k)p0 as the n. So we get Δ41ni=1nDˆi(κ)p0. Under C9, we can get Δ1pJ0(κ). Now, let's prove that Δ2p0 and Δ3p0.

Denote Δ2=1ni=1nξiκ, where ξiκ=(Xi,Z~i)TAi1/2MκAi1/2(wi,u~i). Obviously, we can get E(ξiκ)=0 and

cov(ξiκ)=(Xi,Z~i)TAi1/2MκAi1/2E(wi,u~i)(wi,u~i)TAi1/2MκAi1/2(Xi,Z~i)

where E((wi,u~i)(wi,u~i)T)=diag(E(wiwiT),E(u~iu~iT)). From C4-C7, we see that E(wiwiT) and E(u~iu~iT)) are bounded. By the law of large numbers, we can get Δ3T=Δ2p0. Thus, we have g¯ˆ˙nκ(θ)pJ0(k) and g¯ˆ˙n(θ)pJ0 where J0=(J0(1),J0(2),,J0(s))T.

According to the Taylor expansion to g¯ˆn(θ) at θ0, we have

g¯ˆn(θ)=g¯ˆn(θ0)+g¯ˆ˙n(θ0)(θθ0)+o(θθ0). (A1)

Denote the κth block matrix of g¯ˆn(θ0) as g¯ˆnκ(θ0), κ=1,2,,s

g¯ˆnκ(θ0)=1ni=1n((Wi,U~i)TAi1/2MκAi1/2(YiWiβ0U~iγ0)+Dˆi(κ)θ0)=1ni=1n(((Xi,Z~i)+(wi,u~i))TAi1/2MκAi1/2(ϵi(wi,u~i)θ0+ZiR(ti))+Dˆi(κ)θ0)=1ni=1n(Xi,Z~i)TAi1/2MκAi1/2ϵi1ni=1n(Xi,Z~i)TAi1/2MκAi1/2(wi,u~i)θ0+1ni=1n(wi,u~i)TAi1/2MκAi1/2ZiR(ti)+1ni=1n(wi,u~i)TAi1/2MκAi1/2ϵi+1ni=1n(Xi,Z~i)TAi1/2MκAi1/2ZiR(ti)1ni=1n(wi,u~i)TAi1/2MκAi1/2(wi,u~i)θ0+1ni=1nDi(κ)θ0=J1J2+J3+J4+J5J6+1ni=1nDi(κ)θ0

where R(t)=(R1(t),R2(t),,Rq(t))T, Rl(t)=αl(t)BT(t)γl0,l=1,2,,q.

Denote J1=1ni=1nφi, where (Xi,Z~i)TAi1/2MκAi1/2ϵi. According to C5-C7 and Lemma 1, we have E(φi)=0 and

cov(φi)=(Xi,Z~i)TAi1/2MκAi1/2ViAi1/2MκAi1/2(Xi,Z~i)<

By the law of large numbers, we get J1p0. Similarly, we have J2p0 and J3p0.

Denote J4=1ni=1nϕi, where ϕi=(wi,u~i)TAi1/2MκAi1/2ϵi. And since ϵi,wi,u~i are independent of each other, we have E(ϕi)=0. According to the Cauchy-Schwarz inequality and C5-C7 we have

(cov(φi))2=E((wi,u~i)TAi1/2MkAi1/2(wi,u~i))E(ϵiTAi1/2MkAi1/2ϵi)<

Thus, J4p0. By the law of large numbers, from the definition of Dˆi(κ), we have J61ni=1nDˆi(k)θ0p0. From C8 and Lemma 1, we have J5=Op(n1/2Kr)=op(n1/2) and J3=op(n1/2). So, according to (A1), we have g¯ˆn(θ)pJ0(θ0θ),θΘ.

Following [20], according to the results above, we have

g¯ˆnκ(θ0)=1ni=1n((Xi,Z~i)+(wi,u~i))TAi1/2MκAi1/2(ϵi(wi,u~i)θ0)+Dˆi(κ)θ0)+op(n1/2)=1ni=1n(Xi,Z~i)TAi1/2MκAi1/2ϵi(Xi,Z~i)TAi1/2MκAi1/2(wi,u~i)θ0+op(n1/2)+(wi,u~i)TAi1/2MκAi1/2ϵi(wi,u~i)TAi1/2MκAi1/2(wi,u~i)θ0Dˆi(κ)θ0)+op(n1/2)=1ni=1n(ψiκ1+ψiκ2+ψiκ3+ψiκ4)+op(n1/2)=1ni=1nψiκ+op(n1/2)

where ψi=(ψi1,ψi2,,ψis)T, ψiκ=ψiκ1+ψiκ2+ψiκ3+ψiκ4. So we have

g¯ˆn(θ0)=1ni=1nψi+op(n1/2),andΩn(θ0)=1ni=1nψiψiT+o(1).

From C5-C7, we get E(ψikm)=0,cov(ψikm)<,m=1,2,3,4. Following the properties of covariance matrix, we have

cov(ψik)m=14cov(ψikm)+mlcov(ψikm)cov(ψikl)<,aRs(p+q(K+d)),aTa=1,E(aTψi)=0,supiEaTψiaTsupiψi3.

So aTψi satisfies the Lyapunov condition for the central limit theorem. Thus

aTi=1ncov(ψi)a1/2i=1naTψiLN(0,1).

According to the Slutsky Theorem, we have ng¯ˆn(θ0)LN(0,Ω0),g¯ˆn(θ0)=Op(n1/2). The proof of Lemma 1 is completed.

Lemma 1

If C1-C11 hold, K=O(n1/(2r+1)) then

n1Q˙n(θ0)2g¯ˆ˙nT(θ0)Ωn1g¯ˆn(θ0)=Op(n1), (A2)
n1Q¨n(θ0)2g¯ˆ˙nT(θ0)Ωn1g¯ˆ˙n(θ0)=op(1). (A3)

Proof.

The proof of Lemma 2 is similar as Lemma 2 in Tian et al. [20] and is omitted here.

Proof of Theorem 1.

Proof.

Let δ=nr/(2r+1), β=β0+δC1, γ=γ0+δC2 and C=(C1T,C2T)T. To prove Theorem 1, it is sufficient to show that ϵ>0, ∃ a large constant C0 satisfies

PinfC=C0Qp(θ)Qp(θ0)1ϵ. (A4)

Obviously, when ϵ1, 1ϵ<0, (A4) is always true. Therefore, we consider the case that ϵ(0,1). Assume βk=0(k=p1+1,,p), θl()=0(l=q1+1,,q) and pλ(0)=0. Let Δ(β,γ)=1K[Qp(θ)Qp(θ0)], θ0=(β0T,γ0T)T, we have

Δ(β,γ)1K[Qn(θ)Qn(θ0)]+nKk=1p1[pλ1k(|βk|)pλ1k(|βk0|)]+nKl=1q1[pλ2lγlHpλ2lγl0H]=Δ1+Δ2+Δ3.

Apply Taylor expansion to Qn(θ) at θ0, we have

Qn(θ)=Qn(θ0+δC)=Qn(θ0)+δCTQ˙n(θ0)+1n;2δ2CTQ¨n(θ~)C,

where θ~ lies between β and β0. According to Lemma 1 and Lemma 2, we can get

δCTQ˙n(θ0)=δCT{2ng¯ˆ˙nT(θ0)Ωn1g¯ˆn(β0)+nOp(n1)}=COp(nδ)+COp(δ),

and

δ2CTQ¨n(θ0)C=δ2CT{2ng¯ˆ˙nT(θ0)Ωn1g¯ˆ˙n(θ0)+nop(1)}C=nδ2CTg¯ˆ˙nT(θ0)Ωn1g¯ˆ˙n(θ0)C+nδ2C2op(1).

Therefore, we have

Δ1=1K{nδ2C2J0TΩ01J0+COp(nδ)+COp(δ)+nδ2C2op(1)}.

Obviously, nδ2C2J0TΩ01J00. When C is large enough,

nδ2C2J0TΩ01J0COp(nδ),nδ2C2J0TΩ01J0nδ2C2op(1).

So when C is large enough, Δ1>0. Next, by Taylor expression, we get that

Δ2=nKk=1p1[pλ1k(|βk|)pλ1k(|βk0|)]=1Kk=1p1[nδpλ2k(|βk0|)sgn(βk0)|C1|+nδ2pλ2k(βk0)|C1|2(1+o(1))]1K{p1nδanC+nδ2anC2}.

Then, Δ2 is dominated by Δ1 uniformly in C=C0 for a sufficiently large C0.

Assume λ1k0, λ2l0 and K=O(n1/(2r+1)). When n is large enough, following Xue et al. [29], we have γlHaλ,γl0Haλ2l. According to the definition of the penalty function, we get

pλl(γlH)=pλl(γl0H)=(1+a)λl22,l=1q1n[pλl(γlH)pλl(γl0H)]=0.

So, ϵ>0, ∃ a large enough C0 satisfies (A4), which further implies that there exists θˆ satisfies θˆθ0=Op(δ)=Op(nr/(2r+1)). Note that

αˆl(t)αl(t)2=01{BT(t)γˆlBT(t)γl0+αl(t)+BT(t)γl0}2dt201{BT(t)γˆlBT(t)γl0}2dt+201{αl(t)BT(t)βl0}2dt=2(γˆlγl0)T01BT(t)B(t)dt(γˆlγl0)+201{αl(t)BT(t)βl0}2dt=2(γˆlγl0)TH(γˆlγl0)+201Rl(t)2dt.

With the same arguments above, we can get γˆγ=Op(nr/(2r+1)). Therefore, invoking H=O(1)), we have (γˆlγl0)TH(γˆlγl0)=OP(n2r/(2r+1)).

Suppose C2 and C8 hold and K=O(N1/(2r+1)), with the Corollary 6.21 in [18], ∃ a constant c0 that satisfies

supt[0,1]|αl(t)BT(t)γl0|c0Kr,l=1,2,,q. (A5)

So we get O1Rl(t)2dt=OP(n2r/(2r+1)). Thus, the proof of Theorem 1 is complete.

Proof of Theorem 2.

Proof.

Part (i). Denote Qp(θ)=Qp(β,γ). According to Theorem 1, similar as [20], it suffices to show that, γ that satisfies γγ0=Op(nr/(2r+1))(k=1,2,,p1), β that satisfies βkβ0=Op(nr/(2r+1))(k=1,2,,p1), and ∃ small ϵ=Op(n1/(2r+1)), when n, with probability tending to one, we have

Qp(β,γ)βk>0,for0<βk<ϵ,k=p1+1,,p (A6)

and

Qp(β,γ)βk<0,forϵ<βk<0,k=p1+1,,p (A7)

obviously, (A6) and (A7) imply that minimizer of Qp(β,γ) about β attains at βk=0(k=1,2,,p).

According to Lemma 2, we have

Q(β,γ)βk=2ng¯ˆnT(β,γ)βkΩn1g¯ˆn(β,γ)+op(1)+npλ1k(|βk|)sgn(βk)=nλ1k{2λ1k1g¯ˆnT(β,γ)βkΩn1g¯ˆn(β,γ)+λ1k1pλ1k(|βk|)sgn(βk)}+op(1)

Denote g¯ˆnT(θ)θ=(g¯ˆnT(β,γ)β1,g¯ˆnT(β,γ)β2,,g¯ˆnT(β,γ)βp,g¯ˆnT(β,γ)γ1,g¯ˆnT(β,γ)γ2,,g¯ˆnT(β,γ)γq). According to Lemma 2, we have g¯ˆnT(β,γ)βkpJβk, where J0=(Jβ1,Jβ2,,Jβp,Jγ1,Jγ2,,Jγq). Thus, we get

Q(β,γ)βk=nλ1k{Op(λ1k1n1/2)+λ1k1pλ1k(|βk|)sgn(βk)}+op(1)

In addition, C11 implies that lim infnlim infβk0+λ1k1pλ1k(|βk|)>0, and λ1k1n1/20, which means that the sign of Q(β,γ)βk is same as that of βk. So, (A6) and (A7) hold, the proof of part (i) is completed.

We then prove part (ii). Denote

Θ1={θ:θ=(βT,γT)T,γl=0,l=q1+1,,q},Θl={θ:θ=(βT,0T,,0T,γlT,0T,,0T),l=q1+1,,q},

where 0 is a (K+d)×1 vector with all of components being zero.

To prove part (ii), it is sufficient to show that, θΘ1 and θlΘl, Qp(θ+θl)Qp(θ) is true with probability tending to 1.

Qp(θ+θl)Qp(θ)=Qn(θ+θl)Qn(θ)+npλ2l(γlH)=θlTQ˙n(θ)+12θlTQ¨n(θˆl)θl(1+op(1))+npλ2l(γlH)=nλ2lBT(t)γlRlλ2l+pλ2l(t)λ2l(1+op(1)),

where θˆl lies between θ+θl and θ, t(0,γlH). Furthermore, we get

Rl=θlTn1Q˙n(θˆl)+12θlTn1Q¨n(θˆl)θlBT(t)γl.

Note that αl0=0,l=q1+1,,q, from Lemma 1 and [29], we have BT(t)γl=O(nr/(2r+1)) and BT(t)γlλ2l=O(nr/(2r+1)λl). According to Lemma 2 and Lemma 3, we have

θlTn1Q˙n(θˆl)=Op(n1/2)=op(1),θlTn1Q¨n(θˆl)θl=θlTJ0TΩ01J0θl+op(1)<+,Rlλ2l=θlTJ0TΩ01J0θlBT(t)γlλ2l+op(1)0.

Form C10 and C11, for t lies between 0 and γlH

lim infnlim infγlH0pλ2l(t)λ2l>0,l=q1+1,,q.

Thus, for any θΘ and θlΘl, Qp(θ+θl)Qp(θ) is true with probability tending to 1. The part (ii) is real. So, the proof of Theorem 2 is completed.

Proof of Theorem 3.

Proof.

Let β0 be the true value of β. Let α(t)=(α1(t),α2(t),,αq1(t))T and α0(t) be the true value of α(t), γ and γ0 are the spline coefficients of α(t) and α0(t) respectively. Then, Theorems 1 and 2 imply that Qp(θ) attains the minimal value at (βˆT,0)T and (γˆT,0)T.

Denote θ=(βT,0,γT,0)T, θ0=(β0T,0,γ0T,0)T, and θˆ=(βˆT,0,γˆT,0)T, write g¯ˆ˙n(θ)=θg¯ˆn(θ)=(βg¯ˆn(θ),γg¯ˆn(θ))=(g¯ˆ˙β(θ),g¯ˆ˙γ(θ)), we have

Sn(θ)=g¯ˆ˙βT(θ)Ωn1g¯ˆn(θ)g¯ˆ˙γT(θ)Ωn1g¯ˆn(θ),Hn(θ)=H11H12H21H22=g¯ˆ˙βT(θ)Ωn1g¯ˆ˙β(θ)g¯ˆ˙βT(θ)Ωn1g¯ˆ˙γ(θ)g¯ˆ˙γT(θ)Ωn1g¯ˆ˙β(θ)g¯ˆ˙γT(θ)Ωn1g¯ˆ˙γ(θ).

Denote pλ(θ)=k=1ppλ1k(|βk|)+l=1qpλ2l(γlH). According to (22) we have

Q˙p(θˆ)=Q˙n(θˆ)+np˙λ(θˆ)θˆ=0. (A8)

Applying the Taylor expression to (A8), we have

Q˙n(θˆ)+np˙λ(θˆ)θˆ=Q˙n(θˆ0)+np˙λ(θˆ0)θˆ0+{Q¨n(θˆ0)+np¨λ(θ~0)}(θˆθˆ0)=0, (A9)

where θ~0 lies between θ0T and θˆ0. Therefore, we have

n1Q˙n(θˆ0)+p˙λ(θˆ0)θˆ0=n1{Q¨n(θˆ0)+p¨λ(θ~0)}(θˆθˆ0). (A10)

Note that p˙λ1(β)=k=1p1pλ1k(|βˆk|)sgn(βˆ), p˙λ2(γ)=l=1q1pλ2l(γˆl)Hγˆlγˆl, and p˙λ(θ)=p˙λ1(β)+p˙λ2(γ). Apply Taylor expression to pλ1k(|βˆk|), we have

pλ1k(|βˆk|)=pλ1k(|βˆ0k|)+{pλ1k(|β0k|)+op(1)}(βˆkβ0k)

C9 implies that pλ1k(|β0k|)=op(1), and note that pλ1k(|β0k|)=0 as λmax→=op(βˆβ0). Following [29], we know that γˆlHaλ2l for n large enough. Thus, pλ2l(γˆlH)=0 and pλ2l(γˆlH)=0, which imply that p˙λ2(γ)=0=op(γˆγ0) and p˙λ(θ)=0. So we have

nH(θˆθ0)=nSn+op(1)n(βˆβ0)={H11(θˆ0)H12(θˆ0)H221(θˆ0)H21(θˆ0)}1×(I,H21(θˆ0)H221(θˆ0))nSn(θˆ0)+op(1)

From Lemma 1, we can get n1Q˙n(θ0)2Sn(θ0)=Op(n1), n1Q¨n(θ0)2Hn(θ0)=op(1). According to C9 and Lemma 2, we have

g¯ˆn(θ)θPJ0θ,g¯ˆn(θ)βPJ0β,g¯ˆn(θ)γPJ0γ

and

ng¯ˆn(θ0)=Op(n1/2),nSn(θˆ0)=J0θΩ01ng¯ˆn(θ0)+op(1)=Op(n1/2)H11(θˆ0)J0βTΩ01J0β=H110(θˆ0),H22(θˆ0)J0γTΩ01J0γ=H220(θˆ0)H12(θˆ0)J0βTΩ01J0γ=H120(θˆ0),H21(θˆ0)J0γTΩ01J0β=H210(θˆ0)

where J0θ=(J0β,J0γ).

Denote A={H11(θˆ0)H12(θˆ0)H221(θˆ0)H21(θˆ0)}1(I,H21(θˆ0)H221(θˆ0)), Hence we can get

Ap{H110(θˆ0)H120(θˆ0)H221(θˆ0)H210(θˆ0)}1(I,H210(θˆ0)H221(θˆ0))=A0 (A11)
nSn(θˆ0)LN(0,(Jθ0Ω01Jθ0T)1)

According to the Slutsky Theorem, we can see that βˆ is consistent and asymptotic normality,

n(βˆβ0)LN(0,A0(Jθ0Ω01Jθ0T)1A0T) (A12)

This completes the proof of Theorem 3.

Funding Statement

This work is supported by grants from the Social Science Foundation of China [grant number 15CTJ008 to M. Z.], the Natural Science Foundation of Anhui Universities [grant number KJ2017A433 to K. Z.], the Social Science Foundation of the Ministry of Education of China [grant number 19YJCZH250 to K. Z.], the National Science Foundation of China [grant numbers 12071305, 11871390 and 11871411 to Y. Z.], the Excellent Young Talents Fund Program of Higher Education Institutions of Anhui Province [grant number gxyqZD2019031 to Y. Z.], the National Science Foundation of China [grant number 71803001 to Y. Z.]. This paper is partially supported by the National Natural Science Foundation of China [grant number 11901401].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Carroll R.J., Ruppert D., Stefanski L.A. and Crainiceanu C.M., Measurement Error in Nonlinear Models: a Modern Perspective, Chapman and Hall/CRC, New York, 2006. [Google Scholar]
  • 2.Fan G.L., Xu H.X. and Huang Z.S., Empirical likelihood for semivarying coefficient model with measurement error in the nonparametric part, AStA Adv. Stat. Anal. 100 (2015), pp. 21–41. [Google Scholar]
  • 3.Fan G.L., Xu H.X. and Liang H.Y., Empirical likelihood inference for partially time-varying coefficient errors-in-variables models, Electron. J. Stat. 6 (2012), pp. 1040–1058. [Google Scholar]
  • 4.Fan J. and Huang T., Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli. 11 (2005), pp. 1031–1057. [Google Scholar]
  • 5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Statist. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]
  • 6.Feng S. and Xue L., Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition, Ann. Inst. Statist. Math. 66 (2014), pp. 121–140. [Google Scholar]
  • 7.Hastie T. and Tibshirani R., Varying coefficient models, J. R. Stat. Soc. Ser. B. (Stat. Methodol.). 55 (1993), pp. 757–779. [Google Scholar]
  • 8.He X., Zhu Z.Y. and Fung W.K., Estimation in a semiparametric model for longitudinal data with unspecified dependence structure, Biometrika 89 (2002), pp. 579–590. [Google Scholar]
  • 9.Hu X., Wang Z. and Zhao Z., Empirical likelihood for semiparametric varying coefficient partially linear errors-in-variables models, Statist. Probab. Lett. 79 (2009), pp. 1044–1052. [Google Scholar]
  • 10.Huang Z. and Zhang R., Empirical likelihood for nonparametric parts in semiparametric varying coefficient partially linear models, Statist. Probab. Lett. 79 (2009), pp. 1798–1808. [Google Scholar]
  • 11.Kaslow R.A., Ostrow D.G., Detels R., Phair J.P., Polk B.F. and Rinaldo C.J., The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants, Am. J. Epidemiol. 126 (1987), pp. 310–318. [DOI] [PubMed] [Google Scholar]
  • 12.Li Q., Huang C.J., Li D. and Fu T-T, Semiparametric smooth coefficient models, J. Bus. Econom. Statist. 20 (2002), pp. 412–422. [Google Scholar]
  • 13.Li R. and Liang H., Variable selection in semiparametric regression modeling, Ann. Stat. 36 (2008), pp. 261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Liang K.Y. and Zeger S.L., Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), pp. 13–22. [Google Scholar]
  • 15.Park B.U., Mammen E., Lee Y.K. and Lee E.R., Varying coefficient regression models: a review and new developments, Int. Stat. Rev. 83 (2015), pp. 36–64. [Google Scholar]
  • 16.Qu A. and Li R., Quadratic inference functions for varying coefficient models with longitudinal data, Biometrics. 62 (2006), pp. 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Qu A., Lindsay B.G. and Li B., Improving generalised estimating equations using quadratic inference functions, Biometrika 87 (2000), pp. 823–836. [Google Scholar]
  • 18.Schumaker L., Spline Functions: Basic Theory, Cambridge University Press, New York, 2007. [Google Scholar]
  • 19.Tian R. and Xue L., Variable selection for semiparametric errors-in-variables regression model with longitudinal data, J. Stat. Comput. Simul. 19 (2013), pp. 1–16. [Google Scholar]
  • 20.Tian R., Xue L. and Liu C., Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data, J. Multivariate Anal. 132 (2014), pp. 94–110. [Google Scholar]
  • 21.Wang H., Li R. and Tsai C-L., Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika 94 (2007), pp. 553–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang H., Zou G. and Wan A.T., Model averaging for varying-coefficient partially linear measurement error models, Electron. J. Stat. 6 (2012), pp. 1017–1039. [Google Scholar]
  • 23.Wang H.J., Zhu Z. and Zhou J., Quantile regression in partially linear varying coefficient models, Ann. Statist. 37 (2009), pp. 3841–3866. [Google Scholar]
  • 24.Wang L., Li H. and Huang J.Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Am. Statist. Assoc. 103 (2008), pp. 1556–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wang X., Li G. and Lin L., Empirical likelihood inference for semi-parametric varying-coefficient partially linear EV models, Metrika. 73 (2011), pp. 171–185. [Google Scholar]
  • 26.Wang Z. and Xue L., Variable selection for high dimensional partially linear varying coefficient errors-in-variables models, Hacet. J. Math. Stat. 48 (2019), pp. 213–229. [Google Scholar]
  • 27.Wei C., Statistical inference for restricted partially linear varying coefficient errors-in-variables models, J. Statist. Plann. Inference 142 (2012), pp. 2464–2472. [Google Scholar]
  • 28.Xia Y. and Da H., Block empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models with longitudinal data, J. Probab. Stat. 168 (2013), pp. 175–186. [Google Scholar]
  • 29.Xue L., Qu A. and Zhou J., Consistent model selection for marginal generalized additive model for correlated data, J. Am. Stat. Assoc. 105 (2010), pp. 1518–1530. [Google Scholar]
  • 30.You J. and Zhou Y., Empirical likelihood for semiparametric varying-coefficient partially linear regression models, Statist. Probab. Lett. 76 (2006), pp. 412–422. [Google Scholar]
  • 31.Zhang J., Feng Z., Xu P. and Liang H., Generalized varying coefficient partially linear measurement errors models, Ann. Inst. Statist. Math. 69 (2017), pp. 97–120. [Google Scholar]
  • 32.Zhang W., Lee S.Y. and Song X., Local polynomial fitting in semivarying coefficient model, J. Multivariate Anal. 82 (2002), pp. 166–188. [Google Scholar]
  • 33.Zhang W., Li G. and Xue L., Profile inference on partially linear varying-coefficient errors-in-variables models under restricted condition, Comput. Statist. Data Anal. 55 (2011), pp. 3027–3040. [Google Scholar]
  • 34.Zhao M., Gao Y. and Cui Y., Variable selection for longitudinal varying coefficient errors-in-variables models, Comm. Statist. Theory Methods. 19 (2020), pp. 1–26. [Google Scholar]
  • 35.Zhao P. and Xue L., Empirical likelihood inferences for semiparametric varying coefficient partially linear errors-in-variables models with longitudinal data, J. Nonparametr. Stat. 21 (2009), pp. 907–923. [Google Scholar]
  • 36.Zhao P. and Xue L., Variable selection for semiparametric varying coefficient partially linear errors-in-variables models, J. Multivariate Anal. 101 (2010), pp. 1872–1883. [Google Scholar]
  • 37.Zhou X. and Liang H., Statistical inference for semiparametric varying coefficient partially linear models with error-prone linear covariates, Ann. Statist. 37 (2009), pp. 427–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhou X., Zhao P. and Lin L., Empirical likelihood for parameters in an additive partially linear errors-in-variables model with longitudinal data, J. Korean Stat. Soc. 43 (2014), pp. 91–103. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES