Model estimation and selection for partial linear varying coefficient EV models with longitudinal data

Mingtao Zhao; Xiaoli Xu; Yanling Zhu; Kongsheng Zhang; Yan Zhou

doi:10.1080/02664763.2021.1904847

. 2021 Mar 23;50(3):512–534. doi: 10.1080/02664763.2021.1904847

Model estimation and selection for partial linear varying coefficient EV models with longitudinal data

Mingtao Zhao ^a, Xiaoli Xu ^b, Yanling Zhu ^a, Kongsheng Zhang ^a, Yan Zhou ^c,^CONTACT

PMCID: PMC9930868 PMID: 36819082

Abstract

In this paper, we consider the estimation and model selection for longitudinal partial linear varying coefficient errors-in-variables (EV) models when the covariates are measured with some additive errors. Bias-corrected penalized quadratic inference functions method is proposed based on quadratic inference functions with two penalty function terms. The proposed method can not only handle the measurement errors of covariates and within-subject correlations but also estimate and select significant non-zero parametric and nonparametric components simultaneously. With some regularization conditions, the resulting estimators of parameters are asymptotically normal and the estimators of nonparametric varying coefficient achieves the optimal convergence rate. Furthermore, we present simulation studies and a real example analysis to evaluate the finite sample performance of the proposed method.

Keywords: Longitudinal data, variable selection, partial linear varying coefficient EV models, quadratic inference function

1. Introduction

Varying coefficient models [7] have better interpretability and flexibility than linear models and can avoid the curse of dimensionality. They are usually applied for analysis of longitudinal and clustered data. In varying coefficient model, the regression coefficients are unknown nonparametric functions and allowed to depend on time or some other covariates. They facilitate the study of dynamic features. A survey about studies for varying coefficient models can be seen in [15].

As we know, not all of the coefficients are varying in some special cases. Thus, we consider the partial linear varying coefficient model [12] based on longitudinal data. Suppose the longitudinal data

\{(Y_{i j}, X_{i j}, Z_{i j}, t_{i j}) : i = 1, 2, \dots, n, j = 1, 2, \dots, n_{i}\}

satisfies the following partial linear varying coefficient model as

Y_{i j} = X_{i j}^{T} β + Z_{i j}^{T} α (t_{i j}) + ϵ_{i j},

(1)

where $Y_{i j} \in R$ and $(X_{i j}, Z_{i j}) \in R^{p} \times R^{q}$ are the response and covariate at $t_{i j} \in R$ , $X_{i j} = (X_{i j}^{1}, {X_{i j}}^{2}, \dots, X_{i j}^{p})^{T}$ , $Z_{i j} = (Z_{i j}^{1}, Z_{i j}^{2}, \dots, Z_{i j}^{q})^{T}$ , $ϵ_{i j} \in R$ is a zero-mean stochastic process, $ϵ_{i j}$ and $(X_{i j}, Z_{i j})$ are independent of each other. $β = (β_{1}, β_{2}, \dots, β_{p})^{T}$ is a regression parameter vector. $\forall t \in R$ , $α (t) = (α_{1} (t), α_{2} (t), \dots, α_{q} (t))^{T}$ is coefficient functions vector with $α_{l} (t) (l = 1, 2, \dots, q)$ being unknown smoothing coefficient functions of t, $t \in [0, 1]$ . We further state some assumptions on the first two moments of model (1) as $E (Y_{i j} | X_{i j}, Z_{i j}, t_{i j}) = μ_{i j}$ and $var (Y_{i j} | X_{i j}, Z_{i j}, t_{i j}) = ν (μ_{i j})$ , where $E (\cdot)$ , $var (\cdot)$ are the expectation and variance respectively, and $ν (\cdot)$ is known function.

Model (1) has the advantages of linear model and varying coefficient model, can reduce the modeling bias and avoid the curse of dimensionality. Recently, it has been studied by some different statistical methods, such as local polynomial fitting method [32], profile least square method [4], empirical likelihood method [10,30], quantile regression method [23], penalized quadratic inference function (pQIF) method [20] and so on. An important assumption in these methods is that covariates can be observed.

However, as all we know, it is impossible to get accurate data in practice, especially for some important covariates. No matter how data is collected, measurement errors are unavoidable or some covariates are unobserved. Ignoring measurement errors may result in biased estimators or even uncorrected results. Therefore, it is meaningful to incorporate measurement errors into model (1). In view of these, we consider that X and Z are measured with additive errors in model (1), which is the so-called partial linear varying coefficient errors-in-variables (EV) model as

\begin{aligned} \{\begin{array}{l} Y_{i j} = X_{i j}^{T} β + Z_{i j}^{T} α (t_{i j}) + ϵ_{i j} \\ W_{i j} = X_{i j} + w_{i j} \\ U_{i j} = Z_{i j} + u_{i j} \end{array}, i = 1, 2, \dots, n, j = 1, 2, \dots, n_{i}, \end{aligned}

(2)

where $W_{i j} = (W_{i j}^{1}, W_{i j}^{2}, \dots, W_{i j}^{p})^{T} \in R^{p}$ and $U_{i j} = (U_{i j}^{1}, U_{i j}^{2}, \dots, U_{i j}^{q})^{T} \in R^{q}$ can be observed directly, $w_{i j} = (w_{i j}^{1}, w_{i j}^{2}, \dots, w_{i j}^{p})^{T} \in R^{p}$ and $u_{i j} = (u_{i j}^{1}, u_{i j}^{2}, \dots, u_{i j}^{q})^{T} \in R^{q}$ are zero-mean measurement errors with diagonal covariance matrix $Σ_{w}$ and $Σ_{u}$ respectively. In addition, we assume that $cov (w_{i j_{1}}, w_{i j_{2}}) = 0, cov (u_{i j_{1}}, u_{i j_{2}}) = 0$ for $j_{1} \neq j_{2}$ , $w_{i j}$ and $u_{i j}$ are independent of each other and are all independent of $(X_{i j}, Z_{i j}, t_{i j}, ϵ_{i j})$ , where $cov (\cdot)$ denotes the covariance operator. Although these assumptions are not the weakest possible condition, to deal with measurement errors, we need extra information about $Σ_{w}$ and $Σ_{u}$ in practice. For example, we usually assume that $Σ_{w}$ and $Σ_{u}$ are known or can be estimated.

Model (2) was studied in the literatures. For the case that only $X_{i j}$ is measured with additive error, You and Chen [30] proposed two different estimators for parametric and nonparametric components with cross-sectional data. Empirical likelihood inference can be seen in Hu et al. [9], Zhao and Xue [35], Xia and Da[28], Zhou et al. [38], Fan et al. [3] and Wang et al. [25]. The cases for some linear covariates are unobserved with available ancillary variables can been in Zhou and Liang [37]. Variable selection procedure for the high-dimensional situation was studied by Wang and Xue [26]. The estimation and testing problems can be seen in Zhang et al. [33]. Wang and Zou [22] studied the model average problem of model (2). Wei [27] proposed a restricted modified profile least squares estimator for the parametric components.

For the models that only $Z_{i j}$ is unobserved and measured with additive error, empirical likelihood inference and Local bias-corrected restricted profile least squares estimators can be used for model estimation [2,6]. For generalized partial linear varying coefficient model with some linear covariates being error prone but ancillary variables being available, Zhang et al. [31] proposed a variable selection method. Zhao and Xue [36] proposed a variable selection method for the case that $X_{i j}$ and $Z_{i j}$ are measured with errors simultaneously based on the cross-sectional data.

On the other hand, model selection is an important topic for longitudinal data analysis. Some valuable research can be seen in Tian and Xue [19], Zhao et al. [34]. As far as we know, there is no study being reported on model selection for model (2) with longitudinal data when $X_{i j}$ and $Z_{i j}$ are measured with additive errors simultaneously. Taking this issue into account, inspired by [34] and [19], we mainly study the variable selection for model (2). In view of the advantages of quadratic inference functions (QIF)[16] versus generalized estimating equations (GEE) [14], a bias-corrected penalized quadratic inference functions (pQIF) method for model (2) is proposed in this paper, which can estimate and select non-zero regression parameters and coefficient functions simultaneously. Furthermore, the asymptotic properties of the proposed method and estimators are constructed.

The rest of this paper is organized as follows. In section 2, we propose the bias-corrected pQIF method. In section 3, we study the asymptotic properties of model estimation and selection results. Some issues in practical implementation are presented in section 4. Simulation studies and a real example analysis are presented in section 5. In Section 6, we present a brief conclusion and discussion of the results and methods. The proofs of some asymptotic results are provided in the Appendix.

2. Model estimation and selection method

Denote B-spline basis vector $B (t)$ with the order d as $B (t) = (B_{1} (t), B_{2} (t), \dots, B_{L} (t))^{T}$ , where L = K + d, $K (> 0)$ is the number of interior knots. Hence, following He et al. [8], $α_{l} (t) (l = 1, 2, \dots, q)$ can be represented approximately as

α_{l} (t) \approx B (t)^{T} γ_{l}, l = 1, 2, \dots, q,

(3)

where $γ_{l} \in R^{K + d}$ is a regression coefficient vector of B-spline basis.

Replace $α_{l} (t) (l = 1, 2, \dots, q)$ by (3), model (2) can be represented as

\begin{aligned} \{\begin{array}{l} Y_{i j} \approx X_{i j}^{T} β + {\tilde{Z}}_{i j}^{T} γ + ϵ_{i j} \\ W_{i j} = X_{i j} + w_{i j} \\ {\tilde{U}}_{i j} = {\tilde{Z}}_{i j} + {\tilde{u}}_{i j} \end{array}, \end{aligned}

(4)

where $γ = (γ_{1}^{T}, γ_{2}^{T}, \dots, γ_{q}^{T})^{T}$ , $B_{i j} = I_{q} \otimes B (t_{i j})$ , ${\tilde{Z}}_{i j} = B_{i j} Z_{i j}$ , ${\tilde{U}}_{i j} = B_{i j} U_{i j}$ , ${\tilde{u}}_{i j} = B_{i j} u_{i j}$ , $I_{q}$ is the $q \times q$ identity matrix. From the assumption of model (2), we can see that $w_{i j}$ and ${\tilde{u}}_{i j}$ are independent of each other, and are all independent of $(X_{i j}, Z_{i j}, t_{i j}, ϵ_{i j})$ , $E ({\tilde{u}}_{i j}) = 0$ , $cov ({\tilde{u}}_{i j}) = Σ_{\tilde{u}}$ , $Σ_{\tilde{u}} = B_{i j} Σ_{u} B_{i j}^{T}$ , $cov ({\tilde{u}}_{i j_{1}}, {\tilde{u}}_{i j_{2}}) = 0$ for $j_{1} \neq j_{2}$ . From (4), we can get the GEE about $θ = (β^{T}, γ^{T})^{T}$ as

\sum_{i = 1}^{n} {(W_{i}, {\tilde{U}}_{i})}^{T} V_{i}^{- 1} (Y_{i} - (W_{i}, {\tilde{U}}_{i}) θ) = 0

(5)

Then we can get

\begin{aligned} E & (\sum_{i = 1}^{n} {{(W_{i}, {\tilde{U}}_{i})}^{T} V_{i}^{- 1} (Y_{i} - (W_{i}, {\tilde{U}}_{i}) θ)}) \\ = E (\sum_{i = 1}^{n} {{(X_{i}, {\tilde{Z}}_{i})}^{T} V_{i}^{- 1} (Y_{i} - (X_{i}, {\tilde{Z}}_{i}) θ)}) - n E ((w_{i}, {\tilde{u}}_{i})^{T} V_{i}^{- 1} (w_{i}, {\tilde{u}}_{i}) θ) \\ = - n E ((w_{i}, {\tilde{u}}_{i})^{T} V_{i}^{- 1} (w_{i}, {\tilde{u}}_{i}) θ) \neq 0. \end{aligned}

This shows that Equation (5) is biased, then we can get the bias-corrected GEE about θ as

\sum_{i = 1}^{n} \{{(W_{i}, {\tilde{U}}_{i})}^{T} V_{i}^{- 1} (Y_{i} - W_{i} β - {\tilde{U}}_{i} γ) + D_{i} θ\} = 0,

(6)

where $W_{i} = (W_{i 1}, W_{i 2}, \dots, W_{i n_{i}})^{T}$ , ${\tilde{U}}_{i} = ({\tilde{U}}_{i 1}, {\tilde{U}}_{i 2}, \dots, {\tilde{U}}_{i n_{i}})^{T}$ , $Y_{i} = (Y_{i 1}, Y_{i 2}, \dots, Y_{i n_{i}})^{T}$ , $D_{i} = E (w_{i}, {\tilde{u}}_{i})^{T} V_{i}^{- 1} (w_{i}, {\tilde{u}}_{i}))$ , $w_{i} = (w_{i 1}, w_{i 2}, \dots, w_{i n_{i}})^{T}$ , ${\tilde{u}}_{i} = ({\tilde{u}}_{i 1}^{T}, {\tilde{u}}_{i 2}^{T}, \dots, {\tilde{u}}_{i n_{i}}^{T})^{T}$ , $V_{i}$ is the covariance of $Y_{i}$ . Obviously, equation (6) is unbiased. From the GEE method, we take $V_{i}$ as $V_{i} = A_{i}^{1 / 2} R_{i} (ρ) A_{i}^{1 / 2}$ , where $A_{i} = diag (var (Y_{i 1}), var (Y_{i 2}), \dots, var (Y_{i n_{i}})) = diag (var (ϵ_{i 1}), var (ϵ_{i 2}), \dots, var (ϵ_{i n_{i}}))$ , $R_{i} (ρ)$ is a working correlation matrix, ρ is a nuisance parameter. Liang and Zeger [14] pointed out that consistent estimator of ρ may not exist in some certain simple cases, which may invalidate the GEE method.

To overcome this drawback of the GEE, Qu and Li [17] proposed the QIF method to analyze longitudinal data by assuming that $R_{i}^{- 1} (ρ) = \sum_{κ = 1}^{s} a_{κ} M_{κ}$ , where $M_{κ} (κ = 1, 2, \dots, s)$ are some simple known matrices, $a_{κ} (κ = 1, 2, \dots, s)$ are unknown constants. This approach treats $a_{κ} (κ = 1, 2, \dots, s)$ as nuisance parameters [17]. Substituting it into (6), we get the new bias-corrected GEE as

\begin{aligned} \sum_{i = 1}^{n} \{{(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} (\sum_{κ = 1}^{s} a_{κ} M_{κ}) A_{i}^{- 1 / 2} (Y_{i} - W_{i} β - {\tilde{U}}_{i} γ) + \\ E ({(w_{i}, {\tilde{u}}_{i})}^{T} A_{i}^{- 1 / 2} (\sum_{κ = 1}^{s} a_{κ} M_{κ}) A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i})) θ\} = 0. \end{aligned}

(7)

Unlike the GEE method, we do not need to estimate $a = (a_{1}, a_{2}, \dots, a_{s})$ . Instead, define the bias-corrected extended score function ${\bar{g}}_{n} (θ)$ as

\begin{aligned} {\bar{g}}_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} g_{i} (θ) = \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{1} A_{i}^{- 1 / 2} (Y_{i} - W_{i} β - {\tilde{U}}_{i} γ) + D_{i}^{(1)} θ \\ ⋮ \\ {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{s} A_{i}^{- 1 / 2} (Y_{i} - W_{i} β - {\tilde{U}}_{i} γ) + D_{i}^{(s)} θ \end{matrix}), \end{aligned}

(8)

where $D_{i}^{(κ)} = E ((w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}))$ , $κ = 1, 2, \dots, s$ . Obviously, $w_{i j}$ and $u_{i j}$ are independent of each other, so we can get that, $w_{i}$ and ${\tilde{u}}_{i}$ are independent of each other and $E (w_{i}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} {\tilde{u}}_{i}) = 0$ and $E ({\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} w_{i}) = 0$ . we can get

\begin{aligned} D_{i}^{(κ)} = (\begin{matrix} D_{11, i}^{(κ)} & 0 \\ 0 & D_{22, i}^{(κ)} \end{matrix}), \end{aligned}

(9)

where $D_{11, i}^{(κ)} = E (w_{i}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} w_{i})$ , $D_{22, i}^{(κ)} = E ({\tilde{u}}_{i}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} {\tilde{u}}_{i})$ , $κ = 1, 2, \dots, s$ . By some simple matrix calculations, following as Zhao et al. (2020)[34], we have

\begin{aligned} D_{11, i}^{(κ)} & = tr (A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2}) \cdot Σ_{w} \end{aligned}

(10)

\begin{aligned} D_{22, i}^{(κ)} & = Σ_{u} \otimes (B_{i} diag (A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2}) B_{i}^{T}) \end{aligned}

(11)

where $B_{i} = (B (t_{i 1}), B (t_{i 2}), \dots, B (t_{i n_{i}}))$ , $diag (\cdot)$ denotes a diagonal matrix operator. However, the covariance matrix $Σ_{u}$ and $Σ_{v}$ are usually unknown in advance, so we need to estimate $Σ_{u}$ in practice. Under some conditions, $Σ_{u}$ and $Σ_{v}$ can usually be estimated by partial replication similar as [1].

If the longitudinal data is balanced, that is, $n_{i} = n_{0} < \infty (i = 1, 2, \dots, n)$ . Suppose that $W_{i j}$ and $U_{i j}$ can be observed $m_{i}$ times for ith subject, $W_{i j}^{(r)} = X_{i j} + w_{i j}^{(r)}$ , $U_{i j}^{(r)} = Z_{i j} + u_{i j}^{(r)}$ , $r = 1, 2, \dots, m_{i}$ , we can get two consistent, unbiased estimators ${\hat{Σ}}_{w}$ and ${\hat{Σ}}_{u}$ for $Σ_{w}$ and $Σ_{u}$ , respectively, as

\begin{aligned} {\hat{Σ}}_{w} & = \frac{1}{n n_{0}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{0}} (\frac{1}{m_{i} - 1} \sum_{r = 1}^{m_{i}} (W_{i j}^{(r)} - {\bar{W}}_{i j}) {(W_{i j}^{(r)} - {\bar{W}}_{i j})}^{T}) \end{aligned}

(12)

\begin{aligned} {\hat{Σ}}_{u} & = \frac{1}{n n_{0}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{0}} (\frac{1}{m_{i} - 1} \sum_{r = 1}^{m_{i}} (U_{i j}^{(r)} - {\bar{U}}_{i j}) {(U_{i j}^{(r)} - {\bar{U}}_{i j})}^{T}) \end{aligned}

(13)

where ${\bar{W}}_{i j} = m_{i}^{- 1} \sum_{i = 1}^{m_{i}} W_{i j}^{(r)}$ , ${\bar{U}}_{i j} = m_{i}^{- 1} \sum_{i = 1}^{m_{i}} U_{i j}^{(r)}$ . Furthermore, we can get two consistent, unbiased estimators ${\hat{D}}_{11, i}^{(κ)}$ and ${\hat{D}}_{22, i}^{(κ)}$ for $D_{11, i}^{(κ)}$ and $D_{22, i}^{(κ)}$ , respectively, as

\begin{aligned} {\hat{D}}_{11, i}^{(κ)} & = tr (A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2}) \cdot {\hat{Σ}}_{w} \end{aligned}

(14)

\begin{aligned} {\hat{D}}_{22, i}^{(κ)} & = {\hat{Σ}}_{u} \otimes (B_{i} diag (A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2}) B_{i}^{T}) \end{aligned}

(15)

Substituting (14) and (15) into (9), we can get a consistent, unbiased estimator ${\hat{D}}_{i}^{(κ)}$ for $D_{i}^{(κ)}$ as

\begin{aligned} {\hat{D}}_{i}^{(κ)} = (\begin{matrix} {\hat{D}}_{11, i}^{(κ)} & 0 \\ 0 & {\hat{D}}_{22, i}^{(κ)} \end{matrix}), κ = 1, 2, \dots, s . \end{aligned}

(16)

If the longitudinal data is unbalanced, following Xue et al. [29], it can be reformulated to balanced, the details are omitted here.

According to (16), we can get a estimator ${\hat{\bar{g}}}_{n} (θ)$ for ${\bar{g}}_{n} (θ)$ as

\begin{aligned} {\hat{\bar{g}}}_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} {\hat{g}}_{i} (θ) = \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{1} A_{i}^{- 1 / 2} (Y_{i} - W_{i} β - {\tilde{U}}_{i} γ) + {\hat{D}}_{i}^{(1)} θ \\ ⋮ \\ {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{s} A_{i}^{- 1 / 2} (Y_{i} - W_{i} β - {\tilde{U}}_{i} γ) + {\hat{D}}_{i}^{(s)} θ \end{matrix}) . \end{aligned}

(17)

Obviously, ${\hat{\bar{g}}}_{n} (θ)$ is a $s (p + q (K + d)) \times 1$ vector, however, θ is a $(p + q (K + d)) \times 1$ parameter vector. Equation $E ({\hat{\bar{g}}}_{n} (θ)) = 0$ is over-identified and can not be used to solve the $\hat{θ}$ . To solve this problem, following Qu et al. [16], we construct the bias-corrected QIF about θ as

Q_{n} (θ) = n {\hat{\bar{g}}}_{n}^{T} (θ) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (θ),

(18)

where $Ω_{n} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{g}}_{i} (θ) {\hat{g}}_{i}^{T} (θ)$ . Furthermore, we can get $\tilde{θ}$ as

\tilde{θ} = \arg min_{θ} Q_{n} (θ) .

(19)

As mentioned above, the bias-corrected QIF can correct the bias of estimating equations and handle within-subject correlations simultaneously. However, the bias-corrected QIF method for nonparametric coefficient functions is the spline regression approach and usually over-fitted. Not only that, but the true model is unknown in practice. To solve these issues, we construct the bias-corrected pQIF to estimate and select significant parameters and varying coefficients simultaneously, defined as

Q_{p} (θ) = Q_{n} (θ) + n \sum_{k = 1}^{p} p_{λ_{1 k}} (| β_{k} |) + n \sum_{l = 1}^{q} p_{λ_{2 l}} ({∥ γ_{l} ∥}_{H}),

(20)

where $∥ γ_{l} ∥_{H} = (γ_{l}^{T} H γ_{l})^{1 / 2}$ , $H = (h_{i j})_{L \times L}$ , $h_{i j} = \int_{0}^{1} B_{i} (t) B_{j}^{T} (t) d t$ , $p_{λ} (\cdot)$ is the SCAD penalty function [5] defined as

p_{λ}^{'} (w) = λ \{I (w \leq λ) + \frac{{(a λ - w)}_{+}}{(a - 1) λ} I (w > λ)\},

(21)

where a = 3.7, w>0 and $p_{λ} (0) = 0$ , λ is a tuning parameter and measures the amount of penalty. Therefore, denote $λ_{1 k} (k = 1, 2, \dots, p)$ and $λ_{2 l} (l = 1, 2, \dots, q)$ for $β_{k} (k = 1, 2, \dots, p)$ and $α_{l} (t) (l = 1, 2, \dots, q)$ respectively in (20). The bias-corrected pQIF estimator $\hat{θ}$ is given by

\hat{θ} = ({\hat{β}}^{T}, {\hat{γ}}^{T})^{T} = \arg min_{θ} Q_{p} (θ) .

(22)

Furthermore, the estimators of $β_{l} (t) (l = 1, 2, \dots, q)$ can be obtained by

{\hat{α}}_{l} (t) = B (t)^{T} {\hat{γ}}_{l}, l = 1, 2, \dots, q .

(23)

3. Asymptotic properties

We now construct the asymptotic properties of ${\hat{β}}_{k} (k = 1, 2, \dots, p)$ and ${\hat{α}}_{l} (t) (l = 1, 2, \dots, q)$ . Firstly, let $β_{0} = (β_{10}, β_{20}, \dots, β_{p 0})^{T}$ and $α_{0} (t) = (α_{10} (t), α_{20} (t), \dots, α_{q 0} (t))^{T}$ be the true regression parameters and coefficient functions, $γ_{l 0} (l = 1, 2, \dots, q)$ be B-spline regression coefficient vectors from the spline approximation to $α_{l 0} (t) (l = 1, 2, \dots, q)$ . Furthermore, we assume that

\begin{aligned} β_{k 0} \neq 0 (k = 1, 2, \dots, p_{1}), β_{k 0} = 0 (k = p_{1} + 1, p_{1} + 2, \dots, p), \\ α_{l 0} (t) \neq 0 (l = 1, 2, \dots, q_{1}), α_{l 0} (t) = 0 (l = q_{1} + 1, q_{2} + 2, \dots, q) . \end{aligned}

Some necessary regularity conditions for the asymptotic properties are as follows.

C1:
$0 < n_{i} < \infty$ for $i = 1, 2, \dots, n$ .
C2:
$α_{l} (t) (l = 1, 2, \dots, q)$ are rth continuously differentiable on $(0, 1)$ , where $r \geq 2$ .
C3:
∃ unique $θ_{0} \in Θ$ satisfies $E ({\hat{\bar{g}}}_{n} (θ_{0})) = o (1)$ , where Θ is the parameter space.
C4:
∃ invertible $Ω_{0}$ such that $Ω_{n} ⟹ a . s . Ω_{0}$ .
C5:
$E (ϵ_{i} ϵ_{i}^{T}) = V_{i}, sup_{i} ∥ V_{i} ∥ < \infty$ , and ∃ $δ > 0$ such that $sup_{i} E {{∥ ϵ_{i} ∥}^{2 + δ}} < \infty$ , $E ∥ w_{i} ∥^{8} < \infty$ , $E ∥ u_{i} ∥^{8} < \infty$ , where $∥ \cdot ∥$ is the modulus of the largest singular values.
C6:
$A_{i} \geq 0$ , $sup_{i} ∥ A_{i} ∥ < \infty$ .
C7:
$E ∥ X_{i} ∥^{4} < \infty$ , $E ∥ Z_{i} ∥^{4} < \infty$ , $i = 1, 2, \dots, n$ .
C8:
Denote interior knots as ${τ_{i}, i = 1, 2, \dots, K}$ and satisfy $max_{1 \leq i \leq K} | Δ τ_{i + 1} - Δ τ_{i} | = o (K^{- 1})$ and $\frac{Δ τ_{max}}{Δ τ_{min}} \leq C$ , where $C \geq 0$ , $Δ τ_{max} = max_{1 \leq i \leq K} τ_{i}, Δ τ_{min} = min_{1 \leq i \leq K} τ_{i}$ , $Δ τ_{i} = τ_{i} - τ_{i - 1}$ , $τ_{0} = 0$ , $τ_{K + 1} = 1$ .
C9:
${\dot{\hat{\bar{g}}}}_{n} (θ) = \frac{\partial {\hat{\bar{g}}}_{n} (θ)}{\partial θ}$ exists and is continuous, and according to the weak law of large number, when $\hat{θ} ⟹ p θ_{0}$ , ∃ $J_{0}$ such that
$\begin{aligned} lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} E (\begin{matrix} {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{1} A_{i}^{- 1 / 2} (W_{i}, {\tilde{U}}_{i}) \\ ⋮ \\ {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{s} A_{i}^{- 1 / 2} (W_{i}, {\tilde{U}}_{i}) \end{matrix}) \equiv J_{0} . \end{aligned}$ (24)
C10:
Denote $a_{n} = max_{k, l} {| p_{λ_{1 k}}^{''} (| β_{k 0} |) |, | p_{λ_{2 l}}^{''} (∥ γ_{l 0} ∥_{H}) |, β_{k 0} \neq 0, γ_{l 0} \neq 0}$ , then $a_{n} \to 0$ as $n \to \infty$ .
C11:
$p_{λ} (t)$ satisfies
$\begin{aligned} \underset{n \to \infty}{lim inf} \underset{β_{k} \to 0^{+}}{lim inf} λ_{1 k}^{- 1} p_{λ_{1 k}}^{'} (| β_{k} |) > 0, k = p_{1} + 1, p_{1} + 2, \dots, p . \end{aligned}$ (25)

$\begin{aligned} \underset{n \to \infty}{lim inf} \underset{{∥ γ_{l} ∥}_{H} \to 0}{lim inf} λ_{2 l}^{- 1} p_{λ_{2 l}}^{'} ({∥ γ_{l} ∥}_{H}) > 0, l = q_{1} + 1, q_{1} + 2, \dots, q . \end{aligned}$ (26)

Remark 3.1

These conditions are often used in the literatures for nonparametric and semi-parametric statistical inference. C1 implies $N = \sum_{i = 1}^{n} n_{i} = O (n)$ . C2 is the smoothness condition about $α_{l} (t) (l = 1, 2, \dots, q)$ and necessary condition to study the convergence rate of B-spline estimator. C4 and C9 can be easily obtained by the weak law of large numbers when $n \to \infty$ . C3, C5-C7, C9 can be seen in [20]. C8 is necessary for knots of B-spline basis approximations [18]. C10 and C11 can be seen in [5,20,36].

According these conditions above, some asymptotic properties about resulting estimators are presented as follows.

Theorem 3.1

If C1-C11 hold, and $K = O (N^{1 / (2 r + 1)})$ , we have

$| {\hat{α}}_{l} (\cdot) - α_{l 0} (\cdot) | = O_{p} (n^{- r / (2 r + 1)}), l = 1, 2, \dots, q .$ (27)

Theorem 3.2

If C1-C11 hold, and $K = O (N^{1 / (2 r + 1)})$ , let $λ_{max} = max_{k, l} {λ_{1 k}, λ_{2 l}}$ , $λ_{min} = min_{k, l} {λ_{1 k}, λ_{2 l}}$ satisfy $λ_{max} \to 0$ , $n^{r / (2 r + 1)} λ_{min} \to + \infty$ , then with probability tending to 1, we have

${\hat{β}}_{k} (\cdot) = 0, k = p_{1} + 1, \dots, p,$

${\hat{α}}_{l} (\cdot) \equiv 0, l = q_{1} + 1, \dots, q .$

Theorem 3.3

Denote ${\hat{β}}^{*} = ({\hat{β}}_{1}, {\hat{β}}_{2}, \dots, {\hat{β}}_{p_{1}})^{T}$ as the estimator of $β^{*} = (β_{1}, β_{2}, \dots, β_{p_{1}})^{T}$ . If C1-C11 hold, and $K = O (N^{1 / (2 r + 1)})$ , we have

$\sqrt{n} ({\hat{β}}^{*} - β_{0}^{*}) ⟹ L N (0, A_{0} (J_{θ_{0}^{*}} Ω_{0}^{- 1} J_{θ_{0}^{*}}^{T})^{- 1} A_{0}^{T})$ (28)

where $A_{0}$ is denoted as Equation (A11) in Appendix, “ $⟹ L$ ” represents the convergence in distribution.

Remark 3.2

Theorem 1 shows that the estimators of varying coefficients have the optimal convergence rate, Theorem 2 shows that the estimators of constant coefficients and varying coefficients have sparse property. From Theorem 1-3, we know that the proposed method possesses the oracle property.

4. Computational algorithm and selection of tuning parameters

4.1. Computational algorithm

It is obvious that $\hat{θ}$ by (22) does not have closed form and $Q_{n} (\cdot)$ is irregular at the origin which means that we can only get numerical solution of $\hat{θ}$ . Therefore, $Q_{n} (\cdot)$ can be approximated around a given point $θ^{(0)}$ using Taylor expansion as

Q_{n} (θ) \approx Q_{n} (θ^{(0)}) + {\dot{Q}}_{n} (θ^{(0)})^{T} (θ - θ^{(0)}) + \frac{1}{2} (θ - θ^{(0)})^{T} {\ddot{Q}}_{n} (θ^{(0)}) (θ - θ^{(0)}),

where ${\dot{Q}}_{n} (\cdot) = \frac{\partial Q_{n} (\cdot)}{\partial θ}$ and ${\ddot{Q}}_{n} (\cdot) = \frac{\partial \dot{} Q_{n} (\cdot)}{\partial θ}$ . On the other hand, $p_{λ} (\cdot)$ can be approximated as

p_{λ} (| t |) \approx p_{λ} (| t_{0} |) + \frac{1}{2} \frac{p_{λ}^{'} (| t_{0} |)}{| t_{0} |} (t^{2} - t_{0}^{2}), f o r t \approx t_{0},

where $t_{0}$ is an initial value. Therefore, apart from a constant, the bias-corrected pQIF can represented as

\begin{aligned} Q_{p} (θ) & \approx Q_{n} (θ^{(0)}) + {\dot{Q}}_{n} (θ^{(0)})^{T} (θ - θ^{(0)}) \\ + \frac{1}{2} (θ - θ^{(0)})^{T} {\ddot{Q}}_{n} (θ^{(0)}) (θ - θ^{(0)}) + \frac{n}{2} θ^{T} Σ_{λ} (θ^{(0)}) θ, \end{aligned}

(29)

where

\begin{aligned} Σ_{λ} (θ^{(0)}) \\ = diag \{\frac{p_{λ_{11}}^{'} (| β_{1}^{(0)} |)}{| β_{1}^{(0)} |}, \dots, \frac{p_{λ_{1 p}}^{'} (| β_{p}^{(0)} |)}{| β_{p}^{(0)} |}, \frac{p_{λ_{21}}^{'} (∥ γ_{1}^{(0)} ∥_{H})}{∥ γ_{1}^{(0)} ∥_{H}} H, \dots, \frac{p_{λ_{2 q}}^{'} (∥ γ_{q}^{(0)} ∥_{H})}{∥ γ_{q}^{(0)} ∥_{H}} H\} . \end{aligned}

According to (29), $\hat{θ}$ can be solved by following calculation algorithm

θ^{(1)} \approx θ^{(0)} - {{\ddot{Q}}_{n} (θ^{(0)}) + n Σ_{λ} (θ^{(0)})}^{- 1} {{\dot{Q}}_{n} (θ^{(0)}) + n Σ_{λ} (θ^{(0)}) θ^{(0)}} .

The detailed computational algorithm iterative calculation method is shown below.

Step 1: Take the bias-corrected QIF estimator $\tilde{θ}$ denoted by (19) as $θ^{(0)}$ .
Step 2: Update $\hat{θ}$ at the $(k + 1)$ th iteration by
$θ^{(k + 1)} \approx θ^{(k)} - {{\ddot{Q}}_{n} (θ^{(k)}) + n Σ_{λ} (θ^{(k)})}^{- 1} {{\dot{Q}}_{n} (θ^{(k)}) + n Σ_{λ} (θ^{(k)}) θ^{(k)}} .$
Step 3: Repeat Step 2 until certain convergence criterion is satisfied.

4.2. Selection of tuning parameters

As all we know, $λ_{1 k}$ and $λ_{2 l}$ control the amount of penalty and determine the values of the penalty function $p_{λ_{1 k}} (\cdot) (k = 1, 2, \dots, p)$ and $p_{λ_{2 l}} (l = 1, 2, \dots, q)$ . However, they are unknown in practice. These mean that unknown $λ_{1 k}$ and $λ_{2 l}$ determine the results of model estimation and selection indirectly. Thus, it is important for selection of $λ_{1 k}$ and $λ_{2 l}$ in the implementation. As Wang et al. [21] presented, the BIC criterion for SCAD estimator can select the true model with probability tending to one. In our work, we apply the BIC criterion to select the optimal tuning parameters $λ_{1 k}$ and $λ_{2 l}$ .

However, it is a challenge to select p + q parameters simultaneously in real applications. A wise way for selection of $λ_{1 k} (k = 1, 2, \dots, p)$ and $λ_{2 l} (l = 1, 2, \dots, q)$ is to give a larger value to a zero parameter or a zero coefficient function than to a non-zero parameter or non-zero coefficient function. This method aims to give more amount of penalty to zero parameters or zero coefficient functions than to non-zero parameters or non-zero coefficient functions, which is good for selecting significantly non-zero parameters or non-zero coefficient functions and can reduce computational complexity. This kind of tuning parameters usually are called adaptive tuning parameters. The proposed method using the adaptive tuning parameters can estimate large parameters and coefficient functions unbiasedly and shrink the small ones toward zero simultaneously. Thus, denote $λ_{1 k} (k = 1, 2, \dots, p)$ and $λ_{2 l} (l = 1, 2, \dots, q)$ as

λ_{1 k} = \frac{λ}{| {\tilde{β}}_{k} |}, λ_{2 l} = \frac{λ}{{∥ {\tilde{γ}}_{l} ∥}_{H}},

where ${\tilde{β}}_{k} (k = 1, 2, \dots, p)$ and ${\tilde{γ}}_{l} (l = 1, 2, \dots, q)$ are defined by (19). Consequently, the selection of $λ_{1 k} (k = 1, 2, \dots, p)$ and $λ_{2 l} (l = 1, 2, \dots, q)$ becomes a problem of selection of $λ$ , which is an easier univariate problem and can reduce computational complexity greatly. Define BIC as

B I C (λ) = Q_{n} ({\hat{θ}}_{λ}) + d f_{λ} \cdot \log (n),

(30)

where ${\hat{θ}}_{λ} = ({\hat{β}}_{λ}^{T}, {\hat{γ}}_{λ}^{T})^{T}$ is defined by (22) for a given λ, $d f_{λ}$ is the number of non-zero parameters and coefficients of ${\hat{β}}_{1 λ}, {\hat{β}}_{2 λ}, \dots, {\hat{β}}_{p λ}$ and $∥ {\hat{γ}}_{1 λ} ∥_{H}, ∥ {\hat{γ}}_{2 λ} ∥_{H}, \dots, ∥ {\hat{γ}}_{q λ} ∥_{H}$ , ${\hat{β}}_{λ} = ({\hat{β}}_{1 λ}, {\hat{β}}_{2 λ}, \dots, {\hat{β}}_{p λ})^{T}$ , ${\hat{γ}}_{λ} = ({\hat{γ}}_{1 λ}^{T}, {\hat{γ}}_{2 λ}^{T}, \dots, {\hat{γ}}_{q λ}^{T})^{T}$ . So we can get the optimal $\hat{λ}$ as

\hat{λ} = min_{λ} B I C (λ)

(31)

In practice, $\hat{λ}$ can be obtained by the grid searching method.

5. Numerical studies

5.1. Simulations studies

We conducted some numerical simulations to asses the performance of the bias-corrected pQIF method in terms of estimation accuracy and selection performance in finite samples. Firstly, the generalized mean square error (GMSE) [20,36] is defined as

G M S E = (\hat{β} - β)^{T} E (X X^{T}) (\hat{β} - β) .

Obviously, the smaller the GMSE, the better the estimation effect for β. The square root of average square (RASE) is defined as

R A S E = {\{\frac{1}{m} \sum_{l = 1}^{q} \sum_{ℓ = 1}^{m} {[{\hat{α}}_{l} (t_{ℓ}) - α_{l} (t_{ℓ})]}^{2}\}}^{1 / 2} .

Smaller RASE indicates better estimation accuracy which implies $\hat{β} (t)$ is more closer to the true function $α (t)$ . In our work, we set m = 200, and $t_{ℓ} (ℓ = 1, 2, \dots, m)$ are equally spaced on $[0, 1]$ .

‘C’ in tables below means the average number of ${\hat{β}}_{k} = 0 (k = p_{1} + 1, \dots, p)$ or $α_{l} (t) = 0 (l = q_{1} + 1, q_{1} + 2. \dots, q)$ , and ‘IC’ denotes the average number of ${\hat{β}}_{k} = 0 (k = 1, 2, \dots, p_{1})$ or $α_{l} (t) = 0 (l = 1, 2, \dots, q_{1})$ . Obviously, larger ‘C’ and smaller ‘IC’ imply better model selection results. The performance of the bias-corrected pQIF method is assessed by the GMSE, RASE, ‘C’ and ‘IC’ simultaneously.

In our simulation studies, for model (2), let $β = (β_{1}, β_{2}, β_{3}, β_{4})^{T}$ with $β_{1} = 2$ , $β_{2} = 0.7$ , $β_{k} = 0 (k = 3, 4)$ , $α (t) = (α_{1} (t), α_{2} (t), \dots, α_{6} (t))^{T}$ , $α_{l} (t) \equiv 0 (l = 3, 4, 5, 6)$ and

α_{1} (t) = 7.5 + 0.1 \exp (3 t - 1), α_{2} (t) = \sin (2 π t)

We took $X_{i j} \sim N (2, σ_{X}^{2} I_{4})$ , $Z_{i j} \sim N (2, σ_{Z}^{2} I_{6})$ , $w_{i j} \sim N (0, σ_{w}^{2} I_{4})$ , $u_{i j} \sim N (0, σ_{u}^{2} I_{6})$ , where $j = 1, 2, \dots, 10$ , $σ_{X} = σ_{Z} = 2$ , $I_{4}$ is $4 \times 4$ identify matrix, $I_{6}$ is $6 \times 6$ identify matrix. We set $σ_{w} = σ_{u}$ as 0.2, 0.4, 0.6. $t_{i j} \sim U [0, 1]$ . $ϵ_{i} = (ϵ_{i 1}, ϵ_{i 2}, \dots, ϵ_{i n_{i}})^{T} \sim N (0, σ^{2} C o r r (ϵ_{i}, ρ))$ , where $σ^{2} = 1$ and $C o r r (ϵ_{i}, ρ))$ is a known correlation matrix with parameter ρ. So we can get $A_{i} = diag (1, 1, \dots, 1)$ . We set $n_{i} = 10$ and considered $ϵ_{i}$ has the first-order autoregressive (AR(1)) and exchangeable (EX) correlation structures with $ρ = 0.3$ and $ρ = 0.7$ . We generated n = 150, 200, 300 subjects. The cubic B-spline basis was applied, the knots were equally spaced in $[0, 1]$ , $K = ⌊ c \times N^{1 / 5} ⌋$ , where $⌊ c ⌋$ denotes the largest integer less than c [8]. Following Tian et al. [20], we choose c = 0.6.

For each simulated longitudinal data, we compared the bias-corrected pQIF method with the LASSO and the SCAD penalty functions and the one neglecting measurement errors with SCAD penalty function (denoted as ‘nSCAD’). For the sake of simplicity, the bias-corrected pQIF with LASSO and SCAD penalty functions are denoted by ‘LASSO’ and ‘SCAD’ respectively. ${\hat{λ}}_{1 k} (k = 1, 2, \dots, p)$ and ${\hat{λ}}_{2 l} (l = 1, 2, \dots, q)$ were chosen by (31). Furthermore, we did 500 simulation runs under each simulation setup and presented the median of GMSE and RASE in the following tables.

In summary, from Tables 1 to 4, we can get some conclusions as follows:

The performances of the LASSO and SCAD methods are much more better than the nSCAD method in all of cases, which implies that bias-corrected method we proposed is valuable and neglecting measurement errors results in biased estimation and poor variable selection results for model (2).
Under the same conditions, the performance of the SCAD and LASSO methods become better as the sample size becomes larger. Furthermore, the SCAD methods is better than the LASSO method as for estimation and selection for parametric and nonparametric parts.
Under the same conditions, the SCAD and LASSO methods become worse when the measurement error increases. There is little difference between the performance of the SCAD and LASSO methods when the measurement error is small. However, the SCAD method is significantly better than the LASSO method when the measurement error is large, which implies that the LASSO method is less robust than that of the SCAD method.

Table 2.

Variable selections for $α (\cdot)$ with the EX correlation structure.

			n = 150			n = 200			n = 300
ρ	$σ_{u}$	Method	C	IC	RASE	C	IC	RASE	C	IC	RASE
$0.3$	0.2	LASSO	3.442	0	0.11303	3.798	0	0.09892	3.972	0	0.08785
		SCAD	3.502	0	0.11307	3.808	0	0.09879	3.974	0	0.08779
		nSCAD	3.410	0	0.13200	3.764	0	0.11801	3.948	0	0.10970
	0.4	LASSO	3.378	0	0.17904	3.696	0	0.14573	3.916	0	0.12109
		SCAD	3.414	0	0.17833	3.766	0	0.14346	3.956	0	0.11848
		nSCAD	3.134	0	0.30538	3.524	0	0.29042	3.850	0	0.27680
	0.6	LASSO	3.140	0	0.26483	3.628	0	0.20670	3.888	0	0.16369
		SCAD	3.282	0	0.25851	3.674	0	0.20247	3.922	0	0.15911
		nSCAD	2.834	0	0.60510	3.178	0	0.57348	3.588	0	0.56202
$0.7$	0.2	LASSO	3.520	0	0.10967	3.752	0	0.09818	3.970	0	0.08738
		SCAD	3.538	0	0.10946	3.776	0	0.09803	3.972	0	0.08711
		nSCAD	3.434	0	0.12746	3.804	0	0.11733	3.958	0	0.10913
	0.4	LASSO	3.334	0	0.17337	3.684	0	0.14602	3.940	0	0.12035
		SCAD	3.386	0	0.17244	3.710	0	0.14444	3.954	0	0.11865
		nSCAD	3.136	0	0.30833	3.502	0	0.29786	3.856	0	0.27834
	0.6	LASSO	3.180	0	0.26489	3.522	0	0.20862	3.900	0	0.16294
		SCAD	3.242	0	0.26246	3.662	0	0.20480	3.938	0	0.15718
		nSCAD	2.688	0	0.61163	3.132	0	0.58300	3.600	0	0.56400

Open in a new tab

Table 3.

Variable selections for β with the AR(1) correlation structure.

			n = 150			n = 200			n = 300
ρ	$σ_{u}$	Method	C	IC	GMSE	C	IC	GMSE	C	IC	GMSE
$0.3$	0.2	LASSO	1.834	0	0.00027	1.950	0	0.00018	1.992	0	0.00010
		SCAD	1.842	0	0.00017	1.960	0	0.00011	1.992	0	7.6E-05
		nSCAD	1.796	0	0.00079	1.880	0	0.00061	1.964	0	0.00060
	0.4	LASSO	1.380	0	0.00676	1.586	0	0.00184	1.754	0	0.00601
		SCAD	1.396	0	0.00154	1.594	0	0.00113	1.758	0	0.00075
		nSCAD	0.988	0	0.03460	0.954	0	0.00621	0.892	0	0.00111
	0.6	LASSO	1.028	0	0.03337	1.150	0	0.03183	1.408	0	0.03737
		SCAD	1.060	0	0.00929	1.178	0	0.00516	1.410	0	0.00418
		nSCAD	0.448	0	0.24760	0.286	0	0.16020	0.162	0	0.09212
$0.7$	0.2	LASSO	1.884	0	0.00026	1.934	0	0.00015	1.992	0	0.00011
		SCAD	1.886	0	0.00014	1.936	0	0.00010	1.996	0	6.9E-05
		nSCAD	1.814	0	0.00092	1.904	0	0.00076	1.948	0	0.00062
	0.4	LASSO	1.420	0	0.00301	1.588	0	0.00662	1.794	0	0.00126
		SCAD	1.474	0	0.00168	1.614	0	0.00107	1.824	0	0.00084
		nSCAD	1.046	0	0.00856	0.982	0	0.00197	0.886	0	0.00628
	0.6	LASSO	0.984	0	0.03522	1.212	0	0.03437	1.428	0	0.04007
		SCAD	0.994	0	0.00885	1.236	0	0.00583	1.436	0	0.00406
		nSCAD	0.460	0	0.23031	0.308	0	0.16011	0.136	0	0.08534

Open in a new tab

Table 1.

Variable selections for β with the EX correlation structure.

			n = 150			n = 200			n = 300
ρ	$σ_{u}$	Method	C	IC	GMSE	C	IC	GMSE	C	IC	GMSE
$0.3$	0.2	LASSO	1.834	0	0.00025	1.950	0	0.00017	1.988	0	0.00010
		SCAD	1.850	0	0.00014	1.960	0	9.7E-05	1.988	0	7.6E-05
		nSCAD	1.802	0	0.00089	1.878	0	0.00072	1.974	0	0.00068
	0.4	LASSO	1.420	0	0.00339	1.546	0	0.00200	1.784	0	0.00120
		SCAD	1.448	0	0.00151	1.580	0	0.00116	1.792	0	0.00077
		nSCAD	0.892	0	0.00762	1.000	0	0.00683	1.038	0	0.00534
	0.6	LASSO	1.046	0	0.01934	1.180	0	0.01422	1.404	0	0.00954
		SCAD	1.082	0	0.00820	1.200	0	0.00610	1.438	0	0.00435
		nSCAD	0.188	0	0.03413	0.314	0	0.03321	0.478	0	0.03836
$0.7$	0.2	LASSO	1.882	0	0.00028	1.962	0	0.00017	1.994	0	0.00012
		SCAD	1.892	0	0.00013	1.968	0	9.3E-05	1.996	0	6.1E-05
		nSCAD	1.870	0	0.00099	1.926	0	0.00085	1.984	0	0.00070
	0.4	LASSO	1.456	0	0.00340	1.626	0	0.00192	1.752	0	0.00121
		SCAD	1.466	0	0.00176	1.598	0	0.00117	1.784	0	0.00079
		nSCAD	0.974	0	0.00883	0.994	0	0.00723	1.000	0	0.00630
	0.6	LASSO	1.024	0	0.03529	1.262	0	0.01561	1.438	0	0.00424
		SCAD	1.068	0	0.00821	1.264	0	0.00594	1.448	0	0.00421
		nSCAD	0.170	0	0.03655	0.330	0	0.03483	0.496	0	0.02724

Open in a new tab

Table 4.

Variable selections for $α (\cdot)$ with the AR(1) correlation structure.

			n = 150			n = 200			n = 300
ρ	$σ_{u}$	Method	C	IC	RASE	C	IC	RASE	C	IC	RASE
$0.3$	0.2	LASSO	3.446	0	0.11341	3.790	0	0.10114	3.952	0	0.08951
		SCAD	3.454	0	0.11330	3.816	0	0.10087	3.958	0	0.08931
		nSCAD	3.488	0	0.13156	3.716	0	0.11831	3.960	0	0.10979
	0.4	LASSO	3.324	0	0.18076	3.672	0	0.14687	3.924	0	0.11895
		SCAD	3.396	0	0.17887	3.744	0	0.14504	3.954	0	0.11742
		nSCAD	3.064	0	0.30761	3.546	0	0.29106	3.836	0	0.27361
	0.6	LASSO	3.132	0	0.26516	3.628	0	0.20858	3.868	0	0.16263
		SCAD	3.276	0	0.25738	3.732	0	0.19892	3.934	0	0.15858
		nSCAD	2.694	0	0.60629	2.986	0	0.57842	3.468	0	0.55978
$0.7$	0.2	LASSO	3.416	0	0.11313	3.780	0	0.09946	3.964	0	0.08908
		SCAD	3.474	0	0.11247	3.806	0	0.09909	3.966	0	0.08878
		nSCAD	3.396	0	0.13017	3.710	0	0.11741	3.968	0	0.10843
	0.4	LASSO	3.278	0	0.17854	3.706	0	0.14404	3.924	0	0.11779
		SCAD	3.372	0	0.17591	3.782	0	0.14356	3.950	0	0.11729
		nSCAD	3.150	0	0.30647	3.496	0	0.28994	3.808	0	0.27429
	0.6	LASSO	3.088	0	0.26305	3.574	0	0.20768	3.86	0	0.16407
		SCAD	3.226	0	0.26080	3.67	0	0.20441	3.92	0	0.15910
		nSCAD	2.656	0	0.6041	3.018	0	0.58127	3.522	0	0.56007

Open in a new tab

5.2. Real example analysis

We now describe the performance of the proposed method through analysis of the AIDS dataset. This dataset contains some variables such as the mean CD4 percentage, smoking status, the pre-HIV infection CD4 percentage and age. It is unbalanced and available in the R package timereg. More details of the study design and medical implications can be found in [11]. It has been analyzed to illustrate partial linear varying coefficient models [20] and partial linear varying coefficient EV models [35]. Zhao et al. [37] and Tian et al. [20] indicated that only the baseline function varies over time and pre-CD4 has a constant effect over time. We now consider measurement errors for the covariates and analyze this dataset using the proposed method.

For simplicity, following Zhao and Xue [36], we considered the following model. Let Y be the individuals CD4 percentage, $X_{1}$ be the centered preCD4 percentage, $X_{2} = X_{1}^{2}$ , $Z_{1}$ be the centered age at HIV infection, $Z_{2} = Z_{1}^{2}$ .

Y = X_{1} β_{1} + X_{2} β_{2} + α_{0} (t) + Z_{1} α_{1} (t) + Z_{2} α_{2} (t) + ϵ

(32)

where $α_{0} (t)$ is the baseline of CD4 percentage; $β_{1}$ and $β_{2}$ describe the first-order and second-order effects of preCD4 percentage, $α_{1} (t)$ and $α_{2} (t)$ describes the first-order and second-order effects of the age at HIV infection, t is the visiting time for each patient.

For the AIDS dataset, we cannot get repeated measurements of the covariates or estimate the variance of the measurement error. Following Lin and Carroll (2000), a sensitivity analysis can be used to test the practicability of the proposed method. Similar as in Zhao and Xue (2010)[36], we considered $X_{1}$ and $Z_{1}$ have additive measurement errors as follows

W_{1} = X_{1} + w_{1}, U_{1} = Z_{1} + u_{1}

where $w_{1} \sim N (0, σ_{w}^{2})$ , $u_{1} \sim N (0, σ_{u}^{2})$ . In our work, we took $σ_{w} = σ_{u} = 0, 0.5, 1$ to represent different measurement errors. It is obvious that $σ_{w} = σ_{u} = 0$ implies no measurement errors.

We repeated the proposed model selection procedure and the proposed method identified two non-zero coefficients $β_{1}$ and $α_{0} (t)$ every time under different measurement errors, which means that the first-order or second-order effects of age at HIV infection have no significant impact on the mean CD4 percentage. The same goes for the second-order effect of the centered preCD4 percentage and the interaction effect between the preCD4 percentage and age at HIV infection. Our result is same as in Zhao et al. (2009).

Figure 1 shows the curve of ${\hat{α}}_{0} (t)$ over time under different measurement errors. It shows that $α_{0} (t)$ decreases quickly at the beginning of HIV infection, and the rate of decrease slows down, which is similar as in Zhao and Xue [36]. Furthermore, we found that the estimated functional curve ${\hat{α}}_{0} (t)$ preserves its shape under different measurement errors, which means that our bias-corrected model selection scheme works well. This further demonstrates that the proposed model estimation and selection method has good practical value.

6. Conclusion and discussion

Longitudinal data are widely used in some scientific fields, and it is of great significance to consider measurement error in longitudinal research. Longitudinal data have unknown within-subject correlations. Thus, the processing of within-subject correlations and measurement errors is an important subject for analysis for longitudinal data with measurement errors. In our work, we consider cases where covariates of model (2) have additive measurement errors. For model (2), some scholars have done valuable researches, such as [9,33,35–37]. However, no studies have been reported on model estimation and selection simultaneously for model (2) with longitudinal data. In our work, we proposed a bias-corrected penalized quadratic inference functions method to do model estimation and selection for model (2) with longitudinal data. This method can deal with both within-subject correlation and measurement errors. Under some conditions, the proposed method can select significant non-zero parameters and varying coefficients. Furthermore, the estimators of non-zero coefficient functions achieve the optimal convergence rate, the estimators of parameters are asymptotic normal. The performance of the proposed method in the case of finite samples can be demonstrated by numerical studies. Finally, it can be concluded that the proposed method has good theoretical and practical value for model estimation and selection of model (2).

The proposed method can also be applied to other models, such as generalized partial linear additive models, generalized partial linear single index models and many others. In addition, the proposed method can also be used for other types of correlated data analysis, such as panel data, clustered data and so on. In future, we will use this method to study more complex models.

Acknowledgements

This work is supported by grants from the Social Science Foundation of China (15CTJ008 to MZ), the Natural Science Foundation of Anhui Universities (KJ2017A433 to KZ), the Social Science Foundation of the Ministry of Education of China(19YJCZH250 to KZ), the National Science Foundation of China (12071305, 11871390 and 11871411 to YZ), the Excellent Young Talents Fund Program of Higher Education Institutions of Anhui Province(gxyqZD2019031 to YZ), the National Science Foundation of China (71803001 to YZ). This paper is partially supported by the National Natural Science Foundation of China (11901401). All authors read and approved the final manuscript.

Appendix. Proof of theorems.

Lemma 1. If C1-C11 hold, and $K = O (N^{1 / (2 r + 1)})$ , then we have

{\dot{\hat{\bar{g}}}}_{n} (β) ⟹ p - J_{0}, \sqrt{n} {\hat{\bar{g}}}_{n} (θ_{0}) ⟹ L N (0, Ω_{0}) .

Proof.

According to (17), we have

$\begin{aligned} {\dot{\hat{\bar{g}}}}_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} - {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{1} A_{i}^{- 1 / 2} (W_{i}, {\tilde{U}}_{i}) + {\hat{D}}_{i}^{(1)} \\ ⋮ \\ - {(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{s} A_{i}^{- 1 / 2} (W_{i}, {\tilde{U}}_{i}) + {\hat{D}}_{i}^{(s)} \end{matrix}) . \end{aligned}$

Denote the κth block matrix of ${\dot{\hat{\bar{g}}}}_{n} (β)$ as ${\dot{\hat{\bar{g}}}}_{n κ} (β)$ , $κ = 1, 2, \dots, s$

$\begin{aligned} {\dot{\hat{\bar{g}}}}_{n κ} (θ) & = - \frac{1}{n} \sum_{i = 1}^{n} ({(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (W_{i}, {\tilde{U}}_{i}) - {\hat{D}}_{i}^{(k)}) \\ = - \frac{1}{n} \sum_{i = 1}^{n} ({(X_{i} + w_{i}, {\tilde{Z}}_{i} + {\tilde{u}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (X_{i} + w_{i}, {\tilde{Z}}_{i} + {\tilde{u}}_{i}) - {\hat{D}}_{i}^{(κ)}) \\ = - \frac{1}{n} \sum_{i = 1}^{n} ({(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (X_{i}, {\tilde{Z}}_{i}) + {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) \\ + (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (X_{i}, {\tilde{Z}}_{i}) + (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) - {\hat{D}}_{i}^{(k)}) \\ = - (Δ_{1} + Δ_{2} + Δ_{3} + Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(κ)}) \end{aligned}$

Now, we prove $Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} ⟹ p 0$ as $n \to \infty$ .

$\begin{aligned} Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} & = \frac{1}{n} \sum_{i = 1}^{n} (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) - D_{i}^{(k)} + D_{i}^{(k)} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} \end{aligned}$

Clearly, according to the law of large numbers, we have $\frac{1}{n} \sum_{i = 1}^{n} (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) - D_{i}^{(k)} ⟹ p 0$ and $D_{i}^{(k)} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} ⟹ p 0$ as the $n \to \infty$ . So we get $Δ_{4} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(κ)} ⟹ p 0$ . Under C9, we can get $Δ_{1} ⟹ p J_{0}^{(κ)}$ . Now, let's prove that $Δ_{2} ⟹ p 0$ and $Δ_{3} ⟹ p 0$ .

Denote $Δ_{2} = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i κ}$ , where $ξ_{i κ} = {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) .$ Obviously, we can get $E (ξ_{i κ}) = 0$ and

$cov (ξ_{i κ}) = {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} E (w_{i}, {\tilde{u}}_{i}) (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (X_{i}, {\tilde{Z}}_{i})$

where $E ((w_{i}, {\tilde{u}}_{i}) (w_{i}, {\tilde{u}}_{i})^{T}) = diag (E (w_{i} w_{i}^{T}), E ({\tilde{u}}_{i} {\tilde{u}}_{i}^{T}))$ . From C4-C7, we see that $E (w_{i} w_{i}^{T})$ and $E ({\tilde{u}}_{i} {\tilde{u}}_{i}^{T}))$ are bounded. By the law of large numbers, we can get $Δ_{3}^{T} = Δ_{2} ⟹ p 0$ . Thus, we have ${\dot{\hat{\bar{g}}}}_{n κ} (θ) ⟹ p - J_{0}^{(k)}$ and ${\dot{\hat{\bar{g}}}}_{n} (θ) ⟹ p - J_{0}$ where $J_{0} = (J_{0}^{(1)}, J_{0}^{(2)}, \dots, J_{0}^{(s)})^{T}$ .

According to the Taylor expansion to ${\hat{\bar{g}}}_{n} (θ)$ at $θ_{0}$ , we have

${\hat{\bar{g}}}_{n} (θ) = {\hat{\bar{g}}}_{n} (θ_{0}) + {\dot{\hat{\bar{g}}}}_{n} (θ_{0}) (θ - θ_{0}) + o (θ - θ_{0}) .$ (A1)

Denote the κth block matrix of ${\hat{\bar{g}}}_{n} (θ_{0})$ as ${\hat{\bar{g}}}_{n κ} (θ_{0})$ , $κ = 1, 2, \dots, s$

$\begin{aligned} {\hat{\bar{g}}}_{n κ} (θ_{0}) & = \frac{1}{n} \sum_{i = 1}^{n} ({(W_{i}, {\tilde{U}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (Y_{i} - W_{i} β_{0} - {\tilde{U}}_{i} γ_{0}) + {\hat{D}}_{i}^{(κ)} θ_{0}) \\ = \frac{1}{n} \sum_{i = 1}^{n} ({((X_{i}, {\tilde{Z}}_{i}) + (w_{i}, {\tilde{u}}_{i}))}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (ϵ_{i} - (w_{i}, {\tilde{u}}_{i}) θ_{0} + Z_{i} R (t_{i})) + {\hat{D}}_{i}^{(κ)} θ_{0}) \\ = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} ϵ_{i} - \frac{1}{n} \sum_{i = 1}^{n} {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) θ_{0} \\ + \frac{1}{n} \sum_{i = 1}^{n} {(w_{i}, {\tilde{u}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} Z_{i} R (t_{i}) + \frac{1}{n} \sum_{i = 1}^{n} {(w_{i}, {\tilde{u}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} ϵ_{i} \\ + \frac{1}{n} \sum_{i = 1}^{n} {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} Z_{i} R (t_{i}) \\ - \frac{1}{n} \sum_{i = 1}^{n} {(w_{i}, {\tilde{u}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) θ_{0} + \frac{1}{n} \sum_{i = 1}^{n} D_{i}^{(κ)} θ_{0} \\ = J_{1} - J_{2} + J_{3} + J_{4} + J_{5} - J_{6} + \frac{1}{n} \sum_{i = 1}^{n} D_{i}^{(κ)} θ_{0} \end{aligned}$

where $R (t) = (R_{1} (t), R_{2} (t), \dots, R_{q} (t))^{T}$ , $R_{l} (t) = α_{l} (t) - B^{T} (t) γ_{l 0}, l = 1, 2, \dots, q$ .

Denote $J_{1} = \frac{1}{n} \sum_{i = 1}^{n} φ_{i}$ , where ${(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} ϵ_{i}$ . According to C5-C7 and Lemma 1, we have $E (φ_{i}) = 0$ and

$cov (φ_{i}) = (X_{i}, {\tilde{Z}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} V_{i} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (X_{i}, {\tilde{Z}}_{i}) < \infty$

By the law of large numbers, we get $J_{1} ⟹ p 0$ . Similarly, we have $J_{2} ⟹ p 0$ and $J_{3} ⟹ p 0$ .

Denote $J_{4} = \frac{1}{n} \sum_{i = 1}^{n} ϕ_{i}$ , where $ϕ_{i} = (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} ϵ_{i}$ . And since $ϵ_{i}, w_{i}, {\tilde{u}}_{i}$ are independent of each other, we have $E (ϕ_{i}) = 0$ . According to the Cauchy-Schwarz inequality and C5-C7 we have

$(cov (φ_{i}))^{2} = E ((w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i})) E (ϵ_{i}^{T} A_{i}^{- 1 / 2} M_{k} A_{i}^{- 1 / 2} ϵ_{i}) < \infty$

Thus, $J_{4} ⟹ p 0$ . By the law of large numbers, from the definition of ${\hat{D}}_{i}^{(κ)}$ , we have $J_{6} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i}^{(k)} θ_{0} ⟹ p 0$ . From C8 and Lemma 1, we have $J_{5} = O_{p} (n^{- 1 / 2} K^{- r}) = o_{p} (n^{- 1 / 2})$ and $J_{3} = o_{p} (n^{- 1 / 2})$ . So, according to (A1), we have ${\hat{\bar{g}}}_{n} (θ) ⟹ p J_{0} (θ_{0} - θ), \forall θ \in Θ$ .

Following [20], according to the results above, we have

$\begin{aligned} {\hat{\bar{g}}}_{n κ} (θ_{0}) & = \frac{1}{n} \sum_{i = 1}^{n} {((X_{i}, {\tilde{Z}}_{i}) + (w_{i}, {\tilde{u}}_{i}))}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (ϵ_{i} - (w_{i}, {\tilde{u}}_{i}) θ_{0}) + {\hat{D}}_{i}^{(κ)} θ_{0}) + o_{p} (n^{- 1 / 2}) \\ = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} ϵ_{i} - {(X_{i}, {\tilde{Z}}_{i})}^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) θ_{0} + o_{p} (n^{- 1 / 2}) \\ + (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} ϵ_{i} - (w_{i}, {\tilde{u}}_{i})^{T} A_{i}^{- 1 / 2} M_{κ} A_{i}^{- 1 / 2} (w_{i}, {\tilde{u}}_{i}) θ_{0} - {\hat{D}}_{i}^{(κ)} θ_{0}) + o_{p} (n^{- 1 / 2}) \\ = \frac{1}{n} \sum_{i = 1}^{n} (ψ_{i κ 1} + ψ_{i κ 2} + ψ_{i κ 3} + ψ_{i κ 4}) + o_{p} (n^{- 1 / 2}) \\ = \frac{1}{n} \sum_{i = 1}^{n} ψ_{i κ} + o_{p} (n^{- 1 / 2}) \end{aligned}$

where $ψ_{i} = (ψ_{i 1}, ψ_{i 2}, \dots, ψ_{i s})^{T}$ , $ψ_{i κ} = ψ_{i κ 1} + ψ_{i κ 2} + ψ_{i κ 3} + ψ_{i κ 4}$ . So we have

${\hat{\bar{g}}}_{n} (θ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} ψ_{i} + o_{p} (n^{- 1 / 2}), a n d Ω_{n} (θ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} ψ_{i} ψ_{i}^{T} + o (1) .$

From C5-C7, we get $E (ψ_{i k m}) = 0, cov (ψ_{i k m}) < \infty, m = 1, 2, 3, 4$ . Following the properties of covariance matrix, we have

$\begin{aligned} cov (ψ_{i k}) \leq \sum_{m = 1}^{4} cov (ψ_{i k m}) + \sum_{m \neq l} \sqrt{cov (ψ_{i k m}) cov (ψ_{i k l})} < \infty, \\ \forall a \in R^{s (p + q (K + d))}, a^{T} a = 1, E (a^{T} ψ_{i}) = 0, sup_{i} E ∥ a^{T} ψ_{i} ∥ \leq ∥ a^{T} ∥ sup_{i} ∥ ψ_{i} ∥^{3} . \end{aligned}$

So $a^{T} ψ_{i}$ satisfies the Lyapunov condition for the central limit theorem. Thus

${(a^{T} \sum_{i = 1}^{n} cov (ψ_{i}) a)}^{- 1 / 2} (\sum_{i = 1}^{n} a^{T} ψ_{i}) ⟹ L N (0, 1) .$

According to the Slutsky Theorem, we have $\sqrt{n} {\hat{\bar{g}}}_{n} (θ_{0}) ⟹ L N (0, Ω_{0}), {\hat{\bar{g}}}_{n} (θ_{0}) = O_{p} (n^{- 1 / 2})$ . The proof of Lemma 1 is completed.

Lemma 1

If C1-C11 hold, $K = O (n^{1 / (2 r + 1)})$ then

$\begin{aligned} ∥ n^{- 1} {\dot{Q}}_{n} (θ_{0}) - 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (θ_{0}) ∥ = O_{p} (n^{- 1}), \end{aligned}$ (A2)

$\begin{aligned} ∥ n^{- 1} {\ddot{Q}}_{n} (θ_{0}) - 2 {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (θ_{0}) ∥ = o_{p} (1) . \end{aligned}$ (A3)

Proof.

The proof of Lemma 2 is similar as Lemma 2 in Tian et al. [20] and is omitted here.

Proof of Theorem 1.

Proof.

Let $δ = n^{- r / (2 r + 1)}$ , $β = β_{0} + δ C_{1}$ , $γ = γ_{0} + δ C_{2}$ and $C = (C_{1}^{T}, C_{2}^{T})^{T}$ . To prove Theorem 1, it is sufficient to show that $\forall ϵ > 0$ , ∃ a large constant $C_{0}$ satisfies

$P \{{inf}_{∥ C ∥ = C_{0}} Q_{p} (θ) - Q_{p} (θ_{0})\} \geq 1 - ϵ .$ (A4)

Obviously, when $ϵ \geq 1$ , $1 - ϵ < 0$ , (A4) is always true. Therefore, we consider the case that $ϵ \in (0, 1)$ . Assume $β_{k} = 0 (k = p_{1} + 1, \dots, p)$ , $θ_{l} (\cdot) = 0 (l = q_{1} + 1, \dots, q)$ and $p_{λ} (0) = 0$ . Let $Δ (β, γ) = \frac{1}{K} [Q_{p} (θ) - Q_{p} (θ_{0})]$ , $θ_{0} = (β_{0}^{T}, γ_{0}^{T})^{T}$ , we have

$\begin{aligned} Δ (β, γ) & \geq \frac{1}{K} [Q_{n} (θ) - Q_{n} (θ_{0})] + \frac{n}{K} \sum_{k = 1}^{p_{1}} [p_{λ_{1 k}} (| β_{k} |) - p_{λ_{1 k}} (| β_{k 0} |)] \\ + \frac{n}{K} \sum_{l = 1}^{q_{1}} [p_{λ_{2 l}} ({∥ γ_{l} ∥}_{H}) - p_{λ_{2 l}} ({∥ γ_{l 0} ∥}_{H})] \\ = Δ_{1} + Δ_{2} + Δ_{3} . \end{aligned}$

Apply Taylor expansion to $Q_{n} (θ)$ at $θ_{0}$ , we have

$Q_{n} (θ) = Q_{n} (θ_{0} + δ C) = Q_{n} (θ_{0}) + δ C^{T} {\dot{Q}}_{n} (θ_{0}) + \frac{1}{n}; 2 δ^{2} C^{T} {\ddot{Q}}_{n} (\tilde{θ}) C,$

where $\tilde{θ}$ lies between β and $β_{0}$ . According to Lemma 1 and Lemma 2, we can get

$δ C^{T} {\dot{Q}}_{n} (θ_{0}) = δ C^{T} {2 n {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β_{0}) + n O_{p} (n^{- 1})} = ∥ C ∥ O_{p} (\sqrt{n} δ) + ∥C∥ O_{p} (δ),$

and

$\begin{aligned} δ^{2} C^{T} {\ddot{Q}}_{n} (θ_{0}) C & = δ^{2} C^{T} {2 n {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (θ_{0}) + n o_{p} (1)} C \\ = n δ^{2} C^{T} {\dot{\hat{\bar{g}}}}_{n}^{T} (θ_{0}) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{n} (θ_{0}) C + n δ^{2} ∥ C ∥^{2} o_{p} (1) . \end{aligned}$

Therefore, we have

$\begin{matrix} Δ_{1} = \frac{1}{K} {n δ^{2} ∥ C ∥^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} + ∥ C ∥ O_{p} (\sqrt{n} δ) + ∥ C ∥ O_{p} (δ) + n δ^{2} ∥ C ∥^{2} o_{p} (1)} . \end{matrix}$

Obviously, $n δ^{2} ∥ C ∥^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} \geq 0$ . When C is large enough,

$n δ^{2} ∥ C ∥^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} \geq ∥ C ∥ O_{p} (\sqrt{n} δ), n δ^{2} ∥ C ∥^{2} J_{0}^{T} Ω_{0}^{- 1} J_{0} \geq n δ^{2} ∥ C ∥^{2} o_{p} (1) .$

So when C is large enough, $Δ_{1} > 0$ . Next, by Taylor expression, we get that

$\begin{aligned} Δ_{2} & = \frac{n}{K} \sum_{k = 1}^{p_{1}} [p_{λ_{1 k}} (| β_{k} |) - p_{λ_{1 k}} (| β_{k 0} |)] \\ = \frac{1}{K} \sum_{k = 1}^{p_{1}} [n δ p_{λ_{2 k}}^{'} (| β_{k 0} |) sgn (β_{k 0}) | C_{1} | + n δ^{2} p_{λ_{2 k}}^{''} (β_{k 0}) | C_{1} |^{2} (1 + o (1))] \\ \leq \frac{1}{K} {\sqrt{p_{1}} n δ a_{n} ∥ C ∥ + n δ^{2} a_{n} ∥ C ∥^{2}} . \end{aligned}$

Then, $Δ_{2}$ is dominated by $Δ_{1}$ uniformly in $∥ C ∥ = C_{0}$ for a sufficiently large $C_{0}$ .

Assume $λ_{1 k} \to 0$ , $λ_{2 l} \to 0$ and $K = O (n^{1 / (2 r + 1)})$ . When n is large enough, following Xue et al. [29], we have $∥ γ_{l} ∥_{H} \geq a λ, ∥ γ_{l 0} ∥_{H} \geq a λ_{2 l}$ . According to the definition of the penalty function, we get

$p_{λ_{l}} (∥ γ_{l} ∥_{H}) = p_{λ_{l}} (∥ γ_{l 0} ∥_{H}) = \frac{(1 + a) λ_{l}^{2}}{2}, \sum_{l = 1}^{q_{1}} n [p_{λ_{l}} ({∥γ_{l}∥}_{H}) - p_{λ_{l}} ({∥ γ_{l 0} ∥}_{H})] = 0.$

So, $\forall ϵ > 0$ , ∃ a large enough $C_{0}$ satisfies (A4), which further implies that there exists $\hat{θ}$ satisfies $∥ \hat{θ} - θ_{0} ∥ = O_{p} (δ) = O_{p} (n^{- r / (2 r + 1)})$ . Note that

$\begin{aligned} ∥ {\hat{α}}_{l} (t) - α_{l} (t) ∥^{2} & = \int_{0}^{1} {B^{T} (t) {\hat{γ}}_{l} - B^{T} (t) γ_{l 0} + α_{l} (t) + B^{T} (t) γ_{l 0}}^{2} d t \\ \leq 2 \int_{0}^{1} {B^{T} (t) {\hat{γ}}_{l} - B^{T} (t) γ_{l 0}}^{2} d t + 2 \int_{0}^{1} {α_{l} (t) - B^{T} (t) β_{l 0}}^{2} d t \\ = 2 ({\hat{γ}}_{l} - γ_{l 0})^{T} (\int_{0}^{1} B^{T} (t) B (t) d t) ({\hat{γ}}_{l} - γ_{l 0}) + 2 \int_{0}^{1} {α_{l} (t) - B^{T} (t) β_{l 0}}^{2} d t \\ = 2 ({\hat{γ}}_{l} - γ_{l 0})^{T} H ({\hat{γ}}_{l} - γ_{l 0}) + 2 \int_{0}^{1} R_{l} (t)^{2} d t . \end{aligned}$

With the same arguments above, we can get $∥ \hat{γ} - γ ∥ = O_{p} (n^{- r / (2 r + 1)})$ . Therefore, invoking $H = O (1))$ , we have $({\hat{γ}}_{l} - γ_{l 0})^{T} H ({\hat{γ}}_{l} - γ_{l 0}) = O_{P} (n^{- 2 r / (2 r + 1)})$ .

Suppose C2 and C8 hold and $K = O (N^{1 / (2 r + 1)})$ , with the Corollary 6.21 in [18], ∃ a constant $c_{0}$ that satisfies

$sup_{t \in [0, 1]} | α_{l} (t) - B^{T} (t) γ_{l 0} | \leq c_{0} K^{- r}, l = 1, 2, \dots, q .$ (A5)

So we get $\int_{O}^{1} R_{l} (t)^{2} d t = O_{P} (n^{- 2 r / (2 r + 1)})$ . Thus, the proof of Theorem 1 is complete.

Proof of Theorem 2.

Proof.

Part (i). Denote $Q_{p} (θ) = Q_{p} (β, γ)$ . According to Theorem 1, similar as [20], it suffices to show that, $\forall γ$ that satisfies $∥ γ - γ_{0} ∥ = O_{p} (n^{- r / (2 r + 1)}) (k = 1, 2, \dots, p_{1})$ , $\forall β$ that satisfies $∥ β_{k} - β_{0} ∥ = O_{p} (n^{- r / (2 r + 1)}) (k = 1, 2, \dots, p_{1})$ , and ∃ small $ϵ = O_{p} (n^{- 1 / (2 r + 1)})$ , when $n \to \infty$ , with probability tending to one, we have

$\frac{\partial Q_{p} (β, γ)}{\partial β_{k}} > 0, f o r 0 < β_{k} < ϵ, k = p_{1} + 1, \dots, p$ (A6)

and

$\frac{\partial Q_{p} (β, γ)}{\partial β_{k}} < 0, f o r - ϵ < β_{k} < 0, k = p_{1} + 1, \dots, p$ (A7)

obviously, (A6) and (A7) imply that minimizer of $Q_{p} (β, γ)$ about β attains at $β_{k} = 0 (k = 1, 2, \dots, p)$ .

According to Lemma 2, we have

$\begin{aligned} \frac{\partial Q (β, γ)}{\partial β_{k}} & = 2 n \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial β_{k}} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β, γ) + o_{p} (1) + n p_{λ_{1 k}}^{'} (| β_{k} |) sgn (β_{k}) \\ = n λ_{1 k} {2 λ_{1 k}^{- 1} \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial β_{k}} Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (β, γ) + λ_{1 k}^{- 1} p_{λ_{1 k}}^{'} (| β_{k} |) sgn (β_{k})} + o_{p} (1) \end{aligned}$

Denote $\frac{\partial {\hat{\bar{g}}}_{n}^{T} (θ)}{\partial θ} = (\frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial β_{1}}, \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial β_{2}}, \dots, \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial β_{p}}, \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial γ_{1}}, \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial γ_{2}}, \dots, \frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial γ_{q}})$ . According to Lemma 2, we have $\frac{\partial {\hat{\bar{g}}}_{n}^{T} (β, γ)}{\partial β_{k}} ⟹ p - J_{β_{k}}$ , where $J_{0} = (J_{β_{1}}, J_{β_{2}}, \dots, J_{β_{p}}, J_{γ_{1}}, J_{γ_{2}}, \dots, J_{γ_{q}})$ . Thus, we get

$\frac{\partial Q (β, γ)}{\partial β_{k}} = n λ_{1 k} {O_{p} (λ_{1 k}^{- 1} n^{- 1 / 2}) + λ_{1 k}^{- 1} p_{λ_{1 k}}^{'} (| β_{k} |) sgn (β_{k})} + o_{p} (1)$

In addition, C11 implies that $\underset{n \to \infty}{lim inf} \underset{β_{k} \to 0^{+}}{lim inf} λ_{1 k}^{- 1} p_{λ_{1 k}}^{'} (| β_{k} |) > 0$ , and $λ_{1 k}^{- 1} n^{- 1 / 2} \to 0$ , which means that the sign of $\frac{\partial Q (β, γ)}{\partial β_{k}}$ is same as that of $β_{k}$ . So, (A6) and (A7) hold, the proof of part (i) is completed.

We then prove part (ii). Denote

$\begin{aligned} Θ_{1} & = {θ : θ = (β^{T}, γ^{T})^{T}, γ_{l} = 0, l = q_{1} + 1, \dots, q}, \\ Θ_{l} & = {θ : θ = (β^{T}, 0^{T}, \dots, 0^{T}, γ_{l}^{T}, 0^{T}, \dots, 0^{T}), l = q_{1} + 1, \dots, q}, \end{aligned}$

where $0$ is a $(K + d) \times 1$ vector with all of components being zero.

To prove part (ii), it is sufficient to show that, $\forall θ \in Θ_{1}$ and $θ_{l}^{*} \in Θ_{l}$ , $Q_{p} (θ + θ_{l}^{*}) \geq Q_{p} (θ)$ is true with probability tending to 1.

$\begin{aligned} Q_{p} (θ + θ_{l}^{*}) - Q_{p} (θ) & = Q_{n} (θ + θ_{l}^{*}) - Q_{n} (θ) + n p_{λ_{2 l}} ({∥ γ_{l}^{*} ∥}_{H}) \\ = θ_{l}^{* T} {\dot{Q}}_{n} (θ) + \frac{1}{2} θ_{l}^{* T} {\ddot{Q}}_{n} ({\hat{θ}}_{l}^{*}) θ_{l}^{*} (1 + o_{p} (1)) + n p_{λ_{2 l}} ({∥ γ_{l}^{*} ∥}_{H}) \\ = n λ_{2 l} ∥ B^{T} (t) γ_{l}^{*} ∥ \{\frac{R_{l}^{*}}{λ_{2 l}} + \frac{p_{λ_{2 l}}^{'} (t)}{λ_{2 l}}\} (1 + o_{p} (1)), \end{aligned}$

where ${\hat{θ}}_{l}^{*}$ lies between $θ + θ_{l}^{*}$ and θ, $t \in (0, ∥ γ_{l}^{*} ∥_{H})$ . Furthermore, we get

$R_{l}^{*} = \frac{θ {_{l}^{*}}^{T} n^{- 1} {\dot{Q}}_{n} ({\hat{θ}}_{l}^{*}) + \frac{1}{2} θ {_{l}^{*}}^{T} n^{- 1} {\ddot{Q}}_{n} ({\hat{θ}}_{l}^{*}) θ_{l}^{*}}{∥ B^{T} (t) γ_{l}^{*} ∥} .$

Note that $α_{l 0} = 0, l = q_{1} + 1, \dots, q$ , from Lemma 1 and [29], we have $∥ B^{T} (t) γ_{l}^{*} ∥ = O (n^{r / (2 r + 1)})$ and $∥ B^{T} (t) γ_{l}^{*} ∥ λ_{2 l} = O (n^{r / (2 r + 1)} λ_{l}) \to \infty$ . According to Lemma 2 and Lemma 3, we have

$\begin{aligned} θ_{l}^{* T} n^{- 1} {\dot{Q}}_{n} ({\hat{θ}}_{l}^{*}) & = O_{p} (n^{- 1 / 2}) = o_{p} (1), θ_{l}^{* T} n^{- 1} {\ddot{Q}}_{n} ({\hat{θ}}_{l}^{*}) θ_{l}^{*} = θ_{l}^{* T} J_{0}^{T} Ω_{0}^{- 1} J_{0} θ_{l}^{*} + o_{p} (1) < + \infty, \\ \frac{R_{l}^{*}}{λ_{2 l}} & = \frac{θ {_{l}^{*}}^{T} J_{0}^{T} Ω_{0}^{- 1} J_{0} θ_{l}^{*}}{∥ B^{T} (t) γ_{l}^{*} ∥ λ_{2 l}} + o_{p} (1) \to 0. \end{aligned}$

Form C10 and C11, for t lies between 0 and $∥ γ_{l}^{*} ∥_{H}$

$\underset{n \to \infty}{lim inf} \underset{{∥ γ_{l}^{*} ∥}_{H} \to 0}{lim inf} \frac{p_{λ_{2 l}}^{'} (t)}{λ_{2 l}} > 0, l = q_{1} + 1, \dots, q .$

Thus, for any $θ \in Θ$ and $θ_{l}^{*} \in Θ_{l}$ , $Q_{p} (θ + θ_{l}^{*}) \geq Q_{p} (θ)$ is true with probability tending to 1. The part (ii) is real. So, the proof of Theorem 2 is completed.

Proof of Theorem 3.

Proof.

Let $β_{0}^{*}$ be the true value of $β^{*}$ . Let $α^{*} (t) = (α_{1} (t), α_{2} (t), \dots, α_{q_{1}} (t))^{T}$ and $α_{0}^{*} (t)$ be the true value of $α^{*} (t)$ , $γ^{*}$ and $γ_{0}^{*}$ are the spline coefficients of $α^{*} (t)$ and $α_{0}^{*} (t)$ respectively. Then, Theorems 1 and 2 imply that $Q_{p} (θ)$ attains the minimal value at $({\hat{β}}^{* T}, 0)^{T}$ and $({\hat{γ}}^{* T}, 0)^{T}$ .

Denote $θ^{*} = (β^{* T}, 0, γ^{* T}, 0)^{T}$ , $θ_{0}^{*} = (β_{0}^{* T}, 0, γ_{0}^{* T}, 0)^{T}$ , and ${\hat{θ}}^{*} = ({\hat{β}}^{* T}, 0, {\hat{γ}}^{* T}, 0)^{T}$ , write ${\dot{\hat{\bar{g}}}}_{n} (θ) = \frac{\partial}{\partial θ} {\hat{\bar{g}}}_{n} (θ) = (\frac{\partial}{\partial β} {\hat{\bar{g}}}_{n} (θ), \frac{\partial}{\partial γ} {\hat{\bar{g}}}_{n} (θ)) = ({\dot{\hat{\bar{g}}}}_{β} (θ), {\dot{\hat{\bar{g}}}}_{γ} (θ))$ , we have

$\begin{aligned} S_{n} (θ) & = (\begin{matrix} {\dot{\hat{\bar{g}}}}_{β}^{T} (θ) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (θ) \\ {\dot{\hat{\bar{g}}}}_{γ}^{T} (θ) Ω_{n}^{- 1} {\hat{\bar{g}}}_{n} (θ) \end{matrix}), H_{n} (θ) = (\begin{array}{cc} H_{11} & H_{12} \\ H_{21} & H_{22} \end{array}) \\ = (\begin{array}{cc} {\dot{\hat{\bar{g}}}}_{β}^{T} (θ) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{β} (θ) & {\dot{\hat{\bar{g}}}}_{β}^{T} (θ) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{γ} (θ) \\ {\dot{\hat{\bar{g}}}}_{γ}^{T} (θ) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{β} (θ) & {\dot{\hat{\bar{g}}}}_{γ}^{T} (θ) Ω_{n}^{- 1} {\dot{\hat{\bar{g}}}}_{γ} (θ) \end{array}) . \end{aligned}$

Denote $p_{λ} (θ) = \sum_{k = 1}^{p} p_{λ_{1 k}} (| β_{k} |) + \sum_{l = 1}^{q} p_{λ_{2 l}} ({∥ γ_{l} ∥}_{H})$ . According to (22) we have

${\dot{Q}}_{p} ({\hat{θ}}^{*}) = {\dot{Q}}_{n} ({\hat{θ}}^{*}) + n {\dot{p}}_{λ} ({\hat{θ}}^{*}) {\hat{θ}}^{*} = 0.$ (A8)

Applying the Taylor expression to (A8), we have

${\dot{Q}}_{n} ({\hat{θ}}^{*}) + n {\dot{p}}_{λ} ({\hat{θ}}^{*}) {\hat{θ}}^{*} = {\dot{Q}}_{n} ({\hat{θ}}_{0}^{*}) + n {\dot{p}}_{λ} ({\hat{θ}}_{0}^{*}) {\hat{θ}}_{0}^{*} + {{\ddot{Q}}_{n} ({\hat{θ}}_{0}^{*}) + n {\ddot{p}}_{λ} ({\tilde{θ}}_{0}^{*})} ({\hat{θ}}^{*} - {\hat{θ}}_{0}^{*}) = 0,$ (A9)

where ${\tilde{θ}}_{0}^{*}$ lies between $θ_{0}^{T}$ and ${\hat{θ}}_{0}^{*}$ . Therefore, we have

$- n^{- 1} {\dot{Q}}_{n} ({\hat{θ}}_{0}^{*}) + {\dot{p}}_{λ} ({\hat{θ}}_{0}^{*}) {\hat{θ}}_{0}^{*} = n^{- 1} {{\ddot{Q}}_{n} ({\hat{θ}}_{0}^{*}) + {\ddot{p}}_{λ} ({\tilde{θ}}_{0}^{*})} ({\hat{θ}}^{*} - {\hat{θ}}_{0}^{*}) .$ (A10)

Note that ${\dot{p}}_{λ_{1}} (β^{*}) = \sum_{k = 1}^{p_{1}} p_{λ_{1 k}}^{'} (| {\hat{β}}_{k}^{*} |) sgn ({\hat{β}}^{*})$ , ${\dot{p}}_{λ_{2}} (γ^{*}) = \sum_{l = 1}^{q_{1}} p_{λ_{2 l}}^{'} (∥ {\hat{γ}}_{l}^{*} ∥) \frac{H {\hat{γ}}_{l}^{*}}{∥ {\hat{γ}}_{l}^{*} ∥}$ , and ${\dot{p}}_{λ} (θ) = {\dot{p}}_{λ_{1}} (β) + {\dot{p}}_{λ_{2}} (γ)$ . Apply Taylor expression to $p_{λ_{1 k}}^{'} (| {\hat{β}}_{k}^{*} |)$ , we have

$p_{λ_{1 k}}^{'} (| {\hat{β}}_{k}^{*} |) = p_{λ_{1 k}}^{'} (| {\hat{β}}_{0 k} |) + {p_{λ_{1 k}}^{''} (| β_{0 k} |) + o_{p} (1)} ({\hat{β}}_{k}^{*} - β_{0 k})$

C9 implies that $p_{λ_{1 k}}^{''} (| β_{0 k}^{*} |) = o_{p} (1)$ , and note that $p_{λ_{1 k}}^{'} (| β_{0 k}^{*} |) = 0$ as $λ_{max} \to= o_{p} ({\hat{β}}^{*} - β_{0}^{*})$ . Following [29], we know that $∥ {\hat{γ}}_{l}^{*} ∥_{H} \geq a λ_{2 l}$ for n large enough. Thus, $p_{λ_{2 l}}^{'} (∥ {\hat{γ}}_{l}^{*} ∥_{H}) = 0$ and $p_{λ_{2 l}}^{''} (∥ {\hat{γ}}_{l}^{*} ∥_{H}) = 0$ , which imply that ${\dot{p}}_{λ_{2}} (γ^{*}) = 0 = o_{p} ({\hat{γ}}^{*} - γ_{0}^{*})$ and ${\dot{p}}_{λ} (θ^{*}) = 0$ . So we have

$\begin{aligned} \sqrt{n} H ({\hat{θ}}^{*} - θ_{0}^{*}) & = \sqrt{n} S_{n} + o_{p} (1) \\ \sqrt{n} ({\hat{β}}^{*} - β_{0}^{*}) & = {H_{11} ({\hat{θ}}_{0}^{*}) - H_{12} ({\hat{θ}}_{0}^{*}) H_{22}^{- 1} ({\hat{θ}}_{0}^{*}) H_{21} ({\hat{θ}}_{0}^{*})}^{- 1} \\ \times (I, H_{21} ({\hat{θ}}_{0}^{*}) H_{22}^{- 1} ({\hat{θ}}_{0}^{*})) \sqrt{n} S_{n} ({\hat{θ}}_{0}^{*}) + o_{p} (1) \end{aligned}$

From Lemma 1, we can get $∥ n^{- 1} {\dot{Q}}_{n} (θ_{0}^{*}) - 2 S_{n} (θ_{0}^{*}) ∥ = O_{p} (n^{- 1})$ , $∥ n^{- 1} {\ddot{Q}}_{n} (θ_{0}^{*}) - 2 H_{n} (θ_{0}^{*}) ∥ = o_{p} (1)$ . According to C9 and Lemma 2, we have

$\frac{\partial {\hat{\bar{g}}}_{n} (θ)}{\partial θ^{*}} ⟹ P - J_{0 θ^{*}}, \frac{\partial {\hat{\bar{g}}}_{n} (θ)}{\partial β^{*}} ⟹ P - J_{0 β^{*}}, \frac{\partial {\hat{\bar{g}}}_{n} (θ)}{\partial γ^{*}} ⟹ P - J_{0 γ^{*}}$

and

$\begin{aligned} \sqrt{n} {\hat{\bar{g}}}_{n} (θ_{0}^{*}) = O_{p} (n^{- 1 / 2}), \sqrt{n} S_{n} ({\hat{θ}}_{0}^{*}) = J_{0 θ^{*}} Ω_{0}^{- 1} \sqrt{n} {\hat{\bar{g}}}_{n} (θ_{0}^{*}) + o_{p} (1) = O_{p} (n^{- 1 / 2}) \\ H_{11} ({\hat{θ}}_{0}^{*}) \to J_{0 β^{*}}^{T} Ω_{0}^{- 1} J_{0 β^{*}} = H_{11}^{0} ({\hat{θ}}_{0}^{*}), H_{22} ({\hat{θ}}_{0}^{*}) \to J_{0 γ^{*}}^{T} Ω_{0}^{- 1} J_{0 γ^{*}} = H_{22}^{0} ({\hat{θ}}_{0}^{*}) \\ H_{12} ({\hat{θ}}_{0}^{*}) \to J_{0 β^{*}}^{T} Ω_{0}^{- 1} J_{0 γ^{*}} = H_{12}^{0} ({\hat{θ}}_{0}^{*}), H_{21} ({\hat{θ}}_{0}^{*}) \to J_{0 γ^{*}}^{T} Ω_{0}^{- 1} J_{0 β^{*}} = H_{21}^{0} ({\hat{θ}}_{0}^{*}) \end{aligned}$

where $J_{0 θ^{*}} = (J_{0 β^{*}}, J_{0 γ^{*}})$ .

Denote $A = {H_{11} ({\hat{θ}}_{0}^{*}) - H_{12} ({\hat{θ}}_{0}^{*}) H_{22}^{- 1} ({\hat{θ}}_{0}^{*}) H_{21} ({\hat{θ}}_{0}^{*})}^{- 1} (I, H_{21} ({\hat{θ}}_{0}^{*}) H_{22}^{- 1} ({\hat{θ}}_{0}^{*}))$ , Hence we can get

$A ⟹ p {H_{11}^{0} ({\hat{θ}}_{0}^{*}) - H_{12}^{0} ({\hat{θ}}_{0}^{*}) H_{22}^{- 1} ({\hat{θ}}_{0}^{*}) H_{21}^{0} ({\hat{θ}}_{0}^{*})}^{- 1} (I, H_{21}^{0} ({\hat{θ}}_{0}^{*}) H_{22}^{- 1} ({\hat{θ}}_{0}^{*})) = A_{0}$ (A11)

$\sqrt{n} S_{n} ({\hat{θ}}_{0}^{*}) ⟹ L N (0, (J_{θ_{0}^{*}} Ω_{0}^{- 1} J_{θ_{0}^{*}}^{T})^{- 1})$

According to the Slutsky Theorem, we can see that $\hat{β^{*}}$ is consistent and asymptotic normality,

$\sqrt{n} ({\hat{β}}^{*} - β_{0}^{*}) ⟹ L N (0, A_{0} (J_{θ_{0}^{*}} Ω_{0}^{- 1} J_{θ_{0}^{*}}^{T})^{- 1} A_{0}^{T})$ (A12)

This completes the proof of Theorem 3.

Funding Statement

This work is supported by grants from the Social Science Foundation of China [grant number 15CTJ008 to M. Z.], the Natural Science Foundation of Anhui Universities [grant number KJ2017A433 to K. Z.], the Social Science Foundation of the Ministry of Education of China [grant number 19YJCZH250 to K. Z.], the National Science Foundation of China [grant numbers 12071305, 11871390 and 11871411 to Y. Z.], the Excellent Young Talents Fund Program of Higher Education Institutions of Anhui Province [grant number gxyqZD2019031 to Y. Z.], the National Science Foundation of China [grant number 71803001 to Y. Z.]. This paper is partially supported by the National Natural Science Foundation of China [grant number 11901401].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Carroll R.J., Ruppert D., Stefanski L.A. and Crainiceanu C.M., Measurement Error in Nonlinear Models: a Modern Perspective, Chapman and Hall/CRC, New York, 2006. [Google Scholar]
2.Fan G.L., Xu H.X. and Huang Z.S., Empirical likelihood for semivarying coefficient model with measurement error in the nonparametric part, AStA Adv. Stat. Anal. 100 (2015), pp. 21–41. [Google Scholar]
3.Fan G.L., Xu H.X. and Liang H.Y., Empirical likelihood inference for partially time-varying coefficient errors-in-variables models, Electron. J. Stat. 6 (2012), pp. 1040–1058. [Google Scholar]
4.Fan J. and Huang T., Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli. 11 (2005), pp. 1031–1057. [Google Scholar]
5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Statist. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]
6.Feng S. and Xue L., Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition, Ann. Inst. Statist. Math. 66 (2014), pp. 121–140. [Google Scholar]
7.Hastie T. and Tibshirani R., Varying coefficient models, J. R. Stat. Soc. Ser. B. (Stat. Methodol.). 55 (1993), pp. 757–779. [Google Scholar]
8.He X., Zhu Z.Y. and Fung W.K., Estimation in a semiparametric model for longitudinal data with unspecified dependence structure, Biometrika 89 (2002), pp. 579–590. [Google Scholar]
9.Hu X., Wang Z. and Zhao Z., Empirical likelihood for semiparametric varying coefficient partially linear errors-in-variables models, Statist. Probab. Lett. 79 (2009), pp. 1044–1052. [Google Scholar]
10.Huang Z. and Zhang R., Empirical likelihood for nonparametric parts in semiparametric varying coefficient partially linear models, Statist. Probab. Lett. 79 (2009), pp. 1798–1808. [Google Scholar]
11.Kaslow R.A., Ostrow D.G., Detels R., Phair J.P., Polk B.F. and Rinaldo C.J., The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants, Am. J. Epidemiol. 126 (1987), pp. 310–318. [DOI] [PubMed] [Google Scholar]
12.Li Q., Huang C.J., Li D. and Fu T-T, Semiparametric smooth coefficient models, J. Bus. Econom. Statist. 20 (2002), pp. 412–422. [Google Scholar]
13.Li R. and Liang H., Variable selection in semiparametric regression modeling, Ann. Stat. 36 (2008), pp. 261. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Liang K.Y. and Zeger S.L., Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), pp. 13–22. [Google Scholar]
15.Park B.U., Mammen E., Lee Y.K. and Lee E.R., Varying coefficient regression models: a review and new developments, Int. Stat. Rev. 83 (2015), pp. 36–64. [Google Scholar]
16.Qu A. and Li R., Quadratic inference functions for varying coefficient models with longitudinal data, Biometrics. 62 (2006), pp. 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Qu A., Lindsay B.G. and Li B., Improving generalised estimating equations using quadratic inference functions, Biometrika 87 (2000), pp. 823–836. [Google Scholar]
18.Schumaker L., Spline Functions: Basic Theory, Cambridge University Press, New York, 2007. [Google Scholar]
19.Tian R. and Xue L., Variable selection for semiparametric errors-in-variables regression model with longitudinal data, J. Stat. Comput. Simul. 19 (2013), pp. 1–16. [Google Scholar]
20.Tian R., Xue L. and Liu C., Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data, J. Multivariate Anal. 132 (2014), pp. 94–110. [Google Scholar]
21.Wang H., Li R. and Tsai C-L., Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika 94 (2007), pp. 553–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Wang H., Zou G. and Wan A.T., Model averaging for varying-coefficient partially linear measurement error models, Electron. J. Stat. 6 (2012), pp. 1017–1039. [Google Scholar]
23.Wang H.J., Zhu Z. and Zhou J., Quantile regression in partially linear varying coefficient models, Ann. Statist. 37 (2009), pp. 3841–3866. [Google Scholar]
24.Wang L., Li H. and Huang J.Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Am. Statist. Assoc. 103 (2008), pp. 1556–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Wang X., Li G. and Lin L., Empirical likelihood inference for semi-parametric varying-coefficient partially linear EV models, Metrika. 73 (2011), pp. 171–185. [Google Scholar]
26.Wang Z. and Xue L., Variable selection for high dimensional partially linear varying coefficient errors-in-variables models, Hacet. J. Math. Stat. 48 (2019), pp. 213–229. [Google Scholar]
27.Wei C., Statistical inference for restricted partially linear varying coefficient errors-in-variables models, J. Statist. Plann. Inference 142 (2012), pp. 2464–2472. [Google Scholar]
28.Xia Y. and Da H., Block empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models with longitudinal data, J. Probab. Stat. 168 (2013), pp. 175–186. [Google Scholar]
29.Xue L., Qu A. and Zhou J., Consistent model selection for marginal generalized additive model for correlated data, J. Am. Stat. Assoc. 105 (2010), pp. 1518–1530. [Google Scholar]
30.You J. and Zhou Y., Empirical likelihood for semiparametric varying-coefficient partially linear regression models, Statist. Probab. Lett. 76 (2006), pp. 412–422. [Google Scholar]
31.Zhang J., Feng Z., Xu P. and Liang H., Generalized varying coefficient partially linear measurement errors models, Ann. Inst. Statist. Math. 69 (2017), pp. 97–120. [Google Scholar]
32.Zhang W., Lee S.Y. and Song X., Local polynomial fitting in semivarying coefficient model, J. Multivariate Anal. 82 (2002), pp. 166–188. [Google Scholar]
33.Zhang W., Li G. and Xue L., Profile inference on partially linear varying-coefficient errors-in-variables models under restricted condition, Comput. Statist. Data Anal. 55 (2011), pp. 3027–3040. [Google Scholar]
34.Zhao M., Gao Y. and Cui Y., Variable selection for longitudinal varying coefficient errors-in-variables models, Comm. Statist. Theory Methods. 19 (2020), pp. 1–26. [Google Scholar]
35.Zhao P. and Xue L., Empirical likelihood inferences for semiparametric varying coefficient partially linear errors-in-variables models with longitudinal data, J. Nonparametr. Stat. 21 (2009), pp. 907–923. [Google Scholar]
36.Zhao P. and Xue L., Variable selection for semiparametric varying coefficient partially linear errors-in-variables models, J. Multivariate Anal. 101 (2010), pp. 1872–1883. [Google Scholar]
37.Zhou X. and Liang H., Statistical inference for semiparametric varying coefficient partially linear models with error-prone linear covariates, Ann. Statist. 37 (2009), pp. 427–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Zhou X., Zhao P. and Lin L., Empirical likelihood for parameters in an additive partially linear errors-in-variables model with longitudinal data, J. Korean Stat. Soc. 43 (2014), pp. 91–103. [Google Scholar]

[CIT0001] 1.Carroll R.J., Ruppert D., Stefanski L.A. and Crainiceanu C.M., Measurement Error in Nonlinear Models: a Modern Perspective, Chapman and Hall/CRC, New York, 2006. [Google Scholar]

[CIT0002] 2.Fan G.L., Xu H.X. and Huang Z.S., Empirical likelihood for semivarying coefficient model with measurement error in the nonparametric part, AStA Adv. Stat. Anal. 100 (2015), pp. 21–41. [Google Scholar]

[CIT0003] 3.Fan G.L., Xu H.X. and Liang H.Y., Empirical likelihood inference for partially time-varying coefficient errors-in-variables models, Electron. J. Stat. 6 (2012), pp. 1040–1058. [Google Scholar]

[CIT0004] 4.Fan J. and Huang T., Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli. 11 (2005), pp. 1031–1057. [Google Scholar]

[CIT0005] 5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Statist. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]

[CIT0006] 6.Feng S. and Xue L., Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition, Ann. Inst. Statist. Math. 66 (2014), pp. 121–140. [Google Scholar]

[CIT0007] 7.Hastie T. and Tibshirani R., Varying coefficient models, J. R. Stat. Soc. Ser. B. (Stat. Methodol.). 55 (1993), pp. 757–779. [Google Scholar]

[CIT0008] 8.He X., Zhu Z.Y. and Fung W.K., Estimation in a semiparametric model for longitudinal data with unspecified dependence structure, Biometrika 89 (2002), pp. 579–590. [Google Scholar]

[CIT0009] 9.Hu X., Wang Z. and Zhao Z., Empirical likelihood for semiparametric varying coefficient partially linear errors-in-variables models, Statist. Probab. Lett. 79 (2009), pp. 1044–1052. [Google Scholar]

[CIT0010] 10.Huang Z. and Zhang R., Empirical likelihood for nonparametric parts in semiparametric varying coefficient partially linear models, Statist. Probab. Lett. 79 (2009), pp. 1798–1808. [Google Scholar]

[CIT0011] 11.Kaslow R.A., Ostrow D.G., Detels R., Phair J.P., Polk B.F. and Rinaldo C.J., The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants, Am. J. Epidemiol. 126 (1987), pp. 310–318. [DOI] [PubMed] [Google Scholar]

[CIT0012] 12.Li Q., Huang C.J., Li D. and Fu T-T, Semiparametric smooth coefficient models, J. Bus. Econom. Statist. 20 (2002), pp. 412–422. [Google Scholar]

[CIT0013] 13.Li R. and Liang H., Variable selection in semiparametric regression modeling, Ann. Stat. 36 (2008), pp. 261. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0014] 14.Liang K.Y. and Zeger S.L., Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), pp. 13–22. [Google Scholar]

[CIT0015] 15.Park B.U., Mammen E., Lee Y.K. and Lee E.R., Varying coefficient regression models: a review and new developments, Int. Stat. Rev. 83 (2015), pp. 36–64. [Google Scholar]

[CIT0016] 16.Qu A. and Li R., Quadratic inference functions for varying coefficient models with longitudinal data, Biometrics. 62 (2006), pp. 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0017] 17.Qu A., Lindsay B.G. and Li B., Improving generalised estimating equations using quadratic inference functions, Biometrika 87 (2000), pp. 823–836. [Google Scholar]

[CIT0018] 18.Schumaker L., Spline Functions: Basic Theory, Cambridge University Press, New York, 2007. [Google Scholar]

[CIT0019] 19.Tian R. and Xue L., Variable selection for semiparametric errors-in-variables regression model with longitudinal data, J. Stat. Comput. Simul. 19 (2013), pp. 1–16. [Google Scholar]

[CIT0020] 20.Tian R., Xue L. and Liu C., Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data, J. Multivariate Anal. 132 (2014), pp. 94–110. [Google Scholar]

[CIT0021] 21.Wang H., Li R. and Tsai C-L., Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika 94 (2007), pp. 553–568. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0022] 22.Wang H., Zou G. and Wan A.T., Model averaging for varying-coefficient partially linear measurement error models, Electron. J. Stat. 6 (2012), pp. 1017–1039. [Google Scholar]

[CIT0023] 23.Wang H.J., Zhu Z. and Zhou J., Quantile regression in partially linear varying coefficient models, Ann. Statist. 37 (2009), pp. 3841–3866. [Google Scholar]

[CIT0024] 24.Wang L., Li H. and Huang J.Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Am. Statist. Assoc. 103 (2008), pp. 1556–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0025] 25.Wang X., Li G. and Lin L., Empirical likelihood inference for semi-parametric varying-coefficient partially linear EV models, Metrika. 73 (2011), pp. 171–185. [Google Scholar]

[CIT0026] 26.Wang Z. and Xue L., Variable selection for high dimensional partially linear varying coefficient errors-in-variables models, Hacet. J. Math. Stat. 48 (2019), pp. 213–229. [Google Scholar]

[CIT0027] 27.Wei C., Statistical inference for restricted partially linear varying coefficient errors-in-variables models, J. Statist. Plann. Inference 142 (2012), pp. 2464–2472. [Google Scholar]

[CIT0028] 28.Xia Y. and Da H., Block empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models with longitudinal data, J. Probab. Stat. 168 (2013), pp. 175–186. [Google Scholar]

[CIT0029] 29.Xue L., Qu A. and Zhou J., Consistent model selection for marginal generalized additive model for correlated data, J. Am. Stat. Assoc. 105 (2010), pp. 1518–1530. [Google Scholar]

[CIT0030] 30.You J. and Zhou Y., Empirical likelihood for semiparametric varying-coefficient partially linear regression models, Statist. Probab. Lett. 76 (2006), pp. 412–422. [Google Scholar]

[CIT0031] 31.Zhang J., Feng Z., Xu P. and Liang H., Generalized varying coefficient partially linear measurement errors models, Ann. Inst. Statist. Math. 69 (2017), pp. 97–120. [Google Scholar]

[CIT0032] 32.Zhang W., Lee S.Y. and Song X., Local polynomial fitting in semivarying coefficient model, J. Multivariate Anal. 82 (2002), pp. 166–188. [Google Scholar]

[CIT0033] 33.Zhang W., Li G. and Xue L., Profile inference on partially linear varying-coefficient errors-in-variables models under restricted condition, Comput. Statist. Data Anal. 55 (2011), pp. 3027–3040. [Google Scholar]

[CIT0034] 34.Zhao M., Gao Y. and Cui Y., Variable selection for longitudinal varying coefficient errors-in-variables models, Comm. Statist. Theory Methods. 19 (2020), pp. 1–26. [Google Scholar]

[CIT0035] 35.Zhao P. and Xue L., Empirical likelihood inferences for semiparametric varying coefficient partially linear errors-in-variables models with longitudinal data, J. Nonparametr. Stat. 21 (2009), pp. 907–923. [Google Scholar]

[CIT0036] 36.Zhao P. and Xue L., Variable selection for semiparametric varying coefficient partially linear errors-in-variables models, J. Multivariate Anal. 101 (2010), pp. 1872–1883. [Google Scholar]

[CIT0037] 37.Zhou X. and Liang H., Statistical inference for semiparametric varying coefficient partially linear models with error-prone linear covariates, Ann. Statist. 37 (2009), pp. 427–458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0038] 38.Zhou X., Zhao P. and Lin L., Empirical likelihood for parameters in an additive partially linear errors-in-variables model with longitudinal data, J. Korean Stat. Soc. 43 (2014), pp. 91–103. [Google Scholar]

PERMALINK

Model estimation and selection for partial linear varying coefficient EV models with longitudinal data

Mingtao Zhao

Xiaoli Xu

Yanling Zhu

Kongsheng Zhang

Yan Zhou

Abstract

1. Introduction

2. Model estimation and selection method

3. Asymptotic properties

Remark 3.1

Theorem 3.1

Theorem 3.2

Theorem 3.3

Remark 3.2

4. Computational algorithm and selection of tuning parameters

4.1. Computational algorithm

4.2. Selection of tuning parameters

5. Numerical studies

5.1. Simulations studies

Table 2.

Table 3.

Table 1.

Table 4.

5.2. Real example analysis

Figure 1.

6. Conclusion and discussion

Acknowledgements

Appendix. Proof of theorems.

Proof.

Lemma 1

Proof.

Proof.

Proof.

Proof.

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases