Profile local linear estimation of generalized semiparametric regression model for longitudinal data

Yanqing Sun; Liuquan Sun; Jie Zhou

doi:10.1007/s10985-013-9251-y

. Author manuscript; available in PMC: 2014 Jul 1.

Published in final edited form as: Lifetime Data Anal. 2013 Mar 8;19(3):317–349. doi: 10.1007/s10985-013-9251-y

Profile local linear estimation of generalized semiparametric regression model for longitudinal data

Yanqing Sun ^1,^✉, Liuquan Sun ², Jie Zhou ³

PMCID: PMC3710313 NIHMSID: NIHMS453735 PMID: 23471814

Abstract

This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A K -fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.

Keywords: Asymptotics, Kernel smoothing, Link function, Sampling adjusted estimation, Testing time-varying effects, Weighted least squares

1 Introduction

We study semiparametric modeling of covariate effects on a longitudinal response process based on repeated measurements observed at a series of sampling times. Suppose that there is a random sample of n subjects. For the ith subject, let Y_i (t) be the response process and let Z_i (t) and X_i (t) be the possibly time-dependent covariates of dimensions p × 1 and q × 1, respectively, over the time interval [0, τ]. We consider the following generalized semiparametric regression model for Y_i (t), 0 ≤ t ≤ τ,

μ_{i} (t) = E {Y_{i} (t) ∣ X_{i} (t), Z_{i} (t)} = g^{- 1} {γ^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)}, i = 1, \dots, n,

(1)

where g(·) is a known link function, β is a p-dimensional vector of unknown parameters and γ(t) is a q-dimensional vector of completely unspecified functions. The notation β^T represents transpose of a vector or matrix β. The first component of X_i (t) is set to be 1, which gives a nonparametric baseline function. Under model (1), the effects of some covariates are constant while others are time-varying. Different link functions can be selected to provide a richer family of models for longitudinal data.

When the link function g(·) is the identity function, model (1) is known as the semiparametric additive model. The semiparametric additive model with longitudinal data has been studied extensively in recent years. We refer to Hoover et al. (1998), Martinussen and Scheike (1999, 2000, 2001), Lin and Ying (2001), Wu and Liang (2004), Fan and Li (2004), Hu et al. (2004), Sun and Wu (2005) and Fan et al. (2007), among others. When the link function is the natural logarithm function and X_i (t) ≡ 1, model (1) becomes the proportional means model. Data collected on the individual response processes at a finite set of sampling times are also called panel data. For panel count data, the proportional means model has been studied by Sun and Wei (2000), Cheng and Wei (2000), Zhang (2002) and Hu et al. (2003). Model (1) unifies the semiparametric additive model and the proportional means model under the same umbrella.

Although model (1) has been extensively studied for cross-sectional data, few have studied it with longitudinal data. Lin and Carroll (2001) studied model (1) when X_i (t) ≡ 1 by using profile-based generalized estimating equations (GEE) and a local linear approach. Lin et al. (2007) proposed a local linear GEE method when all the regression coefficients are nonparametric functions of time. The GEE method with appropriately selected working covariance structure of the longitudinal data can lead to improved efficiency (Fan et al. 2007). However, the selection of the working covariance can be difficult and the efficiency gain under an improperly selected working covariance structure is not clear. Further, there may be technique difficulties with the extension of the GEE method to more complicated sampling schemes. In both Lin and Carroll (2001) and Lin et al. (2007), the sampling times are assumed to be independent of covariates and the situation of possible dropouts of the subjects in the follow-up is not considered. The extensions of their methods to more general sampling and censoring schemes would make these methods more useful in practice.

This paper proposes a sampling adjusted profile local linear estimation method for the generalized semiparametric regression model (1). The paper has two main contributions. First, the proposed method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. Second, this paper presents an unified approach to the semiparametric model (1) with a general link function which has never been exploited for longitudinal data to the best of our knowledge. This presents an opportunity for model selection of the link function. A criteria for selecting the link function is proposed to provide a better fit of the data. The proposed method does not require time-varying covariates to be observed at all time, only the values at the sampling times are needed. Some hypothesis testing procedures are proposed to check whether the effect of a covariate is time-varying. This can lead to more efficient estimation when the effects of some covariates are not really time-varying.

The rest of the paper is organized as follows. In Sect. 2, a sampling adjusted profile-based local linear estimation method is proposed for model (1). Large sample properties are investigated in Sect. 3. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. This section also presents some formal hypothesis testing procedures to check whether the effect of a covariate is time-varying. The procedures for selecting bandwidth and the link function are proposed in Sect. 3.4. A simulation study is conducted in Sect. 4 to examine the finite sample performances of the proposed statistical procedures. The proposed methods are illustrated with the analysis of a HIV-1 RNA data set from an AIDS clinical trial in Sect. 5. Some concluding remarks are made in Sect. 6. All proofs are given in the Appendix.

2 Profile local linear estimation approach

2.1 Prelimilaries

Suppose that the observations of the response process Y_i (t) for the ith subject are taken at the sampling time points 0 ≤ t_i₁ < t_i₂ < · · · < t_{in_i} ≤ τ, where n_i is the total number of observations on the ith subject and τ is the end of follow-up time. The sampling times are often irregular and depend on covariates. In addition, some subjects may drop out of the study early. Let $N_{i} (t) = \sum_{j = 1}^{n_{i}} I (t_{i j} \leq t)$ be the number of observations taken on the ith subject by time t, where I (·) is the indicator function. Let C_i be the end of follow-up time or censoring time whichever comes first. The responses for the ith subject can only be observed at the time points before C_i. Thus N_i (t) can be written as $N_{i}^{*} (t \land C_{i})$ , where $N_{i}^{*} (t)$ is the counting process of sampling times. Let X_i (t) and Z_i (t) be the predictable covariate processes associated with the ith subject. We assume that {(Y_i (·), X_i (·), Z_i (·), N_i (·))}, i = 1, …, n, are independent identically distributed random processes. In this section, we propose an estimation procedure for model (1) based on the observations {(Y_i (t_ij), X_i (t_ij), Z_i (t_ij)); j = 1, …, n_i, i = 1, …, n.}. These are the values of {(Y_i (t), X_i (t), Z_i (t)), 0 ≤ t ≤ τ} observed at sampling times or the jump time points of $N_{i} (t) = N_{i}^{*} (t \land C_{i})$ , i = 1, …, n.

Let Inline graphic be the σ -field representing the history $N_{i}^{*} (\cdot)$ , X_i (·) and Z_i (·) up to time t for 1 ≤ i ≤ n. Let λ_i (t) be the intensity process defined as follows

E {{d N}_{i}^{*} (t) ∣ F_{t -}} = λ_{i} (t) d t,

(2)

for 0 ≤ t ≤ τ. Thus λ_i (t) is the sampling rate at time t conditional on the past Inline graphic . Let α_i (t) = α(t, X_i (t), Z_i (t)) be the conditional mean rate of the sampling times such that $E {{d N}_{i}^{*} (t) ∣ X_{i} (t), Z_{i} (t)} = α (t, X_{i} (t), Z_{i} (t)) d t$ . Then α_i (t) = E{λ_i (t)|X_i (t), Z_i (t)} by the using the double expectation property.

Many existing methods such as Lin and Ying (2001), Martinussen and Scheike (1999, 2000, 2001) took the approach by modelling α_i (t). Lin and Ying (2001) assumed that the sampling process follows a proportional mean rate model (Lin et al. 2000). Martinussen and Scheike (1999, 2000) assumed that the intensity of the sampling process follows a multiplicative Aalen model (Aalen 1978) λ_i(t) = η_i(t) α(t) where α(t) is an unknown deterministic function and η_i (t) is a predictable process. Martinussen and Scheike (2001) considered the sampling adjusted approach by assuming that the intensity follows a nonparametric additive regression model λ_i (t) = η_i (t)α(t)^T X_i (t), where η_i (t) is a predictable at risk indicator, α(t) is vector of unspecified time-dependent regression functions and X_i (t) are predictable time varying covariates. For all these methods mentioned above, the misspecifications of the sampling model can lead to biased estimation of the mean longitudinal response since the expectations of the estimating equations may not be zero, which is also demonstrated in our simulation study in Sect. 4.

The proposed method in the following allows the sampling strategy to depend on the past Inline graphic as well as possibly time-dependent covariates without specifically model such dependence. The estimation procedure directly uses the sampling process $N_{i} (\cdot) = N_{i}^{*} (\cdot \land C_{i})$ without modeling for λ_i (t) or α_i (t).

2.2 Estimation procedures

We adopt a profile approach for the estimation of model (1). First, assuming β is known, the nonparametric component, γ(t), of the model is estimated using the local linear estimating equations. The parametric component, β, is estimated through the weighted profile estimating equations. The details of the estimation procedure are described in the following.

At each t, let γ(s) = γ(t)+ γ̇(t)(s−t)+ O((s-t)²) be the first order Taylor expansion of γ(·) for s in a neighborhood of t, where γ̇(t) is the derivative of γ(t) with respect to t. Denote γ_a(t) = (γ^T(t), γ̇^T(t))^T and ${\tilde{X}}_{i} (s, s - t) = {(X_{i}^{T} (s), (s - t) X_{i}^{T} (s))}^{T}$ . Let ${\tilde{μ}}_{a} (s, γ_{a}, β ∣ X_{i}, Z_{i}) = ϕ {γ_{a}^{T} (t) {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)}$ , where ϕ(x) = g⁻¹(x) is the inverse function of the link function g(y). Let W_i (t) = W (t, X_i (t), Z_i (t)) be a nonnegative weight process that may depend on n. At each t and for fixed β, we consider the following estimating function for γ_a(t):

U_{a} (γ_{a}, β) = \sum_{i = 1}^{n} \int_{0}^{τ} w_{i} (s) {Y_{i} (s) - {\tilde{μ}}_{a} (s, γ_{a}, β ∣ X_{i}, Z_{i})} {\tilde{X}}_{i} (s, s - t) K_{h} (s - t) d N_{i} (s),

(3)

where K_h(·) = K(·/h)/h, K(·) is a kernel function that weights smoothly down the contributions of remote data points and h = h_n > 0 is the bandwidth parameter that controls the size of a local neighborhood. The root of the equation U_a(γ_a, β) = 0 is denoted by γ̃_a(t, β). Since the data used in (3) are localized in the neighborhood of t, a weight function for (3) will not have much effect on the local linear estimator.

Let ϕ̇(x) be the derivative of ϕ(x) = g⁻¹(x) with respect to x. The estimating function U_a(γ_a, β) can be obtained by setting $Q_{i} (s) = W_{i} (s) {[\dot{ϕ} {γ_{a}^{T} (t) {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)}]}^{- 1}$ in the derivative of the local weighted sum of the squares $ℓ_{a} (γ_{a}, β) = \sum_{i = 1}^{n} \int_{0}^{τ} Q_{i} (s) {Y_{i} (s) - {\tilde{μ}}_{a} (s, γ_{a}, β ∣ X_{i}, Z_{i})}^{2} K_{h} (s - t) d N_{i} (s)$ with respect to γ_a. The expectation of U_a(γ_a, β) is approximately zero for the true β and γ(·) as h → 0 under the assumptions given in the Appendix. Let ${\tilde{E}}_{x x} (t) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) K_{h} (s - t) {({\tilde{X}}_{i} (s, s - t))}^{\otimes 2} d N_{i} (s)$ and ${\tilde{E}}_{z x} (t) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) K_{h} (s - t) Z_{i} (s) {({\tilde{X}}_{i} (s, s - t))}^{T} d N_{i} (s)$ , where v^⊗2 = vv^T for a column vector v. Ẽ_yx(t) is defined similarly to Ẽ_zx (t) by replacing Z_i(·) with Y_i (·). Under the identity link function g(x) = x, a explicit solution for (3) can be derived as ${\tilde{γ}}_{a} (t, β) = {\tilde{Y}}_{x}^{T} (t) - {\tilde{Z}}_{x}^{T} (t) β$ where Ỹ_x (t) = Ẽ_yx (t)(Ẽ_{x x} (t))⁻¹ and Z̃_x (t) = Ẽ_zx (t)(Ẽ_{x x} (t))⁻¹.

Let γ̃(t, β) be the first q components of γ̃_a(t, β). The profile estimating function for β is given by

U (β) = \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) [Y_{i} (s) - ϕ {{(\tilde{γ} (s, β))}^{T} X_{i} (s) + β^{T} Z_{i} (s)}] \times {\frac{\partial \tilde{γ} (s, β)}{\partial β} X_{i} (s) + Z_{i} (s)} {d N}_{i} (s),

(4)

where [t₁, t₂] ⊂ (0, τ). The subset [t₁, t₂] is considered to avoid possible instability of γ̃(t, β) near the boundary. In practice, this interval can be taken to be close to [0, τ]. We estimate β by β̂ that solves U(β̂) = 0 and γ(t) by γ̂(t) = γ̂(t, β̂).

The expression for the derivative $\frac{\partial \tilde{γ} (s, β)}{\partial β}$ in (4) is derived in the following. Since U_a(γ̃_a(t, β), β) ≡ 0₂_q, γ̃_a(t, β) satisfies

{{\frac{\partial U_{a} (γ_{a}, β)}{\partial γ_{a}} \frac{\partial {\tilde{γ}}_{a} (t, β)}{\partial β} + \frac{\partial U_{a} (γ_{a}, β)}{\partial β}} |}_{γ_{a} = {\tilde{γ}}_{a} (t, β)} = 0_{2 q} .

It follows that

\frac{\partial {\tilde{γ}}_{a} (t, β)}{\partial β} = {- {\frac{\partial U_{a} (γ_{a}, β)}{\partial γ_{a}}}^{- 1} \frac{\partial U_{a} (γ_{a}, β)}{\partial β} |}_{γ_{a} = {\tilde{γ}}_{a} (t, β)},

(5)

where

- \frac{\partial U_{a} (γ_{a}, β)}{\partial γ_{a}} = \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) \dot{ϕ} {γ_{a}^{T} {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)} \times {{\tilde{X}}_{i} (s, s - t)}^{\otimes 2} K_{h} (s - t) {d N}_{i} (s),

(6)

- \frac{\partial U_{a} (γ_{a}, β)}{\partial β} = \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) \dot{ϕ} {γ_{a}^{T} {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)} \times {\tilde{X}}_{i} (s, s - t)} {(Z_{i} (s))}^{T} K_{h} (s - t) {d N}_{i} (s),

(7)

The estimator β̂ is a weighted least square estimator since the estimating function U(β) can be obtained by setting Q_i (t) = W_i (t)[ϕ̇ {(γ̃(t, β))^T X_i (t) + β^TZ_i (t)}]⁻¹ in the derivative of the profile least squares function ℓ (β) with respect to β, where $ℓ (β) = \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} Q_{i} (s) {[Y_{i} (s) - ϕ {{(\tilde{γ} (s, β))}^{T} X_{i} (s) + β^{T} Z_{i} (s)}]}^{2} {d N}_{i} (s)$ .

2.3 Computational algorithm

The estimators β̂ and γ̂(t̂) can be obtained through an iterated estimation procedure. Let β̂^{^m^−1} be the estimate of β at the (m − 1)th step. The mth step estimator ${\hat{γ}}_{a}^{{m}} (t) = {\tilde{γ}}_{a} (t, {\hat{β}}^{{m - 1}})$ is the root of the estimating function (3) satisfying $U_{a} ({\hat{γ}}_{a}^{{m}} (t), {\hat{β}}^{{m - 1}}) = 0$ . The mth step estimator β̂^{^m^} is obtained by solving the estimatisng function for β:

U_{m} (β) = \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) [Y_{i} (s) - ϕ {{\tilde{γ} (s, {\hat{β}}^{{m - 1}}))}^{T} X_{i} (s) + β^{T} Z_{i} (s)}] \times {\frac{\partial \tilde{γ} (s, {\hat{β}}^{{m - 1}})}{\partial β} X_{i} (s) + Z_{i} (s)} {d N}_{i} (s),

(8)

where $\frac{\partial \tilde{γ} (t, {\hat{β}}^{(m - 1)})}{\partial β}$ is calculated using the formula (5) at β = β̂^{^m^−1}. The estimators ${\hat{γ}}_{a}^{{m}} (t)$ and β̂^{^m^} are updated at each iteration until convergence. The γ̂(t) is the first q components of γ̂_a(t) = γ̃_a(t, β̂). The estimation of β requires that both ${\hat{γ}}_{a}^{{m}} (t)$ and $\frac{\partial \tilde{γ} (t, {\hat{β}}^{(m - 1)})}{\partial β}$ be evaluated at the combined sampling points of all subjects or the jump points of {N_i (·), i = 1, …, n}. The estimate γ̂(t) at the last iteration can be obtained by solving $U_{a} ({\hat{γ}}_{a}^{{m}} (t), {\hat{β}}^{{m - 1}}) = 0$ at the grid points fine enough such that their plots look reasonably smooth.

2.4 Estimation under the fixed designs

Model (2) assumes existence of intensity for the counting processes that record the sampling time points. This formulation excludes sampling at predetermined time points, i.e., the fixed design. However, the method developed in Sect. 2.2 can be extended to the fixed designs with some modifications. Let t₁, …, t_k be the fixed sampling time points at which the responses and covariates may be observed. For the fixed designs, estimation of model (1) does not involve the kernel neighborhood smoothing. In particular, for the fixed designs, the counting process is $N_{i} (t) = \sum_{j = 1}^{k} I (t_{j} \leq t \land C_{i})$ , where C_i is the censoring time for subject i. The equation (3) should be replaced by

U_{a} (γ (t), β) = \sum_{i = 1}^{n} {Y_{i} (t) - ϕ (γ^{T} (t) X_{i} (t) + β^{T} Z_{i} (t))} X_{i} (t) I (C_{i} \geq t) .

(9)

Let γ̃(t, β) solve the equation U_a(γ(t), β) at the fixed time points t = t₁, …, t_k for each fixed β. The estimator β̂ solves U(β̂) = 0 where U(β) is the profile estimating equation under the fixed design having the same expression as (4). The regression coefficient function γ(t) is estimated by γ̂(t) = γ̃(t, β̂).

3 Statistical inferences of semiparametric model

3.1 Asymptotic properties

This subsection investigates the asymptotic properties of the proposed estimators. These asymptotic results are used to construct confidence bands and formulate the test statistics for the regression coefficients in the subsequent subsections.

Let β₀ and γ₀(t) be the true values of β and γ(t) under model (1), respectively. Let $μ_{i} (t) = ϕ {γ_{0}^{T} (t) X_{i} (t) + β_{0}^{T} Z_{i} (t)}$ and ${\dot{μ}}_{i} (t) = \dot{ϕ} {γ_{0}^{T} (t) X_{i} (t) + β_{0}^{T} Z_{i} (t)}$ . Let w(t, x, z) be the deterministic limit of W (t, x, z) in probability as n → ∞. Define e_{x x} (t) = E[w_i(t)μ̇_i (t){X_i (t)}^⊗2α_i(t)ξ_i(t)] and e_xz(t) = E[w_i(t)μ̇_i(t)X_i(t) {Z_i(t)}^Tα_i(t)ξ_i(t)], where ξ_i(t) = I (C_i ≥ t). Let $A = E [\int_{t_{1}}^{t_{2}} w_{i} (s) {\dot{μ}}_{i} (s) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)}^{\otimes 2} {d N}_{i} (s)]$ and $\sum = E {[\int_{t_{1}}^{t_{2}} w_{i} (s) {Y_{i} (s) - μ_{i} (s)} {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)} {d N}_{i} (s)]}^{\otimes 2}$ , where w_i(t) = w(t, X_i (t), Z_i (t)).

Let μ̂_i (s) = ϕ{γ̂^T (s)X_i (s) + β̂^TZ_i(s)} and ${\hat{\dot{μ}}}_{i} (t) = \dot{ϕ} {{\hat{γ}}^{T} (s) X_{i} (s) + {\hat{β}}^{T} Z_{i} (s)}$ . Let ${\hat{E}}_{x x} (t) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) K_{h} (s - t) {\hat{\dot{μ}}}_{i} (s) {(X_{i} (s))}^{\otimes 2} {d N}_{i} (s)$ and ${\hat{E}}_{x z} (t) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) K_{h} (s - t) {\hat{\dot{μ}}}_{i} (s) X_{i} (s) {(Z_{i} (s))}^{T} {d N}_{i} (s)$ . The following theorem presents the consistency and asymptotic normality of β̂.

Theorem 1

Assume that Condition A holds. Then

$\hat{β} \overset{P}{\to} β_{0}$ as n → ∞;
$n^{1 / 2} (\hat{β} - β_{0}) \overset{D}{\to} N (0, A^{- 1} \sum A^{- 1})$ as nh² → ∞ and nh⁵ = O(1).

The matrix A can be consistently estimated by

\hat{A} = n^{- 1} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) {\hat{\dot{μ}}}_{i} (s) {Z_{i} (s) - {({\hat{E}}_{x z} (s))}^{T} {({\hat{E}}_{x x} (s))}^{- 1} X_{i} (s)}^{\otimes 2} {d N}_{i} (s),

and Σ an be consistently estimated by

\sum^{^} = n^{- 1} \sum_{i = 1}^{n} {(\int_{t_{1}}^{t_{2}} W_{i} (s) {Y_{i} (s) - {\hat{μ}}_{i} (s)} {Z_{i} (s) - {({\hat{E}}_{x z} (s))}^{T} {({\hat{E}}_{x x} (s))}^{- 1} X_{i} (s)} {d N}_{i} (s))}^{\otimes 2} .

Under Theorem 1, the proposed estimator β̂ is consistent and asymptotically normal as long as the weight process W (·) converges in probability to a deterministic function w(·). The selection of W (·) plays a role in the variance of the estimator β̂. Naturally, we would like to choose the optimal weight such that the asymptotic variance of β̂ is minimized. This selection is usually difficult. It depends on the correlation structure of the longitudinal data among other things. Suppose that the repeated measurements of Y_i (·) within the same subject are independent and that Y_i (·) is independent of N_i (·) conditional on the covariates X_i (t) and Z_i (t). Let $σ_{ε}^{2} (t ∣ X_{i}, Z_{i}) = Var {Y_{i} (t) ∣ X_{i} (t), Z_{i} (t)}$ be the conditional variance of Y_i (t) given the covariates X_i (t) and Z_i (t) under model (1). Then the matrix $\sum = E [\int_{t_{1}}^{t_{2}} w_{i}^{2} (s) σ_{ε}^{2} (s ∣ X_{i}, Z_{i}) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)}^{\otimes 2} α_{i} (s) ξ_{i} (s) d s]$ . Let $\sum_{0} = E [\int_{t_{1}}^{t_{2}} {{\dot{μ}}_{i} (s) / σ_{ε} (s ∣ X_{i}, Z_{i})}^{2} {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)}^{\otimes 2} α_{i} (s) ξ_{i} (s) d s]$ . We show in the Appendix that

A^{- 1} \sum A^{- 1} - \sum_{0}^{- 1} \geq 0,

(10)

where B ≥ 0 means that the matrix B is nonnegative definite. When w_i (t) = μ̇_i (t)/{σ_ε (t| X_i, Z_i)}², A = Σ = Σ₀ and the equality in (10) holds. The situation often leads to asymptotically efficient estimators in many semiparametric models discussed by Bickel et al. (1993).

Next, we state an asymptotic result for the estimator γ̂(t). The result is useful for constructing confidence intervals for the mean response curve given the covariates. Denote γ̇₀(t), γ̈₀(t) the first and second derivatives of γ₀(t) with respect to t, respectively.

Theorem 2

Under Condition A, $\hat{γ} (t) \overset{P}{\to} γ_{0} (t)$ ,

\sqrt{n h} (\hat{γ} (t) - γ_{0} (t) - \frac{1}{2} μ_{2} h^{2} {\ddot{γ}}_{0}^{T} (t)) \overset{D}{\to} N (0, \sum_{γ} (t)),

as nh² → ∞ and nh⁵ = O(1) for t ∈ (0, τ), where $μ_{2} = \int_{- 1}^{1} t^{2} K (t) d t$ , Σ_γ (t) = e_{x x} (t))⁻¹ Σ_e(e_{x x} (t))⁻¹, $\sum_{e} (t) = {lim}_{n \to \infty} h E {\int_{0}^{τ} w_{i} (s) {Y_{i} (s) - μ_{i} (s)} X_{i} (s) K_{h} (s - t) {d N}_{i} (s)}^{\otimes 2}$ . The covariance matrix Σ_γ (t) can be estimated consistently by ${\sum^{^}}_{γ} (t) = n^{- 1} \sum_{i = 1}^{n} {{\hat{g}}_{i} (t)}^{\otimes 2}$ , where

\begin{array}{l} {\hat{g}}_{i} (t) = h^{1 / 2} {({\hat{E}}_{x x} (t))}^{- 1} \int_{0}^{τ} W_{i} (s) K_{h} (s - t) X_{i} (s) {Y_{i} (s) - {\hat{μ}}_{i} (s)} {d N}_{i} (s) - h^{1 / 2} {({\hat{E}}_{x x} (t))}^{- 1} {\hat{E}}_{x z} (t) \\ \times {\hat{A}}^{- 1} \int_{t_{1}}^{t_{2}} W_{i} (s) {Z_{i} (s) - {({\hat{E}}_{x z} (s))}^{T} {({\hat{E}}_{x x} (s))}^{- 1} X_{i} (s)} {Y_{i} (s) - {\hat{μ}}_{i} (s)} {d N}_{i} (s) . \end{array}

When the link function is the identity function, Sun and Wu (2005) showed that the asymptotic bias of using the profile kernel smoothing for γ₀(t) is $\frac{1}{2} μ_{2} h^{2} {{\ddot{γ}}_{0}^{T} (t) + 2 {(e_{x x} (t))}^{- 1} {\dot{e}}_{x x} (t) {\dot{γ}}_{0} (t)}$ . This phenomenon parallels the situation described in Fan and Gijbels (1996, p. 17) for the nonparametric regression with cross-sectional data that compares the Nadaraya-Watson estimator and the local linear estimator. The extra term in the bias of γ̂(t) using profile kernel smoothing depends on (e_{x x} (t))⁻¹ė_{x x} (γ̇₀(t). The bias of the profile kernel smoothing estimator can be large in the highly asymmetric design where (e_{x x} (t))⁻¹ė_{x x} (γ̇₀(t) is large. On the other hand, the bias of the profile local linear smoothing estimator only involves the second derivative γ̈₀(t), thus is design-adaptive. Another advantage of the local linear smoothing over the kernel smoothing, as discussed in Fan and Gijbels (1996), is the automatic boundary adaption. The rate of convergence at boundary points using the local linear smoothing is same as for the interior points, which can be shown to hold for model (1) with longitudinal data as well.

Let $Γ_{0} (t) = \int_{t_{1}}^{t} γ_{0} (s) d s$ and $\hat{Γ} (t) = \int_{t_{1}}^{t} \hat{γ} (s) d s$ . The following theorem presents a weak convergence result for G_n(t) = n^1/2(Γ̂(t) − Γ₀(t)) over t ∈ [t₁, t₂]. This result provides theoretical justifications for testing the regression coefficient functions γ(t) and for the construction of simultaneous confidence bands of $Γ (t) = \int_{t_{1}}^{t} γ (s) d s$ developed later.

Theorem 3

Under Condition A, $G_{n} (t) = n^{- 1 / 2} \sum_{i = 1}^{n} H_{i} (t) + o_{p} (1)$ uniformly in t ∈ [t₁, t₂] ⊂ (0, τ) as nh² → ∞ and nh⁵ → 0, where

\begin{array}{l} H_{i} (t) = \int_{t_{1}}^{t} {(e_{x x} (s))}^{- 1} \int_{0}^{τ} w_{i} (u) K_{h} (u - s) X_{i} (u) {Y_{i} (u) - μ_{i} (u)} {d N}_{i} (u) d s \\ - \int_{t_{1}}^{t} {(e_{x x} (s))}^{- 1} e_{x z} (s) d s A^{- 1} \\ \times \int_{t_{1}}^{t_{2}} w_{i} (s) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)} {Y_{i} (s) - μ_{i} (s)} {d N}_{i} (s) . \end{array}

(11)

The processes G_n(t) converges weakly to a zero-mean Gaussian process G(t) on [t₁, t₂]. The asymptotic covariance matrix of G_n(t) can be estimated consistently by ${\sum^{^}}_{G} (t) = n^{- 1} \sum_{i = 1}^{n} {{\hat{H}}_{i} (t)}^{\otimes 2}$ , where

\begin{array}{l} {\hat{H}}_{i} (t) = \int_{t_{1}}^{t} {({\hat{E}}_{x x} (s))}^{- 1} \int_{0}^{τ} W_{i} (u) K_{h} (u - s) X_{i} (u) {Y_{i} (u) - {\hat{μ}}_{i} (u)} {d N}_{i} (u) d s \\ - \int_{t_{1}}^{t} {({\hat{E}}_{x x} (s))}^{- 1} {\hat{E}}_{x z} (s) d s {\hat{A}}^{- 1} \\ \times \int_{t_{1}}^{t_{2}} W_{i} (s) {Z_{i} (s) - {({\hat{E}}_{x z} (s))}^{T} {({\hat{E}}_{x x} (s))}^{- 1} X_{i} (s)} {Y_{i} (s) - {\hat{μ}}_{i} (s)} {d N}_{i} (s) . \end{array}

(12)

Remark

For the estimation of model (1) under the fixed designs, the asymptotic results similar to those in Theorem 1 can be established. Without the kernel neighborhood smoothing, one needs to replace, e_xz(t) = E[w_i (t)μ̇_i (t)X_i (t){Z_i (t)}^T α_i (t)ξ_i (t)] by e_xz(t) = E[μ̇_i (t)X_i (t){Z_i (t)}^T ξ_i (t)], and ${\hat{E}}_{x z} (t) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) K_{h} (s - t) {\hat{\dot{μ}}}_{i} (s) X_{i} (s) {(Z_{i} (s))}^{T} {d N}_{i} (s)$ by ${\hat{E}}_{x z} (t) = n^{- 1} \sum_{i = 1}^{n} {\hat{\dot{μ}}}_{i} (t) X_{i} (t) {(Z_{i} (t))}^{T} ξ_{i} (t)$ . Similar replacements hold for e_{x x} (t), Ê_{x x} (t), e_yx (t) and Ê_yx (t). The following asymptotic results can be established for γ̂(t) at $t = t_{1}, \dots, t_{k} : \sqrt{n} (\hat{γ} (t) - γ_{0} (t)) \overset{D}{\to} N (0, \sum_{γ} (t))$ , where Σ_γ(t) = (e_{x x} (t))⁻¹ Σ_e(t) (e_{x x} (t))⁻¹, Σ_e(t) = E{(Y_i (t) − μ_i (t))X_i (t)ξ_i (t)}^⊗2.

3.2 Confidence intervals and simultaneous confidence bands

Let γ⁽^k⁾(t) be the kth component of γ(t). Similar notations are used throughout with the superscript (k) denoting the kth component of the corresponding vector. Assuming nh⁵ → 0, based on Theorem 2, the under-smoothing avoids estimating the second derivative γ̈(t) and controls the size of the bias term. The large sample pointwise confidence intervals for γ⁽^k⁾(t), 0 < t < τ, is obtained by

{\hat{γ}}^{(k)} (t) \pm {(n h)}^{- 1 / 2} z_{α / 2} {[n^{- 1} \sum_{i = 1}^{n} {{\hat{g}}_{i}^{(k)} (t)}^{2}]}^{1 / 2} .

(13)

By Theorem 3, the pointwise confidence intervals for Γ⁽^k⁾(t), 0 < t < τ, is given by

{\hat{Γ}}^{(k)} (t) \pm n^{- 1 / 2} z_{α / 2} {[n^{- 1} \sum_{i = 1}^{n} {{\hat{H}}_{i}^{(k)} (t)}^{2}]}^{1 / 2} .

(14)

Furthermore, based on Theorem 3, simultaneous confidence bands and hypothesis tests related to the regression coefficient functions γ(t) can be constructed. A key component is the estimation of confidence coefficients and the critical values. The Gaussian multiplier resampling method of Lin et al. (1993) has been widely employed for this purpose and is described in the following.

Let $G_{n}^{*} (t) = n^{- 1 / 2} \sum_{i = 1}^{n} {\hat{H}}_{i} (t) ξ_{i}$ , where ξ₁, ξ₂, …, ξ_n are independent identically distributed (iid) standard normal random variables independent from the observed data set. By Lemma 1 of Sun and Wu (2005), the processes G_n(t) and $G_{n}^{*} (t)$ given the observed data sequence converge weakly to the same zero-mean Gaussian process on [t₁, t₂]. To approximate the distribution of G_n(t), we simulate a large number of realizations from $G_{n}^{*} (t)$ by repeatedly generating (ξ₁, …, ξ_n) while fixing {Y_i (t), X_i (t), Z_i (t), N_i (t)), t ≥ 0} at their observed values. Let c_α be the (1 − α)- quantile of ${sup}_{t_{1} \leq t \leq t_{2}} ∣ G_{n}^{* (k)} (t) / {[\sum_{i = 1}^{n} {{\hat{H}}_{i}^{(k)} (t)}^{2} / n]}^{1 / 2} ∣$ , which can be approximated by repeatedly generating independent normal samples (ξ₁, …, ξ_n). An asymptotic 1− α simultaneous confidence bands for Γ⁽^k⁾(t) on [t₁, t₂] is given by

{\hat{Γ}}^{(k)} (t) \pm n^{- 1 / 2} c_{α} {[n^{- 1} \sum_{i = 1}^{n} {{\hat{H}}_{i}^{(k)} (t)}^{2}]}^{1 / 2} .

(15)

3.3 Hypothesis testing of regression coefficients

The generalized semiparametric regression model (1) postulates that the covariates effects are constant for some and are time-varying for others. A formal hypothesis testing procedure can be established to check whether the effect of a covariate is time-varying under model (1). This can lead to more efficient estimation when the effects of some covariates are not really time-varying. We consider testing the null hypothesis H₀₁ that γ⁽^k⁾(t) is constant for 0 ≤ t ≤ τ.

Under H₀₁, $Γ^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} Γ^{(k)} (t_{2}) = 0$ for t ∈ [t₁, t₂]. By Theorem 3 and the continuous mapping theorem,

n^{1 / 2} {{\hat{Γ}}^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} {\hat{Γ}}^{(k)} (t_{2})} = n^{1 / 2} {{\hat{Γ}}^{(k)} (t) - Γ^{(k)} (t)} - \frac{t - t_{1}}{t_{2} - t_{1}} n^{1 / 2} {{\hat{Γ}}^{(k)} (t_{2}) - Γ^{(k)} (t_{2})}

converges weakly to $G^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} G^{(k)} (t_{2})$ , where G⁽^k⁾(t) is the kth component of the limiting Gaussian process G(t) of n^1/2 {Γ̂(t) − Γ(t)}. The rationale leads to the following constructions of the test statistics:

S_{1} = sup_{t_{1} \leq t \leq t_{2}} n^{1 / 2} | {\hat{Γ}}^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} {\hat{Γ}}^{(k)} (t_{2}) |

and

L_{1} = \int_{t_{1}}^{t_{2}} n {{\hat{Γ}}^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} {\hat{Γ}}^{(k)} (t_{2})}^{2} d t .

By the continuous mapping theorem, under H₀₁, the test statistic S₁ converges in distribution to ${sup}_{t_{1} \leq t \leq t_{2}} | G^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} G^{(k)} (t_{2}) |$ , and the test statistic S₁ converges in distribution to $\int_{t_{1}}^{t_{2}} {G^{(k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} G^{(k)} (t_{2})}^{2} d t$ . The two test statistics are commonly used in statistics literature with S₁ referred as the supremum type and L₁ as the integrated square type, cf., Martinussen and Scheike (2006).

Let

S_{1}^{*} = sup_{t_{1} \leq t \leq t_{2}} n^{1 / 2} | G_{n}^{* (k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} G_{n}^{* (k)} (t_{2}) |

and

L_{1}^{*} = \int_{t_{1}}^{t_{2}} n {G_{n}^{* (k)} (t) - \frac{t - t_{1}}{t_{2} - t_{1}} G_{n}^{* (k)} (t_{2})}^{2} d t .

The critical values of S₁ and L₁ can be approximated by simulating a number of copies of $S_{1}^{*}$ and $L_{1}^{*}$ obtained by repeatedly generating independent normal samples (ξ₁, …, ξ_n) while holding the observed data fixed. For example, the critical values of test statistics S₁ and L₁ at the significance level α can be estimated by the upper α quantile of, say 1, 000, copies of $S_{1}^{*}$ and $L_{1}^{*}$ , respectively. The p-values of the tests based on S₁ and L₁ are the percentages of $S_{1}^{*}$ and $L_{1}^{*}$ exceeding S₁ and L₁, respectively. The null hypothesis is rejected if the p-values are less than α.

The tests of the null hypothesis H₀₂ that γ⁽^k⁾(t) = 0 for 0 ≤ t ≤ τ can also be constructed similarly. In particular, one may consider the test statistics: S₂ sup_t_₁≤_t_≤_t_₂n^1/2|Γ̂⁽^k⁾(t) and $L_{2} = \int_{t_{1}}^{t_{2}} n {{\hat{Γ}}^{(k)} (t)}^{2} d t$ . The reference distributions of S₂ and L₂ can be generated based on $S_{2}^{*} = {sup}_{t_{1} \leq t \leq t_{2}} n^{1 / 2} ∣ G_{n}^{* (k)} (t) ∣$ and $L_{2}^{*} = \int_{t_{1}}^{t_{2}} n {G_{n}^{* (k)} (t)}^{2} d t$ , respectively.

3.4 Selections of bandwidth and link function

Let σ⁽^k⁾(t) be the (k, k)th element of Σ_γ(t). It follows from Theorem 2 that the mean integrated square error for estimating the kth component γ⁽^k⁾(t) over [t₁, t₂] is

\int_{t_{1}}^{t_{2}} [\frac{1}{4} μ_{2}^{2} {{\ddot{γ}}_{0}^{(k)} (t)}^{2} h^{4} + \frac{1}{n h} σ^{(k)} (t)] d t .

The asymptotic optimal bandwidth is given by

h_{opt, k} = {[\frac{\int_{t_{1}}^{t_{2}} σ^{(k)} (t) d t}{\int_{t_{1}}^{t_{2}} μ_{2}^{2} {{\ddot{γ}}_{0}^{(k)} (t)}^{2}}]}^{1 / 5} n^{- 1 / 5} .

The optimal theoretical bandwidth is difficult to achieve since it involves estimating the second derivative ${\ddot{γ}}_{0}^{(k)} (t)$ . In practice, the appropriate bandwidth selection can be based on a cross-validation method. This approach is widely used in nonparametric function estimation literature, see Rice and Silverman (1991) for leave-one-subject-out cross-validation approach and Tian et al. (2005) for K -fold cross-validation approach.

An analog of the K -fold cross-validation approach in the current setting is to divide the data into K equal-sized groups. Let D_k denote the kth subgroup of data, then the kth prediction error is given by

{P E}_{k} (h) = \sum_{i \in D_{k}} \int_{t_{1}}^{t_{2}} {[Y_{i} (t) - ϕ {{({\hat{γ}}_{(- k)} (t))}^{T} X_{i} (t) + {\hat{β}}_{(- k)}^{T} Z_{i} (t)}]}^{2} {d N}_{i} (t),

(16)

for k = 1, …, K, where γ̂₍₋_k₎(t) and β̂₍₋_k₎ are the estimators of γ₀(t) and β₀ based on the data without the subgroup D_k. The data-driven bandwidth selection based on the K -fold cross-validation is to choose the bandwidth h that minimizes the total prediction error $P E (h) = \sum_{k = 1}^{K} {P E}_{k} (h)$ . As we show in Sect. 5 in the analysis of a HIV-1 RNA data set from an AIDS clinical trial, the K -fold cross-validation bandwidth selection provides a working tool for locating an appropriate bandwidth.

Our estimation procedure for model (1) holds for a wide class of link functions. This presents an opportunity to select the most appropriate link function for a particular application. In some applications the choice may be based on prior knowledge, but more often it will be a pragmatic choice based on what gives the “best fit”. One natural criterion for accessing the model fit is the regression deviation defined as

R D (g (\cdot), h_{c v}) = \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} {[Y_{i} (t) - g^{- 1} {{({\hat{γ}}_{g} (t))}^{T} X_{i} (t) + {\hat{β}}_{g}^{T} Z_{i} (t)}]}^{2} {d N}_{i} (t),

(17)

where h_cv is the bandwidth selected based on the K -fold cross-validation method for the given link function g(·) described above, and γ̂_g(t) and β̂_g are the estimators of γ₀(t) and β₀ under model (1) with the bandwidth h_cv. In practice, the link function g(·) can be selected to minimize the regression deviation. This approach is illustrated through a data example in Sect. 5.

4 A simulation study

In this section, we examine finite sample properties of the estimation and hypothesis testing procedures proposed for model (1). The performances of the estimators for β and γ(t) at a fixed time t are measured through the bias, the sample mean of the estimated standard errors (ESE), the sample standard error of the estimators (SEE) and the 95 % empirical coverage probability (CP). To evaluate the overall performance of the estimator γ̂⁽^k⁾(t) on the interval [h, τ − h], we consider the square root of integrated mean square error ${RMSE}_{k} = {\frac{1}{N (τ - 2 h)} \sum_{j = 1}^{N} \int_{h}^{τ - h} {({\hat{γ}}_{j}^{(k)} (t) - γ_{0}^{(k)} (t))}^{2} d t}^{1 / 2}$ , where N is the repetition number, ${\hat{γ}}_{j}^{(k)} (t)$ is the jth estimate of γ⁽^k⁾(t) for j = 1, …, N. We use the unit weight function W_i (t) = 1 and the Epanechnikov kernel K (u) = 0.75(1− u²)I (|u| ≤ 1) throughout the simulation. We take t₁ = 0 and t₂ = τ in the estimating functions (4) and (8).

The performance of the estimators are examined under the following selected setting of model (1), in which we take the link function g(x) = ln(x):

Y_{i} (t) = exp {0.5 \sqrt{t} + 0.5 sin (2 t) X_{i} + β Z_{i}} + ε_{i} (t), i = 1, \dots, n,

(18)

for 0 ≤ t ≤ τ with τ = 3.5, where X_i is a Bernoulli random variable with the success probability of 0.5, Z_i is uniformly distributed on (0, 1), ε_i (t) is N (φ_i, 0.5²) conditional on φ_i and φ_i is N (0, 1). Here γ(t) = (γ₁(t), γ₂(t))^T with γ₁(t) = 0.5t^1/2 and γ₂(t) = 0.5 sin(2t).

We consider three models for the sampling times. The first model is a Poison process with the proportional mean rate

α (t ∣ X_{i}, Z_{i}) = 0.6 exp (0.7 Z_{i}), i = 1, \dots, n .

(19)

The second model is a Poison process with the additive mean rate

α (t ∣ X_{i}, Z_{i}) = 0.4 + 0.9 Z_{i}, i = 1, \dots, n .

(20)

To examine the performance of the proposed method when the sampling strategy depends on the past history, we consider a nonhomogeneous poisson process for the sampling times with the intensity function

λ (t ∣ Z_{i}, Z_{i}^{*}) = 0.12 t^{1 / 2} exp {2 Z_{i} + Z_{i}^{*} (t)},

(21)

where Z_i is uniform on (0, 1) and $Z_{i}^{*} (t) = 1$ if there was an event within the interval [t − 1, t) and 0 otherwise. For all the three sampling models, the censoring times C_i are generated from U(1.5, 8). There are approximately 3 observations per subject in the interval [0, τ] and about 30 % subjects are censored before τ = 3.5.

Table 1 summarizes the bias, SEE, ESE and CP for β and RMSE for γ(t) under the longitudinal model (18) with β = 0.5 and with the sampling times models (19)–(21). The integrals are evaluated on the grid points s_i = 0.05i, i = 1, 2, …, 69. The summaries of performance of γ̂(t) at time points 0.5 j, j = 1, …, 6, are given in Table 2. Each entry of the tables is calculated based on 1, 000 repetitions. Table 1 and Table 2 include the simulation results for β = 0.5 and n = 200 and 300. The expanded simulations for β = 0.0 and 1.5 and at n = 100 are also conducted but not reported here. The simulation studies demonstrate that the proposed estimation procedures perform well for three sampling situations considered here. It appears that the estimates are unbiased and there is a good agreement between the estimated and empirical standard errors. The empirical coverage probabilities are reasonable for both sample sizes 200 and 300. Plots of γ̂₁(t) and γ̂₂(t) for model (18) are depicted in Fig. 1 when β = 0.5 for n = 100 and h = 0.3. Figure 1a, b in the first row are under the proportional sampling model (19), Fig. 1c, d in the second row are under the additive sampling model (20) and Fig. 1e, f in the third row are under the sampling model (21). The estimators γ̂₁(t) and γ̂₂(t) are essentially unbiased. These figures also show that the proposed estimation procedures perform well for the nonparametric components under these three different sampling models.

Table 1.

Summary of bias, SEE, ESE and CP for β and RMSE for γ(t) under the longitudinal model (18) with β = 0.5, $γ_{1} (t) = 0.5 \sqrt{t}$ and γ₂(t) = 0.5 sin(2t) and with the sampling times models (19)–(21)

n	h	Bias	SEE	ESE	CP	RM SE₁	RM SE₂
Under sampling times model (19)
200	0.3	0.0012	0.0671	0.0657	94.1	0.0753	0.0960
	0.4	0.0020	0.0681	0.0660	94.7	0.0735	0.0907
	0.5	0.0031	0.0680	0.0664	94.4	0.0704	0.0878
300	0.3	0.0028	0.0562	0.0540	94.4	0.0678	0.0849
	0.4	0.0011	0.0566	0.0539	94.4	0.0608	0.0758
	0.5	−0.0013	0.0561	0.0542	93.2	0.0570	0.0750
Under sampling times model (20)
200	0.3	−0.0010	0.0676	0.0681	94.7	0.0754	0.0959
	0.4	0.0032	0.0709	0.0685	93.8	0.0758	0.0896
	0.5	0.0035	0.0693	0.0687	94.8	0.0695	0.0855
300	0.3	−0.0043	0.0580	0.0557	93.5	0.0690	0.0843
	0.4	−0.0020	0.0555	0.0560	95.9	0.0600	0.0743
	0.5	0.0005	0.0553	0.0561	94.7	0.0569	0.0730
Under sampling times model (21)
200	0.3	0.0004	0.0776	0.0703	0.923	0.0850	0.0914
	0.4	−0.0015	0.0730	0.0703	0.944	0.0803	0.0856
	0.5	−0.0025	0.0711	0.0705	0.948	0.0730	0.0799
300	0.3	0.0013	0.0595	0.0575	0.945	0.0679	0.0751
	0.4	0.0024	0.0603	0.0581	0.935	0.0668	0.0703
	0.5	−0.0007	0.0589	0.0577	0.943	0.0605	0.0702

Open in a new tab

Table 2.

Summary of bias, SEE, ESE and CP for γ (t) at t = 0.5, 1.0, 1.5, 2.0, 2.5, 3.0 for n = 200 and h = 0.4 under the longitudinal model (18) with β = 0.5, $γ_{1} (t) = 0.5 \sqrt{t}$ and γ₂(t) = 0.5 sin(2t) and with the sampling times models (19)–(21)

γ_{1} (t) = 0.5 \sqrt{t}

γ₂(t) = 0.5 sin(2t)

Bias

SEE

ESE

Bias

SEE

ESE

Under sampling times model (19)

0.5

−0.0164

0.0876

0.0862

0.940

−0.0181

0.0940

0.0909

0.929

1.0

−0.0110

0.0792

0.0766

0.946

−0.0207

0.0771

0.0772

0.927

1.5

−0.0079

0.0708

0.0707

0.941

−0.0012

0.0836

0.0780

0.930

2.0

−0.0030

0.0698

0.0674

0.944

0.0221

0.0931

0.0916

0.945

2.5

−0.0040

0.0661

0.0662

0.958

0.0284

0.1004

0.0938

0.911

3.0

−0.0038

0.0661

0.0648

0.944

0.0073

0.0792

0.0754

0.932

Under sampling times model (20)

0.5

−0.0148

0.0893

0.0869

0.949

−0.0213

0.0929

0.0892

0.930

1.0

−0.0057

0.0804

0.0776

0.932

−0.0283

0.0776

0.0758

0.921

1.5

−0.0050

0.0745

0.0724

0.933

−0.0042

0.0806

0.0774

0.942

2.0

−0.0026

0.0711

0.0689

0.936

0.0187

0.0963

0.0904

0.925

2.5

−0.0035

0.0699

0.0675

0.937

0.0230

0.0941

0.0925

0.928

3.0

−0.0046

0.0699

0.0668

0.931

0.0070

0.0808

0.0748

0.923

Under sampling times model (21)

0.5

−0.0097

0.1029

0.0947

0.927

−0.0193

0.1075

0.0961

0.921

1.0

−0.0078

0.0867

0.0808

0.920

−0.0225

0.0780

0.0749

0.927

1.5

−0.0033

0.0774

0.0754

0.936

−0.0028

0.0793

0.0739

0.933

2.0

−0.0023

0.0749

0.0706

0.924

0.0195

0.0864

0.0812

0.922

2.5

−0.0013

0.0715

0.0673

0.933

0.0318

0.0819

0.0765

0.895

3.0

−0.0001

0.0666

0.0649

0.941

0.0074

0.0606

0.0575

0.936

Open in a new tab

Fig. 1 — Plots of γ̂(t) for model (18) when γ₁(t) = 0.5t^1/2, γ₂(t) = 0.5 sin(2t) and β = 0.5 for n = 100 and h = 0.3. a, b in the *first row* are under the proportional sampling model (19), c, d in the *second row* are under the additive sampling model (20) and e, f in the *third row* are under the sampling model (21). The *solid lines* are the estimates and the *dashed lines* are the true *curves*

The following models are considered to evaluate the performance of the test statistics S₁ and L₁ for testing H₀₁:

Y_{i} (t) = exp {0.5 \sqrt{t} + {0.5 - θ sin (2 t)} X_{i} + 0.5 Z_{i}} + ε_{i} (t), i = 1, \dots, n,

(22)

for 0 ≤ t ≤ τ, where the distributions of X_i, Z_i and ε_i (t) are same as those given in model (18). Different values of θ are to be selected to examine the power of the tests.

The observed sizes of the test statistics are calculated under θ = 0. The powers of the tests are evaluated at θ = 0.1, 0.15 and 0.2. Table 3 lists the empirical sizes and powers of the test statistics S₁ and L₁ at the significance level 0.05 under the sampling models (19)–(21). Each entry is based on 1, 000 repetitions. Each p-value is estimated by generating 1, 000 independent Gaussian random samples. The empirical sizes of both the tests are reasonably close to the 0.05 nominal level. The empirical power increases when sample size increases. There is also an increased power when θ increases, which represents an increased time-varying effect under model (22). Again, the performances of the tests are robust to the models of sampling times.

Table 3.

Empirical sizes and powers of the tests based on S₁ and L₁ at nominal level α = 0.05 under the longitudinal model (22) with the sampling times models (19)–(21)

n	h	Size		Power

		θ = 0		θ = 0.1		θ = 0.15		θ = 0.2
		S₁	L₁	S₁	L₁	S₁	L₁	S₁	L₁
Under sampling times model (19)
200	0.3	0.053	0.054	0.503	0.526	0.851	0.863	0.968	0.970
	0.4	0.055	0.057	0.542	0.566	0.851	0.859	0.980	0.981
	0.5	0.047	0.045	0.542	0.536	0.853	0.864	0.976	0.976
300	0.3	0.054	0.054	0.687	0.684	0.954	0.960	0.997	0.997
	0.4	0.060	0.054	0.699	0.702	0.950	0.952	0.999	0.998
	0.5	0.059	0.054	0.696	0.689	0.955	0.962	0.996	0.996
Under sampling times model (20)
200	0.3	0.057	0.056	0.535	0.554	0.986	0.987	0.979	0.984
	0.4	0.053	0.049	0.529	0.542	0.861	0.862	0.975	0.977
	0.5	0.066	0.058	0.565	0.558	0.872	0.867	0.974	0.973
300	0.3	0.064	0.055	0.695	0.703	0.952	0.959	0.997	0.997
	0.4	0.066	0.069	0.717	0.718	0.956	0.956	1.000	1.000
	0.5	0.052	0.045	0.710	0.717	0.958	0.965	0.999	0.999
Under sampling times model (21)
200	0.3	0.060	0.065	0.572	0.599	0.887	0.895	0.990	0.986
	0.4	0.065	0.066	0.584	0.578	0.886	0.885	0.986	0.989
	0.5	0.058	0.059	0.604	0.598	0.880	0.886	0.988	0.989
300	0.3	0.058	0.069	0.738	0.755	0.960	0.965	0.998	0.998
	0.4	0.065	0.066	0.760	0.768	0.967	0.968	0.999	0.998
	0.5	0.044	0.049	0.738	0.741	0.965	0.969	1.000	0.999

Open in a new tab

Finally, we conduct a small simulation study under the identity link function to compare with the joint modelling method of Lin and Ying (2001) in which the sampling times are modelled through the proportional mean rate model. We consider the following model for the longitudinal response

Y_{i} (t) = α (t) + β Z_{i} + ε_{i} (t),

(23)

where Z_i and ε_i (t) are same as those for model (18), β = 1, and α(t) is taken to be 1+t or 1+ t³. Table 4 list the summaries of the estimation for β for two different choices of α(t) using the method of Lin and Ying (2001) (L&Y) and the proposed method with h = 0.3, 0.4 and 0.5 when the sampling times are generated from model (19)–(21). Each entry is based on 1, 000 repetitions. The estimation of Lin and Ying (2001) has larger biases when the sampling model is mis-specified under (20) and (21), especially when the sampling strategy depends on the past history and the intercept α(t) varies more. In all the cases, Lin and Ying (2001) estimation yields large variances compared to the proposed method.

Table 4.

Comparisons of the estimation for β using the proposed method and the method of Lin and Ying (2001) under model (23) with β = 1 and two different choices of α(t) for n = 200

h	α(t) = 1 + t				α(t) = 1 + t³
h	Bias	SEE	ESE	CP	Bias	SEE	ESE	CP
Under sampling times model (19)
0.3	0.0008	0.1665	0.1648	0.952	0.0065	0.1692	0.1675	0.944
0.4	0.0013	0.1660	0.1649	0.947	0.0072	0.1693	0.1714	0.951
0.5	0.0013	0.1659	0.1649	0.944	0.0076	0.1693	0.1786	0.962
L&Y	0.0024	0.1772	0.1786	0.947	−0.0390	0.9055	0.8756	0.943
Under sampling times model (20)
0.3	0.0052	0.1724	0.1728	0.954	−0.0019	0.1801	0.1756	0.942
0.4	0.0053	0.1720	0.1728	0.952	−0.0022	0.1793	0.1794	0.951
0.5	0.0051	0.1717	0.1728	0.953	−0.0021	0.1788	0.1867	0.958
L&Y	0.0033	0.1802	0.1879	0.952	0.0182	0.9088	0.9081	0.949
Under sampling times model (21)
0.3	0.0062	0.1779	0.1765	0.948	0.0068	0.1801	0.1796	0.938
0.4	0.0061	0.1775	0.1764	0.950	0.0068	0.1797	0.1863	0.949
0.5	0.0059	0.1770	0.1764	0.951	0.0063	0.1797	0.1985	0.964
L&Y	0.0905	0.1885	0.2026	0.931	0.7316	0.9190	0.9414	0.871

Open in a new tab

5 An application

In this section, we apply the proposed methods to a real data example. We demonstrate how to select the link function that provides better fit of the data using the procedures given in Sect. 3.4. The estimation and inference are then carried out using the selected link function. We consider the analysis of a HIV-1 RNA data set from an AIDS clinical trial. In this study, all subjects initiated the antiretroviral treatment at time 0 (the baseline). Some subjects received a single protease inhibitor (PI) while others received a double-PI antiretroviral regimens in treating HIV-infected patients. HIV-1 RNA levels in plasma was measured repeatedly during the follow-up. The scheduled visit times were at weeks 0, 2, 4, 8, 16 and 24. But the actual visit times of individuals may vary around the scheduled visiting times. Some patients had prior antiviral treatment with non-nucleoside analogue reverse transcriptase inhibitors (NNRTI) and others did not have prior NNRTI treatment. The prior NNRTI treatment is considered to be a factor that affects the antiviral response to the antiretroviral regimens in the current study.

A total of 481 patients were enrolled in the study, with 2, 626 total visits. Owing to technical limitations, 175 measurements of HIV-1 RNA levels were censored below the detection limit, and three were censored above the detection limit. We restrict our analysis to those responses within the detectable range. This data set has been analyzed by Sun and Wu (2005). Here we use the same transformed time scale t = log₁₀(day of actual visit + 40) − log₁₀(32) of the actual visits so that the transformed sampling time points are more evenly distributed suitable for bandwidth selection. The maximum of transformed sampling times is τ = 0.88. The response variable Y (t) is the change of HIV-1 RNA level using a log₁₀ scale at time t ∈ [0, τ] from the baseline. We refer to Sun and Wu (2005) for the detailed discussions of the transformations. Let X = 1 denote the patients who received a double-PI treatment and X = 0 for patients who received a single-PI treatment. Let Z be the indicator of the prior antiviral treatment with NNRTI, with 1 for having had NNRTI and 0 for having not received NNRTI.

Analysis of Sun and Wu (2005) shows that the effect of treatment (double-PI versus single-PI) is time-varying after adjusting for the prior NNRTI antiviral treatment experience under the semiparametric additive regression model. Here we consider to fit the following generalized semiparametric model

μ_{i} (t) = g^{- 1} {γ_{1} (t) + γ_{2} (t) X_{i} + β Z_{i}},

(24)

for 0 ≤ t ≤ τ, where g(·) is a known link function. In the following we illustrate the selection of g(·) based on the criteria (17) among the two commonly used link functions, the identity link function and the logarithm link function. We use the unit weight function W_i (t) = 1 and set t₁ = 0 and t₂ = τ in (4) for the estimation of β.

Based on the procedure given in Sect. 3.4, the Inline graphic -fold cross-validation method with K = 50 for the identity link function yields the bandwidth h_cv = 0.06. The plot of the total prediction error is given in Fig. 2a. The regression deviation is R D = 3.3495×10³ using (17). For the logarithm link function, the -fold cross-validation with K = 50 yields h_cv = 0.07. The plot of the total prediction error is similar to Fig. 2a. The corresponding regression deviation is R D = 3.3801 × 10³, which is larger than that under the identity link function. This suggests that model (24) with the identity link function provides better fit of the data.

Fig. 2 — The curve of the total prediction error PE(h) is plotted against h in (a) and the change of β̂ with h is shown in (b) under the identity link function for the HIV-1 RNA data

Under the identity link function, the estimate β̂ is 0.6243 with the standard error 0.0883 for h = 0.06. The estimates γ̂₁(t) and γ̂₂(t) and the 95 % pointwise confidence intervals are plotted in the first row in Fig. 3. The p-values for testing for time-dependence of γ₂(t) are 0.009 and 0.022 using the test statistics S₁ and L₁, respectively, based on 1, 000 Gaussian samples. Our experience shows that the “optimal” bandwidth that minimizes the total prediction error tends to be a little small for yielding smoothed curves for the nonparametric regression function estimation. The estimation of the parametric components are not greatly affected by the choices of the bandwidth. The plot of β̂ against h is given in Fig. 2b. For example, β̂ is 0.6150 with the standard error 0.0888 for h = 0.09 and β̂ is 0.6064 with the standard error is 0.0894 for h = 0.12. The p-values for testing for time-dependence of γ₂(t) are 0.014 and 0.034 for h = 0.09 using the test statistics S₁ and L₁, respectively. The corresponding p-values are 0.039 and 0.022 for h = 0.12 based on the test statistics S₁ and L₁, respectively. The estimates γ̂₁(t) and γ̂₂(t) and the 95 % pointwise confidence intervals with h = 0.09 and 0.12 are plotted in the second and third rows of Fig. 3, respectively. Our hypothesis tests indicate that the treatment effect changes with time. The p-values for testing γ₂(t) = 0 using the test statistics S₂ and L₂ are 0.036 and 0.044, respectively for h = 0.06. The double-PI antiretroviral regimens works better than the single PI regimens in reducing viral load in treating HIV-infected patients and this effect becomes stronger over time during the course of the study as shown in Fig. 3. The patients who had prior antiviral treatment with NNRTIs tend to have higher level of viral load than those who did not have the prior treatment.

Fig. 3 — The plots of the estimates γ̂₁(t) and γ̂₂(t) and the 95 % pointwise confidence intervals for three different bandwidths h = 0.06, 0.09 and 0.12 under the identity link function for the HIV-1 RNA data

6 Discussion

In this paper we study the generalized semiparametric regression model for longitudinal data. The semiparametric model (1) allows the covariate effects to be constant for some and time-varying for others. We proposed an estimation method that automatically adjusts for heterogeneity of sampling times. The nonparametric components of the model are estimated using the local linear estimating equations and the parametric components are estimated through the weighted profile estimating functions. Unlike the profile-based estimation methods of Lin and Carroll (2001), Lin et al. (2007) and Fan et al. (2007), the proposed method applies to the situations where the conditional distributions of the sampling times may depend on the past sampling history as well as covariates and where the subjects may dropout in the follow-up studies. Also unlike the joint modelling approaches of Martinussen and Scheike (1999), Martinussen and Scheike (2000, 2001), Lin and Ying (2001), Hu et al. (2003) and Fan and Li (2004), the proposed method utilizes the sampling process directly without having to model its intensity function λ_i (t) or the mean rate function α_i (t). The model specifications of λ_i (t) or α_i (t) can be very difficult and problematic especially when the sampling strategy may depend on the past sampling history. Our simulation studies show that the proposed method works well under a variety of sampling models including the proportional mean rate model, additive mean rate model and the sampling model that depends on the past sampling history. Simulations also show that, for some existing methods, misspecification of the sampling model can result in large estimation bias.

This paper presents an unified approach to the semiparametric model (1) with a general link function. Different link functions can be specified for more flexible modelling of longitudinal data. Both the categorical and continuous longitudinal responses can be modelled with appropriately chosen link functions. For example, the identity and logarithm link functions can be used for the continuous response variables while the logit link function can be used for the binary responses. Model (1) presents an opportunity for model selection of the link function. A criteria is proposed for selecting the link function that provides better fit. The procedures are illustrated with a data example.

The estimation method developed in Sects. 2.1–2.3 assumes existence of intensity for the counting processes that record the random sampling time points. The method is extended to the fixed designs in Sect. 2.4 where data are observed at the planned sampling time points. The asymptotic properties of the estimators for both cases are derived in Sect. 3.1. In the situation of a “mixed design” where some observations are made at fixed time points with positive probability while other observations are made at random sampling points, we believe that the estimating functions (3) and (4) are still valid yielding consistent estimators. However, the asymptotic properties need to be carefully worked out since the asymptotic rate of γ̂(t) is (nh)^−1/2 under the random design and is n^−1/2 under the fixed design. The derivation of the asymptotic results require clear classifications whether a observation time point is from a planned visiting time or from a random sampling time.

The proposed estimation method is a marginal approach that does not take into consideration of the correlations between the repeated measurements. Following the discussions in Sect. 3.1, the efficiency of the estimation can be improved by selecting W_i (t) = W (t, X_i (t), Z_i (t)), where W (t, x, z) converges in probability to w(t, x, z) = μ̇_i (t)/{σ₃ (t|x, z)}². Since model (1) does not specify the conditional variance structure for $σ_{ε}^{2} (t ∣ x, z)$ , such selection can be difficult. When the dimension of covariates is small, a two-stage estimation procedure can be considered. In the first stage, estimate β and γ(t) with the identity weight function to obtain β̂_I and γ̂_I (t). In the second stage, the updated estimation of β and γ(t) is obtained by choosing the weight $W_{i} (t) = {\hat{\dot{μ}}}_{i} (t) / {{\hat{σ}}_{ε} (t ∣ x, z)}^{2}$ , where ${\hat{\dot{μ}}}_{i} (t) = \dot{ϕ} {{\hat{γ}}_{I}^{T} (t) X_{i} (t) + {\hat{β}}_{I}^{T} Z_{i} (t)}$ and {σ_ε (t|x, z)}² equals

\frac{\sum_{j = 1}^{n} \int_{0}^{τ} {Y_{j} (s) - ϕ ({\hat{γ}}_{I}^{T} (s) X_{j} (s) + {\hat{β}}_{I}^{T} Z_{j} (s))}^{2} K_{h} (s - t) {\tilde{K}}_{b} (X_{j} - x, Z_{j} - z) {d N}_{j} (s)}{\sum_{j = 1}^{n} \int_{0}^{τ} K_{h} (s - t) {\tilde{K}}_{B} (X_{j} - x, Z_{j} - z) {d N}_{j} (s)},

where K̃_B (u) = |B|^−1/2 K̃ (B^−1/2u) and K̃(u) is a multivariate kernel function and B is the bandwidth (p + q) × (p + q) positive definite matrix. The strategy may be difficult to implement when the dimension of covariates is large because of the curse of dimensionality. Further research is warranted.

Acknowledgments

The authors thank the reviewers for their constructive comments that have improved the presentation and content of the paper. The research of Yanqing Sun was partially supported by NSF grants DMS-0905777 and DMS-1208978, NIH NIAID grant 2 R37 AI054165-10 and a fund provided by UNC Charlotte. The research of Liuquan Sun was partly supported by the National Natural Science Foundation of China Grants (No. 10731010, 10971015 and 10721101), the National Basic Research Program of China (973 Program) (No. 2007CB814902) and Key Laboratory of RCSDS, CAS (No. 2008DP173182).

Appendix

We assume the following conditions throughout the paper:

Condition A

The covariate processes X_i (·) and Z_i (·) are left continuous; The censoring time C_i is noninformative in the sense that $E {{d N}_{i}^{*} (t) ∣ X_{i} (t), Z_{i} (t), C_{i} \geq t} = E {{d N}_{i}^{*} (t) ∣ X_{i} (t), Z_{i} (t)}$ and E{Y_i (t)|X_i (t), Z_i (t), C_i ≥ t} = E{Y_i (t)|X_i (t), Z_i (t)}; ${d N}_{i}^{*} (t)$ is independent of Y_i (t) conditional on X_i (t), Z_i (t) and C_i ≥ t; the processes Y_i (t), X_i (t), Z_i (t) and α_i (t), 0 ≤ t ≤ τ, are bounded and their total variations are bounded by a constant; E|N_i (t₂) − N_i (t₁)|² ≤ L(t₂ − t₁) for 0 ≤ t₁ ≤ t₂ = τ, where L > 0 is a constant; the link function g(y) is monotone and its inverse function g⁻¹(x) is twice differentiable; γ₀(t), e_{x x} (t) and e_xz(t) are twice differentiable; (e_{x x} (t))⁻¹ is bounded over 0 ≤ t ≤ τ; the matrices A and Σ are positive definite; the weight process $W (t, x, z) \overset{P}{\to} w (t, x, z)$ uniformly in the range of (t, x, z); w(t, x, z) is differentiable with uniformly bounded partial derivatives; the kernel function K (·) is symmetric with compact support on [−1, 1] and bounded variation; bandwidth h → 0; E|N_i (t + h) − N_i (t − h)|²⁺^v = O(h), for some v > 0; the limit ${lim}_{n \to \infty} h E {\int_{0}^{τ} w_{i} (s) {Y_{i} (s) - μ_{i} (s)} X_{i} (s) K_{h} (s - t) {d N}_{i} (s)}^{\otimes 2} = \sum_{e} (t)$ exists and is finite.

Let $u_{a} (γ, β) = E ([ϕ {γ_{0}^{T} (t) X_{i} (t) + β_{0}^{T} Z_{i} (t)} - ϕ {γ^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)}] X_{i} (t) ξ_{i} (t) α_{i} (t))$ . Define γ_β (t) as the unique root such that u_a(γ_β, β) = 0 for β ∈ Inline graphic where is a neighborhood of β₀. Let $e_{β, x x} (t) = E [w_{i} (t) \dot{ϕ} {γ_{β}^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)} {X_{i} (t)}^{\otimes 2} α_{i} (t) ξ_{i} (t)]$ and $e_{β, x z} (t) = E [w_{i} (t) \dot{ϕ} {γ_{β}^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)} X_{i} (t) {(Z_{i} (t))}^{T} α_{i} (t) ξ_{i} (t)]$ . When β = β₀, we have γ_β(t) = γ₀(t). In this case, e_β_,_{x x} (t) = e_{x x} (t) and e_β_,_xz(t) = e_xz(t). Let $γ_{a β} (t) = {(γ_{β}^{T} (t), 0_{q}^{T})}^{T}$ where 0_q is a q × 1 vector of zeros.

Let H = diag{I_q, h I_q }. The following lemmas are used in the proofs of the main theorems. The proofs of the lemmas make repeated applications of the Glivenko-Cantelli Theorem (Theorem 19.4 of van der Vaart 1998). A sufficient condition for applying the Glivenko-Cantelli Theorem can be checked by estimating the order of the bracketing number, similar to the proof of Lemma 2 of Sun et al. (2009). This sufficient condition holds under the conditions provided in Condition A. The details are omitted to save space.

Lemma 1

Assume that Condition A holds. Then as n → ∞, $H {\tilde{γ}}_{a} (t, β) \overset{P}{\to} γ_{a β} (t)$ ,

H \frac{\partial {\tilde{γ}}_{a} (t, β)}{\partial β} \overset{P}{\to} (\begin{matrix} - {(e_{β, x x} (t))}^{- 1} e_{β, x z} (t) \\ 0_{q} \end{matrix}),

and H∂²γ̃(t, β)/∂β² converges in probability to a deterministic function of (t, β) of bounded variation, uniformly in t ∈ [t₁, t₂] ⊂ (0, τ) and β ∈ Inline graphic at the rate n^−1/2+^ν for ν > 0.

Proof of Lemma 1

To simplify the presentations, we use the notations γ_aβ and γ_β for γ_aβ (t) and γ_β (t), respectively. Let θ = H(γ_a − γ_aβ) and θ̃ = H(γ̃_a(t, β) − γ_aβ). By (3), θ̃ is the root of the following estimating function for fixed β:

U_{a} (γ_{a β} + H^{- 1} θ, β) = \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {Y_{i} (s) - {\tilde{μ}}_{a} (s, γ_{a β} + H^{- 1} θ, β ∣ X_{i}, Z_{i})} \times {\tilde{X}}_{i} (s, s - t) K_{h} (s - t) {d N}_{i} (s),

(25)

where ${\tilde{μ}}_{a} (s, γ_{a β} + H^{- 1} θ, β ∣ X_{i}, Z_{i}) = ϕ {θ^{T} {\tilde{U}}_{i} (s, s - t) + γ_{α β}^{T} (t) {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)}$ and Ũ_i (s, s − t) = H⁻¹ X_i (s, s − t).

By the Glivenko-Cantelli theorem,

\begin{array}{l} n^{- 1} {U_{a} (γ_{a β} + H^{- 1} θ, β) - U_{a} (γ_{a β}, β)} \\ = - n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {{\tilde{μ}}_{a} (s, γ_{a β} + H^{- 1} θ, β ∣ X_{i}, Z_{i}) - {\tilde{μ}}_{a} (s, γ_{a β}, β ∣ X_{i}, Z_{i})} {\tilde{X}}_{i} (s, s - t) K_{h} (s - t) {d N}_{i} (s) \\ \overset{P}{\to} - E (\int_{- 1}^{1} w_{i} (t) {\dot{μ}}_{i β} (t) θ^{T} {X_{i}^{T} (t), {u X}_{i}^{T} (t)}^{T} {X_{i}^{T} (t), 0}^{T} K (u) α_{i} (t) ξ_{i} (t) d u), \end{array}

uniformly in t ∈ [t₁, t₂], β ∈ Inline graphic and θ ∈ , a neighborhood of 0₂_q ∈ R²^q, where ${\dot{μ}}_{i β} (t) = \dot{ϕ} {γ_{β}^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)}$ . The limit has a unique root at θ = 0₂_q.

By the Glivenko-Cantelli theorem and (3), $n^{- 1} U_{a} (γ_{a β}, β) \overset{P}{\to} {u_{a}^{T} (γ_{β}, β), 0_{q}^{T}}^{T} = 0_{2 q}$ . It follows by Lemma 1 of Sun et al. (2009) that $\tilde{θ} \overset{P}{\to} 0_{2 q}$ uniformly in t and β. Thus

H {\tilde{γ}}_{a} (t, β) - γ_{a β} (t) \overset{P}{\to} 0_{2 q} uniformly in t \in [t_{1}, t_{2}] and β \in N_{β} .

(26)

Since U_a(γ̃_a(t, β), β) ≡ 0₂_q, γ̃_a(t, β) satisfies

{{\frac{\partial U_{a} (γ_{a}, β)}{\partial γ_{a}} \frac{\partial {\tilde{γ}}_{a} (t, β)}{\partial β} + \frac{\partial U_{a} (γ_{a}, β)}{\partial β}} |}_{γ_{a} = {\tilde{γ}}_{a} (t, β)} = 0_{2 q} .

(27)

Note that

\begin{array}{l} - n^{- 1} H^{- 2} \frac{\partial U_{a} (γ_{a}, β)}{\partial γ_{a}} = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) \dot{ϕ} {γ_{a}^{T} {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)} H^{- 2} {{\tilde{X}}_{i} (s, s - t)}^{\otimes 2} K_{h} (s - t) {d N}_{i} (s) \\ = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) \dot{ϕ} {{(H γ_{a})}^{T} H^{- 1} {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)} H^{- 2} {{\tilde{X}}_{i} (s, s - t)}^{\otimes 2} K_{h} (s - t) {d N}_{i} (s) . \end{array}

(28)

By the Glivenko-Cantelli theorem, the process

n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) \dot{ϕ} {η^{T} H^{- 1} {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)} H^{- 2} {{\tilde{X}}_{i} (s, s - t)}^{\otimes 2} K_{h} (s - t) {d N}_{i} (s)

converges in probability to

E [\int_{0}^{τ} w_{i} (t) \dot{ϕ} [η^{T} {X_{i}^{T} (t), {u X}_{i}^{T} (t)}^{T} + β^{T} Z_{i} (t)] (\begin{matrix} 1 & u \\ u & u^{2} \end{matrix}) \otimes {X_{i} {(t)}^{\otimes 2} ξ_{i} (t) α_{i} (t) K (u) d u}],

uniformly in t ∈ [t₁, t₂], β ∈ Inline graphic and η in a neighborhood of γ_aβ (t) at the rate n^−1/2+ ^ν for ν > 0.

It follows from (26) that

\begin{array}{l} - {n^{- 1} H^{- 2} \frac{\partial U_{a} (γ_{a}, β)}{\partial γ_{a}} |}_{γ_{a} = {\tilde{γ}}_{a} (t, β)} \\ \overset{P}{\to} E [w_{i} (t) \dot{ϕ} {γ_{β}^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)} (\begin{matrix} 1 & 0 \\ 0 & μ_{2} \end{matrix}) \otimes {X_{i} (t)}^{\otimes 2} ξ_{i} (t) α_{i} (t)], \end{array}

uniformly in t ∈ [t₁, t₂] and β ∈ Inline graphic at the rate n^−1/2+^ν for ν > 0.

Similarly,

\begin{array}{l} - {n^{- 1} H^{- 1} \frac{\partial U_{a} (γ_{a}, β)}{\partial β} |}_{γ_{a} = {\tilde{γ}}_{a} (t, β)} \\ = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) \dot{ϕ} {γ_{a}^{T} {\tilde{X}}_{i} (s, s - t) + β^{T} Z_{i} (s)} H^{- 1} {\tilde{X}}_{i} (s, s - t) {(Z_{i} (s))}^{T} {K_{h} (s - t) {d N}_{i} (s) |}_{γ_{a} = {\tilde{γ}}_{a} (t, β)} \\ \overset{P}{\to} (\begin{matrix} E [w_{i} (t) \dot{ϕ} {γ_{β}^{T} (t) X_{i} (t) + β^{T} Z_{i} (t)} X_{i} (t) {(Z_{i} (t))}^{T} ξ_{i} (t) α_{i} (t)] \\ 0_{q} \end{matrix}), \end{array}

(29)

uniformly in t ∈ [t₁, t₂] and β ∈ Inline graphic at the rate n^−1/2+^ν for ν > 0. It follows from (27) that

H \frac{\partial {\tilde{γ}}_{a} (t, β)}{\partial β} \overset{P}{\to} (\begin{matrix} - {(e_{β, x x} (t))}^{- 1} e_{β, x z} (t) \\ 0_{q} \end{matrix}),

(30)

at the rate n^−1/2+^ν for ν > 0, uniformly in t ∈ [t₁, t₂] and β ∈ Inline graphic .

By a similar argument, H∂²γ̃ (t, β)/∂β² converges in probability to a deterministic function of (t, β) of bounded variation, uniformly in t ∈ [t₁, t₂] and β ∈ Inline graphic .

Lemma 2

Under Condition A, as nh → ∞ and nh⁵ = O(1),

{(n h)}^{1 / 2} {\tilde{γ} (t, β_{0}) - γ_{0} (t) - \frac{1}{2} μ_{2} h^{2} {\ddot{γ}}_{0}^{T} (t)} = {(e_{x x} (t))}^{- 1} {(n h)}^{1 / 2} n^{- 1} U_{γ} (γ_{0}, β_{0}) + o_{p} (1),

(31)

uniformly in t ∈ [t₁, t₂] ⊂ (0, τ), where $μ_{2} = \int_{- 1}^{1} t^{2} K (t) d t$ and

U_{γ} (γ_{0}, β_{0}) = \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {Y_{i} (s) - μ_{i} (s)} X_{i} (s) K_{h} (s - t) {d N}_{i} (s) .

Further, (nh)^1/2n⁻¹U_γ(γ₀, β₀) = O_p(1) uniformly in t ∈ [t₁, t₂] ⊂ (0, τ).

Proof of Lemma 2

Let $γ_{0 a} (t) = {(γ_{0}^{T} (t), {\dot{γ}}_{0}^{T} (t))}^{T}$ , ρ_n = (nh)^1/2 and θ = ρ_n H(γ_a − γ₀_a(t)). By the first order Taylor expansion, we have

\begin{array}{l} n^{- 1} {U_{a} (γ_{0 a} + ρ_{n}^{- 1} H^{- 1} θ, β_{0}) - U_{a} (γ_{0 a}, β_{0})} \\ = - n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {{\tilde{μ}}_{a} (s, γ_{0 a} + ρ_{n}^{- 1} H^{- 1} θ, β_{0} ∣ X_{i}, Z_{i}) - {\tilde{μ}}_{a} (s, γ_{0 a}, β_{0} ∣ X_{i}, Z_{i})} \times {\tilde{X}}_{i} (s, s - t) K_{h} (s - t) {d N}_{i} (s) \\ = - n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {ρ_{n}^{- 1} θ^{T} {\tilde{U}}_{i} (s, s - t)} \dot{ϕ} {γ_{0 a}^{T} {\tilde{X}}_{i} (s, s - t) + β_{0}^{T} Z_{i} (s)} \times {\tilde{X}}_{i} (s, s - t) K_{h} (s - t) {d N}_{i} (s) + o_{p} (ρ_{n}^{- 1} θ) \\ = - n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {({\tilde{X}}_{i} (s, s - t))}^{\otimes 2} ρ_{n}^{- 1} H^{- 1} θ \dot{ϕ} {γ_{0 a}^{T} {\tilde{X}}_{i} (s, s - t) + β_{0}^{T} Z_{i} (s)} K_{h} (s - t) {d N}_{i} (s) + o_{p} (ρ_{n}^{- 1} θ), \end{array}

which holds uniformly in t ∈ [t₁, t₂]. Since θ̃ = ρ_n H (γ̃_a(t, β₀) − γ₀_a(t)) is the root of $U_{a} (γ_{0 a} + ρ_{n}^{- 1} H^{- 1} θ, β_{0})$ , it follows that θ̃ equals

{(n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {({\tilde{X}}_{i} (s, s - t))}^{\otimes 2} H^{- 1} \dot{ϕ} {γ_{0 a}^{T} {\tilde{X}}_{i} (s, s - t) + β_{0}^{T} Z_{i} (s)} K_{h} (s - t) {d N}_{i} (s) + o_{p} (ρ_{n}^{- 1}))}^{- 1} \times ρ_{n} n^{- 1} U_{a} (γ_{0 a}, β_{0}) .

The first q components of θ̃ yields

ρ_{n} (\tilde{γ} (t, β_{0}) - γ_{0} (t)) = {(e_{x x} (t))}^{- 1} ρ_{n} n^{- 1} U_{1} (γ_{0 a}, β_{0}) + o_{p} (ρ_{n}^{- 1}),

(32)

uniformly in t ∈ [t₁, t₂], where

U_{1} (γ_{0 a}, β_{0}) = \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {Y_{i} (s) - {\tilde{μ}}_{a} (s, γ_{0 a}, β_{0} ∣ X_{i}, Z_{i})} X_{i} (s) K_{h} (s - t) {d N}_{i} (s) .

By the local linear approximation for γ₀(s) around t,

\begin{array}{l} μ_{i} (s) - {\tilde{μ}}_{a} (s, γ_{0 a}, β_{0} ∣ X_{i}, Z_{i}) = ϕ {γ_{0}^{T} (s) X_{i} (s) + β_{0}^{T} Z_{i} (s)} - ϕ [{γ_{0}^{T} (t) + {\dot{γ}}_{0}^{T} (t) (s - t)} X_{i} (s) + β_{0}^{T} Z_{i} (s)] \\ = {\dot{μ}}_{i} (s) {\frac{1}{2} {\ddot{γ}}_{0}^{T} (t) X_{i} (s) {(s - t)}^{2} + O ({(s - t)}^{3})} (1 + o_{p} (1)), \end{array}

as s → t, where ${\dot{μ}}_{i} (s) = \dot{ϕ} {γ_{0}^{T} (s) X_{i} (s) + β_{0}^{T} Z_{i} (s)}$ . It follows that

\begin{array}{l} ρ_{n} n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) {μ_{i} (s) - {\tilde{μ}}_{a} (s, γ_{0 a}, β_{0} ∣ X_{i}, Z_{i})} X_{i} (s) K_{h} (s - t) {d N}_{i} (s) = \frac{1}{2} μ_{2} ρ_{n} h^{2} E {w_{i} (t) {\dot{μ}}_{i} (t) X_{i} (t) X_{i}^{T} (t) ξ_{i} (t) α_{i} (t)} {\ddot{γ}}_{0} (t) + o_{p} (ρ_{n} h^{2}) \\ = \frac{1}{2} μ_{2} ρ_{n} h^{2} e_{x x} (t) {\ddot{γ}}_{0} (t) + o_{p} (ρ_{n} h^{2}), \end{array}

uniformly in t ∈ [t₁, t₂]. Hence

\begin{array}{l} ρ_{n} n^{- 1} U_{1} (γ_{0 a}, β_{0}) = ρ_{n} n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} W_{i} (s) [Y_{i} (s) - μ_{i} (s) + {μ_{i} (s) - {\tilde{μ}}_{a} (s, γ_{0 a}, β_{0} ∣ X_{i}, Z_{i})}] X_{i} (s) K_{h} (s - t) {d N}_{i} (s) \\ = ρ_{n} n^{- 1} U_{γ} (γ_{0}, β_{0}) + \frac{1}{2} μ_{2} ρ_{n} h^{2} e_{x x} (t) {\ddot{γ}}_{0} (t) + o_{p} (ρ_{n} h^{2}), \end{array}

(33)

uniformly in t ∈ [t₁, t₂]. By (32) and (33),

ρ_{n} {\tilde{γ} (t, β_{0}) - γ_{0} (t) - \frac{1}{2} μ_{2} h^{2} {\ddot{γ}}_{0}^{T} (t)} = {(e_{x x} (t))}^{- 1} ρ_{n} n^{- 1} U_{γ} (γ_{0}, β_{0}) + o_{p} (ρ_{n}^{- 1}) + o_{p} (ρ_{n} h^{2}),

(34)

uniformly in t ∈ [t₁, t₂].

Following the same lines as the proof in Appendix A of Tian et al. (2005), we get (nh)^1/2n⁻¹U_γ(γ₀, β₀) = O_p(1) uniformly in t ∈ [t₁, t₂] ⊂ (0, τ).

Proof of Theorem 1

By Lemma 1 and application of the Glivenko-Cantelli theorem to the estimating function defined in (4), we have

\begin{array}{l} n^{- 1} U (β) \\ \overset{P}{\to} E {\int_{t_{1}}^{t_{2}} w_{i} (s) [Y_{i} (s) - ϕ {{(γ_{β} (s))}^{T} X_{i} (s) + β^{T} Z_{i} (s)}] \times [- {(e_{β, x z} (s))}^{T} {(e_{β, x x} (s))}^{- 1} X_{i} (s) + Z_{i} (s)] {d N}_{i} (s)} \\ = E {\int_{t_{1}}^{t_{2}} w_{i} (s) [ϕ {{(γ_{0} (s))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)} - ϕ {{(γ_{β} (s))}^{T} X_{i} (s) + β^{T} Z_{i} (s)}] \times [- {(e_{β, x z} (s))}^{T} {(e_{β, x x} (s))}^{- 1} X_{i} (s) + Z_{i} (s)] ξ_{i} (s) α_{i} (s) d s} \\ \equiv u (β), \end{array}

uniformly for β ∈ Inline graphic . Since u(β₀) = 0 and A is positive definite, β₀ is the unique root of u(β). By Theorem 5.9 of van der Vaart (1998), $\hat{β} \overset{P}{\to} β_{0}$ .

By Lemma 1 and the Glivenko-Cantelli theorem,

n^{- 1} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) [Y_{i} (s) - ϕ {{(\tilde{γ} (s, β_{0}))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)}] \frac{\partial^{2} \tilde{γ} (s, β_{0})}{\partial β^{2}} X_{i} (s) {d N}_{i} (s) \overset{P}{\to} 0 .

It follows that

\begin{array}{l} - {n^{- 1} \frac{\partial U (β)}{\partial β} |}_{β = β_{0}} \\ = n^{- 1} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) \dot{ϕ} {{(\tilde{γ} (s, β_{0}))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)} \times {{(\frac{\partial \tilde{γ} (s, β_{0})}{\partial β})}^{T} X_{i} (s) + Z_{i} (s)}^{\otimes 2} {d N}_{i} (s) + o_{p} (1) \\ \overset{P}{\to} E {\int_{t_{1}}^{t_{2}} w_{i} (s) \dot{ϕ} {{(γ_{0} (s))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)} {- {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s) + Z_{i} (s)}^{\otimes 2} {d N}_{i} (s)} \\ = A, \end{array}

(35)

uniformly in a neighborhood of β.

Now we show that n^−1/2U (β₀) converges in distribution to a normal distribution. By Taylor expansion,

ϕ {{(\tilde{γ} (s, β_{0}))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)} - ϕ {{(γ_{0} (s))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)} = {\dot{μ}}_{i} (s) {{(\tilde{γ} (s, β_{0}))}^{T} - {(γ_{0} (s))}^{T}} X_{i} (s) + O_{p} ({| | \tilde{γ} (s, β_{0}) - γ_{0} (s) | |}^{2}) .

By Lemmas 1 and 2,

\begin{array}{l} n^{- 1 / 2} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) [ϕ {{(\tilde{γ} (s, β_{0}))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)} - ϕ {{(γ_{0} (s))}^{T} X_{i} (s) + β_{0}^{T} Z_{i} (s)}] \times {{(X_{i} (s))}^{T} \frac{\partial \tilde{γ} (s, β_{0})}{\partial β} + {(Z_{i} (s))}^{T}} {d N}_{i} (s) \\ = n^{- 1 / 2} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) [{(\tilde{γ} (s, β_{0}))}^{T} - {(γ_{0} (s))}^{T}] \times {\dot{μ}}_{i} (s) X_{i} (s) {{(X_{i} (s))}^{T} \frac{\partial \tilde{γ} (s, β_{0})}{\partial β} + {(Z_{i} (s))}^{T}} {d N}_{i} (s) + O_{p} ({(n h^{2})}^{- 1 / 2}) \\ = o_{p} (1), as n h^{2} \to \infty . \end{array}

Hence

\begin{array}{l} n^{- 1 / 2} U (β_{0}) = n^{- 1 / 2} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} W_{i} (s) [Y_{i} (s) - μ_{i} (s)] {{(\frac{\partial \tilde{γ} (s, β_{0})}{\partial β})}^{T} X_{i} (s) + Z_{i} (s)} {d N}_{i} (s) + o_{p} (1) \\ = n^{- 1 / 2} \sum_{i = 1}^{n} \int_{t_{1}}^{t_{2}} w_{i} (s) [Y_{i} (s) - μ_{i} (s)] {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)} {d N}_{i} (s) + o_{p} (1), \end{array}

(36)

which converges in distribution to N (0, Σ), where

\sum = E {(\int_{t_{1}}^{t_{2}} w_{i} (s) [Y_{i} (s) - μ_{i} (s)] {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)} {d N}_{i} (s))}^{\otimes 2} .

(37)

Since $n^{1 / 2} (\hat{β} - β_{0}) = - {(n^{- 1} \frac{\partial U (β_{0})}{\partial β})}^{- 1} n^{- 1 / 2} U (β_{0}) + o_{p} (1)$ , it follows from (35) and (36) that $n^{1 / 2} (\hat{β} - β_{0}) \overset{D}{\to} N (0, A^{- 1} \sum A^{- 1})$ as n → ∞.

Proof of Theorem 2

Since γ̂(t) = γ̃(t, β̂), we have $\hat{γ} (t) \overset{P}{\to} γ_{0} (t)$ uniform in t ∈ [0, τ] by Theorem 1 and Lemma 1. It also follows that $\partial \tilde{γ} (t, β^{*}) / \partial β \overset{P}{\to} - {(e_{x x} (t))}^{- 1} e_{x z} (t)$ for β* on the line segment between β̂ and β₀. By Lemma 2 and (36),

\begin{array}{l} {(n h)}^{1 / 2} {\hat{γ} (t) - γ_{0} (t) - \frac{1}{2} μ_{2} h^{2} {\ddot{γ}}_{0}^{T} (t)} = {(n h)}^{1 / 2} {\tilde{γ} (t, β_{0}) - γ_{0} (t) - \frac{1}{2} μ_{2} h^{2} {\ddot{γ}}_{0}^{T} (t)} - {(n h)}^{1 / 2} {(e_{x x} (t))}^{- 1} e_{x z} (t) (\hat{β} - β_{0}) + o_{p} (1) \\ = n^{- 1 / 2} \sum_{i = 1}^{n} g_{i} (t) + o_{p} (1), \end{array}

where

\begin{array}{l} g_{i} (t) = h^{1 / 2} {(e_{x x} (t))}^{- 1} \int_{0}^{τ} w_{i} (s) K_{h} (s - t) X_{i} (s) {Y_{i} (s) - μ_{i} (s)} {d N}_{i} (s) - h^{1 / 2} {(e_{x x} (t))}^{- 1} e_{x z} (t) \\ \times A^{- 1} \int_{t_{1}}^{t_{2}} w_{i} (s) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)} {Y_{i} (s) - μ_{i} (s)} {d N}_{i} (s) . \end{array}

Following the arguments of Lemma 2 of Sun (2010),

{(n h)}^{1 / 2} (\hat{γ} (t) - γ_{0} (t) - \frac{1}{2} μ_{2} h^{2} {\ddot{γ}}_{0}^{T} (t)) \overset{D}{\to} N (0, \sum_{γ} (t)),

(38)

as nh² → ∞ and nh⁵ = O(1). The consistency of the variance estimator for Σ_γ(t) follows from the proof of Theorem 2 of Sun (2010).

Proof of Theorem 3

By (31), (35) and (36), we have

\begin{array}{l} G_{n} (t) = n^{1 / 2} \int_{t_{1}}^{t} (\tilde{γ} (s; β_{0}) - γ_{0} (s)) d s + n^{1 / 2} \int_{t_{1}}^{t} (\tilde{γ} (s; \hat{β}) - \tilde{γ} (s; β_{0})) d s \\ = n^{1 / 2} \int_{t_{1}}^{t} (\tilde{γ} (s; β_{0}) γ_{0} (s)) d s - \int_{t_{1}}^{t} {(e_{x x} (s))}^{- 1} e_{x z} (s) d s n^{1 / 2} (\hat{β} - β_{0}) + o_{p} (1) \\ = n^{- 1 / 2} \sum_{i = 1}^{n} {\int_{t_{1}}^{t} {(e_{x x} (s))}^{- 1} \int_{0}^{τ} K_{h} (u - s) w_{i} (s) X_{i} (u) {Y_{i} (u) - μ_{i} (u)} {d N}_{i} (u) d s \\ - \int_{0}^{t} {(e_{x x} (s))}^{- 1} e_{x z} (s) d s A^{- 1} \\ \times \int_{0}^{τ} w_{i} (s) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)} {Y_{i} (s) - μ_{i} (s)} {d N}_{i} (s)} + o_{p} (1), \end{array}

which converges weakly to a zero-mean Gaussian process by Lemma 1 of Sun and Wu (2005).

Proof of (10)

Note that $A = E [\int_{t_{1}}^{t_{2}} w_{i} (s) {\dot{μ}}_{i} (s) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)}^{\otimes 2} α_{i} (s) d s]$ . Let

\begin{array}{l} D (s) = A^{- 1} w_{i} (s) {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)}^{T} σ_{ε} (s ∣ X_{i}, Z_{i}) α_{i}^{1 / 2} (s) \\ - \sum_{0}^{- 1} {Z_{i} (s) - {(e_{x z} (s))}^{T} {(e_{x x} (s))}^{- 1} X_{i} (s)}^{T} {{\dot{μ}}_{i} (s) / σ_{ε} (s ∣ X_{i}, Z_{i})} α_{i}^{1 / 2} (s) . \end{array}

Then the matrix

\begin{array}{l} E (\int_{t_{1}}^{t_{2}} D (s) D {(s)}^{T} d s) = A^{- 1} \sum A^{- 1} - A^{- 1} A \sum_{0}^{- 1} - \sum_{0}^{- 1} A A^{- 1} + \sum_{0}^{- 1} \sum_{0} \sum_{0}^{- 1} \\ = A^{- 1} \sum A^{- 1} - \sum_{0}^{- 1} \end{array}

is nonnegative definite.

Contributor Information

Yanqing Sun, Email: yasun@uncc.edu, Department of Mathematics and Statistics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Liuquan Sun, Email: slq@amt.ac.cn, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Beijing, China.

Jie Zhou, Email: zhoujie@amss.ac.cn, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Beijing, China.

References

Aalen OO. Nonparametric inference for a family of counting processes. Ann Stat. 1978;6:701–726. [Google Scholar]
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Springer; New York: 1993. [Google Scholar]
Cheng SC, Wei LJ. Inferences for a semiparametric model with panel data. Biometrika. 2000;87:89–97. [Google Scholar]
Fan J, Gijbels I. Local polynomial modelling and its applications. Chapman and Hall; London: 1996. [Google Scholar]
Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc. 2004;99:710–723. [Google Scholar]
Fan J, Huang T, Li R. Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc. 2007;102:632–641. doi: 10.1198/016214507000000095. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
Hu XJ, Sun J, Wei LJ. Regression parameter estimation from panel counts. Scand J Stat. 2003;30:25–43. [Google Scholar]
Hu Z, Wang N, Carroll RJ. Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data. Biometrika. 2004;91:251–262. [Google Scholar]
Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. J Am Stat Assoc. 2001;96:1045–1056. [Google Scholar]
Lin DY, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data (with discussion) J Am Stat Assoc. 2001;96:103–113. [Google Scholar]
Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572. [Google Scholar]
Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Stat Soc Ser B. 2000;62(Part 4):711–730. [Google Scholar]
Lin H, Song PX-K, Zhou QM. Varying-coefficient marginal models and applications in longitudinal data analysis. Sankhya. 2007;69:581–614. [Google Scholar]
Martinussen T, Scheike TH. A semiparametric additive regression model for longitudinal data. Biometrika. 1999;86:691–702. [Google Scholar]
Martinussen T, Scheike TH. A nonparametric dynamic additive regression model for longitudinal data. Ann Stat. 2000;28:1000–1025. [Google Scholar]
Martinussen T, Scheike TH. Sampling adjusted analysis of dynamic additive regression models for longitudinal data. Scand J Stat. 2001;28:303–323. [Google Scholar]
Martinussen T, Scheike TH. Dynamic regression models for survival data. Springer; New York: 2006. [Google Scholar]
Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B. 1991;53:233–243. [Google Scholar]
Sun Y. Estimation of semiparametric regression model with longitudinal data. Lifetime Data Anal. 2010;16:271–298. doi: 10.1007/s10985-009-9136-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Ser B. 2000;62:293–302. [Google Scholar]
Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Stat. 2005;32:21–47. [Google Scholar]
Sun Y, Gilbert PB, McKeague IW. Proportional hazards models with continuous marks. Ann Stat. 2009;37:394–426. doi: 10.1214/07-AOS554. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. J Am Stat Assoc. 2005;100:172–183. [Google Scholar]
Van der Vaart AW. Asymptotic statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
Wu H, Liang H. Backfitting random varying-coefficient models with time-dependent smoothing covariates. Scand J Stat. 2004;31:3–19. [Google Scholar]
Zhang Y. A semiparametric pseudolikelihood estimation method for panel count data. Biometrika. 2002;89:39–48. [Google Scholar]

[R1] Aalen OO. Nonparametric inference for a family of counting processes. Ann Stat. 1978;6:701–726. [Google Scholar]

[R2] Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Springer; New York: 1993. [Google Scholar]

[R3] Cheng SC, Wei LJ. Inferences for a semiparametric model with panel data. Biometrika. 2000;87:89–97. [Google Scholar]

[R4] Fan J, Gijbels I. Local polynomial modelling and its applications. Chapman and Hall; London: 1996. [Google Scholar]

[R5] Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc. 2004;99:710–723. [Google Scholar]

[R6] Fan J, Huang T, Li R. Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc. 2007;102:632–641. doi: 10.1198/016214507000000095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]

[R8] Hu XJ, Sun J, Wei LJ. Regression parameter estimation from panel counts. Scand J Stat. 2003;30:25–43. [Google Scholar]

[R9] Hu Z, Wang N, Carroll RJ. Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data. Biometrika. 2004;91:251–262. [Google Scholar]

[R10] Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. J Am Stat Assoc. 2001;96:1045–1056. [Google Scholar]

[R11] Lin DY, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data (with discussion) J Am Stat Assoc. 2001;96:103–113. [Google Scholar]

[R12] Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572. [Google Scholar]

[R13] Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Stat Soc Ser B. 2000;62(Part 4):711–730. [Google Scholar]

[R14] Lin H, Song PX-K, Zhou QM. Varying-coefficient marginal models and applications in longitudinal data analysis. Sankhya. 2007;69:581–614. [Google Scholar]

[R15] Martinussen T, Scheike TH. A semiparametric additive regression model for longitudinal data. Biometrika. 1999;86:691–702. [Google Scholar]

[R16] Martinussen T, Scheike TH. A nonparametric dynamic additive regression model for longitudinal data. Ann Stat. 2000;28:1000–1025. [Google Scholar]

[R17] Martinussen T, Scheike TH. Sampling adjusted analysis of dynamic additive regression models for longitudinal data. Scand J Stat. 2001;28:303–323. [Google Scholar]

[R18] Martinussen T, Scheike TH. Dynamic regression models for survival data. Springer; New York: 2006. [Google Scholar]

[R19] Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B. 1991;53:233–243. [Google Scholar]

[R20] Sun Y. Estimation of semiparametric regression model with longitudinal data. Lifetime Data Anal. 2010;16:271–298. doi: 10.1007/s10985-009-9136-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Ser B. 2000;62:293–302. [Google Scholar]

[R22] Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Stat. 2005;32:21–47. [Google Scholar]

[R23] Sun Y, Gilbert PB, McKeague IW. Proportional hazards models with continuous marks. Ann Stat. 2009;37:394–426. doi: 10.1214/07-AOS554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. J Am Stat Assoc. 2005;100:172–183. [Google Scholar]

[R25] Van der Vaart AW. Asymptotic statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]

[R26] Wu H, Liang H. Backfitting random varying-coefficient models with time-dependent smoothing covariates. Scand J Stat. 2004;31:3–19. [Google Scholar]

[R27] Zhang Y. A semiparametric pseudolikelihood estimation method for panel count data. Biometrika. 2002;89:39–48. [Google Scholar]

PERMALINK

Profile local linear estimation of generalized semiparametric regression model for longitudinal data

Yanqing Sun

Liuquan Sun

Jie Zhou

Abstract

1 Introduction

2 Profile local linear estimation approach

2.1 Prelimilaries

2.2 Estimation procedures

2.3 Computational algorithm

2.4 Estimation under the fixed designs

3 Statistical inferences of semiparametric model

3.1 Asymptotic properties

Theorem 1

Theorem 2

Theorem 3

Remark

3.2 Confidence intervals and simultaneous confidence bands

3.3 Hypothesis testing of regression coefficients

3.4 Selections of bandwidth and link function

4 A simulation study

Table 1.

Table 2.

Fig. 1.

Table 3.

Table 4.

5 An application

Fig. 2.

Fig. 3.

6 Discussion

Acknowledgments

Appendix

Condition A

Lemma 1

Proof of Lemma 1

Lemma 2

Proof of Lemma 2

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of (10)

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases