Abstract
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A K -fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
Keywords: Asymptotics, Kernel smoothing, Link function, Sampling adjusted estimation, Testing time-varying effects, Weighted least squares
1 Introduction
We study semiparametric modeling of covariate effects on a longitudinal response process based on repeated measurements observed at a series of sampling times. Suppose that there is a random sample of n subjects. For the ith subject, let Yi (t) be the response process and let Zi (t) and Xi (t) be the possibly time-dependent covariates of dimensions p × 1 and q × 1, respectively, over the time interval [0, τ]. We consider the following generalized semiparametric regression model for Yi (t), 0 ≤ t ≤ τ,
| (1) |
where g(·) is a known link function, β is a p-dimensional vector of unknown parameters and γ(t) is a q-dimensional vector of completely unspecified functions. The notation βT represents transpose of a vector or matrix β. The first component of Xi (t) is set to be 1, which gives a nonparametric baseline function. Under model (1), the effects of some covariates are constant while others are time-varying. Different link functions can be selected to provide a richer family of models for longitudinal data.
When the link function g(·) is the identity function, model (1) is known as the semiparametric additive model. The semiparametric additive model with longitudinal data has been studied extensively in recent years. We refer to Hoover et al. (1998), Martinussen and Scheike (1999, 2000, 2001), Lin and Ying (2001), Wu and Liang (2004), Fan and Li (2004), Hu et al. (2004), Sun and Wu (2005) and Fan et al. (2007), among others. When the link function is the natural logarithm function and Xi (t) ≡ 1, model (1) becomes the proportional means model. Data collected on the individual response processes at a finite set of sampling times are also called panel data. For panel count data, the proportional means model has been studied by Sun and Wei (2000), Cheng and Wei (2000), Zhang (2002) and Hu et al. (2003). Model (1) unifies the semiparametric additive model and the proportional means model under the same umbrella.
Although model (1) has been extensively studied for cross-sectional data, few have studied it with longitudinal data. Lin and Carroll (2001) studied model (1) when Xi (t) ≡ 1 by using profile-based generalized estimating equations (GEE) and a local linear approach. Lin et al. (2007) proposed a local linear GEE method when all the regression coefficients are nonparametric functions of time. The GEE method with appropriately selected working covariance structure of the longitudinal data can lead to improved efficiency (Fan et al. 2007). However, the selection of the working covariance can be difficult and the efficiency gain under an improperly selected working covariance structure is not clear. Further, there may be technique difficulties with the extension of the GEE method to more complicated sampling schemes. In both Lin and Carroll (2001) and Lin et al. (2007), the sampling times are assumed to be independent of covariates and the situation of possible dropouts of the subjects in the follow-up is not considered. The extensions of their methods to more general sampling and censoring schemes would make these methods more useful in practice.
This paper proposes a sampling adjusted profile local linear estimation method for the generalized semiparametric regression model (1). The paper has two main contributions. First, the proposed method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. Second, this paper presents an unified approach to the semiparametric model (1) with a general link function which has never been exploited for longitudinal data to the best of our knowledge. This presents an opportunity for model selection of the link function. A criteria for selecting the link function is proposed to provide a better fit of the data. The proposed method does not require time-varying covariates to be observed at all time, only the values at the sampling times are needed. Some hypothesis testing procedures are proposed to check whether the effect of a covariate is time-varying. This can lead to more efficient estimation when the effects of some covariates are not really time-varying.
The rest of the paper is organized as follows. In Sect. 2, a sampling adjusted profile-based local linear estimation method is proposed for model (1). Large sample properties are investigated in Sect. 3. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. This section also presents some formal hypothesis testing procedures to check whether the effect of a covariate is time-varying. The procedures for selecting bandwidth and the link function are proposed in Sect. 3.4. A simulation study is conducted in Sect. 4 to examine the finite sample performances of the proposed statistical procedures. The proposed methods are illustrated with the analysis of a HIV-1 RNA data set from an AIDS clinical trial in Sect. 5. Some concluding remarks are made in Sect. 6. All proofs are given in the Appendix.
2 Profile local linear estimation approach
2.1 Prelimilaries
Suppose that the observations of the response process Yi (t) for the ith subject are taken at the sampling time points 0 ≤ ti1 < ti2 < · · · < tini ≤ τ, where ni is the total number of observations on the ith subject and τ is the end of follow-up time. The sampling times are often irregular and depend on covariates. In addition, some subjects may drop out of the study early. Let be the number of observations taken on the ith subject by time t, where I (·) is the indicator function. Let Ci be the end of follow-up time or censoring time whichever comes first. The responses for the ith subject can only be observed at the time points before Ci. Thus Ni (t) can be written as , where is the counting process of sampling times. Let Xi (t) and Zi (t) be the predictable covariate processes associated with the ith subject. We assume that {(Yi (·), Xi (·), Zi (·), Ni (·))}, i = 1, …, n, are independent identically distributed random processes. In this section, we propose an estimation procedure for model (1) based on the observations {(Yi (tij), Xi (tij), Zi (tij)); j = 1, …, ni, i = 1, …, n.}. These are the values of {(Yi (t), Xi (t), Zi (t)), 0 ≤ t ≤ τ} observed at sampling times or the jump time points of , i = 1, …, n.
Let
be the σ -field representing the history
, Xi (·) and Zi (·) up to time t for 1 ≤ i ≤ n. Let λi (t) be the intensity process defined as follows
| (2) |
for 0 ≤ t ≤ τ. Thus λi (t) is the sampling rate at time t conditional on the past
. Let αi (t) = α(t, Xi (t), Zi (t)) be the conditional mean rate of the sampling times such that
. Then αi (t) = E{λi (t)|Xi (t), Zi (t)} by the using the double expectation property.
Many existing methods such as Lin and Ying (2001), Martinussen and Scheike (1999, 2000, 2001) took the approach by modelling αi (t). Lin and Ying (2001) assumed that the sampling process follows a proportional mean rate model (Lin et al. 2000). Martinussen and Scheike (1999, 2000) assumed that the intensity of the sampling process follows a multiplicative Aalen model (Aalen 1978) λi(t) = ηi(t) α(t) where α(t) is an unknown deterministic function and ηi (t) is a predictable process. Martinussen and Scheike (2001) considered the sampling adjusted approach by assuming that the intensity follows a nonparametric additive regression model λi (t) = ηi (t)α(t)T Xi (t), where ηi (t) is a predictable at risk indicator, α(t) is vector of unspecified time-dependent regression functions and Xi (t) are predictable time varying covariates. For all these methods mentioned above, the misspecifications of the sampling model can lead to biased estimation of the mean longitudinal response since the expectations of the estimating equations may not be zero, which is also demonstrated in our simulation study in Sect. 4.
The proposed method in the following allows the sampling strategy to depend on the past
as well as possibly time-dependent covariates without specifically model such dependence. The estimation procedure directly uses the sampling process
without modeling for λi (t) or αi (t).
2.2 Estimation procedures
We adopt a profile approach for the estimation of model (1). First, assuming β is known, the nonparametric component, γ(t), of the model is estimated using the local linear estimating equations. The parametric component, β, is estimated through the weighted profile estimating equations. The details of the estimation procedure are described in the following.
At each t, let γ(s) = γ(t)+ γ̇(t)(s−t)+ O((s-t)2) be the first order Taylor expansion of γ(·) for s in a neighborhood of t, where γ̇(t) is the derivative of γ(t) with respect to t. Denote γa(t) = (γT(t), γ̇T(t))T and . Let , where ϕ(x) = g−1(x) is the inverse function of the link function g(y). Let Wi (t) = W (t, Xi (t), Zi (t)) be a nonnegative weight process that may depend on n. At each t and for fixed β, we consider the following estimating function for γa(t):
| (3) |
where Kh(·) = K(·/h)/h, K(·) is a kernel function that weights smoothly down the contributions of remote data points and h = hn > 0 is the bandwidth parameter that controls the size of a local neighborhood. The root of the equation Ua(γa, β) = 0 is denoted by γ̃a(t, β). Since the data used in (3) are localized in the neighborhood of t, a weight function for (3) will not have much effect on the local linear estimator.
Let ϕ̇(x) be the derivative of ϕ(x) = g−1(x) with respect to x. The estimating function Ua(γa, β) can be obtained by setting in the derivative of the local weighted sum of the squares with respect to γa. The expectation of Ua(γa, β) is approximately zero for the true β and γ(·) as h → 0 under the assumptions given in the Appendix. Let and , where v⊗2 = vvT for a column vector v. Ẽyx(t) is defined similarly to Ẽzx (t) by replacing Zi(·) with Yi (·). Under the identity link function g(x) = x, a explicit solution for (3) can be derived as where Ỹx (t) = Ẽyx (t)(Ẽx x (t))−1 and Z̃x (t) = Ẽzx (t)(Ẽx x (t))−1.
Let γ̃(t, β) be the first q components of γ̃a(t, β). The profile estimating function for β is given by
| (4) |
where [t1, t2] ⊂ (0, τ). The subset [t1, t2] is considered to avoid possible instability of γ̃(t, β) near the boundary. In practice, this interval can be taken to be close to [0, τ]. We estimate β by β̂ that solves U(β̂) = 0 and γ(t) by γ̂(t) = γ̂(t, β̂).
The expression for the derivative in (4) is derived in the following. Since Ua(γ̃a(t, β), β) ≡ 02q, γ̃a(t, β) satisfies
It follows that
| (5) |
where
| (6) |
| (7) |
The estimator β̂ is a weighted least square estimator since the estimating function U(β) can be obtained by setting Qi (t) = Wi (t)[ϕ̇ {(γ̃(t, β))T Xi (t) + βTZi (t)}]−1 in the derivative of the profile least squares function ℓ (β) with respect to β, where .
2.3 Computational algorithm
The estimators β̂ and γ̂(t̂) can be obtained through an iterated estimation procedure. Let β̂{m−1} be the estimate of β at the (m − 1)th step. The mth step estimator is the root of the estimating function (3) satisfying . The mth step estimator β̂{m} is obtained by solving the estimatisng function for β:
| (8) |
where is calculated using the formula (5) at β = β̂{m−1}. The estimators and β̂{m} are updated at each iteration until convergence. The γ̂(t) is the first q components of γ̂a(t) = γ̃a(t, β̂). The estimation of β requires that both and be evaluated at the combined sampling points of all subjects or the jump points of {Ni (·), i = 1, …, n}. The estimate γ̂(t) at the last iteration can be obtained by solving at the grid points fine enough such that their plots look reasonably smooth.
2.4 Estimation under the fixed designs
Model (2) assumes existence of intensity for the counting processes that record the sampling time points. This formulation excludes sampling at predetermined time points, i.e., the fixed design. However, the method developed in Sect. 2.2 can be extended to the fixed designs with some modifications. Let t1, …, tk be the fixed sampling time points at which the responses and covariates may be observed. For the fixed designs, estimation of model (1) does not involve the kernel neighborhood smoothing. In particular, for the fixed designs, the counting process is , where Ci is the censoring time for subject i. The equation (3) should be replaced by
| (9) |
Let γ̃(t, β) solve the equation Ua(γ(t), β) at the fixed time points t = t1, …, tk for each fixed β. The estimator β̂ solves U(β̂) = 0 where U(β) is the profile estimating equation under the fixed design having the same expression as (4). The regression coefficient function γ(t) is estimated by γ̂(t) = γ̃(t, β̂).
3 Statistical inferences of semiparametric model
3.1 Asymptotic properties
This subsection investigates the asymptotic properties of the proposed estimators. These asymptotic results are used to construct confidence bands and formulate the test statistics for the regression coefficients in the subsequent subsections.
Let β0 and γ0(t) be the true values of β and γ(t) under model (1), respectively. Let and . Let w(t, x, z) be the deterministic limit of W (t, x, z) in probability as n → ∞. Define ex x (t) = E[wi(t)μ̇i (t){Xi (t)}⊗2αi(t)ξi(t)] and exz(t) = E[wi(t)μ̇i(t)Xi(t) {Zi(t)}Tαi(t)ξi(t)], where ξi(t) = I (Ci ≥ t). Let and , where wi(t) = w(t, Xi (t), Zi (t)).
Let μ̂i (s) = ϕ{γ̂T (s)Xi (s) + β̂TZi(s)} and . Let and . The following theorem presents the consistency and asymptotic normality of β̂.
Theorem 1
Assume that Condition A holds. Then
as n → ∞;
as nh2 → ∞ and nh5 = O(1).
The matrix A can be consistently estimated by
and Σ an be consistently estimated by
Under Theorem 1, the proposed estimator β̂ is consistent and asymptotically normal as long as the weight process W (·) converges in probability to a deterministic function w(·). The selection of W (·) plays a role in the variance of the estimator β̂. Naturally, we would like to choose the optimal weight such that the asymptotic variance of β̂ is minimized. This selection is usually difficult. It depends on the correlation structure of the longitudinal data among other things. Suppose that the repeated measurements of Yi (·) within the same subject are independent and that Yi (·) is independent of Ni (·) conditional on the covariates Xi (t) and Zi (t). Let be the conditional variance of Yi (t) given the covariates Xi (t) and Zi (t) under model (1). Then the matrix . Let . We show in the Appendix that
| (10) |
where B ≥ 0 means that the matrix B is nonnegative definite. When wi (t) = μ̇i (t)/{σε (t| Xi, Zi)}2, A = Σ = Σ0 and the equality in (10) holds. The situation often leads to asymptotically efficient estimators in many semiparametric models discussed by Bickel et al. (1993).
Next, we state an asymptotic result for the estimator γ̂(t). The result is useful for constructing confidence intervals for the mean response curve given the covariates. Denote γ̇0(t), γ̈0(t) the first and second derivatives of γ0(t) with respect to t, respectively.
Theorem 2
Under Condition A, ,
as nh2 → ∞ and nh5 = O(1) for t ∈ (0, τ), where , Σγ (t) = ex x (t))−1 Σe(ex x (t))−1, . The covariance matrix Σγ (t) can be estimated consistently by , where
When the link function is the identity function, Sun and Wu (2005) showed that the asymptotic bias of using the profile kernel smoothing for γ0(t) is . This phenomenon parallels the situation described in Fan and Gijbels (1996, p. 17) for the nonparametric regression with cross-sectional data that compares the Nadaraya-Watson estimator and the local linear estimator. The extra term in the bias of γ̂(t) using profile kernel smoothing depends on (ex x (t))−1ėx x (γ̇0(t). The bias of the profile kernel smoothing estimator can be large in the highly asymmetric design where (ex x (t))−1ėx x (γ̇0(t) is large. On the other hand, the bias of the profile local linear smoothing estimator only involves the second derivative γ̈0(t), thus is design-adaptive. Another advantage of the local linear smoothing over the kernel smoothing, as discussed in Fan and Gijbels (1996), is the automatic boundary adaption. The rate of convergence at boundary points using the local linear smoothing is same as for the interior points, which can be shown to hold for model (1) with longitudinal data as well.
Let and . The following theorem presents a weak convergence result for Gn(t) = n1/2(Γ̂(t) − Γ0(t)) over t ∈ [t1, t2]. This result provides theoretical justifications for testing the regression coefficient functions γ(t) and for the construction of simultaneous confidence bands of developed later.
Theorem 3
Under Condition A, uniformly in t ∈ [t1, t2] ⊂ (0, τ) as nh2 → ∞ and nh5 → 0, where
| (11) |
The processes Gn(t) converges weakly to a zero-mean Gaussian process G(t) on [t1, t2]. The asymptotic covariance matrix of Gn(t) can be estimated consistently by , where
| (12) |
Remark
For the estimation of model (1) under the fixed designs, the asymptotic results similar to those in Theorem 1 can be established. Without the kernel neighborhood smoothing, one needs to replace, exz(t) = E[wi (t)μ̇i (t)Xi (t){Zi (t)}T αi (t)ξi (t)] by exz(t) = E[μ̇i (t)Xi (t){Zi (t)}T ξi (t)], and by . Similar replacements hold for ex x (t), Êx x (t), eyx (t) and Êyx (t). The following asymptotic results can be established for γ̂(t) at , where Σγ(t) = (ex x (t))−1 Σe(t) (ex x (t))−1, Σe(t) = E{(Yi (t) − μi (t))Xi (t)ξi (t)}⊗2.
3.2 Confidence intervals and simultaneous confidence bands
Let γ(k)(t) be the kth component of γ(t). Similar notations are used throughout with the superscript (k) denoting the kth component of the corresponding vector. Assuming nh5 → 0, based on Theorem 2, the under-smoothing avoids estimating the second derivative γ̈(t) and controls the size of the bias term. The large sample pointwise confidence intervals for γ(k)(t), 0 < t < τ, is obtained by
| (13) |
By Theorem 3, the pointwise confidence intervals for Γ(k)(t), 0 < t < τ, is given by
| (14) |
Furthermore, based on Theorem 3, simultaneous confidence bands and hypothesis tests related to the regression coefficient functions γ(t) can be constructed. A key component is the estimation of confidence coefficients and the critical values. The Gaussian multiplier resampling method of Lin et al. (1993) has been widely employed for this purpose and is described in the following.
Let , where ξ1, ξ2, …, ξn are independent identically distributed (iid) standard normal random variables independent from the observed data set. By Lemma 1 of Sun and Wu (2005), the processes Gn(t) and given the observed data sequence converge weakly to the same zero-mean Gaussian process on [t1, t2]. To approximate the distribution of Gn(t), we simulate a large number of realizations from by repeatedly generating (ξ1, …, ξn) while fixing {Yi (t), Xi (t), Zi (t), Ni (t)), t ≥ 0} at their observed values. Let cα be the (1 − α)- quantile of , which can be approximated by repeatedly generating independent normal samples (ξ1, …, ξn). An asymptotic 1− α simultaneous confidence bands for Γ(k)(t) on [t1, t2] is given by
| (15) |
3.3 Hypothesis testing of regression coefficients
The generalized semiparametric regression model (1) postulates that the covariates effects are constant for some and are time-varying for others. A formal hypothesis testing procedure can be established to check whether the effect of a covariate is time-varying under model (1). This can lead to more efficient estimation when the effects of some covariates are not really time-varying. We consider testing the null hypothesis H01 that γ(k)(t) is constant for 0 ≤ t ≤ τ.
Under H01, for t ∈ [t1, t2]. By Theorem 3 and the continuous mapping theorem,
converges weakly to , where G(k)(t) is the kth component of the limiting Gaussian process G(t) of n1/2 {Γ̂(t) − Γ(t)}. The rationale leads to the following constructions of the test statistics:
and
By the continuous mapping theorem, under H01, the test statistic S1 converges in distribution to , and the test statistic S1 converges in distribution to . The two test statistics are commonly used in statistics literature with S1 referred as the supremum type and L1 as the integrated square type, cf., Martinussen and Scheike (2006).
Let
and
The critical values of S1 and L1 can be approximated by simulating a number of copies of and obtained by repeatedly generating independent normal samples (ξ1, …, ξn) while holding the observed data fixed. For example, the critical values of test statistics S1 and L1 at the significance level α can be estimated by the upper α quantile of, say 1, 000, copies of and , respectively. The p-values of the tests based on S1 and L1 are the percentages of and exceeding S1 and L1, respectively. The null hypothesis is rejected if the p-values are less than α.
The tests of the null hypothesis H02 that γ(k)(t) = 0 for 0 ≤ t ≤ τ can also be constructed similarly. In particular, one may consider the test statistics: S2 supt1≤t≤t2n1/2|Γ̂(k)(t) and . The reference distributions of S2 and L2 can be generated based on and , respectively.
3.4 Selections of bandwidth and link function
Let σ(k)(t) be the (k, k)th element of Σγ(t). It follows from Theorem 2 that the mean integrated square error for estimating the kth component γ(k)(t) over [t1, t2] is
The asymptotic optimal bandwidth is given by
The optimal theoretical bandwidth is difficult to achieve since it involves estimating the second derivative . In practice, the appropriate bandwidth selection can be based on a cross-validation method. This approach is widely used in nonparametric function estimation literature, see Rice and Silverman (1991) for leave-one-subject-out cross-validation approach and Tian et al. (2005) for K -fold cross-validation approach.
An analog of the K -fold cross-validation approach in the current setting is to divide the data into K equal-sized groups. Let Dk denote the kth subgroup of data, then the kth prediction error is given by
| (16) |
for k = 1, …, K, where γ̂(−k)(t) and β̂(−k) are the estimators of γ0(t) and β0 based on the data without the subgroup Dk. The data-driven bandwidth selection based on the K -fold cross-validation is to choose the bandwidth h that minimizes the total prediction error . As we show in Sect. 5 in the analysis of a HIV-1 RNA data set from an AIDS clinical trial, the K -fold cross-validation bandwidth selection provides a working tool for locating an appropriate bandwidth.
Our estimation procedure for model (1) holds for a wide class of link functions. This presents an opportunity to select the most appropriate link function for a particular application. In some applications the choice may be based on prior knowledge, but more often it will be a pragmatic choice based on what gives the “best fit”. One natural criterion for accessing the model fit is the regression deviation defined as
| (17) |
where hcv is the bandwidth selected based on the K -fold cross-validation method for the given link function g(·) described above, and γ̂g(t) and β̂g are the estimators of γ0(t) and β0 under model (1) with the bandwidth hcv. In practice, the link function g(·) can be selected to minimize the regression deviation. This approach is illustrated through a data example in Sect. 5.
4 A simulation study
In this section, we examine finite sample properties of the estimation and hypothesis testing procedures proposed for model (1). The performances of the estimators for β and γ(t) at a fixed time t are measured through the bias, the sample mean of the estimated standard errors (ESE), the sample standard error of the estimators (SEE) and the 95 % empirical coverage probability (CP). To evaluate the overall performance of the estimator γ̂(k)(t) on the interval [h, τ − h], we consider the square root of integrated mean square error , where N is the repetition number, is the jth estimate of γ(k)(t) for j = 1, …, N. We use the unit weight function Wi (t) = 1 and the Epanechnikov kernel K (u) = 0.75(1− u2)I (|u| ≤ 1) throughout the simulation. We take t1 = 0 and t2 = τ in the estimating functions (4) and (8).
The performance of the estimators are examined under the following selected setting of model (1), in which we take the link function g(x) = ln(x):
| (18) |
for 0 ≤ t ≤ τ with τ = 3.5, where Xi is a Bernoulli random variable with the success probability of 0.5, Zi is uniformly distributed on (0, 1), εi (t) is N (φi, 0.52) conditional on φi and φi is N (0, 1). Here γ(t) = (γ1(t), γ2(t))T with γ1(t) = 0.5t1/2 and γ2(t) = 0.5 sin(2t).
We consider three models for the sampling times. The first model is a Poison process with the proportional mean rate
| (19) |
The second model is a Poison process with the additive mean rate
| (20) |
To examine the performance of the proposed method when the sampling strategy depends on the past history, we consider a nonhomogeneous poisson process for the sampling times with the intensity function
| (21) |
where Zi is uniform on (0, 1) and if there was an event within the interval [t − 1, t) and 0 otherwise. For all the three sampling models, the censoring times Ci are generated from U(1.5, 8). There are approximately 3 observations per subject in the interval [0, τ] and about 30 % subjects are censored before τ = 3.5.
Table 1 summarizes the bias, SEE, ESE and CP for β and RMSE for γ(t) under the longitudinal model (18) with β = 0.5 and with the sampling times models (19)–(21). The integrals are evaluated on the grid points si = 0.05i, i = 1, 2, …, 69. The summaries of performance of γ̂(t) at time points 0.5 j, j = 1, …, 6, are given in Table 2. Each entry of the tables is calculated based on 1, 000 repetitions. Table 1 and Table 2 include the simulation results for β = 0.5 and n = 200 and 300. The expanded simulations for β = 0.0 and 1.5 and at n = 100 are also conducted but not reported here. The simulation studies demonstrate that the proposed estimation procedures perform well for three sampling situations considered here. It appears that the estimates are unbiased and there is a good agreement between the estimated and empirical standard errors. The empirical coverage probabilities are reasonable for both sample sizes 200 and 300. Plots of γ̂1(t) and γ̂2(t) for model (18) are depicted in Fig. 1 when β = 0.5 for n = 100 and h = 0.3. Figure 1a, b in the first row are under the proportional sampling model (19), Fig. 1c, d in the second row are under the additive sampling model (20) and Fig. 1e, f in the third row are under the sampling model (21). The estimators γ̂1(t) and γ̂2(t) are essentially unbiased. These figures also show that the proposed estimation procedures perform well for the nonparametric components under these three different sampling models.
Table 1.
Summary of bias, SEE, ESE and CP for β and RMSE for γ(t) under the longitudinal model (18) with β = 0.5, and γ2(t) = 0.5 sin(2t) and with the sampling times models (19)–(21)
| n | h | Bias | SEE | ESE | CP | RM SE1 | RM SE2 |
|---|---|---|---|---|---|---|---|
| Under sampling times model (19) | |||||||
| 200 | 0.3 | 0.0012 | 0.0671 | 0.0657 | 94.1 | 0.0753 | 0.0960 |
| 0.4 | 0.0020 | 0.0681 | 0.0660 | 94.7 | 0.0735 | 0.0907 | |
| 0.5 | 0.0031 | 0.0680 | 0.0664 | 94.4 | 0.0704 | 0.0878 | |
| 300 | 0.3 | 0.0028 | 0.0562 | 0.0540 | 94.4 | 0.0678 | 0.0849 |
| 0.4 | 0.0011 | 0.0566 | 0.0539 | 94.4 | 0.0608 | 0.0758 | |
| 0.5 | −0.0013 | 0.0561 | 0.0542 | 93.2 | 0.0570 | 0.0750 | |
| Under sampling times model (20) | |||||||
| 200 | 0.3 | −0.0010 | 0.0676 | 0.0681 | 94.7 | 0.0754 | 0.0959 |
| 0.4 | 0.0032 | 0.0709 | 0.0685 | 93.8 | 0.0758 | 0.0896 | |
| 0.5 | 0.0035 | 0.0693 | 0.0687 | 94.8 | 0.0695 | 0.0855 | |
| 300 | 0.3 | −0.0043 | 0.0580 | 0.0557 | 93.5 | 0.0690 | 0.0843 |
| 0.4 | −0.0020 | 0.0555 | 0.0560 | 95.9 | 0.0600 | 0.0743 | |
| 0.5 | 0.0005 | 0.0553 | 0.0561 | 94.7 | 0.0569 | 0.0730 | |
| Under sampling times model (21) | |||||||
| 200 | 0.3 | 0.0004 | 0.0776 | 0.0703 | 0.923 | 0.0850 | 0.0914 |
| 0.4 | −0.0015 | 0.0730 | 0.0703 | 0.944 | 0.0803 | 0.0856 | |
| 0.5 | −0.0025 | 0.0711 | 0.0705 | 0.948 | 0.0730 | 0.0799 | |
| 300 | 0.3 | 0.0013 | 0.0595 | 0.0575 | 0.945 | 0.0679 | 0.0751 |
| 0.4 | 0.0024 | 0.0603 | 0.0581 | 0.935 | 0.0668 | 0.0703 | |
| 0.5 | −0.0007 | 0.0589 | 0.0577 | 0.943 | 0.0605 | 0.0702 | |
Table 2.
Summary of bias, SEE, ESE and CP for γ (t) at t = 0.5, 1.0, 1.5, 2.0, 2.5, 3.0 for n = 200 and h = 0.4 under the longitudinal model (18) with β = 0.5, and γ2(t) = 0.5 sin(2t) and with the sampling times models (19)–(21)
| t |
|
γ2(t) = 0.5 sin(2t)
|
|||||||
|---|---|---|---|---|---|---|---|---|---|
| Bias | SEE | ESE | CP | Bias | SEE | ESE | CP | ||
| Under sampling times model (19) | |||||||||
| 0.5 | −0.0164 | 0.0876 | 0.0862 | 0.940 | −0.0181 | 0.0940 | 0.0909 | 0.929 | |
| 1.0 | −0.0110 | 0.0792 | 0.0766 | 0.946 | −0.0207 | 0.0771 | 0.0772 | 0.927 | |
| 1.5 | −0.0079 | 0.0708 | 0.0707 | 0.941 | −0.0012 | 0.0836 | 0.0780 | 0.930 | |
| 2.0 | −0.0030 | 0.0698 | 0.0674 | 0.944 | 0.0221 | 0.0931 | 0.0916 | 0.945 | |
| 2.5 | −0.0040 | 0.0661 | 0.0662 | 0.958 | 0.0284 | 0.1004 | 0.0938 | 0.911 | |
| 3.0 | −0.0038 | 0.0661 | 0.0648 | 0.944 | 0.0073 | 0.0792 | 0.0754 | 0.932 | |
| Under sampling times model (20) | |||||||||
| 0.5 | −0.0148 | 0.0893 | 0.0869 | 0.949 | −0.0213 | 0.0929 | 0.0892 | 0.930 | |
| 1.0 | −0.0057 | 0.0804 | 0.0776 | 0.932 | −0.0283 | 0.0776 | 0.0758 | 0.921 | |
| 1.5 | −0.0050 | 0.0745 | 0.0724 | 0.933 | −0.0042 | 0.0806 | 0.0774 | 0.942 | |
| 2.0 | −0.0026 | 0.0711 | 0.0689 | 0.936 | 0.0187 | 0.0963 | 0.0904 | 0.925 | |
| 2.5 | −0.0035 | 0.0699 | 0.0675 | 0.937 | 0.0230 | 0.0941 | 0.0925 | 0.928 | |
| 3.0 | −0.0046 | 0.0699 | 0.0668 | 0.931 | 0.0070 | 0.0808 | 0.0748 | 0.923 | |
| Under sampling times model (21) | |||||||||
| 0.5 | −0.0097 | 0.1029 | 0.0947 | 0.927 | −0.0193 | 0.1075 | 0.0961 | 0.921 | |
| 1.0 | −0.0078 | 0.0867 | 0.0808 | 0.920 | −0.0225 | 0.0780 | 0.0749 | 0.927 | |
| 1.5 | −0.0033 | 0.0774 | 0.0754 | 0.936 | −0.0028 | 0.0793 | 0.0739 | 0.933 | |
| 2.0 | −0.0023 | 0.0749 | 0.0706 | 0.924 | 0.0195 | 0.0864 | 0.0812 | 0.922 | |
| 2.5 | −0.0013 | 0.0715 | 0.0673 | 0.933 | 0.0318 | 0.0819 | 0.0765 | 0.895 | |
| 3.0 | −0.0001 | 0.0666 | 0.0649 | 0.941 | 0.0074 | 0.0606 | 0.0575 | 0.936 | |
Fig. 1.
Plots of γ̂(t) for model (18) when γ1(t) = 0.5t1/2, γ2(t) = 0.5 sin(2t) and β = 0.5 for n = 100 and h = 0.3. a, b in the first row are under the proportional sampling model (19), c, d in the second row are under the additive sampling model (20) and e, f in the third row are under the sampling model (21). The solid lines are the estimates and the dashed lines are the true curves
The following models are considered to evaluate the performance of the test statistics S1 and L1 for testing H01:
| (22) |
for 0 ≤ t ≤ τ, where the distributions of Xi, Zi and εi (t) are same as those given in model (18). Different values of θ are to be selected to examine the power of the tests.
The observed sizes of the test statistics are calculated under θ = 0. The powers of the tests are evaluated at θ = 0.1, 0.15 and 0.2. Table 3 lists the empirical sizes and powers of the test statistics S1 and L1 at the significance level 0.05 under the sampling models (19)–(21). Each entry is based on 1, 000 repetitions. Each p-value is estimated by generating 1, 000 independent Gaussian random samples. The empirical sizes of both the tests are reasonably close to the 0.05 nominal level. The empirical power increases when sample size increases. There is also an increased power when θ increases, which represents an increased time-varying effect under model (22). Again, the performances of the tests are robust to the models of sampling times.
Table 3.
Empirical sizes and powers of the tests based on S1 and L1 at nominal level α = 0.05 under the longitudinal model (22) with the sampling times models (19)–(21)
| n | h | Size | Power | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
|
θ = 0
|
θ = 0.1
|
θ = 0.15
|
θ = 0.2
|
||||||
| S1 | L1 | S1 | L1 | S1 | L1 | S1 | L1 | ||
| Under sampling times model (19) | |||||||||
| 200 | 0.3 | 0.053 | 0.054 | 0.503 | 0.526 | 0.851 | 0.863 | 0.968 | 0.970 |
| 0.4 | 0.055 | 0.057 | 0.542 | 0.566 | 0.851 | 0.859 | 0.980 | 0.981 | |
| 0.5 | 0.047 | 0.045 | 0.542 | 0.536 | 0.853 | 0.864 | 0.976 | 0.976 | |
| 300 | 0.3 | 0.054 | 0.054 | 0.687 | 0.684 | 0.954 | 0.960 | 0.997 | 0.997 |
| 0.4 | 0.060 | 0.054 | 0.699 | 0.702 | 0.950 | 0.952 | 0.999 | 0.998 | |
| 0.5 | 0.059 | 0.054 | 0.696 | 0.689 | 0.955 | 0.962 | 0.996 | 0.996 | |
| Under sampling times model (20) | |||||||||
| 200 | 0.3 | 0.057 | 0.056 | 0.535 | 0.554 | 0.986 | 0.987 | 0.979 | 0.984 |
| 0.4 | 0.053 | 0.049 | 0.529 | 0.542 | 0.861 | 0.862 | 0.975 | 0.977 | |
| 0.5 | 0.066 | 0.058 | 0.565 | 0.558 | 0.872 | 0.867 | 0.974 | 0.973 | |
| 300 | 0.3 | 0.064 | 0.055 | 0.695 | 0.703 | 0.952 | 0.959 | 0.997 | 0.997 |
| 0.4 | 0.066 | 0.069 | 0.717 | 0.718 | 0.956 | 0.956 | 1.000 | 1.000 | |
| 0.5 | 0.052 | 0.045 | 0.710 | 0.717 | 0.958 | 0.965 | 0.999 | 0.999 | |
| Under sampling times model (21) | |||||||||
| 200 | 0.3 | 0.060 | 0.065 | 0.572 | 0.599 | 0.887 | 0.895 | 0.990 | 0.986 |
| 0.4 | 0.065 | 0.066 | 0.584 | 0.578 | 0.886 | 0.885 | 0.986 | 0.989 | |
| 0.5 | 0.058 | 0.059 | 0.604 | 0.598 | 0.880 | 0.886 | 0.988 | 0.989 | |
| 300 | 0.3 | 0.058 | 0.069 | 0.738 | 0.755 | 0.960 | 0.965 | 0.998 | 0.998 |
| 0.4 | 0.065 | 0.066 | 0.760 | 0.768 | 0.967 | 0.968 | 0.999 | 0.998 | |
| 0.5 | 0.044 | 0.049 | 0.738 | 0.741 | 0.965 | 0.969 | 1.000 | 0.999 | |
Finally, we conduct a small simulation study under the identity link function to compare with the joint modelling method of Lin and Ying (2001) in which the sampling times are modelled through the proportional mean rate model. We consider the following model for the longitudinal response
| (23) |
where Zi and εi (t) are same as those for model (18), β = 1, and α(t) is taken to be 1+t or 1+ t3. Table 4 list the summaries of the estimation for β for two different choices of α(t) using the method of Lin and Ying (2001) (L&Y) and the proposed method with h = 0.3, 0.4 and 0.5 when the sampling times are generated from model (19)–(21). Each entry is based on 1, 000 repetitions. The estimation of Lin and Ying (2001) has larger biases when the sampling model is mis-specified under (20) and (21), especially when the sampling strategy depends on the past history and the intercept α(t) varies more. In all the cases, Lin and Ying (2001) estimation yields large variances compared to the proposed method.
Table 4.
Comparisons of the estimation for β using the proposed method and the method of Lin and Ying (2001) under model (23) with β = 1 and two different choices of α(t) for n = 200
| h |
α(t) = 1 + t
|
α(t) = 1 + t3
|
||||||
|---|---|---|---|---|---|---|---|---|
| Bias | SEE | ESE | CP | Bias | SEE | ESE | CP | |
| Under sampling times model (19) | ||||||||
| 0.3 | 0.0008 | 0.1665 | 0.1648 | 0.952 | 0.0065 | 0.1692 | 0.1675 | 0.944 |
| 0.4 | 0.0013 | 0.1660 | 0.1649 | 0.947 | 0.0072 | 0.1693 | 0.1714 | 0.951 |
| 0.5 | 0.0013 | 0.1659 | 0.1649 | 0.944 | 0.0076 | 0.1693 | 0.1786 | 0.962 |
| L&Y | 0.0024 | 0.1772 | 0.1786 | 0.947 | −0.0390 | 0.9055 | 0.8756 | 0.943 |
| Under sampling times model (20) | ||||||||
| 0.3 | 0.0052 | 0.1724 | 0.1728 | 0.954 | −0.0019 | 0.1801 | 0.1756 | 0.942 |
| 0.4 | 0.0053 | 0.1720 | 0.1728 | 0.952 | −0.0022 | 0.1793 | 0.1794 | 0.951 |
| 0.5 | 0.0051 | 0.1717 | 0.1728 | 0.953 | −0.0021 | 0.1788 | 0.1867 | 0.958 |
| L&Y | 0.0033 | 0.1802 | 0.1879 | 0.952 | 0.0182 | 0.9088 | 0.9081 | 0.949 |
| Under sampling times model (21) | ||||||||
| 0.3 | 0.0062 | 0.1779 | 0.1765 | 0.948 | 0.0068 | 0.1801 | 0.1796 | 0.938 |
| 0.4 | 0.0061 | 0.1775 | 0.1764 | 0.950 | 0.0068 | 0.1797 | 0.1863 | 0.949 |
| 0.5 | 0.0059 | 0.1770 | 0.1764 | 0.951 | 0.0063 | 0.1797 | 0.1985 | 0.964 |
| L&Y | 0.0905 | 0.1885 | 0.2026 | 0.931 | 0.7316 | 0.9190 | 0.9414 | 0.871 |
5 An application
In this section, we apply the proposed methods to a real data example. We demonstrate how to select the link function that provides better fit of the data using the procedures given in Sect. 3.4. The estimation and inference are then carried out using the selected link function. We consider the analysis of a HIV-1 RNA data set from an AIDS clinical trial. In this study, all subjects initiated the antiretroviral treatment at time 0 (the baseline). Some subjects received a single protease inhibitor (PI) while others received a double-PI antiretroviral regimens in treating HIV-infected patients. HIV-1 RNA levels in plasma was measured repeatedly during the follow-up. The scheduled visit times were at weeks 0, 2, 4, 8, 16 and 24. But the actual visit times of individuals may vary around the scheduled visiting times. Some patients had prior antiviral treatment with non-nucleoside analogue reverse transcriptase inhibitors (NNRTI) and others did not have prior NNRTI treatment. The prior NNRTI treatment is considered to be a factor that affects the antiviral response to the antiretroviral regimens in the current study.
A total of 481 patients were enrolled in the study, with 2, 626 total visits. Owing to technical limitations, 175 measurements of HIV-1 RNA levels were censored below the detection limit, and three were censored above the detection limit. We restrict our analysis to those responses within the detectable range. This data set has been analyzed by Sun and Wu (2005). Here we use the same transformed time scale t = log10(day of actual visit + 40) − log10(32) of the actual visits so that the transformed sampling time points are more evenly distributed suitable for bandwidth selection. The maximum of transformed sampling times is τ = 0.88. The response variable Y (t) is the change of HIV-1 RNA level using a log10 scale at time t ∈ [0, τ] from the baseline. We refer to Sun and Wu (2005) for the detailed discussions of the transformations. Let X = 1 denote the patients who received a double-PI treatment and X = 0 for patients who received a single-PI treatment. Let Z be the indicator of the prior antiviral treatment with NNRTI, with 1 for having had NNRTI and 0 for having not received NNRTI.
Analysis of Sun and Wu (2005) shows that the effect of treatment (double-PI versus single-PI) is time-varying after adjusting for the prior NNRTI antiviral treatment experience under the semiparametric additive regression model. Here we consider to fit the following generalized semiparametric model
| (24) |
for 0 ≤ t ≤ τ, where g(·) is a known link function. In the following we illustrate the selection of g(·) based on the criteria (17) among the two commonly used link functions, the identity link function and the logarithm link function. We use the unit weight function Wi (t) = 1 and set t1 = 0 and t2 = τ in (4) for the estimation of β.
Based on the procedure given in Sect. 3.4, the
-fold cross-validation method with K = 50 for the identity link function yields the bandwidth hcv = 0.06. The plot of the total prediction error is given in Fig. 2a. The regression deviation is R D = 3.3495×103 using (17). For the logarithm link function, the
-fold cross-validation with K = 50 yields hcv = 0.07. The plot of the total prediction error is similar to Fig. 2a. The corresponding regression deviation is R D = 3.3801 × 103, which is larger than that under the identity link function. This suggests that model (24) with the identity link function provides better fit of the data.
Fig. 2.
The curve of the total prediction error PE(h) is plotted against h in (a) and the change of β̂ with h is shown in (b) under the identity link function for the HIV-1 RNA data
Under the identity link function, the estimate β̂ is 0.6243 with the standard error 0.0883 for h = 0.06. The estimates γ̂1(t) and γ̂2(t) and the 95 % pointwise confidence intervals are plotted in the first row in Fig. 3. The p-values for testing for time-dependence of γ2(t) are 0.009 and 0.022 using the test statistics S1 and L1, respectively, based on 1, 000 Gaussian samples. Our experience shows that the “optimal” bandwidth that minimizes the total prediction error tends to be a little small for yielding smoothed curves for the nonparametric regression function estimation. The estimation of the parametric components are not greatly affected by the choices of the bandwidth. The plot of β̂ against h is given in Fig. 2b. For example, β̂ is 0.6150 with the standard error 0.0888 for h = 0.09 and β̂ is 0.6064 with the standard error is 0.0894 for h = 0.12. The p-values for testing for time-dependence of γ2(t) are 0.014 and 0.034 for h = 0.09 using the test statistics S1 and L1, respectively. The corresponding p-values are 0.039 and 0.022 for h = 0.12 based on the test statistics S1 and L1, respectively. The estimates γ̂1(t) and γ̂2(t) and the 95 % pointwise confidence intervals with h = 0.09 and 0.12 are plotted in the second and third rows of Fig. 3, respectively. Our hypothesis tests indicate that the treatment effect changes with time. The p-values for testing γ2(t) = 0 using the test statistics S2 and L2 are 0.036 and 0.044, respectively for h = 0.06. The double-PI antiretroviral regimens works better than the single PI regimens in reducing viral load in treating HIV-infected patients and this effect becomes stronger over time during the course of the study as shown in Fig. 3. The patients who had prior antiviral treatment with NNRTIs tend to have higher level of viral load than those who did not have the prior treatment.
Fig. 3.
The plots of the estimates γ̂1(t) and γ̂2(t) and the 95 % pointwise confidence intervals for three different bandwidths h = 0.06, 0.09 and 0.12 under the identity link function for the HIV-1 RNA data
6 Discussion
In this paper we study the generalized semiparametric regression model for longitudinal data. The semiparametric model (1) allows the covariate effects to be constant for some and time-varying for others. We proposed an estimation method that automatically adjusts for heterogeneity of sampling times. The nonparametric components of the model are estimated using the local linear estimating equations and the parametric components are estimated through the weighted profile estimating functions. Unlike the profile-based estimation methods of Lin and Carroll (2001), Lin et al. (2007) and Fan et al. (2007), the proposed method applies to the situations where the conditional distributions of the sampling times may depend on the past sampling history as well as covariates and where the subjects may dropout in the follow-up studies. Also unlike the joint modelling approaches of Martinussen and Scheike (1999), Martinussen and Scheike (2000, 2001), Lin and Ying (2001), Hu et al. (2003) and Fan and Li (2004), the proposed method utilizes the sampling process directly without having to model its intensity function λi (t) or the mean rate function αi (t). The model specifications of λi (t) or αi (t) can be very difficult and problematic especially when the sampling strategy may depend on the past sampling history. Our simulation studies show that the proposed method works well under a variety of sampling models including the proportional mean rate model, additive mean rate model and the sampling model that depends on the past sampling history. Simulations also show that, for some existing methods, misspecification of the sampling model can result in large estimation bias.
This paper presents an unified approach to the semiparametric model (1) with a general link function. Different link functions can be specified for more flexible modelling of longitudinal data. Both the categorical and continuous longitudinal responses can be modelled with appropriately chosen link functions. For example, the identity and logarithm link functions can be used for the continuous response variables while the logit link function can be used for the binary responses. Model (1) presents an opportunity for model selection of the link function. A criteria is proposed for selecting the link function that provides better fit. The procedures are illustrated with a data example.
The estimation method developed in Sects. 2.1–2.3 assumes existence of intensity for the counting processes that record the random sampling time points. The method is extended to the fixed designs in Sect. 2.4 where data are observed at the planned sampling time points. The asymptotic properties of the estimators for both cases are derived in Sect. 3.1. In the situation of a “mixed design” where some observations are made at fixed time points with positive probability while other observations are made at random sampling points, we believe that the estimating functions (3) and (4) are still valid yielding consistent estimators. However, the asymptotic properties need to be carefully worked out since the asymptotic rate of γ̂(t) is (nh)−1/2 under the random design and is n−1/2 under the fixed design. The derivation of the asymptotic results require clear classifications whether a observation time point is from a planned visiting time or from a random sampling time.
The proposed estimation method is a marginal approach that does not take into consideration of the correlations between the repeated measurements. Following the discussions in Sect. 3.1, the efficiency of the estimation can be improved by selecting Wi (t) = W (t, Xi (t), Zi (t)), where W (t, x, z) converges in probability to w(t, x, z) = μ̇i (t)/{σ3 (t|x, z)}2. Since model (1) does not specify the conditional variance structure for , such selection can be difficult. When the dimension of covariates is small, a two-stage estimation procedure can be considered. In the first stage, estimate β and γ(t) with the identity weight function to obtain β̂I and γ̂I (t). In the second stage, the updated estimation of β and γ(t) is obtained by choosing the weight , where and {σε (t|x, z)}2 equals
where K̃B (u) = |B|−1/2 K̃ (B−1/2u) and K̃(u) is a multivariate kernel function and B is the bandwidth (p + q) × (p + q) positive definite matrix. The strategy may be difficult to implement when the dimension of covariates is large because of the curse of dimensionality. Further research is warranted.
Acknowledgments
The authors thank the reviewers for their constructive comments that have improved the presentation and content of the paper. The research of Yanqing Sun was partially supported by NSF grants DMS-0905777 and DMS-1208978, NIH NIAID grant 2 R37 AI054165-10 and a fund provided by UNC Charlotte. The research of Liuquan Sun was partly supported by the National Natural Science Foundation of China Grants (No. 10731010, 10971015 and 10721101), the National Basic Research Program of China (973 Program) (No. 2007CB814902) and Key Laboratory of RCSDS, CAS (No. 2008DP173182).
Appendix
We assume the following conditions throughout the paper:
Condition A
The covariate processes Xi (·) and Zi (·) are left continuous; The censoring time Ci is noninformative in the sense that and E{Yi (t)|Xi (t), Zi (t), Ci ≥ t} = E{Yi (t)|Xi (t), Zi (t)}; is independent of Yi (t) conditional on Xi (t), Zi (t) and Ci ≥ t; the processes Yi (t), Xi (t), Zi (t) and αi (t), 0 ≤ t ≤ τ, are bounded and their total variations are bounded by a constant; E|Ni (t2) − Ni (t1)|2 ≤ L(t2 − t1) for 0 ≤ t1 ≤ t2 = τ, where L > 0 is a constant; the link function g(y) is monotone and its inverse function g−1(x) is twice differentiable; γ0(t), ex x (t) and exz(t) are twice differentiable; (ex x (t))−1 is bounded over 0 ≤ t ≤ τ; the matrices A and Σ are positive definite; the weight process uniformly in the range of (t, x, z); w(t, x, z) is differentiable with uniformly bounded partial derivatives; the kernel function K (·) is symmetric with compact support on [−1, 1] and bounded variation; bandwidth h → 0; E|Ni (t + h) − Ni (t − h)|2+v = O(h), for some v > 0; the limit exists and is finite.
Let
. Define γβ (t) as the unique root such that ua(γβ, β) = 0 for β ∈
where
is a neighborhood of β0. Let
and
. When β = β0, we have γβ(t) = γ0(t). In this case, eβ,x x (t) = ex x (t) and eβ,xz(t) = exz(t). Let
where 0q is a q × 1 vector of zeros.
Let H = diag{Iq, h Iq }. The following lemmas are used in the proofs of the main theorems. The proofs of the lemmas make repeated applications of the Glivenko-Cantelli Theorem (Theorem 19.4 of van der Vaart 1998). A sufficient condition for applying the Glivenko-Cantelli Theorem can be checked by estimating the order of the bracketing number, similar to the proof of Lemma 2 of Sun et al. (2009). This sufficient condition holds under the conditions provided in Condition A. The details are omitted to save space.
Lemma 1
Assume that Condition A holds. Then as n → ∞, ,
and H∂2γ̃(t, β)/∂β2 converges in probability to a deterministic function of (t, β) of bounded variation, uniformly in t ∈ [t1, t2] ⊂ (0, τ) and β ∈
at the rate n−1/2+ν for ν > 0.
Proof of Lemma 1
To simplify the presentations, we use the notations γaβ and γβ for γaβ (t) and γβ (t), respectively. Let θ = H(γa − γaβ) and θ̃ = H(γ̃a(t, β) − γaβ). By (3), θ̃ is the root of the following estimating function for fixed β:
| (25) |
where and Ũi (s, s − t) = H−1 Xi (s, s − t).
By the Glivenko-Cantelli theorem,
uniformly in t ∈ [t1, t2], β ∈
and θ ∈
, a neighborhood of 02q ∈ R2q, where
. The limit has a unique root at θ = 02q.
By the Glivenko-Cantelli theorem and (3), . It follows by Lemma 1 of Sun et al. (2009) that uniformly in t and β. Thus
| (26) |
Since Ua(γ̃a(t, β), β) ≡ 02q, γ̃a(t, β) satisfies
| (27) |
Note that
| (28) |
By the Glivenko-Cantelli theorem, the process
converges in probability to
uniformly in t ∈ [t1, t2], β ∈
and η in a neighborhood of γaβ (t) at the rate n−1/2+
ν for ν > 0.
It follows from (26) that
uniformly in t ∈ [t1, t2] and β ∈
at the rate n−1/2+ν for ν > 0.
Similarly,
| (29) |
uniformly in t ∈ [t1, t2] and β ∈
at the rate n−1/2+ν for ν > 0. It follows from (27) that
| (30) |
at the rate n−1/2+ν for ν > 0, uniformly in t ∈ [t1, t2] and β ∈
.
By a similar argument, H∂2γ̃ (t, β)/∂β2 converges in probability to a deterministic function of (t, β) of bounded variation, uniformly in t ∈ [t1, t2] and β ∈
.
Lemma 2
Under Condition A, as nh → ∞ and nh5 = O(1),
| (31) |
uniformly in t ∈ [t1, t2] ⊂ (0, τ), where and
Further, (nh)1/2n−1Uγ(γ0, β0) = Op(1) uniformly in t ∈ [t1, t2] ⊂ (0, τ).
Proof of Lemma 2
Let , ρn = (nh)1/2 and θ = ρn H(γa − γ0a(t)). By the first order Taylor expansion, we have
which holds uniformly in t ∈ [t1, t2]. Since θ̃ = ρn H (γ̃a(t, β0) − γ0a(t)) is the root of , it follows that θ̃ equals
The first q components of θ̃ yields
| (32) |
uniformly in t ∈ [t1, t2], where
By the local linear approximation for γ0(s) around t,
as s → t, where . It follows that
uniformly in t ∈ [t1, t2]. Hence
| (33) |
uniformly in t ∈ [t1, t2]. By (32) and (33),
| (34) |
uniformly in t ∈ [t1, t2].
Following the same lines as the proof in Appendix A of Tian et al. (2005), we get (nh)1/2n−1Uγ(γ0, β0) = Op(1) uniformly in t ∈ [t1, t2] ⊂ (0, τ).
Proof of Theorem 1
By Lemma 1 and application of the Glivenko-Cantelli theorem to the estimating function defined in (4), we have
uniformly for β ∈
. Since u(β0) = 0 and A is positive definite, β0 is the unique root of u(β). By Theorem 5.9 of van der Vaart (1998),
.
By Lemma 1 and the Glivenko-Cantelli theorem,
It follows that
| (35) |
uniformly in a neighborhood of β.
Now we show that n−1/2U (β0) converges in distribution to a normal distribution. By Taylor expansion,
By Lemmas 1 and 2,
Hence
| (36) |
which converges in distribution to N (0, Σ), where
| (37) |
Proof of Theorem 2
Since γ̂(t) = γ̃(t, β̂), we have uniform in t ∈ [0, τ] by Theorem 1 and Lemma 1. It also follows that for β* on the line segment between β̂ and β0. By Lemma 2 and (36),
where
Following the arguments of Lemma 2 of Sun (2010),
| (38) |
as nh2 → ∞ and nh5 = O(1). The consistency of the variance estimator for Σγ(t) follows from the proof of Theorem 2 of Sun (2010).
Proof of Theorem 3
By (31), (35) and (36), we have
which converges weakly to a zero-mean Gaussian process by Lemma 1 of Sun and Wu (2005).
Proof of (10)
Note that . Let
Then the matrix
is nonnegative definite.
Contributor Information
Yanqing Sun, Email: yasun@uncc.edu, Department of Mathematics and Statistics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
Liuquan Sun, Email: slq@amt.ac.cn, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Beijing, China.
Jie Zhou, Email: zhoujie@amss.ac.cn, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Beijing, China.
References
- Aalen OO. Nonparametric inference for a family of counting processes. Ann Stat. 1978;6:701–726. [Google Scholar]
- Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Springer; New York: 1993. [Google Scholar]
- Cheng SC, Wei LJ. Inferences for a semiparametric model with panel data. Biometrika. 2000;87:89–97. [Google Scholar]
- Fan J, Gijbels I. Local polynomial modelling and its applications. Chapman and Hall; London: 1996. [Google Scholar]
- Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc. 2004;99:710–723. [Google Scholar]
- Fan J, Huang T, Li R. Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc. 2007;102:632–641. doi: 10.1198/016214507000000095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
- Hu XJ, Sun J, Wei LJ. Regression parameter estimation from panel counts. Scand J Stat. 2003;30:25–43. [Google Scholar]
- Hu Z, Wang N, Carroll RJ. Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data. Biometrika. 2004;91:251–262. [Google Scholar]
- Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. J Am Stat Assoc. 2001;96:1045–1056. [Google Scholar]
- Lin DY, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data (with discussion) J Am Stat Assoc. 2001;96:103–113. [Google Scholar]
- Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572. [Google Scholar]
- Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Stat Soc Ser B. 2000;62(Part 4):711–730. [Google Scholar]
- Lin H, Song PX-K, Zhou QM. Varying-coefficient marginal models and applications in longitudinal data analysis. Sankhya. 2007;69:581–614. [Google Scholar]
- Martinussen T, Scheike TH. A semiparametric additive regression model for longitudinal data. Biometrika. 1999;86:691–702. [Google Scholar]
- Martinussen T, Scheike TH. A nonparametric dynamic additive regression model for longitudinal data. Ann Stat. 2000;28:1000–1025. [Google Scholar]
- Martinussen T, Scheike TH. Sampling adjusted analysis of dynamic additive regression models for longitudinal data. Scand J Stat. 2001;28:303–323. [Google Scholar]
- Martinussen T, Scheike TH. Dynamic regression models for survival data. Springer; New York: 2006. [Google Scholar]
- Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B. 1991;53:233–243. [Google Scholar]
- Sun Y. Estimation of semiparametric regression model with longitudinal data. Lifetime Data Anal. 2010;16:271–298. doi: 10.1007/s10985-009-9136-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Ser B. 2000;62:293–302. [Google Scholar]
- Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Stat. 2005;32:21–47. [Google Scholar]
- Sun Y, Gilbert PB, McKeague IW. Proportional hazards models with continuous marks. Ann Stat. 2009;37:394–426. doi: 10.1214/07-AOS554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. J Am Stat Assoc. 2005;100:172–183. [Google Scholar]
- Van der Vaart AW. Asymptotic statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
- Wu H, Liang H. Backfitting random varying-coefficient models with time-dependent smoothing covariates. Scand J Stat. 2004;31:3–19. [Google Scholar]
- Zhang Y. A semiparametric pseudolikelihood estimation method for panel count data. Biometrika. 2002;89:39–48. [Google Scholar]



