Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Apr 1.
Published in final edited form as: Lifetime Data Anal. 2009 Nov 5;16(2):271–298. doi: 10.1007/s10985-009-9136-2

Estimation of semiparametric regression model with longitudinal data

Yanqing Sun 1
PMCID: PMC3043558  NIHMSID: NIHMS161826  PMID: 19890712

Abstract

In a longitudinal study, an individual is followed up over a period of time. Repeated measurements on the response and some time-dependent covariates are taken at a series of sampling times. The sampling times are often irregular and depend on covariates. In this paper, we propose a sampling adjusted procedure for the estimation of the proportional mean model without having to specify a sampling model. Unlike existing procedures, the proposed method is robust to model misspecification of the sampling times. Large sample properties are investigated for the estimators of both regression coefficients and the baseline function. We show that the proposed estimation procedure is more efficient than the existing procedures. Large sample confidence intervals for the baseline function are also constructed by perturbing the estimation equations. A simulation study is conducted to examine the finite sample properties of the proposed estimators and to compare with some of the existing procedures. The method is illustrated with a data set from a recurrent bladder cancer study.

Keywords: Asymptotic efficiency, Data-driven bandwidth selection, Censored follow-up times, Kernel smoothing, Model misspecification, Panel count data, Proportional mean model, Sampling adjusted estimation, Weighted least squares estimator

1 Introduction

We consider semiparametric modeling of covariate effects on a longitudinal response process based on repeated measurements observed at a series of sampling times. Suppose that there is a random sample of n subjects. For the ith subject, let Yi (t) be the response process and let Zi (t) be the possibly time-dependent covariates of dimensions p × 1 over the time interval [0, τ]. For example, Yi (t) can be the response of the ith individual observed at time t, or the cumulative number of events that have occurred up to time t. We consider the following marginal proportional means model for Yi (t), 0 ≤ t ≤ τ,

μi(t)=E{Yi(t)Zi(t)}=μ0(t)exp{βTZi(t)},i=1,,n (1)

where μ0(t) is a completely unspecified function and β is a p-dimensional vector of unknown parameters. The notation βT represents transpose of a vector or matrix β.

In longitudinal studies, subjects are followed over a period of time and the responses are taken at different time points. Suppose that the observations of Yi (t) are taken at the sampling time points 0 ≤ ti1 < ti2 < … < tini ≤ τ, where ni is the total number of observations on the ith subject. The sampling times are often irregular and depend on covariates. In addition, some subjects may drop out of the study early. Let Ni(t)=Σj=0niI(tijt) be the number of observations taken on the ith subject by time t, where I (·) is the indicator function. Let Ci be the end of follow-up time or censoring time for the ith subject. Then the responses for the ith subject can only be observed at the time points before Ci. Thus Ni (t) can be written as Ni(tCi), where Ni(t) is a counting process for sampling times.

Nonparametric and semiparametric modeling of longitudinal data have been studied extensively in recent years. These include semiparametric methods by Moyeed and Diggle (1994); Zeger and Diggle (1994); Liang et al. (2003), and nonparametric methods by Hoover et al. (1998); Wu et al. (1998); Wu and Zhang (2002), Wu and Liang (2004), Martinussen and Scheike (1999, 2000, 2001); Lin and Ying (2001); Scheike (2002); Sun and Wu (2005) and Fan and Li (2004). Most of the aforementioned works focus on additive time-varying coefficients regression models for the mean response.

When Yi (t), 0 ≤ t ≤ τ, is a continuously observed counting process subject to right censoring, model (1) has been studied by Andersen and Gill (1982); Lawless and Nadeau (1995), and Lin et al. (2000) among others. Data collected on the individual response processes Yi (t), 0 ≤ t ≤ τ, at a finite set of sampling times are also called panel data. Many authors have studied nonparametric estimation of panel count data in the absence of covariates; cf. Sun and Kalbfleisch (1995); Wellner and Zhang (2000); Lu et al. (2007); Hu et al. (2009). The semiparametric model (1) for the mean function allows the study of a treatment effect based on panel data, where the responses are not limited to the count data. Zhang (2002) proposed a semiparametric pseudolikelihood method for model (1) under the assumption that Yi (t) is a nonhomogeneous Poisson process. For panel count data, model (1) has been studied by Sun and Wei (2000); Cheng and Wei (2000), and Hu et al. (2003). These works generally assume that the sampling times are independent of covariates or follow a proportional mean rate model to account for possible dependence on the covariates. However, mis-specification of the sampling model may mislead the inferences for the response process. Assuming time independent covariate processes, Hu et al. (2003, Sect. 2.2) proposed estimation procedure that does not require modelling of the observation process but assuming that the observation processes are discrete and that a subject has observations at some time points with positive probability. This approach is useful for situations where the panel count processes are observed at a fixed set of time points and would require discretizing the time scale in practice when the condition is not met.

We propose a sampling adjusted procedure for estimation of model (1) without having to specify a sampling model for observation times whether it is proportional, additive or transformation mean rate model, cf. Lin et al. (2000, 2001) and Scheike (2002). The proposed method applies to longitudinal response processes not limited to count data. Consider the following general mean rate model for sampling times:

E{dNi(t)Zi(t)}=α(t,Zi(t))dt=αi(t)dt, (2)

where, other than some imposed smooth conditions, the function α(t, z) is completely unspecified. We focus on the statistical analysis of model (1) for modelling the mean response without having to worry about the implications of model misspecification of the sampling times. Thus our estimator for β is robust to model misspecification of the sampling times. We also show that the proposed estimator for β is more efficient than the estimators proposed by Cheng and Wei (2000) and by Hu et al. (2003). In addition, a sampling adjusted estimator for the baseline function μ0(t) is proposed and its pointwise confidence intervals are constructed using the perturbing method of Jin et al. (2001). In the case when some of the covariates may be time-dependent, an added advantage of the proposed method is that Zi (t) need not to be observed at all time t. Only the values at sampling times are needed, which is practically convenient since the time-dependent covariates are usually observed at the sampling times.

The rest of the paper is organized as follows. In Sect. 2.1, we propose a sampling adjusted procedure for estimation of the proportional means model. Large sample properties are investigated. Pointwise confidence intervals for the baseline function is constructed in Sect. 2.2. The asymptotic efficiency of the proposed estimator is compared to the existing procedures in Sect. 2.3. A simulation study is presented in Sect. 3. An application of the proposed method to a recurrent bladder cancer study is given in Sect. 4. All proofs are given in the Appendix.

2 Estimation of proportional mean regression model

In this section, we propose an estimation procedure for model (1) based on the observations {(Yi (tij), Zi (tij)); j = 1, … , ni, i = 1, … , n.}. These are the values of {(Yi (t), Zi (t)), 0 ≤ t ≤ τ} observed at sampling times or the jump time points of Ni(t)=Ni(tCi),i=1,,n. Assume that the processes Ni() and Yi (·) are independent given Zi (·). The asymptotic results of the proposed estimators are proved. The asymptotic efficiency is discussed.

2.1 Estimation procedures and asymptotic properties

Under model (1) and Condition A given in the Appendix, Yi (t) dNi (t) − μ0(t)exp {βT Zi (t)} dNi (t) has mean zero for 0 ≤ t ≤ τ. The estimation of μ0(t) for fixed β can be derived based on this property. Because of sparsity of the data at time t. we gather the data around a neighborhood of t through kernel smoothing. This motivates the following estimate for μ0(t) when β is known:

μ~0(t;β)=Σi=1n0τKb(ts)Yi(s)dNi(s)Σi=1n0τKb(ts)exp{βTZi(s)}dNi(s), (3)

where Kb(x) = K (x/b)/b, K (·) is a kernel function and b is the bandwidth. Let

S~y(t)=n1Σi=1n0τKb(ts)Yi(s)dNi(s)S~(k)(t;β)=n1Σi=1n0τKb(ts)Zik(s)exp{βTZi(s)}dNi(s)

for k = 0, 1, 2, where Z⊗2 = ZZT, Z⊗1 = Z and Z⊗0 = 1. Define (t; β) = (1) (t; β)/(0) (t; β). We can write μ̃0(t; β) = y (t)/(0) (t; β).

A profile weighted least squares type estimator for β can be obtained by minimizing

l(β)=Σi=1nt1t2Wi(s)(Yi(s)μ~0(s;β)exp{βTZi(s)})2dNi(s), (4)

where Wi (t) are weight processes and [t1, t2] is a subinterval of [0, τ]. The trimming of the boundary points is to avoid the complicated boundary problems involved in the proofs of the asymptotic properties. Our simulations have shown that t1 and t2 can be taken to be very close to 0 and τ, respectively, usually less than one bandwidth in difference. Note that ∂μ̃0(t; β)/∂β = −μ̃0(t; β) (t; β). By taking the partial derivatives of l(β) with respect to β, we have

l(β)β=Σi=1nt1t2Wi(s)μ~0(s;β)exp{βTZi(s)}(Zi(s)Z~(s;β))(Yi(s)μ~0(s;β)exp{βTZi(s)})dNi(s). (5)

Let Wi (t) = W (t)(μ̃0(t; β)exp{βT Zi (t)})−1 where W (t) is some weight process not depending on β and i. We obtain the following estimation function:

U(β)=Σi=1nt1t2W(s)(Zi(s)Z~(s;β))(Yi(s)μ~0(s;β)exp{βTZi(s)})dNi(s). (6)

Let β̂ be the estimator of β such that U(β̂) = 0. We show that β̂ is asymptotically consistent and has an asymptotic normal distribution. A estimator for the baseline mean function is given by μ̂0(t) = μ̃0(t; β̂).

As we shall see later in Sect. 2.3, the proposed estimator gains efficiency over the estimator of Hu et al. (2003) by centering Yi (·) around its estimated mean, thus reducing variance, while Hu et al. (2003) avoided dealing with the baseline function μ0(·) in their estimation of parametric components.

Let μ̃1(t; β) = ∂μ̃0(t; β)/∂β. Taking partial derivative of U (β) with respect to β, we have

U(β)βT=Σi=1nt1t2W(s)(Zi(s)Z~(s;β))2μ~0(s;β)exp{βTZi(s)}dNi(s)Σi=1nt1t2W(s)(S~(2)(s;β)S~(0)(s;β)Z~(s;β)2)(Yi(s)μ~0(s;β)exp{βTZi(s)})dNi(s). (7)

By Lemma 1 given in the Appendix, the second term above is at the order of op(n). Let β0 be the true value of β under model (1). By the first order Taylor expansion, we have

n12(β^β0)=[n1U(β)βT]1n12U(β0), (8)

where β* is on the line segment between β̂ and β0.

Let ξi (t) = I (Cit), sy(t) = E(ξi (t)αi (t)Yi (t)) and s(k)(t)=E[ξi(t)αi(t)Zik(t)exp{βTZi(t)}], for k = 0, 1, 2. Let w(t) be a deterministic nonnegative function such that W(t)Pw(t) uniformly in t. Define (t) = S(1)(t)/S(0)(t), A=t1t2w(s)μ0(u)s(0)(u)(s(2)(u)s(0)(u)z(u)2)du, and

Σ=E[(t1t2w(s)(Zi(s)z(s))(Yi(s)μ0(s)exp{βTZi(s)})dNi(s))2]. (9)

The following theorem presents the asymptotic consistency of the estimator β̂. By (8), the asymptotic normality of β̂ follows from the asymptotic consistency of β̂, uniform convergence of n−1U (β)/∂βT in a neighborhood of β0 and asymptotic normality of n−1/2U (β0).

Theorem 1

Under Condition A given in the Appendix, β^Pβ0,n12(β^β0)𝒟N(0,A1ΣA1) and μ^0(t)Pμ0(t) uniformly in t ∈ [t1, t2]. Consistent estimators for  = −n−1 ∂U (β̂)/∂βT and

Σ^=n1Σi=1n(t1t2W(s)(Zi(s)Z~(s;β^))^i(s)dNi(s))2,

where ∈̂i(S) = Yi(S) − μ̂0(S)exp{β̂T Zi(S)}.

From (9), we see that the weight process W (·) assigns the weight W (tij) to the difference Yi (tij) − μ0(tij)exp{βT Zi (tij)} at the sampling time tij. Taking W (·) = 1 implies that the longitudinal observations at all the times are equally weighted for the estimation of β, which is what we recommend for most applications. On the other hand, one can choose to emphasize the early or late observations by selecting W (·) to be decreasing or increasing. The optimal choice of W (·) for the observed data such that Σ defined in (9) is minimized is a challenge problem and needs further exploration.

In practice, the appropriate bandwidth can be selected using a leave-one-subject-out cross validation approach suggested by Rice and Silverman (1991). In particular, Let

PE(b)=n1Σi=1n[0τ(Yi(t)μ^0(i)(t)exp{β^(i)TZi(t)})dNi(t)]2, (10)

where, for a given bandwidth b, β(i) and μ̂0(I)(t) are the estimators of β and μ0(t) based on the data without subject i. The data-driven bandwidth selection method is to choose the bandwidth b that minimizes the mean squares of fitted residuals PE(b). This data-driven bandwidth selection method is used in Sect. 4 for analyzing the bladder cancer data.

Next, we present an asymptotic result for the estimator μ̂0(t) of the baseline function. The result is useful for constructing confidence intervals for the mean response curve given the covariates. Let μ2 = ∫ u2 K (u) du and η(t) = (s(0)(t))−1[(sy(t))″ − μ0(t)(s(0)(t))″], where ( f (t))″ denotes the second derivative of f (t) with respect to t.

Theorem 2

Under Conditions A and B given in the Appendix,

(nb)12{μ^0(t)μ0(t)12b2μ2η(t)}𝒟N{0,(s(0)(t))2σ2(t)}

for t ∈ [t1, t2]. The asymptotic variance can be estimated consistently by ((0)(t))−2 σ̂2(t), where σ^2(t)=bn1Σi=1n[0τKb(ts)^i(s)dNi(s)]2.

2.2 Constructing confidence intervals for baseline function

The pointwise confidence intervals for the baseline function μ0(t) can be constructed based on the asymptotic normality given in Theorem 2. However, since μ̂0(t) has a slow convergence rate of (nb)−1/2, the resampling-based method yields more accurate coverage probability. Here we present a method by perturbing the estimation equations. This method has been studied and shown to have good empirical properties by Jin et al. (2001) and by other authors. Let ζ1, … ,ζn be iid random variables with mean 1 and variance 1, say, exponential random variables with mean 1. The perturbed estimating equation for μ0(t) for fixed β is

Σi=1nζi[Yi(t)dNi(t)μ0(t)exp{βTZi(t)}dNi(t)]=0. (11)

Similar to (3), we obtain the following smoothed estimator of μ0(t) for given β:

μ~0(t;β)=Σi=1nζi0τKb(ts)Yi(s)dNi(s)Σi=1nζi0τKb(ts)exp{βTZi(s)}dNi(s). (12)

Perturbing the estimation Eq. 6 yields

U(β)=Σi=1nζit1t2W(s)(Zi(s)Z~(s;β))(Yi(s)μ~0(s;β)exp{βTZi(s)})dNi(s). (13)

Let β̂* be the estimator of β such that U* (β̂*) = 0. Let μ^0(t)=μ~0(t;β^). Similar to Jin et al. (2001), the distributions of n1/2(β̂β) and n1/2(μ̂0(t) − μ0(t)) can be approximated by the distributions of n1/2(β̂* − β̂) and n12(μ^0(t)μ^0(t)). A 100(1 − α)% confidence interval for μ0(t), 0 ≤ tτ, can be constructed by (μ̂0(t) − q1−α/2(t), μ̂0(t) − qα/2(t)), where qα/2(t), q1−α/2(t) are the α/2 and 1 − α/2 quantiles of {μ^0k(t)μ^0(t),k=1,,B} based on B sets of perturbed estimation equations. Alternatively, the confidence interval for μ0(t) can be obtained by μ̂0(t) ± zα/2SE*(t), where SE*(t) is the standard deviation of the estimators {μ^0k(t),k=1,,B} based on B sets of perturbed estimation equations. Our simulations show that the two approaches have similar performances, only the first is presented in Sect. 3.

2.3 Asymptotic efficiency considerations

Cheng and Wei (2000) studied model (1) by assuming that αi (t) = λ0(t) does not depend on Zi (t). Their estimator of β has the asymptotic variance of A−1ΣCW A−1, where

ΣCW=E{t1t2w(s)(Zi(s)z(s))[Yi(s)dNi(s)μ0(s)λ0(s)exp{βTZi(s)}ξi(s)ds]}2.

Under noninformative censoring and the independence between Yi (·) and Ni() given Zi (·),

ΣCW=Σ+E{t1t2w(s)(Zi(s)z(s))μ0(s)exp{βTZi(s)}(dNi(s)ξi(s)λ0(s)ds)}2,

where Σ is defined in (9). It is clear that our estimator β̂ has smaller asymptotic variance, thus more efficient than the estimator of Cheng and Wei (2000).

Hu et al. (2003) studied model (1) under the assumption that the observation process Ni(t) follows the conditional proportional mean rate model

αi(t)=λ0(t)exp{αTZi} (14)

with λ0(t) an unspecified baseline mean function, and that covariate Zi is time independent. Hu et al. (2003, Sect. 2.3) obtained a joint estimator (β̂M, α̂M) for (β, α). Here we show that the HSW procedure is also applicable to time-dependent covariate and that our robust estimator is more efficient than the HSW estimator. We use a different parametrization, expressing the estimation equations as the functions of (β, α) instead of (β̃, α) as in Hu et al. (2003), where β̃ = β+α. Of course, the different ways of parametrization do not change the estimator for β or its asymptotic variance. Our parametrization only makes it easier to show that the proposed estimator β̂ is more efficient than the HSW estimator β̂M.

Let

SM(j)(t;β,α)=n1Σi=1nξi(t)Zi(t)jexp{(β+α)TZi(t),forj=0,1,2

and ZM(t;β,α)=SM(1)(t;β,α)SM(0)(t;β,α). The joint estimation equations of Hu et al. (2003) for (β, α) using data on [t1, t2] ⊂ [0, τ] are

U1M(β,α)=Σi=1nt1t2W(s)(Zi(s)ZM(s;β,α))Yi(s)dNi(s)U2M(α)=Σi=1nt1t2WO(s)(Zi(s)ZM(s;0,α))dNi(s),

where W(t) and WO(t) are the weight processes. Hu et al. (2003) have focused the investigation on the unit weights of W(t) = 1 and WO(t) = 1. Since WO(t) = 1 is the optimal weight for estimating α, we let WO(t) = 1.

Let M(t; β, α) be the limit of M(t; β, α) in probability. Let MiO(t)=t1Tξi(s)[dNi(s)exp(αTZi(s))λ0(s)ds], MiM(t)=t1Tξi(s)[Yi(s)dNi(s)μ0(s)exp((β+α)TZi(s))λ0(s)ds] and Δ=E{t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))μ0(s)exp{βTZi(s)}dMiO(s)}2. Let

[B11B12B21B22]=E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))dMiM(s)t1t2ξi(s)(Zi(s)zM(s;0,α))dMiO(s)]2} (15)

where B11 is the variance matrix of the top term in the square bracket, B12 is the covariance matrix the top term and the bottom term, B21=B12T and B22 is the variance matrix of the bottom term in the square bracket.

The following Theorem shows that our estimator β̂ is more efficient than the HSW estimator β̂M for β.

Theorem 3

Under the assumption that the observation process Ni(t) has the conditional proportional mean rate model (14), the asymptotic variance of HSW estimator β̂M for β is A−1ΣA−1+D, where D=A1ΔA1A1B12Aα1(A1B12Aα1)T+Aα1B22Aα1 is semi-positive definite and Aα is the usual information matrix associated with the proportional mean rate model defined in Condition (e) of Lin et al. (2000), which can be obtained by letting w(t) = 1, μ0(t) = 1 and β = 0 in A.

Hu et al. (2003) have taken [t1, t2] to be [0, τ] while we take [t1, t2] to be a subinterval of [0, τ] to avoid dealing with boundary problems. This small[ trim] of data will not result in much efficiency loss in analyzing the data. In fact, t1 and t2 can be chosen arbitrarily close to 0 and τ , respectively.

When the observation process is a counting process, MiO(t) is a martingale. It follows that B22 = Aα and

B12=E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))μ0(s)exp(βTZi(s))dMiO(s)]×[t1t2ξi(s)(Zi(s)zM(s;0,α))dMiO(s)]T}=E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))(Zi(s)zM(s;0,α))T×μ0(s)λ0(s)exp((β+α)TZi(s))ds]}=E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))2μ0(s)λ0(s)exp((β+α)TZi(s))ds]}+E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))(zM(s;β,α)zM(s;0,α))T×μ0(s)λ0(s)exp((β+α)TZi(s))ds]}=E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))2μ0(s)λ0(s)exp((β+α)TZi(s))ds]}=A.

By Theorem 3, the asymptotic variance of the HSW estimator β̂M is A1(Σ+Δ)A1Aα1. One can also see from the proof of Theorem 3 that β̂M and α̂M are asymptotically independent. In this case, the matrix A1ΔA1Aα1 is semi-positive definite. This expression shows that the variance for the estimator β̂M is reduced by modeling the observation process Ni(t) using the proportional mean rate model as expected. However, the proposed estimator gains more efficiency by subtracting the mean of Yi (t) in the estimation function U(β) in (6).

3 A simulation study

Some numerical simulation results are presented in this section to illustrate the feasibility and validity of the proposed methods. The responses are generated from the following model

Yi(t)=μ0(t)exp(β1Z1i+β2Z2i)+i(t),i=1,n, (16)

over the time interval [0, τ] where β1 = 1, β2 = −0.5 and (Z1i, Z2i) are independent identically distributed. We take τ = 24. The Z1i has a Bernoulli distribution with P(Z1i = 1) = 0.4 and Z2i is uniformly distributed on (0, 1). We consider two baseline functions μ0(t) = 0.2 + 0.5t and μ0(t) = 0.05t2. The i (t) has a normal distribution conditional on the ith subject with mean ϕi and variance σe2=1, and ϕi is normal with mean zero and variance σϕ2=1. The counting process Ni(t) is set to be a Poisson process with intensity rate of αi (t) over the interval [0, 24] . We consider two models for αi (t). One is the proportional mean rate model with αi (t) = 0.4 exp(0.3Z1i + 0.6Z2i). The other is the additive mean rate model with αi (t) = 0.48Z1i + 0.96Z2i. We generate the censoring random variable Ci from the uniform distribution on (0,60). With this censoring distribution, about 36% of subjects have censored observations under both the proportional mean rate model and the additive mean rate model. The average uncensored observations per subject is about 12 under the proportional mean rate model and 13 under the additive mean rate model. Trimming away a little bit of boundary points, we take [t1, t2] = [1, 23]. For simplicity, the weight function W(t) = 1 and use the Epanechnikov kernel K(t) = 0.75(1 − t2)+.

Table 1 summarizes some simulation results from the estimators of β1 and β2 for n = 100, 200, 300 under the proportional mean rate sampling model αi (t) = 0.4 exp(0.3Z1i + 0.6Z2i ) and for two different baseline functions. The bandwidths b = 1.5, 2.0 and 2.5 are used for the kernel smoothing. The simulation results based on the method of Hu et al. (2003) are indicated with HSW under the bandwidth column. The method of Hu et al. (2003) is developed under the proportional mean rate sampling model. In Table 2 we also list its simulation results under the misspecified additive mean rate sampling model αi (t) = 0.48Z1i + 0.96Z2i , to show its sensitivity to the model specification and to demonstrate the robust property of the proposed estimator. Each entry in Table 1 and 2 is based on 1000 repetitions (samples), where, for k = 1, 2, under b = 1.5, 2.0 and 2.5, Bias(βk) is the average of the estimation bias of β̂k; SSE(βk) is the sampling standard error of β̂k; ESE(βk) is the estimated standard error of β̂k; CP(βk) is the coverage probability of 95% confidence interval for βk using β̂k, and under b = HSW, they are corresponding summary statistics for the estimators of Hu et al. (2003).

Table 1.

Summary statistics for the estimators β̂ and β̂HSW under the proportional mean sampling model αi (t) = 0.4 exp(0.3Z1i + 0.6Z2i)

μ0(t) n b Bias(β1) Bias(β2) SSE(β1) SSE(β2) ESE(β1) ESE(β2) CP(β1) CP(β2)
0.2 + 0.5t 100 1.5 −0.0001 −0.0008 0.0391 0.0539 0.0370 0.0513 0.933 0.937
2.0 −0.0003 −0.0000 0.0369 0.0532 0.0369 0.0510 0.947 0.935
2.5 0.0025 0.0011 0.0377 0.0542 0.0369 0.0507 0.944 0.923
HSW −0.0010 0.0018 0.0548 0.1026 0.0522 0.0976 0.940 0.920
200 1.5 −0.0004 −0.0003 0.0264 0.0366 0.0263 0.0364 0.946 0.947
2.0 0.0013 0.0009 0.0268 0.0379 0.0263 0.0361 0.936 0.932
2.5 0.0010 0.0003 0.0264 0.0372 0.0263 0.0360 0.949 0.935
HSW −0.0009 0.0015 0.0377 0.0713 0.0369 0.0694 0.942 0.930
300 1.5 0.0004 0.0005 0.0222 0.0306 0.0216 0.0297 0.940 0.952
2.0 0.0007 0.0002 0.0210 0.0301 0.0216 0.0296 0.960 0.946
2.5 0.0002 −0.0001 0.0215 0.0294 0.0216 0.0296 0.948 0.952
HSW −0.0001 0.0027 0.0309 0.0570 0.0301 0.0568 0.943 0.952
0.05t2 100 1.5 −0.0001 −0.0005 0.0276 0.0381 0.0262 0.0364 0.932 0.939
2.0 −0.0003 −0.0000 0.0261 0.0378 0.0261 0.0362 0.947 0.936
2.5 0.0018 0.0008 0.0267 0.0385 0.0261 0.0360 0.942 0.925
HSW −0.0015 0.0032 0.0662 0.1284 0.0631 0.1230 0.939 0.928
200 1.5 −0.0003 −0.0002 0.0187 0.0258 0.0186 0.0257 0.945 0.948
2.0 0.0009 0.0006 0.0189 0.0268 0.0186 0.0256 0.938 0.933
2.5 0.0006 0.0002 0.0187 0.0264 0.0186 0.0255 0.949 0.932
HSW −0.0011 0.0025 0.0462 0.0904 0.0447 0.0880 0.938 0.934
300 1.5 0.0003 0.0003 0.0156 0.0216 0.0152 0.0209 0.940 0.951
2.0 0.0004 0.0001 0.0149 0.0213 0.0152 0.0209 0.960 0.946
2.5 0.0001 −0.0001 0.0151 0.0208 0.0152 0.0209 0.949 0.950
HSW −0.0006 −0.0036 0.0375 0.0708 0.0365 0.0723 0.938 0.947

Table 2.

Summary statistics for the estimators β̂ and β̂HSW under the additive mean sampling model with αi (t) = 0.48Z1i + 0.96Z2i.

μ0(t) n b Bias(β1) Bias(β2) SSE(β1) SSE(β2) ESE(β1) ESE(β2) CP(β1) CP(β2)
0.2 + 0.5t 100 1.5 0.0012 −0.0009 0.0446 0.0526 0.0431 0.0509 0.938 0.933
2.0 0.0005 −0.0016 0.0443 0.0544 0.0428 0.0517 0.940 0.929
2.5 0.0009 −0.0027 0.0433 0.0520 0.0427 0.0512 0.940 0.947
HSW 0.048 −0.1956 0.0615 0.0984 0.0592 0.0955 0.861 0.490
200 1.5 0.0006 −0.0008 0.0308 0.0379 0.0307 0.0366 0.938 0.938
2.0 0.0007 −0.0014 0.0304 0.0366 0.0305 0.0366 0.951 0.946
2.5 0.0002 0.0018 0.0304 0.0382 0.0305 0.0366 0.948 0.932
HSW 0.0459 −0.1962 0.0414 0.0676 0.0418 0.0683 0.828 0.177
300 1.5 0.0004 −0.0017 0.0248 0.0295 0.0251 0.0300 0.954 0.957
2.0 0.0003 0.0013 0.0243 0.0308 0.0250 0.0300 0.946 0.936
2.5 −0.0001 0.0006 0.0251 0.0307 0.0251 0.0298 0.952 0.929
HSW 0.0459 −0.1959 0.0331 0.0549 0.0341 0.0559 0.750 0.055
0.05t2 100 1.5 0.0008 −0.0004 0.0316 0.0374 0.0305 0.0361 0.937 0.934
2.0 0.0003 −0.0010 0.0313 0.0387 0.0302 0.0367 0.940 0.932
2.5 0.0005 −0.0016 0.0306 0.0370 0.0301 0.0364 0.942 0.951
HSW 0.0485 −0.1939 0.0692 0.1233 0.0668 0.1201 0.884 0.640
200 1.5 0.0005 −0.0005 0.0217 0.0268 0.0217 0.0259 0.939 0.938
2.0 0.0005 −0.0009 0.0214 0.0259 0.0215 0.0259 0.952 0.949
2.5 0.0002 0.0014 0.0215 0.0271 0.0216 0.0259 0.946 0.934
HSW 0.0463 −0.1951 0.0466 0.0856 0.0471 0.0860 0.854 0.392
300 1.5 0.0003 −0.0011 0.0176 0.0208 0.0177 0.0212 0.954 0.959
2.0 0.0003 0.0011 0.0172 0.0218 0.0177 0.0212 0.947 0.937
2.5 −0.0001 0.0005 0.0177 0.0216 0.0177 0.0210 0.952 0.931
HSW 0.0465 −0.1933 0.0380 0.0706 0.0384 0.0706 0.773 0.223

The biases for the proposed estimators are generally small and the coverage probabilities are close to 95% indicating appropriateness of the proposed estimation procedures for β. Under the proportional mean rate model, our estimator outperforms that of Hu et al. (2003) in terms of bias and standard error; see Table 1. When μ0(t) = 0.2 + 0.5t, the ratios of the standard errors of the proposed estimator to those of the HSW estimator are around 0.72 for β1 and 0.53 for β2. When μ0(t) = 0.05t2, the ratios of the standard errors of the proposed estimator to those of the HSW estimator are around 0.42 for β1 and 0.30 for β2. Our estimators have similar performances under the additive mean rate model while the estimators of Hu et al. (2003) fall apart, see Table 2. These numerical results are consistent with the large sample results derived in Sect. 2.

The biases for the baseline function estimator μ̂0(t), and the lengths and coverage probabilities of the 95% confidence intervals for μ0(t) for n = 100 under the proportional mean rate sampling model αi (t) = 0.4 exp(0.3Z1i + 0.6Z2i) and the additive mean rate sampling model αi (t) = 0.48Z1i + 0.96Z2i at a number of the grid points are given Table 3. The perturbation method described in Sect. 2.2 is used to obtain the lengths and the coverage probabilities. The first numbers in parenthesis are biases, the second and third numbers are coverage probabilities and lengths of 95% confidence intervals for μ0(t). Each entry in Table 3 is evaluated using 500 repetitions and 500 perturbation samples. We find in a simulation not presented here that using the perturbation method produces better results than using the bootstrap resampling at the subject level. Perhaps this is due to the variability in the number of observations across subjects. Table 3 shows that the biases are generally small, the coverage probabilities are close to 95% nominal level. The lengths of the 95% confidence intervals increase with t due to the increased variations in the estimator μ̂0(t).

Table 3.

Summary statistics for the estimator μ̂0(t) under the proportional mean sampling model (Cox) αi (t) = 0.4 exp(0.3Z1i + 0.6Z2i) and the additive mean sampling model (Aalen) αi (t) = 0.48Z1i + 0.96Z2i for n = 100. The first number in parenthesis is the bias, the second and third numbers the are coverage probability and length of a 95% confidence interval for μ0(t)

αi(t) b t = 7 t = 10 t = 13 t = 16 t = 19
μi (t) = (0.2 + 0.5t) exp(Z1i − 0.5Z2i)
Cox 1.5 (−0.0009, 92.4, 0.683) (0.0020, 94.4, 0.856) (0.0016, 94.0, 1.037) (−0.0013, 94.2, 1.228) (0.0016, 93.4, 1.425)
2.0 (−0.0036, 92.6, 0.668) (−0.0012, 94.4, 0.843) (−0.0026, 93.2, 1.026) (−0.0054, 94.4, 1.217) (−0.0037, 93.8, 1.416)
2.5 (−0.0135, 93.2, 0.657) (−0.0008, 95.0, 0.836) (−0.0099, 94.8, 1.021) (−0.0048, 95.8, 1.220) (−0.0159, 94.8, 1.414)
Aalen 1.5 (−0.0097, 95.0, 0.748) (−0.0070, 94.4, 0.979) (−0.0058, 94.6, 1.217) (−0.0051, 92.2, 1.465) (−0.0069, 93.6, 1.713)
2.0 (−0.0120, 94.2, 0.736) (−0.0103, 94.0, 0.970) (−0.0079, 94.8, 1.209) (−0.0095, 92.2, 1.459) (−0.0104, 93.8, 1.711)
2.5 (−0.0092, 92.8, 0.734) (−0.0113, 92.4, 0.962) (−0.0191, 93.2, 1.199) (−0.0211, 92.0, 1.446) (−0.0212, 93.2, 1.700)
μi (t) = 0.05t2 exp(Z1i − 0.5Z2i)
Cox 1.5 (0.0191, 92.2, 0.526) (0.0201, 92.4, 0.706) (0.0158, 93.2, 0.992) (0.0085, 95.2, 1.387) (0.0191, 93.8, 1.881)
2.0 (0.0327, 91.8, 0.510) (0.0307, 92.8, 0.699) (0.0232, 92.6, 0.993) (0.0138, 95.2, 1.395) (0.0138, 92.8, 1.895)
2.5 (0.0427, 93.0, 0.502) (0.0512, 94.6, 0.704) (0.0322, 94.4, 1.010) (0.0261, 95.6, 1.423) (0.0003, 93.2, 1.927)
Aalen 1.5 (0.0115, 93.6, 0.526) (0.0096, 94.0, 0.763) (0.0059, 94.2, 1.142) (0.0008, 92.8, 1.657) (0.0006, 94.8, 2.290)
2.0 (0.0252, 94.0, 0.513) (0.0194, 94.2, 0.761) (0.0162, 94.2, 1.145) (0.0039, 94.4, 1.666) (0.0020, 94.4, 2.305)
2.5 (0.0503, 93.0, 0.513) (0.0410, 92.4, 0.763) (0.0173, 93.6, 1.144) (0.0056, 93.2, 1.661) (−0.0131, 93.4, 2.301)

4 An Application

In this section, we apply the proposed method to analyze a data set from the bladder caner study, conducted by the Veterans Administration Cooperative Urological Research Group of USA over four years of period. All the 121 subjects entered trial had superficial bladder tumors (Byar 1980). These tumors were removed transurethrally and then subjects were randomly allocated to one of the three treatments, placebo, thiotepa and pyridoxine. There were 47 subjects in the placebo group, 38 in the thiotepa group and the rest in the pyridoxine. Identical tablets were given daily by mouth to the subjects in the placebo and pyridoxine groups, while for the subjects in the thiotepa group, thiotepa was instilled into the bladder for 2 hours once a week for 4 weeks and once a month thereafter. Many subjects had multiple new tumors during the study. The new tumors were removed at the clinical visits of the subjects. One of the study objectives was to evaluate the effectiveness of thiotepa by comparing the placebo and the thiotepa groups in tumor accumulation. Thus the number of subjects of interest is n = 85 and the follow-up time is τ = 48 months.

Let Yi(t) be the number of the accumulated new tumors for subject i by time t, t ∈ [0, τ]. The process Yi (·) is only observable at the subject's finite number of clinical visits. The times of clinical visits varied among individuals and it has been noticed that the thiotepa group tended to visit the clinics more frequently than the placebo group. Let Ni (·) be the counting process of the visiting times for subject i over the time period [0, τ]. Let Zi be a 3-D vector with the first component indicating whether the subject was in thiotepa group, and the second and third component being the number of tumors observed at the beginning of the study and the size of the largest initial tumors, respectively. The average number of tumors observed at the beginning of the study is 1.936 for the placebo group and 2.316 for the thiotepa group. The average size of the largest initial tumors is 2.085 for the placebo group and 1.921 for the thiotepa group. The exact censoring time Ci for subject i is not available and is taken to be the subject's last visit time. The bladder cancer data set is published in Hu et al. (2003). The plots of the new tumor accumulations against entry time are given in Fig. 1(a) for subjects in the placebo group and in Fig. 1(b) for subjects in the thiotepa group.

Fig. 1.

Fig. 1

The plots of the new tumor accumulations against entry time; (a) for the placebo group and (b) for the thiotepa group

The bladder cancer data has been analyzed by Hu et al. (2003) under the models (1) and (14) where they obtained an estimate of β̂M = (−1.482, 0.285, −0.083) for β and the standard error of (0.329, 0.062, 0.105). To apply the proposed robust method, we first choose a bandwidth using the leave-one-subject-out cross validation method described in Sect. 2. The plot of the mean squares of fitted residuals PE(b) defined in (10) against different bandwidths are given in Fig. 2. The minimal value of PE(b) is found at the bandwidth b = 9.0.

Fig. 2.

Fig. 2

The plot of the mean squares of fitted residuals against different bandwidths

We trim a small part of boundary points and take [t1, t2] = [1, 47]. With b = 9.0, our method yields β̂ = (−1.310, 0.248, −0.067) and the standard error of (0.315, 0.062, 0.098). For this example, Hu et al. (2003) have shown that the proportional mean rate model (14) fits the clinical visit times well. The two methods produce similar estimates. We have tried different bandwidths ranging from 3.0 to 14.0, the results on the estimation are very similar, differ only on the thousand-th decimal place. We also estimate the baseline function in the model (1) and its 95% pointwise confidence intervals. The plot is given in Fig. 3(a). The plots of mean accumulated new tumors for the placebo group and the thiotepa group at the average number of tumors at the beginning of the study and the average size of largest initial tumors are given in Fig. 3(b), showing that the thiotepa group has much less accumulated new tumors on average. The residual plots of {∈̂i (t), i = 1, …, 85} versus entry time are given in Fig. 3(c).

Fig. 3.

Fig. 3

(a) The plot of the baseline function estimate μ̂0(t) and 95% pointwise confidence intervals with the bandwidth b = 9.0 months; (b) The plots of mean accumulated new tumors for the placebo group and the thiotepa group at the average number of tumors (2.106) at the beginning of the study and the average largest initial tumors size (2.012); (c) The residual plots of {∈̂i (t), i = 1, …, 85}

The plot shows that the distribution of ∈̂i (t) for a given t is skewed to the right and is not symmetric around zero. The mean value of ∈̂i (t) seems to be around zero. This is reasonable since the response process Yi (t) here is the accumulated new tumor counts by time t, whose distribution exhibits similar pattern as that of a Poisson distribution.

5 Discussion

The procedure proposed here has several advantages over the existing methods. It is more flexible. The covariate processes need not to be observed at all the time, which is practically a useful feature when some of the covariates may change with time. It is robust to the assumptions of the sampling models since we do not need to specify a particular form for the conditional mean rate α(t, Zi (t)). This is an important improvement and generalization of the existing methods. One does not need to know whether α(t, Zi (t)) is of the proportional model (Lin et al. 2000), or the additive model (Scheike 2002), or the transformation model (Lin and Ying 2001). The proposed estimator is more efficient than the estimator of Cheng and Wei (2000) which assumes α(t, Zi (t)) not depending on Zi (t). It is also more efficient than the estimator of Hu et al. (2003) which assumes that α(t, Zi (t)) is proportional.

In the situations where the response process is the number of recurrent events that have occurred by time t, it is expected that the baseline function μ0(t) increases with time when covariate Zi is time-independent. Although the proposed estimator μ̂0(t) is still a valid estimator for μ0(t), it is desirable to have a estimator that increases with time. Sun and Kalbfleisch (1995) proposed an estimator for the mean function of point processes of recurrent events based on isotonic regression, which is shown to be a pseudo-maximum likelihood estimator by Wellner and Zhang (2000). It would be interesting to construct a isotonic regression type estimator for μ0(t).

6 Appendix

Large sample theory, such as presented by Van der Vaart (1998), together with an elegant technical lemma by Lin and Ying (2001), will be applied to prove the asymptotic results. We assume the following conditions throughout the paper:

Condition A. The censoring time Ci is noninformative in the sense that E{dNi(t)Zi(t),Cit}=E{dNi(t)Zi(t)} and E{Yi (t)|Zi (t), Cit} = E{Yi (t)|Zi (t)}; the censoring time Ci is allowed to depend on the covariate process Zi (·); the process Ni (·) is independent of Yi (·) given Zi (·); the processes Yi (t), Zi (t) and αi (t), 0 ≤ tτ, are bounded and their total variations are bounded by a constant; E|Ni (t2Ni (t1)|2L (t2t1) for 0 ≤ t1t2τ, where L > 0 is a constant; the weight function W (t) can be written as a difference of two monotone functions, each of which converges in probability to a deterministic function, such that W(t)Pw(t); the kernel function K (·) is symmetric with compact support on [−1, 1] and bounded variation; n → ∞, nb2 → ∞ and nb4 → 0; sy(t) and s(k)(t), k = 0, 1, 2, are twice differentiable; (s(0)(t))−1 are bounded over 0 ≤ tτ; and that A and Σ are positive definite.

Condition B. E|Ni (t + b) − Ni (tb)|2+ν = O(b), for some ν > 0; the limit σ2(t)=limnbE[0τKb(ts)i(s)dNi(s)]2 exists and is finite, where i (s) = Yi (s) − μ0(s) exp{βTZi (s)}.

We remark that under the conditional independence of Ni (·) and Yi (·) given Zi (·) and assuming noninformative censoring Ci, Condition B holds if Ni (·) is Poisson process and if E{(i (s))2ξi (s)αi (s)} is continuous in s ∈ (tb, t + b).

6.1 Technical lemmas

In the rest of the section, we drop β in the terms (k)(t; β) and (t; β) for ease of notation. Let S(k)(t)=n1Σi=1nξi(t)αi(t)Zi(k)(t)exp{βTZi(t)},k=0,1,2. Let (t) = S(1)(t)/S(0)(t). Let Mi(t)=Ni(t)0tξi(s)αi(s)ds. The process Mi (t) is a mean-zero process.

Lemma 1

Under Condition A, S~(k)(t)Ps(k)(t)fork=0,1,2,Z~(t)Pz(t) and μ~0(t;β)Pμ0(t), uniformly in t ∈ [t1, t2] and β𝒩 (β0), a neighborhood of β0.

Proof Similar to the proof given in the Appendix of Gilbert and Sun (2005), under Condition A, by Theorem 19.4 and 19.5 of Van der Vaart (1998), we have n1Σi=1n0tZi2(u)exp{βTZi(u)}dNi(u)Ps(k)(t) for k = 0, 1, 2, and n1Σn=1n0tYi(u)dNi(u)Psy(t), uniformly in t × β ∈ [t1, t2] × 𝒩 (β0). Since K (·) has bounded variation, nb2 → ∞ and nb4 → 0, it follows S~(k)(t)Ps(k)(t) and S~y(t)Psy(t), uniformly in t × β ∈ [t1, t2] × 𝒩 (β0), by integrations by parts. Thus, Z~(t)Pz(t) and μ~0(t;β)Pμ0(t), uniformly in t × β ∈ [t1, t2] × 𝒩 (β0). □

Lemma 2

Let η(t) = (s(0)(t))−1[(sy(t))″ − μ0(t)(s(0)(t))″].

(a) Under Condition A and B, (nb)12(μ~0(t;β)μ0(t)12b2μ2η(t))𝒟N(0,(s(0)(t))2σ2(t));

(b) Under Condition A, t1tn12(μ~0(s;β)μ0(s)) ds converges weakly to a mean-zero Gaussian process on [t1, t2], and n12Σi=1nt1t2W(s)(Zi(s)Z~(s)) exp{βTZi(s)} dMi(s) converges weakly to mean-zero normal distribution.

Proof We begin with the proof of the assertion (a). Since μ0(t) = sy(t)/s(0) (t), we have

n12(μ~0(t;β)μ0(t))=n12(S~y(t)S~(0)(t)sy(t)s(0)(t))=(S~(0)(t))1n12(S~y(t)sy(t))sy(t)(S~(0)(t)s(0)(t))1n12(S~(0)(t)s(0)(t)). (17)

Note that 0τKb(ts)sy(s)ds=sy(t)+12b2μ2(sy(t))+o(b2) and 0τKb(ts)s(0)(s)ds=s(0)(t)+12b2μ2(s(0)(t))+o(b2), uniformly in t ∈ [t1, t2]. Let Sy(t)=n1Σi=1nξi(t)αi(t)Yi(t). We have

n12(S~y(t)sy(t))=0τKb(ts)n12(Sy(s)sy(s))ds+0τKb(ts)n12Σi=1nYi(s)dMi(s)+12n12b2μ2(sy(t))+o(n12b2), (18)

and

n12(S~(0)(t)s(0)(t))=0τKb(ts)n12(S(0)(s)s(0)(s))ds+0τKb(ts)n12Σi=1nexp{βTZi(s)}dMi(s)+12n12b2μ2(s(0)(t))+o(n12b2). (19)

By (17), (18), (19), the convergence of ((0) (t))−1 in probability, the weak convergence of the processes n½(Sy(t) − sy(t)) and n½(S(0)(t)) − s(0)(t)) on the interval [t1, t2], and Lemma 1 of Sun and Wu (2005), we have

(nb)12(μ~0(t;β)μ0(t))=n12b12(s(0)(t))10τKb(ts)Σi=1nξi(s)αi(s)i(s)ds+n12b12(s(0)(t))10τKb(ts)Σi=1nYi(s)dMi(s)n12b12sy(t)(s(0)(t))20τKb(ts)Σi=1nexp{βTZi(s)}dMi(s)+12n12b52μ2(s(0)(t))1[(sy(t))μ0(t)(s(0)(t))]+op(b+n12b52). (20)

Let ϕi(t)=b120τKb(ts)i(s)dNi(s). It follows that

(nb)12(μ~0(t;β)μ0(t)12b2μ2η(t))=(s(0)(t))1n12Σi=1nϕi(t)+op(1). (21)

Note that E(ϕi (t)) = 0. Let Bn2=nσn2(t), where σn2(t)=Var(ϕi(t))=bE[0τKb(ts)i(s)dNi(s)]2σ2(t). Under Condition A and B, for ν > 0,

Σi=1nE|ϕi(t)|2+v=nb(1+v2)E|0τK((ts)b)i(s)dNi(s)|2+v=O(1)nb(1+v2)b=O(1)nbv2.

Hence, Σi=1nE|ϕi(t)|2+vBn2+v=O(1)(nb)v2=o(1). It follows that n12Σi=1nϕi(t)𝒟N(0,σ2(t)) by applying the Lindeberg-Feller central limit theorem for double arrays of random variables (cf. Serfling (1980), Corollary, page 32). Consequently,

(nb)12(μ~0(t;β)μ0(t)12b2μ2η(t))𝒟N(0,(s(0)(t))2σ2(t)). (22)

Next, we prove the assertion (b). By Lemma 1, ((0)(t))−1 converges in probability uniformly on the interval [t1, t2]. By Lemma 1 of Sun and Wu (2005), the processes n½(Sy(t) − sy(t)), n12Σi=1nt1tYi(s)dMi(s) n½(S(0)(t) − s(0)(t)) and n12Σi=1nt1texp{βTZi(s)} dMi(s) jointly converge in distribution to some mean-zero Gaussian processes on [t1, t2]. By (17), (18) and (19), applying Lemma 2 of Sun and Wu (2005) and Lemma A.1 of Lin and Ying (2001), we have

n12(t1t(μ~0(s,β)μ0(s))ds=n12Σi=1nt1t(s(0)(s))10τKb(su)(Yi(u)dNi(u)sy(u)du)dsn12Σi=1nt1tsy(s)(s(0)(s))20τKb(su)(exp{βTZi(u)}dNi(u)s0(u)du)ds+op(1)=n12Σi=1nt1t(s(0)(s))1(Yi(s)dNi(s)sy(s)ds)n12Σi=1nt1tsy(s)(s(0)(s))2(exp{βTZi(s)}dNi(s)s0(s)ds)+op(1)=n12Σi=1nt1t(s(0)(s))1(Yi(s)μ0(s)exp{βTZi(s)})dNi(s)+op(1), (23)

which converges weakly to mean-zero Gaussian process by Lemma 1 of Sun and Wu (2005).

The weak convergence of n12Σi=1nt1t2W(s)(Zi(s)Z~(s))exp{βTZi(s)} dMi(s), follows from the uniform convergence in probability of W(t) and (t), the weak convergence of n12Σi=1nt1t2Zi(s)exp{βTZi(s)}dMi(s) and n12Σi=1nt1t2exp{βTZi(s)}dMi(s), and applications of Lemma A.1 of Lin and Ying (2001).

6.2 Proof of Theorem 1

Proof of the asymptotic consistency of β̂

We use a similar argument to the Appendix A.1 of Lin et al. (2001). By Lemma 1 and Theorem 19.4 of Van der Vaart (1998), we have n1U(β0)P0asn Further, by (7), n1U(β)βPA uniformly in a neighborhood of β0. Note that A is continuous in β and is nonsingular in a neighborhood of β0. Then, for any δ > 0, there exists an event J with P(J) < δ and a small neighborhood of β0 inside of which the eigenvalues of −n−1∂U(β)/∂β are bounded away from zero for all large n on Jc, the complement of J. Thus, by the inverse function theorem (Goffman 1965, p.92), we can find a small neighborhood of β0, inside of which there exists a unique solution β̂ to U(β) = 0 for every sufficiently large n on Jc. Since δ can be arbitrarily small, it follows that unique β̂ exists in a neighborhood of β0 with probability 1. The nonnegative definiteness of −n−1∂U(β)/∂β in the entire domain of β implies the global uniqueness of β̂.

Considering an neighborhood of β0 in the preceding arguments for any > 0, we see that β̂ converges to β0 on Jc. This follows by P({β̂𝒩(β0)} ∩ Jc) → 1 as n → ∞. Thus, P(β̂𝒩 (β0)) → 1 since δ can be arbitrarily small, where 𝒩 (β0) is an neighborhood of β0. This proves the consistency of β̂.

Proof of the asymptotic normality of β̂

Taking partial derivative of U(β) with respect to β and applying Lemma 1, we have

U(β)βT=Σi=1nt1t2W(s)[μ~0(s;β)exp{βTZi(s)}Zi(s)+exp{βTZi(s)}μ~1(s;β)](Zi(s)Z~(s))TdNi(s)+op(n)=Σi=1nt1t2W(s)(Zi(s)Z~(s))2μ~0(s;β)exp{βTZi(s)}dNi(s)+op(n)=Σi=1nt1t2w(s)(Zi(s)z(s))2μ0(s)exp{βTZi(s)}dNi(s)+op(n),

It follows that

n1U(β)βTPA. (24)

Now, consider

n12U(β)=n12Σi=1nt1t2W(s)(Zi(s)Z~(s))(Yi(s)μ0(s)exp{βTZi(s)})dNi(s)n12Σi=1nt1t2W(s)(Zi(s)Z~(s))(μ~0(s;β)μ0(s))exp{βTZi(s)}dNi(s). (25)

By Lemma 1, 2 and repeatedly applying Lemma A.1 of Lin and Ying (2001), the second term of (25) equals to

n12Σi=1nt1t2W(s)(Z(s)Z~(s))(μ~0(s;β)μ0(s))S(0)(s)ds+n12Σi=1nt1t2W(s)(Zi(s)Z~(s))(μ~0(s;β)μ0(s))×exp{βTZi(s)}dMi(s)=op(1).

Hence,

n12U(β)=n12Σi=1nt1t2w(s)(Zi(s)z(s))(Yi(s)μ0(s)exp{βTZi(s)})dNi(s)+op(1) (26)

which converges in distribution to a normal random variable with variance equal to Σ.

By (8), (24) and (26), we have

n12(β^β)𝒟N(0,A1ΣA1). (27)

Proof of the uniform consistency of μ̂0(t)

The uniform consistency μ^0(t)Pμ0(t) for t ∈ [t1, t2] follows by β^Pβ and the uniform convergence of μ~0(t;β)Pμ0(t) on t ∈ [t1, t2] and β𝒩(β0)) in Lemma 1.

The consistency of  and Σ̂ follows from the consistency of β̂, the uniform consistency of μ̂0(t) and by Lemma 1.

6.3 Proof of Theorem 2

Under Condition A, for t ∈ [t1, t2], (nb)12(S~(0)(t,β^)S~(0)(t,β))P0. We have (nb)12(μ~0(t,β^)μ~0(t,β))P0. Thus, by Lemma 2, for t ∈ [t1, t2],

(nb)12(μ~0(t;β^)μ0(t)12b2μ2η(t))𝒟N(0,(s(0)(t))2σ2(t)).

Since S~(0)(t)Ps(0)(t) by Lemma 1, to prove the consistency of the variance estimator, it suffices to show

σ^2(t)=bn1Σi=1n[0τKb(ts)^i(s)dNi(s)]2Pσ2(t). (28)

The left hand side of (28) equals to

bn1Σi=1n[0τKb(ts)i(s)dNi(s)+0τKb(ts)(^i(s)i(s))dNi(s)]2.

The (28) is implied by

bn1Σi=1n[0τKb(ts)i(s)dNi(s)]2Pσ2(t) (29)
bn1Σi=1n[0τKb(ts)(^i(s)i(s))dNi(s)]2P0 (30)

Let ϕi(t)=b120τKb(ts)i(s)dNi(s). The limit (29) holds by using Chebychev's inequality because i(t)2σ2(t) and Var{n1Σi=1n(ϕi2(t)Eϕi2(t))}=n1{Eϕi4(t)(Eϕi2(t))2}n1Eϕi4(t)=O((nb2)1), under Condition A and B. To show (30), note that

bn1Σi=1n[0τKb(ts)(^i(s)i(s))dNi(s)]2max1insups[tb,t+b]|^i(s)i(s)|2bn1Σi=1n[0τKb(ts)dNi(s)]2.

Under Condition A and B, bE[0τKb(ts)dNi(s)]2=O(1), we have bn1Σi=1n[0τKb(ts)dNi(s)]2=Op(1) by the Markov inequality. By the consistency of β̂ and the uniform consistency of μ̂0(t) in Theorem 1, we have max1insups[tb,t+b]|^i(s)i(s)|P0. Hence the limit (30) holds.

6.4 Proof of Theorem 3

By Hu et al. (2003),

U1M(β,α)=Σi=1nt1t2ξi(s)w(s)(Zi(s)zM(s;β,α))dMiM(s)+op(n12) (31)
U2M(α)=Σi=1nt1t2ξi(s)(Zi(s)zM(s;0,α))dMiO(s)+op(n12),

where M(t; β, α) is the limit of M(, α) in probability.

Taking the partial derivatives of U1M(β,α) and U2M(α) with respective to β and α, Hu et al. (2003) showed that

U1M(β,α)βT=Σi=1nt1t2W(s)(SM(2)(s;β,α)SM(0)(s;β,α)ZM(s;β,α)2)×Yi(s)dNi(s)PAU2M(α)αT=Σi=1nt1t2(SM(2)(s;0,α)SM(0)(s;0,α)ZM(s;0,α)2)dNi(s)PAα, (32)

and that U1M(β,α)αT=U1M(β,α)βTPA and U2M(α)βT=0.

By (15), (31) and (32), the asymptotic covariance matrix of (β̂M, α̂M is

ΣM=[AA0Aα]1[B11B12B21B22][A0AAα]1=[A1Aα10Aα1][B11B12B21B22][A10Aα1Aα1]=[ΣM11ΣM12ΣM21ΣM22], (33)

where

ΣM11=A1B11A1A1B12Aα1(A1B12Aα1)T+Aα1B22Aα1 (34)

is the asymptotic variance matrix of β̂M, ΣM21=Aα1B21A1Aα1B22Aα1 is the asymptotic covariance matrix of α̂M and β̂M, ΣM12=(ΣM21)T and ΣM22=Aα1B22Aα1 is the asymptotic variance matrix of α̂M.

Note that

MiM(t)=t1tξi(s)[Yi(s)μ0(s)exp(βTZi(s))]dNi(s)+t1tξi(s)μ0(s)exp(βTZi(s))dMiO(s). (35)

We have the following decomposition for the top term in the square bracket in (15):

t1t2ξi(s)w(s)(Zi(s)ZM(s;β,α))dMiM(s)=t1t2ξi(s)w(s)(Zi(s)ZM(s;β,α))[Yi(s)μ0(s)exp(βTZi(s))]dNi(s)+t1t2ξi(s)w(s)(Zi(s)ZM(s;β,α))μ0(s)exp(βTZi(s))dMiO(s). (36)

Since the two terms in (36) are uncorrelated, we have

B11=Σ+E{t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))μ0(s)exp{βTZi(s)}dMiO(s)}2=Σ+Δ. (37)

Hence we can write

ΣM11=A1ΣA1+A1ΔA1A1B12Aα1(A1B12Aα1)T+Aα1B22Aα1=A1ΣA1+D. (38)

Note that the first term is asymptotic variance matrix of our estimator β̂. It remain to show that the matrix D is semi-positive definite.

Because the first term in (36) is uncorrelated to the second term in (36) and is also uncorrelated to the bottom term in in the square bracket in (15), replacing the top term in (15) by the second term of its decomposition in (36), we have

E{[t1t2ξi(s)w(s)(Zi(s)zM(s;β,α))μ0(s)exp(βTZi(s))dMiO(s)t1t2ξi(s)(Zi(s)zM(s;0,α))dMiO(s)]2}=[ΔB12B21B22]. (39)

Now replacing the middle matrix of (33) by the above matrix, we have that the matrix

[AA0Aα]1[ΔB12B21B22][A0AAα]1

is semi-positive definite. Hence, its first block matrix D=A1ΔA1A1B12Aα1(A1B12Aα1)T+Aα1B22Aα1 is also semi-positive definite.

References

  1. Andersen PK, Gill RD. Cox'x regression model for counting processes: a large sample study. Annal Stat. 1982;10:1100–1120. [Google Scholar]
  2. Byar DP. The veterans administration study of chemoprophylaxis for recurrent stage I bladder tumors: comparison of placebo, pyridoxine, and topical thiotepa. In: Pavone-Macaluso M, Smith PH, Edsmyn F, editors. Bladder tumors and other topics in urological oncology. Plenum; New York: 1980. pp. 363–370. [Google Scholar]
  3. Cheng SC, Wei LJ. Inferences for a semiparametric model with panel data. Biometrika. 2000;87:89–97. [Google Scholar]
  4. Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc. 2004;99:710–723. [Google Scholar]
  5. Gilbert PB, Sun Y. Failure time analysis of HIV vaccine effects on viral load and treatment initiation. Biostatistics. 2005;6:374–394. doi: 10.1093/biostatistics/kxi014. [DOI] [PubMed] [Google Scholar]
  6. Goffman C. Calculus of several variables. Harper and Row; New York: 1965. [Google Scholar]
  7. Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
  8. Hu XJ, Lagakos SW, Lockhart RA. Generalized least squares estimation of the mean function of a counting process based on panel counts. Stat Sinica. 2009;19:561–580. [PMC free article] [PubMed] [Google Scholar]
  9. Hu XJ, Sun J, Wei LJ. Regression parameter estimation from panel counts. Scand J Stat. 2003;30:25–43. [Google Scholar]
  10. Jin Z, Ying Z, Wei LJ. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]
  11. Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168. [Google Scholar]
  12. Liang H, Wu H, Carroll RJ. The relationship between virologic and immunologic responses in AIDS clinical research using mixed-effects varying-coefficient semiparametric models with measurement error. Biostatistics. 2003;4:297–312. doi: 10.1093/biostatistics/4.2.297. [DOI] [PubMed] [Google Scholar]
  13. Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J Royal Stat Soc B. 2000;62:711–730. [Google Scholar]
  14. Lin DY, Wei LJ, Ying Z. Semiparametric transformation models for point processes. J Am Stat Assoc. 2001;96:620–628. [Google Scholar]
  15. Lin DY, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data (with discussion) J Am Stat Assoc. 2001;96:103–113. [Google Scholar]
  16. Lu M, Zhang Y, Huang J. Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika. 2007;94:705–718. [Google Scholar]
  17. Martinussen T, Scheike TH. A semiparametric additive regression model for longitudinal data. Biometrika. 1999;86:691–702. [Google Scholar]
  18. Martinussen T, Scheike TH. A nonparametric dynamic additive regression model for longitudinal data. Annal Stat. 2000;28:1000–1025. [Google Scholar]
  19. Martinussen T, Scheike TH. Sampling adjusted analysis of dynamic additive regression models for longitudinal data. Scand J Stat. 2001;28:303–323. [Google Scholar]
  20. Moyeed RA, Diggle PJ. Rates of convergence in semiparametric modelling of longitudinal data. Aust J Stat. 1994;36:75–93. [Google Scholar]
  21. Rice JA, Silverman B. Estimating the mean and covariance structure nonparametrically when the data are curves. J Royal Stat Soc B. 1991;53:233–243. [Google Scholar]
  22. Scheike TH. The additive nonparametric and semiparametric Aalen model as the rate function for a counting process. Lifetime Data Anal. 2002;8:247–262. doi: 10.1023/a:1015849821021. [DOI] [PubMed] [Google Scholar]
  23. Serfling RJ. Approximation theorems of mathematical statistics. Wiley; New York: 1980. [Google Scholar]
  24. Sun J, Kalbfleisch JD. Estimation of the mean function of point process based on panel count data. Stat Sinica. 1995;5:279–289. [Google Scholar]
  25. Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Stat. 2005;32:21–47. [Google Scholar]
  26. Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. J Royal Stat Soc B. 2000;62:293–302. [Google Scholar]
  27. Van der Vaart AW. Asymptotic statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
  28. Wellner JA, Zhang Y. Two estimators of the mean of a counting process with panel count data. Annal Stat. 2000;28:779–814. [Google Scholar]
  29. Wu CO, Chiang CT, Hoover D. Asymptotic confidence regions for kernel smoothing of a time-varying coefficient model with longitudinal data. J Am Stat Assoc. 1998;88:1388–1402. [Google Scholar]
  30. Wu H, Liang H. Backfitting random varying-coefficient models with time-dependent smoothing covariates. Scand J Stat. 2004;31:3–19. [Google Scholar]
  31. Wu H, Zhang JT. Local polynomial mixed-effects models for longitudinal data. J Am Stat Assoc. 2002;97:883–897. [Google Scholar]
  32. Zeger SL, Diggle PJ. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics. 1994;55:452–459. [PubMed] [Google Scholar]
  33. Zhang Y. A semiparametric pseudolikelihood estimation method for panel count data. Biometrika. 2002;89:39–48. [Google Scholar]

RESOURCES