Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Sep 23.
Published in final edited form as: J Am Stat Assoc. 2004 Dec;99(468):1153–1165. doi: 10.1198/016214504000001033

Joint Modeling and Estimation for Recurrent Event Processes and Failure Time Data

Chiung-Yu Huang 1, Mei-Cheng Wang 2
PMCID: PMC3780991  NIHMSID: NIHMS511256  PMID: 24068850

Abstract

Recurrent event data are commonly encountered in longitudinal follow-up studies related to biomedical science, econometrics, reliability, and demography. In many studies, recurrent events serve as important measurements for evaluating disease progression, health deterioration, or insurance risk. When analyzing recurrent event data, an independent censoring condition is typically required for the construction of statistical methods. In some situations, however, the terminating time for observing recurrent events could be correlated with the recurrent event process, thus violating the assumption of independent censoring. In this article, we consider joint modeling of a recurrent event process and a failure time in which a common subject-specific latent variable is used to model the association between the intensity of the recurrent event process and the hazard of the failure time. The proposed joint model is flexible in that no parametric assumptions on the distributions of censoring times and latent variables are made, and under the model, informative censoring is allowed for observing both the recurrent events and failure times. We propose a “borrow-strength estimation procedure” by first estimating the value of the latent variable from recurrent event data, then using the estimated value in the failure time model. Some interesting implications and trajectories of the proposed model are presented. Properties of the regression parameter estimates and the estimated baseline cumulative hazard functions are also studied.

Keywords: Borrow-strength method, Frailty, Informative censoring, Joint model, Nonstationary Poisson process

1. INTRODUCTION

Recurrent event data are often collected in longitudinal follow-up studies. During the observation period, recurrent events, such as repeated tumor occurrences (Byar 1980), repeated hospitalizations (Eaton et al. 1992a,b), or recurrent injuries (Wassell, Wojciechowski, and Landen 1999), are recorded in the studies. The observation of recurrent events could be terminated (i.e., censored) by loss to follow-up, end of the study, or a failure event such as death. Conventional analysis usually focuses on either failure time data (Cox 1972; Cox and Oakes 1984) or recurrent event data [Prentice, Williams, and Peterson 1981; Andersen and Gill 1982; Pepe and Cai 1993; Lin, Wei, Yang, and Ying 2000; Wang, Qin, and Chiang (WQC) 2001]. In this article, the event process and the failure time are both of interest, and we consider the joint modeling of a recurrent event process and a failure time.

In analyzing recurrent event data, an independent censoring condition is usually required for the development of statistical methods under different types of models. When a failure event serves as a part of the censoring mechanism, validity of the independent censoring assumption is violated when the recurrent event process is correlated with the failure time. Lancaster and Intrator (1998) considered a joint parametric model of the recurrent event process and the failure time, and demonstrated the use of their methodology using AIDS panel data. In their work, then used a latent variable to characterize the association between the recurrent event process and the failure time, and a common baseline function is shared by the intensity of the recurrent event process and the hazard of the failure time. In non-parametric and semiparametric settings, WQC (2001) proposed estimation procedures for estimating the cumulative rate function and regression parameters under multiplicative intensity models with dependent censoring. The WQC model focused on the distributional pattern of the recurrent event process where the censoring time was treated as a nuisance and the joint modeling of recurrent event process and failure time was not considered.

To jointly model recurrent events and failure time, Ghosh and Lin (2003) studied correlated marginal models for these two outcomes. At the cost of censoring some of the originally uncensored data, they developed estimation inferences with the correlation between recurrent events and failure time unspecified. Using a general censoring pattern, Huang and Wang (2002) proposed statistical methods to study two nested joint models of a recurrent event process and a failure time, where the correlation of the two outcomes is partially specified in the conditional distribution of the recurrent event process given the failure time. Note that neither of those two articles used frailty in their joint models. In this article we consider joint modeling of the recurrent event process and the failure time via frailty. This joint model has attractive features of frailty models, especially in its interpretation of correlation, and avoids parametric assumptions on the frailty term.

The article is organized as follows. In Section 2 we introduce a joint model of recurrent event process and failure time in which a common subject-specific latent variable (frailty) is used to model the association between intensity of the recurrent event process and hazard of the failure time. The proposed joint model is flexible in that no parametric assumptions on the distributions of censoring times and latent variables are made, and under the model, informative censoring is allowed for observing both the recurrent events and failure times. In Section 3 we present theoretical implications and trajectories of the proposed model. In Section 4 we study a “borrow-strength estimation procedure” by first estimating the value of the latent variable from recurrent event data, then using the estimated values in the failure time models. We explore properties of the regression parameter estimators and the estimated baseline cumulative hazard functions. In Section 5 we report results of simulation studies, along with the application to a Denmark schizophrenia case cohort study, and we conclude with discussion in Section 6.

2. NOTATION AND THE JOINT MODEL

Let N(t) denote the number of events occurring before or at time t, and let D be the failure time and C be the potential censoring time for reasons other than the failure event. The research interest is to derive inferential results on N(·) and D within a fixed time interval [0, T0], where the event process potentially could be observed beyond T0. Let X be a 1 × p vector of covariates. We then make the following model assumptions:

  • (M1)
    There exists a nonnegative-valued latent variable Z so that, given X = x and Z = z, the recurrent event process N(·) is a nonstationary Poisson process with intensity function
    λ(t)=zλ0(t)exp(xα),0tT0,
    where α is a p × 1 vector of parameters and the baseline intensity function λ0(t) is a continuous function with Λ0(T0)=0T0λ0(u)du=1. The latent variable Z satisfies E(Z|X) = E(Z).
  • (M2)
    Given (x, z), the hazard function of D takes the form
    h(t)=zh0(t)exp(xβ),
    where β is a p × 1 vector of parameters and the baseline hazard function h0(t) is continuous.
  • (M3)

    Conditioning on (x, z), (N(·), D, C) are mutually independent.

The occurrence of recurrent events is modeled by a subject-specific Poisson process via a latent variable. Conditioning on z, the rate function equals the intensity function, because a Poisson process is memoryless. Under (M1), the baseline in tensity function λ0(t) is shared by all subjects and is left unspecified. A multiplicative hazard function with the same latent variable but a different baseline function is assumed for the hazard of failure event in (M2). Clearly, a large value of z inflates both the intensity of recurrent events and the hazard of the failure event. Under assumption (M3), D, C, and N(·) are allowed to be correlated via their connection with (x, z). This model relaxes the requirement that a common baseline function be shared by the intensity of N(·) and the hazard of D assumed by Lancaster and Intrator (1998), while keeping the semiparametric model features of WQC (2001). Define Y = min(C, D, T0), the time when the observation of the recurrent event process is terminated, and Δi = I(DiYi), the observed censoring indicator. By further conditioning on z, the usual independent censoring condition that N(·) is independent of Y given x is relaxed for recurrent events, and, interestingly, the independent censoring condition that D be independent of C given x is also relaxed for failure time data.

Note that the rate function of event occurrence at time t in a random population, for study subjects with explanatory variable x, is μZλ0(t) exp(xα), where μZ = E[Z]. In many public health and biomedical studies, the rate function is preferred for analysis, especially in identifying treatment effects and risk factors, because of its marginal interpretation. For instance, the parameter α can be interpreted as the logarithm of the ratio of the rate function for every unit increase in the explanatory variable.

Under (M1)–(M3), the distribution of Z, the baseline functions λ0(t) and h0(t), and the distribution of C serve as nonparametric components in the model. In the next section we examine model implications with or without additional parametric assumptions on Z, yet with no parametric assumptions made on Z for our development of estimation inferences in Section 4.

3. MODEL IMPLICATIONS

Let H(t)={N(u):0ut} be the event history up to t, and let t1t2 ≤ … ≤ tN(t) be the ordered event times before or at t. Define fZ(·) as the probability density function of the latent variable Z, f(·) as a general probability density function, and f(·|·) as a general conditional probability density function. In this section we discuss model implications under the proposed joint model with or without additional parametric assumptions on Z. To simplify the discussion, we consider the reduced model without covariates. Similar results for regression models with covariates can be obtained with replacement (Λ0(t), λ0(t), H0(t), h0(t)) by (Λ0(t)exα, λ0(t)exα, H0(t)exβ, h0(t)exβ).

Proposition 1 (Posterior mean of Z). Given the observed recurrent event data, we show in Appendix A that the posterior mean of Z can be expressed as

E[Z]H(y),y]=N(y)+1Λ0(y)×f(N(y)+1y)f(N(y)y),

where f(N(y)|y) is the conditional probability density function of N(Y) given Y. We can see that, given the follow-up time y, the posterior mean depends on the event history H(y) only through the number of observed events. The posterior mean can be used for individual-specific prediction when additional model assumptions are available for obtaining an explicit form of the formula.

Proposition 2 (Residual lifetime). Let t and s be nonnegative constants. For individuals who survive beyond time t (Dt), the conditional probability for the residual life time to be longer than s units of time given H(t) is

P(Dt+sH(t),Dt)=E[eZ{H0(t+s)+Λ0(t)}ZN(t)]E[eZ{H0(t)+Λ0(t)}ZN(t)].

The derivation is given in Appendix A. The computation implies that

P(Dt+sN(t),Dt)=P(Dt+sH(t),Dt);

that is, the residual life time probability depends on the event history only through the number of events occurring up to time t. Further, the median residual lifetime after time t can be obtained by solving P(Dt + s|N(t), Dt) = 1/2. The residual lifetime unconditional on the event history has the survival function

P(Dt+sDt)=E[eZH0(t+s)]E[eZH0(t)].

For the specific case where Z is distributed as gamma(a, b) with mean a/b, the residual lifetime probability, given the event history, has the survivor function

P(Dt+sN(t),Dt)=(b+H0(t)+Λ0(t)b+H0(t+s)+Λ0(t))N(t)+a.

This conditional survival function has the following interesting interpretation. With each additional event occurrence in the time interval [0, t], the survival probability at time t + s is decreased by the constant factor, {b + H0(t) + Λ0(t)}/{b + H0(t + s) + Λ0(t)} where the constant factor has a value between 0 and 1 and depends on (H0(t), H0(t + s), Λ0(t), b). In additional, with the assumption that Z is distributed as gamma(a, b), the survival function for the residual lifetime unconditional on the event history can be expressed as

P(Dt+sDt)=(b+H0(t)b+H0(t+s))a.

It is then interesting to see that P(Dt + s|N(t) = 0, Dt) ≥ P(Dt + s|Dt), where the inequality relationship becomes strict if Λ0(t) > 0 and H0(t + s) > H0(t) as s > 0. That is, survivors at time t who experienced no events before t would have higher probability to live s units of residual lifetime than those population survivors at time t.

Proposition 3 (Residual lifetime for censored subjects). It is also possible to examine the residual lifetime of those who are censored at time t given the event history

P(Dt+sY=t,Δ=0,H(t))=E[eZ{H0(t+s)+Λ0(t)}ZN(t)fc(tZ)]E[eZ{H0(t)+Λ0(t)}ZN(t)fc(tZ)],

where Δ = I(DC) is the censoring indicator and fc(t|z) is the conditional probability density function of the censoring time, C, given Z. If we assume that the hazard function of C given Z is λc(t|z) = zg0(t) and that Z is distributed as gamma(a, b), then we have

P(Dt+sN(t),Y=t,Δ=0)=(b+H0(t)+Λ0(t)+G0(t)b+H0(t+s)+Λ0(t)+G0(t))a+N(t)+1,

where G0 is the cumulative distribution function of g0. With each additional event, the probability of surviving an extra s unit of time after being censored at t is decreased by a constant factor, where the constant factor depends on (H0(t), H0(t + s), Λ0(t), G0(t), b).

Proposition 4 (Effect of failure time on recurrent events). We derive in Appendix A the mean function of the recurrent event process conditional on the failure time. For ts,

E[N(s)Dt]=E[ZeZH0(t)]E[eZH0(t)]Λ0(s).

The mean function given failure time can be decomposed into two parts, one part depending on the baseline cumulative rate function, and the other part depending on the baseline cumulative hazard function and the frailty distribution. The function E[N(s)|Dt] can be further shown to be decreasing in t, where ts. This result is intuitive, because our model implies that the subject-specific event occurrence rate is positively correlated with the risk of failure event; subjects who survive longer tend to have lower event occurrence rates.

4. ESTIMATION PROCEDURE AND ASYMPTOTIC PROPERTIES

4.1 A Brief Review

Let subscript i be the index for a subject, i = 1, 2, …, n. For subject i, let Xi denote the time-independent covariate, Zi denote the subject-specific latent variable, Yi denote the observed terminating time for observing the event process Ni(·), Di denote the failure time, and Δi = I(DiYi) denote the censoring indicator. We further let mi denote the number of recurrent events occurring before time Yi and ti1, …, timi denote the observed event times for subject i. For ease of notation, we use mi and tij, i = l,2, …, n, j = 1,2, …, mi, to denote either random variables or realized values. Assume that {(Xi, Zi, Ni(·), Di, Ci)} are iid, so that the observed {(Xi, Zi, mi, (ti1, …, timi), Yi)} are also iid.

Under assumption (M3), Y and N(·) are independent given the values of Z and X. The estimation procedure of WQC (2001) can then be adopted to estimate Λ0 and α. A key step of their estimation procedure is to observe that, conditional on (xi, yi, zi, mi), the observed event times, {ti1, ti2, …, timi}, are the order statistics of a set of iid random variables with the density function πi(t), where, for zi > 0,

πi(t)=ziλ0(t)exp(αxi)ziΛ0(yi)exp(αxi)=λ0(t)Λ0(yi),0tyi.

Note that πi(t) depends on neither zi nor xi, and it is a truncated density function of λ0(t) with observations truncated from the right side of yi. As a result, the conditional likelihood function Lc given (xi, yi, zi, mi), where

Lci=1nj=1miλ0(tij)Λ0(yi),

does not require information on xi and the unobserved zi. Although the data are correlated, computationally the conditional likelihood has the form of the nonparametric likelihood for independently right-truncated data. The nonparametric maximum likelihood estimator (MLE) of Λ0, Λ^0, based on randomly truncated data is known to have a product-limit representation (Wang, Jewell, and Tsai 1986),

Λ^0(t)=s(l)>t(1d(l)R(l)),

where {s(l)} are the ordered and distinct values of the event times {tij}, d(l) is the number of events occurring at s(l), and R(l) is the total number of events with event time and observation terminating time satisfying {tijs(l)yi}.

It follows from E[mi|Xi, Yi, Zi] = Ziexp(Xiα0(Yi) that

E[miΛ01(Yi)Xi,Yi]=E[E[miXi,Yi,Zi]Λ01(Yi)Xi,Yi]=μZexp(Xiα).

Thus a class of estimating equations for α is defined as

n1i=1nwiXiT(miΛ0(Yi)1exp(Xiγ))=0, (1)

where Xi=(1,Xi), γT = (ln(μZ), αT), and wi is a weight function depending on (Xi, γ, Λ0). An estimate of α, α^ can be obtained by solving the estimating equation with Λ0(Yi) replaced by Λ^0(Yi).

It is clearly seen that the estimation focus of WQC (2001) was placed on the recurrent event process where the occurrence of the failure event is treated as a nuisance. In Section 4.2 we consider inferential results for the failure event as well as the joint model.

4.2 A Borrow-Strength Method

Let ε^ and ε represent the sample empirical means and the limit of average expectation. More specifically, for any function a of (X, Y, Z, Δ), let ε^{a(X,Y,Z,Δ)}=n1i=1na(Xi,Yi,Zi,Δi) and ε{a(X,Y,Z,Δ)}=limnn1i=1nE[a(Xi,Yi,Zi,Δi)], assuming existence of the limit.

Conditional on {(Xi, Yi, Zi), i = 1,…, n}, under (M2) the score function derived from the partial likelihood can be expressed as

U(β)=1ni=1nΔi{XiΣj=1nXjZjexp(Xjβ)I(YjYi)Σj=1nZjexp(Xjβ)I(YjYi)}×I(YiT0)=ε^{XΔI(YT0)}0T0ε^{XZexp(Xβ)I(Ys)}ε^{Zexp(Xβ)I(Ys)}×dε^{ΔI(Ys)}. (2)

U defines a functional of four empirical processes for each fixed β. It is known that under mild regularity conditions, U(β) converges almost surely to U(β) for each fixed β, where

U(β)=ε{XΔI(YT0)}0T0ε{XZexp(Xβ)I(Ys)}ε{Zexp(Xβ)I(Ys)}dε{ΔI(Ys)}.

Under (M3) and minor regularity conditions, it can be proved that the two equalities

dε{ΔI(Ys)}=ε{Zexp(βX)I(Ys)}h0(s)ds

and

dε{XΔI(Ys)}=ε{XZexp(βX)I(Ys)}h0(s)ds

hold when β satisfies (M2). It follows that U(β)=0 if β is the true regression parameter. By applying the Cauchy–Schwartz inequality to the derivative of U, it can be further shown that the true regression parameter is the unique root (zero-crossing) of U.

In reality, we are not able to observe the value of Z, and thus cannot have the direct use of the score function U. Conditioning on (Xi, Yi, Zi), the expected value of mi is Ziexp(Xiα0(Yi). It is natural to estimate Zi by

Z^i=miΛ^0(Yi)eXiα^,

where Λ^0() and α^ are obtained from the estimation procedure discussed in the previous section. We propose a “borrow-strength estimation procedure” as follows. First, compute the individual frailty value Z^i. Next, estimate the empirical processes in the score function (2) by plugging in (Z^1,,Z^n), and, in the final step, use this working score function to estimate β.

Note that the estimate of Λ0(t), and hence of Zi, is obtained from the entire collection of recurrent event data, and that Z^i captures the subject-specific characteristics under model (M1)–(M3). The proposed estimator Z^i has desirable moment properties; as we show in the next section, the two processes ε^{Z^exp(Xβ)I(Ys)} and ε^{XZ^exp(Xβ)I(Ys)} converge almost surely to the limits ε{Zexp(Xβ)I(Ys)} and ε{Zexp(Xβ)I(Ys)} for each fixed β. Therefore, this strength-borrowing method allows the working score function to attain the same limit U as if the latent variable were observed. The zero-crossing of the working score function serves as an estimator of the zero-crossing of U, that is, β. To be specific, the working score function U^ of U is given by

U^(β)=ε^{XΔI(YT0)}0T0ε^{XZ^exp(Xβ)I(Ys)}ε^{Z^exp(Xβ)I(Ys)}dε^{ΔI(Ys)}, (3)

with the usual convention that 0/0 = 0. We show in Section 5 that U^ converges to U almost surely in a neighborhood of β. We then estimate β by β^, where U^(β^)=0 = 0.

If Z were observed, then the Breslow estimator H~0(t) of the baseline cumulative hazard function, H0, would be

H~0(t)=0tdε^{ΔI(Ys)}ε^{Zexp(Xβ^)I(Ys)},

which is a functional of two empirical processes. Under the conditional independence assumption of C and D, given (X, Z), we can show that the baseline cumulative hazard function, H0(t), is the limit of H~0(t).

As with the estimation procedure for the regression parameters, we propose an estimator of H0(t) as

H^0(t)=0tdε^{ΔI(Ys)}ε^{Z^exp(Xβ^)I(Ys)}. (4)

The limit of the estimator H^0(t) can be shown to be the functional of the limits of the two processes in (4), that is, H^0(t)H0(t) almost surely. We study the asymptotic normality of the proposed estimator H^0(t) in the next section.

5. LARGE-SAMPLE PROPERTIES

To study the large-sample properties of the proposed estimators, we impose the following regularity conditions:

  • (A1)

    Pr(YT0, Z > 0) > 0.

  • (A2)

    X is uniformly bounded.

  • (A3)

    EZ2 < ∞.

  • (A4)

    G(u) = E[ZI(Yu)] is a continuous function for u ∈ [0, T0].

Under these regularity conditions, the large-sample properties of Λ^0 and α^ were established by WQC (2001). Let tij denote the jth event time of the ith subject, and define the functions G(t) = E[Z1I(Y1t)], R(t) = G(t0(t), Q(t)=0tG(u)dΛ0(u), and, for i = 1, …, n,

bi(t)=i=1mi{tT0I(tijuYi)dQ(u)R2(u)I(t<tijT0)R(tij)}.

Under regularity conditions (A1)–(A4), it has been shown that Λ^0(t)Λ0(t)=n1i=1nΛ0(t)bi(t)+op(n12), for inf{y: Λ0(y) > 0} < t < T0, and that n(Λ^0(t)Λ0(t)) converges weakly to a normal distribution with mean 0 and variance Λ0(t)2E[b12(t)].

Define V* to be the joint probability measure of (w, X, m, Y) and

ei=wxtmbi(y)Λ0(y)dV(w,x,m,y)+wixit{miΛ0(yi)1exp(xiγ)}.

Then the left side of the estimating function (1) can be expressed as n1i=1nei+op(n12). Assuming that E[∂e1/∂γ] is nonsingular, we have n(α^α)=n12i=1nfi(α)+op(1), where fi(α) is the vector function E[−∂e1/∂γ]−1ei without the first entry, and n(α^α) converges to a multivariate normal distribution with mean 0 and variance E[f12].

Note that in the WQC model, the baseline cumulative intensity function was not assumed to satisfy Λ0(T0) = 1 as we assumed in (M1). The aforementioned asymptotic representations have been modified to accommodate the current model assumptions.

The weak convergence of n(ε^{XΔI(YT0)}ε{X×ΔI(YT0)}) and n(ε^{ΔI(Yt)}ε{ΔI(Yt)}) follow from the classical central limit theorem and example 2.11.16 of van der Vaart and Wellner (1996). The two empirical processes converge weakly to a mean-0 normal distribution, W1, and a mean-0 Gaussian process, W2.

Furthermore, letting V denote the joint probability density function of (X, Y, m) and arguing as in the proof of theorem 1 of WQC (2001), we are able to show that n(ε^{Z^exp(Xb)×I(Yt)}ε{Zexp(Xb)I(Yt)})=n12i=1nψ3i(t;b)+op(1), where

ψ3i(t;b)={mΛ01(y)ex(bα)I(yt)(xfi(α)+bi(y))}dV(x,y,m)+miΛ01(Yi)exi(bα)I(Yit)ε{ZeXbI(Yt)},

with the usual convention 0/0 = 0. Note that the ψ3i’s are uncorrelated random variables, because ψ3i(t, b) depends only on observed data from the ith individual. It follows from the law of large numbers that ε^{Z^exp(Xb)I(Yt)}ε{Zexp(Xb)×I(Yt)}0 almost surely, for each fixed b. Furthermore, by the central limit theorem, the process converges in finite dimension to a mean-0 Gaussian process W3 on the time interval [0, T0]. The explanatory variable X is assumed to be bounded, and without loss of generality we assume that X is a semi-positive definite matrix. Because items in ψ3i(t; b) are monotone processes for each b, the process ψ3i(t; b) is tight and converges weakly to W3 (see example 2.11.16 of van der Vaart and Wellner 1996). Similar arguments hold for ε^{XZ^exp(Xb)I(Yt)}ε{XZexp(Xb)×I(Yt)}0 almost surely, and the process n(ε^{XZ^exp(Xb)×I(Yt)}ε{XZ^exp(Xb)I(Yt)}) has the asymptotically iid representation n12i=1nψ4i(t;b)+op(1), where ψ4i is defined by

ψ4i(t;b)=mxΛ01(y)ex(bα)I(yt)(xfi(α)+bi(y))dV(x,y,m)+mixiΛ01(Yi)exi(bα)I(Yit)ε{XZeXbI(Yt)}.

Moreover, the process converges weakly to a mean-0 Gaussian process, denoted by W4.

We establish the consistency of β^ as follows. Define the two functions

An(b)=ε^{(XΔI(YT0)}(bβ)0T0ln(ε^{Z^exp(Xb)I(Ys)}ε^{Z^exp(Xβ)I(Ys)})dε^{ΔI(Ys)}

and

A(b)=ε{(XΔI(YT0)}(bβ)0T0ln(ε{Zexp(Xb)I(Ys)}ε{Zexp(Xβ)I(Ys)})dε{ΔI(Ys)}.

We can easily verify that U^(b) and U(b) are derivatives of An(b) and A(b), and that β is the unique maximum of A. Furthermore, β^ can be shown to be the unique maximum of An.

From the foregoing discussions, the four processes in U^ has the n-convergence rate; hence the four processes converge almost surely to their limits. Applying lemma 3 of Gill (1989) and the chain rule, we can show that the functional defined by U^ is continuous with respect to the supremum norm under regularity conditions (A1)–(A4). Then, for some compact neighborhood B of β, as n → ∞, supbBU^(b)U(b)0 almost surely. Applying Taylor expansion and using the fact that An(β) = A(β) = 0, we have that An(b)A(b)={U^(β)U(β)}(bβ), where β* lies between b and β. Now it is clear that as n → ∞, supbBAn(b)A(b)0 almost surely.

Define Γ^(b)=dU^(b)db=d2An(b)db2 and Γ^(b)=dU(b)db=d2A(b)db2, that is,

Γ^(b)=0T0{ε^{X2Z^eXbI(Ys)}ε^{Z^eXbI(Ys)}}+{ε^{XZ^eXbI(Ys)}2ε^{Z^eXbI(Ys)}2}dε^{ΔI(Ys)}

and

Γ(b)=0T0{ε{X2ZeXbI(Ys)}ε{ZeXbI(Ys)}}+{ε{XZeXbI(Ys)}2ε{ZeXbI(Ys)}2}dε{ΔI(Ys)}.

We can show that Γ^(b) and Γ(b) are both negative definite, and it follows that An and A are concave. By Lenglart’s theorem (appendix II in Andersen and Gill 1982), the unique maximum of An, β^, converges in probability to the unique maximum of A, that is, β. Hence, we establish the consistency of β^.

Note that {Z^1,,Z^n} are correlated because these values are estimated from the entire collection of recurrent event data; therefore, martingale theory does not apply to the working score function, U^. In this article we study the large-sample properties of β^ and H^0(t) by empirical process theories and the functional delta method. For convenience, we denote a2 = aaT for any vector a. We present asymptotic theories in Lemmas 1–3, with the proofs given in Appendix B, and summarize these results in Theorem 1.

Lemma 1. Under regularity conditions (A1)–(A4) and the assumption that Ψ = E [∂e1/∂γ] is nonsingular, n12U^(β) is the sum of asymptotically uncorrelated random variables; n12U^(β)=n12i=1nψi(β)+op(1), where ψi(β) is defined in Appendix B. Moreover, nU^(β) converges weakly to a normal distribution with mean 0 and variance–covariance matrix Σ(β) = E[ψi(β)2].

Note that the variance–covariance matrix Σ can be consistently estimated by Σ^(β^), where Σ^(β^) is defined in Appendix B. To study the large-sample property of β^, we further define Γ(β)=U(β)β and Γ^(β)=U^(β)β.

Lemma 2. Assume that Ψ and Γ = Γ (β) are both nonsingular. Then, under regularity conditions? (A1)–(A4), n(β^β)=n12i=1nΓ1ψi(β)+op(1), where ψ(β) is defined in Appendix B. Thus n(β^β) converges weakly to a normal distribution with mean 0 and variance–covariance matrix Γ−1Σ(Γ−1)T, which can be consistently estimated by Γ^(β^)1Σ^(β^){Γ^(β^)1}T.

Lemma 3. Under regularity conditions (A1)–(A4) and by assuming that Ψ and Γ are nonsingular, the cumulative hazard function, H0(t), can be expressed as the sum of asymptotically uncorrelated random variables, n12{H^0(t)H0(t)}=n12i=1nϕi(t)+op(1), where t ∈ [0, T0] and φi(t) is defined in Appendix B. Then n12{H^0(t)H0(t)} converges weakly on [0, T0] to a mean-0 Gaussian process with variance–covariance function E[φ1 (t1)φ1 (t2)].

Along with the results stated in Section 4.1 and following directly from Lemmas 2 and 3, we state the main asymptotic theorem.

Theorem 1. Assume that Γ and Ψ are nonsingular. Under regularity conditions (A1)–(A4), for each fixed s, inf{y: Λ0(y) > 0} < s < T0, and fixed t, t ∈ [0, T0], the random vector n(α^α,β^β,Λ^0(s)Λ0(s),H^0(t)H0(t)) converges weakly to a multivariate normal distribution with mean 0 and variance–covariance matrix E[η12], where the ηi’s are uncorrelated random vectors defined by ηi = (fi(α), Γ ψi, (β), Λ0(s)bi(s), φi(t)).

6. SIMULATIONS AND DATA ANALYSIS

6.1 Monte Carlo Simulations

We conducted studies to assess the performance of the proposed estimators. For all simulation studies, we generated 1,000 simulated dataseis, each with n = 200 and n = 500 independent subjects. The explanatory variable X was generated from a Bernoulli distribution with P(X = 0) = P(X = 1) = .5, and the subject-specific latent variable Z was generated from a discrete (poisson with mean 10) and a continuous (gamma with mean 10 and variance 50) distribution. Given X = x and Z = z, the subject’s underlying recurrent event process {N(t), t ∈ [0, 10]} is a nonstationary Poisson process with the corresponding intensity function zλ0(t) exp(), and the subject’s failure time D has a hazard function zh0(t) exp(). To examine the performance of proposed estimators under different choices of (α, β) and (λ0(·), h0(·)), we also consider combinations corresponding to (α, β) = (0,0) and (−1, −1.5) and the following two sets of functions for λ0(t) and h0(t).

  • Scenario I: λ0(t) = 1/10, and h0(t) = t/400;

  • Scenario II: h0(t) = (t + 1)/10, t200.

Finally, the censoring time C is either a exponential variable with mean 10 when x = 1 or a exponential variable with mean 300/z2 when x = 0. Given (x, z), the triplets (N(·), D, C) are mutually independent.

Suppose that the censoring time C is the potential dropout time. The justification for such a design for the censoring variable is as follows. Suppose that the frailty is an unobserved health indicator. In the control group (X = 0), sick patients with a high occurrence rate of recurrent events drop out early due to large values of frailty; in the treatment group (X = 1), in contrast, because the treatment has effectively reduced the event occurrence rates, the dropout is noninformative for both the recurrent event process and the failure time.

As summarized in Table 1, the average death rate ranges from 13% to 28%, the average length of follow-up period ranges from 3.9 to 4.91, and the average number of observed recurrent events ranges from 1.57 to 3.65 in the conducted simulation studies. Noted that the average follow-up time is approximately the same under different choices of (λ0, h0), but the average number of observed events is smaller under Scenario II. The result of simulation studies is summarized in Table 2. For each simulation study, the empirical bias, standard error, and correlation coefficient of proposed estimators were calculated based on 1,000 samples. Figures 1 and 2 show the estimates and the pointwise 95% confidence intervals of the baseline cumulative intensity function and baseline cumulative hazard function. As shown in Table 2 and Figures 1 and 2, the proposed estimator performs reasonably well; that is, the empirical bias in the estimates of regression parameters are small, and the averages of Λ^0(t) and H^0(t) are almost indistinguishable from the true curves. Note that the parameter estimates under Z ~ poisson(10) have smaller standard errors than those under Z ~ gamma(2, 5), and the empirical correlation coefficients between α^ and β^ are smaller under the assumed Poisson distribution; this is because the poisson(10) distribution has smaller variability than the gamma(2, 5) distribution; that is, the defined population is more homogeneous under Z ~ poisson(10).

Table 1.

Summary of the Simulated Data

Z ~ poisson(10)
Z ~ gamma(2, 5)
(α, β) P(death) Y m P(death) Y m
Scenario I: λ0(t) = 1/10, h0(t) = t/400
(0, 0) .27 3.90 3.64 .26 4.44 3.40
(−1, −1.5) .14 4.41 2.34 .14 4.91 2.19
Scenario II: λ0(t) = (t + 1)/60, h0(t)=t200
(0, 0) .28 3.80 2.40 .26 4.35 2.26
(−1, −1.5) .15 4.36 1.59 .14 4.87 1.58

NOTE: P(death) is the average death rate; Y is the average terminating time; m is the average number of recurrent events.

Table 2.

Summary Statistics of the Simulation Studies

Z ~ poisson(10)
Z ~ gamma(2, 5)
(α, β) Bα Vα Bβ Vβ ρ Avgβ^ Vβ* Bα Vα Bβ Vβ ρ Avgβ^ Vβ*
Scenario I: λ0(t) = 1/10, h0(t) = t/400
n = 200
(0, 0) 0 181 −3 383 .43 .383 334 4 274 −4 438 .61 .584 331
(−1, −1.5) −8 255 −25 487 .49 −1.228 406 −5 277 −25 484 .50 −.769 395
n = 500
(0, 0) −3 151 −1 243 .58 .244 191 7 199 21 283 .67 .599 207
(−1, −1.5) 0 176 −6 307 .54 −1.211 249 −13 212 −1 324 .63 −.752 244
Scenario II: λ0(t) = (t + 1)/60, h0(t)=t200
n = 200
(0, 0) −13 412 2 496 .75 .211 305 17 511 23 608 .83 .539 319
(−1, −1.5) −38 422 −44 558 .64 −1.253 400 −51 553 −17 681 .78 −.846 410
n = 500
(0, 0) 10 243 15 303 .72 .208 197 4 358 16 399 .86 .538 192
(−1, −1.5) −23 287 −16 373 .71 −1.246 246 −18 369 −15 445 .81 −.822 246

NOTE: and are the empirical bias (× 1,000) of α^ and β^ Vα and Vβ are the empirical standard error (× 1,000) of α^ and β^ ρ is the empirical correlation coefficient of α^ and β^ Avg β^ is the empirical average and Vβ, is the empirical standard error (× 1,000) of the estimator based on the Cox proportional hazards model.

Figure 1.

Figure 1

Plots of Estimated Λ^0(t) and H^0(t) With Pointwise 95% Confidence for n = 200. Scenario I: λ0(t)= 1/10, and h0(t) = t/400; Scenario II: λ0(t) (t + 1)/60, h0(t)=t200 (—-, true curve; – – – – –; empirical average;………, pointwise 95% confidence intervals).

Figure 2.

Figure 2

Plots of Estimated Λ^0(t) and H^0(t) With Pointwise 95% Confidence for n = 500. Scenario I: λ0(t)= 1/10, and h0(t) = t/400; Scenario II: λ0(t) (t + 1)/60, h0(t)=t200 (—-, true curve; – – – – –; empirical average;………, pointwise 95% confidence intervals).

With data generated by model (M1)–(M3), it is interesting to see results from the use of a popular but incorrect model, that is, the proportional hazards model, h(t)=h0(t)exp(xβ), for the failure time data. Using the partial likelihood method (Cox 1972), Table 2 also reports the average and empirical standard error of the 1,000 estimates of β*. Note that using the Cox proportional hazards model, which incorrectly assumes the independent censoring assumption, results in biased estimation of the treatment effect. This phenomenon can be explained as follows. In the simulated control group (X = 0), sicker patients with higher hazards tend to drop out at earlier times; thus risk sets are likely to consist of healthier patients at later time points. As a result, the estimates given by the Cox proportional hazards model based on comparisons of subjects within risk sets under-estimate the treatment effect when treatment reduces the risk of death and conclude that treatment is associated with increased risk of death when the treatment does not affect the mortality rate.

6.2 Data Analysis

A Denmark registry dataset recorded the initial and recurrences of hospitalizations and associated patient information from 8,811 patients whose first schizophrenia-related hospitalization occurred between April 1, 1970 and March 25, 1988 (Eaton et al. 1992a,b). The catchment area for the register is the entire nation of Denmark. The dataset provides a large collection of repeated psychiatric measurements as well as recorded hospitalization episodes. All death records in Denmark are linked into the register.

Table 3 summarizes numbers of hospital admissions and deaths for subgroups by gender and age of onset. Comparing crude proportions seems to suggest that patients whose first hospitalization occurred after age 20 tend to have fewer hospitalizations but are more likely to die before the end of study. The hospitalizations and survival experiences do not look very different in males and females based on these summary statistics.

Table 3.

Hospital Admissions and Deaths for Different Subgroups

No. of hospital admissions since entry
Subgroup No. of
patients
No. of
deaths
0 1 2 3 4 5 6
Male 3,318 368 984 581 394 331 200 157 671
 (%) 100 11.1 29.7 17.5 11.9 10 6 4.7 20.2
Female 5,493 685 1,392 945 636 470 363 279 1,408
 (%) 100 12.5 25.3 17.2 11.6 8.6 6.6 5.1 25.6
Onset age ≤20 1,065 76 187 130 144 90 82 59 373
 (%) 100 7.1 17.6 12.2 13.5 8.5 7.7 5.5 35.0
Onset age >20 7,746 977 2,189 1,396 886 711 481 377 1,706
 (%) 100 12.6 28.3 18.0 11.4 9.2 6.2 4.9 22.0

We apply the proposed joint model to the Denmark schizophrenia cohort data and investigate the effects of gender and age of onset on the rate of hospitalization and the risk of death. The gender indicator is set to be 1 for male and 0 for female, and the indicator of age onset is set to be 1 for under 20 years of age and 0 for 20 years old and older.

To estimate the standard errors of α^, β^, Λ^0(t), and H^0(t) at selected time points, we adopted a nonparametric bootstrap method for clustered data by repeatedly sampling 8,811 subjects with replacement, using subject as the sampling unit, from the schizophrenia cohort data. The results of the data analysis are summarized in Table 4. Estimates of Λ0(t) and H0(t), and their pointwise 95% bootstrap confidence intervals are given in Figure 3. Table 4 shows that patients with early onset (age ≤ 20 years) are hospitalized more often (21% higher) and have a lower risk of death (57% lower) than patients with later onset. Moreover, being a male decreases one’s rate of hospitalization episodes by 16% (≈ 1 − e−.18) and risk of death by 10% (≈ 1 − e−.11). The estimated covariate effects are statistically significant, except for the gender effect on the risk of death, which is marginally significant. It is interesting to see that age of onset has opposite effects on hospital admissions rate and the hazard of death; this is not surprising, however, because young patients tend to have longer life expectancy. Also, the analysis confirms the theory in schizophrenia that patients with early onset age tend to be hospitalized more often than those with later onset age.

Table 4.

Summary of Denmark PCR Data Analysis

Risk factor Estimate SE 95% bootstrap CI
Hospital admissions
Onset age ≤ 20 .19 .04 (.10, .27)
Gender −.18 .04 (−.26, −.09)
Death
Onset age ≤ 20 −.84 .13 (−1.10, −.62)
Gender −.11 .07 (−.25, .01)

NOTE: SE, standard error of estimates from the 200 bootstrap samples; 95% bootstrap CI, (2.5%, 97.5%) quantiles of the 200 estimates.

Figure 3.

Figure 3

Plots of Λ^0(t) and H^0(t) for the Denmark Schizophrenia Cohort Data, With Pointwise 95% Bootstrap Confidence Intervals, (a) Baseline cumulative rate function; (b) baseline cumulative hazard function (—, estimates;…….., 95% pointwise CI).

In the case of a degenerate frailty, the Cox proportional hazards model gives estimates of −.74 [standard error (SE) = .12] and −.14 (SE = .06) for the effects of early onset and gender. The direction of covariate effects estimated in the Cox proportional hazards model are consistent with the estimates under the proposed model.

7. DISCUSSION

Frailty models are commonly adopted in modeling multivari ate survival time data (Clayton 1978; Oakes 1982) and in jointly modeling repeated measures and survival time data (Henderson, Diggle, and Dobson 2000; Lin, Turnbull, McCulloch, and Slate 2002). In this article, we propose a semiparametric joint model for the recurrent event process and failure time data. A latent variable (frailty) is assumed to act as a multiplicative factor in both the intensity function and the hazard function, and hence induces the correlation between the event process and the failure time. Unlike the usual setting of frailty models, where a parametric distribution is assumed for the frailty, a specific feature of our model is that the frailty distribution is treated as a nuisance parameter and no parametric assumptions were imposed. Additionally, via the use of frailty, the proposed model relaxes the independent censoring condition for observing both the recurrent event process and the failure time data.

For a semiparametric model like (M1)–(M3), model checking is expected to be a difficult task in general. In this article we do not intend to develop methods for formal model checking, we simply suggest possible approaches for validating model assumptions. A rigorous study of model checking methods will be done elsewhere. To test the assumption of a common baseline intensity function shared by all subjects, we use the fact that, under (M1) and conditioning on (mi, xi, yi, zi), tij are iid with the cdf F(t)I(0 ≤ tyi)/F(yi). Define Vij = F(tij)I(0 ≤ tijyi)/F(yi); then Vij are order statistics of iid uniform(0, 1) random variables. Let V^ij=F^(tij)I(0tijyi)F^(yi); then a necessary condition to validate the assumption of sharing a common intensity function is to check whether the empirical distribution of {V^ij:j = 1, …, mi; i = 1, …, n} is approximately uniform(0, 1) distribution. To check on the proportional rate and hazards model assumption imposed by (M1) and (M2), replace Z with Z^ to derive the Schoenfeld residuals (Schoenfeld 1982). If the assumption of proportional hazards holds, then the derived residuals are expected to randomly scattered around 0 and to gradually converge to 0 over time.

In this article we proposed a borrow-strength procedure by first estimating the value of the latent variable from recurrent event data, then using the estimated value in the failure time models. The central idea of estimation is to use moment properties of Z^ so that the partial score functions, with Z or Z^, attain the same convergence function. The proposed Z^ requires no parametric assumption on Z and is easy to compute. As opposed to this approach, an alternative choice is to estimate Z by the posterior mean of Z given the observed recurrent event data; however, as discussed in Proposition 1, the posterior mean does not have an explicit form in our model setting, and thus is not a useful choice in theory or application.

The proposed estimation procedure is not without constraints. It is applicable only to time-independent covariates. In some applications, it would be desirable to develop estimation procedures that allow for both time-invariant and time-dependent covariates. Also, the propositions and trajectories described in Section 3 help understand the general relationship between the recurrent event process and the failure time. However, the probability formulas established in Section 3 cannot be made explicit unless the unknown parameters in the formulas are known or estimable, and accomplishing such a task requires more parametric modeling and alternative estimation procedures. Such work will be considered elsewhere in the future. Finally, the proposed time-to-events models assume that a common baseline intensity/rate function is shared by all subjects and that the intensity/rate function does not change after the occurrence of an event. To characterize the possible change in the risk of event occurrence after each event time, techniques for time-between-events models by, say, Prentice et al. (1981) and Chang and Wang (1999), can be adopted.

Acknowledgments

The content of this article is based on the first author’s Ph.D. dissertation conducted at Johns Hopkins University. Part of the research was supported by National Institute of Health grants R01 HD38209 and R01 MH56639. The authors thank Preben Bo Mortensen and William Eaton for generously providing anonymous Denmark schizophrenia data for illustrating the proposed methods.

APPENDIX A: PROOFS OF PROPOSITIONS

Proposition 1

The probability density function of the event history given the value of the frailty and the termination time can be expressed as

f(H(y)z,y)=f(H(y)N(y),z,y)f(N(y)z,y)=(j=1N(y)λ0(tj)Λ0(y))f(N(y)z,y),

where f(N(y)|z, y) is the probability density function of the number of observed recurrent events given the value of the frailty and the termination time. Consequently,

f(H(y)y)=f(H(y)z,y)f(zy)dz=(j=1N(y)λ0(tj)Λ0(y))f(N(y)y).

Thus we can write the posterior mean of Z, given the observed recurrent event data, as

E[ZH(y),y]=zf(H(y)z,y)f(zy)f(H(y)y)dz=zf(N(y)z,y)f(zy)f(N(y)y)dz=zexp(zΛ0(y))(zΛ0(y))N(y)N(y)!f(zy)f(N(y)y)dz=N(y)+1Λ0(y)f(N(y)+1z,y)f(zy)f(N(y)y)dz=N(y)+1Λ0(y)f(N(y)+1y)f(N(y)y).

Proposition 2

For 0 ≤ tt + sT0, the survival function of the residual life time after time t, given the event history before and up to time t, can be expressed as P(Dt+sH(t),Dt)=P(Dt+s,H(t))P(Dt,H(t)), where

P(Dt+s,H(t))=P(Dt+sz)f(H(t)z)fZ(z)dz=P(Dt+sz)f(H(t)N(t),z)f(N(t)z)fZ(z)dz=ezH0(t+s)(j=1N(t)λ0(tj)Λ0(t))ezΛ0(t)(zΛ0(t))N(t)N(t)!fZ(z)dz=1N(t)!j=1N(t)λ0(tj)E[eZ{H0(t+s)+Λ0(t)}ZN(t)],

and, similarly,

P(Dt,H(t))=(N(t)!)1j=1N(t)λ0(tj)E[eZ{H0(t)+Λ0(t)ZN(t)}].

We then simplify the formula

P(Dt+sH(t),Dt)=E[eZ{H0(t+s)+Λ0(t)}ZN(t)]E[eZ{H0(t)+Λ0(t)}ZN(t)].

Proposition 4

Following (M3), the mean function of the recurrent event conditional on the failure time can be expressed as

E[N(s)Dt]=E[N(s)z]P(Dtz)P(Dt)fZ(z)dz=zΛ0(s)ezH0(t)E[eZH0(t)]fz(z)dz=E[ZeZH0(t)]E[eZH0(t)]Λ0(s). (A.1)

The partial derivative of the right side term in (A.1) with respect to t can be derived as

E[ZeZH0(t)]2E[Z2eZH0(t)]E[eZH0(t)]E[eZH0(t)]2h0(t)Λ0(s).

The partial derivative can be shown to be nonpositive by applying the Cauchy–Schwartz inequality, and, as a result, the mean function in (A.1) is decreasing in t, ts.

APPENDIX B: PROOFS OF LEMMAS

Proof of Lemma 1

Straightforward algebra yields

n12U^(β)=1ni=1n0T0Xiε^{XZ^eXβI(Ys)}ε^{Z^eXβI(Ys)}×{dΔiI(Yis)Z^ieXiβI(Yis)h0(s)ds}.

Because the mapping of U^ from the four empirical processes, under the regularity conditions, is compactly differentiable with respect to the supremum norm and the four empirical processes converge weakly to their limits, we apply the functional delta method to U^ and establish its asymptotic representation n12U^(β)=n12i=1nψi(β)+op(1), where

ψi(β)=XiΔiI(YiT0)ε{XΔI(YT0)}+0T0ψ3i(s;β)ε{XZeXβI(Ys)}ε{ZeXβI(Ys)}2dε{ΔI(Ys)}0T0ψ4i(s;β)ε{ZeXβI(Ys)}dε{ΔI(Ys)}0T0ε{XZeXβI(Ys)}ε{ZeXβI(Ys)}d(ΔiI(Yis)ε{ΔI(Ys)}).

Note that the ψi’s are uncorrelated random variables, because ψi depends only on the observed data of the ith individual. Following the classical central limit theorem, n12U^(β) is asymptotically normal with mean 0 and variance–covariance matrix Σ(β) = E[ψi(β)2]. Define ψ^i(β) by substituting empirical processes for their limits in ψi, and define Σ^(β)=n1i=1n{ψ^i(β)ψ(β)}{ψ^i(β)ψ(β)}T, where ψi*(β) is the average over ψ^1(β),,ψ^n(β). It can be shown that the second moment of ψ^i(β) exists, and it follows from the strong law of large numbers that Σ^(β) converges to its limit, Σ(β), uniformly. Arguing as in the proof for the consistency of β^, we can show that the functional defined by Σ^ satisfies supbBΣ^(b)Σ(b)0 almost surely. By the consistency of β^, as well as the continuity of Σ(b) at β, we are able to show that Σ^(β^) is a consistent estimator of Σ(β). Moreover, in terms of the notations used before, we can rewrite the limit of nU^(β) as

W10T0W4(s)h0(s)ds0T0ε{XZeXβI(Ys)}ε{ZeXβI(Ys)}{dW2(s)W3(s)h0(s)ds}.

Proof of Lemma 2

Define Γ^(b)=dU^(b)db, that is,

Γ^(b)=0T0{ε^{X2Z^eXbI(Ys)}ε^{Z^eXbI(Ys)}}+{ε^{XZ^eXbI(Ys)}2ε^{Z^eXbI(Ys)}2}dε^{ΔI(Ys)}.

It can be shown that Γ^(b) defines a functional of four empirical processes. Arguing as in the proof of consistency of β^, we can show that Γ^(b)Γ(b) in a neighborhood B of β, where Γ(b) is the derivative of U and

Γ(b)=0T0{ε{X2ZeXbI(Ys)}ε{ZeXbI(Ys)}}+{ε{XZeXbI(Ys)}2ε{ZeXbI(Ys)}2}dε{ΔI(Ys)}.

Applying Taylor expansion, we have U^(β^)U^(β)=Γ^(β)(β^β), where β* lies on the segment between β^ and β. In light of the consistency of β^, and therefore β*, for β, as well as the continuity of Γ(β) at β, Γ^(β) converges to Γ(β) almost surely. By Slutsky’s theorem, n(β^β) converges to a normal distribution with mean 0 and covariance matrix Γ(β)−1Σ(β){Γ(β)−1}T, where Σ(β) = E[ψ1(β)ψ1 (β)T]. Arguing as before, Γ(β) can be consistently estimated by Γ^(β^), and, as a result, Γ^(β^)1Σ^(β^){Γ^(β^)1}T is a consistent variance estimator.

Proof of Lemma 3

Define the functions H^0(t;b)=0tdε^{ΔI(Ys)}ε^{Z^eXbI(Ys)} and H0(t;b)=0tdε{ΔI(Ys)}ε{ZeXbI(Ys)}. H^0(t;b) is a continuous functional of two processes because the denominator is bounded away from 0. The almost-sure convergence of the two processes can be established from the previous discussions. It can be shown that supt[0,T0]bBH^0(t;b)H0(t;b)0 almost surely. Then the consistency of H^0(t,β^) for H0(t) follows the strong consistency of β^ for β.

A Taylor expansion of H^0(t;β^) about β gives

H^0(t;β^)=H^0(t;β)+H^0(t;b)bb=βt(β^β), (B.1)

where βt depends on t and lies on the line segment between β^ and β. By a similar argument used earlier, we can show that H^0(t;b)bb=βt, converges in probability to ∂H0(t; b)/∂b|b=β for t ∈ [0, T0]. Moreover, the functional delta method applied to H^0(t,β) yields

n(H^0(t;β)H0(t;β))=0tn(ε^{Z^eXβI(Ys)}ε{ZeXβI(Ys)})ε{ZeXβI(Ys)}2dε{ΔI(Ys)}+0tdn(ε^{ΔI(Ys)}ε{ΔI(Ys)})ε{ZeXβI(Ys)}+op(1).

Following Theorem 2, n(β^β)=Γ(β)nU^(β)+op(1), and, by definition, H0(t,β) = H0(t). From (B.1), the estimator of the baseline cumulative hazard function can be expressed as

n{H^0(t;β^)H0(t)}=n{H^0(t;β)H0(t;β)}+H^0(t;b)bb=βtn(β^β)=1ni=1nϕ1(t)+op(1),

where φi(t) is defined by

ϕi(t)=0tψi(3)(s;β)dε{ΔI(Ys)}ε{ZeXβI(Ys)}2+0td(ΔiI(Yis)ε{ΔI(Ys)})ε{ZeXβI(Ys)}H0(t;β)βΓ(β)ψi(β).

Because φi(t) is a linear combination of monotone processes with bounded second moments, the weak convergence of n(H^0(t;β^)H0(t)) follows from example 2.11.16 of van der Vaart and Wellner (1996).

Contributor Information

Chiung-Yu Huang, Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 (cyhuang@biostat.umn.edu).

Mei-Cheng Wang, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 (mcwang@jhsph.edu).

REFERENCES

  1. Andersen PK, Gill RD. Cox’s Regression Model for Counting Processes: A Large-Sample Study. The Annals of Statistics. 1982;10:1100–1120. [Google Scholar]
  2. Byar DP. The Veterans Administration Study of Chemoprophylaxis for Recurrent Stage I Bladder Tumors: Comparisons of Placebo, Pyridoxine, and Topical Thiotepa. In: Pavone-Maculuso M, Smith PH, Edsmyr F, editors. Bladder Tumors and Other Topics in Urological Oncology. Plenum; New York: 1980. pp. 363–370. [Google Scholar]
  3. Chang S-H, Wang M-C. Conditional Regression Analysis for Recurrence Time Data. Journal of the American Statistical Association. 1999;94:1221–1230. [Google Scholar]
  4. Clayton DG. A Model for Association in Bivariate Lifetables and Its Application in Epidemiological Studies of Family Tendency in Chronic Disease Incidence. Biometrika. 1978;65:141–151. [Google Scholar]
  5. Cox DR. “Regression Models and Life Tables” (with discussion) Journal of the Royal Statistical Society. 1972;34:187–220. Ser. B. [Google Scholar]
  6. Cox DR, Oakes DA. Analysis of Survival Data. Chapman & Hall; New York: 1984. [Google Scholar]
  7. Eaton WW, Mortensen PB, Herrman H, Freeman H, Bilker W, Burgess P, Wooff K. Long-Term Course of Hospitalization for Schizophrenia: Part I. Risk for Hospitalization. Schizophrenia Bulletin. 1992a;18:217–228. doi: 10.1093/schbul/18.2.217. [DOI] [PubMed] [Google Scholar]
  8. Eaton WW, Bilker W, Haro JM, Herrman H, Mortensen PB, Freeman H, Burgess P. Long-Term Course of Hospitalization for Schizophrenia: Part II. Change With Passage of Time. Schizophrenia Bulletin. 1992b;18:229–241. doi: 10.1093/schbul/18.2.229. [DOI] [PubMed] [Google Scholar]
  9. Ghosh D, Lin DY. Semiparametric Analysis of Recurrent Events in the Presence of Dependent Censoring. Biometrics. 2003;59:877–885. doi: 10.1111/j.0006-341x.2003.00102.x. [DOI] [PubMed] [Google Scholar]
  10. Gill RD. Non- and Semi-Parametric Maximum Likelihood Estimators and the von Mises Method (Part 1) Scandinavian Journal of Statistics. 1989;16:97–124. [Google Scholar]
  11. Henderson R, Diggle P, Dobson A. Joint Modelling of Longitudinal Measurements and Event Time Data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
  12. Huang Y, Wang M-C. Frequency of Recurrent Events at Failure Time: Modeling and Inference. Journal of the American Statistical Association. 2002;98:663–670. [Google Scholar]
  13. Lancaster T, Intrator O. Panel Data With Survival: Hospitalization of HIV-Positive Patients. Journal of the American Statistical Association. 1998;93:46–53. [Google Scholar]
  14. Lin DY, Wei LI, Yang I, Ying Z. Semiparametric Regression for the Mean and Rate Functions of Recurrent Events. Journal of the Royal Statistical Society. 2000;62:711–730. Ser. B. [Google Scholar]
  15. Lin H, Turnbull BW, McCulloch CE, Slate EH. Latent Class Models for Joint Analysis of Longitudinal Biomarker and Event Process Data: Application to Longitudinal Prostate-Specific Antigen Readings and Prostate Cancer. Journal of the American Statistical Association. 2002;97:53–65. [Google Scholar]
  16. Oakes D. Bivariate Survival Models Induced by Frailties. Journal of the American Statistical Association. 1989;84:487–93. [Google Scholar]
  17. Pepe MS, Cai J. Some Graphical Displays and Marginal Regression Analyses for Recurrent Failure Times and Time-Dependent Covariates. Journal of the American Statistical Association. 1993;88:811–820. [Google Scholar]
  18. Prentice RL, Williams BJ, Peterson AV. On the Regression Analysis of Multivariate Failure Time Data. Biometrika. 1981;68:373–379. [Google Scholar]
  19. Schoenfeld D. Partial Residuals for the Proportional Hazards Regression Model. Biometrika. 1982;69:239–241. [Google Scholar]
  20. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer-Verlag; New York: 1996. [Google Scholar]
  21. Wang M-C, Jewell NP, Tsai W-Y. Asymptotic Properties of the Product Limit Estimate Under Random Truncation. The Annals of Statistics. 1986;14:1597–1650. [Google Scholar]
  22. Wang M-C, Qin J, Chiang C-T. Analyzing Recurrent Event Data With Informative Censoring. Journal of the American Statistical Association. 2001;96:1057–1065. doi: 10.1198/016214501753209031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Wassell JT, Wojciechowski WC, Landen DD. Recurrent Injury Event-Time Analysis. Statistics in Medicine. 1999;18:3355–3363. doi: 10.1002/(sici)1097-0258(19991215)18:23<3355::aid-sim322>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

RESOURCES