Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 22.
Published in final edited form as: Int J Biostat. 2012 Jun 22;8(1):Article–15. doi: 10.1515/1557-4679.1233

Estimation in a semi-Markov transformation model

Dorota M Dabrowska 1
PMCID: PMC3405912  NIHMSID: NIHMS380985  PMID: 22740583

Abstract

Multi-state models provide a common tool for analysis of longitudinal failure time data. In biomedical applications, models of this kind are often used to describe evolution of a disease and assume that patient may move among a finite number of states representing different phases in the disease progression. Several authors developed extensions of the proportional hazard model for analysis of multi-state models in the presence of covariates. In this paper, we consider a general class of censored semi-Markov and modulated renewal processes and propose the use of transformation models for their analysis. Special cases include modulated renewal processes with interarrival times specified using transformation models, and semi-Markov processes with with one-step transition probabilities defined using copula-transformation models. We discuss estimation of finite and infinite dimensional parameters of the model, and develop an extension of the Gaussian multiplier method for setting confidence bands for transition probabilities. A transplant outcome data set from the Center for International Blood and Marrow Transplant Research is used for illustrative purposes.

1 Introduction

We consider estimation in a semi-Markov regression model with a finite state space Inline graphic = {1, …, r}. In the absence of covariates, the model can be described by a sequence (T, J) = {(Tn, Jn): n ≥ 0}, where T0 < T1 < T2 … are consecutive times of entrances into the states J0, J1, J2, …, JnInline graphic = {1, …, r}. The sequence J = {Jn: n ≥ 0} of states visited forms a Markov chain and given J, the sojourn times T1, T2T1, … are independent with distributions depending on the adjoining states only. Alternatively, the distribution of the sojourn times Tn+1Tn, n ≥ 0 satisfies

P(Tn+1-TnxJn+1=jJ0,T0,J1,T1,,Jn,Tn)=P(Tn+1-TnxJn+1=jJn).

Properties of semi-Markov processes were discussed in some detail in classical papers of Pyke (1961,a b), Pyke and Schaufele (1964,1966), and textbooks of Cinlar (1975), Daley and Vere-Jones (1988), Karr (1991), Last and Brandt (1995) and Limnios and Oprisan (2001). Numerous examples of applications to areas such as reliability, insurance and finance were provided by Janssen (1999), Janssen and Manca (2006,2007) and Janssen and Limnios (2001), for instance. In such studies, it is most common to consider estimation methods assuming that a single realization of a semi-Markov process is observed over a finite time interval [0, τ] whose length tends to infinity (τ ↑ ∞). Greenwood and Wefelmeyer (1996) and Greenwood, Müller and Wefelmeyer (2004) developed a general framework for analysis of non- and semi-parametric semi-Markov processes in this setting. In particular, they studied properties of classical estimators of the jump frequency and the proportion of visits to a given state, as well as Moore and Pyke’s (1968) non-parametric estimator of the kernel of the process. Estimation of transition intensities and transition probabilities was considered by Ouhbi and Limnios (1996,1999).

In survival analysis, it is more common to consider estimation based on a large number of iid copies of a semi-Markov process observed over a deterministic or random time intervals. Lagakos, Sommer and Zelen (1978), Gill (1980), Voelkel and Crowley (1984) and Phelan (1999) developed nonparametric estimators of the semi-Markov kernel of the process in the presence of random censoring. Examples of applications of these processes to analysis of survival data can be found in Commenges (1986), Keiding (1986), Dabrowska et al. (1994), Chang et al. (1994, 1999,2000), Cook and Lawless (2007), among others.

In this paper, we assume that the evolution of the process (Tm, Jm)m≥0 depends also on an Rd-valued covariate (Zm)m≥0, Zm = [Zjm: jInline graphic], which represents either a vector of time independent covariates, or a vector of time dependent covariates changing at the successive renewal times. As an extension of the semi-Markov process to the regression setting, Cox (1973) proposed to consider a proportional hazards modulated renewal process. More precisely, let Ñ = {Ñj(t): t ≥ 0, j = (j1, j2) ∈ Inline graphic × Inline graphic} be the counting process registering transitions among adjoining states of the model,

Nj(t)=m01(Tm+1t,Jm+1=j2,Jm=j1).

Cox’s model assumes that the compensator of this process, relative to the self-exciting filtration { Inline graphic}t≥0, is given by Λj(0) = 0,

Λj(t)=Λj(Tm)+0t-Tm1(Jm=j1)eβTZj1mΓj(du)

for t ∈ (Tm, Tm+1] and j = (j1, j2) ∈ Inline graphic × Inline graphic. Here β is a regression coefficient and Γj in an unknown cumulative hazard function. If covariates are time independent and Γj(x) = γjx, the process reduces to a Markov chain regression model. In the general case, the modulated renewal process allows to incorporate dependence of the history on the sequence of states visited and the length of time spent in each state. As a result of this, it has a more flexible structure than Markov chains.

The purpose of this paper is to extend Cox’s modulated process to a class of transformation models. In the case of single spell models, they provide a common alternative to the proportional hazard model. In particular, they may be more appropriate than the proportional hazard model if relative differences between covariates dissipate or diverge over time. As an extension to multistate models, we consider here a modulated renewal process assuming that the counting process Ñ has compensator given by Λj(0) = 0,

Λj(t)=Λj(Tm)+0t-Tm1(Jm=j1)αj(Γ(j1,.)(u),θ,Zj1m)Γj(du) (1.1)

for t ∈ (Tm, Tm+1] and j = (j1, j2) ∈ Inline graphic × Inline graphic. For any such pair j = (j1, j2), αj is a hazard function dependent on an unknown Euclidean parameter θ and a vector of unknown increasing functions Γ(j1,.)=[Γj(x)=0xγj(u)du:j=(j1,j2)J×J,x0]. The components of Γ(j1,.) depend on all states which can be reached from the state j1 in one step. If covariates are time independent, then (1.1) includes as a special case renewal processes whose interarrival times satisfy common transformation models. Other choices include semi-Markov models with one-step transition probabilities defined using copula graphic models (e.g. Zheng and Klein (1995), Rivest and Wells (2001), Lo and Wilke (2010)) or extensions of the dynamic Cox-McFadden’s model (Chintagunta and Prasad (1998)) combining transformation models and multinomial regression. These models are defined in more detail in Section 2, where covariates are also allowed to change at the renewal times of the process.

For purposes of estimation, we consider a modification of procedures studied by Bagdonovicius and Nikulin (1999,2004) and Dabrowska (2006) in the case of single spell transformation models. Section 3 provides properties of the estimates as well as an extension of the Gaussian multiplier method of Lin et al. (1994) for setting point-wise and simultaneous confidence bands for the unknown transformations and related parameters. In analogy to Cox’s model, the counting process Ñ has a compensator depending on the backwards recurrence time and as a result of this, it falls outside the class of multiplicative models studied by Andersen et al. (1993), for instance. In the case of Cox’s modulated renewal process or non-parametric semi-Markov models, estimation of the cumulative hazards of one-step transitions leads to a time transformation which arranges observations according to the length of time spent in each state rather than calendar time. As a result of the rearrangement of the time scale, usual counting process methods for analysis of large sample properties of stochastic integrals do not apply (Gill (1980), Oakes (1981), Oakes and Cui (1994)). To alleviate these problems, we use Hoeffding’s projection method and empirical processes in Section 5.

In Section 4, we consider a transplant outcome data set from the Center for International Blood and Marrow Transplant Research (CIBMTR). The example data set consists of patients who received HLA-identical sibling transplant from 1995 to 2004 for acute myelogenous leukemia (AML) or acute lymphoblastic leukemia (ALL). Multistate models for analysis of the bone marrow transplant recovery process have been proposed by several authors. The early work in this area focused on competing risk models and goes back Prentice et al. (1978) who discussed estimation of cause specific cumulative hazards in the proportional hazard model. More recent approaches towards analysis of leukemia transplant data are based on multistate models. They provide a convenient tool for evaluation of the impact of intermediate events in the transplant recovery process on the main outcome events corresponding to leukemia relapse and death in remission. However, analysis of multistate regression models leads to some difficulties in the interpretation of the results because there is no one-to-one correspondence between regression coefficients and transition probabilities. Each covariate may increase the risk of transition among some states of the model and at the same time decrease it among the others. Correspondingly, its overall impact on the outcome events is often not clear. To obviate difficulties, Arjas and Eerola (1993) and Eerola (1994) proposed a set of graphical tools which can be used for purposes of interpretation of regression analyzes based on multistate models. These included graphs of innovation gains and plots of the transition probabilities evaluated by conditioning on the follow-up history of a patient. The approach was illustrated using a proportional hazard model with time dependent covariates in Eerola (1994). Applications of these methods to proportional hazard Markov chain models were given in Klein et al. (1993) and Keiding et al. (2001) and Andersen and Parme (2008), and proportional hazard semi-Markov models in Dabrowska et al. (1993, 2006). Putter et al. (2007) discussed special cases of both models.

In this paper, we consider a data set involving patients who received either bone marrow (BMT) or peripheral blood stem cell transplant (PBSCT). Many clinical studies have reported that PBSCT may be beneficial during the early post-transplant period as it leads to faster engraftment and hematopoietic recovery than BMT (e.g. Flowers et al. 2002, Ringden et al. 2002). Several studies have also pointed out that differences between the two transplant types may dissipate over time (e.g. Friedrichs et al. 2010, Cutler et al. 2002ab). Such dissipating time effects are better captured by the proportional odds ratio model than the proportional hazard model, and in Section 5 we discuss an extension of it to semi-Markov models. In this section we also propose pointwise and simultaneous confidence bands for comparison of transition probabilities.

2 The model

Throughout the paper we assume that (Ω, Inline graphic, P) is a complete probability space and (Tm, Vm)m≥0 is a marked point process defined on it with marks taking on values in a separable measure space (E, Inline graphic) and enlarged by the empty mark Δ. Thus T0 < T1 < … Tm … is a sequence of random time points registering occurrence of some events in time such that Tm are almost surely distinct and Tm ↑ ∞ P-a.s. At time Tm we observe a variable Vm such that VmE if Tm < ∞, and Vm = Δ if Tm = ∞.

For any BInline graphic, let Ñ(t, B) = Σm≥0 1(Tm+1t, Vm+1B) be the process counting observations falling into the set [0, t] × B. The internal history of the process, {FtN}t0, represents information collected on Ñ until time t, and is given by FtN=σ(1(Tms,VmB):m1,st,BE)σ(V0). Let Ft=NFtN be the self-exciting filtration associated with the process Ñ, obtained by adjoining the P-null sets to the internal history of the process. The compensator of the process Ñ with respect to Inline graphic is given by

Λ(t,B)=Λ(Tm,B)+(Tm,t]×BPm(d(s,v))Pm([s,];EΔ)fort(Tm,Tm+1],

where Pm(d(s, v)) is a version of a regular conditional distribution of (Tm+1, Vm+1) given Inline graphic (Jacod (1975)).

In this paper we assume that the marks Vm have the form Vm = (Jm, m), where JmInline graphic = {1, …, r} is a discrete variable representing the type of the event occurring at time Tm and m are covariates taking on value in Rd. The covariate m may correspond to some measurements taken upon entrance into the state Jm. The process Ñ = [Ñj, j = (j1, j2) ∈ Inline graphic × Inline graphic],

Nj(t,B)=m01(Tm+1t,Jm+1=j2,Jm=j1,Zm+1B),

has compensator given by

Λj(t,B)=Λj(Tm,B)+0t-Tmμm+1(B,u+Tm,j)1(Jm=j1)αj(Γ(j1,.)(u),θ,Zj1m)Γj(du),

for t ∈ (Tm, Tm+1]. Here μm+1(B, Tm+1, Jm, Jm+1) is the conditional probability of the event {m+1B} given σ( Inline graphic, Tm+1, Jm+1). Further, Zj1m = gj1m(Tl, Jl, l: l = 0, …, m) is a fixed Rd valued function, measurable with respect to Inline graphic. Finally, αj denotes a hazard rate dependent on a Euclidean parameter θ and a vector of unknown monotone increasing functions Γ (j1,.) = [Γj: j = (j1, j2) ∈ Inline graphic × Inline graphic]. In particular, setting B = Rd and using μm+1(Rd, Tm+1, Jm, Jm+1)1(Tm+1 < ∞) = 1 P-a.s., Λ̃j(t, Rd) reduces to (1.1) and represents the compensator of the “marginal” counting process

Nj(t)=Nj(t,Rd)=m01(Tm+1t,Jm+1=j2,Jm=j1) (2.1)

registering transitions among the adjoining states of the model.

To give examples of the model, we assume first that the covariates are time independent. If events are of a single type (| Inline graphic| = 1), then (1.1) represents compensator of a renewal regression model assuming that the interarrival times follow a transformation model. Thus in this case {α(u, θ, Z): θ ∈ Θ} is a parametric family of hazard rates, and the model stipulates that conditionally on Z, the interarrival times, Xm+1 are independent and their conditional survival function has cumulative hazard function A(Γ(x), θ, Z).

Simple examples of multi-type processes are given by competing risk and semi-Markov regression models. In particular, a semi-Markov regression model assumes that one-step transition probabilities satisfy

P(Xm+1x,Jm+1=j2(T,J)=0m,Z)=P(Xm+1x,Jm+1=j2Jm,Z).

The matrix [Fj, j = (j1, j2) ∈ Inline graphic × Inline graphic],

Fj(xZ)=P(Xm+1x,Jm+1=j2Jm=j1,Z),

forms the kernel of the process. One way to define it is to consider latent variable models. Specifically, suppose that transitions originating from the state j1 have the same conditional distribution as the pair (U, V), where

U=min[Uj:j=(j1,j2)J×J],V=[1(U=Uj):j=(j1,j2)J×J],

and [Uj: j = (j1, j2) ∈ Inline graphic × Inline graphic] is a multivariate vector whose joint conditional survival function given Z is

S(j1,.)(u,θ,z)-S(j1,.)0([Γj(uj)eθjTz:j=(j1,j2)J×J]).

Here u = [uj, j = (j1, j2) ∈ Inline graphic × Inline graphic] and S(j1,.)0 is a known multivariate survival function with a density with respect to Lebesgue measure supported on the entire upper orthant of Rqj1, qj1 = |{j2: (j1, j2) ∈ Inline graphic × Inline graphic}|. The functions αj in (1.1) are equal to

-yjlogS(j1,.)0([yjeθjTz:j=(j1,j2)JJ]).

With this choice the cumulative intensity (1.1) corresponds to a semi-Markov model whose kernel is given by

Fj(xZ)=P(Xm+1x,Jm+1=j2Jm=j1,Z)=0xF¯(j1,.)(uZ)αj(Γ(j1·)(u),θ,Z)Γj(du), (2.2)

where j = (j1, j2) ∈ Inline graphic × Inline graphicand (j1,.)(x|z) is the survival function of the sojourn time in state j1,

F¯(j1,.)(xz)=P(Xm+1>xJm=j1,z)=exp[-j=(j1,j2)0xαj(Γ(j1,.)(u),θ,z)Γj(du)]=S(j1,.)(Γ(j1,.)(x),θ,z). (2.3)

If the state space of the process consists of one ephemeral state (J0 = 1, say) and q − 1 absorbing states, q ≥ 3, then the semi-Markov process reduces to a competing risk model. In this case transition probabilities (2.2) provide a regression analogue of copula-graphic models proposed for analysis of competing risks by Zhang and Klein (1995) and Rivest and Wells (2001). The special case of Archimedean copula models corresponds to the choice S(j1,.)(0)(y(j1,.))=S¯(||y(j1,.)||1), where is a known survival function with a density supported on the positive half-line and ||·||1 is the ℓ1-norm of a vector.

Another example of a semi-Markov model is provided by the dynamic Cox-McFadden model (Chintagunta and Prasad, 1998). In this case, the distribution of the sojourn time in state j1Inline graphic is specified by means of a transformation model for univariate failure time data, i.e. the survival function (2.3) is of the form (j1,.)(x|z) = exp[−Ãj1j1(x), θ1, z)] for some univariate cumulative hazard function Ãj1. The kernel of the process is given by

Fj(xz)=0xπj(u,z,θ2)F(j1,.)(duz),

where F(j1,.)(·|z) = 1 − (j1,.)(·|z) and for j = (j1, j2),

πj(Xm+1,,Z,θ2)=P(jm+1=j2Xm+1,Jm=j1,Z) (2.4)

are the one-step state transition probabilities. The state transition probabilities can be specified using multinomial regression models such as the logistic or probit model. If the state transition probabilities (2.4) do not depend on the length of the sojourn time Xm+1, the model reduces to a stationary process, i.e. conditionally on Z, the transition probabilities do not depend on m.

In practice, the assumptions of the semi-Markov process may be violated if transitions from a state j1 to a state j2 depend on the sequence or the time spent in states visited prior to the entrance into the state j1. Both models can accommodate this problem by allowing the covariates to depend on the internal history of the process. The time dependent covariates may represent for instance the total number of events occurring prior to the entrance into the state j2 or the length of time spent in states preceding entrance into the state j1. The time dependent covariates may also represent changing treatment types or levels of drugs.

We further assume that the process is subject to censoring and times at which the process is observed is determined by a process C(t) = Σm≥1 1(Cm−1 < tCm), where 0 ≤ C0C1 ≤ … ≤ Cm… is an increasing sequence such that Cm ∈ [Tm, Tm+1] are stopping times with respect to a larger filtration { Inline graphic}t≥0, Inline graphicInline graphic. If Tm = Cm then no information is available on either the sojourn time Xm+1 = Tm+1Tm or the marks (Vm, Vm+1). If Cm = Tm+1 then the sojourn time Xm+1 = Tm+1Tm and the marks (Vm, Vm+1) are observable. Finally, if Tm < Cm < Tm+1 then the mark Vm is visible while the sojourn time Xm+1 is only known to exceed CmTm. Following Andersen et al. (1993), we assume that the compensator ΛInline graphic, of the marked point process Ñ, relative to the filtration { Inline graphic}t≥0, satisfies ΛInline graphic = Λ, P-a.s. and that the censoring process and the compensator Λ depend on parameters which do not share components in common. We also make the assumption that the censoring process is monotone so that with probability 1, TmCm < Tm+1Cm = Tm for all m′ > m. This condition stipulates that the process terminates once censoring takes place.

These conditions are satisfied in two common applications. The first assumes that the process is subject to censoring by a univariate failure time T′ such that T′ is independent of the the sequence (Tm, Vm), conditionally on the initial state of the process, V0. In this case, Cm = Tm + min(T′ − Tm, Xm+1)1(T′ ≥ Tm) and the augmented filtration is given by Inline graphic = Inline graphicσ(T′).

The second example assumes that the state space of the process has an extra absorbing state corresponding censoring, say {c}, which can be reached in one step from each transient state j1Inline graphic. Time T till entrance into the censoring state forms then stopping time with respect to the filtration Inline graphic = Inline graphic. Consequently, there exist nonnegative variables Um such that on the event {TTm}, we have TTm+1 = (Tm + Um) ∧ Tm+1, and Um is measurable with respect to Inline graphic. Correspondingly, Cm = Tm + min(Um, Xm+1)1(TTm). In this setting, the assumption of non-informative censoring means that the compensators of one-step transitions into the the censoring state depend on different parameters than the compensator of transitions among the remaining states of the model.

Let Inline graphicInline graphic × Inline graphic be the set of pairs of adjacent states in the model, i.e. j = (j1, j2) ∈ Inline graphic iff the subject may progress from state j1 to state j2 in one step. For j = (j1, j2) ∈ Inline graphic and m ≥ 0, let Njm(x) = 1(Xm+1x, Jm = j1, Jm+1 = j2, Tm = Cm+1), Yjm(v) = 1(Xm+1x, CmTmx, Jm = j1) and set

Mjm(x,θ)=Njm(x)-Λjm(x,θ),Λjm(x,θ)=0xYjm(u)αj(Γj1,.(u),Zj1m,θ)Γj(du).

The aggregate processes Nj., Yj. and Mj. are defined as Nj. = Σm Njm, Yj. = Σm Yjm and Mj.m Mjm, respectively.

Note that the model depends on two parameters, θ and Γ, however, we suppress the dependence on Γ in the notation. In analogy to single spell models in Bagdonovicius and Nikulin (1999,2004) and Dabrowska (2006), under regularity conditions stated in Section 5, we can associate, with any θ ∈ Θ, a vector Γθ of locally bounded increasing functions. For this purpose, we shall require only that the processes Nj. and Yj. have a finite expectation. To show asymptotic normality of estimates we shall require existence of the second moments of these processes. More precisely, we assume the following conditions.

Condition 2.1

For all jInline graphic

  1. The functions EYj.(x) have at most a finite number of discontinuity points and EYj.(0)2 < ∞.

  2. The functions ENj.(x) are continuous, ENj.(τ)2 < ∞ and the point τ satisfies inf{x: ENj.(x) > 0} < τ < τj0, where τj0 = sup{x: EYj.(x) > 0}.

  3. We have P(|ZJ(t−),Ñ..(t−) | ≤ C) = 1, where C is a finite constant, J(t) is the state occupied by the process at time t and Ñ..(t) = ΣjÑj.(t) is the total number of events observed in the interval [0, t].

Under the added assumption that the model corresponds to the censored modulated renewal process, and θ represents the true parameter, we have the following moment identities.

Lemma 2.1

Let L(t) = tTÑ..(t−) be the backwards time of the process Ñ and let {ϕm(x), m ≥ 0, x ≥ 0} be a sequence of random functions such that the process ϕL, ϕL(t) = ϕÑ..(t−)(tTÑ..(t−)), is predictable with respect to the filtration { Inline graphic}t≥0 and E0[ϕL]2(s)Λj(ds,θ)<. Then

Em0ϕm(u)Njm(du)=Em0ϕm(u)Λjm(du,θ),E[m0ϕm(u)Mjm(du,θ)]2=Em0ϕm2(u)Λjm(du,θ).

In addition, if {ϕ1m: m ≥ 0} and {ϕ2m: m ≥ 0} are two such sequences, then

E[m0ϕ1m(u)Mjm(du,θ)][m0ϕ2m(u)Mjm(du,θ)]=0

for pairs jj′, j, j′ ∈ Inline graphic.

Similarly to Gill (1980), this lemma follows from the dominated convergence theorem, martingale properties of the processes j = Ñj(t) − Λ̃j(t), and the identities

0[ϕL]k(s)C(s)Nj(ds)=m00ϕmk(u)Njm(du),0[ϕL]k(s)C(s)Λj(ds,θ)=m00ϕmk(u)Λjm(du,θ).

The identities hold almost surely for k = 1, 2. We omit the details.

3 Estimation

Throughout the remainder of this paper, we assume that we have an iid sample of size n of the censored modulated renewal process and covariates. The subscript ”i” refers to the i-th subject under study and Di represents the associated vector of observations. It corresponds to the sequence of states visited, duration of the time spent in each state, the initial covariate and its updates occurring at uncensored renewal times.

Further, let q = | Inline graphic| be the total number of possible one-step transitions in the model. For each j = 1, …, q, we let (r(j), c(j)) = (j1, j2) if the pair jInline graphic corresponds to the one-step transition from state j1 to the state j2. For any such jInline graphic, the covariate Zj1m is denoted as Zjm. We shall also find it convenient to write Γ = [Γ1, …, Γq]T for the vector obtained by stacking the columns of the matrix Γ = [Γj]jInline graphic on the top of each other and deleting all entries corresponding to the pairs (j1, j2) ∉ Inline graphic. For the sake of convenience, we shall write αj(y, θ, z) for each jInline graphic and y = (y1, …, yq)T, yjR+, j = 1, …, q. However, it is tacitly assumed here that for j = (j1, j2) ∈ Inline graphic, the function αj(y, θ, z) may depend only on yk’s such that (r(k), c(k)) = (j1, ℓ) for some (j1, ℓ) ∈ Inline graphic.

Under assumptions stated in section 5, the parameter θ varies over a bounded open subset Θ of Rd and the functions ℓj(y, θ, z) = log αj(y, θ, z), yRq are twice continuously differentiable with respect to (y, θ). We let j=(j(1),,j(q))T be a vector whose k-th component is equal to the partial derivative of ℓj(y, θ, z) with respect to yk, k = 1, …, q. Likewise, ℓ̇j denotes the (column) vector of length d corresponding to the derivative of ℓj with respect to θ. We further set Sj(y,θ,x)=n-1i=1nmYjmi(x)αj(y,θ,Zjmi), yR q and denote by , S′ the derivatives of these processes with respect to (y, θ). Here, is a d × q matrix, whose j-th column is given by j(y, θ, x), the derivative of Sj with respect to θ. Further S=[Sj(k)]j,k=1,,q is a q × q matrix, whose (k, j) entry is equal to the partial derivative Sj(k)(y,θ,x)of Sj(y, θ, x) with respect to yk, k = 1, …, q. Let s and let , s′ be the matrices of expected and S′ processes. Finally, for each jInline graphic, we let Nj..(x)=n-1i=1nmNjmi(x) be the averaged process counting transitions from the state j1 = r(j) to the state j2 = c(j) and whose sojourn time in the state j1 does not exceed x.

As an estimate of the unknown transformations Γ = [Γ1, …, Γq]T, we consider a vector valued analogue of the estimator proposed by Bagdonovicius and Nikulin (1999,2004) for analysis of single spell models. The estimator is given by

Γjnθ(x)=0xNj..(du)Sj(Γnθ(u-),θ,u),Γjnθ(0-)=0,θΘ,x0,jJ0. (3.1)

For fixed θ, (3.1) forms a sample analogue of the non-linear vector-valued Volterra equation

Γjθ(x)=0xENj..(du)sj(Γθ(u-),θ,u),Γjθ(0-)=0,x0,jJ0. (3.2)

Using arguments similar to Dabrowska (2006), we can show that under the regularity conditions stated in Section 5, the equation (3.2) has a unique solution Γθ = [Γ1θ, …, Γ]T and its estimator (3.1) is uniformly consistent. Further, the function Θ ∋ θ → {Γθ(x): x ∈ [0, τ]} ∈ C([0, τ])q is Frèchet differentiable with respect to θ. The derivative is a d × q matrix of continuous functions satisfying the matrix-valued linear Volterra equation

Γ.θ(x)=-0xs.(Γθ(w-),θ,w)Cθ(dw)-0xΓ.θ(w-)Qθ(dw), (3.3)

where Cθ(x) is the diagonal q × q matrix Cθ(x) = diag [C1θ(x), …, C(x)] with entries

Cjθ(x)=0xENj..(du)sj2(Γθ(u-),θ,u)

and

Qθ(x)=0xs(Γ(w-),θ,w)cθ(dw).

The solution to the Volterra equation is given by

Γ.θ(x)=-0xs.(Γθ(w-),θ,w)Cθ(dw)Pθ(w,x). (3.4)

where Inline graphic(w, x), 0 < wx is the Peano series (Gill and Johansen, 1990)

Pθ(u,x)=I+m=1u<w1<<wmx(-1)mQθ(dw1)··Qθ(dwm). (3.5)

Here I is the q × q identity matrix. A uniformly consistent estimate of {Γ̇θ(x): x ∈ [0, τ], θ ∈ Θ} can be obtained by substituting the processes Nj.. and Sj, Sj, j into the preceding expressions.

To define the score equation for estimation of the Euclidean parameter, let

ej[fj](u,θ)=EmYjmi(u)[fjαj](Γθ(u),θ,Zjmi)EmYjmi(u)αj(Γθ(u),θ,Zjmi),

where fj(y, θ, Zjmi) is a function of covariates, jointly continuous with respect to (y, θ) and bounded on every compact set of Rq × Θ. Likewise, for any two vectors f1j and f2j of such functions, define

covj[f1j,f2j](u,θ)=(ej[(f1jf2j)]-(ej[f1j]ej[f2j]))(u,θ)

and set varj[fj](u, θ) = covj[fj, fj](u, θ).

To estimate the parameter θ, we use a solution to the score equation Un(θ) = Un(θ) = oP (n−1/2), where

Unϕn(θ)=1ni=1njm0τb^jmi(Γnθ(u),θ,u)Njmi(du), (3.6)

jmi(u), u, θ) = b̂jm1i(u), u, θ) − ϕ(u)b̂jm2i(u), u, θ) and

b^jm1i(y,θ,u)=.j(y,θ,Zjmi)-[S.j/Sj](y,θ,u),b^jm2i(y,θ,u)=j(y,θ,Zjmi)-[Sj/Sj](y,θ,u).

Here ϕ(x) is an estimate of a d × q matrix of bounded functions ϕθ(x), whose j-th column is absolutely continuous with respect to Γ.

We further define matrices

0(θ)=j0τvj,ϕ(u,θ)ENj..(du),1(θ)=0(θ)+j0τρjϕ(u,θ)ENj..(du)[Γ.θ(u)+ϕθ(u)]T,2(θ)=0(θ)+0τDϕ(u,θ)TCθ(du)Dϕ(u,θ),

where vj,ϕ(u,θ)=varj[.j-ϕθj](u,θ),ρj,ϕ(u,θ)=covj[.j-ϕθj,j](u,θ) and

Dϕ(u,θ)=juτPθ(u,w)ENj..(dw)ρjϕ(w,θ)T.

Proposition 3.1

Let εn ↓ 0 be a sequence such that nεn and let Inline graphic(θ0, εn) = {θ: |θθ0| ≤ εn} be the ball of radius εn centered at θ0. Suppose that the matrix Σ0(θ0) is positive definite and the matrix Σ1(θ0) is non-singular. Under conditions stated in Section 5, the score equation Un(θ) = oP*(n−1/2) has a solution θ̂ in the ball Inline graphic(θ0, εn), with (inner) probability tending to 1. Further, let Ξ^=n(θ^-θ0) and W^0=n[(Γnθ^-Γθ0)T-(θ^-θ0)TΓ.nθ^]. Then [Ξ̂, Ŵ0] converges weakly in Rd ×ℓ([0, τ] × Inline graphic) to a tight mean zero Gaussian process [Ξ, W0] with covariance

covΞ=1-1(θ0)2(θ0)[1-1(θ0)]T,cov(W0(x),W0(x))=Kθ0(x,x),cov(Ξ,W0(x))=-1-1(θ0)j0τρj,ϕ(u,θ0)ENj..(du)Kθ0(u,x),

where Kθ, θ ∈ Θ is a q × q matrix

Kθ(x,x)=0xxPθT(u,x)Cθ(du)Pθ(u,x). (3.7)

Here Inline graphic = ℓ([0, τ] × Inline graphic) denotes the space of bounded functions mapping the set [0, τ] × Inline graphic into R and equipped with uniform metric and Borel σ-field. The Borel σ-field Inline graphic = Rd × Inline graphic is generated by open sets in the product topology of the Euclidean space Rd and the space Inline graphic. It is equal to Inline graphic(Rd) ⊗ Inline graphic( Inline graphic) because Rd is a complete separable metric space. The process X = (Ξ, W0) has a version whose almost all paths are in the separable subspace of Inline graphic corresponding to Rd × Cb([0, τ] × Inline graphic), where Cb([0, τ] × Inline graphic) is the space functions continuous with respect to the variance pseudometric. Weak convergence of the sequence Xn = [Ξ̂, Ŵ0] to (Ξ, W0) means that for all bounded continuous functions f on Inline graphic, we have E*f(Xn) − Ef (X) → 0, where E* is the outer expectation. This implies that Xn is asymptotically measurable. In particular, we have E*f(Xn) − E*f (Xn) → 0 for all bounded continuous functions f on Inline graphic, where E*f (Xn) = −E*(−f(Xn)) is the inner expected (van der Vaart and Wellner (1996), Dudley (1999)). We also note that the space Inline graphic = ℓ( Inline graphic × Inline graphic) is isometric to the product space Inline graphic = ℓ([0, τ])q equipped with uniform metric dY (x, y) = maxj supt |xj(t) − yj(t)| and product topology of Inline graphic coincides with the topology induced by metric dY. Under assumptions of section 5, the space Cb([0, τ] × Inline graphic) is isometric to the space C([0, τ])q and W0 is a linear transformation of a vector of q independent time-transformed Brownian motions.

The M-estimator θ̂ depends on the specification of the matrix ϕθ and its estimator ϕ. Depending on the measurability properties of the estimator ϕ, the solution to the score equation exists either with probability tending to 1, or with inner probability tending to 1 (Section 5). Two simple choices of the function ϕθ correspond to ϕθ ≡ 0 and ϕθ = − Γ̇θ. In particular, with the latter choice, the estimate θ̂ is an analogue of the pseudo-maximum likelihood estimators considered by Bagdonovicius and Nikulin (1999,2004) in the case of single spell models. Under regularity conditions, the optimal choice of this function corresponds to solution of a system of Sturm-Liouville equations and yields an asymptotically efficient estimate of the Euclidean component of the model. If the process registers only events of one type (i.e. | Inline graphic| = 1) then the form of ϕθ corresponding to the efficient estimate of θ is similar to the single spell version of this model and can be found in Bickel (1986) and Bickel and Ritov (1995) in the uncensored case, and in Dabrowska (2007) in the censored case. The estimate of the function ϕθ can be obtained in this case by inverting a simple tridiagonal band-symmetric matrix. The form of the information bound and efficient score function for the general case (| Inline graphic| > 1) is postponed to a separate paper, where we consider it under additional compatibility conditions.

To set confidence bands for the baseline Γ vector and related parameters, we consider Gaussian multiplier method of Lin, Fleming and Wei (1994). For this purpose, we shall need some additional notation.

  1. Let G0 be a vector of independent Inline graphic(0, Id×d) variables. and let Gi = (Gmi: m = 1, …, Ki), i = 1, …, n, Ki = Y..i(0) be standard normal variables, independent of G0 and mutually independent given the data D1, …, Dn.

  2. For jInline graphic, set
    V^j#(x)=1ni=1nmGmi0xNjmi(du)Sj(Γnθ^(u-),θ^,u),
  3. Put Ξ^#=Ξ^1#-Ξ^2#, where Ξ^1#=^1#(θ^)^0(θ^)1/2 and
    Ξ^2#=^1-1(θ^)j0τρ^j,ϕn(u,θ^)Nj..(du)W^0#(u)T,W^0#(x)=0xV^#(du)P^θ^(u,x)=V^#(x)-0xW^0#(u-)Q^θ^(du).

The estimates θ̂ and Inline graphic are plug-in analogues of the matrices defined in (3.3)–(3.5)

Proposition 3.2

Suppose that the conditions of Proposition 3.1 are satisfied. Then, unconditionally, ((Ξ̂#, W^0#), W^0#={[W^j#(x):x[0,τ],jJ0]} converges weakly in Rd×ℓ([0, τInline graphic) to a mean zero Gaussian process (Ξ#, W0#) with the same covariance function as (Ξ, W0). Moreover, (Ξ, W0) and (Ξ#, W0#) are independent while (Ξ̂, Ŵ0) and (Ξ#, W^0#) are asymptotically independent. Conditionally, the process (Ξ#, W^0#) converges weakly to (Ξ#, W0#), in probability. As in van der Vaart and Wellner (1996, p. 181), conditional weak convergence means that suphBL1EGh(Ξ^#,W^0#)-Eh(Ξ#,W0#)P0, where EG denotes expectation with respect to the G variables. Further, h varies over the class of bounded Lipschitz functions, and BL1 is the set Lipschitz functions whose norm is bounded by 1.

This proposition can be further extended to approximate the distribution of functionals Φ(θ, Γ). In sufficiently simple cases, functional delta method can be used for this purpose. In particular, we may consider estimation of the kernel F of a semi-Markov processes with a state space Inline graphic = {1, …, r}. In this case the covariates are time independent, and the entries of the matrix F(x|z) = [Fj(x|z)]jInline graphic are specified by (2.2)–(2.3). Under the assumed differentiability conditions on the hazard functions αj, the plug-in sample analogue of the matrix F has entries satisfying

W^F,j(xz)=n[F^j-Fj](xz)=Ξ^T0xf.j(Γ(j1·)(u),θ0,z)Γj(du)+0xW(j1·)(u)fj(Γ(j1·)(u),θ,z)Γj(du)+0xfj(Γ(j1·)(u),θ0,z)W0j(du)+OP(1),jJ0. (3.8)

For any j = (j1, j2) ∈ Inline graphic, Γ (j1,.) and (j1,.) denote subvectors Γ (j1,.) = {Γθ0j: j = (j1, ℓ) ∈ Inline graphic} and (j1,.) = {0j: j = (j1, ℓ) ∈ Inline graphic}, where

W0={n[Γnjθ^-Γjθ0]:jJ0}=W^0+Ξ^TΓ.nθ^+OP(1). (3.9)

Denote by W^F# the matrix obtained by replacing in (3.8)–(3.9) the process (Ξ̂, Ŵ0) by (Ξ̂#, W^0#) and the unknown parameters by their estimates (θ̂, Γnθ̂). Using integration by parts and Proposition 3.1 it is easy to verify that the process ŴF = [ŴF,j(x|z): xτ, jInline graphic] converges weakly to a mean zero Gaussian process WF in ℓ([0, τ])| Inline graphic|. In addition, the conclusions of Proposition 3.2 carry over to the process W^F#=[W^F,j#(xz):xτ,jJ0], i.e. unconditionally, W^F# converges weakly to a mean zero Gaussian process WF# with the same covariance function as the process WF and is independent of it. Conditionally, the process W^F# converges weakly to WF# in probability.

Another example of a functional may correspond to the cumulative residual process arising in goodness-of-fit testing. In particular, suppose that covariates are partitioned into k disjoint categories, I1, …, Ik. The cumulative residual process for the one-step transition between states j1j2 is given by

R^j(x,)=1ni=1nm1(ZjmiI)M^jmi(x)=1ni=1nm0x[1(Zjmi)-SjSj(Γnθ^(u-),θ^,u)]Njmi(du),

where Sj(Γnθ^(u-),θ^,u)=i=1nmYjmi(u)1(ZjmiI)αj(Γ^nθ^(u-),θ^,Zjmi) is the risk process corresponding to subjects in the group I. Under the assumption that residuals are consistent with the model, the = {j(t, ℓ): t ∈ [0, τ], jInline graphic, ℓ = 1, …, k} converges weakly to a mean zero Gaussian process and the Gaussian multiplier approximation to its distribution is given by

R^j#(x,)=1n0x[1(ZjmiI)-SjSj(Γnθ^(u-),θ^,u)]GmiNjmi(du)-(Ξ^#)T0x([S.jSj-S.jSj]SjSj)(Γnθ^,θ^,u)Nj..(du)-0xW(j1,.)#(u)([SjSj-SjSj]SjSj)(Γnθ^,θ^,u)Nj..(du).

In analogy to Martinussen and Scheike (2006), the performance of residuals can be evaluated using Kolmogorov-Smirnov statistics such as supx∈[δ,τδ] |j(x, ℓ)| and the Guassian multiplier method can be used to obtain critical levels of tests. Alternate tests can be obtained by modifying chi-squared tests in Aalen et al (2008, p.144) or tests based on Schoenfeld residuals.

4 Example

We consider a transplant outcome data set from the Center for International Blood and Marrow Transplant Research (CIBMTR). The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and bone marrow transplant patients with CIBMTR Data Collection Center located at the Medical College of Wisconsin. The CIBMTR is a repository of information about results of transplants at more than 450 transplant centers worldwide. The example data set consists of patients who received HLA-identical sibling transplant from 1995 to 2004 for acute myeloge-nous leukemia (AML) or acute lymphoblastic leukemia (ALL) and transplanted in first remission. All patients received bone marrow transplantation or peripheral blood stem cell transplantation. Children under age 16 and all patients who received umbilical cord blood transplants were excluded as risk factors are likely to vary in this group.

Allogeneic stem cell transplantation (ASCT) is an accepted treatment for leukemia patients. Transplant candidates receive high doses of chemotherapy and radiation which destroy malignant cells in the bone marrow and elsewhere. Because stem cells in the normal bone marrow are destroyed in this process as well, patients subsequently receive a transplant from a suitably matched donor. The transplant can be followed by several complications. In this study, fatal complications correspond to relapse of leukemia or death in remission (hereafter referred to as death). The most important intermediate event in ASCT is graft-versus-host-disease (GVHD) in which transplanted immune cells recognize the recipient’s body tissues as foreign. Acute and chronic GVHD (AGVHD and CGVHD) are two forms of this disease. AGVHD occurs during the early post-transplant period is defined here as moderate to severe using clinically established criteria. CGVHD occurs later in time and may be preceded by AGVHD.

The incidence of GVHD, leukemia relapse and death in remission depends on a number of variables characterizing the recipient, the donor and the transplant. The main variables considered in this paper include recipient’s age, donor-recipient gender match, disease type and graft source. Bone marrow was the first source of stems cells used in used ASCT. Since 90’ies, peripheral-blood stem cell transplants have replaced bone marrow as the preferred source of stem cells because of a quicker hematologic recovery and relative ease of collection. Patients may receive also an infusion of both peripheral stem-cells and bone marrow. Several studies have shown that PBSCT recipients may be at a higher risk of GVHD than BMT patients. (e.g. Cutler et al. (2001), Flowers et al. (2002), Friedrichs et al. (2010)). A possible explanation of this phenomenon is that GVHD develops from the infusion of donor T cells and PBSCT recipients receive a significantly higher dose of T cells than BMT patients. As a result of the increased risk of GVHD, the patients who experience it may be at a higher risk of death in remission than BMT patients. GVHD is also more more common among older patients and among male recipients receiving transplants from female donors (Gale et al. 1987).

For purposes of modeling, we consider a five state modulated renewal model proposed for analysis of the transplant recovery process in Dabrowska et al. (1994). Table 1 collects some information about the type and number of the observed transitions, their range and median. The model assumes that a patient remains in the transplant state (tx, state 1) until the time of the first adverse event which may correspond to AGVHD (state 2), CGVHD (state 3), relapse (state 4) or death in remission (state 5). The model takes also in to the account that a patient who develops GVHD may subsequently relapse or die, and that CGVHD may be preceded by AGVHD. The observed model has an extra absorbing state corresponding to censoring (loss-to-follow-up). Further, age was categorized into 3 groups, each representing approximately one third of the patients. The baseline group corresponds to the age range [29.5, 42.5]. Transitions were also adjusted for the waiting time for transplant. Two continuous variables were used for this purpose: the length of time between leukemia diagnosis and first remission (DxCr) and the length of time between first remission and transplant (CrTx). Their medians and range were: median(DxCr)= 1.38, IQR(DxCr)=1.15, range(DxCr)=221.45 months and med(CrTx) = 3.06, IQR(CrTx)=2.5, range(CrTx)=46.74 months. To obviate skewness of the distribution, the log transformation of these variables is used in the regression analysis.

Table 1.

Observed one-step transitions

n median (in months) range (in months)
TX → AGVHD 491 .7 4.3
TX → CGVHD 372 5.5 106.4
TX → relapse 106 5.6 59.4
TX → death 179 2.9 131.9
TX → censoring 506 56.9 143.8

AGVHD → CGVHD 202 4.8 57.4
AGVHD → relapse 33 5.2 23.7
AGVHD → death 141 2.9 80.3
AGVHD → censoring 115 45.7 133.0

CGVHD → relapse 27 8.3 98.3
CGVHD → death 79 9.8 124.4
CGVHD → censoring 266 51.1 144.3

A+CGVHD → relapse 25 3.5 53.3
A+CGVHD → death 65 5.6 109.3
A+CGVHD → censoring 112 56.3 145.2

The modulated renewal process assumes that one-step transition probabilities are specified by means of a proportional odds ratio model. More precisely, hazard rates of one-step transitions originating from the transplant or AGVHD state are of the form

αj(Γ(j1,.)(x),θ,Z)γj(x)=eθjTZj[1+k=j1+151(=(j1,k))Γ(x)eθTZ]-1γj(x),

for j =(j1, j2) such that j1 = 1 or j1 = 2 and j1 + 1 ≤ j2 ≤ 5, Γj(x)=0xγj(u)du. In the case of transition rates originating from the CGVHD state, we use covariate ZC = (Z, ZA), where ZA is a binary variable indicating by 1 whether AGVHD preceded onset of chronic graft versus host disease. The corresponding transition rates into the relapse and death states are given by

αj(Γ(3,.)(x),θ,ZC)γj(x)=eθjTZjC[1+k=451(=(3,k))Γ(x)eθTZjC]-1γj(x)

for j = (3, j2) and j2 = 4, 5. Here Zj and ZjC, j = (j1, j2), represent transition specific covariates, which correspond to subvectors of Z and ZC, respectively. Table 4 provides their entries as well as the estimates of the regression coefficients and standard errors. The estimates were obtained using Fisher scoring algorithm applied to the score process (3.6) with ϕ = −Γ̇. Variable selection was based on backwards elimination and Wald testing. To asses adequacy of the model, we have used Kolmogorov-Smirnov described in Section 3. The results are summarized below and in Table 5.

Table 4.

Regression estimates

1 2 3 4 5 6 7 8 9
ALL vs AML .07 (.25) 1.32 (.36) .50 (.23) .45 (.38) .12 (.30) .90 (.31) .58 (.22)
Age1 −.25 (.16) −.68 (.28) −.49 (.32) −.45 (.26)
Age2 .27 (.20) .33 (.23) .51 (.59) .01 (.24) .72 (.21)
FM −.20 (.28) −.57 (.39)
PBSCT vs BMT .09 (.22) .01 (.29) .28 (.30) −.12 (.66) −.17 (.38) .55 (.35)
ALLxPBSCT .46 (.23) .92 (.43)
AMLxPBSCT −.33 (.30)
AMLxBMT −.30 (.22) −.50 (.43)
DxCr .12 (.08) .45 (.16) .22 (.12) .13 (.14)
CrTx −.21 (.07) −.26 (.09) −.33 (.15) −.24 (.17) −.10 (.12)
prioir AGVHD .72 (.30) .72 (.20)
Age1xPBSCT −.44 (.34)
Age1xBMT −.57 (.33) .70 (.60) −.88 (.35) −.64 (.34)
Age2xPBSCT −.28 (.25) .13 (.37)
Age2xBMT −.37 (.24) .60 (.88)
Age0xPBSCT −.27 (.26) .40 (.31)
AMLxPBSCTxAge2 .25 (.22)
Age1xALL .61 (.27) −.87 (.48) −.60 (.42)
Age2xALL .57 (.30) −.97 (.47)
FMxALL .63 (.27)
FMxAML .64 (.17) .81 (.45)
FMxPBSCT .42 (.18) .44 (.25)
FXALL −.25 (.19)
FxAML −.19 (.14)

Columns: 1 = Tx → AGVHD; 2 = Tx → CGVHD; 3 = Tx → Relapse; 4 = Tx → Death 5 = AGVHD → CGVHD; 6 = AGVHD → Relapse; 7 = AGVHD → Death; 8 = CGVHD → Relapse; 9 = CGVHD → Death.

Variable names: Age0: age in the (29.5, 42, 5] range, Age1 = age ≤ 29.5 years, Age2 = age > 42.5 years; F = female donor transplant; FM = female donor to male recipient vs other sexmatch transplant.

Table 5.

Kolmogorov-Smirnov residual statistics

1 2 3 4 5 6 7 8 9
AML 8.51 (.97) 6.35 (.96) 7.88 (.69) 5.87 (.86) 4.40 (.96) 2.75 (.86) 4.14 (.97) 4.42 (.73) 6.70 (.82)
ALL 10.22 (.89) 7.66 (.83) 7.65 (.69) 6.09 (.73) 4.34 (.94) 2.66 (.85) 4.23 (.95) 4.49 (.71) 6.46 (.75)
Age0 12.99 (.81) 6.52 (.93) 3.30 (.95) 6.50 (.63) 4.55 (.92) 2.79 (.51) 4.37 (.93) 2.03 (.94) 8.17 (.47)
Age1 6.42 (.98) 5.82 (.95) 3.20 (.94) 5.18 (.82) 4.96 (.87) 3.66 (.71) 2.66 (.98) 2.04 (.95) 3.53 (.87)
Age2 10.56 (.90) 7.64 (.91) 6.40 (.60) 8.11 (.72) 6.35 (.83) 2.44 (.87) 2.92 (.99) 1.51 (.99) 8.07 (.76)
BMT 7.84 (.97) 7.73 (.91) 7.16 (.69) 6.04 (.61) 5.96 (.81) 1.36 (.99) 5.11 (.90) 2.94 (.79) 9.67 (.57)
PBSCT 9.44 (.95) 9.07 (.90) 7.49 (.73) 5.75 (.82) 6.66 (.88) 1.27 (.99) 4.84 (.95) 2.91 (.91) 9.39 (.69)
non-FM 5.46 (.99) 7.84 (.94) 1.95 (.99) 9.82 (.66) 4.22 (.97) 1.50 (.97) 4.57 (.96) 3.97 (.80) 4.20 (.95)
FM 6.83 (.96) 9.36 (.82) 2.32 (.96) 9.52 (.43) 5.06 (.88) 1.33 (.96) 5.07 (.84) 3.91 (.60) 4.37 (.88)
M donor 5.48 (.99) 1.29 (.84) 2.96 (.98) 6.00 (.84) 1.42 (.58) 2.02 (.94) 3.93 (.98) 2.06 (.97) 3.50 (.98)
F donor 6.84 (.98) 11.95 (.80) 2.61 (.99) 5.83 (.86) 11.50 (.53) 1.93 (.93) 4.43 (.95) 2.08 (.97) 3.47 (.97)
DxCr-1 9.73 (.86) 16.86 (.42) 5.67 (.56) 7.88 (.49) 4.92 (.85) 2.27 (.77) 5.17 (.81) 2.44 (.89) 5.34 (.74)
DxCr-2 8.47 (.88) 11.67 (.59) 2.56 (.94) 2.86 (.98) 6.80 (.61) 2.79 (.64) 5.93 (.73) 3.85 (.57) 4.72 (.74)
DxCr-3 4.03 (1.00) 12.92 (.48) 4.34 (.71) 5.03 (.71) 9.88 (.33) 2.21 (.75) 4.27 (.88) 4.38 (.40) 11.23 (.30)
DxCR-4 11.80 (.83) 15.05 (.43) 4.01 (.89) 6.71 (.69) 6.15 (.71) 5.67 (.27) 12.11 (.43) 2.83 (.78) 3.52 (.94)
CrTx-1 14.69 (.74) 11.23 (.52) 5.19 (.66) 5.21 (.74) 6.53 (.74) 3.97 (.55) 7.24 (.76) 3.17 (.75) 2.43 (1.00)
CrTx-2 9.92 (.85) 12.52 (.58) 3.96 (.81) 5.37 (.68) 7.05 (.58) 2.61 (.66) 6.13 (.73) 3.49 (.67) 7.57 (.54)
CrTx-3 12.08 (.76) 5.86 (.94) 5.96 (.62) 8.34 (.46) 4.17 (.89) 2.52 (.67) 2.62 (.98) 2.34 (.86) 4.84 (.75)
CrTx-4 8.84 (.87) 7.21 (.85) 2.94 (.92) 1.37 (.35) 5.73 (.71) 4.21 (.37) 5.71 (.73) 3.59 (.57) 4.54 (.80)
AML x BMT 5.38 (.99) 9.84 (.76) 5.15 (.70) 1.62 (.41) 5.12 (.82) 2.04 (.88) 5.46 (.78) 2.57 (.64) 7.96 (.50)
AML x PBSCT 7.25 (.96) 6.30 (.97) 1.03 (.34) 5.04 (.80) 5.18 (.92) 1.84 (.87) 5.77 (.88) 3.18 (.84) 4.60 (.91)
ALL x BMT 7.62 (.85) 2.92 (.99) 4.44 (.74) 7.03 (.42) 2.44 (.97) 1.10 (.99) 2.79 (.95) 2.46 (.70) 3.22 (.85)
ALL x PBSCT 6.20 (.93) 6.39 (.69) 3.88 (.85) 3.01 (.86) 4.62 (.82) 1.97 (.75) 3.24 (.95) 3.79 (.66) 5.92 (.62)
Age1 x PBSCT 14.92 (.71) 8.03 (.81) 6.58 (.63) 5.12 (.75) 7.56 (.58) 1.63 (.84) 5.31 (.86) 3.28 (.71) 6.58 (.57)
Age1 x BMT 6.45 (.93) 6.12 (.84) 3.11 (.88) 7.13 (.51) 4.22 (.78) 2.49 (.80) 2.01 (.96) 2.94 (.56) 3.09 (.76)
Age2 x PBSCT 7.31 (.94) 5.61 (.95) 7.69 (.34) 9.53 (.45) 4.36 (.93) 1.28 (.96) 3.74 (.95) 1.26 (1.00) 9.60 (.57)
Age2xBMT 5.46 (.87) 6.96 (.68) 4.02 (.41) 4.58 (.76) 3.66 (.77) 1.95 (.78) 2.37 (.93) .93 (.91) 4.43 (.72)
Age0xPBSCT 4.29 (.98) 8.10 (.71) 4.73 (.67) 5.45 (.49) 5.34 (.74) 1.88 (.61) 2.46 (.98) 1.70 (.94) 3.58 (.75)
Age0 x BMT 9.73 (.73) 9.06 (.59) 3.97 (.76) 8.60 (.22) 6.34 (.42) 1.76 (.58) 4.03 (.85) 3.31 (.24) 5.82 (.45)
Age1 x AML 6.87 (.89) 5.47 (.87) 2.52 (.91) 2.49 (.98) 3.22 (.94) 3.98 (.34) 2.73 (.86) 2.39 (.59) 3.29 (.68)
Age1 x ALL 4.51 (.97) 2.73 (.99) 2.70 (.89) 3.19 (.74) 2.59 (.96) 2.18 (.82) 3.95 (.78) 3.88 (.50) 2.06 (.93)
Age2 x AML 10.54 (.80) 7.95 (.83) 2.94 (.89) 3.69 (.88) 5.19 (.78) 3.98 (.17) 4.89 (.81) 1.78 (.92) 7.34 (.38)
Age2 x ALL 8.57 (.65) 5.01 (.60) 4.60 (.74) 3.34 (.74) 2.13 (.96) 1.74 (.29) 4.09 (.78) 1.94 (.67) 1.69 (.97)
Age0 x AML 9.70 (.88) 9.63 (.80) 7.64 (.34) 8.95 (.58) 4.92 (.89) 3.30 (.63) 5.20 (.86) 3.54 (.67) 4.12 (.95)
Age0 x ALL 3.75 (.95) 2.97 (.94) 3.75 (.52) 2.10 (.94) 3.15 (.76) 1.27 (.86) 3.16 (.81) 2.32 (.67) 5.76 (.52)

Columns: 1 = Tx → AGVHD; 2 = Tx → CGVHD; 3 = Tx → Relapse; 4 = Tx → Death 5 = AGVHD → CGVHD; 6 = AGVHD → Relapse; 7 = AGVHD → Death; 8 = CGVHD → Relapse; 9 = CGVHD → Death.

Variable names: Age0: age in the (29.5, 42, 5] range, Age1 = age ≤ 29.5 years, Age2 = age > 42.5 years.

DxCr-i and CrTX-i, i = 1,2,3,4: DxCr and CrTx variables grouped according to quartiles.

F = female donor transplant; FM = female donor to male recipient vs other sexmatch transplant. Each column provides test statistics and p-values determined based on 5000 resampling experiments.

We note here that the transitions originating from the CGVHD state depend on whether or AGVHD was experienced prior to the entrance to the CGVHD state. This dependence violates the assumption that the sequence of states visited forms a Markov chain. However, this problem disappears if the state space of the process is enlarged to include an extra state A+CGVHD. This extra state is here denoted by 3̄. Conditionally on the time independent covariates, the resulting model has structure of a semi-Markov process with kernel F(x|z) = [Fj(x|z)] specified in Table 3. The entries of the kernel matrix have a fairly explicit form. For transitions originating from the transplant (tx) or AGVHD state, we have

Table 3.

One-step transition probability matrix

tx AGVH CGVH A+CGVH rel death
tx 0 F12 F13 0 F14 F15
AGVHD 0 0 0 F23 F24 F25
CGVHD 0 0 0 0 F34 F35
A+CGVHD 0 0 0 0 F3̄4 F3̄5
rel 0 0 0 0 1 0
death 0 0 0 0 0 1
Fj(xz)=0xeθjTZj[1+k=j1+151(=(j1,k))Γ(u)eθTZ]-2Γj(du)

for j = (j1, j2), j1 = 1, 2 and j2 = j1 + 1 ≤ j2 ≤ 5. One-step transition probabilities originating from the CGVHD state are given by

Fj(xz)=1(ZA=0)0x[1+k=451(=(3,k))Γ(u)eθTZjC]-2eθjTZjCΓj(du)

for j = (3, j2) and j2 = 4, 5. One-step transition probabilities originating from the state A+CGVHD (labeled as “3̄”) have a similar form, with covariate covariate ZA = 1.

We also consider multi-step probabilities of transitions into the absorbing states, i.e. probabilities of transition into the relapse and death states along any possible path of the model. Let J(t) be the state occupied by the process at time t and let e denote either relapse or death in remission. By noting that a patient may move into an absorbing state by first passing through the GVHD states, these probabilities are given by

He(tz)=P(J(t)=ez)=k=14He(k)(tz),

where

He(1)(tz)=P({J(t)=e}AcCcz),He(2)(tz)=P({J(t)=e}ACcz),He(3)(tz)=P({J(t)=e}AcCz),He(4)(tz)=P({J(t)=e}ACz), (4.1)

and the events A and C represent

A={AGVHDoccurspriortotheevente},C={CGVHDoccurspriortotheevente}.

The first of these probabilities corresponds to a move from the transplant to the state e in one step so that He(1)(tz)=F1e(tz) for e = 4, 5. The terms He(2) and He(3) provide the probabilities of transitions along the paths “tx → AGVHD → e” ( He(2)) and “tx → CGVHD → e” ( He(3)) and are given by He(k)(tz)=(F1kFke)(tz), k = 2 or 3, e = 4 or 5. Here for any two subdistribution functions F and F′ on the positive half-line, FF′ is their convolution

(FF)(t)=0xF(t-u)F(du)=0xF(du)F(t-u).

Lastly, transition along the path “tx → AGVHD → A+CGVHD → e” ( He(4)) contributes to the sum He(4)(tz)=(F12F23F3¯e)(tz).

The multi-step transition probabilities can be estimated using plug-in method. The estimates are consistent on time intervals [0, τ] strictly contained in the support of all sojourn time distributions. As an example, Figure 1 compares transition probabilities of hypothetical ALL patients receiving BMT and PBSCT transplant. The remaining covariates correspond to the age range 16–29.5 years and baseline subgroups specified in Table 2. The plots represent the four components of the multistep transition probabilities defined in (4.1). PBSCT seems to reduce one-step transition probabilities of both relapse and death ( He(1), black curves), and the effect is more pronounced in the case of the tx → death transition. The graphs suggest also that PBSCT associates with a reduced probability of relapse preceded by AGVHD ( He(2), red curves). At the same time, however, the probability of death in remission is higher than that of a BMT recipient. We also see an increase in the probability of relapse and death resulting from CGVHD without AGVHD ( He(3), blue curves) and CGVHD with AGVHD ( He(4), green curves).

Figure 1.

Figure 1

Transition probabilities of endpoint events of a young ALL patient receiving BMT (left panel) or PB (right panel). The remaining covariates correspond to the baseline. The curves represent one-step transitions tx → e (black), two-step transitions tx → AGVHD → e (red) and tx → CGVHD → e (blue), and three-step transitions tx → AGVHD → CGVHD → e (green).

Table 2.

Summary of covariates

Age n Graft source n Disease n
< 30 (young) 550 [BMT] 842 [AML] 1168
[30, 42.5] 534 PB/PB+BMT 803 ALL 477
> 42.5 (old) 561

Donor’s Gender n Gender-Match n

F 890 FM 441
[M] 755 [not FM] 1224

Baseline groups are marked in brackets.

FM represents a female to male transplant

To assess effects of covariates, we consider pointwise and simultaneous confidence bands for pairwise differences of one-step and multi-step transition probabilities. In the case of one-step transition probabilities, we consider functions

ΔjF(tz1,z2)=Fj(tz1)-Fj(tz2),jJ0,

where z1 and z2 are two covariate levels. We denote by Δ^jF the corresponding sample analogue of the function ΔjF. Results of Section 5 imply that the normalized process W^j,ΔF={n[Δ^jF-ΔjF](tz1,z2):t[0,τ]} converges weakly to a mean zero Gaussian process WjΔF={WjF(tz1)-WjF(tz2):t[0,τ]}.

To construct confidence bands, we note that each Δ function forms a difference of two subdistributions functions. Correspondingly, it assumes values between −1 and 1. Direct application of the Gaussian approximation to the limiting distribution of the process WjΔF may result in confidence intervals and confidence bands whose bounds may fall outside the interval (−1, 1). To circumvent this problem, we use transformation method.

Let Φ: R → (−1, 1) be strictly increasing differentiable function derivative ϕ satisfying ϕ(x) > 0 for all xR. By delta method,

n[Φ-1(Δ^jF(tz1,z2))-Φ-1(ΔjF(tz1,z2))]ϕ(Φ-1(ΔjF(tz1,z2)))-1Wj,ΔF(tz1,z2),t[0,τ].

Let cα(t1, t2) be the upper α percentile of the distribution of

supt1tt2[Wj,ΔFσ^ΔjF](tz1,z2),

where σ^ΔjF(tz1,z2) is an estimate if the standard deviation of ΔjF(tz1,z2). Then, by the continuous mapping theorem, the 100 × (1 − α)% asymptotic confidence band for the Δ function has upper and lower bounds given by

Φ(Φ-1(Δ^jF(tz1,z2))±cα(t1,t2)σ^ΔjF(tz1,z2)ϕ(Φ-1(Δ^jF(tz1,z2)))). (4.2)

The corresponding pointwise confidence intervals can be obtained by replacing the constant cα(t1, t2) by the upper α/2 percentile of the standard normal distribution.

A possible choice of the Φ function may correspond to Φ(x) = 2G(x) − 1, where G is a distribution function with density g supported on the whole real line. In analogy to the construction of the confidence bands for survival function in Andersen et al. (1993), we may consider the choice of the extreme value distribution G(x) = 1 − exp[−ex]. In this case Φ−1(u) = log[−log[(1 − u)/2]] and the bounds are given by

1-2[1-Δ^jF(tz1,z2)2]exp[±cα(t1,t2)[hσ^ΔjF](tz1,z2)],h(tz1,z2)=[log[(1-Δ^jF(tz1,z2))/2](1-Δ^JF(tz1,z2))]-1. (4.3)

Another possible choice may correspond to the logistic distribution, G(x) = ex/[1+ex]. We have Φ−1(u) = log([1 + u]/[1 − u]), and the bounds assume form

1-2(1+1+Δ^jF(tz1,z2)1-Δ^JF(tz1,z2)exp[±cα(t1,t2)[hσ^ΔJF](tz1,z2)])-1,h(tz1,z2)=2[Δ^jF(tz1,z2)+1)(1-Δ^jF(tz1,z2))]-1. (4.4)

A similar approach can be applied towards comparison of multi-step transition probabilities. For any two covariate levels, z1 and z2, we set

ΔjH(tz1,z2)=Hj(tz1)-Hj(tz2),j=4,5.

The corresponding sample analogue is denoted by Δ^jH. It is easy to see that { W^j,ΔH(tz1,z2)=n[Δ^jH-ΔjH](tz1,z2):t[0,τ]} converges weakly to a Gaussian process WjΔH(tz1,z2)=WjH(tz1)-WjH(tz2), where

WjH(tz)=W1jF(tz)+i=23[W1iFFij+F1iWij](tz)+[W12FF23F3¯j+F12W23FF3¯j+F12F23W3¯jF](tz).

and the integrals are defined by means of the convolution formula.

In Figures 2–5, we compare one-step and multi-step transition probabilities of relapse and death in remission for patients with selected covariate profiles. To obtain the bands, we first used Gaussian multiplier method to estimate the approximate variance of the Δ function: the Monte Carlo variance of the Δ function was computed based on 5000 Monte Carlo experiments. A second application of the Gaussian multiplier method was then used to obtain an approximation of the critical level cα(t1, t2) based on 5000 Monte Carlo trials. The interval [t1, t2] was chosen to correspond to t1 = 1.5 and t2 = 60 months. The bounds (4.2) and (4.3) showed a close numerical agreement and the resolution of the graphs does not allow to show the difference between the two choices. The difference between the upper/lower bounds did not exceed .07%, and the bands obtained using the logistic transformation were narrower.

Figure 2.

Figure 2

Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of ALL patients receiving BMT. Covariates: age z1 ≤ 29.5 and z2 = baseline age. The remaining covariates correspond to the baseline.

Younger age associated with reduced probabilities of relapse and death of both AML and ALL patients. In Figure 2, we use Δ function to compare transition probabilities of hypothetical younger (z1) and baseline age (z2) ALL bone marrow transplant recipients. The remaining covariates correspond to baseline groups specified in Table 2 and median waiting times variables DxCr and CrTx. The plots show that younger age has “concordant” effect on endpoint probabilities, i.e. younger age associated with reduced probability of both relapse and death. In the case of one-step tx → relapse transitions, the pointwise bands suggest that the differences are significant but the wider simultaneous bands show that this is not the case. Examination of the four possible paths leading to the relapse state showed that although younger patients have lower one-step relapse transition probabilities, they are at a higher risk of relapse preceded by AGVHD than patients in the baseline age group. This accounts for marginal differences in the multistep relapse transition probabilities. Figure 2 shows also that multi-step transitions into the death state are significantly lower for a younger patient since the upper bounds of both pointwise and simultaneous bands are below the horizontal line passing through 0. While in the case of one-step transition probabilities there is a marginal difference during the early post-transplant period, patients in the baseline age group had higher probabilities death preceded by GVHD.

In Figure 3, we show the “discordant” effect of older age on the two endpoint probabilities. The graphs represent Δ function for hypothetical ALL patients receiving peripheral blood stem cell transplant. The covariate z1 corresponds to the older age and z2 to the baseline age group. The remaining covariates correspond to baseline (Table 2). Older age associated with lower transition probabilities into the relapse state. On the other hand, the role of the two covariates is reversed in the case of transitions into the death state. Plots of the four paths leading to the endpoint events showed that an older patient may have higher probabilities of death resulting from CGVHD (with or without AGVHD) while probability of transition along the path tx → AGVHD → death is comparable to that of a patient in the baseline age group.

Figure 3.

Figure 3

Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of ALL patients receiving PBSCT. Covariates: z1 = age > 42.5 years, z2 = baseline age. The remaining covariates correspond to the baseline.

In the next figure we show a “switching” treatment effect. Figure 4 compares two hypothetical young AML patients receiving PBSCT (z1) and BMT (z2). The one-step and multi-step relapse probabilities were lower in the case of the PBSCT but the differences were not significant. On the other hand, we see that PBSCT associates with a lower probability of one-step transition into the death state, while in the case of multi-step transitions the role of the two graft sources is reversed. This pattern is also seen in the case ALL young patients in Figure 1, but in the case of AML patients the differences in the multi-step transition probabilities were more pronounced.

Figure 4.

Figure 4

Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of young AML patients. Covariates z1 = PBSCT z2 = BMT. The remaining covariates correspond to the baseline.

A similar approach can be applied to compare transition probabilities evaluated by conditioning on the follow-up history of a patient. In particular, Arjas and Eerola (1993) and Eerola (1994) have suggested the use of graphs of the conditional probabilities

P(J(t)=eHs),s<t (4.5)

where Inline graphic represents patient’s history up-to time s. Examples of these graphs specialized to Markov chains and semi-Markov models were given in Klein et al. (1993), Keiding et al. (2001), Dabrowska et al. (1994) and Putter et al. (2007). Here we note only that in the case of Markov chain regression models, the predictions depend only on the state occupied by the patient at time s and estimation of (4.5) reduces to estimation of the transition probability matrix because

P(J(t)=eHs)=P(J(t)=eJ(s)=i,Z)fors<t. (4.6)

In the case of semi-Markov model, the conditional probabilities P(J(t) = e| Inline graphic) are given by the transition probability matrix of a delayed Markov renewal process, with delay determined by the length of time spent on the state occupied at time s. On the other hand, the right-hand side of (4.6) depends also on the the initial state J0, and all possible transitions leading to the state e and passing through the state i on or prior to time s. The two models coincide only if the sojourn times in each state are exponentially distributed.

In Table 5 we report results from analysis of residuals of the main variables in the model. We considered Martinussen and Scheike’s Kolmogorov-Smirnov statistics for transitions between adjacent states of the model from each state. The test statistics were calculated in the range t ∈ [1, 90] months and the reported p-values were obtained using Gaussian multiplier method based on 5000 Monte Carlo samples. The results were also compared with a larger model, which included length of time spent in the transplant and AGVHD states as time dependent covariates. The dependence on length of time spent in these states appeared to have marginal effect. In the case of the transitions originating from the CGVHD state, the latter may stem from a relatively small number of failures (relapse or death). On the other hand, AGVHD can occur only during the first 4 months and the state space of the process partially captures the dependence on the length of time spent in the transplant state. Although Table 5 shows an acceptable fit, there are several possible sources of departure from the model, In particular, they may be caused by calendar and center effects. For example, grading of acute and chronic GVHD is not uniform across centers. At the same time, the use of PBSCT in allogeneic transplants might have been more frequent towards the end of the study period than at its beginning. These factors were not taken into the account in this study as they identify patients in the population. Further, transplant may result in many other complications, including infections, pneumonia, as well secondary cancers, loss of vision and damage of other organs. We have not taken them into the account due to lack of data.

There has been very little work on variable and model selection problems in multi-state models. Commenges et al. (2007) considered a flexible class of multistate models which includes as special cases Markov chains and semi-Markov models. They extended the expected Kullback-Leibler (EKL) risk function to counting process models coarsened at random and proposed a leave-one-out cross-validation method for approximation of EKL based on penalized likelihoods. The approach was illustrated using a three state additive illness process, though the methodology applies to more complex situations as well. Another approach may be based on focused information criteria and model averaging of Hjort and Cleaskens (2003,2006). Their approach is tailored towards selection of a model for given parameters of interest. In the case of single spell models, examples of such parameters include regression coefficients, quantiles, cumulative hazards or distribution functions evaluated at a fixed point or over a fixed interval. Extension of this method to multistate regression models may include onestep and multistep transition probabilities or other parameters arising in prediction problems.

5 Proofs

5.1 Assumptions and notation

We first recall that if A = [ak] is a rectangular d × q matrix then its ℓ1 and ℓ norms are given by

||A||1=maxk=1dakand||A||=maxk=1qak,

and we have ||A||1 = sup{μT: ||μ|| ≤ 1, ||λ||1 ≤ 1} = ||AT||, where μ = (μ1, …, μd)T and λ = (λ1, …, λq)T. If A(s) = [aij(s)], s = (x, θ) is a d × q matrix of functions defined on Inline graphic = [0, τ] × Θ then ||A|| = sup{||A(s)||1: sInline graphic} is the corresponding supremum norm, and with some abuse of notations, we write ||A|| = sup{||A(s)||: sInline graphic}. We also use ||·|| to denote the supremum norm of scalar or vector-valued functions on [0, τ].

We shall assume the following regularity conditions on the hazard rates αj(y, θ, z), yRq, jInline graphic.

Condition 5.1

  1. The parameter set Θ ⊂ Rd is bounded and open.

  2. For fixed zRd, the function ℓj(y, θ, z) = log αj(y, θ, z), jInline graphic is twice continuously differentiable with respect to (y, θ). The derivatives with respect to y (denoted by primes) and with respect to θ (denoted by dots) satisfy ||j(y,θ,z)||1ψ(||y||1),||j(y,θ,z)||1ψ(||y||1), ||ℓ̇j(y, θ, z)||1ψ1(||y||1), ||ℓ̈j(y, θ, z)||1ψ2(||y||1) and ||g(y, θ, z) − g(y′, θ′, z)||1 ≤ max(ψ3(||y||1), ψ3(||y′||1)) × × [||yy′||1 + |θθ′|], where g=¨j,.j and j. Here ψ is a constant or a continuous bounded decreasing function. The functions ψp, p = 1, 2, 3 satisfy ψp(0) < ∞, are continuous and locally bounded.

  3. For fixed θ ∈ Θ and yRq, the functions αj(y, θ, ·) and their logarithmic derivatives in (ii) are measurable with respect to the Borel σ-field of Rd.

  4. We have either a) m1 < αj(y, θ, z) < m2 for some 0 < m1 < m2 < ∞ or b) αj(y, θ, z) is a bounded coordinate-wise decreasing function such that αj(y1, …, yk, θ, z) ↓ 0 as y ↑ ∞, ℓ = 1, …, q, and
    m1[1+c1||y||1]-e1αj(y,θ,z)<m2[1+c2yj]-e2,j=1,,q

    for some c1, c2 > 0, e1 ∈ (0, 1], e2 ∈ [0, 1] and 0 < m1 < m2 < ∞.

The condition (ii) assumes that the function α(y, θ, z) and its derivatives are jointly continuous in the arguments (y, θ). Together with the condition (iii), this implies that they are measurable with respect to the Borel σ-field of Inline graphic(Rq) ⊗ Inline graphic(Θ) Inline graphic(Rd). The condition 5.1 (iii) serves to ensure that for each state j1Inline graphic, the Volterra equation corresponding to the transitions jInline graphic originating from the state j1 has a non-explosive solution on the interval [0, τj1] = sup{x: EYj(x) > 0}.

Let P be a distribution satisfying assumptions 2.1 of Section 2. For any jInline graphic let

AjP(x)=0xEPNj.i(du)EPYj.i(u)

and set A.P = ΣjInline graphicAjP. In analogy to the single spell models in Dabrowska (2006), we can show that the condition 5.1 (iii-a) implies that, the Volterra equation has a unique solution Γθ = [Γ1θ, …, Γ]T such that m2-1AjP(x)Γjθ(x)m1-1AjP(x) for x ∈ [0, τ], θ ∈ Θ. In addition, there exist positive constants d1, d2, d3 such that

||Γθ(x)-Γθ(x)||1θ-θd1exp[d2A.P(x)],Γjθ(x)-Γjθ(x)d3EPNj.i((xx,xx]). (5.1)

Similar inequalities hold also for the left continuous version of Γ. On the other hand, under the condition 5.1.(iii-b), we have Φ2(AjP (x)) ≤ Γ(x) ≤ Φ1(A.P (x)), where Φq(u)=cq-1([1+cqu/mq]1/1-eq-1) for q = 1, 2 and mq=mq/(1-eq) if eq ≠ 1, and Φq(u)=cq-1(ecqu/mq-1) if eq = 1. The functions Φq are inverse cumulative hazards corresponding to the lower and upper bounds on hazard rates in the condition 5.1 (iii). The inequality (5.1) is in this case satisfied with the function A.P replaced by Φ1(A.P ).

5.2 Some measurability issues

In section 2, we assumed that the observations D1, …, Dn of the censored modulated renewal process are defined on a common complete probability space (Ω, Inline graphic, P) and take on values in a separable measure space ( Inline graphic, Inline graphic). A measure space is here called separable if its σ-field is countably generated and contains all singletons. Any such space is measurably isomorphic to a subspace of the real line equipped with its Borel σ-field (e.g. Dellacherie-Meyer, 1975, p.15). Let ( Inline graphic, Inline graphic) and ( Inline graphic, Inline graphic) be the corresponding n-fold and infinite product spaces and let Pn and P be the corresponding product measures on Inline graphic and Inline graphic induced by (D1, …, Dn) and D = (D1, D2, …, Dn, …), respectively. We denote by SnP the sigma-field of subsets AInline graphic measurable in the completion of the product probability measure Pn and by Snu the universal sigma-field generated by Inline graphic, i.e. the sigma-field of subsets measurable in the completion of any probability measure Q on Inline graphic. We have SnSnuSnP. Whereas Inline graphic is not complete with respect to the product measure Pn, any set ASnP satisfies Pn(A)=Pn,(A) and gn-1(A)F for gn = (D1, …, Dn). The sigma-fields SP and Su have similar property. Without much loss of generality, we can assume therefore that (Ω, Inline graphic) = ( Inline graphic, Inline graphic) and, when necessary, require measurability with respect to these larger sigma-fields. With this choice the sequence D is the identity map on Inline graphic and (D1, …, Dn) are the corresponding coordinate projections on ( Inline graphic, Inline graphic).

Further, let (Ω0, Inline graphic) be an arbitrary measure space let Inline graphic be a Polish space or a Borel subset of it. For any set A ⊂ Ω0 × Inline graphic, its projection on Ω0 is denoted by projΩ0(A) = {ω0: (ω0, z) ∈ A for some z ∈ Inline graphic}. A multifunction (or correspondence) is a set-valued function assigning to each ω0 ∈ Ω0 a subset of Inline graphic. We shall write H: Ω0, ↪ Inline graphic for such mappings to differentiate them from usual functions assigning to each ω0 a single value (h: Ω0Inline graphic). The domain and graph of a multifuction H are defined as

domH={ω0:H(ω0)}andgraphH={(ω0,z):zH(ω0)},

respectively. For any nonempty set BInline graphic, the inverse image of H is given by

H-1(B)={ω0:H(ω0)B}={ω0:zH(ω0)forsomezB}

and the right side is equal to the projection projΩ0(graphH ∩ Ω0 × B). Finally, by a selector we mean a function h: Ω0Inline graphic ∪ {z*} such that h(ω0) ∈ H(ω0) if domH ≠ ∅ and h(ω0) = z*, otherwise. (Here z* is an extra point attached to Inline graphic).

A set-valued mapping H is here called measurable if graphH is jointly measurable with respect to Inline graphicB( Inline graphic). By measurable projection theorems (e.g. Dellacherie and Meyer, 1975, p.252, Pollard, 1984, p. 196–197 or Dudley, 1999, Chapter 5), the joint measurability of graph H entails that the inverse image H−1(B) of any Borel set BInline graphic( Inline graphic) belongs to the universal sigma field F0u generated by Inline graphic. Moreover, H admits at least one F0u -measurable selector. If Inline graphic is complete with respect to some probability measure then F0u=F0. For alternative conditions for this equality we refer to Wagner (1976).

Further, let Inline graphic be a Polish space and let {Xt: tInline graphic} be an Rk-valued random element defined on Ω0. We refer to it as measurable if it forms a measurable stochastic process, i.e. the map Ω0 × Inline graphic ∋ (ω, t) → X(ω, t) ∈ Rk is jointly measurable with respect to the σ-fields Inline graphicInline graphic( Inline graphic) and Inline graphic(Rk). Correspondingly, the set valued function H: Ω0Inline graphic = Inline graphic × Rk given by H(ω0) = {(t, X(ω0, t): tInline graphic} has a measurable graph and for any Borel sets BInline graphic(Rk) and CInline graphic( Inline graphic), we have {ω0:X(ω0,t)BforsometC}F0u. In section 5.3, we use that an Rk-valued process is measurable iff each of its components is measurable. Moreover, sums and products of such processes are measurable as well.

A class of scalar functions Inline graphic = {gt(s): tInline graphic} defined on Inline graphic, kn is called here measurable if it forms a measurable process in the above sense. Following Nolan and Pollard (1987) and Pollard (1990), a measurable class of functions Inline graphic is called Euclidean for an envelope G if |gt|(s) ≤ G(s) for all tInline graphic, and there exist constants A and V such that N(ε||G||Q,r, Inline graphic, ||·||Q,r) ≤ (A/ε)V for all ε ∈ (0, 1) and all probability measures Q on Inline graphic such that ||G||Q,r < ∞. Here N (η, Inline graphic, ||·||Q,r) is the minimal number of Lr(Q)– balls of radius η covering the class Inline graphic and ||·||Q,r is the Lr(Q) norm. We use r = 1, 2 in the sequel.

In our application the space ( Inline graphic, Inline graphic) can be taken as the complete separable metric space ( Inline graphic, Inline graphic) = (E0, Inline graphic(E0)) × (E1 × Inline graphic(E0)), where E0 = Inline graphic × Rd E1 = (+ × ( Inline graphic × Rd)∪ Δ). Here E0 represents possible initial realizations of the mark V0 = (J0, Z0) and E1 is the space of realizations of the censored modulated renewal process (Xm, Vm = (Jm, Zm))m≥1. Further, Inline graphic = [0, τ] × Θ, where τ is a finite point on the positive half-line and Θ is a bounded open subset of a Euclidean space. Here Inline graphic is a Polish space because Inline graphic forms a Gδ subset (a countable intersection of open sets) of a Polish space and Polishness is hereditary with respect to Gδ sets. Finally, all classes Inline graphic = {gt(s): t = (x, θ) ∈ Inline graphic} correspond to cádlág (or cáglád) functions such that for 0 ≤ x < x′ ≤ τ and θ, θ′ ∈ Θ, we have

gxθ(s)-gxθ(s)C1[G(x,s)-G(x,s)],gxθ(s)-gxθ(s)C2θ-θG(τ,s), (5.2)

where (s, x) is a nonnegative monotone increasing cádlág (respectively cáglád) function of x such that (s, 0) = 0 and ||(τ, ·)||Q,r < ∞. In this case, the Euclidean property is satisfied with envelope G(s) = [C1 + C2diam Θ](τ, s) + gx0θ0(s), where gx0θ0 (s) is an arbitrary function from the class Inline graphic.

To verify measurability of the estimates, we shall need some properties of Carathéodory integrands and cádlág or cáglád functions. If Inline graphic and Inline graphic are Polish spaces then a function f: Ω0 × Inline graphicInline graphic is called a Carathéodory integrand if for fixed tInline graphic, f (·, t): Ω0Inline graphic is measurable, and for fixed ω0 ∈ Ω0, f (ω0, ·): Inline graphicInline graphicis continuous. Here (Ω0, Inline graphic) is an arbitrary measure space and we have

Lemma 5.1

Let f: Ω0 × Inline graphicInline graphic be a Carathéodory mapping. Then

  1. f is measurable with respect to Inline graphic × Inline graphic( Inline graphic).

  2. For any open set B of Inline graphic, let H(ω0) = {t: f (ω0, t) ∈ B}. Then for any closed or open C set of Inline graphic, we have H−1(C) = {ω0: f (ω0, t) ∈ B for some tC} ∈ Inline graphic.

  3. If g: Ω0Inline graphic is measurable, then the composite mapping fg: Ω0Inline graphic given by (fg)(ω0) = f (ω0, g(ω0)) is measurable.

  4. Suppose that Inline graphic is another Polish space and h: Ω0 × Inline graphic × Inline graphicInline graphic is a Carathéodory integrand. Then the composite map (hf): Ω0 × Inline graphicInline graphic given by (hf )(ω0, t) = h(ω0, t, f (ω0, t) is a Carathéodory integrand.

Part (i) remains valid even if Inline graphic is replaced by a nonseparable metric (Kuratowski 1966 p. 378, or Himmelberg, 1975). In part (ii), if C is a closed set then

H-1(C)=qC{ω0:f(ω0,q)B},

where the union is over a dense subset of C. If C is open then it can be represented as a countable increasing union of closed sets and part (ii) follows by noting that inverse images preserve unions of sets. Part (iii) follows from the definition of a measurable function and continuity of f with respect to t. Part (iv) follows from part (i) and (iii) and definition of a continuous function.

Part (i) of the lemma extends to functions f which are cádlág, cáglád, cád and cág in tInline graphic, Inline graphic = R+ or Inline graphic = [0, τ] and take on values in a complete separable metric space (e.g. Dellacherie and Meyer, 1975 p. 144). Any cádlág or cáglád function is also a pointwise limit of Carathéodory integrands.

Finally, suppose that Inline graphic = [0, τ] × Θ and f is a function such that (i) for fixed (x, θ) ∈ Inline graphic, f(·, x, θ) is the Inline graphic measurable and (ii) for fixed ω0 ∈ Ω0, it is jointly cádlág with respect to (x, θ) and continuous with respect to θ. To see that f is jointly measurable, let {qk: k ≥ 1} be a dense set in Θ and for given integer m ≥ 1 let Bmk be a balls of radius 1/m centered at qk covering Θ. Set Bmk=Bmk-r=1k-1Bmr and

fm(ω0,x,θ)==1k=1f(ω,mτ,qk)1(-1mx<m)1(θBmk).

Then fm is joinly measurable and pointwise converges to f. Similarly, if f is jointly cáglád rather than cádlág function in (ii) then f is a jointly measurable with respect to Inline graphicInline graphic( Inline graphic). Similarly to the single parameter case, functions of this type are pointwise limits of Carathéodory integrands. Part (ii) of the lemma remains valid for sets of the form C = I × C′, where C′ is an open or closed subset of Θ and I is an interval contained in [0, τ]. In particular, if f is a real valued cádlág function of this type then its supremum is Inline graphic measurable.

5.3 Proof of Proposition 3.1

To show proposition 3.1, we shall first consider the process Γ(x), (x, θ) ∈ [0, τ] × Θ = Inline graphic.

Lemma 5.2

  1. The process Ŵ= {Ŵ(t) = [Ŵj(t): t = (x, θ) ∈ Inline graphic, jInline graphic}, W^j(x,θ)=n[Γnjθ-Γjθ](x), converges weakly in ℓ( Inline graphic × Inline graphic) to
    W(x,θ)=V(x,θ)-[0,x]V(u-,θ)s(Γθ(u-),θ,u)Cθ(du)Pθ(u,x),
    where {V (t) = [Vj(t): t = (x, θ) ∈ Inline graphic, jInline graphic} is a tight mean zero Gaussian process. Its covariance function is given by
    cov(Vj(x,θ),Vj(x,θ))=Emm0x0xMjm(du,θ)Mjm(dv,θ)sj(Γθ(u),θ,u)sj(Γθ(v),θ,v).

    In addition, under the assumption that observations correspond to a censored modulated renewal process and θ = θ0 is the true parameter, cov(Vj(x, θ), Vj(x′, θ)) = 1(j = j′)C(xx′).

  2. Let θ0 be an arbitrary point in Θ. If θ̂ is a n-consistent estimate of it, then the process Ŵ0 = {Ŵ0(x): xτ}, W^0=n[Γnθ^-Γθ0-(θ^-θ0)TΓ.nθ^] converges weakly in ℓ([0, τ] × Inline graphic) to W0 = W (·, θ0).

Here the space Inline graphic = ℓ( Inline graphic × Inline graphic) is equipped with uniform metric, dX(, ) = supt,j |(t, j) − (t, j)| and is isometric to the space Inline graphic = ℓ( Inline graphic)q equipped with metric dY (x, y) = maxj supt |xj(t) − yj(t)|. Apparently, the isometry is given by the mapping Φ assigning to each Inline graphic the vector of coordinate functions, Φ() = [(·, 1), …, (·, q)]T. Open sets of Inline graphic can be represented as arbitrary unions of balls Inline graphic(, ε) = {y: dX(x, y) < ε}. On the other hand, the product topology of Inline graphic coincides with the topology induced by the metric dY so that any open set in the product topology is an arbitrary union of balls Inline graphic(x, ε), where x = [x1, …, xq].

Proof

To show part (i), define Vn = [Vjn: jInline graphic], where

Vjn(x,θ)=(0,x]Nj..(du)Sj(Γθ(u-),θ,u)-(0,x]ENj..(du)sj(Γθ(u-),θ,u).

Then Vjn = V1jn + remj, where

V1jn(x,θ)=1ni=1n(0,x][Nj.i(du)s(Γθ(u-),θ,u)-Sjisj2(Γθ(u-),θ,u)ENj..(du)]

and remj(x, θ) is a remainder term. Lemma 5.3 gives its form and shows that ||remj|| = oP (n−1/2). Therefore the process V1n = [V1jn: jInline graphic] satisfies also ||VnV1n|| = oP (n−1/2).

Using CLT and Cramer-Wold device, the finite dimensional distributions of nV1n converge in distribution to finite dimensional distributions of V: for any distinct t1, …, tkInline graphic and any numerical vector λ of length kq, the random variable λTvec[V1n(t1), …, V1n(tk)] converges in distribution to the corresponding linear combination of finite dimensional marginals of V.

For each jInline graphic, the process V1jn can be represented as V1jn(x, θ) = [ℙnP ]g, where g varies over a class Inline graphic = {gtj: t = (x, θ) ∈ Inline graphic} consisting of cádlág functions such that each gtj is a difference of two càdlàg functions, increasing in x and Lipschitz continuous with respect to θ. Setting Gj(Di,x)=Nj.i(x)+0xYj.i(u)Aj(du), the condition (5.2) is satisfied with constants C1 and C2 determined by the functions ψ, ψ1 of the condition 5.2 (ii) and gt0gτ, θ0, say. Correspondingly, the class Inline graphic is Euclidean for a square integrable envelope Gj. From Pollard (1984,1990) it follows that the process nV1jn converges weakly in ℓ( Inline graphic) to Vj, the j-th component of the process V because the class Inline graphic is totally bounded and asymptotically uniformly equicontinuous with respect to the variance pseudo-metric dj(t, t′) = sd(V1jn(t) − V1jn(t′)), t, t′ ∈ Inline graphic. Joint weak convergence of the process nVn=n(Pn-P)g, g ∈ ∪j Gj follows from finite dimensional weak convergence and by noting that union of a finite number Euclidean classes of functions is also Euclidean (Pollard, 1990). In particular, the class Inline graphic is totally bounded and asymptotically equicontinuous with respect to the variance pseudo-metric d((t, j), (t′, j′)) = sd(V1nj(t) − V1nj(t′)). Denoting by Vn- the left-continuous process (obtained by changing the integrals over (0, x] to integrals over intervals [0, x)), the process nVn- converges weakly to V as well because the jumps of the process Vn are of the order Op(1/n) unifromly in tInline graphic and the functions ENj are continous.

Finally, to show weak convergence of the standardized Γ process, we shall need bounds on the supremum of the norm of the vector Vn. Let Inline graphic denote the class of functions Inline graphic = {h(λ, t) =Σj=1 λjgtj: gtjInline graphic, |λj| ≤ 1, j = 1, …, q}. Then Inline graphic forms a Euclidean class for the envelope H = Σj Gj and we have

EsuptT||nV1n(t)||1=EsuphHnPn-Ph=O(1).

Similarly, EsuptT||nV1n(t)||=O(1) and the left-continuous versions of the process satisfy similar bounds.

To show consistency of the estimate Γ, we first assume the condition 5.1. (iii-a). Let Ajn be the Aalen-Nelson estimator. Let Apjn=mp-1Ajn, p = 1, 2. Then A2jn(x) ≤ Γnjθ(x) ≤ A1jn(x) for all θ ∈ Θ and a similar algebra as in Dabrowska (2006) shows that

Γnjθ(x)-Γjθ(x)Vjn(x,θ)+(0,x]||Γnθ-Γθ||1(u-)ρjn(du),

where ρjn = max(cj, 1)A1jn for some constant cj. Therefore ||Γ − Γθ||1(x) ≤ ||Vn(x, θ)||1+∫(0,x] ||Γ − Γθ||1(u−)ρn(du), where ρn = Σ j ρnj. Gronwall’s inequality (Beesack, 1973, Dabrowska, 2006) implies that supx,θexp[−ρn(x)]||Γ− Γθ||1(x) → 0 a.s., where the supremum is over θ ∈ Θ and x ∈ [0, τ]. In the case of the condition 5.1.(iii-b), the proof is the same, except that the function ρjn is replaced by ρj = max(cj, 1)Φ1(A.n), where A.n = ΣAjn. Note that Aalen-Nelson estimate is a measurable process, whereas measurability of the process Γ is verified below.

The process W^(x,θ)=n[Γnθ-Γθ]T(x) satisfies

W^(x,θ)=nVn(x,θ)-(0,x]W^(u-,θ)bnθ(u)N¯(du),

where (x) is the diagonal matrix (x) = diag [N1..(x), …, Nq..(x)], and (u) is a q × q matrix with columns

bjnθ(u)=[01(Sj/Sj2)(θ,Γθ(u-)+λ[Γnθ-Γθ](u-),u)dλ].

Let bθ(u) be a q × q matrix with columns bjθ(u)=[sj/sj2](Γθ(u),θ,u). Using consistency of Γ and Lemma 5.3, we have [bθ](u) ∈ 0 a.s. uniformly in (u, θ) ∈ Inline graphic. Moreover, (5.1) and (5.2) imply also that ||R1n||→ 0 a.s., where

R1n(x,θ)=(0,x]bθ(u)[N¯-EN¯](du).

Define

W(x,θ)=nVn(x,θ)-(0,x]W(u-,θ)bθ(u)EN¯(du).

Then

W(x,θ)=nVn(x,θ)-(0,x]nVn(u-,θ)bθ(u)EN¯(du)Pθ(u,x)=(0,x]Vn(du,θ)Pθ(u,x).

and

W^(x,θ)-W(x,θ)=-(0,x][W^-W](u-,θ)bnθ(u)N¯(du)+rem(x,θ),

where

rem(x,θ)=-(0,x]W(u-,θ)[bnθ(u)N¯(du)-bθ(u)EN¯(du)]. (5.3)

Setting vn=max(||nVn||1,||nVn-||1)=OP(1), we have

max(||Wn||,||Wn-||)vnexpsupθ0τ||bθ(u)||1EN(u)=Op(1).

The process is a sum of iid mean zero processes whose finite dimensional distributions are asymptotically normal and converge to the finite dimensional distributions of the process W in the statement of the proposition. Moreover, its components can be represented as empirical processes indexed by Euclidean classes of functions satisfying the condition (5.2). Therefore a similar argument as in the case of the process nV1n, shows that W. The remainder term (5.3) is bounded by p=24Rpn(x,θ), where

R2n(x,θ)=(0,x]nVn(u-,θ)R1n(du,θ),R3n(x,θ)=(0,x]nVn(u-,θ)bθ(du)EN¯(du)Jn(u,x,θ),R4n(x,θ)=(0,τ]||[bnθ-bθ](u)||1N(du)||W(u-,θ)||1,Jn(u,x,θ)=(u,x]Pθ(u,w)R1n(dw,θ),

where N = ΣqNq,... We have ||R2n|| = oP (1), by a similar V-process expansion as in Lemma 5.4 below. Using Kolmogorov equations for matrix product integrals (Gill and Johansen, 1990), we also have

Jn(u,x,θ)=R1n(x,θ)-R1n(u,θ)-(u,x]Pθ(u,s-)bθ(s)EN¯(ds)[R1n(x,θ)-R1n(s,θ)]

and

||Jn(u,x,θ)||12||R1n||[1+(u,x]||Pθ(u,s-)||1||bθ(s)||1EN(ds)2||R1n||exp(u,x]||bθ(s)||1EN(ds)2||R1n||exp(0,τ]||bθ(s)||1EN(ds).

From this we also get ||R3n|| = oP (1), because bθ (u) is uniformly bounded. Finally, ||R4n|| = oP (1) Combining, the right-hand side of (5.3) is of the order oP (1), uniformly in (x, θ) ∈ Inline graphic. For fixed (x, θ), we also have

||W^(x,θ)-W(x,θ)||1||rem(x,θ)||1+(0,x]||W^-W||1(u-,θ)ρn(du)

and by uniform Gronwall’s inequality (Beesack, 1993, Dabrowska, 2006), we have Ŵ(x, θ) = (x, θ) + oP (1) uniformly in (x, θ) ∈ Inline graphic.

To complete the argument, we note that the processes V1n Vn, and the remainders Rpn, p = 1, …, 3 satisfy measurability conditions of section 5.2, whereas to show that Ŵ and R4n have this property, it is enough to show that the process Γ is measurable. However, the aggregate process N(x)=i=1nmNjmi(x) is measurable since it is cádlág increasing with respect to x and measurable with respect to Inline graphic for fixed x. For any integer k and ω0 = (s1, …, sn), Tk(ω0) = inf{x: N(ω0, x) ≥ k} is a random variable because {ω0: Tk(ω0) ≤ x} = {ω0: N(x, ω0) ≥ k}∈ Inline graphic. Similarly, the censored data ranks Rim = Σk 1(TkXim) are measurable. Define set valued mapping Hn: Inline graphicRq by setting Hn(ω0) = {(θ, x): Γ(ω0, x)) ∈ B} where B is an open set of Rq. Then Hn(ω0) =∪ℓ≥0 Hn(ω0) where

Hn(ω0)={(θ,x):Γnθ(ω0,x)BandN(ω0,)=}.

On the set Al = {ω: N(ω0, τ) = ℓ}∈ Inline graphic, the process Γ is a weighted sum

Γnθ(x,ω0)=k=11(Tk(ω0)x)hnθ(·,k,ω0)

and the weights form a finite composition of Carathéodory integrands. Suppressing dependence on ω0, h(·, k) is the k-th column of a q × ℓ matrix hn with entries

hnθ(j,k)=im1(Rim=k)1((Jim,Jim+1)=j)im1(Rimk,Jim=j1)αj(gnθ(·,k-1),θ,Zim),

where j = (j1, j2) ∈ Inline graphic and g is a q × ℓ matrix with columns

gnθ(·,0)=0gnθ(·,k)=gnθ(·,k-1)+hnθ(·,k).

Alternatively, gnθ=gnθ(), where gnθ(0)0 and for r = 1, …, ℓ

gnθ(r)(·,k)=pkhnθ(r)(·,p),

where

hnθ(r)(j,k)=im1(Rim=k)1((Jim,Jim+1)=j)im1(Rimk,Jim=j1)αj(gnθ(r-1)(·,k-1,θ,Zim)

for j = (j1, j2) ∈ Inline graphic. The indicators 1(Tk(ω0) ≤ x) are jointly measurable with respect to Inline graphicInline graphic( Inline graphic) and by Lemma 5.1, so are the weights h and g. Therefore the graph of Hn is Inline graphicInline graphic( Inline graphic) is measurable and

{(ω0,x,θ):Γnθ(ω0,x)B}=graphHn=l0graph(Hn)SnB(T).

A similar argument can be used to show measurability of the process Γ̇ in part (ii). Using arguments analogous to Dabrowska (2006), ||Γnθ0+hn-Γnθ0-hnΓ.nθ0||=OP(||hn||12) and ||Γ̇0+hn− Γ̇0|| = OP (||hn||1) = oP (1) for any deterministic sequence hn → 0 or a random SnP - measurable sequence hnP0. Therefore if θ̂ is an SnP -measurable n - consistent estimator of θ0, then setting hn = θ̂θ0, we have Ŵ0(x) = Ŵ(x, θ0) + remn(x), where remn=n[Γnθ^-Γnθ0-(θ^-θ0)Γ.nθ^]=oP(1). For non-measurable hn and θ̂n, convergence is in outer probability.

Let us assume now that fj(y, θ, z), jInline graphic is a scalar Carathéodory integrands such that | fj(y, θ, z)| ≤ ψ̃(||y||1) and | fj(y, θ′, z) − fj(y′, θ′, z)| ≤ [|θθ′| + ||yy′||1] max( ψ̃′(||y||1), ψ̃′||y′||1) where ψ̃= ψ1,ψ2 and,ψ̃′ = ψ3 satisfy conditions 5.1. Put Sj[fj](u,θ)=n-1i=1nSj.i[fj](u,θ), where Sj.i[fj](u, θ) = Σm Yjmi(u)(fjαj)( Γθ (u), θ, Zjmi), and let sj[fj] = ESj[fj]. We write Sj[1] and sj[1] when fj ≡ 1, and set êj[fj] = Sj[fj]/Sj[1] and ej[fj] = sj[fj]/sj[1].

Lemma 5.3

We have ||Sj[fj]/sj[1] − sj[fj]/sj[1]|| → 0 a.s. for all jInline graphic.

Proof

We have ([Sj[fj]/sj[1]])(x, θ) = ℙng, where

gxθ(Di)=mYjmi(x)(fjαj)(Γθ(x),θ,Zjmi)EmYjmi(x)αj(Γθ(x),θ,Zjmi).

The conditions 5.1 imply that there exist constants C1 and C2 (dependent on the functions ψ̃, ψ̃′) such that

gxθ(Di)-gxθ(Di)C1[Yj.i(x)-Yj.i(x)+Yj.i(0)(ENj.i(x)-ENj.i(x)+EYj.i(x)-EYj.i(x))],gxθ(Di)-gxθ(Di)θ-θC2Yj.i(0)[1+ENj.i(τ)+EYj.i(0)].

Define G(Di) = Yj.i(0)[C2diam Θ+C1][1+ENj.i(τ)+EYj.i(0)]+gx0θ0 (Di), where (x0, θ0) is an arbitrary point in Θ × [0, τ]. Let θp, p = 1, …, ℓ = O(diam Θ/ε)d be centers of balls B(θp, ε) of radius ε covering the set Θ. By noting that ENj.i is an increasing continuous function and EYj.i is a decreasing cáglád function, we can construct a finite partition 0 = x0 < x1 < … < xk = τ such that the intervals Ir = [xr−1, xr], r = 1, …, k satisfy ENj.i(Iq) ≤ εENj.i(τ) and E|Yj.i(Ir)| ≤ εEYj.i(0). Let xq be the center of the interval Ir. Then for each xIr and θB(θp, ε), we have ||g(Di) − gxrθp(Di)||P,1ε||G(Di)||P,1. It follows that the class of functions Inline graphic = {g: x ∈ [0, τ], θ ∈ Θ} is Euclidean for the envelope G(Di) and Glivenko-Cantelli.

Lemma 5.4

For jInline graphic, define remj(x, θ) = [VjnV1jn](x, θ) and

Bj(x,θ)=0x[e^j[fj]-ej[fj]](u,θ)Mj..(du,θ),

where fj satisfies assumptions of Lemma 5.3. Then ||nremj||=oP(1) and ||nBj||=oP(1).

Proof

For the sake of convenience write rem = remj and B = Bj. Put ηj(u, θ) = [Sj/sj](Γθ(u), θ, u) − 1. A little algebra shows that

rem(x,θ)=-0xηj(u,θ)[Nj..-ENj..](du)sj[1](u,θ)+0xηj2(u,θ)Nj..(du)Sj[1](u,θ)=rem1(x,θ)+rem2(x,θ).

We have rem2(x, θ) = OP (1)rem3(τ, θ), where

rem3(x,θ)=0xηj2(u,θ)[Nj..-ENj..](du)sj[1](u,θ)+0xηj2(u,θ)ENj..(du)sj[1](u,θ).

In addition,

B(x,θ)=0x(Sj[fj]-Sj[1]ej[fj]sj[1])(u,θ)[Nj..-ENj..](du)-0x[(Sj[fj]-sj[fj]sj[1])ηj](u,θ)[Nj..-ENj..](du)-0x[(Sj[fj]-sj[fj]sj[1])ηj](u,θ)ENj..(du)+0xSj[fj](u,θ)rem2(du,θ)=p=14Bp(x,θ).

We have B4(x, θ) = OP (1)B5(θ),

B5(θ)=0τ(Sj[fj]-sj[fj])(u,θ)rem3(du,θ)+0τsj[fj])(u,θ)rem3(du,θ).

These expressions can be rewritten as V processes of degree r + 1, r ≤ 3

Vn,r+1(g)=1nr+1ir+1g(Dir+1),gG,

where the sum extends over sequences r + 1-tuplets Dir+1= (Di1, …, Dir+1) ir+1 = (ir1, …, ir+1), ij ∈ 1, …, n. The kernels g vary over the class Inline graphic = {gt: tInline graphic}, where for t = (x, θ) we have

gt(Dir+1)=0x=1r[h(Di,θ,u)-Eh(Di,θ,u)][Nj.ir+1-ENj.ir+1](du) (5.4)

or

gt(Dir+1)=0x=1r+1[h(Di,θ,u)-Eh(Di,θ,u)]ENj.(du). (5.5)

Here h(Di) are functions of the form Sj[fj]/sj[1], Sj[1]/s[1] and (sj[fj])Sj[1]/s[1]. In all cases, there exists a constant C, such that h(Di, θ, u) ≤ CYji(u) and |h(Di, θ, u)− h(Di, θ′, u)| ≤ |θθ′|CYj.i(u). Therefore, for any sequence Dir+1 = (Di1, …, Dir+1), we also have

gxθ-gxθ(Dir+1)G(Dir+1,x)-G(Dir+1,x),gxθ-gxθ(Dir+1)θ-θG(Dir+1,τ),

where

G(Dir+1,x)=0x=1r[H(Di,u)+EH(Di,u)][Nj.ir+1+ENj.ir+1](du)

and H(Di, u) = CYj.i(u), ℓ = 1, …, r for some constant C.

Let { Inline graphic(gt): tInline graphic} denote the U process associated with the kernels (5.4–5.5). It is easy to see that Inline graphic(gt) forms a canonical process. For Dr+1 = (D1, …, Dr+1), we have EGp(Dr+1) < ∞ for p = 1+1/(2r+1). Therefore, by Marcinkiewicz-Zygmund law in Teicher (1998) and Lemma A.1 in Dabrowska (2009), nsuptUr+1,n(gt)P0. By Marcinkiewicz-Zygmund theorem in de la Peña and Giné (1999), we also have nsuptVr+1,n(gt)-Ur+1,n(gt)0 a.s. because

EG(Dir+1)2d(ir+1)/(2r+1)<,

where ir+1 = (i1, …, ir+1) and d = d(ir+1) is the number of distinct coefficients among {i1, …, ir+1}, d = 1, …, r, r ≤ 3.

We denote now by ||B||v the variation norm of a d × q-matrix of functions B(x) = [bkl(x)], x ∈ [0, τ]. For any interval I ⊆ [0, τ], ||B||v(I)=supi=1m||B(xj)-B(xj-1)||1, where the supremum is taken over finite partitions of I such that xi < xj.

Further, let Inline graphic(θ0, εn) be a ball centered at θ0 of radius εn, εn ↓ 0, nεn. Suppose that ϕθ(x) is a d × q matrix of functions, with columns of the form 0xgjθdΓθ,j such that ||ϕθ0||v = O(1). Let ϕ be a sequence of consistent estimators such that

  • (i)

    ϕ(x) is a càdlàg or càglàd function (jointly in (x, θ)), continuous with respect to θ;

  • (ii)

    lim supn sup{||ϕ||v: θInline graphic(θ0, εn)} = OP (1);

  • (iii)

    sup{||ϕϕθ0||: θInline graphic(θ0, εn)} = oP (1) or

  • (iii′)

    ϕϕ = (θθ′)ψ,θ where lim supn sup{||ψnθθ||v: θ, θ′ ∈ Inline graphic(θ0, εn)} = OP (1).

If ϕ is a jointly SnPB(T) measurable estimator then conditions (ii)–(iii) are assumed to hold in probability. If this is not the case then the conditions (ii)–(iii) are taken to hold in outer probability.

Lemma 5.5

  1. If ϕ(x) is a measurable process satisfying (i)–(ii) and (iii) or (iii′) then with probability tending to 1, the equation Un(θ) = 0 has a consistent root θ̂ in the ball Inline graphic(θ0, εn). In addition, under the condition (iii′), the score equation has a unique root in Inline graphic(θ0, εn), with probability tending to 1.

  2. If ϕ is not measurable, then statements in part (1) hold with inner probability tending to 1.

  3. If θ̃ is an arbitrary consistent estimator of θ0, then the equation Unϕ̃n(θ) = 0, where ϕ̃n(x) = ϕnθ̃(x) has a unique solution θ̂, with (inner) probability tending to 1, and Un(θ̂) = op*(n−1/2).

In all three cases, Ξ^=n(θ^-θ0) and the process W^0={n[Γnθ^-Γθ0-(θ^-θ0)TΓ.nθ^](x):xτ} converge weakly in Rd×ℓ([0, τInline graphic) to a mean zero Gaussian process defined in the statement of Proposition 3.1.

Proof
Case (1)

Write Un(θ) = U n(θ) for short. Set b̃jmiθ(u), θ, u) = = b̃jmi1θ(u), θ, u) − ϕθ0(u)b̃jmi2θ(u), θ, u) where

bjmi1(Γθ(u),θ,u)=.j(Γθ(u),θ,Zjmi)-ej[.j](u,θ),bjmi2(Γθ(u),θ,u)=j(Γθ(u),θ,Zjmi)-ej[j](u,θ).

Define jmiθ(u), θ, u), jmi1θ(u)θ, u) and jmi2θ(u), θ, u) using similar expressions with ej[ℓ̇j] and ej[j] replaced by êj[ℓ̇j] and e^j[j]. We have Un(θ)=p=14Unp(θ), where

U1n(θ)=1ni=1njm0xbjmi(Γθ(u),θ,u)Mjmi(du,θ),U2n(θ)=q=120τrnq(du,θ)[Γnθ-Γθ]T(u)=q=12U2n;q(θ),U3n(θ)=-j0τ[(e^j[.j]-ej[.j])(u,θ)-ϕθ0(u)(e^j[j]-ej[j])(u,θ)]Mj..(du,θ),U4n(θ)=-1ni=1njm0τ[ϕnθ-ϕθ0](u)]b^jmi2(Γnθ(u),θ,u)Njmi(du),

and

rn1(x,θ)=1ni=1njm0xb¯jmi(Γθ(u),θ,u)Njmi(du),rn2(x,θ)=1ni=1njm010x[b¯jmi(Γnθλ(u),θ,u)Njmi(du)dλ-rn1(x,θ).

Here Γnθλ=Γθ+λ(Γnθ-Γθ) for λ ∈ (0, 1). We have U2n;2(θ0)=0τOP(||Γnθ0-Γθ0||2)jNj..(du)=oP(n-1/2). Moreover, r1n(x, θ0) converges almost surely to

r(x,θ0)=j0x[covj(j,.j)(u,θ0)-ϕθ0(u)covj(j,j)(u,θ0)]ENj..(du)

uniformly in x, xτ. Lemma 5.2 and integration by parts imply that the terms [ nU1n(θ0),nU2n;1(θ0)] converge weakly to a pair of independent normal variables with mean zero and covariances Σ0(θ0) and Σ2(θ0) − Σ0(θ0), respectively. By Lemma 5.3–4, we also have U3n(θ0) = oP (n−1/2). Finally,

U4n(θ)=-p=130τ[ϕnθ-ϕθ0](u)]Bpn(du,θ)=p=13U4n;p(θ),

where

B1n(x,θ)=1ni=1njm0x[b^2jmi(Γnθ(u),θ,u)-b^2jmi(Γθ(u),θ,u)]Njmi(du),B2n(x,θ)=-j0x(e^j[j]-ej[j])(u,θ)Mj..(du,θ),B3n(x,θ)=1ni=1njm0xb2jmi(Γθ(u),θ,u)Mjmi(du,θ).

By Lemmas 5.2–5.4, we have nU4n;2(θ)=oP(1) and nU4n;1(θ)=j0τOP(n||Γnθ-Γθ||1(u)||ϕnθ-ϕθ0||1(u)Nj..(du)=oP(1), uniformly in θInline graphic( θ0, εn). On the other hand, at θ = θ0, { nB3n(x,θ0):xτ} is a sum of iid mean zero processes. The finite dimensional distributions are mean zero variables with finite variance-covariance matrix and converge weakly to mean zero Gaussian variables. Each component of B3n(x, θ0) is a measurable process which can be represented as a finite linear combination of càdlàg monotone functions of x with a square integrable envelope satisfying (5.2). The same argument as in Lemma 5.2 implies that the process is nB3n(x,θ0) converges weakly to a mean zero Gaussian process with sample paths continuous with respect to the variance semi-metric. The space of functions continuous with respect to the variance semi-metric is isometric to the space C([0, τ])q. By almost sure representation theorem and a similar integration by parts argument as in Bilias et al (1997) we have nU4n;3(θ0)=oP(1).

Set U^n(θ)=j=13Ujn(θ). Some elementary algebra shows that for θ, θ′ ∊ Inline graphic(θ0, εn), we have Ûn(θ) = Ûn(θ′) + (Σn(θ0) + rem0n(θ, θ′))(θθ′), where Σn(θ0) is a matrix which converges in probability −Σ1(θ0). The matrix Σ1(θ) is defined in Section 3 and is non-singular. Further, U4n(θ) − U4n(θ′) = rem2n(θ, θ′)(θθ′) + rem3n(θ, θ′) + O(|θθ0| ∨ |θ′ − θ0|)rem4n(θ, θ′). Setting rem1n(θ,θ)=I+1-1(θ0)[n(θ0)+rem0n(θ,θ)], and bqn = sup{|remqn(θ, θ′)|: θ, θ′ ∈ Inline graphic( θ0, εn)}, q = 1, …, 4, we have b1n = oP (1), b2n = oP (1). Under the condition (iii′), remnq ≡ 0 ≡ bqn, q = 3, 4, while under the condition (iii), b3n = oP (n−1/2) and b4n = oP (1).

Put an=b1n + b2n + b4n and An= b5n + b3n, where b5n=|Σ(θ0)−1Ûn(θ0)| = OP (n−1/2). Let 0 < η < 1/2 and 0 < η′ < 1 be given. By asymptotic tightness of An, we can find a compact set K = K(η) and n0 such that for all nn0 and all open sets G containing K, we have Pn(nAnG)<η and Pn(an > η′) < η. Therefore, we also have Pn(nAn>M(1-η))<η for all finite MM0, where M0 = M0(η) is a large enough finite nonnegative constant. Since nεn and εn ↓ 0, by eventually increasing n0, we can assume that for nn0, we have Inline graphic(θ0, εn) ⊂ Θ and M<nεn. Consequently, the set EnInline graphic given by En = {ω0: An(ω0)/(1 − an(ω0) < εn, an(ω0) ≤ η′} satisfies Pn(En) ≥ 1 − 2η for all nn0.

For nn0, consider the set-valued mapping Hn: Inline graphic, ↪ Rd given by

Hn(ω0)=B¯(θ0,An(ω0)1-an(ω0))={θ:θ-θ0An(ω0)1-an(ω0)}ifω0En,=ifω0En.

The graph of Hn, graphHn = {(ω0, θ): θHn(ω0)} is SnPB(Θ)-measurable and Hn=EnSnP. Further, let gn(ω0,θ)=θ+1-1(θ0)Un(ω0,θ). Then gn is SnPB(Θ) measurable, because it is continuous with respect to θ for fixed ω0 and SnP -measurable for fixed θ. It follows that the set valued mapping

Cn(ω0)={θ:gn(ω0,θ)=0andθHn(ω0)}forω0En,=forω0En

is closed-valued and has an SnPB(Θ)- measurable graph. We have domCn = En: for fixed ω0En, Hn(ω0) is a closed ball, gn(ω0, θ) is continuous and maps Hn(ω0) into itself. By Brouwer’s fixed point theorem, Cn(ω0) ≠ ∅. Thus En ⊆ domCn, while the reversed inclusion is obvious.

Further, for any root θ̂ in domCn, we have n(θ^-θ0)An/(1-an)=OP(1) , and n(θ^-θ0)=(θ0)-1nU^n(θ0)+oP(n-1/2) so that n(θ^-θ0) converges in law to the normal distribution given in Section 3. An argument similar to Bickel et al. (1993, p.517) shows also that under the condition (iii′), gn(ω0, θ) is a contraction on Hn(ω0), ω0En, with contraction coefficient an(ω0). Thus in this case, the root is unique: Cn(ω0) = {θ̂ (ω0)} for ω0En and nn0.

Case (2)

If ϕ estimators are not SnPB(T) measurable, then the score function splits into two parts: Ũn(θ) = Ûn(θ) + U4n(θ). The term Ûn(θ) remains SnPB(Θ) measurable, while the second term is not. However, b3n = op*(n−1/2), an = oP* (1) while b5n = |Σ(θ0)−1Ûn(θ0)| = Op(n−1/2). In this case, the set En satisfies lim infn Pn,*(En) ≥ 1−2η and the closed ball Inline graphic(θ0, An/1−an) is contained in Inline graphic( θ0, εn) with inner probability tending to 1.

Case (3)

We write Ũn(θ) for the modified score function obtained by substituting in ϕ̃n(x) = ϕnθ̃(x) in place of ϕ. Suppose that θ̃ is SnP -measurable and ϕ(x) is SnPB(T) measurable. Then the plug-in estimator ϕnθ̃(x) is SnPB([0,τ]) measurable and the modified score process Ũn(θ) is SnPB(Θ) measurable. Moreover, we have Ũn(θ) = Ûn(θ) + Ũn4(θ), where the remainder Ũ4n(θ) satisfies n[Un4(θ)-Un4(θ)]=oP(1+nθ-θ0), uniformly in θB(θ0, ε0) and U4n(θ)-U4n(θ)=(θ-θ)rem2n(θ,θ),sup{rem2n(θ,θ):θ,θB(θ0,εn)}=oP(1). With probability tending to 1, the modified equation has a unique root θ̂ in a compact random ball contained in B(θ0, εn) and Un(θ̂) = oP*(n−1/2). On the other hand, if either θ̃ or ϕ are not measurable, then this remains to hold, except that the modified equation has a unique solution with inner probability tending to one.

Under assumptions of part (1), measurable selection theorems (Wagner, 1976) ensure that there exists at least one function θ^^:SnRd such that θ^^(ω0)Cn(ω0) whenever ω0En and θ^^ is measurable with respect to Snp. This also applies to part (3), provided θ̃ and ϕ are SnP - measurable.

5.4 Proof of Proposition 3.2

With some abuse of notation, set V = [Vj, jInline graphic] where V (x) = V (x, θ0) and V (x, θ) is the Gaussian process of Lemma 5.1. Under the assumption that θ0 is the true parameter of the modulated renewal process, the process V corresponds to a vector of independent time-transformed Brownian motions with covariance

cov(Vj(x),Vj(y))=Cj(xy)andcov(Vj(x),V(y))=0ifj.

Similarly, let = [j: jInline graphic] be equal to Vˇ(x)=nV1n(x,θ0) where V1n(x, θ) is defined as in Lemma 5.1. Thus the j-th component of is

Vˇj(x)=1ni=1nm0xMjmi(du)sj(Γθ0(u-),θ0,u).

Put Vˇ#=[Vˇj#:j=1,,q],

Vˇj#(x)=1ni=1nmGmi0xNjmi(du)sj(Γθ0(u-),θ0,u).

Finally, let G0 be a Inline graphic(0, Id×d) variable, independent of (Di, Gi)’s. Set Ξ1#=1-1(θ0)0(θ0)1/2G0 and Ξ^1#=^1-1(θ^)^0(θ^)1/2G0. We have EVˇj#(x)=0=EVˇj(x),

cov(Vˇj#(x),Vˇ#(x))=cov(Vˇj(x),Vˇ(x))=δjlCjθ0(xx),cov(Vˇj#(x),Vˇ(x))=0. (5.6)

Moreover, Ξ1# is independent of D1, …, Dn. This also means that it is independent of (#, ).

We consider first unconditional weak convergence. By central limit theorem and strong law of large numbers, the finite dimensional distributions of the processes (, #) converge weakly to finite dimensional distributions of (V, V#), two independent vectors of Brownian motions with variance functions Cj, θ0, j = 1, …, q.

For each j = 1, …, q, the process Vˇj# can be represented as Vˇj#(x)=n-1/2i=1nfx(j)(Gi,Di), where

fx(j)(GiDi)=mGmi0xNjmi(du)sj(Γθ0(u-),θ0,u).

The class of functions Fj={fx(j)(Gi,Di):x[0,τ]} has a square integrable envelope

Fj(Gi,Di)=m=1Gmi0τNjmi(du)sj(Γθ0(u-),θ0,u)

and is Euclidean for this envelope because each fx(j)Fj is a difference of two functions increasing in x and bounded by Fj(Gi, Di). Thus Inline graphic forms a Donsker class of functions. The union of these classes, Inline graphic = ∪j Inline graphic is Donsker as well. From Lemma 1, the process = {j(x): x ∈ [0, τ], jInline graphic} can be also represented as an empirical process over a Euclidean class of functions Inline graphic and the union Inline graphicInline graphic forms a Donsker class. Using consistency of the estimates (θ̂, Γnθ̂), Lemma 5.5 and a couple of lines integration by parts yields also ||##|| = oP (1) in outer probability.

Write # as the empirical process # = ℙnf, fInline graphic. Further, let BL1 be the collection of Lipschitz functions h from Rd × ℓ( Inline graphic) into [0, 1], such that |h(r, w) − h(r′, w′)| ≤ |rr′| + ||ww′|| for r, r′ ∈ Rd and w, w′ ∈ ℓ( Inline graphic). The set Inline graphic is totally bounded with respect to the variance pseudo-metric d. Therefore, for fixed δ > 0, it can be covered by a finite number of d-balls of radius δ, say Inline graphic(fl,δ) ℓ = 1, …, k = k(δ). Set V #πδ = ℙnπδ(f ), where πδ(f ) = f for fInline graphic(f,δ) (pick one f for each fInline graphic). By triangular inequality, we have

suphBL1EGh(Ξ^1#,V^#)-Eh(Ξ1#,V#)r=14I4(δ),

where

I1(δ)=suphBL1Eh(Ξ1#,V#πδ)-Eh(Ξ1#,V#),I2(δ)=suphBL1Eh(Ξ1#,V#πδ)-EGh(Ξ1#,Vˇ#πδ),I3(δ)=suphBL1EGh(Ξ1#,Vˇ#πδ)-EGh(Ξ1#,Vˇ#),I4(δ)=suphBL1EGh(Ξ^1#,V^#)-EGh(Ξ1#,Vˇ#).

For given ε > 0, we can choose δ0 so that I1(δ) < εfor all δ <δ0. The second term converges in outer probability to 0, for any δ. This follows from weak convergence of finite dimensional distributions of # and the same argument as in Van der Vaart and Wellner (1996, p. 182), except that in our setting, the Lindeberg condition of their Lemma 2.9.5 is not needed to verify conditional weak convergence of finite dimensional distributions. We also have I3(δ)EG||Vˇ#πδ-Vˇ#||FδEG||Vˇ#||Fδ where Inline graphic = {ff′: f, f′ ∈ Inline graphic: d(ff′) < δ}. Since Inline graphic forms a Euclidean class of functions with a square integrable envelope, we have limδ0limsupnEI3(δ)limδ0limsupnEEG||Vˇ#||Fδ=0. Finally, the term I4(δ) does not depend on δ, and we have I4(δ)2PG(Ξ^1-Ξ1+||V^#-Vˇ#||>ε)+ε. By unconditional convergence, we have I4(δ) → 0 in outer probability.

Finally, set Ψ(Ξ^1#,V^#)=[Ξˇ#,Wˇ0#], where

Ξˇ#=Ξ^1#-1-1(θ0)j0τρj,ϕ(u,θ0)ENj..(du)Wˇ0#(u)T,Wˇ0#(x)=0xV^#(du)Pθ0(u,x)=V^#(x)-0xV^#(u-)Qθ0(du)Pθ0(u,x).

The estimates [Ξ̂#, W^0#] defined in Section 4 are [Ξ^#,W^0#]=Ψ^(Ξ^1#,V^#), where Ψ̂ is the sample analogue of Ψ obtained by plugging in the estimates Inline graphic,θ̂, ρj, ϕn(·, θ̂0). By the continuous mapping theorem, unconditionally, Ψ(Ξ^1#,V^#)Ψ(Ξ1#,V#)=(Ξ#,W0#). By triangular inequality one more time, we have suphBL1EGh(Ξ^#,W^0#)-Eh(Ξ#,W0#)J1+J2, where

J1=suphBL1EGh(Ξ^#,W^0#)-EGh(Ξˇ#,Wˇ0#),J2=suphBL1EGh(Ξˇ#,Wˇ0#)-Eh(Ξ#,W0#).

For any Lipschitz continuous function hBL1, h ∘ Ψ ∈ BLc for some constant c. Therefore the preceding implies that J2 tends to 0 in outer probability. This also holds for the term J1, because ||Ξ̌# − Ξ̂#||→P* 0 and ||W^0#-Wˇ0#||1P0, by consistency of the estimates (θ̂, Γnθ̂) and integration by parts.

Acknowledgments

The data presented here were obtained from the Statistical Center of the Center for International Blood and Marrow Transplant Research (CIBMTR). The analysis has not been reviewed or approved by the Advisory or Scientific Committee of the CIBMTR. The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and marrow transplant patients with the CIBMTR Data Collection Center located in the Medical College, Wisconsin. The CIBMTR is a repository of information about results of transplant at more than 450 transplant centers worldwide. I thank Mei-Jie Zhang for preparation of the data and some discussions. I also thank a reviewer and Editor Daniel Commenges for their comments. Research supported by the grant R01 AI067943 from the National Institute of Allergy and Infectious Diseases. The content is solely the responsibility of the author and does not necessarily represent the official views of NIAID, NIH or CIBMTR.

References

  • 1.Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer; New York: 1993. [Google Scholar]
  • 2.Arjas E, Eerola M. On predictive causality in longitudinal studies. J Statist Planning and Inference. 1993;34:361–386. [Google Scholar]
  • 3.Bagdonovicius V, Nikulin M. Generalized proportional hazards model based on modified partial likelihood. Lifetime Data Analysis. 1999;5:329–350. doi: 10.1023/a:1009688109364. [DOI] [PubMed] [Google Scholar]
  • 4.Bagdonovicius M, Hafdi MA, Nikulin M. Analysis of survival data with cross-effects of survival functions. Biostatistics. 2004;5:415–425. doi: 10.1093/biostatistics/5.3.415. [DOI] [PubMed] [Google Scholar]
  • 5.Beesack PR. Carlton Math Lecture Notes. Vol. 11. Carlton University; Ottawa: 1973. Gronwall Inequalities. [Google Scholar]
  • 6.Bickel PJ. Efficient testing in a class of transformation models. Proceedings of the 45th Session of the International Statistical Institute; ISI, Amsterdam. 1986. pp. 23.3.63–23.3.81. [Google Scholar]
  • 7.Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation in semi-parametric models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]
  • 8.Bickel PJ, Ritov Y. Local asymptotic normality ranks and covariates in transformation models. In: Pollard D, Yang G, editors. Festschrift for L LeCam. Springer; New York: 1995. [Google Scholar]
  • 9.Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann Statist. 1997;25:662–683. [Google Scholar]
  • 10.Chang I-S, Hsiung CA. Information and asymptotic efficiency in some generalized proportional hazard models for counting processes. Ann Statist. 1994;22:1275–1298. [Google Scholar]
  • 11.Chang I-S, Chuang Y-C, Hsiung CA. A class of nonparametric k-sample tests for semi-Markov processes. Statistica Sinica. 1999;9:211–277. [Google Scholar]
  • 12.Chang I-S, Hsiung CA, Wu S-M. Estimation in a proportional hazard model for semi-Markov counting process. Statistica Sinica. 2000;10:1257–1266. [Google Scholar]
  • 13.Chen K, Jin Z, Ying Z. Semi-parametric analysis of transformation models with censored data. Biometrika. 2002;89:659–668. [Google Scholar]
  • 14.Chintagunta P, Prasad AR. An empirical investigation of the “Dynamic McFadden” model of purchase timing and brand choice: implications for market structure. J Business and Economic Statist. 1998;16:2–12. [Google Scholar]
  • 15.Cinlar E. Introduction to Stochastic Processes. Prentice-Hall; New Jersey: 1975. [Google Scholar]
  • 16.Cook RJ, Lawless JF. The Statistical Analysis of Recurrent Events. Springer; New York: 2007. [Google Scholar]
  • 17.Commenges D. Semi-Markov and non-homogeneous Markov models in medical studies. In: Janssen J, editor. Semi-Markov models. Plenum Press; New York: 1986. pp. 411–422. [Google Scholar]
  • 18.Commenges D, Joly P, Gégout-Petit A, Liquet B. Choice between semi-parametric estimators for Markov and non-Markov multistate models from coarsened observations. Scand J Statist. 2007;34:33–52. [Google Scholar]
  • 19.Cox DR. The statistical analysis of dependencies in point processes. In: Lewis PAW, editor. Symposium on Point Processes. Wiley; New York: 1973. [Google Scholar]
  • 20.Cutler C, Antin JH. Peripheral blood stem cells for allogeneic transplantation: a review. Stem Cells. 2001;19:108–117. doi: 10.1634/stemcells.19-2-108. [DOI] [PubMed] [Google Scholar]
  • 21.Cutler C, Giri S, Jeyapalan S, Paniagua D, Viswanathan A, Antin JH. Acute and chronic graft-versus-host disease after allogeneic peripheral blood stem-cell and bone marrow transplantation: a meta analysis. J Clin Oncol. 2001;19:3685–3691. doi: 10.1200/JCO.2001.19.16.3685. [DOI] [PubMed] [Google Scholar]
  • 22.Dabrowska DM, Sun G, Horowitz MM. Cox regression in a Markov renewal model: an application to the analysis of bone marrow transplant data. J Amer Statist Assoc. 1994;89:867–877. [Google Scholar]
  • 23.Dabrowska DM. Estimation of transition probabilities and bootstrap in a semi–parametric Markov renewal model. J Nonparametric Statist. 1995;5:237–259. [Google Scholar]
  • 24.Dabrowska DM. Estimation in a class of semi-parametric transformation models. In: Rojo J, editor. Second Erich L Lehmann Symposium - Optimality. Vol. 49. Institute of Mathematical Statistics; 2006. pp. 166–216. Lecture Notes and Monograph Series. [Google Scholar]
  • 25.Dabrowska DM. Information bounds and efficient estimation in a class of censored transformation models. Acta Applicandae Mathematicae. 2007;96:177–201. [Google Scholar]
  • 26.Dabrowska DM. Estimation in a semi-parametric two-stage renewal regression model. Statistica Sinica. 2009;19:981–996. [PMC free article] [PubMed] [Google Scholar]
  • 27.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. Springer; New York: 1988. [Google Scholar]
  • 28.Dellacherie C, Meyer PA. Probabilities and Potentiel. Hermann; Paris: 1975. [Google Scholar]
  • 29.de la Peña V, Giné H. Decoupling: From Dependence to Independence. Springer; New York: 1999. [Google Scholar]
  • 30.Dudley RM. Uniform Central Limit Theorems. Cambridge University Press; 1999. [Google Scholar]
  • 31.Eerola M. Probabilistic causality in longitudinal studies. Springer; New York: 1994. [Google Scholar]
  • 32.Friedrichs B, Tichelli A, Bacigalupo A, Russel NH, Ruutu T, Beksac M, Hasenclever D, Socié G, Schmitz N. Long-term outcome and late effects in patients transplanted with mobilised blood or bone marrow: a randomised trial. Lancet Oncology. 2001;11:331–338. doi: 10.1016/S1470-2045(09)70352-3. [DOI] [PubMed] [Google Scholar]
  • 33.Flowers MED, Parker PM, Johnston LJ, Matos AV, Storer B, Bensinger WI, Storb R, Appelbaum FR, Forman SJ, Blume KG, Martin PJ. Comparison of chronic graft-versus-host disease after transplantation of peripheral blood stem cells versus bone marrow in allogeneic recipients: long-term follow-up of a randomized trial. Blood. 2002;100:415–419. doi: 10.1182/blood-2002-01-0011. [DOI] [PubMed] [Google Scholar]
  • 34.Gale RP, Bortin MM, van Bekkum DW, Biggs JC, Dicke KA, Gluck-man E, Good RA, Hoffman RG, Key HEM, Kersey JH, Marmont A, Masaoka T, Rimm AA, van Rood JJ, Zwaan FE. Risk factors for acute graft-versus-host disease. Br J Haematol. 1987;67:397–406. doi: 10.1111/j.1365-2141.1987.tb06160.x. [DOI] [PubMed] [Google Scholar]
  • 35.Gill RD. Nonparametric estimation based on censored observations of a Markov renewal process. Z Wahrscheinlichkeitstheorie verv Gebiete. 1980;53:97–116. [Google Scholar]
  • 36.Gill RD, Johansen S. A survey of product integration with a view toward application in survival analysis. Ann Statist. 1990;18:1501–1555. [Google Scholar]
  • 37.Greenwood P, Wefelmeyer W. Empirical estimators for semi-Markov processes. Math Meth Statist. 1996;5:299–315. [Google Scholar]
  • 38.Greenwood P, Müller UU, Wefelmeyer W. Semi-Markov processes and their applications. Commun Stat Theory Methods. 2004;33:419–435. [Google Scholar]
  • 39.Himmelberg CJ. Measurable relations. Fund Math. 1975;87:53–72. [Google Scholar]
  • 40.Hjort NL, Cleaskens G. Frequentist model average estimators. J Amer Statist Assoc. 2003;98:938–945. [Google Scholar]
  • 41.Hjort NL, Cleaskens G. Focused information criteria and model averaging for Cox’s hazard regression model. J Amer Statist Assoc. 2006;101:1449–1464. [Google Scholar]
  • 42.Jacod J. Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Z Wahrscheinlichkeitstheorie verv Gebiete. 1975;31:235–254. [Google Scholar]
  • 43.Janssen J. Semi-Markov Models: Theory and Applications. Springer; New York: 1999. [Google Scholar]
  • 44.Janssen J, Manca R. Semi-Markov Risk Models for Finance, Insurance and Reliability. Springer; New York: 2007. [Google Scholar]
  • 45.Janssen J, Manca R. Applied Semi-Markov Processes. Springer; New York: 2006. [Google Scholar]
  • 46.Janssen J, Limnios N. International Symposium on Semi-Markov Models: Theory and Applications. Kluwer: Academic Press; 2001. [Google Scholar]
  • 47.Jones MP, Crowley JJ. Nonparametric tests of the Markov model for survival data. Biometrika. 1992;79:513–522. [Google Scholar]
  • 48.Kalbfleisch JD, Prentice RL. Statistical Analysis of Failure Time Data. Wiley; 1981. [Google Scholar]
  • 49.Karr AF. Point Processes and their Statistical Inference. Marcel Dekker; New York: 1991. [Google Scholar]
  • 50.Keiding N. Statistical analysis of semi-Markov models based on the theory of counting processes. In: Janssen J, editor. Semi-Markov models Theory and Applications. Plenum Press; 1986. pp. 301–315. [Google Scholar]
  • 51.Keiding N, Klein JP, Horowitz MM. Multistate models and outcome prediction in bone marrow transplantation. Statist Med. 2001;20:1871–1885. doi: 10.1002/sim.810. [DOI] [PubMed] [Google Scholar]
  • 52.Klein JP, Keiding N, Copelan EA. Plotting summary predictions in multistate survival models: probabilities of relapse and death in remission for bone marrow transplantation patients. Statist Med. 1993;12:2315–2332. doi: 10.1002/sim.4780122408. [DOI] [PubMed] [Google Scholar]
  • 53.Fillipov A. On certain questions in the theory of optimal control. Vestnik Moskov Univ Ser Mat Meh Astronom Fiz Him. 1962;2:25–32. (1959) English Translation 1 76–84. [Google Scholar]
  • 54.Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazard models. Ann Statist. 2004;32:1448–1491. [Google Scholar]
  • 55.Kuratowski K. Topology. Academic Press; 1966. [Google Scholar]
  • 56.Lagakos SW, Sommer CJ, Zelen M. Semi-Markov models for censored data. Biometrika. 1978;65:311–317. [Google Scholar]
  • 57.Last G, Brandt A. Marked Point Processes on the Real Line: the Dynamic Approach. Springer; New York: 1995. [Google Scholar]
  • 58.Limnios N, Oprisan . Semi-Markov Processes and Reliability. Vol. 2001 Springer; 2001. [Google Scholar]
  • 59.Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]
  • 60.Lo SMS, Wilke RA. A copula model for dependent competing risks. Appl Statist. 2010;59:359–376. [Google Scholar]
  • 61.Martinussen T, Scheike T. Dynamic Regression Models for Survival Data. Springer; New York: 2006. [Google Scholar]
  • 62.Moore EM, Pyke R. Estimation of the transition distributions of a Markov renewal process. Ann Inst Stat Math. 1968;20:411–468. [Google Scholar]
  • 63.Nolan D, Pollard D. U-processes: rates of convergence. Ann Statist. 1987;15:780–799. [Google Scholar]
  • 64.Oakes D. Survival analysis: aspects of partial likelihood (with discussion) Int Statist Rev. 1981;49:235–264. [Google Scholar]
  • 65.Oakes D, Cui L. On semi-parametric inference for modulated renewal processes. Biometrika. 1994;81:83–91. [Google Scholar]
  • 66.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov kernels with applications to reliability analysis. Appl Stochastic Models and Data Analysis. 1996;12:209–220. [Google Scholar]
  • 67.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov processes based on its hazard rate functions. Stat Inference Stoch Processes. 1999;2:151–173. [Google Scholar]
  • 68.Pollard D. Convergence of Stochastic Processes. Springer Verlag; New York: 1984. [Google Scholar]
  • 69.Pollard D. Inst Math Statist. Hayward: 1990. Empirical Processes: Theory and Applications. [Google Scholar]
  • 70.Phelan MF. Bayes estimation from a Markov renewal process. Ann Statist. 1999;18:603–616. [Google Scholar]
  • 71.Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Statist Med. 2007;26:2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]
  • 72.Pyke R. Markov renewal processes: definitions and preliminary properties. Ann Math Statist. 1961a;32:1231–1242. [Google Scholar]
  • 73.Pyke R. Markov renewal processes with finitely many states. Ann Math Statist. 1961b;32:1243–1259. [Google Scholar]
  • 74.Pyke R, Schaufele R. Limit theorems for Markov renewal processes. Ann Math Statist. 1964;35:1746–1764. [Google Scholar]
  • 75.Pyke R, Schaufele R. The existence and uniqueness of stationary measures for Markov renewal processes. Ann Math Statist. 1966;37:1439–1462. [Google Scholar]
  • 76.Ringden O, Labopin M, Bacigalupo A, Arcese W, Schaefer UW, Willemze R, Koc H, Bunjes D, Gluckman E, Rocha V, Schattenberg A, Frassoni F. Transplantation of peripheral blood stem cell as compared with bone marrow from HLA-identical siblings in adult patients with acute myeloid leukemia and acute lymphoblastic leukemia. J Clin Oncol. 2002;20(24):4655–4664. doi: 10.1200/JCO.2002.12.049. [DOI] [PubMed] [Google Scholar]
  • 77.Rivest LP, Wells MT. A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multiv Analysis. 2001;79:138–155. [Google Scholar]
  • 78.Teicher H. On the Marcinkiewicz-Zygmund strong law for U-statistics. J Theoret Probab. 1998;11:279–288. [Google Scholar]
  • 79.van der Vaart AW, Wellner JA. Weak convergence and Empirical Processes with Applications to Statistics. Springer; New York: 1996. [Google Scholar]
  • 80.Voelkel JG, Crowley JJ. Nonparametric inference for a class of semi-Markov processes with censored observations. Ann Statist. 1984;12:142–160. [Google Scholar]
  • 81.Wagner DH. Survey of measurable selection theorems. SIAM, J Control and Optimization. 1977;15:859–903. [Google Scholar]
  • 82.Weiss GH, Zelen M. A semi-Markov model for clinical trials. J Appl Probab. 1965;2:269–285. [Google Scholar]
  • 83.Zheng M, Klein JP. estimates of marginal survival for dependent competing risks based on an assumed copula model. Biometrika. 1995;82:127–138. [Google Scholar]

RESOURCES