Estimation in a semi-Markov transformation model

Dorota M Dabrowska

doi:10.1515/1557-4679.1233

. Author manuscript; available in PMC: 2013 Jun 22.

Published in final edited form as: Int J Biostat. 2012 Jun 22;8(1):Article–15. doi: 10.1515/1557-4679.1233

Estimation in a semi-Markov transformation model

Dorota M Dabrowska ¹

PMCID: PMC3405912 NIHMSID: NIHMS380985 PMID: 22740583

Abstract

Multi-state models provide a common tool for analysis of longitudinal failure time data. In biomedical applications, models of this kind are often used to describe evolution of a disease and assume that patient may move among a finite number of states representing different phases in the disease progression. Several authors developed extensions of the proportional hazard model for analysis of multi-state models in the presence of covariates. In this paper, we consider a general class of censored semi-Markov and modulated renewal processes and propose the use of transformation models for their analysis. Special cases include modulated renewal processes with interarrival times specified using transformation models, and semi-Markov processes with with one-step transition probabilities defined using copula-transformation models. We discuss estimation of finite and infinite dimensional parameters of the model, and develop an extension of the Gaussian multiplier method for setting confidence bands for transition probabilities. A transplant outcome data set from the Center for International Blood and Marrow Transplant Research is used for illustrative purposes.

1 Introduction

We consider estimation in a semi-Markov regression model with a finite state space Inline graphic = {1, …, r}. In the absence of covariates, the model can be described by a sequence (T, J) = {(T_n, J_n): n ≥ 0}, where T₀ < T₁ < T₂ … are consecutive times of entrances into the states J₀, J₁, J₂, …, J_n ∈ = {1, …, r}. The sequence J = {J_n: n ≥ 0} of states visited forms a Markov chain and given J, the sojourn times T₁, T₂ − T₁, … are independent with distributions depending on the adjoining states only. Alternatively, the distribution of the sojourn times T_n₊₁ − T_n, n ≥ 0 satisfies

P (T_{n + 1} - T_{n} \leq x J_{n + 1} = j ∣ J_{0}, T_{0}, J_{1}, T_{1}, \dots, J_{n}, T_{n}) = P (T_{n + 1} - T_{n} \leq x J_{n + 1} = j ∣ J_{n}) .

Properties of semi-Markov processes were discussed in some detail in classical papers of Pyke (1961,a b), Pyke and Schaufele (1964,1966), and textbooks of Cinlar (1975), Daley and Vere-Jones (1988), Karr (1991), Last and Brandt (1995) and Limnios and Oprisan (2001). Numerous examples of applications to areas such as reliability, insurance and finance were provided by Janssen (1999), Janssen and Manca (2006,2007) and Janssen and Limnios (2001), for instance. In such studies, it is most common to consider estimation methods assuming that a single realization of a semi-Markov process is observed over a finite time interval [0, τ] whose length tends to infinity (τ ↑ ∞). Greenwood and Wefelmeyer (1996) and Greenwood, Müller and Wefelmeyer (2004) developed a general framework for analysis of non- and semi-parametric semi-Markov processes in this setting. In particular, they studied properties of classical estimators of the jump frequency and the proportion of visits to a given state, as well as Moore and Pyke’s (1968) non-parametric estimator of the kernel of the process. Estimation of transition intensities and transition probabilities was considered by Ouhbi and Limnios (1996,1999).

In survival analysis, it is more common to consider estimation based on a large number of iid copies of a semi-Markov process observed over a deterministic or random time intervals. Lagakos, Sommer and Zelen (1978), Gill (1980), Voelkel and Crowley (1984) and Phelan (1999) developed nonparametric estimators of the semi-Markov kernel of the process in the presence of random censoring. Examples of applications of these processes to analysis of survival data can be found in Commenges (1986), Keiding (1986), Dabrowska et al. (1994), Chang et al. (1994, 1999,2000), Cook and Lawless (2007), among others.

In this paper, we assume that the evolution of the process (T_m, J_m)_m_≥0 depends also on an R^d-valued covariate (Z_m)_m_≥0, Z_m = [Z_jm: j ∈ Inline graphic ], which represents either a vector of time independent covariates, or a vector of time dependent covariates changing at the successive renewal times. As an extension of the semi-Markov process to the regression setting, Cox (1973) proposed to consider a proportional hazards modulated renewal process. More precisely, let Ñ = {Ñ_j(t): t ≥ 0, j = (j₁, j₂) ∈ Inline graphic × } be the counting process registering transitions among adjoining states of the model,

{\tilde{N}}_{j} (t) = \sum_{m \geq 0} 1 (T_{m + 1} \leq t, J_{m + 1} = j_{2}, J_{m} = j_{1}) .

Cox’s model assumes that the compensator of this process, relative to the self-exciting filtration { Inline graphic }_t_≥0, is given by Λ_j(0) = 0,

Λ_{j} (t) = Λ_{j} (T_{m}) + \int_{0}^{t - T_{m}} 1 (J_{m} = j_{1}) e^{β^{T} Z_{j_{1} m}} Γ_{j} (d u)

for t ∈ (T_m, T_m₊₁] and j = (j₁, j₂) ∈ Inline graphic × . Here β is a regression coefficient and Γ_j in an unknown cumulative hazard function. If covariates are time independent and Γ_j(x) = γ_jx, the process reduces to a Markov chain regression model. In the general case, the modulated renewal process allows to incorporate dependence of the history on the sequence of states visited and the length of time spent in each state. As a result of this, it has a more flexible structure than Markov chains.

The purpose of this paper is to extend Cox’s modulated process to a class of transformation models. In the case of single spell models, they provide a common alternative to the proportional hazard model. In particular, they may be more appropriate than the proportional hazard model if relative differences between covariates dissipate or diverge over time. As an extension to multistate models, we consider here a modulated renewal process assuming that the counting process Ñ has compensator given by Λ_j(0) = 0,

Λ_{j} (t) = Λ_{j} (T_{m}) + \int_{0}^{t - T_{m}} 1 (J_{m} = j_{1}) α_{j} (Γ_{(j_{1}, .)} (u), θ, Z_{j_{1} m}) Γ_{j} (d u)

(1.1)

for t ∈ (T_m, T_m₊₁] and j = (j₁, j₂) ∈ Inline graphic × . For any such pair j = (j₁, j₂), α_j is a hazard function dependent on an unknown Euclidean parameter θ and a vector of unknown increasing functions $Γ_{(j_{1}, .)} = [Γ_{j} (x) = \int_{0}^{x} γ_{j} (u) d u : j = (j_{1}, j_{2}) \in J \times J, x \geq 0]$ . The components of Γ_{(j_1,.)} depend on all states which can be reached from the state j₁ in one step. If covariates are time independent, then (1.1) includes as a special case renewal processes whose interarrival times satisfy common transformation models. Other choices include semi-Markov models with one-step transition probabilities defined using copula graphic models (e.g. Zheng and Klein (1995), Rivest and Wells (2001), Lo and Wilke (2010)) or extensions of the dynamic Cox-McFadden’s model (Chintagunta and Prasad (1998)) combining transformation models and multinomial regression. These models are defined in more detail in Section 2, where covariates are also allowed to change at the renewal times of the process.

For purposes of estimation, we consider a modification of procedures studied by Bagdonovicius and Nikulin (1999,2004) and Dabrowska (2006) in the case of single spell transformation models. Section 3 provides properties of the estimates as well as an extension of the Gaussian multiplier method of Lin et al. (1994) for setting point-wise and simultaneous confidence bands for the unknown transformations and related parameters. In analogy to Cox’s model, the counting process Ñ has a compensator depending on the backwards recurrence time and as a result of this, it falls outside the class of multiplicative models studied by Andersen et al. (1993), for instance. In the case of Cox’s modulated renewal process or non-parametric semi-Markov models, estimation of the cumulative hazards of one-step transitions leads to a time transformation which arranges observations according to the length of time spent in each state rather than calendar time. As a result of the rearrangement of the time scale, usual counting process methods for analysis of large sample properties of stochastic integrals do not apply (Gill (1980), Oakes (1981), Oakes and Cui (1994)). To alleviate these problems, we use Hoeffding’s projection method and empirical processes in Section 5.

In Section 4, we consider a transplant outcome data set from the Center for International Blood and Marrow Transplant Research (CIBMTR). The example data set consists of patients who received HLA-identical sibling transplant from 1995 to 2004 for acute myelogenous leukemia (AML) or acute lymphoblastic leukemia (ALL). Multistate models for analysis of the bone marrow transplant recovery process have been proposed by several authors. The early work in this area focused on competing risk models and goes back Prentice et al. (1978) who discussed estimation of cause specific cumulative hazards in the proportional hazard model. More recent approaches towards analysis of leukemia transplant data are based on multistate models. They provide a convenient tool for evaluation of the impact of intermediate events in the transplant recovery process on the main outcome events corresponding to leukemia relapse and death in remission. However, analysis of multistate regression models leads to some difficulties in the interpretation of the results because there is no one-to-one correspondence between regression coefficients and transition probabilities. Each covariate may increase the risk of transition among some states of the model and at the same time decrease it among the others. Correspondingly, its overall impact on the outcome events is often not clear. To obviate difficulties, Arjas and Eerola (1993) and Eerola (1994) proposed a set of graphical tools which can be used for purposes of interpretation of regression analyzes based on multistate models. These included graphs of innovation gains and plots of the transition probabilities evaluated by conditioning on the follow-up history of a patient. The approach was illustrated using a proportional hazard model with time dependent covariates in Eerola (1994). Applications of these methods to proportional hazard Markov chain models were given in Klein et al. (1993) and Keiding et al. (2001) and Andersen and Parme (2008), and proportional hazard semi-Markov models in Dabrowska et al. (1993, 2006). Putter et al. (2007) discussed special cases of both models.

In this paper, we consider a data set involving patients who received either bone marrow (BMT) or peripheral blood stem cell transplant (PBSCT). Many clinical studies have reported that PBSCT may be beneficial during the early post-transplant period as it leads to faster engraftment and hematopoietic recovery than BMT (e.g. Flowers et al. 2002, Ringden et al. 2002). Several studies have also pointed out that differences between the two transplant types may dissipate over time (e.g. Friedrichs et al. 2010, Cutler et al. 2002ab). Such dissipating time effects are better captured by the proportional odds ratio model than the proportional hazard model, and in Section 5 we discuss an extension of it to semi-Markov models. In this section we also propose pointwise and simultaneous confidence bands for comparison of transition probabilities.

2 The model

Throughout the paper we assume that (Ω, Inline graphic , P) is a complete probability space and (T_m, V_m)_m_≥0 is a marked point process defined on it with marks taking on values in a separable measure space (E, ) and enlarged by the empty mark Δ. Thus T₀ < T₁ < … T_m … is a sequence of random time points registering occurrence of some events in time such that T_m are almost surely distinct and T_m ↑ ∞ P-a.s. At time T_m we observe a variable V_m such that V_m ∈ E if T_m < ∞, and V_m = Δ if T_m = ∞.

For any B ∈ Inline graphic , let Ñ(t, B) = Σ_m_≥0 1(T_m₊₁ ≤ t, V_m₊₁ ∈ B) be the process counting observations falling into the set [0, t] × B. The internal history of the process, ${F_{t}^{N}}_{t \geq 0}$ , represents information collected on Ñ until time t, and is given by $F_{t}^{N} = σ (1 (T_{m} \leq s, V_{m} \in B) : m \geq 1, s \leq t, B \in E) \lor σ (V_{0})$ . Let $F_{t} = N \lor F_{t}^{N}$ be the self-exciting filtration associated with the process Ñ, obtained by adjoining the P-null sets to the internal history of the process. The compensator of the process Ñ with respect to Inline graphic is given by

\tilde{Λ} (t, B) = \tilde{Λ} (T_{m}, B) + \int_{(T_{m}, t] \times B} \frac{P_{m} (d (s, v))}{P_{m} ([s, \infty]; E \cup Δ)} for t \in (T_{m}, T_{m + 1}],

where P_m(d(s, v)) is a version of a regular conditional distribution of (T_m₊₁, V_m₊₁) given Inline graphic (Jacod (1975)).

In this paper we assume that the marks V_m have the form V_m = (J_m, Z̃_m), where J_m ∈ Inline graphic = {1, …, r} is a discrete variable representing the type of the event occurring at time T_m and Z̃_m are covariates taking on value in R^d. The covariate Z̃_m may correspond to some measurements taken upon entrance into the state J_m. The process Ñ = [Ñ_j, j = (j₁, j₂) ∈ Inline graphic × ],

{\tilde{N}}_{j} (t, B) = \sum_{m \geq 0} 1 (T_{m + 1} \leq t, J_{m + 1} = j_{2}, J_{m} = j_{1}, {\tilde{Z}}_{m + 1} \in B),

has compensator given by

{\tilde{Λ}}_{j} (t, B) = {\tilde{Λ}}_{j} (T_{m}, B) + \int_{0}^{t - T_{m}} μ_{m + 1} (B, u + T_{m}, j) 1 (J_{m} = j_{1}) α_{j} (Γ_{(j_{1}, .)} (u), θ, Z_{j_{1} m}) Γ_{j} (d u),

for t ∈ (T_m, T_m₊₁]. Here μ_m₊₁(B, T_m₊₁, J_m, J_m₊₁) is the conditional probability of the event {Z̃_m₊₁∈ B} given σ( Inline graphic , T_m₊₁, J_m₊₁). Further, Z_j₁m = g_j₁m(T_l, J_l, Z̃_l: l = 0, …, m) is a fixed R^d valued function, measurable with respect to . Finally, α_j denotes a hazard rate dependent on a Euclidean parameter θ and a vector of unknown monotone increasing functions Γ _{(j_1,.)} = [Γ_j: j = (j₁, j₂) ∈ Inline graphic × ]. In particular, setting B = R^d and using μ_m₊₁(R^d, T_m₊₁, J_m, J_m₊₁)1(T_m₊₁ < ∞) = 1 P-a.s., Λ̃_j(t, R^d) reduces to (1.1) and represents the compensator of the “marginal” counting process

{\tilde{N}}_{j} (t) = {\tilde{N}}_{j} (t, R^{d}) = \sum_{m \geq 0} 1 (T_{m + 1} \leq t, J_{m + 1} = j_{2}, J_{m} = j_{1})

(2.1)

registering transitions among the adjoining states of the model.

To give examples of the model, we assume first that the covariates are time independent. If events are of a single type (| Inline graphic | = 1), then (1.1) represents compensator of a renewal regression model assuming that the interarrival times follow a transformation model. Thus in this case {α(u, θ, Z): θ ∈ Θ} is a parametric family of hazard rates, and the model stipulates that conditionally on Z, the interarrival times, X_m₊₁ are independent and their conditional survival function has cumulative hazard function A(Γ(x), θ, Z).

Simple examples of multi-type processes are given by competing risk and semi-Markov regression models. In particular, a semi-Markov regression model assumes that one-step transition probabilities satisfy

P {(X_{m + 1} \leq x, J_{m + 1} = j_{2} ∣ (T_{ℓ}, J_{ℓ})}_{ℓ = 0}^{m}, Z) = P (X_{m + 1} \leq x, J_{m + 1} = j_{2} ∣ J_{m}, Z) .

The matrix [F_j, j = (j₁, j₂) ∈ Inline graphic × ],

F_{j} (x ∣ Z) = P (X_{m + 1} \leq x, J_{m + 1} = j_{2} ∣ J_{m} = j_{1}, Z),

forms the kernel of the process. One way to define it is to consider latent variable models. Specifically, suppose that transitions originating from the state j₁ have the same conditional distribution as the pair (U, V), where

\begin{array}{l} U = min [U_{j} : j = (j_{1}, j_{2}) \in J \times J], \\ V = [1 (U = U_{j}) : j = (j_{1}, j_{2}) \in J \times J], \end{array}

and [U_j: j = (j₁, j₂) ∈ Inline graphic × ] is a multivariate vector whose joint conditional survival function given Z is

S_{(j_{1}, .)} (u, θ, z) - S_{(j_{1}, .)}^{0} ([Γ_{j} (u_{j}) e^{θ_{j}^{T} z} : j = (j_{1}, j_{2}) \in J \times J]) .

Here u = [u_j, j = (j₁, j₂) ∈ Inline graphic × ] and $S_{(j_{1}, .)}^{0}$ is a known multivariate survival function with a density with respect to Lebesgue measure supported on the entire upper orthant of R^q_j₁, q_j₁ = |{j₂: (j₁, j₂) ∈ × }|. The functions α_j in (1.1) are equal to

- \frac{\partial}{\partial y_{j}} log S_{(j_{1}, .)}^{0} ([y_{j} e^{θ_{j}^{T} z} : j = (j_{1}, j_{2}) \in J \in J]) .

With this choice the cumulative intensity (1.1) corresponds to a semi-Markov model whose kernel is given by

\begin{array}{l} F_{j} (x ∣ Z) = P (X_{m + 1} \leq x, J_{m + 1} = j_{2} ∣ J_{m} = j_{1}, Z) \\ = \int_{0}^{x} {\bar{F}}_{(j_{1}, .)} (u ∣ Z) α_{j} (Γ_{(j_{1 \cdot})} (u), θ, Z) Γ_{j} (d u), \end{array}

(2.2)

where j = (j₁, j₂) ∈ Inline graphic × and F̄_{(j_1,.)}(x|z) is the survival function of the sojourn time in state j₁,

\begin{array}{l} {\bar{F}}_{(j_{1}, .)} (x ∣ z) = P (X_{m + 1} > x ∣ J_{m} = j_{1}, z) \\ = exp [- \sum_{j = (j_{1}, j_{2})} \int_{0}^{x} α_{j} (Γ_{(j_{1}, .)} (u), θ, z) Γ_{j} (d u)] \\ = S_{(j_{1}, .)} (Γ_{(j_{1}, .)} (x), θ, z) . \end{array}

(2.3)

If the state space of the process consists of one ephemeral state (J₀ = 1, say) and q − 1 absorbing states, q ≥ 3, then the semi-Markov process reduces to a competing risk model. In this case transition probabilities (2.2) provide a regression analogue of copula-graphic models proposed for analysis of competing risks by Zhang and Klein (1995) and Rivest and Wells (2001). The special case of Archimedean copula models corresponds to the choice $S_{(j_{1}, .)}^{(0)} (y_{(j_{1}, .)}) = \bar{S} ({| | y_{(j_{1}, .)} | |}_{1})$ , where S̄ is a known survival function with a density supported on the positive half-line and ||·||₁ is the ℓ₁-norm of a vector.

Another example of a semi-Markov model is provided by the dynamic Cox-McFadden model (Chintagunta and Prasad, 1998). In this case, the distribution of the sojourn time in state j₁ ∈ Inline graphic is specified by means of a transformation model for univariate failure time data, i.e. the survival function (2.3) is of the form F̄_{(j_1,.)}(x|z) = exp[−Ã_j₁ (Γ_j₁(x), θ₁, z)] for some univariate cumulative hazard function Ã_j₁. The kernel of the process is given by

F_{j} (x ∣ z) = \int_{0}^{x} π_{j} (u, z, θ_{2}) F_{(j_{1}, .)} (d u ∣ z),

where F_{(j_1,.)}(·|z) = 1 − F̄_{(j_1,.)}(·|z) and for j = (j₁, j₂),

π_{j} (X_{m + 1},, Z, θ_{2}) = P (j_{m + 1} = j_{2} ∣ X_{m + 1}, J_{m} = j_{1}, Z)

(2.4)

are the one-step state transition probabilities. The state transition probabilities can be specified using multinomial regression models such as the logistic or probit model. If the state transition probabilities (2.4) do not depend on the length of the sojourn time X_m₊₁, the model reduces to a stationary process, i.e. conditionally on Z, the transition probabilities do not depend on m.

In practice, the assumptions of the semi-Markov process may be violated if transitions from a state j₁ to a state j₂ depend on the sequence or the time spent in states visited prior to the entrance into the state j₁. Both models can accommodate this problem by allowing the covariates to depend on the internal history of the process. The time dependent covariates may represent for instance the total number of events occurring prior to the entrance into the state j₂ or the length of time spent in states preceding entrance into the state j₁. The time dependent covariates may also represent changing treatment types or levels of drugs.

We further assume that the process is subject to censoring and times at which the process is observed is determined by a process C(t) = Σ_m_≥1 1(C_m₋₁ < t ≤ C_m), where 0 ≤ C₀ ≤ C₁ ≤ … ≤ C_m… is an increasing sequence such that C_m ∈ [T_m, T_m₊₁] are stopping times with respect to a larger filtration { Inline graphic }_t_≥0, ⊆ . If T_m = C_m then no information is available on either the sojourn time X_m₊₁ = T_m₊₁ − T_m or the marks (V_m, V_m₊₁). If C_m = T_m₊₁ then the sojourn time X_m₊₁ = T_m₊₁ − T_m and the marks (V_m, V_m₊₁) are observable. Finally, if T_m < C_m < T_m₊₁ then the mark V_m is visible while the sojourn time X_m₊₁ is only known to exceed C_m − T_m. Following Andersen et al. (1993), we assume that the compensator Λ, of the marked point process Ñ, relative to the filtration { Inline graphic }_t_≥0, satisfies Λ = Λ, P-a.s. and that the censoring process and the compensator Λ depend on parameters which do not share components in common. We also make the assumption that the censoring process is monotone so that with probability 1, T_m ≤ C_m < T_m₊₁ ⇒ C_m_′ = T_m_′ for all m′ > m. This condition stipulates that the process terminates once censoring takes place.

These conditions are satisfied in two common applications. The first assumes that the process is subject to censoring by a univariate failure time T′ such that T′ is independent of the the sequence (T_m, V_m), conditionally on the initial state of the process, V₀. In this case, C_m = T_m + min(T′ − T_m, X_m₊₁)1(T′ ≥ T_m) and the augmented filtration is given by Inline graphic = ∨ σ(T′).

The second example assumes that the state space of the process has an extra absorbing state corresponding censoring, say {c}, which can be reached in one step from each transient state j₁ ∈ Inline graphic . Time T till entrance into the censoring state forms then stopping time with respect to the filtration = . Consequently, there exist nonnegative variables U_m such that on the event {T ≥ T_m}, we have T ∧ T_m₊₁ = (T_m + U_m) ∧ T_m₊₁, and U_m is measurable with respect to Inline graphic . Correspondingly, C_m = T_m + min(U_m, X_m₊₁)1(T ≥ T_m). In this setting, the assumption of non-informative censoring means that the compensators of one-step transitions into the the censoring state depend on different parameters than the compensator of transitions among the remaining states of the model.

Let Inline graphic ⊂ × be the set of pairs of adjacent states in the model, i.e. j = (j₁, j₂) ∈ iff the subject may progress from state j₁ to state j₂ in one step. For j = (j₁, j₂) ∈ and m ≥ 0, let N_jm(x) = 1(X_m₊₁ ≤ x, J_m = j₁, J_m₊₁ = j₂, T_m = C_m₊₁), Y_jm(v) = 1(X_m₊₁ ≥ x, C_m − T_m ≥ x, J_m = j₁) and set

\begin{array}{l} M_{j m} (x, θ) = N_{j m} (x) - Λ_{j m} (x, θ), \\ Λ_{j m} (x, θ) = \int_{0}^{x} Y_{j m} (u) α_{j} (Γ_{j_{1}, .} (u), Z_{j_{1} m}, θ) Γ_{j} (d u) . \end{array}

The aggregate processes N_j., Y_j. and M_j. are defined as N_j. = Σ_m N_jm, Y_j. = Σ_m Y_jm and M_j. =Σ_m M_jm, respectively.

Note that the model depends on two parameters, θ and Γ, however, we suppress the dependence on Γ in the notation. In analogy to single spell models in Bagdonovicius and Nikulin (1999,2004) and Dabrowska (2006), under regularity conditions stated in Section 5, we can associate, with any θ ∈ Θ, a vector Γ_θ of locally bounded increasing functions. For this purpose, we shall require only that the processes N_j. and Y_j. have a finite expectation. To show asymptotic normality of estimates we shall require existence of the second moments of these processes. More precisely, we assume the following conditions.

Condition 2.1

For all j ∈ Inline graphic

The functions EY_j.(x) have at most a finite number of discontinuity points and EY_j.(0)² < ∞.
The functions EN_j.(x) are continuous, EN_j.(τ)² < ∞ and the point τ satisfies inf{x: EN_j.(x) > 0} < τ < τ_j₀, where τ_j₀ = sup{x: EY_j.(x) > 0}.
We have P(|Z_{J(t−),Ñ_..(t−)} | ≤ C) = 1, where C is a finite constant, J(t) is the state occupied by the process at time t and Ñ..(t) = Σ_jÑ_j.(t) is the total number of events observed in the interval [0, t].

Under the added assumption that the model corresponds to the censored modulated renewal process, and θ represents the true parameter, we have the following moment identities.

Lemma 2.1

Let L(t) = t − T_{Ñ_..(t−)} be the backwards time of the process Ñ and let {ϕ_m(x), m ≥ 0, x ≥ 0} be a sequence of random functions such that the process ϕ ∘ L, ϕ ∘ L(t) = ϕ_{Ñ_..(t−)}(t − T_{Ñ_..(t−)}), is predictable with respect to the filtration { Inline graphic }_t_≥0 and $E \int_{0}^{\infty} {[ϕ \circ L]}^{2} (s) {\tilde{Λ}}_{j} (d s, θ) < \infty$ . Then

\begin{array}{r} E \sum_{m} \int_{0}^{\infty} ϕ_{m} (u) N_{j m} (d u) = E \sum_{m} \int_{0}^{\infty} ϕ_{m} (u) Λ_{j m} (d u, θ), \\ E {[\sum_{m} \int_{0}^{\infty} ϕ_{m} (u) M_{j m} (d u, θ)]}^{2} = E \sum_{m} \int_{0}^{\infty} ϕ_{m}^{2} (u) Λ_{j m} (d u, θ) . \end{array}

In addition, if {ϕ₁_m: m ≥ 0} and {ϕ₂_m: m ≥ 0} are two such sequences, then

E [\sum_{m} \int_{0}^{\infty} ϕ_{1 m} (u) M_{j m} (d u, θ)] [\sum_{m} \int_{0}^{\infty} ϕ_{2 m} (u) M_{j^{'} m} (d u, θ)] = 0

for pairs j ≠ j′, j, j′ ∈ Inline graphic .

Similarly to Gill (1980), this lemma follows from the dominated convergence theorem, martingale properties of the processes M̃_j = Ñ_j(t) − Λ̃_j(t), and the identities

\begin{array}{c} \int_{0}^{\infty} {[ϕ \circ L]}^{k} (s) C (s) {\tilde{N}}_{j} (d s) = \sum_{m \geq 0} \int_{0}^{\infty} ϕ_{m}^{k} (u) N_{j m} (d u), \\ \int_{0}^{\infty} {[ϕ \circ L]}^{k} (s) C (s) {\tilde{Λ}}_{j} (d s, θ) = \sum_{m \geq 0} \int_{0}^{\infty} ϕ_{m}^{k} (u) Λ_{j m} (d u, θ) . \end{array}

The identities hold almost surely for k = 1, 2. We omit the details.

3 Estimation

Throughout the remainder of this paper, we assume that we have an iid sample of size n of the censored modulated renewal process and covariates. The subscript ”i” refers to the i-th subject under study and D_i represents the associated vector of observations. It corresponds to the sequence of states visited, duration of the time spent in each state, the initial covariate and its updates occurring at uncensored renewal times.

Further, let q = | Inline graphic | be the total number of possible one-step transitions in the model. For each j = 1, …, q, we let (r(j), c(j)) = (j₁, j₂) if the pair j ∈ corresponds to the one-step transition from state j₁ to the state j₂. For any such j ∈ , the covariate Z_j₁m is denoted as Z_jm. We shall also find it convenient to write Γ = [Γ₁, …, Γ_q]^T for the vector obtained by stacking the columns of the matrix Γ = [Γ_j]_j_∈ on the top of each other and deleting all entries corresponding to the pairs (j₁, j₂) ∉ Inline graphic . For the sake of convenience, we shall write α_j(y, θ, z) for each j ∈ and y = (y₁, …, y_q)^T, y_j ∈ R₊, j = 1, …, q. However, it is tacitly assumed here that for j = (j₁, j₂) ∈ , the function α_j(y, θ, z) may depend only on y_k’s such that (r(k), c(k)) = (j₁, ℓ) for some (j₁, ℓ) ∈ Inline graphic .

Under assumptions stated in section 5, the parameter θ varies over a bounded open subset Θ of R^d and the functions ℓ_j(y, θ, z) = log α_j(y, θ, z), y ∈ R^q are twice continuously differentiable with respect to (y, θ). We let $ℓ_{j}^{'} = {(ℓ_{j}^{(1)}, \dots, ℓ_{j}^{(q)})}^{T}$ be a vector whose k-th component is equal to the partial derivative of ℓ_j(y, θ, z) with respect to y_k, k = 1, …, q. Likewise, ℓ̇_j denotes the (column) vector of length d corresponding to the derivative of ℓ_j with respect to θ. We further set $S_{j} (y, θ, x) = n^{- 1} \sum_{i = 1}^{n} \sum_{m} Y_{jmi} (x) α_{j} (y, θ, Z_{jmi})$ , y ∈ R ^q and denote by Ṡ, S′ the derivatives of these processes with respect to (y, θ). Here, Ṡ is a d × q matrix, whose j-th column is given by Ṡ_j(y, θ, x), the derivative of S_j with respect to θ. Further $S^{'} = {[S_{j}^{(k)}]}_{j, k = 1, \dots, q}$ is a q × q matrix, whose (k, j) entry is equal to the partial derivative $S_{j}^{(k)} (y, θ, x)$ of S_j(y, θ, x) with respect to y_k, k = 1, …, q. Let s and let ṡ, s′ be the matrices of expected Ṡ and S′ processes. Finally, for each j ∈ Inline graphic , we let $N_{j . .} (x) = n^{- 1} \sum_{i = 1}^{n} \sum_{m} N_{jmi} (x)$ be the averaged process counting transitions from the state j₁ = r(j) to the state j₂ = c(j) and whose sojourn time in the state j₁ does not exceed x.

As an estimate of the unknown transformations Γ = [Γ₁, …, Γ_q]^T, we consider a vector valued analogue of the estimator proposed by Bagdonovicius and Nikulin (1999,2004) for analysis of single spell models. The estimator is given by

\begin{array}{l} Γ_{j n θ} (x) = \int_{0}^{x} \frac{N_{j . .} (d u)}{S_{j} (Γ_{n θ} (u -), θ, u)}, \\ Γ_{j n θ} (0 -) = 0, θ \in Θ, x \geq 0, j \in J_{0} . \end{array}

(3.1)

For fixed θ, (3.1) forms a sample analogue of the non-linear vector-valued Volterra equation

Γ_{j θ} (x) = \int_{0}^{x} \frac{{E N}_{j . .} (d u)}{s_{j} (Γ_{θ} (u -), θ, u)}, Γ_{j θ} (0 -) = 0, x \geq 0, j \in J_{0} .

(3.2)

Using arguments similar to Dabrowska (2006), we can show that under the regularity conditions stated in Section 5, the equation (3.2) has a unique solution Γ_θ = [Γ₁_θ, …, Γ_qθ]^T and its estimator (3.1) is uniformly consistent. Further, the function Θ ∋ θ → {Γ_θ(x): x ∈ [0, τ]} ∈ C([0, τ])^q is Frèchet differentiable with respect to θ. The derivative is a d × q matrix of continuous functions satisfying the matrix-valued linear Volterra equation

{\dot{Γ}}_{θ} (x) = - \int_{0}^{x} \dot{s} (Γ_{θ} (w -), θ, w) C_{θ} (d w) - \int_{0}^{x} {\dot{Γ}}_{θ} (w -) Q_{θ} (d w),

(3.3)

where C_θ(x) is the diagonal q × q matrix C_θ(x) = diag [C₁_θ(x), …, C_qθ(x)] with entries

C_{j θ} (x) = \int_{0}^{x} \frac{{E N}_{j . .} (d u)}{s_{j}^{2} (Γ_{θ} (u -), θ, u)}

and

Q_{θ} (x) = \int_{0}^{x} s^{'} (Γ (w -), θ, w) c_{θ} (d w) .

The solution to the Volterra equation is given by

{\dot{Γ}}_{θ} (x) = - \int_{0}^{x} \dot{s} (Γ_{θ} (w -), θ, w) C_{θ} (d w) P_{θ} (w, x) .

(3.4)

where Inline graphic (w, x), 0 < w ≤ x is the Peano series (Gill and Johansen, 1990)

P_{θ} (u, x) = I + \sum_{m = 1}^{\infty} \int_{u < w_{1} < \dots < w_{m} \leq x} {(- 1)}^{m} Q_{θ} ({d w}_{1}) \cdot \dots \cdot Q_{θ} ({d w}_{m}) .

(3.5)

Here I is the q × q identity matrix. A uniformly consistent estimate of {Γ̇_θ(x): x ∈ [0, τ], θ ∈ Θ} can be obtained by substituting the processes N_j.. and S_j, $S_{j}^{'}$ , Ṡ_j into the preceding expressions.

To define the score equation for estimation of the Euclidean parameter, let

e_{j} [f_{j}] (u, θ) = \frac{E \sum_{m} Y_{jmi} (u) [f_{j} α_{j}] (Γ_{θ} (u), θ, Z_{jmi})}{E \sum_{m} Y_{jmi} (u) α_{j} (Γ_{θ} (u), θ, Z_{jmi})},

where f_j(y, θ, Z_jmi) is a function of covariates, jointly continuous with respect to (y, θ) and bounded on every compact set of R^q × Θ. Likewise, for any two vectors f₁_j and f₂_j of such functions, define

{cov}_{j} [f_{1 j}, f_{2 j}] (u, θ) = (e_{j} [(f_{1 j} \otimes f_{2 j})] - (e_{j} [f_{1 j}] \otimes e_{j} [f_{2 j}])) (u, θ)

and set var_j[f_j](u, θ) = cov_j[f_j, f_j](u, θ).

To estimate the parameter θ, we use a solution to the score equation U_n(θ) = U_{nϕ_n}(θ) = o_P (n^−1/2), where

U_{n ϕ_{n}} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{τ} {\hat{b}}_{jmi} (Γ_{n θ} (u), θ, u) N_{jmi} (d u),

(3.6)

b̂_jmi(Γ_nθ(u), u, θ) = b̂_jm1i(Γ_nθ(u), u, θ) − ϕ_nθ(u)b̂_jm2i(Γ_nθ(u), u, θ) and

\begin{array}{l} {\hat{b}}_{j m 1 i} (y, θ, u) = {\dot{ℓ}}_{j} (y, θ, Z_{jmi}) - [{\dot{S}}_{j} / S_{j}] (y, θ, u), \\ {\hat{b}}_{j m 2 i} (y, θ, u) = ℓ_{j}^{'} (y, θ, Z_{jmi}) - [S_{j}^{'} / S_{j}] (y, θ, u) . \end{array}

Here ϕ_nθ(x) is an estimate of a d × q matrix of bounded functions ϕ_θ(x), whose j-th column is absolutely continuous with respect to Γ_jθ.

We further define matrices

\begin{array}{l} \sum_{0} (θ) = \sum_{j} \int_{0}^{τ} v_{j, ϕ} (u, θ) {E N}_{j . .} (d u), \\ \sum_{1} (θ) = \sum_{0} (θ) + \sum_{j} \int_{0}^{τ} ρ_{j ϕ} (u, θ) {E N}_{j . .} (d u) {[{\dot{Γ}}_{θ} (u) + ϕ_{θ} (u)]}^{T}, \\ \sum_{2} (θ) = \sum_{0} (θ) + \int_{0}^{τ} D_{ϕ} {(u, θ)}^{T} C_{θ} (d u) D_{ϕ} (u, θ), \end{array}

where $v_{j, ϕ} (u, θ) = {var}_{j} [{\dot{ℓ}}_{j} - ϕ_{θ} ℓ_{j}^{'}] (u, θ), ρ_{j, ϕ} (u, θ) = {cov}_{j} [{\dot{ℓ}}_{j} - ϕ_{θ} ℓ_{j}^{'}, ℓ_{j}^{'}] (u, θ)$ and

D_{ϕ} (u, θ) = \sum_{j} \int_{u}^{τ} P_{θ} (u, w) {E N}_{j . .} (d w) ρ_{j ϕ} {(w, θ)}^{T} .

Proposition 3.1

Let ε_n ↓ 0 be a sequence such that $\sqrt{n} ε_{n} \to \infty$ and let Inline graphic (θ₀, ε_n) = {θ: |θ − θ₀| ≤ ε_n} be the ball of radius ε_n centered at θ₀. Suppose that the matrix Σ₀(θ₀) is positive definite and the matrix Σ₁(θ₀) is non-singular. Under conditions stated in Section 5, the score equation U_{nϕ_n}(θ) = o_P^*(n^−1/2) has a solution θ̂ in the ball Inline graphic (θ₀, ε_n), with (inner) probability tending to 1. Further, let $\hat{Ξ} = \sqrt{n} (\hat{θ} - θ_{0})$ and ${\hat{W}}_{0} = \sqrt{n} [{(Γ_{n \hat{θ}} - Γ_{θ_{0}})}^{T} - {(\hat{θ} - θ_{0})}^{T} {\dot{Γ}}_{n \hat{θ}}]$ . Then [Ξ̂, Ŵ₀] converges weakly in R^d ×ℓ^∞([0, τ] × Inline graphic ) to a tight mean zero Gaussian process [Ξ, W₀] with covariance

\begin{array}{l} cov Ξ = \sum_{1}^{- 1} (θ_{0}) \sum_{2} (θ_{0}) {[\sum_{1}^{- 1} (θ_{0})]}^{T}, \\ cov (W_{0} (x), W_{0} (x^{'})) = K_{θ_{0}} (x, x^{'}), \\ cov (Ξ, W_{0} (x)) = - \sum_{1}^{- 1} (θ_{0}) \sum_{j} \int_{0}^{τ} ρ_{j, ϕ} (u, θ_{0}) {E N}_{j . .} (d u) K_{θ_{0}} (u, x), \end{array}

where K_θ, θ ∈ Θ is a q × q matrix

K_{θ} (x, x^{'}) = \int_{0}^{x \land x^{'}} P_{θ}^{T} (u, x) C_{θ} (d u) P_{θ} (u, x^{'}) .

(3.7)

Here Inline graphic = ℓ^∞([0, τ] × ) denotes the space of bounded functions mapping the set [0, τ] × into R and equipped with uniform metric and Borel σ-field. The Borel σ-field = R^d × is generated by open sets in the product topology of the Euclidean space R^d and the space . It is equal to Inline graphic (R^d) ⊗ ( ) because R^d is a complete separable metric space. The process X = (Ξ, W₀) has a version whose almost all paths are in the separable subspace of corresponding to R^d × C_b([0, τ] × ), where C_b([0, τ] × ) is the space functions continuous with respect to the variance pseudometric. Weak convergence of the sequence X_n = [Ξ̂, Ŵ₀] to (Ξ, W₀) means that for all bounded continuous functions f on Inline graphic , we have E^*f(X_n) − Ef (X) → 0, where E^* is the outer expectation. This implies that X_n is asymptotically measurable. In particular, we have E^*f(X_n) − E_*f (X_n) → 0 for all bounded continuous functions f on , where E_*f (X_n) = −E^*(−f(X_n)) is the inner expected (van der Vaart and Wellner (1996), Dudley (1999)). We also note that the space Inline graphic = ℓ^∞( × ) is isometric to the product space = ℓ^∞([0, τ])^q equipped with uniform metric d_Y (x, y) = max_j sup_t |x_j(t) − y_j(t)| and product topology of coincides with the topology induced by metric d_Y. Under assumptions of section 5, the space C_b([0, τ] × Inline graphic ) is isometric to the space C([0, τ])^q and W₀ is a linear transformation of a vector of q independent time-transformed Brownian motions.

The M-estimator θ̂ depends on the specification of the matrix ϕ_θ and its estimator ϕ_nθ. Depending on the measurability properties of the estimator ϕ_nθ, the solution to the score equation exists either with probability tending to 1, or with inner probability tending to 1 (Section 5). Two simple choices of the function ϕ_θ correspond to ϕ_θ ≡ 0 and ϕ_θ = − Γ̇_θ. In particular, with the latter choice, the estimate θ̂ is an analogue of the pseudo-maximum likelihood estimators considered by Bagdonovicius and Nikulin (1999,2004) in the case of single spell models. Under regularity conditions, the optimal choice of this function corresponds to solution of a system of Sturm-Liouville equations and yields an asymptotically efficient estimate of the Euclidean component of the model. If the process registers only events of one type (i.e. | Inline graphic | = 1) then the form of ϕ_θ corresponding to the efficient estimate of θ is similar to the single spell version of this model and can be found in Bickel (1986) and Bickel and Ritov (1995) in the uncensored case, and in Dabrowska (2007) in the censored case. The estimate of the function ϕ_θ can be obtained in this case by inverting a simple tridiagonal band-symmetric matrix. The form of the information bound and efficient score function for the general case (| Inline graphic | > 1) is postponed to a separate paper, where we consider it under additional compatibility conditions.

To set confidence bands for the baseline Γ vector and related parameters, we consider Gaussian multiplier method of Lin, Fleming and Wei (1994). For this purpose, we shall need some additional notation.

Let G₀ be a vector of independent (0, I_d_×_d) variables. and let G_i = (G_mi: m = 1, …, K_i), i = 1, …, n, K_i = Y.._i(0) be standard normal variables, independent of G₀ and mutually independent given the data D₁, …, D_n.
For j ∈ , set
${\hat{V}}_{j}^{#} (x) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{m} G_{m i} \int_{0}^{x} \frac{N_{jmi} (d u)}{S_{j} (Γ_{n \hat{θ}} (u -), \hat{θ}, u)},$
Put ${\hat{Ξ}}^{#} = {\hat{Ξ}}_{1}^{#} - {\hat{Ξ}}_{2}^{#}$ , where ${\hat{Ξ}}_{1}^{#} = {\sum^{^}}_{1}^{#} (\hat{θ}) {\sum^{^}}_{0} {(\hat{θ})}^{1 / 2}$ and
$\begin{array}{l} {\hat{Ξ}}_{2}^{#} = {\sum^{^}}_{1}^{- 1} (\hat{θ}) \sum_{j} \int_{0}^{τ} {\hat{ρ}}_{j, ϕ_{n}} (u, \hat{θ}) N_{j . .} (d u) {\hat{W}}_{0}^{#} {(u)}^{T}, \\ {\hat{W}}_{0}^{#} (x) = \int_{0}^{x} {\hat{V}}^{#} (d u) {\hat{P}}_{\hat{θ}} (u, x) = {\hat{V}}^{#} (x) - \int_{0}^{x} {\hat{W}}_{0}^{#} (u -) {\hat{Q}}_{\hat{θ}} (d u) . \end{array}$

The estimates Q̂_θ̂ and Inline graphic are plug-in analogues of the matrices defined in (3.3)–(3.5)

Proposition 3.2

Suppose that the conditions of Proposition 3.1 are satisfied. Then, unconditionally, ((Ξ̂^#, ${\hat{W}}_{0}^{#}$ ), ${\hat{W}}_{0}^{#} = {[{\hat{W}}_{j}^{#} (x) : x \in [0, τ], j \in J_{0}]}$ converges weakly in R^d×ℓ^∞([0, τ]× Inline graphic ) to a mean zero Gaussian process (Ξ^#, $W_{0}^{#}$ ) with the same covariance function as (Ξ, W₀). Moreover, (Ξ, W₀) and (Ξ^#, $W_{0}^{#}$ ) are independent while (Ξ̂, Ŵ₀) and (Ξ^#, ${\hat{W}}_{0}^{#}$ ) are asymptotically independent. Conditionally, the process (Ξ^#, ${\hat{W}}_{0}^{#}$ ) converges weakly to (Ξ^#, $W_{0}^{#}$ ), in probability. As in van der Vaart and Wellner (1996, p. 181), conditional weak convergence means that ${sup}_{h \in {B L}_{1}} ∣ E_{G} h ({\hat{Ξ}}^{#}, {\hat{W}}_{0}^{#}) - E h (Ξ^{#}, W_{0}^{#}) ∣ \to_{P^{*}} 0$ , where E_G denotes expectation with respect to the G variables. Further, h varies over the class of bounded Lipschitz functions, and BL₁ is the set Lipschitz functions whose norm is bounded by 1.

This proposition can be further extended to approximate the distribution of functionals Φ(θ, Γ). In sufficiently simple cases, functional delta method can be used for this purpose. In particular, we may consider estimation of the kernel F of a semi-Markov processes with a state space Inline graphic = {1, …, r}. In this case the covariates are time independent, and the entries of the matrix F(x|z) = [F_j(x|z)]_j_∈ are specified by (2.2)–(2.3). Under the assumed differentiability conditions on the hazard functions α_j, the plug-in sample analogue F̂ of the matrix F has entries satisfying

\begin{array}{l} {\hat{W}}_{F, j} (x ∣ z) = \sqrt{n} [{\hat{F}}_{j} - F_{j}] (x ∣ z) \\ = {\hat{Ξ}}^{T} \int_{0}^{x} {\dot{f}}_{j} (Γ_{(j_{1 \cdot})} (u), θ_{0}, z) Γ_{j} (d u) + \int_{0}^{x} {\tilde{W}}_{(j_{1 \cdot})} (u) f_{j}^{'} (Γ_{(j_{1 \cdot})} (u), θ, z) Γ_{j} (d u) \\ + \int_{0}^{x} f_{j} (Γ_{(j_{1 \cdot})} (u), θ_{0}, z) {\tilde{W}}_{0 j} (d u) + O_{P^{*}} (1), j \in J_{0} . \end{array}

(3.8)

For any j = (j₁, j₂) ∈ Inline graphic , Γ _{(j_1,.)} and W̃_{(j_1,.)} denote subvectors Γ _{(j_1,.)} = {Γ_θ₀j: j = (j₁, ℓ) ∈ } and W̃_{(j_1,.)} = {W̃₀_j: j = (j₁, ℓ) ∈ }, where

{\tilde{W}}_{0} = {\sqrt{n} [Γ_{n j \hat{θ}} - Γ_{j θ_{0}}] : j \in J_{0}} = {\hat{W}}_{0} + {\hat{Ξ}}^{T} {\dot{Γ}}_{n \hat{θ}} + O_{P^{*}} (1) .

(3.9)

Denote by ${\hat{W}}_{F}^{#}$ the matrix obtained by replacing in (3.8)–(3.9) the process (Ξ̂, Ŵ₀) by (Ξ̂^#, ${\hat{W}}_{0}^{#}$ ) and the unknown parameters by their estimates (θ̂, Γ_nθ̂). Using integration by parts and Proposition 3.1 it is easy to verify that the process Ŵ_F = [Ŵ_F_,_j(x|z): x ≤ τ, j ∈ Inline graphic ] converges weakly to a mean zero Gaussian process W_F in ℓ^∞([0, τ])^|
|. In addition, the conclusions of Proposition 3.2 carry over to the process ${\hat{W}}_{F}^{#} = [{\hat{W}}_{F, j}^{#} (x ∣ z) : x \leq τ, j \in J_{0}]$ , i.e. unconditionally, ${\hat{W}}_{F}^{#}$ converges weakly to a mean zero Gaussian process $W_{F}^{#}$ with the same covariance function as the process W_F and is independent of it. Conditionally, the process ${\hat{W}}_{F}^{#}$ converges weakly to $W_{F}^{#}$ in probability.

Another example of a functional may correspond to the cumulative residual process arising in goodness-of-fit testing. In particular, suppose that covariates are partitioned into k disjoint categories, I₁, …, I_k. The cumulative residual process for the one-step transition between states j₁ → j₂ is given by

\begin{array}{l} {\hat{R}}_{j} (x, ℓ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{m} 1 (Z_{jmi} \in I_{ℓ}) {\hat{M}}_{jmi} (x) \\ = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{m} \int_{0}^{x} [1 (Z_{jmi} \in ℓ) - \frac{S_{j ℓ}}{S_{j}} (Γ_{n \hat{θ}} (u -), \hat{θ}, u)] N_{jmi} (d u), \end{array}

where $S_{j ℓ} (Γ_{n \hat{θ}} (u -), \hat{θ}, u) = \sum_{i = 1}^{n} \sum_{m} Y_{jmi} (u) 1 (Z_{jmi} \in I_{ℓ}) α_{j} ({\hat{Γ}}_{n \hat{θ}} (u -), \hat{θ}, Z_{jmi})$ is the risk process corresponding to subjects in the group I_ℓ. Under the assumption that residuals are consistent with the model, the R̂ = {R̂_j(t, ℓ): t ∈ [0, τ], j ∈ Inline graphic , ℓ = 1, …, k} converges weakly to a mean zero Gaussian process and the Gaussian multiplier approximation to its distribution is given by

\begin{array}{l} {\hat{R}}_{j}^{#} (x, ℓ) = \frac{1}{\sqrt{n}} \int_{0}^{x} [1 (Z_{jmi} \in I_{ℓ}) - \frac{S_{j ℓ}}{S_{j}} (Γ_{n \hat{θ}} (u -), \hat{θ}, u)] G_{m i} N_{jmi} (d u) \\ - {({\hat{Ξ}}^{#})}^{T} \int_{0}^{x} ([\frac{{\dot{S}}_{j ℓ}}{S_{j ℓ}} - \frac{{\dot{S}}_{j}}{S_{j}}] \frac{S_{j ℓ}}{S_{j}}) (Γ_{n \hat{θ}}, \hat{θ}, u) N_{j . .} (d u) \\ - \int_{0}^{x} {\tilde{W}}_{(j_{1}, .)}^{#} (u) ([\frac{S_{j ℓ}^{'}}{S_{j ℓ}} - \frac{S_{j}^{'}}{S_{j}}] \frac{S_{j ℓ}}{S_{j}}) (Γ_{n \hat{θ}}, \hat{θ}, u) N_{j . .} (d u) . \end{array}

In analogy to Martinussen and Scheike (2006), the performance of residuals can be evaluated using Kolmogorov-Smirnov statistics such as sup_x_∈[_δ_,_τ₋_δ_] |R̂_j(x, ℓ)| and the Guassian multiplier method can be used to obtain critical levels of tests. Alternate tests can be obtained by modifying chi-squared tests in Aalen et al (2008, p.144) or tests based on Schoenfeld residuals.

4 Example

We consider a transplant outcome data set from the Center for International Blood and Marrow Transplant Research (CIBMTR). The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and bone marrow transplant patients with CIBMTR Data Collection Center located at the Medical College of Wisconsin. The CIBMTR is a repository of information about results of transplants at more than 450 transplant centers worldwide. The example data set consists of patients who received HLA-identical sibling transplant from 1995 to 2004 for acute myeloge-nous leukemia (AML) or acute lymphoblastic leukemia (ALL) and transplanted in first remission. All patients received bone marrow transplantation or peripheral blood stem cell transplantation. Children under age 16 and all patients who received umbilical cord blood transplants were excluded as risk factors are likely to vary in this group.

Allogeneic stem cell transplantation (ASCT) is an accepted treatment for leukemia patients. Transplant candidates receive high doses of chemotherapy and radiation which destroy malignant cells in the bone marrow and elsewhere. Because stem cells in the normal bone marrow are destroyed in this process as well, patients subsequently receive a transplant from a suitably matched donor. The transplant can be followed by several complications. In this study, fatal complications correspond to relapse of leukemia or death in remission (hereafter referred to as death). The most important intermediate event in ASCT is graft-versus-host-disease (GVHD) in which transplanted immune cells recognize the recipient’s body tissues as foreign. Acute and chronic GVHD (AGVHD and CGVHD) are two forms of this disease. AGVHD occurs during the early post-transplant period is defined here as moderate to severe using clinically established criteria. CGVHD occurs later in time and may be preceded by AGVHD.

The incidence of GVHD, leukemia relapse and death in remission depends on a number of variables characterizing the recipient, the donor and the transplant. The main variables considered in this paper include recipient’s age, donor-recipient gender match, disease type and graft source. Bone marrow was the first source of stems cells used in used ASCT. Since 90’ies, peripheral-blood stem cell transplants have replaced bone marrow as the preferred source of stem cells because of a quicker hematologic recovery and relative ease of collection. Patients may receive also an infusion of both peripheral stem-cells and bone marrow. Several studies have shown that PBSCT recipients may be at a higher risk of GVHD than BMT patients. (e.g. Cutler et al. (2001), Flowers et al. (2002), Friedrichs et al. (2010)). A possible explanation of this phenomenon is that GVHD develops from the infusion of donor T cells and PBSCT recipients receive a significantly higher dose of T cells than BMT patients. As a result of the increased risk of GVHD, the patients who experience it may be at a higher risk of death in remission than BMT patients. GVHD is also more more common among older patients and among male recipients receiving transplants from female donors (Gale et al. 1987).

For purposes of modeling, we consider a five state modulated renewal model proposed for analysis of the transplant recovery process in Dabrowska et al. (1994). Table 1 collects some information about the type and number of the observed transitions, their range and median. The model assumes that a patient remains in the transplant state (tx, state 1) until the time of the first adverse event which may correspond to AGVHD (state 2), CGVHD (state 3), relapse (state 4) or death in remission (state 5). The model takes also in to the account that a patient who develops GVHD may subsequently relapse or die, and that CGVHD may be preceded by AGVHD. The observed model has an extra absorbing state corresponding to censoring (loss-to-follow-up). Further, age was categorized into 3 groups, each representing approximately one third of the patients. The baseline group corresponds to the age range [29.5, 42.5]. Transitions were also adjusted for the waiting time for transplant. Two continuous variables were used for this purpose: the length of time between leukemia diagnosis and first remission (DxCr) and the length of time between first remission and transplant (CrTx). Their medians and range were: median(DxCr)= 1.38, IQR(DxCr)=1.15, range(DxCr)=221.45 months and med(CrTx) = 3.06, IQR(CrTx)=2.5, range(CrTx)=46.74 months. To obviate skewness of the distribution, the log transformation of these variables is used in the regression analysis.

Table 1.

Observed one-step transitions

	n	median (in months)	range (in months)
TX → AGVHD	491	.7	4.3
TX → CGVHD	372	5.5	106.4
TX → relapse	106	5.6	59.4
TX → death	179	2.9	131.9
TX → censoring	506	56.9	143.8

AGVHD → CGVHD	202	4.8	57.4
AGVHD → relapse	33	5.2	23.7
AGVHD → death	141	2.9	80.3
AGVHD → censoring	115	45.7	133.0

CGVHD → relapse	27	8.3	98.3
CGVHD → death	79	9.8	124.4
CGVHD → censoring	266	51.1	144.3

A+CGVHD → relapse	25	3.5	53.3
A+CGVHD → death	65	5.6	109.3
A+CGVHD → censoring	112	56.3	145.2

Open in a new tab

The modulated renewal process assumes that one-step transition probabilities are specified by means of a proportional odds ratio model. More precisely, hazard rates of one-step transitions originating from the transplant or AGVHD state are of the form

α_{j} (Γ_{(j_{1}, .)} (x), θ, Z) γ_{j} (x) = e^{θ_{j}^{T} Z_{j}} {[1 + \sum_{k = j_{1} + 1}^{5} 1 (ℓ = (j_{1}, k)) Γ_{ℓ} (x) e^{θ_{ℓ}^{T} Z_{ℓ}}]}^{- 1} γ_{j} (x),

for j =(j₁, j₂) such that j₁ = 1 or j₁ = 2 and j₁ + 1 ≤ j₂ ≤ 5, $Γ_{j} (x) = \int_{0}^{x} γ_{j} (u) d u$ . In the case of transition rates originating from the CGVHD state, we use covariate Z_C = (Z, Z_A), where Z_A is a binary variable indicating by 1 whether AGVHD preceded onset of chronic graft versus host disease. The corresponding transition rates into the relapse and death states are given by

α_{j} (Γ_{(3, .)} (x), θ, Z_{C}) γ_{j} (x) = e^{θ_{j}^{T} Z_{j C}} {[1 + \sum_{k = 4}^{5} 1 (ℓ = (3, k)) Γ_{ℓ} (x) e^{θ_{ℓ}^{T} Z_{j C}}]}^{- 1} γ_{j} (x)

for j = (3, j₂) and j₂ = 4, 5. Here Z_j and Z_jC, j = (j₁, j₂), represent transition specific covariates, which correspond to subvectors of Z and Z_C, respectively. Table 4 provides their entries as well as the estimates of the regression coefficients and standard errors. The estimates were obtained using Fisher scoring algorithm applied to the score process (3.6) with ϕ_nθ = −Γ̇_nθ. Variable selection was based on backwards elimination and Wald testing. To asses adequacy of the model, we have used Kolmogorov-Smirnov described in Section 3. The results are summarized below and in Table 5.

Table 4.

Regression estimates

	1	2	3	4	5	6	7	8	9
ALL vs AML	.07 (.25)		1.32 (.36)	.50 (.23)		.45 (.38)	.12 (.30)	.90 (.31)	.58 (.22)
Age1	−.25 (.16)	−.68 (.28)	−.49 (.32)	−.45 (.26)
Age2				.27 (.20)	.33 (.23)	.51 (.59)	.01 (.24)		.72 (.21)
FM			−.20 (.28)		−.57 (.39)
PBSCT vs BMT		.09 (.22)	.01 (.29)		.28 (.30)	−.12 (.66)	−.17 (.38)	.55 (.35)
ALLxPBSCT	.46 (.23)		.92 (.43)
AMLxPBSCT					−.33 (.30)
AMLxBMT	−.30 (.22)						−.50 (.43)
DxCr	.12 (.08)		.45 (.16)	.22 (.12)					.13 (.14)
CrTx	−.21 (.07)	−.26 (.09)	−.33 (.15)					−.24 (.17)	−.10 (.12)
prioir AGVHD								.72 (.30)	.72 (.20)
Age1xPBSCT							−.44 (.34)
Age1xBMT		−.57 (.33)				.70 (.60)	−.88 (.35)		−.64 (.34)
Age2xPBSCT	−.28 (.25)							.13 (.37)
Age2xBMT		−.37 (.24)				.60 (.88)
Age0xPBSCT	−.27 (.26)				.40 (.31)
AMLxPBSCTxAge2		.25 (.22)
Age1xALL		.61 (.27)	−.87 (.48)	−.60 (.42)
Age2xALL		.57 (.30)	−.97 (.47)
FMxALL		.63 (.27)
FMxAML		.64 (.17)			.81 (.45)
FMxPBSCT	.42 (.18)			.44 (.25)
FXALL	−.25 (.19)
FxAML	−.19 (.14)

Open in a new tab

Columns: 1 = Tx → AGVHD; 2 = Tx → CGVHD; 3 = Tx → Relapse; 4 = Tx → Death 5 = AGVHD → CGVHD; 6 = AGVHD → Relapse; 7 = AGVHD → Death; 8 = CGVHD → Relapse; 9 = CGVHD → Death.

Variable names: Age0: age in the (29.5, 42, 5] range, Age1 = age ≤ 29.5 years, Age2 = age > 42.5 years; F = female donor transplant; FM = female donor to male recipient vs other sexmatch transplant.

Table 5.

Kolmogorov-Smirnov residual statistics

	1	2	3	4	5	6	7	8	9
AML	8.51 (.97)	6.35 (.96)	7.88 (.69)	5.87 (.86)	4.40 (.96)	2.75 (.86)	4.14 (.97)	4.42 (.73)	6.70 (.82)
ALL	10.22 (.89)	7.66 (.83)	7.65 (.69)	6.09 (.73)	4.34 (.94)	2.66 (.85)	4.23 (.95)	4.49 (.71)	6.46 (.75)
Age0	12.99 (.81)	6.52 (.93)	3.30 (.95)	6.50 (.63)	4.55 (.92)	2.79 (.51)	4.37 (.93)	2.03 (.94)	8.17 (.47)
Age1	6.42 (.98)	5.82 (.95)	3.20 (.94)	5.18 (.82)	4.96 (.87)	3.66 (.71)	2.66 (.98)	2.04 (.95)	3.53 (.87)
Age2	10.56 (.90)	7.64 (.91)	6.40 (.60)	8.11 (.72)	6.35 (.83)	2.44 (.87)	2.92 (.99)	1.51 (.99)	8.07 (.76)
BMT	7.84 (.97)	7.73 (.91)	7.16 (.69)	6.04 (.61)	5.96 (.81)	1.36 (.99)	5.11 (.90)	2.94 (.79)	9.67 (.57)
PBSCT	9.44 (.95)	9.07 (.90)	7.49 (.73)	5.75 (.82)	6.66 (.88)	1.27 (.99)	4.84 (.95)	2.91 (.91)	9.39 (.69)
non-FM	5.46 (.99)	7.84 (.94)	1.95 (.99)	9.82 (.66)	4.22 (.97)	1.50 (.97)	4.57 (.96)	3.97 (.80)	4.20 (.95)
FM	6.83 (.96)	9.36 (.82)	2.32 (.96)	9.52 (.43)	5.06 (.88)	1.33 (.96)	5.07 (.84)	3.91 (.60)	4.37 (.88)
M donor	5.48 (.99)	1.29 (.84)	2.96 (.98)	6.00 (.84)	1.42 (.58)	2.02 (.94)	3.93 (.98)	2.06 (.97)	3.50 (.98)
F donor	6.84 (.98)	11.95 (.80)	2.61 (.99)	5.83 (.86)	11.50 (.53)	1.93 (.93)	4.43 (.95)	2.08 (.97)	3.47 (.97)
DxCr-1	9.73 (.86)	16.86 (.42)	5.67 (.56)	7.88 (.49)	4.92 (.85)	2.27 (.77)	5.17 (.81)	2.44 (.89)	5.34 (.74)
DxCr-2	8.47 (.88)	11.67 (.59)	2.56 (.94)	2.86 (.98)	6.80 (.61)	2.79 (.64)	5.93 (.73)	3.85 (.57)	4.72 (.74)
DxCr-3	4.03 (1.00)	12.92 (.48)	4.34 (.71)	5.03 (.71)	9.88 (.33)	2.21 (.75)	4.27 (.88)	4.38 (.40)	11.23 (.30)
DxCR-4	11.80 (.83)	15.05 (.43)	4.01 (.89)	6.71 (.69)	6.15 (.71)	5.67 (.27)	12.11 (.43)	2.83 (.78)	3.52 (.94)
CrTx-1	14.69 (.74)	11.23 (.52)	5.19 (.66)	5.21 (.74)	6.53 (.74)	3.97 (.55)	7.24 (.76)	3.17 (.75)	2.43 (1.00)
CrTx-2	9.92 (.85)	12.52 (.58)	3.96 (.81)	5.37 (.68)	7.05 (.58)	2.61 (.66)	6.13 (.73)	3.49 (.67)	7.57 (.54)
CrTx-3	12.08 (.76)	5.86 (.94)	5.96 (.62)	8.34 (.46)	4.17 (.89)	2.52 (.67)	2.62 (.98)	2.34 (.86)	4.84 (.75)
CrTx-4	8.84 (.87)	7.21 (.85)	2.94 (.92)	1.37 (.35)	5.73 (.71)	4.21 (.37)	5.71 (.73)	3.59 (.57)	4.54 (.80)
AML x BMT	5.38 (.99)	9.84 (.76)	5.15 (.70)	1.62 (.41)	5.12 (.82)	2.04 (.88)	5.46 (.78)	2.57 (.64)	7.96 (.50)
AML x PBSCT	7.25 (.96)	6.30 (.97)	1.03 (.34)	5.04 (.80)	5.18 (.92)	1.84 (.87)	5.77 (.88)	3.18 (.84)	4.60 (.91)
ALL x BMT	7.62 (.85)	2.92 (.99)	4.44 (.74)	7.03 (.42)	2.44 (.97)	1.10 (.99)	2.79 (.95)	2.46 (.70)	3.22 (.85)
ALL x PBSCT	6.20 (.93)	6.39 (.69)	3.88 (.85)	3.01 (.86)	4.62 (.82)	1.97 (.75)	3.24 (.95)	3.79 (.66)	5.92 (.62)
Age1 x PBSCT	14.92 (.71)	8.03 (.81)	6.58 (.63)	5.12 (.75)	7.56 (.58)	1.63 (.84)	5.31 (.86)	3.28 (.71)	6.58 (.57)
Age1 x BMT	6.45 (.93)	6.12 (.84)	3.11 (.88)	7.13 (.51)	4.22 (.78)	2.49 (.80)	2.01 (.96)	2.94 (.56)	3.09 (.76)
Age2 x PBSCT	7.31 (.94)	5.61 (.95)	7.69 (.34)	9.53 (.45)	4.36 (.93)	1.28 (.96)	3.74 (.95)	1.26 (1.00)	9.60 (.57)
Age2xBMT	5.46 (.87)	6.96 (.68)	4.02 (.41)	4.58 (.76)	3.66 (.77)	1.95 (.78)	2.37 (.93)	.93 (.91)	4.43 (.72)
Age0xPBSCT	4.29 (.98)	8.10 (.71)	4.73 (.67)	5.45 (.49)	5.34 (.74)	1.88 (.61)	2.46 (.98)	1.70 (.94)	3.58 (.75)
Age0 x BMT	9.73 (.73)	9.06 (.59)	3.97 (.76)	8.60 (.22)	6.34 (.42)	1.76 (.58)	4.03 (.85)	3.31 (.24)	5.82 (.45)
Age1 x AML	6.87 (.89)	5.47 (.87)	2.52 (.91)	2.49 (.98)	3.22 (.94)	3.98 (.34)	2.73 (.86)	2.39 (.59)	3.29 (.68)
Age1 x ALL	4.51 (.97)	2.73 (.99)	2.70 (.89)	3.19 (.74)	2.59 (.96)	2.18 (.82)	3.95 (.78)	3.88 (.50)	2.06 (.93)
Age2 x AML	10.54 (.80)	7.95 (.83)	2.94 (.89)	3.69 (.88)	5.19 (.78)	3.98 (.17)	4.89 (.81)	1.78 (.92)	7.34 (.38)
Age2 x ALL	8.57 (.65)	5.01 (.60)	4.60 (.74)	3.34 (.74)	2.13 (.96)	1.74 (.29)	4.09 (.78)	1.94 (.67)	1.69 (.97)
Age0 x AML	9.70 (.88)	9.63 (.80)	7.64 (.34)	8.95 (.58)	4.92 (.89)	3.30 (.63)	5.20 (.86)	3.54 (.67)	4.12 (.95)
Age0 x ALL	3.75 (.95)	2.97 (.94)	3.75 (.52)	2.10 (.94)	3.15 (.76)	1.27 (.86)	3.16 (.81)	2.32 (.67)	5.76 (.52)

Open in a new tab

Columns: 1 = Tx → AGVHD; 2 = Tx → CGVHD; 3 = Tx → Relapse; 4 = Tx → Death 5 = AGVHD → CGVHD; 6 = AGVHD → Relapse; 7 = AGVHD → Death; 8 = CGVHD → Relapse; 9 = CGVHD → Death.

Variable names: Age0: age in the (29.5, 42, 5] range, Age1 = age ≤ 29.5 years, Age2 = age > 42.5 years.

DxCr-i and CrTX-i, i = 1,2,3,4: DxCr and CrTx variables grouped according to quartiles.

F = female donor transplant; FM = female donor to male recipient vs other sexmatch transplant. Each column provides test statistics and p-values determined based on 5000 resampling experiments.

We note here that the transitions originating from the CGVHD state depend on whether or AGVHD was experienced prior to the entrance to the CGVHD state. This dependence violates the assumption that the sequence of states visited forms a Markov chain. However, this problem disappears if the state space of the process is enlarged to include an extra state A+CGVHD. This extra state is here denoted by 3̄. Conditionally on the time independent covariates, the resulting model has structure of a semi-Markov process with kernel F(x|z) = [F_j(x|z)] specified in Table 3. The entries of the kernel matrix have a fairly explicit form. For transitions originating from the transplant (tx) or AGVHD state, we have

Table 3.

One-step transition probability matrix

	AGVH	CGVH	A+CGVH	rel	death
tx	F₁₂	F₁₃	0	F₁₄	F₁₅
AGVHD	0	0	F₂₃	F₂₄	F₂₅
CGVHD	0	0	0	F₃₄	F₃₅
A+CGVHD	0	0	0	F_3̄4	F_3̄5
rel	0	0	0	1	0
death	0	0	0	0	1

Open in a new tab

F_{j} (x ∣ z) = \int_{0}^{x} e^{θ_{j}^{T} Z_{j}} {[1 + \sum_{k = j_{1} + 1}^{5} 1 (ℓ = (j_{1}, k)) Γ_{ℓ} (u) e^{θ_{ℓ}^{T} Z_{ℓ}}]}^{- 2} Γ_{j} (d u)

for j = (j₁, j₂), j₁ = 1, 2 and j₂ = j₁ + 1 ≤ j₂ ≤ 5. One-step transition probabilities originating from the CGVHD state are given by

F_{j} (x ∣ z) = 1 (Z_{A} = 0) \int_{0}^{x} {[1 + \sum_{k = 4}^{5} 1 (ℓ = (3, k)) Γ_{ℓ} (u) e^{θ_{ℓ}^{T} Z_{j C}}]}^{- 2} e^{θ_{j}^{T} Z_{j C}} Γ_{j} (d u)

for j = (3, j₂) and j₂ = 4, 5. One-step transition probabilities originating from the state A+CGVHD (labeled as “3̄”) have a similar form, with covariate covariate Z_A = 1.

We also consider multi-step probabilities of transitions into the absorbing states, i.e. probabilities of transition into the relapse and death states along any possible path of the model. Let J(t) be the state occupied by the process at time t and let e denote either relapse or death in remission. By noting that a patient may move into an absorbing state by first passing through the GVHD states, these probabilities are given by

H_{e} (t ∣ z) = P (J (t) = e ∣ z) = \sum_{k = 1}^{4} H_{e}^{(k)} (t ∣ z),

where

\begin{array}{l} H_{e}^{(1)} (t ∣ z) = P ({J (t) = e} \cap A^{c} \cap C^{c} ∣ z), \\ H_{e}^{(2)} (t ∣ z) = P ({J (t) = e} \cap A \cap C^{c} ∣ z), \\ H_{e}^{(3)} (t ∣ z) = P ({J (t) = e} \cap A^{c} \cap C ∣ z), \\ H_{e}^{(4)} (t ∣ z) = P ({J (t) = e} \cap A \cap C ∣ z), \end{array}

(4.1)

and the events A and C represent

\begin{array}{l} A = {AGVHD occurs prior to the event e}, \\ C = {CGVHD occurs prior to the event e} . \end{array}

The first of these probabilities corresponds to a move from the transplant to the state e in one step so that $H_{e}^{(1)} (t ∣ z) = F_{1 e} (t ∣ z)$ for e = 4, 5. The terms $H_{e}^{(2)}$ and $H_{e}^{(3)}$ provide the probabilities of transitions along the paths “tx → AGVHD → e” ( $H_{e}^{(2)}$ ) and “tx → CGVHD → e” ( $H_{e}^{(3)}$ ) and are given by $H_{e}^{(k)} (t ∣ z) = (F_{1 k} ★ F_{k e}) (t ∣ z)$ , k = 2 or 3, e = 4 or 5. Here for any two subdistribution functions F and F′ on the positive half-line, F ★ F′ is their convolution

(F ★ F^{'}) (t) = \int_{0}^{x} F (t - u) F^{'} (d u) = \int_{0}^{x} F (d u) F^{'} (t - u) .

Lastly, transition along the path “tx → AGVHD → A+CGVHD → e” ( $H_{e}^{(4)}$ ) contributes to the sum $H_{e}^{(4)} (t ∣ z) = (F_{12} ★ F_{23} ★ F_{\bar{3} e}) (t ∣ z)$ .

The multi-step transition probabilities can be estimated using plug-in method. The estimates are consistent on time intervals [0, τ] strictly contained in the support of all sojourn time distributions. As an example, Figure 1 compares transition probabilities of hypothetical ALL patients receiving BMT and PBSCT transplant. The remaining covariates correspond to the age range 16–29.5 years and baseline subgroups specified in Table 2. The plots represent the four components of the multistep transition probabilities defined in (4.1). PBSCT seems to reduce one-step transition probabilities of both relapse and death ( $H_{e}^{(1)}$ , black curves), and the effect is more pronounced in the case of the tx → death transition. The graphs suggest also that PBSCT associates with a reduced probability of relapse preceded by AGVHD ( $H_{e}^{(2)}$ , red curves). At the same time, however, the probability of death in remission is higher than that of a BMT recipient. We also see an increase in the probability of relapse and death resulting from CGVHD without AGVHD ( $H_{e}^{(3)}$ , blue curves) and CGVHD with AGVHD ( $H_{e}^{(4)}$ , green curves).

Transition probabilities of endpoint events of a young ALL patient receiving BMT (left panel) or PB (right panel). The remaining covariates correspond to the baseline. The curves represent one-step transitions tx → e (black), two-step transitions tx → AGVHD → e (red) and tx → CGVHD → e (blue), and three-step transitions tx → AGVHD → CGVHD → e (green).

Table 2.

Summary of covariates

Age	n	Graft source	n	Disease	n
< 30 (young)	550	[BMT]	842	[AML]	1168
[30, 42.5]	534	PB/PB+BMT	803	ALL	477
> 42.5 (old)	561

Donor’s Gender	n	Gender-Match	n

F	890	FM	441
[M]	755	[not FM]	1224

Open in a new tab

Baseline groups are marked in brackets.

FM represents a female to male transplant

To assess effects of covariates, we consider pointwise and simultaneous confidence bands for pairwise differences of one-step and multi-step transition probabilities. In the case of one-step transition probabilities, we consider functions

Δ_{j}^{F} (t ∣ z_{1}, z_{2}) = F_{j} (t ∣ z_{1}) - F_{j} (t ∣ z_{2}), j \in J_{0},

where z₁ and z₂ are two covariate levels. We denote by ${\hat{Δ}}_{j}^{F}$ the corresponding sample analogue of the function $Δ_{j}^{F}$ . Results of Section 5 imply that the normalized process ${\hat{W}}_{j, Δ}^{F} = {\sqrt{n} [{\hat{Δ}}_{j}^{F} - Δ_{j}^{F}] (t ∣ z_{1}, z_{2}) : t \in [0, τ]}$ converges weakly to a mean zero Gaussian process $W_{j Δ}^{F} = {W_{j}^{F} (t ∣ z_{1}) - W_{j}^{F} (t ∣ z_{2}) : t \in [0, τ]}$ .

To construct confidence bands, we note that each Δ function forms a difference of two subdistributions functions. Correspondingly, it assumes values between −1 and 1. Direct application of the Gaussian approximation to the limiting distribution of the process $W_{j Δ}^{F}$ may result in confidence intervals and confidence bands whose bounds may fall outside the interval (−1, 1). To circumvent this problem, we use transformation method.

Let Φ: R → (−1, 1) be strictly increasing differentiable function derivative ϕ satisfying ϕ(x) > 0 for all x ∈ R. By delta method,

\sqrt{n} [Φ^{- 1} ({\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2})) - Φ^{- 1} (Δ_{j}^{F} (t ∣ z_{1}, z_{2}))] \Rightarrow ϕ {(Φ^{- 1} (Δ_{j}^{F} (t ∣ z_{1}, z_{2})))}^{- 1} W_{j, Δ}^{F} (t ∣ z_{1}, z_{2}), t \in [0, τ] .

Let c_α(t₁, t₂) be the upper α percentile of the distribution of

sup_{t_{1} \leq t \leq t_{2}} [\frac{∣ W_{j, Δ}^{F} ∣}{{\hat{σ}}_{Δ_{j}^{F}}}] (t ∣ z_{1}, z_{2}),

where ${\hat{σ}}_{Δ_{j}^{F}} (t ∣ z_{1}, z_{2})$ is an estimate if the standard deviation of $Δ_{j}^{F} (t ∣ z_{1}, z_{2})$ . Then, by the continuous mapping theorem, the 100 × (1 − α)% asymptotic confidence band for the Δ function has upper and lower bounds given by

Φ (Φ^{- 1} ({\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2})) \pm c_{α} (t_{1}, t_{2}) \frac{{\hat{σ}}_{Δ_{j}^{F}} (t ∣ z_{1}, z_{2})}{ϕ (Φ^{- 1} ({\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2})))}) .

(4.2)

The corresponding pointwise confidence intervals can be obtained by replacing the constant c_α(t₁, t₂) by the upper α/2 percentile of the standard normal distribution.

A possible choice of the Φ function may correspond to Φ(x) = 2G(x) − 1, where G is a distribution function with density g supported on the whole real line. In analogy to the construction of the confidence bands for survival function in Andersen et al. (1993), we may consider the choice of the extreme value distribution G(x) = 1 − exp[−e^x]. In this case Φ⁻¹(u) = log[−log[(1 − u)/2]] and the bounds are given by

\begin{array}{l} 1 - 2 {[\frac{1 - {\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2})}{2}]}^{exp [\pm c_{α} (t_{1}, t_{2}) [h {\hat{σ}}_{Δ_{j}^{F}}] (t ∣ z_{1}, z_{2})]}, \\ h (t ∣ z_{1}, z_{2}) = {[∣ log [(1 - {\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2})) / 2 ∣] (1 - {\hat{Δ}}_{J}^{F} (t ∣ z_{1}, z_{2}))]}^{- 1} . \end{array}

(4.3)

Another possible choice may correspond to the logistic distribution, G(x) = e^x/[1+e^x]. We have Φ⁻¹(u) = log([1 + u]/[1 − u]), and the bounds assume form

\begin{array}{l} 1 - 2 {(1 + \frac{1 + {\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2})}{1 - {\hat{Δ}}_{J}^{F} (t ∣ z_{1}, z_{2})} exp [\pm c_{α} (t_{1}, t_{2}) [h {\hat{σ}}_{Δ_{J}^{F}}] (t ∣ z_{1}, z_{2})])}^{- 1}, \\ h (t ∣ z_{1}, z_{2}) = 2 {[{\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2}) + 1) (1 - {\hat{Δ}}_{j}^{F} (t ∣ z_{1}, z_{2}))]}^{- 1} . \end{array}

(4.4)

A similar approach can be applied towards comparison of multi-step transition probabilities. For any two covariate levels, z₁ and z₂, we set

Δ_{j}^{H} (t ∣ z_{1}, z_{2}) = H_{j} (t ∣ z_{1}) - H_{j} (t ∣ z_{2}), j = 4, 5.

The corresponding sample analogue is denoted by ${\hat{Δ}}_{j}^{H}$ . It is easy to see that { ${\hat{W}}_{j, Δ}^{H} (t ∣ z_{1}, z_{2}) = \sqrt{n} [{\hat{Δ}}_{j}^{H} - Δ_{j}^{H}] (t ∣ z_{1}, z_{2}) : t \in [0, τ]$ } converges weakly to a Gaussian process $W_{j Δ}^{H} (t ∣ z_{1}, z_{2}) = W_{j}^{H} (t ∣ z_{1}) - W_{j}^{H} (t ∣ z_{2})$ , where

W_{j}^{H} (t ∣ z) = W_{1 j}^{F} (t ∣ z) + \sum_{i = 2}^{3} [W_{1 i}^{F} ★ F_{i j} + F_{1 i} ★ W_{i j}] (t ∣ z) + [W_{12}^{F} ★ F_{23} ★ F_{\bar{3} j} + F_{12} ★ W_{23}^{F} ★ F_{\bar{3} j} + F_{12} ★ F_{23} ★ W_{\bar{3} j}^{F}] (t ∣ z) .

and the integrals are defined by means of the convolution formula.

In Figures 2–5, we compare one-step and multi-step transition probabilities of relapse and death in remission for patients with selected covariate profiles. To obtain the bands, we first used Gaussian multiplier method to estimate the approximate variance of the Δ function: the Monte Carlo variance of the Δ function was computed based on 5000 Monte Carlo experiments. A second application of the Gaussian multiplier method was then used to obtain an approximation of the critical level c_α(t₁, t₂) based on 5000 Monte Carlo trials. The interval [t₁, t₂] was chosen to correspond to t₁ = 1.5 and t₂ = 60 months. The bounds (4.2) and (4.3) showed a close numerical agreement and the resolution of the graphs does not allow to show the difference between the two choices. The difference between the upper/lower bounds did not exceed .07%, and the bands obtained using the logistic transformation were narrower.

Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of ALL patients receiving BMT. Covariates: age z₁ ≤ 29.5 and z₂ = baseline age. The remaining covariates correspond to the baseline.

Younger age associated with reduced probabilities of relapse and death of both AML and ALL patients. In Figure 2, we use Δ function to compare transition probabilities of hypothetical younger (z₁) and baseline age (z₂) ALL bone marrow transplant recipients. The remaining covariates correspond to baseline groups specified in Table 2 and median waiting times variables DxCr and CrTx. The plots show that younger age has “concordant” effect on endpoint probabilities, i.e. younger age associated with reduced probability of both relapse and death. In the case of one-step tx → relapse transitions, the pointwise bands suggest that the differences are significant but the wider simultaneous bands show that this is not the case. Examination of the four possible paths leading to the relapse state showed that although younger patients have lower one-step relapse transition probabilities, they are at a higher risk of relapse preceded by AGVHD than patients in the baseline age group. This accounts for marginal differences in the multistep relapse transition probabilities. Figure 2 shows also that multi-step transitions into the death state are significantly lower for a younger patient since the upper bounds of both pointwise and simultaneous bands are below the horizontal line passing through 0. While in the case of one-step transition probabilities there is a marginal difference during the early post-transplant period, patients in the baseline age group had higher probabilities death preceded by GVHD.

In Figure 3, we show the “discordant” effect of older age on the two endpoint probabilities. The graphs represent Δ function for hypothetical ALL patients receiving peripheral blood stem cell transplant. The covariate z₁ corresponds to the older age and z₂ to the baseline age group. The remaining covariates correspond to baseline (Table 2). Older age associated with lower transition probabilities into the relapse state. On the other hand, the role of the two covariates is reversed in the case of transitions into the death state. Plots of the four paths leading to the endpoint events showed that an older patient may have higher probabilities of death resulting from CGVHD (with or without AGVHD) while probability of transition along the path tx → AGVHD → death is comparable to that of a patient in the baseline age group.

Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of ALL patients receiving PBSCT. Covariates: z₁ = age > 42.5 years, z₂ = baseline age. The remaining covariates correspond to the baseline.

In the next figure we show a “switching” treatment effect. Figure 4 compares two hypothetical young AML patients receiving PBSCT (z₁) and BMT (z₂). The one-step and multi-step relapse probabilities were lower in the case of the PBSCT but the differences were not significant. On the other hand, we see that PBSCT associates with a lower probability of one-step transition into the death state, while in the case of multi-step transitions the role of the two graft sources is reversed. This pattern is also seen in the case ALL young patients in Figure 1, but in the case of AML patients the differences in the multi-step transition probabilities were more pronounced.

Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of young AML patients. Covariates z₁ = PBSCT z₂ = BMT. The remaining covariates correspond to the baseline.

A similar approach can be applied to compare transition probabilities evaluated by conditioning on the follow-up history of a patient. In particular, Arjas and Eerola (1993) and Eerola (1994) have suggested the use of graphs of the conditional probabilities

P (J (t) = e ∣ H_{s}), s < t

(4.5)

where Inline graphic represents patient’s history up-to time s. Examples of these graphs specialized to Markov chains and semi-Markov models were given in Klein et al. (1993), Keiding et al. (2001), Dabrowska et al. (1994) and Putter et al. (2007). Here we note only that in the case of Markov chain regression models, the predictions depend only on the state occupied by the patient at time s and estimation of (4.5) reduces to estimation of the transition probability matrix because

P (J (t) = e ∣ H_{s}) = P (J (t) = e ∣ J (s) = i, Z) for s < t .

(4.6)

In the case of semi-Markov model, the conditional probabilities P(J(t) = e| Inline graphic ) are given by the transition probability matrix of a delayed Markov renewal process, with delay determined by the length of time spent on the state occupied at time s. On the other hand, the right-hand side of (4.6) depends also on the the initial state J₀, and all possible transitions leading to the state e and passing through the state i on or prior to time s. The two models coincide only if the sojourn times in each state are exponentially distributed.

In Table 5 we report results from analysis of residuals of the main variables in the model. We considered Martinussen and Scheike’s Kolmogorov-Smirnov statistics for transitions between adjacent states of the model from each state. The test statistics were calculated in the range t ∈ [1, 90] months and the reported p-values were obtained using Gaussian multiplier method based on 5000 Monte Carlo samples. The results were also compared with a larger model, which included length of time spent in the transplant and AGVHD states as time dependent covariates. The dependence on length of time spent in these states appeared to have marginal effect. In the case of the transitions originating from the CGVHD state, the latter may stem from a relatively small number of failures (relapse or death). On the other hand, AGVHD can occur only during the first 4 months and the state space of the process partially captures the dependence on the length of time spent in the transplant state. Although Table 5 shows an acceptable fit, there are several possible sources of departure from the model, In particular, they may be caused by calendar and center effects. For example, grading of acute and chronic GVHD is not uniform across centers. At the same time, the use of PBSCT in allogeneic transplants might have been more frequent towards the end of the study period than at its beginning. These factors were not taken into the account in this study as they identify patients in the population. Further, transplant may result in many other complications, including infections, pneumonia, as well secondary cancers, loss of vision and damage of other organs. We have not taken them into the account due to lack of data.

There has been very little work on variable and model selection problems in multi-state models. Commenges et al. (2007) considered a flexible class of multistate models which includes as special cases Markov chains and semi-Markov models. They extended the expected Kullback-Leibler (EKL) risk function to counting process models coarsened at random and proposed a leave-one-out cross-validation method for approximation of EKL based on penalized likelihoods. The approach was illustrated using a three state additive illness process, though the methodology applies to more complex situations as well. Another approach may be based on focused information criteria and model averaging of Hjort and Cleaskens (2003,2006). Their approach is tailored towards selection of a model for given parameters of interest. In the case of single spell models, examples of such parameters include regression coefficients, quantiles, cumulative hazards or distribution functions evaluated at a fixed point or over a fixed interval. Extension of this method to multistate regression models may include onestep and multistep transition probabilities or other parameters arising in prediction problems.

5 Proofs

5.1 Assumptions and notation

We first recall that if A = [a_k_ℓ] is a rectangular d × q matrix then its ℓ₁ and ℓ_∞ norms are given by

{| | A | |}_{1} = max_{ℓ} \sum_{k = 1}^{d} ∣ a_{k ℓ} ∣ and {| | A | |}_{\infty} = max_{k} \sum_{ℓ = 1}^{q} ∣ a_{k ℓ} ∣,

and we have ||A||₁ = sup{μ^TAλ: ||μ||_∞ ≤ 1, ||λ||₁ ≤ 1} = ||A^T||_∞, where μ = (μ₁, …, μ_d)^T and λ = (λ₁, …, λ_q)^T. If A(s) = [a_ij(s)], s = (x, θ) is a d × q matrix of functions defined on Inline graphic = [0, τ] × Θ then ||A|| = sup{||A(s)||₁: s ∈ } is the corresponding supremum norm, and with some abuse of notations, we write ||A|| = sup{||A(s)||_∞: s ∈ }. We also use ||·|| to denote the supremum norm of scalar or vector-valued functions on [0, τ].

We shall assume the following regularity conditions on the hazard rates α_j(y, θ, z), y ∈ R^q, j ∈ Inline graphic .

Condition 5.1

The parameter set Θ ⊂ R^d is bounded and open.
For fixed z ∈ R^d, the function ℓ_j(y, θ, z) = log α_j(y, θ, z), j ∈ is twice continuously differentiable with respect to (y, θ). The derivatives with respect to y (denoted by primes) and with respect to θ (denoted by dots) satisfy ${| | ℓ_{j}^{'} (y, θ, z) | |}_{1} \leq ψ ({| | y | |}_{1}), {| | ℓ_{j}^{″} (y, θ, z) | |}_{1} \leq ψ ({| | y | |}_{1})$ , ||ℓ̇_j(y, θ, z)||₁ ≤ ψ₁(||y||₁), ||ℓ̈_j(y, θ, z)||₁ ≤ ψ₂(||y||₁) and ||g(y, θ, z) − g(y′, θ′, z)||₁ ≤ max(ψ₃(||y||₁), ψ₃(||y′||₁)) × × [||y − y′||₁ + |θ − θ′|], where $g = {\ddot{ℓ}}_{j}, {\dot{ℓ}}_{j}^{'}$ and $ℓ_{j}^{″}$ . Here ψ is a constant or a continuous bounded decreasing function. The functions ψ_p, p = 1, 2, 3 satisfy ψ_p(0) < ∞, are continuous and locally bounded.
For fixed θ ∈ Θ and y ∈ R^q, the functions α_j(y, θ, ·) and their logarithmic derivatives in (ii) are measurable with respect to the Borel σ-field of R^d.
We have either a) m₁ < α_j(y, θ, z) < m₂ for some 0 < m₁ < m₂ < ∞ or b) α_j(y, θ, z) is a bounded coordinate-wise decreasing function such that α_j(y₁, …, y_k, θ, z) ↓ 0 as y_ℓ ↑ ∞, ℓ = 1, …, q, and
$m_{1} {[1 + c_{1} {| | y | |}_{1}]}^{- e_{1}} \leq α_{j} (y, θ, z) < m_{2} {[1 + c_{2} y_{j}]}^{- e_{2}}, j = 1, \dots, q$

for some c₁, c₂ > 0, e₁ ∈ (0, 1], e₂ ∈ [0, 1] and 0 < m₁ < m₂ < ∞.

The condition (ii) assumes that the function α(y, θ, z) and its derivatives are jointly continuous in the arguments (y, θ). Together with the condition (iii), this implies that they are measurable with respect to the Borel σ-field of Inline graphic (R^q) ⊗ (Θ) (R^d). The condition 5.1 (iii) serves to ensure that for each state j₁ ∈ , the Volterra equation corresponding to the transitions j ∈ originating from the state j₁ has a non-explosive solution on the interval [0, τ_j₁] = sup{x: EY_j(x) > 0}.

Let P be a distribution satisfying assumptions 2.1 of Section 2. For any j ∈ Inline graphic let

A_{j P} (x) = \int_{0}^{x} \frac{E_{P} N_{j . i} (d u)}{E_{P} Y_{j . i} (u)}

and set A._P = Σ_j_∈A_jP. In analogy to the single spell models in Dabrowska (2006), we can show that the condition 5.1 (iii-a) implies that, the Volterra equation has a unique solution Γ_θ = [Γ₁_θ, …, Γ_qθ]^T such that $m_{2}^{- 1} A_{j P} (x) \leq Γ_{j θ} (x) \leq m_{1}^{- 1} A_{j P} (x)$ for x ∈ [0, τ], θ ∈ Θ. In addition, there exist positive constants d₁, d₂, d₃ such that

\begin{array}{l} {| | Γ_{θ} (x) - Γ_{θ^{'}} (x) | |}_{1} \leq ∣ θ - θ^{'} ∣ d_{1} exp [d_{2} A_{. P} (x)], \\ ∣ Γ_{j θ} (x) - Γ_{j θ} (x^{'}) ∣ \leq d_{3} E_{P} N_{j . i} ((x \land x^{'}, x \lor x^{'}]) . \end{array}

(5.1)

Similar inequalities hold also for the left continuous version of Γ_jθ. On the other hand, under the condition 5.1.(iii-b), we have Φ₂(A_jP (x)) ≤ Γ_jθ(x) ≤ Φ₁(A._P (x)), where $Φ_{q} (u) = c_{q}^{- 1} ({[1 + c_{q} u / m_{q}^{'}]}^{1 / 1 - e_{q}} - 1)$ for q = 1, 2 and $m_{q}^{'} = m_{q} / (1 - e_{q})$ if e_q ≠ 1, and $Φ_{q} (u) = c_{q}^{- 1} (e^{c_{q} u / m_{q}} - 1)$ if e_q = 1. The functions Φ_q are inverse cumulative hazards corresponding to the lower and upper bounds on hazard rates in the condition 5.1 (iii). The inequality (5.1) is in this case satisfied with the function A._P replaced by Φ₁(A._P ).

5.2 Some measurability issues

In section 2, we assumed that the observations D₁, …, D_n of the censored modulated renewal process are defined on a common complete probability space (Ω, Inline graphic , P) and take on values in a separable measure space ( , ). A measure space is here called separable if its σ-field is countably generated and contains all singletons. Any such space is measurably isomorphic to a subspace of the real line equipped with its Borel σ-field (e.g. Dellacherie-Meyer, 1975, p.15). Let ( Inline graphic , ) and ( , ) be the corresponding n-fold and infinite product spaces and let P_n and P_∞ be the corresponding product measures on and induced by (D₁, …, D_n) and D = (D₁, D₂, …, D_n, …), respectively. We denote by $S_{n}^{P}$ the sigma-field of subsets A ⊂ Inline graphic measurable in the completion of the product probability measure P_n and by $S_{n}^{u}$ the universal sigma-field generated by , i.e. the sigma-field of subsets measurable in the completion of any probability measure Q on . We have $S_{n} \subseteq S_{n}^{u} \subseteq S_{n}^{P}$ . Whereas Inline graphic is not complete with respect to the product measure P_n, any set $A \in S_{n}^{P}$ satisfies $P_{n}^{*} (A) = P_{n, *} (A)$ and $g_{n}^{- 1} (A) \in F$ for g_n = (D₁, …, D_n). The sigma-fields $S_{\infty}^{P}$ and $S_{\infty}^{u}$ have similar property. Without much loss of generality, we can assume therefore that (Ω, Inline graphic ) = ( , ) and, when necessary, require measurability with respect to these larger sigma-fields. With this choice the sequence D is the identity map on and (D₁, …, D_n) are the corresponding coordinate projections on ( , ).

Further, let (Ω₀, Inline graphic ) be an arbitrary measure space let be a Polish space or a Borel subset of it. For any set A ⊂ Ω₀ × , its projection on Ω₀ is denoted by proj_Ω₀(A) = {ω₀: (ω₀, z) ∈ A for some z ∈ }. A multifunction (or correspondence) is a set-valued function assigning to each ω₀ ∈ Ω₀ a subset of Inline graphic . We shall write H: Ω₀, ↪ for such mappings to differentiate them from usual functions assigning to each ω₀ a single value (h: Ω₀ → ). The domain and graph of a multifuction H are defined as

dom H = {ω_{0} : H (ω_{0}) \neq \emptyset} and graph H = {(ω_{0}, z) : z \in H (ω_{0})},

respectively. For any nonempty set B ⊆ Inline graphic , the inverse image of H is given by

H^{- 1} (B) = {ω_{0} : H (ω_{0}) \cap B \neq \emptyset} = {ω_{0} : z \in H (ω_{0}) for some z \in B}

and the right side is equal to the projection proj_Ω₀(graphH ∩ Ω₀ × B). Finally, by a selector we mean a function h: Ω₀ → Inline graphic ∪ {z^*} such that h(ω₀) ∈ H(ω₀) if domH ≠ ∅ and h(ω₀) = z^*, otherwise. (Here z^* is an extra point attached to ).

A set-valued mapping H is here called measurable if graphH is jointly measurable with respect to Inline graphic ⊗ B( ). By measurable projection theorems (e.g. Dellacherie and Meyer, 1975, p.252, Pollard, 1984, p. 196–197 or Dudley, 1999, Chapter 5), the joint measurability of graph H entails that the inverse image H⁻¹(B) of any Borel set B ∈ ( ) belongs to the universal sigma field $F_{0}^{u}$ generated by Inline graphic . Moreover, H admits at least one $F_{0}^{u}$ -measurable selector. If is complete with respect to some probability measure then $F_{0}^{u} = F_{0}$ . For alternative conditions for this equality we refer to Wagner (1976).

Further, let Inline graphic be a Polish space and let {X_t: t ∈ } be an R^k-valued random element defined on Ω₀. We refer to it as measurable if it forms a measurable stochastic process, i.e. the map Ω₀ × ∋ (ω, t) → X(ω, t) ∈ R^k is jointly measurable with respect to the σ-fields ⊗ Inline graphic ( ) and (R^k). Correspondingly, the set valued function H: Ω₀ ↪ = × R^k given by H(ω₀) = {(t, X(ω₀, t): t ∈ } has a measurable graph and for any Borel sets B ∈ (R^k) and C ∈ ( ), we have ${ω_{0} : X (ω_{0}, t) \in B for some t \in C} \in F_{0}^{u}$ . In section 5.3, we use that an R^k-valued process is measurable iff each of its components is measurable. Moreover, sums and products of such processes are measurable as well.

A class of scalar functions Inline graphic = {g_t(s): t ∈ } defined on , k ≤ n is called here measurable if it forms a measurable process in the above sense. Following Nolan and Pollard (1987) and Pollard (1990), a measurable class of functions is called Euclidean for an envelope G if |g_t|(s) ≤ G(s) for all t ∈ Inline graphic , and there exist constants A and V such that N(ε||G||_Q_,_r, , ||·||_Q_,_r) ≤ (A/ε)^V for all ε ∈ (0, 1) and all probability measures Q on such that ||G||_Q_,_r < ∞. Here N (η, , ||·||_Q_,_r) is the minimal number of L_r(Q)– balls of radius η covering the class and ||·||_Q_,_r is the L_r(Q) norm. We use r = 1, 2 in the sequel.

In our application the space ( Inline graphic , ) can be taken as the complete separable metric space ( , ) = (E₀, (E₀)) × (E₁ × (E₀))^ℕ, where E₀ = × R^d E₁ = (R̄⁺ × ( × R^d)∪ Δ)^ℕ. Here E₀ represents possible initial realizations of the mark V₀ = (J₀, Z₀) and E₁ is the space of realizations of the censored modulated renewal process (X_m, V_m = (J_m, Z_m))_m_≥1. Further, Inline graphic = [0, τ] × Θ, where τ is a finite point on the positive half-line and Θ is a bounded open subset of a Euclidean space. Here is a Polish space because forms a G_δ subset (a countable intersection of open sets) of a Polish space and Polishness is hereditary with respect to G_δ sets. Finally, all classes Inline graphic = {g_t(s): t = (x, θ) ∈ } correspond to cádlág (or cáglád) functions such that for 0 ≤ x < x′ ≤ τ and θ, θ′ ∈ Θ, we have

\begin{array}{l} ∣ g_{x θ} (s) - g_{x^{'} θ} (s) ∣ \leq C_{1} [\tilde{G} (x^{'}, s) - \tilde{G} (x, s)], \\ ∣ g_{x θ} (s) - g_{x θ^{'}} (s) ∣ \leq C_{2} ∣ θ - θ^{'} ∣ \tilde{G} (τ, s), \end{array}

(5.2)

where G̃(s, x) is a nonnegative monotone increasing cádlág (respectively cáglád) function of x such that G̃(s, 0) = 0 and ||G̃(τ, ·)||_Q_,_r < ∞. In this case, the Euclidean property is satisfied with envelope G(s) = [C₁ + C₂diam Θ]G̃(τ, s) + g_x₀θ₀(s), where g_x₀θ₀ (s) is an arbitrary function from the class Inline graphic .

To verify measurability of the estimates, we shall need some properties of Carathéodory integrands and cádlág or cáglád functions. If Inline graphic and are Polish spaces then a function f: Ω₀ × → is called a Carathéodory integrand if for fixed t ∈ , f (·, t): Ω₀ → is measurable, and for fixed ω₀ ∈ Ω₀, f (ω₀, ·): → is continuous. Here (Ω₀, ) is an arbitrary measure space and we have

Lemma 5.1

Let f: Ω₀ × Inline graphic → be a Carathéodory mapping. Then

f is measurable with respect to × ( ).
For any open set B of , let H(ω₀) = {t: f (ω₀, t) ∈ B}. Then for any closed or open C set of , we have H⁻¹(C) = {ω₀: f (ω₀, t) ∈ B for some t ∈ C} ∈ .
If g: Ω₀ → is measurable, then the composite mapping f ∘ g: Ω₀ → given by (f ∘ g)(ω₀) = f (ω₀, g(ω₀)) is measurable.
Suppose that is another Polish space and h: Ω₀ × × → is a Carathéodory integrand. Then the composite map (h ∘ f): Ω₀ × → given by (h ∘ f )(ω₀, t) = h(ω₀, t, f (ω₀, t) is a Carathéodory integrand.

Part (i) remains valid even if Inline graphic is replaced by a nonseparable metric (Kuratowski 1966 p. 378, or Himmelberg, 1975). In part (ii), if C is a closed set then

H^{- 1} (C) = \underset{q \in C}{\cup} {ω_{0} : f (ω_{0}, q) \in B},

where the union is over a dense subset of C. If C is open then it can be represented as a countable increasing union of closed sets and part (ii) follows by noting that inverse images preserve unions of sets. Part (iii) follows from the definition of a measurable function and continuity of f with respect to t. Part (iv) follows from part (i) and (iii) and definition of a continuous function.

Part (i) of the lemma extends to functions f which are cádlág, cáglád, cád and cág in t ∈ Inline graphic , = R₊ or = [0, τ] and take on values in a complete separable metric space (e.g. Dellacherie and Meyer, 1975 p. 144). Any cádlág or cáglád function is also a pointwise limit of Carathéodory integrands.

Finally, suppose that Inline graphic = [0, τ] × Θ and f is a function such that (i) for fixed (x, θ) ∈ , f(·, x, θ) is the measurable and (ii) for fixed ω₀ ∈ Ω₀, it is jointly cádlág with respect to (x, θ) and continuous with respect to θ. To see that f is jointly measurable, let {q_k: k ≥ 1} be a dense set in Θ and for given integer m ≥ 1 let B_mk be a balls of radius 1/m centered at q_k covering Θ. Set $B_{m k}^{'} = B_{m k} - \cup_{r = 1}^{k - 1} B_{m r}$ and

f_{m} (ω_{0}, x, θ) = \sum_{ℓ = 1}^{\infty} \sum_{k = 1}^{\infty} f (ω, \frac{ℓ}{m} \land τ, q_{k}) 1 (\frac{ℓ - 1}{m} \leq x < \frac{ℓ}{m}) 1 (θ \in B_{m k}^{'}) .

Then f_m is joinly measurable and pointwise converges to f. Similarly, if f is jointly cáglád rather than cádlág function in (ii) then f is a jointly measurable with respect to Inline graphic ⊗ ( ). Similarly to the single parameter case, functions of this type are pointwise limits of Carathéodory integrands. Part (ii) of the lemma remains valid for sets of the form C = I × C′, where C′ is an open or closed subset of Θ and I is an interval contained in [0, τ]. In particular, if f is a real valued cádlág function of this type then its supremum is Inline graphic measurable.

5.3 Proof of Proposition 3.1

To show proposition 3.1, we shall first consider the process Γ_nθ(x), (x, θ) ∈ [0, τ] × Θ = Inline graphic .

Lemma 5.2

The process Ŵ= {Ŵ(t) = [Ŵ_j(t): t = (x, θ) ∈ , j ∈ }, ${\hat{W}}_{j} (x, θ) = \sqrt{n} [Γ_{n j θ} - Γ_{j θ}] (x)$ , converges weakly in ℓ^∞( × ) to
$W (x, θ) = V (x, θ) - \int_{[0, x]} V (u -, θ) s^{'} (Γ_{θ} (u -), θ, u) C_{θ} (d u) P_{θ} (u, x),$

where {V (t) = [V_j(t): t = (x, θ) ∈ , j ∈ } is a tight mean zero Gaussian process. Its covariance function is given by
$cov (V_{j} (x, θ), V_{j^{'}} (x^{'}, θ^{'})) = E \sum_{m} \sum_{m^{'}} \int_{0}^{x} \int_{0}^{x^{'}} \frac{M_{j m} (d u, θ) M_{j^{'} m^{'}} (d v, θ^{'})}{s_{j} (Γ_{θ} (u), θ, u) s_{j^{'}} (Γ_{θ^{'}} (v), θ^{'}, v)} .$

In addition, under the assumption that observations correspond to a censored modulated renewal process and θ = θ₀ is the true parameter, cov(V_j(x, θ), V_j_′(x′, θ)) = 1(j = j′)C_jθ(x ∧ x′).
Let θ₀ be an arbitrary point in Θ. If θ̂ is a $\sqrt{n}$ -consistent estimate of it, then the process Ŵ₀ = {Ŵ₀(x): x ≤ τ}, ${\hat{W}}_{0} = \sqrt{n} [Γ_{n \hat{θ}} - Γ_{θ_{0}} - {(\hat{θ} - θ_{0})}^{T} {\dot{Γ}}_{n \hat{θ}}]$ converges weakly in ℓ^∞([0, τ] × ) to W₀ = W (·, θ₀).

Here the space Inline graphic = ℓ^∞( × ) is equipped with uniform metric, d_X(x̃, ỹ) = sup_t_,_j |x̃(t, j) − ỹ(t, j)| and is isometric to the space = ℓ^∞( )^q equipped with metric d_Y (x, y) = max_j sup_t |x_j(t) − y_j(t)|. Apparently, the isometry is given by the mapping Φ assigning to each x̃ ∈ Inline graphic the vector of coordinate functions, Φ(x̃) = [x̃(·, 1), …, x̃(·, q)]^T. Open sets of can be represented as arbitrary unions of balls (x̃, ε) = {y: d_X(x, y) < ε}. On the other hand, the product topology of coincides with the topology induced by the metric d_Y so that any open set in the product topology is an arbitrary union of balls Inline graphic (x, ε), where x = [x₁, …, x_q].

Proof

To show part (i), define V_n = [V_jn: j ∈ Inline graphic ], where

V_{j n} (x, θ) = \int_{(0, x]} \frac{N_{j . .} (d u)}{S_{j} (Γ_{θ} (u -), θ, u)} - \int_{(0, x]} \frac{{E N}_{j . .} (d u)}{s_{j} (Γ_{θ} (u -), θ, u)} .

Then V_jn = V₁_jn + rem_j, where

V_{1 j n} (x, θ) = \frac{1}{n} \sum_{i = 1}^{n} \int_{(0, x]} [\frac{N_{j . i} (d u)}{s (Γ_{θ} (u -), θ, u)} - \frac{S_{j i}}{s_{j}^{2}} (Γ_{θ} (u -), θ, u) {E N}_{j . .} (d u)]

and rem_j(x, θ) is a remainder term. Lemma 5.3 gives its form and shows that ||rem_j|| = o_P (n^−1/2). Therefore the process V₁_n = [V₁_jn: j ∈ Inline graphic ] satisfies also ||V_n − V₁_n|| = o_P (n^−1/2).

Using CLT and Cramer-Wold device, the finite dimensional distributions of $\sqrt{n} V_{1 n}$ converge in distribution to finite dimensional distributions of V: for any distinct t₁, …, t_k ∈ Inline graphic and any numerical vector λ of length kq, the random variable λ^Tvec[V₁_n(t₁), …, V₁_n(t_k)] converges in distribution to the corresponding linear combination of finite dimensional marginals of V.

For each j ∈ Inline graphic , the process V₁_jn can be represented as V₁_jn(x, θ) = [ℙ_n − P ]g, where g varies over a class = {g_tj: t = (x, θ) ∈ } consisting of cádlág functions such that each g_tj is a difference of two càdlàg functions, increasing in x and Lipschitz continuous with respect to θ. Setting ${\tilde{G}}_{j} (D_{i}, x) = N_{j . i} (x) + \int_{0}^{x} Y_{j . i} (u) A_{j} (d u)$ , the condition (5.2) is satisfied with constants C₁ and C₂ determined by the functions ψ, ψ₁ of the condition 5.2 (ii) and g_t₀ ≡ g_{τ, θ₀}, say. Correspondingly, the class Inline graphic is Euclidean for a square integrable envelope G_j. From Pollard (1984,1990) it follows that the process $\sqrt{n} V_{1 j n}$ converges weakly in ℓ^∞( ) to V_j, the j-th component of the process V because the class is totally bounded and asymptotically uniformly equicontinuous with respect to the variance pseudo-metric d_j(t, t′) = sd(V₁_jn(t) − V₁_jn(t′)), t, t′ ∈ Inline graphic . Joint weak convergence of the process $\sqrt{n} V_{n} = \sqrt{n} (P_{n} - P) g$ , g ∈ ∪_j G_j follows from finite dimensional weak convergence and by noting that union of a finite number Euclidean classes of functions is also Euclidean (Pollard, 1990). In particular, the class Inline graphic is totally bounded and asymptotically equicontinuous with respect to the variance pseudo-metric d((t, j), (t′, j′)) = sd(V₁_nj(t) − V₁_nj_′(t′)). Denoting by $V_{n}^{-}$ the left-continuous process (obtained by changing the integrals over (0, x] to integrals over intervals [0, x)), the process $\sqrt{n} V_{n}^{-}$ converges weakly to V as well because the jumps of the process V_n are of the order O_p(1/n) unifromly in t ∈ Inline graphic and the functions EN_j are continous.

Finally, to show weak convergence of the standardized Γ_nθ process, we shall need bounds on the supremum of the norm of the vector V_n. Let Inline graphic denote the class of functions = {h(λ, t) =Σ_j₌₁ λ_jg_tj: g_tj ∈ , |λ_j| ≤ 1, j = 1, …, q}. Then forms a Euclidean class for the envelope H = Σ_j G_j and we have

E sup_{t \in T} {| | \sqrt{n} V_{1 n} (t) | |}_{1} = E sup_{h \in H} \sqrt{n} ∣ P_{n} - P ∣ h = O (1) .

Similarly, $E {sup}_{t \in T} {| | \sqrt{n} V_{1 n} (t) | |}_{\infty} = O (1)$ and the left-continuous versions of the process satisfy similar bounds.

To show consistency of the estimate Γ_nθ, we first assume the condition 5.1. (iii-a). Let A_jn be the Aalen-Nelson estimator. Let $A_{pjn} = m_{p}^{- 1} A_{j n}$ , p = 1, 2. Then A₂_jn(x) ≤ Γ_njθ(x) ≤ A₁_jn(x) for all θ ∈ Θ and a similar algebra as in Dabrowska (2006) shows that

∣ Γ_{n j θ} (x) - Γ_{j θ} (x) ∣ \leq ∣ V_{j n} (x, θ) ∣ + \int_{(0, x]} {| | Γ_{n θ} - Γ_{θ} | |}_{1} {(u -)}_{ρ_{j n}} (d u),

where ρ_jn = max(c_j, 1)A₁_jn for some constant c_j. Therefore ||Γ_nθ − Γ_θ||₁(x) ≤ ||V_n(x, θ)||₁+∫_(0,_x_] ||Γ_nθ − Γ_θ||₁(u−)ρ_n(du), where ρ_n = Σ _j ρ_nj. Gronwall’s inequality (Beesack, 1973, Dabrowska, 2006) implies that sup_x_,_θexp[−ρ_n(x)]||Γ_nθ− Γ_θ||₁(x) → 0 a.s., where the supremum is over θ ∈ Θ and x ∈ [0, τ]. In the case of the condition 5.1.(iii-b), the proof is the same, except that the function ρ_jn is replaced by ρ_j = max(c_j, 1)Φ₁(A._n), where A._n = ΣA_jn. Note that Aalen-Nelson estimate is a measurable process, whereas measurability of the process Γ_nθ is verified below.

The process $\hat{W} (x, θ) = \sqrt{n} {[Γ_{n θ} - Γ_{θ}]}^{T} (x)$ satisfies

\hat{W} (x, θ) = \sqrt{n} V_{n} (x, θ) - \int_{(0, x]} \hat{W} (u -, θ) {\tilde{b}}_{n θ} (u) \bar{N} (d u),

where N̄(x) is the diagonal matrix N̄(x) = diag [N_1..(x), …, N_q..(x)], and b̃_nθ(u) is a q × q matrix with columns

{\tilde{b}}_{j n θ} (u) = [\int_{0}^{1} (S_{j}^{'} / S_{j}^{2}) (θ, Γ_{θ} (u -) + λ [Γ_{n θ} - Γ_{θ}] (u -), u) d λ] .

Let b_θ(u) be a q × q matrix with columns $b_{j θ} (u) = [s_{j}^{'} / s_{j}^{2}] (Γ_{θ} (u), θ, u)$ . Using consistency of Γ_nθ and Lemma 5.3, we have [b̃_nθ − b_θ](u) ∈ 0 a.s. uniformly in (u, θ) ∈ Inline graphic . Moreover, (5.1) and (5.2) imply also that ||R₁_n||→ 0 a.s., where

R_{1 n} (x, θ) = \int_{(0, x]} b_{θ} (u) [\bar{N} - E \bar{N}] (d u) .

Define

\tilde{W} (x, θ) = \sqrt{n} V_{n} (x, θ) - \int_{(0, x]} \tilde{W} (u -, θ) b_{θ} (u) E \bar{N} (d u) .

Then

\begin{array}{l} \tilde{W} (x, θ) = \sqrt{n} V_{n} (x, θ) - \int_{(0, x]} \sqrt{n} V_{n} (u -, θ) b_{θ} (u) E \bar{N} (d u) P_{θ} (u, x) \\ = \int_{(0, x]} V_{n} (d u, θ) P_{θ} (u, x) . \end{array}

and

\hat{W} (x, θ) - \tilde{W} (x, θ) = - \int_{(0, x]} [\hat{W} - \tilde{W}] (u -, θ) {\tilde{b}}_{n θ} (u) \bar{N} (d u) + rem (x, θ),

where

rem (x, θ) = - \int_{(0, x]} \tilde{W} (u -, θ) [{\tilde{b}}_{n θ} (u) \bar{N} (d u) - b_{θ} (u) E \bar{N} (d u)] .

(5.3)

Setting $v_{n} = max ({| | \sqrt{n} V_{n} | |}_{1}, {| | \sqrt{n} V_{n}^{-} | |}_{1}) = O_{P} (1)$ , we have

max (| | {\tilde{W}}_{n} | |, | | {\tilde{W}}_{n}^{-} | |) \leq v_{n} exp sup_{θ} \int_{0}^{τ} {| | b_{θ} (u) | |}_{1} E N \dots (u) = O_{p} (1) .

The process W̃ is a sum of iid mean zero processes whose finite dimensional distributions are asymptotically normal and converge to the finite dimensional distributions of the process W in the statement of the proposition. Moreover, its components can be represented as empirical processes indexed by Euclidean classes of functions satisfying the condition (5.2). Therefore a similar argument as in the case of the process $\sqrt{n} V_{1 n}$ , shows that W̃ ⇒ W. The remainder term (5.3) is bounded by $\sum_{p = 2}^{4} R_{p n} (x, θ)$ , where

\begin{array}{l} R_{2 n} (x, θ) = \int_{(0, x]} \sqrt{n} V_{n} (u -, θ) R_{1 n} (d u, θ), \\ R_{3 n} (x, θ) = \int_{(0, x]} \sqrt{n} V_{n} (u -, θ) b_{θ} (d u) E \bar{N} (d u) J_{n} (u, x, θ), \\ R_{4 n} (x, θ) = \int_{(0, τ]} {| | [{\tilde{b}}_{n θ} - b_{θ}] (u) | |}_{1} N_{\dots} (d u) {| | \tilde{W} (u -, θ) | |}_{1}, \\ J_{n} (u, x, θ) = \int_{(u, x]} P_{θ} (u, w) R_{1 n} (d w, θ), \end{array}

where N_… = Σ_qN_q_,... We have ||R₂_n|| = o_P (1), by a similar V-process expansion as in Lemma 5.4 below. Using Kolmogorov equations for matrix product integrals (Gill and Johansen, 1990), we also have

J_{n} (u, x, θ) = R_{1 n} (x, θ) - R_{1 n} (u, θ) - \int_{(u, x]} P_{θ} (u, s -) b_{θ} (s) E \bar{N} (d s) [R_{1 n} (x, θ) - R_{1 n} (s, θ)]

and

\begin{array}{l} {| | J_{n} (u, x, θ) | |}_{1} \leq 2 | | R_{1 n} | | [1 + \int_{(u, x]} {| | P_{θ} (u, s -) | |}_{1} {| | b_{θ} (s) | |}_{1} {E N}_{\dots} (d s) \\ \leq 2 | | R_{1 n} | | exp \int_{(u, x]} {| | b_{θ} (s) | |}_{1} {E N}_{\dots} (d s) \leq 2 | | R_{1 n} | | exp \int_{(0, τ]} {| | b_{θ} (s) | |}_{1} {E N}_{\dots} (d s) . \end{array}

From this we also get ||R₃_n|| = o_P (1), because b_θ (u) is uniformly bounded. Finally, ||R₄_n|| = o_P (1) Combining, the right-hand side of (5.3) is of the order o_P (1), uniformly in (x, θ) ∈ Inline graphic . For fixed (x, θ), we also have

{| | \hat{W} (x, θ) - \tilde{W} (x, θ) | |}_{1} \leq {| | rem (x, θ) | |}_{1} + \int_{(0, x]} {| | \hat{W} - \tilde{W} | |}_{1} (u -, θ) ρ_{n} (d u)

and by uniform Gronwall’s inequality (Beesack, 1993, Dabrowska, 2006), we have Ŵ(x, θ) = W̃(x, θ) + o_P (1) uniformly in (x, θ) ∈ Inline graphic .

To complete the argument, we note that the processes V₁_n V_n, W̃ and the remainders R_pn, p = 1, …, 3 satisfy measurability conditions of section 5.2, whereas to show that Ŵ and R₄_n have this property, it is enough to show that the process Γ_nθ is measurable. However, the aggregate process $N_{\dots} (x) = \sum_{i = 1}^{n} \sum_{m} N_{jmi} (x)$ is measurable since it is cádlág increasing with respect to x and measurable with respect to Inline graphic for fixed x. For any integer k and ω₀ = (s₁, …, s_n), T_k(ω₀) = inf{x: N_…(ω₀, x) ≥ k} is a random variable because {ω₀: T_k(ω₀) ≤ x} = {ω₀: N_…(x, ω₀) ≥ k}∈ . Similarly, the censored data ranks R_im = Σ_k 1(T_k ≤ X_im) are measurable. Define set valued mapping H_n: Inline graphic ↪ R^q by setting H_n(ω₀) = {(θ, x): Γ_nθ(ω₀, x)) ∈ B} where B is an open set of R^q. Then H_n(ω₀) =∪_ℓ≥0 H_n_ℓ(ω₀) where

H_{n ℓ} (ω_{0}) = {(θ, x) : Γ_{n θ} (ω_{0}, x) \in B and N_{\dots} (ω_{0}, \infty) = ℓ} .

On the set A_l = {ω: N_…(ω₀, τ) = ℓ}∈ Inline graphic , the process Γ_nθ is a weighted sum

Γ_{n θ} (x, ω_{0}) = \sum_{k = 1}^{ℓ} 1 (T_{k} (ω_{0}) \leq x) h_{n θ} (\cdot, k, ω_{0})

and the weights form a finite composition of Carathéodory integrands. Suppressing dependence on ω₀, h_nθ(·, k) is the k-th column of a q × ℓ matrix h_n with entries

h_{n θ} (j, k) = \frac{\sum_{i} \sum_{m} 1 (R_{i m} = k) 1 ((J_{i m}, J_{i m + 1}) = j)}{\sum_{i} \sum_{m} 1 (R_{i m} \geq k, J_{i m} = j_{1}) α_{j} (g_{n θ} (\cdot, k - 1), θ, Z_{i m})},

where j = (j₁, j₂) ∈ Inline graphic and g_nθ is a q × ℓ matrix with columns

g_{n θ} (\cdot, 0) = 0 g_{n θ} (\cdot, k) = g_{n θ} (\cdot, k - 1) + h_{n θ} (\cdot, k) .

Alternatively, $g_{n θ} = g_{n θ}^{(ℓ)}$ , where $g_{n θ}^{(0)} \equiv 0$ and for r = 1, …, ℓ

g_{n θ}^{(r)} (\cdot, k) = \sum_{p \leq k} h_{n θ}^{(r)} (\cdot, p),

where

h_{n θ}^{(r)} (j, k) = \frac{\sum_{i} \sum_{m} 1 (R_{i m} = k) 1 ((J_{i m}, J_{i m + 1}) = j)}{\sum_{i} \sum_{m} 1 (R_{i m} \geq k, J_{i m} = j_{1}) α_{j} (g_{n θ}^{(r - 1)} (\cdot, k - 1, θ, Z_{i m})}

for j = (j₁, j₂) ∈ Inline graphic . The indicators 1(T_k(ω₀) ≤ x) are jointly measurable with respect to ⊗ ( ) and by Lemma 5.1, so are the weights h_nθ and g_nθ. Therefore the graph of H_n_ℓ is ⊗ ( ) is measurable and

{(ω_{0}, x, θ) : Γ_{n θ} (ω_{0}, x) \in B} = graph H_{n} = \underset{l \geq 0}{\cup} graph (H_{n ℓ}) \in S_{n} \otimes B (T) .

A similar argument can be used to show measurability of the process Γ̇_nθ in part (ii). Using arguments analogous to Dabrowska (2006), $| | Γ_{n θ_{0} + h_{n}} - Γ_{n θ_{0}} - h_{n} {\dot{Γ}}_{n θ_{0}} | | = O_{P} ({| | h_{n} | |}_{1}^{2})$ and ||Γ̇_{nθ₀+h_n}− Γ̇_nθ₀|| = O_P (||h_n||₁) = o_P (1) for any deterministic sequence h_n → 0 or a random $S_{n}^{P}$ - measurable sequence h_n →_P0. Therefore if θ̂ is an $S_{n}^{P}$ -measurable $\sqrt{n}$ - consistent estimator of θ₀, then setting h_n = θ̂ − θ₀, we have Ŵ₀(x) = Ŵ(x, θ₀) + rem_n(x), where ${rem}_{n} = \sqrt{n} [Γ_{n \hat{θ}} - Γ_{n θ_{0}} - (\hat{θ} - θ_{0}) {\dot{Γ}}_{n \hat{θ}}] = o_{P} (1)$ . For non-measurable h_n and θ̂_n, convergence is in outer probability.

Let us assume now that f_j(y, θ, z), j ∈ Inline graphic is a scalar Carathéodory integrands such that | f_j(y, θ, z)| ≤ ψ̃(||y||₁) and | f_j(y, θ′, z) − f_j(y′, θ′, z)| ≤ [|θ − θ′| + ||y − y′||₁] max( ψ̃′(||y||₁), ψ̃′||y′||₁) where ψ̃= ψ₁,ψ₂ and,ψ̃′ = ψ₃ satisfy conditions 5.1. Put $S_{j} [f_{j}] (u, θ) = n^{- 1} \sum_{i = 1}^{n} S_{j . i} [f_{j}] (u, θ)$ , where S_j.i[f_j](u, θ) = Σ_m Y_jmi(u)(f_jα_j)( Γ_θ (u), θ, Z_jmi), and let s_j[f_j] = ES_j[f_j]. We write S_j[1] and s_j[1] when f_j ≡ 1, and set ê_j[f_j] = S_j[f_j]/S_j[1] and e_j[f_j] = s_j[f_j]/s_j[1].

Lemma 5.3

We have ||S_j[f_j]/s_j[1] − s_j[f_j]/s_j[1]|| → 0 a.s. for all j ∈ Inline graphic .

Proof

We have ([S_j[f_j]/s_j[1]])(x, θ) = ℙ_ng_xθ, where

g_{x θ} (D_{i}) = \frac{\sum_{m} Y_{jmi} (x) (f_{j} α_{j}) (Γ_{θ} (x), θ, Z_{jmi})}{E \sum_{m} Y_{jmi} (x) α_{j} (Γ_{θ} (x), θ, Z_{jmi})} .

The conditions 5.1 imply that there exist constants C₁ and C₂ (dependent on the functions ψ̃, ψ̃′) such that

\begin{array}{l} ∣ g_{x θ} (D_{i}) - g_{x^{'} θ} (D_{i}) ∣ \leq C_{1} [∣ Y_{j . i} (x^{'}) - Y_{j . i} (x) ∣ + \\ Y_{j . i} (0) (∣ {E N}_{j . i} (x) - {E N}_{j . i} (x^{'}) ∣ + ∣ E Y_{j . i} (x) - E Y_{j . i} (x^{'}) ∣)], \\ ∣ g_{x θ} (D_{i}) - g_{x θ^{'}} (D_{i}) ∣ \leq ∣ θ - θ^{'} ∣ C_{2} Y_{j . i} (0) [1 + {E N}_{j . i} (τ) + E Y_{j . i} (0)] . \end{array}

Define G(D_i) = Y_j.i(0)[C₂diam Θ+C₁][1+EN_j.i(τ)+EY_j.i(0)]+g_x₀θ₀ (D_i), where (x₀, θ₀) is an arbitrary point in Θ × [0, τ]. Let θ_p, p = 1, …, ℓ = O(diam Θ/ε)^d be centers of balls B(θ_p, ε) of radius ε covering the set Θ. By noting that EN_j.i is an increasing continuous function and EY_j.i is a decreasing cáglád function, we can construct a finite partition 0 = x₀ < x₁ < … < x_k = τ such that the intervals I_r = [x_r₋₁, x_r], r = 1, …, k satisfy EN_j.i(I_q) ≤ εEN_j.i(τ) and E|Y_j.i(I_r)| ≤ εEY_j.i(0). Let x_q be the center of the interval I_r. Then for each x ∈ I_r and θ ∈ B(θ_p, ε), we have ||g_xθ(D_i) − g_{x_rθ_p}(D_i)||_P_,1 ≤ ε||G(D_i)||_P_,1. It follows that the class of functions Inline graphic = {g_xθ: x ∈ [0, τ], θ ∈ Θ} is Euclidean for the envelope G(D_i) and Glivenko-Cantelli.

Lemma 5.4

For j ∈ Inline graphic , define rem_j(x, θ) = [V_jn − V₁_jn](x, θ) and

B_{j} (x, θ) = \int_{0}^{x} [{\hat{e}}_{j} [f_{j}] - e_{j} [f_{j}]] (u, θ) M_{j . .} (d u, θ),

where f_j satisfies assumptions of Lemma 5.3. Then $| | \sqrt{n} {rem}_{j} | | = o_{P} (1)$ and $| | \sqrt{n} B_{j} | | = o_{P} (1)$ .

Proof

For the sake of convenience write rem = rem_j and B = B_j. Put η_j(u, θ) = [S_j/s_j](Γ_θ(u), θ, u) − 1. A little algebra shows that

\begin{array}{l} rem (x, θ) = - \int_{0}^{x} η_{j} (u, θ) \frac{[N_{j . .} - {E N}_{j . .}] (d u)}{s_{j} [1] (u, θ)} + \int_{0}^{x} η_{j}^{2} (u, θ) \frac{N_{j . .} (d u)}{S_{j} [1] (u, θ)} \\ = {rem}_{1} (x, θ) + {rem}_{2} (x, θ) . \end{array}

We have rem₂(x, θ) = O_P (1)rem₃(τ, θ), where

{rem}_{3} (x, θ) = \int_{0}^{x} η_{j}^{2} (u, θ) \frac{[N_{j . .} - {E N}_{j . .}] (d u)}{s_{j} [1] (u, θ)} + \int_{0}^{x} η_{j}^{2} (u, θ) \frac{{E N}_{j . .} (d u)}{s_{j} [1] (u, θ)} .

In addition,

\begin{array}{l} B (x, θ) = \int_{0}^{x} (\frac{S_{j} [f_{j}] - S_{j} [1] e_{j} [f_{j}]}{s_{j} [1]}) (u, θ) [N_{j . .} - {E N}_{j . .}] (d u) \\ - \int_{0}^{x} [(\frac{S_{j} [f_{j}] - s_{j} [f_{j}]}{s_{j} [1]}) η_{j}] (u, θ) [N_{j . .} - {E N}_{j . .}] (d u) \\ - \int_{0}^{x} [(\frac{S_{j} [f_{j}] - s_{j} [f_{j}]}{s_{j} [1]}) η_{j}] (u, θ) {E N}_{j . .} (d u) \\ + \int_{0}^{x} S_{j} [f_{j}] (u, θ) {rem}_{2} (d u, θ) = \sum_{p = 1}^{4} B_{p} (x, θ) . \end{array}

We have B₄(x, θ) = O_P (1)B₅(θ),

B_{5} (θ) = \int_{0}^{τ} (S_{j} [∣ f_{j} ∣] - s_{j} [∣ f_{j} ∣]) (u, θ) {rem}_{3} (d u, θ) + \int_{0}^{τ} s_{j} [∣ f_{j} ∣]) (u, θ) {rem}_{3} (d u, θ) .

These expressions can be rewritten as V processes of degree r + 1, r ≤ 3

V_{n, r + 1} (g) = \frac{1}{n^{r + 1}} \sum_{i_{r + 1}} g (D_{i_{r + 1}}), g \in G,

where the sum extends over sequences r + 1-tuplets D_{i_r+1}= (D_i₁, …, D_{i_r+1}) i_r₊₁ = (i_r₁, …, i_r+1), i_j ∈ 1, …, n. The kernels g vary over the class Inline graphic = {g_t: t ∈ }, where for t = (x, θ) we have

g_{t} (D_{i_{r + 1}}) = \int_{0}^{x} \prod_{ℓ = 1}^{r} [h_{ℓ} (D_{i_{ℓ}}, θ, u) - {E h}_{ℓ} (D_{i_{ℓ}}, θ, u)] [N_{j . i_{r + 1}} - {E N}_{j . i_{r + 1}}] (d u)

(5.4)

g_{t} (D_{i_{r + 1}}) = \int_{0}^{x} \prod_{ℓ = 1}^{r + 1} [h_{ℓ} (D_{i_{ℓ}}, θ, u) - {E h}_{ℓ} (D_{i_{ℓ}}, θ, u)] {E N}_{j .} (d u) .

(5.5)

Here h_ℓ(D_i_ℓ) are functions of the form S_j[f_j]/s_j[1], S_j[1]/s[1] and $(\sqrt{s_{j} [∣ f_{j} ∣]}) S_{j} [1] / s [1]$ . In all cases, there exists a constant C, such that h_ℓ(D_i, θ, u) ≤ CY_ji(u) and |h_ℓ(D_i, θ, u)− h_ℓ(D_i, θ′, u)| ≤ |θ − θ′|CY_j.i(u). Therefore, for any sequence D_{i_r+1} = (D_i₁, …, D_{i_r+1}), we also have

\begin{array}{l} ∣ g_{x θ} - g_{x^{'} θ} ∣ (D_{i_{r + 1}}) \leq ∣ G (D_{i_{r + 1}}, x) - G (D_{i_{r + 1}}, x^{'}) ∣, \\ ∣ g_{x θ} - g_{x θ^{'}} ∣ (D_{i_{r + 1}}) \leq ∣ θ - θ^{'} ∣ G (D_{i_{r + 1}}, τ), \end{array}

where

G (D_{i_{r + 1}}, x) = \int_{0}^{x} \prod_{ℓ = 1}^{r} [H_{ℓ} (D_{i_{ℓ}}, u) + {E H}_{ℓ} (D_{i_{ℓ}}, u)] [N_{j . i_{r + 1}} + {E N}_{j . i_{r + 1}}] (d u)

and H_ℓ(D_i, u) = CY_{j.i_ℓ}(u), ℓ = 1, …, r for some constant C.

Let { Inline graphic (g_t): t ∈ } denote the U process associated with the kernels (5.4–5.5). It is easy to see that (g_t) forms a canonical process. For D_r₊₁ = (D₁, …, D_r₊₁), we have EG^p(D_r₊₁) < ∞ for p = 1+1/(2r+1). Therefore, by Marcinkiewicz-Zygmund law in Teicher (1998) and Lemma A.1 in Dabrowska (2009), $\sqrt{n} {sup}_{t} ∣ U_{r + 1, n} (g_{t}) ∣ \to_{P} 0$ . By Marcinkiewicz-Zygmund theorem in de la Peña and Giné (1999), we also have $\sqrt{n} {sup}_{t} ∣ V_{r + 1, n} (g_{t}) - U_{r + 1, n} (g_{t}) ∣ \to 0$ a.s. because

E G {(D_{i_{r + 1}})}^{2 d (i_{r + 1}) / (2 r + 1)} < \infty,

where i_r₊₁ = (i₁, …, i_r₊₁) and d = d(i_r₊₁) is the number of distinct coefficients among {i₁, …, i_r₊₁}, d = 1, …, r, r ≤ 3.

We denote now by ||B||_v the variation norm of a d × q-matrix of functions B(x) = [b_kl(x)], x ∈ [0, τ]. For any interval I ⊆ [0, τ], ${| | B | |}_{v} (I) = sup \sum_{i = 1}^{m} {| | B (x_{j}) - B (x_{j - 1}) | |}_{1}$ , where the supremum is taken over finite partitions of I such that x_i < x_j.

Further, let Inline graphic (θ₀, ε_n) be a ball centered at θ₀ of radius ε_n, ε_n ↓ 0, $\sqrt{n} ε_{n} ↑ \infty$ . Suppose that ϕ_θ(x) is a d × q matrix of functions, with columns of the form $\int_{0}^{x} g_{j θ} d Γ_{θ, j}$ such that ||ϕ_θ₀||_v = O(1). Let ϕ_nθ be a sequence of consistent estimators such that

(i)
ϕ_nθ(x) is a càdlàg or càglàd function (jointly in (x, θ)), continuous with respect to θ;
(ii)
lim sup_n sup{||ϕ_nθ||_v: θ ∈ (θ₀, ε_n)} = O_P (1);
(iii)
sup{||ϕ_nθ − ϕ_θ₀||_∞: θ ∈ (θ₀, ε_n)} = o_P (1) or
(iii′)
ϕ_nθ − ϕ_nθ_′ = (θ − θ′)ψ_nθ_,_θ_′ where lim sup_n sup{||ψ_nθθ_′||_v: θ, θ′ ∈ (θ₀, ε_n)} = O_P (1).

If ϕ_nθ is a jointly $S_{n}^{P} \otimes B (T)$ measurable estimator then conditions (ii)–(iii) are assumed to hold in probability. If this is not the case then the conditions (ii)–(iii) are taken to hold in outer probability.

Lemma 5.5

If ϕ_nθ(x) is a measurable process satisfying (i)–(ii) and (iii) or (iii′) then with probability tending to 1, the equation U_{nφ_n}(θ) = 0 has a consistent root θ̂ in the ball (θ₀, ε_n). In addition, under the condition (iii′), the score equation has a unique root in (θ₀, ε_n), with probability tending to 1.
If ϕ_nθ is not measurable, then statements in part (1) hold with inner probability tending to 1.
If θ̃ is an arbitrary consistent estimator of θ₀, then the equation U_{nϕ̃_n}(θ) = 0, where ϕ̃_n(x) = ϕ_nθ̃(x) has a unique solution θ̂, with (inner) probability tending to 1, and U_{nϕ_n}(θ̂) = o_p^*(n^−1/2).

In all three cases, $\hat{Ξ} = \sqrt{n} (\hat{θ} - θ_{0})$ and the process ${\hat{W}}_{0} = {\sqrt{n} [Γ_{n \hat{θ}} - Γ_{θ_{0}} - {(\hat{θ} - θ_{0})}^{T} {\dot{Γ}}_{n \hat{θ}}] (x) : x \leq τ}$ converge weakly in R^d×ℓ^∞([0, τ]× Inline graphic ) to a mean zero Gaussian process defined in the statement of Proposition 3.1.

Proof

Case (1)

Write U_n(θ) = U _{nϕ_n}(θ) for short. Set b̃_jmi(Γ_θ(u), θ, u) = = b̃_jmi1(Γ_θ(u), θ, u) − ϕ_θ₀(u)b̃_jmi2(Γ_θ(u), θ, u) where

\begin{array}{l} {\tilde{b}}_{jmi 1} (Γ_{θ} (u), θ, u) = {\dot{ℓ}}_{j} (Γ_{θ} (u), θ, Z_{jmi}) - e_{j} [{\dot{ℓ}}_{j}] (u, θ), \\ {\tilde{b}}_{jmi 2} (Γ_{θ} (u), θ, u) = ℓ_{j}^{'} (Γ_{θ} (u), θ, Z_{jmi}) - e_{j} [ℓ_{j}^{'}] (u, θ) . \end{array}

Define b̄_jmi(Γ_θ(u), θ, u), b̄_jmi₁(Γ_θ(u)θ, u) and b̄_jmi₂(Γ_θ(u), θ, u) using similar expressions with e_j[ℓ̇_j] and $e_{j} [ℓ_{j}^{'}]$ replaced by ê_j[ℓ̇_j] and ${\hat{e}}_{j} [ℓ_{j}^{'}]$ . We have $U_{n} (θ) = \sum_{p = 1}^{4} U_{n p} (θ)$ , where

\begin{array}{l} U_{1 n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{x} {\tilde{b}}_{jmi} (Γ_{θ} (u), θ, u) M_{jmi} (d u, θ), \\ U_{2 n} (θ) = \sum_{q = 1}^{2} \int_{0}^{τ} r_{n q} (d u, θ) {[Γ_{n θ} - Γ_{θ}]}^{T} (u) = \sum_{q = 1}^{2} U_{2 n; q} (θ), \\ U_{3 n} (θ) = - \sum_{j} \int_{0}^{τ} [({\hat{e}}_{j} [{\dot{ℓ}}_{j}] - e_{j} [{\dot{ℓ}}_{j}]) (u, θ) - ϕ_{θ_{0}} (u) ({\hat{e}}_{j} [ℓ_{j}^{'}] - e_{j} [ℓ_{j}^{'}]) (u, θ)] M_{j . .} (d u, θ), \\ U_{4 n} (θ) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{τ} [ϕ_{n θ} - ϕ_{θ_{0}}] (u)] {\hat{b}}_{jmi 2} (Γ_{n θ} (u), θ, u) N_{jmi} (d u), \end{array}

and

\begin{array}{l} r_{n 1} (x, θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{x} {\bar{b}}_{jmi}^{'} (Γ_{θ} (u), θ, u) N_{jmi} (d u), \\ r_{n 2} (x, θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{1} \int_{0}^{x} [{\bar{b}}_{jmi}^{'} (Γ_{n θ}^{λ} (u), θ, u) N_{jmi} (d u) d λ - r_{n 1} (x, θ) . \end{array}

Here $Γ_{n θ}^{λ} = Γ_{θ} + λ (Γ_{n θ} - Γ_{θ})$ for λ ∈ (0, 1). We have $U_{2 n; 2} (θ_{0}) = \int_{0}^{τ} O_{P} ({| | Γ_{n θ_{0}} - Γ_{θ_{0}} | |}^{2}) \sum_{j} N_{j . .} (d u) = o_{P} (n^{- 1 / 2})$ . Moreover, r₁_n(x, θ₀) converges almost surely to

r (x, θ_{0}) = \sum_{j} \int_{0}^{x} [{cov}_{j} (ℓ_{j}^{'}, {\dot{ℓ}}_{j}) (u, θ_{0}) - ϕ_{θ_{0}} (u) {cov}_{j} (ℓ_{j}^{'}, ℓ_{j}^{'}) (u, θ_{0})] {E N}_{j . .} (d u)

uniformly in x, x ≤ τ. Lemma 5.2 and integration by parts imply that the terms [ $\sqrt{n} U_{1 n} (θ_{0}), \sqrt{n} U_{2 n; 1} (θ_{0})$ ] converge weakly to a pair of independent normal variables with mean zero and covariances Σ₀(θ₀) and Σ₂(θ₀) − Σ₀(θ₀), respectively. By Lemma 5.3–4, we also have U₃_n(θ₀) = o_P (n^−1/2). Finally,

U_{4 n} (θ) = - \sum_{p = 1}^{3} \int_{0}^{τ} [ϕ_{n θ} - ϕ_{θ_{0}}] (u)] B_{p n} (d u, θ) = \sum_{p = 1}^{3} U_{4 n; p} (θ),

where

\begin{array}{l} B_{1 n} (x, θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{x} [{\hat{b}}_{2 jmi} (Γ_{n θ} (u), θ, u) - {\hat{b}}_{2 jmi} (Γ_{θ} (u), θ, u)] N_{jmi} (d u), \\ B_{2 n} (x, θ) = - \sum_{j} \int_{0}^{x} ({\hat{e}}_{j} [ℓ_{j}^{'}] - e_{j} [ℓ_{j}^{'}]) (u, θ) M_{j . .} (d u, θ), \\ B_{3 n} (x, θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j} \sum_{m} \int_{0}^{x} {\tilde{b}}_{2 jmi} (Γ_{θ} (u), θ, u) M_{jmi} (d u, θ) . \end{array}

By Lemmas 5.2–5.4, we have $\sqrt{n} U_{4 n; 2} (θ) = o_{P} (1)$ and $\sqrt{n} U_{4 n; 1} (θ) = \sum_{j} \int_{0}^{τ} O_{P} (\sqrt{n} {| | Γ_{n θ} - Γ_{θ} | |}_{1} (u) {| | ϕ_{n θ} - ϕ_{θ_{0}} | |}_{1} (u) N_{j . .} (d u) = o_{P} (1)$ , uniformly in θ ∈ Inline graphic ( θ₀, ε_n). On the other hand, at θ = θ₀, { $\sqrt{n} B_{3 n} (x, θ_{0}) : x \leq τ$ } is a sum of iid mean zero processes. The finite dimensional distributions are mean zero variables with finite variance-covariance matrix and converge weakly to mean zero Gaussian variables. Each component of B₃_n(x, θ₀) is a measurable process which can be represented as a finite linear combination of càdlàg monotone functions of x with a square integrable envelope satisfying (5.2). The same argument as in Lemma 5.2 implies that the process is $\sqrt{n} B_{3 n} (x, θ_{0})$ converges weakly to a mean zero Gaussian process with sample paths continuous with respect to the variance semi-metric. The space of functions continuous with respect to the variance semi-metric is isometric to the space C([0, τ])^q. By almost sure representation theorem and a similar integration by parts argument as in Bilias et al (1997) we have $\sqrt{n} U_{4 n; 3} (θ_{0}) = o_{P} (1)$ .

Set ${\hat{U}}_{n} (θ) = \sum_{j = 1}^{3} U_{j n} (θ)$ . Some elementary algebra shows that for θ, θ′ ∊ Inline graphic (θ₀, ε_n), we have Û_n(θ) = Û_n(θ′) + (Σ_n(θ₀) + rem₀_n(θ, θ′))(θ − θ′), where Σ_n(θ₀) is a matrix which converges in probability −Σ₁(θ₀). The matrix Σ₁(θ) is defined in Section 3 and is non-singular. Further, U₄_n(θ) − U₄_n(θ′) = rem₂_n(θ, θ′)(θ − θ′) + rem₃_n(θ, θ′) + O(|θ − θ₀| ∨ |θ′ − θ₀|)rem₄_n(θ, θ′). Setting ${rem}_{1 n} (θ, θ^{'}) = I + \sum_{1}^{- 1} (θ_{0}) [\sum_{n} (θ_{0}) + {rem}_{0 n} (θ, θ^{'})]$ , and b_qn = sup{|rem_qn(θ, θ′)|: θ, θ′ ∈ Inline graphic ( θ₀, ε_n)}, q = 1, …, 4, we have b₁_n = o_P (1), b₂_n = o_P (1). Under the condition (iii′), rem_nq ≡ 0 ≡ b_qn, q = 3, 4, while under the condition (iii), b₃_n = o_P (n^−1/2) and b₄_n = o_P (1).

Put a_n=b₁_n + b₂_n + b₄_n and A_n= b₅_n + b₃_n, where b₅_n=|Σ(θ₀)⁻¹Û_n(θ₀)| = O_P (n^−1/2). Let 0 < η < 1/2 and 0 < η′ < 1 be given. By asymptotic tightness of A_n, we can find a compact set K = K(η) and n₀ such that for all n ≥ n₀ and all open sets G containing K, we have $P_{n} (\sqrt{n} A_{n} \notin G) < η$ and P_n(a_n > η′) < η. Therefore, we also have $P_{n} (\sqrt{n} A_{n} > M (1 - η^{'})) < η$ for all finite M ≥ M₀, where M₀ = M₀(η) is a large enough finite nonnegative constant. Since $\sqrt{n} ε_{n} ↑ \infty$ and ε_n ↓ 0, by eventually increasing n₀, we can assume that for n ≥ n₀, we have Inline graphic (θ₀, ε_n) ⊂ Θ and $M < \sqrt{n} ε_{n}$ . Consequently, the set E_n ⊂ given by E_n = {ω₀: A_n(ω₀)/(1 − a_n(ω₀) < ε_n, a_n(ω₀) ≤ η′} satisfies P_n(E_n) ≥ 1 − 2η for all n ≥ n₀.

For n ≥ n₀, consider the set-valued mapping H_n: Inline graphic , ↪ R^d given by

\begin{array}{l} H_{n} (ω_{0}) = \bar{B} (θ_{0}, \frac{A_{n} (ω_{0})}{1 - a_{n} (ω_{0})}) = {θ : ∣ θ - θ_{0} ∣ \leq \frac{A_{n} (ω_{0})}{1 - a_{n} (ω_{0})}} if ω_{0} \in E_{n}, \\ = \emptyset if ω_{0} \notin E_{n} . \end{array}

The graph of H_n, graphH_n = {(ω₀, θ): θ ∈ H_n(ω₀)} is $S_{n}^{P} \otimes B (Θ)$ -measurable and $H_{n} = E_{n} \in S_{n}^{P}$ . Further, let $g_{n} (ω_{0}, θ) = θ + \sum_{1}^{- 1} (θ_{0}) U_{n} (ω_{0}, θ)$ . Then g_n is $S_{n}^{P} \otimes B (Θ)$ measurable, because it is continuous with respect to θ for fixed ω₀ and $S_{n}^{P}$ -measurable for fixed θ. It follows that the set valued mapping

\begin{array}{l} C_{n} (ω_{0}) = {θ : g_{n} (ω_{0}, θ) = 0 and θ \in H_{n} (ω_{0})} for ω_{0} \in E_{n}, \\ = \emptyset for ω_{0} \notin E_{n} \end{array}

is closed-valued and has an $S_{n}^{P} \otimes B (Θ)$ - measurable graph. We have domC_n = E_n: for fixed ω₀ ∈ E_n, H_n(ω₀) is a closed ball, g_n(ω₀, θ) is continuous and maps H_n(ω₀) into itself. By Brouwer’s fixed point theorem, C_n(ω₀) ≠ ∅. Thus E_n ⊆ domC_n, while the reversed inclusion is obvious.

Further, for any root θ̂ in domC_n, we have ${∣ \sqrt{n} (\hat{θ} - θ_{0}) ∣}^{*} \leq A_{n} / (1 - a_{n}) = O_{P} (1)$ , and $\sqrt{n} (\hat{θ} - θ_{0}) = \sum {(θ_{0})}^{- 1} \sqrt{n} {\hat{U}}_{n} (θ_{0}) + o_{P^{*}} (n^{- 1 / 2})$ so that $\sqrt{n} (\hat{θ} - θ_{0})$ converges in law to the normal distribution given in Section 3. An argument similar to Bickel et al. (1993, p.517) shows also that under the condition (iii′), g_n(ω₀, θ) is a contraction on H_n(ω₀), ω₀ ∈ E_n, with contraction coefficient a_n(ω₀). Thus in this case, the root is unique: C_n(ω₀) = {θ̂ (ω₀)} for ω₀ ∈ E_n and n ≥ n₀.

Case (2)

If ϕ_nθ estimators are not $S_{n}^{P} \otimes B (T)$ measurable, then the score function splits into two parts: Ũ_n(θ) = Û_n(θ) + U₄_n(θ). The term Û_n(θ) remains $S_{n}^{P} \otimes B (Θ)$ measurable, while the second term is not. However, b₃_n = o_p^*(n^−1/2), a_n = o_P_* (1) while b₅_n = |Σ(θ₀)⁻¹Û_n(θ₀)| = O_p(n^−1/2). In this case, the set E_n satisfies lim inf_n P_n_,*(E_n) ≥ 1−2η and the closed ball Inline graphic (θ₀, A_n/1−a_n) is contained in ( θ₀, ε_n) with inner probability tending to 1.

Case (3)

We write Ũ_n(θ) for the modified score function obtained by substituting in ϕ̃_n(x) = ϕ_nθ̃(x) in place of ϕ_nθ. Suppose that θ̃ is $S_{n}^{P}$ -measurable and ϕ_nθ(x) is $S_{n}^{P} \otimes B (T)$ measurable. Then the plug-in estimator ϕ_nθ̃(x) is $S_{n}^{P} \otimes B ([0, τ])$ measurable and the modified score process Ũ_n(θ) is $S_{n}^{P} \otimes B (Θ)$ measurable. Moreover, we have Ũ_n(θ) = Û_n(θ) + Ũ_n₄(θ), where the remainder Ũ₄_n(θ) satisfies $\sqrt{n} [U_{n 4} (θ) - {\tilde{U}}_{n 4} (θ)] = o_{P} (1 + \sqrt{n} ∣ θ - θ_{0} ∣)$ , uniformly in θ ∈ B(θ₀, ε₀) and ${\tilde{U}}_{4 n} (θ) - {\tilde{U}}_{4 n} (θ^{'}) = (θ - θ^{'}) {\tilde{rem}}_{2 n} (θ, θ^{'}), sup {{\tilde{rem}}_{2 n} (θ, θ^{'}) : θ, θ^{'} \in B (θ_{0}, ε_{n})} = o_{P} (1)$ . With probability tending to 1, the modified equation has a unique root θ̂ in a compact random ball contained in B(θ₀, ε_n) and U_n(θ̂) = o_P^*(n^−1/2). On the other hand, if either θ̃ or ϕ_nθ are not measurable, then this remains to hold, except that the modified equation has a unique solution with inner probability tending to one.

Under assumptions of part (1), measurable selection theorems (Wagner, 1976) ensure that there exists at least one function $\hat{\hat{θ}} : S_{n} \to R^{d}$ such that $\hat{\hat{θ}} (ω_{0}) \in C_{n} (ω_{0})$ whenever ω₀ ∈ E_n and $\hat{\hat{θ}}$ is measurable with respect to $S_{n}^{p}$ . This also applies to part (3), provided θ̃ and ϕ_nθ are $S_{n}^{P}$ - measurable.

5.4 Proof of Proposition 3.2

With some abuse of notation, set V = [V_j, j ∈ Inline graphic ] where V (x) = V (x, θ₀) and V (x, θ) is the Gaussian process of Lemma 5.1. Under the assumption that θ₀ is the true parameter of the modulated renewal process, the process V corresponds to a vector of independent time-transformed Brownian motions with covariance

cov (V_{j} (x), V_{j} (y)) = C_{j} (x \land y) and cov (V_{j} (x), V_{ℓ} (y)) = 0 if j \neq ℓ .

Similarly, let V̌ = [V̌_j: j ∈ Inline graphic ] be equal to $\overset{ˇ}{V} (x) = \sqrt{n} V_{1 n} (x, θ_{0})$ where V₁_n(x, θ) is defined as in Lemma 5.1. Thus the j-th component of V˘ is

{\overset{ˇ}{V}}_{j} (x) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{m} \int_{0}^{x} \frac{M_{jmi} (d u)}{s_{j} (Γ_{θ_{0}} (u -), θ_{0}, u)} .

Put ${\overset{ˇ}{V}}^{#} = [{\overset{ˇ}{V}}_{j}^{#} : j = 1, \dots, q]$ ,

{\overset{ˇ}{V}}_{j}^{#} (x) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{m} G_{m i} \int_{0}^{x} \frac{N_{jmi} (d u)}{s_{j} (Γ_{θ_{0}} (u -), θ_{0}, u)} .

Finally, let G₀ be a Inline graphic (0, I_d_×_d) variable, independent of (D_i, G_i)’s. Set $Ξ_{1}^{#} = \sum_{1}^{- 1} (θ_{0}) \sum_{0} {(θ_{0})}^{1 / 2} G_{0}$ and ${\hat{Ξ}}_{1}^{#} = {\sum^{^}}_{1}^{- 1} (\hat{θ}) {\sum^{^}}_{0} {(\hat{θ})}^{1 / 2} G_{0}$ . We have $E {\overset{ˇ}{V}}_{j}^{#} (x) = 0 = E {\overset{ˇ}{V}}_{j} (x)$ ,

\begin{array}{l} cov ({\overset{ˇ}{V}}_{j}^{#} (x), {\overset{ˇ}{V}}_{ℓ}^{#} (x^{'})) = cov ({\overset{ˇ}{V}}_{j} (x), {\overset{ˇ}{V}}_{ℓ} (x^{'})) = δ_{j l} C_{j θ_{0}} (x \land x^{'}), \\ cov ({\overset{ˇ}{V}}_{j}^{#} (x), {\overset{ˇ}{V}}_{ℓ} (x^{'})) = 0. \end{array}

(5.6)

Moreover, $Ξ_{1}^{#}$ is independent of D₁, …, D_n. This also means that it is independent of (V̌^#, V̌).

We consider first unconditional weak convergence. By central limit theorem and strong law of large numbers, the finite dimensional distributions of the processes (V̌, V̌^#) converge weakly to finite dimensional distributions of (V, V^#), two independent vectors of Brownian motions with variance functions C_{j, θ₀}, j = 1, …, q.

For each j = 1, …, q, the process ${\overset{ˇ}{V}}_{j}^{#}$ can be represented as ${\overset{ˇ}{V}}_{j}^{#} (x) = n^{- 1 / 2} \sum_{i = 1}^{n} f_{x}^{(j)} (G_{i}, D_{i})$ , where

f_{x}^{(j)} (G_{i} D_{i}) = \sum_{m} G_{m i} \int_{0}^{x} \frac{N_{jmi} (d u)}{s_{j} (Γ_{θ_{0}} (u -), θ_{0}, u)} .

The class of functions $F_{j} = {f_{x}^{(j)} (G_{i}, D_{i}) : x \in [0, τ]}$ has a square integrable envelope

F_{j} (G_{i}, D_{i}) = \sum_{m = 1} ∣ G_{m i} ∣ \int_{0}^{τ} \frac{N_{jmi} (d u)}{s_{j} (Γ_{θ_{0}} (u -), θ_{0}, u)}

and is Euclidean for this envelope because each $f_{x}^{(j)} \in F_{j}$ is a difference of two functions increasing in x and bounded by F_j(G_i, D_i). Thus Inline graphic forms a Donsker class of functions. The union of these classes, = ∪_j is Donsker as well. From Lemma 1, the process V̌ = {V̌_j(x): x ∈ [0, τ], j ∈ } can be also represented as an empirical process over a Euclidean class of functions and the union ∪ forms a Donsker class. Using consistency of the estimates (θ̂, Γ_nθ̂), Lemma 5.5 and a couple of lines integration by parts yields also ||V̂^# − V̌^#|| = o_P (1) in outer probability.

Write V̌^# as the empirical process V̌ ^# = ℙ_nf, f ∈ Inline graphic . Further, let BL₁ be the collection of Lipschitz functions h from R^d × ℓ^∞( ) into [0, 1], such that |h(r, w) − h(r′, w′)| ≤ |r − r′| + ||w − w′|| for r, r′ ∈ R^d and w, w′ ∈ ℓ^∞( ). The set is totally bounded with respect to the variance pseudo-metric d. Therefore, for fixed δ > 0, it can be covered by a finite number of d-balls of radius δ, say Inline graphic (f_l,δ) ℓ = 1, …, k = k(δ). Set V ^# ∘ π_δ = ℙ_nπ_δ(f ), where π_δ(f ) = f_ℓ for f ∈ (f_ℓ,δ) (pick one f_ℓ for each f ∈ ). By triangular inequality, we have

sup_{h \in {B L}_{1}} ∣ E_{G} h ({\hat{Ξ}}_{1}^{#}, {\hat{V}}^{#}) - E h (Ξ_{1}^{#}, V^{#}) ∣ \leq \sum_{r = 1}^{4} I_{4} (δ),

where

\begin{array}{l} I_{1} (δ) = sup_{h \in {B L}_{1}} ∣ E h (Ξ_{1}^{#}, V^{#} \circ π_{δ}) - E h (Ξ_{1}^{#}, V^{#}) ∣, \\ I_{2} (δ) = sup_{h \in {B L}_{1}} ∣ E h (Ξ_{1}^{#}, V^{#} \circ π_{δ}) - E_{G} h (Ξ_{1}^{#}, {\overset{ˇ}{V}}^{#} \circ π_{δ}) ∣, \\ I_{3} (δ) = sup_{h \in {B L}_{1}} ∣ E_{G} h (Ξ_{1}^{#}, {\overset{ˇ}{V}}^{#} \circ π_{δ}) - E_{G} h (Ξ_{1}^{#}, {\overset{ˇ}{V}}^{#}) ∣, \\ I_{4} (δ) = sup_{h \in {B L}_{1}} ∣ E_{G} h ({\hat{Ξ}}_{1}^{#}, {\hat{V}}^{#}) - E_{G} h (Ξ_{1}^{#}, {\overset{ˇ}{V}}^{#}) ∣ . \end{array}

For given ε > 0, we can choose δ₀ so that I₁(δ) < εfor all δ <δ₀. The second term converges in outer probability to 0, for any δ. This follows from weak convergence of finite dimensional distributions of V̌^# and the same argument as in Van der Vaart and Wellner (1996, p. 182), except that in our setting, the Lindeberg condition of their Lemma 2.9.5 is not needed to verify conditional weak convergence of finite dimensional distributions. We also have $I_{3} (δ) \leq E_{G}^{*} {| | {\overset{ˇ}{V}}^{#} \circ π_{δ} - {\overset{ˇ}{V}}^{#} | |}_{F_{δ}} \leq \sum E_{G}^{*} {| | {\overset{ˇ}{V}}^{#} | |}_{F_{δ}}$ where Inline graphic = {f − f′: f, f′ ∈ : d(f − f′) < δ}. Since forms a Euclidean class of functions with a square integrable envelope, we have ${lim}_{δ ↓ 0} lim {sup}_{n} E^{*} I_{3} (δ) \leq {lim}_{δ ↓ 0} lim {sup}_{n} E^{*} E_{G}^{*} {| | {\overset{ˇ}{V}}^{#} | |}_{F_{δ}} = 0$ . Finally, the term I₄(δ) does not depend on δ, and we have $I_{4} (δ) \leq 2 P_{G}^{*} (∣ {\hat{Ξ}}_{1} - Ξ_{1} ∣ + | | {\hat{V}}^{#} - {\overset{ˇ}{V}}^{#} | | > ε) + ε$ . By unconditional convergence, we have I₄(δ) → 0 in outer probability.

Finally, set $Ψ ({\hat{Ξ}}_{1}^{#}, {\hat{V}}^{#}) = [{\overset{ˇ}{Ξ}}^{#}, {\overset{ˇ}{W}}_{0}^{#}]$ , where

\begin{array}{l} {\overset{ˇ}{Ξ}}^{#} = {\hat{Ξ}}_{1}^{#} - \sum_{1}^{- 1} (θ_{0}) \sum_{j} \int_{0}^{τ} ρ_{j, ϕ} (u, θ_{0}) {E N}_{j . .} (d u) {\overset{ˇ}{W}}_{0}^{#} {(u)}^{T}, \\ {\overset{ˇ}{W}}_{0}^{#} (x) = \int_{0}^{x} {\hat{V}}^{#} (d u) P_{θ_{0}} (u, x) \\ = {\hat{V}}^{#} (x) - \int_{0}^{x} {\hat{V}}^{#} (u -) Q_{θ_{0}} (d u) P_{θ_{0}} (u, x) . \end{array}

The estimates [Ξ̂^#, ${\hat{W}}_{0}^{#}$ ] defined in Section 4 are $[{\hat{Ξ}}^{#}, {\hat{W}}_{0}^{#}] = \hat{Ψ} ({\hat{Ξ}}_{1}^{#}, {\hat{V}}^{#})$ , where Ψ̂ is the sample analogue of Ψ obtained by plugging in the estimates Inline graphic ,Q̂_θ̂, ρ_{j, ϕ_n}(·, θ̂₀). By the continuous mapping theorem, unconditionally, $Ψ ({\hat{Ξ}}_{1}^{#}, {\hat{V}}^{#}) \Rightarrow Ψ (Ξ_{1}^{#}, V^{#}) = (Ξ^{#}, W_{0}^{#})$ . By triangular inequality one more time, we have ${sup}_{h \in {B L}_{1}} ∣ E_{G} h ({\hat{Ξ}}^{#}, {\hat{W}}_{0}^{#}) - E h (Ξ^{#}, W_{0}^{#}) ∣ \leq J_{1} + J_{2}$ , where

\begin{array}{l} J_{1} = sup_{h \in {B L}_{1}} ∣ E_{G} h ({\hat{Ξ}}^{#}, {\hat{W}}_{0}^{#}) - E_{G} h ({\overset{ˇ}{Ξ}}^{#}, {\overset{ˇ}{W}}_{0}^{#}) ∣, \\ J_{2} = sup_{h \in {B L}_{1}} ∣ E_{G} h ({\overset{ˇ}{Ξ}}^{#}, {\overset{ˇ}{W}}_{0}^{#}) - E h (Ξ^{#}, W_{0}^{#}) ∣ . \end{array}

For any Lipschitz continuous function h ∈ BL₁, h ∘ Ψ ∈ BL_c for some constant c. Therefore the preceding implies that J₂ tends to 0 in outer probability. This also holds for the term J₁, because ||Ξ̌^# − Ξ̂^#||→_P^* 0 and ${| | {\hat{W}}_{0}^{#} - {\overset{ˇ}{W}}_{0}^{#} | |}_{1} \to_{P^{*}} 0$ , by consistency of the estimates (θ̂, Γ_nθ̂) and integration by parts.

Acknowledgments

The data presented here were obtained from the Statistical Center of the Center for International Blood and Marrow Transplant Research (CIBMTR). The analysis has not been reviewed or approved by the Advisory or Scientific Committee of the CIBMTR. The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and marrow transplant patients with the CIBMTR Data Collection Center located in the Medical College, Wisconsin. The CIBMTR is a repository of information about results of transplant at more than 450 transplant centers worldwide. I thank Mei-Jie Zhang for preparation of the data and some discussions. I also thank a reviewer and Editor Daniel Commenges for their comments. Research supported by the grant R01 AI067943 from the National Institute of Allergy and Infectious Diseases. The content is solely the responsibility of the author and does not necessarily represent the official views of NIAID, NIH or CIBMTR.

References

1.Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer; New York: 1993. [Google Scholar]
2.Arjas E, Eerola M. On predictive causality in longitudinal studies. J Statist Planning and Inference. 1993;34:361–386. [Google Scholar]
3.Bagdonovicius V, Nikulin M. Generalized proportional hazards model based on modified partial likelihood. Lifetime Data Analysis. 1999;5:329–350. doi: 10.1023/a:1009688109364. [DOI] [PubMed] [Google Scholar]
4.Bagdonovicius M, Hafdi MA, Nikulin M. Analysis of survival data with cross-effects of survival functions. Biostatistics. 2004;5:415–425. doi: 10.1093/biostatistics/5.3.415. [DOI] [PubMed] [Google Scholar]
5.Beesack PR. Carlton Math Lecture Notes. Vol. 11. Carlton University; Ottawa: 1973. Gronwall Inequalities. [Google Scholar]
6.Bickel PJ. Efficient testing in a class of transformation models. Proceedings of the 45th Session of the International Statistical Institute; ISI, Amsterdam. 1986. pp. 23.3.63–23.3.81. [Google Scholar]
7.Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation in semi-parametric models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]
8.Bickel PJ, Ritov Y. Local asymptotic normality ranks and covariates in transformation models. In: Pollard D, Yang G, editors. Festschrift for L LeCam. Springer; New York: 1995. [Google Scholar]
9.Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann Statist. 1997;25:662–683. [Google Scholar]
10.Chang I-S, Hsiung CA. Information and asymptotic efficiency in some generalized proportional hazard models for counting processes. Ann Statist. 1994;22:1275–1298. [Google Scholar]
11.Chang I-S, Chuang Y-C, Hsiung CA. A class of nonparametric k-sample tests for semi-Markov processes. Statistica Sinica. 1999;9:211–277. [Google Scholar]
12.Chang I-S, Hsiung CA, Wu S-M. Estimation in a proportional hazard model for semi-Markov counting process. Statistica Sinica. 2000;10:1257–1266. [Google Scholar]
13.Chen K, Jin Z, Ying Z. Semi-parametric analysis of transformation models with censored data. Biometrika. 2002;89:659–668. [Google Scholar]
14.Chintagunta P, Prasad AR. An empirical investigation of the “Dynamic McFadden” model of purchase timing and brand choice: implications for market structure. J Business and Economic Statist. 1998;16:2–12. [Google Scholar]
15.Cinlar E. Introduction to Stochastic Processes. Prentice-Hall; New Jersey: 1975. [Google Scholar]
16.Cook RJ, Lawless JF. The Statistical Analysis of Recurrent Events. Springer; New York: 2007. [Google Scholar]
17.Commenges D. Semi-Markov and non-homogeneous Markov models in medical studies. In: Janssen J, editor. Semi-Markov models. Plenum Press; New York: 1986. pp. 411–422. [Google Scholar]
18.Commenges D, Joly P, Gégout-Petit A, Liquet B. Choice between semi-parametric estimators for Markov and non-Markov multistate models from coarsened observations. Scand J Statist. 2007;34:33–52. [Google Scholar]
19.Cox DR. The statistical analysis of dependencies in point processes. In: Lewis PAW, editor. Symposium on Point Processes. Wiley; New York: 1973. [Google Scholar]
20.Cutler C, Antin JH. Peripheral blood stem cells for allogeneic transplantation: a review. Stem Cells. 2001;19:108–117. doi: 10.1634/stemcells.19-2-108. [DOI] [PubMed] [Google Scholar]
21.Cutler C, Giri S, Jeyapalan S, Paniagua D, Viswanathan A, Antin JH. Acute and chronic graft-versus-host disease after allogeneic peripheral blood stem-cell and bone marrow transplantation: a meta analysis. J Clin Oncol. 2001;19:3685–3691. doi: 10.1200/JCO.2001.19.16.3685. [DOI] [PubMed] [Google Scholar]
22.Dabrowska DM, Sun G, Horowitz MM. Cox regression in a Markov renewal model: an application to the analysis of bone marrow transplant data. J Amer Statist Assoc. 1994;89:867–877. [Google Scholar]
23.Dabrowska DM. Estimation of transition probabilities and bootstrap in a semi–parametric Markov renewal model. J Nonparametric Statist. 1995;5:237–259. [Google Scholar]
24.Dabrowska DM. Estimation in a class of semi-parametric transformation models. In: Rojo J, editor. Second Erich L Lehmann Symposium - Optimality. Vol. 49. Institute of Mathematical Statistics; 2006. pp. 166–216. Lecture Notes and Monograph Series. [Google Scholar]
25.Dabrowska DM. Information bounds and efficient estimation in a class of censored transformation models. Acta Applicandae Mathematicae. 2007;96:177–201. [Google Scholar]
26.Dabrowska DM. Estimation in a semi-parametric two-stage renewal regression model. Statistica Sinica. 2009;19:981–996. [PMC free article] [PubMed] [Google Scholar]
27.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. Springer; New York: 1988. [Google Scholar]
28.Dellacherie C, Meyer PA. Probabilities and Potentiel. Hermann; Paris: 1975. [Google Scholar]
29.de la Peña V, Giné H. Decoupling: From Dependence to Independence. Springer; New York: 1999. [Google Scholar]
30.Dudley RM. Uniform Central Limit Theorems. Cambridge University Press; 1999. [Google Scholar]
31.Eerola M. Probabilistic causality in longitudinal studies. Springer; New York: 1994. [Google Scholar]
32.Friedrichs B, Tichelli A, Bacigalupo A, Russel NH, Ruutu T, Beksac M, Hasenclever D, Socié G, Schmitz N. Long-term outcome and late effects in patients transplanted with mobilised blood or bone marrow: a randomised trial. Lancet Oncology. 2001;11:331–338. doi: 10.1016/S1470-2045(09)70352-3. [DOI] [PubMed] [Google Scholar]
33.Flowers MED, Parker PM, Johnston LJ, Matos AV, Storer B, Bensinger WI, Storb R, Appelbaum FR, Forman SJ, Blume KG, Martin PJ. Comparison of chronic graft-versus-host disease after transplantation of peripheral blood stem cells versus bone marrow in allogeneic recipients: long-term follow-up of a randomized trial. Blood. 2002;100:415–419. doi: 10.1182/blood-2002-01-0011. [DOI] [PubMed] [Google Scholar]
34.Gale RP, Bortin MM, van Bekkum DW, Biggs JC, Dicke KA, Gluck-man E, Good RA, Hoffman RG, Key HEM, Kersey JH, Marmont A, Masaoka T, Rimm AA, van Rood JJ, Zwaan FE. Risk factors for acute graft-versus-host disease. Br J Haematol. 1987;67:397–406. doi: 10.1111/j.1365-2141.1987.tb06160.x. [DOI] [PubMed] [Google Scholar]
35.Gill RD. Nonparametric estimation based on censored observations of a Markov renewal process. Z Wahrscheinlichkeitstheorie verv Gebiete. 1980;53:97–116. [Google Scholar]
36.Gill RD, Johansen S. A survey of product integration with a view toward application in survival analysis. Ann Statist. 1990;18:1501–1555. [Google Scholar]
37.Greenwood P, Wefelmeyer W. Empirical estimators for semi-Markov processes. Math Meth Statist. 1996;5:299–315. [Google Scholar]
38.Greenwood P, Müller UU, Wefelmeyer W. Semi-Markov processes and their applications. Commun Stat Theory Methods. 2004;33:419–435. [Google Scholar]
39.Himmelberg CJ. Measurable relations. Fund Math. 1975;87:53–72. [Google Scholar]
40.Hjort NL, Cleaskens G. Frequentist model average estimators. J Amer Statist Assoc. 2003;98:938–945. [Google Scholar]
41.Hjort NL, Cleaskens G. Focused information criteria and model averaging for Cox’s hazard regression model. J Amer Statist Assoc. 2006;101:1449–1464. [Google Scholar]
42.Jacod J. Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Z Wahrscheinlichkeitstheorie verv Gebiete. 1975;31:235–254. [Google Scholar]
43.Janssen J. Semi-Markov Models: Theory and Applications. Springer; New York: 1999. [Google Scholar]
44.Janssen J, Manca R. Semi-Markov Risk Models for Finance, Insurance and Reliability. Springer; New York: 2007. [Google Scholar]
45.Janssen J, Manca R. Applied Semi-Markov Processes. Springer; New York: 2006. [Google Scholar]
46.Janssen J, Limnios N. International Symposium on Semi-Markov Models: Theory and Applications. Kluwer: Academic Press; 2001. [Google Scholar]
47.Jones MP, Crowley JJ. Nonparametric tests of the Markov model for survival data. Biometrika. 1992;79:513–522. [Google Scholar]
48.Kalbfleisch JD, Prentice RL. Statistical Analysis of Failure Time Data. Wiley; 1981. [Google Scholar]
49.Karr AF. Point Processes and their Statistical Inference. Marcel Dekker; New York: 1991. [Google Scholar]
50.Keiding N. Statistical analysis of semi-Markov models based on the theory of counting processes. In: Janssen J, editor. Semi-Markov models Theory and Applications. Plenum Press; 1986. pp. 301–315. [Google Scholar]
51.Keiding N, Klein JP, Horowitz MM. Multistate models and outcome prediction in bone marrow transplantation. Statist Med. 2001;20:1871–1885. doi: 10.1002/sim.810. [DOI] [PubMed] [Google Scholar]
52.Klein JP, Keiding N, Copelan EA. Plotting summary predictions in multistate survival models: probabilities of relapse and death in remission for bone marrow transplantation patients. Statist Med. 1993;12:2315–2332. doi: 10.1002/sim.4780122408. [DOI] [PubMed] [Google Scholar]
53.Fillipov A. On certain questions in the theory of optimal control. Vestnik Moskov Univ Ser Mat Meh Astronom Fiz Him. 1962;2:25–32. (1959) English Translation 1 76–84. [Google Scholar]
54.Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazard models. Ann Statist. 2004;32:1448–1491. [Google Scholar]
55.Kuratowski K. Topology. Academic Press; 1966. [Google Scholar]
56.Lagakos SW, Sommer CJ, Zelen M. Semi-Markov models for censored data. Biometrika. 1978;65:311–317. [Google Scholar]
57.Last G, Brandt A. Marked Point Processes on the Real Line: the Dynamic Approach. Springer; New York: 1995. [Google Scholar]
58.Limnios N, Oprisan . Semi-Markov Processes and Reliability. Vol. 2001 Springer; 2001. [Google Scholar]
59.Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]
60.Lo SMS, Wilke RA. A copula model for dependent competing risks. Appl Statist. 2010;59:359–376. [Google Scholar]
61.Martinussen T, Scheike T. Dynamic Regression Models for Survival Data. Springer; New York: 2006. [Google Scholar]
62.Moore EM, Pyke R. Estimation of the transition distributions of a Markov renewal process. Ann Inst Stat Math. 1968;20:411–468. [Google Scholar]
63.Nolan D, Pollard D. U-processes: rates of convergence. Ann Statist. 1987;15:780–799. [Google Scholar]
64.Oakes D. Survival analysis: aspects of partial likelihood (with discussion) Int Statist Rev. 1981;49:235–264. [Google Scholar]
65.Oakes D, Cui L. On semi-parametric inference for modulated renewal processes. Biometrika. 1994;81:83–91. [Google Scholar]
66.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov kernels with applications to reliability analysis. Appl Stochastic Models and Data Analysis. 1996;12:209–220. [Google Scholar]
67.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov processes based on its hazard rate functions. Stat Inference Stoch Processes. 1999;2:151–173. [Google Scholar]
68.Pollard D. Convergence of Stochastic Processes. Springer Verlag; New York: 1984. [Google Scholar]
69.Pollard D. Inst Math Statist. Hayward: 1990. Empirical Processes: Theory and Applications. [Google Scholar]
70.Phelan MF. Bayes estimation from a Markov renewal process. Ann Statist. 1999;18:603–616. [Google Scholar]
71.Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Statist Med. 2007;26:2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]
72.Pyke R. Markov renewal processes: definitions and preliminary properties. Ann Math Statist. 1961a;32:1231–1242. [Google Scholar]
73.Pyke R. Markov renewal processes with finitely many states. Ann Math Statist. 1961b;32:1243–1259. [Google Scholar]
74.Pyke R, Schaufele R. Limit theorems for Markov renewal processes. Ann Math Statist. 1964;35:1746–1764. [Google Scholar]
75.Pyke R, Schaufele R. The existence and uniqueness of stationary measures for Markov renewal processes. Ann Math Statist. 1966;37:1439–1462. [Google Scholar]
76.Ringden O, Labopin M, Bacigalupo A, Arcese W, Schaefer UW, Willemze R, Koc H, Bunjes D, Gluckman E, Rocha V, Schattenberg A, Frassoni F. Transplantation of peripheral blood stem cell as compared with bone marrow from HLA-identical siblings in adult patients with acute myeloid leukemia and acute lymphoblastic leukemia. J Clin Oncol. 2002;20(24):4655–4664. doi: 10.1200/JCO.2002.12.049. [DOI] [PubMed] [Google Scholar]
77.Rivest LP, Wells MT. A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multiv Analysis. 2001;79:138–155. [Google Scholar]
78.Teicher H. On the Marcinkiewicz-Zygmund strong law for U-statistics. J Theoret Probab. 1998;11:279–288. [Google Scholar]
79.van der Vaart AW, Wellner JA. Weak convergence and Empirical Processes with Applications to Statistics. Springer; New York: 1996. [Google Scholar]
80.Voelkel JG, Crowley JJ. Nonparametric inference for a class of semi-Markov processes with censored observations. Ann Statist. 1984;12:142–160. [Google Scholar]
81.Wagner DH. Survey of measurable selection theorems. SIAM, J Control and Optimization. 1977;15:859–903. [Google Scholar]
82.Weiss GH, Zelen M. A semi-Markov model for clinical trials. J Appl Probab. 1965;2:269–285. [Google Scholar]
83.Zheng M, Klein JP. estimates of marginal survival for dependent competing risks based on an assumed copula model. Biometrika. 1995;82:127–138. [Google Scholar]

[R1] 1.Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer; New York: 1993. [Google Scholar]

[R2] 2.Arjas E, Eerola M. On predictive causality in longitudinal studies. J Statist Planning and Inference. 1993;34:361–386. [Google Scholar]

[R3] 3.Bagdonovicius V, Nikulin M. Generalized proportional hazards model based on modified partial likelihood. Lifetime Data Analysis. 1999;5:329–350. doi: 10.1023/a:1009688109364. [DOI] [PubMed] [Google Scholar]

[R4] 4.Bagdonovicius M, Hafdi MA, Nikulin M. Analysis of survival data with cross-effects of survival functions. Biostatistics. 2004;5:415–425. doi: 10.1093/biostatistics/5.3.415. [DOI] [PubMed] [Google Scholar]

[R5] 5.Beesack PR. Carlton Math Lecture Notes. Vol. 11. Carlton University; Ottawa: 1973. Gronwall Inequalities. [Google Scholar]

[R6] 6.Bickel PJ. Efficient testing in a class of transformation models. Proceedings of the 45th Session of the International Statistical Institute; ISI, Amsterdam. 1986. pp. 23.3.63–23.3.81. [Google Scholar]

[R7] 7.Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation in semi-parametric models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]

[R8] 8.Bickel PJ, Ritov Y. Local asymptotic normality ranks and covariates in transformation models. In: Pollard D, Yang G, editors. Festschrift for L LeCam. Springer; New York: 1995. [Google Scholar]

[R9] 9.Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann Statist. 1997;25:662–683. [Google Scholar]

[R10] 10.Chang I-S, Hsiung CA. Information and asymptotic efficiency in some generalized proportional hazard models for counting processes. Ann Statist. 1994;22:1275–1298. [Google Scholar]

[R11] 11.Chang I-S, Chuang Y-C, Hsiung CA. A class of nonparametric k-sample tests for semi-Markov processes. Statistica Sinica. 1999;9:211–277. [Google Scholar]

[R12] 12.Chang I-S, Hsiung CA, Wu S-M. Estimation in a proportional hazard model for semi-Markov counting process. Statistica Sinica. 2000;10:1257–1266. [Google Scholar]

[R13] 13.Chen K, Jin Z, Ying Z. Semi-parametric analysis of transformation models with censored data. Biometrika. 2002;89:659–668. [Google Scholar]

[R14] 14.Chintagunta P, Prasad AR. An empirical investigation of the “Dynamic McFadden” model of purchase timing and brand choice: implications for market structure. J Business and Economic Statist. 1998;16:2–12. [Google Scholar]

[R15] 15.Cinlar E. Introduction to Stochastic Processes. Prentice-Hall; New Jersey: 1975. [Google Scholar]

[R16] 16.Cook RJ, Lawless JF. The Statistical Analysis of Recurrent Events. Springer; New York: 2007. [Google Scholar]

[R17] 17.Commenges D. Semi-Markov and non-homogeneous Markov models in medical studies. In: Janssen J, editor. Semi-Markov models. Plenum Press; New York: 1986. pp. 411–422. [Google Scholar]

[R18] 18.Commenges D, Joly P, Gégout-Petit A, Liquet B. Choice between semi-parametric estimators for Markov and non-Markov multistate models from coarsened observations. Scand J Statist. 2007;34:33–52. [Google Scholar]

[R19] 19.Cox DR. The statistical analysis of dependencies in point processes. In: Lewis PAW, editor. Symposium on Point Processes. Wiley; New York: 1973. [Google Scholar]

[R20] 20.Cutler C, Antin JH. Peripheral blood stem cells for allogeneic transplantation: a review. Stem Cells. 2001;19:108–117. doi: 10.1634/stemcells.19-2-108. [DOI] [PubMed] [Google Scholar]

[R21] 21.Cutler C, Giri S, Jeyapalan S, Paniagua D, Viswanathan A, Antin JH. Acute and chronic graft-versus-host disease after allogeneic peripheral blood stem-cell and bone marrow transplantation: a meta analysis. J Clin Oncol. 2001;19:3685–3691. doi: 10.1200/JCO.2001.19.16.3685. [DOI] [PubMed] [Google Scholar]

[R22] 22.Dabrowska DM, Sun G, Horowitz MM. Cox regression in a Markov renewal model: an application to the analysis of bone marrow transplant data. J Amer Statist Assoc. 1994;89:867–877. [Google Scholar]

[R23] 23.Dabrowska DM. Estimation of transition probabilities and bootstrap in a semi–parametric Markov renewal model. J Nonparametric Statist. 1995;5:237–259. [Google Scholar]

[R24] 24.Dabrowska DM. Estimation in a class of semi-parametric transformation models. In: Rojo J, editor. Second Erich L Lehmann Symposium - Optimality. Vol. 49. Institute of Mathematical Statistics; 2006. pp. 166–216. Lecture Notes and Monograph Series. [Google Scholar]

[R25] 25.Dabrowska DM. Information bounds and efficient estimation in a class of censored transformation models. Acta Applicandae Mathematicae. 2007;96:177–201. [Google Scholar]

[R26] 26.Dabrowska DM. Estimation in a semi-parametric two-stage renewal regression model. Statistica Sinica. 2009;19:981–996. [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. Springer; New York: 1988. [Google Scholar]

[R28] 28.Dellacherie C, Meyer PA. Probabilities and Potentiel. Hermann; Paris: 1975. [Google Scholar]

[R29] 29.de la Peña V, Giné H. Decoupling: From Dependence to Independence. Springer; New York: 1999. [Google Scholar]

[R30] 30.Dudley RM. Uniform Central Limit Theorems. Cambridge University Press; 1999. [Google Scholar]

[R31] 31.Eerola M. Probabilistic causality in longitudinal studies. Springer; New York: 1994. [Google Scholar]

[R32] 32.Friedrichs B, Tichelli A, Bacigalupo A, Russel NH, Ruutu T, Beksac M, Hasenclever D, Socié G, Schmitz N. Long-term outcome and late effects in patients transplanted with mobilised blood or bone marrow: a randomised trial. Lancet Oncology. 2001;11:331–338. doi: 10.1016/S1470-2045(09)70352-3. [DOI] [PubMed] [Google Scholar]

[R33] 33.Flowers MED, Parker PM, Johnston LJ, Matos AV, Storer B, Bensinger WI, Storb R, Appelbaum FR, Forman SJ, Blume KG, Martin PJ. Comparison of chronic graft-versus-host disease after transplantation of peripheral blood stem cells versus bone marrow in allogeneic recipients: long-term follow-up of a randomized trial. Blood. 2002;100:415–419. doi: 10.1182/blood-2002-01-0011. [DOI] [PubMed] [Google Scholar]

[R34] 34.Gale RP, Bortin MM, van Bekkum DW, Biggs JC, Dicke KA, Gluck-man E, Good RA, Hoffman RG, Key HEM, Kersey JH, Marmont A, Masaoka T, Rimm AA, van Rood JJ, Zwaan FE. Risk factors for acute graft-versus-host disease. Br J Haematol. 1987;67:397–406. doi: 10.1111/j.1365-2141.1987.tb06160.x. [DOI] [PubMed] [Google Scholar]

[R35] 35.Gill RD. Nonparametric estimation based on censored observations of a Markov renewal process. Z Wahrscheinlichkeitstheorie verv Gebiete. 1980;53:97–116. [Google Scholar]

[R36] 36.Gill RD, Johansen S. A survey of product integration with a view toward application in survival analysis. Ann Statist. 1990;18:1501–1555. [Google Scholar]

[R37] 37.Greenwood P, Wefelmeyer W. Empirical estimators for semi-Markov processes. Math Meth Statist. 1996;5:299–315. [Google Scholar]

[R38] 38.Greenwood P, Müller UU, Wefelmeyer W. Semi-Markov processes and their applications. Commun Stat Theory Methods. 2004;33:419–435. [Google Scholar]

[R39] 39.Himmelberg CJ. Measurable relations. Fund Math. 1975;87:53–72. [Google Scholar]

[R40] 40.Hjort NL, Cleaskens G. Frequentist model average estimators. J Amer Statist Assoc. 2003;98:938–945. [Google Scholar]

[R41] 41.Hjort NL, Cleaskens G. Focused information criteria and model averaging for Cox’s hazard regression model. J Amer Statist Assoc. 2006;101:1449–1464. [Google Scholar]

[R42] 42.Jacod J. Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Z Wahrscheinlichkeitstheorie verv Gebiete. 1975;31:235–254. [Google Scholar]

[R43] 43.Janssen J. Semi-Markov Models: Theory and Applications. Springer; New York: 1999. [Google Scholar]

[R44] 44.Janssen J, Manca R. Semi-Markov Risk Models for Finance, Insurance and Reliability. Springer; New York: 2007. [Google Scholar]

[R45] 45.Janssen J, Manca R. Applied Semi-Markov Processes. Springer; New York: 2006. [Google Scholar]

[R46] 46.Janssen J, Limnios N. International Symposium on Semi-Markov Models: Theory and Applications. Kluwer: Academic Press; 2001. [Google Scholar]

[R47] 47.Jones MP, Crowley JJ. Nonparametric tests of the Markov model for survival data. Biometrika. 1992;79:513–522. [Google Scholar]

[R48] 48.Kalbfleisch JD, Prentice RL. Statistical Analysis of Failure Time Data. Wiley; 1981. [Google Scholar]

[R49] 49.Karr AF. Point Processes and their Statistical Inference. Marcel Dekker; New York: 1991. [Google Scholar]

[R50] 50.Keiding N. Statistical analysis of semi-Markov models based on the theory of counting processes. In: Janssen J, editor. Semi-Markov models Theory and Applications. Plenum Press; 1986. pp. 301–315. [Google Scholar]

[R51] 51.Keiding N, Klein JP, Horowitz MM. Multistate models and outcome prediction in bone marrow transplantation. Statist Med. 2001;20:1871–1885. doi: 10.1002/sim.810. [DOI] [PubMed] [Google Scholar]

[R52] 52.Klein JP, Keiding N, Copelan EA. Plotting summary predictions in multistate survival models: probabilities of relapse and death in remission for bone marrow transplantation patients. Statist Med. 1993;12:2315–2332. doi: 10.1002/sim.4780122408. [DOI] [PubMed] [Google Scholar]

[R53] 53.Fillipov A. On certain questions in the theory of optimal control. Vestnik Moskov Univ Ser Mat Meh Astronom Fiz Him. 1962;2:25–32. (1959) English Translation 1 76–84. [Google Scholar]

[R54] 54.Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazard models. Ann Statist. 2004;32:1448–1491. [Google Scholar]

[R55] 55.Kuratowski K. Topology. Academic Press; 1966. [Google Scholar]

[R56] 56.Lagakos SW, Sommer CJ, Zelen M. Semi-Markov models for censored data. Biometrika. 1978;65:311–317. [Google Scholar]

[R57] 57.Last G, Brandt A. Marked Point Processes on the Real Line: the Dynamic Approach. Springer; New York: 1995. [Google Scholar]

[R58] 58.Limnios N, Oprisan . Semi-Markov Processes and Reliability. Vol. 2001 Springer; 2001. [Google Scholar]

[R59] 59.Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]

[R60] 60.Lo SMS, Wilke RA. A copula model for dependent competing risks. Appl Statist. 2010;59:359–376. [Google Scholar]

[R61] 61.Martinussen T, Scheike T. Dynamic Regression Models for Survival Data. Springer; New York: 2006. [Google Scholar]

[R62] 62.Moore EM, Pyke R. Estimation of the transition distributions of a Markov renewal process. Ann Inst Stat Math. 1968;20:411–468. [Google Scholar]

[R63] 63.Nolan D, Pollard D. U-processes: rates of convergence. Ann Statist. 1987;15:780–799. [Google Scholar]

[R64] 64.Oakes D. Survival analysis: aspects of partial likelihood (with discussion) Int Statist Rev. 1981;49:235–264. [Google Scholar]

[R65] 65.Oakes D, Cui L. On semi-parametric inference for modulated renewal processes. Biometrika. 1994;81:83–91. [Google Scholar]

[R66] 66.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov kernels with applications to reliability analysis. Appl Stochastic Models and Data Analysis. 1996;12:209–220. [Google Scholar]

[R67] 67.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov processes based on its hazard rate functions. Stat Inference Stoch Processes. 1999;2:151–173. [Google Scholar]

[R68] 68.Pollard D. Convergence of Stochastic Processes. Springer Verlag; New York: 1984. [Google Scholar]

[R69] 69.Pollard D. Inst Math Statist. Hayward: 1990. Empirical Processes: Theory and Applications. [Google Scholar]

[R70] 70.Phelan MF. Bayes estimation from a Markov renewal process. Ann Statist. 1999;18:603–616. [Google Scholar]

[R71] 71.Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Statist Med. 2007;26:2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]

[R72] 72.Pyke R. Markov renewal processes: definitions and preliminary properties. Ann Math Statist. 1961a;32:1231–1242. [Google Scholar]

[R73] 73.Pyke R. Markov renewal processes with finitely many states. Ann Math Statist. 1961b;32:1243–1259. [Google Scholar]

[R74] 74.Pyke R, Schaufele R. Limit theorems for Markov renewal processes. Ann Math Statist. 1964;35:1746–1764. [Google Scholar]

[R75] 75.Pyke R, Schaufele R. The existence and uniqueness of stationary measures for Markov renewal processes. Ann Math Statist. 1966;37:1439–1462. [Google Scholar]

[R76] 76.Ringden O, Labopin M, Bacigalupo A, Arcese W, Schaefer UW, Willemze R, Koc H, Bunjes D, Gluckman E, Rocha V, Schattenberg A, Frassoni F. Transplantation of peripheral blood stem cell as compared with bone marrow from HLA-identical siblings in adult patients with acute myeloid leukemia and acute lymphoblastic leukemia. J Clin Oncol. 2002;20(24):4655–4664. doi: 10.1200/JCO.2002.12.049. [DOI] [PubMed] [Google Scholar]

[R77] 77.Rivest LP, Wells MT. A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multiv Analysis. 2001;79:138–155. [Google Scholar]

[R78] 78.Teicher H. On the Marcinkiewicz-Zygmund strong law for U-statistics. J Theoret Probab. 1998;11:279–288. [Google Scholar]

[R79] 79.van der Vaart AW, Wellner JA. Weak convergence and Empirical Processes with Applications to Statistics. Springer; New York: 1996. [Google Scholar]

[R80] 80.Voelkel JG, Crowley JJ. Nonparametric inference for a class of semi-Markov processes with censored observations. Ann Statist. 1984;12:142–160. [Google Scholar]

[R81] 81.Wagner DH. Survey of measurable selection theorems. SIAM, J Control and Optimization. 1977;15:859–903. [Google Scholar]

[R82] 82.Weiss GH, Zelen M. A semi-Markov model for clinical trials. J Appl Probab. 1965;2:269–285. [Google Scholar]

[R83] 83.Zheng M, Klein JP. estimates of marginal survival for dependent competing risks based on an assumed copula model. Biometrika. 1995;82:127–138. [Google Scholar]

PERMALINK

Estimation in a semi-Markov transformation model

Dorota M Dabrowska

Abstract

1 Introduction

2 The model

Condition 2.1

Lemma 2.1

3 Estimation

Proposition 3.1

Proposition 3.2

4 Example

Table 1.

Table 4.

Table 5.

Table 3.

Figure 1.

Table 2.

Figure 2.

Figure 3.

Figure 4.

5 Proofs

5.1 Assumptions and notation

Condition 5.1

5.2 Some measurability issues

Lemma 5.1

5.3 Proof of Proposition 3.1

Lemma 5.2

Proof

Lemma 5.3

Proof

Lemma 5.4

Proof

Lemma 5.5

Proof

Case (1)

Case (2)

Case (3)

5.4 Proof of Proposition 3.2

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases