Abstract
Multi-state models provide a common tool for analysis of longitudinal failure time data. In biomedical applications, models of this kind are often used to describe evolution of a disease and assume that patient may move among a finite number of states representing different phases in the disease progression. Several authors developed extensions of the proportional hazard model for analysis of multi-state models in the presence of covariates. In this paper, we consider a general class of censored semi-Markov and modulated renewal processes and propose the use of transformation models for their analysis. Special cases include modulated renewal processes with interarrival times specified using transformation models, and semi-Markov processes with with one-step transition probabilities defined using copula-transformation models. We discuss estimation of finite and infinite dimensional parameters of the model, and develop an extension of the Gaussian multiplier method for setting confidence bands for transition probabilities. A transplant outcome data set from the Center for International Blood and Marrow Transplant Research is used for illustrative purposes.
1 Introduction
We consider estimation in a semi-Markov regression model with a finite state space
= {1, …, r}. In the absence of covariates, the model can be described by a sequence (T, J) = {(Tn, Jn): n ≥ 0}, where T0 < T1 < T2 … are consecutive times of entrances into the states J0, J1, J2, …, Jn ∈
= {1, …, r}. The sequence J = {Jn: n ≥ 0} of states visited forms a Markov chain and given J, the sojourn times T1, T2 − T1, … are independent with distributions depending on the adjoining states only. Alternatively, the distribution of the sojourn times Tn+1 − Tn, n ≥ 0 satisfies
Properties of semi-Markov processes were discussed in some detail in classical papers of Pyke (1961,a b), Pyke and Schaufele (1964,1966), and textbooks of Cinlar (1975), Daley and Vere-Jones (1988), Karr (1991), Last and Brandt (1995) and Limnios and Oprisan (2001). Numerous examples of applications to areas such as reliability, insurance and finance were provided by Janssen (1999), Janssen and Manca (2006,2007) and Janssen and Limnios (2001), for instance. In such studies, it is most common to consider estimation methods assuming that a single realization of a semi-Markov process is observed over a finite time interval [0, τ] whose length tends to infinity (τ ↑ ∞). Greenwood and Wefelmeyer (1996) and Greenwood, Müller and Wefelmeyer (2004) developed a general framework for analysis of non- and semi-parametric semi-Markov processes in this setting. In particular, they studied properties of classical estimators of the jump frequency and the proportion of visits to a given state, as well as Moore and Pyke’s (1968) non-parametric estimator of the kernel of the process. Estimation of transition intensities and transition probabilities was considered by Ouhbi and Limnios (1996,1999).
In survival analysis, it is more common to consider estimation based on a large number of iid copies of a semi-Markov process observed over a deterministic or random time intervals. Lagakos, Sommer and Zelen (1978), Gill (1980), Voelkel and Crowley (1984) and Phelan (1999) developed nonparametric estimators of the semi-Markov kernel of the process in the presence of random censoring. Examples of applications of these processes to analysis of survival data can be found in Commenges (1986), Keiding (1986), Dabrowska et al. (1994), Chang et al. (1994, 1999,2000), Cook and Lawless (2007), among others.
In this paper, we assume that the evolution of the process (Tm, Jm)m≥0 depends also on an Rd-valued covariate (Zm)m≥0, Zm = [Zjm: j ∈
], which represents either a vector of time independent covariates, or a vector of time dependent covariates changing at the successive renewal times. As an extension of the semi-Markov process to the regression setting, Cox (1973) proposed to consider a proportional hazards modulated renewal process. More precisely, let Ñ = {Ñj(t): t ≥ 0, j = (j1, j2) ∈
×
} be the counting process registering transitions among adjoining states of the model,
Cox’s model assumes that the compensator of this process, relative to the self-exciting filtration {
}t≥0, is given by Λj(0) = 0,
for t ∈ (Tm, Tm+1] and j = (j1, j2) ∈
×
. Here β is a regression coefficient and Γj in an unknown cumulative hazard function. If covariates are time independent and Γj(x) = γjx, the process reduces to a Markov chain regression model. In the general case, the modulated renewal process allows to incorporate dependence of the history on the sequence of states visited and the length of time spent in each state. As a result of this, it has a more flexible structure than Markov chains.
The purpose of this paper is to extend Cox’s modulated process to a class of transformation models. In the case of single spell models, they provide a common alternative to the proportional hazard model. In particular, they may be more appropriate than the proportional hazard model if relative differences between covariates dissipate or diverge over time. As an extension to multistate models, we consider here a modulated renewal process assuming that the counting process Ñ has compensator given by Λj(0) = 0,
(1.1) |
for t ∈ (Tm, Tm+1] and j = (j1, j2) ∈
×
. For any such pair j = (j1, j2), αj is a hazard function dependent on an unknown Euclidean parameter θ and a vector of unknown increasing functions
. The components of Γ(j1,.) depend on all states which can be reached from the state j1 in one step. If covariates are time independent, then (1.1) includes as a special case renewal processes whose interarrival times satisfy common transformation models. Other choices include semi-Markov models with one-step transition probabilities defined using copula graphic models (e.g. Zheng and Klein (1995), Rivest and Wells (2001), Lo and Wilke (2010)) or extensions of the dynamic Cox-McFadden’s model (Chintagunta and Prasad (1998)) combining transformation models and multinomial regression. These models are defined in more detail in Section 2, where covariates are also allowed to change at the renewal times of the process.
For purposes of estimation, we consider a modification of procedures studied by Bagdonovicius and Nikulin (1999,2004) and Dabrowska (2006) in the case of single spell transformation models. Section 3 provides properties of the estimates as well as an extension of the Gaussian multiplier method of Lin et al. (1994) for setting point-wise and simultaneous confidence bands for the unknown transformations and related parameters. In analogy to Cox’s model, the counting process Ñ has a compensator depending on the backwards recurrence time and as a result of this, it falls outside the class of multiplicative models studied by Andersen et al. (1993), for instance. In the case of Cox’s modulated renewal process or non-parametric semi-Markov models, estimation of the cumulative hazards of one-step transitions leads to a time transformation which arranges observations according to the length of time spent in each state rather than calendar time. As a result of the rearrangement of the time scale, usual counting process methods for analysis of large sample properties of stochastic integrals do not apply (Gill (1980), Oakes (1981), Oakes and Cui (1994)). To alleviate these problems, we use Hoeffding’s projection method and empirical processes in Section 5.
In Section 4, we consider a transplant outcome data set from the Center for International Blood and Marrow Transplant Research (CIBMTR). The example data set consists of patients who received HLA-identical sibling transplant from 1995 to 2004 for acute myelogenous leukemia (AML) or acute lymphoblastic leukemia (ALL). Multistate models for analysis of the bone marrow transplant recovery process have been proposed by several authors. The early work in this area focused on competing risk models and goes back Prentice et al. (1978) who discussed estimation of cause specific cumulative hazards in the proportional hazard model. More recent approaches towards analysis of leukemia transplant data are based on multistate models. They provide a convenient tool for evaluation of the impact of intermediate events in the transplant recovery process on the main outcome events corresponding to leukemia relapse and death in remission. However, analysis of multistate regression models leads to some difficulties in the interpretation of the results because there is no one-to-one correspondence between regression coefficients and transition probabilities. Each covariate may increase the risk of transition among some states of the model and at the same time decrease it among the others. Correspondingly, its overall impact on the outcome events is often not clear. To obviate difficulties, Arjas and Eerola (1993) and Eerola (1994) proposed a set of graphical tools which can be used for purposes of interpretation of regression analyzes based on multistate models. These included graphs of innovation gains and plots of the transition probabilities evaluated by conditioning on the follow-up history of a patient. The approach was illustrated using a proportional hazard model with time dependent covariates in Eerola (1994). Applications of these methods to proportional hazard Markov chain models were given in Klein et al. (1993) and Keiding et al. (2001) and Andersen and Parme (2008), and proportional hazard semi-Markov models in Dabrowska et al. (1993, 2006). Putter et al. (2007) discussed special cases of both models.
In this paper, we consider a data set involving patients who received either bone marrow (BMT) or peripheral blood stem cell transplant (PBSCT). Many clinical studies have reported that PBSCT may be beneficial during the early post-transplant period as it leads to faster engraftment and hematopoietic recovery than BMT (e.g. Flowers et al. 2002, Ringden et al. 2002). Several studies have also pointed out that differences between the two transplant types may dissipate over time (e.g. Friedrichs et al. 2010, Cutler et al. 2002ab). Such dissipating time effects are better captured by the proportional odds ratio model than the proportional hazard model, and in Section 5 we discuss an extension of it to semi-Markov models. In this section we also propose pointwise and simultaneous confidence bands for comparison of transition probabilities.
2 The model
Throughout the paper we assume that (Ω,
, P) is a complete probability space and (Tm, Vm)m≥0 is a marked point process defined on it with marks taking on values in a separable measure space (E,
) and enlarged by the empty mark Δ. Thus T0 < T1 < … Tm … is a sequence of random time points registering occurrence of some events in time such that Tm are almost surely distinct and Tm ↑ ∞ P-a.s. At time Tm we observe a variable Vm such that Vm ∈ E if Tm < ∞, and Vm = Δ if Tm = ∞.
For any B ∈
, let Ñ(t, B) = Σm≥0 1(Tm+1 ≤ t, Vm+1 ∈ B) be the process counting observations falling into the set [0, t] × B. The internal history of the process,
, represents information collected on Ñ until time t, and is given by
. Let
be the self-exciting filtration associated with the process Ñ, obtained by adjoining the P-null sets to the internal history of the process. The compensator of the process Ñ with respect to
is given by
where Pm(d(s, v)) is a version of a regular conditional distribution of (Tm+1, Vm+1) given
(Jacod (1975)).
In this paper we assume that the marks Vm have the form Vm = (Jm, Z̃m), where Jm ∈
= {1, …, r} is a discrete variable representing the type of the event occurring at time Tm and Z̃m are covariates taking on value in Rd. The covariate Z̃m may correspond to some measurements taken upon entrance into the state Jm. The process Ñ = [Ñj, j = (j1, j2) ∈
×
],
has compensator given by
for t ∈ (Tm, Tm+1]. Here μm+1(B, Tm+1, Jm, Jm+1) is the conditional probability of the event {Z̃m+1∈ B} given σ(
, Tm+1, Jm+1). Further, Zj1m = gj1m(Tl, Jl, Z̃l: l = 0, …, m) is a fixed Rd valued function, measurable with respect to
. Finally, αj denotes a hazard rate dependent on a Euclidean parameter θ and a vector of unknown monotone increasing functions Γ (j1,.) = [Γj: j = (j1, j2) ∈
×
]. In particular, setting B = Rd and using μm+1(Rd, Tm+1, Jm, Jm+1)1(Tm+1 < ∞) = 1 P-a.s., Λ̃j(t, Rd) reduces to (1.1) and represents the compensator of the “marginal” counting process
(2.1) |
registering transitions among the adjoining states of the model.
To give examples of the model, we assume first that the covariates are time independent. If events are of a single type (|
| = 1), then (1.1) represents compensator of a renewal regression model assuming that the interarrival times follow a transformation model. Thus in this case {α(u, θ, Z): θ ∈ Θ} is a parametric family of hazard rates, and the model stipulates that conditionally on Z, the interarrival times, Xm+1 are independent and their conditional survival function has cumulative hazard function A(Γ(x), θ, Z).
Simple examples of multi-type processes are given by competing risk and semi-Markov regression models. In particular, a semi-Markov regression model assumes that one-step transition probabilities satisfy
The matrix [Fj, j = (j1, j2) ∈
×
],
forms the kernel of the process. One way to define it is to consider latent variable models. Specifically, suppose that transitions originating from the state j1 have the same conditional distribution as the pair (U, V), where
and [Uj: j = (j1, j2) ∈
×
] is a multivariate vector whose joint conditional survival function given Z is
Here u = [uj, j = (j1, j2) ∈
×
] and
is a known multivariate survival function with a density with respect to Lebesgue measure supported on the entire upper orthant of Rqj1, qj1 = |{j2: (j1, j2) ∈
×
}|. The functions αj in (1.1) are equal to
With this choice the cumulative intensity (1.1) corresponds to a semi-Markov model whose kernel is given by
(2.2) |
where j = (j1, j2) ∈
×
and F̄(j1,.)(x|z) is the survival function of the sojourn time in state j1,
(2.3) |
If the state space of the process consists of one ephemeral state (J0 = 1, say) and q − 1 absorbing states, q ≥ 3, then the semi-Markov process reduces to a competing risk model. In this case transition probabilities (2.2) provide a regression analogue of copula-graphic models proposed for analysis of competing risks by Zhang and Klein (1995) and Rivest and Wells (2001). The special case of Archimedean copula models corresponds to the choice , where S̄ is a known survival function with a density supported on the positive half-line and ||·||1 is the ℓ1-norm of a vector.
Another example of a semi-Markov model is provided by the dynamic Cox-McFadden model (Chintagunta and Prasad, 1998). In this case, the distribution of the sojourn time in state j1 ∈
is specified by means of a transformation model for univariate failure time data, i.e. the survival function (2.3) is of the form F̄(j1,.)(x|z) = exp[−Ãj1 (Γj1(x), θ1, z)] for some univariate cumulative hazard function Ãj1. The kernel of the process is given by
where F(j1,.)(·|z) = 1 − F̄(j1,.)(·|z) and for j = (j1, j2),
(2.4) |
are the one-step state transition probabilities. The state transition probabilities can be specified using multinomial regression models such as the logistic or probit model. If the state transition probabilities (2.4) do not depend on the length of the sojourn time Xm+1, the model reduces to a stationary process, i.e. conditionally on Z, the transition probabilities do not depend on m.
In practice, the assumptions of the semi-Markov process may be violated if transitions from a state j1 to a state j2 depend on the sequence or the time spent in states visited prior to the entrance into the state j1. Both models can accommodate this problem by allowing the covariates to depend on the internal history of the process. The time dependent covariates may represent for instance the total number of events occurring prior to the entrance into the state j2 or the length of time spent in states preceding entrance into the state j1. The time dependent covariates may also represent changing treatment types or levels of drugs.
We further assume that the process is subject to censoring and times at which the process is observed is determined by a process C(t) = Σm≥1 1(Cm−1 < t ≤ Cm), where 0 ≤ C0 ≤ C1 ≤ … ≤ Cm… is an increasing sequence such that Cm ∈ [Tm, Tm+1] are stopping times with respect to a larger filtration {
}t≥0,
⊆
. If Tm = Cm then no information is available on either the sojourn time Xm+1 = Tm+1 − Tm or the marks (Vm, Vm+1). If Cm = Tm+1 then the sojourn time Xm+1 = Tm+1 − Tm and the marks (Vm, Vm+1) are observable. Finally, if Tm < Cm < Tm+1 then the mark Vm is visible while the sojourn time Xm+1 is only known to exceed Cm − Tm. Following Andersen et al. (1993), we assume that the compensator Λ
, of the marked point process Ñ, relative to the filtration {
}t≥0, satisfies Λ
= Λ, P-a.s. and that the censoring process and the compensator Λ depend on parameters which do not share components in common. We also make the assumption that the censoring process is monotone so that with probability 1, Tm ≤ Cm < Tm+1 ⇒ Cm′ = Tm′ for all m′ > m. This condition stipulates that the process terminates once censoring takes place.
These conditions are satisfied in two common applications. The first assumes that the process is subject to censoring by a univariate failure time T′ such that T′ is independent of the the sequence (Tm, Vm), conditionally on the initial state of the process, V0. In this case, Cm = Tm + min(T′ − Tm, Xm+1)1(T′ ≥ Tm) and the augmented filtration is given by
=
∨ σ(T′).
The second example assumes that the state space of the process has an extra absorbing state corresponding censoring, say {c}, which can be reached in one step from each transient state j1 ∈
. Time T till entrance into the censoring state forms then stopping time with respect to the filtration
=
. Consequently, there exist nonnegative variables Um such that on the event {T ≥ Tm}, we have T ∧ Tm+1 = (Tm + Um) ∧ Tm+1, and Um is measurable with respect to
. Correspondingly, Cm = Tm + min(Um, Xm+1)1(T ≥ Tm). In this setting, the assumption of non-informative censoring means that the compensators of one-step transitions into the the censoring state depend on different parameters than the compensator of transitions among the remaining states of the model.
Let
⊂
×
be the set of pairs of adjacent states in the model, i.e. j = (j1, j2) ∈
iff the subject may progress from state j1 to state j2 in one step. For j = (j1, j2) ∈
and m ≥ 0, let Njm(x) = 1(Xm+1 ≤ x, Jm = j1, Jm+1 = j2, Tm = Cm+1), Yjm(v) = 1(Xm+1 ≥ x, Cm − Tm ≥ x, Jm = j1) and set
The aggregate processes Nj., Yj. and Mj. are defined as Nj. = Σm Njm, Yj. = Σm Yjm and Mj. =Σm Mjm, respectively.
Note that the model depends on two parameters, θ and Γ, however, we suppress the dependence on Γ in the notation. In analogy to single spell models in Bagdonovicius and Nikulin (1999,2004) and Dabrowska (2006), under regularity conditions stated in Section 5, we can associate, with any θ ∈ Θ, a vector Γθ of locally bounded increasing functions. For this purpose, we shall require only that the processes Nj. and Yj. have a finite expectation. To show asymptotic normality of estimates we shall require existence of the second moments of these processes. More precisely, we assume the following conditions.
Condition 2.1
For all j ∈
The functions EYj.(x) have at most a finite number of discontinuity points and EYj.(0)2 < ∞.
The functions ENj.(x) are continuous, ENj.(τ)2 < ∞ and the point τ satisfies inf{x: ENj.(x) > 0} < τ < τj0, where τj0 = sup{x: EYj.(x) > 0}.
We have P(|ZJ(t−),Ñ..(t−) | ≤ C) = 1, where C is a finite constant, J(t) is the state occupied by the process at time t and Ñ..(t) = ΣjÑj.(t) is the total number of events observed in the interval [0, t].
Under the added assumption that the model corresponds to the censored modulated renewal process, and θ represents the true parameter, we have the following moment identities.
Lemma 2.1
Let L(t) = t − TÑ..(t−) be the backwards time of the process Ñ and let {ϕm(x), m ≥ 0, x ≥ 0} be a sequence of random functions such that the process ϕ ∘ L, ϕ ∘ L(t) = ϕÑ..(t−)(t − TÑ..(t−)), is predictable with respect to the filtration {
}t≥0 and
. Then
In addition, if {ϕ1m: m ≥ 0} and {ϕ2m: m ≥ 0} are two such sequences, then
for pairs j ≠ j′, j, j′ ∈
.
Similarly to Gill (1980), this lemma follows from the dominated convergence theorem, martingale properties of the processes M̃j = Ñj(t) − Λ̃j(t), and the identities
The identities hold almost surely for k = 1, 2. We omit the details.
3 Estimation
Throughout the remainder of this paper, we assume that we have an iid sample of size n of the censored modulated renewal process and covariates. The subscript ”i” refers to the i-th subject under study and Di represents the associated vector of observations. It corresponds to the sequence of states visited, duration of the time spent in each state, the initial covariate and its updates occurring at uncensored renewal times.
Further, let q = |
| be the total number of possible one-step transitions in the model. For each j = 1, …, q, we let (r(j), c(j)) = (j1, j2) if the pair j ∈
corresponds to the one-step transition from state j1 to the state j2. For any such j ∈
, the covariate Zj1m is denoted as Zjm. We shall also find it convenient to write Γ = [Γ1, …, Γq]T for the vector obtained by stacking the columns of the matrix Γ = [Γj]j∈
on the top of each other and deleting all entries corresponding to the pairs (j1, j2) ∉
. For the sake of convenience, we shall write αj(y, θ, z) for each j ∈
and y = (y1, …, yq)T, yj ∈ R+, j = 1, …, q. However, it is tacitly assumed here that for j = (j1, j2) ∈
, the function αj(y, θ, z) may depend only on yk’s such that (r(k), c(k)) = (j1, ℓ) for some (j1, ℓ) ∈
.
Under assumptions stated in section 5, the parameter θ varies over a bounded open subset Θ of Rd and the functions ℓj(y, θ, z) = log αj(y, θ, z), y ∈ Rq are twice continuously differentiable with respect to (y, θ). We let
be a vector whose k-th component is equal to the partial derivative of ℓj(y, θ, z) with respect to yk, k = 1, …, q. Likewise, ℓ̇j denotes the (column) vector of length d corresponding to the derivative of ℓj with respect to θ. We further set
, y ∈ R q and denote by Ṡ, S′ the derivatives of these processes with respect to (y, θ). Here, Ṡ is a d × q matrix, whose j-th column is given by Ṡj(y, θ, x), the derivative of Sj with respect to θ. Further
is a q × q matrix, whose (k, j) entry is equal to the partial derivative
of Sj(y, θ, x) with respect to yk, k = 1, …, q. Let s and let ṡ, s′ be the matrices of expected Ṡ and S′ processes. Finally, for each j ∈
, we let
be the averaged process counting transitions from the state j1 = r(j) to the state j2 = c(j) and whose sojourn time in the state j1 does not exceed x.
As an estimate of the unknown transformations Γ = [Γ1, …, Γq]T, we consider a vector valued analogue of the estimator proposed by Bagdonovicius and Nikulin (1999,2004) for analysis of single spell models. The estimator is given by
(3.1) |
For fixed θ, (3.1) forms a sample analogue of the non-linear vector-valued Volterra equation
(3.2) |
Using arguments similar to Dabrowska (2006), we can show that under the regularity conditions stated in Section 5, the equation (3.2) has a unique solution Γθ = [Γ1θ, …, Γqθ]T and its estimator (3.1) is uniformly consistent. Further, the function Θ ∋ θ → {Γθ(x): x ∈ [0, τ]} ∈ C([0, τ])q is Frèchet differentiable with respect to θ. The derivative is a d × q matrix of continuous functions satisfying the matrix-valued linear Volterra equation
(3.3) |
where Cθ(x) is the diagonal q × q matrix Cθ(x) = diag [C1θ(x), …, Cqθ(x)] with entries
and
The solution to the Volterra equation is given by
(3.4) |
where
(w, x), 0 < w ≤ x is the Peano series (Gill and Johansen, 1990)
(3.5) |
Here I is the q × q identity matrix. A uniformly consistent estimate of {Γ̇θ(x): x ∈ [0, τ], θ ∈ Θ} can be obtained by substituting the processes Nj.. and Sj, , Ṡj into the preceding expressions.
To define the score equation for estimation of the Euclidean parameter, let
where fj(y, θ, Zjmi) is a function of covariates, jointly continuous with respect to (y, θ) and bounded on every compact set of Rq × Θ. Likewise, for any two vectors f1j and f2j of such functions, define
and set varj[fj](u, θ) = covj[fj, fj](u, θ).
To estimate the parameter θ, we use a solution to the score equation Un(θ) = Unϕn(θ) = oP (n−1/2), where
(3.6) |
b̂jmi(Γnθ(u), u, θ) = b̂jm1i(Γnθ(u), u, θ) − ϕnθ(u)b̂jm2i(Γnθ(u), u, θ) and
Here ϕnθ(x) is an estimate of a d × q matrix of bounded functions ϕθ(x), whose j-th column is absolutely continuous with respect to Γjθ.
We further define matrices
where and
Proposition 3.1
Let εn ↓ 0 be a sequence such that
and let
(θ0, εn) = {θ: |θ − θ0| ≤ εn} be the ball of radius εn centered at θ0. Suppose that the matrix Σ0(θ0) is positive definite and the matrix Σ1(θ0) is non-singular. Under conditions stated in Section 5, the score equation Unϕn(θ) = oP*(n−1/2) has a solution θ̂ in the ball
(θ0, εn), with (inner) probability tending to 1. Further, let
and
. Then [Ξ̂, Ŵ0] converges weakly in Rd ×ℓ∞([0, τ] ×
) to a tight mean zero Gaussian process [Ξ, W0] with covariance
where Kθ, θ ∈ Θ is a q × q matrix
(3.7) |
Here
= ℓ∞([0, τ] ×
) denotes the space of bounded functions mapping the set [0, τ] ×
into R and equipped with uniform metric and Borel σ-field. The Borel σ-field
= Rd ×
is generated by open sets in the product topology of the Euclidean space Rd and the space
. It is equal to
(Rd) ⊗
(
) because Rd is a complete separable metric space. The process X = (Ξ, W0) has a version whose almost all paths are in the separable subspace of
corresponding to Rd × Cb([0, τ] ×
), where Cb([0, τ] ×
) is the space functions continuous with respect to the variance pseudometric. Weak convergence of the sequence Xn = [Ξ̂, Ŵ0] to (Ξ, W0) means that for all bounded continuous functions f on
, we have E*f(Xn) − Ef (X) → 0, where E* is the outer expectation. This implies that Xn is asymptotically measurable. In particular, we have E*f(Xn) − E*f (Xn) → 0 for all bounded continuous functions f on
, where E*f (Xn) = −E*(−f(Xn)) is the inner expected (van der Vaart and Wellner (1996), Dudley (1999)). We also note that the space
= ℓ∞(
×
) is isometric to the product space
= ℓ∞([0, τ])q equipped with uniform metric dY (x, y) = maxj supt |xj(t) − yj(t)| and product topology of
coincides with the topology induced by metric dY. Under assumptions of section 5, the space Cb([0, τ] ×
) is isometric to the space C([0, τ])q and W0 is a linear transformation of a vector of q independent time-transformed Brownian motions.
The M-estimator θ̂ depends on the specification of the matrix ϕθ and its estimator ϕnθ. Depending on the measurability properties of the estimator ϕnθ, the solution to the score equation exists either with probability tending to 1, or with inner probability tending to 1 (Section 5). Two simple choices of the function ϕθ correspond to ϕθ ≡ 0 and ϕθ = − Γ̇θ. In particular, with the latter choice, the estimate θ̂ is an analogue of the pseudo-maximum likelihood estimators considered by Bagdonovicius and Nikulin (1999,2004) in the case of single spell models. Under regularity conditions, the optimal choice of this function corresponds to solution of a system of Sturm-Liouville equations and yields an asymptotically efficient estimate of the Euclidean component of the model. If the process registers only events of one type (i.e. |
| = 1) then the form of ϕθ corresponding to the efficient estimate of θ is similar to the single spell version of this model and can be found in Bickel (1986) and Bickel and Ritov (1995) in the uncensored case, and in Dabrowska (2007) in the censored case. The estimate of the function ϕθ can be obtained in this case by inverting a simple tridiagonal band-symmetric matrix. The form of the information bound and efficient score function for the general case (|
| > 1) is postponed to a separate paper, where we consider it under additional compatibility conditions.
To set confidence bands for the baseline Γ vector and related parameters, we consider Gaussian multiplier method of Lin, Fleming and Wei (1994). For this purpose, we shall need some additional notation.
Let G0 be a vector of independent
(0, Id×d) variables. and let Gi = (Gmi: m = 1, …, Ki), i = 1, …, n, Ki = Y..i(0) be standard normal variables, independent of G0 and mutually independent given the data D1, …, Dn.
- For j ∈
, set
- Put , where and
The estimates Q̂θ̂ and
are plug-in analogues of the matrices defined in (3.3)–(3.5)
Proposition 3.2
Suppose that the conditions of Proposition 3.1 are satisfied. Then, unconditionally, ((Ξ̂#,
),
converges weakly in Rd×ℓ∞([0, τ]×
) to a mean zero Gaussian process (Ξ#,
) with the same covariance function as (Ξ, W0). Moreover, (Ξ, W0) and (Ξ#,
) are independent while (Ξ̂, Ŵ0) and (Ξ#,
) are asymptotically independent. Conditionally, the process (Ξ#,
) converges weakly to (Ξ#,
), in probability. As in van der Vaart and Wellner (1996, p. 181), conditional weak convergence means that
, where EG denotes expectation with respect to the G variables. Further, h varies over the class of bounded Lipschitz functions, and BL1 is the set Lipschitz functions whose norm is bounded by 1.
This proposition can be further extended to approximate the distribution of functionals Φ(θ, Γ). In sufficiently simple cases, functional delta method can be used for this purpose. In particular, we may consider estimation of the kernel F of a semi-Markov processes with a state space
= {1, …, r}. In this case the covariates are time independent, and the entries of the matrix F(x|z) = [Fj(x|z)]j∈
are specified by (2.2)–(2.3). Under the assumed differentiability conditions on the hazard functions αj, the plug-in sample analogue F̂ of the matrix F has entries satisfying
(3.8) |
For any j = (j1, j2) ∈
, Γ (j1,.) and W̃(j1,.) denote subvectors Γ (j1,.) = {Γθ0j: j = (j1, ℓ) ∈
} and W̃(j1,.) = {W̃0j: j = (j1, ℓ) ∈
}, where
(3.9) |
Denote by
the matrix obtained by replacing in (3.8)–(3.9) the process (Ξ̂, Ŵ0) by (Ξ̂#,
) and the unknown parameters by their estimates (θ̂, Γnθ̂). Using integration by parts and Proposition 3.1 it is easy to verify that the process ŴF = [ŴF,j(x|z): x ≤ τ, j ∈
] converges weakly to a mean zero Gaussian process WF in ℓ∞([0, τ])|
|. In addition, the conclusions of Proposition 3.2 carry over to the process
, i.e. unconditionally,
converges weakly to a mean zero Gaussian process
with the same covariance function as the process WF and is independent of it. Conditionally, the process
converges weakly to
in probability.
Another example of a functional may correspond to the cumulative residual process arising in goodness-of-fit testing. In particular, suppose that covariates are partitioned into k disjoint categories, I1, …, Ik. The cumulative residual process for the one-step transition between states j1 → j2 is given by
where
is the risk process corresponding to subjects in the group Iℓ. Under the assumption that residuals are consistent with the model, the R̂ = {R̂j(t, ℓ): t ∈ [0, τ], j ∈
, ℓ = 1, …, k} converges weakly to a mean zero Gaussian process and the Gaussian multiplier approximation to its distribution is given by
In analogy to Martinussen and Scheike (2006), the performance of residuals can be evaluated using Kolmogorov-Smirnov statistics such as supx∈[δ,τ−δ] |R̂j(x, ℓ)| and the Guassian multiplier method can be used to obtain critical levels of tests. Alternate tests can be obtained by modifying chi-squared tests in Aalen et al (2008, p.144) or tests based on Schoenfeld residuals.
4 Example
We consider a transplant outcome data set from the Center for International Blood and Marrow Transplant Research (CIBMTR). The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and bone marrow transplant patients with CIBMTR Data Collection Center located at the Medical College of Wisconsin. The CIBMTR is a repository of information about results of transplants at more than 450 transplant centers worldwide. The example data set consists of patients who received HLA-identical sibling transplant from 1995 to 2004 for acute myeloge-nous leukemia (AML) or acute lymphoblastic leukemia (ALL) and transplanted in first remission. All patients received bone marrow transplantation or peripheral blood stem cell transplantation. Children under age 16 and all patients who received umbilical cord blood transplants were excluded as risk factors are likely to vary in this group.
Allogeneic stem cell transplantation (ASCT) is an accepted treatment for leukemia patients. Transplant candidates receive high doses of chemotherapy and radiation which destroy malignant cells in the bone marrow and elsewhere. Because stem cells in the normal bone marrow are destroyed in this process as well, patients subsequently receive a transplant from a suitably matched donor. The transplant can be followed by several complications. In this study, fatal complications correspond to relapse of leukemia or death in remission (hereafter referred to as death). The most important intermediate event in ASCT is graft-versus-host-disease (GVHD) in which transplanted immune cells recognize the recipient’s body tissues as foreign. Acute and chronic GVHD (AGVHD and CGVHD) are two forms of this disease. AGVHD occurs during the early post-transplant period is defined here as moderate to severe using clinically established criteria. CGVHD occurs later in time and may be preceded by AGVHD.
The incidence of GVHD, leukemia relapse and death in remission depends on a number of variables characterizing the recipient, the donor and the transplant. The main variables considered in this paper include recipient’s age, donor-recipient gender match, disease type and graft source. Bone marrow was the first source of stems cells used in used ASCT. Since 90’ies, peripheral-blood stem cell transplants have replaced bone marrow as the preferred source of stem cells because of a quicker hematologic recovery and relative ease of collection. Patients may receive also an infusion of both peripheral stem-cells and bone marrow. Several studies have shown that PBSCT recipients may be at a higher risk of GVHD than BMT patients. (e.g. Cutler et al. (2001), Flowers et al. (2002), Friedrichs et al. (2010)). A possible explanation of this phenomenon is that GVHD develops from the infusion of donor T cells and PBSCT recipients receive a significantly higher dose of T cells than BMT patients. As a result of the increased risk of GVHD, the patients who experience it may be at a higher risk of death in remission than BMT patients. GVHD is also more more common among older patients and among male recipients receiving transplants from female donors (Gale et al. 1987).
For purposes of modeling, we consider a five state modulated renewal model proposed for analysis of the transplant recovery process in Dabrowska et al. (1994). Table 1 collects some information about the type and number of the observed transitions, their range and median. The model assumes that a patient remains in the transplant state (tx, state 1) until the time of the first adverse event which may correspond to AGVHD (state 2), CGVHD (state 3), relapse (state 4) or death in remission (state 5). The model takes also in to the account that a patient who develops GVHD may subsequently relapse or die, and that CGVHD may be preceded by AGVHD. The observed model has an extra absorbing state corresponding to censoring (loss-to-follow-up). Further, age was categorized into 3 groups, each representing approximately one third of the patients. The baseline group corresponds to the age range [29.5, 42.5]. Transitions were also adjusted for the waiting time for transplant. Two continuous variables were used for this purpose: the length of time between leukemia diagnosis and first remission (DxCr) and the length of time between first remission and transplant (CrTx). Their medians and range were: median(DxCr)= 1.38, IQR(DxCr)=1.15, range(DxCr)=221.45 months and med(CrTx) = 3.06, IQR(CrTx)=2.5, range(CrTx)=46.74 months. To obviate skewness of the distribution, the log transformation of these variables is used in the regression analysis.
Table 1.
Observed one-step transitions
n | median (in months) | range (in months) | |
---|---|---|---|
TX → AGVHD | 491 | .7 | 4.3 |
TX → CGVHD | 372 | 5.5 | 106.4 |
TX → relapse | 106 | 5.6 | 59.4 |
TX → death | 179 | 2.9 | 131.9 |
TX → censoring | 506 | 56.9 | 143.8 |
| |||
AGVHD → CGVHD | 202 | 4.8 | 57.4 |
AGVHD → relapse | 33 | 5.2 | 23.7 |
AGVHD → death | 141 | 2.9 | 80.3 |
AGVHD → censoring | 115 | 45.7 | 133.0 |
| |||
CGVHD → relapse | 27 | 8.3 | 98.3 |
CGVHD → death | 79 | 9.8 | 124.4 |
CGVHD → censoring | 266 | 51.1 | 144.3 |
| |||
A+CGVHD → relapse | 25 | 3.5 | 53.3 |
A+CGVHD → death | 65 | 5.6 | 109.3 |
A+CGVHD → censoring | 112 | 56.3 | 145.2 |
The modulated renewal process assumes that one-step transition probabilities are specified by means of a proportional odds ratio model. More precisely, hazard rates of one-step transitions originating from the transplant or AGVHD state are of the form
for j =(j1, j2) such that j1 = 1 or j1 = 2 and j1 + 1 ≤ j2 ≤ 5, . In the case of transition rates originating from the CGVHD state, we use covariate ZC = (Z, ZA), where ZA is a binary variable indicating by 1 whether AGVHD preceded onset of chronic graft versus host disease. The corresponding transition rates into the relapse and death states are given by
for j = (3, j2) and j2 = 4, 5. Here Zj and ZjC, j = (j1, j2), represent transition specific covariates, which correspond to subvectors of Z and ZC, respectively. Table 4 provides their entries as well as the estimates of the regression coefficients and standard errors. The estimates were obtained using Fisher scoring algorithm applied to the score process (3.6) with ϕnθ = −Γ̇nθ. Variable selection was based on backwards elimination and Wald testing. To asses adequacy of the model, we have used Kolmogorov-Smirnov described in Section 3. The results are summarized below and in Table 5.
Table 4.
Regression estimates
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|
ALL vs AML | .07 (.25) | 1.32 (.36) | .50 (.23) | .45 (.38) | .12 (.30) | .90 (.31) | .58 (.22) | ||
Age1 | −.25 (.16) | −.68 (.28) | −.49 (.32) | −.45 (.26) | |||||
Age2 | .27 (.20) | .33 (.23) | .51 (.59) | .01 (.24) | .72 (.21) | ||||
FM | −.20 (.28) | −.57 (.39) | |||||||
PBSCT vs BMT | .09 (.22) | .01 (.29) | .28 (.30) | −.12 (.66) | −.17 (.38) | .55 (.35) | |||
ALLxPBSCT | .46 (.23) | .92 (.43) | |||||||
AMLxPBSCT | −.33 (.30) | ||||||||
AMLxBMT | −.30 (.22) | −.50 (.43) | |||||||
DxCr | .12 (.08) | .45 (.16) | .22 (.12) | .13 (.14) | |||||
CrTx | −.21 (.07) | −.26 (.09) | −.33 (.15) | −.24 (.17) | −.10 (.12) | ||||
prioir AGVHD | .72 (.30) | .72 (.20) | |||||||
Age1xPBSCT | −.44 (.34) | ||||||||
Age1xBMT | −.57 (.33) | .70 (.60) | −.88 (.35) | −.64 (.34) | |||||
Age2xPBSCT | −.28 (.25) | .13 (.37) | |||||||
Age2xBMT | −.37 (.24) | .60 (.88) | |||||||
Age0xPBSCT | −.27 (.26) | .40 (.31) | |||||||
AMLxPBSCTxAge2 | .25 (.22) | ||||||||
Age1xALL | .61 (.27) | −.87 (.48) | −.60 (.42) | ||||||
Age2xALL | .57 (.30) | −.97 (.47) | |||||||
FMxALL | .63 (.27) | ||||||||
FMxAML | .64 (.17) | .81 (.45) | |||||||
FMxPBSCT | .42 (.18) | .44 (.25) | |||||||
FXALL | −.25 (.19) | ||||||||
FxAML | −.19 (.14) |
Columns: 1 = Tx → AGVHD; 2 = Tx → CGVHD; 3 = Tx → Relapse; 4 = Tx → Death 5 = AGVHD → CGVHD; 6 = AGVHD → Relapse; 7 = AGVHD → Death; 8 = CGVHD → Relapse; 9 = CGVHD → Death.
Variable names: Age0: age in the (29.5, 42, 5] range, Age1 = age ≤ 29.5 years, Age2 = age > 42.5 years; F = female donor transplant; FM = female donor to male recipient vs other sexmatch transplant.
Table 5.
Kolmogorov-Smirnov residual statistics
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|
AML | 8.51 (.97) | 6.35 (.96) | 7.88 (.69) | 5.87 (.86) | 4.40 (.96) | 2.75 (.86) | 4.14 (.97) | 4.42 (.73) | 6.70 (.82) |
ALL | 10.22 (.89) | 7.66 (.83) | 7.65 (.69) | 6.09 (.73) | 4.34 (.94) | 2.66 (.85) | 4.23 (.95) | 4.49 (.71) | 6.46 (.75) |
Age0 | 12.99 (.81) | 6.52 (.93) | 3.30 (.95) | 6.50 (.63) | 4.55 (.92) | 2.79 (.51) | 4.37 (.93) | 2.03 (.94) | 8.17 (.47) |
Age1 | 6.42 (.98) | 5.82 (.95) | 3.20 (.94) | 5.18 (.82) | 4.96 (.87) | 3.66 (.71) | 2.66 (.98) | 2.04 (.95) | 3.53 (.87) |
Age2 | 10.56 (.90) | 7.64 (.91) | 6.40 (.60) | 8.11 (.72) | 6.35 (.83) | 2.44 (.87) | 2.92 (.99) | 1.51 (.99) | 8.07 (.76) |
BMT | 7.84 (.97) | 7.73 (.91) | 7.16 (.69) | 6.04 (.61) | 5.96 (.81) | 1.36 (.99) | 5.11 (.90) | 2.94 (.79) | 9.67 (.57) |
PBSCT | 9.44 (.95) | 9.07 (.90) | 7.49 (.73) | 5.75 (.82) | 6.66 (.88) | 1.27 (.99) | 4.84 (.95) | 2.91 (.91) | 9.39 (.69) |
non-FM | 5.46 (.99) | 7.84 (.94) | 1.95 (.99) | 9.82 (.66) | 4.22 (.97) | 1.50 (.97) | 4.57 (.96) | 3.97 (.80) | 4.20 (.95) |
FM | 6.83 (.96) | 9.36 (.82) | 2.32 (.96) | 9.52 (.43) | 5.06 (.88) | 1.33 (.96) | 5.07 (.84) | 3.91 (.60) | 4.37 (.88) |
M donor | 5.48 (.99) | 1.29 (.84) | 2.96 (.98) | 6.00 (.84) | 1.42 (.58) | 2.02 (.94) | 3.93 (.98) | 2.06 (.97) | 3.50 (.98) |
F donor | 6.84 (.98) | 11.95 (.80) | 2.61 (.99) | 5.83 (.86) | 11.50 (.53) | 1.93 (.93) | 4.43 (.95) | 2.08 (.97) | 3.47 (.97) |
DxCr-1 | 9.73 (.86) | 16.86 (.42) | 5.67 (.56) | 7.88 (.49) | 4.92 (.85) | 2.27 (.77) | 5.17 (.81) | 2.44 (.89) | 5.34 (.74) |
DxCr-2 | 8.47 (.88) | 11.67 (.59) | 2.56 (.94) | 2.86 (.98) | 6.80 (.61) | 2.79 (.64) | 5.93 (.73) | 3.85 (.57) | 4.72 (.74) |
DxCr-3 | 4.03 (1.00) | 12.92 (.48) | 4.34 (.71) | 5.03 (.71) | 9.88 (.33) | 2.21 (.75) | 4.27 (.88) | 4.38 (.40) | 11.23 (.30) |
DxCR-4 | 11.80 (.83) | 15.05 (.43) | 4.01 (.89) | 6.71 (.69) | 6.15 (.71) | 5.67 (.27) | 12.11 (.43) | 2.83 (.78) | 3.52 (.94) |
CrTx-1 | 14.69 (.74) | 11.23 (.52) | 5.19 (.66) | 5.21 (.74) | 6.53 (.74) | 3.97 (.55) | 7.24 (.76) | 3.17 (.75) | 2.43 (1.00) |
CrTx-2 | 9.92 (.85) | 12.52 (.58) | 3.96 (.81) | 5.37 (.68) | 7.05 (.58) | 2.61 (.66) | 6.13 (.73) | 3.49 (.67) | 7.57 (.54) |
CrTx-3 | 12.08 (.76) | 5.86 (.94) | 5.96 (.62) | 8.34 (.46) | 4.17 (.89) | 2.52 (.67) | 2.62 (.98) | 2.34 (.86) | 4.84 (.75) |
CrTx-4 | 8.84 (.87) | 7.21 (.85) | 2.94 (.92) | 1.37 (.35) | 5.73 (.71) | 4.21 (.37) | 5.71 (.73) | 3.59 (.57) | 4.54 (.80) |
AML x BMT | 5.38 (.99) | 9.84 (.76) | 5.15 (.70) | 1.62 (.41) | 5.12 (.82) | 2.04 (.88) | 5.46 (.78) | 2.57 (.64) | 7.96 (.50) |
AML x PBSCT | 7.25 (.96) | 6.30 (.97) | 1.03 (.34) | 5.04 (.80) | 5.18 (.92) | 1.84 (.87) | 5.77 (.88) | 3.18 (.84) | 4.60 (.91) |
ALL x BMT | 7.62 (.85) | 2.92 (.99) | 4.44 (.74) | 7.03 (.42) | 2.44 (.97) | 1.10 (.99) | 2.79 (.95) | 2.46 (.70) | 3.22 (.85) |
ALL x PBSCT | 6.20 (.93) | 6.39 (.69) | 3.88 (.85) | 3.01 (.86) | 4.62 (.82) | 1.97 (.75) | 3.24 (.95) | 3.79 (.66) | 5.92 (.62) |
Age1 x PBSCT | 14.92 (.71) | 8.03 (.81) | 6.58 (.63) | 5.12 (.75) | 7.56 (.58) | 1.63 (.84) | 5.31 (.86) | 3.28 (.71) | 6.58 (.57) |
Age1 x BMT | 6.45 (.93) | 6.12 (.84) | 3.11 (.88) | 7.13 (.51) | 4.22 (.78) | 2.49 (.80) | 2.01 (.96) | 2.94 (.56) | 3.09 (.76) |
Age2 x PBSCT | 7.31 (.94) | 5.61 (.95) | 7.69 (.34) | 9.53 (.45) | 4.36 (.93) | 1.28 (.96) | 3.74 (.95) | 1.26 (1.00) | 9.60 (.57) |
Age2xBMT | 5.46 (.87) | 6.96 (.68) | 4.02 (.41) | 4.58 (.76) | 3.66 (.77) | 1.95 (.78) | 2.37 (.93) | .93 (.91) | 4.43 (.72) |
Age0xPBSCT | 4.29 (.98) | 8.10 (.71) | 4.73 (.67) | 5.45 (.49) | 5.34 (.74) | 1.88 (.61) | 2.46 (.98) | 1.70 (.94) | 3.58 (.75) |
Age0 x BMT | 9.73 (.73) | 9.06 (.59) | 3.97 (.76) | 8.60 (.22) | 6.34 (.42) | 1.76 (.58) | 4.03 (.85) | 3.31 (.24) | 5.82 (.45) |
Age1 x AML | 6.87 (.89) | 5.47 (.87) | 2.52 (.91) | 2.49 (.98) | 3.22 (.94) | 3.98 (.34) | 2.73 (.86) | 2.39 (.59) | 3.29 (.68) |
Age1 x ALL | 4.51 (.97) | 2.73 (.99) | 2.70 (.89) | 3.19 (.74) | 2.59 (.96) | 2.18 (.82) | 3.95 (.78) | 3.88 (.50) | 2.06 (.93) |
Age2 x AML | 10.54 (.80) | 7.95 (.83) | 2.94 (.89) | 3.69 (.88) | 5.19 (.78) | 3.98 (.17) | 4.89 (.81) | 1.78 (.92) | 7.34 (.38) |
Age2 x ALL | 8.57 (.65) | 5.01 (.60) | 4.60 (.74) | 3.34 (.74) | 2.13 (.96) | 1.74 (.29) | 4.09 (.78) | 1.94 (.67) | 1.69 (.97) |
Age0 x AML | 9.70 (.88) | 9.63 (.80) | 7.64 (.34) | 8.95 (.58) | 4.92 (.89) | 3.30 (.63) | 5.20 (.86) | 3.54 (.67) | 4.12 (.95) |
Age0 x ALL | 3.75 (.95) | 2.97 (.94) | 3.75 (.52) | 2.10 (.94) | 3.15 (.76) | 1.27 (.86) | 3.16 (.81) | 2.32 (.67) | 5.76 (.52) |
Columns: 1 = Tx → AGVHD; 2 = Tx → CGVHD; 3 = Tx → Relapse; 4 = Tx → Death 5 = AGVHD → CGVHD; 6 = AGVHD → Relapse; 7 = AGVHD → Death; 8 = CGVHD → Relapse; 9 = CGVHD → Death.
Variable names: Age0: age in the (29.5, 42, 5] range, Age1 = age ≤ 29.5 years, Age2 = age > 42.5 years.
DxCr-i and CrTX-i, i = 1,2,3,4: DxCr and CrTx variables grouped according to quartiles.
F = female donor transplant; FM = female donor to male recipient vs other sexmatch transplant. Each column provides test statistics and p-values determined based on 5000 resampling experiments.
We note here that the transitions originating from the CGVHD state depend on whether or AGVHD was experienced prior to the entrance to the CGVHD state. This dependence violates the assumption that the sequence of states visited forms a Markov chain. However, this problem disappears if the state space of the process is enlarged to include an extra state A+CGVHD. This extra state is here denoted by 3̄. Conditionally on the time independent covariates, the resulting model has structure of a semi-Markov process with kernel F(x|z) = [Fj(x|z)] specified in Table 3. The entries of the kernel matrix have a fairly explicit form. For transitions originating from the transplant (tx) or AGVHD state, we have
Table 3.
One-step transition probability matrix
tx | AGVH | CGVH | A+CGVH | rel | death | |
---|---|---|---|---|---|---|
tx | 0 | F12 | F13 | 0 | F14 | F15 |
AGVHD | 0 | 0 | 0 | F23 | F24 | F25 |
CGVHD | 0 | 0 | 0 | 0 | F34 | F35 |
A+CGVHD | 0 | 0 | 0 | 0 | F3̄4 | F3̄5 |
rel | 0 | 0 | 0 | 0 | 1 | 0 |
death | 0 | 0 | 0 | 0 | 0 | 1 |
for j = (j1, j2), j1 = 1, 2 and j2 = j1 + 1 ≤ j2 ≤ 5. One-step transition probabilities originating from the CGVHD state are given by
for j = (3, j2) and j2 = 4, 5. One-step transition probabilities originating from the state A+CGVHD (labeled as “3̄”) have a similar form, with covariate covariate ZA = 1.
We also consider multi-step probabilities of transitions into the absorbing states, i.e. probabilities of transition into the relapse and death states along any possible path of the model. Let J(t) be the state occupied by the process at time t and let e denote either relapse or death in remission. By noting that a patient may move into an absorbing state by first passing through the GVHD states, these probabilities are given by
where
(4.1) |
and the events A and C represent
The first of these probabilities corresponds to a move from the transplant to the state e in one step so that for e = 4, 5. The terms and provide the probabilities of transitions along the paths “tx → AGVHD → e” ( ) and “tx → CGVHD → e” ( ) and are given by , k = 2 or 3, e = 4 or 5. Here for any two subdistribution functions F and F′ on the positive half-line, F ★ F′ is their convolution
Lastly, transition along the path “tx → AGVHD → A+CGVHD → e” ( ) contributes to the sum .
The multi-step transition probabilities can be estimated using plug-in method. The estimates are consistent on time intervals [0, τ] strictly contained in the support of all sojourn time distributions. As an example, Figure 1 compares transition probabilities of hypothetical ALL patients receiving BMT and PBSCT transplant. The remaining covariates correspond to the age range 16–29.5 years and baseline subgroups specified in Table 2. The plots represent the four components of the multistep transition probabilities defined in (4.1). PBSCT seems to reduce one-step transition probabilities of both relapse and death ( , black curves), and the effect is more pronounced in the case of the tx → death transition. The graphs suggest also that PBSCT associates with a reduced probability of relapse preceded by AGVHD ( , red curves). At the same time, however, the probability of death in remission is higher than that of a BMT recipient. We also see an increase in the probability of relapse and death resulting from CGVHD without AGVHD ( , blue curves) and CGVHD with AGVHD ( , green curves).
Figure 1.
Transition probabilities of endpoint events of a young ALL patient receiving BMT (left panel) or PB (right panel). The remaining covariates correspond to the baseline. The curves represent one-step transitions tx → e (black), two-step transitions tx → AGVHD → e (red) and tx → CGVHD → e (blue), and three-step transitions tx → AGVHD → CGVHD → e (green).
Table 2.
Summary of covariates
Age | n | Graft source | n | Disease | n |
---|---|---|---|---|---|
< 30 (young) | 550 | [BMT] | 842 | [AML] | 1168 |
[30, 42.5] | 534 | PB/PB+BMT | 803 | ALL | 477 |
> 42.5 (old) | 561 | ||||
| |||||
Donor’s Gender | n | Gender-Match | n | ||
| |||||
F | 890 | FM | 441 | ||
[M] | 755 | [not FM] | 1224 |
Baseline groups are marked in brackets.
FM represents a female to male transplant
To assess effects of covariates, we consider pointwise and simultaneous confidence bands for pairwise differences of one-step and multi-step transition probabilities. In the case of one-step transition probabilities, we consider functions
where z1 and z2 are two covariate levels. We denote by the corresponding sample analogue of the function . Results of Section 5 imply that the normalized process converges weakly to a mean zero Gaussian process .
To construct confidence bands, we note that each Δ function forms a difference of two subdistributions functions. Correspondingly, it assumes values between −1 and 1. Direct application of the Gaussian approximation to the limiting distribution of the process may result in confidence intervals and confidence bands whose bounds may fall outside the interval (−1, 1). To circumvent this problem, we use transformation method.
Let Φ: R → (−1, 1) be strictly increasing differentiable function derivative ϕ satisfying ϕ(x) > 0 for all x ∈ R. By delta method,
Let cα(t1, t2) be the upper α percentile of the distribution of
where is an estimate if the standard deviation of . Then, by the continuous mapping theorem, the 100 × (1 − α)% asymptotic confidence band for the Δ function has upper and lower bounds given by
(4.2) |
The corresponding pointwise confidence intervals can be obtained by replacing the constant cα(t1, t2) by the upper α/2 percentile of the standard normal distribution.
A possible choice of the Φ function may correspond to Φ(x) = 2G(x) − 1, where G is a distribution function with density g supported on the whole real line. In analogy to the construction of the confidence bands for survival function in Andersen et al. (1993), we may consider the choice of the extreme value distribution G(x) = 1 − exp[−ex]. In this case Φ−1(u) = log[−log[(1 − u)/2]] and the bounds are given by
(4.3) |
Another possible choice may correspond to the logistic distribution, G(x) = ex/[1+ex]. We have Φ−1(u) = log([1 + u]/[1 − u]), and the bounds assume form
(4.4) |
A similar approach can be applied towards comparison of multi-step transition probabilities. For any two covariate levels, z1 and z2, we set
The corresponding sample analogue is denoted by . It is easy to see that { } converges weakly to a Gaussian process , where
and the integrals are defined by means of the convolution formula.
In Figures 2–5, we compare one-step and multi-step transition probabilities of relapse and death in remission for patients with selected covariate profiles. To obtain the bands, we first used Gaussian multiplier method to estimate the approximate variance of the Δ function: the Monte Carlo variance of the Δ function was computed based on 5000 Monte Carlo experiments. A second application of the Gaussian multiplier method was then used to obtain an approximation of the critical level cα(t1, t2) based on 5000 Monte Carlo trials. The interval [t1, t2] was chosen to correspond to t1 = 1.5 and t2 = 60 months. The bounds (4.2) and (4.3) showed a close numerical agreement and the resolution of the graphs does not allow to show the difference between the two choices. The difference between the upper/lower bounds did not exceed .07%, and the bands obtained using the logistic transformation were narrower.
Figure 2.
Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of ALL patients receiving BMT. Covariates: age z1 ≤ 29.5 and z2 = baseline age. The remaining covariates correspond to the baseline.
Younger age associated with reduced probabilities of relapse and death of both AML and ALL patients. In Figure 2, we use Δ function to compare transition probabilities of hypothetical younger (z1) and baseline age (z2) ALL bone marrow transplant recipients. The remaining covariates correspond to baseline groups specified in Table 2 and median waiting times variables DxCr and CrTx. The plots show that younger age has “concordant” effect on endpoint probabilities, i.e. younger age associated with reduced probability of both relapse and death. In the case of one-step tx → relapse transitions, the pointwise bands suggest that the differences are significant but the wider simultaneous bands show that this is not the case. Examination of the four possible paths leading to the relapse state showed that although younger patients have lower one-step relapse transition probabilities, they are at a higher risk of relapse preceded by AGVHD than patients in the baseline age group. This accounts for marginal differences in the multistep relapse transition probabilities. Figure 2 shows also that multi-step transitions into the death state are significantly lower for a younger patient since the upper bounds of both pointwise and simultaneous bands are below the horizontal line passing through 0. While in the case of one-step transition probabilities there is a marginal difference during the early post-transplant period, patients in the baseline age group had higher probabilities death preceded by GVHD.
In Figure 3, we show the “discordant” effect of older age on the two endpoint probabilities. The graphs represent Δ function for hypothetical ALL patients receiving peripheral blood stem cell transplant. The covariate z1 corresponds to the older age and z2 to the baseline age group. The remaining covariates correspond to baseline (Table 2). Older age associated with lower transition probabilities into the relapse state. On the other hand, the role of the two covariates is reversed in the case of transitions into the death state. Plots of the four paths leading to the endpoint events showed that an older patient may have higher probabilities of death resulting from CGVHD (with or without AGVHD) while probability of transition along the path tx → AGVHD → death is comparable to that of a patient in the baseline age group.
Figure 3.
Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of ALL patients receiving PBSCT. Covariates: z1 = age > 42.5 years, z2 = baseline age. The remaining covariates correspond to the baseline.
In the next figure we show a “switching” treatment effect. Figure 4 compares two hypothetical young AML patients receiving PBSCT (z1) and BMT (z2). The one-step and multi-step relapse probabilities were lower in the case of the PBSCT but the differences were not significant. On the other hand, we see that PBSCT associates with a lower probability of one-step transition into the death state, while in the case of multi-step transitions the role of the two graft sources is reversed. This pattern is also seen in the case ALL young patients in Figure 1, but in the case of AML patients the differences in the multi-step transition probabilities were more pronounced.
Figure 4.
Pointwise and simultaneous confidence bands for the one-step and multi-step Δ functions of young AML patients. Covariates z1 = PBSCT z2 = BMT. The remaining covariates correspond to the baseline.
A similar approach can be applied to compare transition probabilities evaluated by conditioning on the follow-up history of a patient. In particular, Arjas and Eerola (1993) and Eerola (1994) have suggested the use of graphs of the conditional probabilities
(4.5) |
where
represents patient’s history up-to time s. Examples of these graphs specialized to Markov chains and semi-Markov models were given in Klein et al. (1993), Keiding et al. (2001), Dabrowska et al. (1994) and Putter et al. (2007). Here we note only that in the case of Markov chain regression models, the predictions depend only on the state occupied by the patient at time s and estimation of (4.5) reduces to estimation of the transition probability matrix because
(4.6) |
In the case of semi-Markov model, the conditional probabilities P(J(t) = e|
) are given by the transition probability matrix of a delayed Markov renewal process, with delay determined by the length of time spent on the state occupied at time s. On the other hand, the right-hand side of (4.6) depends also on the the initial state J0, and all possible transitions leading to the state e and passing through the state i on or prior to time s. The two models coincide only if the sojourn times in each state are exponentially distributed.
In Table 5 we report results from analysis of residuals of the main variables in the model. We considered Martinussen and Scheike’s Kolmogorov-Smirnov statistics for transitions between adjacent states of the model from each state. The test statistics were calculated in the range t ∈ [1, 90] months and the reported p-values were obtained using Gaussian multiplier method based on 5000 Monte Carlo samples. The results were also compared with a larger model, which included length of time spent in the transplant and AGVHD states as time dependent covariates. The dependence on length of time spent in these states appeared to have marginal effect. In the case of the transitions originating from the CGVHD state, the latter may stem from a relatively small number of failures (relapse or death). On the other hand, AGVHD can occur only during the first 4 months and the state space of the process partially captures the dependence on the length of time spent in the transplant state. Although Table 5 shows an acceptable fit, there are several possible sources of departure from the model, In particular, they may be caused by calendar and center effects. For example, grading of acute and chronic GVHD is not uniform across centers. At the same time, the use of PBSCT in allogeneic transplants might have been more frequent towards the end of the study period than at its beginning. These factors were not taken into the account in this study as they identify patients in the population. Further, transplant may result in many other complications, including infections, pneumonia, as well secondary cancers, loss of vision and damage of other organs. We have not taken them into the account due to lack of data.
There has been very little work on variable and model selection problems in multi-state models. Commenges et al. (2007) considered a flexible class of multistate models which includes as special cases Markov chains and semi-Markov models. They extended the expected Kullback-Leibler (EKL) risk function to counting process models coarsened at random and proposed a leave-one-out cross-validation method for approximation of EKL based on penalized likelihoods. The approach was illustrated using a three state additive illness process, though the methodology applies to more complex situations as well. Another approach may be based on focused information criteria and model averaging of Hjort and Cleaskens (2003,2006). Their approach is tailored towards selection of a model for given parameters of interest. In the case of single spell models, examples of such parameters include regression coefficients, quantiles, cumulative hazards or distribution functions evaluated at a fixed point or over a fixed interval. Extension of this method to multistate regression models may include onestep and multistep transition probabilities or other parameters arising in prediction problems.
5 Proofs
5.1 Assumptions and notation
We first recall that if A = [akℓ] is a rectangular d × q matrix then its ℓ1 and ℓ∞ norms are given by
and we have ||A||1 = sup{μTAλ: ||μ||∞ ≤ 1, ||λ||1 ≤ 1} = ||AT||∞, where μ = (μ1, …, μd)T and λ = (λ1, …, λq)T. If A(s) = [aij(s)], s = (x, θ) is a d × q matrix of functions defined on
= [0, τ] × Θ then ||A|| = sup{||A(s)||1: s ∈
} is the corresponding supremum norm, and with some abuse of notations, we write ||A|| = sup{||A(s)||∞: s ∈
}. We also use ||·|| to denote the supremum norm of scalar or vector-valued functions on [0, τ].
We shall assume the following regularity conditions on the hazard rates αj(y, θ, z), y ∈ Rq, j ∈
.
Condition 5.1
The parameter set Θ ⊂ Rd is bounded and open.
For fixed z ∈ Rd, the function ℓj(y, θ, z) = log αj(y, θ, z), j ∈
is twice continuously differentiable with respect to (y, θ). The derivatives with respect to y (denoted by primes) and with respect to θ (denoted by dots) satisfy , ||ℓ̇j(y, θ, z)||1 ≤ ψ1(||y||1), ||ℓ̈j(y, θ, z)||1 ≤ ψ2(||y||1) and ||g(y, θ, z) − g(y′, θ′, z)||1 ≤ max(ψ3(||y||1), ψ3(||y′||1)) × × [||y − y′||1 + |θ − θ′|], where and . Here ψ is a constant or a continuous bounded decreasing function. The functions ψp, p = 1, 2, 3 satisfy ψp(0) < ∞, are continuous and locally bounded.
For fixed θ ∈ Θ and y ∈ Rq, the functions αj(y, θ, ·) and their logarithmic derivatives in (ii) are measurable with respect to the Borel σ-field of Rd.
-
We have either a) m1 < αj(y, θ, z) < m2 for some 0 < m1 < m2 < ∞ or b) αj(y, θ, z) is a bounded coordinate-wise decreasing function such that αj(y1, …, yk, θ, z) ↓ 0 as yℓ ↑ ∞, ℓ = 1, …, q, and
for some c1, c2 > 0, e1 ∈ (0, 1], e2 ∈ [0, 1] and 0 < m1 < m2 < ∞.
The condition (ii) assumes that the function α(y, θ, z) and its derivatives are jointly continuous in the arguments (y, θ). Together with the condition (iii), this implies that they are measurable with respect to the Borel σ-field of
(Rq) ⊗
(Θ)
(Rd). The condition 5.1 (iii) serves to ensure that for each state j1 ∈
, the Volterra equation corresponding to the transitions j ∈
originating from the state j1 has a non-explosive solution on the interval [0, τj1] = sup{x: EYj(x) > 0}.
Let P be a distribution satisfying assumptions 2.1 of Section 2. For any j ∈
let
and set A.P = Σj∈
AjP. In analogy to the single spell models in Dabrowska (2006), we can show that the condition 5.1 (iii-a) implies that, the Volterra equation has a unique solution Γθ = [Γ1θ, …, Γqθ]T such that
for x ∈ [0, τ], θ ∈ Θ. In addition, there exist positive constants d1, d2, d3 such that
(5.1) |
Similar inequalities hold also for the left continuous version of Γjθ. On the other hand, under the condition 5.1.(iii-b), we have Φ2(AjP (x)) ≤ Γjθ(x) ≤ Φ1(A.P (x)), where for q = 1, 2 and if eq ≠ 1, and if eq = 1. The functions Φq are inverse cumulative hazards corresponding to the lower and upper bounds on hazard rates in the condition 5.1 (iii). The inequality (5.1) is in this case satisfied with the function A.P replaced by Φ1(A.P ).
5.2 Some measurability issues
In section 2, we assumed that the observations D1, …, Dn of the censored modulated renewal process are defined on a common complete probability space (Ω,
, P) and take on values in a separable measure space (
,
). A measure space is here called separable if its σ-field is countably generated and contains all singletons. Any such space is measurably isomorphic to a subspace of the real line equipped with its Borel σ-field (e.g. Dellacherie-Meyer, 1975, p.15). Let (
,
) and (
,
) be the corresponding n-fold and infinite product spaces and let Pn and P∞ be the corresponding product measures on
and
induced by (D1, …, Dn) and D = (D1, D2, …, Dn, …), respectively. We denote by
the sigma-field of subsets A ⊂
measurable in the completion of the product probability measure Pn and by
the universal sigma-field generated by
, i.e. the sigma-field of subsets measurable in the completion of any probability measure Q on
. We have
. Whereas
is not complete with respect to the product measure Pn, any set
satisfies
and
for gn = (D1, …, Dn). The sigma-fields
and
have similar property. Without much loss of generality, we can assume therefore that (Ω,
) = (
,
) and, when necessary, require measurability with respect to these larger sigma-fields. With this choice the sequence D is the identity map on
and (D1, …, Dn) are the corresponding coordinate projections on (
,
).
Further, let (Ω0,
) be an arbitrary measure space let
be a Polish space or a Borel subset of it. For any set A ⊂ Ω0 ×
, its projection on Ω0 is denoted by projΩ0(A) = {ω0: (ω0, z) ∈ A for some z ∈
}. A multifunction (or correspondence) is a set-valued function assigning to each ω0 ∈ Ω0 a subset of
. We shall write H: Ω0, ↪
for such mappings to differentiate them from usual functions assigning to each ω0 a single value (h: Ω0 →
). The domain and graph of a multifuction H are defined as
respectively. For any nonempty set B ⊆
, the inverse image of H is given by
and the right side is equal to the projection projΩ0(graphH ∩ Ω0 × B). Finally, by a selector we mean a function h: Ω0 →
∪ {z*} such that h(ω0) ∈ H(ω0) if domH ≠ ∅ and h(ω0) = z*, otherwise. (Here z* is an extra point attached to
).
A set-valued mapping H is here called measurable if graphH is jointly measurable with respect to
⊗ B(
). By measurable projection theorems (e.g. Dellacherie and Meyer, 1975, p.252, Pollard, 1984, p. 196–197 or Dudley, 1999, Chapter 5), the joint measurability of graph H entails that the inverse image H−1(B) of any Borel set B ∈
(
) belongs to the universal sigma field
generated by
. Moreover, H admits at least one
-measurable selector. If
is complete with respect to some probability measure then
. For alternative conditions for this equality we refer to Wagner (1976).
Further, let
be a Polish space and let {Xt: t ∈
} be an Rk-valued random element defined on Ω0. We refer to it as measurable if it forms a measurable stochastic process, i.e. the map Ω0 ×
∋ (ω, t) → X(ω, t) ∈ Rk is jointly measurable with respect to the σ-fields
⊗
(
) and
(Rk). Correspondingly, the set valued function H: Ω0 ↪
=
× Rk given by H(ω0) = {(t, X(ω0, t): t ∈
} has a measurable graph and for any Borel sets B ∈
(Rk) and C ∈
(
), we have
. In section 5.3, we use that an Rk-valued process is measurable iff each of its components is measurable. Moreover, sums and products of such processes are measurable as well.
A class of scalar functions
= {gt(s): t ∈
} defined on
, k ≤ n is called here measurable if it forms a measurable process in the above sense. Following Nolan and Pollard (1987) and Pollard (1990), a measurable class of functions
is called Euclidean for an envelope G if |gt|(s) ≤ G(s) for all t ∈
, and there exist constants A and V such that N(ε||G||Q,r,
, ||·||Q,r) ≤ (A/ε)V for all ε ∈ (0, 1) and all probability measures Q on
such that ||G||Q,r < ∞. Here N (η,
, ||·||Q,r) is the minimal number of Lr(Q)– balls of radius η covering the class
and ||·||Q,r is the Lr(Q) norm. We use r = 1, 2 in the sequel.
In our application the space (
,
) can be taken as the complete separable metric space (
,
) = (E0,
(E0)) × (E1 ×
(E0))ℕ, where E0 =
× Rd E1 = (R̄+ × (
× Rd)∪ Δ)ℕ. Here E0 represents possible initial realizations of the mark V0 = (J0, Z0) and E1 is the space of realizations of the censored modulated renewal process (Xm, Vm = (Jm, Zm))m≥1. Further,
= [0, τ] × Θ, where τ is a finite point on the positive half-line and Θ is a bounded open subset of a Euclidean space. Here
is a Polish space because
forms a Gδ subset (a countable intersection of open sets) of a Polish space and Polishness is hereditary with respect to Gδ sets. Finally, all classes
= {gt(s): t = (x, θ) ∈
} correspond to cádlág (or cáglád) functions such that for 0 ≤ x < x′ ≤ τ and θ, θ′ ∈ Θ, we have
(5.2) |
where G̃(s, x) is a nonnegative monotone increasing cádlág (respectively cáglád) function of x such that G̃(s, 0) = 0 and ||G̃(τ, ·)||Q,r < ∞. In this case, the Euclidean property is satisfied with envelope G(s) = [C1 + C2diam Θ]G̃(τ, s) + gx0θ0(s), where gx0θ0 (s) is an arbitrary function from the class
.
To verify measurability of the estimates, we shall need some properties of Carathéodory integrands and cádlág or cáglád functions. If
and
are Polish spaces then a function f: Ω0 ×
→
is called a Carathéodory integrand if for fixed t ∈
, f (·, t): Ω0 →
is measurable, and for fixed ω0 ∈ Ω0, f (ω0, ·):
→
is continuous. Here (Ω0,
) is an arbitrary measure space and we have
Lemma 5.1
Let f: Ω0 ×
→
be a Carathéodory mapping. Then
f is measurable with respect to
×
(
).
For any open set B of
, let H(ω0) = {t: f (ω0, t) ∈ B}. Then for any closed or open C set of
, we have H−1(C) = {ω0: f (ω0, t) ∈ B for some t ∈ C} ∈
.
If g: Ω0 →
is measurable, then the composite mapping f ∘ g: Ω0 →
given by (f ∘ g)(ω0) = f (ω0, g(ω0)) is measurable.
Suppose that
is another Polish space and h: Ω0 ×
×
→
is a Carathéodory integrand. Then the composite map (h ∘ f): Ω0 ×
→
given by (h ∘ f )(ω0, t) = h(ω0, t, f (ω0, t) is a Carathéodory integrand.
Part (i) remains valid even if
is replaced by a nonseparable metric (Kuratowski 1966 p. 378, or Himmelberg, 1975). In part (ii), if C is a closed set then
where the union is over a dense subset of C. If C is open then it can be represented as a countable increasing union of closed sets and part (ii) follows by noting that inverse images preserve unions of sets. Part (iii) follows from the definition of a measurable function and continuity of f with respect to t. Part (iv) follows from part (i) and (iii) and definition of a continuous function.
Part (i) of the lemma extends to functions f which are cádlág, cáglád, cád and cág in t ∈
,
= R+ or
= [0, τ] and take on values in a complete separable metric space (e.g. Dellacherie and Meyer, 1975 p. 144). Any cádlág or cáglád function is also a pointwise limit of Carathéodory integrands.
Finally, suppose that
= [0, τ] × Θ and f is a function such that (i) for fixed (x, θ) ∈
, f(·, x, θ) is the
measurable and (ii) for fixed ω0 ∈ Ω0, it is jointly cádlág with respect to (x, θ) and continuous with respect to θ. To see that f is jointly measurable, let {qk: k ≥ 1} be a dense set in Θ and for given integer m ≥ 1 let Bmk be a balls of radius 1/m centered at qk covering Θ. Set
and
Then fm is joinly measurable and pointwise converges to f. Similarly, if f is jointly cáglád rather than cádlág function in (ii) then f is a jointly measurable with respect to
⊗
(
). Similarly to the single parameter case, functions of this type are pointwise limits of Carathéodory integrands. Part (ii) of the lemma remains valid for sets of the form C = I × C′, where C′ is an open or closed subset of Θ and I is an interval contained in [0, τ]. In particular, if f is a real valued cádlág function of this type then its supremum is
measurable.
5.3 Proof of Proposition 3.1
To show proposition 3.1, we shall first consider the process Γnθ(x), (x, θ) ∈ [0, τ] × Θ =
.
Lemma 5.2
-
The process Ŵ= {Ŵ(t) = [Ŵj(t): t = (x, θ) ∈
, j ∈
}, , converges weakly in ℓ∞(
×
) to
where {V (t) = [Vj(t): t = (x, θ) ∈, j ∈
} is a tight mean zero Gaussian process. Its covariance function is given by
In addition, under the assumption that observations correspond to a censored modulated renewal process and θ = θ0 is the true parameter, cov(Vj(x, θ), Vj′(x′, θ)) = 1(j = j′)Cjθ(x ∧ x′).
Let θ0 be an arbitrary point in Θ. If θ̂ is a -consistent estimate of it, then the process Ŵ0 = {Ŵ0(x): x ≤ τ}, converges weakly in ℓ∞([0, τ] ×
) to W0 = W (·, θ0).
Here the space
= ℓ∞(
×
) is equipped with uniform metric, dX(x̃, ỹ) = supt,j |x̃(t, j) − ỹ(t, j)| and is isometric to the space
= ℓ∞(
)q equipped with metric dY (x, y) = maxj supt |xj(t) − yj(t)|. Apparently, the isometry is given by the mapping Φ assigning to each x̃ ∈
the vector of coordinate functions, Φ(x̃) = [x̃(·, 1), …, x̃(·, q)]T. Open sets of
can be represented as arbitrary unions of balls
(x̃, ε) = {y: dX(x, y) < ε}. On the other hand, the product topology of
coincides with the topology induced by the metric dY so that any open set in the product topology is an arbitrary union of balls
(x, ε), where x = [x1, …, xq].
Proof
To show part (i), define Vn = [Vjn: j ∈
], where
Then Vjn = V1jn + remj, where
and remj(x, θ) is a remainder term. Lemma 5.3 gives its form and shows that ||remj|| = oP (n−1/2). Therefore the process V1n = [V1jn: j ∈
] satisfies also ||Vn − V1n|| = oP (n−1/2).
Using CLT and Cramer-Wold device, the finite dimensional distributions of
converge in distribution to finite dimensional distributions of V: for any distinct t1, …, tk ∈
and any numerical vector λ of length kq, the random variable λTvec[V1n(t1), …, V1n(tk)] converges in distribution to the corresponding linear combination of finite dimensional marginals of V.
For each j ∈
, the process V1jn can be represented as V1jn(x, θ) = [ℙn − P ]g, where g varies over a class
= {gtj: t = (x, θ) ∈
} consisting of cádlág functions such that each gtj is a difference of two càdlàg functions, increasing in x and Lipschitz continuous with respect to θ. Setting
, the condition (5.2) is satisfied with constants C1 and C2 determined by the functions ψ, ψ1 of the condition 5.2 (ii) and gt0 ≡ gτ, θ0, say. Correspondingly, the class
is Euclidean for a square integrable envelope Gj. From Pollard (1984,1990) it follows that the process
converges weakly in ℓ∞(
) to Vj, the j-th component of the process V because the class
is totally bounded and asymptotically uniformly equicontinuous with respect to the variance pseudo-metric dj(t, t′) = sd(V1jn(t) − V1jn(t′)), t, t′ ∈
. Joint weak convergence of the process
, g ∈ ∪j Gj follows from finite dimensional weak convergence and by noting that union of a finite number Euclidean classes of functions is also Euclidean (Pollard, 1990). In particular, the class
is totally bounded and asymptotically equicontinuous with respect to the variance pseudo-metric d((t, j), (t′, j′)) = sd(V1nj(t) − V1nj′(t′)). Denoting by
the left-continuous process (obtained by changing the integrals over (0, x] to integrals over intervals [0, x)), the process
converges weakly to V as well because the jumps of the process Vn are of the order Op(1/n) unifromly in t ∈
and the functions ENj are continous.
Finally, to show weak convergence of the standardized Γnθ process, we shall need bounds on the supremum of the norm of the vector Vn. Let
denote the class of functions
= {h(λ, t) =Σj=1
λjgtj: gtj ∈
, |λj| ≤ 1, j = 1, …, q}. Then
forms a Euclidean class for the envelope H = Σj Gj and we have
Similarly, and the left-continuous versions of the process satisfy similar bounds.
To show consistency of the estimate Γnθ, we first assume the condition 5.1. (iii-a). Let Ajn be the Aalen-Nelson estimator. Let , p = 1, 2. Then A2jn(x) ≤ Γnjθ(x) ≤ A1jn(x) for all θ ∈ Θ and a similar algebra as in Dabrowska (2006) shows that
where ρjn = max(cj, 1)A1jn for some constant cj. Therefore ||Γnθ − Γθ||1(x) ≤ ||Vn(x, θ)||1+∫(0,x] ||Γnθ − Γθ||1(u−)ρn(du), where ρn = Σ j ρnj. Gronwall’s inequality (Beesack, 1973, Dabrowska, 2006) implies that supx,θexp[−ρn(x)]||Γnθ− Γθ||1(x) → 0 a.s., where the supremum is over θ ∈ Θ and x ∈ [0, τ]. In the case of the condition 5.1.(iii-b), the proof is the same, except that the function ρjn is replaced by ρj = max(cj, 1)Φ1(A.n), where A.n = ΣAjn. Note that Aalen-Nelson estimate is a measurable process, whereas measurability of the process Γnθ is verified below.
The process satisfies
where N̄(x) is the diagonal matrix N̄(x) = diag [N1..(x), …, Nq..(x)], and b̃nθ(u) is a q × q matrix with columns
Let bθ(u) be a q × q matrix with columns
. Using consistency of Γnθ and Lemma 5.3, we have [b̃nθ − bθ](u) ∈ 0 a.s. uniformly in (u, θ) ∈
. Moreover, (5.1) and (5.2) imply also that ||R1n||→ 0 a.s., where
Define
Then
and
where
(5.3) |
Setting , we have
The process W̃ is a sum of iid mean zero processes whose finite dimensional distributions are asymptotically normal and converge to the finite dimensional distributions of the process W in the statement of the proposition. Moreover, its components can be represented as empirical processes indexed by Euclidean classes of functions satisfying the condition (5.2). Therefore a similar argument as in the case of the process , shows that W̃ ⇒ W. The remainder term (5.3) is bounded by , where
where N… = ΣqNq,... We have ||R2n|| = oP (1), by a similar V-process expansion as in Lemma 5.4 below. Using Kolmogorov equations for matrix product integrals (Gill and Johansen, 1990), we also have
and
From this we also get ||R3n|| = oP (1), because bθ (u) is uniformly bounded. Finally, ||R4n|| = oP (1) Combining, the right-hand side of (5.3) is of the order oP (1), uniformly in (x, θ) ∈
. For fixed (x, θ), we also have
and by uniform Gronwall’s inequality (Beesack, 1993, Dabrowska, 2006), we have Ŵ(x, θ) = W̃(x, θ) + oP (1) uniformly in (x, θ) ∈
.
To complete the argument, we note that the processes V1n Vn, W̃ and the remainders Rpn, p = 1, …, 3 satisfy measurability conditions of section 5.2, whereas to show that Ŵ and R4n have this property, it is enough to show that the process Γnθ is measurable. However, the aggregate process
is measurable since it is cádlág increasing with respect to x and measurable with respect to
for fixed x. For any integer k and ω0 = (s1, …, sn), Tk(ω0) = inf{x: N…(ω0, x) ≥ k} is a random variable because {ω0: Tk(ω0) ≤ x} = {ω0: N…(x, ω0) ≥ k}∈
. Similarly, the censored data ranks Rim = Σk 1(Tk ≤ Xim) are measurable. Define set valued mapping Hn:
↪ Rq by setting Hn(ω0) = {(θ, x): Γnθ(ω0, x)) ∈ B} where B is an open set of Rq. Then Hn(ω0) =∪ℓ≥0
Hnℓ(ω0) where
On the set Al = {ω: N…(ω0, τ) = ℓ}∈
, the process Γnθ is a weighted sum
and the weights form a finite composition of Carathéodory integrands. Suppressing dependence on ω0, hnθ(·, k) is the k-th column of a q × ℓ matrix hn with entries
where j = (j1, j2) ∈
and gnθ is a q × ℓ matrix with columns
Alternatively, , where and for r = 1, …, ℓ
where
for j = (j1, j2) ∈
. The indicators 1(Tk(ω0) ≤ x) are jointly measurable with respect to
⊗
(
) and by Lemma 5.1, so are the weights hnθ and gnθ. Therefore the graph of Hnℓ is
⊗
(
) is measurable and
A similar argument can be used to show measurability of the process Γ̇nθ in part (ii). Using arguments analogous to Dabrowska (2006), and ||Γ̇nθ0+hn− Γ̇nθ0|| = OP (||hn||1) = oP (1) for any deterministic sequence hn → 0 or a random - measurable sequence hn →P0. Therefore if θ̂ is an -measurable - consistent estimator of θ0, then setting hn = θ̂ − θ0, we have Ŵ0(x) = Ŵ(x, θ0) + remn(x), where . For non-measurable hn and θ̂n, convergence is in outer probability.
Let us assume now that fj(y, θ, z), j ∈
is a scalar Carathéodory integrands such that | fj(y, θ, z)| ≤ ψ̃(||y||1) and | fj(y, θ′, z) − fj(y′, θ′, z)| ≤ [|θ − θ′| + ||y − y′||1] max( ψ̃′(||y||1), ψ̃′||y′||1) where ψ̃= ψ1,ψ2 and,ψ̃′ = ψ3 satisfy conditions 5.1. Put
, where Sj.i[fj](u, θ) = Σm Yjmi(u)(fjαj)( Γθ (u), θ, Zjmi), and let sj[fj] = ESj[fj]. We write Sj[1] and sj[1] when fj ≡ 1, and set êj[fj] = Sj[fj]/Sj[1] and ej[fj] = sj[fj]/sj[1].
Lemma 5.3
We have ||Sj[fj]/sj[1] − sj[fj]/sj[1]|| → 0 a.s. for all j ∈
.
Proof
We have ([Sj[fj]/sj[1]])(x, θ) = ℙngxθ, where
The conditions 5.1 imply that there exist constants C1 and C2 (dependent on the functions ψ̃, ψ̃′) such that
Define G(Di) = Yj.i(0)[C2diam Θ+C1][1+ENj.i(τ)+EYj.i(0)]+gx0θ0 (Di), where (x0, θ0) is an arbitrary point in Θ × [0, τ]. Let θp, p = 1, …, ℓ = O(diam Θ/ε)d be centers of balls B(θp, ε) of radius ε covering the set Θ. By noting that ENj.i is an increasing continuous function and EYj.i is a decreasing cáglád function, we can construct a finite partition 0 = x0 < x1 < … < xk = τ such that the intervals Ir = [xr−1, xr], r = 1, …, k satisfy ENj.i(Iq) ≤ εENj.i(τ) and E|Yj.i(Ir)| ≤ εEYj.i(0). Let xq be the center of the interval Ir. Then for each x ∈ Ir and θ ∈ B(θp, ε), we have ||gxθ(Di) − gxrθp(Di)||P,1 ≤ ε||G(Di)||P,1. It follows that the class of functions
= {gxθ: x ∈ [0, τ], θ ∈ Θ} is Euclidean for the envelope G(Di) and Glivenko-Cantelli.
Lemma 5.4
For j ∈
, define remj(x, θ) = [Vjn − V1jn](x, θ) and
where fj satisfies assumptions of Lemma 5.3. Then and .
Proof
For the sake of convenience write rem = remj and B = Bj. Put ηj(u, θ) = [Sj/sj](Γθ(u), θ, u) − 1. A little algebra shows that
We have rem2(x, θ) = OP (1)rem3(τ, θ), where
In addition,
We have B4(x, θ) = OP (1)B5(θ),
These expressions can be rewritten as V processes of degree r + 1, r ≤ 3
where the sum extends over sequences r + 1-tuplets Dir+1= (Di1, …, Dir+1) ir+1 = (ir1, …, ir+1), ij ∈ 1, …, n. The kernels g vary over the class
= {gt: t ∈
}, where for t = (x, θ) we have
(5.4) |
or
(5.5) |
Here hℓ(Diℓ) are functions of the form Sj[fj]/sj[1], Sj[1]/s[1] and . In all cases, there exists a constant C, such that hℓ(Di, θ, u) ≤ CYji(u) and |hℓ(Di, θ, u)− hℓ(Di, θ′, u)| ≤ |θ − θ′|CYj.i(u). Therefore, for any sequence Dir+1 = (Di1, …, Dir+1), we also have
where
and Hℓ(Di, u) = CYj.iℓ(u), ℓ = 1, …, r for some constant C.
Let {
(gt): t ∈
} denote the U process associated with the kernels (5.4–5.5). It is easy to see that
(gt) forms a canonical process. For Dr+1 = (D1, …, Dr+1), we have EGp(Dr+1) < ∞ for p = 1+1/(2r+1). Therefore, by Marcinkiewicz-Zygmund law in Teicher (1998) and Lemma A.1 in Dabrowska (2009),
. By Marcinkiewicz-Zygmund theorem in de la Peña and Giné (1999), we also have
a.s. because
where ir+1 = (i1, …, ir+1) and d = d(ir+1) is the number of distinct coefficients among {i1, …, ir+1}, d = 1, …, r, r ≤ 3.
We denote now by ||B||v the variation norm of a d × q-matrix of functions B(x) = [bkl(x)], x ∈ [0, τ]. For any interval I ⊆ [0, τ], , where the supremum is taken over finite partitions of I such that xi < xj.
Further, let
(θ0, εn) be a ball centered at θ0 of radius εn, εn ↓ 0,
. Suppose that ϕθ(x) is a d × q matrix of functions, with columns of the form
such that ||ϕθ0||v = O(1). Let ϕnθ be a sequence of consistent estimators such that
-
(i)
ϕnθ(x) is a càdlàg or càglàd function (jointly in (x, θ)), continuous with respect to θ;
-
(ii)
lim supn sup{||ϕnθ||v: θ ∈
(θ0, εn)} = OP (1);
-
(iii)
sup{||ϕnθ − ϕθ0||∞: θ ∈
(θ0, εn)} = oP (1) or
-
(iii′)
ϕnθ − ϕnθ′ = (θ − θ′)ψnθ,θ′ where lim supn sup{||ψnθθ′||v: θ, θ′ ∈
(θ0, εn)} = OP (1).
If ϕnθ is a jointly measurable estimator then conditions (ii)–(iii) are assumed to hold in probability. If this is not the case then the conditions (ii)–(iii) are taken to hold in outer probability.
Lemma 5.5
If ϕnθ(x) is a measurable process satisfying (i)–(ii) and (iii) or (iii′) then with probability tending to 1, the equation Unφn(θ) = 0 has a consistent root θ̂ in the ball
(θ0, εn). In addition, under the condition (iii′), the score equation has a unique root in
(θ0, εn), with probability tending to 1.
If ϕnθ is not measurable, then statements in part (1) hold with inner probability tending to 1.
If θ̃ is an arbitrary consistent estimator of θ0, then the equation Unϕ̃n(θ) = 0, where ϕ̃n(x) = ϕnθ̃(x) has a unique solution θ̂, with (inner) probability tending to 1, and Unϕn(θ̂) = op*(n−1/2).
In all three cases,
and the process
converge weakly in Rd×ℓ∞([0, τ]×
) to a mean zero Gaussian process defined in the statement of Proposition 3.1.
Proof
Case (1)
Write Un(θ) = U nϕn(θ) for short. Set b̃jmi(Γθ(u), θ, u) = = b̃jmi1(Γθ(u), θ, u) − ϕθ0(u)b̃jmi2(Γθ(u), θ, u) where
Define b̄jmi(Γθ(u), θ, u), b̄jmi1(Γθ(u)θ, u) and b̄jmi2(Γθ(u), θ, u) using similar expressions with ej[ℓ̇j] and replaced by êj[ℓ̇j] and . We have , where
and
Here for λ ∈ (0, 1). We have . Moreover, r1n(x, θ0) converges almost surely to
uniformly in x, x ≤ τ. Lemma 5.2 and integration by parts imply that the terms [ ] converge weakly to a pair of independent normal variables with mean zero and covariances Σ0(θ0) and Σ2(θ0) − Σ0(θ0), respectively. By Lemma 5.3–4, we also have U3n(θ0) = oP (n−1/2). Finally,
where
By Lemmas 5.2–5.4, we have
and
, uniformly in θ ∈
( θ0, εn). On the other hand, at θ = θ0, {
} is a sum of iid mean zero processes. The finite dimensional distributions are mean zero variables with finite variance-covariance matrix and converge weakly to mean zero Gaussian variables. Each component of B3n(x, θ0) is a measurable process which can be represented as a finite linear combination of càdlàg monotone functions of x with a square integrable envelope satisfying (5.2). The same argument as in Lemma 5.2 implies that the process is
converges weakly to a mean zero Gaussian process with sample paths continuous with respect to the variance semi-metric. The space of functions continuous with respect to the variance semi-metric is isometric to the space C([0, τ])q. By almost sure representation theorem and a similar integration by parts argument as in Bilias et al (1997) we have
.
Set
. Some elementary algebra shows that for θ, θ′ ∊
(θ0, εn), we have Ûn(θ) = Ûn(θ′) + (Σn(θ0) + rem0n(θ, θ′))(θ − θ′), where Σn(θ0) is a matrix which converges in probability −Σ1(θ0). The matrix Σ1(θ) is defined in Section 3 and is non-singular. Further, U4n(θ) − U4n(θ′) = rem2n(θ, θ′)(θ − θ′) + rem3n(θ, θ′) + O(|θ − θ0| ∨ |θ′ − θ0|)rem4n(θ, θ′). Setting
, and bqn = sup{|remqn(θ, θ′)|: θ, θ′ ∈
( θ0, εn)}, q = 1, …, 4, we have b1n = oP (1), b2n = oP (1). Under the condition (iii′), remnq ≡ 0 ≡ bqn, q = 3, 4, while under the condition (iii), b3n = oP (n−1/2) and b4n = oP (1).
Put an=b1n + b2n + b4n and An= b5n + b3n, where b5n=|Σ(θ0)−1Ûn(θ0)| = OP (n−1/2). Let 0 < η < 1/2 and 0 < η′ < 1 be given. By asymptotic tightness of An, we can find a compact set K = K(η) and n0 such that for all n ≥ n0 and all open sets G containing K, we have
and Pn(an > η′) < η. Therefore, we also have
for all finite M ≥ M0, where M0 = M0(η) is a large enough finite nonnegative constant. Since
and εn ↓ 0, by eventually increasing n0, we can assume that for n ≥ n0, we have
(θ0, εn) ⊂ Θ and
. Consequently, the set En ⊂
given by En = {ω0: An(ω0)/(1 − an(ω0) < εn, an(ω0) ≤ η′} satisfies Pn(En) ≥ 1 − 2η for all n ≥ n0.
For n ≥ n0, consider the set-valued mapping Hn:
, ↪ Rd given by
The graph of Hn, graphHn = {(ω0, θ): θ ∈ Hn(ω0)} is -measurable and . Further, let . Then gn is measurable, because it is continuous with respect to θ for fixed ω0 and -measurable for fixed θ. It follows that the set valued mapping
is closed-valued and has an - measurable graph. We have domCn = En: for fixed ω0 ∈ En, Hn(ω0) is a closed ball, gn(ω0, θ) is continuous and maps Hn(ω0) into itself. By Brouwer’s fixed point theorem, Cn(ω0) ≠ ∅. Thus En ⊆ domCn, while the reversed inclusion is obvious.
Further, for any root θ̂ in domCn, we have , and so that converges in law to the normal distribution given in Section 3. An argument similar to Bickel et al. (1993, p.517) shows also that under the condition (iii′), gn(ω0, θ) is a contraction on Hn(ω0), ω0 ∈ En, with contraction coefficient an(ω0). Thus in this case, the root is unique: Cn(ω0) = {θ̂ (ω0)} for ω0 ∈ En and n ≥ n0.
Case (2)
If ϕnθ estimators are not
measurable, then the score function splits into two parts: Ũn(θ) = Ûn(θ) + U4n(θ). The term Ûn(θ) remains
measurable, while the second term is not. However, b3n = op*(n−1/2), an = oP* (1) while b5n = |Σ(θ0)−1Ûn(θ0)| = Op(n−1/2). In this case, the set En satisfies lim infn Pn,*(En) ≥ 1−2η and the closed ball
(θ0, An/1−an) is contained in
( θ0, εn) with inner probability tending to 1.
Case (3)
We write Ũn(θ) for the modified score function obtained by substituting in ϕ̃n(x) = ϕnθ̃(x) in place of ϕnθ. Suppose that θ̃ is -measurable and ϕnθ(x) is measurable. Then the plug-in estimator ϕnθ̃(x) is measurable and the modified score process Ũn(θ) is measurable. Moreover, we have Ũn(θ) = Ûn(θ) + Ũn4(θ), where the remainder Ũ4n(θ) satisfies , uniformly in θ ∈ B(θ0, ε0) and . With probability tending to 1, the modified equation has a unique root θ̂ in a compact random ball contained in B(θ0, εn) and Un(θ̂) = oP*(n−1/2). On the other hand, if either θ̃ or ϕnθ are not measurable, then this remains to hold, except that the modified equation has a unique solution with inner probability tending to one.
Under assumptions of part (1), measurable selection theorems (Wagner, 1976) ensure that there exists at least one function such that whenever ω0 ∈ En and is measurable with respect to . This also applies to part (3), provided θ̃ and ϕnθ are - measurable.
5.4 Proof of Proposition 3.2
With some abuse of notation, set V = [Vj, j ∈
] where V (x) = V (x, θ0) and V (x, θ) is the Gaussian process of Lemma 5.1. Under the assumption that θ0 is the true parameter of the modulated renewal process, the process V corresponds to a vector of independent time-transformed Brownian motions with covariance
Similarly, let V̌ = [V̌j: j ∈
] be equal to
where V1n(x, θ) is defined as in Lemma 5.1. Thus the j-th component of V˘ is
Put ,
Finally, let G0 be a
(0, Id×d) variable, independent of (Di, Gi)’s. Set
and
. We have
,
(5.6) |
Moreover, is independent of D1, …, Dn. This also means that it is independent of (V̌#, V̌).
We consider first unconditional weak convergence. By central limit theorem and strong law of large numbers, the finite dimensional distributions of the processes (V̌, V̌#) converge weakly to finite dimensional distributions of (V, V#), two independent vectors of Brownian motions with variance functions Cj, θ0, j = 1, …, q.
For each j = 1, …, q, the process can be represented as , where
The class of functions has a square integrable envelope
and is Euclidean for this envelope because each
is a difference of two functions increasing in x and bounded by Fj(Gi, Di). Thus
forms a Donsker class of functions. The union of these classes,
= ∪j
is Donsker as well. From Lemma 1, the process V̌ = {V̌j(x): x ∈ [0, τ], j ∈
} can be also represented as an empirical process over a Euclidean class of functions
and the union
∪
forms a Donsker class. Using consistency of the estimates (θ̂, Γnθ̂), Lemma 5.5 and a couple of lines integration by parts yields also ||V̂# − V̌#|| = oP (1) in outer probability.
Write V̌# as the empirical process V̌
# = ℙnf, f ∈
. Further, let BL1 be the collection of Lipschitz functions h from Rd × ℓ∞(
) into [0, 1], such that |h(r, w) − h(r′, w′)| ≤ |r − r′| + ||w − w′|| for r, r′ ∈ Rd and w, w′ ∈ ℓ∞(
). The set
is totally bounded with respect to the variance pseudo-metric d. Therefore, for fixed δ > 0, it can be covered by a finite number of d-balls of radius δ, say
(fl,δ) ℓ = 1, …, k = k(δ). Set V
# ∘ πδ = ℙnπδ(f ), where πδ(f ) = fℓ for f ∈
(fℓ,δ) (pick one fℓ for each f ∈
). By triangular inequality, we have
where
For given ε > 0, we can choose δ0 so that I1(δ) < εfor all δ <δ0. The second term converges in outer probability to 0, for any δ. This follows from weak convergence of finite dimensional distributions of V̌# and the same argument as in Van der Vaart and Wellner (1996, p. 182), except that in our setting, the Lindeberg condition of their Lemma 2.9.5 is not needed to verify conditional weak convergence of finite dimensional distributions. We also have
where
= {f − f′: f, f′ ∈
: d(f − f′) < δ}. Since
forms a Euclidean class of functions with a square integrable envelope, we have
. Finally, the term I4(δ) does not depend on δ, and we have
. By unconditional convergence, we have I4(δ) → 0 in outer probability.
Finally, set , where
The estimates [Ξ̂#,
] defined in Section 4 are
, where Ψ̂ is the sample analogue of Ψ obtained by plugging in the estimates
,Q̂θ̂, ρj, ϕn(·, θ̂0). By the continuous mapping theorem, unconditionally,
. By triangular inequality one more time, we have
, where
For any Lipschitz continuous function h ∈ BL1, h ∘ Ψ ∈ BLc for some constant c. Therefore the preceding implies that J2 tends to 0 in outer probability. This also holds for the term J1, because ||Ξ̌# − Ξ̂#||→P* 0 and , by consistency of the estimates (θ̂, Γnθ̂) and integration by parts.
Acknowledgments
The data presented here were obtained from the Statistical Center of the Center for International Blood and Marrow Transplant Research (CIBMTR). The analysis has not been reviewed or approved by the Advisory or Scientific Committee of the CIBMTR. The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and marrow transplant patients with the CIBMTR Data Collection Center located in the Medical College, Wisconsin. The CIBMTR is a repository of information about results of transplant at more than 450 transplant centers worldwide. I thank Mei-Jie Zhang for preparation of the data and some discussions. I also thank a reviewer and Editor Daniel Commenges for their comments. Research supported by the grant R01 AI067943 from the National Institute of Allergy and Infectious Diseases. The content is solely the responsibility of the author and does not necessarily represent the official views of NIAID, NIH or CIBMTR.
References
- 1.Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer; New York: 1993. [Google Scholar]
- 2.Arjas E, Eerola M. On predictive causality in longitudinal studies. J Statist Planning and Inference. 1993;34:361–386. [Google Scholar]
- 3.Bagdonovicius V, Nikulin M. Generalized proportional hazards model based on modified partial likelihood. Lifetime Data Analysis. 1999;5:329–350. doi: 10.1023/a:1009688109364. [DOI] [PubMed] [Google Scholar]
- 4.Bagdonovicius M, Hafdi MA, Nikulin M. Analysis of survival data with cross-effects of survival functions. Biostatistics. 2004;5:415–425. doi: 10.1093/biostatistics/5.3.415. [DOI] [PubMed] [Google Scholar]
- 5.Beesack PR. Carlton Math Lecture Notes. Vol. 11. Carlton University; Ottawa: 1973. Gronwall Inequalities. [Google Scholar]
- 6.Bickel PJ. Efficient testing in a class of transformation models. Proceedings of the 45th Session of the International Statistical Institute; ISI, Amsterdam. 1986. pp. 23.3.63–23.3.81. [Google Scholar]
- 7.Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation in semi-parametric models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]
- 8.Bickel PJ, Ritov Y. Local asymptotic normality ranks and covariates in transformation models. In: Pollard D, Yang G, editors. Festschrift for L LeCam. Springer; New York: 1995. [Google Scholar]
- 9.Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann Statist. 1997;25:662–683. [Google Scholar]
- 10.Chang I-S, Hsiung CA. Information and asymptotic efficiency in some generalized proportional hazard models for counting processes. Ann Statist. 1994;22:1275–1298. [Google Scholar]
- 11.Chang I-S, Chuang Y-C, Hsiung CA. A class of nonparametric k-sample tests for semi-Markov processes. Statistica Sinica. 1999;9:211–277. [Google Scholar]
- 12.Chang I-S, Hsiung CA, Wu S-M. Estimation in a proportional hazard model for semi-Markov counting process. Statistica Sinica. 2000;10:1257–1266. [Google Scholar]
- 13.Chen K, Jin Z, Ying Z. Semi-parametric analysis of transformation models with censored data. Biometrika. 2002;89:659–668. [Google Scholar]
- 14.Chintagunta P, Prasad AR. An empirical investigation of the “Dynamic McFadden” model of purchase timing and brand choice: implications for market structure. J Business and Economic Statist. 1998;16:2–12. [Google Scholar]
- 15.Cinlar E. Introduction to Stochastic Processes. Prentice-Hall; New Jersey: 1975. [Google Scholar]
- 16.Cook RJ, Lawless JF. The Statistical Analysis of Recurrent Events. Springer; New York: 2007. [Google Scholar]
- 17.Commenges D. Semi-Markov and non-homogeneous Markov models in medical studies. In: Janssen J, editor. Semi-Markov models. Plenum Press; New York: 1986. pp. 411–422. [Google Scholar]
- 18.Commenges D, Joly P, Gégout-Petit A, Liquet B. Choice between semi-parametric estimators for Markov and non-Markov multistate models from coarsened observations. Scand J Statist. 2007;34:33–52. [Google Scholar]
- 19.Cox DR. The statistical analysis of dependencies in point processes. In: Lewis PAW, editor. Symposium on Point Processes. Wiley; New York: 1973. [Google Scholar]
- 20.Cutler C, Antin JH. Peripheral blood stem cells for allogeneic transplantation: a review. Stem Cells. 2001;19:108–117. doi: 10.1634/stemcells.19-2-108. [DOI] [PubMed] [Google Scholar]
- 21.Cutler C, Giri S, Jeyapalan S, Paniagua D, Viswanathan A, Antin JH. Acute and chronic graft-versus-host disease after allogeneic peripheral blood stem-cell and bone marrow transplantation: a meta analysis. J Clin Oncol. 2001;19:3685–3691. doi: 10.1200/JCO.2001.19.16.3685. [DOI] [PubMed] [Google Scholar]
- 22.Dabrowska DM, Sun G, Horowitz MM. Cox regression in a Markov renewal model: an application to the analysis of bone marrow transplant data. J Amer Statist Assoc. 1994;89:867–877. [Google Scholar]
- 23.Dabrowska DM. Estimation of transition probabilities and bootstrap in a semi–parametric Markov renewal model. J Nonparametric Statist. 1995;5:237–259. [Google Scholar]
- 24.Dabrowska DM. Estimation in a class of semi-parametric transformation models. In: Rojo J, editor. Second Erich L Lehmann Symposium - Optimality. Vol. 49. Institute of Mathematical Statistics; 2006. pp. 166–216. Lecture Notes and Monograph Series. [Google Scholar]
- 25.Dabrowska DM. Information bounds and efficient estimation in a class of censored transformation models. Acta Applicandae Mathematicae. 2007;96:177–201. [Google Scholar]
- 26.Dabrowska DM. Estimation in a semi-parametric two-stage renewal regression model. Statistica Sinica. 2009;19:981–996. [PMC free article] [PubMed] [Google Scholar]
- 27.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. Springer; New York: 1988. [Google Scholar]
- 28.Dellacherie C, Meyer PA. Probabilities and Potentiel. Hermann; Paris: 1975. [Google Scholar]
- 29.de la Peña V, Giné H. Decoupling: From Dependence to Independence. Springer; New York: 1999. [Google Scholar]
- 30.Dudley RM. Uniform Central Limit Theorems. Cambridge University Press; 1999. [Google Scholar]
- 31.Eerola M. Probabilistic causality in longitudinal studies. Springer; New York: 1994. [Google Scholar]
- 32.Friedrichs B, Tichelli A, Bacigalupo A, Russel NH, Ruutu T, Beksac M, Hasenclever D, Socié G, Schmitz N. Long-term outcome and late effects in patients transplanted with mobilised blood or bone marrow: a randomised trial. Lancet Oncology. 2001;11:331–338. doi: 10.1016/S1470-2045(09)70352-3. [DOI] [PubMed] [Google Scholar]
- 33.Flowers MED, Parker PM, Johnston LJ, Matos AV, Storer B, Bensinger WI, Storb R, Appelbaum FR, Forman SJ, Blume KG, Martin PJ. Comparison of chronic graft-versus-host disease after transplantation of peripheral blood stem cells versus bone marrow in allogeneic recipients: long-term follow-up of a randomized trial. Blood. 2002;100:415–419. doi: 10.1182/blood-2002-01-0011. [DOI] [PubMed] [Google Scholar]
- 34.Gale RP, Bortin MM, van Bekkum DW, Biggs JC, Dicke KA, Gluck-man E, Good RA, Hoffman RG, Key HEM, Kersey JH, Marmont A, Masaoka T, Rimm AA, van Rood JJ, Zwaan FE. Risk factors for acute graft-versus-host disease. Br J Haematol. 1987;67:397–406. doi: 10.1111/j.1365-2141.1987.tb06160.x. [DOI] [PubMed] [Google Scholar]
- 35.Gill RD. Nonparametric estimation based on censored observations of a Markov renewal process. Z Wahrscheinlichkeitstheorie verv Gebiete. 1980;53:97–116. [Google Scholar]
- 36.Gill RD, Johansen S. A survey of product integration with a view toward application in survival analysis. Ann Statist. 1990;18:1501–1555. [Google Scholar]
- 37.Greenwood P, Wefelmeyer W. Empirical estimators for semi-Markov processes. Math Meth Statist. 1996;5:299–315. [Google Scholar]
- 38.Greenwood P, Müller UU, Wefelmeyer W. Semi-Markov processes and their applications. Commun Stat Theory Methods. 2004;33:419–435. [Google Scholar]
- 39.Himmelberg CJ. Measurable relations. Fund Math. 1975;87:53–72. [Google Scholar]
- 40.Hjort NL, Cleaskens G. Frequentist model average estimators. J Amer Statist Assoc. 2003;98:938–945. [Google Scholar]
- 41.Hjort NL, Cleaskens G. Focused information criteria and model averaging for Cox’s hazard regression model. J Amer Statist Assoc. 2006;101:1449–1464. [Google Scholar]
- 42.Jacod J. Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Z Wahrscheinlichkeitstheorie verv Gebiete. 1975;31:235–254. [Google Scholar]
- 43.Janssen J. Semi-Markov Models: Theory and Applications. Springer; New York: 1999. [Google Scholar]
- 44.Janssen J, Manca R. Semi-Markov Risk Models for Finance, Insurance and Reliability. Springer; New York: 2007. [Google Scholar]
- 45.Janssen J, Manca R. Applied Semi-Markov Processes. Springer; New York: 2006. [Google Scholar]
- 46.Janssen J, Limnios N. International Symposium on Semi-Markov Models: Theory and Applications. Kluwer: Academic Press; 2001. [Google Scholar]
- 47.Jones MP, Crowley JJ. Nonparametric tests of the Markov model for survival data. Biometrika. 1992;79:513–522. [Google Scholar]
- 48.Kalbfleisch JD, Prentice RL. Statistical Analysis of Failure Time Data. Wiley; 1981. [Google Scholar]
- 49.Karr AF. Point Processes and their Statistical Inference. Marcel Dekker; New York: 1991. [Google Scholar]
- 50.Keiding N. Statistical analysis of semi-Markov models based on the theory of counting processes. In: Janssen J, editor. Semi-Markov models Theory and Applications. Plenum Press; 1986. pp. 301–315. [Google Scholar]
- 51.Keiding N, Klein JP, Horowitz MM. Multistate models and outcome prediction in bone marrow transplantation. Statist Med. 2001;20:1871–1885. doi: 10.1002/sim.810. [DOI] [PubMed] [Google Scholar]
- 52.Klein JP, Keiding N, Copelan EA. Plotting summary predictions in multistate survival models: probabilities of relapse and death in remission for bone marrow transplantation patients. Statist Med. 1993;12:2315–2332. doi: 10.1002/sim.4780122408. [DOI] [PubMed] [Google Scholar]
- 53.Fillipov A. On certain questions in the theory of optimal control. Vestnik Moskov Univ Ser Mat Meh Astronom Fiz Him. 1962;2:25–32. (1959) English Translation 1 76–84. [Google Scholar]
- 54.Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazard models. Ann Statist. 2004;32:1448–1491. [Google Scholar]
- 55.Kuratowski K. Topology. Academic Press; 1966. [Google Scholar]
- 56.Lagakos SW, Sommer CJ, Zelen M. Semi-Markov models for censored data. Biometrika. 1978;65:311–317. [Google Scholar]
- 57.Last G, Brandt A. Marked Point Processes on the Real Line: the Dynamic Approach. Springer; New York: 1995. [Google Scholar]
- 58.Limnios N, Oprisan . Semi-Markov Processes and Reliability. Vol. 2001 Springer; 2001. [Google Scholar]
- 59.Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]
- 60.Lo SMS, Wilke RA. A copula model for dependent competing risks. Appl Statist. 2010;59:359–376. [Google Scholar]
- 61.Martinussen T, Scheike T. Dynamic Regression Models for Survival Data. Springer; New York: 2006. [Google Scholar]
- 62.Moore EM, Pyke R. Estimation of the transition distributions of a Markov renewal process. Ann Inst Stat Math. 1968;20:411–468. [Google Scholar]
- 63.Nolan D, Pollard D. U-processes: rates of convergence. Ann Statist. 1987;15:780–799. [Google Scholar]
- 64.Oakes D. Survival analysis: aspects of partial likelihood (with discussion) Int Statist Rev. 1981;49:235–264. [Google Scholar]
- 65.Oakes D, Cui L. On semi-parametric inference for modulated renewal processes. Biometrika. 1994;81:83–91. [Google Scholar]
- 66.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov kernels with applications to reliability analysis. Appl Stochastic Models and Data Analysis. 1996;12:209–220. [Google Scholar]
- 67.Ouhbi L, Limnios N. Nonparametric estimation for semi-Markov processes based on its hazard rate functions. Stat Inference Stoch Processes. 1999;2:151–173. [Google Scholar]
- 68.Pollard D. Convergence of Stochastic Processes. Springer Verlag; New York: 1984. [Google Scholar]
- 69.Pollard D. Inst Math Statist. Hayward: 1990. Empirical Processes: Theory and Applications. [Google Scholar]
- 70.Phelan MF. Bayes estimation from a Markov renewal process. Ann Statist. 1999;18:603–616. [Google Scholar]
- 71.Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Statist Med. 2007;26:2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]
- 72.Pyke R. Markov renewal processes: definitions and preliminary properties. Ann Math Statist. 1961a;32:1231–1242. [Google Scholar]
- 73.Pyke R. Markov renewal processes with finitely many states. Ann Math Statist. 1961b;32:1243–1259. [Google Scholar]
- 74.Pyke R, Schaufele R. Limit theorems for Markov renewal processes. Ann Math Statist. 1964;35:1746–1764. [Google Scholar]
- 75.Pyke R, Schaufele R. The existence and uniqueness of stationary measures for Markov renewal processes. Ann Math Statist. 1966;37:1439–1462. [Google Scholar]
- 76.Ringden O, Labopin M, Bacigalupo A, Arcese W, Schaefer UW, Willemze R, Koc H, Bunjes D, Gluckman E, Rocha V, Schattenberg A, Frassoni F. Transplantation of peripheral blood stem cell as compared with bone marrow from HLA-identical siblings in adult patients with acute myeloid leukemia and acute lymphoblastic leukemia. J Clin Oncol. 2002;20(24):4655–4664. doi: 10.1200/JCO.2002.12.049. [DOI] [PubMed] [Google Scholar]
- 77.Rivest LP, Wells MT. A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multiv Analysis. 2001;79:138–155. [Google Scholar]
- 78.Teicher H. On the Marcinkiewicz-Zygmund strong law for U-statistics. J Theoret Probab. 1998;11:279–288. [Google Scholar]
- 79.van der Vaart AW, Wellner JA. Weak convergence and Empirical Processes with Applications to Statistics. Springer; New York: 1996. [Google Scholar]
- 80.Voelkel JG, Crowley JJ. Nonparametric inference for a class of semi-Markov processes with censored observations. Ann Statist. 1984;12:142–160. [Google Scholar]
- 81.Wagner DH. Survey of measurable selection theorems. SIAM, J Control and Optimization. 1977;15:859–903. [Google Scholar]
- 82.Weiss GH, Zelen M. A semi-Markov model for clinical trials. J Appl Probab. 1965;2:269–285. [Google Scholar]
- 83.Zheng M, Klein JP. estimates of marginal survival for dependent competing risks based on an assumed copula model. Biometrika. 1995;82:127–138. [Google Scholar]