A GENERAL ASYMPTOTIC THEORY FOR MAXIMUM LIKELIHOOD ESTIMATION IN SEMIPARAMETRIC REGRESSION MODELS WITH CENSORED DATA

Donglin Zeng; D Y Lin

. Author manuscript; available in PMC: 2010 Jul 1.

Published in final edited form as: Stat Sin. 2010 Apr;20(2):871–910.

A GENERAL ASYMPTOTIC THEORY FOR MAXIMUM LIKELIHOOD ESTIMATION IN SEMIPARAMETRIC REGRESSION MODELS WITH CENSORED DATA

Donglin Zeng ¹, D Y Lin ¹

PMCID: PMC2888521 NIHMSID: NIHMS197848 PMID: 20577580

Abstract

We establish a general asymptotic theory for nonparametric maximum likelihood estimation in semiparametric regression models with right censored data. We identify a set of regularity conditions under which the nonparametric maximum likelihood estimators are consistent, asymptotically normal, and asymptotically efficient with a covariance matrix that can be consistently estimated by the inverse information matrix or the profile likelihood method. The general theory allows one to obtain the desired asymptotic properties of the nonparametric maximum likelihood estimators for any specific problem by verifying a set of conditions rather than by proving technical results from first principles. We demonstrate the usefulness of this powerful theory through a variety of examples.

Key words and phrases: Counting process, empirical process, multivariate failure times, nonparametric likelihood, profile likelihood, survival data

1. Introduction

Semiparametric regression models are highly useful in investigating the effects of covariates on potentially censored responses (e.g. failure times and repeated measures) in longitudinal studies. It is desirable to analyze such models by the nonparametric maximum likelihood approach, which generally yields consistent, asymptotically normal, and asymptotically efficient estimators. It is technically difficult to prove the asymptotic properties of the nonparametric maximum likelihood estimators (NPMLEs). Thus far, rigorous proofs exist only in some special cases.

In this paper, we develop a general asymptotic theory for the NPMLEs with right censored data. The theory is very encompassing in that it pertains to a generic form of likelihood rather than specific models. We prove that, under a set of mild regularity conditions, the NPMLEs are consistent, asymptotically normal, and asymptotically efficient with a limiting covariance matrix that can be consistently estimated by the inverse information matrix or the profile likelihood method.

This paper is the technical companion to Zeng and Lin (2007), in which several classes of models were proposed to unify and extend existing semiparametric regression models. The likelihoods for those models can all be written in the general form considered in this paper. For each class of models in Zeng and Lin (2007), we identify a set of conditions under which the regularity conditions for the general theory hold so that desired asymptotic properties are ensured.

2. Some Semiparametric Models

We describe briefly the three kinds of models considered in Zeng and Lin (2007). We assume that the censoring mechanism satisfies coarsening at random (Heitjan and Rubin (1991)).

2.1. Transformation Models for Counting Processes

Let N*(t) record the number of events that the subject has experienced by time t, and let Z(·) denote the corresponding covariate processes. Zeng and Lin (2007) proposed the following class of transformation models for the cumulative intensity function of N*(t)

Λ (t ∣ Z) = G [{1 + \int_{0}^{t} R^{*} (s) e^{β^{T} Z (s)} d Λ (s)}^{e^{γ^{T} \tilde{Z}}}] - G (1),

where G is a continuously differentiable and strictly increasing function with G′(1) > 0 and G(∞) = ∞, R*(·) is an indicator process, Z̃ is a subset of Z, β and γ are regression parameters, and Λ(·) is an unspecified increasing function. The data consist of {N_i(t), R_i(t), Z_i(t); t ∈ [0, τ]} (i = 1, …, n), where $R_{i} (t) = I (C_{i} \geq t) R_{i}^{*} (t), N_{i} (t) = N_{i}^{*} (t \land C_{i})$ , C_i is the censoring time, and τ is a finite constant. The likelihood is

\prod_{i = 1}^{n} \prod_{t \leq τ} {R_{i} (t) d Λ (t ∣ Z_{i})}^{d N_{i} (t)} exp {- \int_{0}^{τ} R_{i} (t) d Λ (t ∣ Z_{i})},

where dN_i(t) = N_i(t) − N_i(t−).

2.2. Transformation Models With Random Effects for Dependent Failure Times

For i = 1, …, n, k = 1, …, K and l = 1, …, n_ik, let $N_{ikl}^{*} (\cdot)$ denote the number of the kth type of event experienced by the lth individual in the ith cluster, and Z_ikl(·) the corresponding covariate processes. Zeng and Lin (2007) assumed that the cumulative intensity for $N_{ikl}^{*} (t)$ takes the form

Λ_{k} (t ∣ Z_{ikl}; b_{i}) = G_{k} {\int_{0}^{t} R_{ikl}^{*} (s) e^{β^{T} Z_{ikl} (s) + b_{i}^{T} {\tilde{Z}}_{ikl} (s)} d Λ_{k} (s)},

where G_k, Λ_k, and $R_{ikl}^{*}$ are analogous to G, Λ, and R* of Section 2.1, Z̃_ikl is a subset of Z_ikl plus the unit component, and b_i is a vector of random effects with density f (b; γ). Let C_ikl, N_ikl, and R_ikl be defined analogously to C_i, N_i, and R_i of Section 2.1. The likelihood is

\prod_{i = 1}^{n} \int_{b} \prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} {[R_{ikl} (t) e^{β^{T} Z_{ikl} (t) + b^{T} {\tilde{Z}}_{ikl} (t)} d Λ_{k} (t) G_{k}^{'} {\int_{0}^{t} R_{ikl} (s) e^{β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d Λ_{k} (s)}]}^{d N_{ikl} (t)} \times exp [- G_{k} {\int_{0}^{τ} R_{ikl} (t) e^{β^{T} Z_{ikl} (t) + b^{T} {\tilde{Z}}_{ikl} (t)} d Λ_{k} (t)}] f (b; γ) d b .

2.3. Joint Models for Repeated Measures and Failure Times

For i = 1, …, n and j = 1, …, n_i, let Y_ij be the response variable at time t_ij for the ith subject, and X_ij the corresponding covariates. We assume that (Y_i₁, …, Y_{in_i}) follows a generalized linear mixed model with density f_y(y|X_ij; b_i), where b_i is a set of random effects with density f (b; γ). We define $N_{i}^{*}$ and Z_i as in Section 2.1, and assume that

Λ (t ∣ Z_{i}; b_{i}) = G {\int_{0}^{t} R_{i}^{*} (s) e^{β^{T} Z_{i} (s) + {(ψ \circ b_{i})}^{T} {\tilde{Z}}_{i} (s)} d Λ (s)},

where Z̃_i is a subset of Z_i plus the unit component, ψ is a vector of unknown constants, and v₁ ◦ v₂ is the component-wise product of two vectors v₁ and v₂. The likelihood is

\prod_{i = 1}^{n} \int_{b} \prod_{t \leq τ} {R_{i} (t) d Λ (t ∣ Z_{i}; b)}^{d N_{i} (t)} exp {- \int_{0}^{τ} R_{i} (t) d Λ (t ∣ Z_{i}; b)} \prod_{j = 1}^{n_{i}} f_{y} (Y_{i j} ∣ X_{i j}; b) f (b; γ) d b .

For continuous measures, Zeng and Lin (2007) proposed the semiparametric linear mixed model

\tilde{H} (Y_{i j}) = α^{T} X_{i j} + b_{i}^{T} {\tilde{X}}_{i j} + ε_{i j},

where H̃ is an unknown increasing function with H̃(−∞) = −∞, H̃(∞) = ∞, and H̃(0) = 0, α is a set of regression parameters, X̃_ij is typically a subset of X_ij, and ε_ij (i = 1, …, n; j = 1, …, n_ij) are independent with density f_ε. Write Λ̃(y) = e^H̃⁽^y⁾. The likelihood is

\prod_{i = 1}^{n} \int_{b} \prod_{t \leq τ} {R_{i} (t) d Λ (t ∣ Z_{i}; b)}^{d N_{i} (t)} exp {- \int_{0}^{τ} R_{i} (t) d Λ (t ∣ Z_{i}; b)} \times \prod_{j = 1}^{n_{i}} f_{ε} (log (\tilde{Λ} (Y_{i j})) - α^{T} X_{i j} - b_{i}^{T} {\tilde{X}}_{i j}) {d log \tilde{Λ} (Y_{i j}) / d y} f (b; γ) d b .

3. Nonparametric Maximum Likelihood Estimation

All the likelihood functions given in Section 2 can be expressed as

\prod_{i = 1}^{n} \prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} λ_{k} {(t)}^{R_{ikl} (t) d N_{ikl}^{*} (t)} Ψ (O_{i}; θ, A),

where $λ_{k} (t) = Λ_{k}^{'} (t)$ , θ is a d-vector of regression parameters and variance components, Inline graphic = (Λ₁, …, Λ_K), pertains to the observation on the ith cluster, and Ψ is a functional of , θ, and . For nonparametric maximum likelihood estimation, we allow to be discontinuous with jumps at the observed failure times and maximize the modified likelihood function

\prod_{i = 1}^{n} \prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} Λ_{k} {t}^{R_{ikl} (t) d N_{ikl}^{*} (t)} Ψ (O_{i}; θ, A),

where Λ_k{t} denotes the jump size of the monotone function Λ_k at t. Equivalently, we maximize the logarithm of the above function

L_{n} (θ, A) = \sum_{i = 1}^{n} [\sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int_{0}^{τ} R_{ikl} (t) log Λ_{k} {t} d N_{ikl}^{*} (t) + log Ψ (O_{i}; θ, A)] .

(1)

We wish to establish an asymptotic theory for the resulting NPMLEs θ̂ and Inline graphic .

4. Regularity Conditions

We impose the following conditions on the model and data structures.

(C1) The true value θ₀ lies in the interior of a compact set Θ, and the true functions Λ₀_k are continuously differentiable in [0, τ] with $Λ_{0 k}^{'} (t) > 0$ , k = 1, …, K.

(C2) With probability one, P(inf_s_∈[0,_t_] R_ik_·(s) ≥ 1|Z_ikl, l = 1, …, n_ik) > δ₀ > 0 for all t ∈ [0, τ], where $R_{i k \cdot} (t) = \sum_{l = 1}^{n_{i k}} R_{ikl} (t)$ .

(C3) There exist a constant c₁ > 0 and a random variable r₁( Inline graphic ) > 0 such that E[log r₁()] < ∞ and, for any θ ∈ Θ and any finite Λ₁, …, Λ_K,

Ψ (O_{i}; θ, A) \leq r_{1} (O_{i}) \prod_{k = 1}^{K} \prod_{t \leq τ} {1 + \int_{0}^{t} R_{i k \cdot} (t) d Λ_{k} (t)}^{- d N_{i k \cdot}^{*} (t)} {1 + \int_{0}^{τ} R_{i k \cdot} (t) d Λ_{k} (t)}^{- c_{1}}

almost surely, where $N_{i k \cdot}^{*} (t) = \sum_{l = 1}^{n_{i k}} N_{ikl}^{*} (t)$ . In addition, for any constant c₂,

inf {Ψ (O_{i}; θ, A) : {| | Λ_{1} | |}_{V [0, τ]} \leq c_{2}, \dots, {| | Λ_{K} | |}_{V [0, τ]} \leq c_{2}, θ \in Θ} > r_{2} (O_{i}) > 0,

where ||h||_V_[0,_τ_] is the total variation of h(·) in [0, τ], and r₂( Inline graphic ), which may depend on c₂, is a finite random variable with E[|log r₂()|] < ∞.

We require certain smoothness of Ψ. Let Ψ̇_θ denote the derivative of Ψ( Inline graphic ; θ, ) with respect to θ, and let Ψ̇_k[H_k] denote the derivative of Ψ(; θ, ) along the path (Λ_k + εH_k), where H_k belongs to the set of functions in which Λ_k + εH_k is increasing with bounded total variation.

(C4) For any (θ⁽¹⁾, θ⁽²⁾) ∈ Θ, and $(Λ_{1}^{(1)}, Λ_{1}^{(2)}), \dots, (Λ_{K}^{(1)}, Λ_{K}^{(2)}), (H_{1}^{(1)}, H_{1}^{(2)}), \dots, (H_{K}^{(1)}, H_{K}^{(2)})$ with uniformly bounded total variations, there exist a random variable ℱ( Inline graphic ) ∈ L₄(P) and K stochastic processes μ_ik(t; ) ∈ L₆(P), k = 1, …, K, such that

\begin{array}{l} | Ψ (O_{i}; θ^{(1)}, A^{(1)}) - Ψ (O_{i}; θ^{(2)}, A^{(2)}) | + | {\dot{Ψ}}_{θ} (O_{i}; θ^{(1)}, A^{(1)}) - {\dot{Ψ}}_{θ} (O_{i}; θ^{(2)}, A^{(2)}) | + \sum_{k = 1}^{K} | {\dot{Ψ}}_{k} (O_{i}; θ^{(1)}, A^{(1)}) [H_{k}^{(1)}] - {\dot{Ψ}}_{k} (O_{i}; θ^{(2)}, A^{(2)}) [H_{k}^{(2)}] | + \sum_{k = 1}^{K} | \frac{{\dot{Ψ}}_{k} (O_{i}; θ^{(1)}, A^{(1)}) [H_{k}^{(1)}]}{Ψ (O_{i}; θ^{(1)}, A^{(1)})} - \frac{{\dot{Ψ}}_{k} (O_{i}; θ^{(2)}, A^{(2)}) [H_{k}^{(2)}]}{Ψ (O_{i}; θ^{(2)}, A^{(2)})} | \\ \leq F (O_{i}) [∣ θ^{(1)} - θ^{(2)} ∣ + \sum_{k = 1}^{K} {\int_{0}^{τ} ∣ Λ_{k}^{(1)} (s) - Λ_{k}^{(2)} (s) ∣ d μ_{i k} (s; O_{i}) + \int_{0}^{τ} ∣ H_{k}^{(1)} (s) - H_{k}^{(2)} (s) ∣ d μ_{i k} (s; O_{i})}] . \end{array}

In addition, μ_ik(s; Inline graphic ) is non-decreasing, and E[ℱ()μ_ik(s; )] is left-continuous with uniformly bounded left- and right-derivatives for any s ∈ [0, τ]. Here, the right-derivative for a function f(x) is defined as lim_h_→0+(f (x + h) − f (x+))/h.

The following condition ensures identifiability of parameters.

(C5) (First Identifiability Condition) If

[\prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} λ_{k}^{*} {(t)}^{R_{ikl} (t) d N_{ikl}^{*} (t)}] Ψ (O_{i}; θ^{*}, A^{*}) = [\prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} λ_{0 k} {(t)}^{R_{ikl} (t) d N_{ikl}^{*} (t)}] Ψ (O_{i}; θ_{0}, A_{0})

almost surely, then θ* = θ₀ and $Λ_{k}^{*} (t) = Λ_{0 k} (t)$ for t ∈ [0, τ], k = 1, …, K.

The next assumption is more technical and will be used in proving the weak convergence of the NPM-LEs. For any fixed (θ, Inline graphic ) in a small neighborhood of (θ₀, ) in R^d × {BV[0, τ]}^K, where BV[0, τ] denotes the space of functions with bounded total variations in [0, τ], (C4) implies that the linear functional

H_{k} \mapsto E [\frac{{\dot{Ψ}}_{k} (O_{i}; θ, A) [H_{k}]}{Ψ (O_{i}; θ, A)}]

is continuous from BV[0, τ] to R. Thus, there exists a bounded function η₀_k(s; θ, Inline graphic ) such that

E [\frac{{\dot{Ψ}}_{k} (O_{i}; θ, A) [H_{k}]}{Ψ (O_{i}; θ, A)}] = \int_{0}^{τ} η_{0 k} (s; θ, A) d H_{k} (s) .

(C6) There exist functions ζ₀_k(s; θ₀, Inline graphic ) ∈ BV[0, τ], k = 1, …, K, and a matrix ζ₀_θ(θ₀, ) such that

| E [\frac{{\dot{Ψ}}_{θ} (O_{i}; θ, A)}{Ψ (O_{i}; θ, A)} - \frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0})}{Ψ (O_{i}; θ_{0}, A_{0})}] - ζ_{0 θ} (θ_{0}, A_{0}) (θ - θ_{0}) - \sum_{k = 1}^{K} \int_{0}^{τ} ζ_{0 k} (s; θ_{0}, A_{0}) d (Λ_{k} - Λ_{0 k}) | = o (∣ θ - θ_{0} ∣ + \sum_{k = 1}^{K} {| | Λ_{k} - Λ_{0 k} | |}_{V [0, τ]}) .

In addition, for k = 1, …, K,

\sum_{k = 1}^{K} sup_{s \in [0, τ]} | {η_{0 k} (s; θ, A) - η_{0 k} (s; θ_{0}, A_{0})} - η_{0 k θ} (s; θ_{0}, A_{0}) (θ - θ_{0}) - \int_{0}^{τ} \sum_{m = 1}^{K} η_{0 k m} (s, t; θ_{0}, A_{0}) d (Λ_{m} - Λ_{0 m}) (t) | = o (∣ θ - θ_{0} ∣ + \sum_{k = 1}^{K} {| | Λ_{k} - Λ_{0 k} | |}_{V [0, τ]}),

where η₀_km is a bounded bivariate function and η₀_kθ is a d-dimensional bounded function. Furthermore, there exists a constant c₃ such that |η₀_km(s, t₁; θ₀, Inline graphic ) − η₀_km(s, t₂; θ₀, )| ≤ c₃|t₁ − t₂| for any s ∈ [0, τ] and any t₁, t₂ ∈ [0, τ].

The final assumption ensures that the Fisher information matrix along any finite-dimensional sub-model is non-singular.

(C7) (Second Identifiability Condition) If with probability one,

\sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int h_{k} (t) R_{ikl} (t) d N_{ikl}^{*} (t) + \frac{{\dot{Ψ}}_{θ} {(O_{i}; θ_{0}, A_{0})}^{T} v + \sum_{k = 1}^{K} {\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}]}{Ψ (O_{i}; θ_{0}, A_{0})} = 0

for some constant vector v ∈ R^d and h_k ∈ BV[0, τ], k = 1, …, K, then v = 0 and h_k = 0 for k = 1, …, K.

Remark 1

(C1)–(C2) are standard assumptions in any analysis of censored data. (C3) pertains to the model structure, and (C4) and (C6) essentially impose the smoothness of this structure. Although they appear technical, these conditions are easy to verify in practice. (C5) and (C7) usually require some work to verify, but can be translated to simple conditions in specific cases.

5. Some Useful Lemmas

Lemma 1

For any constant c, the following classes of functions are P-Donsker:

\begin{array}{l} F_{1} = {log Ψ (O_{i}; θ, A) : {| | Λ_{k} | |}_{V [0, τ]} \leq c, k = 1, \dots, K, θ \in Θ}, \\ F_{2} = {\frac{{\dot{Ψ}}_{θ} (O_{i}; θ, A)}{Ψ (O_{i}; θ, A)} : {| | Λ_{k} | |}_{V [0, τ]} \leq c, k = 1, \dots, K, θ \in Θ}, \\ F_{3 k} = {\frac{{\dot{Ψ}}_{k} (O_{i}; θ, A) [H]}{Ψ (O_{i}; θ, A)} : {| | Λ_{m} | |}_{V [0, τ]} \leq c, m = 1, \dots, K, θ \in Θ {| | H | |}_{V [0, τ] \leq c}}, k = 1, \dots, K . \end{array}

Proof

We only prove that ℱ₃_k is P-Donsker; the proofs for the other two classes are similar. For k = 1, …, K, we define a measure μ̃_k in [0, τ] such that, for any Borel set A ⊂ [0, τ],

{\tilde{μ}}_{k} (A) = \int_{0}^{τ} I (t \in A) E [F {(O_{i})}^{2} {(μ_{i k} (τ; O_{i}) - μ_{i k} (0; O_{i}))}^{2} d μ_{i k} (t; O_{i})] .

Condition (C4) implies that μ̃_k([0, τ]) ≤ ||ℱ( Inline graphic )||_L₄(P)||μ_ik(τ; ) − μ_ik(0; )||_L₆(P). Thus, μ̃_k is a finite measure. According to Theorem 2.7.5 of van der Vaart and Wellner (1996), the bracket covering number for any bounded set in BV[0, τ] is of order exp{O(1/ε)} in L₂(μ̃_k), k = 1, …, K. Thus, we can construct N_ε ≡ (1/ε)^d × exp{O(K/ε)} × exp{O(1/ε)} brackets for the set of (θ, Inline graphic , H) in ℱ₃_k, denoted by

[θ_{p}^{L}, θ_{p}^{U}] \times [Λ_{1 p}^{L}, Λ_{1 p}^{U}] \times \dots \times [Λ_{K p}^{L}, Λ_{K p}^{U}] \times [H_{p}^{L}, H_{p}^{U}], p = 1, \dots, N_{ε},

such that $∣ θ_{p}^{U} - θ_{p}^{L} ∣ < ε$ and

\int ∣ Λ_{k p}^{U} - Λ_{k p}^{L} ∣^{2} d {\tilde{μ}}_{k} < ε^{2}, \int ∣ H_{p}^{U} - H_{p}^{L} ∣^{2} d {\tilde{μ}}_{k} < ε^{2}, k = 1, \dots, K .

Any (θ, Inline graphic , H) must belong to one of these brackets. Obviously, the bracket functions

\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{p}^{L}, A_{p}^{L}) [H^{L}]}{Ψ (O_{i}; θ_{p}^{L}, A_{p}^{L})} \pm F (O_{i}) {∣ θ_{p}^{U} - θ_{p}^{L} ∣ + \sum_{m = 1}^{K} \int ∣ Λ_{m p}^{U} (s) - Λ_{m p}^{L} (s) ∣ d μ_{i m} (s; O_{i}) + \int ∣ H_{p}^{U} (s) - H_{p}^{L} (s) ∣ d μ_{i m} (s; O_{i})}, p = 1, \dots, N_{ε},

cover all the functions in ℱ₃_k. Since

\begin{array}{l} {‖ F (O_{i}) {∣ θ_{p}^{U} - θ_{p}^{L} ∣ + \sum_{m = 1}^{K} \int ∣ Λ_{m p}^{U} (s) - Λ_{m p}^{L} (s) ∣ d μ_{i m} (s; O_{i}) + \sum_{m = 1}^{K} \int ∣ H_{p}^{U} (s) - H_{p}^{L} (s) ∣ d μ_{i m} (s; O_{i})} ‖}_{L_{2} (P)} \\ \leq c [∣ θ_{p}^{U} - θ_{p}^{L} ∣ + \sum_{m = 1}^{K} {E {(\int ∣ Λ_{m p}^{U} (s) - Λ_{m p}^{L} (s) ∣ d {\tilde{μ}}_{i m} F (O_{i}))}^{2}}^{1 / 2} + \sum_{m = 1}^{K} {\int_{0}^{τ} ∣ H_{p}^{U} (s) - H_{p}^{L} (s) ∣^{2} d {\tilde{μ}}_{m}}^{1 / 2}] \\ \leq c [∣ θ_{p}^{U} - θ_{p}^{L} ∣ + \sum_{m = 1}^{K} {\int ∣ Λ_{m p}^{U} (s) - Λ_{m p}^{L} (s) ∣^{2} d {\tilde{μ}}_{m}}^{1 / 2} + \sum_{m = 1}^{K} {\int_{0}^{τ} ∣ H_{p}^{U} (s) - H_{p}^{L} (s) ∣^{2} d {\tilde{μ}}_{m}}^{1 / 2}], \end{array}

where c is a constant depending on K, the L₂(P)-distance within each bracket pair is O(ε). Hence, the bracket entropy integral of ℱ₃_k is finite, so that ℱ₃_k is P-Donsker.

Lemma 2

Proof

Since μ_ik(t; Inline graphic ) is non-decreasing in t, it follows from (C4) that for any s₁ and s₂,

\begin{array}{l} ∣ g (s_{1}) - g (s_{2}) ∣ \leq E [F (O_{i}) {\int ∣ I (t \geq s_{1}) - I (t \geq s_{2}) ∣ d μ_{i k} (t; O_{i})}] \\ \leq ∣ E [F (O_{i}) μ_{i k} (s_{1}; O_{i})] - E [F (O_{i}) μ_{i k} (s_{2}; O_{i})] ∣ . \end{array}

Thus, g(s) is in BV[0, τ] and is left-continuous. In addition, the left- and right-differentiability of E[ℱ( Inline graphic )μ_ik(s; )] in (C4) implies that the second part of the lemma holds.

Lemma 3

For any h(s) ∈ BV[0, τ], the linear map $h \mapsto \int_{0}^{τ} h (t) η_{0 k m} (t, s; θ_{0}, A_{0}) d Λ_{0 k} (t)$ is a bounded compact operator from BV[0, τ] to BV[0, τ].

Proof

It is clear from (C6) that this function maps any bounded set in BV[0, τ] into a bounded set consisting of Lipschitz-continuous functions. The result thus follows since any bounded and Lipschitz-continuous functions consist of a totally bounded set in BV[0, τ] and the linear map is continuous.

6. Consistency

The following theorem states the consistency of θ̂ and Λ̂_k, k = 1, …, K.

Theorem 1

Under (C1)–(C5), $∣ \hat{θ} - θ_{0} ∣ + \sum_{k = 1}^{K} {sup}_{t \in [0, τ]} ∣ {\hat{Λ}}_{k} (t) - Λ_{0 k} (t) ∣ \to_{a . s .} 0$ .

Proof

We fix a random sample in the probability space and assume that (C1)–(C5) hold for this sample. The set of such samples has probability one. We prove the result for this fixed sample. The entire proof consists of three steps.

Step 1

We show that the NPMLEs exist or, equivalently, Λ̂_k(τ) < ∞ (k = 1, …, K) for large n. By (C3), the likelihood function is bounded by

\begin{array}{l} \prod_{i = 1}^{n} r_{1} (O_{i}) \prod_{k = 1}^{K} \prod_{t \leq τ} {[Λ_{k} {t} R_{i k \cdot} (t) {1 + \int_{0}^{t} R_{i k \cdot} (s) d Λ_{k} (s)}^{- 1}]}^{d N_{i k \cdot}^{*} (t)} {1 + \int_{0}^{τ} R_{i k \cdot} (s) d Λ_{k} (s)}^{- c_{1}} \\ \leq \prod_{i = 1}^{n} r_{1} (O_{i}) \prod_{k = 1}^{K} {1 + \int_{0}^{τ} R_{i k \cdot} (s) d Λ_{k} (s)}^{- c_{1}} . \end{array}

If Λ_k(τ) = ∞ for some k, then (C2) implies that, with probability one, inf_t_∈[0,_τ_] R_ik_·(t) ≥ 1 for some i, so that the above function is equal to zero. Thus, the maximum of the likelihood function can only be attained for Λ̂_k(τ) < ∞.

Step 2

We show that lim sup_n Λ̂_k(τ) < ∞ almost surely, i.e., Λ̂_k(τ) is bounded uniformly for all large n. By differentiating the objective function (1) with respect to Λ_k{Y_ikl} for which $d N_{ikl}^{*} (Y_{ikl}) = 1$ and R_ikl(Y_ikl) = 1, we note that Λ̂_k{Y_ikl} satisfies

\frac{1}{{\hat{Λ}}_{k} {Y_{ikl}}} = - \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; \hat{θ}, \hat{A}) [I (\cdot \geq Y_{ikl})]}{Ψ (O_{j}; \hat{θ}, \hat{A})} .

In other words,

{\hat{Λ}}_{k} (t) = - \sum_{i = 1}^{n} \sum_{m = 1}^{n_{i k}} \int_{0}^{t} {\sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; \hat{θ}, \hat{A}) [I (\cdot \geq s)]}{Ψ (O_{j}; \hat{θ}, \hat{A})}}^{- 1} R_{ikm} (s) d N_{ikm}^{*} (s) .

To prove the boundedness of Λ̂_k(τ), we construct another step function Λ̃_k with jumps only at the Y_ikl for which $d N_{ikl}^{*} (Y_{ikl}) = 1$ and R_ikl(Y_ikl) = 1,

\frac{1}{{\tilde{Λ}}_{k} {Y_{ikl}}} = - \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq Y_{ikl})]}{Ψ (O_{j}; θ_{0}, A_{0})},

that is,

{\tilde{Λ}}_{k} (t) = - \sum_{i = 1}^{n} \sum_{m = 1}^{n_{i k}} \int_{0}^{t} {\sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ_{0}, A_{0})}}^{- 1} R_{ikm} (s) d N_{ikm}^{*} (s) .

We show that Λ̃_k uniformly converges to Λ₀_k. By Lemma 1,

n^{- 1} {\sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ_{0}, A_{0})}} \to E [\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{i}; θ_{0}, A_{0})}]

(2)

uniformly in s ∈ [0, τ]. Since the score function along the path Λ_k = Λ₀_k + εI(· ≥ s) with the other parameters fixed at their true values has zero expectation,

\begin{array}{l} 0 = E [\sum_{l = 1}^{n_{i k}} \int \frac{δ (t = s)}{λ_{0 k} (t)} R_{ikl} (t) d N_{ikl}^{*} (t)] + E [\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{i}; θ_{0}, A_{0})}] \\ = E [\sum_{l = 1}^{n_{i k}} R_{ikl} (s) d N_{ikl}^{*} (s) / d s] / λ_{0 k} (s) + E [\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{i}; θ_{0}, A_{0})}], \end{array}

(3)

where δ(t = s) is the Dirac function. The submodel is not in the parameter space; however, we can always choose a sequence of submodels in the parameter space which approximates this submodel. Thus, the uniform limit of Λ̃_k(t) is

E [\sum_{m = 1}^{n_{i k}} \int_{0}^{t} {E [\sum_{l = 1}^{n_{i k}} R_{ikl} (s) d N_{ikl}^{*} (s) / d s] / λ_{0 k} (s)}^{- 1} R_{ikm} (s) d N_{ikm}^{*} (s)] = Λ_{0 k} (t) .

That is, Λ̃_k(t) uniformly converges to Λ₀_k(t).

We next show that the difference between the log-likelihood functions evaluated at (θ̂, Inline graphic ) and (θ₀, ), where = (Λ̃₁, …, Λ̃_K), is negative eventually if some Λ̂_k(τ) diverges, which will induce a contradiction. The key arguments are based on (C3). Clearly, n⁻¹ℒ_n(θ̂, ) ≥ n⁻¹ℒ_n(θ₀, ). It follows from (2) and (3) that nΛ̃_k{t} converges to $λ_{0 k} (t) / E [\sum_{l = 1}^{n_{i k}} R_{ikl} (t) d N_{ikl}^{*} (t) / d t]$ , and is thus uniformly bounded away from zero, where t is an observed failure time. Therefore,

n^{- 1} L_{n} (θ_{0}, \tilde{A}) + n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int R_{ikl} (t) d N_{ikl}^{*} (t) log n = n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int log (n {\tilde{Λ}}_{k} {t}) R_{ikl} (t) d N_{ikl}^{*} (t) + n^{- 1} \sum_{i = 1}^{n} log Ψ (O_{i}; θ_{0}, A_{0}),

which is bounded away from − ∞ when n is large. That is,

n^{- 1} L_{n} (θ_{0}, \tilde{A}) + n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int R_{ikl} (t) d N_{ikl}^{*} (t) log n = O (1),

where O(1) denotes a finite constant. On the other hand, (C3) implies that

\begin{array}{l} n^{- 1} L_{n} (\hat{θ}, \hat{A}) \leq n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int R_{ikl} (t) log {\hat{Λ}}_{k} {t} d N_{ikl}^{*} (t) + n^{- 1} \sum_{i = 1}^{n} log Ψ (O_{i}; \hat{θ}, \hat{A}) \\ \leq n^{- 1} \sum_{i = 1}^{n} log r_{1} (O_{i}) + n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \int I (R_{i k \cdot} (t) > 0) log {\hat{Λ}}_{k} {t} d N_{i k \cdot} (t) - n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \int log {1 + \int_{0}^{t} R_{i k \cdot} (s) d {\hat{Λ}}_{k} (s)} d N_{i k \cdot} (t) - n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} c_{1} log {1 + \int_{0}^{τ} R_{i k \cdot} (s) d {\hat{Λ}}_{k} (s)}, \end{array}

where $d N_{i k \cdot} (t) = \sum_{l = 1}^{n_{i k}} R_{ikl} (t) d N_{ikl}^{*} (t)$ . Thus,

O (1) \leq n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \int I (R_{i k \cdot} (t) > 0) log (n {\hat{Λ}}_{k} {t}) d N_{i k \cdot} (t) - n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \int log {1 + \int_{0}^{t} R_{i k \cdot} (s) d {\hat{Λ}}_{k} (s)} d N_{i k \cdot} (t) - n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} c_{1} log {1 + \int_{0}^{τ} R_{i k \cdot} (s) d {\hat{Λ}}_{k} (s)} .

(4)

We now show that the right-hand side diverges to − ∞ if Λ̂_k(τ) diverges for some k. The proof is based on the partitioning idea of Murphy (1994). Specifically, we construct a sequence t₀_k = τ > t₁_k > t₂_k > … in the following manner. First, we define

t_{1 k} = argmin {t \in [0, t_{0 k}) : \frac{c_{1}}{2} E [I ({\bar{R}}_{i k \cdot} (τ) > 0)] \geq E [I ({\bar{R}}_{i k \cdot} (t) > 0, {\bar{R}}_{i k \cdot} (τ) = 0) \int_{t}^{t_{0 k}} d N_{i k \cdot} (t)]},

where R̄_ik_·(t) = inf_s_∈[0,_t_] R_ik_·(s). Clearly, such a t₁_k exists, and the above inequality becomes an equality if t₁_k > 0. If t₁_k > 0, we choose a small constant ε₀ such that

\frac{ε_{0}}{1 - ε_{0}} < \frac{c_{1} E [I ({\bar{R}}_{i k \cdot} (τ) = 0, {\bar{R}}_{i k \cdot} (t_{1 k}) > 0)]}{E [I ({\bar{R}}_{i k \cdot} (t_{1 k}) = 0, {\bar{R}}_{i k \cdot} (0) > 0) \int_{0}^{τ} d N_{i k \cdot} (t)]},

and define

t_{2 k} = argmin {t \in [0, t_{1 k}) : (1 - ε_{0}) E [{c_{1} + \int_{t_{1 k}}^{t_{0 k}} d N_{i k \cdot} (t)} I ({\bar{R}}_{i k \cdot} (t_{0 k}) = 0, {\bar{R}}_{i k \cdot} (t_{1 k}) > 0)] \geq E [I ({\bar{R}}_{i k \cdot} (t_{1 k}) = 0, {\bar{R}}_{i k \cdot} (t) > 0) \int_{t}^{t_{1 k}} d N_{i k \cdot} (t)]} .

Such a t₂_k exists. If t₂_k > 0, the inequality is an equality, and we define

t_{3 k} = argmin {t \in [0, t_{1 k}) : (1 - ε_{0}) E [{c_{1} + \int_{t_{2 k}}^{t_{1 k}} d N_{i k \cdot} (t)} I ({\bar{R}}_{i k \cdot} (t_{1 k}) = 0, {\bar{R}}_{i k \cdot} (t_{2 k}) > 0)] \geq E [I ({\bar{R}}_{i k \cdot} (t_{2 k}) = 0, {\bar{R}}_{i k \cdot} (t) > 0) \int_{t}^{t_{2 k}} d N_{i k \cdot} (t)]} .

We continue this process. The sequence eventually stops at some t_{N_k,k} = 0. If this is not true, then the sequence is infinite and strictly decreases to some t* ≥ 0. Since all the inequalities are equalities, we sum all the equations except the first one to obtain

(1 - ε_{0}) E [{c_{1} + \int_{t^{*}}^{t_{0 k}} d N_{i k \cdot} (t)} I ({\bar{R}}_{i k \cdot} (t^{*}) > 0, {\bar{R}}_{i k \cdot} (τ) = 0)] = E [I ({\bar{R}}_{i k \cdot} (t_{1 k}) = 0, {\bar{R}}_{i k \cdot} (t^{*}) > 0) \int_{t *}^{t_{1 k}} d N_{i k \cdot} (t)],

which implies that

c_{1} (1 - ε_{0}) E [I ({\bar{R}}_{i k \cdot} (τ) = 0, {\bar{R}}_{i k \cdot} (t_{1 k}) > 0)] \leq ε_{0} E [I ({\bar{R}}_{i k \cdot} (t_{1 k}) = 0, {\bar{R}}_{i k \cdot} (0) > 0) \int_{0}^{τ} d N_{i k \cdot} (t)] .

This contradicts the choice of ε₀. Thus, the sequence stops at some t_{N_k,k} = 0.

If we write I_qk = [t_q_+1,_k, t_qk), then the right-hand side of (4) can be bounded by

\sum_{k = 1}^{K} [n^{- 1} \sum_{i = 1}^{n} \sum_{q = 0}^{N_{k} - 1} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} log (n {\hat{Λ}}_{k} {t}) d N_{i k \cdot} - n^{- 1} \sum_{i = 1}^{n} \sum_{q = 0}^{N_{k} - 1} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot} log {1 + {\hat{Λ}}_{k} {t_{q + 1, k}}} - n^{- 1} \sum_{i = 1}^{n} \sum_{q = 0}^{N_{k} - 1} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) c_{1} log {1 + {\hat{Λ}}_{k} {t_{q + 1, k}}} - n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{0 k}) > 0) log {1 + {\hat{Λ}}_{k} {τ}}] .

(5)

Since log x is a concave function,

\begin{array}{l} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} log (n {\hat{Λ}}_{k} {t}) d N_{i k \cdot} (t) \\ \leq {\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}} \times log [\frac{\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} n {\hat{Λ}}_{k} {t} d N_{i k \cdot} (t)}{\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot} (t)}] \\ \leq {\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}} \times log [\frac{n {\hat{Λ}}_{k} {t_{q k}}}{\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot} (t)}] . \end{array}

Therefore, (5) can be further bounded by

O (1) \leq \sum_{k = 1}^{K} [\sum_{q = 0}^{N_{k} - 1} n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot} \times log {\frac{n}{\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}}} + \sum_{q = 0}^{N_{k} - 1} log {\hat{Λ}}_{k} (t_{q k}) {n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}} - n^{- 1} \sum_{i = 1}^{n} \sum_{q = 0}^{N - 1} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot} log {1 + {\hat{Λ}}_{k} (t_{q + 1, k})} - \sum_{q = 0}^{N_{k} - 1} n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) c_{1} log {1 + {\hat{Λ}}_{k} (t_{q + 1, k})} - n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{0 k}) > 0) log {1 + {\hat{Λ}}_{k} (τ)}] .

By (C2),

\frac{n}{\sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}} \to_{a . s .} {(E [I ({\bar{R}}_{i k \cdot} (t_{q k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}])}^{- 1} < \infty,

so that

O (1) \leq \sum_{k = 1}^{K} (- n^{- 1} \sum_{i = 1}^{n} \frac{c_{1}}{2} I ({\bar{R}}_{i k \cdot} (t_{0 k}) > 0) log {1 + {\hat{Λ}}_{k} (τ)} - {n^{- 1} \sum_{i = 1}^{n} \frac{c_{1}}{2} I ({\bar{R}}_{i k \cdot} (t_{0 k}) > 0) - n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{0 k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{0 k}} d N_{i k \cdot}} \times log {1 + {\hat{Λ}}_{k} (t_{0 k})} - \sum_{q = 1}^{N_{k} - 1} [n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q - 1, k}) = 0, {\bar{R}}_{i k \cdot} (t_{q, k}) > 0) {c_{1} + \int_{t \in I_{q k}} d N_{i k \cdot}} - n^{- 1} \sum_{i = 1}^{n} I ({\bar{R}}_{i k \cdot} (t_{q, k}) = 0, {\bar{R}}_{i k \cdot} (t_{q + 1, k}) > 0) \int_{t \in I_{q k}} d N_{i k \cdot}] {1 + log {\hat{Λ}}_{k} (t_{q, k})}) .

According to the construction of the t_qk’s, the coefficients in front of log Λ̂_k(t_qk) are all negative when n is large enough. Therefore, the corresponding terms cannot diverge to ∞. However, if Λ̂_k(τ) → ∞, the first term in the summation goes to −∞. We conclude that for all n large enough, Λ̂_k(τ) < ∞. Thus, lim sup_n Λ̂_k(τ) < ∞.

Step 3

We obtain the consistency result from (C5). Since Λ̂_k is bounded and monotone, Λ̂_k is weakly compact. Helly’s Selection Theorem implies that, for any subsequence, we can always choose a further subsequence such that Λ̂_k point-wise converges to some monotone function $Λ_{k}^{*}$ . Without loss of generality, we also assume that θ̂ converges to some θ^*. The consistency will hold if we can show that $Λ_{k}^{*} = Λ_{0 k}$ and θ^* = θ₀. Since Λ₀_k is continuous, the weak convergence of Λ̂_k to Λ₀_k can be strengthened to the uniform convergence of Λ̂_k to Λ₀_k in [0, τ].

Note that

{\hat{Λ}}_{k} (t) = \int_{0}^{t} \frac{∣ n^{- 1} \sum_{j = 1}^{n} {\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq s)] / Ψ (O_{j}; θ_{0}, A_{0}) ∣}{∣ n^{- 1} \sum_{j = 1}^{n} {\dot{Ψ}}_{k} (O_{j}; \hat{θ}, \hat{A}) [I (\cdot \geq s)] / Ψ (O_{j}; \hat{θ}, \hat{A}) ∣} d {\tilde{Λ}}_{k} (s) .

(6)

Clearly, Λ̂_k is absolutely continuous with respect to Λ̃_k. By condition (C3),

sup_{s \in [0, τ]} | n^{- 1} \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; \hat{θ}, \hat{A}) [I (\cdot \geq s)]}{Ψ (O_{j}; \hat{θ}, \hat{A})} - n^{- 1} \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ^{*}, A^{*}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ^{*}, A^{*})} | \leq n^{- 1} \sum_{j = 1}^{n} F (O_{j}) {∣ \hat{θ} - θ^{*} ∣ + \sum_{k = 1}^{K} \int ∣ {\hat{Λ}}_{k} (t) - Λ_{k}^{*} (t) ∣ d μ_{j k} (t; O_{j})} \to 0

since Λ̂_k converges to $Λ_{k}^{*}$ and is bounded and {ℱ( Inline graphic )μ_jk(t; ): t ∈ [0, τ]} is a P-Glivenko-Cantelli class. By Lemma 1 and the Glivenko-Cantelli Theorem,

\begin{array}{l} n^{- 1} \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ^{*}, A^{*}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ^{*}, A^{*})} \to E [\frac{{\dot{Ψ}}_{k} (O_{j}; θ^{*}, A^{*}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ^{*}, A^{*})}] uniformly in s \in [0, τ], \\ n^{- 1} \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ_{0}, A_{0})} \to E [\frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq s)]}{Ψ (O_{j}; θ_{0}, A_{0})}] uniformly in s \in [0, τ] . \end{array}

The numerator and denominator in the integrand of (6) converge uniformly to deterministic functions, denoted by g₁_k(s) and g₂_k(s), respectively. It follows from (3) that $g_{1 k} (s) \equiv E [\sum_{l = 1}^{n_{i k}} R_{ikl} (s) d N_{i k l^{*} (s)} / d s] / λ_{i k} (s)$ is bounded away from zero. We claim that inf _s_∈[0,_τ_] g₂_k(s) > 0. If this is not true, then there exists some s^* ∈ [0, τ] such that g₂_k(s^*+) = 0 or g₂_k(s^*) = 0. By Lemma 2, there exist δ^* and c^* such that |g₂_k(s)| ≤ c^*|s − s^*| for s ∈ (s^*, s^* + δ ^*) or s ∈ (s^* − δ ^*, s^*]. On the other hand, for any ε > 0,

{\hat{Λ}}_{k} (τ) \geq \int_{0}^{τ} \frac{∣ n^{- 1} \sum_{j = 1}^{n} {\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (\cdot \geq s)] / Ψ (O_{j}; θ_{0}, A_{0}) ∣}{ε + ∣ n^{- 1} \sum_{j = 1}^{n} {\dot{Ψ}}_{k} (O_{j}; \hat{θ}, \hat{A}) [I (\cdot \geq s)] / Ψ (O_{j}; \hat{θ}, \hat{A}) ∣} d {\tilde{Λ}}_{k} (s) .

Taking limits on both sides, we obtain $O (1) \geq \int_{0}^{τ} {ε + g_{2 k} (s)}^{- 1} g_{1 k} (s) d Λ_{0 k} (s)$ . Let ε → 0. By the Monotone Convergence Theorem, $O (1) \geq \int_{s^{*}}^{s^{*} + δ^{*}} {c^{*} ∣ s - s^{*} ∣}^{- 1} g_{1 k} (s) λ_{0 k} (s) d s$ , or $O (1) \geq \int_{s^{*} - δ^{*}}^{s^{*}} {c^{*} ∣ s - s^{*} ∣}^{- 1} g_{1 k} (s) λ_{0 k} (s) d s$ . This is a contradiction since the right-hand side is infinite. The contradiction implies that the limit g₂_k(s) is uniformly positive. We can take limits on both sides of (6) to obtain $Λ_{k}^{*} (t) = \int_{0}^{t} g_{2 k}^{- 1} (s) g_{1 k} (s) d Λ_{0 k} (s)$ . Thus, $Λ_{k}^{*}$ is also absolutely continuous with respect to Λ₀_k and $d Λ_{k}^{*} / d Λ_{0 k} = g_{1 k} / g_{2 k}$ . Since Λ₀_k(t) is differentiable with respect to t, so is $Λ_{k}^{*} (t)$ . Write ${Λ_{k}^{*}}^{'} (t) = λ_{k}^{*} (t)$ . The forgoing arguments show that dΛ̂_k(t)/dΛ̃_k(t) uniformly converges to $λ_{k}^{*} (t) / λ_{0 k} (t)$ , which is uniformly positive in [0, τ].

It follows from the inequality n⁻¹ℒ_n(θ̂, Inline graphic ) ≥ n⁻¹ℒ_n(θ₀, ) that

n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \sum_{l = 1}^{n_{i k}} \int log \frac{d {\hat{Λ}}_{k} (t)}{d {\tilde{Λ}}_{k} (t)} R_{ikl} (t) d N_{ikl}^{*} (t) + n^{- 1} \sum_{i = 1}^{n} log \frac{Ψ (O_{i}; \hat{θ}, \hat{A})}{Ψ (O_{i}; θ_{0}, \tilde{A})} \geq 0 .

In view of Lemma 1, the Glivenko-Cantelli Theorem and the uniform convergence of dΛ̂_k/dΛ̃_k, taking limits on both sides of the above inequality yields

E [log \frac{\prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} {λ_{k}^{*} (t)}^{R_{ikl} (t) d N_{ikl}^{*} (t)} Ψ (O_{i}; θ^{*}, A^{*})}{\prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} \prod_{t \leq τ} {λ_{0} (t)}^{R_{ikl} (t) d N_{ikl}^{*} (t)} Ψ (O_{i}; θ_{0}, A_{0})}] \geq 0.

The left-hand side is the negative Kullback-Leibler distance of the density indexed by (θ^*, Inline graphic ). Thus, (C5) entails that θ^* = θ₀ and Λ^* = Λ₀.

7. Weak Convergence and Asymptotic Efficiency

Define Inline graphic = {v ∈ R^d, |v| ≤ 1}, and = {h(t): ||h(t)||_V _[0,_τ_] ≤ 1 }. We identify (θ̂ − θ₀, − ) as a random element in l^∞( × ) through the definition ${(\hat{θ} - θ_{0})}^{T} v + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (s) d ({\hat{Λ}}_{k} - Λ_{0 k}) (s)$ .

Theorem 2

Under (C1)–(C7), n^1/2(θ̂ − θ₀, Inline graphic − ) →_d ℊ in l^∞( × ), where ℊ is a continuous zero-mean Gaussian process. Furthermore, the limiting covariance matrix of n^1/2(θ̂ − θ₀) attains the semiparametric efficiency bound.

Proof

The proof is based on the likelihood equation and follows the arguments of van der Vaart (1998, pp. 419–424). Let ℒ(θ, Inline graphic ) be the log-likelihood function from a single cluster, ℒ̇_θ(θ, ) be the derivative of ℒ(θ, ) with respect to θ, and ℒ̇_k(θ, )[H_k] be the path-wise derivative along the path Λ_k + εH_k. We sometimes omit the arguments in these derivatives when θ = θ₀ and = . Let Inline graphic be the empirical measure based on n i.i.d. observations, and be its expectation.

Let Inline graphic = (h₁, …, h_K) ∈ . The likelihood equation for (θ̂, ) along the path (θ̂+εv, +ε∫d), where v ∈ R^d and h_k ∈ BV [0, τ], is given by

0 = P_{n} [v^{T} {\dot{L}}_{θ} (θ, A) + \sum_{k = 1}^{K} {\dot{L}}_{k} (θ, A) [\int h_{k} d Λ_{k}]] .

To be specific,

0 = P_{n} [\frac{v^{T} {\dot{Ψ}}_{θ} (O_{i}; θ, A)}{Ψ (O_{i}; θ, A)}] + \sum_{k = 1}^{K} P_{n} [\sum_{l = 1}^{n_{i k}} \int h_{k} (t) R_{ikl} (t) d N_{ikl}^{*} (t) + {\dot{Ψ}}_{k} (O_{i}; θ, A) [\int h_{k} d Λ_{k}]] .

Since (θ₀, Inline graphic ) maximizes [ℒ(θ, )],

0 = P [v^{T} {\dot{L}}_{θ} (θ_{0}, A_{0})], 0 = P [{\dot{L}}_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}]], h_{k} \in Q, k = 1, \dots, K .

These equations, combined with the likelihood equation for (θ̂, Inline graphic ), yield

n^{1 / 2} (P_{n} - P) [v^{T} {\dot{L}}_{θ} (\hat{θ}, \hat{A}) + \sum_{k = 1}^{K} {\dot{L}}_{k} (\hat{θ}, \hat{A}) [\int h_{k} d {\hat{Λ}}_{k}]] = - n^{1 / 2} P [\frac{v^{T} {\dot{Ψ}}_{θ} (O_{i}; \hat{θ}, \hat{A})}{Ψ (O_{i}; \hat{θ}, \hat{A})} - \frac{v^{T} {\dot{Ψ}}_{θ} (O_{i}; θ_{0}, A_{0})}{Ψ (O_{i}; θ_{0}, A_{0})}] - \sum_{k = 1}^{K} n^{1 / 2} P [\frac{{\dot{Ψ}}_{k} (O_{i}; \hat{θ}, \hat{A}) [\int h_{k} d {\hat{Λ}}_{k}]}{Ψ (O_{i}; \hat{θ}, \hat{A})} - \frac{{\dot{Ψ}}_{θ} (O_{i}; θ_{0}, A_{0}) [\int h_{k} d {\hat{Λ}}_{0 k}]}{Ψ (O_{i}; θ_{0}, A_{0})}] .

Define $N_{0} = {(θ, A) : ∣ θ - θ_{0} ∣ + \sum_{k = 1}^{K} {| | Λ_{k} - Λ_{0 k} | |}_{V [0, τ]} < δ_{0}}$ , where δ₀ is a small positive constant. When n is large enough, (θ̂, Inline graphic ) belongs to with probability one. By Lemma 1 and the Donsker Theorem,

\begin{array}{l} o_{p} (1) + n^{1 / 2} (P_{n} - P) [v^{T} {\dot{L}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} L_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}]] \\ = - n^{1 / 2} P [\frac{v^{T} {\dot{Ψ}}_{θ} (O_{i}; \hat{θ}, \hat{A})}{Ψ (O_{i}; \hat{θ}, \hat{A})} - \frac{v^{T} {\dot{Ψ}}_{θ} (O_{i}; θ_{0}, A_{0})}{Ψ (O_{i}; θ_{0}, A_{0})}] - \sum_{k = 1}^{K} n^{1 / 2} P [\frac{{\dot{Ψ}}_{k} (O_{i}; \hat{θ}, \hat{A}) [\int h_{k} d {\hat{Λ}}_{k}]}{Ψ (O_{i}; \hat{θ}, \hat{A})} - \frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [\int h_{k} d {\hat{Λ}}_{0 k}]}{Ψ (O_{i}; θ_{0}, A_{0})}], \end{array}

(7)

where o_p(1) represents some random element converging in probability to zero in l^∞( Inline graphic × ).

Under (C6), the first term on the right-hand side of (7) is

n^{- 1 / 2} {\sum_{k = 1}^{K} \int_{0}^{τ} v^{T} ζ_{0 k} (s) d ({\hat{Λ}}_{k} - Λ_{0 k}) + v^{T} ζ_{0 θ} (\hat{θ} - θ_{0})} + o (n^{1 / 2} ∣ \hat{θ} - θ_{0} ∣ + n^{1 / 2} \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}) .

The second term is $- \sum_{k = 1}^{K} n^{1 / 2} {\int_{0}^{τ} h_{k} (t) η_{0 k} (t; \hat{θ}, \hat{A}) d {\hat{Λ}}_{k} (t) - \int_{0}^{τ} h_{k} (y) η_{0 k} (t; θ_{0}, A_{0}) d Λ_{0 k} (t)}$ . It follows from (C6) that the above expression is

\begin{array}{l} - \sum_{k = 1}^{K} n^{1 / 2} [\int_{0}^{τ} h_{k} (t) {η_{0 k θ} (t; θ_{0}, A_{0}) (\hat{θ} - θ_{0}) + \sum_{m = 1}^{K} \int_{0}^{τ} η_{0 k m} (s, t; θ_{0}, A_{0}) d ({\hat{Λ}}_{m} - Λ_{0 m}) (s)} d Λ_{0 k} (t) + \int_{0}^{τ} h_{k} (t) η_{0 k} (t; θ_{0}, A_{0}) d ({\hat{Λ}}_{k} (t) - Λ_{0 k} (t))] + o (n^{1 / 2} ∣ \hat{θ} - θ_{0} ∣ + n^{1 / 2} \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}) \\ = - \sum_{k = 1}^{K} n^{1 / 2} [{(\hat{θ} - θ_{0})}^{T} \int_{0}^{τ} h_{k} (t) η_{0 k θ} (t; θ_{0}, A_{0}) d Λ_{0 k} (t) + \sum_{m = 1}^{K} \int_{0}^{τ} {I (m = k) h_{m} (t) η_{0 m} (t; θ_{0}, A_{0}) + \int_{0}^{τ} η_{0 k m} (s, t; θ_{0}, A_{0}) h_{k} (s) d Λ_{0 k} (s)} d ({\hat{Λ}}_{m} (t) - Λ_{0 m} (t))] + o (n^{1 / 2} ∣ \hat{θ} - θ_{0} ∣ + n^{1 / 2} \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}) . \end{array}

Thus, the right-hand side of (7) can be written as

- n^{- 1 / 2} {B_{1} {[v, W]}^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int B_{2 k} [v, W] d ({\hat{Λ}}_{k} - Λ_{0 k})} + o (n^{1 / 2} ∣ \hat{θ} - θ_{0} ∣ + n^{1 / 2} \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}),

where (B₁, B₂₁, …, B₂_K) are linear operators in R^d × {BV [0, τ]}^K, and

B_{1} [v, W] = v^{T} ζ_{0 θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (t) η_{0 k θ} (t; θ_{0}, A_{0}) d Λ_{0 k} (t),

(8)

B_{2 k} [v, W] = v^{T} ζ_{0 k} (s; θ_{0}, A_{0}) + h_{k} (t) η_{0 k} (t; θ_{0}, A_{0}) + \sum_{m = 1}^{K} \int_{0}^{τ} η_{0 m k} (s, t; θ_{0}, A_{0}) h_{m} (s) d Λ_{0 k} (s), k = 1, \dots, K .

(9)

It follows from the above derivation that

\begin{array}{l} B_{1} {[v, W]}^{T} \tilde{v} + \sum_{m = 1}^{K} \int B_{2 k} [v, W] {\tilde{W}}_{k} d Λ_{0 k} \\ = {\frac{d}{d ε} |}_{ε = 0} P [v^{T} L_{θ} (θ_{0} + ε \tilde{v}, A_{0} + ε \int \tilde{W} d A_{0}) + \sum_{k = 1}^{K} L_{k} (θ_{0} + ε \tilde{v}, A_{0} + ε \int \tilde{W} d A_{0}) [\int h_{k} d Λ_{0 k}]] . \end{array}

(10)

We can write (B₁, B₂₁, …, B₂_K)[v, Inline graphic ] as

(\begin{matrix} v \\ η_{01} (t; θ_{0}, A_{0}) \times h_{1} (t) \\ ⋮ \\ η_{0 K} (t; θ_{0}, A_{0}) \times h_{K} (t) \end{matrix}) + (\begin{matrix} v^{T} ζ_{0 θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (t) η_{0 k θ} (t; θ_{0}, A_{0}) d Λ_{0 k} (t) - v \\ v^{T} ζ_{01} (t; θ_{0}, A_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} η_{0 m 1} (s, t; θ_{0}, A_{0}) h_{m} (s) d Λ_{0 m} (s) \\ ⋮ \\ v^{T} ζ_{0 K} (t; θ_{0}, A_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} η_{0 m K} (s, t; θ_{0}, A_{0}) h_{m} (s) d Λ_{0 m} (s) \end{matrix}) .

We wish to prove that (B₁, B₂₁, …, B₂_K) is invertible. As shown at the end of this section, η₀_k(t; θ₀, Inline graphic ) < 0, so that the first term of (B₁, B₂₁, …, B₂_K) is an invertible operator. It follows from Lemma 3 that the second term is a compact operator. Thus, (B₁, B₂₁, …, B₂_K) is a Fredholm operator, and the invertibility of (B₁, …, B₂_K) is equivalent to the operator being one-to-one (Rudin (1973, pp. 99–103)). Suppose that B₁[v, Inline graphic ] = 0, …, and B₂_K[v, ] = 0. It is easy to see from (10) that the derivative of $P [v^{T} L_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} L_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}]]$ along the path (θ₀ + εv, + ε∫d) is zero. That is, the information along this path is zero, or $v^{T} L_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} L_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}] = 0$ almost surely. By (C7), v = 0 and Inline graphic = 0, so that (B₁, B₂₁, …, B₂_K) is one-to-one and invertible.

It follows from (7) that, for any (v, Inline graphic ) ∈ × ,

n^{1 / 2} {v^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (t) d ({\hat{Λ}}_{k} (t) - Λ_{0 k} (t))} = - n^{1 / 2} (P_{n} - P) [{\hat{v}}^{T} {\dot{L}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} {\dot{L}}_{k} (θ_{0}, A_{0}) [\int {\tilde{h}}_{k} d Λ_{0 k}]] + o (n^{1 / 2} ∣ \hat{θ} - θ_{0} ∣ + n^{1 / 2} \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}),

where (ṽ, h̃₁, …, h̃_K) = (B₁, B₂₁, …, B₂_K)⁻¹(v, h₁, …, h_K). Since

∣ \hat{θ} - θ_{0} ∣ + \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]} = sup_{(v, h_{1}, \dots, h_{K}) \in V \times Q^{K}} | v^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (t) d ({\hat{Λ}}_{k} (t) - Λ_{0 k} (t)) |,

we have

n^{1 / 2} {∣ \hat{θ} - θ_{0} ∣ + \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}} = O_{p} (1) + o (n^{1 / 2} ∣ \hat{θ} - θ_{0} ∣ + n^{1 / 2} \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}) .

Thus, $n^{1 / 2} {∣ \hat{θ} - θ_{0} ∣ + \sum_{k = 1}^{K} {| | {\hat{Λ}}_{k} - Λ_{0 k} | |}_{V [0, τ]}} = O_{p} (1)$ . Consequently,

n^{1 / 2} {v^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (t) d ({\hat{Λ}}_{k} (t) - Λ_{0 k} (t))} = - n^{1 / 2} (P_{n} - P) [{\tilde{v}}^{T} {\dot{L}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} {\dot{L}}_{k} (θ_{0}, A_{0}) [\int {\tilde{h}}_{k} d Λ_{0 k}]] + o_{p} (1) .

We have proved that n^1/2(θ̂− θ₀, Inline graphic − ) converges weakly to a Gaussian process in l^∞( × ). By choosing h_k = 0 for k = 1, …, K, we see that v^Tθ̂ is an asymptotically linear estimator of v^T θ₀ with influence function ${\tilde{v}}^{T} {\dot{L}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} {\dot{L}}_{k} (θ_{0}, A_{0}) [\int {\tilde{h}}_{k} d Λ_{0 k}]$ . Since the influence function lies in the space spanned by the score functions, θ̂ is an efficient estimator for θ₀.

It remains to verify that η₀_k(t; θ₀, Inline graphic ) < 0. Under (C6), $P [{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [H_{k}] / Ψ (O_{i}; θ_{0}, A_{0})] = \int_{0}^{τ} η_{0 k} (s; θ_{0}, A_{0}) d H_{k} (s)$ . The choice of H_k(s) = I(s ≥ t) yields [Ψ̇_k(;θ₀,)[I(· ≥ t)]/Ψ(;θ₀,)] = η₀_k(t;θ₀,). On the other hand, the score function along the path Λ₀_k + εI(· ≥ t) with the other parameters fixed at their true values has zero expectation. We expand this expectation to obtain

P [\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [I (\cdot \geq t)]}{Ψ (O_{i}; θ_{0}, A_{0})}] = - λ_{k}^{- 1} (t) d E [I (R_{i k \cdot} (t) > 0) N_{i k \cdot}^{*} (t)] / d t < 0.

Thus, η₀_k(t; θ₀, Inline graphic ) < 0.

8. Information Matrix

Theorem 2 implies that the functional parameter Inline graphic can be estimated at the same rate as the Euclidean parameter θ. Thus, we may treat (1) as a parametric log-likelihood with θ and the jump sizes of Λ_k, k = 1, …, K, at the observed failure times as the parameters and estimate the asymptotic covariance matrix of the NPMLEs for these parameters by inverting the information matrix. This result is formally stated in Theorem 3. We impose an additional assumption.

(C8) There exists a neighborhood of (θ₀, Inline graphic ) such that for (θ, ) in this neighborhood, the first and second derivatives of log Ψ(; θ, ) with respect to θ and along the path Λ_k + εH_k with respect to ε satisfy the inequality in (C4).

For any v ∈ Inline graphic and h₁, …, h_K ∈ , we consider the vector ${(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T}$ , where h⃗_k is the vector consisting of the values of h_k(·) at the observed failure times. Let ℐ_n be the negative Hessian matrix of (1) with respect to θ̂ and the jump sizes of (Λ̂₁, …, Λ̂_K).

Theorem 3

Assume (C1)–(C8). Then ℐ_n is invertible for large n, and

sup_{v \in V, h_{1}, \dots, h_{K} \in Q} | n (v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) I_{n}^{- 1} {(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T} - AVar [n^{1 / 2} {v^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int h_{k} d ({\hat{Λ}}_{k} - Λ_{0 k})}] | \to 0

in probability, where AVar denotes the asymptotic variance.

Proof

The proof is similar to that of Theorem 3 in Parner (1998); see also van der Vaart (1998, pp. 419–424). First, (10) implies that, for any v ∈ Inline graphic and h₁, …, h_K ∈ ,

- P ((\begin{matrix} {\ddot{L}}_{θ θ} & {\ddot{L}}_{θ 1} & \dots & {\ddot{L}}_{θ K} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\ddot{L}}_{K θ} & L_{K 1} & \dots & L_{K K} \end{matrix}) [(\begin{matrix} v \\ \int h_{1} d Λ_{01} \\ ⋮ \\ \int h_{K} d Λ_{0 K} \end{matrix}), (\begin{matrix} v \\ \int h_{1} d Λ_{01} \\ ⋮ \\ \int h_{K} d Λ_{0 K} \end{matrix})]) = v^{T} B_{1} (v, h_{1}, \dots, h_{K}) + \sum_{k = 1}^{K} \int B_{2 k} (v, h_{1}, \dots, h_{K}) h_{k} d Λ_{0 k},

(11)

where ℒ̈ pertains to the second-order derivative of the log-likelihood function.

On the right-hand side of (10), we replace Inline graphic by to obtain two new linear operators B_n₁ and B_n₂_k. It is easy to show that B_n₁ and B_n₂_k converge uniformly to B₁ and B₂_k, respectively. Under (C8), the results of Lemma 1 apply to the second-order derivatives ℒ̈ and the operators (B₁, B₂₁, …, B₂_K). By replacing θ₀, Λ₀_k and Inline graphic on both sides of (11) with θ̂, Λ̂₀_k and , we obtain

(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) I_{n} {(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T} = v^{T} B_{n 1} (\tilde{v}, {\tilde{h}}_{1}, \dots, {\tilde{h}}_{K}) + \sum_{k = 1}^{K} \int B_{n 2 k} (\tilde{v}, {\tilde{h}}_{1}, \dots, {\tilde{h}}_{K}) h_{k} d {\tilde{Λ}}_{k} + o_{p} (1) .

According to the proof of Theorem 2, (B₁, B₂₁, …, B₂_K) is invertible, and so is (B_n₁, …, B_n₂_k) for large n. Note that $v^{T} B_{n 1} (\tilde{v}, {\tilde{h}}_{1}, \dots, {\tilde{h}}_{K}) + \sum_{k = 1}^{K} \int B_{n 2 k} (\tilde{v}, {\tilde{h}}_{1}, \dots, {\tilde{h}}_{K}) h_{k} d {\tilde{Λ}}_{k}$ can be written as $(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) \times B_{n} {(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T}$ for some matrix ℬ_n. Therefore ℬ_n is invertible, and so is ℐ_n. Furthermore,

sup_{v \in V, h_{1}, \dots, h_{K} \in Q} | (v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) I_{n} {(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T} - (v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) B_{n} {(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T} | \to 0.

According to Theorem 2, the asymptotic variance of $n^{1 / 2} {v^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int h_{k} d ({\hat{Λ}}_{k} - Λ_{0 k})}$ is

P [{{\dot{L}}_{θ}^{T} \tilde{v} + \sum_{k = 1}^{K} {\dot{L}}_{k} [\int {\tilde{h}}_{k} d Λ_{0 k}]}^{2}] - P {(\begin{matrix} {\ddot{L}}_{θ θ} & {\ddot{L}}_{θ 1} & \dots & {\ddot{L}}_{θ K} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\ddot{L}}_{K θ} & L_{K 1} & \dots & L_{K K} \end{matrix}) [(\begin{matrix} \tilde{v} \\ \int {\tilde{h}}_{1} d Λ_{01} \\ ⋮ \\ \int {\tilde{h}}_{K} d Λ_{0 K} \end{matrix}), (\begin{matrix} \tilde{v} \\ \int {\tilde{h}}_{1} d Λ_{01} \\ ⋮ \\ \int {\tilde{h}}_{K} d Λ_{0 K} \end{matrix})]},

where (ṽ, h̃₁, …, h̃_K) is (B₁, B₂₁, …, B₂_K)⁻¹(v, h₁, …, h_K ), which can be approximated by (B_n₁, B_n₂₁, …, B_n₂_K)⁻¹(v, h₁, …, h_K). Hence, the asymptotic variance can be approximated uniformly in v and h_k’s by its empirical counterpart $(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) B_{n}^{- 1} I_{n} B_{n}^{- 1} {({\tilde{v}}^{T}, {\vec{\tilde{h}}}_{1}^{T}, \dots, {\vec{\tilde{h}}}_{K}^{T})}^{T}$ , which is further approximately by $(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T}) I_{n}^{- 1} {(v^{T}, {\vec{h}}_{1}^{T}, \dots, {\vec{h}}_{K}^{T})}^{T}$ .

9. Profile Likelihood

Theorem 4

Let pl_n(θ) be the profile log-likelihood function for θ, and assume (C1)–(C8). For any ε_n = O_p(n^−1/2) and any vector v,

- \frac{p l_{n} (\hat{θ} + ε_{n} v) - 2 {p l}_{n} (\hat{θ}) + p l_{n} (\hat{θ} - ε_{n})}{n ε_{n}^{2}} \to_{p} v^{T} \sum^{- 1} v,

where Σ is the limiting covariance matrix of n^1/2(θ̂ − θ₀). Furthermore, $2 {p l_{n} (\hat{θ}) - p l_{n} (θ_{0})} \to_{d} χ_{d}^{2}$ .

Proof

We appeal to Theorem 1 of Murphy and van der Vaart (2000). Specifically, we construct the least favorable submodel for θ₀ and verify all the conditions in their Theorem 1. For notational simplicity, we assume that K = 1. It is straightforward to extend to K > 1.

It follows from the proof of Theorem 2 that

\int_{0}^{τ} B_{2} (0, h) h^{*} d Λ_{0} = - E [{\ddot{L}}_{Λ Λ} [\int h^{*} d Λ_{0}, \int h d Λ_{0}]],

where B₂ stands for the operator (B₂₁, …, B₂_K), and ℒ̈_ΛΛ[H₁, H₂] denotes the second-order derivative of ℒ(θ, A) with respect to Λ along the bi-directions H₁ and H₂. On the other hand,

E [{\dot{L}}_{Λ} [\int h^{*} d Λ_{0}] {\dot{L}}_{θ}] = - \int_{0}^{τ} h^{*} (s) {\dot{L}}_{Λ}^{*} {\dot{L}}_{θ} d Λ_{0} (s),

where $L_{Λ}^{*}$ is the dual operator of ℒ_Λ in L₂[0, τ]. Thus, if we choose h such that $B_{2} (0, h) = - {\dot{L}}_{Λ}^{*} {\dot{L}}_{θ}$ , then

E [{\dot{L}}_{Λ} [\int h^{*} d Λ_{0}] {\dot{L}}_{θ}] = - E [{\ddot{L}}_{Λ Λ} [\int h^{*} d Λ_{0}, \int h d Λ_{0}]] .

By definition, ∫hdΛ₀ is the least favorable direction for θ₀ and ℒ̇_θ − ℒ̇_Λ [∫hd Λ₀] is the efficient score function. Such an h exists since B₂(0, ·) is invertible. In addition, h ∈ BV [0, τ]. Hence, we can construct the least favorable submodel at (θ, Λ) by ε ↦ (ε, Λ_ε) with dΛ_ε (θ, Λ) = {1 + (ε − θ) · h} dΛ. Clearly, Λ_θ (θ, Λ) = Λ and

{\frac{\partial L (ε, Λ_{ε})}{\partial ε} |}_{ε = θ_{0}, θ = θ_{0}, Λ = Λ_{0}} = {\dot{L}}_{θ} - {\dot{L}}_{Λ} [\int h d Λ_{0}] .

If θ̃ →_p θ₀ and Λ̂_θ̃ maximizes the objective function with θ̂ replaced by θ̃, we can use the arguments in the proof of Theorem 1 to show that Λ̂_θ̃ is consistent. In the likelihood equation for Λ̂_θ̃, we can use the arguments for the linearization of (7) to show that, uniformly in h ∈ Inline graphic ,

o_{p} (1) + n^{1 / 2} (P_{n} - P) [{\dot{L}}_{Λ} (θ_{0}, Λ_{0}) [\int h d Λ_{0}]] = - n^{1 / 2} \int_{0}^{τ} B_{2} (0, h) d ({\hat{Λ}}_{\tilde{θ}} - Λ_{0}) + O_{p} (n^{1 / 2} ∣ \tilde{θ} - θ_{0} ∣) + o_{p} (n^{1 / 2} {| | {\hat{Λ}}_{\tilde{θ}} - Λ_{0} | |}_{V [0, τ]}) .

The arguments for proving the invertibility of (B₁, B₂) show that h ↦ B₂(0, h) is invertible. Thus,

{| | {\hat{Λ}}_{\tilde{θ}} - Λ_{0} | |}_{V [0, τ]} = O_{p} (∣ \tilde{θ} - θ_{0} ∣ + n^{1 / 2}) .

By condition (C6), we obtain the no-bias condition, i.e.,

E [{\frac{\partial L (ε, Λ_{ε})}{\partial ε} |}_{ε = θ_{0}, θ = \tilde{θ}, Λ = {\hat{Λ}}_{\tilde{θ}}}] = O_{p} (∣ \tilde{θ} - θ_{0} ∣ + n^{- 1 / 2}) .

We have verified conditions (8)–(11) of Murphy and van der Vaart (2000).

Condition (C4), together with Lemma 1, implies that the class

{\frac{\partial L (ε, Λ_{ε})}{\partial ε} : ∣ ε - θ_{0} ∣ < δ_{0}, (θ, Λ) \in N_{0}}

is P-Donsker and that the functions in the class are continuous at (θ₀, Λ₀) almost surely, while condition (C8) implies that the class

{\frac{\partial^{2} L (ε, Λ_{ε})}{\partial ε^{2}} : ∣ ε - θ_{0} ∣ < δ_{0}, (θ, Λ) \in N_{0}}

is P-Glivenko-Cantelli and is bounded in L₂(P). Therefore, all the conditions in Murphy and van der Vaart (2000) hold, so that the desired results follows from their Theorem 1.

10. Applications

In this section, we apply the general results to the problems described in Section 2. We identify a set of conditions for each problem under which regularity conditions (C1)–(C8) are satisfied so that the desired asymptotic properties hold. These applications not only provide the theoretical justifications for the work of Zeng and Lin (2007), but also illustrate how the general theory can be applied to specific problems.

10.1. Transformation Models With Random Effects for Dependent Failure Times

We assume the following conditions.

(D1) The parameter value ${(β_{0}^{T}, γ_{0}^{T})}^{T}$ belongs to the interior of a compact set Θ in R^d, and $Λ_{0 k}^{'} (t) > 0$ for all t ∈ [0, τ], k = 1, …, K.

(D2) With probability one, Z_ikl(·) and Z̃_ikl(·) are in BV [0, τ] and are left-continuous with bounded left-and right-derivatives in [0, τ].

(D3) With probability one, P (C_ikl ≥ τ|Z_ikl) > δ₀ > 0 for some constant δ₀.

(D4) With probability one, n_ik is bounded by some integer n₀. In addition, E[N_ik_·(τ)] < ∞.

(D5) For k = 1, …, K, G_k(x) is four-times differentiable such that G_k(0) = 0, $G_{k}^{'} (x) > 0$ , and for any integer m ≥0 and any sequence 0 < x₁ < … < x_m ≤ y,

\prod_{l = 1}^{m} {(1 + x_{l}) G_{k}^{'} (x_{l})} exp {- G_{k} (y)} \leq μ_{0 k}^{m} {(1 + y)}^{- κ_{0 k}}

for some constants μ₀_k and κ₀_k > 0. In addition, there exists a constant ρ₀_k such that

sup_{x} {\frac{∣ G_{k}^{″} (x) ∣ + ∣ G^{(3)} (x) ∣ + ∣ G^{(4)} (x) ∣}{G^{'} (x) {(1 + x)}^{ρ_{0 k}}}} < \infty .

(D6) For any constant a₁ > 0,

sup_{γ} E [\int_{b} exp {a_{1} (N_{i k \cdot}^{*} (τ) + 1) ∣ b ∣} f (b; γ) d b] < \infty,

and there exists a constant a₂ > 0 such that for any γ,

| \frac{{\dot{f}}_{γ} (b; γ)}{f (b; γ)} | + | \frac{{\ddot{f}}_{γ} (b; γ)}{f (b; γ)} | + | \frac{f_{γ}^{(3)} (b; γ)}{f (b; γ)} | \leq O (1) exp {a_{2} (1 + ∣ b ∣)} .

(D7) Consider two types of events: k ∈ Inline graphic indicates that event k is recurrent and k ∈ indicates that event k is survival time. For k ∈ ∪ , if there exist c_k(t) and v such that with probability 1, c_k(t) + v^TZ_ikl(t) = 0 for k ∈ and c_k(0) + v^TZ_ikl(0) = 0 for k ∈ , then v = 0.

(D8) If there exist constants α_k and α₀_k such that for any subset L_k ⊂ {1, …, n_ik} and for any ω_kl and t_kl,

\begin{array}{l} \int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} b^{T} {\tilde{Z}}_{ikl} (t_{k l})} \prod_{k \in K_{2}} \prod_{l \in L_{k}} exp {α_{k} + b^{T} {\tilde{Z}}_{ikl} (0)} f (b; γ) d b \\ = \int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} b^{T} {\tilde{Z}}_{ikl} (t_{k l})} \prod_{k \in K_{2}} \prod_{l \in L_{k}} exp {α_{0 k} + b^{T} {\tilde{Z}}_{ikl} (0)} f (b; γ_{0}) d b, \end{array}

then γ = γ₀. In addition, if for k ∈ Inline graphic and for any t,

\int_{b} exp {- G_{k} (\int_{0}^{t} e^{b^{T} {\tilde{Z}}_{ikl} (s)} d Λ_{1} (s))} f (b; γ_{0}) d b = \int_{b} exp {- G_{k} (\int_{0}^{t} e^{b^{T} {\tilde{Z}}_{ikl} (s)} d Λ_{2} (s))} f (b; γ_{0}) d b,

then Λ₁ = Λ₂. Furthermore, if for some vector v and constant α_k,

I (k \in K_{1}) \int_{b} e^{2 b^{T} {\tilde{Z}}_{ikl} (0)} f^{'} {(b; γ_{0})}^{T} vdb + (k \in K_{2}) \int_{b} e^{b^{T} {\tilde{Z}}_{ikl} (0)} (α_{k} f (b; γ_{0}) - f^{'} {(b; γ_{0})}^{T} v) d b = 0,

then v = 0.

(D1)–(D4) are standard conditions for this type of problem. We show that (D5) holds for all commonly used transformations. We first consider the class of logarithmic transformations G(x) = ρ log(1 + rx) (ρ > 0, r > 0). Clearly,

\begin{array}{l} \prod_{k = 1}^{m} {(1 + x_{k}) G^{'} (y)} exp {- G (y)} \leq \prod_{k = 1}^{m} {\frac{ρ r (1 + x_{k})}{1 + r x_{k}}} {(1 + r y)}^{- ρ} \\ \leq {ρ r (1 + 1 / r)}^{m} min {(1, r)}^{- ρ} {(1 + y)}^{- ρ} . \end{array}

Thus, in (D5), we can set μ₀ to ρr(1 + 1/r) min(1, r)⁻^ρ and κ₀ to ρ. We can verify the polynomial bounds for G″(x)/G(x), G⁽³⁾(x)/G(x) and G⁽⁴⁾(x)/G(x) by direct calculations. We next consider the class of Box-Cox transformations G(x) = {(1 + x)^ρ − 1}/ρ. Clearly,

\begin{array}{l} \prod_{k = 1}^{m} {(1 + x_{k}) G^{'} (x_{k})} exp {- G (y)} \leq \prod_{k = 1}^{m} {(1 + x_{k})}^{ρ} exp [- {{(1 + y)}^{ρ} - 1} / ρ] \\ \leq {(1 + y)}^{m ρ} exp {- {(1 + y)}^{ρ} / 2 ρ} exp {- {(1 + y)}^{ρ} / 2 ρ} exp (1 / ρ) \\ \leq {4 ρ + exp (1 / ρ)}^{m} {(1 + y)}^{- ρ} . \end{array}

Thus, we can set μ₀ to 4ρ + exp(1/ρ) and κ₀ to ρ. The polynomial bounds for G″ (x)/G(x), G⁽³⁾(x)/G(x) and G⁽⁴⁾(x)/G(x) hold naturally. Finally, we consider the linear transformation model: H(T) = β ^T Z + ε, where ε is standard normal. In this case, G(x) = − log{1 − Φ(log x)}, where Φ is the standard normal distribution function. We claim that there exists a constant ν₀ > 0 such that φ (x) − ν₀{1 − Φ(x)}(1 + |x|). If x < 0, then φ(x) ≤ (2π)^−1/2 ≤ 2(2π)^−1/2 {1 − Φ(x)}(1 + |x|). If x ≥ 0,

lim_{x \to 0} \frac{φ (x)}{{1 - Φ (x)} (1 + x)} = 2 {(2 π)}^{- 1 / 2} .

By the L’Hospital rule,

\begin{matrix} lim_{x \to \infty} \frac{1 - Φ (x)}{φ (x)} = lim_{x \to \infty} \frac{φ (x)}{φ (x) x} = 0, \\ lim_{x \to \infty} \frac{φ (x)}{{1 - Φ (x)} (1 + x)} = lim_{x \to \infty} \frac{- φ (x) x}{- φ (x) (1 + x) + {1 - Φ (x)}} = lim_{x \to \infty} \frac{1}{(1 + x) / x - {1 - Φ (x)} / x φ (x)} = 1. \end{matrix}

Therefore, φ(x)/[{1 − Φ(x)}(1 + x)] is bounded for x ≥ 0. Without loss of generality, assume that y > 1. Clearly,

\prod_{k = 1}^{m} {(1 + x_{k}) G^{'} (x_{k})} exp {- G (y)} = \prod_{k = 1}^{m} {\frac{(1 + x_{k}) φ (log (x_{k})) / x_{k}}{1 - Φ (log (x_{k}))}} {1 - Φ (log y)} .

Since (1 + x) φ(log(x))/[x{1 − Φ(log x)}] is bounded when x is close to zero and it is bounded by a multiplier of (1 + log x) when x is close to ∞, (1 + x)φ(log(x))/x{1 − Φ(log x)} ≤ ν₀₁ + ν₀₂ log(1 + x) for two constants ν₀₁ and ν₀₂. Therefore,

\prod_{k = 1}^{m} {(1 + x_{k}) G^{'} (x_{k})} exp {- G (y)} \leq {ν_{01} + ν_{02} log (1 + y)}^{m} {1 - Φ (log y)} .

Since 1 − Φ(x) ≤ 2^1/2 exp(−x²/4) when x > 0, the above expression is bounded by

\begin{array}{l} 2^{1 / 2} {ν_{01} + ν_{02} log (1 + y)}^{m} exp {- {(log y)}^{2} / 4} \\ \leq ν_{03} {ν_{01} + ν_{02} log (1 + y)}^{m} exp {- ν_{04} {(log (1 + y))}^{2}} \\ \leq ν_{05}^{m} {(1 + y)}^{- ν_{04} / 2}, \end{array}

where all the ν’s are positive constants. The polynomial bounds for G″ (x)/G(x), G⁽³⁾(x)/G(x) and G⁽⁴⁾(x)/G(x) follow from the fact that φ(x)={1 − Φ (x)} ≤ O(1 + |x|).

Condition (D6) pertains to the tail property of the density function for the random effects f(b; γ). For survival data, $N_{i k \cdot}^{*} (τ) \leq 1$ , so that the first half of condition (D6) is tantamount to that the moment generating function of b exists everywhere. This condition holds naturally when b has a compact support or a Gaussian density tail. The second half of condition (D6) clearly holds for Gaussian density functions.

(D7) and (D8) are sufficient conditions to ensure parameter identifiability and non-singularity of the Fisher information matrix. In most applications, these conditions are tantamount to the linear independence of covariates and the unique parametrization of the random-effects distribution. Specifically, if Z̃_ikl is time-independent, then the second condition in (D8) is not necessary; if Z̃_ikl does not depend on k and l, and b has a normal distribution, then the other two conditions in (D8) hold as well provided that Z̃_ikl is linearly independent with positive probability; if Z̃_ikl is time-independent and Inline graphic is non-empty (i.e., at least one event is recurrent), then (D8) can be replaced by the linear independence of Z̃_ikl for some k ∈ and the unique parametrization of f (b; γ).

We wish to show that (D1)–(D8) imply (C1)–(C8), so that the desired asymptotic properties hold. Conditions (C1) and (C2) follow naturally from (D1)–(D4). To verify (C3), we note that

Ψ (O_{i}; θ, A) = \int_{b} \prod_{k = l}^{K} \prod_{l = 1}^{n_{i k}} Ω_{ikl} (b; β, Λ_{k}) f (b; γ) d b,

where

Ω_{ikl} (b; β, Λ_{k}) = \prod_{t \leq τ} {R_{ikl} (t) e^{β^{T} Z_{ikl} (t) + b^{T} {\tilde{Z}}_{ikl} (t)} G_{k}^{'} (q_{ikl} (t))}^{d N_{ikl}^{*} (t)} exp {- G_{k} (q_{ikl} (τ))},

and $q_{ikl} (t) = \int_{0}^{t} R_{ikl} (s) exp {β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d Λ_{k} (s)$ .

If ||Λ_k||_V _[0,_τ_] are bounded, then $Ω_{ikl} (b; β, Λ_{k}) \geq exp {O (1) N_{ikl}^{*} (τ)} I (∣ b ∣ \leq B_{0})$ for any fixed constant B₀ such that P(|b| ≤ B₀) > 0. Thus, Ψ( Inline graphic ; θ, ) is bounded from below by $exp {O (1) N_{ikl}^{*} (τ)}$ , so that the second half of (C3) holds. It follows from (D5) that

Ω_{ikl} (b; β, Λ_{k}) \leq O (1) \prod_{t \leq τ} {R_{ikl} (t) e^{b^{T} {\tilde{Z}}_{ikl} (t)}}^{d N_{ikl}^{*} (t)} μ_{0 k}^{N_{ikl}^{*} (τ)} \prod_{t \leq τ} {1 + q_{ikl} (t)}^{- d N_{ikl}^{*} (t)} {1 + q_{ikl} (τ)}^{- κ_{0 k}} .

Since exp{β^T Z_ikl(s) + b^T Z̃_ikl(s)} ≥ exp{−O(1 + |b|)}, we have $1 + q_{ikl} (t) \geq e^{- O (1 + ∣ b ∣)} {1 + \int R_{i k \cdot} (s) d Λ_{k} (s)}$ , so that

Ω_{ikl} (b; β, Λ_{k}) \leq O (1) μ_{0 k}^{N_{ikl}^{*} (τ)} e^{O (1 + N_{ikl}^{*} (τ)) ∣ b ∣} \prod_{t \leq τ} {1 + \int_{0}^{t} R_{i k \cdot} (s) d Λ_{k} (s)}^{- d N_{ikl}^{*} (t)} {1 + \int_{0}^{τ} R_{ikl} (s) d Λ_{k} (s)}^{- κ_{0 k}} .

Thus, the first half of (C3) holds as well.

We now verify (C4). Under (D5),

\begin{array}{l} ∣ Ω_{ikl} (b; β, Λ_{k}) ∣ \leq exp {O (1 + N_{ikl}^{*} (τ)) ∣ b ∣}, \\ | \frac{\partial}{\partial β} Ω_{ikl} (b; β, Λ_{k}) | = | Ω_{ikl} (b; β, Λ_{k}) [{\int R_{ikl} (t) Z_{ikl} (t) d N_{ikl}^{*} (t) + \int R_{ikl} (t) \frac{G_{k}^{″} (q_{ikl} (t)) \int_{0}^{t} R_{ikl} (s) e^{β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} Z_{ikl} (s) d Λ_{k} (s)}{G_{k}^{'} (q_{ikl} (t))} d N_{ikl}^{*} (t)} - G_{k}^{'} (q_{ikl} (τ)) {\int_{0}^{τ} R_{ikl} (s) e^{β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} Z_{ikl} (s) d Λ_{k} (s)}] | \\ \leq exp {O (1 + N_{ikl}^{*} (τ)) (1 + ∣ b ∣)}, \end{array}

\begin{array}{l} | \frac{\partial}{\partial Λ_{k}} Ω_{ikl} (b; β, Λ_{k}) [H_{k}] | = | Ω_{ikl} (b; β, Λ_{k}) \times [{\int R_{ikl} (t) \frac{G_{k}^{″} (q_{ikl} (t)) \int_{0}^{t} R_{ikl} (s) e^{β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d H_{k} (s)}{G_{k}^{'} (q_{ikl} (t))} d N_{ikl}^{*} (t)} - G_{k}^{'} (q_{ikl} (τ)) {\int_{0}^{τ} R_{ikl} (s) e^{β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d H_{k} (s)}] | \\ \leq exp {O (1 + N_{ikl}^{*} (τ)) (1 + ∣ b ∣)} . \end{array}

Thus, it follows from the Mean-Value Theorem that

\begin{array}{l} | Ω_{ikl} (b; β^{(1)}, Λ_{k}) - Ω_{ikl} (b; β^{(2)}, Λ_{k}) | = | \frac{\partial}{\partial β} Ω_{ikl} (b; β^{*}, Λ_{k}) | ∣ β^{(1)} - β^{(2)} ∣ \\ \leq exp {O (1 + N_{ikl}^{*} (τ)) ∣ b ∣} ∣ β^{(1)} - β^{(2)} ∣, \end{array}

\begin{array}{l} ∣ Ω_{ikl} (b; β, Λ_{k}^{(1)}) - Ω_{ikl} (b; β, Λ_{k}^{(2)}) ∣ = | \frac{\partial}{\partial Λ_{k}} Ω_{ikl} (b; β, Λ_{k}^{*}) [Λ_{k}^{(1)} - Λ_{k}^{(2)}] | \\ \leq exp {O (1 + N_{ikl}^{*} (τ)) ∣ b ∣} \times {\int R_{ikl} (t) | \int_{0}^{t} e^{β^{* T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d (Λ_{k}^{(1)} - Λ_{k}^{(2)}) (s) | d N_{ikl}^{*} (t) + | \int_{0}^{τ} R_{ikl} (t) e^{β^{* T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d (Λ_{k}^{(1)} - Λ_{k}^{(2)}) (s) |} \\ \leq exp {O (1 + N_{ikl}^{*} (τ)) (1 + ∣ b ∣)} \times {\int R_{ikl} (t) ∣ Λ_{k}^{(1)} (t) - Λ_{k}^{(2)} (t) ∣ d N_{ikl}^{*} (t) + \int_{0}^{τ} ∣ Λ_{k}^{(1)} (s) - Λ_{k}^{(2)} (s) ∣ d s}, \end{array}

where the last inequality follows from integration by parts and the fact that Z_ikl(t) and Z̃_ikl(t) have bounded variations. It then follows from (D6) that |Ψ ( Inline graphic ; θ⁽¹⁾, ) − Ψ (; θ⁽²⁾, )| is bounded by the right-hand side of the inequality in (C4). By the same arguments, we can verify the bounds for the other three terms in (C4).

To verify (C6), we calculate that

η_{0 k} (s; θ, A) = E [\int_{b} \frac{\prod_{m = 1}^{K} \prod_{l = 1}^{n_{i m}} Ω_{iml} (b; β, Λ_{m}) f (b; γ)}{\int_{b} \prod_{m = 1}^{K} \prod_{l = 1}^{n_{i m}} Ω_{iml} (b; β, Λ_{m}) f (b; γ) d b} \times {\int_{t \geq s} \frac{G_{k}^{″} (q_{ikl} (t))}{G_{k}^{'} (q_{ikl} (t))} d N_{ikl}^{*} (t) - G_{k}^{'} (q_{ikl} (τ))} R_{ikl} (s) e^{β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d b] .

For (θ, Inline graphic ) in a neighborhood of (θ₀, ),

| η_{0 k} (s; θ, A) - η_{0 k} (s; θ_{0}, A_{0}) - \frac{\partial}{\partial θ} η_{0 k} {(s; θ_{0}, A_{0})}^{T} (θ - θ_{0}) - \sum_{m = 1}^{K} \frac{\partial η_{0 k}}{\partial Λ_{m}} (s; θ_{0}, A_{0}) [Λ_{m} - Λ_{0 m}] | = 0 (∣ θ - θ_{0} ∣ + \sum_{m = 1}^{K} {| | Λ_{m} - Λ_{0 m} | |}_{V [0, τ]}) .

Thus, for the second equation in (C6), η₀_km(s, t; θ₀, Inline graphic ) is obtained from the derivative of η₀_k with respect to Λ_m along the direction Λ_m − Λ₀_m, and η₀_kθ is the derivative of η₀_k with respect to θ. Likewise, we can obtain the first equation in (C6). It is straightforward to verify the Lipschitz continuity of η₀_km.

The verification of (C8) is similar to that of (C4), relying on the explicit expressions of Ψ̈ _θθ ( Inline graphic ; θ, ) and the first and second derivatives of Ψ(; θ, + εℋ) with respect to ε.

It remains to verify the two identifiability conditions under (D7) and (D8). To verify (C5), suppose that (β, γ, Λ₁, …, Λ_k) yields the same likelihood as (β₀, γ₀, Λ₁₀, …, Λ_k₀). That is,

\int_{b} \prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} λ_{k} {(t)}^{d N_{ikl}^{*} (t)} Ω_{ikl} (b; β, Λ_{k}) f (b; γ) d b = \int_{b} \prod_{k = 1}^{K} \prod_{l = 1}^{n_{i k}} λ_{k 0} {(t)}^{d N_{ikl}^{*} (t)} Ω_{ikl} (b; β_{0}, Λ_{k 0}) f (b; γ_{0}) d b .

We perform the following operations on both sides sequentially for k = 1, …, K and l = 1, …, n_ik.

(a) If the kth type of event pertains to survival time, for the lth subject of this type of event, the first equation is obtained with R_ikl(t) = 1 and $d N_{ikl}^{*} (t) = 0$ for any t ≤ τ, i.e., the subject does not experience any event in [0, τ]. The second equation is obtained by integrating t from t_kl to τ on both sides under the scenario that R_ikl(t) = 1 and $N_{ikl}^{*} (t)$ has a jump at t, i.e, the subject experiences the event at time t_kl. We then take the difference between these two equations. In the resulting equation, the terms $λ_{k} {(t)}^{d N_{ikl}^{*} (t)} Ω_{ikl} (b; β, Λ_{k})$ and $λ_{k 0} {(t)}^{d N_{ikl}^{*} (t)} Ω_{ikl} (b; β_{0}, Λ_{k 0})$ are replaced by $exp {- G_{k} (\int_{0}^{t_{k l}} exp {β^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{i k} (s)} d Λ_{k})}$ and $exp {- G_{k} (\int_{0}^{t_{k l}} exp {β_{0}^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{i k} (s)} d Λ_{k 0})}$ , respectively.

(b) If the kth type of event is recurrent, for the lth subject of this type of event, we let R_ikl(t) = 1 and let $N_{ikl}^{*} (t)$ have jumps at s₁, s₂, …, s_m and $s_{1}^{'}, \dots, s_{m^{'}}^{'}$ for any arbitrary (m + m′) times in [0, τ]. We integrate s₁, …, s_m from 0 to t_kl and integrate $s_{1}^{'}, \dots, s_{m^{'}}^{'}$ from 0 to τ. In the obtained equation, $λ_{k} {(t)}^{d N_{ikl}^{*} (t)} Ω_{ikl} (b; β, Λ_{k})$ is replaced by {G_k(q_ikl(t_kl))}^m {G_k(q_ikl(τ))}^m^′ on both sides. Note that m and m′ are arbitrary. We then multiple both sides by {(iω_kl)^m/m!}/m′! and sum over m, m′ = 0, 1, … On both sides of the resulting equation, the terms associated with k and l are replaced by exp{iω_klG_k(q_ikl(t_kl))}.

After these sequential operations, we obtain

\int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} G_{k} (q_{ikl} (t_{k l}))} \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} exp {- G_{k} (q_{ikl} (t_{k l}))} f (b; γ) d b = \int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} G_{k} (q_{ikl} (t_{k l}))} \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} exp {- G_{k} (q_{ikl 0} (t_{k l}))} f (b; γ_{0}) d b .

For survival time, we can let any subject from the n_ik subjects have t_kl = 0, which results in

\int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} G_{k} (q_{ikl} (t_{k l}))} \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} [\frac{1}{ξ_{k l}} + exp {- G_{k} (q_{ikl} (t_{k l}))}] f (b; γ) d b = \int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} G_{k} (q_{ikl 0} (t_{k l}))} \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} [\frac{1}{ξ_{k l}} + exp {- G_{k} (q_{ikl 0} (t_{k l}))}] f (b; γ_{0}) d b,

where ξ_kl is any positive variable.

The above expression implies that {G_k(q_ikl(t)), k ∈ Inline graphic } as a function of

b_{1} \sim \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} [\frac{1}{ξ_{k l}} + exp {- G_{k} (q_{ikl} (t_{k l}))}] f (b; γ)

has the same distribution as {G_k(q_ikl₀(t)), k ∈ Inline graphic } as a function of

b_{2} \sim \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} [\frac{1}{ξ_{k l}} + exp {- G_{k} (q_{ikl 0} (t_{k l}))}] f (b; γ_{0});

so this is true between {q_ikl(t)} and {q_ikl₀(t)} because of the one-to-one mapping. Thus, the distributions of { $log q_{ikl}^{'} (t)$ } and { $log q_{ikl 0}^{'} (t)$ } should also agree and they have the same expectation. Now let t_kl = 0 for k ∈ Inline graphic . Since E[b₁] = E[b₂] = 0, we obtain $log λ_{k} (t) + β^{T} Z_{ikl} (t) = log λ_{k 0} (t) + β_{0}^{T} Z_{ikl} (t)$ for k ∈ . The above arguments also yield

\int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {b^{T} {\tilde{Z}}_{ikl} (t_{k l})} \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} [\frac{1}{ξ_{k l}} + exp {- G_{k} (q_{ikl} (t_{k l}))}] f (b; γ) d b = \int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {b^{T} {\tilde{Z}}_{ikl} (t_{k l})} \prod_{k \in K_{2}} \prod_{l = 1}^{n_{i k}} [\frac{1}{ξ_{k l}} + exp {- G_{k} (q_{ikl 0} (t_{k l}))}] f (b; γ_{0}) d b .

We compare the coefficients of ξ_kl for k ∈ Inline graphic . This yields that for any subset L_k ⊂ {1, …, n_ik},

\int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} b^{T} {\tilde{Z}}_{ikl} (t_{k l})} \prod_{k \in K_{2}} \prod_{l \in L_{k}} exp {- G_{k} (q_{ikl} (t))} f (b; γ) d b = \int_{b} \prod_{k \in K_{1}} \prod_{l = 1}^{n_{i k}} exp {i ω_{k l} b^{T} {\tilde{Z}}_{ikl} (t_{k l})} \prod_{k \in K_{2}} \prod_{l \in L_{k}} exp {- G_{k} (q_{ikl 0} (t))} f (b; γ_{0}) d b .

We differentiate the above expression with respect to t_kl at 0 for k ∈ Inline graphic . It then follows from (D8) that log λ_k(0) − log λ₀_k(0) + (β − β₀)^T Z_ikl(0) = 0 and γ = γ₀. Thus, (D7) implies that β = β₀ and λ_k(t) = λ₀_k(t) for k ∈ . On the other hand, for any fixed k ∈ , we let t_k_′_l_′ = 0 if k′ ≠ k or l′ ≠ l. Thus, ∫_b exp{− G_k(q_ikl(t_kl))}f (b; γ₀)db = ∫_b exp{− G_k(q₀_ikl(t_kl))}f (b; γ₀)db. Therefore, Λ_k = Λ₀_k for k ∈ Inline graphic according to (D8).

To verify (C7), we write v = (v_β, v_γ). We perform operations (a) and (b) on the score equation in (C7). The arguments used in proving the identifiability yield

\int_{b} [\sum_{k \in K_{1}} \sum_{l = 1}^{n_{i k}} i ω_{k l} A_{ikl} (t_{k l}) G_{k} (q_{ikl 0} (t_{k l})) - \sum_{k \in K_{2}} \sum_{l \in L_{k}} A_{ikl} (t_{k l}) + \frac{f^{'} {(b; γ_{0})}^{T} v_{γ}}{f (b; γ_{0})}] \times exp {\sum_{k \in K_{1}} \sum_{l = 1}^{n_{ikl}} i ω_{k l} G_{k} (q_{ikl 0} (t_{k l})) - \sum_{k \in K_{2}} \sum_{l \in L_{k}} G_{k} (q_{ikl 0} (t_{k l}))} f (b; γ_{0}) d b = 0,

(12)

where $A_{ikl} (t) = \int_{0}^{t} (h_{k} (s) + Z_{ikl} {(s)}^{T} v_{β}) e^{β_{0}^{T} Z_{ikl} (s) + b^{T} {\tilde{Z}}_{ikl} (s)} d Λ_{k 0} (s) G_{k}^{'} (q_{ikl 0} (t))$ . We differentiate (12) with respect to t_kl twice at 0 for k ∈ Inline graphic . Comparison of the coefficients for ω_kl yields ∫_b e^{2b^TZ̃_ikl(0)} f′(b; γ₀)^T v_γdb = 0. We also differentiate (12) with respect to t_kl at 0 for k ∈ . Thus, for each k ∈ and l = 1, …, n_ik, $\int_{b} (h_{k} (0) + Z_{ikl} {(0)}^{T} v_{β}) e^{b^{T} {\tilde{Z}}_{ikl} (0)} f (b; γ_{0}) d b = - G_{k}^{'} (0) \int_{b} e^{b^{T} {\tilde{Z}}_{ikl} (0)} f^{'} {(b; γ_{0})}^{T} v_{γ} d b$ . It then follows from (D8) that v_γ = 0. For fixed k₀ and l₀, with the fact of v_γ = 0, the score equation under operations (a) and (b), where in (a) we let $d N_{ikl}^{*} (t) = 0$ for any t ≤ τ and in (b) we let m = 0 whenever k ≠ k₀ or l ≠ l₀, becomes a homogeneous integral equation for h_k₀ (t) + Z_ik₀l₀ (t)^T v_β. The equation has a trivial solution, so h_k₀ (t) + Z_ik₀l₀ (t)^T v_β= 0. Since k₀ and l₀ are arbitrary, (D7) implies that h_k = 0 and v_β= 0.

Remark 2

For survival time, (D5) is required to hold only for m = 0 and m = 1.

Remark 3

The above results do not apply directly to the proportional hazards model with gamma frailty because (D6) does not hold when b has a gamma distribution. It is mathematically convenient to handle this model because the marginal hazard function has an explicit form. The likelihood is a special case of ours with

Ψ (O_{i}; θ, Λ) = \prod_{j = 1}^{n_{i}} \prod_{t \leq τ} Y_{i j} {(t; β)}^{d N_{i j} (t)} \prod_{t \leq τ} {1 + θ N_{i \cdot} (u -)}^{d N_{i \cdot} (t)} {1 + θ \int_{0}^{τ} Y_{i \cdot} (u; β) d Λ (u)}^{- (1 / θ + N_{i \cdot} (τ))}

in the notation of Parner (1998). Clearly, Ψ satisfies (C3) when θ > 0. The other conditions can be verified in the same manner as before.

Remark 4

Our theory does not cover the case in which the true parameter values lie on the boundary of Θ. It is delicate to deal with the boundary problem. One possible solution is to follow the idea of Parner (1998) by extending the definition of the likelihood function outside Θ and verifying (C2)–(C8) for the extended likelihood function.

Remark 5

We have assumed known transformations. We may allow G_k to belong to a parametric family of distributions, say G_k(·; φ), where φ is a parameter in a compact set. Then θ contains φ. Our results and proofs apply to this situation if (D5) holds uniformly in φ and the two identifiability conditions are satisfied.

10.2. Joint Models for Repeated Measures and Failure Times

For the (parametric) generalized linear mixed model, the likelihood can be viewed as a special case of that of Section 10.1 except that there is an additional parameter α in f(y|x; b). We assume that (D1)–(D8) hold but with (D6) replaced by the following condition.

(D6′) For any constant a₁ > 0,

sup_{α, γ} E [\int_{b} exp {a_{1} (N_{i}^{*} (τ) + 1) ∣ b ∣} \prod_{j = 1}^{n_{i}} f (Y_{i j} ∣ X_{i j}; b) f (b; γ) d b] < \infty,

and there exists a constant a₂ > 0 such that for any γ and α,

\sum_{k = 1}^{3} | \frac{f_{α}^{(k)} (Y_{i j} ∣ X_{i j}, b)}{f (Y_{i j} ∣ X_{i j}, b)} | + | \frac{f_{γ}^{(k)} (b; γ)}{f (b; γ)} | \leq r_{3} (O_{i}) exp {a_{2} (1 + ∣ b ∣)}

almost surely, where r₃( Inline graphic ) is a random variable in L₂(P).

Under these conditions, the desired asymptotic properties follow from the arguments of Section 10.1.

Under the semiparametric linear transformation model for continuous repeated measures, the likelihood is in the form of that of Section 2.2 with K = 2 and n_i₂ = n_i, where the time to the second type of failure is defined by Y_ij (assuming without loss of generality that Y_ij ≥ 0). Thus, if we regard Y_ij as a right-censored observation when it is greater than a very large value (i.e., the upper limit of detection), then the asymptotic results given in Section 10.1 hold. When such an upper limit does not exist, the estimator for Λ̃ can be unbounded when sample size goes to infinity. Then our proof of Theorem 1 does not apply.

10.3. Transformation Models for Counting Processes

We verify (C1)–(C8) under the following conditions.

(E1) The parameter value ${(β_{0}^{T}, γ_{0}^{T})}^{T}$ belongs to the interior of a compact set Θ in R^d, and $Λ_{0}^{'} (t) > 0$ for all t ∈ [0, τ].

(E2) With probability one, P (C ≥ τ|Z) > δ₀ > 0 for some constant δ₀.

(E3) Condition (D5) holds.

(E4) With probability one, Z(·) and Z̃ are in BV [0, τ] and are left-continuous with bounded left- and right-derivatives in [0, τ].

(E5) If γ^T Z̃ is equal to a constant with probability one, then γ = 0. In addition, if β^T Z(t) = c(t) for a deterministic function c(t) with probability one, then β = 0.

In this case,

Ψ (O_{i}; θ, Λ) = \prod_{t \leq τ} {(R_{i} (t) e^{β^{T} Z_{i} (t) + γ^{T} {\tilde{Z}}_{i}} {1 + \int_{0}^{t} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{e^{γ^{T} {\tilde{Z}}_{i - 1}}} \times G^{'} [{1 + \int_{0}^{t} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{e^{γ^{T} {\tilde{Z}}_{i}}}])}^{d N_{i}^{*} (t)} exp (- G [{1 + \int_{0}^{τ} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{e^{γ^{T} {\tilde{Z}}_{i}}}]) .

By (D5),

\begin{array}{l} \prod_{t \leq τ} {(R_{i} (t) e^{β^{T} Z_{i} (t) + γ^{T} {\tilde{Z}}_{i}} {1 + \int_{0}^{t} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{e^{γ^{T} {\tilde{Z}}_{i - 1}}} \times G^{'} [{1 + \int_{0}^{t} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{e^{γ^{T} {\tilde{Z}}_{i}}}])}^{d N_{i}^{*} (t)} exp (- G [{1 + \int_{0}^{τ} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{e^{γ^{T} {\tilde{Z}}_{i}}}]) \\ \leq μ_{1}^{N_{i}^{*} (τ)} \prod_{t \leq τ} {1 + \int_{0}^{t} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{- d N_{i}^{*} (t)} {1 + \int_{0}^{τ} R_{i} (s) e^{β^{T} Z_{i} (s)} d Λ (s)}^{- κ e^{γ^{T} {\tilde{Z}}_{i}}} \end{array}

for some constant μ₁. Thus, (C3) follows from the boundedness of γ^T Z̃_i. We can verify the other conditions by using the arguments of Section 10.1.

To verify the first identifiability condition, we assume that $N_{i}^{*} (t)$ has jumps at x, x₁, …, x_m for some integer m. After integrating both sides of the equation in (C5) over x₁, …, x_m from 0 to τ and integrating x from x to τ, we obtain

(G [{1 + \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}] - G [{1 + \int_{0}^{x} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}]) \times {(G [{1 + \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}] - G (1))}^{m} exp (- G [{1 + \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}] + G (1)) = (G [{1 + \int_{0}^{τ} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}] - G [{1 + \int_{0}^{x} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}]) \times {(G [{1 + \int_{0}^{τ} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}] - G (1))}^{m} exp (- G [{1 + \int_{0}^{τ} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}] + G (1)) .

Multiplying both sides of this equation by 1/m! and summing over m ≥ 0, we obtain

G [{1 + \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}] - G [{1 + \int_{0}^{x} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}] = G [{1 + \int_{0}^{τ} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}] - G [{1 + \int_{0}^{x} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}] .

Setting $N_{i}^{*} (τ) = 0$ in the likelihood function yields

G [{1 + \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}] = G [{1 + \int_{0}^{τ} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}}] .

Thus

{1 + \int_{0}^{x} e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t)}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}} = {1 + \int_{0}^{x} e^{β^{* T} Z_{i} (t)} d Λ^{*} (t)}^{e^{γ^{* T} {\tilde{Z}}_{i}}} .

Then Λ^* (t) is absolutely continuous with respect to t. Differentiating both sides with respect to x and letting x = 0 yield λ^* (0) > 0. When x converges to zero, the left-hand side is ${[exp {β_{0}^{T} Z_{i} (0)} λ_{0} (0) x]}^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}} + o (x^{e^{γ_{0}^{T} {\tilde{Z}}_{i}}})$ while the right-hand side is ${[exp {β^{* T} Z_{i} (0)} λ^{*} (0) x]}^{e^{γ^{* T} {\tilde{Z}}_{i}}} + o (x^{e^{γ^{* T} {\tilde{Z}}_{i}}})$ . Thus, $γ_{0}^{T} {\tilde{Z}}_{i} = γ^{* T} {\tilde{Z}}_{i}$ . By (E5), γ₀ = γ ^*. Furthermore, $e^{β_{0}^{T} Z_{i} (t)} d Λ_{0} (t) / d t = e^{β^{* T} Z_{i} (t)} d Λ^{*} (t) / d t$ . It follows from (E5) that β₀ = β^* and Λ₀ = Λ^*.

To verify (C7), we assume that the score function along (β₀ + εh_β, γ₀ + εh_γ, dΛ₀ + εhdΛ₀) is zero. Equivalently, if we let $g_{0} (t) = {1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)}^{γ_{0}^{T} {\tilde{Z}}_{i}}$ , then we obtain

0 = \int h (t) R_{i} (t) d N_{i}^{*} (t) + \int R_{i} (t) {h_{β}^{T} Z_{i} (t) + h_{γ}^{T} {\tilde{Z}}_{i}} d N_{i}^{*} (t) + \int \frac{R_{i} (t) (e^{γ^{T} {\tilde{Z}}_{i}} - 1)}{1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} [\int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} {h_{β}^{T} Z_{i} (s) + h (s)} d Λ_{0} (s)] d N_{i}^{*} (t) + \int R_{i} (t) h_{γ}^{T} {\tilde{Z}}_{i} e^{γ^{T} {\tilde{Z}}_{i}} log {1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} d N_{i}^{*} (t) + \int R_{i} (t) \frac{G^{″} (g_{0} (t))}{G^{'} (g_{0} (t))} g_{0} (t) h_{γ}^{T} {\tilde{Z}}_{i} e^{γ^{T} {\tilde{Z}}_{i}} log {1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} d N_{i}^{*} (t) + \int R_{i} (t) \frac{G^{″} (g_{0} (t))}{G^{'} (g_{0} (t))} g_{0} (t) [\frac{e^{γ_{0}^{T} {\tilde{Z}}_{i}} \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} {h_{β}^{T} Z_{i} (s) + h (s)} d Λ_{0} (s)}{1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)}] d N_{i}^{*} (t) - G^{'} (g_{0} (τ)) g_{0} (τ) h_{γ}^{T} {\tilde{Z}}_{i} e^{γ^{T} {\tilde{Z}}_{i}} log {1 + \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} - G^{'} (g_{0} (τ)) g_{0} (τ) \frac{e^{γ_{0}^{T} {\tilde{Z}}_{i}}}{1 + \int_{0}^{T} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} \int_{0}^{τ} e^{β_{0}^{T} Z_{i} (s)} {h_{β}^{T} Z_{i} (s) + h (s)} d Λ_{0} (s) .

We multiply both sides by the likelihood function and let $N_{i}^{*} (t)$ have jumps at times t₁, t₂, …, t_m. We integrate t₁ from 0 to t and t_l, 1 < l ≤ m from 0 to τ. By multiplying the resulting equation by 1/(m − k)! and summing over m = 1, 2, …, we obtain

h_{γ}^{T} {\tilde{Z}}_{i} log {1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} + \frac{\int_{0}^{t} e^{β_{0}^{T} {\tilde{Z}}_{i} (s)} {h_{β}^{T} Z_{i} (s) + h (s)} d Λ_{0} (s)}{1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} = 0.

Differentiation with respect to t then yields

h_{γ}^{T} {\tilde{Z}}_{i} + {h_{β}^{T} Z_{i} (t) + h (t)} - \frac{\int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} {h_{β}^{T} Z_{i} (s) + h (s)} d Λ_{0} (s)}{1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} = 0.

Combining the above two equations, we have

{h_{β}^{T} Z_{i} (t) + h (t)} - \frac{\int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} {h_{β}^{T} Z_{i} (s) + h (s)} d Λ_{0} (s)}{1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)} [1 + \frac{1}{log {1 + \int_{0}^{t} e^{β_{0}^{T} Z_{i} (s)} d Λ_{0} (s)}}] = 0.

This is a homogeneous integral equation for $h_{β}^{T} Z_{i} (t) + h (t)$ and has zero solution. That is, $h_{β}^{T} Z_{i} (t) + h (t) = 0$ . It follows from (E5) that h(t) = 0 and h_β= 0. Thus, h_γ = 0.

11. Concluding Remarks

We have developed a general asymptotic theory for the NPMLEs with right censored data and shown that this theory applies to the models considered by Zeng and Lin (2007). This theory can also be used to establish the desired asymptotic properties for other existing semiparametric models, particularly the models mentioned in Sections 7.1–7.4 of Zeng and Lin (2007), as well as those that may be invented in the future. It is much simpler to verify the set of sufficient conditions identified in this paper than to prove the asymptotic results from scratch. Conditions (C1) and (C2) are standard conditions required in all censored-data regression; (C3), (C4) and (C6) are certain smoothness conditions that can be verified directly, as demonstrated in Section 10; (C5) and (C7) are two minimal identifiability conditions that need to be verified for any specific problem.

Although the basic structures of our proofs mimic those of Murphy (1994; 1995) and Parner (1998), our technical arguments are innovative and substantially more difficult because we deal with a very general form of likelihood function rather than specific problems. In all previous work, verification of the Donsker property relies on the specific expressions of the functions, whereas our Lemma 1 provides a universal way to verify this property. In verifying the invertibility of the information operator, all previous work requires an explicit expression of the information operator that is identified as the sum of an invertible operator and a compact operator, whereas we allow a very generic form of information operator obtained from the likelihood function (1). Murphy and van der Vaart (2001) stated that the consistency of NPMLEs needs to be proved on a case-by-case basis; however, we were able to prove the consistency for a very general likelihood function. Although we borrowed the partitioning idea of Murphy (1994), our technical arguments are very different because of the generic form of the likelihood.

In some applications, the failure times are subject to left truncation in addition to right censoring. To accommodate general censoring/truncation patterns, we define N(t) as the number of events observed by time t and R(t) as the at-risk indicator at time t, reflecting both left truncation and right censoring. Assume that the truncation time has positive mass at time 0, so that (C2) is satisfied. Then all the results continue to hold.

This paper is concerned with the theoretical aspect of the NPMLEs and complements the work of Zeng and Lin (2007). The interested readers are referred to the latter for the calculations of the NPMLEs and for the use of the semiparametric regression models and NPMLEs in practice. The latter also provides rationale for the kind of model considered in Sections 2 and 10 of this paper. Although the latter contains some theoretical elements, this paper presents the theory (especially the regularity conditions) in a more rigorous manner and provides all the proofs.

Contributor Information

Donglin Zeng, Email: dzeng@bios.unc.edu.

D. Y. Lin, Email: lin@bios.unc.edu.

References

Heitjan DF, Rubin DB. Ignorability and coarse data. Ann Statist. 1991;19:2244–2253. [Google Scholar]
Murphy SA. Consistency in a proportional hazards model incorporating a random effect. Ann Statist. 1994;22:712–731. [Google Scholar]
Murphy SA. Asymptotic theory for the frailty model. Ann Statist. 1995;23:182–198. [Google Scholar]
Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–485. [Google Scholar]
Murphy SA, van der Vaart AW. Semiparametric mixtures in case-control studies. J Multi Analy. 2001;79:1–32. [Google Scholar]
Parner E. Asymptotic theory for the correlated gamma-frailty model. Ann Statist. 1998;26:183–214. [Google Scholar]
Rudin W. Functional Analysis. McGraw-Hill; New York: 1973. [Google Scholar]
Van der Vaart AW. Asymptotic Statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
Van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; New York: 1996. [Google Scholar]
Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data (with discussion) J R Statist Soc, Ser B. 2007;69:507–564. [Google Scholar]

[R1] Heitjan DF, Rubin DB. Ignorability and coarse data. Ann Statist. 1991;19:2244–2253. [Google Scholar]

[R2] Murphy SA. Consistency in a proportional hazards model incorporating a random effect. Ann Statist. 1994;22:712–731. [Google Scholar]

[R3] Murphy SA. Asymptotic theory for the frailty model. Ann Statist. 1995;23:182–198. [Google Scholar]

[R4] Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–485. [Google Scholar]

[R5] Murphy SA, van der Vaart AW. Semiparametric mixtures in case-control studies. J Multi Analy. 2001;79:1–32. [Google Scholar]

[R6] Parner E. Asymptotic theory for the correlated gamma-frailty model. Ann Statist. 1998;26:183–214. [Google Scholar]

[R7] Rudin W. Functional Analysis. McGraw-Hill; New York: 1973. [Google Scholar]

[R8] Van der Vaart AW. Asymptotic Statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]

[R9] Van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; New York: 1996. [Google Scholar]

[R10] Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data (with discussion) J R Statist Soc, Ser B. 2007;69:507–564. [Google Scholar]

PERMALINK

A GENERAL ASYMPTOTIC THEORY FOR MAXIMUM LIKELIHOOD ESTIMATION IN SEMIPARAMETRIC REGRESSION MODELS WITH CENSORED DATA

Donglin Zeng

D Y Lin

Abstract

1. Introduction

2. Some Semiparametric Models

2.1. Transformation Models for Counting Processes

2.2. Transformation Models With Random Effects for Dependent Failure Times

2.3. Joint Models for Repeated Measures and Failure Times

3. Nonparametric Maximum Likelihood Estimation

4. Regularity Conditions

Remark 1

5. Some Useful Lemmas

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

6. Consistency

Theorem 1

Proof

Step 1

Step 2

Step 3

7. Weak Convergence and Asymptotic Efficiency

Theorem 2

Proof

8. Information Matrix

Theorem 3

Proof

9. Profile Likelihood

Theorem 4

Proof

10. Applications

10.1. Transformation Models With Random Effects for Dependent Failure Times

Remark 2

Remark 3

Remark 4

Remark 5

10.2. Joint Models for Repeated Measures and Failure Times

10.3. Transformation Models for Counting Processes

11. Concluding Remarks

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases