Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Jul 1.
Published in final edited form as: Stat Sin. 2010 Apr;20(2):871–910.

A GENERAL ASYMPTOTIC THEORY FOR MAXIMUM LIKELIHOOD ESTIMATION IN SEMIPARAMETRIC REGRESSION MODELS WITH CENSORED DATA

Donglin Zeng 1, D Y Lin 1
PMCID: PMC2888521  NIHMSID: NIHMS197848  PMID: 20577580

Abstract

We establish a general asymptotic theory for nonparametric maximum likelihood estimation in semiparametric regression models with right censored data. We identify a set of regularity conditions under which the nonparametric maximum likelihood estimators are consistent, asymptotically normal, and asymptotically efficient with a covariance matrix that can be consistently estimated by the inverse information matrix or the profile likelihood method. The general theory allows one to obtain the desired asymptotic properties of the nonparametric maximum likelihood estimators for any specific problem by verifying a set of conditions rather than by proving technical results from first principles. We demonstrate the usefulness of this powerful theory through a variety of examples.

Key words and phrases: Counting process, empirical process, multivariate failure times, nonparametric likelihood, profile likelihood, survival data

1. Introduction

Semiparametric regression models are highly useful in investigating the effects of covariates on potentially censored responses (e.g. failure times and repeated measures) in longitudinal studies. It is desirable to analyze such models by the nonparametric maximum likelihood approach, which generally yields consistent, asymptotically normal, and asymptotically efficient estimators. It is technically difficult to prove the asymptotic properties of the nonparametric maximum likelihood estimators (NPMLEs). Thus far, rigorous proofs exist only in some special cases.

In this paper, we develop a general asymptotic theory for the NPMLEs with right censored data. The theory is very encompassing in that it pertains to a generic form of likelihood rather than specific models. We prove that, under a set of mild regularity conditions, the NPMLEs are consistent, asymptotically normal, and asymptotically efficient with a limiting covariance matrix that can be consistently estimated by the inverse information matrix or the profile likelihood method.

This paper is the technical companion to Zeng and Lin (2007), in which several classes of models were proposed to unify and extend existing semiparametric regression models. The likelihoods for those models can all be written in the general form considered in this paper. For each class of models in Zeng and Lin (2007), we identify a set of conditions under which the regularity conditions for the general theory hold so that desired asymptotic properties are ensured.

2. Some Semiparametric Models

We describe briefly the three kinds of models considered in Zeng and Lin (2007). We assume that the censoring mechanism satisfies coarsening at random (Heitjan and Rubin (1991)).

2.1. Transformation Models for Counting Processes

Let N*(t) record the number of events that the subject has experienced by time t, and let Z(·) denote the corresponding covariate processes. Zeng and Lin (2007) proposed the following class of transformation models for the cumulative intensity function of N*(t)

Λ(tZ)=G[{1+0tR(s)eβTZ(s)dΛ(s)}eγTZ]G(1),

where G is a continuously differentiable and strictly increasing function with G′(1) > 0 and G(∞) = ∞, R*(·) is an indicator process, is a subset of Z, β and γ are regression parameters, and Λ(·) is an unspecified increasing function. The data consist of {Ni(t), Ri(t), Zi(t); t ∈ [0, τ]} (i = 1, …, n), where Ri(t)=I(Cit)Ri(t),Ni(t)=Ni(tCi), Ci is the censoring time, and τ is a finite constant. The likelihood is

i=1ntτ{Ri(t)dΛ(tZi)}dNi(t)exp{0τRi(t)dΛ(tZi)},

where dNi(t) = Ni(t) − Ni(t−).

2.2. Transformation Models With Random Effects for Dependent Failure Times

For i = 1, …, n, k = 1, …, K and l = 1, …, nik, let Nikl(·) denote the number of the kth type of event experienced by the lth individual in the ith cluster, and Zikl(·) the corresponding covariate processes. Zeng and Lin (2007) assumed that the cumulative intensity for Nikl(t) takes the form

Λk(tZikl;bi)=Gk{0tRikl(s)eβTZikl(s)+biTZikl(s)dΛk(s)},

where Gk, Λk, and Rikl are analogous to G, Λ, and R* of Section 2.1, ikl is a subset of Zikl plus the unit component, and bi is a vector of random effects with density f (b; γ). Let Cikl, Nikl, and Rikl be defined analogously to Ci, Ni, and Ri of Section 2.1. The likelihood is

i=1nbk=1Kl=1niktτ[Rikl(t)eβTZikl(t)+bTZikl(t)dΛk(t)Gk{0tRikl(s)eβTZikl(s)+bTZikl(s)dΛk(s)}]dNikl(t)×exp[Gk{0τRikl(t)eβTZikl(t)+bTZikl(t)dΛk(t)}]f(b;γ)db.

2.3. Joint Models for Repeated Measures and Failure Times

For i = 1, …, n and j = 1, …, ni, let Yij be the response variable at time tij for the ith subject, and Xij the corresponding covariates. We assume that (Yi1, …, Yini) follows a generalized linear mixed model with density fy(y|Xij; bi), where bi is a set of random effects with density f (b; γ). We define Ni and Zi as in Section 2.1, and assume that

Λ(tZi;bi)=G{0tRi(s)eβTZi(s)+(ψbi)TZi(s)dΛ(s)},

where i is a subset of Zi plus the unit component, ψ is a vector of unknown constants, and v1v2 is the component-wise product of two vectors v1 and v2. The likelihood is

i=1nbtτ{Ri(t)dΛ(tZi;b)}dNi(t)exp{0τRi(t)dΛ(tZi;b)}j=1nify(YijXij;b)f(b;γ)db.

For continuous measures, Zeng and Lin (2007) proposed the semiparametric linear mixed model

H(Yij)=αTXij+biTXij+εij,

where is an unknown increasing function with (−∞) = −∞, (∞) = ∞, and (0) = 0, α is a set of regression parameters, ij is typically a subset of Xij, and εij (i = 1, …, n; j = 1, …, nij) are independent with density fε. Write Λ̃(y) = e(y). The likelihood is

i=1nbtτ{Ri(t)dΛ(tZi;b)}dNi(t)exp{0τRi(t)dΛ(tZi;b)}×j=1nifε(log(Λ(Yij))αTXijbiTXij){dlogΛ(Yij)/dy}f(b;γ)db.

3. Nonparametric Maximum Likelihood Estimation

All the likelihood functions given in Section 2 can be expressed as

i=1nk=1Kl=1niktτλk(t)Rikl(t)dNikl(t)Ψ(Oi;θ,A),

where λk(t)=Λk(t), θ is a d-vector of regression parameters and variance components, Inline graphic = (Λ1, …, ΛK), Inline graphic pertains to the observation on the ith cluster, and Ψ is a functional of Inline graphic, θ, and Inline graphic. For nonparametric maximum likelihood estimation, we allow Inline graphic to be discontinuous with jumps at the observed failure times and maximize the modified likelihood function

i=1nk=1Kl=1niktτΛk{t}Rikl(t)dNikl(t)Ψ(Oi;θ,A),

where Λk{t} denotes the jump size of the monotone function Λk at t. Equivalently, we maximize the logarithm of the above function

Ln(θ,A)=i=1n[k=1Kl=1nik0τRikl(t)logΛk{t}dNikl(t)+logΨ(Oi;θ,A)]. (1)

We wish to establish an asymptotic theory for the resulting NPMLEs θ̂ and Inline graphic.

4. Regularity Conditions

We impose the following conditions on the model and data structures.

(C1) The true value θ0 lies in the interior of a compact set Θ, and the true functions Λ0k are continuously differentiable in [0, τ] with Λ0k(t)>0, k = 1, …, K.

(C2) With probability one, P(infs∈[0,t] Rik·(s) ≥ 1|Zikl, l = 1, …, nik) > δ0 > 0 for all t ∈ [0, τ], where Rik·(t)=l=1nikRikl(t).

(C3) There exist a constant c1 > 0 and a random variable r1(Inline graphic) > 0 such that E[log r1(Inline graphic)] < ∞ and, for any θ ∈ Θ and any finite Λ1, …, ΛK,

Ψ(Oi;θ,A)r1(Oi)k=1Ktτ{1+0tRik·(t)dΛk(t)}dNik·(t){1+0τRik·(t)dΛk(t)}c1

almost surely, where Nik·(t)=l=1nikNikl(t). In addition, for any constant c2,

inf{Ψ(Oi;θ,A):||Λ1||V[0,τ]c2,,||ΛK||V[0,τ]c2,θΘ}>r2(Oi)>0,

where ||h||V[0,τ] is the total variation of h(·) in [0, τ], and r2(Inline graphic), which may depend on c2, is a finite random variable with E[|log r2(Inline graphic)|] < ∞.

We require certain smoothness of Ψ. Let Ψ̇θ denote the derivative of Ψ(Inline graphic; θ, Inline graphic) with respect to θ, and let Ψ̇k[Hk] denote the derivative of Ψ(Inline graphic; θ, Inline graphic) along the path (Λk + εHk), where Hk belongs to the set of functions in which Λk + εHk is increasing with bounded total variation.

(C4) For any (θ(1), θ(2)) ∈ Θ, and (Λ1(1),Λ1(2)),,(ΛK(1),ΛK(2)),(H1(1),H1(2)),,(HK(1),HK(2)) with uniformly bounded total variations, there exist a random variable ℱ(Inline graphic) ∈ L4(P) and K stochastic processes μik(t; Inline graphic) ∈ L6(P), k = 1, …, K, such that

|Ψ(Oi;θ(1),A(1))Ψ(Oi;θ(2),A(2))|+|Ψ.θ(Oi;θ(1),A(1))Ψ.θ(Oi;θ(2),A(2))|+k=1K|Ψ.k(Oi;θ(1),A(1))[Hk(1)]Ψ.k(Oi;θ(2),A(2))[Hk(2)]|+k=1K|Ψ.k(Oi;θ(1),A(1))[Hk(1)]Ψ(Oi;θ(1),A(1))Ψ.k(Oi;θ(2),A(2))[Hk(2)]Ψ(Oi;θ(2),A(2))|F(Oi)[θ(1)θ(2)+k=1K{0τΛk(1)(s)Λk(2)(s)dμik(s;Oi)+0τHk(1)(s)Hk(2)(s)dμik(s;Oi)}].

In addition, μik(s; Inline graphic) is non-decreasing, and E[ℱ(Inline graphic)μik(s; Inline graphic)] is left-continuous with uniformly bounded left- and right-derivatives for any s ∈ [0, τ]. Here, the right-derivative for a function f(x) is defined as limh→0+(f (x + h) − f (x+))/h.

The following condition ensures identifiability of parameters.

(C5) (First Identifiability Condition) If

[k=1Kl=1niktτλk(t)Rikl(t)dNikl(t)]Ψ(Oi;θ,A)=[k=1Kl=1niktτλ0k(t)Rikl(t)dNikl(t)]Ψ(Oi;θ0,A0)

almost surely, then θ* = θ0 and Λk(t)=Λ0k(t) for t ∈ [0, τ], k = 1, …, K.

The next assumption is more technical and will be used in proving the weak convergence of the NPM-LEs. For any fixed (θ, Inline graphic) in a small neighborhood of (θ0, Inline graphic) in Rd × {BV[0, τ]}K, where BV[0, τ] denotes the space of functions with bounded total variations in [0, τ], (C4) implies that the linear functional

HkE[Ψ.k(Oi;θ,A)[Hk]Ψ(Oi;θ,A)]

is continuous from BV[0, τ] to R. Thus, there exists a bounded function η0k(s; θ, Inline graphic) such that

E[Ψ.k(Oi;θ,A)[Hk]Ψ(Oi;θ,A)]=0τη0k(s;θ,A)dHk(s).

(C6) There exist functions ζ0k(s; θ0, Inline graphic) ∈ BV[0, τ], k = 1, …, K, and a matrix ζ0θ(θ0, Inline graphic) such that

|E[Ψ.θ(Oi;θ,A)Ψ(Oi;θ,A)Ψ.k(Oi;θ0,A0)Ψ(Oi;θ0,A0)]ζ0θ(θ0,A0)(θθ0)k=1K0τζ0k(s;θ0,A0)d(ΛkΛ0k)|=o(θθ0+k=1K||ΛkΛ0k||V[0,τ]).

In addition, for k = 1, …, K,

k=1Ksups[0,τ]|{η0k(s;θ,A)η0k(s;θ0,A0)}η0kθ(s;θ0,A0)(θθ0)0τm=1Kη0km(s,t;θ0,A0)d(ΛmΛ0m)(t)|=o(θθ0+k=1K||ΛkΛ0k||V[0,τ]),

where η0km is a bounded bivariate function and η0 is a d-dimensional bounded function. Furthermore, there exists a constant c3 such that |η0km(s, t1; θ0, Inline graphic) − η0km(s, t2; θ0, Inline graphic)| ≤ c3|t1t2| for any s ∈ [0, τ] and any t1, t2 ∈ [0, τ].

The final assumption ensures that the Fisher information matrix along any finite-dimensional sub-model is non-singular.

(C7) (Second Identifiability Condition) If with probability one,

k=1Kl=1nikhk(t)Rikl(t)dNikl(t)+Ψ.θ(Oi;θ0,A0)Tv+k=1KΨ.k(Oi;θ0,A0)[hkdΛ0k]Ψ(Oi;θ0,A0)=0

for some constant vector vRd and hkBV[0, τ], k = 1, …, K, then v = 0 and hk = 0 for k = 1, …, K.

Remark 1

(C1)–(C2) are standard assumptions in any analysis of censored data. (C3) pertains to the model structure, and (C4) and (C6) essentially impose the smoothness of this structure. Although they appear technical, these conditions are easy to verify in practice. (C5) and (C7) usually require some work to verify, but can be translated to simple conditions in specific cases.

5. Some Useful Lemmas

Lemma 1

For any constant c, the following classes of functions are P-Donsker:

F1={logΨ(Oi;θ,A):||Λk||V[0,τ]c,k=1,,K,θΘ},F2={Ψ.θ(Oi;θ,A)Ψ(Oi;θ,A):||Λk||V[0,τ]c,k=1,,K,θΘ},F3k={Ψ.k(Oi;θ,A)[H]Ψ(Oi;θ,A):||Λm||V[0,τ]c,m=1,,K,θΘ||H||V[0,τ]c},k=1,,K.

Proof

We only prove that ℱ3k is P-Donsker; the proofs for the other two classes are similar. For k = 1, …, K, we define a measure μ̃k in [0, τ] such that, for any Borel set A ⊂ [0, τ],

μk(A)=0τI(tA)E[F(Oi)2(μik(τ;Oi)μik(0;Oi))2dμik(t;Oi)].

Condition (C4) implies that μ̃k([0, τ]) ≤ ||ℱ(Inline graphic)||L4(P)||μik(τ; Inline graphic) − μik(0; Inline graphic)||L6(P). Thus, μ̃k is a finite measure. According to Theorem 2.7.5 of van der Vaart and Wellner (1996), the bracket covering number for any bounded set in BV[0, τ] is of order exp{O(1/ε)} in L2(μ̃k), k = 1, …, K. Thus, we can construct Nε ≡ (1/ε)d × exp{O(K/ε)} × exp{O(1/ε)} brackets for the set of (θ, Inline graphic, H) in ℱ3k, denoted by

[θpL,θpU]×[Λ1pL,Λ1pU]××[ΛKpL,ΛKpU]×[HpL,HpU],p=1,,Nε,

such that θpUθpL<ε and

ΛkpUΛkpL2dμk<ε2,HpUHpL2dμk<ε2,k=1,,K.

Any (θ, Inline graphic, H) must belong to one of these brackets. Obviously, the bracket functions

Ψ.k(Oi;θpL,ApL)[HL]Ψ(Oi;θpL,ApL)±F(Oi){θpUθpL+m=1KΛmpU(s)ΛmpL(s)dμim(s;Oi)+HpU(s)HpL(s)dμim(s;Oi)},p=1,,Nε,

cover all the functions in ℱ3k. Since

F(Oi){θpUθpL+m=1KΛmpU(s)ΛmpL(s)dμim(s;Oi)+m=1KHpU(s)HpL(s)dμim(s;Oi)}L2(P)c[θpUθpL+m=1K{E(ΛmpU(s)ΛmpL(s)dμimF(Oi))2}1/2+m=1K{0τHpU(s)HpL(s)2dμm}1/2]c[θpUθpL+m=1K{ΛmpU(s)ΛmpL(s)2dμm}1/2+m=1K{0τHpU(s)HpL(s)2dμm}1/2],

where c is a constant depending on K, the L2(P)-distance within each bracket pair is O(ε). Hence, the bracket entropy integral of ℱ3k is finite, so that ℱ3k is P-Donsker.

Lemma 2

For any bounded random variable (θ, Λ) in Θ × BV[0, τ], the function g(s) ≡ |E[Ψ̇k(Inline graphic; θ, Inline graphic) [I(· ≥ s)]/Ψ (Inline graphic; θ, Inline graphic)]| is left-continuous and satisfies that, for any s ∈ [0, τ], there exist δs, cs > 0 such that |g() − g(s)| ≤ cs|s| for ∈ (sδs, s) and |g() − g(s+)| ≤ cs|s| for ∈ (s, s + δs).

Proof

Since μik(t; Inline graphic) is non-decreasing in t, it follows from (C4) that for any s1 and s2,

g(s1)g(s2)E[F(Oi){I(ts1)I(ts2)dμik(t;Oi)}]E[F(Oi)μik(s1;Oi)]E[F(Oi)μik(s2;Oi)].

Thus, g(s) is in BV[0, τ] and is left-continuous. In addition, the left- and right-differentiability of E[ℱ(Inline graphic)μik(s; Inline graphic)] in (C4) implies that the second part of the lemma holds.

Lemma 3

For any h(s) ∈ BV[0, τ], the linear map h0τh(t)η0km(t,s;θ0,A0)dΛ0k(t) is a bounded compact operator from BV[0, τ] to BV[0, τ].

Proof

It is clear from (C6) that this function maps any bounded set in BV[0, τ] into a bounded set consisting of Lipschitz-continuous functions. The result thus follows since any bounded and Lipschitz-continuous functions consist of a totally bounded set in BV[0, τ] and the linear map is continuous.

6. Consistency

The following theorem states the consistency of θ̂ and Λ̂k, k = 1, …, K.

Theorem 1

Under (C1)–(C5), θ^θ0+k=1Ksupt[0,τ]Λ^k(t)Λ0k(t)a.s.0.

Proof

We fix a random sample in the probability space and assume that (C1)–(C5) hold for this sample. The set of such samples has probability one. We prove the result for this fixed sample. The entire proof consists of three steps.

Step 1

We show that the NPMLEs exist or, equivalently, Λ̂k(τ) < ∞ (k = 1, …, K) for large n. By (C3), the likelihood function is bounded by

i=1nr1(Oi)k=1Ktτ[Λk{t}Rik·(t){1+0tRik·(s)dΛk(s)}1]dNik·(t){1+0τRik·(s)dΛk(s)}c1i=1nr1(Oi)k=1K{1+0τRik·(s)dΛk(s)}c1.

If Λk(τ) = ∞ for some k, then (C2) implies that, with probability one, inft∈[0,τ] Rik·(t) ≥ 1 for some i, so that the above function is equal to zero. Thus, the maximum of the likelihood function can only be attained for Λ̂k(τ) < ∞.

Step 2

We show that lim supn Λ̂k(τ) < ∞ almost surely, i.e., Λ̂k(τ) is bounded uniformly for all large n. By differentiating the objective function (1) with respect to Λk{Yikl} for which dNikl(Yikl)=1 and Rikl(Yikl) = 1, we note that Λ̂k{Yikl} satisfies

1Λ^k{Yikl}=j=1nΨ.k(Oj;θ^,A^)[I(·Yikl)]Ψ(Oj;θ^,A^).

In other words,

Λ^k(t)=i=1nm=1nik0t{j=1nΨ.k(Oj;θ^,A^)[I(·s)]Ψ(Oj;θ^,A^)}1Rikm(s)dNikm(s).

To prove the boundedness of Λ̂k(τ), we construct another step function Λ̃k with jumps only at the Yikl for which dNikl(Yikl)=1and Rikl(Yikl) = 1,

1Λk{Yikl}=j=1nΨ.k(Oj;θ0,A0)[I(·Yikl)]Ψ(Oj;θ0,A0),

that is,

Λk(t)=i=1nm=1nik0t{j=1nΨ.k(Oj;θ0,A0)[I(·s)]Ψ(Oj;θ0,A0)}1Rikm(s)dNikm(s).

We show that Λ̃k uniformly converges to Λ0k. By Lemma 1,

n1{j=1nΨ.k(Oj;θ0,A0)[I(·s)]Ψ(Oj;θ0,A0)}E[Ψ.k(Oi;θ0,A0)[I(·s)]Ψ(Oi;θ0,A0)] (2)

uniformly in s ∈ [0, τ]. Since the score function along the path Λk = Λ0k + εI(· ≥ s) with the other parameters fixed at their true values has zero expectation,

0=E[l=1nikδ(t=s)λ0k(t)Rikl(t)dNikl(t)]+E[Ψ.k(Oi;θ0,A0)[I(·s)]Ψ(Oi;θ0,A0)]=E[l=1nikRikl(s)dNikl(s)/ds]/λ0k(s)+E[Ψ.k(Oi;θ0,A0)[I(·s)]Ψ(Oi;θ0,A0)], (3)

where δ(t = s) is the Dirac function. The submodel is not in the parameter space; however, we can always choose a sequence of submodels in the parameter space which approximates this submodel. Thus, the uniform limit of Λ̃k(t) is

E[m=1nik0t{E[l=1nikRikl(s)dNikl(s)/ds]/λ0k(s)}1Rikm(s)dNikm(s)]=Λ0k(t).

That is, Λ̃k(t) uniformly converges to Λ0k(t).

We next show that the difference between the log-likelihood functions evaluated at (θ̂, Inline graphic) and (θ0, Inline graphic), where Inline graphic = (Λ̃1, …, Λ̃K), is negative eventually if some Λ̂k(τ) diverges, which will induce a contradiction. The key arguments are based on (C3). Clearly, n−1n(θ̂, Inline graphic) ≥ n−1n(θ0, Inline graphic). It follows from (2) and (3) that nΛ̃k{t} converges to λ0k(t)/E[l=1nikRikl(t)dNikl(t)/dt], and is thus uniformly bounded away from zero, where t is an observed failure time. Therefore,

n1Ln(θ0,A)+n1i=1nk=1Kl=1nikRikl(t)dNikl(t)logn=n1i=1nk=1Kl=1niklog(nΛk{t})Rikl(t)dNikl(t)+n1i=1nlogΨ(Oi;θ0,A0),

which is bounded away from − ∞ when n is large. That is,

n1Ln(θ0,A)+n1i=1nk=1Kl=1nikRikl(t)dNikl(t)logn=O(1),

where O(1) denotes a finite constant. On the other hand, (C3) implies that

n1Ln(θ^,A^)n1i=1nk=1Kl=1nikRikl(t)logΛ^k{t}dNikl(t)+n1i=1nlogΨ(Oi;θ^,A^)n1i=1nlogr1(Oi)+n1i=1nk=1KI(Rik·(t)>0)logΛ^k{t}dNik·(t)n1i=1nk=1Klog{1+0tRik·(s)dΛ^k(s)}dNik·(t)n1i=1nk=1Kc1log{1+0τRik·(s)dΛ^k(s)},

where dNik·(t)=l=1nikRikl(t)dNikl(t). Thus,

O(1)n1i=1nk=1KI(Rik·(t)>0)log(nΛ^k{t})dNik·(t)n1i=1nk=1Klog{1+0tRik·(s)dΛ^k(s)}dNik·(t)n1i=1nk=1Kc1log{1+0τRik·(s)dΛ^k(s)}. (4)

We now show that the right-hand side diverges to − ∞ if Λ̂k(τ) diverges for some k. The proof is based on the partitioning idea of Murphy (1994). Specifically, we construct a sequence t0k = τ > t1k > t2k > … in the following manner. First, we define

t1k=argmin{t[0,t0k):c12E[I(R¯ik·(τ)>0)]E[I(R¯ik·(t)>0,R¯ik·(τ)=0)tt0kdNik·(t)]},

where ik·(t) = infs∈[0,t] Rik·(s). Clearly, such a t1k exists, and the above inequality becomes an equality if t1k > 0. If t1k > 0, we choose a small constant ε0 such that

ε01ε0<c1E[I(R¯ik·(τ)=0,R¯ik·(t1k)>0)]E[I(R¯ik·(t1k)=0,R¯ik·(0)>0)0τdNik·(t)],

and define

t2k=argmin{t[0,t1k):(1ε0)E[{c1+t1kt0kdNik·(t)}I(R¯ik·(t0k)=0,R¯ik·(t1k)>0)]E[I(R¯ik·(t1k)=0,R¯ik·(t)>0)tt1kdNik·(t)]}.

Such a t2k exists. If t2k > 0, the inequality is an equality, and we define

t3k=argmin{t[0,t1k):(1ε0)E[{c1+t2kt1kdNik·(t)}I(R¯ik·(t1k)=0,R¯ik·(t2k)>0)]E[I(R¯ik·(t2k)=0,R¯ik·(t)>0)tt2kdNik·(t)]}.

We continue this process. The sequence eventually stops at some tNk,k = 0. If this is not true, then the sequence is infinite and strictly decreases to some t* ≥ 0. Since all the inequalities are equalities, we sum all the equations except the first one to obtain

(1ε0)E[{c1+tt0kdNik·(t)}I(R¯ik·(t)>0,R¯ik·(τ)=0)]=E[I(R¯ik·(t1k)=0,R¯ik·(t)>0)tt1kdNik·(t)],

which implies that

c1(1ε0)E[I(R¯ik·(τ)=0,R¯ik·(t1k)>0)]ε0E[I(R¯ik·(t1k)=0,R¯ik·(0)>0)0τdNik·(t)].

This contradicts the choice of ε0. Thus, the sequence stops at some tNk,k = 0.

If we write Iqk = [tq+1,k, tqk), then the right-hand side of (4) can be bounded by

k=1K[n1i=1nq=0Nk1I(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqklog(nΛ^k{t})dNik·n1i=1nq=0Nk1I(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·log{1+Λ^k{tq+1,k}}n1i=1nq=0Nk1I(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)c1log{1+Λ^k{tq+1,k}}n1i=1nI(R¯ik·(t0k)>0)log{1+Λ^k{τ}}]. (5)

Since log x is a concave function,

i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqklog(nΛ^k{t})dNik·(t){i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·}×log[i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqknΛ^k{t}dNik·(t)i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·(t)]{i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·}×log[nΛ^k{tqk}i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·(t)].

Therefore, (5) can be further bounded by

O(1)k=1K[q=0Nk1n1i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·×log{ni=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·}+q=0Nk1logΛ^k(tqk){n1i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·}n1i=1nq=0N1I(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·log{1+Λ^k(tq+1,k)}q=0Nk1n1i=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)c1log{1+Λ^k(tq+1,k)}n1i=1nI(R¯ik·(t0k)>0)log{1+Λ^k(τ)}].

By (C2),

ni=1nI(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·a.s.(E[I(R¯ik·(tqk)=0,R¯ik·(tq+1,k)>0)tIqkdNik·])1<,

so that

O(1)k=1K(n1i=1nc12I(R¯ik·(t0k)>0)log{1+Λ^k(τ)}{n1i=1nc12I(R¯ik·(t0k)>0)n1i=1nI(R¯ik·(t0k)=0,R¯ik·(tq+1,k)>0)tI0kdNik·}×log{1+Λ^k(t0k)}q=1Nk1[n1i=1nI(R¯ik·(tq1,k)=0,R¯ik·(tq,k)>0){c1+tIqkdNik·}n1i=1nI(R¯ik·(tq,k)=0,R¯ik·(tq+1,k)>0)tIqkdNik·]{1+logΛ^k(tq,k)}).

According to the construction of the tqk’s, the coefficients in front of log Λ̂k(tqk) are all negative when n is large enough. Therefore, the corresponding terms cannot diverge to ∞. However, if Λ̂k(τ) → ∞, the first term in the summation goes to −∞. We conclude that for all n large enough, Λ̂k(τ) < ∞. Thus, lim supn Λ̂k(τ) < ∞.

Step 3

We obtain the consistency result from (C5). Since Λ̂k is bounded and monotone, Λ̂k is weakly compact. Helly’s Selection Theorem implies that, for any subsequence, we can always choose a further subsequence such that Λ̂k point-wise converges to some monotone function Λk. Without loss of generality, we also assume that θ̂ converges to some θ*. The consistency will hold if we can show that Λk=Λ0k and θ* = θ0. Since Λ0k is continuous, the weak convergence of Λ̂k to Λ0k can be strengthened to the uniform convergence of Λ̂k to Λ0k in [0, τ].

Note that

Λ^k(t)=0tn1j=1nΨ.k(Oj;θ0,A0)[I(·s)]/Ψ(Oj;θ0,A0)n1j=1nΨ.k(Oj;θ^,A^)[I(·s)]/Ψ(Oj;θ^,A^)dΛk(s). (6)

Clearly, Λ̂k is absolutely continuous with respect to Λ̃k. By condition (C3),

sups[0,τ]|n1j=1nΨ.k(Oj;θ^,A^)[I(·s)]Ψ(Oj;θ^,A^)n1j=1nΨ.k(Oj;θ,A)[I(·s)]Ψ(Oj;θ,A)|n1j=1nF(Oj){θ^θ+k=1KΛ^k(t)Λk(t)dμjk(t;Oj)}0

since Λ̂k converges to Λk and is bounded and {ℱ(Inline graphic)μjk(t; Inline graphic): t ∈ [0, τ]} is a P-Glivenko-Cantelli class. By Lemma 1 and the Glivenko-Cantelli Theorem,

n1j=1nΨ.k(Oj;θ,A)[I(·s)]Ψ(Oj;θ,A)E[Ψ.k(Oj;θ,A)[I(·s)]Ψ(Oj;θ,A)]uniformlyins[0,τ],n1j=1nΨ.k(Oj;θ0,A0)[I(·s)]Ψ(Oj;θ0,A0)E[Ψ.k(Oj;θ0,A0)[I(·s)]Ψ(Oj;θ0,A0)]uniformlyins[0,τ].

The numerator and denominator in the integrand of (6) converge uniformly to deterministic functions, denoted by g1k(s) and g2k(s), respectively. It follows from (3) that g1k(s)E[l=1nikRikl(s)dNikl(s)/ds]/λik(s) is bounded away from zero. We claim that inf s∈[0,τ] g2k(s) > 0. If this is not true, then there exists some s* ∈ [0, τ] such that g2k(s*+) = 0 or g2k(s*) = 0. By Lemma 2, there exist δ* and c* such that |g2k(s)| ≤ c*|ss*| for s ∈ (s*, s* + δ *) or s ∈ (s*δ *, s*]. On the other hand, for any ε > 0,

Λ^k(τ)0τn1j=1nΨ.k(Oj;θ0,A0)[I(·s)]/Ψ(Oj;θ0,A0)ε+n1j=1nΨ.k(Oj;θ^,A^)[I(·s)]/Ψ(Oj;θ^,A^)dΛk(s).

Taking limits on both sides, we obtain O(1)0τ{ε+g2k(s)}1g1k(s)dΛ0k(s). Let ε → 0. By the Monotone Convergence Theorem, O(1)ss+δ{css}1g1k(s)λ0k(s)ds, or O(1)sδs{css}1g1k(s)λ0k(s)ds. This is a contradiction since the right-hand side is infinite. The contradiction implies that the limit g2k(s) is uniformly positive. We can take limits on both sides of (6) to obtain Λk(t)=0tg2k1(s)g1k(s)dΛ0k(s). Thus, Λk is also absolutely continuous with respect to Λ0k and dΛk/dΛ0k=g1k/g2k. Since Λ0k(t) is differentiable with respect to t, so is Λk(t). Write {Λk}(t)=λk(t). The forgoing arguments show that dΛ̂k(t)/dΛ̃k(t) uniformly converges to λk(t)/λ0k(t), which is uniformly positive in [0, τ].

It follows from the inequality n−1n(θ̂, Inline graphic) ≥ n−1n(θ0, Inline graphic) that

n1i=1nk=1Kl=1niklogdΛ^k(t)dΛk(t)Rikl(t)dNikl(t)+n1i=1nlogΨ(Oi;θ^,A^)Ψ(Oi;θ0,A)0.

In view of Lemma 1, the Glivenko-Cantelli Theorem and the uniform convergence of dΛ̂k/dΛ̃k, taking limits on both sides of the above inequality yields

E[logk=1Kl=1niktτ{λk(t)}Rikl(t)dNikl(t)Ψ(Oi;θ,A)k=1Kl=1niktτ{λ0(t)}Rikl(t)dNikl(t)Ψ(Oi;θ0,A0)]0.

The left-hand side is the negative Kullback-Leibler distance of the density indexed by (θ*, Inline graphic). Thus, (C5) entails that θ* = θ0 and Λ* = Λ0.

7. Weak Convergence and Asymptotic Efficiency

Define Inline graphic = {vRd, |v| ≤ 1}, and Inline graphic = {h(t): ||h(t)||V [0,τ] ≤ 1 }. We identify (θ̂θ0, Inline graphicInline graphic) as a random element in l(Inline graphic × Inline graphic) through the definition (θ^θ0)Tv+k=1K0τhk(s)d(Λ^kΛ0k)(s).

Theorem 2

Under (C1)–(C7), n1/2(θ̂θ0, Inline graphicInline graphic) →d ℊ in l(Inline graphic × Inline graphic), where ℊ is a continuous zero-mean Gaussian process. Furthermore, the limiting covariance matrix of n1/2(θ̂θ0) attains the semiparametric efficiency bound.

Proof

The proof is based on the likelihood equation and follows the arguments of van der Vaart (1998, pp. 419–424). Let ℒ(θ, Inline graphic) be the log-likelihood function from a single cluster, ℒ̇θ(θ, Inline graphic) be the derivative of ℒ(θ, Inline graphic) with respect to θ, and ℒ̇k(θ, Inline graphic)[Hk] be the path-wise derivative along the path Λk + εHk. We sometimes omit the arguments in these derivatives when θ = θ0 and Inline graphic = Inline graphic. Let Inline graphic be the empirical measure based on n i.i.d. observations, and Inline graphic be its expectation.

Let Inline graphic = (h1, …, hK) ∈ Inline graphic. The likelihood equation for (θ̂, Inline graphic) along the path (θ̂+εv, Inline graphic+εInline graphicdInline graphic), where vRd and hkBV [0, τ], is given by

0=Pn[vTL.θ(θ,A)+k=1KL.k(θ,A)[hkdΛk]].

To be specific,

0=Pn[vTΨ.θ(Oi;θ,A)Ψ(Oi;θ,A)]+k=1KPn[l=1nikhk(t)Rikl(t)dNikl(t)+Ψ.k(Oi;θ,A)[hkdΛk]].

Since (θ0, Inline graphic) maximizes Inline graphic[ℒ(θ, Inline graphic)],

0=P[vTL.θ(θ0,A0)],0=P[L.k(θ0,A0)[hkdΛ0k]],hkQ,k=1,,K.

These equations, combined with the likelihood equation for (θ̂, Inline graphic), yield

n1/2(PnP)[vTL.θ(θ^,A^)+k=1KL.k(θ^,A^)[hkdΛ^k]]=n1/2P[vTΨ.θ(Oi;θ^,A^)Ψ(Oi;θ^,A^)vTΨ.θ(Oi;θ0,A0)Ψ(Oi;θ0,A0)]k=1Kn1/2P[Ψ.k(Oi;θ^,A^)[hkdΛ^k]Ψ(Oi;θ^,A^)Ψ.θ(Oi;θ0,A0)[hkdΛ^0k]Ψ(Oi;θ0,A0)].

Define N0={(θ,A):θθ0+k=1K||ΛkΛ0k||V[0,τ]<δ0}, where δ0 is a small positive constant. When n is large enough, (θ̂, Inline graphic) belongs to Inline graphic with probability one. By Lemma 1 and the Donsker Theorem,

op(1)+n1/2(PnP)[vTL.θ(θ0,A0)+k=1KLk(θ0,A0)[hkdΛ0k]]=n1/2P[vTΨ.θ(Oi;θ^,A^)Ψ(Oi;θ^,A^)vTΨ.θ(Oi;θ0,A0)Ψ(Oi;θ0,A0)]k=1Kn1/2P[Ψ.k(Oi;θ^,A^)[hkdΛ^k]Ψ(Oi;θ^,A^)Ψ.k(Oi;θ0,A0)[hkdΛ^0k]Ψ(Oi;θ0,A0)], (7)

where op(1) represents some random element converging in probability to zero in l(Inline graphic × Inline graphic).

Under (C6), the first term on the right-hand side of (7) is

n1/2{k=1K0τvTζ0k(s)d(Λ^kΛ0k)+vTζ0θ(θ^θ0)}+o(n1/2θ^θ0+n1/2k=1K||Λ^kΛ0k||V[0,τ]).

The second term is k=1Kn1/2{0τhk(t)η0k(t;θ^,A^)dΛ^k(t)0τhk(y)η0k(t;θ0,A0)dΛ0k(t)}. It follows from (C6) that the above expression is

k=1Kn1/2[0τhk(t){η0kθ(t;θ0,A0)(θ^θ0)+m=1K0τη0km(s,t;θ0,A0)d(Λ^mΛ0m)(s)}dΛ0k(t)+0τhk(t)η0k(t;θ0,A0)d(Λ^k(t)Λ0k(t))]+o(n1/2θ^θ0+n1/2k=1K||Λ^kΛ0k||V[0,τ])=k=1Kn1/2[(θ^θ0)T0τhk(t)η0kθ(t;θ0,A0)dΛ0k(t)+m=1K0τ{I(m=k)hm(t)η0m(t;θ0,A0)+0τη0km(s,t;θ0,A0)hk(s)dΛ0k(s)}d(Λ^m(t)Λ0m(t))]+o(n1/2θ^θ0+n1/2k=1K||Λ^kΛ0k||V[0,τ]).

Thus, the right-hand side of (7) can be written as

n1/2{B1[v,W]T(θ^θ0)+k=1KB2k[v,W]d(Λ^kΛ0k)}+o(n1/2θ^θ0+n1/2k=1K||Λ^kΛ0k||V[0,τ]),

where (B1, B21, …, B2K) are linear operators in Rd × {BV [0, τ]}K, and

B1[v,W]=vTζ0θ(θ0,A0)+k=1K0τhk(t)η0kθ(t;θ0,A0)dΛ0k(t), (8)
B2k[v,W]=vTζ0k(s;θ0,A0)+hk(t)η0k(t;θ0,A0)+m=1K0τη0mk(s,t;θ0,A0)hm(s)dΛ0k(s),k=1,,K. (9)

It follows from the above derivation that

B1[v,W]Tv+m=1KB2k[v,W]WkdΛ0k=ddε|ε=0P[vTLθ(θ0+εv,A0+εWdA0)+k=1KLk(θ0+εv,A0+εWdA0)[hkdΛ0k]]. (10)

We can write (B1, B21, …, B2K)[v, Inline graphic] as

(vη01(t;θ0,A0)×h1(t)η0K(t;θ0,A0)×hK(t))+(vTζ0θ(θ0,A0)+k=1K0τhk(t)η0kθ(t;θ0,A0)dΛ0k(t)vvTζ01(t;θ0,A0)+k=1K0τη0m1(s,t;θ0,A0)hm(s)dΛ0m(s)vTζ0K(t;θ0,A0)+k=1K0τη0mK(s,t;θ0,A0)hm(s)dΛ0m(s)).

We wish to prove that (B1, B21, …, B2K) is invertible. As shown at the end of this section, η0k(t; θ0, Inline graphic) < 0, so that the first term of (B1, B21, …, B2K) is an invertible operator. It follows from Lemma 3 that the second term is a compact operator. Thus, (B1, B21, …, B2K) is a Fredholm operator, and the invertibility of (B1, …, B2K) is equivalent to the operator being one-to-one (Rudin (1973, pp. 99–103)). Suppose that B1[v, Inline graphic] = 0, …, and B2K[v, Inline graphic] = 0. It is easy to see from (10) that the derivative of P[vTLθ(θ0,A0)+k=1KLk(θ0,A0)[hkdΛ0k]] along the path (θ0 + εv, Inline graphic + εInline graphicdInline graphic) is zero. That is, the information along this path is zero, or vTLθ(θ0,A0)+k=1KLk(θ0,A0)[hkdΛ0k]=0 almost surely. By (C7), v = 0 and Inline graphic = 0, so that (B1, B21, …, B2K) is one-to-one and invertible.

It follows from (7) that, for any (v, Inline graphic) ∈ Inline graphic × Inline graphic,

n1/2{vT(θ^θ0)+k=1K0τhk(t)d(Λ^k(t)Λ0k(t))}=n1/2(PnP)[v^TL.θ(θ0,A0)+k=1KL.k(θ0,A0)[hkdΛ0k]]+o(n1/2θ^θ0+n1/2k=1K||Λ^kΛ0k||V[0,τ]),

where (, 1, …, K) = (B1, B21, …, B2K)−1(v, h1, …, hK). Since

θ^θ0+k=1K||Λ^kΛ0k||V[0,τ]=sup(v,h1,,hK)V×QK|vT(θ^θ0)+k=1K0τhk(t)d(Λ^k(t)Λ0k(t))|,

we have

n1/2{θ^θ0+k=1K||Λ^kΛ0k||V[0,τ]}=Op(1)+o(n1/2θ^θ0+n1/2k=1K||Λ^kΛ0k||V[0,τ]).

Thus, n1/2{θ^θ0+k=1K||Λ^kΛ0k||V[0,τ]}=Op(1). Consequently,

n1/2{vT(θ^θ0)+k=1K0τhk(t)d(Λ^k(t)Λ0k(t))}=n1/2(PnP)[vTL.θ(θ0,A0)+k=1KL.k(θ0,A0)[hkdΛ0k]]+op(1).

We have proved that n1/2(θ̂θ0, Inline graphicInline graphic) converges weakly to a Gaussian process in l(Inline graphic × Inline graphic). By choosing hk = 0 for k = 1, …, K, we see that vTθ̂ is an asymptotically linear estimator of vT θ0 with influence function vTL.θ(θ0,A0)+k=1KL.k(θ0,A0)[hkdΛ0k]. Since the influence function lies in the space spanned by the score functions, θ̂ is an efficient estimator for θ0.

It remains to verify that η0k(t; θ0, Inline graphic) < 0. Under (C6), P[Ψ.k(Oi;θ0,A0)[Hk]/Ψ(Oi;θ0,A0)]=0τη0k(s;θ0,A0)dHk(s). The choice of Hk(s) = I(st) yields Inline graphic[Ψ̇k(Inline graphic;θ0,Inline graphic)[I(· ≥ t)]/Ψ(Inline graphic;θ0,Inline graphic)] = η0k(t;θ0,Inline graphic). On the other hand, the score function along the path Λ0k + εI(· ≥ t) with the other parameters fixed at their true values has zero expectation. We expand this expectation to obtain

P[Ψ.k(Oi;θ0,A0)[I(·t)]Ψ(Oi;θ0,A0)]=λk1(t)dE[I(Rik·(t)>0)Nik·(t)]/dt<0.

Thus, η0k(t; θ0, Inline graphic) < 0.

8. Information Matrix

Theorem 2 implies that the functional parameter Inline graphic can be estimated at the same rate as the Euclidean parameter θ. Thus, we may treat (1) as a parametric log-likelihood with θ and the jump sizes of Λk, k = 1, …, K, at the observed failure times as the parameters and estimate the asymptotic covariance matrix of the NPMLEs for these parameters by inverting the information matrix. This result is formally stated in Theorem 3. We impose an additional assumption.

(C8) There exists a neighborhood of (θ0, Inline graphic) such that for (θ, Inline graphic) in this neighborhood, the first and second derivatives of log Ψ(Inline graphic; θ, Inline graphic) with respect to θ and along the path Λk + εHk with respect to ε satisfy the inequality in (C4).

For any vInline graphic and h1, …, hKInline graphic, we consider the vector (vT,h1T,,hKT)T, where h⃗k is the vector consisting of the values of hk(·) at the observed failure times. Let ℐn be the negative Hessian matrix of (1) with respect to θ̂ and the jump sizes of (Λ̂1, …, Λ̂K).

Theorem 3

Assume (C1)–(C8). Then ℐn is invertible for large n, and

supvV,h1,,hKQ|n(vT,h1T,,hKT)In1(vT,h1T,,hKT)TAVar[n1/2{vT(θ^θ0)+k=1Khkd(Λ^kΛ0k)}]|0

in probability, where AVar denotes the asymptotic variance.

Proof

The proof is similar to that of Theorem 3 in Parner (1998); see also van der Vaart (1998, pp. 419–424). First, (10) implies that, for any vInline graphic and h1, …, hKInline graphic,

P((L¨θθL¨θ1L¨θKL¨KθLK1LKK)[(vh1dΛ01hKdΛ0K),(vh1dΛ01hKdΛ0K)])=vTB1(v,h1,,hK)+k=1KB2k(v,h1,,hK)hkdΛ0k, (11)

where ℒ̈ pertains to the second-order derivative of the log-likelihood function.

On the right-hand side of (10), we replace Inline graphic by Inline graphic to obtain two new linear operators Bn1 and Bn2k. It is easy to show that Bn1 and Bn2k converge uniformly to B1 and B2k, respectively. Under (C8), the results of Lemma 1 apply to the second-order derivatives ℒ̈ and the operators (B1, B21, …, B2K). By replacing θ0, Λ0k and Inline graphic on both sides of (11) with θ̂, Λ̂0k and Inline graphic, we obtain

(vT,h1T,,hKT)In(vT,h1T,,hKT)T=vTBn1(v,h1,,hK)+k=1KBn2k(v,h1,,hK)hkdΛk+op(1).

According to the proof of Theorem 2, (B1, B21, …, B2K) is invertible, and so is (Bn1, …, Bn2k) for large n. Note that vTBn1(v,h1,,hK)+k=1KBn2k(v,h1,,hK)hkdΛk can be written as (vT,h1T,,hKT)×Bn(vT,h1T,,hKT)T for some matrix ℬn. Therefore ℬn is invertible, and so is ℐn. Furthermore,

supvV,h1,,hKQ|(vT,h1T,,hKT)In(vT,h1T,,hKT)T(vT,h1T,,hKT)Bn(vT,h1T,,hKT)T|0.

According to Theorem 2, the asymptotic variance of n1/2{vT(θ^θ0)+k=1Khkd(Λ^kΛ0k)} is

P[{L.θTv+k=1KL.k[hkdΛ0k]}2]P{(L¨θθL¨θ1L¨θKL¨KθLK1LKK)[(vh1dΛ01hKdΛ0K),(vh1dΛ01hKdΛ0K)]},

where (, 1, …, K) is (B1, B21, …, B2K)−1(v, h1, …, hK ), which can be approximated by (Bn1, Bn21, …, Bn2K)−1(v, h1, …, hK). Hence, the asymptotic variance can be approximated uniformly in v and hk’s by its empirical counterpart (vT,h1T,,hKT)Bn1InBn1(vT,h1T,,hKT)T, which is further approximately by (vT,h1T,,hKT)In1(vT,h1T,,hKT)T.

9. Profile Likelihood

Theorem 4

Let pln(θ) be the profile log-likelihood function for θ, and assume (C1)–(C8). For any εn = Op(n−1/2) and any vector v,

pln(θ^+εnv)2pln(θ^)+pln(θ^εn)nεn2pvT1v,

where Σ is the limiting covariance matrix of n1/2(θ̂θ0). Furthermore, 2{pln(θ^)pln(θ0)}dχd2.

Proof

We appeal to Theorem 1 of Murphy and van der Vaart (2000). Specifically, we construct the least favorable submodel for θ0 and verify all the conditions in their Theorem 1. For notational simplicity, we assume that K = 1. It is straightforward to extend to K > 1.

It follows from the proof of Theorem 2 that

0τB2(0,h)hdΛ0=E[L¨ΛΛ[hdΛ0,hdΛ0]],

where B2 stands for the operator (B21, …, B2K), and ℒ̈ΛΛ[H1, H2] denotes the second-order derivative of ℒ(θ, A) with respect to Λ along the bi-directions H1 and H2. On the other hand,

E[L.Λ[hdΛ0]L.θ]=0τh(s)L.ΛL.θdΛ0(s),

where LΛ is the dual operator of ℒΛ in L2[0, τ]. Thus, if we choose h such that B2(0,h)=L.ΛL.θ, then

E[L.Λ[hdΛ0]L.θ]=E[L¨ΛΛ[hdΛ0,hdΛ0]].

By definition, ∫hdΛ0 is the least favorable direction for θ0 and ℒ̇θ − ℒ̇Λ [∫hd Λ0] is the efficient score function. Such an h exists since B2(0, ·) is invertible. In addition, hBV [0, τ]. Hence, we can construct the least favorable submodel at (θ, Λ) by ε ↦ (ε, Λε) with dΛε (θ, Λ) = {1 + (εθ) · h} dΛ. Clearly, Λθ (θ, Λ) = Λ and

L(ε,Λε)ε|ε=θ0,θ=θ0,Λ=Λ0=L.θL.Λ[hdΛ0].

If θ̃p θ0 and Λ̂θ̃ maximizes the objective function with θ̂ replaced by θ̃, we can use the arguments in the proof of Theorem 1 to show that Λ̂θ̃ is consistent. In the likelihood equation for Λ̂θ̃, we can use the arguments for the linearization of (7) to show that, uniformly in hInline graphic,

op(1)+n1/2(PnP)[L.Λ(θ0,Λ0)[hdΛ0]]=n1/20τB2(0,h)d(Λ^θΛ0)+Op(n1/2θθ0)+op(n1/2||Λ^θΛ0||V[0,τ]).

The arguments for proving the invertibility of (B1, B2) show that hB2(0, h) is invertible. Thus,

||Λ^θΛ0||V[0,τ]=Op(θθ0+n1/2).

By condition (C6), we obtain the no-bias condition, i.e.,

E[L(ε,Λε)ε|ε=θ0,θ=θ,Λ=Λ^θ]=Op(θθ0+n1/2).

We have verified conditions (8)–(11) of Murphy and van der Vaart (2000).

Condition (C4), together with Lemma 1, implies that the class

{L(ε,Λε)ε:εθ0<δ0,(θ,Λ)N0}

is P-Donsker and that the functions in the class are continuous at (θ0, Λ0) almost surely, while condition (C8) implies that the class

{2L(ε,Λε)ε2:εθ0<δ0,(θ,Λ)N0}

is P-Glivenko-Cantelli and is bounded in L2(P). Therefore, all the conditions in Murphy and van der Vaart (2000) hold, so that the desired results follows from their Theorem 1.

10. Applications

In this section, we apply the general results to the problems described in Section 2. We identify a set of conditions for each problem under which regularity conditions (C1)–(C8) are satisfied so that the desired asymptotic properties hold. These applications not only provide the theoretical justifications for the work of Zeng and Lin (2007), but also illustrate how the general theory can be applied to specific problems.

10.1. Transformation Models With Random Effects for Dependent Failure Times

We assume the following conditions.

(D1) The parameter value (β0T,γ0T)T belongs to the interior of a compact set Θ in Rd, and Λ0k(t)>0 for all t ∈ [0, τ], k = 1, …, K.

(D2) With probability one, Zikl(·) and ikl(·) are in BV [0, τ] and are left-continuous with bounded left-and right-derivatives in [0, τ].

(D3) With probability one, P (Ciklτ|Zikl) > δ0 > 0 for some constant δ0.

(D4) With probability one, nik is bounded by some integer n0. In addition, E[Nik·(τ)] < ∞.

(D5) For k = 1, …, K, Gk(x) is four-times differentiable such that Gk(0) = 0, Gk(x)>0, and for any integer m ≥0 and any sequence 0 < x1 < … < xmy,

l=1m{(1+xl)Gk(xl)}exp{Gk(y)}μ0km(1+y)κ0k

for some constants μ0k and κ0k > 0. In addition, there exists a constant ρ0k such that

supx{Gk(x)+G(3)(x)+G(4)(x)G(x)(1+x)ρ0k}<.

(D6) For any constant a1 > 0,

supγE[bexp{a1(Nik·(τ)+1)b}f(b;γ)db]<,

and there exists a constant a2 > 0 such that for any γ,

|f.γ(b;γ)f(b;γ)|+|f¨γ(b;γ)f(b;γ)|+|fγ(3)(b;γ)f(b;γ)|O(1)exp{a2(1+b)}.

(D7) Consider two types of events: kInline graphic indicates that event k is recurrent and kInline graphic indicates that event k is survival time. For kInline graphicInline graphic, if there exist ck(t) and v such that with probability 1, ck(t) + vTZikl(t) = 0 for kInline graphic and ck(0) + vTZikl(0) = 0 for kInline graphic, then v = 0.

(D8) If there exist constants αk and α0k such that for any subset Lk ⊂ {1, …, nik} and for any ωkl and tkl,

bkK1l=1nikexp{iωklbTZikl(tkl)}kK2lLkexp{αk+bTZikl(0)}f(b;γ)db=bkK1l=1nikexp{iωklbTZikl(tkl)}kK2lLkexp{α0k+bTZikl(0)}f(b;γ0)db,

then γ = γ0. In addition, if for kInline graphic and for any t,

bexp{Gk(0tebTZikl(s)dΛ1(s))}f(b;γ0)db=bexp{Gk(0tebTZikl(s)dΛ2(s))}f(b;γ0)db,

then Λ1 = Λ2. Furthermore, if for some vector v and constant αk,

I(kK1)be2bTZikl(0)f(b;γ0)Tvdb+(kK2)bebTZikl(0)(αkf(b;γ0)f(b;γ0)Tv)db=0,

then v = 0.

(D1)–(D4) are standard conditions for this type of problem. We show that (D5) holds for all commonly used transformations. We first consider the class of logarithmic transformations G(x) = ρ log(1 + rx) (ρ > 0, r > 0). Clearly,

k=1m{(1+xk)G(y)}exp{G(y)}k=1m{ρr(1+xk)1+rxk}(1+ry)ρ{ρr(1+1/r)}mmin(1,r)ρ(1+y)ρ.

Thus, in (D5), we can set μ0 to ρr(1 + 1/r) min(1, r)ρ and κ0 to ρ. We can verify the polynomial bounds for G″(x)/G(x), G(3)(x)/G(x) and G(4)(x)/G(x) by direct calculations. We next consider the class of Box-Cox transformations G(x) = {(1 + x)ρ − 1}/ρ. Clearly,

k=1m{(1+xk)G(xk)}exp{G(y)}k=1m(1+xk)ρexp[{(1+y)ρ1}/ρ](1+y)mρexp{(1+y)ρ/2ρ}exp{(1+y)ρ/2ρ}exp(1/ρ){4ρ+exp(1/ρ)}m(1+y)ρ.

Thus, we can set μ0 to 4ρ + exp(1/ρ) and κ0 to ρ. The polynomial bounds for G″ (x)/G(x), G(3)(x)/G(x) and G(4)(x)/G(x) hold naturally. Finally, we consider the linear transformation model: H(T) = β T Z + ε, where ε is standard normal. In this case, G(x) = − log{1 − Φ(log x)}, where Φ is the standard normal distribution function. We claim that there exists a constant ν0 > 0 such that φ (x) − ν0{1 − Φ(x)}(1 + |x|). If x < 0, then φ(x) ≤ (2π)−1/2 ≤ 2(2π)−1/2 {1 − Φ(x)}(1 + |x|). If x ≥ 0,

limx0φ(x){1Φ(x)}(1+x)=2(2π)1/2.

By the L’Hospital rule,

limx1Φ(x)φ(x)=limxφ(x)φ(x)x=0,limxφ(x){1Φ(x)}(1+x)=limxφ(x)xφ(x)(1+x)+{1Φ(x)}=limx1(1+x)/x{1Φ(x)}/xφ(x)=1.

Therefore, φ(x)/[{1 − Φ(x)}(1 + x)] is bounded for x ≥ 0. Without loss of generality, assume that y > 1. Clearly,

k=1m{(1+xk)G(xk)}exp{G(y)}=k=1m{(1+xk)φ(log(xk))/xk1Φ(log(xk))}{1Φ(logy)}.

Since (1 + x) φ(log(x))/[x{1 − Φ(log x)}] is bounded when x is close to zero and it is bounded by a multiplier of (1 + log x) when x is close to ∞, (1 + x)φ(log(x))/x{1 − Φ(log x)} ≤ ν01 + ν02 log(1 + x) for two constants ν01 and ν02. Therefore,

k=1m{(1+xk)G(xk)}exp{G(y)}{ν01+ν02log(1+y)}m{1Φ(logy)}.

Since 1 − Φ(x) ≤ 21/2 exp(−x2/4) when x > 0, the above expression is bounded by

21/2{ν01+ν02log(1+y)}mexp{(logy)2/4}ν03{ν01+ν02log(1+y)}mexp{ν04(log(1+y))2}ν05m(1+y)ν04/2,

where all the ν’s are positive constants. The polynomial bounds for G″ (x)/G(x), G(3)(x)/G(x) and G(4)(x)/G(x) follow from the fact that φ(x)={1 − Φ (x)} ≤ O(1 + |x|).

Condition (D6) pertains to the tail property of the density function for the random effects f(b; γ). For survival data, Nik·(τ)1, so that the first half of condition (D6) is tantamount to that the moment generating function of b exists everywhere. This condition holds naturally when b has a compact support or a Gaussian density tail. The second half of condition (D6) clearly holds for Gaussian density functions.

(D7) and (D8) are sufficient conditions to ensure parameter identifiability and non-singularity of the Fisher information matrix. In most applications, these conditions are tantamount to the linear independence of covariates and the unique parametrization of the random-effects distribution. Specifically, if ikl is time-independent, then the second condition in (D8) is not necessary; if ikl does not depend on k and l, and b has a normal distribution, then the other two conditions in (D8) hold as well provided that ikl is linearly independent with positive probability; if ikl is time-independent and Inline graphic is non-empty (i.e., at least one event is recurrent), then (D8) can be replaced by the linear independence of ikl for some kInline graphic and the unique parametrization of f (b; γ).

We wish to show that (D1)–(D8) imply (C1)–(C8), so that the desired asymptotic properties hold. Conditions (C1) and (C2) follow naturally from (D1)–(D4). To verify (C3), we note that

Ψ(Oi;θ,A)=bk=lKl=1nikΩikl(b;β,Λk)f(b;γ)db,

where

Ωikl(b;β,Λk)=tτ{Rikl(t)eβTZikl(t)+bTZikl(t)Gk(qikl(t))}dNikl(t)exp{Gk(qikl(τ))},

and qikl(t)=0tRikl(s)exp{βTZikl(s)+bTZikl(s)}dΛk(s).

If ||Λk||V [0,τ] are bounded, then Ωikl(b;β,Λk)exp{O(1)Nikl(τ)}I(bB0) for any fixed constant B0 such that P(|b| ≤ B0) > 0. Thus, Ψ(Inline graphic; θ, Inline graphic) is bounded from below by exp{O(1)Nikl(τ)}, so that the second half of (C3) holds. It follows from (D5) that

Ωikl(b;β,Λk)O(1)tτ{Rikl(t)ebTZikl(t)}dNikl(t)μ0kNikl(τ)tτ{1+qikl(t)}dNikl(t){1+qikl(τ)}κ0k.

Since exp{βT Zikl(s) + bTikl(s)} ≥ exp{−O(1 + |b|)}, we have 1+qikl(t)eO(1+b){1+Rik·(s)dΛk(s)}, so that

Ωikl(b;β,Λk)O(1)μ0kNikl(τ)eO(1+Nikl(τ))btτ{1+0tRik·(s)dΛk(s)}dNikl(t){1+0τRikl(s)dΛk(s)}κ0k.

Thus, the first half of (C3) holds as well.

We now verify (C4). Under (D5),

Ωikl(b;β,Λk)exp{O(1+Nikl(τ))b},|βΩikl(b;β,Λk)|=|Ωikl(b;β,Λk)[{Rikl(t)Zikl(t)dNikl(t)+Rikl(t)Gk(qikl(t))0tRikl(s)eβTZikl(s)+bTZikl(s)Zikl(s)dΛk(s)Gk(qikl(t))dNikl(t)}Gk(qikl(τ)){0τRikl(s)eβTZikl(s)+bTZikl(s)Zikl(s)dΛk(s)}]|exp{O(1+Nikl(τ))(1+b)},
|ΛkΩikl(b;β,Λk)[Hk]|=|Ωikl(b;β,Λk)×[{Rikl(t)Gk(qikl(t))0tRikl(s)eβTZikl(s)+bTZikl(s)dHk(s)Gk(qikl(t))dNikl(t)}Gk(qikl(τ)){0τRikl(s)eβTZikl(s)+bTZikl(s)dHk(s)}]|exp{O(1+Nikl(τ))(1+b)}.

Thus, it follows from the Mean-Value Theorem that

|Ωikl(b;β(1),Λk)Ωikl(b;β(2),Λk)|=|βΩikl(b;β,Λk)|β(1)β(2)exp{O(1+Nikl(τ))b}β(1)β(2),
Ωikl(b;β,Λk(1))Ωikl(b;β,Λk(2))=|ΛkΩikl(b;β,Λk)[Λk(1)Λk(2)]|exp{O(1+Nikl(τ))b}×{Rikl(t)|0teβTZikl(s)+bTZikl(s)d(Λk(1)Λk(2))(s)|dNikl(t)+|0τRikl(t)eβTZikl(s)+bTZikl(s)d(Λk(1)Λk(2))(s)|}exp{O(1+Nikl(τ))(1+b)}×{Rikl(t)Λk(1)(t)Λk(2)(t)dNikl(t)+0τΛk(1)(s)Λk(2)(s)ds},

where the last inequality follows from integration by parts and the fact that Zikl(t) and ikl(t) have bounded variations. It then follows from (D6) that |Ψ (Inline graphic; θ(1), Inline graphic) − Ψ (Inline graphic; θ(2), Inline graphic)| is bounded by the right-hand side of the inequality in (C4). By the same arguments, we can verify the bounds for the other three terms in (C4).

To verify (C6), we calculate that

η0k(s;θ,A)=E[bm=1Kl=1nimΩiml(b;β,Λm)f(b;γ)bm=1Kl=1nimΩiml(b;β,Λm)f(b;γ)db×{tsGk(qikl(t))Gk(qikl(t))dNikl(t)Gk(qikl(τ))}Rikl(s)eβTZikl(s)+bTZikl(s)db].

For (θ, Inline graphic) in a neighborhood of (θ0, Inline graphic),

|η0k(s;θ,A)η0k(s;θ0,A0)θη0k(s;θ0,A0)T(θθ0)m=1Kη0kΛm(s;θ0,A0)[ΛmΛ0m]|=0(θθ0+m=1K||ΛmΛ0m||V[0,τ]).

Thus, for the second equation in (C6), η0km(s, t; θ0, Inline graphic) is obtained from the derivative of η0k with respect to Λm along the direction Λm − Λ0m, and η0 is the derivative of η0k with respect to θ. Likewise, we can obtain the first equation in (C6). It is straightforward to verify the Lipschitz continuity of η0km.

The verification of (C8) is similar to that of (C4), relying on the explicit expressions of Ψ̈ θθ (Inline graphic; θ, Inline graphic) and the first and second derivatives of Ψ(Inline graphic; θ, Inline graphic + εℋ) with respect to ε.

It remains to verify the two identifiability conditions under (D7) and (D8). To verify (C5), suppose that (β, γ, Λ1, …, Λk) yields the same likelihood as (β0, γ0, Λ10, …, Λk0). That is,

bk=1Kl=1nikλk(t)dNikl(t)Ωikl(b;β,Λk)f(b;γ)db=bk=1Kl=1nikλk0(t)dNikl(t)Ωikl(b;β0,Λk0)f(b;γ0)db.

We perform the following operations on both sides sequentially for k = 1, …, K and l = 1, …, nik.

(a) If the kth type of event pertains to survival time, for the lth subject of this type of event, the first equation is obtained with Rikl(t) = 1 and dNikl(t)=0 for any tτ, i.e., the subject does not experience any event in [0, τ]. The second equation is obtained by integrating t from tkl to τ on both sides under the scenario that Rikl(t) = 1 and Nikl(t) has a jump at t, i.e, the subject experiences the event at time tkl. We then take the difference between these two equations. In the resulting equation, the terms λk(t)dNikl(t)Ωikl(b;β,Λk) and λk0(t)dNikl(t)Ωikl(b;β0,Λk0) are replaced by exp{Gk(0tklexp{βTZikl(s)+bTZik(s)}dΛk)} and exp{Gk(0tklexp{β0TZikl(s)+bTZik(s)}dΛk0)}, respectively.

(b) If the kth type of event is recurrent, for the lth subject of this type of event, we let Rikl(t) = 1 and let Nikl(t) have jumps at s1, s2, …, sm and s1,,sm for any arbitrary (m + m′) times in [0, τ]. We integrate s1, …, sm from 0 to tkl and integrate s1,,sm from 0 to τ. In the obtained equation, λk(t)dNikl(t)Ωikl(b;β,Λk) is replaced by {Gk(qikl(tkl))}m {Gk(qikl(τ))}m on both sides. Note that m and m′ are arbitrary. We then multiple both sides by {(iωkl)m/m!}/m′! and sum over m, m′ = 0, 1, … On both sides of the resulting equation, the terms associated with k and l are replaced by exp{iωklGk(qikl(tkl))}.

After these sequential operations, we obtain

bkK1l=1nikexp{iωklGk(qikl(tkl))}kK2l=1nikexp{Gk(qikl(tkl))}f(b;γ)db=bkK1l=1nikexp{iωklGk(qikl(tkl))}kK2l=1nikexp{Gk(qikl0(tkl))}f(b;γ0)db.

For survival time, we can let any subject from the nik subjects have tkl = 0, which results in

bkK1l=1nikexp{iωklGk(qikl(tkl))}kK2l=1nik[1ξkl+exp{Gk(qikl(tkl))}]f(b;γ)db=bkK1l=1nikexp{iωklGk(qikl0(tkl))}kK2l=1nik[1ξkl+exp{Gk(qikl0(tkl))}]f(b;γ0)db,

where ξkl is any positive variable.

The above expression implies that {Gk(qikl(t)), kInline graphic} as a function of

b1kK2l=1nik[1ξkl+exp{Gk(qikl(tkl))}]f(b;γ)

has the same distribution as {Gk(qikl0(t)), kInline graphic} as a function of

b2kK2l=1nik[1ξkl+exp{Gk(qikl0(tkl))}]f(b;γ0);

so this is true between {qikl(t)} and {qikl0(t)} because of the one-to-one mapping. Thus, the distributions of { logqikl(t)} and { logqikl0(t)} should also agree and they have the same expectation. Now let tkl = 0 for kInline graphic. Since E[b1] = E[b2] = 0, we obtain logλk(t)+βTZikl(t)=logλk0(t)+β0TZikl(t) for kInline graphic. The above arguments also yield

bkK1l=1nikexp{bTZikl(tkl)}kK2l=1nik[1ξkl+exp{Gk(qikl(tkl))}]f(b;γ)db=bkK1l=1nikexp{bTZikl(tkl)}kK2l=1nik[1ξkl+exp{Gk(qikl0(tkl))}]f(b;γ0)db.

We compare the coefficients of ξkl for kInline graphic. This yields that for any subset Lk ⊂ {1, …, nik},

bkK1l=1nikexp{iωklbTZikl(tkl)}kK2lLkexp{Gk(qikl(t))}f(b;γ)db=bkK1l=1nikexp{iωklbTZikl(tkl)}kK2lLkexp{Gk(qikl0(t))}f(b;γ0)db.

We differentiate the above expression with respect to tkl at 0 for kInline graphic. It then follows from (D8) that log λk(0) − log λ0k(0) + (ββ0)T Zikl(0) = 0 and γ = γ0. Thus, (D7) implies that β = β0 and λk(t) = λ0k(t) for kInline graphic. On the other hand, for any fixed kInline graphic, we let tkl = 0 if k′ ≠ k or l′ ≠ l. Thus, b exp{− Gk(qikl(tkl))}f (b; γ0)db = b exp{− Gk(q0ikl(tkl))}f (b; γ0)db. Therefore, Λk = Λ0k for kInline graphic according to (D8).

To verify (C7), we write v = (vβ, vγ). We perform operations (a) and (b) on the score equation in (C7). The arguments used in proving the identifiability yield

b[kK1l=1nikiωklAikl(tkl)Gk(qikl0(tkl))kK2lLkAikl(tkl)+f(b;γ0)Tvγf(b;γ0)]×exp{kK1l=1nikliωklGk(qikl0(tkl))kK2lLkGk(qikl0(tkl))}f(b;γ0)db=0, (12)

where Aikl(t)=0t(hk(s)+Zikl(s)Tvβ)eβ0TZikl(s)+bTZikl(s)dΛk0(s)Gk(qikl0(t)). We differentiate (12) with respect to tkl twice at 0 for kInline graphic. Comparison of the coefficients for ωkl yields b e2bTikl(0) f′(b; γ0)T vγdb = 0. We also differentiate (12) with respect to tkl at 0 for kInline graphic. Thus, for each kInline graphic and l = 1, …, nik, b(hk(0)+Zikl(0)Tvβ)ebTZikl(0)f(b;γ0)db=Gk(0)bebTZikl(0)f(b;γ0)Tvγdb. It then follows from (D8) that vγ = 0. For fixed k0 and l0, with the fact of vγ = 0, the score equation under operations (a) and (b), where in (a) we let dNikl(t)=0 for any tτ and in (b) we let m = 0 whenever kk0 or ll0, becomes a homogeneous integral equation for hk0 (t) + Zik0l0 (t)T vβ. The equation has a trivial solution, so hk0 (t) + Zik0l0 (t)T vβ= 0. Since k0 and l0 are arbitrary, (D7) implies that hk = 0 and vβ= 0.

Remark 2

For survival time, (D5) is required to hold only for m = 0 and m = 1.

Remark 3

The above results do not apply directly to the proportional hazards model with gamma frailty because (D6) does not hold when b has a gamma distribution. It is mathematically convenient to handle this model because the marginal hazard function has an explicit form. The likelihood is a special case of ours with

Ψ(Oi;θ,Λ)=j=1nitτYij(t;β)dNij(t)tτ{1+θNi·(u)}dNi·(t){1+θ0τYi·(u;β)dΛ(u)}(1/θ+Ni·(τ))

in the notation of Parner (1998). Clearly, Ψ satisfies (C3) when θ > 0. The other conditions can be verified in the same manner as before.

Remark 4

Our theory does not cover the case in which the true parameter values lie on the boundary of Θ. It is delicate to deal with the boundary problem. One possible solution is to follow the idea of Parner (1998) by extending the definition of the likelihood function outside Θ and verifying (C2)–(C8) for the extended likelihood function.

Remark 5

We have assumed known transformations. We may allow Gk to belong to a parametric family of distributions, say Gk(·; φ), where φ is a parameter in a compact set. Then θ contains φ. Our results and proofs apply to this situation if (D5) holds uniformly in φ and the two identifiability conditions are satisfied.

10.2. Joint Models for Repeated Measures and Failure Times

For the (parametric) generalized linear mixed model, the likelihood can be viewed as a special case of that of Section 10.1 except that there is an additional parameter α in f(y|x; b). We assume that (D1)–(D8) hold but with (D6) replaced by the following condition.

(D6′) For any constant a1 > 0,

supα,γE[bexp{a1(Ni(τ)+1)b}j=1nif(YijXij;b)f(b;γ)db]<,

and there exists a constant a2 > 0 such that for any γ and α,

k=13|fα(k)(YijXij,b)f(YijXij,b)|+|fγ(k)(b;γ)f(b;γ)|r3(Oi)exp{a2(1+b)}

almost surely, where r3(Inline graphic) is a random variable in L2(P).

Under these conditions, the desired asymptotic properties follow from the arguments of Section 10.1.

Under the semiparametric linear transformation model for continuous repeated measures, the likelihood is in the form of that of Section 2.2 with K = 2 and ni2 = ni, where the time to the second type of failure is defined by Yij (assuming without loss of generality that Yij ≥ 0). Thus, if we regard Yij as a right-censored observation when it is greater than a very large value (i.e., the upper limit of detection), then the asymptotic results given in Section 10.1 hold. When such an upper limit does not exist, the estimator for Λ̃ can be unbounded when sample size goes to infinity. Then our proof of Theorem 1 does not apply.

10.3. Transformation Models for Counting Processes

We verify (C1)–(C8) under the following conditions.

(E1) The parameter value (β0T,γ0T)T belongs to the interior of a compact set Θ in Rd, and Λ0(t)>0 for all t ∈ [0, τ].

(E2) With probability one, P (Cτ|Z) > δ0 > 0 for some constant δ0.

(E3) Condition (D5) holds.

(E4) With probability one, Z(·) and are in BV [0, τ] and are left-continuous with bounded left- and right-derivatives in [0, τ].

(E5) If γT is equal to a constant with probability one, then γ = 0. In addition, if βT Z(t) = c(t) for a deterministic function c(t) with probability one, then β = 0.

In this case,

Ψ(Oi;θ,Λ)=tτ(Ri(t)eβTZi(t)+γTZi{1+0tRi(s)eβTZi(s)dΛ(s)}eγTZi1×G[{1+0tRi(s)eβTZi(s)dΛ(s)}eγTZi])dNi(t)exp(G[{1+0τRi(s)eβTZi(s)dΛ(s)}eγTZi]).

By (D5),

tτ(Ri(t)eβTZi(t)+γTZi{1+0tRi(s)eβTZi(s)dΛ(s)}eγTZi1×G[{1+0tRi(s)eβTZi(s)dΛ(s)}eγTZi])dNi(t)exp(G[{1+0τRi(s)eβTZi(s)dΛ(s)}eγTZi])μ1Ni(τ)tτ{1+0tRi(s)eβTZi(s)dΛ(s)}dNi(t){1+0τRi(s)eβTZi(s)dΛ(s)}κeγTZi

for some constant μ1. Thus, (C3) follows from the boundedness of γTi. We can verify the other conditions by using the arguments of Section 10.1.

To verify the first identifiability condition, we assume that Ni(t) has jumps at x, x1, …, xm for some integer m. After integrating both sides of the equation in (C5) over x1, …, xm from 0 to τ and integrating x from x to τ, we obtain

(G[{1+0τeβ0TZi(t)dΛ0(t)}eγ0TZi]G[{1+0xeβ0TZi(t)dΛ0(t)}eγ0TZi])×(G[{1+0τeβ0TZi(t)dΛ0(t)}eγ0TZi]G(1))mexp(G[{1+0τeβ0TZi(t)dΛ0(t)}eγ0TZi]+G(1))=(G[{1+0τeβTZi(t)dΛ(t)}eγTZi]G[{1+0xeβTZi(t)dΛ(t)}eγTZi])×(G[{1+0τeβTZi(t)dΛ(t)}eγTZi]G(1))mexp(G[{1+0τeβTZi(t)dΛ(t)}eγTZi]+G(1)).

Multiplying both sides of this equation by 1/m! and summing over m ≥ 0, we obtain

G[{1+0τeβ0TZi(t)dΛ0(t)}eγ0TZi]G[{1+0xeβ0TZi(t)dΛ0(t)}eγ0TZi]=G[{1+0τeβTZi(t)dΛ(t)}eγTZi]G[{1+0xeβTZi(t)dΛ(t)}eγTZi].

Setting Ni(τ)=0 in the likelihood function yields

G[{1+0τeβ0TZi(t)dΛ0(t)}eγ0TZi]=G[{1+0τeβTZi(t)dΛ(t)}eγTZi].

Thus

{1+0xeβ0TZi(t)dΛ0(t)}eγ0TZi={1+0xeβTZi(t)dΛ(t)}eγTZi.

Then Λ* (t) is absolutely continuous with respect to t. Differentiating both sides with respect to x and letting x = 0 yield λ* (0) > 0. When x converges to zero, the left-hand side is [exp{β0TZi(0)}λ0(0)x]eγ0TZi+o(xeγ0TZi) while the right-hand side is [exp{βTZi(0)}λ(0)x]eγTZi+o(xeγTZi). Thus, γ0TZi=γTZi. By (E5), γ0 = γ *. Furthermore, eβ0TZi(t)dΛ0(t)/dt=eβTZi(t)dΛ(t)/dt. It follows from (E5) that β0 = β* and Λ0 = Λ*.

To verify (C7), we assume that the score function along (β0 + εhβ, γ0 + εhγ, dΛ0 + εhdΛ0) is zero. Equivalently, if we let g0(t)={1+0teβ0TZi(s)dΛ0(s)}γ0TZi, then we obtain

0=h(t)Ri(t)dNi(t)+Ri(t){hβTZi(t)+hγTZi}dNi(t)+Ri(t)(eγTZi1)1+0teβ0TZi(s)dΛ0(s)[0teβ0TZi(s){hβTZi(s)+h(s)}dΛ0(s)]dNi(t)+Ri(t)hγTZieγTZilog{1+0teβ0TZi(s)dΛ0(s)}dNi(t)+Ri(t)G(g0(t))G(g0(t))g0(t)hγTZieγTZilog{1+0teβ0TZi(s)dΛ0(s)}dNi(t)+Ri(t)G(g0(t))G(g0(t))g0(t)[eγ0TZi0teβ0TZi(s){hβTZi(s)+h(s)}dΛ0(s)1+0teβ0TZi(s)dΛ0(s)]dNi(t)G(g0(τ))g0(τ)hγTZieγTZilog{1+0τeβ0TZi(s)dΛ0(s)}G(g0(τ))g0(τ)eγ0TZi1+0Teβ0TZi(s)dΛ0(s)0τeβ0TZi(s){hβTZi(s)+h(s)}dΛ0(s).

We multiply both sides by the likelihood function and let Ni(t) have jumps at times t1, t2, …, tm. We integrate t1 from 0 to t and tl, 1 < lm from 0 to τ. By multiplying the resulting equation by 1/(mk)! and summing over m = 1, 2, …, we obtain

hγTZilog{1+0teβ0TZi(s)dΛ0(s)}+0teβ0TZi(s){hβTZi(s)+h(s)}dΛ0(s)1+0teβ0TZi(s)dΛ0(s)=0.

Differentiation with respect to t then yields

hγTZi+{hβTZi(t)+h(t)}0teβ0TZi(s){hβTZi(s)+h(s)}dΛ0(s)1+0teβ0TZi(s)dΛ0(s)=0.

Combining the above two equations, we have

{hβTZi(t)+h(t)}0teβ0TZi(s){hβTZi(s)+h(s)}dΛ0(s)1+0teβ0TZi(s)dΛ0(s)[1+1log{1+0teβ0TZi(s)dΛ0(s)}]=0.

This is a homogeneous integral equation for hβTZi(t)+h(t) and has zero solution. That is, hβTZi(t)+h(t)=0. It follows from (E5) that h(t) = 0 and hβ= 0. Thus, hγ = 0.

11. Concluding Remarks

We have developed a general asymptotic theory for the NPMLEs with right censored data and shown that this theory applies to the models considered by Zeng and Lin (2007). This theory can also be used to establish the desired asymptotic properties for other existing semiparametric models, particularly the models mentioned in Sections 7.1–7.4 of Zeng and Lin (2007), as well as those that may be invented in the future. It is much simpler to verify the set of sufficient conditions identified in this paper than to prove the asymptotic results from scratch. Conditions (C1) and (C2) are standard conditions required in all censored-data regression; (C3), (C4) and (C6) are certain smoothness conditions that can be verified directly, as demonstrated in Section 10; (C5) and (C7) are two minimal identifiability conditions that need to be verified for any specific problem.

Although the basic structures of our proofs mimic those of Murphy (1994; 1995) and Parner (1998), our technical arguments are innovative and substantially more difficult because we deal with a very general form of likelihood function rather than specific problems. In all previous work, verification of the Donsker property relies on the specific expressions of the functions, whereas our Lemma 1 provides a universal way to verify this property. In verifying the invertibility of the information operator, all previous work requires an explicit expression of the information operator that is identified as the sum of an invertible operator and a compact operator, whereas we allow a very generic form of information operator obtained from the likelihood function (1). Murphy and van der Vaart (2001) stated that the consistency of NPMLEs needs to be proved on a case-by-case basis; however, we were able to prove the consistency for a very general likelihood function. Although we borrowed the partitioning idea of Murphy (1994), our technical arguments are very different because of the generic form of the likelihood.

In some applications, the failure times are subject to left truncation in addition to right censoring. To accommodate general censoring/truncation patterns, we define N(t) as the number of events observed by time t and R(t) as the at-risk indicator at time t, reflecting both left truncation and right censoring. Assume that the truncation time has positive mass at time 0, so that (C2) is satisfied. Then all the results continue to hold.

This paper is concerned with the theoretical aspect of the NPMLEs and complements the work of Zeng and Lin (2007). The interested readers are referred to the latter for the calculations of the NPMLEs and for the use of the semiparametric regression models and NPMLEs in practice. The latter also provides rationale for the kind of model considered in Sections 2 and 10 of this paper. Although the latter contains some theoretical elements, this paper presents the theory (especially the regularity conditions) in a more rigorous manner and provides all the proofs.

Contributor Information

Donglin Zeng, Email: dzeng@bios.unc.edu.

D. Y. Lin, Email: lin@bios.unc.edu.

References

  1. Heitjan DF, Rubin DB. Ignorability and coarse data. Ann Statist. 1991;19:2244–2253. [Google Scholar]
  2. Murphy SA. Consistency in a proportional hazards model incorporating a random effect. Ann Statist. 1994;22:712–731. [Google Scholar]
  3. Murphy SA. Asymptotic theory for the frailty model. Ann Statist. 1995;23:182–198. [Google Scholar]
  4. Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–485. [Google Scholar]
  5. Murphy SA, van der Vaart AW. Semiparametric mixtures in case-control studies. J Multi Analy. 2001;79:1–32. [Google Scholar]
  6. Parner E. Asymptotic theory for the correlated gamma-frailty model. Ann Statist. 1998;26:183–214. [Google Scholar]
  7. Rudin W. Functional Analysis. McGraw-Hill; New York: 1973. [Google Scholar]
  8. Van der Vaart AW. Asymptotic Statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
  9. Van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; New York: 1996. [Google Scholar]
  10. Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data (with discussion) J R Statist Soc, Ser B. 2007;69:507–564. [Google Scholar]

RESOURCES