CURRENT STATUS DATA WITH COMPETING RISKS: LIMITING DISTRIBUTION OF THE MLE

Piet Groeneboom; Marloes H Maathuis; Jon A Wellner

doi:10.1214/009053607000000983

. Author manuscript; available in PMC: 2009 Nov 2.

Published in final edited form as: Ann Stat. 2008 Jan 1;36(3):1064–1089. doi: 10.1214/009053607000000983

CURRENT STATUS DATA WITH COMPETING RISKS: LIMITING DISTRIBUTION OF THE MLE

Piet Groeneboom ¹, Marloes H Maathuis ², Jon A Wellner ³

PMCID: PMC2771736 NIHMSID: NIHMS102010 PMID: 19888358

Abstract

We study nonparametric estimation for current status data with competing risks. Our main interest is in the nonparametric maximum likelihood estimator (MLE), and for comparison we also consider a simpler ‘naive estimator’. Groeneboom, Maathuis and Wellner [8] proved that both types of estimators converge globally and locally at rate n^1/3. We use these results to derive the local limiting distributions of the estimators. The limiting distribution of the naive estimator is given by the slopes of the convex minorants of correlated Brownian motion processes with parabolic drifts. The limiting distribution of the MLE involves a new self-induced limiting process. Finally, we present a simulation study showing that the MLE is superior to the naive estimator in terms of mean squared error, both for small sample sizes and asymptotically.

Keywords and phrases: Survival analysis, Current status data, Competing risks, Maximum likelihood, Limiting distribution

1. Introduction

We study nonparametric estimation for current status data with competing risks. The set-up is as follows. We analyze a system that can fail from K competing risks, where K ∈ ℕ is fixed. The random variables of interest are (X, Y), where X ∈ ℝ is the failure time of the system, and Y ∈ {1,…,K} is the corresponding failure cause. We cannot observe (X,Y) directly. Rather, we observe the ‘current status’ of the system at a single random observation time T ∈ ℝ, where T is independent of (X,Y). This means that at time T, we observe whether or not failure occurred, and if and only if failure occurred, we also observe the failure cause Y. Such data arise naturally in cross-sectional studies with several failure causes, and generalizations arise in HIV vaccine clinical trials [see 10].

We study nonparametric estimation of the sub-distribution functions F₀₁,…,F_0K, where F_0k(s) = P(X ≤ s,Y = k), k = 1,…,K. Various estimators for this purpose were introduced in [10, 12], including the nonparametric maximum likelihood estimator (MLE), which is our primary focus. For comparison we also consider the ‘naive estimator’, an alternative to the MLE discussed in [12]. Characterizations, consistency, and n^1/3 rates of convergence of these estimators were established in Groeneboom, Maathuis and Wellner [8]. In the current paper we use these results to derive the local limiting distributions of the estimators.

1.1. Notation

The following notation is used throughout. The observed data are denoted by (T, Δ), where T is the observation time and Δ = (Δ₁,…,Δ_K+1) is an indicator vector defined by Δ_k = 1{X ≤ T,Y = k} for k = 1,…,K, and Δ_K+1 = 1{X > T}. Let (T_i, Δⁱ), i = 1,…,n, be n i.i.d. observations of (T, Δ), where $Δ^{i} = (Δ_{1}^{i}, \dots, Δ_{K + 1}^{i}) .$ Note that we use the superscript i as the index of an observation, and not as a power. The order statistics of T₁,…,T_n are denoted by T₍₁₎,…,T_(n). Furthermore, G is the distribution of T, G_n is the empirical distribution of T_i, i = 1,…,n, and ℙ_n is the empirical distribution (T_i, Δⁱ), i = 1,…,n. For any vector (x₁,…,x_K) ∈ ℝ^K we define $x_{+} = \sum_{k = 1}^{K} x_{k},$ so that, for example, $Δ_{+} = \sum_{k = 1}^{K} Δ_{k}$ and $F_{0 +} (s) = \sum_{k = 1}^{K} F_{0 k} (s) .$ For any K-tuple F = (F₁,…,F_K) of sub-distribution functions, we define F_K+1(s) = ∫_u>sdF₊(u) = F₊(∞) – F₊(s).

We denote the right-continuous derivative of a function f : ℝ ⟼ ℝ by f′ (if it exists). For any function f : ℝ ⟼ ℝ, we define the convex minorant of f to be the largest convex function that is pointwise bounded by f. For any interval I, D(I) denotes the collection of cadlag functions on I. Finally, we use the following definition for integrals and indicator functions:

Definition 1.1

Let dA be a Lebesgue-Stieltjes measure, and let W be a Brownian motion process. For t < t₀, we define $1_{[t_{0}, t)} (u) = - 1_{[t, t_{0})} (u),$ $\int_{[t_{0}, t)} f (u) d A (u) = - \int_{[t, t_{0})} f (u) d A (u),$ and $\int_{t_{0}}^{t} f (u) d W (u) = - \int_{t}^{t_{0}} f (u) d W (u) .$

1.2. Assumptions

We prove the local limiting distributions of the estimators at a fixed point t₀, under the following conditions: (a) The observation time T is independent of the variables of interest (X,Y); (b) For each k = 1,…,K, 0 < F_0k(t₀) < F_0k(∞), and F_0k and G are continuously differentiable at t₀ with positive derivatives f_0k(t₀) and g(t₀); (c) The system cannot fail from two or more causes at the same time. Assumptions (a) and (b) are essential for the development of the theory. Assumption (c) ensures that the failure cause is well-defined. This assumption is always satisfied by defining simultaneous failure from several causes as a new failure cause.

1.3. The estimators

We first consider the MLE. The MLE F̂_n = (F̂_n1,…,F̂_nK) is defined by $l_{n} ({\hat{F}}_{n}) = \max_{F \in ℱ_{κ}} l_{n} (F),$ where

l_{n} (F) = \int {\sum_{k = 1}^{K} δ_{k} \log F_{k} (t) + (1 - δ_{+}) \log (1 - F_{+} (t))} d ℙ_{n} (t, δ),

(1)

and ℱ_K is the collection of K-tuples F = (F₁,…,F_K) of sub-distribution functions on ℝ with F₊ ≤ 1. The naive estimator F̃_n = (F̃_n1,…,F̃_nK) is defined by $l_{nk} ({\tilde{F}}_{nk}) = \max_{F_{k} \in ℱ} l_{nk} (F_{k}),$ for k = 1,…,K, where ℱ is the collection of distribution functions on ℝ, and

l_{n k} (F_{k}) = \int {δ_{k} \log F_{k} (t) + (1 - δ_{k}) \log (1 - F_{k} (t))} d ℙ_{n} (t, δ), k = 1, \dots, K .

(2)

Note that F̃_nk only uses the kth entry of the Δ-vector, and is simply the MLE for the reduced current status data (T, Δ_k). Thus, the naive estimator splits the optimization problem into K separate well-known problems. The MLE, on the other hand, estimates F₀₁,…,F_0K simultaneously, accounting for the fact that $\sum_{k = 1}^{K} F_{0 k} (s) = P (X \leq s)$ is the overall failure time distribution. This relation is incorporated both in the object function l_n(F) (via the term log(1 – F₊)) and in the space ℱ_K over which is maximized (via the constraint F₊ ≤ 1).

1.4. Main results

The main results in this paper are the local limiting distributions of the MLE and the naive estimator. The limiting distribution of F̃_nk corresponds to the limiting distribution of the MLE for the reduced current status data (T, Δ_k). Thus, it is given by the slope of the convex minorant of a two-sided Brownian motion process plus parabolic drift [9, Theorem 5.1, page 89], known as Chernoff's distribution. The joint limiting distribution of (F̃_n1,…,F̃_nK) follows by noting that the K Brownian motion processes have a multinomial covariance structure, since Δ|T ∼ Mult_K+1(1, (F₀₁(T),…,F_0,K+1(T))). The drifted Brownian motion processes and their convex minorants are specified in Definitions 1.2 and 1.5. The limiting distribution of the naive estimator is given in Theorem 1.6, and is simply a K-dimensional version of the limiting distribution for current status data. A formal proof of this result can be found in [14, Section 6.1].

Definition 1.2

Let W = (W₁,…,W_K) be a K-tuple of two-sided Brownian motion processes originating from zero, with mean zero and covariances

E {W_{j} (t) W_{k} (s)} = (| s | \land | t |) 1 {s t > 0} Σ_{j k}, s, t \in ℝ, j, k \in {1, \dots, K},

(3)

where Σ_jk = g(t₀)⁻¹ {1{j = k}F_0k(t₀) – F_0j(t₀)F_0k(t₀)}. Moreover, V = (V₁,…,V_K) is a vector of drifted Brownian motions, defined by

V_{k} (t) = W_{k} (t) + \frac{1}{2} f_{0 k} (t_{0}) t^{2}, k = 1, \dots, K .

(4)

Following the convention introduced in Section 1.1, we write $W_{+} = \sum_{k = 1}^{K} W_{k}$ and $V_{+} = \sum_{k = 1}^{K} V_{k} .$ Finally, we use the shorthand notation a_k = (F_0k(t₀))⁻¹, k = 1,…,K + 1.

Remark 1.3

Note that W is the limit of a rescaled version of W_n = (W_n1,…,W_nK), and that V is the limit of a recentered and rescaled version of V_n = (V_n1,…,V_nK), where W_nk and V_nk are defined by (17) and (6) of [8]:

\begin{matrix} W_{n k} (t) = \int_{u \leq t} {δ_{k} - F_{0 k} (t_{0})} d ℙ_{n} (u, δ), & t \in ℝ, k = 1, \dots, K, \\ V_{n k} (t) = \int_{u \leq t} δ_{k} d ℙ_{n} (u, δ), & t \in ℝ, k = 1, \dots, K . \end{matrix}

(5)

Remark 1.4

We define the correlation between Brownian motions W_j and W_k by

r_{j k} = \frac{Σ_{j k}}{\sqrt{Σ_{j j} Σ_{k k}}} = - \frac{\sqrt{F_{0 j} (t_{0}) F_{0 k} (t_{0})}}{\sqrt{(1 - F_{0 j} (t_{0})) (1 - F_{0 k} (t_{0}))}} .

Thus, the Brownian motions are negatively correlated, and this negative correlation becomes stronger as t₀ increases. In particular, it follows that r₁₂ → −1 as F₀₊(t₀)→ 1, in the case of K = 2 competing risks.

Definition 1.5

Let H̃ = (H̃₁,…,H̃_K) be the vector of convex minorants of V, i.e., H̃_k is the convex minorant of V_k, for k = 1,…,K. Let F̃ = (F̃₁,…,F̃_K) be the vector of right derivatives of H̃.

Theorem 1.6

Under the assumptions of Section 1.2, n^1/3{F̃_n(t₀ + n^−1/3t) – F₀(t₀)} →_d F̃(t) in the Skorohod topology on (D(ℝ))^K.

The limiting distribution of the MLE is given by the slopes of a new self-induced process Ĥ = (Ĥ₁,…,Ĥ_K), defined in Theorem 1.7. We say that the process Ĥ is ‘self-induced’, since each component Ĥ_k is defined in terms of the other components through ${\hat{H}}_{+} = \sum_{j = 1}^{K} {\hat{H}}_{j} .$ Due to this self-induced nature, existence and uniqueness of Ĥ need to be formally established (Theorem 1.7). The limiting distribution of the MLE is given in Theorem 1.8. These results are proved in the remainder of the paper.

Theorem 1.7

There exists an almost surely unique K-tuple Ĥ = (Ĥ₁,…,Ĥ_K) of convex functions with right-continuous derivatives F̂ = (F̂,…,F̂_K), satisfying the following three conditions:

a_kĤ_k(t) + a_K+1Ĥ₊(t) ≤ a_kV_k(t) + a_K+1V₊(t), for k = 1,…,K, t ∈ ℝ,
∫{a_kĤ_k(t) + a_K+1Ĥ₊(t) – a_kV_k(t) – a_K+1V₊(t)}dF̂_k(t) = 0, k = 1,…,K,
For all M > 0 and k = 1,…,K, there are points τ_1k < −M and τ_2k > M so that a_kĤ_k(t) + a_K+1Ĥ₊(t) = a_kV_k(t) + a_K+1V₊(t) for t = τ_1kand t = τ_2k.

Theorem 1.8

Under the assumptions of Section 1.2, n^1/3{F̂_n(t₀ + n^−1/3t) – F₀(t₀)} →_d F̂(t) in the Skorohod topology on (D(ℝ))^K. Thus, the limiting distributions of the MLE and the naive estimator are given by the slopes of the limiting processes Ĥ and H̃, respectively. In order to compare Ĥ and H̃ we note that the convex minorant H̃_k of V_k can be defined as the almost surely unique convex function H̃_k with right-continuous derivative F̃_k that satisfies (i) H̃_k(t) ≤ V_k(t) for all t ∈ ℝ, and (ii) ∫{H̃_k(t) – Ṽ_k(t)}dF̃_k(t) = 0. Comparing this to the definition of Ĥ_k in Theorem 1.7, we see that the definition of Ĥ_k contains the extra terms Ĥ₊ and V₊, which come from the term log(1 – F₊(t)) in the log likelihood (1). The presence of Ĥ₊ in Theorem 1.7 causes Ĥ to be self-induced. In contrast, the processes H̃_k for the naive estimator depend only on V_k, so that H̃ is not self-induced. However, note that the processes H̃₁,…,H̃_K are correlated, since the Brownian motions W₁,…,W_K are correlated (see Definition 1.2).

1.5. Outline

This paper is organized as follows. In Section 2 we discuss the new self-induced limiting processes Ĥ and F̂. We give various interpretations of these processes and prove the uniqueness part of Theorem 1.7. Section 3 establishes convergence of the MLE to its limiting distribution (Theorem 1.8). Moreover, in this proof we automatically obtain existence of Ĥ, hence completing the proof of Theorem 1.7. This approach to proving existence of the limiting processes is different from the one followed by [5, 6] for the estimation of convex functions, who establish existence and uniqueness of the limiting process before proving convergence. In Section 4 we compare the estimators in a simulation study, and show that the MLE is superior to the naive estimator in terms of mean squared error, both for small sample sizes and asymptotically. We also discuss computation of the estimators in Section 4. Technical proofs are collected in Section 5.

2. Limiting processes

We now discuss the new self-induced processes Ĥ and F̂ in more detail. In Section 2.1 we give several interpretations of these processes, and illustrate them graphically. In Section 2.2 we prove tightness of {F̂_k – f_0k(t₀)t} and {Ĥ_k(t) – V_k(t)}, for t ∈ ℝ. These results are used in Section 2.3 to prove almost sure uniqueness of Ĥ and F̂.

2.1. Interpretations of Ĥ and F̂

Let k ∈ {1,…,K}. Theorem 1.7 (i) and the convexity of Ĥ_k, imply that a_kĤ_k + a_K+1Ĥ₊ is a convex function that lies below a_kV_k + a_K+1V₊. Hence, a_kĤ_k + a_K+1Ĥ₊ is bounded above by the convex minorant of a_kV_k + a_K+1V₊. This observation leads directly to the following proposition about the points of touch between a_kĤ_k + a_K+1Ĥ₊ and a_kV_k + a_K+1V₊:

Proposition 2.1

For each k = 1,…,K, we define 𝒩_k and 𝒩̂_k by

N_{k} = {p o i n t s o f t o u c h b e t w e e n a_{k} V_{k} + a_{K + 1} V + a n d i t s c o n v e x m i n o r a n t},

(6)

{\hat{N}}_{k} = {p o i n t s o f t o u c h b e t w e e n a_{k} V_{k} + a_{K + 1} V_{+} a n d a_{k} {\hat{H}}_{k} + a_{K + 1} {\hat{H}}_{+}} .

(7)

Then the following properties hold: (i) 𝒩̂_k ⊂ 𝒩_k, and (ii) At points t ∈ 𝒩̂_k, the right and left derivatives of a_kĤ_k(t) + a_K+1Ĥ₊(t) are bounded above and below by the right and left derivatives of the convex minorant of a_kV_k(t) + a_K+1V₊(t).

Since a_kV_k + a_K+1V₊ is a Brownian motion process plus parabolic drift, the point process 𝒩_k is well-known from [4]. On the other hand, little is known about 𝒩̂_k, due to the self-induced nature of this process. However, Proposition 2.1 (i) relates 𝒩̂_k to 𝒩_k, and this allows us to deduce properties of 𝒩̂_k and the associated processes Ĥ_k and F̂_k. In particular, Proposition 2.1 (i) implies that F̂_k is piecewise constant, and that Ĥ_k is piecewise linear (Corollary 2.2). Moreover, Proposition 2.1 (i) is essential for the proof of Proposition 2.16, where it is used to establish expression (30). Proposition 2.1 (ii) is not used in the sequel.

Corollary 2.2

For each k ∈ {1,…,K}, the following properties hold almost surely: (i) 𝒩̂_k has no condensation points in a finite interval, and (ii) F̂_k is piecewise constant and Ĥ_k is piecewise linear.

Proof. 𝒩_k is a stationary point process which, with probability one, has no condensation points in a finite interval [see 4]. Together with Proposition 2.1 (i), this yields that with probability one, 𝒩̂_k has no condensation points in a finite interval. Conditions (i) and (ii) of Theorem 1.7 imply that F̂_k can only increase at points t ∈ 𝒩_k. Hence, F̂_k is piecewise constant and Ĥ_k is piecewise linear.

Thus, conditions (i) and (ii) of Theorem 1.7 imply that a_kĤ_k + a_K+1H^₊ is a piecewise linear convex function, lying below a_kV_k + a_K+1V₊, and touching a_kV_k + a_K+1V₊ whenever F̂_k jumps. We illustrate these processes using the following example with K = 2 competing risks:

Example 2.3 Let K = 2, and let T be independent of (X,Y). Let T,Y and X|Y be distributed as follows: G(t) = 1 – exp(−t), P(Y = k) = k/3 and P(X ≤ t|Y = k) = 1 – exp(−kt) for k = 1,2. This yields F_0k(t) = (k/3){1 – exp(−kt)} for k = 1,2.

Figure 1 shows the limiting processes a_kV_k + a_K+1V₊, a_kĤ_k + a_K+1Ĥ₊, and F̂_k, for this model with t₀ = 1. The relevant parameters at the point t₀ = 1 are:

F_{01} (1) = 0.21, F_{02} (1) = 0.58, f_{01} (1) = 0.12, f_{02} (1) = 0.18, g (1) = 0.37 .

F₀₁(1) = 0.21, F₀₂(1) = 0.58, f₀₁(1) = 0.12, f₀₂(1) = 0.18, g(1) = 0.37.

FIG 1 — Limiting processes for the model given in Example 2.3 for t₀ = 1. The top row shows the processes a_kV_k + a_K+1V₊ and a_kĤ_k + a_k+1Ĥ₊, around the dashed parabolic drifts a_kf_0k(t₀)t²/2 + a_K+1f₀₊(t₀)t²/2. The bottom row shows the slope processes F̂_k, around dashed lines with slope f_0k(t₀). The circles and crosses indicate jump points of F̂₁ and F̂₂, respectively. Note that a_kĤ_k + a_k+1Ĥ₊ touches a_kV_k + a_K+1V₊ whenever F̂_k has a jump, for k = 1, 2.

The processes shown in Figure 1 are approximations, obtained by computing the MLE for sample size n = 100,000 (using the algorithm described in Section 4), and then computing the localized processes $V_{n k}^{l o c}$ and ${\hat{H}}_{n k}^{l o c}$ (see Definition 3.1 ahead).

Note that F̂₁ has a jump around −3. This jump causes a change of slope in a_kĤ_k + a_K+1Ĥ₊ for both components k ∈ {1,2}, but only for k = 1 is there a touch between a_kĤ_k + a_K+1Ĥ₊ and a_kV_k + a_K+1V₊. Similarly, F̂₂ has a jump around −1. Again, this causes a change of slope in a_kĤ_k + a_K+1Ĥ₊ for both components k ∈ {1,2}, but only for k = 2 is there a touch between a_kĤ_k + a_K+1Ĥ₊ and a_kV_k + a_K+1V₊. The fact that a_kĤ_k + a_K+1Ĥ₊ has changes of slope without touching a_kV_k + a_K+1V₊ implies that a_kĤ_k + a_K+1Ĥ₊ is not the convex minorant of a_kV_k + a_K+1V₊.

It is possible to give convex minorant characterizations of Ĥ, but again these characterizations are self-induced. Proposition 2.4 (a) characterizes Ĥ_k in terms of $\sum_{j = 1}^{K} {\hat{H}}_{j},$ and Proposition 2.4 (b) characterizes Ĥ_k in terms of $\sum_{j = 1, j \neq k}^{K} {\hat{H}}_{j} .$

Proposition 2.4

Ĥ satisfies the following convex minorant characterizations:

For each k = 1,…,K, Ĥ_k(t) is the convex minorant of
$V_{k} (t) + \frac{a_{K + 1}}{a_{k}} {V_{+} (t) - {\hat{H}}_{+} (t)} .$ (8)
For each k = 1,…,K, Ĥ_k(t) is the convex minorant of
$V_{k} (t) + \frac{a_{K + 1}}{a_{k} + a_{K + 1}} {V_{+}^{(- k)} (t) - {\hat{H}}_{+}^{(- k)} (t)},$ (9)
where $V_{+}^{(- k)} (t) = \sum_{j = 1, j \neq k}^{K} V_{j} (t)$ and ${\hat{H}}_{+}^{(- k)} (t) = \sum_{j = 1, j \neq k}^{K} {\hat{H}}_{j} (t) .$

Proof. Conditions (i) and (ii) of Theorem 1.7 are equivalent to:

\begin{array}{l} {\hat{H}}_{k} (t) \leq V_{k} (t) + \frac{a_{K + 1}}{a_{k}} {V_{+} (t) - {\hat{H}}_{+} (t)}, t \in ℝ, \\ \int {{\hat{H}}_{k} (t) - V_{k} (t) - \frac{a_{K + 1}}{a_{k}} {V_{+} (t) - {\hat{H}}_{+} (t)}} d {\hat{F}}_{k} (t) = 0, \end{array}

for k = 1,…,K. This gives characterization (a). Similarly, characterization (b) holds since conditions (i) and (ii) of Theorem 1.7 are equivalent to:

\begin{array}{l} {\hat{H}}_{k} (t) \leq V_{k} (t) + \frac{a_{K + 1}}{a_{k} + a_{K + 1}} {V_{+}^{(- k)} (t) - {\hat{H}}_{+}^{(- k)} (t)}, t \in ℝ, \\ \int {{\hat{H}}_{k} (t) - V_{k} (t) - \frac{a_{K + 1}}{a_{k} + a_{K + 1}} {V_{+}^{(- k)} (t) - {\hat{H}}_{+}^{(- k)} (t)}} d {\hat{F}}_{k} (t) = 0, \end{array}

for k = 1,…,K.

Comparing the MLE and the naive estimator, we see that H̃_k is the convex minorant of V_k, and Ĥ_k is the convex minorant of V_k + (a_K+1/a_k){V₊ – Ĥ₊}. These processes are illustrated in Figure 2. The difference between the two estimators lies in the extra term (a_K+1/a_k){V₊ – Ĥ₊}, which is shown in the bottom row of Figure 2. Apart from the factor a_K+1/a_k, this term is the same for all k = 1,…,K. Furthermore, a_K+1/a_k = F_0k(t₀)/F_0,K+1(t₀) is an increasing function of t₀, so that the extra term (a_K+1/a_k){V₊ – Ĥ₊} is more important for large values of t₀. This provides an explanation for the simulation results shown in Figure 3 of Section 4, which indicate that MLE is superior to the naive estimator in terms of mean squared error, especially for large values of t. Finally, note that (a_K+1/a_k){V₊ – Ĥ₊} appears to be nonnegative in Figure 2. In Proposition 2.5 we prove that this is indeed the case. In turn, this result implies that H̃_k ≤ Ĥ_k (Corollary 2.6), as shown in the top row of Figure 2.

FIG 2 — Limiting processes for the model given in Example 2.3 for t₀ = 1. The top row shows the processes V_k and their convex minorants H̃_k (grey), together with V_k + (a_k+1/a_k)(V₊–Ĥ₊ and their convex minorants Ĥ_k (black). The dashed lines depict the parabolic drift f_0k(t₀)t²/2. The middle row shows the slope processes F̃_k (grey) and F̂_k (black), which follow the dashed lines with slope f_0k(t₀). The bottom row shows the ‘correction term’ (a_k+1/a_k)(V₊–Ĥ₊) for the MLE.

FIG 3 — Relative MSEs, computed by dividing the MSE of the MLE by the MSE of the other estimators. All MSEs were computed over 1000 simulations for each sample size, on the grid 0, 0.01, 0.02,…, 3.0.

Proposition 2.5

Ĥ₊ (t) ≤ V₊(t) for all t ∈ ℝ.

Proof. Theorem 1.7 (i) can be written as

{\hat{H}}_{k} (t) + \frac{F_{0 k} (t_{0})}{1 - F_{0 +} (t_{0})} {\hat{H}}_{+} (t) \leq V_{k} (t) + \frac{F_{0 k} (t_{0})}{1 - F_{0 +} (t_{0})} V_{+} (t), k = 1, \dots, K, t \in ℝ .

The statement then follows by summing over k = 1,…,K.

Corollary 2.6

H̃_k(t) ≤ Ĥ_k(t) for all k = 1,…,K and t ∈ ℝ.

Proof. Let k ∈ {1,…,K} and recall that H̃_k is the convex minorant of V_k. Since V₊ – Ĥ₊ ≥ 0 by Proposition 2.5, it follows that H̃_k is a convex function below V_k + (a_K+1/a_k){V₊ – Ĥ₊}. Hence, it is bounded above by the convex minorant Ĥ_k of V_k + (a_K+1/a_k){V₊ – Ĥ₊}.

Finally, we write the characterization of Theorem 1.7 in a way that is analogous to the characterization of the MLE in Proposition 4.8 of [8]. We do this to make a connection between the finite sample situation and the limiting situation. Using this connection, the proofs for the tightness results in Section 2.2 are similar to the proofs for the local rate of convergence in [8, Section 4.3]. We need the following definition:

Definition 2.7

For k = 1,…,K and t ∈ ℝ, we define

{\bar{F}}_{0 k} (t) = f_{0 k} (t_{0}) t and S_{k} (t) = a_{k} W_{k} (t) + a_{K + 1} W_{+} (t) .

(10)

Note that S_k is the limit of a rescaled version of the process S_nk = a_kW_nk + a_K+1W_n+, defined in (18) of [8].

Proposition 2.8

For all k = 1,…,K, for each point τ_k ∈ 𝒩̂_k (defined in (7)) and for all s ∈ ℝ we have:

\int_{τ_{k}}^{s} {a_{k} {{\hat{F}}_{k} (u) - {\bar{F}}_{0 k} (u)} + a_{K + 1} {{\hat{F}}_{+} (u) - {\bar{F}}_{0 +} (u)}} d u \leq \int_{τ_{k}}^{s} d S_{k} (u),

(11)

and equality must hold if s ∈ 𝒩̂_k.

Proof. Let k ∈ {1,…,K}. By Theorem 1.7 (i), we have

a_{k} {\hat{H}}_{k} (t) + a_{K + 1} {\hat{H}}_{+} (t) \leq a_{k} V_{k} (t) + a_{K + 1} V_{+} (t), t \in ℝ,

where equality holds at t = τ_k ∈ 𝒩̂_k. Subtracting this expression for t = τ_k from the expression for t = s, we get:

\int_{τ_{k}}^{s} {a_{k} {\hat{F}}_{k} (u) + a_{K + 1} {\hat{F}}_{+} (u)} d u \leq \int_{τ_{k}}^{s} {a_{k} d V_{k} (u) + a_{K + 1} d V_{+} (u)} .

The result then follows by subtracting $\int_{τ_{k}}^{s} {a_{k} {\bar{F}}_{0 k} (u) + a_{K + 1} {\bar{F}}_{0 +} (u)} d u$ from both sides, and using that dV_k(u) = F̄_0k(u)du + dW_k(u) (see (4)).

2.2. Tightness of Ĥ and F̂

The main results of this section are tightness of {F̂_k(t) – F̄_0k(t)} (Proposition 2.9) and {Ĥ_k(t) – V_k(t)} (Corollary 2.15), for t ∈ ℝ. These results are used in Section 2.3 to prove that Ĥ and F̂ are almost surely unique.

Proposition 2.9

For every ϵ > 0 there is an M > 0 such that

P (| {\hat{F}}_{k} (t) - {\bar{F}}_{0 k} (t) | \geq M) < ϵ, f o r k = 1, \dots, K, t \in ℝ .

Proposition 2.9 is the limit version of Theorem 4.17 of [8], which gave the n^1/3 local rate of convergence of F̂_nk. Hence, analogously to [8, Proof of Theorem 4.17], we first prove a stronger tightness result for the sum process {F̂₊(t) – F̄₀₊(t)}, t ∈ ℝ.

Proposition 2.10

Let β ∈ (0,1) and define

v (t) = {\begin{matrix} 1, & i f | t | \leq 1, \\ | t |^{β}, & i f | t | > 1 . \end{matrix}

(12)

Then for every ϵ > 0 there is an M > 0 such that

P (\sup_{t \in ℝ} \frac{| {\hat{F}}_{+} (t) - {\bar{F}}_{0 +} (t)}{v (t - s)} \geq M) < ϵ, f o r s \in ℝ .

Proof. The organization of this proof is similar to the proof of Theorem 4.10 of [8]. Let ϵ > 0. We only prove the result for s = 0, since the proof for s ≠ 0 is equivalent, due to stationarity of the increments of Brownian motion.

It is sufficient to show that we can choose M > 0 such that

\begin{array}{l} P (\exists t \in ℝ : {\hat{F}}_{+} (t) \notin ({\bar{F}}_{0 +} (t - M v (t)), {\bar{F}}_{0 +} (t + M v (t)))) \\ = P (\exists t \in ℝ : | {\hat{F}}_{+} (t) - {\bar{F}}_{0 +} (t) | \geq f_{0 +} (t_{0}) M v (t)) < ϵ . \end{array}

In fact, we only prove that there is an M such that

P (\exists t \in [0, \infty) : {\hat{F}}_{+} (t) \geq {\bar{F}}_{0 +} (t + M v (t))) < \frac{ϵ}{4},

since the proofs for the inequality F̂₊(t) ≤ F̄₀₊(t – Mv(t)) and the interval (−∞,0] are analogous. In turn, it is sufficient to show that there is an m₁ > 0 such that

P (\exists t \in [j, j + 1) : {\hat{F}}_{+} (t) \geq {\bar{F}}_{0 +} (t + M v (t))) \leq p_{j} M, j \in ℕ, M > m_{1},

(13)

where p_jM satisfies $\sum_{j = 0}^{\infty} p_{j M} \to 0$ as M → ∞. We prove (13) for

p_{j M} = d_{1} \exp {- d_{2} {(M v (j))}^{3}},

(14)

where d₁ and d₂ are positive constants. Using the monotonicity of F̂₊, we only need to show that P(A_jM) < p_jM for all j ∈ ℕ and M > m₁, where

A_{j M} = {\hat{F} + (j + 1) \geq {\bar{F}}_{0 +} (s_{j M})} and s_{j M} = j + M v (j) .

(15)

We now fix M > 0 and j ∈ ℕ, and define τ_kj = max{𝒩̂k ∩ (−∞,j + 1]}, for k = 1,…,K. These points are well defined by Theorem 1.7 (iii) and Corollary 2.2 (i). Without loss of generality, we assume that the sub-distribution functions are labeled so that τ_1j ≤ ⋯ ≤ τ_Kj. On the event A_jM, there is a k ∈ {1,…,K} such that F̂_k(j + 1) ≥ F̄_0k(s_jM). Hence, we can define ℓ ∈ {1,…,K} such that

{\hat{F}}_{k} (j + 1) < {\bar{F}}_{0 k} (s_{j M}), k = ℓ + 1, \dots, K,

(16)

{\hat{F}}_{ℓ} (j + 1) \geq {\bar{F}}_{0 ℓ} (s_{j M}) .

(17)

Recall that F̂ must satisfy (11). Hence, P(A_jM) equals

\begin{array}{l} P (\int_{τ_{ℓ j}}^{s_{j M}} {a_{ℓ} {{\hat{F}}_{ℓ} (u) - {\bar{F}}_{0 ℓ} (u)} + a_{K + 1} {{\hat{F}}_{+} (u) - {\bar{F}}_{0 +} (u)}} d u \leq \int_{τ_{ℓ j}}^{s_{j M}} d S_{ℓ} (u), A_{j M}) \\ \leq P (\int_{τ_{ℓ j}}^{s_{j M}} a_{ℓ} {{\hat{F}}_{ℓ} (u) - {\bar{F}}_{0 ℓ} (u)} d u \leq \int_{τ_{ℓ j}}^{s_{j M}} d S_{ℓ} (u), A_{j M}) \end{array}

(18)

+ P (\int_{τ_{ℓ j}}^{s_{j M}} {{\hat{F}}_{+} (u) - {\bar{F}}_{0 +} (u)} d u \leq 0, A_{j M}) .

(19)

Using the definition of τ_ℓj and the fact that F̂_ℓ is monotone nondecreasing and piecewise constant (Corollary 2.2), it follows that on the event A_jM we have, F̂_ℓ(u) ≥ F̂_ℓ(τ_ℓj) = F̂_ℓ(j + 1) ≥ F̄_0ℓ(s_jM), for u ≥ τ_ℓj. Hence, we can bound (18) above by

\begin{matrix} P (\int_{τ_{ℓ j}}^{s_{j M}} a_{ℓ} {{\bar{F}}_{0 ℓ} (s_{j M}) - {\bar{F}}_{0 ℓ} (u)} d u \leq \int_{τ_{ℓ j}}^{s_{j M}} d S_{ℓ} (u)) \\ = P (\frac{1}{2} f_{0 ℓ} (t_{0}) {(s_{j M} - τ_{ℓ j})}^{2} \leq \int_{τ_{ℓ j}}^{s_{j M}} d S_{ℓ} (u)) \\ \leq P (\inf_{w \leq j + 1} {\frac{1}{2} f_{0 ℓ} (t_{0}) {(s_{j M} - w)}^{2} - \int_{w}^{s_{j M}} d S_{ℓ} (u)} \leq 0) . \end{matrix}

For m₁ sufficiently large, this probability is bounded above by p_jM/2 for all M > m₁ and j ∈ ℕ, by Lemma 2.11 below. Similarly, (19) is bounded by p_jM/2, using Lemma 2.12 below.

Lemmas 2.11 and 2.12 are the key lemmas in the proof of Proposition 2.10. They are the limit versions of Lemmas 4.13 and 4.14 of [8], and their proofs are given in Section 5. The basic idea of Lemma 2.11 is that the positive quadratic drift b(s_jM – w)² dominates the Brownian motion process S_k and the term C(s_jM – w)^3/2. Note that the lemma also holds when C(s_jM – w)^3/2 is omitted, since this term is positive for M > 1. In fact, in the proof of Proposition 2.10 we only use the lemma without this term, but we need the term C(s_jM – w)^3/2 in the proof of Proposition 2.9 ahead. The proof of Lemma 2.12 relies on the system of component processes. Since it is very similar to the proof of Lemma 4.14, we only point out the differences in Section 5.

Lemma 2.11

Let C > 0 and b > 0. Then there exists an m₁ > 0 such that for all k = 1,…,K, M > m₁and j ∈ ℕ,

P (\inf_{w \leq j + 1} {b {(s_{j M} - w)}^{2} - \int_{w}^{s_{j M}} d S_{k} (u) - C {(s_{j M} - w)}^{3 / 2}} \leq 0) \leq p_{j M},

where s_jM = j + Mv(j), and S_k(·), v(·) and p_jM are defined by (10), (12) and (14), respectively.

Lemma 2.12

Let ℓ be defined by (16) and (17). There is an m₁ > 0 such that

P (\int_{τ_{ℓ j}}^{s_{j M}} {{\hat{F}}_{+} (u) - {\bar{F}}_{0 +} (u)} d u \leq 0, A_{j M}) \leq p_{j M}, f o r M > m_{1}, j \in ℕ,

where s_jM = j + Mv(j), τ_ℓj = max{𝒩̂_ℓ ∩ (−∞,j + 1]} and v(·), p_jM and A_jM are defined by (12), (14) and (15), respectively.

In order to prove tightness of {F̂_k(t) – F̄_0k(t), t ∈ ℝ, we only need Proposition 2.10 to hold for one value of β ∈ (0,1), analogously to [8, Remark 4.12]. We therefore fix β = 1/2, so that $v (t) = 1 \lor \sqrt{| t |} .$ Then Proposition 2.10 leads to the following corollary, which is a limit version of Corollary 4.16 of [8]:

Corollary 2.13

For every ϵ > 0 there is a C > 0 such that

P {\sup_{u \in ℝ_{+}} \frac{\int_{s - u}^{s} | {\hat{F}}_{+} (t) - {\bar{F}}_{0 +} (t) | d t}{u \lor u^{3 / 2}} \geq C} < ϵ, f o r s \in ℝ .

This corollary allows us to complete the proof of Proposition 2.9.

Proof of Proposition 2.9

Let ϵ > 0 and let k ∈ {1,…,K}. It is sufficient to show that there is an M > 0 such that P(F̂_k(t) ≥ F̄_0k(t + M)) < ϵ and P(F̂_k(t) ≤ F̄_0k(t – M)) < ϵ for all t ∈ ℝ. We only prove the first inequality, since the proof of the second one is analogous. Thus, let t ∈ ℝ and M > 1, and define

B_{k M} = {{\hat{F}}_{k} (t) \geq {\bar{F}}_{0 k} (t + M)} and τ_{k} = \max {{\hat{N}}_{k} \cap (- \infty, t]} .

Note that τ_k is well-defined because of Theorem 1.7 (iii) and Corollary 2.2 (i). We want to prove that P(B_kM) < ϵ. Recall that F̂ must satisfy (11). Hence,

\begin{matrix} P (B_{k M}) = P (\int_{τ k}^{t + M} {a_{k} {{\hat{F}}_{k} (u) - {\bar{F}}_{0 k} (u)} + a_{K + 1} {{\hat{F}}_{+} (u) - {\bar{F}}_{0 +} (u)}} d u \\ \leq \int_{τ k}^{t + M} d S_{k} (u), B_{k M}) . \end{matrix}

(20)

By Corollary 2.13, we can choose C > 0 such that, with high probability,

\int_{τ_{κ}}^{t + M} | {\hat{F}}_{+} (u) - {\bar{F}}_{0 +} (u) | d u \leq C {(t + M - τ_{k})}^{3 / 2},

(21)

uniformly in τ_k ≤ t, using that u^3/2 > u for u > 1. Moreover, on the event B_kM, we have $\int_{τ_{κ}}^{t + M} {{\hat{F}}_{k} (u) - {\bar{F}}_{0 k} (u)} d u \geq \int_{τ_{k}}^{t + M} {{\bar{F}}_{0 k} (t + M) - {\bar{F}}_{0 k} (u)} d u = f_{0 k} (t_{0}) {(t + M - τ_{k})}^{2} / 2,$ yielding a positive quadratic drift. The statement now follows by combining these facts with (20), and applying Lemma 2.11.

Proposition 2.9 leads to the following corollary about the distance between the jump points of F̂_k. The proof is analogous to the proof of Corollary 4.19 of [8], and is therefore omitted.

Corollary 2.14

For all k = 1,…,K, let $τ_{κ}^{-} (s)$ and $τ_{k}^{+} (s)$ be, respectively, the largest jump point ≤ s and the smallest jump point > s of F̂_k. Then for every ϵ > 0 there is a C > 0 such that $P (τ_{k}^{+} (s) - τ_{k}^{-} (s) > C) < ϵ,$ for k = 1,…,K, s ∈ ℝ. Combining Theorem 2.9 and Corollary 2.14 yields tightness of {Ĥ_k(t) – V_k(t)}:

Corollary 2.15

For every ϵ > 0 there is an M > 0 such that

P (| {\hat{H}}_{k} (t) - V_{k} (t) | > M) < ϵ, f o r t \in ℝ .

2.3. Uniqueness of Ĥ and F̂

We now use the tightness results of Section 2.2 to prove the uniqueness part of Theorem 1.7, as given in Proposition 2.16. The existence part of Theorem 1.7 will follow in Section 3.

Proposition 2.16Let Ĥ and H satisfy the conditions of Theorem 1.7. Then Ĥ ≡ H almost surely.

The proof of Proposition 2.16 relies on the following lemma:

Lemma 2.17

Let Ĥ = (Ĥ₁,…,Ĥ_K) and H = (H₁,…,H_K) satisfy the conditions of Theorem 1.7, and let F̂ = (F̂₁,…,F̂_K) and F = (F₁,…,F_K) be the corresponding derivatives. Then

\begin{array}{l} \sum_{k = 1}^{K} a_{k} \int {F_{k} (t) - {\hat{F}}_{k} (t)}^{2} d t + a_{K + 1} \int {F_{+} (t) - {\hat{F}}_{+} (t)}^{2} d t \\ \leq \underset{m \to \infty}{\lim \inf} \sum_{k = 1}^{K} {ψ_{k} (m) - ψ_{k} (- m)}, \end{array}

(22)

where ψ_k : ℝ → ℝ is defined by

ψ_{k} (t) = {F_{k} (t) - {\hat{F}}_{k} (t)} [a_{k} {H_{k} (t) - {\hat{H}}_{k} (t)} + a_{K + 1} {H_{+} (t) - {\hat{H}}_{+} (t)}] .

(23)

Proof. We define the following functional:

\begin{matrix} ϕ_{m} (F) = \sum_{k = 1}^{K} a_{k} {\frac{1}{2} \int_{- m}^{m} F_{k}^{2} (t) d t - \int_{- m}^{m} F_{k} (t) d V_{k} (t)} \\ + a_{K + 1} {\frac{1}{2} \int_{- m}^{m} F_{+}^{2} (t) d t - \int_{- m}^{m} F_{+} (t) d V + (t)}, m \in ℕ . \end{matrix}

Then, letting

D_{k} (t) = a_{k} {H_{k} (t) - V_{k} (t)} + a_{K + 1} {H_{+} (t) - V_{+} (t)},

(24)

{\hat{D}}_{k} (t) = a_{k} {{\hat{H}}_{k} (t) - V_{k} (t)} + a_{K + 1} {{\hat{H}}_{+} (t) - V_{+} (t)},

(25)

and using $F_{k}^{2} - {\hat{F}}_{k}^{2} = {(F_{k} - {\hat{F}}_{k})}^{2} + 2 {\hat{F}}_{k} (F_{k} - {\hat{F}}_{k}),$ we have

\begin{matrix} ϕ_{m} (F) - ϕ_{m} (\hat{F}) = \sum_{k = 1}^{K} \frac{a_{k}}{2} \int_{- m}^{m} {F_{k} (t) - {\hat{F}}_{k} (t)}^{2} d t + \frac{a_{K + 1}}{2} \int_{- m}^{m} {F_{+} (t) - {\hat{F}}_{+} (t)}^{2} d t \\ + \sum_{k = 1}^{K} \int_{- m}^{m} {F_{k} (t) - {\hat{F}}_{k} (t)} d {\hat{D}}_{k} (t) . \end{matrix}

(26)

Using integration by parts, we rewrite the last term of the right side of (26) as:

\begin{matrix} \sum_{k = 1}^{k} {F_{k} (t) - {\hat{F}}_{k} (t)} {\hat{D}}_{k} (t) |_{- m}^{m} - \sum_{k = 1}^{K} \int_{- m}^{m} {\hat{D}}_{k} (t) d {F_{k} (t) - {\hat{F}}_{k} (t)} \\ \geq \sum_{k = 1}^{K} {F_{k} (t) - {\hat{F}}_{k} (t)} {\hat{D}}_{k} (t) |_{- m}^{m} . \end{matrix}

(27)

The inequality on the last line follows from: (a) $\int_{- m}^{m} {\hat{D}}_{k} (t) d {\hat{F}}_{k} (t) = 0$ by Theorem 1.7 (ii), and (b) $\int_{- m}^{m} {\hat{D}}_{k} (t) d F_{k} (t) \leq 0,$ since D̂_k(t) ≤ 0 by Theorem 1.7 (i) and F_k is monotone nondecreasing. Combining (26) and (27), and using the same expressions with F and F̂ interchanged, yields

\begin{array}{l} 0 = φ_{m} (\hat{F}) - φ_{m} (F) + φ_{m} (F) - φ_{m} (\hat{F}) \\ \geq \sum_{k = 1}^{K} a_{k} \int_{- m}^{m} {F_{k} (t) - {\hat{F}}_{k} (t)}^{2} d t + a_{K + 1} \int_{- m}^{m} {F_{+} (t) - {\hat{F}}_{+} (t)}^{2} d t \\ + {\sum_{k = 1}^{K} {{\hat{F}}_{k} (t) - F_{k} (t)} D_{k} (t) |}_{- m}^{m} + {\sum_{k = 1}^{K} {F_{k} (t) - {\hat{F}}_{k} (t)} {\hat{D}}_{k} (t) |}_{- m}^{m} . \end{array}

By writing out the right side of this expression, we find that it is equivalent to

\begin{matrix} \sum_{k = 1}^{K} a_{k} \int_{- m}^{m} {F_{k} (t) - {\hat{F}}_{k} (t)}^{2} d t + a_{K + 1} \int_{- m}^{m} {F_{+} (t) - {\hat{F}}_{+} (t)}^{2} d t \\ \leq \sum_{k = 1}^{K} [{F_{k} (m) - {\hat{F}}_{k} (m)} {D_{k} (m) - {\hat{D}}_{k} (m)} \\ - {F_{k} (- m) - {\hat{F}}_{k} (- m)} {D_{k} (- m) - {\hat{D}}_{k} (- m)}] . \end{matrix}

(28)

This inequality holds for all m ∈ ℕ, and hence we can take lim inf_m→∞. The left side of (28) is a monotone sequence in m, so that we can replace lim inf_m→∞. by lim_m→∞. The result then follows from the definitions of ψ_k, D_k, and D̂_k in (23) – (25).

We are now ready to prove Proposition 2.16. The idea of the proof is to show that the right side of (22) is almost surely equal to zero. We prove this in two steps. First, we show that it is of order O_p(1), using the tightness results of Proposition 2.9 and Corollary 2.15. Next, we show that the right side is almost surely equal to zero.

Proof of Proposition 2.16

We first show that the right side of (22) is of order O_p(1). Let k ∈ {1,…,K}, and note that Proposition 2.9 yields that {F_k(m) – F̄_0k(m)} and {F̂_k(m) – F̄_0k(m)} are of order O_p(1), so that also {F_k(m) – F̂_k(m)} = O_p(1). Similarly, Corollary 2.15 implies that {H_k(m) – Ĥ_k(m)} = O_p(1). Using the same argument for −m, this proves that the right side of (22) is of order O_p(1).

We now show that the right side of (22) is almost surely equal to zero. Let k ∈ {1,…,K}. We only consider |F_k(m) – F̂_k(m)‖H_k(m) – Ĥ_k(m)|, since the term |F_k(m) – F̂_k(m)‖H₊(m) – Ĥ₊(m)| and the point –m can be treated analogously. It is sufficient to show that

\underset{m \to \infty}{lim inf} P (| F_{k} (m) - {\hat{F}}_{k} (m) | | H_{k} (m) - {\hat{H}}_{k} (m) | > η) = 0, for all η > 0 .

(29)

Let τ_mk be the last jump point of F_k before m, and let τ̂_mk be the last jump point of F̂_k before m. We define the following events

\begin{matrix} E_{m} = E_{m} (ϵ, δ, C) = E_{1 m} (ϵ) \cap E_{2 m} (δ) \cap E_{3 m} (C), where \\ E_{1 m} = E_{1 m} (ϵ) = {\int_{τ_{m k} \lor {\hat{τ}}_{m k}}^{\infty} {F_{k} (t) - {\hat{F}}_{k} (t)}^{2} d t < ϵ}, \\ E_{2 m} = E_{2 m} (δ) = {m - (τ_{m k} \lor {\hat{τ}}_{m k}) > δ}, \\ E_{3 m} = E_{3 m} (C) = {| H_{k} (m) - {\hat{H}}_{k} (m) | < C} . \end{matrix}

Let ϵ₁ > 0 and ϵ₂ > 0. Since the right side of (22) is of order O_p(1), it follows that ∫{F_k(t) – F̂_k(t)}²dt = O_p(1) for every k ∈ {1,…,K}. This implies that $\int_{m}^{\infty} {F_{k} (t) - {\hat{F}}_{k} (t)}^{2} d t \to_{p} 0$ as m → ∞ . Together with the fact that m – {τ_mk ∨ τ̂_mk} = O_p(1) (Corollary 2.14), this implies that there is an m₁ > 0 such that P(E_1m(ϵ₁)^c) < ϵ₁ for all m > m₁. Next, recall that the points of jump of F_k and F̂_k are contained in the set 𝒩_k, defined in Proposition 2.1. Letting τ′_mk = max{𝒩_k ∩ (–∞, m]}, we have

P (E_{2 m}^{c} (δ)) \leq P (m - {τ^{'}}_{m k} < δ) .

(30)

The distribution of m – τ′_mk is independent of m, non-degenerate and continuous [see 4]. Hence, we can choose δ > 0 such that the probabilities in (30) are bounded by ϵ₂/2 for all m. Furthermore, by tightness of {H_k(m) – Ĥ_k(m)}, there is a C > 0 such that P(E_3m(C)^c) < ϵ₂/2 for all m. This implies that P(E_m(ϵ₁,δ,C)^c) < ϵ₁ + ϵ₂ for m > m₁.

Returning to (29), we now have for η > 0:

\begin{matrix} \underset{m \to \infty}{lim inf} P (| F_{k} (m) - {\hat{F}}_{k} (m) | | H_{k} (m) - {\hat{H}}_{k} (m) | > η) \\ \leq ϵ_{1} + ϵ_{2} + \underset{m \to \infty}{\lim \inf} P (| F_{k} (m) - {\hat{F}}_{k} (m) | | H_{k} (m) - {\hat{H}}_{k} (m) | > η, E_{m} (ϵ_{1}, δ, C)) \\ \leq ϵ_{1} + ϵ_{2} + \underset{m \to \infty}{\lim \inf} P (| F_{k} (m) - {\hat{F}}_{k} (m) | > \frac{η}{C}, E_{m} (ϵ_{1}, δ, C)), \end{matrix}

using the definition of E_3m(C) in the last line. The probability in the last line equals zero for ϵ₁ small. To see this, note that F_k(m) – F̂_k(m) > η/C, m – {τ_mk ∨ τ̂_mk} > δ, and the fact that F_k and F̂_k are piecewise constant on m – {τ_km ∨ τ̂_mk} imply that

\int_{τ_{m k} \lor {\hat{τ}}_{m k}}^{\infty} {F_{k} (u) - {\hat{F}}_{k} (u)}^{2} d u \geq \int_{τ_{m k} \lor {\hat{τ}}_{m k}}^{m} {F_{k} (u) - {\hat{F}}_{k} (u)}^{2} d u > \frac{η^{2} δ}{C^{2}},

so that E_1m(ϵ₁) cannot hold for ϵ₁ < η²δ/C².

This proves that the right side of (22) equals zero, almost surely. Together with the right-continuity of F_k and F̂_k, this implies that F_k ≡ F̂_k almost surely, for k = 1,…,K. Since F_k and F̂_k are the right derivatives of H_k and Ĥ_k, this yields that H_k = Ĥ_k + c_k almost surely. Finally, both H_k and Ĥ_k satisfy conditions (i) and (ii) of Theorem 1.7 for k = 1,…,K, so that c₁ = ⋯ = c_K = 0 and H ≡ H^ almost surely.

3. Proof of the limiting distribution of the MLE

In this section we prove that the MLE converges to the limiting distribution given in Theorem 1.8. In the process, we also prove the existence part of Theorem 1.7.

First, we recall from [8, Section 2.2] that the naive estimators F˜_nk, k = 1,…,K, are unique at t ∈ {T₁,…,T_n}, and that the MLEs F̂_nk, k = 1,…,K, are unique at t ∈ 𝒯_K, where $T_{k} = {T_{i}, i = 1, \dots, n : Δ_{k}^{i} + Δ_{K + 1}^{i} > 0} \cup {T_{(n)}}$ for k = 1,…,K [see 8, Proposition 2.3]. To avoid issues with non-uniqueness, we adopt the convention that F̃_nk and F̂_nk, k = 1,…,K, are piecewise constant and right-continuous, with jumps only at the points at which they are uniquely defined. This convention does not affect the asymptotic properties of the estimators under the assumptions of Section 1.2. Recalling the definitions of G and G_n given in Section 1.1, we now define the following localized processes:

Definition 3.1

For each k = 1,…,K, we define:

{\hat{F}}_{n k}^{l o c} (t) = n^{1 / 3} {{\hat{F}}_{n k} (t_{0} + n^{- 1 / 3} t) - F_{0 k} (t_{0})},

(31)

V_{n k}^{l o c} (t) = \frac{n^{2 / 3}}{g (t_{0})} \int_{u \in (t_{0}, t_{0} + n^{- 1 / 3} t)} {δ_{k} - F_{0 k} (t_{0})} d ℙ_{n} (u, δ),,

(32)

{\bar{H}}_{n k}^{l o c} (t) = \frac{n^{2 / 3}}{g (t_{0})} \int_{t_{0}}^{t_{0} + n^{- 1 / 3} t} {{\hat{F}}_{n k} (u) - F_{0 k} (t_{0})} d G (u),

(33)

{\hat{H}}_{n k}^{l o c} (t) = {\bar{H}}_{n k}^{l o c} (t) + \frac{c_{n k}}{a_{k}} - F_{0 k} (t_{0}) \sum_{k = 1}^{K} \frac{c_{n k}}{a_{k}},

(34)

where c_nk is the difference between $a_{k} V_{n k}^{l o c} + a_{K + 1} V_{n +}^{l o c}$ and $a_{k} H_{n k}^{l o c} + a_{K + 1} H_{n +}^{l o c}$ at the last jump point τ_nk of ${\hat{F}}_{n k}^{l o c}$ before zero, i.e.,

c_{n k} = a_{k} V_{n k}^{l o c} (τ_{n k} -) + a_{K + 1} V_{n +}^{l o c} (τ_{n k} -) - a_{k} {\bar{H}}_{n k}^{l o c} (τ_{n k}) - a_{K + 1} {\bar{H}}_{n +}^{l o c} (τ_{n k}) .

(35)

Moreover, we define the vectors ${\hat{F}}_{n}^{l o c} = ({\hat{F}}_{n 1}^{l o c}, \dots, {\hat{F}}_{n K}^{l o c}), V_{n}^{l o c} = (V_{n 1}^{l o c}, \dots, V_{n K}^{l o c}),$ and ${\hat{H}}_{n}^{l o c} = ({\hat{H}}_{n 1}^{l o c}, \dots, {\hat{H}}_{n K}^{l o c}) .$

Note that ${\hat{H}}_{n k}^{l o c}$ differs from ${\bar{H}}_{n k}^{l o c}$ by a vertical shift, and that $({\hat{H}}_{n k}^{l o c})^{'} (t) = ({\bar{H}}_{n k}^{l o c})^{'} (t) = {\hat{F}}_{n k}^{l o c} (t) + o (1) .$ We now show that the MLE satisfies the characterization given in Proposition 3.2, which can be viewed as a recentered and rescaled version of the characterization in Proposition 4.8 of [8]. In the proof of Theorem 1.8 we will see that, as n → ∞, this characterization converges to the characterization of the limiting process given in Theorem 1.7.

Proposition 3.2

Let the assumptions of Section 1.2 hold, and let m > 0. then

\begin{array}{l} a_{k} {\hat{H}}_{n k}^{l o c} (t) + a_{K + 1} {\hat{H}}_{n +}^{l o c} (t) \leq a_{k} V_{n k}^{l o c} (t -) + a_{K + 1} V_{n +}^{l o c} (t -) + R_{n k}^{l o c} (t), f o r t \in [- m, m], \\ \int_{- m}^{m} {a_{k} V_{n k}^{l o c} (t -) + a_{K + 1} d V_{n +}^{l o c} (t -) + R_{n k}^{l o c} (t) - a_{k} {\hat{H}}_{n k}^{l o c} (t) - a_{K + 1} {\hat{H}}_{n +}^{l o c} (t)} d {\hat{F}}_{n k}^{l o c} (t) = 0, \end{array}

where $R_{n k}^{l o c} (t) = o_{p} (1),$ uniformly in t ∈ [–m, m].

Proof. Let m > 0 and let τ_nk be the last jump point of F̂_nk before t₀. It follows from the characterization of the MLE in Proposition 4.8 of [8] that

\begin{array}{l} \int_{τ_{n k}}^{s} {a_{k} {{\hat{F}}_{n k} (u) - F_{0 k} (u)} + a_{K + 1} {{\hat{F}}_{n +} (u) - F_{0 +} (u)}} d G (u) \\ \leq \int_{[τ_{n k}, s)} {a_{k} {δ_{k} - F_{0 k} (u)} + a_{K + 1} {δ_{+} - F_{0 +} (u)}} d ℙ_{n} (u, δ) + R_{n k} (τ_{n k}, s), \end{array}

(36)

where equality holds if s is a jump point of F̂_nk. Using that t₀ – τ_nk = O_p(n^−1/3) by [8, Corollary 4.19], it follows from [8, Corollary 4.20] that R_nk(τ_nk, s) = O_p(n^−2/3), uniformly in s ∈ [t₀ – m₁n^1/3,t₀ + m₁n^−1/3]. We now add

\int_{τ_{n k}}^{s} {a_{k} {F_{0 k} (u) - F_{0 k} (t_{0})} + a_{K + 1} {F_{0 +} (u) - F_{0 +} (t_{0})}} d G (u)

to both sides of (36). This gives

\begin{matrix} \int_{τ_{n k}}^{s} {a_{k} {{\hat{F}}_{n k} (u) - F_{0 k} (t_{0})} + a_{K + 1} {{\hat{F}}_{n +} (u) - F_{0 +} (t_{0})}} d G (u) \\ \leq \int_{[τ_{n k}, s)} {a_{k} {δ_{k} - F_{0 k} (t_{0})} + a_{K + 1} {δ_{+} - F_{0 +} (t_{0})}} d ℙ_{n} (u, δ) + R_{n k}^{'} (τ_{n k}, s), \end{matrix}

(37)

where equality holds if s is a jump point of F^_nk, and where

\begin{array}{l} {R^{'}}_{n k} (s, t) = R_{n k} (s, t) + ρ_{n k} (s, t), with \\ ρ_{n k} (s, t) = \int_{[s, t)} {a_{k} {F_{0 k} (t_{0}) - F_{0 k} (u)} + a_{K + 1} {F_{0 +} (t_{0}) - F_{0 +} (u)}} d (G_{n} - G) (u) . \end{array}

Note that ρ_nk(τ_nk, s) = o_p(n^−2/3), uniformly in s ∈ [t₀ – m₁n^−1/3,t₀ + m₁n^−1/3], using (29) in [8, Lemma 4.9] and t₀ – τ_nk = O_p(n^−1/3) by [8, Corollary 4.19]. Hence, the remainder term R′_nk in (37) is of the same order as R_nk. Next, consider (37), and write $\int_{[τ_{n k}, s)} = \int_{[τ_{n k}, t_{0})} + \int_{[t_{0}, s),}$ let s = t₀ + n^−1/3t, and multiply by n^2/3/g(t₀). This yields

c_{n k} + a_{k} {\bar{H}}_{n k} (t) + a_{K + 1} {\bar{H}}_{n +} (t) \leq R_{n k}^{l o c} (t) + a_{k} V_{n k}^{l o c} (t -) + a_{K + 1} V_{n +}^{l o c} (t -),

(38)

where equality holds if t is a jump point of ${\hat{F}}_{n k}^{l o c}$ and where

R_{n k}^{l o c} (t) = {n^{2 / 3} / g (t_{0})} {R^{'}}_{n k} (τ_{n k}, t_{0} + n^{- 1 / 3} t), k = 1, \dots, K .

(39)

Note that $R_{n k}^{l o c} (t) = o_{p} (1)$ uniformly in t ∈ [−m₁, m₁], using again that t₀ – τ_nk = O_p(n^−1/3). Moreover, note that $R_{n k}^{l o c}$ is left-continuous. We now remove the random variables c_nk by solving the following system of equations for H₁,…,H_K:

c_{n k} + a_{k} {\bar{H}}_{n k} (t) + a_{K + 1} {\bar{H}}_{n +} (t) = a_{k} H_{n k} (t) + a_{K + 1} H_{n +} (t), k = 1, \dots, K .

The unique solution is $H_{n k} (t) = {\bar{H}}_{n k} (t) + (c_{n k} / a_{k}) + \sum_{k = 1}^{K} (c_{n k} / a_{k}) \equiv {\hat{H}}_{n k}^{l o c} (t) .$

Definition 3.3

We define ${\hat{U}}_{n} = (R_{n}^{l o c}, V_{n}^{l o c}, {\hat{H}}_{n}^{l o c}, {\hat{F}}_{n}^{l o c}),$ where $R_{n}^{l o c} = (R_{n 1}^{l o c}, \dots, R_{n K}^{l o c})$ with $R_{n k}^{l o c}$ defined by (39), and where $V_{n}^{l o c}, {\hat{H}}_{n}^{l o c}$ and ${\hat{F}}_{n}^{l o c}$ are given in Definition 34. We use the notation ·|[−m,m] to denote that processes are restricted to [−m,m]. We now define a space for Û_n|[−m,m]:

Definition 3.4

For any interval I, let D⁻(I) be the collection of ‘caglad’ functions on I (left-continuous with right limits), and let C(I) denote the collection of continuous functions on I. For m ∈ ℕ, we define the space

\begin{matrix} E [- m, m] = {(D^{-} [- m, m])}^{K} \times {(D [- m, m])}^{K} \times {(C [- m, m])}^{K} \times {(D [- m, m])}^{K} \\ \equiv I \times I I \times I I I \times I V, \end{matrix}

endowed with the product topology induced by the uniform topology on I × II × III, and the Skorohod topology on IV.

Proof of Theorem 1.8

Analogously to the work of [6, Proof of Theorem 6.2] on the estimation of convex densities, we first show that Û_n|[−m,m] is tight in E[−m,m] for each m ∈ ℕ. Since $R_{n k}^{l o c} | [- m, m] = o_{p} (1)$ by Proposition 3.2, it follows that $R_{n}^{l o c}$ is tight in (D⁻[−m,m])^K endowed with the uniform topology. Next, note that the subset of D[−m,m] consisting of absolutely bounded nondecreasing functions is compact in the Skorohod topology. Hence, the local rate of convergence of the MLE [see 8, Theorem 4.17] and the monotonicity of ${\hat{F}}_{n k}^{l o c},$ k = 1,…,K, yield tightness of ${\hat{F}}_{n}^{l o c} | [- m, m]$ in the space (D[−m,m])^K endowed with the Skorohod topology. Moreover, since the set of absolutely bounded continuous functions with absolutely bounded derivatives is compact in C[−m,m] endowed with the uniform topology, it follows that ${\bar{H}}_{n}^{l o c} | [- m, m]$ is tight in (C[−m,m])^K endowed with the uniform topology. Furthermore, $V_{n}^{l o c} | [- m, m]$ is tight in (D[−m,m])^K endowed with the uniform topology, since $V_{n}^{l o c} (t) \to_{d} V (t)$ uniformly on compacta. Finally, c_n1,…,c_nK are tight since each c_nk is the difference of quantities that are tight, using that t₀ – τ_nk = O_p(n^−1/3) by [8, Corollary 4.19]. Hence, also ${\hat{H}}_{n}^{l o c} | [- m, m]$ is tight in (C[−m,m])^K endowed with the uniform topology. Combining everything, it follows that Û_n[−m,m] is tight in E[−m,m] for each m ∈ ℕ.

It now follows by a diagonal argument that any subsequence Û_n′ of Û_n has a further subsequence Û_n″ that converges in distribution to a limit

U = (0, V, H, F) \in {(C (ℝ))}^{K} \times {(C (ℝ))}^{K} \times {(C (ℝ))}^{K} \times {(D (ℝ))}^{K} .

Using a representation theorem (see, e.g., [2], [15, Representation Theorem 13, page 71], or [17, Theorem 1.10.4, page 59]), we can assume that Û_n″ →_a.s U. Hence, F = H′ at continuity points of F, since the derivatives of a sequence of convex functions converge together with the convex functions at points where the limit has a continuous derivative. Proposition 3.2 and the continuous mapping theorem imply that the vector (V,H,F) must satisfy

\begin{array}{l} inf_{[- m, m]} {a_{k} V_{k} (t) + a_{K + 1} V_{+} (t) - a_{k} H_{k} (t) - a_{K + 1} H_{+} (t)} \geq 0, \\ \int_{- m}^{m} {a_{k} V_{k} (t) + a_{K + 1} V_{+} (t) - a_{k} H_{k} (t) - a_{K + 1} H_{+} (t)} d F_{k} (t) = 0, \end{array}

for all m ∈ ℕ, where we replaced V_k(t–) by V_k(t), since V₁,…,V_K are continuous.

Letting m → ∞ it follows that H₁,…,H_K satisfy conditions (i) and (ii) of Theorem 1.7. Furthermore, Theorem 1.7 (iii) is satisfied since t₀ – τ_nk = O_p(n^−1/3) by [8, Corollary 4.19]. Hence, there exists a K-tuple of processes (H₁,…,H_K) that satisfies the conditions of Theorem 1.7. This proves the existence part of Theorem 1.7. Moreover, Proposition 2.16 implies that there is only one such K-tuple. Thus, each subsequence converges to the same limit H = (H₁,…,H_K) = (Ĥ₁,…,Ĥ_K) defined in Theorem 1.8. In particular, this implies that ${\hat{F}}_{n}^{l o c} (t) = n^{1 / 3} ({\hat{F}}_{n} (t_{0} + n^{- 1 / 3} t) - F_{0} (t_{0})) \to_{d} \hat{F} (t)$ in the Skorohod topology on (D(ℝ))^K.

4. Simulations

We simulated 1000 data sets of sizes n = 250, 2500 and 25000, from the model given in Example 2.3. For each data set, we computed the MLE and the naive estimator. For computation of the naive estimator, see [1, pages 13–15] and [9, pages 40–41]. Various algorithms for the computation of the MLE are proposed by [10, 11, 12]. However, in order to handle large data sets, we use a different approach. We view the problem as a bivariate censored data problem, and use a method based on sequential quadratic programming and the support reduction algorithm of [7]. Details are discussed in [13, Chapter 5]. As convergence criterion we used satisfaction of the characterization in [8, Corollary 2.8] within a tolerance of 10⁻¹⁰. Both estimators were assumed to be piecewise constant, as discussed in the beginning of Section 3.

It was suggested by [12] that the naive estimator can be improved by suitably modifying it when the sum of its components exceeds one. In order to investigate this idea, we define a ‘scaled naive estimator’ ${\tilde{F}}_{n k}^{s}$ by

{\tilde{F}}_{n k}^{s} (t) = {\begin{matrix} {\tilde{F}}_{n k} (t) & if {\tilde{F}}_{n +} (s_{0}) \leq 1, \\ {\tilde{F}}_{n k} (t) / {\tilde{F}}_{n +} (s_{0}) & if {\tilde{F}}_{n +} (s_{0}) > 1, \end{matrix}

for k = 1,…,K, where we take s₀ = 3. Note that ${\tilde{F}}_{n +}^{s} (t) \leq 1$ for t ≤ 3. We also defined a ‘truncated naive estimator’ ${\tilde{F}}_{n k}^{t} .$ If F̃_n+(T_(n)) ≤ 1, then ${\tilde{F}}_{n k}^{t} \equiv {\tilde{F}}_{n k}$ for all k = 1,…,K. Otherwise, we let s_n = min{t : F̃_n+(t) > 1} and define

\begin{matrix} {\tilde{F}}_{n k}^{t} (t) = {\begin{matrix} {\tilde{F}}_{n k} (t) & for t < s_{n}, \\ {\tilde{F}}_{n k} (t) + α_{n k} & for t \geq s_{n}, \end{matrix} \\ where α_{κ} = \frac{{\tilde{F}}_{n k} (s_{n}) - {\tilde{F}}_{n k} (s_{n} -)}{{\tilde{F}}_{n +} (s_{n}) - {\tilde{F}}_{n +} (s_{n} -)} {1 - {\tilde{F}}_{n +} (s_{n} -)}, \end{matrix}

for k = 1,…,K. Note that ${\tilde{F}}_{n +}^{t} (t) \leq 1$ for all t ∈ ℝ.

We computed the mean squared error (MSE) of all estimators on a grid with points 0, 0.01, 0.02,…,3.0. Subsequently, we computed relative MSEs by dividing the MSE of the MLE by the MSE of each estimator. The results are shown in Figure 3. Note that the MLE tends to have the best MSE, for all sample sizes and for all values of t. Only for sample size 250 and small values of t, the scaled naive estimator outperforms the other estimators; this anomaly is caused by the fact that this estimator is scaled down so much that it has a very small variance. The difference between the MLE and the naive estimators is most pronounced for large values of t. This was also observed by [12], and they explained this by noting that only the MLE is guaranteed to satisfy the constraint F₊(t) ≤ 1 at large values of t. We believe that this constraint is indeed important for small sample sizes, but the theory developed in this paper indicates that it does not play any role asymptotically. Asymptotically, the difference can be explained by the extra term (a_K+1/a_k){V₊ – Ĥ₊} in the limiting process of the MLE (see Proposition 2.4), since the factor a_K+1/a_k = F_0k(t)/F_0,K+1(t) is increasing in t.

Among the naive estimators, the truncated naive estimator behaves better than the naive estimator for sample sizes 250 and 2500, especially for large values of t. However, for sample size 25000 we can barely distinguish the three naive estimators. The latter can be explained by the fact that all versions of the naive estimator are asymptotically equivalent for t ∈ [0, 3], since consistency of the naive estimator ensures that lim_n→∞ F̃_n+(3) ≤ 1 almost surely. On the other hand, the three naive estimators are clearly less efficient than the MLE for sample size 25000. These results support our theoretical finding that the form of the likelihood (and not the constrained F₊ ≤ 1) causes the different asymptotic behavior of the MLE and the naive estimator.

Finally, we note that our simulations consider estimation of F_0k(t), for t on a grid. Alternatively, one can consider estimation of certain smooth functional of F_0k. The naive estimator was suggested to be asymptotically efficient for this purpose [12], and [14, Chapter 7] proved that the same is true for the MLE. A simulation study that compares the estimators in this setting is presented in [14, Chapter 8.2].

5. Technical proofs

Proof of Lemma 2.11

Let k ∈ {1,…,K} and j ∈ ℕ = {0,1,…}. Note that for M large, we have for all w ≤ j + 1:

C (s_{j M} - w \lor {(s_{j M} - w)}^{3 / 2} \leq \frac{1}{2} b {(s_{j M} - w)}^{2} .

Hence, the probability in the statement of Lemma 2.11 is bounded above by

P {\sup_{w \leq j + 1} {\int_{w}^{s_{j M}} d S_{k} (u) - \frac{1}{2} b {(s_{j M} - w)}^{2}} \geq 0} .

In turn, this probability is bounded above by

\sum_{q = 0}^{\infty} P {\sup_{w \in (j - q, j - q + 1]} \int_{w}^{s_{j M}} d S_{k} (u) \geq λ_{k j q}},

(40)

where λ_kjq = b(s_jM – (j – q + 1))²/2 = b(Mu(j) + q – 1)²/2.

We write the qth term in (40) as

\begin{array}{l} P (\sup_{w \in [j - q, j - q + 1)} S_{k} (s_{j M} - w) \geq λ_{k j q}) \leq P (\sup_{w \in [0, M v (j) + q)} S_{k} (w) \geq λ_{k j q}) \\ = P (\sup_{w \in [0, 1)} S_{k} (w) \geq \frac{λ_{k j q}}{\sqrt{M v (j) + q}}) \leq P (\sup_{w \in [0, 1]} B_{k} (w) \geq \frac{λ_{k j q}}{b_{k} \sqrt{M v (j) + q}}) \\ \leq 2 P (N (0, 1) \geq \frac{λ_{k j q}}{b_{k} \sqrt{M v (j) + q}}) \leq 2 b_{k j q} \exp (- \frac{1}{2} {(\frac{λ_{k j q}}{b_{k} \sqrt{M v (j) + q}})}^{2}), \end{array}

where b_k is the standard deviation of S_k(1) and $b_{j k q} = b_{k} \sqrt{M v (j) + q} / (λ_{k j q} \sqrt{2 π}),$ and B_k(·) is standard Brownian motion. Here we used standard properties of Brownian motion. The second to last inequality is given in for example [16, equation 6, page 33], and the last inequality follows from Mills’ ratio [see 3, Equation (10)]. Note that b_kjq ≤ d all j ∈ ℕ, for some d > 0 and all M > 3. It follows that (40) is bounded above by

\sum_{q = 0}^{\infty} d \exp (- \frac{1}{2} {(\frac{λ_{k j q}}{b_{k} \sqrt{M v (j) + q}})}^{2}) \approx \sum_{q = 0}^{\infty} d \exp (- \frac{1}{2} \frac{{(M v (j) + q)}^{3}}{b_{k}^{2}}),

which in turn is bounded above by d₁ exp(−d₂(Mu(j))³), for some constants d₁ and d₂, using (a + b)³ ≥ a³ + b³ for a, b ≥ 0.

Proof of Lemma 2.12

This proof is completely analogous to the proof of Lemma 4.14 of [8], upon replacing F̂_nk(u) by F^_k(u), F_0k(u) by F̄_0k(u), dG(u) by du, S_nk(·) by S_k(·), τ_nkj by τ_kj, s_njM by s_jM, and A_njM by A_jM. The only difference is that the second term on the right side of [8, equation (69)] vanishes, since this term comes from the remainder term R_nk(s,t), and we do not have such a remainder term in the limiting characterization given in Proposition 3.2.

Footnotes

Supported in part by NSF grant DMS-0203320

^†

Supported in part by NSF grants DMS-0203320 and DMS-0503822 and by NI-AID grant 2R01 AI291968-04

AMS 2000 subject classifications: Primary 62N01, 62G20; secondary 62G05

Contributor Information

Piet Groeneboom, Department of Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands, e-mail: p.groeneboom@its.tudelft.nl.

Marloes H. Maathuis, Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195, USA, e-mail: marloes@stat.washington.edu.

Jon A. Wellner, Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195, USA, e-mail: jaw@stat.washington.edu.

REFERENCES

1.Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions. The Theory and Application of Isotonic Regression. New York: John Wiley & Sons; 1972. [Google Scholar]
2.Dudley RM. Distances of probability measures and random variables. Ann. Math. Statist. 1968;39:1563–1572. [Google Scholar]
3.Gordon RD. Values of Mills’ ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statistics. 1941;12:364–366. [Google Scholar]
4.Groeneboom P. Brownian motion with a parabolic drift and Airy functions. Probability Theory and Related Fields. 1989;81:79–109. [Google Scholar]
5.Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion + t4. Ann. Statist. 2001;29:1620–1652. [Google Scholar]
6.Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698. [Google Scholar]
7.Groeneboom P, Jongbloed G, Wellner JA. Technical Report 2002–13. The Netherlands: Vrije Universiteit Amsterdam; 2002. The support reduction algorithm for computing nonparametric function estimates in mixture models. Available at arXiv:math/ST/0405511. [Google Scholar]
8.Groeneboom P, Maathuis MH, Wellner JA. Current status data with competing risks: consistency and rates of convergence of the MLE. Ann. Statist. 2007 doi: 10.1214/009053607000000983. accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Groeneboom P, Wellner JA. Information Bounds and Nonparametric Maximum Likelihood Estimation. Basel: Birkhäauser Verlag; 1992. [Google Scholar]
10.Hudgens MG, Satten GA, Longini IM. Nonparametric maximum likelihood estimation for competing risks survival data subject to interval censoring and truncation. Biometrics. 2001;57:74–80. doi: 10.1111/j.0006-341x.2001.00074.x. [DOI] [PubMed] [Google Scholar]
11.Jewell NP, Kalbfleisch JD. Maximum likelihood estimation of ordered multinomial parameters. Biostatistics. 2004;5:291–306. doi: 10.1093/biostatistics/5.2.291. [DOI] [PubMed] [Google Scholar]
12.Jewell NP, Van der Laan MJ, Henneman T. Nonparametric estimation from current status data with competing risks. Biometrika. 2003;90:183–197. [Google Scholar]
13.Maathuis MH. Master's thesis. The Netherlands: Delft University of Technology; 2003. Nonparametric Maximum Likelihood Estimation for Bivariate Censored Data. Available at http://www.stat.washington.edu/marloes/papers. [Google Scholar]
14.Maathuis MH. Ph.D. thesis. University of Washington; 2006. Nonparametric Estimation for Current Status Data with Competing Risks. Available at http://www.stat.washington.edu/marloes/papers. [Google Scholar]
15.Pollard D. Convergence of Stochastic Processes. New York: Springer-Verlag; 1984. Available at http://ameliabedelia.library.yale.edu/dbases/pollard1984.pdf. [Google Scholar]
16.Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. New York: John Wiley & Sons; 1986. [Google Scholar]
17.Van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer-Verlag; 1996. [Google Scholar]

[R1] 1.Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions. The Theory and Application of Isotonic Regression. New York: John Wiley & Sons; 1972. [Google Scholar]

[R2] 2.Dudley RM. Distances of probability measures and random variables. Ann. Math. Statist. 1968;39:1563–1572. [Google Scholar]

[R3] 3.Gordon RD. Values of Mills’ ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statistics. 1941;12:364–366. [Google Scholar]

[R4] 4.Groeneboom P. Brownian motion with a parabolic drift and Airy functions. Probability Theory and Related Fields. 1989;81:79–109. [Google Scholar]

[R5] 5.Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion + t4. Ann. Statist. 2001;29:1620–1652. [Google Scholar]

[R6] 6.Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698. [Google Scholar]

[R7] 7.Groeneboom P, Jongbloed G, Wellner JA. Technical Report 2002–13. The Netherlands: Vrije Universiteit Amsterdam; 2002. The support reduction algorithm for computing nonparametric function estimates in mixture models. Available at arXiv:math/ST/0405511. [Google Scholar]

[R8] 8.Groeneboom P, Maathuis MH, Wellner JA. Current status data with competing risks: consistency and rates of convergence of the MLE. Ann. Statist. 2007 doi: 10.1214/009053607000000983. accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Groeneboom P, Wellner JA. Information Bounds and Nonparametric Maximum Likelihood Estimation. Basel: Birkhäauser Verlag; 1992. [Google Scholar]

[R10] 10.Hudgens MG, Satten GA, Longini IM. Nonparametric maximum likelihood estimation for competing risks survival data subject to interval censoring and truncation. Biometrics. 2001;57:74–80. doi: 10.1111/j.0006-341x.2001.00074.x. [DOI] [PubMed] [Google Scholar]

[R11] 11.Jewell NP, Kalbfleisch JD. Maximum likelihood estimation of ordered multinomial parameters. Biostatistics. 2004;5:291–306. doi: 10.1093/biostatistics/5.2.291. [DOI] [PubMed] [Google Scholar]

[R12] 12.Jewell NP, Van der Laan MJ, Henneman T. Nonparametric estimation from current status data with competing risks. Biometrika. 2003;90:183–197. [Google Scholar]

[R13] 13.Maathuis MH. Master's thesis. The Netherlands: Delft University of Technology; 2003. Nonparametric Maximum Likelihood Estimation for Bivariate Censored Data. Available at http://www.stat.washington.edu/marloes/papers. [Google Scholar]

[R14] 14.Maathuis MH. Ph.D. thesis. University of Washington; 2006. Nonparametric Estimation for Current Status Data with Competing Risks. Available at http://www.stat.washington.edu/marloes/papers. [Google Scholar]

[R15] 15.Pollard D. Convergence of Stochastic Processes. New York: Springer-Verlag; 1984. Available at http://ameliabedelia.library.yale.edu/dbases/pollard1984.pdf. [Google Scholar]

[R16] 16.Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. New York: John Wiley & Sons; 1986. [Google Scholar]

[R17] 17.Van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer-Verlag; 1996. [Google Scholar]

PERMALINK

CURRENT STATUS DATA WITH COMPETING RISKS: LIMITING DISTRIBUTION OF THE MLE

Piet Groeneboom

Marloes H Maathuis

Jon A Wellner

Abstract

1. Introduction

1.1. Notation

Definition 1.1

1.2. Assumptions

1.3. The estimators

1.4. Main results

Definition 1.2

Remark 1.3

Remark 1.4

Definition 1.5

Theorem 1.6

Theorem 1.7

Theorem 1.8

1.5. Outline

2. Limiting processes

2.1. Interpretations of Ĥ and F̂

Proposition 2.1

Corollary 2.2

FIG 1.

Proposition 2.4

FIG 2.

FIG 3.

Proposition 2.5

Corollary 2.6

Definition 2.7

Proposition 2.8

2.2. Tightness of Ĥ and F̂

Proposition 2.9

Proposition 2.10

Lemma 2.11

Lemma 2.12

Corollary 2.13

Proof of Proposition 2.9

Corollary 2.14

Corollary 2.15

2.3. Uniqueness of Ĥ and F̂

Lemma 2.17

Proof of Proposition 2.16

3. Proof of the limiting distribution of the MLE

Definition 3.1

Proposition 3.2

Definition 3.3

Definition 3.4

Proof of Theorem 1.8

4. Simulations

5. Technical proofs

Proof of Lemma 2.11

Proof of Lemma 2.12

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases