Nonparametric estimation of a convex bathtub-shaped hazard function

HANNA K JANKOWSKI; JON A WELLNER

doi:10.3150/09-BEJ202

. Author manuscript; available in PMC: 2010 May 1.

Published in final edited form as: Bernoulli (Andover). 2009 Nov 1;15(4):1010–1035. doi: 10.3150/09-BEJ202

Nonparametric estimation of a convex bathtub-shaped hazard function

HANNA K JANKOWSKI ¹, JON A WELLNER ²

PMCID: PMC2850000 NIHMSID: NIHMS177730 PMID: 20383267

Abstract

In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n^2/5 at points x₀ where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.

Keywords: antimode, bathtub, consistency, convex, failure rate, force of mortality, hazard rate, invelope process, limit distribution, nonparametric estimation, U-shaped

1. Introduction

Information on the behavior of time to a random event is of much interest in many fields. The random event could be failure of a material or machine, death, an earthquake or infection by a disease, to name but a few examples. Frequently, this type of data is called lifetime data, and it is natural to assume that it takes values in [0, ∞). If the lifetime distribution F has a density f , then a key quantity of interest is the hazard (or failure) rate h(t) = f(t)/(1 − F(t)). Heuristically, h(t) dt is the probability that, given survival until time t, the event will occur in the next duration of length dt. The hazard function is also known as the force of mortality in actuarial science or the intensity function in extreme value theory.

Certain shape restrictions arise quite naturally for hazard rates. In this work, we are particularly interested in the family of hazard functions which are convex. That is, we attach the additional smoothness constraint of convexity to the more traditional assumption of a bathtub-shaped failure rate (that is, first decreasing, then increasing). Heuristically, bathtub-shaped hazards correspond to lifetime distributions with high initial hazard (or infant mortality), lower and often rather constant hazard during the middle of life and then increasing hazard of failure (or wear-out) as aging proceeds; see [20,23].

Many other estimators of hazard functions (and solutions to the closely related problem of estimating the intensity of a Poisson process) with and without shape restrictions have been considered in the literature; see [24] for a partial review up to 2002. In recent years, the focus has shifted to construction of “adaptive” estimators over large scales of smoothness classes; see, for example, [5,6,25]. Virtually all of these other estimators require careful choice of penalty terms or tuning parameters, and computation of the adaptive estimators typically involves methods of combinatorial optimization. As far as we know, reliable algorithms for computing them are not yet available. Our estimators avoid the choices of tuning parameters or penalty terms by virtue of the shape constraint of convexity and are relatively straightforward to compute since the corresponding optimization problems are convex.

Recall the definition of a convex function. Let $C \subset R_{+} = [0, \infty)$ be convex. Then $h : C \mapsto R$ is convex (on C) if it satisfies

h (λ x + (1 - λ) y) \leq λ h (x) + (1 - λ) h (y), 0 < λ < 1,

for all x, y ∈ C. Equivalently, a function is convex if its epigraph

{(x, μ) : x \in C, μ \in R, μ \geq f (x)}

is a convex set in $R^{2}$ (see, for example, [27], Section 4). Thus, a convex function on C may be extended to a convex function on $R_{+}$ by setting h(x) = +∞ for $x \in R_{+} \cap C^{c}$ .

Suppose, then, taht we observe i.i.d. variables X₁, … , X_n from a distribution F₀ with density f₀ and hazard rate h₀. We denote the true cumulative hazard function by $H_{0} (t) = \int_{0}^{t} h_{0} (s) d s$ , and the true survival function by S₀ = 1 − F₀. Also, 0 < X₍₁₎ < X₍₂₎ < ⋯ < X_(n) denote the order statistics corresponding to X₁, … , X_n.

To define the MLE of h₀, ${\hat{h}}_{n}$ , we first consider the likelihood in terms of the hazard,

L (h) = \prod_{i = 1}^{n} h (X_{i}) \exp {- H (X_{i})} = \prod_{i = 1}^{n} h (X_{(i)}) \exp {- H (X_{(i)})},

where $H (t) = \int_{0}^{t} h (s) d s$ . This can be made arbitrarily large by increasing the value of h(X_(n)). We therefore find ${\hat{h}}_{n} : [0, X_{(n)}) \mapsto R_{+}$ by maximizing the modified likelihood

L^{\mod} (h) = \prod_{i = 1}^{n - 1} h (X_{(i)}) \exp {- H (X_{(i)})} \times \exp {- H (X_{(n)})}

(1.1)

over $K_{+}$ , the space of non-negative convex functions on [0, X_(n)). The full MLE is then found by setting ${\hat{h}}_{n} (x) = \infty$ for all x ≥ X_(n). This is the same approach as taken in [10]. Equivalently, one could first impose the constraint that h ≤ M, and then let M → ∞ (see, for example, [26], page 338).

To illustrate the proposed estimator, consider the distribution with density given by

f (t) = \frac{1 + 2 b}{2 A \sqrt{b^{2} + (1 + 2 b) t ∕ A}} on 0 \leq t \leq A .

This distribution was derived in [14] as a relatively simple model with bathtub-shaped hazards which also has an adequate ability to model lifetime behavior. We will call this the HS distribution, after the authors. It has convex hazards for all values of b in the parameter space (b > −1/2). Figure 1 shows the MLE for simulated data from this distribution with sample size n = 100.

Examples of the estimator. Left: Estimating the HS hazard with b = 0 and A = 1 for a sample size of 100 (bold = true hazard, solid = MLE). Right: Estimation of the earthquake hazard from CPTI04 data (solid = MLE).

We also applied our estimators to the earthquake data of the Appennino Abruzzese region of Italy (Region 923) recently considered by [19], where Bayesian estimation methods are studied. The data comes from the Gruppo di Lavoro CPTI (2004) catalog [9]. It consists of 46 inter-quake times for Region 923, occurring after the year 1650 and with moment magnitude greater than 5.1 (details on the justification of these criteria is available in [19], page 14). Figure 1 shows the resulting estimator.

The main results of this paper are the characterizations, consistency and asymptotic behavior of the nonparametric MLE of a convex hazard function. The estimator is continuous and piecewise linear on [0, X_(n)). Although we give a characterization of the MLE, the final form of the estimator is not explicit. We therefore propose an algorithm (based on the support reduction algorithm of [13]). This algorithm is discussed in a separate report, [16], and is available as the R package convexHaz [17].

To describe the local asymptotics of the MLE, we introduce the following process.

Definition 1.1

Let W(s) denote a standard two-sided Brownian motion, with W(0) = 0, and define $Y (t) = \int_{0}^{t} W (s) d s + t^{4}$ . The function ${I (t) : t \in R}$ , the invelope of the process ${Y (t) : t \in R}$ is defined as fallows:

the function I is above the function Y : I (t) \geq Y (t) for all t \in R;

(1.2)

the function I has a convex second derivative;

(1.3)

the function I satisfies \int_{R} {I (t) - Y (t)} d I^{(3)} (t) = 0 .

(1.4)

It was shown in [11] that the process $I$ exists and is almost surely uniquely defined. Moreover, with probability one, $I$ is three times differentiable at t = 0. The asymptotic behavior of all of our estimators may be described in terms of the derivatives of the invelope $I$ at zero. The following theorem builds on the basic results in [12] concerning nonparametric estimation of a decreasing convex density.

Theorem 1.2

Suppose that h₀ is convex, x₀ > 0 is a point which satisfies h₀(x₀) > 0 and $h_{0}^{''} (x_{0}) m a k 0$ , and $h_{0}^{''} (\cdot)$ is continuous in a neighborhood of x₀. Then

(\begin{matrix} n^{2 ∕ 5} ({\hat{h}}_{n} (x_{0}) - h_{0} (x_{0})) \\ n^{1 ∕ 5} ({\hat{h}}_{n}^{'} (x_{0}) - h_{0}^{'} (x_{0})) \end{matrix}) \to_{d} (\begin{matrix} c_{1} I^{(2)} (0) \\ c_{2} I^{(3)} (0) \end{matrix}),

where $I^{(2)} (0)$ and $I^{(3)} (0)$ are the second and third derivatives at 0, respectively, of the invelope of $Y (t) \equiv \int_{0}^{t} W (s) d s + t^{4}$ and where

c_{1} = {(\frac{h_{0}^{2} (x_{0}) h_{0}^{''} (x_{0})}{24 S_{0}^{2} (x_{0})})}^{1 ∕ 5}, c_{2} = {(\frac{h_{0} (x_{0}) h_{0}^{''} {(x_{0})}^{3}}{24^{3} S_{0} (x_{0})})}^{1 ∕ 5} .

The key to this result lies in Lemma 5.3, where we establish that the “touchpoints” (defined carefully in Section 2) cluster around x₀ at a local scale of n^−1/5. The assumption that $h_{0}^{''}$ is strictly convex and continuous near x₀ is crucial in this step. If $h_{0}^{''} (x_{0}) = 0$ (and is continuous in a neighborhood of x₀), we conjecture that ${\hat{h}}_{n} (x_{0})$ converges at the rate n^1/2. Similar behavior has been noted for monotone density estimators in [8]. If $h_{0}^{'}$ is discontinuous at x₀, unpublished work of Cai and Low [7] suggests that ${\hat{h}}_{n} (x_{0})$ converges to h₀(x₀) at rate n^1/3. The behavior of convex-constrained estimators in both of these situations remains unknown and is the subject of current research.

The limiting distributions of ${\hat{h}}_{n} (x_{0})$ and ${\hat{h}}_{n}^{'} (x_{0})$ involve the constants c₁ and c₂, which depend on the (unknown) hazard function h₀, as well as the random variables $(I^{(2)} (0), I^{(3)} (0))$ , which have a universal distribution free of the parameters of the problem. Thus, Theorem 1.2 can be used, in principle, to form confidence intervals for h₀(x₀) and $h_{0}^{'} (x_{0})$ . This would involve estimation of the constants c₁ = c₁(h₀, x₀) and c₂ = c₂(h₀, x₀), respectively, both of which depend on $h_{0}^{''} (x_{0})$ , and appropriate quantiles of the distributions of $I^{(2)} (0)$ and $I^{(3)} (0)$ , respectively. Although virtually nothing is known about the distribution of the invelope and its derivatives analytically, the algorithms developed in [11] can easily be used to obtain simulated values of the needed quantiles. Other possible approaches to confidence intervals in this problem involve inversion of likelihood ratio tests (see [2-4] for this approach in the context of monotone or U-shaped function estimation) or resampling methods as in [22] and as discussed in [4] in the setting of nonparametric estimation of monotone functions. It should be noted that our Theorem 1.2 verifies one of the key hypotheses needed for validity of the general subsampling theory of [21,22], and therefore makes the subsampling approach to confidence intervals viable. The details and properties of all these approaches remain to be investigated.

The outline of this paper is as follows. Section 2 is dedicated to the proof of characterizations, existence and uniqueness of the MLE. Consistency is proved in Sections 3 and 4 establishes lower bounds for the pointwise minimax risk of ${\hat{h}}_{n}$ . Rates of convergence are established in Section 5, with Section 5.3 containing proofs of our main results concerning the limiting distribution at a fixed point. The companion technical report [15] also includes a detailed treatment of a least-squares estimator, as well as sketches of similar results for censored data and intensity functions of Poisson processes.

2. Characterizations, uniqueness and existence

Proposition 2.1

The function ${\hat{h}}_{n}$ which minimizes φ_n over $K_{+}$ is piecewise linear. It has at most one change of slope between observations, except perhaps in one such interval, where, if the estimator touches zero, it may have two changes of slope (it is zero between these two changes). Also, between zero and X₍₁₎, the minimizer may have at most one change of slope, but this happens only if it touches zero, and in this case the estimator is increasing and equal to zero before the first change of slope. Between X_(n−1) and X_(n), the minimizer will also have at most one change of slope, and this only in the case where it is decreasing on [X_(n−1), X_(n)) and equal to zero after the change.

Proof

Consider any h and choose a convex g such that h(X_i) = g(X_i) for i = 1, … , n − 1 and h(x) ≥ g(x) ≥ 0 on [0, X_(n)). It follows that φ_n(h) − φ_n(g) ≥ 0 if and only if H(X_i) ≥ G(X_i) for i = 1, … , n. Hence, the smaller we make g on [0, X_(n)), the smaller φ_n(g) will become. It is not difficult to see that the smallest such g, with values of g(X_i) fixed, must have the prescribed form.

Since ${\hat{h}}_{n}$ is piecewise linear, it may be expressed as

{\hat{h}}_{n} (t) = \hat{a} + \sum_{j = 1}^{k} {\hat{ν}}_{j} {(τ_{j} - t)}_{+} + \sum_{j = 1}^{m} {\hat{μ}}_{j} {(t - η_{j})}_{+},

(2.1)

where ${\hat{ν}}_{j}, \hat{a}, {\hat{μ}}_{j} \geq 0$ . We let τ_j denote the points of change of slope of ${\hat{h}}_{n}$ where ${\hat{h}}_{n}$ is decreasing and let η_j > 0 denote the points of change of slope where ${\hat{h}}_{n}$ is increasing. For simplicity, we assume that these are ordered. Also, we have τ_k ≤ η₁. As seen in the next lemma, the τ_j ’s and η_j’s correspond to “points of touch” or equality of processes defined on the one hand in terms of ${\hat{h}}_{n}$ and the data, and on the other hand just in terms of the data. We therefore also refer to them as “touchpoints” repeatedly in the remainder of the paper.

It is convenient to define the MLE in terms of the minimization of the criterion function

φ_{n} (h) = \int_{0}^{\infty} {H (t) - \log h (t) 1_{t \neq X_{(n)}}} d F_{n} (t),

where $F_{n}$ denotes the empirical distribution function of the data,

F_{n} (t) = \frac{1}{n} \sum_{i = 1}^{n} 1_{[0, t]} (X_{i}) .

We will also use the notation $S_{n} = 1 - F_{n} (t)$ for the empirical survival function.

Lemma 2.2

Let ${\tilde{F}}_{n} (t) = (1 ∕ n) \sum_{i = 1}^{n - 1} 1_{[0, t]} (X_{(i)})$ . A function ${\hat{h}}_{n}$ minimizes φ_n over $K_{+}$ (and hence is the MLE) if and only if:

\int_{0}^{x} \frac{x - t}{{\hat{h}}_{n} (t)} d {\tilde{F}}_{n} (t) = \int_{0}^{x} \int_{0}^{t} S_{n} (s) d s d t

(2.2)

for all x ≥ 0with equality at τ_i for i = 1, … , k;

\int_{x}^{\infty} \frac{t - x}{{\hat{h}}_{n} (t)} d {\tilde{F}}_{n} (t) = \int_{x}^{\infty} \int_{t}^{\infty} S_{n} (s) d s d t

(2.3)

for all x ≥ 0with equality at η_j for j = 1, … , m;

\int_{0}^{\infty} \frac{1}{{\hat{h}}_{n} (t)} d {\tilde{F}}_{n} (t) \leq \int_{0}^{\infty} S_{n} (t) d t,

(2.4)

\int_{0}^{\infty} {\hat{H}}_{n} (t) d F_{n} (t) = 1 - 1 ∕ n .

(2.5)

Moreover, the minimizer ${\hat{h}}_{n}$ satisfies

\int_{0}^{x} {\hat{h}}_{n} (t) S_{n} (t) d t = F_{n} (x)

(2.6)

for x ∈ {τ₁, … , τ_k, η₁, … , η_m}.

Remark 2.3

As we assume a priori that ${\hat{h}}_{n} (X_{(n)}) = \infty$ , we may rewrite the left-hand side terms in (2.2)–(2.4) via

\begin{matrix} \int_{A} \frac{x - t}{{\hat{h}}_{n} (t)} d {\tilde{F}}_{n} (t) = \int_{A} \frac{x - t}{{\hat{h}}_{n} (t)} 1_{t \neq X_{(n)}} d F_{n} (t) = \int_{A} \frac{x - t}{{\hat{h}}_{n} (t)} d F_{n} (t), \\ \int_{A} \frac{1}{{\hat{h}}_{n} (t)} d {\tilde{F}}_{n} (t) = \int_{A} \frac{1}{{\hat{h}}_{n} (t)} 1_{t \neq X_{(n)}} d F_{n} (t) = \int_{A} \frac{1}{{\hat{h}}_{n} (t)} d F_{n} (t) \end{matrix}

for any set A. We will hereafter use this latter formulation.

Corollary 2.4

Let ${τ_{i}}_{i = 1}^{k}$ and ${η_{j}}_{j = 1}^{m}$ denote the change points of ${\hat{h}}_{n}$ as in (2.1). It fallows that

\int_{0}^{τ_{i}} \frac{1}{{\hat{h}}_{n} (t)} d F_{n} (t) = \int_{0}^{τ_{i}} S_{n} (u) d u,

(2.7)

\int_{η_{j}}^{\infty} \frac{1}{{\hat{h}}_{n} (t)} d F_{n} (t) = \int_{η_{j}}^{\infty} S_{n} (u) d u

(2.8)

for i = 1, … , k and j = 1, … , m.

Proof

The function

ϕ (x) \equiv \int_{o}^{x} \frac{x - t}{{\hat{h}}_{n} (t)} d F_{n} (t) - \int_{0}^{x} \int_{0}^{t} S_{n} (s) d s d t

is maximized at τ_i for i = 1, … , k. Since it is also differentiable, (2.7) follows. A similar argument proves (2.8).

Proof of Lemma 2.2

Consider any non-negative convex function h. It follows that there exists a non-negative constant a and non-negative measures ν and μ (these measures have supports with intersection containing at most one point) such that

h (t) = a + \int_{0}^{\infty} {(x - t)}_{+} d ν (x) + \int_{0}^{\infty} {(t - x)}_{+} d μ (x) .

For any function ĥ in $K_{+}$ , we calculate

φ_{n} (h) - φ_{n} (\hat{h}) \geq \int_{0}^{\infty} {H (t) - \hat{H} (t) + (1 - \frac{h (t)}{\hat{h} (t)}) 1_{t \neq X_{(n)}}} d F_{n} (t)

since − log x ≥ 1 − x. Plugging in the explicit form of h from above, we find that the right-hand side is equal to

\begin{matrix} a & {\int_{[0, \infty)} (t - \frac{1}{\hat{h} (t)} 1_{t \neq X_{(n)}}) d F_{n} (t)} + {\frac{n - 1}{n} - \int_{0}^{\infty} \hat{H} (t) d F_{n} (t)} \\ + \int_{0}^{\infty} {\int_{0}^{x} \int_{0}^{t} S_{n} (s) d s d t - \int_{0}^{x} \frac{x - t}{\hat{h} (t)} 1_{t \neq X_{(n)}} d F_{n} (t)} d ν (x) \\ + \int_{0}^{\infty} {\int_{x}^{\infty} \int_{t}^{\infty} S_{n} (s) d s d t - \int_{0}^{x} \frac{t - x}{\hat{h} (t)} 1_{t \neq X_{(n)}} d F_{n} (t)} d μ (x) . \end{matrix}

This is non-negative if $\hat{h}$ is a function which satisfies conditions (2.2)–(2.5). It follows that these conditions are sufficient to describe a minimizer of φ_n.

We next show that these conditions are necessary. To do this, we first define the directional derivative

\partial_{γ} φ_{n} (h) \equiv \lim_{ε \to 0} \frac{φ_{n} (h + ε γ) - φ_{n} (h)}{ε} = \int_{0}^{\infty} {Γ (t) - \frac{γ (t)}{h (t)} 1_{t \neq X_{(n)}}} d F_{n} (t) .

(2.9)

If ${\hat{h}}_{n}$ minimizes φ_n, then for any γ such that ${\hat{h}}_{n} + ε γ$ is in $K_{+}$ for sufficiently small ε, we must have $\partial_{γ} φ_{n} ({\hat{h}}_{n}) \geq 0$ . If however, ${\hat{h}}_{n} \pm ε γ$ is in $K_{+}$ for sufficiently small ε, then $\partial_{γ} φ_{n} ({\hat{h}}_{n}) = 0$ .

If we choose, respectively, γ(t) ≡ 1, (t − y)₊, (y − t)₊, then ${\hat{h}}_{n} + ε γ$ is in $K_{+}$ and we obtain the inequalities in conditions (2.2)–(2.4). Since $(1 \pm ε) {\hat{h}}_{n}$ is also in $K_{+}$ , for sufficiently small ε, we obtain (2.5). Choosing γ = (τ_i − t)₊, (t − η_j)₊ yields the equalities in (2.2) and (2.3), respectively, since for each of these functions, ${\hat{h}}_{n} \pm ε γ$ is in $K_{+}$ .

Lastly, we prove (2.6). For any τ_i, define

γ (t) = {\begin{matrix} {\hat{h}}_{n} (t) - {\hat{h}}_{n} (τ_{i}), & for t \in [0, τ_{i}), \\ 0, & otherwise . \end{matrix}

Since (1 ± ε)γ is also in $K_{+}$ , It follows that $\partial_{γ} φ_{n} ({\hat{h}}_{n}) = 0$ and hence

\begin{matrix} 0 = & {\int_{0}^{τ_{i}} {\hat{H}}_{n} (t) d F_{n} (t) - F_{n} (τ_{i}) + {\hat{H}}_{n} (τ_{i}) S_{n} (τ_{i})} \\ + {\hat{h}}_{n} (τ_{i}) {\int_{0}^{τ_{i}} \frac{1}{{\hat{h}}_{n} (t)} d F_{n} (t) - \int_{0}^{τ_{i}} t d F_{n} (t) - τ_{i} S_{n} (τ_{i})} . \end{matrix}

Integration by parts and Corollary 2.4 yield (2.6) for x = τ_i. The case where x = η_j is obtained in a similar manner, but using $γ (t) = ({\hat{h}}_{n} (t) - {\hat{h}}_{n} (η_{j})) 1_{(η_{j}, \infty)} (t)$ and (2.5).

The next corollary allows us to extend the equalities of the characterization of the MLE to some extra touchpoints. The significance of these equations will become clear in Section 5, where we consider asymptotics of the estimator.

Corollary 2.5

Suppose that ${\hat{h}}_{n}$ is strictly positive and recall the formulation given in (2.1). Then we also have that

\int_{0}^{η_{1}} \frac{η_{1} - t}{{\hat{h}}_{n} (t)} d F_{n} (t) = \int_{0}^{η_{1}} \int_{0}^{s} S_{n} (u) d u d s,

(2.10)

\int_{τ_{k}}^{\infty} \frac{t - τ_{k}}{{\hat{h}}_{n} (t)} d F_{n} (t) = \int_{τ_{k}}^{\infty} \int_{t}^{\infty} S_{n} (s) d s d t,

(2.11)

\int_{0}^{τ_{i}} \frac{1}{{\hat{h}}_{n} (t)} d F_{n} (t) = \int_{0}^{τ_{i}} S_{n} (u) d u,

(2.12)

\int_{η_{j}}^{\infty} \frac{1}{{\hat{h}}_{n} (t)} d F_{n} (t) = \int_{η_{j}}^{\infty} S_{n} (u) d u,

(2.13)

Proof

The first two equalities follow by noting that if ${\hat{h}}_{n}$ is strictly positive, then for ε sufficiently small, ${\hat{h}}_{n} \pm ε γ$ is in $K_{+}$ for γ(t) = (t − τ_k)₊, (η₁ − t)₊. Arguing as for Corollary 2.4 proves the remaining identities.

Proposition 2.6

There exists a unique minimizer ${\hat{h}}_{n}$ of φ_n over $K_{+}$ .

Proof

We will show that a minimizer exists by reducing the search to bounded positive convex functions on a compact domain. As this is a compact set, under the topology of uniform convergence, a minimizer of φ_n exists (see [27], Theorems 10.6, 10.8 and 27.3).

We must first handle the issue of a compact domain. As we assume a priori that ${\hat{h}}_{n} (X_{(n)}) = \infty$ , we are really looking for the minimizer of the modified negative of the log-likelihood with domain [0, X_(n)). However, we have also argued that the minimizer must have the specific functional form as described in Proposition 2.1. Therefore, it is sufficient to reduce the domain to [0, X_(n−1) + δ], for any δ > 0, since ${\hat{h}}_{n}$ is then extended linearly beyond X_(n−1) + δ in a unique manner. It will therefore be sufficient to show that we may reduce the search to functions bounded on [0, X_(n−1)], with a derivative at X_(n−1) which is bounded above.

Recall that the minimizer must satisfy (2.5). We therefore reduce our search to the class of functions which satisfy this condition. For any such h, write h = h₊ + h₋, where h₊ is increasing and h₋ is decreasing. It follows that for any x,

1 \geq \int_{0}^{\infty} H (t) d F_{n} (t) = \int_{0}^{\infty} h (t) S_{n} (t) d t \geq h_{-} (x) \int_{0}^{x} S_{n} (t) d t .

A similar bound for h₊ yields

h (x) \leq {(\int_{0}^{x} S_{n} (t) d t)}^{- 1} + {(\int_{x}^{\infty} S_{n} (t) d t)}^{- 1} \equiv M_{n} (x)

(2.14)

for all x in (0, X_(n)). Thus we know that h(x) must be bounded for x ∈ (0, X_(n−1)].

To show that h is also bounded at zero, we need to show that h’(X₍₁₎) is bounded from below. Assuming that it is negative, we may write, for 0 < x ≤ X₍₁₎,

h (X_{(1)}) + h^{'} (X_{(1)}) (x - X_{(1)}) = h (x) \leq M_{n} (x) .

Fixing x* > 0 and less than X₍₁₎, we then obtain that

h^{'} (X_{(1)}) \geq (M_{n} (x^{*}) - h (X_{(1)})) ∕ (x^{*} - X_{(1)}),

from which it follows that h must be bounded on the set [0, X_(n−1)].

By (2.5), we also have that

n \geq H (X_{(n)}) \geq \int_{X_{(n - 1)}}^{X_{(n - 1)} + δ} h (t) d t = \int_{X_{(n - 1)}}^{X_{(n - 1)} + δ} {h (X_{(n - 1)}) + h^{'} (X_{(n - 1)}) (t - X_{(n - 1)})} d t

if h is increasing on [X_(n−1), X_(n)). This implies that h’(X_(n−1)) is bounded above, completing the proof.

We now show uniqueness. Suppose that h₁ and h₂ both minimize φ_n. Then, by (2.5), φ_n(h₁) and φ_n(h₂) differ only in the term $- \int_{0}^{\infty} \log h_{i} (t) 1 (t \neq X_{(n)}) d F_{n} (t)$ . However, this term is strictly convex and it follows that h₁(X_(i)) = h₂(X_(i)) for all i = 1, … , n − 1.

Let h‾ = (h₁ + h₂)/2. By linearity, we have that φ_n(h₁) = φ(h₂) = φ(h‾), which implies that h‾ is also a minimizer. However, the only way that this is possible is if h‾ also satisfies the conditions of Proposition 2.1. This implies that one of the following holds:

Both h₁ and h₂ are increasing and h₁(0) = h₂(0) = 0. In this case, they must have the same locations for their changes of slope, as otherwise h‾ violates Proposition 2.1.
Point 1 above does not hold. Then, by the same argument as above, if h₁ and h₂ have at least one change of slope in an interval between observations (or between zero and X₍₁₎), then these locations of change of slope must be equal.

If the first case holds, then it is not difficult to see that h₁ ≡ h₂ on [0, X_(n)), as h₁(t) = h₁(t) = 0 on [0, τ₁] and h₁(X_i) = h₂(X_i) for all observation points.

In the second case, we use a different argument. We know that neither h₁ nor h₂ have touchpoints before X₍₁₎. Let t* denote the first touchpoint of h₁ and (without loss of generality) assume that the first touchpoint of h₂ is greater than t*. Hence, by (2.6),

h_{1} (X_{(1)}) = h_{2} (X_{(1)}), \int_{0}^{t^{*}} h_{1} (t) d t = F_{n} (t^{*}) .

Now, h‾ = (h₁ + h₂)/2 and h₂ are also minimizers of the MLE criterion function φ_n. Also, h‾ has a touchpoint at t* and h‾(X₍₁₎) = h₂(X₍₁₎).

Averaging h‾ with h₂ yields the functions h‾_l = 2⁻¹(h₁ − h₂) + h₂, which satisfy

{\overset{‒}{h}}_{l} (X_{(1)}) = h_{2} (X_{(1)}), \int_{0}^{t^{*}} {\overset{‒}{h}}_{l} (t) d t = F_{n} (t^{*}) for all l \geq 1 .

Since h‾_l → h₂ pointwise, it follows from the dominated convergence theorem that $\int_{0}^{t^{*}} h_{2} (t) d t = F_{n} (t^{*})$ . Therefore, since h₁ and h₂ are both linear on [0, t*] with h₁(X₍₁₎) = h₂(X₍₁₎) and $\int_{0}^{t^{*}} h_{1} (t) d t = \int_{0}^{t^{*}} h_{2} (t) d t$ , they must have both the same value and slope at X₍₁₎. That is, both h₁(X₍₁₎) = h₂(X₍₂₎) and $h_{1}^{'} (X_{(1)}) = h_{2}^{'} (X_{(2)})$ hold.

Now write,

\begin{matrix} h_{1} (t) = a_{1} + b_{1} t + \sum_{i = 1}^{m_{1} - 1} ν_{i, 1} {(t - t_{i, 1})}_{+}, \\ h_{2} (t) = a_{2} + b_{2} t + \sum_{i = 1}^{m_{2} - 1} ν_{i, 2} {(t - t_{i, 2})}_{+}, \end{matrix}

where X₍₁₎ < t_1,j < t_2,j < ⋯ < t_{m_j−1,j} < X_(n), j = 1, 2, and where h₁(X_(i)) = h₂(X_(i)) for i = 1, … , n. We also assume that ν_i,j > 0 for i = 1, … , m_j − 1, j = 1, 2. This implies, in particular, that h_j(t) = a_j + b_jt for t ≤ t_1,j , j = 1, 2, and since X₍₁₎ < t_1,j, j = 1, 2, h₁(X₍₁₎) = h₂(X₍₁₎). Thus a₁ + b₁X₍₁₎ = a₂ + b₂X₍₁₎. From the argument above, it follows that $b_{1} = h_{1}^{'} (X_{(1)}) = h_{2}^{'} (X_{(1)}) = b_{2}$ . We conclude that a₁ = a₂ and b₁ = b₂ so that h₁(t) = h₂(t) for 0 ≤ t ≤ t*. It also follows that t_1,1 = t_1,2.

Repeating this argument on the interval [t*, t**] with t** = min{t_2,1, t_2,2 shows that ν_1,1 = ν_1,2 or t_2,1 = t_2,2. Proceeding by induction yields ν_j,1 = ν_j,2 and t_j+1,1 = t_j+1,2 for j = 1, … , m₁ − 1 = m₂ − 1, hence uniqueness.

3. Consistency

Theorem 3.1

Suppose that X₁, … , X_n are i.i.d. random variables with convex hazard function h₀ and corresponding distribution function F₀. Let T₀ ≡ T₀(F₀) ≡ inf{t : F₀(t) = 1}. The MLE ${\hat{h}}_{n} (t)$ is then consistent for all t ∈ (0, T₀). Also, for all δ > 0,

\sup_{δ \leq t \leq T_{0} - δ} ∣ {\hat{h}}_{n} (t) - h (t) ∣ \to 0, almost surely,

if T₀ < ∞. if T₀ = ∞, the above statement holds with T₀ − δ replaced by any K < ∞.

Remark 3.2

If h₀ is increasing at 0, then one can show that ${\hat{h}}_{n}$ is not consistent at zero. This is a frequently occurring difficulty of shape constrained estimators; see, for example, [1,12,30].

Proof

We first show that ${\hat{h}}_{n}$ is bounded appropriately so that we can select convergent subsequences. Decompose ${\hat{h}}_{n}$ into its decreasing and increasing parts: ${\hat{h}}_{n} = {\hat{h}}_{n, ↓} + {\hat{h}}_{n, ↑}$ . Then, arguing as in (2.14), it follows from (2.5) that

{\hat{h}}_{n, ↓} (x) \leq \frac{1}{\int_{0}^{x} S_{n} (t) d t},

(3.1)

where the right-hand side is almost surely bounded and, in fact, converges almost surely to $1 ∕ \int_{0}^{x} S_{0} (t) d t p h b \infty$ for all x > 0. Also,

{\hat{h}}_{n, ↑} (x) \leq \frac{1}{\int_{x}^{x + δ} S_{n} (t) d t},

(3.2)

where the right-hand side is almost surely bounded for x ∈ (supp(F₀))° and converges almost surely to $1 ∕ \int_{x}^{x + δ} S_{0} (t) d t p h b \infty$ .

Now, take γ = h₀ in the directional derivative (2.9). It follows that

0 \leq \lim_{ε ↓ 0} \frac{φ_{n} ({\hat{h}}_{n} + ε h_{0}) - φ_{n} ({\hat{h}}_{n})}{ε} = \int_{0}^{\infty} {H_{0} (t) - \frac{h_{0} (t)}{{\hat{h}}_{n} (t)}} d F_{n} (t),

noting that ${\hat{h}}_{n} (X_{(n)}) = \infty$ , and hence

\int_{0}^{\infty} \frac{h_{0} (t)}{{\hat{h}}_{n} (t)} d F_{n} (t) \leq \int_{0}^{\infty} H_{0} (t) d F_{n} (t) \underset{a . s .}{\to} \int_{0}^{\infty} H_{0} (t) d F_{0} (t) = 1 .

Fix any 0 < a < b < ∞ such that a, b ∈ (supp(F₀))°. It follows that lim_n X_(n) > b with probability one (this can be shown using the Borel–Cantelli theorem). Also, sup $∣ F_{n} (t) - F_{0} (t) ∣ \to_{a . s .} 0$ by the Glivenko–Cantelli lemma. Both of these events occur on the set Ω, with P(Ω) = 1. Fix ω ∈ Ω. We will show that ${\hat{h}}_{n} \to h_{0}$ for such an ω.

Let {n’} denote any subsequence of {n}. By the bounds in (3.1) and (3.2) (which are finite for our choice of ω), using a classical diagonalization argument and the continuity of convex functions, we may extract a further subsequence {n”} such that ${\hat{h}}_{n^{''}} \to \hat{h}$ pointwise on [a, b], where the limit $\hat{h}$ must be convex. We denote this subsequence {n} to simplify notation.

From Fatou’s lemma, it follows that

\begin{matrix} \int_{a}^{b} \frac{h_{0}^{2} (t)}{{\hat{h}}_{n} (t)} S_{0} (t) d t & = \int_{a}^{b} \frac{h_{0} (t)}{{\hat{h}}_{n} (t)} f_{0} (t) d t \leq \underset{n}{\lim \inf} \int_{a}^{b} \frac{h_{0} (t)}{{\hat{h}}_{n} (t)} d F_{n} (t) \\ \leq \underset{n}{\lim \sup} \int_{0}^{\infty} \frac{h_{0} (t)}{{\hat{h}}_{n} (t)} d F_{n} (t) \leq \lim_{n} \int_{0}^{\infty} H_{0} (t) d F_{n} (t) \leq 1 . \end{matrix}

Note that this implies that if $\hat{h} (t) = 0$ , then h₀(t) = 0. By (2.5) and integration by parts, we see that $1 \geq \int_{[0, X_{(n)})} {\hat{h}}_{n} (t) S_{n} (t) d t$ . Therefore, again applying Fatou’s lemma,

1 \geq \int_{a}^{b} \hat{h} (t) S_{0} (t) d t .

It also follows that

\begin{matrix} 0 \leq & \int_{a}^{b} \frac{{(\hat{h} (t) - h_{0} (t))}^{2}}{\hat{h} (t)} S_{0} (t) d t \\ = & \int_{a}^{b} \hat{h} (t) S_{0} (t) d t - 2 \int_{a}^{b} h_{0} (t) S_{0} (t) d t + \int_{a}^{b} \frac{h_{0}^{2} (t)}{\hat{h} (t)} S_{0} (t) d t \\ \leq & 2 - 2 \int_{a}^{b} h_{0} (t) S_{0} (t) d t . \end{matrix}

Define $\hat{h} = h_{0}$ for t ∉ [a, b], which allows us to let both 1/a and b → ∞ in the above display. Since $\int_{0}^{\infty} h_{0} (t) S_{0} (t) d t = 1$ , it follows that

\int_{0}^{\infty} \frac{{(\hat{h} (t) - h_{0} (t))}^{2}}{\hat{h} (t)} S_{0} (t) d t = 0

and this implies that $\hat{h} (t) = h_{0} (t)$ for all t ∈ [a, b].

We have thus shown that every subsequence ${{\hat{h}}_{n} (x)}$ has a further subsequence which converges to the true hazard function h₀(x) pointwise, for all x ∈ (supp F₀)°. It follows that ${{\hat{h}}_{n}}$ converges to h₀ pointwise. By Theorem 10.8, page 90, [27], this implies that the claimed uniform convergence on [a, b] also holds. As this happens for any ω ∈ Ω, and P(Ω) = 1, we have proven the result.

Corollary 3.3

Suppose that $h_{0}^{''}$ is continuous and strictly positive at x₀ ∈ (supp F₀)°. It follows that there exist touchpoints τ_n ≤ x₀ ≤ η_n such that τ_n, η_n → x₀ in probability.

Proof

Let η_n, τ_n be touchpoints such that τ_n ≤ x₀ ≤ η_n. If τ_n does not exist, then set τ_n = 0, and η_n = ∞ otherwise. Suppose that it is not the case that τ_n, η_n →_p x₀. It then follows from Theorem 3.1 that there exists an interval I = [a, b] such that x₀ ∈ I for |I| > 0, lim sup_n τ_n ≤ a and lim inf_n η_n ≥ b almost surely, and, lastly, $\lim {\hat{h}}_{n} (t) \to_{a . s .} h_{0} (t)$ on I. However, this implies that h₀(t) is linear on I, which is a contradiction.

From consistency of the estimator, we also obtain consistency of the derivatives.

Corollary 3.4

Suppose that x ∈ (a, b) and $\sup_{a \leq t \leq b} ∣ {\hat{h}}_{n} (t) - h_{0} (t) ∣ \to_{a . s .} 0$ . Then ${\hat{h}}_{n}^{'} (x) \to_{a . s .} h_{0}^{'} (x)$ at all continuity points x of $h_{0}^{'}$ on (a, b).

This follows from the following simple result for convex functions, proved in [12].

Lemma 3.5

Suppose that h‾_n is a sequence of convex functions satisfying sup_a≤x≤b |h‾_n(t) − h₀(t)| → 0 with probability one. Then (also with probability one), for all x ∈ (a, b),

- \infty < h_{0}^{'} (x^{-}) \leq \underset{n \to \infty}{\lim \inf} {\overset{‒}{h}}_{n}^{'} (x^{-}) \leq \underset{n \to \infty}{limsup} {\overset{‒}{h}}_{n}^{'} (x^{+}) \leq h_{0}^{'} (x^{+}) < \infty .

4. Asymptotic lower bounds for the minimax risk

Define the class of densities $C$ by

\begin{matrix} C = & {f : [0, \infty) \to [0, \infty) : \int_{0}^{\infty} f (x) d x = 1, \\ h (x) = f (x) ∕ (1 - F (x)) is convex, h (x) > 0 for all x > 0} . \end{matrix}

We want to derive asymptotic lower bounds for the local minimax risks for estimating the convex hazard function h and its derivative at a fixed point. The L₁-minimax risk for estimating a functional T of f₀ based on a sample X₁, … , X_n of size n from f₀ which is known to be in a subset $C_{n}$ of $C$ is defined by

{MMR}_{1} (n, T, C_{n}) = \inf_{T_{n}} \sup_{f \in C_{n}} E_{f} ∣ T_{n} - T f ∣,

(4.1)

where the infimum ranges over all possible measurable functions T_n = t_n(X₁, … , X_n) mapping $R^{n}$ to $R$ . The shrinking classes $C_{n}$ used here are Hellinger balls centered at f₀,

C_{n, τ} = {f \in C : H^{2} (f, f_{0}) \equiv \frac{1}{2} \int_{0}^{\infty} {(\sqrt{f (z)} - \sqrt{f_{0} (z)})}^{2} d z \leq τ ∕ n} .

Consider estimation of

T_{1} (f) = \frac{f (x_{0})}{1 - F (x_{0})} = h (x_{0}), T_{2} (f) = h^{'} (x_{0}) .

(4.2)

Let $f_{0} \in C$ and x₀ > 0 be fixed such that h₀ is twice continuously differentiable at x₀. Define, for ε > 0, the functions h_ε as follows:

h_{ε} (z) = {\begin{matrix} h_{0} (x_{0} - ε c_{ε}) + (z - x_{0} + ε c_{ε}) h_{0}^{'} (x_{0} - ε c_{ε}), & z \in [x_{0} - ε c_{ε}, x_{0} - ε], \\ h_{0} (x_{0} + ε) + (z - x_{0} - ε) h_{0}^{'} (x_{0} + ε), & z \in [x_{0} - ε, x_{0} + ε], \\ h_{0} (z), & otherwise . \end{matrix}

Here, c_ε is chosen so that h_ε is continuous at x₀ − ε. Using continuity of h_ε and a second order expansion of h₀, it follows that c_ε = 3 + o(1) as ε → 0. Now, define f_ε by

f_{ε} (z) = \exp (- H_{ε} (z)) h_{ε} (z),

where $H_{ε} (z) \equiv \int_{0}^{z} h_{ε} (u) d u$ . It follows easily that

T_{1} (f_{ε}) - T_{1} (f_{0}) = \frac{1}{2} h_{0}^{''} (x_{0}) ε^{2} + o (ε^{2}),

(4.3)

T_{2} (f_{ε}) - T_{2} (f_{0}) = h_{0}^{''} (x_{0}) ε + o (ε) .

(4.4)

Furthermore, the following lemma holds.

Lemma 4.1

Under the above assumptions,

H^{2} (f_{ε}, f_{0}) = \frac{2}{5} \frac{h_{0}^{''} {(x_{0})}^{2} (1 - F (x_{0}))}{h_{0} (x_{0})} ε^{5} + o (ε^{5}) \equiv ν_{0} ε^{5} + o (ε^{5}) .

Proof

The lemma follows from Lemma 2 of [18] and

\int \frac{{(f_{ε} (x) - f_{0} (x))}^{2}}{f_{0} (x)} d x = \frac{16}{5} \frac{h_{0}^{''} {(x_{0})}^{2} (1 - F (x_{0}))}{h_{0} (x_{0})} ε^{5} + o (ε^{5}) .

This is achieved by careful calculation.

Combining (4.3) and (4.4) with the lemma, it follows that

\begin{matrix} ∣ T_{1} (f_{{(ε ∕ ν_{0})}^{1 ∕ 5}}) - T_{1} (f_{0}) ∣ \geq {(\frac{h_{0} (x_{0}) \sqrt{h_{0}^{''} (x_{0})}}{S_{0} (x_{0}) 8 \sqrt{2}})}^{2 ∕ 5} ε^{2 ∕ 5} (1 + o (1)), \\ ∣ T_{2} (f_{{(ε ∕ ν_{0})}^{1 ∕ 5}}) - T_{2} (f_{0}) ∣ \geq {(\frac{5 h_{0} (x_{0}) h_{0}^{''} {(x_{0})}^{3}}{2 S_{0} (x_{0})})}^{1 ∕ 5} ε^{1 ∕ 5} (1 + o (1)) . \end{matrix}

From these calculations, together with Lemma 5.1 of [12], we have the following result. Along with Theorem 1.2, it indicates that ${\hat{h}}_{n} (x_{0})$ and ${\hat{h}}_{n}^{'} (x_{0})$ achieve optimal rates and also have the correct dependence on the parameters h”(x₀) and h(x₀) (up to absolute constants).

Theorem 4.2 (Minimax risk lower bound)

For the functionals T₁ and T₂ as defined in (4.2), and with ${MMR}_{1} (n, T, C_{n, τ})$ as defined in (4.1),

\begin{matrix} \sup_{τ > 0} \underset{n \to \infty}{\lim \sup} n^{2 ∕ 5} {MMR}_{1} (n, T_{1}, C_{n, τ}) \geq \frac{1}{4} {(\frac{h_{0} (x_{0}) \sqrt{h_{0}^{''} (x_{0})}}{S_{0} (x_{0}) e 8 \sqrt{2}})}^{2 ∕ 5} and \\ \sup_{τ > 0} \underset{n \to \infty}{\lim \sup} n^{1 ∕ 5} {MMR}_{1} (n, T_{2}, C_{n, τ}) \geq \frac{1}{4} {(\frac{1}{4 e} \frac{h_{0} (x_{0}) h_{0}^{''} {(x_{0})}^{3}}{2 S_{0} (x_{0})})}^{1 ∕ 5} . \end{matrix}

In particular, Theorem 1.2 shows that the MLE achieves the optimal pointwise rate of convergence, n^2/5, at points x₀ with h”(x₀) > 0. Convergence rates over the larger class of bathtub-shaped functions would be slower: the MLE of a U-shaped hazard is known to converge locally at rate n^1/3; see, for example, [2].

5. Rates of convergence

In this section, we identify the local rates of convergence of the MLE. Fix a point x₀ ∈ (supp F₀)°. To obtain the results, we assume that $h_{0}^{''} (\cdot)$ is continuous and strictly positive in a neighborhood of x₀ and that h(x₀) > 0.

5.1. Some useful estimates

For 0 < x ≤ y, define

U_{n} (x, y) = \int_{x}^{y} {\frac{z - (1 ∕ 2) (x + y)}{{\hat{h}}_{n} (z)}} d (F_{n} - F_{0}) (z) .

Lemma 5.1

Let x₀ ∈ (supp F₀)°. Then, for each ε > 0, there exist constants δ, c₀, n₀ and (positive) random variables M_n (independent of x, y), of order O_p(1), such that for each |x − x₀| < δ,

∣ U_{n} (x, y) ∣ \leq ε {(y - x)}^{4} + n^{- 4 ∕ 5} M_{n}, 0 \leq y - x \leq c_{0},

(5.1)

for all n ≥ n₀.

Proof

Note that $U_{n} = (P_{n} - P_{0}) (g_{x, y, {\hat{h}}_{n}})$ , where

g_{x, y, h} (z) \equiv \frac{f_{x, y} (z)}{h (z)} 1_{[x, y]} (z)

and, in view of the consistency established in Theorem 3.1, ${\hat{h}}_{n}$ is a convex function uniformly close to h₀ on neighborhoods of x₀. This leads to the consideration of the class of functions

F_{x, R} \equiv {\begin{matrix} z \mapsto g_{x, y, h} (z) : x \leq y \leq x + R, h convex, \\ {∥ h - h_{0} ∥}_{x_{0} - δ}^{x_{0} + δ + c_{0}} \leq γ \end{matrix}}

with γ ≡ inf_{x₀ − δ≤x≤x₀+δ+c₀} h₀(x)/2 and we define $G_{n} \equiv {{∥ {\hat{h}}_{n} - h_{0} ∥}_{x_{0} - δ}^{x_{0} + δ + c_{0}} \leq γ}$ . The class $F_{x, R}$ has an envelope function F_x,R(z) = γ⁻¹{(z − x)1_[x,x+R](z)+2⁻¹R1_[x,x+R](z)} and hence the following second moment bound holds:

E {{[F_{x, R}]}^{2}} = \frac{1}{γ^{2}} \int_{[x, x + R]} {[(z - x) + R ∕ 2]}^{2} f_{0} (z) d z \leq \frac{13}{12 γ^{2}} {∥ f_{0} ∥}_{x_{0} - δ}^{x_{0} + δ} R^{3} .

Furthermore, $\log N_{[]} (ε, F_{x, R}, L_{2} (P_{0})) \leq K ∕ ε^{1 ∕ 2}$ for some constant K by [29], Theorem 2.7.10, page 159, and a straightforward bracketing argument. It then follows from [29], Theorems 2.14.2 and 2.14.5, pages 240 and 244, that

E {{(\sup_{f \in F_{x, R}} ∣ (P_{n} - P_{0}) (f) ∣)}^{2}} \leq \frac{1}{n} K^{'} E {{[F_{x, R} (X_{1})]}^{2}} = O (n^{- 1} R^{3}) .

(5.2)

Define M_n(ω) as the infimum (possibly +∞) of those values such that (5.1) holds and define A(n, j) to be the set [(j − 1)n^−1/5, jn^−1/5). Then, for m constant,

\begin{matrix} P (M_{n} > m) & \leq P ([M_{n} > m] \cap G_{n}) + P (G_{n}^{c}) \\ \leq P ([\exists u : ∣ U_{n} (x, x + u) ∣ > ε u^{4} + n^{- 4 ∕ 5} m] \cap G_{n}) + P (G_{n}^{c}) \\ \leq \sum_{j \geq 1} P ([\exists u \in A (n, j) : n^{4 ∕ 5} ∣ U_{n} (x, x + u) ∣ > ε {(j - 1)}^{4} + m] \cap G_{n}) + P (G_{n}^{c}) . \end{matrix}

The jth summand is hence bounded by

n^{8 ∕ 5} E [\sup_{u \in A (n, j)} ∣ U_{n} (x, x + u) m^{2} ∣ 1_{G_{n}}] ∕ {[m + ε {(j - 1)}^{4}]}^{2} \leq C \frac{j^{3}}{{[m + ε {(j - 1)}^{4}]}^{2}}

due to (5.2). Thus it follows, using Theorem 3.1 to conclude that $P (G_{n}^{c}) \to 0$ , that

\underset{n \to \infty}{limsup} P (M_{n} > m) \leq C \sum_{j = 1}^{\infty} \frac{j^{3}}{{[m + ε {(j - 1)}^{4}]}^{2}},

where the sum in the bound is finite and converges to zero as m → ∞. This completes the proof of the claim.

A similar approach proves the following for the function

V_{n} (x, y) = \int_{x}^{y} {z - (x + y) ∕ 2} (S_{n} (z) - S_{0} (z)) d z .

Lemma 5.2

Let x₀ ∈ (supp F₀)°. Then, for each ε > 0, there exist constants δ, c₀ > 0 and (positive) random variables M_n (independent of x, y) of order O_p(1) such that for each |x − x₀| < δ,

∣ V_{n} (x, y) ∣ \leq ε n^{- 1 ∕ 5} {(y - x)}^{4} + n^{- 1} M_{n}, 0 \leq y - x \leq c_{0} .

(5.3)

5.2. Asymptotic behavior of touchpoints and resulting bounds

Lemma 5.3

Let x₀ > 0 be a point at which h₀ has a continuous and strictly positive second derivative, and where h(x₀) > 0. Let ξ_n be any sequence of numbers converging to x₀ and define τ_n and η_n to be the largest touchpoint of ${\hat{h}}_{n}$ smaller than ξ_n and the smallest touchpoint larger than ξ_n, respectively. Then

η_{n} - τ_{n} = O_{p} (n^{- 1 ∕ 5}) .

Proof

By Theorem 3.1, we know that ${\hat{h}}_{n}$ is positive near x₀ for large enough n. Also, it is either strictly increasing or strictly decreasing in a neighborhood of x₀, or it is locally flat. If ${\hat{h}}_{n}$ is decreasing between τ_n and η_n, then (2.7) and (2.2) with equality at both η_n and τ_n hold. If, instead, ${\hat{h}}_{n}$ is increasing, then (2.8) and (2.3) with equality at both η_n and τ_n hold. There is only the potential for a problem in the locally flat case. However, since ${\hat{h}}_{n}$ is strictly positive, by Corollary 2.5, we can extend the necessary equalities to this case as well. Therefore, we need only consider two cases, ${\hat{h}}_{n}$ is either non-increasing or non-decreasing on [τ_n, η_n].

We first assume that ${\hat{h}}_{n}$ is non-increasing on [τ_n, η_n]. Define

{\hat{H}}_{n, ↓} (z) = \int_{0}^{z} \frac{z - t}{{\hat{h}}_{n} (t)} d F_{n} (t) and A_{n, ↓} (z) = \int_{0}^{z} S_{n} (t) d t,

(5.4)

and let m_n be the midpoint of [τ_n, η_n], m_n = (τ_n + η_n)/2. We may then calculate

\begin{matrix} {\hat{H}}_{n, ↓} (m_{n}) & = \int_{m_{n}}^{η_{n}} \frac{x - m_{n}}{{\hat{h}}_{n} (x)} d F_{n} (x) + {\hat{H}}_{n, ↓} (η_{n}) - (η_{n} - m_{n}) {\hat{H}}_{n, ↓}^{'} (η_{n}) \\ = \int_{τ_{n}}^{m_{n}} \frac{m_{n} - x}{{\hat{h}}_{n} (x)} d F_{n} (x) + {\hat{H}}_{n, ↓} (τ_{n}) + (m_{n} - τ_{n}) {\hat{H}}_{n, ↓}^{'} (τ_{n}) . \end{matrix}

From (2.2), we know that $2 H_{n, ↓} (m_{n}) \leq 2 \int_{0}^{m_{n}} A_{n, ↓} (t) d t$ . The equality in (2.2), together with (2.7), allows us to rewrite this as 0 ≥ L_1,↓ + L_2,↓, where L_1,↓ is equal to

\begin{matrix} \int_{m_{n}}^{η_{n}} & \frac{x - m_{n}}{{\hat{h}}_{n} (x)} d F_{n} (x) + \int_{τ_{n}}^{m_{n}} \frac{m_{n} - x}{{\hat{h}}_{n} (x)} d F_{n} (x) - \frac{η_{n} - τ_{n}}{4} {{\hat{H}}_{n, ↓}^{'} (η_{n}) - {\hat{H}}_{n, ↓}^{'} (τ_{n})} \\ = & \int_{m_{n}}^{η_{n}} \frac{x - (1 ∕ 2) (η_{n} + m_{n})}{{\hat{h}}_{n} (x)} d F_{n} (x) + \int_{τ_{n}}^{m_{n}} \frac{(1 ∕ 2) (τ_{n} + m_{n}) - x}{{\hat{h}}_{n} (x)} d F_{n} (x) \end{matrix}

and

\begin{matrix} L_{2, ↓} = & \int_{m_{n}}^{η_{n}} A_{n, ↓} (x) d x - \int_{τ_{n}}^{m_{n}} A_{n, ↓} (x) d x - \frac{1}{4} (η_{n} - τ_{n}) {A_{n, ↓} (η_{n}) - A_{n, ↓} (τ_{n})} \\ = & - {\int_{m_{n}}^{η_{n}} {x - \frac{1}{2} (η_{n} + m_{n})} S_{n} (x) d x + \int_{τ_{n}}^{m_{n}} {\frac{1}{2} (τ_{n} + m_{n}) - x} S_{n} (x) d x}, \end{matrix}

by integration by parts.

Now replace $F_{n}$ by the true F₀ in the definition of L_1,↓ to obtain

\begin{matrix} L_{1, ↓}^{0} \equiv & \int_{m_{n}}^{η_{n}} \frac{x - (1 ∕ 2) (η_{n} + m_{n})}{{\hat{h}}_{n} (x)} d F_{0} (x) + \int_{τ_{n}}^{m_{n}} \frac{(1 ∕ 2) (τ_{n} + m_{n}) - x}{{\hat{h}}_{n} (x)} d F_{0} (x) \\ = & \int_{m_{n}}^{η_{n}} {x - \frac{1}{2} (η_{n} + m_{n})} {\frac{1}{{\hat{h}}_{n} (x)} - \frac{1}{h_{0} (x)}} d F_{0} (x) \\ + \int_{τ_{n}}^{m_{n}} {\frac{1}{2} (τ_{n} + m_{n}) - x} {\frac{1}{{\hat{h}}_{n} (x)} - \frac{1}{h_{0} (x)}} d F_{0} (x) - L_{2, ↓}^{0}, \end{matrix}

where

L_{2, ↓}^{0} = - \int_{m_{n}}^{η_{n}} {x - \frac{1}{2} (η_{n} + m_{n})} S_{0} (x) d x - \int_{τ_{n}}^{m_{n}} {\frac{1}{2} (τ_{n} + m_{n}) - x} S_{0} (x) d x .

Next, using a Taylor expansion of order 2 on the function $1 ∕ {\hat{h}}_{n} (x) - 1 ∕ h_{0} (x)$ and about the point m_n, we obtain

L_{1, ↓}^{0} + L_{2, ↓}^{0} = \frac{1}{192} {\frac{h_{0}^{''} (x_{0})}{h_{0}^{2} (x_{0})} f_{0} (x_{0})} {(η_{n} - τ_{n})}^{4} + o ({(η_{n} - τ_{n})}^{4})

(5.5)

since both ${\hat{h}}_{n}$ and ${\hat{h}}_{n}^{'}$ are consistent by Theorem 3.1, ${\hat{h}}_{n}^{''} (x) = 0$ on (τ_n, η_n) and because τ_n − η_n = o_p(1) by Corollary 3.3. Therefore, by Lemmas 5.1 and 5.2, together with the above calculations, we may write

\begin{matrix} 0 & \geq L_{1, ↓} + L_{2, ↓} \\ = L_{1, ↓}^{0} + L_{2, ↓}^{0} + (L_{1, ↓} - L_{1, ↓}^{0}) + (L_{2, ↓} - L_{2, ↓}^{0}) \\ \geq L_{1, ↓}^{0} + L_{2, ↓}^{0} - ε {(η_{n} - τ_{n})}^{4} - O_{p} (n^{- 4 ∕ 5}) - ε n^{- 1 ∕ 5} {(η_{n} - τ_{n})}^{4} - O_{p} (n^{- 1}) \\ = \frac{1}{192} {\frac{h_{0}^{''} (x_{0})}{h_{0}^{2} (x_{0})} f (x_{0}) - 192 ε} {(η_{n} - τ_{n})}^{4} + o ({(η_{n} - τ_{n})}^{4}) - O_{p} (n^{- 4 ∕ 5}) . \end{matrix}

We choose ε sufficiently small (so that the leading term in the last line of the above display is positive) and hence conclude that (η_n − τ_n) = O_p(n^−1/5). A similar approach proves the non-decreasing case.

Lemma 5.4

Let ξ_n be a sequence converging to x₀. Then, for any ε > 0, there exists an M > 1 and a c > 0 such that, with probability greater than 1 − ε, we have that there exist change points τ_n < ξ_n < η_n of ${\hat{h}}_{n}$ such that

\inf_{t \in [τ_{n}, η_{n}]} ∣ {\hat{h}}_{n} (t) - h_{0} (t) ∣ < c n^{- 2 ∕ 5}

for all n sufficiently large.

Proof

Fix ε > 0. From Lemma 5.3, it follows that there exist touchpoints η_n and τ_n, and an M > 1 such that ξ_n − Mn^−1/5 ≤ τ_n ≤ ξ_n − n^−1/5 ≤ ξ_n + n^−1/5 ≤ η_n ≤ ξ_n + Mn^−1/5.

Fix c > 0 and consider the event

\inf_{t \in [τ_{n}, η_{n}]} ∣ {\hat{h}}_{n} (t) - h_{0} (t) ∣ \geq c n^{- 2 ∕ 5} .

(5.6)

First, assume that ${\hat{h}}_{n}$ is non-increasing on [τ_n, η_n]. On this set, we have that

\begin{matrix} ∣ \int_{τ_{n}}^{η_{n}} (η_{n} - t) \frac{{\hat{h}}_{n} (t) - h_{0} (t)}{{\hat{h}}_{n} (t)} S_{0} (t) d t ∣ \\ \geq B c n^{- 2 ∕ 5} {(η_{n} - τ_{n})}^{2} \geq B c n^{- 4 ∕ 5}, \end{matrix}

where B is some constant depending on x₀. Using the definitions in (5.4), as well as the equality in condition (2.2) with (2.7), it follows that

\begin{matrix} 0 = & {\hat{H}}_{n, ↓} (η_{n}) - \int_{0}^{η_{n}} A_{n, ↓} (t) d t - {\hat{H}}_{n, ↓} (τ_{n}) + \int_{0}^{τ_{n}} A_{n, ↓} (t) d t \\ - ({\hat{H}}_{n, ↓}^{'} (τ_{n}) - A_{n, ↓} (τ_{n})) (η_{n} - τ_{n}) \\ = & \int_{τ_{n}}^{η_{n}} \frac{η_{n} - t}{{\hat{h}}_{n} (t)} d F_{n} (t) - \int_{τ_{n}}^{η_{n}} (η_{n} - t) S_{n} (t) d t \\ = & \int_{τ_{n}}^{η_{n}} (η_{n} - t) \frac{{\hat{h}}_{n} (t) - h_{0} (t)}{{\hat{h}}_{n} (t)} S_{0} (t) d t + \int_{τ_{n}}^{η_{n}} \frac{η_{n} - t}{{\hat{h}}_{n} (t)} d {\overset{˘}{F}}_{n} (t) - \int_{τ_{n}}^{η_{n}} (η_{n} - t) {\overset{˘}{S}}_{n} (t) d t, \end{matrix}

where ${\overset{˘}{F}}_{n} (t) = F_{n} (t) - F_{0} (t)$ and ${\overset{˘}{S}}_{n} (t) = S_{n} (t) - S_{0} (t)$ . By the assumption on h₀ and x₀, and arguments similar to those used for Lemmas 5.1 and 5.2, we can show that

\int_{τ_{n}}^{η_{n}} (η_{n} - t) \frac{{\hat{h}}_{n} (t) - h_{0} (t)}{{\hat{h}}_{n} (t)} S_{0} (t) d t = O_{p} (n^{- 4 ∕ 5}),

which is a contradiction to (5.6) if c is chosen large enough. A similar argument completes the proof for the non-decreasing case.

The next proposition is the key to proving tightness in the next section. The results follow from the previous lemmas and make extensive use of the underlying convexity.

Proposition 5.5

Under the assumptions of this section, we have that, for each M > 0,

\sup_{∣ t ∣ \leq M} ∣ {\hat{h}}_{n} (x_{0} + n^{- 1 ∕ 5} t) - h_{0} (x_{0}) - n^{- 1 ∕ 5} t h_{0}^{'} (x_{0}) ∣ = O_{p} (n^{- 2 ∕ 5}),

(5.7)

\sup_{∣ t ∣ \leq M} ∣ {\hat{h}}_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) - h_{0}^{'} (x_{0}) ∣ = O_{p} (n^{- 1 ∕ 5}) .

(5.8)

Proof

For M, ε > 0 fixed, define η_n,1 to be the first point of touch after x₀ + Mn^−1/5 and η_n,i to be the first point of touch after η_n,i−1 + n^−1/5 , i = 2, 3 Define the points η_n,−i for i = 1, 2, 3 similarly, but working to the left of x₀. By Lemma 5.4, there exist points ξ_n,i ∈ (η_n,i, η_n,i+1), i = 1, 2, −2, −3, and a constant c > 0 such that with probability at least 1 − ε, we have that $∣ {\hat{h}}_{n} (ξ_{n, i}) - h_{0} (ξ_{n, i}) ∣ \leq c n^{- 2 ∕ 5}$ .

As ${\hat{h}}_{n}$ is convex, it follows that for any t ∈ [x₀ − Mn^−1/5, x₀ + Mn^−1/5],

\begin{matrix} {\hat{h}}_{n}^{'} (t) & \leq {\hat{h}}_{n}^{'} (ξ_{n, 1}) \leq \frac{{\hat{h}}_{n} (ξ_{n, 2}) - {\hat{h}}_{n} (ξ_{n, 1})}{ξ_{n, 2} - ξ_{n, 1}} \\ \leq \frac{h_{0} (ξ_{n, 2}) - h_{0} (ξ_{n, 1}) + 2 c n^{- 2 ∕ 5}}{ξ_{n, 2} - ξ_{n, 1}} \\ \leq h_{0}^{'} (ξ_{n, 2}) + 2 c n^{- 1 ∕ 5} \end{matrix}

since ξ_n,2 − ξ_n,1 ≥ n^−1/5, where ${\hat{h}}_{n}^{'} (t)$ denotes the right derivative at t . Because of the continuity of $h_{0}^{''} (\cdot)$ near x₀, we may replace $h_{0}^{'} (ξ_{n, 2})$ with $h_{0}^{'} (x_{0}) + \tilde{c} n^{- 1 ∕ 5}$ for some new constant $\tilde{c}$ . The result follows. A similar argument shows the lower bound.

We now consider (5.7). By Lemma 5.3, there exists a constant K > M such that there exist two touchpoints in [x₀ + Mn^−1/5, x₀ + Kn^−1/5], n^−1/5 apart with probability 1 − ε. The same is the case in the interval [x₀ − Mn^−1/5, x₀ − Kn^−1/5]. From Lemma 5.4, it follows that there exist points ξ_n,1 ∈ [x₀ + Mn^−1/5, x₀ + Kn^−1/5] and ξ_n,2 ∈ [x₀ − Mn^−1/5, x₀ − Kn^−1/5] such that $∣ {\hat{h}}_{n} (ξ_{n, i}) - h_{0} (ξ_{n, i}) ∣ \leq c n^{- 2 ∕ 5}$ , for i = 1, 2, with probability at least 1 − ε and sufficiently large n. Lastly, we have already shown that there exists a c’ such that with probability at least 1 − ε,

\sup_{t \in (x_{0} - {K_{n}}^{- 1 ∕ 5}, x_{0} + {K_{n}}^{- 1 ∕ 5})} ∣ {\hat{h}}_{n}^{'} (t) - h_{0}^{'} (x_{0}) ∣ \leq c^{'} n^{- 1 ∕ 5} .

Therefore, with probability at least 1 − 3ε, we have that for any t ∈ [x₀ − Mn^−1/5, x₀ + Mn^−1/5] and sufficiently large n,

\begin{matrix} {\hat{h}}_{n} (t) & \geq {\hat{h}}_{n} (ξ_{n, 1}) + {\hat{h}}_{n}^{'} (ξ_{n, 1}) (t - ξ_{n, 1}) \\ \geq h_{0} (ξ_{n, 1}) - c n^{- 2 ∕ 5} + (h_{0}^{'} (x_{0}) - c^{'} n^{- 1 ∕ 5}) (t - ξ_{n, 1}) \\ = h_{0} (x_{0}) + h^{'} (x_{0}) (t - x_{0}) + \frac{1}{2} h^{''} (x_{0}^{*}) {(ξ_{n, 1} - x_{0})}^{2} - c n^{- 2 ∕ 5} - c^{'} n^{- 1 ∕ 5} (t - ξ_{n, 1}) \\ \geq h_{0} (x_{0}) + h^{'} (x_{0}) (t - x_{0}) - B n^{- 2 ∕ 5} \end{matrix}

for some constant B > 0. A similar argument proves the other direction.

5.3. Limit distribution theory at a fixed point

From Lemma 5.3, we know what rescaling is necessary to pick up a meaningful limit. The idea of the proof is now to write carefully a local version of the characterization of the MLE, Lemma 2.2, and to show that in the limit, these become the characterization of the invelope (Definition 1.1). The invelope $I (\cdot)$ is described in terms of the “driving” process Y(·). Our goal will then be to identify the two processes, one which converges to the invelope and another which converges to the driving process Y .

Note that at x₀ (where h”(x₀) > 0), we have three possibilities:

$h_{0}^{'} (x_{0}) m a k 0$ : By continuity, $h_{0}^{'} (x) m a k 0$ in a neighborhood of x₀. It follows from the consistency of the MLE derivatives that ${\hat{h}}_{n}^{'} m a k 0$ for sufficiently large n and hence all touchpoints to be considered are of the “increasing” kind.
$h_{0}^{'} (x_{0}) p h b 0$ : By the same argument, all touchpoints are decreasing.
$h_{0}^{'} (x_{0}) = 0$ : Since h(x₀) > 0, by Corollary 2.5, there is always at least one touchpoint which satisfies both the non-increasing and non-decreasing properties. The limiting process may then be “stitched” together in an appropriate manner.

Therefore, it will be sufficient to prove the asymptotic results for both types of touchpoints: non-increasing and non-decreasing. For the sake of brevity, we outline the argument only for the non-increasing setting.

For any interval $[a, b] \subset R$ , let D[a, b] denote the space of cadlag functions from [a, b] into $R$ endowed with the Skorohod topology and C[a, b] the space of continuous functions endowed with the uniform topology.

Driving process

Define

B_{n} (t) \equiv \sqrt{n} (H_{n} (t) - H_{0} (t)),

(5.9)

where $H_{n}$ is the empirical cumulative hazard function, defined by $d H_{n} (u) = {(1 - F_{n} (t -))}^{- 1} d F_{n} (t)$ . From [28], Chapter 7, Theorem 7.4.1, page 307, we know that for t ∈ (0, T₀) with T₀ ≡ T₀(F₀) ≡ inf{x : F(x) = 1}, $B_{n} (t) \Rightarrow B (C (t))$ in D[0, M] for M < T₀, where B denoes a standard Brownian motion on [0, ∞) and C(t) = F₀(t)/S₀(t). Let x_n(t) = x₀ + n^−1/5t and define

{\tilde{Y}}_{n}^{loc} (t) \equiv n^{4 ∕ 5} \int_{x_{0}}^{x_{n} (t)} {H_{n} (v) - H_{n} (x_{0}) - \int_{x_{0}}^{v} (h_{0} (x_{0}) + (u - x_{0}) h_{0}^{'} (x_{0})) d u} d v .

(5.10)

It is not difficult to show that

\begin{matrix} {\tilde{Y}}_{n}^{loc} (t) & \Rightarrow \sqrt{C^{'} (x_{0})} \int_{0}^{t} W (s) d s + \frac{1}{24} h_{0}^{''} (x_{0}) t^{4}, \\ {({\tilde{Y}}_{n}^{loc})}^{'} (t) & \Rightarrow \sqrt{C^{'} (x_{0})} W (t) + \frac{1}{3!} h_{0}^{''} (x_{0}) t^{3} \end{matrix}

in D[−M, M] for each fixed 0 < M < ∞, where W is a two-sided Brownian motion process starting at 0 and C’(t) = h₀(t)/S₀(t). Next, define

\begin{matrix} {\hat{Y}}_{n, ↓}^{loc} (t) = & n^{4 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} \int_{x_{0}}^{x_{n} (t)} \int_{x_{0}}^{v} {\frac{h_{0} (u) - h_{0} (x_{0}) - (u - x_{0}) h_{0}^{'} (x_{0})}{{\hat{h}}_{n} (u)}} S_{n} (u) d u d v \\ + n^{4 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} \int_{x_{0}}^{x_{0} + {n^{- 1 ∕ 5}}_{t}} \int_{x_{0}}^{v} \frac{S_{n} (u)}{{\hat{h}}_{n} (u)} d {H_{n}^{*} (u) - H_{0} (u)} d v, \end{matrix}

where $d H_{n}^{*} (u) = \frac{S_{n} (u^{-})}{S_{n} (u)} d H_{n} (u)$ . The derivative ${({\hat{Y}}_{n, ↓}^{loc})}^{'} (t)$ is not difficult to calculate. By consistency of ${\hat{h}}_{n}$ and since $\sup_{t} ∣ S_{n} (t) - S_{0} (t) ∣ \to 0 a . s .,$ for any M > 0,

\lim_{n} \sup_{∣ t ∣ \leq M} ∣ {\hat{Y}}_{n, ↓}^{loc} (t) - {\tilde{Y}}_{n}^{loc} (t) ∣ = \lim_{n} \sup_{∣ t ∣ \leq M} ∣ {({\hat{Y}}_{n, ↓}^{loc})}^{'} (t) - {({\tilde{Y}}_{n}^{loc})}^{'} (t) ∣ = 0 a . s .

(5.11)

${\hat{Y}}_{n, ↓}^{loc}$ is our driving process.

Invelope process

Recall definitions (5.4). Our initial candidate for the invelope is defined as

\begin{matrix} {\hat{I}}_{n, ↓}^{loc} (t) = & n^{4 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} \int_{x_{0}}^{x_{n} (t)} \int_{x_{0}}^{v} {\frac{{\hat{h}}_{n} (u) - h_{0} (x_{0}) - (u - x_{0}) h_{0}^{'} (x_{0})}{{\hat{h}}_{n} (u)}} S_{n} (u) d u d v \\ + {\hat{A}}_{n, ↓} t + {\hat{B}}_{n, ↓}, \end{matrix}

where

\begin{matrix} {\hat{A}}_{n, ↓} & = - n^{3 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} {{\hat{H}}_{n, ↓}^{'} (x_{0}) - A_{n, ↓} (x_{0})} and \\ {\hat{B}}_{n, ↓} & = - n^{4 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} {{\hat{H}}_{n, ↓} (x_{0}) - \int_{0}^{x_{0}} A_{n, ↓} (v) d v} . \end{matrix}

Notice that because of the presence of $S_{n} (v)$ in its definition, ${\hat{I}}_{n, ↓}^{loc} (t)$ is not three times differentiable. We therefore define

\begin{matrix} {\hat{I}}_{n, ↓}^{*, loc} (t) = & n^{4 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} \int_{x_{0}}^{x_{n} (t)} \int_{x_{0}}^{v} {\frac{{\hat{h}}_{n} (u) - h_{0} (x_{0}) - (u - x_{0}) h_{0}^{'} (x_{0})}{{\hat{h}}_{n} (u)}} S_{0} (u) d u d v \\ + {\hat{A}}_{n, ↓} t + {\hat{B}}_{n, ↓}, \end{matrix}

From Proposition 5.5, we have that for any M > 0,

\lim_{n} \sup_{∣ t ∣ \leq M} ∣ {\hat{I}}_{n, ↓}^{loc} (t) - {\hat{I}}_{n, ↓}^{*, loc} (t) ∣ = 0 .

(5.12)

The derivatives of ${\hat{I}}_{n, ↓}^{*, loc}$ will describe the limiting behavior of our estimators. First, though, we must show that this process converges to the invelope. To do this, define the vector

{\hat{Z}}_{n} (t) = ({\hat{Y}}_{n, ↓}^{loc} (t), {({\hat{Y}}_{n, ↓}^{loc})}^{'} (t), {\hat{I}}_{n, ↓}^{*, loc} (t), {({\hat{I}}_{n, ↓}^{*, loc})}^{'} (t), {({\hat{I}}_{n, ↓}^{*, loc})}^{''} (t), {({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (t))

(5.13)

and fix M > 0. We will show that ${\hat{Z}}_{n}$ is tight in the product space

E [- M, M] \equiv C [- M, M] \times D (- M, M) \times C {[- M, M]}^{3} \times D [- M, M] .

This will be done last. We first assume that ${\hat{Z}}_{n}$ has a weak limit and identify its unique limit. The two arguments together prove that ${\hat{Z}}_{n}$ , and hence ${\hat{I}}_{n, ↓}^{*, loc}$ , have the appropriate limiting distribution.

Identifying the limit

It is sufficient to show that ${\hat{I}}_{n, ↓}^{*, loc}$ satisfies (1.2)–(1.4) in the limit.

For condition (1.2), calculate

{\hat{I}}_{n, ↓}^{loc} (t) - {\hat{Y}}_{n, ↓}^{loc} (t) = n^{4 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} {\int_{0}^{x_{0} + {n^{- 1 ∕ 5}}_{t}} A_{n, ↓} (v) d v - {\hat{H}}_{n, ↓} (x_{0} + n^{- 1 ∕ 5} t)} \geq 0,

(5.14)

with equality at the (non-increasing) touchpoints of ${\hat{h}}_{n}$ , using (2.2). By (5.12), it follows that ${\hat{I}}_{n, ↓}^{*, loc}$ satisfies (1.2) in the limit.

Next, the derivatives of ${\hat{I}}_{n, ↓}^{*, loc} (t)$ are calculated as follows:

\begin{matrix} {({\hat{I}}_{n, ↓}^{*, loc})}^{'} (t) = & n^{3 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} \int_{x_{0}}^{x_{n} (t)} {\frac{{\hat{h}}_{n} (u) - h_{0} (x_{0}) - (u - x_{0}) h_{0}^{'} (x_{0})}{{\hat{h}}_{n} (x)}} S_{0} (u) d u + {\hat{A}}_{n, ↓}, \\ {({\hat{I}}_{n, ↓}^{*, loc})}^{''} (t) = & n^{2 ∕ 5} \frac{h_{0} (x_{0})}{S_{0} (x_{0})} {\frac{{\hat{h}}_{n} (x_{n} (t)) - h_{0} (x_{0}) - n^{- 1 ∕ 5} t h_{0}^{'} (x_{0})}{{\hat{h}}_{n} (x_{n} (t))}} S_{0} (x_{0} + n^{- 1 ∕ 5} t) . \end{matrix}

Due to Theorem 3.1 and Proposition 5.5, we have that

\lim_{n} \sup_{∣ t ∣ \leq M} ∣ {({\hat{I}}_{n, ↓}^{*, loc})}^{''} (t) - n^{2 ∕ 5} [{\hat{h}}_{n} (x_{0} + n^{- 1 ∕ 5} t) - h_{0} (x_{0}) - n^{- 1 ∕ 5} t h_{0}^{'} (x_{0})] ∣ = 0,

(5.15)

where $n^{2 ∕ 5} [{\hat{h}}_{n} (x_{0} + n^{- 1 ∕ 5} t) - h_{0} (x_{0}) - n^{- 1 ∕ 5} t h_{0}^{'} (x_{0})]$ is convex, and hence the limit of ${({\hat{I}}_{n, ↓}^{*, loc})}^{''} (t)$ will be convex. Thus, (1.3) is satisfied in the limit.

Let $B_{n} (t) = (h_{0} (x_{0}) ∕ S_{0} (x_{0})) \times (S_{0} (t) ∕ {\hat{h}}_{n} (t))$ . We may then write

\begin{matrix} {({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (t) = & n^{1 ∕ 5} ({\hat{h}}_{n}^{'} (x_{n} (t)) - h_{0}^{'} (x_{0})] B_{n} (x_{n} (t)) \\ + n^{1 ∕ 5} [{\hat{h}}_{n} (x_{n} (t)) - h_{0} (x_{0}) - n^{- 1 ∕ 5} t h_{0}^{'} (x_{0})] \times B_{n}^{'} (x_{n} (t)) . \end{matrix}

Notice that sup_|t|≤M |1 − B_n(x₀ + n^−1/5t)| →_a.s. 0, with $\lim_{n} B_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t)$ bounded. Therefore, from Proposition 5.5, it follows that

\lim_{n} \sup_{∣ t ∣ \leq M} ∣ {({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (t) - n^{1 ∕ 5} [{\hat{h}}_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) - h_{0}^{'} (x_{0})] ∣ = 0,

(5.16)

where $n^{1 ∕ 5} [{\hat{h}}_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) - h_{0}^{'} (x_{0})]$ is piecewise constant, with jumps at the touchpoints of ${\hat{h}}_{n}$ . By consistency of ${\hat{h}}_{n}$ , we have

\begin{matrix} {({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (t) = & B_{n} (x_{0} + n^{- 1 ∕ 5} t) d {\hat{g}}_{n} (t) \\ + 2 [{\hat{h}}_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) - h_{0}^{'} (x_{0})] B_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) d t \\ + n^{1 ∕ 5} ({\hat{h}}_{n} [x_{0} + n^{- 1 ∕ 5} t) - h_{0} (x_{0}) - n^{- 1 ∕ 5} t h_{0}^{'} (x_{0})] d B_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) \\ = & {B_{n} (x_{0} + n^{- 1 ∕ 5} t) + O_{p}^{*} (n^{- 2 ∕ 5})} d {\hat{g}}_{n} (t) + O_{p}^{*} (n^{- 1 ∕ 5}) d t, \end{matrix}

where ${\hat{g}}_{n} (t) = n^{1 ∕ 5} [{\hat{h}}_{n}^{'} (x_{0} + n^{- 1 ∕ 5} t) - h_{0}^{'} (x_{0})]$ . We say that a process X_n(t) is $O_{p}^{*} (1)$ if sup_|t|≤M |X_n(t)| is O_p(1).

Next, fix a c > 0. Since ${\hat{h}}_{n}$ is piecewise linear, it follows that dĝ_n puts mass only at the locations of touchpoints of ${\hat{h}}_{n}$ However, at these locations, by (5.14), the process ${\hat{I}}_{n, ↓}^{loc} (t) - {\hat{Y}}_{n, ↓}^{loc} (t)$ is equal to zero. It follows that

\int_{- c}^{c} ({\hat{I}}_{n, ↓}^{loc} (t) - {\hat{Y}}_{n, ↓}^{loc} (t)) d {\hat{g}}_{n} (t) = 0 .

Hence,

\begin{matrix} \int_{- c}^{c} ({\hat{I}}_{n, ↓}^{*, loc} (t) - {\hat{Y}}_{n, ↓}^{loc} (t)) d {({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (t) = & \int_{- c}^{c} ({\hat{I}}_{n, ↓}^{*, loc} (t) - {\hat{Y}}_{n, ↓}^{loc} (t)) d [{({\hat{I}}_{n, ↓}^{*, loc})}^{'''} - {\hat{g}}_{n}] (t) \\ + \int_{- c}^{c} ({\hat{I}}_{n, ↓}^{*, loc} (t) - {\hat{I}}_{n, ↓}^{loc} (t)) d {\hat{g}}_{n} (t) = o_{p} (1), \end{matrix}

using Proposition 5.5, (5.12) and the fact that ĝ_n is increasing.

It remains to show that (1.4) is maintained under limits. This follows from the continuous mapping theorem since for any element z = {z₁, z₂, z₃, z₄, z₅, z₆} ∈ E[−M, M],

ψ (z) = \int_{- M}^{M} (z_{3} - z_{1}) d z_{6}

is continuous in z for z₆ increasing. We have thus shown that ${\hat{I}}_{n, ↓}^{*, loc} (t)$ satisfies the invelope conditions (1.2)–(1.4) asymptotically. This shows that the only possible limit of ${\hat{I}}_{n, ↓}^{*, loc} (t)$ is the process $I$ .

Tightness

We already know that ${\hat{Y}}_{n, ↓}^{loc} (t)$ and $({\hat{Y}}_{n, ↓}^{loc})^{'} (t)$ are tight in C[−M, M] and D[−M, M], respectively. To address tightness of the invelope processes, note that bounded] and increasing functions are compact in D[−M, M] and that bounded continuous functions with uniformly bounded derivatives are compact in C[−M, M]. These two facts allow us to address only stochastic boundedness of $({\hat{I}}_{n, ↓}^{*, loc}) (i) (t)$ , i = 0, … , 3, to obtain tightness. Thus, Proposition 5.5, along with (5.15) and (5.16), says that ${({\hat{I}}_{n, ↓}^{*, loc})}^{''} (t)$ and ${({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (t)$ are tight. It remains to argue the same for ${({\hat{I}}_{n, ↓}^{*, loc})}^{'} (t)$ and ${\hat{I}}_{n, ↓}^{*, loc} (t)$ . However, this will follow by Proposition 5.5, and (5.12), if we can show that both ${\hat{A}}_{n, ↓}$ and ${\hat{A}}_{n, ↓} t + {\hat{B}}_{n, ↓}$ are tight.

Let τ_n be the largest touchpoint smaller than x₀. By (2.7), and after careful calculations, we have

\begin{matrix} - \frac{S_{0} (x_{0})}{h_{0} (x_{0})} {\hat{A}}_{n, ↓} = - & n^{3 ∕ 5} {\int_{τ_{n}}^{x_{0}} \frac{{\hat{h}}_{n} (u) - h_{0} (x_{0}) - h_{0}^{'} (x_{0}) (u - x_{0})}{{\hat{h}}_{n} (u)} S_{0} (u) d u} \\ + & n^{3 ∕ 5} {\int_{τ_{n}}^{x_{0}} \frac{h_{0} (u) - h_{0} (x_{0}) - h_{0}^{'} (x_{0}) (u - x_{0})}{{\hat{h}}_{n} (u)} S_{0} (u) d u} \\ + & n^{3 ∕ 5} \int_{τ_{n}}^{x_{0}} \frac{1}{{\hat{h}}_{n} (u)} d {F_{n} - F_{0}} (u) + n^{3 ∕ 5} \int_{τ_{n}}^{x_{0}} S_{n} (u) - S_{0} (u) d u . \end{matrix}

By Proposition 5.5, Lemma 5.3 and Theorem 3.1, the first two terms are tight in C[−M, M]. Arguments similar to those used in the proof of Lemma 5.1, along with Lemma 5.3, may be used to handle the remaining terms. Since ${\hat{H}}_{n, ↓} (τ_{n}) = \int_{0}^{τ_{n}} A_{n, ↓} (v) d v$ by Lemma 2.2, it follows that ${\hat{B}}_{n, ↓}$ is tight in C[−M, M], which, in turn, implies that ${\hat{Z}}_{n}$ is tight in the space E[−M, M].

From the invelope to Theorem 1.2

By (5.15) and (5.16), the limiting behavior of $n^{2 ∕ 5} ({\hat{h}}_{n} (x_{0}) - h_{0} (x_{0}))$ and $n^{1 ∕ 5} ({\hat{h}}_{n}^{'} (x_{0}) - h_{0}^{'} (x_{0}))$ is the same as that of the second and third derivatives of ${\hat{I}}_{n, ↓}^{*, loc}$ , which converge to the invelope of ${\hat{Y}}_{n, ↓}^{loc}$ . Define k₁, k₂ by

\lim_{n} {\hat{Y}}_{n, ↓}^{loc} (t) = \sqrt{C^{'} (x_{0})} \int_{0}^{t} W (s) d s + \frac{1}{24} h_{0}^{''} (x_{0}) t^{4} \equiv k_{1} \int_{0}^{t} W (s) d s + k_{2} t^{4} .

For any a, b > 0, $b Y (a t) \overset{d}{=} a^{3 ∕ 2} b \int_{0}^{t} W (s) d s + a^{4} b t^{4}$ . Therefore, choose a, b so that a⁴b = k₂ and a^3/2b = k₁. It follows that

{\hat{Y}}_{n, ↓}^{loc} (t) \Rightarrow b Y (a t) .

Applying this rescaling to all processes shows that

{({\hat{I}}_{n, ↓}^{*, loc})}^{''} (0) \Rightarrow b a^{2} I^{''} (0) and {({\hat{I}}_{n, ↓}^{*, loc})}^{'''} (0) \Rightarrow b a^{3} I^{'''} (0) .

It is now straightforward to calculate the correct constants, c₁ and c₂, of Theorem 1.2.

Acknowledgements

The research of Hanna Jankowski was supported by the NSERC; the research of Jon A. Wellner was supported in part by NSF Grant DMS-0503822 and NI-AID Grant 2R01 AI291968-04.

References

[1].Balabdaoui F. Consistent estimation of a convex density at the origin. Math. Methods Statist. 2007;16:77–95. [Google Scholar]
[2].Banerjee M. Estimating monotone, unimodal and U-shaped failure rates using asymptotic pivots. Statist. Sinica. 2008;18:467–492. [Google Scholar]
[3].Banerjee M, Wellner JA. Likelihood ratio tests for monotone functions. Ann. Statist. 2001;29:1699–1731. [Google Scholar]
[4].Banerjee M, Wellner JA. Confidence intervals for current status data. Scand. J. Statist. 2005;32:405–424. [Google Scholar]
[5].Baraud Y, Birgé L. Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Related Fields. 2009;143:239–284. [Google Scholar]
[6].Brunel E, Comte F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006;15:233–255. [Google Scholar]
[7].Cai T, Low M. Technical report. Dept. Statistics, Univ. Pennsylvania; 2007. Adaptive estimation and confidence intervals for convex functions and monotone functions. [Google Scholar]
[8].Carolan C, Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist. 1999;27:557–566. [Google Scholar]
[9].Gruppo di Lavoro MPS. Catalogo parametrico dei terremoti italiani, versione 2004 (cpti04) INGV; Bologna: 2004. Available at http://emidius.mi.ingv.it/CPTI. in Italian. [Google Scholar]
[10].Grenander U. On the theory of mortality measurement. II. Skand. Aktuarietidskr. 1956;39:125–153. [Google Scholar]
[11].Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion +t4. Ann. Statist. 2001;29:1620–1652. [Google Scholar]
[12].Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698. [Google Scholar]
[13].Groeneboom P, Jongbloed G, Wellner JA. The support reduction algorithm for computing non-parametric function estimates in mixture models. Scand. J. Statist. 2008;35:385–399. doi: 10.1111/j.1467-9469.2007.00588.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Haupt E, Schäbe H. The TTT transformation and a new bathtub distribution model. J. Statist. Plann. Inference. 1997;60:229–240. [Google Scholar]
[15].Jankowski H, Wellner JA. Nonparametric estimation of a convex bathtub-shaped hazard function. Univ. Washington, Department of Statistics; 2007. Technical Report 521. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Jankowski H, Wellner JA. Computation of nonparametric convex hazard estimators via profile methods. J. Nonparametr. Stat. 2009;21:505–518. doi: 10.1080/10485250902745359. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Jankowski H, Wang X, McCauge H, Wellner J. convexHaz: R functions for convex hazard rate estimation. R package version 0.2. 2008. [Google Scholar]
[18].Jongbloed G. Minimax lower bounds and moduli of continuity. Statist. Probab. Lett. 2000;50:279–284. [Google Scholar]
[19].La Rocca L. Bayesian non-parametric estimation of smooth hazard rates for seismic hazard assessment. Scand. J. Statist. 2008;35:524–539. [Google Scholar]
[20].Lai CD, Xie M, Murthy NP. Bathtub-shaped failure rate life distributions. In: Balakrishnan N, Rao CR, editors. Advances in Reliability. Vol. 20. North-Holland Publishing; Amsterdam: 2001. pp. 69–104. Handbook of Statistics. [Google Scholar]
[21].Politis DN, Romano JP. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050. [Google Scholar]
[22].Politis DN, Romano JP, Wolf M. Subsampling. Springer; New York: 1999. [Google Scholar]
[23].Rajarshi S, Rajarshi MB. Bathtub distributions: A review. Comm. Statist. Theory Methods. 1988;17:2597–2621. [Google Scholar]
[24].Reboul L. Estimation of a function under shape restrictions. Applications to reliability. Ann. Statist. 2005;33:1330–1356. [Google Scholar]
[25].Reynaud-Bouret P. Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields. 2003;126:103–153. [Google Scholar]
[26].Robertson T, Wright FT, Dykstra RL. Order Restricted Statistical Inference. Wiley; Chichester: 1988. [Google Scholar]
[27].Rockafellar RT. Convex Analysis. Princeton Mathematical Series. Vol. 28. Princeton Univ. Press; Princeton, NJ: 1970. [Google Scholar]
[28].Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. Wiley; New York: 1986. [Google Scholar]
[29].van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications in Statistics. Springer; New York: 1996. [Google Scholar]
[30].Woodroofe M, Sun J. A penalized maximum likelihood estimate of f(0+) when f is nonincreasing. Statist. Sinica. 1993;3:501–515. [Google Scholar]

[R1] [1].Balabdaoui F. Consistent estimation of a convex density at the origin. Math. Methods Statist. 2007;16:77–95. [Google Scholar]

[R2] [2].Banerjee M. Estimating monotone, unimodal and U-shaped failure rates using asymptotic pivots. Statist. Sinica. 2008;18:467–492. [Google Scholar]

[R3] [3].Banerjee M, Wellner JA. Likelihood ratio tests for monotone functions. Ann. Statist. 2001;29:1699–1731. [Google Scholar]

[R4] [4].Banerjee M, Wellner JA. Confidence intervals for current status data. Scand. J. Statist. 2005;32:405–424. [Google Scholar]

[R5] [5].Baraud Y, Birgé L. Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Related Fields. 2009;143:239–284. [Google Scholar]

[R6] [6].Brunel E, Comte F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006;15:233–255. [Google Scholar]

[R7] [7].Cai T, Low M. Technical report. Dept. Statistics, Univ. Pennsylvania; 2007. Adaptive estimation and confidence intervals for convex functions and monotone functions. [Google Scholar]

[R8] [8].Carolan C, Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist. 1999;27:557–566. [Google Scholar]

[R9] [9].Gruppo di Lavoro MPS. Catalogo parametrico dei terremoti italiani, versione 2004 (cpti04) INGV; Bologna: 2004. Available at http://emidius.mi.ingv.it/CPTI. in Italian. [Google Scholar]

[R10] [10].Grenander U. On the theory of mortality measurement. II. Skand. Aktuarietidskr. 1956;39:125–153. [Google Scholar]

[R11] [11].Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion +t4. Ann. Statist. 2001;29:1620–1652. [Google Scholar]

[R12] [12].Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698. [Google Scholar]

[R13] [13].Groeneboom P, Jongbloed G, Wellner JA. The support reduction algorithm for computing non-parametric function estimates in mixture models. Scand. J. Statist. 2008;35:385–399. doi: 10.1111/j.1467-9469.2007.00588.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Haupt E, Schäbe H. The TTT transformation and a new bathtub distribution model. J. Statist. Plann. Inference. 1997;60:229–240. [Google Scholar]

[R15] [15].Jankowski H, Wellner JA. Nonparametric estimation of a convex bathtub-shaped hazard function. Univ. Washington, Department of Statistics; 2007. Technical Report 521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Jankowski H, Wellner JA. Computation of nonparametric convex hazard estimators via profile methods. J. Nonparametr. Stat. 2009;21:505–518. doi: 10.1080/10485250902745359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Jankowski H, Wang X, McCauge H, Wellner J. convexHaz: R functions for convex hazard rate estimation. R package version 0.2. 2008. [Google Scholar]

[R18] [18].Jongbloed G. Minimax lower bounds and moduli of continuity. Statist. Probab. Lett. 2000;50:279–284. [Google Scholar]

[R19] [19].La Rocca L. Bayesian non-parametric estimation of smooth hazard rates for seismic hazard assessment. Scand. J. Statist. 2008;35:524–539. [Google Scholar]

[R20] [20].Lai CD, Xie M, Murthy NP. Bathtub-shaped failure rate life distributions. In: Balakrishnan N, Rao CR, editors. Advances in Reliability. Vol. 20. North-Holland Publishing; Amsterdam: 2001. pp. 69–104. Handbook of Statistics. [Google Scholar]

[R21] [21].Politis DN, Romano JP. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050. [Google Scholar]

[R22] [22].Politis DN, Romano JP, Wolf M. Subsampling. Springer; New York: 1999. [Google Scholar]

[R23] [23].Rajarshi S, Rajarshi MB. Bathtub distributions: A review. Comm. Statist. Theory Methods. 1988;17:2597–2621. [Google Scholar]

[R24] [24].Reboul L. Estimation of a function under shape restrictions. Applications to reliability. Ann. Statist. 2005;33:1330–1356. [Google Scholar]

[R25] [25].Reynaud-Bouret P. Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields. 2003;126:103–153. [Google Scholar]

[R26] [26].Robertson T, Wright FT, Dykstra RL. Order Restricted Statistical Inference. Wiley; Chichester: 1988. [Google Scholar]

[R27] [27].Rockafellar RT. Convex Analysis. Princeton Mathematical Series. Vol. 28. Princeton Univ. Press; Princeton, NJ: 1970. [Google Scholar]

[R28] [28].Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. Wiley; New York: 1986. [Google Scholar]

[R29] [29].van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications in Statistics. Springer; New York: 1996. [Google Scholar]

[R30] [30].Woodroofe M, Sun J. A penalized maximum likelihood estimate of f(0+) when f is nonincreasing. Statist. Sinica. 1993;3:501–515. [Google Scholar]

PERMALINK

Nonparametric estimation of a convex bathtub-shaped hazard function

HANNA K JANKOWSKI

JON A WELLNER

Abstract

1. Introduction

Figure 1.

Definition 1.1

Theorem 1.2

2. Characterizations, uniqueness and existence

Proposition 2.1

Proof

Lemma 2.2

Remark 2.3

Corollary 2.4

Proof

Proof of Lemma 2.2

Corollary 2.5

Proof

Proposition 2.6

Proof

3. Consistency

Theorem 3.1

Remark 3.2

Proof

Corollary 3.3

Proof

Corollary 3.4

Lemma 3.5

4. Asymptotic lower bounds for the minimax risk

Lemma 4.1

Proof

Theorem 4.2 (Minimax risk lower bound)

5. Rates of convergence

5.1. Some useful estimates

Lemma 5.1

Proof

Lemma 5.2

5.2. Asymptotic behavior of touchpoints and resulting bounds

Lemma 5.3

Proof

Lemma 5.4

Proof

Proposition 5.5

Proof

5.3. Limit distribution theory at a fixed point

Driving process

Invelope process

Identifying the limit

Tightness

From the invelope to Theorem 1.2

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases