Abstract
In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n2/5 at points x0 where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.
Keywords: antimode, bathtub, consistency, convex, failure rate, force of mortality, hazard rate, invelope process, limit distribution, nonparametric estimation, U-shaped
1. Introduction
Information on the behavior of time to a random event is of much interest in many fields. The random event could be failure of a material or machine, death, an earthquake or infection by a disease, to name but a few examples. Frequently, this type of data is called lifetime data, and it is natural to assume that it takes values in [0, ∞). If the lifetime distribution F has a density f , then a key quantity of interest is the hazard (or failure) rate h(t) = f(t)/(1 − F(t)). Heuristically, h(t) dt is the probability that, given survival until time t, the event will occur in the next duration of length dt. The hazard function is also known as the force of mortality in actuarial science or the intensity function in extreme value theory.
Certain shape restrictions arise quite naturally for hazard rates. In this work, we are particularly interested in the family of hazard functions which are convex. That is, we attach the additional smoothness constraint of convexity to the more traditional assumption of a bathtub-shaped failure rate (that is, first decreasing, then increasing). Heuristically, bathtub-shaped hazards correspond to lifetime distributions with high initial hazard (or infant mortality), lower and often rather constant hazard during the middle of life and then increasing hazard of failure (or wear-out) as aging proceeds; see [20,23].
Many other estimators of hazard functions (and solutions to the closely related problem of estimating the intensity of a Poisson process) with and without shape restrictions have been considered in the literature; see [24] for a partial review up to 2002. In recent years, the focus has shifted to construction of “adaptive” estimators over large scales of smoothness classes; see, for example, [5,6,25]. Virtually all of these other estimators require careful choice of penalty terms or tuning parameters, and computation of the adaptive estimators typically involves methods of combinatorial optimization. As far as we know, reliable algorithms for computing them are not yet available. Our estimators avoid the choices of tuning parameters or penalty terms by virtue of the shape constraint of convexity and are relatively straightforward to compute since the corresponding optimization problems are convex.
Recall the definition of a convex function. Let be convex. Then is convex (on C) if it satisfies
for all x, y ∈ C. Equivalently, a function is convex if its epigraph
is a convex set in (see, for example, [27], Section 4). Thus, a convex function on C may be extended to a convex function on by setting h(x) = +∞ for .
Suppose, then, taht we observe i.i.d. variables X1, … , Xn from a distribution F0 with density f0 and hazard rate h0. We denote the true cumulative hazard function by , and the true survival function by S0 = 1 − F0. Also, 0 < X(1) < X(2) < ⋯ < X(n) denote the order statistics corresponding to X1, … , Xn.
To define the MLE of h0, , we first consider the likelihood in terms of the hazard,
where . This can be made arbitrarily large by increasing the value of h(X(n)). We therefore find by maximizing the modified likelihood
| (1.1) |
over , the space of non-negative convex functions on [0, X(n)). The full MLE is then found by setting for all x ≥ X(n). This is the same approach as taken in [10]. Equivalently, one could first impose the constraint that h ≤ M, and then let M → ∞ (see, for example, [26], page 338).
To illustrate the proposed estimator, consider the distribution with density given by
This distribution was derived in [14] as a relatively simple model with bathtub-shaped hazards which also has an adequate ability to model lifetime behavior. We will call this the HS distribution, after the authors. It has convex hazards for all values of b in the parameter space (b > −1/2). Figure 1 shows the MLE for simulated data from this distribution with sample size n = 100.
Figure 1.
Examples of the estimator. Left: Estimating the HS hazard with b = 0 and A = 1 for a sample size of 100 (bold = true hazard, solid = MLE). Right: Estimation of the earthquake hazard from CPTI04 data (solid = MLE).
We also applied our estimators to the earthquake data of the Appennino Abruzzese region of Italy (Region 923) recently considered by [19], where Bayesian estimation methods are studied. The data comes from the Gruppo di Lavoro CPTI (2004) catalog [9]. It consists of 46 inter-quake times for Region 923, occurring after the year 1650 and with moment magnitude greater than 5.1 (details on the justification of these criteria is available in [19], page 14). Figure 1 shows the resulting estimator.
The main results of this paper are the characterizations, consistency and asymptotic behavior of the nonparametric MLE of a convex hazard function. The estimator is continuous and piecewise linear on [0, X(n)). Although we give a characterization of the MLE, the final form of the estimator is not explicit. We therefore propose an algorithm (based on the support reduction algorithm of [13]). This algorithm is discussed in a separate report, [16], and is available as the R package convexHaz [17].
To describe the local asymptotics of the MLE, we introduce the following process.
Definition 1.1
Let W(s) denote a standard two-sided Brownian motion, with W(0) = 0, and define . The function , the invelope of the process is defined as fallows:
| (1.2) |
| (1.3) |
| (1.4) |
It was shown in [11] that the process exists and is almost surely uniquely defined. Moreover, with probability one, is three times differentiable at t = 0. The asymptotic behavior of all of our estimators may be described in terms of the derivatives of the invelope at zero. The following theorem builds on the basic results in [12] concerning nonparametric estimation of a decreasing convex density.
Theorem 1.2
Suppose that h0 is convex, x0 > 0 is a point which satisfies h0(x0) > 0 and , and is continuous in a neighborhood of x0. Then
where and are the second and third derivatives at 0, respectively, of the invelope of and where
The key to this result lies in Lemma 5.3, where we establish that the “touchpoints” (defined carefully in Section 2) cluster around x0 at a local scale of n−1/5. The assumption that is strictly convex and continuous near x0 is crucial in this step. If (and is continuous in a neighborhood of x0), we conjecture that converges at the rate n1/2. Similar behavior has been noted for monotone density estimators in [8]. If is discontinuous at x0, unpublished work of Cai and Low [7] suggests that converges to h0(x0) at rate n1/3. The behavior of convex-constrained estimators in both of these situations remains unknown and is the subject of current research.
The limiting distributions of and involve the constants c1 and c2, which depend on the (unknown) hazard function h0, as well as the random variables , which have a universal distribution free of the parameters of the problem. Thus, Theorem 1.2 can be used, in principle, to form confidence intervals for h0(x0) and . This would involve estimation of the constants c1 = c1(h0, x0) and c2 = c2(h0, x0), respectively, both of which depend on , and appropriate quantiles of the distributions of and , respectively. Although virtually nothing is known about the distribution of the invelope and its derivatives analytically, the algorithms developed in [11] can easily be used to obtain simulated values of the needed quantiles. Other possible approaches to confidence intervals in this problem involve inversion of likelihood ratio tests (see [2-4] for this approach in the context of monotone or U-shaped function estimation) or resampling methods as in [22] and as discussed in [4] in the setting of nonparametric estimation of monotone functions. It should be noted that our Theorem 1.2 verifies one of the key hypotheses needed for validity of the general subsampling theory of [21,22], and therefore makes the subsampling approach to confidence intervals viable. The details and properties of all these approaches remain to be investigated.
The outline of this paper is as follows. Section 2 is dedicated to the proof of characterizations, existence and uniqueness of the MLE. Consistency is proved in Sections 3 and 4 establishes lower bounds for the pointwise minimax risk of . Rates of convergence are established in Section 5, with Section 5.3 containing proofs of our main results concerning the limiting distribution at a fixed point. The companion technical report [15] also includes a detailed treatment of a least-squares estimator, as well as sketches of similar results for censored data and intensity functions of Poisson processes.
2. Characterizations, uniqueness and existence
Proposition 2.1
The function which minimizes φn over is piecewise linear. It has at most one change of slope between observations, except perhaps in one such interval, where, if the estimator touches zero, it may have two changes of slope (it is zero between these two changes). Also, between zero and X(1), the minimizer may have at most one change of slope, but this happens only if it touches zero, and in this case the estimator is increasing and equal to zero before the first change of slope. Between X(n−1) and X(n), the minimizer will also have at most one change of slope, and this only in the case where it is decreasing on [X(n−1), X(n)) and equal to zero after the change.
Proof
Consider any h and choose a convex g such that h(Xi) = g(Xi) for i = 1, … , n − 1 and h(x) ≥ g(x) ≥ 0 on [0, X(n)). It follows that φn(h) − φn(g) ≥ 0 if and only if H(Xi) ≥ G(Xi) for i = 1, … , n. Hence, the smaller we make g on [0, X(n)), the smaller φn(g) will become. It is not difficult to see that the smallest such g, with values of g(Xi) fixed, must have the prescribed form.
Since is piecewise linear, it may be expressed as
| (2.1) |
where . We let τj denote the points of change of slope of where is decreasing and let ηj > 0 denote the points of change of slope where is increasing. For simplicity, we assume that these are ordered. Also, we have τk ≤ η1. As seen in the next lemma, the τj ’s and ηj’s correspond to “points of touch” or equality of processes defined on the one hand in terms of and the data, and on the other hand just in terms of the data. We therefore also refer to them as “touchpoints” repeatedly in the remainder of the paper.
It is convenient to define the MLE in terms of the minimization of the criterion function
where denotes the empirical distribution function of the data,
We will also use the notation for the empirical survival function.
Lemma 2.2
Let . A function minimizes φn over (and hence is the MLE) if and only if:
| (2.2) |
for all x ≥ 0with equality at τi for i = 1, … , k;
| (2.3) |
for all x ≥ 0with equality at ηj for j = 1, … , m;
| (2.4) |
| (2.5) |
Moreover, the minimizer satisfies
| (2.6) |
for x ∈ {τ1, … , τk, η1, … , ηm}.
Remark 2.3
As we assume a priori that , we may rewrite the left-hand side terms in (2.2)–(2.4) via
for any set A. We will hereafter use this latter formulation.
Corollary 2.4
Let and denote the change points of as in (2.1). It fallows that
| (2.7) |
| (2.8) |
for i = 1, … , k and j = 1, … , m.
Proof
The function
is maximized at τi for i = 1, … , k. Since it is also differentiable, (2.7) follows. A similar argument proves (2.8).
Proof of Lemma 2.2
Consider any non-negative convex function h. It follows that there exists a non-negative constant a and non-negative measures ν and μ (these measures have supports with intersection containing at most one point) such that
For any function ĥ in , we calculate
since − log x ≥ 1 − x. Plugging in the explicit form of h from above, we find that the right-hand side is equal to
This is non-negative if is a function which satisfies conditions (2.2)–(2.5). It follows that these conditions are sufficient to describe a minimizer of φn.
We next show that these conditions are necessary. To do this, we first define the directional derivative
| (2.9) |
If minimizes φn, then for any γ such that is in for sufficiently small ε, we must have . If however, is in for sufficiently small ε, then .
If we choose, respectively, γ(t) ≡ 1, (t − y)+, (y − t)+, then is in and we obtain the inequalities in conditions (2.2)–(2.4). Since is also in , for sufficiently small ε, we obtain (2.5). Choosing γ = (τi − t)+, (t − ηj)+ yields the equalities in (2.2) and (2.3), respectively, since for each of these functions, is in .
Lastly, we prove (2.6). For any τi, define
Since (1 ± ε)γ is also in , It follows that and hence
Integration by parts and Corollary 2.4 yield (2.6) for x = τi. The case where x = ηj is obtained in a similar manner, but using and (2.5).
The next corollary allows us to extend the equalities of the characterization of the MLE to some extra touchpoints. The significance of these equations will become clear in Section 5, where we consider asymptotics of the estimator.
Corollary 2.5
Suppose that is strictly positive and recall the formulation given in (2.1). Then we also have that
| (2.10) |
| (2.11) |
| (2.12) |
| (2.13) |
Proof
The first two equalities follow by noting that if is strictly positive, then for ε sufficiently small, is in for γ(t) = (t − τk)+, (η1 − t)+. Arguing as for Corollary 2.4 proves the remaining identities.
Proposition 2.6
There exists a unique minimizer of φn over .
Proof
We will show that a minimizer exists by reducing the search to bounded positive convex functions on a compact domain. As this is a compact set, under the topology of uniform convergence, a minimizer of φn exists (see [27], Theorems 10.6, 10.8 and 27.3).
We must first handle the issue of a compact domain. As we assume a priori that , we are really looking for the minimizer of the modified negative of the log-likelihood with domain [0, X(n)). However, we have also argued that the minimizer must have the specific functional form as described in Proposition 2.1. Therefore, it is sufficient to reduce the domain to [0, X(n−1) + δ], for any δ > 0, since is then extended linearly beyond X(n−1) + δ in a unique manner. It will therefore be sufficient to show that we may reduce the search to functions bounded on [0, X(n−1)], with a derivative at X(n−1) which is bounded above.
Recall that the minimizer must satisfy (2.5). We therefore reduce our search to the class of functions which satisfy this condition. For any such h, write h = h+ + h−, where h+ is increasing and h− is decreasing. It follows that for any x,
A similar bound for h+ yields
| (2.14) |
for all x in (0, X(n)). Thus we know that h(x) must be bounded for x ∈ (0, X(n−1)].
To show that h is also bounded at zero, we need to show that h’(X(1)) is bounded from below. Assuming that it is negative, we may write, for 0 < x ≤ X(1),
Fixing x* > 0 and less than X(1), we then obtain that
from which it follows that h must be bounded on the set [0, X(n−1)].
By (2.5), we also have that
if h is increasing on [X(n−1), X(n)). This implies that h’(X(n−1)) is bounded above, completing the proof.
We now show uniqueness. Suppose that h1 and h2 both minimize φn. Then, by (2.5), φn(h1) and φn(h2) differ only in the term . However, this term is strictly convex and it follows that h1(X(i)) = h2(X(i)) for all i = 1, … , n − 1.
Let h‾ = (h1 + h2)/2. By linearity, we have that φn(h1) = φ(h2) = φ(h‾), which implies that h‾ is also a minimizer. However, the only way that this is possible is if h‾ also satisfies the conditions of Proposition 2.1. This implies that one of the following holds:
Both h1 and h2 are increasing and h1(0) = h2(0) = 0. In this case, they must have the same locations for their changes of slope, as otherwise h‾ violates Proposition 2.1.
Point 1 above does not hold. Then, by the same argument as above, if h1 and h2 have at least one change of slope in an interval between observations (or between zero and X(1)), then these locations of change of slope must be equal.
If the first case holds, then it is not difficult to see that h1 ≡ h2 on [0, X(n)), as h1(t) = h1(t) = 0 on [0, τ1] and h1(Xi) = h2(Xi) for all observation points.
In the second case, we use a different argument. We know that neither h1 nor h2 have touchpoints before X(1). Let t* denote the first touchpoint of h1 and (without loss of generality) assume that the first touchpoint of h2 is greater than t*. Hence, by (2.6),
Now, h‾ = (h1 + h2)/2 and h2 are also minimizers of the MLE criterion function φn. Also, h‾ has a touchpoint at t* and h‾(X(1)) = h2(X(1)).
Averaging h‾ with h2 yields the functions h‾l = 2−1(h1 − h2) + h2, which satisfy
Since h‾l → h2 pointwise, it follows from the dominated convergence theorem that . Therefore, since h1 and h2 are both linear on [0, t*] with h1(X(1)) = h2(X(1)) and , they must have both the same value and slope at X(1). That is, both h1(X(1)) = h2(X(2)) and hold.
Now write,
where X(1) < t1,j < t2,j < ⋯ < tmj−1,j < X(n), j = 1, 2, and where h1(X(i)) = h2(X(i)) for i = 1, … , n. We also assume that νi,j > 0 for i = 1, … , mj − 1, j = 1, 2. This implies, in particular, that hj(t) = aj + bjt for t ≤ t1,j , j = 1, 2, and since X(1) < t1,j, j = 1, 2, h1(X(1)) = h2(X(1)). Thus a1 + b1X(1) = a2 + b2X(1). From the argument above, it follows that . We conclude that a1 = a2 and b1 = b2 so that h1(t) = h2(t) for 0 ≤ t ≤ t*. It also follows that t1,1 = t1,2.
Repeating this argument on the interval [t*, t**] with t** = min{t2,1, t2,2 shows that ν1,1 = ν1,2 or t2,1 = t2,2. Proceeding by induction yields νj,1 = νj,2 and tj+1,1 = tj+1,2 for j = 1, … , m1 − 1 = m2 − 1, hence uniqueness.
3. Consistency
Theorem 3.1
Suppose that X1, … , Xn are i.i.d. random variables with convex hazard function h0 and corresponding distribution function F0. Let T0 ≡ T0(F0) ≡ inf{t : F0(t) = 1}. The MLE is then consistent for all t ∈ (0, T0). Also, for all δ > 0,
if T0 < ∞. if T0 = ∞, the above statement holds with T0 − δ replaced by any K < ∞.
Remark 3.2
If h0 is increasing at 0, then one can show that is not consistent at zero. This is a frequently occurring difficulty of shape constrained estimators; see, for example, [1,12,30].
Proof
We first show that is bounded appropriately so that we can select convergent subsequences. Decompose into its decreasing and increasing parts: . Then, arguing as in (2.14), it follows from (2.5) that
| (3.1) |
where the right-hand side is almost surely bounded and, in fact, converges almost surely to for all x > 0. Also,
| (3.2) |
where the right-hand side is almost surely bounded for x ∈ (supp(F0))° and converges almost surely to .
Now, take γ = h0 in the directional derivative (2.9). It follows that
noting that , and hence
Fix any 0 < a < b < ∞ such that a, b ∈ (supp(F0))°. It follows that limn X(n) > b with probability one (this can be shown using the Borel–Cantelli theorem). Also, sup by the Glivenko–Cantelli lemma. Both of these events occur on the set Ω, with P(Ω) = 1. Fix ω ∈ Ω. We will show that for such an ω.
Let {n’} denote any subsequence of {n}. By the bounds in (3.1) and (3.2) (which are finite for our choice of ω), using a classical diagonalization argument and the continuity of convex functions, we may extract a further subsequence {n”} such that pointwise on [a, b], where the limit must be convex. We denote this subsequence {n} to simplify notation.
From Fatou’s lemma, it follows that
Note that this implies that if , then h0(t) = 0. By (2.5) and integration by parts, we see that . Therefore, again applying Fatou’s lemma,
It also follows that
Define for t ∉ [a, b], which allows us to let both 1/a and b → ∞ in the above display. Since , it follows that
and this implies that for all t ∈ [a, b].
We have thus shown that every subsequence has a further subsequence which converges to the true hazard function h0(x) pointwise, for all x ∈ (supp F0)°. It follows that converges to h0 pointwise. By Theorem 10.8, page 90, [27], this implies that the claimed uniform convergence on [a, b] also holds. As this happens for any ω ∈ Ω, and P(Ω) = 1, we have proven the result.
Corollary 3.3
Suppose that is continuous and strictly positive at x0 ∈ (supp F0)°. It follows that there exist touchpoints τn ≤ x0 ≤ ηn such that τn, ηn → x0 in probability.
Proof
Let ηn, τn be touchpoints such that τn ≤ x0 ≤ ηn. If τn does not exist, then set τn = 0, and ηn = ∞ otherwise. Suppose that it is not the case that τn, ηn →p x0. It then follows from Theorem 3.1 that there exists an interval I = [a, b] such that x0 ∈ I for |I| > 0, lim supn τn ≤ a and lim infn ηn ≥ b almost surely, and, lastly, on I. However, this implies that h0(t) is linear on I, which is a contradiction.
From consistency of the estimator, we also obtain consistency of the derivatives.
Corollary 3.4
Suppose that x ∈ (a, b) and . Then at all continuity points x of on (a, b).
This follows from the following simple result for convex functions, proved in [12].
Lemma 3.5
Suppose that h‾n is a sequence of convex functions satisfying supa≤x≤b |h‾n(t) − h0(t)| → 0 with probability one. Then (also with probability one), for all x ∈ (a, b),
4. Asymptotic lower bounds for the minimax risk
Define the class of densities by
We want to derive asymptotic lower bounds for the local minimax risks for estimating the convex hazard function h and its derivative at a fixed point. The L1-minimax risk for estimating a functional T of f0 based on a sample X1, … , Xn of size n from f0 which is known to be in a subset of is defined by
| (4.1) |
where the infimum ranges over all possible measurable functions Tn = tn(X1, … , Xn) mapping to . The shrinking classes used here are Hellinger balls centered at f0,
Consider estimation of
| (4.2) |
Let and x0 > 0 be fixed such that h0 is twice continuously differentiable at x0. Define, for ε > 0, the functions hε as follows:
Here, cε is chosen so that hε is continuous at x0 − ε. Using continuity of hε and a second order expansion of h0, it follows that cε = 3 + o(1) as ε → 0. Now, define fε by
where . It follows easily that
| (4.3) |
| (4.4) |
Furthermore, the following lemma holds.
Lemma 4.1
Under the above assumptions,
Proof
The lemma follows from Lemma 2 of [18] and
This is achieved by careful calculation.
Combining (4.3) and (4.4) with the lemma, it follows that
From these calculations, together with Lemma 5.1 of [12], we have the following result. Along with Theorem 1.2, it indicates that and achieve optimal rates and also have the correct dependence on the parameters h”(x0) and h(x0) (up to absolute constants).
Theorem 4.2 (Minimax risk lower bound)
For the functionals T1 and T2 as defined in (4.2), and with as defined in (4.1),
In particular, Theorem 1.2 shows that the MLE achieves the optimal pointwise rate of convergence, n2/5, at points x0 with h”(x0) > 0. Convergence rates over the larger class of bathtub-shaped functions would be slower: the MLE of a U-shaped hazard is known to converge locally at rate n1/3; see, for example, [2].
5. Rates of convergence
In this section, we identify the local rates of convergence of the MLE. Fix a point x0 ∈ (supp F0)°. To obtain the results, we assume that is continuous and strictly positive in a neighborhood of x0 and that h(x0) > 0.
5.1. Some useful estimates
For 0 < x ≤ y, define
Lemma 5.1
Let x0 ∈ (supp F0)°. Then, for each ε > 0, there exist constants δ, c0, n0 and (positive) random variables Mn (independent of x, y), of order Op(1), such that for each |x − x0| < δ,
| (5.1) |
for all n ≥ n0.
Proof
Note that , where
and, in view of the consistency established in Theorem 3.1, is a convex function uniformly close to h0 on neighborhoods of x0. This leads to the consideration of the class of functions
with γ ≡ infx0 − δ≤x≤x0+δ+c0 h0(x)/2 and we define . The class has an envelope function Fx,R(z) = γ−1{(z − x)1[x,x+R](z)+2−1R1[x,x+R](z)} and hence the following second moment bound holds:
Furthermore, for some constant K by [29], Theorem 2.7.10, page 159, and a straightforward bracketing argument. It then follows from [29], Theorems 2.14.2 and 2.14.5, pages 240 and 244, that
| (5.2) |
Define Mn(ω) as the infimum (possibly +∞) of those values such that (5.1) holds and define A(n, j) to be the set [(j − 1)n−1/5, jn−1/5). Then, for m constant,
The jth summand is hence bounded by
due to (5.2). Thus it follows, using Theorem 3.1 to conclude that , that
where the sum in the bound is finite and converges to zero as m → ∞. This completes the proof of the claim.
A similar approach proves the following for the function
Lemma 5.2
Let x0 ∈ (supp F0)°. Then, for each ε > 0, there exist constants δ, c0 > 0 and (positive) random variables Mn (independent of x, y) of order Op(1) such that for each |x − x0| < δ,
| (5.3) |
5.2. Asymptotic behavior of touchpoints and resulting bounds
Lemma 5.3
Let x0 > 0 be a point at which h0 has a continuous and strictly positive second derivative, and where h(x0) > 0. Let ξn be any sequence of numbers converging to x0 and define τn and ηn to be the largest touchpoint of smaller than ξn and the smallest touchpoint larger than ξn, respectively. Then
Proof
By Theorem 3.1, we know that is positive near x0 for large enough n. Also, it is either strictly increasing or strictly decreasing in a neighborhood of x0, or it is locally flat. If is decreasing between τn and ηn, then (2.7) and (2.2) with equality at both ηn and τn hold. If, instead, is increasing, then (2.8) and (2.3) with equality at both ηn and τn hold. There is only the potential for a problem in the locally flat case. However, since is strictly positive, by Corollary 2.5, we can extend the necessary equalities to this case as well. Therefore, we need only consider two cases, is either non-increasing or non-decreasing on [τn, ηn].
We first assume that is non-increasing on [τn, ηn]. Define
| (5.4) |
and let mn be the midpoint of [τn, ηn], mn = (τn + ηn)/2. We may then calculate
From (2.2), we know that . The equality in (2.2), together with (2.7), allows us to rewrite this as 0 ≥ L1,↓ + L2,↓, where L1,↓ is equal to
and
by integration by parts.
Now replace by the true F0 in the definition of L1,↓ to obtain
where
Next, using a Taylor expansion of order 2 on the function and about the point mn, we obtain
| (5.5) |
since both and are consistent by Theorem 3.1, on (τn, ηn) and because τn − ηn = op(1) by Corollary 3.3. Therefore, by Lemmas 5.1 and 5.2, together with the above calculations, we may write
We choose ε sufficiently small (so that the leading term in the last line of the above display is positive) and hence conclude that (ηn − τn) = Op(n−1/5). A similar approach proves the non-decreasing case.
Lemma 5.4
Let ξn be a sequence converging to x0. Then, for any ε > 0, there exists an M > 1 and a c > 0 such that, with probability greater than 1 − ε, we have that there exist change points τn < ξn < ηn of such that
for all n sufficiently large.
Proof
Fix ε > 0. From Lemma 5.3, it follows that there exist touchpoints ηn and τn, and an M > 1 such that ξn − Mn−1/5 ≤ τn ≤ ξn − n−1/5 ≤ ξn + n−1/5 ≤ ηn ≤ ξn + Mn−1/5.
Fix c > 0 and consider the event
| (5.6) |
First, assume that is non-increasing on [τn, ηn]. On this set, we have that
where B is some constant depending on x0. Using the definitions in (5.4), as well as the equality in condition (2.2) with (2.7), it follows that
where and . By the assumption on h0 and x0, and arguments similar to those used for Lemmas 5.1 and 5.2, we can show that
which is a contradiction to (5.6) if c is chosen large enough. A similar argument completes the proof for the non-decreasing case.
The next proposition is the key to proving tightness in the next section. The results follow from the previous lemmas and make extensive use of the underlying convexity.
Proposition 5.5
Under the assumptions of this section, we have that, for each M > 0,
| (5.7) |
| (5.8) |
Proof
For M, ε > 0 fixed, define ηn,1 to be the first point of touch after x0 + Mn−1/5 and ηn,i to be the first point of touch after ηn,i−1 + n−1/5 , i = 2, 3 Define the points ηn,−i for i = 1, 2, 3 similarly, but working to the left of x0. By Lemma 5.4, there exist points ξn,i ∈ (ηn,i, ηn,i+1), i = 1, 2, −2, −3, and a constant c > 0 such that with probability at least 1 − ε, we have that .
As is convex, it follows that for any t ∈ [x0 − Mn−1/5, x0 + Mn−1/5],
since ξn,2 − ξn,1 ≥ n−1/5, where denotes the right derivative at t . Because of the continuity of near x0, we may replace with for some new constant . The result follows. A similar argument shows the lower bound.
We now consider (5.7). By Lemma 5.3, there exists a constant K > M such that there exist two touchpoints in [x0 + Mn−1/5, x0 + Kn−1/5], n−1/5 apart with probability 1 − ε. The same is the case in the interval [x0 − Mn−1/5, x0 − Kn−1/5]. From Lemma 5.4, it follows that there exist points ξn,1 ∈ [x0 + Mn−1/5, x0 + Kn−1/5] and ξn,2 ∈ [x0 − Mn−1/5, x0 − Kn−1/5] such that , for i = 1, 2, with probability at least 1 − ε and sufficiently large n. Lastly, we have already shown that there exists a c’ such that with probability at least 1 − ε,
Therefore, with probability at least 1 − 3ε, we have that for any t ∈ [x0 − Mn−1/5, x0 + Mn−1/5] and sufficiently large n,
for some constant B > 0. A similar argument proves the other direction.
5.3. Limit distribution theory at a fixed point
From Lemma 5.3, we know what rescaling is necessary to pick up a meaningful limit. The idea of the proof is now to write carefully a local version of the characterization of the MLE, Lemma 2.2, and to show that in the limit, these become the characterization of the invelope (Definition 1.1). The invelope is described in terms of the “driving” process Y(·). Our goal will then be to identify the two processes, one which converges to the invelope and another which converges to the driving process Y .
Note that at x0 (where h”(x0) > 0), we have three possibilities:
: By continuity, in a neighborhood of x0. It follows from the consistency of the MLE derivatives that for sufficiently large n and hence all touchpoints to be considered are of the “increasing” kind.
: By the same argument, all touchpoints are decreasing.
: Since h(x0) > 0, by Corollary 2.5, there is always at least one touchpoint which satisfies both the non-increasing and non-decreasing properties. The limiting process may then be “stitched” together in an appropriate manner.
Therefore, it will be sufficient to prove the asymptotic results for both types of touchpoints: non-increasing and non-decreasing. For the sake of brevity, we outline the argument only for the non-increasing setting.
For any interval , let D[a, b] denote the space of cadlag functions from [a, b] into endowed with the Skorohod topology and C[a, b] the space of continuous functions endowed with the uniform topology.
Driving process
Define
| (5.9) |
where is the empirical cumulative hazard function, defined by . From [28], Chapter 7, Theorem 7.4.1, page 307, we know that for t ∈ (0, T0) with T0 ≡ T0(F0) ≡ inf{x : F(x) = 1}, in D[0, M] for M < T0, where B denoes a standard Brownian motion on [0, ∞) and C(t) = F0(t)/S0(t). Let xn(t) = x0 + n−1/5t and define
| (5.10) |
It is not difficult to show that
in D[−M, M] for each fixed 0 < M < ∞, where W is a two-sided Brownian motion process starting at 0 and C’(t) = h0(t)/S0(t). Next, define
where . The derivative is not difficult to calculate. By consistency of and since for any M > 0,
| (5.11) |
is our driving process.
Invelope process
Recall definitions (5.4). Our initial candidate for the invelope is defined as
where
Notice that because of the presence of in its definition, is not three times differentiable. We therefore define
From Proposition 5.5, we have that for any M > 0,
| (5.12) |
The derivatives of will describe the limiting behavior of our estimators. First, though, we must show that this process converges to the invelope. To do this, define the vector
| (5.13) |
and fix M > 0. We will show that is tight in the product space
This will be done last. We first assume that has a weak limit and identify its unique limit. The two arguments together prove that , and hence , have the appropriate limiting distribution.
Identifying the limit
It is sufficient to show that satisfies (1.2)–(1.4) in the limit.
For condition (1.2), calculate
| (5.14) |
with equality at the (non-increasing) touchpoints of , using (2.2). By (5.12), it follows that satisfies (1.2) in the limit.
Next, the derivatives of are calculated as follows:
Due to Theorem 3.1 and Proposition 5.5, we have that
| (5.15) |
where is convex, and hence the limit of will be convex. Thus, (1.3) is satisfied in the limit.
Let . We may then write
Notice that sup|t|≤M |1 − Bn(x0 + n−1/5t)| →a.s. 0, with bounded. Therefore, from Proposition 5.5, it follows that
| (5.16) |
where is piecewise constant, with jumps at the touchpoints of . By consistency of , we have
where . We say that a process Xn(t) is if sup|t|≤M |Xn(t)| is Op(1).
Next, fix a c > 0. Since is piecewise linear, it follows that dĝn puts mass only at the locations of touchpoints of However, at these locations, by (5.14), the process is equal to zero. It follows that
Hence,
using Proposition 5.5, (5.12) and the fact that ĝn is increasing.
It remains to show that (1.4) is maintained under limits. This follows from the continuous mapping theorem since for any element z = {z1, z2, z3, z4, z5, z6} ∈ E[−M, M],
is continuous in z for z6 increasing. We have thus shown that satisfies the invelope conditions (1.2)–(1.4) asymptotically. This shows that the only possible limit of is the process .
Tightness
We already know that and are tight in C[−M, M] and D[−M, M], respectively. To address tightness of the invelope processes, note that bounded] and increasing functions are compact in D[−M, M] and that bounded continuous functions with uniformly bounded derivatives are compact in C[−M, M]. These two facts allow us to address only stochastic boundedness of , i = 0, … , 3, to obtain tightness. Thus, Proposition 5.5, along with (5.15) and (5.16), says that and are tight. It remains to argue the same for and . However, this will follow by Proposition 5.5, and (5.12), if we can show that both and are tight.
Let τn be the largest touchpoint smaller than x0. By (2.7), and after careful calculations, we have
By Proposition 5.5, Lemma 5.3 and Theorem 3.1, the first two terms are tight in C[−M, M]. Arguments similar to those used in the proof of Lemma 5.1, along with Lemma 5.3, may be used to handle the remaining terms. Since by Lemma 2.2, it follows that is tight in C[−M, M], which, in turn, implies that is tight in the space E[−M, M].
From the invelope to Theorem 1.2
By (5.15) and (5.16), the limiting behavior of and is the same as that of the second and third derivatives of , which converge to the invelope of . Define k1, k2 by
For any a, b > 0, . Therefore, choose a, b so that a4b = k2 and a3/2b = k1. It follows that
Applying this rescaling to all processes shows that
It is now straightforward to calculate the correct constants, c1 and c2, of Theorem 1.2.
Acknowledgements
The research of Hanna Jankowski was supported by the NSERC; the research of Jon A. Wellner was supported in part by NSF Grant DMS-0503822 and NI-AID Grant 2R01 AI291968-04.
References
- [1].Balabdaoui F. Consistent estimation of a convex density at the origin. Math. Methods Statist. 2007;16:77–95. [Google Scholar]
- [2].Banerjee M. Estimating monotone, unimodal and U-shaped failure rates using asymptotic pivots. Statist. Sinica. 2008;18:467–492. [Google Scholar]
- [3].Banerjee M, Wellner JA. Likelihood ratio tests for monotone functions. Ann. Statist. 2001;29:1699–1731. [Google Scholar]
- [4].Banerjee M, Wellner JA. Confidence intervals for current status data. Scand. J. Statist. 2005;32:405–424. [Google Scholar]
- [5].Baraud Y, Birgé L. Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Related Fields. 2009;143:239–284. [Google Scholar]
- [6].Brunel E, Comte F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006;15:233–255. [Google Scholar]
- [7].Cai T, Low M. Technical report. Dept. Statistics, Univ. Pennsylvania; 2007. Adaptive estimation and confidence intervals for convex functions and monotone functions. [Google Scholar]
- [8].Carolan C, Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist. 1999;27:557–566. [Google Scholar]
- [9].Gruppo di Lavoro MPS. Catalogo parametrico dei terremoti italiani, versione 2004 (cpti04) INGV; Bologna: 2004. Available at http://emidius.mi.ingv.it/CPTI. in Italian. [Google Scholar]
- [10].Grenander U. On the theory of mortality measurement. II. Skand. Aktuarietidskr. 1956;39:125–153. [Google Scholar]
- [11].Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion +t4. Ann. Statist. 2001;29:1620–1652. [Google Scholar]
- [12].Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698. [Google Scholar]
- [13].Groeneboom P, Jongbloed G, Wellner JA. The support reduction algorithm for computing non-parametric function estimates in mixture models. Scand. J. Statist. 2008;35:385–399. doi: 10.1111/j.1467-9469.2007.00588.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Haupt E, Schäbe H. The TTT transformation and a new bathtub distribution model. J. Statist. Plann. Inference. 1997;60:229–240. [Google Scholar]
- [15].Jankowski H, Wellner JA. Nonparametric estimation of a convex bathtub-shaped hazard function. Univ. Washington, Department of Statistics; 2007. Technical Report 521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Jankowski H, Wellner JA. Computation of nonparametric convex hazard estimators via profile methods. J. Nonparametr. Stat. 2009;21:505–518. doi: 10.1080/10485250902745359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Jankowski H, Wang X, McCauge H, Wellner J. convexHaz: R functions for convex hazard rate estimation. R package version 0.2. 2008. [Google Scholar]
- [18].Jongbloed G. Minimax lower bounds and moduli of continuity. Statist. Probab. Lett. 2000;50:279–284. [Google Scholar]
- [19].La Rocca L. Bayesian non-parametric estimation of smooth hazard rates for seismic hazard assessment. Scand. J. Statist. 2008;35:524–539. [Google Scholar]
- [20].Lai CD, Xie M, Murthy NP. Bathtub-shaped failure rate life distributions. In: Balakrishnan N, Rao CR, editors. Advances in Reliability. Vol. 20. North-Holland Publishing; Amsterdam: 2001. pp. 69–104. Handbook of Statistics. [Google Scholar]
- [21].Politis DN, Romano JP. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050. [Google Scholar]
- [22].Politis DN, Romano JP, Wolf M. Subsampling. Springer; New York: 1999. [Google Scholar]
- [23].Rajarshi S, Rajarshi MB. Bathtub distributions: A review. Comm. Statist. Theory Methods. 1988;17:2597–2621. [Google Scholar]
- [24].Reboul L. Estimation of a function under shape restrictions. Applications to reliability. Ann. Statist. 2005;33:1330–1356. [Google Scholar]
- [25].Reynaud-Bouret P. Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields. 2003;126:103–153. [Google Scholar]
- [26].Robertson T, Wright FT, Dykstra RL. Order Restricted Statistical Inference. Wiley; Chichester: 1988. [Google Scholar]
- [27].Rockafellar RT. Convex Analysis. Princeton Mathematical Series. Vol. 28. Princeton Univ. Press; Princeton, NJ: 1970. [Google Scholar]
- [28].Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. Wiley; New York: 1986. [Google Scholar]
- [29].van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications in Statistics. Springer; New York: 1996. [Google Scholar]
- [30].Woodroofe M, Sun J. A penalized maximum likelihood estimate of f(0+) when f is nonincreasing. Statist. Sinica. 1993;3:501–515. [Google Scholar]

