Bi-s*-Concave Distributions

Nilanjana Laha; Zhen Miao; Jon A Wellner

doi:10.1016/j.jspi.2021.03.001

. Author manuscript; available in PMC: 2022 Dec 1.

Published in final edited form as: J Stat Plan Inference. 2021 Mar 13;215:127–157. doi: 10.1016/j.jspi.2021.03.001

Bi-s*-Concave Distributions

Nilanjana Laha ^a, Zhen Miao ^b, Jon A Wellner ^c,^*

PMCID: PMC8486153 NIHMSID: NIHMS1683197 PMID: 34602723

Abstract

We introduce new shape-constrained classes of distribution functions on $R$ , the bi-s*-concave classes. In parallel to results of Dümbgen et al. (2017) for what they called the class of bi-log-concave distribution functions, we show that every s-concave density f has a bi-s*-concave distribution function F for s* ≤ s/(s + 1).

Confidence bands building on existing nonparametric confidence bands, but accounting for the shape constraint of bi-s*-concavity, are also considered. The new bands extend those developed by Dümbgen et al. (2017) for the constraint of bi-log-concavity. We also make connections between bi-s*-concavity and finiteness of the Csörgő - Révész constant of F which plays an important role in the theory of quantile processes.

Keywords: log-concave, bi-log-concave, shape constraint, s-concave, quantile process, Csörgő - Révész condition, hazard function

1. Introduction

Statistical methods based on shape constraints have been developing rapidly during the past 15 - 20 years. From the classical univariate methods based on monotonicity going back to the work of Grenander (1956) and van Eeden (1956) in the 1950’s and 1960’s, research has progressed to consideration of convexity type constraints in a variety of problems including estimation of density functions, regression functions, and other “nonparametric” functions such as hazard (rate) functions. See Samworth and Sen (2018) for a summary and overview of some of this recent activity.

One very appealing shape constraint is log-concavity: a (density) function $f : R^{d} \to [0, \infty]$ is log-concave if log f is concave (with log 0 = −∞). See Samworth (2018) for a recent review of the properties of log-concave densities and their relevance for statistical applications. While much of the current literature has focused on point estimation, our main focus here will be on inference for one-dimensional distribution functions and especially on (honest, exact) confidence bands for distribution functions which take advantage of shape constraints.

To this end, Dümbgen et al. (2017) introduced the class of bi-log-concave distribution functions defined as follows: a distribution function F on $R$ is bi-log-concave if both F and 1 – F are log-concave. They provided several different equivalent characterizations of this property, and noted (the previously known fact) that if f is a log-concave density, then the corresponding distribution function F and survival function 1 – F are both log-concave. But the converse is false: there are many bi-log-concave distribution functions F with density f which fail to be log-concave; see Section 2 below for an explicit example. Dümbgen et al. (2017) also showed how to construct confidence bands which exploit the bi-log-concave shape constraint and thereby obtain narrower bands, especially in the tails, with correct coverage when the bi-log-concave assumption holds.

However, a difficulty with the assumption of bi-log-concavity is that the corresponding density functions inherit the requirement of exponentially decaying tails of the class of log-concave densities, and this rules out distribution functions F with tails decaying more slowly than exponentially. Here we introduce new shape-constrained families of distribution functions F, which we call the bi-s*-concave distributions, with tails possibly decaying more slowly (or more rapidly) than exponentially. As the name indicates, these families involve a parameter s* ∈ (−∞, 1] which allows heavier than exponential tails when s* < 0, lighter than exponential tails when s* > 0, and which correspond to exactly the bi-log-concave class introduced by Dümbgen et al. (2017) when s* = 0.

Here is an outline of the rest of the paper. In Section 2 we give careful definitions of the new classes of bi-s*-concave distributions. We also present several helpful examples and discuss some basic properties of the new classes and their connections to the classes of s-concave densities studied by Borell (1975), Brascamp and Lieb (1976), and Rinott (1976). (See also Dharmadhikari and Joag-Dev (1988), and Gardner (2002).) Section 3 contains the main theoretical results of the paper. The connection between the bi-s*-concave class and a key condition in the theory of quantile processes, the Csörgő - Révész condition, is discussed in Corollary 4. Finally, we give two tail bounds for distribution functions $F \in P_{s^{*}}$ , see Corollary 5.

In Section 4 we first introduce the new confidence bands for a distribution function $F \in P_{s^{*}}$ assuming s* is known. We also discuss some of their theoretical properties: the consistency of confidence bands is discussed in Theorem 7, and Theorem 9 provides a rate of convergence for linear functionals of bi-s*-distribution functions contained in the bands. This extends Theorem 5 of Dümbgen et al. (2017). We then briefly discuss the algorithms used to compute the new bands, and illustrate the new bands with real and artificial data. Section 5 gives a brief summary and statements of further problems. An especially important remaining problem concerns construction of confidence bands when s* is unknown. The proofs for all the results in Sections 2, 3, and 4 are given in Sections 6 and 7.

We conclude this section with some notation which will be used throughout the rest of the paper. The supremum norm of a function $h : R \to R$ is denoted by ${‖ h ‖}_{\infty} \equiv \sup_{x \in R} | h (x) |$ , and for $K \subset R$ we write ${‖ h ‖}_{K, \infty} \equiv \sup_{x \in K} | h (x) |$ . For a function x ↦ f(x),

f_{+}^{'} (x) \equiv \lim_{λ ↓ 0} \frac{f (x + λ) - f (x)}{λ}, and f_{-}^{'} (x) \equiv \lim_{λ ↑ 0} \frac{f (x + λ) - f (x)}{λ}, f (x +) \equiv \lim_{y ↓ x} f (y), and f (x -) \equiv \lim_{y ↑ x} f (y),

assuming that the indicated limits exist. In general, we use F and f to denote a distribution function and the corresponding density function with respect to Lebesgue measure, and we set $J (F) \equiv {x \in R : 0 < F (x) < 1}$ .

2. Definitions, Examples, and First Properties

As we discussed above, for distribution functions F on $R$ , Dümbgen et al. (2017) introduced a shape constraint they called bi-log-concavity by requiring that both F and 1 – F be log-concave.

In this paper, we generalize the bi-log-concave distribution functions by introducing and studying bi-s*-concave distributions defined as follows.

Definition 1.

For −∞ < s* < 0, a distribution function F is bi-s*-concave if both x ↦ F^s* (x) and x ↦ (1 − F(x))^s* are convex functions from $R$ to [0, ∞].

For s* = 0, a distribution function F is bi-s*-concave (or bi-log-concave) if both x ↦ log(F(x)) and x ↦ log (1 − F(x)) are concave functions from $R$ to [−∞, 0].

For 0 < s* ≤ 1, a distribution function F is bi-s*-concave if x ↦ F^s* (x) is concave from (inf J(F), ∞) to [0, 1] and x → (1 − F(x))^s* is concave from (−∞, sup J(F)) to [0, 1].

The class of bi-s*-concave distribution functions is denoted by $P_{s^{*}}$ , i.e.

P_{s^{*}} \equiv {F : F is bi - s^{*} - concave} .

Definition 2. (Alternative to Definition 1.)

A distribution function F is bi-s*-concave if it is continuous on $R$ and satisfies the following properties on J(F):

For −∞ < s* < 0, both x ↦ F^s*(x) and x ↦ (1 − F(x))^s* are convex functions on J(F).
For s* = 0, both x ↦ log(F(x)) and x ↦ log (1 − F(x)) are concave functions on J(F).
For 0 < s* ≤ 1, both x ↦ F^s* (x) and x ↦ (1 − F(x))^s* are concave functions on J(F).

See the Appendix, Section 7, for a proof of the equivalence of Definitions 1 and 2. The main benefit of the second definition is that it is immediately clear that any bi-s*-concave distribution function F is continuous since continuity of F is explicitly required in Definition 2. Moreover, to verify $F \in P_{s^{*}}$ we only need to verify the convexity or concavity of F^s* or (1 − F)^s* on the same interval J(F).

Recall that a density function f is s-concave if f^s is convex for s < 0, f^s is concave for s > 0, and log f is concave for s = 0. Two basic properties linking s-concave densities and bi-s*-concave distribution functions are given in the following two propositions. Proposition 1 generalizes the case s = 0 as noted above, while Proposition 2 generalizes the corresponding nestedness property of the classes of s-concave densities; see e.g. Dharmadhikari and Joag-Dev (1988), page 86, and Borell (1975), page 111.

Proposition 1. Suppose a density function f is s-concave with s ∈ (−1, ∞). Then the corresponding distribution function F is bi-s*-concave for all s* ≤ s/(1 + s).

Proposition 2. The bi-s*-concave classes are nested in the following sense:

P_{s^{*}} \subset P_{t^{*}}, {whenever t}^{*} \leq s^{*} \leq 1 .

(1)

Moreover, the bi-s*-concave classes are continuous at s* = 0 in the following sense:

⋃_{s^{*} > 0} P_{s^{*}} = P_{0} = ⋂_{s^{*} < 0} P_{s^{*}} .

(2)

In view of the nesting property (1), for each $F \in P_{s^{*}}$ for some s* we define

s_{0}^{*} (F) \equiv \sup {s^{*} : F is bi - s^{*} - concave} .

Similarly if f is s-concave for some s we define

s_{0} (f) \equiv \sup {s : f is s - concave} .

We often drop the subscript 0 if the meaning is clear. For other basic properties of s-concave densities and bi-s*-concave distribution functions, including results concerning closure under convolution, see Borell (1975), Dharmadhikari and Joag-Dev (1988), and Saumard (2019).

Now we introduce two important parameters, one of which will appear in connection with our characterization of the class of bi-s*-concave distribution functions in the next section and in our examples below. The Csörgő - Révész constant of a bi-log-concave distribution function F, denoted by $\tilde{γ} (F)$ , is given by

\tilde{γ} (F) \equiv \underset{x \in J (F)}{ess \sup} F (x) (1 - F (x)) \frac{| f^{'} (x) |}{f^{2} (x)},

(3)

provided that F is differentiable on $J (F) \equiv {x \in R : 0 < F (x) < 1}$ with derivative f ≡ F′ and f is differentiable almost everywhere on J(F) with derivative f′ = F″. Here the essential supremum is with respect to Lebesgue measure. Alternatively (and suited for our characterization Theorem 3),

γ (F) \equiv \underset{x \in J (F)}{ess \sup} {F (x) \land (1 - F (x))} \frac{| f^{'} (x) |}{f^{2} (x)} .

(4)

Note that since u∧(1 − u) ≤ 2u(1 − u) ≤ 2{u ∧ (1 − u)} it follows that $2^{- 1} γ (F) \leq \tilde{γ} (F) \leq γ (F)$ , and hence finiteness of γ(F) is equivalent to finiteness of $\tilde{γ} (F)$ . The Csörgő - Révész constant $\tilde{γ} (F)$ arises in the study of quantile processes and transportation distances between empirical distributions and true distributions on $R$ : see Csörgő and Révész (1978), Shorack and Wellner (2009), Barrio et al. (2005), and Bobkov and Ledoux (2019). It follows from the characterization Theorem 1(iv) of DKW (2017) that F is bi-log-concave if and only if $\bar{γ} (F) \leq 1$ . We will define $\bar{γ} (F) \geq γ (F)$ and generalize this to the classes of bi-s*-concave distribution functions in Section 3.

Now we consider several examples of s-concave densities and bi-s*-concave distribution functions.

Example 1. (Student-t) Suppose x ↦ f_r(x) is the density function of the Student-t distribution with r degrees of freedom defined as follows:

f_{r} (x) = \frac{Γ ((r + 1) / 2)}{\sqrt{π} Γ (r / 2)} {(1 + \frac{x^{2}}{r})}^{- (r + 1) / 2} for x \in R .

It is well-known (see e.g. Borell (1975)) that f_r is s-concave for any s ≤ −1/(1 + r) = s₀(f_r). Note that s takes values in (−1, 0) since r ∈ (0, ∞). It follows from Proposition 1 that $F_{r}^{s^{*}}$ and (1 − F)^s* are convex for $s^{*} = s ∕ (1 + s) = - 1 ∕ r = s_{0}^{*} (F_{r}) < 0$ , and hence F_r is bi-s*-concave for all 0 < r < ∞. Direct calculation shows that the Csörgő - Révész constant γ(F_r) = 1 − s* = 1 + (1/r) ∈ (1, ∞) for 0 < r < ∞.

In particular, this yields γ(F₁) = γ(Cauchy) = 2. And it suggests that γ(F) ≤ 1/(1 + s) = 1 − s* for all bi-s*-concave distribution functions F where 1/(1 + s) varies from 1 to ∞ as s varies from 0 to −1. This is one of the characterizations of the bi-s*-concave class that we will prove in Section 3.

Example 2. (F_a,b) Suppose that f_a,b is the family of F—distributions with “degrees of freedom” a > 0 and b > 0. (In statistical practice, if T has the density f_a,b, this would usually be denoted by T ~ F_a,b, where a is the “numerator degrees of freedom” and b is the “denominator degrees of freedom”.) The density is given by

f_{a, b} (x) = C_{a, b} \frac{x^{b / 2 - 1}}{{(a + b x)}^{(a + b) / 2}} for x \geq 0 .

(In fact, C(a, b) = a^a/2b^b/2Beta(a/2, b/2), and f_a,b(x) → g_b(x) as a → ∞ where g_b is the Gamma density with parameters b/2 and b/2.) It is well-known (see e.g. Borell (1975)) that f_a,b belongs to the class of s-concave densities, if s ≤ −1/(1 + a/2) = s₀(f_a,b) when a ≥ 2 and b ≥ 2. This implies that s ∈ [−1/2, 0), and the resulting $s_{0}^{*} = s ∕ (1 + s) = - 2 ∕ a$ is in [−1, 0). By Proposition 1, it follows that F^s* and (1 − F)^s* are convex; i.e. F is bi-s*-concave.

Example 3. (Pareto) Suppose that f_a,b = (a/b)(x/b)^−(a+1)1_[b,∞)(x), the Pareto distribution with parameters a > 0 and b > 0. In this case, f_a,b is s-concave for each s ≤ −1/(1 + a) by noting the convexity of $f_{a, b}^{- 1 ∕ (1 + a)} = (x ∕ b) \cdot {(b ∕ a)}^{1 ∕ (1 + a)}$ . Thus we take s = −1/(1 + a) ∈ (−1, 0) for a ∈ (0, ∞) and hence s* = s/(1 + s) equals −1/a. Furthermore, it is easily seen that

C R_{R} (x) \equiv (1 - F (x)) \frac{- f^{'} (x)}{f^{2} (x)} = 1 - s^{*} = 1 + 1 / a for all x > b .

(CR_R(·) represents the Csörgő - Révész function in the right tail.)

Thus the Pareto distribution is analogous to the exponential distribution in the log-concave case in the sense that x → f^s(x) = cx (with c = b⁻¹(b/a)^1/(1+a)) is linear.

Example 4. (Symmetrized Beta) Suppose that

f_{r} (x) = C_{r} {(1 - x^{2} / r)}^{r / 2} 1_{[- \sqrt{r}, \sqrt{r}]} (x),

where

C_{r} = Γ ((3 + r) / 2) / (\sqrt{π r} Γ (1 + r / 2))

and r ∈ (0, ∞). Note that f_r is an s-concave density with s = 2/r ∈ (0, ∞) since

f_{2}^{2 / r} (x) = C_{r}^{2 / r} (1 - x^{2} / r) 1_{[- \sqrt{r}, \sqrt{r}]}

is concave and hence the corresponding distribution function F_r is bi-s*-concave with s* = s/(1 + s) = 2/(2 + r). As r → ∞ it is easily seen that

f_{r} (x) \to {(2 π)}^{- 1 / 2} \exp (- x^{2} / 2),

the standard normal density. Thus r = ∞ corresponds to s = 0 and s* = 0. On the other hand,

g_{r} (x) \equiv \sqrt{r} f_{r} (\sqrt{r} x) = \sqrt{r} C_{r} {(1 - x^{2})}^{r / 2} 1_{[- 1, 1]} (x) \to 2^{- 1} 1_{[- 1, 1]} (x)

as r → 0. Thus r = 0 corresponds to s = ∞ and s* = 1.

Note that just as the class of bi-log-concave distributions is considerably larger than the class of log-concave distributions (as shown by Dümbgen et al. (2017)), the class of bi-s*-concave distributions is considerably larger than the class of s-concave distributions. In particular, multimodal distributions are allowed in both the bi-log-concave and the bi-s*-concave classes.

Example 5. (Exponential family; exponential tilt of U(0, 1)) Suppose that

f_{t} (x) = \exp (t x - K (t)) 1_{[0, 1]} (x)

where

K (t) \equiv {\begin{matrix} \log (e^{t} - 1) - \log t, t > 0, \\ 0, t = 0, \\ \log (1 - e^{t}) - \log (- t), t < 0, \end{matrix}

(5)

for −∞ < t < ∞ with K(0) ≡ 0, and further define $F_{t} (x) \equiv \int_{0}^{x} f_{t} (y) dy$ .

One can verify that f_t is s-concave only for s ≤ 0 and hence F_t is bi-s*-concave for s* ≤ s/(1 + s) ≤ 0 by Proposition 1. However, this might not be optimal; i.e. there remains the possibility that $F \in P_{s^{*}}$ for some s* > 0. In fact, by Theorem 3(iv) it follows that $F_{t} \in P_{s^{*}}$ with s* = e^−|t|. (For an example involving a power-tilt of U(0, 1), see Dharmadhikari and Joag-Dev (1988) (iv), page 95.) This also implies that the converse of Proposition 1 does not hold here or in general. The following two examples also illustrate this point.

Example 6. (Mixture of Gaussians shifted) (Dümbgen et al. (2017), page 2-3) Suppose that f_δ is the mixture (1/2)N(−δ, 1) + (1/2)N(δ, 1) with δ > 0. It is well-known that f_δ is bimodal if δ > 1. Since all s-concave densities are unimodal (see e.g. Dharmadhikari and Joag-Dev (1988) page 86), it follows that f_δ is not s-concave for any δ > 1. Dümbgen et al. (2017) showed (numerically) that the corresponding distribution F_δ is bi-log-concave for δ ≤ 1.34 but not for δ ≥ 1.35. With δ = 1.8 this example also shows that strict inequality can occur in the second inequality in Corollary 4 below.

Example 7. (Mixture of shifted Student-t) Now suppose that f is the mixture (1/2)t₁(· − δ) + (1/2)t₁(· + δ) with δ > 0 where t_r is the standard Student-t density with r degrees of freedom as in Example 1. Since f_δ is bimodal if δ > δ₀ ≈ 0.6 and all s-concave densities are unimodal, it follows that f_δ is not s-concave for any δ > δ₀. For values of δ < δ₀, f_δ is s-concave with s = −1/2, so Proposition 1 applies and shows that F_δ is bi-s*-concave with s* = −1. By numerical calculation, for δ > δ₀ the distribution functions F_δ are bi-s*-concave for some s* = s*(δ) ∈ (−∞, 1] which decreases (approximately linearly) for large δ.

Example 8. (Lévy with α = 1/2) This example is the completely asymmetric α—stable (or Lévy) law with α = 1/2. It gives the first passage time to the level a > 0 for a standard Brownian motion B (started at 0 and with no drift). See e.g. Durrett (2019), pages 372 - 374. The density is given by

f_{a} (t) = \frac{a}{\sqrt{2 π t^{3}}} \exp (- a^{2} / 2 t) 1_{(0, \infty)} (t),

and the distribution function $F_{a} (t) = 2 P (B_{t} \geq a) = 2 (1 - Φ (a ∕ \sqrt{t}))$ . It is easily seen that f_a is s-concave with s = −2/3, and hence F_a is bi-s*-concave with s* = −2. Thus γ(F) = 3.

The following table summarizes the examples:

Example 5 shows that strict inequality can hold in the inequality $γ (F) \leq \bar{γ} (F)$

3. Main Theoretical Results

Here is our theorem characterizing bi-s*-concave distribution functions.

Theorem 3. Let s* ≤ 1. For a non-degenerate distribution function F, the following statements are equivalent:

(i) F is bi-s*-concave.

(ii) F is continuous on $R$ and differentiable on J(F) with derivative f = F′.

Moreover, for s* ≠ 0,

F (y) {\begin{matrix} \leq F (x) \cdot {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+}^{1 / s^{*}} \\ \geq 1 - (1 - F (x)) \cdot {(1 - s^{*} \frac{f (x)}{1 - F (x)} (y - x))}_{+}^{1 / s^{*}} \end{matrix}

(6)

while for s* = 0

F (y) {\begin{matrix} \leq F (x) \cdot \exp (\frac{f (x)}{F (x)} (y - x)) \\ \geq 1 - (1 - F (x)) \cdot \exp (- \frac{f (x)}{1 - F (x)} (y - x)) \end{matrix}

(7)

for all x, y ∈ J(F).

(iii) F is continuous on $R$ and differentiable on J(F) with derivative f = F′ such that the s*-hazard function f/(1 − F)^1−s* is non-decreasing on J(F), and the reverse s*-hazard function f/F^1−s* is non-increasing on J(F).

(iv) F is continuous on $R$ and differentiable on J(F) with bounded and strictly positive derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying

- (1 - s^{*}) \frac{f^{2}}{1 - F} \leq f^{'} \leq (1 - s^{*}) \frac{f^{2}}{F} almost everywhere on J (F) .

(8)

The following two remarks are immediately consequences of Theorem 3. See Section 6 for a proof of Remark 1.

Remark 1.

(i) The proof of Theorem 3(iv) implies that if s* > 1, then not both F^s* and (1 − F)^s* can be concave.

(ii) If F is a bi-s*-concave distribution function for 0 < s* ≤ 1, then inf J(F) > −∞ and sup J(F) < ∞.

(iii) If F is a bi-s*-concave distribution function for s* < 0, then it follows that

(0, T (F)) \subset {t \in R^{+} : \int {| x |}^{t} d F (x) < \infty},

(9)

with

T (F) \equiv {\begin{matrix} \infty if inf J (F) > - \infty and sup J (F) < \infty, \\ - \frac{1}{s^{*}} otherwise . \end{matrix}

(10)

Remark 2. Suppose that F is a bi-s*-concave distribution function, and define

T_{1} (F) \equiv \sup_{x \in J (F)} \frac{f}{F^{1 - s^{*}}} (x), {and T}_{2} (F) \equiv \sup_{x \in J (F)} \frac{f}{{(1 - F)}^{1 - s^{*}}} (x) .

Since f/F^1-s* is monotonically non-increasing on J(F), it follows that for any x,x₀ ∈ J(F) with x < x₀,

\frac{f}{F^{1 - s^{*}}} (x) \geq \frac{\frac{1}{s^{*}} F^{s^{*}} (x) - \frac{1}{s^{*}} F^{s^{*}} (x_{0})}{x - x_{0}}

and hence

T_{1} (F) = \sup_{x \in J (F)} \frac{f}{F^{1 - s^{*}}} (x) = \lim_{x \to \inf J (F)} \frac{f}{F^{1 - s^{*}}} (x) {\begin{matrix} > 0, \\ = \infty if inf J (F) > - \infty . \end{matrix}

Analogously one can show that

T_{2} (F) {\begin{matrix} > 0, \\ = \infty if sup J (F) < \infty . \end{matrix}

Corollary 4. (Connection with the Csörgő - Révész constant.)

Suppose F is a bi-s*-concave distribution function for s* ≤ 1. Then with $\tilde{γ} (F)$ and γ(F) as defined in (3) and (4), we have

\frac{1}{2} γ (F) \leq \tilde{γ} (F) \leq γ (F) \leq \bar{γ} (F) \leq 1 - s^{*},

(11)

where

\bar{γ} (F) \equiv \max {\tilde{C R} (F), \tilde{C R} (\bar{F})}, \bar{F} \equiv 1 - F, \tilde{C R} (F) \equiv \underset{x \in J (F)}{ess \sup} \frac{F (x) F^{″} (x)}{{(F^{'} (x))}^{2}},

and

γ (F) \equiv \underset{x \in J (F)}{ess \sup} \frac{{F (x) \land (1 - F (x))} | F^{″} (x) |}{{(F^{'} (x))}^{2}} = \underset{x \in J (F)}{ess \sup} \frac{{F (x) \land (1 - F (x))} | f^{'} (x) |}{{(f (x))}^{2}} .

Remark 3. By Theorem 3, one can verify that $\tilde{CR} (F)$ is well-defined for any $F \in P_{s^{*}}$ . Note that

\tilde{C R} (\bar{F}) \equiv \underset{x \in J (F)}{ess \sup} \frac{\bar{F} (x) (- F^{″} (x))}{{(F^{'} (x))}^{2}} .

The first two inequalities in Corollary 4 follow (as we noted before) from 2⁻¹{u∧(1 − u)} ≤ u(1 − u) ≤ u ∧ (1 − u) for 0 ≤ u ≤ 1. Thus finiteness of $\tilde{γ} (F)$ implies finiteness of γ(F) and vice-versa. Examples show that strict inequality may hold in the inner inequalities in (11). On the other hand, if f is non-decreasing on (a, F⁻¹(1/2)) and f is non-increasing on (F⁻¹(1/2), b) where J(F) = (a, b), then $γ = \bar{γ}$ by inspection of the proof of $γ (F) \leq \bar{γ} (F)$ .

Corollary 5. (Bounds for $F \in P_{s^{*}}$ , where s* ≠ 0.)

For any s* ∈ (−∞, 0) ∪ (0, 1] and $F \in P_{s^{*}}$ ,

F_{L} (x) \leq F (x) \leq F_{U} (x),

(12)

where $F_{L} (x) \equiv \frac{1}{s^{*}} (F^{s^{*}} (x) - (1 - s^{*}))$ and $F_{U} (x) \equiv \frac{1}{s^{*}} (1 - {(1 - F (x))}^{s^{*}})$ .

Moreover, F_U(x) is a convex function on J(F), and F_L(x) is a concave function on J(F). For s* = 0 and $F \in P_{0}$ , (12) holds with F_L(x) = 1 + log F(x) and F_U(x) = −log(1 − F(x)).

4. Confidence bands for bi-s*-concave distribution functions

Our goal in this section is to define confidence bands for F which exploit the shape constraint $F \in P_{s_{0}^{*}}$ . We start with some known unconstrained non-parametric bands and define new bands under the assumption that the true distribution function F satisfies the shape constraint $F \in P_{s_{0}^{*}}$ where $s_{0}^{*}$ is known.

4.1. Definitions and Basic Properties

Let X₁, …, X_n be i.i.d. random variables with continuous distribution function F. A (1 − α)-confidence band, denoted by (L_n, U_n), for F means that both L_n and U_n are monotonically non-decreasing functions on $R$ depending on α and X₁, …, X_n only, moreover, L_n and U_n have to satisfy L_n < 1, U_n > 0 and

P (L_{n} (x) \leq F (x) \leq U_{n} (x) for all x \in R) = 1 - α .

The following two bands are discussed in Dümbgen et al. (2017) and we briefly restate them here.

Example (Komogorov-Smirnov band). A Komogorov-Smirnov band (L_n, U_n) is given by

[L_{n} (x), U_{n} (x)] \equiv [F_{n} (x) - \frac{κ_{α, n}^{K S}}{\sqrt{n}}, F_{n} (x) + \frac{κ_{α, n}^{K S}}{\sqrt{n}}] \cap [0, 1],

where $F_{n}$ is the empirical distribution function and $κ_{α, n}^{KS}$ denotes the (1 − α)-quantile of $\sup_{x \in R} n^{1 / 2} | F_{n} (x) - F (x) |$ , see Shorack and Wellner (2009) Note that $κ_{α, n}^{K S} \leq \sqrt{\log (2 / α) / 2}$ by Massart’s (1990) inequality, see Massart (1990).

Example (Weighted Komogorov-Smirnov band). A Weighted Komogorov-Smirnov band (L_n, U_n) is as follows: for any γ ∈ [0, 1/2),

[L_{n} (x), U_{n} (x)] \equiv [t_{i} - \frac{κ_{α, n}^{W K S}}{\sqrt{n}} {(t_{i} (1 - t_{i}))}^{γ}, t_{i + 1} + \frac{κ_{α, n}^{W K S}}{\sqrt{n}} {(t_{i + 1} (1 - t_{i + 1}))}^{γ}] \cap [0, 1],

for i ∈ {0, 1, …, n} and x ∈ [X_(i), X_(i+1)), where ${X_{(i)}}_{i = 1}^{n}$ denotes the order statistics of ${X_{i}}_{i = 1}^{n}$ , X₍₀₎ ≡ −∞, X_(n+1) ≡ ∞, t_i ≡ i/(n + 1) for i = 1, …, n, and $κ_{α, n}^{WKS}$ denotes the (1 − α)-quantile of the following test statistics

\sqrt{n} \max_{i = 1, \dots, n} \frac{| F (X_{(i)}) - t_{i} |}{{(t_{i} (1 - t_{i}))}^{γ}} .

Note that $κ_{α, n}^{W K S} = O (1)$ .

A further example of a nonparametric confidence band due to Owen (1995) and refined by Dümbgen and Wellner (2014) was considered by Dümbgen et al. (2017). We will not consider this third possibility further here due to space constraints.

Now we turn to confidence bands for bi-s*-distribution functions. Our approach will be to refine the three unconstrained bands given in the three examples. Suppose F is a bi-s*-concave distribution function. A nonparametric (1 − α) confidence band (L_n, U_n) for F may be refined as follows:

L_{n}^{o} (x) \equiv \inf {G (x) : G \in P_{s^{*}}, L_{n} \leq G \leq U_{n}}, U_{n}^{o} (x) \equiv \sup {G (x) : G \in P_{s^{*}}, L_{n} \leq G \leq U_{n}} .

If there is no bi-s*-concave distribution function F fitting into the band (L_n, U_n), we set $L_{n}^{o} \equiv 1$ and $U_{n}^{o} \equiv 0$ and we conclude with confidence 1 − α that F is not bi-s*-concave. But in the case of $F \in P_{s^{*}}$ , this happens with probability at most α.

The following lemma implies two properties of our shape-constrained band ( $(L_{n}^{o}, U_{n}^{o})$ ). The first one is that both $L_{n}^{o}$ and $U_{n}^{o}$ are Lipschitz continuous on $R$ , unless $\inf {x \in R : L_{n} (x) > 0} \geq \sup {x \in R : U_{n} (x) < 1}$ . The second one is that $L_{n}^{o} (x)$ converges polynomially fast to 0 as x → −∞ and $U_{n}^{o} (x)$ converges polynomially fast to 1 as x → ∞ as long as lim_x→∞ L_n(x) > lim_x→−∞ U_n(x).

Lemma 6. For real numbers a < b, 0 < u < v < 1 and s* ∈ (−∞, 0) ∪ (0, 1], define

γ_{1} \equiv \frac{\frac{1}{s^{*}} (υ^{s^{*}} - u^{s^{*}})}{b - a} and γ_{2} \equiv \frac{\frac{- 1}{s^{*}} ({(1 - υ)}^{s^{*}} - {(1 - u)}^{s^{*}})}{b - a} .

(i) If L_n(a) ≥ u and U_n(b) ≤ v, then $L_{n}^{o}$ and $U_{n}^{o}$ are Lipschitz-continuous on $R$ with Lipschitz constant max{γ₁, γ₂}.

(ii) If U_n(a) ≤ u and L_n(b) ≥ v, then

U_{n}^{o} (x) \leq {(u^{s^{*}} + s^{*} γ_{1} (x - a))}_{+}^{1 / s^{*}} for x \leq a,

1 - L_{n}^{o} (x) \leq {((1 - υ) {}^{s^{*}}- s^{*} γ_{2} (x - b))}_{+}^{1 / s^{*}} for x \geq b .

The following theorem implies the consistency of our proposed confidence band ( $L_{n}^{o}, U_{n}^{o}$ ).

Theorem 7. Suppose that the original confidence band (L_n, U_n) is consistent in the sense that for any fixed $x \in R$ , both L_n(x) and U_n(x) tend to F(x) in probability.

(i) Suppose that $F \notin P_{s^{*}}$ . Then $P (L_{n}^{o} \leq U_{n}^{o}) \to 0$ .

(ii) Suppose that $F \in P_{s^{*}}$ with s* ≠ 0. Then $P (L_{n}^{o} \leq U_{n}^{o}) \geq 1 - α$ , and

\sup_{G \in P_{s^{*}} : L_{n} \leq G \leq U_{n}} {‖ G - F ‖}_{\infty} \to_{p} 0,

(13)

where sup(Ø) ≡ 0. Moreover, for any compact interval K ⊂ J(F),

\sup_{G \in P_{s^{*}} : L_{n} \leq G \leq U_{n}} {‖ h_{G} - h_{F} ‖}_{K, \infty} \to_{p} 0,

(14)

where h_G stands for any of the three functions G′, (G^s*)′, and ((1 − G)^s*)′. Finally, for any fixed x₁ ∈ J(F) and 0 < b₁ < f(x₁)/F^1−s*(x₁),

P (U_{n}^{o} (x) \leq {(U_{n}^{s^{*}} (x^{'}) + s^{*} b_{1} (x - x^{'}))}_{+}^{1 / s^{*}} for x \leq x^{'} \leq x_{1}) \to 1,

(15)

while for any fixed x₂ ∈ J(F) and 0 < b₂ < f(x₂)/(1 − F(x₂))^1−s*

P (1 - L_{n}^{o} (x) \leq {((1 - L_{n} (x^{'}))}^{s^{*}} - s^{*} b_{2} {(x - x^{'}))}_{+}^{1 / s^{*}} for x \geq x^{'} \geq x_{2}) \to 1 .

(16)

The following result provides the consistency of confidence bands for functionals ∫ ϕdF of F with well-behaved integrands $ϕ : R \mapsto R$ .

Corollary 8. Suppose that the original confidence band (L_n, U_n) is consistent, and let $F \in P_{s^{*}}$ with s* < 0. Let $ϕ : R \mapsto R$ be absolutely continuous with a continuous derivative ϕ′ satisfying the following constraint: there exist constants a > 0 and k < −1/s* such that

| ϕ^{'} (x) | \leq a {| x |}^{k - 1} .

Then

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d G - \int ϕ d F | \to_{p} 0 .

The following theorem provides rates of convergence, with the following condition on the original confidence band (L_n, U_n):

Condition (*): For certain constants γ ∈ [0, 1/2) and κ, λ > 0,

\max {F_{n} - L_{n}, U_{n} - F_{n}} \leq κ n^{- 1 / 2} {(F_{n} (1 - F_{n}))}^{γ}

on the interval ${λ n^{- 1 / (2 - 2 γ)} \leq F_{n} \leq 1 - λ n^{- 1 / (2 - 2 γ)}}$ .

As stated in Dümbgen et al. (2017), this condition is satisfied with γ = 0 in the case of the Kolmogorov-Smirnov band. In the case of the weighted Kolmogorov-Smirnov band, it is satisfied for the given value of γ ∈ [0, 1/2). For the refined version of Owen’s band, it is satisfied for any fixed number γ ∈ (0, 1/2).

Theorem 9. Suppose that $F \in P_{s^{*}}$ with s* < 0 and let (L_n, U_n) satisfy Condition (*). Let $ϕ : R \mapsto R$ be absolutely continuous with a continuous derivative ϕ′.

Suppose that |ϕ′(x)| = O(|x|^k−1) as |x| → ∞ for some numbers k < −1/s*. Then

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d G - \int ϕ d F | = O_{p} (n^{- \frac{1}{2} (1 \land \frac{k_{s^{*}} + 1}{1 - γ})}) .

(17)

Remark: (i) From (17), one can verify that the convergence rate is n^−1/2 as long as k < γ/(−s*).

(ii) From (17), one can verify that when γ = 0, the convergence rate is n^{−1/2+k/(−s*)} and we have a “power deficit” (or “polynomial rate deficit”) relative to n^−1/2.

4.2. Implementation and illustration of the confidence bands

In this section, we discuss the implementation of confidence bands for bi-s*-concave distribution functions. This extends the treatment of Dümbgen et al. (2017) from s* = 0 to general values s* ∈ (−∞, 1].

Recall the procedure ConcInt(·, ·) developed in Dümbgen et al. (2017). Given any finite set $T = {t_{0}, \dots, t_{m}}$ of real numbers t₀ < t₁ < ⋯ < t_m and any pair (l, u) of functions $l, u : T \to [- \infty, \infty)$ with l < u pointwise and l(t) > −∞ for at least two different points $t \in T$ , this procedure computes the pair (l^o, u^o) where

l^{o} (x) \equiv \inf {g (x) : g is concave on R, l \leq g \leq u on T}, u^{o} (x) \equiv \sup {g (x) : g is concave on R, l \leq g \leq u on T} .

First note that l^o is the smallest concave majorant of l on $T$ ; thus it may be computed by a version of the pool-adjacent-violators algorithm; see for example Robertson et al. (1988). Then we obtain indices 0 ≤ j(0) < j(1) < ⋯ < j(b) ≤ m such that

l^{o} {\begin{cases} \equiv - \infty on R \ [t_{j (0)}, t_{j (b)}], \\ is linear on [t_{j (a - 1)}, t_{j (a)}] for 1 \leq a \leq b, \\ change slope at t_{j (a)} if 1 \leq a \leq b . \end{cases}

With l^o in hand, we then check to see if l^o ≤ u on $T$ . If this fails, then there is no concave function lying between l and u, and the procedure returns an error message. If this test succeeds, then we compute u^o(x) as

\min {u (s) + \frac{u (s) - l^{o} (r)}{s - r} (x - s) : r \in T_{0}, r < s \leq x or x \leq s < r},

where $T_{o} = {t_{j (0)}, t_{j (1)}, \dots, t_{j (b)}}$ . (The rest of the description of the procedure ConcInt(·, ·) is just as in Dümbgen et al. (2017).)

When s* < 0, let g(v; s*) ≡ g(v) ≡ −v^s* and h(v; s*) ≡ h(v) ≡ (−v)^1/s*. (This is the most important new case. When s = s* = 0, g(v) ≡ log(v), h(v) ≡ exp(v). When s* > 0, g(v) ≡ v^s* and h(v) ≡ v^1/s*.) Here is pseudocode for the computation of ( $(L_{n}^{o}, U_{n}^{o})$ ).

(L_{n}^{o}, U_{n}^{o}) \leftarrow (L_{n} U_{n}) (l^{o}, u^{o}) \leftarrow ConcInt (g (L_{n}^{o}), g (U_{n}^{o})) ({\tilde{L}}_{n}^{o}, {\tilde{U}}_{n}^{o}) \leftarrow (h (l^{o}), h (u^{o})) (l^{o}, u^{o}) \leftarrow ConcInt (g (1 - {\tilde{U}}_{n}^{o}), g (1 - {\tilde{L}}_{n}^{o})) ({\tilde{L}}_{n}^{o}, {\tilde{U}}_{n}^{o}) \leftarrow (1 - h (u^{o}), 1 - h (l^{o})) while ({\tilde{L}}_{n}^{o}, {\tilde{U}}_{n}^{o}) \neq (L_{n}^{o}, U_{n}^{o}) do (L_{n}^{o}, U_{n}^{o}) \leftarrow ({\tilde{L}}_{n}^{o}, {\tilde{U}}_{n}^{o}) (l^{o}, u^{o}) \leftarrow ConcInt (g (L_{n}^{o}), g (U_{n}^{o})) ({\tilde{L}}_{n}^{o}, {\tilde{U}}_{n}^{o}) \leftarrow (h (l^{o}), h (u^{o})) (l^{o}, u^{o}) \leftarrow ConcInt (g (1 - {\tilde{U}}_{n}^{o}), g (1 - {\tilde{L}}_{n}^{o})) ({\tilde{L}}_{n}^{o}, {\tilde{U}}_{n}^{o}) \leftarrow (1 - h (u^{o}), 1 - h (l^{o})) end while .

Illustration of the confidence bands

To get some feeling for the new confidence bands in a setting in which $s_{0}^{*}$ is known, we generated a sample of size n = 100 from the Student-t distribution with r = 1 degrees of freedom. This distribution belongs to $P_{s^{*}}$ for every $s^{*} \leq - 1 \equiv s_{0}^{*}$ . We constructed Kolmogorov-Smirnov (KS) and weighted Kolmogorov-Smirnov (WKS) bands with γ = 0.4 as the initial starting bands (L_n, U_n). We then computed and plotted our shape constrained confidence bands ( $(L_{n}^{0}, U_{n}^{0})$ ) under the (correct) assumption that s* = −1 and the (incorrect) assumption that s* = 0 for both the KS and WKS bands as initial nonparametric bands with for α = 0.05; see Figure 1 and Figure 2. To see the components of Figures 1 and 2 separately, see the Supplementary file, Figures 1–2 and 3–4 respectively.

Figure 1: — Confidence bands for bi-s*-concave distribution functions based on KS bands. The black curve is the distribution function of the Student-t distribution with 1 degree of freedom. The two gray-black lines give the KS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function in the middle is the empirical distribution function.

Figure 2: — Confidence bands for bi-s*-concave distribution functions based on WKS bands. The black curve is the distribution function of the Student-t distribution with 1 degree of freedom. The two gray-black lines give the WKS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function in the middle is the empirical distribution function.

Figure 3: — Confidence Bands for bi-s*-concave distribution functions from KS bands based on a sample of size 1000 from the Student-t distribution with 1 degree of freedom. The two gray-black lines give the initial bands, lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function (black) in the middle is the empirical distribution function.

Figure 4: — Confidence Bands for bi-s*-concave distribution functions from WKS bands based on a sample of size 1000 from the Student-t distribution with one degree of freedom. The two gray-black lines give the initial bands, lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function (black) in the middle is the empirical distribution function.

Note that when s* = 0, s* is miss-specified and the resulting bands are not guaranteed to have coverage probabiltiy .95. An indication of this is that the shape constrained bands computed under the assumption s* = 0 do not contain the empirical distribution.

From these two plots, an immediate observation is that the confidence bands for smaller s* are wider than those with larger s*. This is a direct consequence of the nested property of the bi-s*-concave classes; see Proposition 2. Also note that the shape constrained band with s* = −1 does improve on the KS band, especially in the tail.

An Application

Dümbgen et al. (2017) gave an application of bi-log-concave confidence bands to a dataset from Woolridge (2000). It contains approximate annual salaries of the CEOs of 177 randomly chosen companies in the U.S. The salary is rounded to multiples of 1000 USD. We denote the i-th observed approximate salary by Y_i,raw. Dümbgen et al. (2017) assume that the unobserved true salary Y_i,true lies within (Y_i,raw − 1, Y_i,raw + 1). Let us assume that G_true is the unknown distribution of Y_true. For income data it is sometimes assumed that log₁₀ Y_true is Gaussian (see Kleiber and Kotz (2003)). Since Gaussian densities are all log-concave and hence have bi-log-concave distribution functions (by Proposition 1), it is natural to consider replacing the Gaussian assumption by the assumption of bi-log-concavity. Dümbgen et al. (2017) therefore assumed that X = log₁₀ Y_true is bi-log-concave and constructed 95% confidence bands (L_n, U_n) (see Figure 4 of Dümbgen et al. (2017)) where L_n is computed with the empirical distribution of $\log_{10} {(Y_{i, raw} - 1)}_{i = 1}^{n}$ and U_n is computed with that of $\log_{10} {(Y_{i, raw} + 1)}_{i = 1}^{n}$ .

Here we assume that the distribution of X is bi-s*-concave for some s* and compute confidence bands for different values of s*. Now we are confronted with the issue of choosing s*: if we want narrower confidence bands we would assume some value of s* ∈ (0, 1], while if we are not willing to assume s* = 0 (the choice made by Dümbgen et al. (2017), then we would assume some value of s* < 0 (leading to the larger classes $P_{s^{*}}$ with s* < 0. It is of some interest to know if the CEO data could be modeled by use of the bi-s* classes with s* ∈ (0, 1] since this would result in still narrower confidence bands. But it is also of interest to try to use the data to choose s*.

Choosing s*

Since F can be a member of $P_{s^{*}}$ for various values of s*, each s* leads to a different set of bands. However, due to the nesting property of $P_{s^{*}}$ , a larger s* always yields a narrower confidence band. Thus, it is of interest to estimate

s_{0}^{*} (F) ≔ \sup {s^{*} \in (- \infty, 1] : F \in P_{s^{*}}}

since $s^{*} = s_{0}^{*}$ generates the narrowest bands at a given confidence level. If F is not bi-s*-concave for any s* ≤ 1, then we set $s_{0}^{*} (F) = - \infty$ . Now $s_{0}^{*}$ is connected to the Csörgő - Révész constant since $s^{*} = s_{0}^{*}$ when $\bar{γ} (F) = 1 - s^{*}$ and $F \in P_{s}^{*}$ . For example, the Student-t distribution with r “degree of freedom” has $s_{0}^{*} = - 1 ∕ r$ . However, this connection cannot be easily exploited for practical estimation purposes due to difficulties in estimating γ(F) or $\bar{γ} (F)$ . So we take an alternative route to making inference about $s_{0}^{*}$ .

Starting from an initial 1 − α band (L_n, U_n), a bound on $s_{0}^{*}$ is given by

{\bar{s}}_{n}^{*} = \sup {s^{*} \in (- \infty, 1] : (L_{n}, U_{n}) contains some d.f. F \in P_{s^{*}}} .

Clearly, for $s^{*} > {\bar{s}}_{n}^{*}$ , there is no bi-s*-concave distribution function fitting into the band (L_n, U_n). Since this happens with probability at most α ∈ (0, 1) when the true distribution function $F \in P_{s^{*}}$ , it follows that $(- \infty, {\bar{s}}_{n}^{*}]$ is a confidence set for $s_{0}^{*}$ with coverage probability at least 1 − α. Our simulations suggest that ${\bar{s}}_{n}^{*}$ is generally considerably larger than $s_{0}^{*}$ , and hence not suitable as an estimator, especially for α = 0.05.

Instead, we propose an estimator of $s_{0}^{*}$ based on the $F_{n}$ measure of the set where the empirical measure remains between the shape-constrained band for s*. More formally, let $L_{n}^{o} (s^{*})$ and $U_{n}^{o} (s^{*})$ denote the 1 − α level bi-s*-concave confidence bands based on the initial bands L_n and U_n and the assumption $F \in P_{s^{*}}$ . Define

ω (s^{*}) ≔ n^{- 1} = \sum_{i = 1}^{n} 1 {L_{n}^{o} (s^{*}) (X_{i}) \leq F_{n} (X_{i}) \leq U_{n}^{o} (s^{*}) (X_{i})} \cdot 1 {L_{n}^{o} (s^{*}) (X_{i}) \leq U_{n}^{o} (s^{*}) (X_{i})} = F_{n} ({L_{n}^{o} (s^{*}) \leq F_{n} \leq U_{n}^{o} (s^{*})} \cap {L_{n}^{o} (s^{*}) \leq U_{n}^{o} (s^{*})}) .

A higher value of ω(s*) indicates that ( $(L_{n}^{o} (s^{*}), U_{n}^{o} (s^{*}))$ ) contains a greater portion of $F_{n}$ . Since the bands (L_n(s*), U_n(s*)) become narrower as s* increases, ω(s*) decreases in s*, and eventually becomes zero when $s^{*} > {\bar{s}}_{n}^{*}$ . A plausible estimator of $s_{0}^{*}$ is therefore given by

{\hat{s}}_{n}^{*} = \min {s^{*} \in (- \infty, {\bar{s}}_{n}^{*}] : ω (s^{*}) > ρ},

(18)

where ρ is a threshold taking values in (0, 1). The calculation of ${\hat{s}}_{n}^{*}$ thus depends on α and ρ.

In the case of the CEO data, ${\bar{s}}_{n}^{*} \approx 0.23$ for the KS initial band, and ${\bar{s}}_{n}^{*} \approx 0.18$ for the WKS band. Taking α = 0.05 and ρ = .95, leads to ${\hat{s}}_{n}^{*} = 0.12$ , while taking α = 0.05 and ρ = 0.95, leads to ${\hat{s}}_{n}^{*} = .12$ . The resulting bands are given in Figures 5 and 6. Also see the Supplementrary file, Figures 9–10 and Figures 11–12 for the steps in constructing Figures 5 and 6.

Figure 5: — Confidence Bands from an initial KS band for the CEO salary data. The step function in the middle is the empirical distribution function. The two gray-black lines give the KS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption.

Figure 6: — Confidence Bands from an initial WKS band for the CEO salary data. The step function in the middle is the empirical distribution function. The two gray-black lines give the WKS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption.

We should emphasize that our current theory says little about the coverage probabilities of the bands ( $(L_{n}^{o} (s^{*}), U_{n}^{o} (s^{*}))$ ). Discussion of the consistency of ${\hat{s}}_{n}^{*}$ is beyond the scope of the present paper, but this and further issues concerning inference for both s* and $F \in P_{s}$ seem to be interesting directions for future research.

5. Summary and further problems

In this paper we have:

• Defined new classes of shape-constrained distribution functions, the bi-s*-classes extending the bi-log-concave class of distribution functions defined by Dümbgen et al. (2017).

• Characterized the new classes and connected our characterization to an important parameter, the Csörgő - Révész constant associated with a distribution function F.

• Used the new bi-s*-concave classes to define refined confidence bands for distribution functions which exploit the shape constraint, thereby producing more accurate (narrower) bands with honest coverage when the shape constraint holds.

Thus we have shown that if we know the parameter s* ∈ (−∞, 1] determining the class, we can construct refined confidence bands which improve on any given nonparametric confidence bands if the given value of s* is correct. It follows from the construction of our bands that they have conservative coverage probabilities under the (null) hypothesis that the true distribution function is in $P_{s^{*}}$ and that s* is correctly specified.

• What if we do not know s*? Can we estimate it from the data? As becomes clear from the discussion of the CEO data via Figures 5 and 6, our methods provide one-sided confidence bounds for the true s* of the form (−∞, ${\bar{s}}_{n}^{*}$ ] under the assumption that $F \in P_{s^{*}}$ for some s*. It remains to develop inference methods for s* and (s*, F) jointly. It will also be of interest to have a more complete understanding of the power behavior of tests related to ${\bar{s}}_{n}^{*}$ and ${\hat{s}}_{n}^{*}$ .

• The stable laws are known to be unimodal; see e.g. Hall (1984) for some history. In connection with Example 8 we have the following:

Conjecture:

the α—stable laws are s—concave with s = −1/(1 + α) for 0 < α < 2.

6. Proofs

Proof of Theorem 3:

Throughout our proof we will denote inf J(F) and sup J(F) by a and b respectively. Moreover, we assume s* < 0 in the following proof and leave the case of s* > 0 for the Appendix. Note that the case s* = 0 is proved by Dümbgen et al. (2017).

(i) implies (ii):

Suppose $F \in P_{s^{*}}$ . To prove that F is continuous on $R$ , we first note that x ↦ F^s*(x) and x ↦ (1 − F(x))^s* (x) are convex functions on $R$ . By Theorem 10.1 (page 82) of Rockafellar (1970), F^s* and (1 − F(x))^s* are continuous on any open convex sets in their effective domains. In particular, F^s* and (1 − F)^s* are continuous on (a, ∞) and (−∞, b) respectively. This implies that F is continuous on (a, ∞) and (−∞, b), or equivalently, on (a, ∞) ∪ (−∞, b) = (−∞, ∞) since F is non-degenerate.

To prove that F is differentiable on J(F), note that J(F) = (a, b) since F is continuous on $R$ . By Theorem 23.1 (page 213) of Rockafellar (1970), for any x ∈ J(F), the convexity of F^s* on J(F) implies the existence of ${(F^{s^{*}})}_{+}^{'} (x)$ and ${(F^{s^{*}})}_{-}^{'} (x)$ . Moreover, ${(F^{s^{*}})}_{-}^{'} (x) \leq {(F^{s^{*}})}_{+}^{'} (x)$ by Theorem 24.1 (page 227) in Rockafellar (1970). Since F = (F^s*)^1/s* on J(F), the chain rule guarantees the existence of $F_{\pm}^{'} (x)$ and

F_{\pm}^{'} (x) = \frac{1}{s^{*}} {(F^{s^{*}})}^{1 / s^{*} - 1} (x \pm) {(F^{s^{*}})}_{\pm}^{'} (x) .

Since F is continuous on J(F), then

F_{\pm}^{'} (x) = \frac{1}{s^{*}} {(F^{s^{*}})}^{1 / s^{*} - 1} (x) {(F^{s^{*}})}_{\pm}^{'} (x) .

Hence $F_{-}^{'} (x) \geq F_{+}^{'} (x)$ by noting that ${(F^{s^{*}})}_{-}^{'} (x) \leq {(F^{s^{*}})}_{+}^{'} (x)$ and s* < 0.

Similarly, one can prove $F_{-}^{'} (x) \leq F_{+}^{'} (x)$ by the convexity of (1 − F)^s* on J(F). Thus $F_{-}^{'} (x) = F_{+}^{'} (x) = F^{'} (x)$ for any x ∈ J(F), or equivalently, F is differentiable on J(F). The derivative of F is denoted by f, i.e. f ≡ F′.

To prove (6), note that the convexity of x ↦ F^s* (x) on J(F) implies that, for any x,y ∈ J(F),

F^{s^{*}} (y) - F^{s^{*}} (x) \geq (y - x) {(F^{s^{*}})}^{'} (x) = (y - x) s^{*} F^{s^{*} - 1} (x) f (x),

or, with x₊ = max{x, 0},

\frac{F^{s^{*}} (y)}{F^{s^{*}} (x)} \geq {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+} .

Hence,

\frac{F (y)}{F (x)} \leq {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+}^{1 / s^{*}},

or, equivalently,

F (y) \leq F (x) {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+}^{1 / s^{*}} .

Analogously, the convexity of (1 − F(x))^s* on J(F) implies that

{(1 - F (y))}^{s^{*}} - {(1 - F (x))}^{s^{*}} \geq - (y - x) s^{*} {(1 - F (x))}^{s^{*} - 1} f (x),

or, equivalently,

{(\frac{1 - F (y)}{1 - F (x)})}^{s^{*}} \geq {(1 - s^{*} \frac{f (x)}{1 - F (x)} (y - x))}_{+},

which yields

F (y) \geq 1 - (1 - F (x)) {(1 - s^{*} \frac{f (x)}{1 - F (x)} (y - x))}_{+}^{1 / s^{*}} .

The proof of (6) is complete.

(i) implies (iii):

Applying (6) yields that for any x, y ∈ J(F) with x < y,

\frac{F^{s^{*}} (x)}{F^{s^{*}} (y)} \geq 1 + s^{*} \frac{f (y)}{F (y)} (x - y),

and

\frac{F^{s^{*}} (y)}{F^{s^{*}} (x)} \geq 1 + s^{*} \frac{f (x)}{F (x)} (y - x),

or, equivalently,

F^{s^{*}} (x) \geq F^{s^{*}} (y) + s^{*} \frac{f (y)}{F^{1 - s^{*}} (y)} (x - y),

and

F^{s^{*}} (y) \geq F^{s^{*}} (x) + s^{*} \frac{f (x)}{F^{1 - s^{*}} (x)} (y - x) .

By defining h ≡ f/F^1−s* on J(F), it follows that

F^{s^{*}} (x) \geq F^{s^{*}} (y) + s^{*} h (y) (x - y),

and

F^{s^{*}} (y) \geq F^{s^{*}} (x) + s^{*} h (x) (y - x) .

After summing up the last two inequalities, it follows that

F^{s^{*}} (x) + F^{s^{*}} (y) \geq F^{s^{*}} y + s^{*} h (y) (x - y) + F^{s^{*}} (x) + s^{*} h (x) (y - x),

or, equivalently,

0 \geq s^{*} (h (x) - h (y)) (y - x) .

Hence h(x) ≥ h(y), or equivalently, h(·) is a monotonically non-increasing function on J(F).

The proof of the monotonicity of $\tilde{h} \equiv f ∕ {(1 - F)}^{1 - s^{*}}$ is similar and hence is omitted.

(iii) implies (iv):

If (iii) holds, it immediately follows that f > 0 on J(F) = (a, b). If not, suppose that f(x₀) = 0 for some x₀ ∈ J(F). It follows that h(x₀) = f(x₀)/F^1-s* (x₀) = 0. Since h is monotonically non-increasing on J(F), h(x) = 0 for all x ∈ [x₀, b), or, equivalently, f = 0 on [x₀, b). Similarly, the non-decreasing monotonicity of $x \mapsto \tilde{h} (x)$ on J(F) implies that f = 0 on (a, x₀]. Then f = 0 on J(F), which violates the continuity assumption in (iii) and hence f > 0 on J(F).

To prove f is bounded on J(F), note that the monotonicities of h and $\tilde{h}$ imply that for any x, x₀ ∈ J(F),

f (x) = {\begin{matrix} F^{1 - s^{*}} (x) h (x) \leq h (x) \leq h (x_{0}), if x \geq x_{0,} \\ {(1 - F (x))}^{1 - s^{*}} \tilde{h} (x) \leq \tilde{h} (x) \leq \tilde{h} (x_{0}), if x \leq x_{0} . \end{matrix}

Hence $f (x) \leq \max {h (x_{0}), \tilde{h} (x_{0})}$ } for any x, x₀ ∈ J(F).

To prove that f is differentiable on J(F) almost everywhere, we first prove that f is Lipschitz continuous on (c, d) for any c, d ∈ J(F) with c < d.

By the non-increasing monotonicity of h on J(F), the following arguments yield an upper bound of (f(y) − f(x))/(y − x) for any x, y ∈ (c, d):

\frac{f (y) - f (x)}{y - x} = \frac{F^{1 - s^{*}} (y) h (y) - F^{1 - s^{*}} (x) h (x)}{y - x} = h (y) \frac{F^{1 - s^{*}} (y) - F^{1 - s^{*}} (x)}{y - x} + F^{1 - s^{*}} (x) \frac{h (y) - h (x)}{y - x} \leq h (y) \frac{F^{1 - s^{*}} (y) - F^{1 - s^{*}} (x)}{y - x} = h (y) (1 - s^{*}) f (z) F^{- s^{*}} (z),

where the last equality follows from the mean value theorem and z is between x and y.

Since −s* > 0, it follows that F^−s* < 1 and hence

\frac{f (y) - f (x)}{y - x} < (1 - s^{*}) f (z) h (y) \leq (1 - s^{*}) \max {h (x_{0}), \tilde{h} (x_{0})} h (c)

for x, y ∈ (c, d).

Similar arguments imply that

\frac{f (y) - f (x)}{y - x} = \frac{{\bar{F}}^{1 - s^{*}} (y) \tilde{h} (y) - {\bar{F}}^{1 - s^{*}} (x) \tilde{h} (x)}{y - x} = \tilde{h} (y) \frac{{\bar{F}}^{1 - s^{*}} (y) - {\bar{F}}^{1 - s^{*}} (x)}{y - x} + {\bar{F}}^{1 - s^{*}} (x) \frac{\tilde{h} (y) - \tilde{h} (x)}{y - x} \geq \tilde{h} (y) \frac{{\bar{F}}^{1 - s^{*}} (y) - {\bar{F}}^{1 - s^{*}} (x)}{y - x} = - \tilde{h} (y) (1 - s^{*}) {\bar{F}}^{- s^{*}} (z) f (z) \geq - (1 - s^{*}) \max {h (x_{0}), \tilde{h} (x_{0})} \tilde{h} (d) .

Hence

| \frac{f (y) - f (x)}{y - x} | \leq (1 - s^{*}) \max {h (x_{0}), \tilde{h} (x_{0})} \max {h (c), \tilde{h} (d)} .

The last display shows that f is Lipschitz continuous on (c, d).

By Proposition 4.1(iii) of Shorack (2017), page 82, f is absolutely continuous on (c, d), and hence f is differentiable on (c, d) almost everywhere.

Since (c, d) is an arbitrary interval in (a, b), the differentiability of f on (c, d) implies the differentiability of f on (a, b) and hence f is differentiable on (a, b) with f′ = F″ almost everywhere.

Since f is differentiable almost everywhere, the non-increasing monotonicity of h on J(F) implies that

h^{'} (x) \leq 0 almost everywhere on J (F),

or, equivalently,

\log {(h)}^{'} (x) \leq 0 almost everywhere on J (F) .

Straight-forward calculation yields that the last display is equivalent to

\frac{f^{'}}{f} - (1 - s^{*}) \frac{f}{F} \leq 0 almost everywhere on J (F),

or,

f^{'} \leq (1 - s^{*}) \frac{f^{2}}{F} almost everywhere on J (F),

which is the right hand side of (8).

Similarly, the non-decreasing monotonicity of $\tilde{h}$ implies the left hand side of (8).

(iv) implies (i):

Since F is continuous on $R$ , it suffices to prove that F^s* is convex on J(F) by Definition 2. Since we assume that F is differentiable on J(F) with derivative f = F′, the convexity of F^s* on J(F) can be proved by the increasing monotonicity of the first derivative of F^s* on J(F). Since f is differentiable almost everywhere on J(F), the increasing monotonicity of (F^s*)′ on J(F) can be proved by the non-positivity of (F^s*)″ on J(F) almost everywhere, which follows from

{(F^{s^{*}})}^{″} (x) = s^{*} F^{s^{*} - 1} (x) (- (1 - s^{*}) \frac{f^{2} (x)}{F (x)} + f^{'} (x)) \geq 0,

where f = F′, f′ = F″. The last inequality follows from the right hand side of (8).

Similarly, the convexity of (1 − F(x))^s* , or ${\overset{‒}{F}}^{s^{*}}$ , on J(F) can be proved by the following arguments:

{({\bar{F}}^{s^{*}})}^{″} (x) = s^{*} {\bar{F}}^{s^{*} - 1} (x) (- (1 - s^{*}) \frac{f^{2} (x)}{\bar{F} (x)} - f^{'} (x)) \geq 0,

where the last inequality follows from the left part of (8). □

Proof of Proposition 1:

First some background and definitions:

Let a, b ≥ 0 and θ ∈ (0, 1). The generalized mean of order $s \in R$ is defined by
$M_{s} (a, b; θ) = {\begin{matrix} {((1 - θ) a^{s} + θ b^{s})}^{1 / s}, if \pm s \in (0, \infty), \\ a^{1 - θ} b^{θ}, if s = 0, \\ \max {a, b}, if s = \infty, \\ \min (a, b), if s = - \infty, \end{matrix}$
Let (M, d) be a metric space with Borel σ—field $M$ . A measure μ on $M$ is called t-concave if for nonempty sets $A, B \in M$ and 0 < θ < 1 we have
$μ_{*} ((1 - θ) A + θ B) \geq M_{t} (μ_{*} (A), μ_{*} (B); θ)$
where μ_* is the inner measure corresponding to μ (which is needed in general in view of examples noted by Erdős and Stone (1970)).
A non-negative real-valued function h on (M, d) is called s-concave if for x, y ∈ M and 0 < θ < 1 we have
$h ((1 - θ) x + θ y) \geq M_{s} (h (x), h (y); θ) .$
See Chapter 3.3 in Dharmadhikari and Joag-Dev (1988) for more details of the definitions of M_s(a, b; θ), t-concave and s-concave.
Suppose $(M, d) = (R^{k}, | \cdot |)$ , k—dimensional Euclidean space with the usual Euclidean metric and suppose that f is an s-concave density function with respect to Lebesgue measure λ on $B_{k}$ , and consider the probability measure μ on $B_{k}$ defined by
$μ (B) = \int_{B} f d λ for all B \in B_{k} .$
Then by a theorem of Borell (1975), Brascamp and Lieb (1976) and Rinott (1976), the measure μ is s*-concave where s* = 1/(1 + ks) if s ∈ (−1/k, ∞) and s* = 0 if s = 0.
Here we are in the case k = 1. Thus for s ∈ (−1, ∞) the measure μ is s* concave: for s ∈ (−1, ∞), $A, B \in B_{1}$ , and 0 < θ < 1,
$μ_{*} ((1 - θ) A + θ B) \geq M_{s^{*}} (μ_{*} (A), μ_{*} (B); θ);$ (19)
here μ_* denotes inner measure corresponding to μ.

With this preparation we can give our proof of Proposition 1: if A = (−∞, x] and B = (−∞, y] for x, y ∈ J(F), it is easily seen that

(1 - θ) A + θ B = {(1 - θ) x^{'} + θ y^{'} : x^{'} \leq x, y^{'} \leq y} \subset {(1 - θ) x^{'} + θ y^{'} : (1 - θ) x^{'} + θ y^{'} \leq (1 - θ) x + θ y} = (- \infty, (1 - θ) x + θ y] .

Therefore, with the second inequality following from (19),

F ((1 - θ) x + θ_{y}) = μ ((- \infty, (1 - θ) x + θ y]) \geq μ ((1 - θ) (- \infty, x] + θ (- \infty, y]) \geq M_{s^{*}} (μ ((- \infty, x]), μ ((- \infty, y]); θ) = M_{s^{*}} (F (x), F (y); θ);

i.e. F is s*-concave. Similarly, taking A = (x, ∞) and B = (y, ∞) it follows that 1 − F is s*-concave.

Note that this argument contains the case s* = 0. □

Proof of Proposition 2:

By Theorem 3, for any $F \in P_{s^{*}}$ , F is continuous on $R$ and differentiable on J(F) with derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying (8).

For any t* ≤ s*, by noting that 1 − s* ≤ 1 − t* and −(1 − s*) ≥ −(1 − t*), it follows that

- (1 - t^{*}) \frac{f^{2}}{1 - F} \leq - (1 - s^{*}) \frac{f^{2}}{1 - F} \leq f^{'} \leq (1 - s^{*}) \frac{f^{2}}{F} \leq (1 - t^{*}) \frac{f^{2}}{F},

almost everywhere on J(F). Hence $F \in P_{t}^{*}$ by Theorem 3. This proves (1).

To prove (2), note that for any $F \in \cup_{s^{*} > 0} P_{s^{*}}$ , F is continuous on $R$ and differentiable on J(F) with derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying (8), i.e.

- (1 - s^{*}) \frac{f^{2}}{1 - F} \leq f^{'} \leq (1 - s^{*}) \frac{f^{2}}{F} almost everywhere on J (F),

for all s* > 0. By taking s* → 0, it follows that

- (1 - 0) \frac{f^{2}}{1 - F} \leq f^{'} \leq (1 - 0) \frac{f^{2}}{F} almost everywhere on J (F) .

The last display is equivalent to $F \in P_{0}$ by Theorem 3. This proves that the left hand side of (2) holds. Similarly, one can prove the right hand side of (2); the details are omitted. □

Proof of Corollary 4:

To prove the right part of (11), note that (8) implies that

1 - s^{*} \geq \frac{F f^{'}}{f^{2}} and 1 - s^{*} \geq - \frac{(1 - F) f^{'}}{f^{2}}

almost everywhere on J(F), or equivalently,

1 - s^{*} \geq \max {\underset{x \in J (F)}{ess sup} \frac{F f^{'}}{f^{2}}, \underset{x \in J (F)}{ess sup} - \frac{(1 - F) f^{'}}{f^{2}}} .

Replacing ess sup_x∈J(F) F f′/f² and ess sup_x∈J(F)−(1 − F)f′/f² by $\tilde{CR} (F)$ and $\tilde{CR} (\bar{F})$ , it follows that

1 - s^{*} \geq \max {\tilde{C R} (F), \tilde{C R} (\bar{F})} = \bar{γ} (F) .

One can prove the left two inequalities of (11) by the following arguments:

\bar{γ} (F) = \max {\tilde{C R} (F), \tilde{C R} (\bar{F})} = \max {\underset{x \in J (F)}{ess sup} \frac{F (x) f^{'} (x)}{f {(x)}^{2}}, \underset{x \in J (F)}{ess sup} - \frac{(1 - F (x)) f^{'} (x)}{f {(x)}^{2}}} = \max {\underset{x \in J (x)}{ess sup} \frac{F (x) f^{'} (x)}{f (x^{2})} 1_{[f^{'} (x) \geq 0]}, \underset{x \in J (F)}{ess sup} - \frac{(1 - F (x)) | f^{'} (x) |}{f (x^{2})} 1_{[f^{'} (x) \leq 0]}} = \max {\underset{x \in J (F)}{ess sup} \frac{F (x) | f^{'} (x) |}{f {(x)}^{2}} 1_{[f^{'} (x) \geq 0]}, \underset{x \in J (F)}{ess sup} \frac{(1 - F (x)) | f^{'} (x) |}{f (x^{2})} 1_{[f^{'} (x) \leq 0]}} \geq max {\underset{x \in J (F)}{ess sup} \frac{F (x) \land (1 - F (x)) | f^{'} (x) |}{f {(x)}^{2}} 1_{[f^{'} (x) \geq 0]}, \underset{x \in J (F)}{ess sup} \frac{F (x) \land (1 - F (x)) | f^{'} (x) |}{f {(x)}^{2}} 1_{[f^{'} (x) \leq 0]}} = \underset{x \in J (F)}{ess sup} \frac{F (x) \land (1 - F (x)) | f^{'} (x) |}{f {(x)}^{2}} = γ (F) \geq \tilde{γ} (F)

where the last inequality holds since u ∧ (1 − u) ≥ u(1 − u) for 0 ≤ u ≤ 1. □

Proof of Corollary 5:

Note that for s* < 0 and y > −1, we have (1 + y)^s* ≥ 1 + s*y. Replacing y by −F(x), where x ∈ J(F), it follows that

{(1 - F (x))}^{s^{*}} \geq 1 - s^{*} F (x),

or, by rearranging,

F (x) \leq \frac{1}{s^{*}} (1 - {(1 - F (x))}^{s^{*}}) = F_{U} (x),

where F_U is a convex function on J(F) if $F \in P_{s^{*}}$ . This proves the right hand side of (12) for s* < 0. Similarly, replacing y by −(1 − F(x)), where x ∈ J(F), by rearranging terms, it follows that

F (x) \geq \frac{1}{s^{*}} (F^{s^{*}} (x) - (1 - s^{*})) = F_{L} (x),

which proves the left hand side of (12) for s* < 0.

Similarly, for 1 ≥ s* > 0 and y > −1, we have (1 + y)^s* ≤ 1 + s*y. Replacing y by −F(x), where x ∈ J(F), it follows that

{(1 - F (x))}^{s^{*}} \leq 1 - s^{*} F (x),

or, by rearranging,

F (x) \leq \frac{1}{s^{*}} (1 - {(1 - F (x))}^{s^{*}}) = F_{U} (x),

where F_U is a convex function on J(F) if $F \in P_{s^{*}}$ . This proves the right hand side of (12) for s* > 0.

Similarly, replacing y by −(1 − F(x)), where x ∈ J(F), by rearranging terms, it follows that

F (x) \geq \frac{1}{s^{*}} (F^{s^{*}} (x) - (1 - s^{*})) = F_{L} (x),

which proves the left hand side of (12) for s* > 0.

Proof of Lemma 6:

If there is no $G \in P_{s^{*}}$ fitting in between L_n and U_n, it follows that $L_{n}^{o} \equiv 1$ and $U_{n}^{o} \equiv 0$ and assertions in both (i) and (ii) are trivial. In the following proof, we let $G \in P_{s^{*}}$ such that L_n ≤ G ≤ U_n.

(i) It suffices to prove that for any x ∈ J(G) the density function g = G′ satisfies g(x) ≤ max{γ₁, γ₂}, because this is equivalent to Lipschitz-continuity of G with the latter constant, and this property carries over to the pointwise infimum $L_{n}^{o}$ and supremum $U_{n}^{o}$ .

To prove g(x) ≤ max{γ₁, γ₂}, note that g/G^1−s* is monotonically non-increasing on J(G) (see Theorem 3(iii)), it follows that for x ≥ b

\frac{g (x)}{G^{1 - s^{*}} (x)} \leq \frac{g (b)}{G^{1 - s^{*}} (b)} = {(\frac{1}{s^{*}} G^{s^{*}})}^{'} (b) \leq \frac{\frac{1}{s^{*}} G^{s^{*}} (b) - \frac{1}{s^{*}} G^{s^{*}} (a)}{b - a} \leq \frac{\frac{1}{s^{*}} (υ^{s^{*}} - u^{s^{*}})}{b - a} = γ_{1} .

The last inequality follows from noting that x ↦ (1/s*)x^s* is a monotonically non-decreasing function for all s* ≠ 0, G(b) ≤ U_n(b) ≤ v and G(a) ≥ L_n(a) ≥ u. Hence

g (x) \leq G^{1 - s^{*}} (x) γ_{1} \leq γ_{1} for x \geq b .

Similarly, by noting that g/(1 − G)^1−s* is monotonically non-decreasing on J(G) (see Theorem 3(iii)), it follows that for x ≤ a

\frac{g (x)}{{(1 - G (x))}^{1 - s^{*}}} \leq \frac{g (a)}{{(1 - G (a))}^{1 - s^{*}}} = {(\frac{- 1}{s^{*}} (1 - G) s^{*})}^{'} (a) \leq \frac{\frac{- 1}{s^{*}} {(1 - G (b))}^{s^{*}} - \frac{- 1}{s^{*}} {(1 - G (a))}^{s^{*}}}{b - a} \leq \frac{\frac{- 1}{s^{*}} ({(1 - v)}^{s^{*}} - {(1 - u)}^{s^{*}})}{b - a} = γ_{2} .

The last inequality follows from noting that x ↦ −(1/s*)(1 − x)^s* is a monotonically non-decreasing function for all s* = 0, G(b) ≤ v and G(a) ≥ u. Hence

g (x) \leq {(1 - G (x))}^{1 - s^{*}} γ_{2} \leq γ_{2} for x \leq a .

For a < x < b, analogously, we get two following inequalities

g (x) = G^{1 - s^{*}} (x) \frac{g (x)}{G^{1 - s^{*}} (x)} \leq G^{1 - s^{*}} (x) \frac{\frac{1}{s^{*}} G^{s^{*}} (x) - \frac{1}{s^{*}} G^{s^{*}} (a)}{x - a} = \frac{1}{s^{*}} \frac{1}{x - a} (G (x) - G^{s^{*}} (a) G^{1 - s^{*}} (x))

and

g (x) = {(1 - G (x))}^{1 - s^{*}} \frac{g (x)}{{(1 - G (x))}^{1 - s^{*}}} \leq {(1 - G (x))}^{1 - s^{*}} \frac{\frac{- 1}{s^{*}} (1 - G (b)) s^{*} - \frac{- 1}{s^{*}} {(1 - G (x))}^{s^{*}}}{b - x} = \frac{1}{s^{*}} \frac{1}{b - x} (1 - G (x) - {(1 - G (b))}^{s^{*}} {(1 - G (x))}^{1 - s^{*}}) .

The former inequality times (x − a) plus the latter inequality times (b − x) yields

g (x) \leq \frac{1}{s^{*}} \frac{1 - G^{s^{*}} (a) G^{1 - s^{*}} (x) - {(1 - G (b))}^{s^{*}} {(1 - G (x))}^{1 - s^{*}}}{b - a} = \frac{h (G (x))}{b - a},

where

h (y) \equiv \frac{1}{s^{*}} (1 - G^{s^{*}} (a) y^{1 - s^{*}} - {(1 - G (b))}^{s^{*}} {(1 - y)}^{1 - s^{*}}) for y \in (0, 1) .

Since

h^{″} (y) = (1 - s^{*}) (G^{s^{*}} (a) y^{- s^{*} - 1} + {(1 - G (b))}^{s^{*}} {(1 - y)}^{- s^{*} - 1}) \geq 0,

it follows that h(y) is convex on (0, 1) and hence

g (x) \leq \max_{y \in {G (a), G (b)}} \frac{h (y)}{b - a} = \max {\frac{h (G (a))}{b - a}, \frac{h (G (b))}{b - a}} .

Note that

\frac{h (G (a))}{b - a} = {(1 - G (a))}^{1 - s^{*}} \frac{\frac{- 1}{s^{*}} {(1 - G (b))}^{s^{*}} - \frac{- 1}{s^{*}} {(1 - G (a))}^{s^{*}}}{b - a} \leq γ_{2}

and

\frac{h (G (b))}{b - a} = G {(b)}^{1 - s^{*}} \frac{\frac{1}{s^{*}} G^{s^{*}} (b) - \frac{1}{s^{*}} G^{s^{*}} (a)}{b - a} \leq γ_{1} .

Hence g(x) ≤ max{γ₁, γ₂} for a < x < b.

(ii) By Theorem 3(ii), it follows that for x ≤ a

G (x) \leq G (a) {(1 + s^{*} \frac{g (a)}{G (a)} (x - a))}_{+}^{1 / s^{*}} = {(G^{s^{*}} (a) + s^{*} \frac{g (a)}{G^{1 - s^{*}} (a)} (x - a))}_{+}^{1 / s^{*}} .

By Theorem 3(iii), the non-increasing monotonicity of g/G^1−s* implies that

\frac{g (a)}{G^{1 - s^{*}} (a)} = {(\frac{1}{s^{*}} G^{s^{*}})}^{'} (a) \geq \frac{\frac{1}{s^{*}} G^{s^{*}} (b) - \frac{1}{s^{*}} G^{s^{*}} (a)}{b - a} \geq \frac{\frac{1}{s^{*}} v^{s^{*}} - \frac{1}{s^{*}} u^{s^{*}}}{b - a} = γ_{1} .

The last inequality follows from noting that G(a) ≤ U_n(a) ≤ u and G(b) ≥ L_n(b) ≥ v. Since x − a ≤ 0, it follows that

G (x) \leq {(G^{s^{*}} (a) + s^{*} \frac{g (a)}{G^{1 - s^{*}} (a)} (x - a))}_{+}^{1 / s^{*}} \leq {(G^{s^{*}} (a) + s^{*} γ_{1} (x - a))}_{+}^{1 / s^{*}} \leq {(u^{s^{*}} + s^{*} γ_{1} (x - a))}_{+}^{1 / s^{*}} .

The last inequality follows from noting that G(a) ≤ u.

On the other hand, by Theorem 3(ii), it follows that for x ≥ b

1 - G (x) \leq (1 - G (b)) {(1 - s^{*} \frac{g (b)}{1 - G (b)} (x - b))}_{+}^{1 / s^{*}} = {({(1 - G (b))}^{s^{*}} - s^{*} \frac{g (b)}{{(1 - G (b))}^{1 - s^{*}}} (x - b))}_{+}^{1 / s^{*}} \leq {({(1 - υ)}^{s^{*}} - s^{*} \frac{g (b)}{{(1 - G (b))}^{1 - s^{*}}} (x - b))}_{+}^{1 / s^{*}} .

The last inequality follows from noting that 1 − G(b) ≤ 1 − v. By Theorem 3(iii), the non-decreasing monotonicity of g/(1 − G)^1−s* implies that

\frac{g (b)}{{(1 - G)}^{1 - s^{*}} (b)} = {(\frac{- 1}{s^{*}} {(1 - G)}^{s^{*}})}^{'} (b) \geq \frac{\frac{- 1}{s^{*}} {(1 - G (b))}^{s^{*}} - \frac{- 1}{s^{*}} {(1 - G (a))}^{s^{*}}}{b - a} = \frac{\frac{1}{s^{*}} ({(1 - G (a))}^{s^{*}} - {(1 - G (b))}^{s^{*}})}{b - a} \geq \frac{\frac{1}{s^{*}} ({(1 - u)}^{s^{*}} - {(1 - υ)}^{s^{*}})}{b - a} = γ_{2} .

The last inequality follows from noting that G(a) ≤ U_n(a) ≤ u and G(b) ≥ L_n(b) ≥ v. Since x − b ≥ 0, it follows that

1 - G (x) \leq {({(1 - v)}^{s^{*}} - s^{*} γ_{2} (x - b))}_{+}^{1 / s^{*}} .

Proof of Theorem 7:

The following proof is analogous to the proof of Theorem 3 in Dümbgen et al. (2017), in which they proved the result in the case s* = 0. In the following proof we assume that s* ≠ 0.

(i) Suppose s* > 0. Since F is not bi-s*-concave, it follows that F^s* or (1 − F)^s* is not concave. Without loss of generality, we assume that F^s* is not concave and hence there exist real numbers x₀ < x₁ < x₂ such that F^s*(x₁) < (1 − λ)F^s* (x₀) + λF^s* (x₂), where λ ≡ (x₁ − x₀)/(x₂ − x₀) ∈ (0, 1). By the consistency of L_n and U_n, it follows that, with probability tending to one, $U_{n}^{s^{*}} (x_{1}) < (1 - λ) L_{n}^{s^{*}} (x_{0}) + λ L_{n}^{s^{*}} (x_{2})$ and hence

G^{s^{*}} (x_{1}) < (1 - λ) G^{s^{*}} (x_{0}) + λ G^{s^{*}} (x_{2}),

for any G such that L_n ≤ G ≤ U_n. Therefore, there are no bi-s*-concave distribution functions fitting between L_n and U_n and hence $L_{n}^{o} = 1$ and $U_{n}^{o} = 0$ with probability tending to one.

The proof of the case s* < 0 is similar and hence is omitted.

(ii) Suppose $F \in P_{s^{*}}$ . Note that since (L_n, U_n) is a (1 − α) confidence band for F, it follows that $P (L_{n}^{o} \leq U_{n}^{o}) \geq P (L_{n} \leq F \leq U_{n}) \geq 1 - α$ .

If ${G \in P_{s^{*}} : L_{n} \leq G \leq U_{n}}$ is empty, it follows that $L_{n}^{o} = 1$ and $U_{n}^{o} = 0$ and hence the assertions are trivial. In the following proof, we assume that ${G \in P_{s^{*}} : L_{n} \leq G \leq U_{n}}$ is not empty.

To prove (13), we first prove that ‖L_n − F‖_∞ →_p 0 and ‖U_n − F‖_∞ →_p 0. By the continuity of F, for any $m \in N^{+}$ with m ≥ 2, there exist real numbers ${x_{i}}_{i = 1}^{m - 1}$ such that F(x_i) = i/m, i = 1, …, m − 1. Furthermore, define x₀ = −∞ and x_m = ∞.

By the non-decreasing monotonicity of L_n and F, it follows that for x ∈ [x_i−1, x_i]

L_{n} (x) - F (x) \leq L_{n} (x_{i}) - F (x_{i - 1}) = L_{n} (x_{i}) - (F (x_{i}) - \frac{1}{m}) = L_{n} (x_{i}) - F (x_{i}) + \frac{1}{m},

and

L_{n} (x) - F (x) \geq L_{n} (x_{i - 1}) - F (x_{i}) = L_{n} (x_{i - 1}) - (F (x_{i - 1}) + \frac{1}{m}) = L_{n} (x_{i - 1}) - F (x_{i - 1}) - \frac{1}{m} .

Hence

| L_{n} (x) - F (x) | \leq \max_{i = 1, \dots, m - 1} | L_{n} (x_{i}) - F (x_{i}) | + \frac{1}{m}

for x ∈ [x_i−1,x_i]. Note that

{‖ L_{n} - F ‖}_{\infty} = \sup_{x \in R} | L_{n} (x) - F (x) | = \max_{i = 1, \dots, m} \sup_{x \in [x_{i - 1}, x_{i}]} | L_{n} (x) - F (x) |,

it follows that

{‖ L_{n} - F ‖}_{\infty} \leq \max_{i = 1, \dots, m - 1} | L_{n} (x_{i}) - F (x_{i}) | + \frac{1}{m},

and hence pointwise convergence implies uniform convergence. An analogous proof shows that ‖U_n − F‖_∞ →_p 0 and is omitted.

Combining ‖L_n − F‖_∞ →_p 0 and ‖U_n − F‖_∞ →_p 0 implies that

\sup_{G \in P_{s^{*}} : L_{n} \leq G \leq U_{n}} {‖ G - F ‖}_{\infty} \leq {‖ L_{n} - F ‖}_{\infty} + {‖ U_{n} - F ‖}_{\infty} \to_{p} 0 .

To prove (14) in the case that h_G = (G^s*)′, it suffices to prove that

\sup_{G \in P_{s^{*}} : L_{n} \leq G \leq U_{n}} {‖ {(G^{s^{*}} / s^{*})}^{'} - {(F^{s^{*}} / s^{*})}^{'} ‖}_{K, \infty} \to_{p} 0 .

(20)

Note that h_G/s* = G′/G^1−s*. Since K is a compact interval in J(F) and h_F/s* = f/F^1−s* is continuous and non-increasing on J(F), for any fixed ϵ > 0 there exist points a₀ < a₁ ⋯ < a_m < a_m+1 in J(F) such that K ⊂ [a₁, a_m] and

0 \leq \frac{1}{s^{*}} h_{F} (a_{i - 1}) - \frac{1}{s^{*}} h_{F} (a_{i}) \leq ϵ for 1 \leq i \leq m + 1 .

For $G \in P_{s^{*}}$ with L_n ≤ G ≤ U_n, for any x ∈ K it follows from the monotonicity of h_F/s* and h_G/s* that

\sup_{x \in K} (\frac{1}{s^{*}} h_{G} (x) - \frac{1}{s^{*}} h_{F} (x)) \leq \max_{i = 1, \dots, m - 1} (\frac{1}{s^{*}} h_{G} (a_{i}) - \frac{1}{s^{*}} h_{F} (a_{i + 1})) \leq \max_{i = 1, \dots, m - 1} (\frac{\frac{1}{s^{*}} G^{s^{*}} (a_{i}) - \frac{1}{s^{*}} G^{s^{*}} (a_{i - 1})}{a_{i} - a_{i - 1}} - \frac{1}{s^{*}} h_{F} (a_{i + 1})) \leq \max_{i = 1, \dots, m - 1} (\frac{\frac{1}{s^{*}} U_{n}^{s^{*}} (a_{i}) - \frac{1}{s^{*}} L_{n}^{s^{*}} (a_{i - 1})}{a_{i} - a_{i - 1}} - \frac{1}{s^{*}} h_{F} (a_{i + 1})) = \max_{i = 1, \dots, m - 1} (\frac{\frac{1}{s^{*}} F^{s^{*}} (a_{i}) - \frac{1}{s^{*}} F^{s^{*}} (a_{i - 1})}{a_{i} - a_{i - 1}} - \frac{1}{s^{*}} h_{F} (a_{i + 1})) + o_{p} (1) \leq \max_{i = 1, \dots, m - 1} (\frac{1}{s^{*}} h_{F} (a_{i - 1}) - \frac{1}{s^{*}} h_{F} (a_{i + 1})) + o_{p} (1) \leq 2 ϵ + o_{p} (1) .

Analogously,

\sup_{x \in K} (\frac{1}{s^{*}} h_{F} (x) - \frac{1}{s^{*}} h_{G} (x)) \leq \max_{i = 1, \dots, m - 1} (\frac{1}{s^{*}} h_{F} (a_{i}) - \frac{1}{s^{*}} h_{F} (a_{i + 2})) + o_{p} (1) \leq 2 ϵ + o_{p} (1) .

Since ϵ > 0 is arbitrarily small, this shows that (20) holds.

The proof of (14) in the case that h_G = ((1 − G)^s*)′ is similar and hence is omitted.

Since G′ = G^1−s* (G^s* / s*)′, it follows from (20) that (14) holds in the case that h_G = G′.

Finally, let x₁ < sup J(F) and b₁ < f(x₁)/F^1−s*(x₁). As in the proof of Lemma 6(ii) an analogous argument implies that for any $x_{1}^{'} > x_{1}$ , $x_{1}^{'} \in J (F)$ ,

U_{n}^{o} (x) \leq (U_{n}^{s^{*}} (x^{'}) + {s^{*} \frac{\frac{1}{s^{*}} L_{n}^{s^{*}} (x_{1}^{'}) - \frac{1}{s^{*}} U_{n}^{s^{*}} (x_{1})}{x_{1}^{'} - x_{1}} (x - x^{'}))}_{+}^{1 / s^{*}}

for all x ≤ x′ ≤ x₁.

Note that by the consistency of L_n and U_n and letting ${x^{'}}_{1} ↓ x_{1}$ , it follows that.

\frac{\frac{1}{s^{*}} L_{n}^{s^{*}} (x_{1}^{'}) - \frac{1}{s^{*}} U_{n}^{s^{*}} (x_{1})}{x_{1}^{'} - x_{1}} \to_{p} \frac{\frac{1}{s^{*}} F^{s^{*}} (x_{1}^{'}) - \frac{1}{s^{*}} F^{s^{*}} (x_{1})}{x_{1}^{'} - x_{1}} > b_{1} .

Hence with probability tending to one,

U_{n}^{o} (x) \leq {(U_{n}^{s^{*}} (x^{'}) + s^{*} b_{1} (x - x^{'}))}_{+}^{1 / s^{*}},

for all x ≤ x′ ≤ x₁. The proof of (16) is similar and hence is omitted. □

Proof of Remark 1:

(i) By Theorem 3(ii), if s* > 0 and inf J(F) = −∞, it follows that for arbitrary x ∈ J(F),

F (y) \leq F (x) \cdot {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+}^{1 / s^{*}} = 0

for small enough y such that

1 + s^{*} \frac{f (x)}{F (x)} (y - x) < 0 .

This violates the assumption that inf J(F) = −∞ and hence inf J(F) > −∞. The finiteness of sup J(F) can be proved similarly and hence is omitted.

(ii) We first note that (9) holds automatically if inf J(F) > −∞ and sup J(F) < ∞.

In the following proof, we focus on the case that inf J(F) = −∞ and sup J(F) < ∞. To prove (9), it suffices to show that ∫ |x|^tdF(x) is finite for t ∈ (0, (−1)/s*). Note that

{\int | x |}^{t} d F (x) = E {| X |}^{t} = \int_{0}^{\infty} P ({| X |}^{t} > a) d a = \int_{0}^{\infty} P (| X | > a^{1 / t}) d a = \int_{0}^{\infty} t a^{t - 1} P (| X | > a) d a = \int_{0}^{\infty} t a^{t - 1} P (X > a) d a + \int_{0}^{\infty} t a^{t - 1} P (X < - a) d a .

Since sup J(F) is finite, the first term of the last display is finite and hence it suffices to prove that ta^t−1P(X < −a) is integrable for t < (−1)/s*.

It follows from Theorem 3(ii) that for any a large enough and x ∈ J(F),

P (X < - a) \leq F (x) {(1 + \frac{s^{*} f (x) (- a - x)}{F (x)})}_{+}^{1 / s^{*}} = F (x) {(1 + \frac{- s^{*} f (x) (a + x)}{F (x)})}_{+}^{1 / s^{*}} .

Thus ta^t−1P(X < −a) is integrable for t < (−1)/s*, since

t a^{t - 1} P (X < - a) \leq t F (x) a^{t - 1} {(1 + \frac{- s^{*} f (x) (a + x)}{F (x)})}_{+}^{1 / s^{*}} = tF (x) {(\frac{- s^{*} f (x)}{F (x)})}^{1 / s^{*}} a^{t - 1} {(a + x + \frac{F (x)}{- s^{*} f (x)})}_{+}^{1 / s^{*}} \leq 2 tF (x) {(\frac{- s^{*} f (x)}{F (x)})}^{1 / s^{*}} a^{t + 1 / s^{*} - 1}

for a large enough and a^t+1/s*−1 is integrable for t < (−1)/s*.

For other cases, the proof is similar and hence is omitted. □

Proof of Corollary 8:

Suppose that x₀ is a point in J(F). Notice that for any $z \in R$ ,

ϕ (z) - ϕ (x_{0}) = \int_{R} (1_{[x_{0} \leq x < z]} - 1_{[z \leq x < x_{0}]}) ϕ^{'} (x) d x,

and hence by Fubini’s theorem, it follows that

\int_{R} ϕ d G = ϕ (x_{0}) + \int_{R} ϕ^{'} (x) (1_{[x \geq x_{0}]} - G (x)) d x,

(21)

provided that

\int_{R} | ϕ^{'} (x) | | 1_{[x \geq x_{0}]} - G (x) | d x < \infty .

To prove the last display, note that for any b₁ ∈ (0, T₁(F)) and b₂ ∈ (0, T₂(F)), there exist points x₁, x₂ ∈ J(F) with x₁ ≤ x₀ ≤ x₂ and

\frac{f}{F^{1 - s^{*}}} (x_{1}) > b_{1}, \frac{f}{{(1 - F)}^{1 - s^{*}}} (x_{2}) > b_{2} .

Then it follows from Theorem 7(ii) that with probability tending to one,

U_{n}^{o} (x) \leq {(U_{n}^{s^{*}} (x_{1}) + s^{*} b_{1} (x - x_{1}))}_{+}^{^{1 / s^{*}}} for x \leq x_{1},

and

1 - L_{n}^{o} (x) \leq {({(1 - L_{n} (x_{2}))}^{s^{*}} - s^{*} b_{2} (x - x_{2}))}_{+}^{1 / s^{*}} for x \geq x_{2} .

Hence for any c > max{|x₁|, |x₂|}, it follows that

\int_{- \infty}^{x_{1} - c} | ϕ^{'} (x) | | 1_{[x \geq x_{0}]} - G (x) | d x = \int_{- \infty}^{x_{1} - x} | ϕ^{'} (x) | G (x) d x \leq \int_{- \infty}^{x_{1} - c} | ϕ^{'} (x) | U_{n}^{o} (x) d x \leq \int_{- \infty}^{x_{1} - c} | ϕ^{'} (x) | {(U_{n}^{s^{*}} (x_{1}) + s^{*} b_{1} (x - x_{1}))}_{+}^{^{1 / s^{*}}} d x = \int_{- \infty}^{x_{1} - c} | ϕ^{'} (x) | {(U_{n}^{s^{*}} (x_{1}) + s^{*} b_{1} (x - x_{1}))}^{1 / s^{*}} d x .

Since |ϕ′(x)| ≤ a|x|^k−1, it follows that the last display is no larger than

\int_{- \infty}^{x_{1} - c} a {| x |}^{k - 1} {(U_{n}^{s^{*}} (x_{1}) + s^{*} b_{1} (x - x_{1}))}^{1 / s^{*}} d x,

which is finite by noting that k − 1 + 1/s* < −1. Analogously, one can prove that for c > max{|x₁|, |x₂|},

\int_{x_{2} + c}^{\infty} | ϕ^{'} (x) | | 1_{[x \geq x_{0}]} - G (x) | d x \leq \int_{x_{2} + c}^{\infty} | ϕ^{'} (x) | | 1 - L_{n}^{o} (x) | d x < \infty .

Since ϕ′ is continuous on $R$ , it follows that for any c > max{|x₁|, |x₂|},

\int_{x_{1} - c}^{x_{2} + c} | ϕ^{'} (x) | | 1_{[x \geq x_{0}]} - G (x) | d x < \infty

and hence

\int_{R} | ϕ^{'} (x) | | 1_{[x \geq x_{0}]} - G (x) | d x < \infty .

By (21), it follows that

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d G - \int ϕ d F | = \sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ^{'} (x) (F - G) (x) d x |,

which is not larger than

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} {‖ G - F ‖}_{\infty} \int_{x_{1} - c}^{x_{2} + c} | ϕ^{'} (x) | d x + \int_{- \infty}^{x_{1} - c} | ϕ^{'} (x) | (F + U_{n}^{o}) (x) d x + \int_{x_{2} + c}^{\infty} | ϕ^{'} (x) | (1 - F + 1 - L_{n}^{o}) (x) d x \leq o_{p} (1) + 2 \int_{- \infty}^{x_{1} - c} | ϕ^{'} (x) | U_{n}^{o} (x) d x + 2 \int_{x_{2} + c}^{\infty} | ϕ^{'} (x) | (1 - L_{n}^{o} (x)) d x .

Note that the last two terms go to zero as c goes to infinity by their integrability and hence

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d G - \int ϕ d F | = o_{p} (1) .

Proof of Theorem 9: It follows from the proof of Corollary 8 that

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d G - \int ϕ d F | = \sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ^{'} (x) (F - G) (x) d x |

and hence

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d G - \int ϕ d F | \leq \sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} \int | ϕ^{'} (x) | | (G - F) (x) | d x .

It suffices to bound |G − F | on $R$ , where G is between $L_{n}^{o}$ and $U_{n}^{o}$ .

It follows from $G \leq U_{n}^{o} \leq U_{n}$ and Condition (*) that on the interval ${λ n^{- 1 / (2 - 2 γ)} \leq F_{n} \leq 1 - λ n^{- 1 / (2 - 2 γ)}}$ ,

G - F \leq U_{n}^{o} - F \leq U_{n} - F \leq U_{n} - F_{n} + F_{n} - F \leq κ n^{- 1 / 2} {(F_{n} (1 - F_{n}))}^{γ} + | F_{n} - F |

To bound $∣ F_{n} - F ∣$ , it follows from Theorem 3.7.1, page 141, Shorack and Wellner (2009) that

‖ \frac{\sqrt{n} (F_{n} - F) - U \circ F}{{(F (1 - F))}^{γ}} ‖ \to_{p} 0

by verifying that q(t) = (t(1 − t))^γ with 0 ≤ γ < 1/2 is monotonically increasing on [0, 1/2], symmetric about 1/2 and $\int_{0}^{1} q^{- 2} (t) d t < \infty$ , where $U$ is Brownian bridge on [0, 1].

Hence for any fixed ϵ ∈ (0, 1) there exists a constant κ_ϵ > 0 such that with probability at least 1 − ϵ,

| F_{n} - F | \leq κ_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ}

on $R$ . Thus, it follows that on the interval ${λ n^{- 1 / (2 - 2 γ)} \leq F_{n} \leq 1 - λ n^{- 1 / (2 - 2 γ)}}$ ,

G - F \leq κ n^{- 1 / 2} {(F_{n} (1 - F_{n}))}^{γ} + κ_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ} .

To bound $F_{n} (1 - F_{n})$ by F(1 − F), note that

F_{n} (1 - F_{n}) = (F_{n} - F + F) (1 - F + F - F_{n}) = (F_{n} - F) (1 - F) + F (1 - F) - {(F_{n} - F)}^{2} - F (F_{n} - F) = F (1 - F) + (F_{n} - F) (1 - 2 F) - {(F_{n} - F)}^{2} \leq F (1 - F) + | F_{n} - F | | 1 - 2 F | + {| F_{n} - F |}^{2} \leq F (1 - F) + | F_{n} - F | + | F_{n} - F | = F (1 - F) \cdot (1 + \frac{2 | F_{n} - F |}{F (1 - F)}) \leq F (1 - F) \cdot (1 + \frac{4 | F_{n} - F |}{\min {F, 1 - F}}) since F (1 - F) \geq \min {F, 1 - F} / 2 \leq F (1 - F) \cdot (1 + \frac{4 κ_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ}}{\min {F, 1 - F}}) .

For a constant λ_ϵ > 0 to be specified later, it follows from λ_ϵn^{−1/(2−2γ)} ≤ F ≤ 1 − λ_ϵn^{−1/(2−2γ)} and γ ∈ [0, 1/2) that

\frac{{(F (1 - F))}^{γ}}{F} = F^{γ - 1} {(1 - F)}^{γ} \leq λ_{ϵ}^{γ - 1} n^{- (γ - 1) / (2 - 2 γ)} = λ_{ϵ}^{γ - 1} n^{1 / 2}

and

\frac{{(F (1 - F))}^{γ}}{1 - F} = F^{γ} {(1 - F)}^{γ - 1} \leq λ_{ϵ}^{γ - 1} n^{- (γ - 1) / (2 - 2 γ)} = λ_{ϵ}^{γ - 1} n^{1 / 2} .

Hence

F_{n} (1 - F_{n}) \leq F (1 - F) \cdot (1 + 4 κ_{ϵ} n^{- 1 / 2} λ_{ϵ}^{γ - 1} n^{1 / 2}) = F (1 - F) (1 + 4 κ_{ϵ} λ_{ϵ}^{γ - 1}) .

Thus, on the interval

{λ n^{- 1 / (2 - 2 γ)} \leq F_{n} \leq 1 - λ n^{- 1 / (2 - 2 γ)}} \cap {λ_{ϵ} n^{- 1 / (2 - 2 γ)} \leq F \leq 1 - λ_{ϵ} n^{- 1 / (2 - 2 γ)}},

G - F \leq κ n^{- 1 / 2} {(F (1 - F) (1 + 4 κ_{ϵ} λ_{ϵ}^{γ - 1}))}^{γ} + κ_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ} = ν_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ},

where $ν_{\in} = κ {(1 + 4 κ_{\in} λ_{\in}^{γ - 1})}^{γ} + κ_{\in}$ .

The following arguments show that for a large enough λ_ϵ, the interval {λ_ϵn^{−1/(2−2γ} ≤ F ≤ 1 − λ_ϵn^{−1/(2−2γ)} is a subset of ${λ n^{- 1 / (2 - 2 γ)} \leq F_{n} \leq 1 - λ n^{- 1 / (2 - 2 γ)}}$ .

To see this, note that

F_{n} = F + F_{n} - F \geq (1 - \frac{| F_{n} - F |}{F}) F \geq (1 - κ_{ϵ} n^{- 1 / 2} F^{γ - 1} {(1 - F)}^{γ}) F \geq (1 - κ_{ϵ} n^{- 1 / 2} λ_{ϵ}^{γ - 1} n^{1 / 2}) λ_{ϵ} n^{- 1 / (2 - 2 γ)} = (λ_{ϵ} - κ_{ϵ} λ_{ϵ}^{γ}) n^{- 1 / (2 - 2 γ)}

and analogously,

1 - F_{n} \geq (λ_{ϵ} - κ_{ϵ} λ_{ϵ}^{γ}) n^{- 1 / (2 - 2 γ)},

it follows that by choosing a λ_ϵ large enough such that $λ_{\in} - κ_{\in} λ_{\in}^{λ} > λ$ , the interval {λ_ϵn^{−1/(2−2γ)} ≤ F ≤ 1 − λ_ϵn^{−1/(2−2γ)}} is a subset of ${λ n^{- 1 / (2 - 2 γ)} \leq F_{n} \leq 1 - λ n^{- 1 / (2 - 2 γ)}}$ and hence on the interval

{λ_{ϵ} n^{- 1 / (2 - 2 γ)} \leq F \leq 1 - λ_{ϵ} n^{- 1 / (2 - 2 γ)}},

G - F \leq ν_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ} .

Define x_n1 and x_n2, such that F(x_n1) = λ_ϵn^{−1/(2−2γ)} and F(x_n2) = 1 − λ_ϵn^{−1/(2−2γ)}. Analogously, one can prove that F − G ≤ ν_ϵn^−1/2(F(1−F))^γ on [x_n1,x_n2] and hence

| G - F | \leq ν_{ϵ} n^{- 1 / 2} {(F (1 - F))}^{γ}

(22)

on [x_n1,x_n2]. Thus for G between $L_{n}^{o}$ and $U_{n}^{o}$ ,

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d (G - F) | = \sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ^{'} (x) (F (x) - G (x)) d x | \leq ν_{ϵ} n^{- 1 / 2} \int_{x_{n 1}}^{x_{n 2}} | ϕ^{'} (x) | F^{γ} (x) {(1 - F (x))}^{γ} d x + \int_{- \infty}^{x_{n 1}} | ϕ^{'} (x) | (F (x) + U_{n}^{o} (x)) d x + \int_{x_{n 2}}^{\infty} | ϕ^{'} (x) | (2 - F (x) - L_{n}^{o} (x)) d x .

From here, we can see that if $F \in P_{s^{*}}$ with s* > 0, it follows from Remark 1(i) that J(F) is bounded and hence

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d (G - F) | = O_{p} (n^{- 1 / 2})

as long as ϕ′ is bounded on J(F).

The similar argument works if $F \in P_{s^{*}}$ with s* < 0 and J(F) is bounded. In the following proof, we get back to our case that $F \in P_{s^{*}}$ with s* < 0 and without loss of generality, we assume J(F) = (−∞, ∞).

As in the proof of Corollary 8, for x₀ ∈ J(F), b₁ ∈ (0, T₁(F)) and b₂ ∈ (0,T₂(F)), there exist points x₁,x₂ ∈ J(F) with x₁ < x₀ < x₂ such that f(x₁)/F^1−s*(x₁) > b₁ and f(x₂)/(1 − F(x₂))^1−s* > b₂. Then it follows from Theorem 7(ii) that with asymptotic probability one,

U_{n}^{o} (x) \leq {(U_{n}^{s^{*}} {(x_{1}) + s^{*} b_{1} (x - x_{1}))}_{+}^{^{1 / s^{*}}} = U_{n} (x_{1}) (1 + \frac{s^{*} b_{1}}{U_{n}^{s^{*}} (x_{1})} (x - x_{1}))}^{1 / s^{*}} for x \leq x_{1},

(23)

and

1 - L_{n}^{0} (x) \leq {({(1 - L_{n} (x_{2}))}^{s^{*}} - s^{*} b_{2} (x - x_{2}))}_{+}^{1 / s^{*}} = (1 - L_{n} (x_{2})) {(1 - \frac{s^{*} b_{2}}{{(1 - L_{n} (x_{2}))}^{s^{*}}} (x - x_{2}))}^{1 / s^{*}} for x \geq x_{2} .

Similarly, it follows from Theorem 3(ii) that

F (x) \leq F (x_{1}) {(1 + s^{*} \frac{f (x_{1})}{F (x_{1})} (x - x_{1}))}_{+}^{1 / s^{*}} \leq F (x_{1}) {(1 + \frac{s^{*} b_{1}}{F^{s^{*}} (x_{1})} (x - x_{1}))}^{1 / s^{*}} for x \leq x_{1},

(24)

and

1 - F (x) \leq (1 - F (x_{2})) {(1 - s^{*} \frac{f (x_{2})}{1 - F (x_{2})} (x - x_{2}))}_{+}^{1 / s^{*}} \leq (1 - F (x_{2})) {(1 - \frac{s^{*} b_{2}}{{(1 - F (x_{2}))}^{s^{*}}} (x - x_{2}))}^{1 / s^{*}} for x \geq x_{2} .

For large enough n, one can have [x₁, x₂] ⊂ [x_n1, x_n2] and hence

\sup_{G : L_{n}^{o} \leq G \leq U_{n}^{o}} | \int ϕ d (G - F) | \leq I_{n 0} + I_{n 1} + I_{n 1}^{'} + I_{n 2} + I_{n 2}^{'},

where

I_{n 0} \equiv ν_{ϵ} n^{- 1 / 2} \int_{x_{1}}^{x_{2}} | ϕ^{'} (x) | F^{γ} (x) {(1 - F (x))}^{γ} d x, I_{n 1} \equiv ν_{ϵ} n^{- 1 / 2} \int_{x_{n} 1}^{x_{1}} | ϕ^{'} (x) | F^{γ} (x) {(1 - F (x))}^{γ} d x, I_{n 2} \equiv ν_{ϵ} n^{- 1 / 2} \int_{x_{2}}^{x_{n} 2} | ϕ^{'} (x) | F^{γ} (x) {(1 - F (x))}^{γ} d x, I_{n 1}^{'} \equiv \int_{- \infty}^{x_{n 1}} | ϕ^{'} (x) | (F (x) + U_{n}^{o} (x)) d x, I_{n 2}^{'} \equiv \int_{x_{n 2}}^{\infty} | ϕ^{'} (x) | (2 - F (x) - L_{n}^{o} (x)) d x .

Note that $I_{n 0} \leq ν_{\in} n^{- 1 / 2} \int_{x_{1}}^{x_{2}} | ϕ^{'} (x) | d x = O (n^{- 1 / 2})$ . For the other terms, first note that F(x_n1) = λ_ϵn^{−1/(2−2γ)} and hence it follows from (24) that

x_{n 1} \geq x_{1} - \frac{F^{s^{*}} (x_{1})}{s^{*} b_{1}} + \frac{λ_{ϵ}^{s^{*}}}{s^{*} b_{1}} n^{- s^{*} / (2 - 2 γ)} = O (1) + \frac{λ_{ϵ}^{s^{*}}}{s^{*} b_{1}} n^{- s^{*} / (2 - 2 γ)} .

Analogously, one can prove that

x_{n 2} \leq x_{2} - \frac{(1 - F (x_{2})) s^{*}}{s^{*} b_{2}} - \frac{λ_{ϵ}^{s^{*}}}{s^{*} b_{2}} n^{- s^{*} / (2 - 2 γ)} = O (1) + \frac{λ_{ϵ}^{s^{*}}}{s^{*} b_{1}} n^{- s^{*} / (2 - 2 γ)} .

Thus, it follows from (24) and the upper bound of |ϕ′| that

I_{n 1} \leq ν_{\in} n^{- 1 / 2} \int_{x_{n} 1}^{x_{1}} | ϕ^{'} (x) | F^{γ} (x) d x \leq O (n^{- 1 / 2} {\int_{x_{n} 1}^{x_{1}} | ϕ^{'} (x) | (1 + \frac{s^{*} b_{1}}{F^{s^{*}} (x_{1})} (x - x_{1}))}^{γ / s^{*}} d x) \leq O (n^{- 1 / 2} \int_{0}^{O (n^{- s^{*} / (2 - 2 γ)})} | ϕ^{'} (x) | {(1 + \frac{- s^{*} b_{1}}{F^{s^{*}} (x_{1})} x)}^{γ / s^{*}} d x) = O (n^{- 1 / 2} \int_{0}^{O (n^{- s^{*} / (2 - 2 γ)})} | ϕ^{'} (x) | x^{γ / s^{*}} d x) \leq O (n^{- 1 / 2} \int_{0}^{O (n^{- s^{*} / (2 - 2 γ)})} x^{k - 1} x^{γ / s^{*}} d x) = O (n^{- 1 / 2} n^{- (k + γ / s^{*}) s^{*} / (2 - 2 γ)}) = O (n^{- \frac{1}{2} (\frac{1 + s^{*} k}{1 - γ})})

Analogously, one could show that

I_{n 2} \leq O (n^{- \frac{1}{2} (\frac{1 + s^{*} k}{1 - γ})}) .

To bound $I_{n 1}^{'}$ , note that for x ≤ x_n1, it follows from an analogous proof of (24) that

F (x) \leq {(F^{s^{*}} (x_{n 1}) + s^{*} b_{1} (x - x_{n 1}))}^{1 / s^{*}} = {(λ_{ϵ}^{s^{*}} n^{- s^{*} / (2 - 2 γ)} + s^{*} b_{1} (x - x_{n 1}))}^{1 / s^{*}} .

Analogously, it follows that for x ≤ x_n1,

U_{n}^{o} (x) \leq {(U_{n}^{s^{*}} (x_{n 1}) + s^{*} b_{1} (x - x_{n 1}))}^{1 / s^{*}} .

Note that it follows from (22) that

U_{n} (x_{n 1}) = U_{n} (x_{n 1}) - F (x_{n 1}) + F (x_{n 1}) \leq ν_{ϵ} n^{- 1 / 2} {(F (x_{n 1}) (1 - F (x_{n 1})))}^{γ} + F (x_{n 1}) \leq ν_{ϵ} n^{- 1 / 2} F^{γ} (x_{n 1}) + F (x_{n 1}) = (ν_{ϵ} λ_{ϵ}^{γ} + λ_{ϵ}) n^{- 1 / (2 - 2 γ)}

and hence for x ≤ x_n1,

U_{n}^{0} (x) \leq {({(ν_{ϵ} λ_{ϵ}^{γ} + λ_{ϵ})}^{s^{*}} n^{- s^{*} / (2 - 2 γ)} + s^{*} b_{1} (x - x_{n 1}))}^{1 / s^{*}} .

Thus,

I_{n 1}^{'} = \int_{- \infty}^{x_{n 1}} | ϕ^{'} (x) | (F (x) + U_{n}^{o} (x)) d x = O (\int_{- \infty}^{x_{n 1}} | ϕ^{'} (x) | {(n^{- s^{*} / (2 - 2 γ)} + s^{*} b_{1} (x - x_{n 1}))}^{1 / s^{*}} d x) = O (\int_{- \infty}^{0} | ϕ^{'} (x + x_{n 1}) | {(n^{- s^{*} / (2 - 2 γ)} + s^{*} b_{1} x)}^{1 / s^{*}} d x) = O (\int_{- \infty}^{0} {| x + x_{n 1} |}^{k - 1} {(n^{- s^{*} / (2 - 2 γ)} + s^{*} b_{1} x)}^{1 / s^{*}} d x) = O (n^{- 1 / (2 - 2 γ)} \int_{- \infty}^{0} {| x + x_{n 1} |}^{k - 1} {(1 + s^{*} b_{1} x / n^{- s^{*} / (2 - 2 γ)})}^{1 / s^{*}} d x) = O (n^{- 1 / (2 - 2 γ)} n^{- s^{*} / (2 - 2 γ)} \cdot \int_{- \infty}^{0} {| x n^{- s^{*} / (2 - 2 γ)} + x_{n 1} |}^{k - 1} {(1 + s^{*} b_{1} x)}^{1 / s^{*}} d x) = O (n^{- 1 / (2 - 2 γ)} n^{- s^{*} / (2 - 2 γ)} \cdot {\int_{- \infty}^{0} | x n^{- s^{*} / (2 - 2 γ)} + n^{- s^{*} / (2 - 2 γ)} |}^{k - 1} {(1 + s^{*} b_{1} x)}^{1 / s^{*}} d x) = O (n^{- 1 / (2 - 2 γ)} n^{- k s^{*} / (2 - 2 γ)} \int_{- \infty}^{0} {| x |}^{k - 1} {| x |}^{1 / s^{*}} d x) = O (n^{- (k s^{*} + 1) / (2 - 2 γ)}) .

Analogously, one could show that

I_{n 2}^{'} \leq O (n^{- (k s^{*} + 1) / (2 - 2 γ)}) .

Hence

\sup_{G : L_{n}^{0} \leq G \leq U_{n}^{o}} | \int ϕ d (G - F) | \leq I_{n 0} + I_{n 1} + I_{n 1}^{'} + I_{n 2} + I_{n 2}^{'} \leq O (n^{- 1 / 2}) + O (n^{- (k s^{*} + 1) / (2 - 2 γ)}) . □

Supplementary Material

NIHMS1683197-supplement-1.pdf^{(662.5KB, pdf)}

NIHMS1683197-supplement-2.tex^{(15.9KB, tex)}

Table 1:

Summary of Examples 1-8

Name	Example	density f	d.f. F	s	s* = s/(1 + s)	$\bar{γ} (F) = 1 - s^{*}$
student-t	1	f_r, r > 0	F_r	− 1/(1 + r)	− 1/r	1 + (1/r)
F_a,b	2	f_a,b, a, b > 0	F_a,b	− 1/(1 + a/2)	−2/a	1 + 2/a
Pareto(a, b)	3	f_a,b, a, b > 0	F_a,b	− 1/(1 + a)	− 1/a	1 + 1/a
Symmetric Beta	4	f_r, r > 0	F_r	2/r	2/(r + 2)	1/(1 + 2/r) = r/(r + 2)
Expo family Tilted U(0, 1)	5	$f_{t}, t \in R$	F_t	0	e ^−\|t\|	1 − e^−\|t\|
Mixture, N(δ, 1), N(−δ, 1)	6	f_δ	F_δ	not s-concave for δ > 1	0 for 0 < δ < 1.34	1 0 < δ < 1.34
Mixture, T(δ, 1), T(−δ, 1)	7	f_δ	F_δ	not s-concave δ > .6	bi-s-concave, some s 0 < δ < ∞	2 δ small
Lévy α = 1/2	8	f_a	F_a	−2/3	−2	3

Open in a new tab

Highlights:

New classes of shape-constrained distributions are defined and studied:
New confidence bands which exploit the shape - constraints are defined and shown to improve on existing bands if the assumed shape constraint holds
The new classes of shape-constrained distribution functions, which we call bi-s*-concave, play an important role in the theory of quantile processes.

Acknowledgements:

We owe thanks to Lutz Dümbgen for several helpful discussions. We also thank two referees for their positive comments and suggestions.

The research of J. A. Wellner was partially supported by NSF grant DMS-1566514, NIAID grant 2R01 AI291968-04, a Simons Fellowship via the Newton Institute (INI-program STS 2018), Cambridge University, and the Saw Swee Hock Visiting Professorship of Statistics at the National University of Singapore (in 2019).

7. Appendix 1

Proof of the equivalence between Definition 1 and Definition 2.

Definition 1 implies Definition 2:

For any $F \in P_{s^{*}}$ , Theorem 3 shows that F is a continuous function on $R$ . By noticing that $J (F) \subset R$ , J(F) ⊂ (inf J(F), ∞) and J(F) ⊂ (−∞, sup J(F)), the convexity or concavity of F^s* or (1−F)^s* on $R$ , (inf J(F), ∞) and (−∞, sup J(F)) imply the convexity or concavity of F^s* or (1 − F)^s* on J(F). Hence, Definition 1 implies Definition 2.

Definition 2 implies Definition 1:

Suppose s* < 0. By Definition 2, for any $F \in P_{s^{*}}$ , F^s* and (1 − F)^s* are convex on J(F). Moreover, F is continuous on $R$ and hence J(F) = (a, b) where a ≡ inf J(F), b ≡ sup J(F).

To prove that F^s* is convex on $R$ , by continuity of F it suffices to prove that F^s* is mid-point convex: that is,

F^{s^{*}} (\frac{x}{2} + \frac{y}{2}) \leq \frac{1}{2} F^{s^{*}} (x) + \frac{1}{2} F^{s^{*}} (y)

(25)

for any $x, y \in R$ . Without loss of generality, we assume that x < y.

Note that if a = −∞ and b = ∞, then there is nothing to prove. Without loss of generality, we assume that a > −∞ and b < ∞.

Note that if x ∈ (−∞, a], then F^s*(x) = ∞ and hence (25) holds automatically. If x ∈ (a, b) and y ∈ (a, b), (25) holds by the convexity of F^s* on J(F). Moreover, by noticing the continuity of F^s* at b, (25) holds for any x ∈ (a, b) and y ∈ (a, b]. Since F^s*(y) = F^s*(b) = 1 for y ≥ b, (25) holds for any x ∈ (a, b) and y ∈ [b, ∞). If x, y ∈ [b, ∞), (25) holds automatically since F^s* (x) = F^s*(y) = 1.

The proof of the convexity of (1 − F)^s* on $R$ is similar and hence is omitted. For the cases that s* ≥ 0, the proof is similar and hence is omitted. □

Proof of Theorem 3 (0 ≤ s* ≤ 1):

Recall that a ≡ inf J(F) and b ≡ sup J(F). Suppose 1 ≥ s* > 0.

(i) implies (ii):

Suppose $F \in P_{s^{*}}$ . To prove that F is continuous on $R$ , we first note that x ↦ F^s* (x) and x ↦ (1 − F(x))^s* (x) are concave functions on (a, ∞) and (−∞, b) respectively. By Theorem 10.1 (page 82) in Rockafellar (1970), F^s* and (1 − F(x))^s* are continuous on any open convex sets in their effective domains. In particular, F^s* and (1 − F)^s* are continuous on (a, ∞) and (−∞, b) respectively. This yields that F is continuous on (a, ∞) and (−∞, b), or equivalently, on (a, ∞)∪(−∞, b) = (−∞, ∞) since F is non-degenerate.

To prove that F is differentiable on J(F), note that J(F) = (a, b) since F is continuous on $R$ . By Theorem 23.1 (page 213) in Rockafellar (1970), for any x ∈ J(F), the concavity of F^s* on J(F) implies the existence of ${(F^{s^{*}})}_{+}^{'} (x)$ and ${(F^{s^{*}})}_{-}^{'} (x)$ . Moreover, ${(F^{s^{*}})}_{-}^{'} (x) \geq {(F^{s^{*}})}_{+}^{'} (x)$ by Theorem 24.1 (page 227) in Rockafellar (1970). Since F = (F^s*)^1/s* on J(F) the chain rule guarantees the existence of ${F^{'}}_{\pm} (x)$ and

F_{\pm}^{'} (x) = \frac{1}{s^{*}} {(F^{s^{*}})}^{1 / s^{*} - 1} (x \pm) {(F^{s^{*}})}_{\pm}^{'} (x) .

Since F is continuous on J(F), then

F_{\pm}^{'} (x) = \frac{1}{s^{*}} {(F^{s^{*}})}^{1 / s^{*} - 1} (x) {(F^{s^{*}})}_{\pm}^{'} (x) .

Hence $F_{-}^{'} (x) \geq F_{+}^{'} (x)$ by ${(F^{s^{*}})}_{-}^{'} (x) \geq {(F^{s^{*}})}_{+}^{'} (x)$ .

Similarly, one can prove $F_{-}^{'} (x) \leq F_{+}^{'} (x)$ by the concavity of (1 − F)^s* on J(F). Thus $F_{-}^{'} (x) = F_{+}^{'} (x) = F^{'} (x)$ for any x ∈ J(F), or equivalently, F is differentiable on J(F). The derivative of F is denoted by f, i.e. f ≡ F′.

To prove (6), note that the concavity of x ↦ F^s* (x) on J(F) implies that, for any x, y ∈ J(F),

F^{s^{*}} (y) - F^{s^{*}} (x) \leq (y - x) {(F^{s^{*}})}^{'} (x) = (y - x) s^{*} F^{s^{*} - 1} (x) f (x),

or, with x₊ = max{x, 0},

\frac{F^{s^{*}} (y)}{F^{s^{*}} (x)} \leq {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+} .

Hence

\frac{F (y)}{F (x)} \leq {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+}^{1 / s^{*}},

or, equivalently,

F (y) \leq F (x) {(1 + s^{*} \frac{f (x)}{F (x)} (y - x))}_{+}^{1 / s^{*}} .

Analogously, the convexity of (1 − F(x))^s* on J(F) implies that for any x, y ∈ J(F)

{(1 - F (y))}^{s^{*}} - {(1 - F (x))}^{s^{*}} \leq - (y - x) s^{*} {(1 - F (x))}^{s^{*} - 1} f (x),

or, equivalently,

{(\frac{1 - F (y)}{1 - F (x)})}^{s^{*}} \leq {(1 - s^{*} \frac{f (x)}{1 - F (x)} (y - x))}_{+},

which yields

F (y) \geq 1 - (1 - F (x)) {(1 - s^{*} \frac{f (x)}{1 - F (x)} (y - x))}_{+}^{1 / s^{*}} .

The proof of (6) is complete.

(ii) implies (iii):

Applying (6) yields that for any x, y ∈ J(F) with x < y,

\frac{F^{s^{*}} (x)}{F^{s^{*}} (y)} \leq 1 + s^{*} \frac{f (y)}{F (y)} (x - y),

and

\frac{F^{s^{*}} (y)}{F^{s^{*}} (x)} \leq 1 + s^{*} \frac{f (x)}{F (x)} (y - x),

or, equivalently,

F^{s^{*}} (x) \leq F^{s^{*}} (y) + s^{*} \frac{f (y)}{F^{1 - s^{*}} (y)} (x - y),

and

F^{s^{*}} (y) \leq F^{s^{*}} (x) + s^{*} \frac{f (x)}{F^{1 - s^{*}} (x)} (y - x) .

By defining h ≡ f/F^1−s* on J(F), it follows that

F^{s^{*}} (x) \leq F^{s^{*}} (y) + s^{*} h (y) (x - y),

and

F^{s^{*}} (y) \leq F^{s^{*}} (x) + s^{*} h (x) (y - x) .

After summing up the last two inequalities, it follows that

F^{s^{*}} (x) + F^{s^{*}} (y) \leq F^{s^{*}} (y) + s^{*} h (y) (x - y) + F^{s^{*}} (x) + s^{*} h (x) (y - x),

or, equivalently,

0 \leq s^{*} (h (x) - h (y)) (y - x) .

Hence h(x) ≥ h(y), or equivalently, h(·) is a monotonically non-increasing function on J(F).

The proof of the monotonicity of $\tilde{h} \equiv {f / (1 - F)}^{1 - s^{*}}$ is similar and hence is omitted.

(iii) implies (iv):

If (iii) holds, it immediately follows that f > 0 on J(F) = (a, b). If not, suppose that f(x₀) = 0 for some x₀ ∈ J(F). It follows that h(x₀) = f(x₀)/F^1−s*(x₀) = 0. Since h is monotonically non-increasing on J(F), h(x) = 0 for all x ∈ [x₀, b), or, equivalently, f = 0 on [x₀, b). Similarly, the non-decreasing monotonicity of $x \mapsto \tilde{h} (x)$ on J(F) implies that f = 0 on (a, x₀]. Then f = 0 on J(F), which violates the continuity assumption in (iii) and hence f > 0 on J(F).

To prove f is bounded on J(F), note that the monotonicities of h and $\tilde{h}$ imply that for any x, x₀ ∈ J(F),

f (x) = {\begin{matrix} F^{1 - s^{*}} h (x) \leq h (x) \leq h (x_{0}), if x \geq x_{0}, \\ {(1 - F (x))}^{1 - s^{*}} \tilde{h} (x) \leq \tilde{h} (x) \leq \tilde{h} (x_{0}), if x \leq x_{0} . \end{matrix}

Hence $f (x) \leq \max {h (x_{0}), \tilde{h} (x_{0})}$ for any x, x₀ ∈ J(F).

To prove that f is differentiable on J(F) almost every, we first prove that f is Lipschitz continuous on (c, d) for any c, d ∈ J(F) with c < d.

By noticing the non-increasing monotonicity of h on J(F), the following arguments yield an upper bound of (f(y) − f(x))/(y − x) for x, y ∈ (c, d):

\frac{f (y) - f (x)}{y - x} = \frac{F^{1 - s^{*}} (y) h (y) - F^{1 - s^{*}} (x) h (x)}{y - x} = h (y) \frac{F^{1 - s^{*}} (y) - F^{1 - s^{*}} (x)}{y - x} + F^{1 - s^{*}} (x) \frac{h (y) - h (x)}{y - x} \leq h (y) \frac{F^{1 - s^{*}} (y) - F^{1 - s^{*}} (x)}{y - x} = h (y) (1 - s^{*}) f (z) F^{- s^{*}} (z),

where the last equality follows from the mean value theorem and z is between x and y.

Since − s* < 0, it follows that F^−s*(z) < F^−s* (c) and hence

\frac{f (y) - f (x)}{y - x} \leq (1 - s^{*}) f (z) h (y) F^{- s^{*}} (z) \leq (1 - s^{*}) \max {h (x_{0}), \tilde{h} (x_{0})} h (c) F^{- s^{*}} (c),

for x, y ∈ (c, d).

Similar arguments imply that

\frac{f (y) - f (x)}{y - x} = \frac{{\bar{F}}^{1 - s^{*}} (y) \tilde{h} (y) - {\bar{F}}^{1 - s^{*}} (x) \tilde{h} (x)}{y - x} = \tilde{h} (y) \frac{{\bar{F}}^{1 - s^{*}} (y) - {\bar{F}}^{1 - s^{*}} (x)}{y - x} + {\bar{F}}^{1 - s^{*}} (x) \frac{\tilde{h} (y) - \tilde{h} (x)}{y - x} \geq \tilde{h} (y) \frac{{\bar{F}}^{1 - s^{*}} (y) - {\bar{F}}^{1 - s^{*}} (x)}{y - x} = - \tilde{h} (y) (1 - s^{*}) {\bar{F}}^{- s^{*}} (z) f (z) \geq - (1 - s^{*}) \max {h (x_{0}), \tilde{h} (x_{0})} \tilde{h} (d) {\bar{F}}^{- s^{*}} (d) .

Hence

| \frac{f (y) - f (x)}{y - x} | \leq (1 - s^{*}) \max {h (x_{0}), \tilde{h} (x_{0})} \max {h (c) F^{- s^{*}} (c), \tilde{h} (d) {\bar{F}}^{- s^{*}} (d)} .

The last display shows that f is Lipschitz continuous on (c, d).

By Proposition 4.1(iii) of Shorack (2017), page 82, f is absolutely continuous on (c, d), and hence f is differentiable on (c, d) almost everywhere.

Since (c, d) is an arbitrary interval in (a, b), the differentiability of f on (c, d) implies the differentiability of f on (a, b) and hence f is differentiable on (a, b) with f′ = F″ almost everywhere.

Since f is differentiable almost everywhere, the non-increasing monotonicity of h on J(F) implies that

h^{'} (x) \leq 0 almost everywhere on J (F),

or, equivalently,

\log {(h)}^{'} (x) \leq 0 almost everywhere on J (F) .

Straight-forward calculation yields that the last display is equivalent to

\frac{f^{'}}{f} - (1 - s^{*}) \frac{f}{F} \leq 0 almost everywhere on J (F),

or,

f^{'} \leq (1 - s^{*}) \frac{f^{2}}{F} almost everywhere on J (F),

which is the right hand side of (8).

Similarly, the non-decreasing monotonicity of $\tilde{h}$ implies the left hand side of (8).

(iv) implies (i):

Since F is continuous on $R$ , it suffices to prove that F^s* is convex on J(F) by Definition 2. Since we assume that F is differentiable on J(F) with derivative f = F′, the concavity of F^s* on J(F) can be proved by the non-increasing monotonicity of the first derivative of F^s* on J(F). Since f is differentiable almost everywhere on J(F), the non-increasing monotonicity of (F^s*)′ on J(F) can be proved by the non-positivity of (F^s*)″ on J(F) almost everywhere, which follows from

{(F^{s^{*}})}^{″} (x) = s^{*} F^{s^{*} - 1} (x) (- (1 - s^{*}) \frac{f^{2} (x)}{F (x)} + f^{'} (x)) \leq 0,

where f = F′, f′ = F″. The last inequality follows from the right hand side of (8).

Similarly, the concavity of (1 − F(x))^s*, or ${\overset{‒}{F}}^{s^{*}}$ , on J(F) can be proved by the following arguments:

{({\bar{F}}^{s^{*}})}^{″} (x) = s^{*} {\bar{F}}^{s^{*} - 1} (x) (- (1 - s^{*}) \frac{f^{2} (x)}{\bar{F} (x)} - f^{'} (x)) \leq 0,

where the last inequality follows from the left part of (8). □

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Barrio ED, Giné E and Utzet F (2005). Asymptotics for L₂ functionals of the empirical quantile process, with applications to tests of fit based on weighted wasserstein distances. Bernoulli 11 131–189. [Google Scholar]
Bobkov S and Ledoux M (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances, vol. 261. Mem. Amer. Math. Soc [Google Scholar]
Borell C (1975). Convex set functions in d-space. Periodica Mathematica Hungarica 6(2) 111–136. [Google Scholar]
Brascamp HJ and Lieb EH (1976). On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis 22 366–389. [Google Scholar]
Csörgő M and Révész P (1978). Strong approximations of the quantile process. Ann. Statist 6 882–894. [Google Scholar]
Dharmadhikari S and Joag-Dev K (1988). Unimodality, Convexity, and Applications. Academic Press. [Google Scholar]
Dümbgen L, Kolesnyk P and Wilke RA (2017). Bi-log-concave distribution functions. Journal of Statistical Planning and Inference 184 1–17. [Google Scholar]
Dümbgen L and Wellner JA (2014). Confidence bands for distribution functions: A new look at the law of the iterated logarithm. Tech. rep., Department of Statistics, University of Washington. [Google Scholar]
Durrett R (2019). Probability—Theory and Examples, vol. 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. Fifth edition of [MR1068527]. [Google Scholar]
Erdős P and Stone A (1970). On the sum of two Borel sets. Proceedings of the American Mathematical Society 25 304–306. [Google Scholar]
Gardner RJ (2002). The Brunn-Minkowski inequality. Bull. Amer. Math. Soc. (N.S.) 39 355–405. [Google Scholar]
Grenander U (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr 39 125–153 (1957). [Google Scholar]
Hall P (1984). On unimodality and rates of convergence for stable laws. J. London Math. Soc (2) 30 371–384. [Google Scholar]
Kleiber C and Kotz S (2003). Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley & Sons. [Google Scholar]
Massart P (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab 18 1269–1283. [Google Scholar]
Owen AB (1995). Nonparametric likelihood confidence bands for a distribution function. Journal of the American Statistical Association 90 516–521. [Google Scholar]
Rinott Y (1976). On convexity of measures. Ann. Probab 4 1020–1026. [Google Scholar]
Robertson T, Wright FT and Dykstra RL (1988). Order Restricted, Statistical Inference. Wiley & Sons. [Google Scholar]
Rockafellar RT (1970). Convex Analysis. Princeton University Press. [Google Scholar]
Samworth RJ (2018). Recent progress in log-concave density estimation. Statist. Sci 33 493–509. [Google Scholar]
Samworth RJ and Sen B (2018). Editorial: special issue on “Nonparametric inference under shape constraints”. Statist. Sci 33 469–472. [Google Scholar]
Saumard A (2019). Bi-log-concavity: some properties and some remarks towards a multi-dimensional extension. Electron. Commun. Probab 24 Paper No. 61, 8. [Google Scholar]
Shorack GR (2017). Probability for Statisticians. Springer. [Google Scholar]
Shorack GR and Wellner JA (2009). Empirical Processes with Applications to Statistics, vol. 59 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. Reprint of the 1986 original [MR0838963]. [Google Scholar]
van Eeden C (1956). Maximum likelihood estimation of ordered probabilities. Statist. Afdeling S 188 (VP 5), Math. Centrum Amsterdam. [Google Scholar]
Woolridge JM (2000). Instructional Stata datasets for econometrics. Boston College Department of Economics. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1683197-supplement-1.pdf^{(662.5KB, pdf)}

NIHMS1683197-supplement-2.tex^{(15.9KB, tex)}

[R1] Barrio ED, Giné E and Utzet F (2005). Asymptotics for L₂ functionals of the empirical quantile process, with applications to tests of fit based on weighted wasserstein distances. Bernoulli 11 131–189. [Google Scholar]

[R2] Bobkov S and Ledoux M (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances, vol. 261. Mem. Amer. Math. Soc [Google Scholar]

[R3] Borell C (1975). Convex set functions in d-space. Periodica Mathematica Hungarica 6(2) 111–136. [Google Scholar]

[R4] Brascamp HJ and Lieb EH (1976). On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis 22 366–389. [Google Scholar]

[R5] Csörgő M and Révész P (1978). Strong approximations of the quantile process. Ann. Statist 6 882–894. [Google Scholar]

[R6] Dharmadhikari S and Joag-Dev K (1988). Unimodality, Convexity, and Applications. Academic Press. [Google Scholar]

[R7] Dümbgen L, Kolesnyk P and Wilke RA (2017). Bi-log-concave distribution functions. Journal of Statistical Planning and Inference 184 1–17. [Google Scholar]

[R8] Dümbgen L and Wellner JA (2014). Confidence bands for distribution functions: A new look at the law of the iterated logarithm. Tech. rep., Department of Statistics, University of Washington. [Google Scholar]

[R9] Durrett R (2019). Probability—Theory and Examples, vol. 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. Fifth edition of [MR1068527]. [Google Scholar]

[R10] Erdős P and Stone A (1970). On the sum of two Borel sets. Proceedings of the American Mathematical Society 25 304–306. [Google Scholar]

[R11] Gardner RJ (2002). The Brunn-Minkowski inequality. Bull. Amer. Math. Soc. (N.S.) 39 355–405. [Google Scholar]

[R12] Grenander U (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr 39 125–153 (1957). [Google Scholar]

[R13] Hall P (1984). On unimodality and rates of convergence for stable laws. J. London Math. Soc (2) 30 371–384. [Google Scholar]

[R14] Kleiber C and Kotz S (2003). Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley & Sons. [Google Scholar]

[R15] Massart P (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab 18 1269–1283. [Google Scholar]

[R16] Owen AB (1995). Nonparametric likelihood confidence bands for a distribution function. Journal of the American Statistical Association 90 516–521. [Google Scholar]

[R17] Rinott Y (1976). On convexity of measures. Ann. Probab 4 1020–1026. [Google Scholar]

[R18] Robertson T, Wright FT and Dykstra RL (1988). Order Restricted, Statistical Inference. Wiley & Sons. [Google Scholar]

[R19] Rockafellar RT (1970). Convex Analysis. Princeton University Press. [Google Scholar]

[R20] Samworth RJ (2018). Recent progress in log-concave density estimation. Statist. Sci 33 493–509. [Google Scholar]

[R21] Samworth RJ and Sen B (2018). Editorial: special issue on “Nonparametric inference under shape constraints”. Statist. Sci 33 469–472. [Google Scholar]

[R22] Saumard A (2019). Bi-log-concavity: some properties and some remarks towards a multi-dimensional extension. Electron. Commun. Probab 24 Paper No. 61, 8. [Google Scholar]

[R23] Shorack GR (2017). Probability for Statisticians. Springer. [Google Scholar]

[R24] Shorack GR and Wellner JA (2009). Empirical Processes with Applications to Statistics, vol. 59 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. Reprint of the 1986 original [MR0838963]. [Google Scholar]

[R25] van Eeden C (1956). Maximum likelihood estimation of ordered probabilities. Statist. Afdeling S 188 (VP 5), Math. Centrum Amsterdam. [Google Scholar]

[R26] Woolridge JM (2000). Instructional Stata datasets for econometrics. Boston College Department of Economics. [Google Scholar]

PERMALINK

Bi-s*-Concave Distributions

Nilanjana Laha

Zhen Miao

Jon A Wellner

Abstract

1. Introduction

2. Definitions, Examples, and First Properties

3. Main Theoretical Results

4. Confidence bands for bi-s*-concave distribution functions

4.1. Definitions and Basic Properties

4.2. Implementation and illustration of the confidence bands

Illustration of the confidence bands

Figure 1:

Figure 2:

Figure 3:

Figure 4:

An Application

Choosing s*

Figure 5:

Figure 6:

5. Summary and further problems

Conjecture:

6. Proofs

Supplementary Material

Table 1:

Highlights:

Acknowledgements:

7. Appendix 1

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Bi-s*-Concave Distributions

Nilanjana Laha

Zhen Miao

Jon A Wellner

Abstract

1. Introduction

2. Definitions, Examples, and First Properties

3. Main Theoretical Results

4. Confidence bands for bi-s*-concave distribution functions

4.1. Definitions and Basic Properties

4.2. Implementation and illustration of the confidence bands

Illustration of the confidence bands

Figure 1:

Figure 2:

Figure 3:

Figure 4:

An Application

Choosing s*

Figure 5:

Figure 6:

5. Summary and further problems

Conjecture:

6. Proofs

Supplementary Material

Table 1:

Highlights:

Acknowledgements:

7. Appendix 1

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases