Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Dec 1.
Published in final edited form as: J Stat Plan Inference. 2021 Mar 13;215:127–157. doi: 10.1016/j.jspi.2021.03.001

Bi-s*-Concave Distributions

Nilanjana Laha a, Zhen Miao b, Jon A Wellner c,*
PMCID: PMC8486153  NIHMSID: NIHMS1683197  PMID: 34602723

Abstract

We introduce new shape-constrained classes of distribution functions on R, the bi-s*-concave classes. In parallel to results of Dümbgen et al. (2017) for what they called the class of bi-log-concave distribution functions, we show that every s-concave density f has a bi-s*-concave distribution function F for s* ≤ s/(s + 1).

Confidence bands building on existing nonparametric confidence bands, but accounting for the shape constraint of bi-s*-concavity, are also considered. The new bands extend those developed by Dümbgen et al. (2017) for the constraint of bi-log-concavity. We also make connections between bi-s*-concavity and finiteness of the Csörgő - Révész constant of F which plays an important role in the theory of quantile processes.

Keywords: log-concave, bi-log-concave, shape constraint, s-concave, quantile process, Csörgő - Révész condition, hazard function

1. Introduction

Statistical methods based on shape constraints have been developing rapidly during the past 15 - 20 years. From the classical univariate methods based on monotonicity going back to the work of Grenander (1956) and van Eeden (1956) in the 1950’s and 1960’s, research has progressed to consideration of convexity type constraints in a variety of problems including estimation of density functions, regression functions, and other “nonparametric” functions such as hazard (rate) functions. See Samworth and Sen (2018) for a summary and overview of some of this recent activity.

One very appealing shape constraint is log-concavity: a (density) function f:Rd[0,] is log-concave if log f is concave (with log 0 = −∞). See Samworth (2018) for a recent review of the properties of log-concave densities and their relevance for statistical applications. While much of the current literature has focused on point estimation, our main focus here will be on inference for one-dimensional distribution functions and especially on (honest, exact) confidence bands for distribution functions which take advantage of shape constraints.

To this end, Dümbgen et al. (2017) introduced the class of bi-log-concave distribution functions defined as follows: a distribution function F on R is bi-log-concave if both F and 1 – F are log-concave. They provided several different equivalent characterizations of this property, and noted (the previously known fact) that if f is a log-concave density, then the corresponding distribution function F and survival function 1 – F are both log-concave. But the converse is false: there are many bi-log-concave distribution functions F with density f which fail to be log-concave; see Section 2 below for an explicit example. Dümbgen et al. (2017) also showed how to construct confidence bands which exploit the bi-log-concave shape constraint and thereby obtain narrower bands, especially in the tails, with correct coverage when the bi-log-concave assumption holds.

However, a difficulty with the assumption of bi-log-concavity is that the corresponding density functions inherit the requirement of exponentially decaying tails of the class of log-concave densities, and this rules out distribution functions F with tails decaying more slowly than exponentially. Here we introduce new shape-constrained families of distribution functions F, which we call the bi-s*-concave distributions, with tails possibly decaying more slowly (or more rapidly) than exponentially. As the name indicates, these families involve a parameter s* ∈ (−∞, 1] which allows heavier than exponential tails when s* < 0, lighter than exponential tails when s* > 0, and which correspond to exactly the bi-log-concave class introduced by Dümbgen et al. (2017) when s* = 0.

Here is an outline of the rest of the paper. In Section 2 we give careful definitions of the new classes of bi-s*-concave distributions. We also present several helpful examples and discuss some basic properties of the new classes and their connections to the classes of s-concave densities studied by Borell (1975), Brascamp and Lieb (1976), and Rinott (1976). (See also Dharmadhikari and Joag-Dev (1988), and Gardner (2002).) Section 3 contains the main theoretical results of the paper. The connection between the bi-s*-concave class and a key condition in the theory of quantile processes, the Csörgő - Révész condition, is discussed in Corollary 4. Finally, we give two tail bounds for distribution functions FPs, see Corollary 5.

In Section 4 we first introduce the new confidence bands for a distribution function FPs assuming s* is known. We also discuss some of their theoretical properties: the consistency of confidence bands is discussed in Theorem 7, and Theorem 9 provides a rate of convergence for linear functionals of bi-s*-distribution functions contained in the bands. This extends Theorem 5 of Dümbgen et al. (2017). We then briefly discuss the algorithms used to compute the new bands, and illustrate the new bands with real and artificial data. Section 5 gives a brief summary and statements of further problems. An especially important remaining problem concerns construction of confidence bands when s* is unknown. The proofs for all the results in Sections 2, 3, and 4 are given in Sections 6 and 7.

We conclude this section with some notation which will be used throughout the rest of the paper. The supremum norm of a function h:RR is denoted by hsupxR|h(x)|, and for KR we write hK,supxK|h(x)|. For a function xf(x),

f+(x)limλ0f(x+λ)f(x)λ,andf(x)limλ0f(x+λ)f(x)λ,f(x+)limyxf(y),andf(x)limyxf(y),

assuming that the indicated limits exist. In general, we use F and f to denote a distribution function and the corresponding density function with respect to Lebesgue measure, and we set J(F){xR:0<F(x)<1}.

2. Definitions, Examples, and First Properties

As we discussed above, for distribution functions F on R, Dümbgen et al. (2017) introduced a shape constraint they called bi-log-concavity by requiring that both F and 1 – F be log-concave.

In this paper, we generalize the bi-log-concave distribution functions by introducing and studying bi-s*-concave distributions defined as follows.

Definition 1.

For −∞ < s* < 0, a distribution function F is bi-s*-concave if both xFs* (x) and x ↦ (1 − F(x))s* are convex functions from R to [0, ∞].

For s* = 0, a distribution function F is bi-s*-concave (or bi-log-concave) if both x ↦ log(F(x)) and x ↦ log (1 − F(x)) are concave functions from R to [−∞, 0].

For 0 < s* ≤ 1, a distribution function F is bi-s*-concave if xFs* (x) is concave from (inf J(F), ∞) to [0, 1] and x → (1 − F(x))s* is concave from (−∞, sup J(F)) to [0, 1].

The class of bi-s*-concave distribution functions is denoted by Ps, i.e.

Ps*{F:Fis bis*concave}.

Definition 2. (Alternative to Definition 1.)

A distribution function F is bi-s*-concave if it is continuous on R and satisfies the following properties on J(F):

  • For −∞ < s* < 0, both xFs*(x) and x ↦ (1 − F(x))s* are convex functions on J(F).

  • For s* = 0, both x ↦ log(F(x)) and x ↦ log (1 − F(x)) are concave functions on J(F).

  • For 0 < s* ≤ 1, both xFs* (x) and x ↦ (1 − F(x))s* are concave functions on J(F).

See the Appendix, Section 7, for a proof of the equivalence of Definitions 1 and 2. The main benefit of the second definition is that it is immediately clear that any bi-s*-concave distribution function F is continuous since continuity of F is explicitly required in Definition 2. Moreover, to verify FPs we only need to verify the convexity or concavity of Fs* or (1 − F)s* on the same interval J(F).

Recall that a density function f is s-concave if fs is convex for s < 0, fs is concave for s > 0, and log f is concave for s = 0. Two basic properties linking s-concave densities and bi-s*-concave distribution functions are given in the following two propositions. Proposition 1 generalizes the case s = 0 as noted above, while Proposition 2 generalizes the corresponding nestedness property of the classes of s-concave densities; see e.g. Dharmadhikari and Joag-Dev (1988), page 86, and Borell (1975), page 111.

Proposition 1. Suppose a density function f is s-concave with s ∈ (−1, ∞). Then the corresponding distribution function F is bi-s*-concave for all s* ≤ s/(1 + s).

Proposition 2. The bi-s*-concave classes are nested in the following sense:

Ps*Pt*,whenevert*s*1. (1)

Moreover, the bi-s*-concave classes are continuous at s* = 0 in the following sense:

s*>0Ps*=P0=s*<0Ps*. (2)

In view of the nesting property (1), for each FPs for some s* we define

s0*(F)sup{s*:Fis bis*concave}.

Similarly if f is s-concave for some s we define

s0(f)sup{s:fissconcave}.

We often drop the subscript 0 if the meaning is clear. For other basic properties of s-concave densities and bi-s*-concave distribution functions, including results concerning closure under convolution, see Borell (1975), Dharmadhikari and Joag-Dev (1988), and Saumard (2019).

Now we introduce two important parameters, one of which will appear in connection with our characterization of the class of bi-s*-concave distribution functions in the next section and in our examples below. The Csörgő - Révész constant of a bi-log-concave distribution function F, denoted by γ~(F), is given by

γ˜(F)esssupxJ(F)F(x)(1F(x))|f(x)|f2(x), (3)

provided that F is differentiable on J(F){xR:0<F(x)<1} with derivative fF′ and f is differentiable almost everywhere on J(F) with derivative f′ = F″. Here the essential supremum is with respect to Lebesgue measure. Alternatively (and suited for our characterization Theorem 3),

γ(F)esssupxJ(F){F(x)(1F(x))}|f(x)|f2(x). (4)

Note that since u∧(1 − u) ≤ 2u(1 − u) ≤ 2{u ∧ (1 − u)} it follows that 21γ(F)γ~(F)γ(F), and hence finiteness of γ(F) is equivalent to finiteness of γ~(F). The Csörgő - Révész constant γ~(F) arises in the study of quantile processes and transportation distances between empirical distributions and true distributions on R: see Csörgő and Révész (1978), Shorack and Wellner (2009), Barrio et al. (2005), and Bobkov and Ledoux (2019). It follows from the characterization Theorem 1(iv) of DKW (2017) that F is bi-log-concave if and only if γ¯(F)1. We will define γ¯(F)γ(F) and generalize this to the classes of bi-s*-concave distribution functions in Section 3.

Now we consider several examples of s-concave densities and bi-s*-concave distribution functions.

Example 1. (Student-t) Suppose xfr(x) is the density function of the Student-t distribution with r degrees of freedom defined as follows:

fr(x)=Γ((r+1)/2)πΓ(r/2)(1+x2r)(r+1)/2forxR.

It is well-known (see e.g. Borell (1975)) that fr is s-concave for any s ≤ −1/(1 + r) = s0(fr). Note that s takes values in (−1, 0) since r ∈ (0, ∞). It follows from Proposition 1 that Frs and (1 − F)s* are convex for s=s(1+s)=1r=s0(Fr)<0, and hence Fr is bi-s*-concave for all 0 < r < ∞. Direct calculation shows that the Csörgő - Révész constant γ(Fr) = 1 − s* = 1 + (1/r) ∈ (1, ∞) for 0 < r < ∞.

In particular, this yields γ(F1) = γ(Cauchy) = 2. And it suggests that γ(F) ≤ 1/(1 + s) = 1 − s* for all bi-s*-concave distribution functions F where 1/(1 + s) varies from 1 to ∞ as s varies from 0 to −1. This is one of the characterizations of the bi-s*-concave class that we will prove in Section 3.

Example 2. (Fa,b) Suppose that fa,b is the family of F—distributions with “degrees of freedom” a > 0 and b > 0. (In statistical practice, if T has the density fa,b, this would usually be denoted by T ~ Fa,b, where a is the “numerator degrees of freedom” and b is the “denominator degrees of freedom”.) The density is given by

fa,b(x)=Ca,bxb/21(a+bx)(a+b)/2forx0.

(In fact, C(a, b) = aa/2bb/2Beta(a/2, b/2), and fa,b(x) → gb(x) as a → ∞ where gb is the Gamma density with parameters b/2 and b/2.) It is well-known (see e.g. Borell (1975)) that fa,b belongs to the class of s-concave densities, if s ≤ −1/(1 + a/2) = s0(fa,b) when a ≥ 2 and b ≥ 2. This implies that s ∈ [−1/2, 0), and the resulting s0=s(1+s)=2a is in [−1, 0). By Proposition 1, it follows that Fs* and (1 − F)s* are convex; i.e. F is bi-s*-concave.

Example 3. (Pareto) Suppose that fa,b = (a/b)(x/b)−(a+1)1[b,∞)(x), the Pareto distribution with parameters a > 0 and b > 0. In this case, fa,b is s-concave for each s ≤ −1/(1 + a) by noting the convexity of fa,b1(1+a)=(xb)(ba)1(1+a). Thus we take s = −1/(1 + a) ∈ (−1, 0) for a ∈ (0, ∞) and hence s* = s/(1 + s) equals −1/a. Furthermore, it is easily seen that

CRR(x)(1F(x))f(x)f2(x)=1s*=1+1/afor allx>b.

(CRR(·) represents the Csörgő - Révész function in the right tail.)

Thus the Pareto distribution is analogous to the exponential distribution in the log-concave case in the sense that xfs(x) = cx (with c = b−1(b/a)1/(1+a)) is linear.

Example 4. (Symmetrized Beta) Suppose that

fr(x)=Cr(1x2/r)r/21[r,r](x),

where

Cr=Γ((3+r)/2)/(πrΓ(1+r/2))

and r ∈ (0, ∞). Note that fr is an s-concave density with s = 2/r ∈ (0, ∞) since

f22/r(x)=Cr2/r(1x2/r)1[r,r]

is concave and hence the corresponding distribution function Fr is bi-s*-concave with s* = s/(1 + s) = 2/(2 + r). As r → ∞ it is easily seen that

fr(x)(2π)1/2exp(x2/2),

the standard normal density. Thus r = ∞ corresponds to s = 0 and s* = 0. On the other hand,

gr(x)rfr(rx)=rCr(1x2)r/21[1,1](x)211[1,1](x)

as r → 0. Thus r = 0 corresponds to s = ∞ and s* = 1.

Note that just as the class of bi-log-concave distributions is considerably larger than the class of log-concave distributions (as shown by Dümbgen et al. (2017)), the class of bi-s*-concave distributions is considerably larger than the class of s-concave distributions. In particular, multimodal distributions are allowed in both the bi-log-concave and the bi-s*-concave classes.

Example 5. (Exponential family; exponential tilt of U(0, 1)) Suppose that

ft(x)=exp(txK(t))1[0,1](x)

where

K(t){log(et1)logt,t>0,0,t=0,log(1et)log(t),t<0, (5)

for −∞ < t < ∞ with K(0) ≡ 0, and further define Ft(x)0xft(y)dy.

One can verify that ft is s-concave only for s ≤ 0 and hence Ft is bi-s*-concave for s* ≤ s/(1 + s) ≤ 0 by Proposition 1. However, this might not be optimal; i.e. there remains the possibility that FPs for some s* > 0. In fact, by Theorem 3(iv) it follows that FtPs with s* = e−|t|. (For an example involving a power-tilt of U(0, 1), see Dharmadhikari and Joag-Dev (1988) (iv), page 95.) This also implies that the converse of Proposition 1 does not hold here or in general. The following two examples also illustrate this point.

Example 6. (Mixture of Gaussians shifted) (Dümbgen et al. (2017), page 2-3) Suppose that fδ is the mixture (1/2)N(−δ, 1) + (1/2)N(δ, 1) with δ > 0. It is well-known that fδ is bimodal if δ > 1. Since all s-concave densities are unimodal (see e.g. Dharmadhikari and Joag-Dev (1988) page 86), it follows that fδ is not s-concave for any δ > 1. Dümbgen et al. (2017) showed (numerically) that the corresponding distribution Fδ is bi-log-concave for δ ≤ 1.34 but not for δ ≥ 1.35. With δ = 1.8 this example also shows that strict inequality can occur in the second inequality in Corollary 4 below.

Example 7. (Mixture of shifted Student-t) Now suppose that f is the mixture (1/2)t1(· − δ) + (1/2)t1(· + δ) with δ > 0 where tr is the standard Student-t density with r degrees of freedom as in Example 1. Since fδ is bimodal if δ > δ0 ≈ 0.6 and all s-concave densities are unimodal, it follows that fδ is not s-concave for any δ > δ0. For values of δ < δ0, fδ is s-concave with s = −1/2, so Proposition 1 applies and shows that Fδ is bi-s*-concave with s* = −1. By numerical calculation, for δ > δ0 the distribution functions Fδ are bi-s*-concave for some s* = s*(δ) ∈ (−∞, 1] which decreases (approximately linearly) for large δ.

Example 8. (Lévy with α = 1/2) This example is the completely asymmetric α—stable (or Lévy) law with α = 1/2. It gives the first passage time to the level a > 0 for a standard Brownian motion B (started at 0 and with no drift). See e.g. Durrett (2019), pages 372 - 374. The density is given by

fa(t)=a2πt3exp(a2/2t)1(0,)(t),

and the distribution function Fa(t)=2P(Bta)=2(1Φ(at)). It is easily seen that fa is s-concave with s = −2/3, and hence Fa is bi-s*-concave with s* = −2. Thus γ(F) = 3.

The following table summarizes the examples:

Example 5 shows that strict inequality can hold in the inequality γ(F)γ¯(F)

3. Main Theoretical Results

Here is our theorem characterizing bi-s*-concave distribution functions.

Theorem 3. Let s* ≤ 1. For a non-degenerate distribution function F, the following statements are equivalent:

(i) F is bi-s*-concave.

(ii) F is continuous on R and differentiable on J(F) with derivative f = F′.

Moreover, for s* ≠ 0,

F(y){F(x)(1+s*f(x)F(x)(yx))+1/s*1(1F(x))(1s*f(x)1F(x)(yx))+1/s* (6)

while for s* = 0

F(y){F(x)exp(f(x)F(x)(yx))1(1F(x))exp(f(x)1F(x)(yx)) (7)

for all x, yJ(F).

(iii) F is continuous on R and differentiable on J(F) with derivative f = F′ such that the s*-hazard function f/(1 − F)1−s* is non-decreasing on J(F), and the reverse s*-hazard function f/F1−s* is non-increasing on J(F).

(iv) F is continuous on R and differentiable on J(F) with bounded and strictly positive derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying

(1s*)f21Ff(1s*)f2Falmost everywhere onJ(F). (8)

The following two remarks are immediately consequences of Theorem 3. See Section 6 for a proof of Remark 1.

Remark 1.

(i) The proof of Theorem 3(iv) implies that if s* > 1, then not both Fs* and (1 − F)s* can be concave.

(ii) If F is a bi-s*-concave distribution function for 0 < s* ≤ 1, then inf J(F) > −∞ and sup J(F) < ∞.

(iii) If F is a bi-s*-concave distribution function for s* < 0, then it follows that

(0,T(F)){tR+:|x|tdF(x)<}, (9)

with

T(F){if infJ(F)>and supJ(F)<,1s*otherwise. (10)

Remark 2. Suppose that F is a bi-s*-concave distribution function, and define

T1(F)supxJ(F)fF1s*(x),andT2(F)supxJ(F)f(1F)1s*(x).

Since f/F1-s* is monotonically non-increasing on J(F), it follows that for any x,x0J(F) with x < x0,

fF1s*(x)1s*Fs*(x)1s*Fs*(x0)xx0

and hence

T1(F)=supxJ(F)fF1s*(x)=limxinfJ(F)fF1s*(x){>0,=if infJ(F)>.

Analogously one can show that

T2(F){>0,=if supJ(F)<.

Corollary 4. (Connection with the Csörgő - Révész constant.)

Suppose F is a bi-s*-concave distribution function for s* ≤ 1. Then with γ~(F) and γ(F) as defined in (3) and (4), we have

12γ(F)γ˜(F)γ(F)γ¯(F)1s*, (11)

where

γ¯(F)max{CR˜(F),CR˜(F¯)},F¯1F,CR˜(F)esssupxJ(F)F(x)F(x)(F(x))2,

and

γ(F)esssupxJ(F){F(x)(1F(x))}|F(x)|(F(x))2=esssupxJ(F){F(x)(1F(x))}|f(x)|(f(x))2.

Remark 3. By Theorem 3, one can verify that CR~(F) is well-defined for any FPs. Note that

CR˜(F¯)esssupxJ(F)F¯(x)(F(x))(F(x))2.

The first two inequalities in Corollary 4 follow (as we noted before) from 2−1{u∧(1 − u)} ≤ u(1 − u) ≤ u ∧ (1 − u) for 0 ≤ u ≤ 1. Thus finiteness of γ~(F) implies finiteness of γ(F) and vice-versa. Examples show that strict inequality may hold in the inner inequalities in (11). On the other hand, if f is non-decreasing on (a, F−1(1/2)) and f is non-increasing on (F−1(1/2), b) where J(F) = (a, b), then γ=γ¯ by inspection of the proof of γ(F)γ¯(F).

Corollary 5. (Bounds for FPs, where s* ≠ 0.)

For any s* ∈ (−∞, 0) ∪ (0, 1] and FPs,

FL(x)F(x)FU(x), (12)

where FL(x)1s(Fs(x)(1s)) and FU(x)1s(1(1F(x))s).

Moreover, FU(x) is a convex function on J(F), and FL(x) is a concave function on J(F). For s* = 0 and FP0, (12) holds with FL(x) = 1 + log F(x) and FU(x) = −log(1 − F(x)).

4. Confidence bands for bi-s*-concave distribution functions

Our goal in this section is to define confidence bands for F which exploit the shape constraint FPs0. We start with some known unconstrained non-parametric bands and define new bands under the assumption that the true distribution function F satisfies the shape constraint FPs0 where s0 is known.

4.1. Definitions and Basic Properties

Let X1, …, Xn be i.i.d. random variables with continuous distribution function F. A (1 − α)-confidence band, denoted by (Ln, Un), for F means that both Ln and Un are monotonically non-decreasing functions on R depending on α and X1, …, Xn only, moreover, Ln and Un have to satisfy Ln < 1, Un > 0 and

P(Ln(x)F(x)Un(x)for allxR)=1α.

The following two bands are discussed in Dümbgen et al. (2017) and we briefly restate them here.

Example (Komogorov-Smirnov band). A Komogorov-Smirnov band (Ln, Un) is given by

[Ln(x),Un(x)][Fn(x)κα,nKSn,Fn(x)+κα,nKSn][0,1],

where Fn is the empirical distribution function and κα,nKS denotes the (1 − α)-quantile of supxRn1/2|Fn(x)F(x)|, see Shorack and Wellner (2009) Note that κα,nKSlog(2/α)/2 by Massart’s (1990) inequality, see Massart (1990).

Example (Weighted Komogorov-Smirnov band). A Weighted Komogorov-Smirnov band (Ln, Un) is as follows: for any γ ∈ [0, 1/2),

[Ln(x),Un(x)][tiκα,nWKSn(ti(1ti))γ,ti+1+κα,nWKSn(ti+1(1ti+1))γ][0,1],

for i ∈ {0, 1, …, n} and x ∈ [X(i), X(i+1)), where {X(i)}i=1n denotes the order statistics of {Xi}i=1n, X(0) ≡ −∞, X(n+1) ≡ ∞, tii/(n + 1) for i = 1, …, n, and κα,nWKS denotes the (1 − α)-quantile of the following test statistics

nmaxi=1,,n|F(X(i))ti|(ti(1ti))γ.

Note that κα,nWKS=O(1).

A further example of a nonparametric confidence band due to Owen (1995) and refined by Dümbgen and Wellner (2014) was considered by Dümbgen et al. (2017). We will not consider this third possibility further here due to space constraints.

Now we turn to confidence bands for bi-s*-distribution functions. Our approach will be to refine the three unconstrained bands given in the three examples. Suppose F is a bi-s*-concave distribution function. A nonparametric (1 − α) confidence band (Ln, Un) for F may be refined as follows:

Lno(x)inf{G(x):GPs*,LnGUn},Uno(x)sup{G(x):GPs*,LnGUn}.

If there is no bi-s*-concave distribution function F fitting into the band (Ln, Un), we set Lno1 and Uno0 and we conclude with confidence 1 − α that F is not bi-s*-concave. But in the case of FPs, this happens with probability at most α.

The following lemma implies two properties of our shape-constrained band ((Lno,Uno)). The first one is that both Lno and Uno are Lipschitz continuous on R, unless inf{xR:Ln(x)>0}sup{xR:Un(x)<1}. The second one is that Lno(x) converges polynomially fast to 0 as x → −∞ and Uno(x) converges polynomially fast to 1 as x → ∞ as long as limx→∞ Ln(x) > limx→−∞ Un(x).

Lemma 6. For real numbers a < b, 0 < u < v < 1 and s* ∈ (−∞, 0) ∪ (0, 1], define

γ11s*(υs*us*)baandγ21s*((1υ)s*(1u)s*)ba.

(i) If Ln(a) ≥ u and Un(b) ≤ v, then Lno and Uno are Lipschitz-continuous on R with Lipschitz constant max{γ1, γ2}.

(ii) If Un(a) ≤ u and Ln(b) ≥ v, then

Uno(x)(us*+s*γ1(xa))+1/s*forxa,
1Lno(x)((1υ)s*s*γ2(xb))+1/s*forxb.

The following theorem implies the consistency of our proposed confidence band (Lno,Uno).

Theorem 7. Suppose that the original confidence band (Ln, Un) is consistent in the sense that for any fixed xR, both Ln(x) and Un(x) tend to F(x) in probability.

(i) Suppose that FPs. Then P(LnoUno)0.

(ii) Suppose that FPs with s* ≠ 0. Then P(LnoUno)1α, and

supGPs*:LnGUnGFp0, (13)

where sup(Ø) ≡ 0. Moreover, for any compact interval KJ(F),

supGPs*:LnGUnhGhFK,p0, (14)

where hG stands for any of the three functions G′, (Gs*)′, and ((1 − G)s*)′. Finally, for any fixed x1J(F) and 0 < b1 < f(x1)/F1−s*(x1),

P(Uno(x)(Uns*(x)+s*b1(xx))+1/s*forxxx1)1, (15)

while for any fixed x2J(F) and 0 < b2 < f(x2)/(1 − F(x2))1−s*

P(1Lno(x)((1Ln(x))s*s*b2(xx))+1/s*forxxx2)1. (16)

The following result provides the consistency of confidence bands for functionals ∫ ϕdF of F with well-behaved integrands ϕ:RR.

Corollary 8. Suppose that the original confidence band (Ln, Un) is consistent, and let FPs with s* < 0. Let ϕ:RR be absolutely continuous with a continuous derivative ϕ′ satisfying the following constraint: there exist constants a > 0 and k < −1/s* such that

|ϕ(x)|a|x|k1.

Then

supG:LnoGUno|ϕdGϕdF|p0.

The following theorem provides rates of convergence, with the following condition on the original confidence band (Ln, Un):

Condition (*): For certain constants γ ∈ [0, 1/2) and κ, λ > 0,

max{FnLn,UnFn}κn1/2(Fn(1Fn))γ

on the interval {λn1/(22γ)Fn1λn1/(22γ)}.

As stated in Dümbgen et al. (2017), this condition is satisfied with γ = 0 in the case of the Kolmogorov-Smirnov band. In the case of the weighted Kolmogorov-Smirnov band, it is satisfied for the given value of γ ∈ [0, 1/2). For the refined version of Owen’s band, it is satisfied for any fixed number γ ∈ (0, 1/2).

Theorem 9. Suppose that FPs with s* < 0 and let (Ln, Un) satisfy Condition (*). Let ϕ:RR be absolutely continuous with a continuous derivative ϕ′.

Suppose that |ϕ′(x)| = O(|x|k−1) as |x| → ∞ for some numbers k < −1/s*. Then

supG:LnoGUno|ϕdGϕdF|=Op(n12(1ks*+11γ)). (17)

Remark: (i) From (17), one can verify that the convergence rate is n−1/2 as long as k < γ/(−s*).

(ii) From (17), one can verify that when γ = 0, the convergence rate is n−1/2+k/(−s*) and we have a “power deficit” (or “polynomial rate deficit”) relative to n−1/2.

4.2. Implementation and illustration of the confidence bands

In this section, we discuss the implementation of confidence bands for bi-s*-concave distribution functions. This extends the treatment of Dümbgen et al. (2017) from s* = 0 to general values s* ∈ (−∞, 1].

Recall the procedure ConcInt(·, ·) developed in Dümbgen et al. (2017). Given any finite set T={t0,,tm} of real numbers t0 < t1 < ⋯ < tm and any pair (l, u) of functions l,u:T[,) with l < u pointwise and l(t) > −∞ for at least two different points tT, this procedure computes the pair (lo, uo) where

lo(x)inf{g(x):gis concave onR,lguonT},uo(x)sup{g(x):gis concave onR,lguonT}.

First note that lo is the smallest concave majorant of l on T; thus it may be computed by a version of the pool-adjacent-violators algorithm; see for example Robertson et al. (1988). Then we obtain indices 0 ≤ j(0) < j(1) < ⋯ < j(b) ≤ m such that

lo{onR\[tj(0),tj(b)],is linear on[tj(a1),tj(a)]for1ab,change slope attj(a)if1ab.

With lo in hand, we then check to see if lou on T. If this fails, then there is no concave function lying between l and u, and the procedure returns an error message. If this test succeeds, then we compute uo(x) as

min{u(s)+u(s)lo(r)sr(xs):rT0,r<sxorxs<r},

where To={tj(0),tj(1),,tj(b)}. (The rest of the description of the procedure ConcInt(·, ·) is just as in Dümbgen et al. (2017).)

When s* < 0, let g(v; s*) ≡ g(v) ≡ −vs* and h(v; s*) ≡ h(v) ≡ (−v)1/s*. (This is the most important new case. When s = s* = 0, g(v) ≡ log(v), h(v) ≡ exp(v). When s* > 0, g(v) ≡ vs* and h(v) ≡ v1/s*.) Here is pseudocode for the computation of ((Lno,Uno)).

(Lno,Uno)(LnUn)(lo,uo)ConcInt(g(Lno),g(Uno))(L˜no,U˜no)(h(lo),h(uo))(lo,uo)ConcInt(g(1U˜no),g(1L˜no))(L˜no,U˜no)(1h(uo),1h(lo))while(L˜no,U˜no)(Lno,Uno)do(Lno,Uno)(L˜no,U˜no)(lo,uo)ConcInt(g(Lno),g(Uno))(L˜no,U˜no)(h(lo),h(uo))(lo,uo)ConcInt(g(1U˜no),g(1L˜no))(L˜no,U˜no)(1h(uo),1h(lo))end while.

Illustration of the confidence bands

To get some feeling for the new confidence bands in a setting in which s0 is known, we generated a sample of size n = 100 from the Student-t distribution with r = 1 degrees of freedom. This distribution belongs to Ps for every s1s0. We constructed Kolmogorov-Smirnov (KS) and weighted Kolmogorov-Smirnov (WKS) bands with γ = 0.4 as the initial starting bands (Ln, Un). We then computed and plotted our shape constrained confidence bands ((Ln0,Un0)) under the (correct) assumption that s* = −1 and the (incorrect) assumption that s* = 0 for both the KS and WKS bands as initial nonparametric bands with for α = 0.05; see Figure 1 and Figure 2. To see the components of Figures 1 and 2 separately, see the Supplementary file, Figures 12 and 34 respectively.

Figure 1:

Figure 1:

Confidence bands for bi-s*-concave distribution functions based on KS bands. The black curve is the distribution function of the Student-t distribution with 1 degree of freedom. The two gray-black lines give the KS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function in the middle is the empirical distribution function.

Figure 2:

Figure 2:

Confidence bands for bi-s*-concave distribution functions based on WKS bands. The black curve is the distribution function of the Student-t distribution with 1 degree of freedom. The two gray-black lines give the WKS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function in the middle is the empirical distribution function.

Figure 3:

Figure 3:

Confidence Bands for bi-s*-concave distribution functions from KS bands based on a sample of size 1000 from the Student-t distribution with 1 degree of freedom. The two gray-black lines give the initial bands, lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function (black) in the middle is the empirical distribution function.

Figure 4:

Figure 4:

Confidence Bands for bi-s*-concave distribution functions from WKS bands based on a sample of size 1000 from the Student-t distribution with one degree of freedom. The two gray-black lines give the initial bands, lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function (black) in the middle is the empirical distribution function.

Note that when s* = 0, s* is miss-specified and the resulting bands are not guaranteed to have coverage probabiltiy .95. An indication of this is that the shape constrained bands computed under the assumption s* = 0 do not contain the empirical distribution.

From these two plots, an immediate observation is that the confidence bands for smaller s* are wider than those with larger s*. This is a direct consequence of the nested property of the bi-s*-concave classes; see Proposition 2. Also note that the shape constrained band with s* = −1 does improve on the KS band, especially in the tail.

An Application

Dümbgen et al. (2017) gave an application of bi-log-concave confidence bands to a dataset from Woolridge (2000). It contains approximate annual salaries of the CEOs of 177 randomly chosen companies in the U.S. The salary is rounded to multiples of 1000 USD. We denote the i-th observed approximate salary by Yi,raw. Dümbgen et al. (2017) assume that the unobserved true salary Yi,true lies within (Yi,raw − 1, Yi,raw + 1). Let us assume that Gtrue is the unknown distribution of Ytrue. For income data it is sometimes assumed that log10 Ytrue is Gaussian (see Kleiber and Kotz (2003)). Since Gaussian densities are all log-concave and hence have bi-log-concave distribution functions (by Proposition 1), it is natural to consider replacing the Gaussian assumption by the assumption of bi-log-concavity. Dümbgen et al. (2017) therefore assumed that X = log10 Ytrue is bi-log-concave and constructed 95% confidence bands (Ln, Un) (see Figure 4 of Dümbgen et al. (2017)) where Ln is computed with the empirical distribution of log10(Yi,raw1)i=1n and Un is computed with that of log10(Yi,raw+1)i=1n.

Here we assume that the distribution of X is bi-s*-concave for some s* and compute confidence bands for different values of s*. Now we are confronted with the issue of choosing s*: if we want narrower confidence bands we would assume some value of s* ∈ (0, 1], while if we are not willing to assume s* = 0 (the choice made by Dümbgen et al. (2017), then we would assume some value of s* < 0 (leading to the larger classes Ps with s* < 0. It is of some interest to know if the CEO data could be modeled by use of the bi-s* classes with s* ∈ (0, 1] since this would result in still narrower confidence bands. But it is also of interest to try to use the data to choose s*.

Choosing s*

Since F can be a member of Ps for various values of s*, each s* leads to a different set of bands. However, due to the nesting property of Ps, a larger s* always yields a narrower confidence band. Thus, it is of interest to estimate

s0*(F)sup{s*(,1]:FPs*}

since s=s0 generates the narrowest bands at a given confidence level. If F is not bi-s*-concave for any s* ≤ 1, then we set s0(F)=. Now s0 is connected to the Csörgő - Révész constant since s=s0 when γ¯(F)=1s and FPs. For example, the Student-t distribution with r “degree of freedom” has s0=1r. However, this connection cannot be easily exploited for practical estimation purposes due to difficulties in estimating γ(F) or γ¯(F). So we take an alternative route to making inference about s0.

Starting from an initial 1 − α band (Ln, Un), a bound on s0 is given by

s¯n*=sup{s*(,1]:(Ln,Un)contains some d.f.FPs*}.

Clearly, for s>s¯n, there is no bi-s*-concave distribution function fitting into the band (Ln, Un). Since this happens with probability at most α ∈ (0, 1) when the true distribution function FPs, it follows that (,s¯n] is a confidence set for s0 with coverage probability at least 1 − α. Our simulations suggest that s¯n is generally considerably larger than s0, and hence not suitable as an estimator, especially for α = 0.05.

Instead, we propose an estimator of s0 based on the Fn measure of the set where the empirical measure remains between the shape-constrained band for s*. More formally, let Lno(s) and Uno(s) denote the 1 − α level bi-s*-concave confidence bands based on the initial bands Ln and Un and the assumption FPs. Define

ω(s*)n1=i=1n1{Lno(s*)(Xi)Fn(Xi)Uno(s*)(Xi)}1{Lno(s*)(Xi)Uno(s*)(Xi)}=Fn({Lno(s*)FnUno(s*)}{Lno(s*)Uno(s*)}).

A higher value of ω(s*) indicates that ((Lno(s),Uno(s))) contains a greater portion of Fn. Since the bands (Ln(s*), Un(s*)) become narrower as s* increases, ω(s*) decreases in s*, and eventually becomes zero when s>s¯n. A plausible estimator of s0 is therefore given by

s^n*=min{s*(,s¯n*]:ω(s*)>ρ}, (18)

where ρ is a threshold taking values in (0, 1). The calculation of s^n thus depends on α and ρ.

In the case of the CEO data, s¯n0.23 for the KS initial band, and s¯n0.18 for the WKS band. Taking α = 0.05 and ρ = .95, leads to s^n=0.12, while taking α = 0.05 and ρ = 0.95, leads to s^n=.12. The resulting bands are given in Figures 5 and 6. Also see the Supplementrary file, Figures 910 and Figures 1112 for the steps in constructing Figures 5 and 6.

Figure 5:

Figure 5:

Confidence Bands from an initial KS band for the CEO salary data. The step function in the middle is the empirical distribution function. The two gray-black lines give the KS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption.

Figure 6:

Figure 6:

Confidence Bands from an initial WKS band for the CEO salary data. The step function in the middle is the empirical distribution function. The two gray-black lines give the WKS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption.

We should emphasize that our current theory says little about the coverage probabilities of the bands ((Lno(s),Uno(s))). Discussion of the consistency of s^n is beyond the scope of the present paper, but this and further issues concerning inference for both s* and FPs seem to be interesting directions for future research.

5. Summary and further problems

In this paper we have:

• Defined new classes of shape-constrained distribution functions, the bi-s*-classes extending the bi-log-concave class of distribution functions defined by Dümbgen et al. (2017).

• Characterized the new classes and connected our characterization to an important parameter, the Csörgő - Révész constant associated with a distribution function F.

• Used the new bi-s*-concave classes to define refined confidence bands for distribution functions which exploit the shape constraint, thereby producing more accurate (narrower) bands with honest coverage when the shape constraint holds.

Thus we have shown that if we know the parameter s* ∈ (−∞, 1] determining the class, we can construct refined confidence bands which improve on any given nonparametric confidence bands if the given value of s* is correct. It follows from the construction of our bands that they have conservative coverage probabilities under the (null) hypothesis that the true distribution function is in Ps and that s* is correctly specified.

• What if we do not know s*? Can we estimate it from the data? As becomes clear from the discussion of the CEO data via Figures 5 and 6, our methods provide one-sided confidence bounds for the true s* of the form (−∞, s¯n] under the assumption that FPs for some s*. It remains to develop inference methods for s* and (s*, F) jointly. It will also be of interest to have a more complete understanding of the power behavior of tests related to s¯n and s^n.

• The stable laws are known to be unimodal; see e.g. Hall (1984) for some history. In connection with Example 8 we have the following:

Conjecture:

the α—stable laws are s—concave with s = −1/(1 + α) for 0 < α < 2.

6. Proofs

Proof of Theorem 3:

Throughout our proof we will denote inf J(F) and sup J(F) by a and b respectively. Moreover, we assume s* < 0 in the following proof and leave the case of s* > 0 for the Appendix. Note that the case s* = 0 is proved by Dümbgen et al. (2017).

(i) implies (ii):

Suppose FPs. To prove that F is continuous on R, we first note that xFs*(x) and x ↦ (1 − F(x))s* (x) are convex functions on R. By Theorem 10.1 (page 82) of Rockafellar (1970), Fs* and (1 − F(x))s* are continuous on any open convex sets in their effective domains. In particular, Fs* and (1 − F)s* are continuous on (a, ∞) and (−∞, b) respectively. This implies that F is continuous on (a, ∞) and (−∞, b), or equivalently, on (a, ∞) ∪ (−∞, b) = (−∞, ∞) since F is non-degenerate.

To prove that F is differentiable on J(F), note that J(F) = (a, b) since F is continuous on R. By Theorem 23.1 (page 213) of Rockafellar (1970), for any xJ(F), the convexity of Fs* on J(F) implies the existence of (Fs)+(x) and (Fs)(x). Moreover, (Fs)(x)(Fs)+(x) by Theorem 24.1 (page 227) in Rockafellar (1970). Since F = (Fs*)1/s* on J(F), the chain rule guarantees the existence of F±(x) and

F±(x)=1s*(Fs*)1/s*1(x±)(Fs*)±(x).

Since F is continuous on J(F), then

F±(x)=1s*(Fs*)1/s*1(x)(Fs*)±(x).

Hence F(x)F+(x) by noting that (Fs)(x)(Fs)+(x) and s* < 0.

Similarly, one can prove F(x)F+(x) by the convexity of (1 − F)s* on J(F). Thus F(x)=F+(x)=F(x) for any xJ(F), or equivalently, F is differentiable on J(F). The derivative of F is denoted by f, i.e. fF′.

To prove (6), note that the convexity of xFs* (x) on J(F) implies that, for any x,yJ(F),

Fs*(y)Fs*(x)(yx)(Fs*)(x)=(yx)s*Fs*1(x)f(x),

or, with x+ = max{x, 0},

Fs*(y)Fs*(x)(1+s*f(x)F(x)(yx))+.

Hence,

F(y)F(x)(1+s*f(x)F(x)(yx))+1/s*,

or, equivalently,

F(y)F(x)(1+s*f(x)F(x)(yx))+1/s*.

Analogously, the convexity of (1 − F(x))s* on J(F) implies that

(1F(y))s*(1F(x))s*(yx)s*(1F(x))s*1f(x),

or, equivalently,

(1F(y)1F(x))s*(1s*f(x)1F(x)(yx))+,

which yields

F(y)1(1F(x))(1s*f(x)1F(x)(yx))+1/s*.

The proof of (6) is complete.

(i) implies (iii):

Applying (6) yields that for any x, yJ(F) with x < y,

Fs*(x)Fs*(y)1+s*f(y)F(y)(xy),

and

Fs*(y)Fs*(x)1+s*f(x)F(x)(yx),

or, equivalently,

Fs*(x)Fs*(y)+s*f(y)F1s*(y)(xy),

and

Fs*(y)Fs*(x)+s*f(x)F1s*(x)(yx).

By defining hf/F1−s* on J(F), it follows that

Fs*(x)Fs*(y)+s*h(y)(xy),

and

Fs*(y)Fs*(x)+s*h(x)(yx).

After summing up the last two inequalities, it follows that

Fs*(x)+Fs*(y)Fs*y+s*h(y)(xy)+Fs*(x)+s*h(x)(yx),

or, equivalently,

0s*(h(x)h(y))(yx).

Hence h(x) ≥ h(y), or equivalently, h(·) is a monotonically non-increasing function on J(F).

The proof of the monotonicity of h~f(1F)1s is similar and hence is omitted.

(iii) implies (iv):

If (iii) holds, it immediately follows that f > 0 on J(F) = (a, b). If not, suppose that f(x0) = 0 for some x0J(F). It follows that h(x0) = f(x0)/F1-s* (x0) = 0. Since h is monotonically non-increasing on J(F), h(x) = 0 for all x ∈ [x0, b), or, equivalently, f = 0 on [x0, b). Similarly, the non-decreasing monotonicity of xh~(x) on J(F) implies that f = 0 on (a, x0]. Then f = 0 on J(F), which violates the continuity assumption in (iii) and hence f > 0 on J(F).

To prove f is bounded on J(F), note that the monotonicities of h and h~ imply that for any x, x0J(F),

f(x)={F1s*(x)h(x)h(x)h(x0),ifxx0,(1F(x))1s*h˜(x)h˜(x)h˜(x0),ifxx0.

Hence f(x)max{h(x0),h~(x0)}} for any x, x0J(F).

To prove that f is differentiable on J(F) almost everywhere, we first prove that f is Lipschitz continuous on (c, d) for any c, dJ(F) with c < d.

By the non-increasing monotonicity of h on J(F), the following arguments yield an upper bound of (f(y) − f(x))/(yx) for any x, y ∈ (c, d):

f(y)f(x)yx=F1s*(y)h(y)F1s*(x)h(x)yx=h(y)F1s*(y)F1s*(x)yx+F1s*(x)h(y)h(x)yxh(y)F1s*(y)F1s*(x)yx=h(y)(1s*)f(z)Fs*(z),

where the last equality follows from the mean value theorem and z is between x and y.

Since −s* > 0, it follows that Fs* < 1 and hence

f(y)f(x)yx<(1s*)f(z)h(y)(1s*)max{h(x0),h˜(x0)}h(c)

for x, y ∈ (c, d).

Similar arguments imply that

f(y)f(x)yx=F¯1s*(y)h˜(y)F¯1s*(x)h˜(x)yx=h˜(y)F¯1s*(y)F¯1s*(x)yx+F¯1s*(x)h˜(y)h˜(x)yxh˜(y)F¯1s*(y)F¯1s*(x)yx=h˜(y)(1s*)F¯s*(z)f(z)(1s*)max{h(x0),h˜(x0)}h˜(d).

Hence

|f(y)f(x)yx|(1s*)max{h(x0),h˜(x0)}max{h(c),h˜(d)}.

The last display shows that f is Lipschitz continuous on (c, d).

By Proposition 4.1(iii) of Shorack (2017), page 82, f is absolutely continuous on (c, d), and hence f is differentiable on (c, d) almost everywhere.

Since (c, d) is an arbitrary interval in (a, b), the differentiability of f on (c, d) implies the differentiability of f on (a, b) and hence f is differentiable on (a, b) with f′ = F″ almost everywhere.

Since f is differentiable almost everywhere, the non-increasing monotonicity of h on J(F) implies that

h(x)0almost everywhere onJ(F),

or, equivalently,

log(h)(x)0almost everywhere onJ(F).

Straight-forward calculation yields that the last display is equivalent to

ff(1s*)fF0almost everywhere onJ(F),

or,

f(1s*)f2Falmost everywhere onJ(F),

which is the right hand side of (8).

Similarly, the non-decreasing monotonicity of h~ implies the left hand side of (8).

(iv) implies (i):

Since F is continuous on R, it suffices to prove that Fs* is convex on J(F) by Definition 2. Since we assume that F is differentiable on J(F) with derivative f = F′, the convexity of Fs* on J(F) can be proved by the increasing monotonicity of the first derivative of Fs* on J(F). Since f is differentiable almost everywhere on J(F), the increasing monotonicity of (Fs*)′ on J(F) can be proved by the non-positivity of (Fs*) on J(F) almost everywhere, which follows from

(Fs*)(x)=s*Fs*1(x)((1s*)f2(x)F(x)+f(x))0,

where f = F′, f′ = F″. The last inequality follows from the right hand side of (8).

Similarly, the convexity of (1 − F(x))s* , or Fs, on J(F) can be proved by the following arguments:

(F¯s*)(x)=s*F¯s*1(x)((1s*)f2(x)F¯(x)f(x))0,

where the last inequality follows from the left part of (8). □

Proof of Proposition 1:

First some background and definitions:

  • Let a, b ≥ 0 and θ ∈ (0, 1). The generalized mean of order sR is defined by
    Ms(a,b;θ)={((1θ)as+θbs)1/s,if±s(0,),a1θbθ,ifs=0,max{a,b},ifs=,min(a,b),ifs=,
  • Let (M, d) be a metric space with Borel σ—field M. A measure μ on M is called t-concave if for nonempty sets A,BM and 0 < θ < 1 we have
    μ*((1θ)A+θB)Mt(μ*(A),μ*(B);θ)
    where μ* is the inner measure corresponding to μ (which is needed in general in view of examples noted by Erdős and Stone (1970)).
  • A non-negative real-valued function h on (M, d) is called s-concave if for x, yM and 0 < θ < 1 we have
    h((1θ)x+θy)Ms(h(x),h(y);θ).
    See Chapter 3.3 in Dharmadhikari and Joag-Dev (1988) for more details of the definitions of Ms(a, b; θ), t-concave and s-concave.
  • Suppose (M,d)=(Rk,||), k—dimensional Euclidean space with the usual Euclidean metric and suppose that f is an s-concave density function with respect to Lebesgue measure λ on Bk, and consider the probability measure μ on Bk defined by
    μ(B)=Bfdλfor allBBk.
    Then by a theorem of Borell (1975), Brascamp and Lieb (1976) and Rinott (1976), the measure μ is s*-concave where s* = 1/(1 + ks) if s ∈ (−1/k, ∞) and s* = 0 if s = 0.
  • Here we are in the case k = 1. Thus for s ∈ (−1, ∞) the measure μ is s* concave: for s ∈ (−1, ∞), A,BB1, and 0 < θ < 1,
    μ*((1θ)A+θB)Ms*(μ*(A),μ*(B);θ); (19)
    here μ* denotes inner measure corresponding to μ.

With this preparation we can give our proof of Proposition 1: if A = (−∞, x] and B = (−∞, y] for x, yJ(F), it is easily seen that

(1θ)A+θB={(1θ)x+θy:xx,yy}{(1θ)x+θy:(1θ)x+θy(1θ)x+θy}=(,(1θ)x+θy].

Therefore, with the second inequality following from (19),

F((1θ)x+θy)=μ((,(1θ)x+θy])μ((1θ)(,x]+θ(,y])Ms*(μ((,x]),μ((,y]);θ)=Ms*(F(x),F(y);θ);

i.e. F is s*-concave. Similarly, taking A = (x, ∞) and B = (y, ∞) it follows that 1 − F is s*-concave.

Note that this argument contains the case s* = 0. □

Proof of Proposition 2:

By Theorem 3, for any FPs, F is continuous on R and differentiable on J(F) with derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying (8).

For any t*s*, by noting that 1 − s* ≤ 1 − t* and −(1 − s*) ≥ −(1 − t*), it follows that

(1t*)f21F(1s*)f21Ff(1s*)f2F(1t*)f2F,

almost everywhere on J(F). Hence FPt by Theorem 3. This proves (1).

To prove (2), note that for any Fs>0Ps, F is continuous on R and differentiable on J(F) with derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying (8), i.e.

(1s*)f21Ff(1s*)f2Falmost everywhere onJ(F),

for all s* > 0. By taking s* → 0, it follows that

(10)f21Ff(10)f2Falmost everywhere onJ(F).

The last display is equivalent to FP0 by Theorem 3. This proves that the left hand side of (2) holds. Similarly, one can prove the right hand side of (2); the details are omitted. □

Proof of Corollary 4:

To prove the right part of (11), note that (8) implies that

1s*Fff2and1s*(1F)ff2

almost everywhere on J(F), or equivalently,

1s*max{ess supxJ(F)Fff2,ess supxJ(F)(1F)ff2}.

Replacing ess supxJ(F) F f′/f2 and ess supxJ(F)−(1 − F)f′/f2 by CR~(F) and CR~(F¯), it follows that

1s*max{CR˜(F),CR˜(F¯)}=γ¯(F).

One can prove the left two inequalities of (11) by the following arguments:

γ¯(F)=max{CR˜(F),CR˜(F¯)}=max{ess supxJ(F)F(x)f(x)f(x)2,ess supxJ(F)(1F(x))f(x)f(x)2}=max{ess supxJ(x)F(x)f(x)f(x2)1[f(x)0],ess supxJ(F)(1F(x))|f(x)|f(x2)1[f(x)0]}=max{ess supxJ(F)F(x)|f(x)|f(x)21[f(x)0],ess supxJ(F)(1F(x))|f(x)|f(x2)1[f(x)0]}max{ess supxJ(F)F(x)(1F(x))|f(x)|f(x)21[f(x)0],ess supxJ(F)F(x)(1F(x))|f(x)|f(x)21[f(x)0]}=ess supxJ(F)F(x)(1F(x))|f(x)|f(x)2=γ(F)γ˜(F)

where the last inequality holds since u ∧ (1 − u) ≥ u(1 − u) for 0 ≤ u ≤ 1. □

Proof of Corollary 5:

Note that for s* < 0 and y > −1, we have (1 + y)s* ≥ 1 + s*y. Replacing y by −F(x), where xJ(F), it follows that

(1F(x))s*1s*F(x),

or, by rearranging,

F(x)1s*(1(1F(x))s*)=FU(x),

where FU is a convex function on J(F) if FPs. This proves the right hand side of (12) for s* < 0. Similarly, replacing y by −(1 − F(x)), where xJ(F), by rearranging terms, it follows that

F(x)1s*(Fs*(x)(1s*))=FL(x),

which proves the left hand side of (12) for s* < 0.

Similarly, for 1 ≥ s* > 0 and y > −1, we have (1 + y)s* ≤ 1 + s*y. Replacing y by −F(x), where xJ(F), it follows that

(1F(x))s*1s*F(x),

or, by rearranging,

F(x)1s*(1(1F(x))s*)=FU(x),

where FU is a convex function on J(F) if FPs. This proves the right hand side of (12) for s* > 0.

Similarly, replacing y by −(1 − F(x)), where xJ(F), by rearranging terms, it follows that

F(x)1s*(Fs*(x)(1s*))=FL(x),

which proves the left hand side of (12) for s* > 0.

Proof of Lemma 6:

If there is no GPs fitting in between Ln and Un, it follows that Lno1 and Uno0 and assertions in both (i) and (ii) are trivial. In the following proof, we let GPs such that LnGUn.

(i) It suffices to prove that for any xJ(G) the density function g = G′ satisfies g(x) ≤ max{γ1, γ2}, because this is equivalent to Lipschitz-continuity of G with the latter constant, and this property carries over to the pointwise infimum Lno and supremum Uno.

To prove g(x) ≤ max{γ1, γ2}, note that g/G1−s* is monotonically non-increasing on J(G) (see Theorem 3(iii)), it follows that for xb

g(x)G1s*(x)g(b)G1s*(b)=(1s*Gs*)(b)1s*Gs*(b)1s*Gs*(a)ba1s*(υs*us*)ba=γ1.

The last inequality follows from noting that x ↦ (1/s*)xs* is a monotonically non-decreasing function for all s* ≠ 0, G(b) ≤ Un(b) ≤ v and G(a) ≥ Ln(a) ≥ u. Hence

g(x)G1s*(x)γ1γ1forxb.

Similarly, by noting that g/(1 − G)1−s* is monotonically non-decreasing on J(G) (see Theorem 3(iii)), it follows that for xa

g(x)(1G(x))1s*g(a)(1G(a))1s*=(1s*(1G)s*)(a)1s*(1G(b))s*1s*(1G(a))s*ba1s*((1v)s*(1u)s*)ba=γ2.

The last inequality follows from noting that x ↦ −(1/s*)(1 − x)s* is a monotonically non-decreasing function for all s* = 0, G(b) ≤ v and G(a) ≥ u. Hence

g(x)(1G(x))1s*γ2γ2forxa.

For a < x < b, analogously, we get two following inequalities

g(x)=G1s*(x)g(x)G1s*(x)G1s*(x)1s*Gs*(x)1s*Gs*(a)xa=1s*1xa(G(x)Gs*(a)G1s*(x))

and

g(x)=(1G(x))1s*g(x)(1G(x))1s*(1G(x))1s*1s*(1G(b))s*1s*(1G(x))s*bx=1s*1bx(1G(x)(1G(b))s*(1G(x))1s*).

The former inequality times (xa) plus the latter inequality times (bx) yields

g(x)1s*1Gs*(a)G1s*(x)(1G(b))s*(1G(x))1s*ba=h(G(x))ba,

where

h(y)1s*(1Gs*(a)y1s*(1G(b))s*(1y)1s*)fory(0,1).

Since

h(y)=(1s*)(Gs*(a)ys*1+(1G(b))s*(1y)s*1)0,

it follows that h(y) is convex on (0, 1) and hence

g(x)maxy{G(a),G(b)}h(y)ba=max{h(G(a))ba,h(G(b))ba}.

Note that

h(G(a))ba=(1G(a))1s*1s*(1G(b))s*1s*(1G(a))s*baγ2

and

h(G(b))ba=G(b)1s*1s*Gs*(b)1s*Gs*(a)baγ1.

Hence g(x) ≤ max{γ1, γ2} for a < x < b.

(ii) By Theorem 3(ii), it follows that for xa

G(x)G(a)(1+s*g(a)G(a)(xa))+1/s*=(Gs*(a)+s*g(a)G1s*(a)(xa))+1/s*.

By Theorem 3(iii), the non-increasing monotonicity of g/G1−s* implies that

g(a)G1s*(a)=(1s*Gs*)(a)1s*Gs*(b)1s*Gs*(a)ba1s*vs*1s*us*ba=γ1.

The last inequality follows from noting that G(a) ≤ Un(a) ≤ u and G(b) ≥ Ln(b) ≥ v. Since xa ≤ 0, it follows that

G(x)(Gs*(a)+s*g(a)G1s*(a)(xa))+1/s*(Gs*(a)+s*γ1(xa))+1/s*(us*+s*γ1(xa))+1/s*.

The last inequality follows from noting that G(a) ≤ u.

On the other hand, by Theorem 3(ii), it follows that for xb

1G(x)(1G(b))(1s*g(b)1G(b)(xb))+1/s*=((1G(b))s*s*g(b)(1G(b))1s*(xb))+1/s*((1υ)s*s*g(b)(1G(b))1s*(xb))+1/s*.

The last inequality follows from noting that 1 − G(b) ≤ 1 − v. By Theorem 3(iii), the non-decreasing monotonicity of g/(1 − G)1−s* implies that

g(b)(1G)1s*(b)=(1s*(1G)s*)(b)1s*(1G(b))s*1s*(1G(a))s*ba=1s*((1G(a))s*(1G(b))s*)ba1s*((1u)s*(1υ)s*)ba=γ2.

The last inequality follows from noting that G(a) ≤ Un(a) ≤ u and G(b) ≥ Ln(b) ≥ v. Since xb ≥ 0, it follows that

1G(x)((1v)s*s*γ2(xb))+1/s*.

Proof of Theorem 7:

The following proof is analogous to the proof of Theorem 3 in Dümbgen et al. (2017), in which they proved the result in the case s* = 0. In the following proof we assume that s* ≠ 0.

(i) Suppose s* > 0. Since F is not bi-s*-concave, it follows that Fs* or (1 − F)s* is not concave. Without loss of generality, we assume that Fs* is not concave and hence there exist real numbers x0 < x1 < x2 such that Fs*(x1) < (1 − λ)Fs* (x0) + λFs* (x2), where λ ≡ (x1x0)/(x2x0) ∈ (0, 1). By the consistency of Ln and Un, it follows that, with probability tending to one, Uns*(x1)<(1λ)Lns*(x0)+λLns*(x2) and hence

Gs*(x1)<(1λ)Gs*(x0)+λGs*(x2),

for any G such that LnGUn. Therefore, there are no bi-s*-concave distribution functions fitting between Ln and Un and hence Lno=1 and Uno=0 with probability tending to one.

The proof of the case s* < 0 is similar and hence is omitted.

(ii) Suppose FPs. Note that since (Ln, Un) is a (1 − α) confidence band for F, it follows that P(LnoUno)P(LnFUn)1α.

If {GPs:LnGUn} is empty, it follows that Lno=1 and Uno=0 and hence the assertions are trivial. In the following proof, we assume that {GPs:LnGUn} is not empty.

To prove (13), we first prove that ‖LnFp 0 and ‖UnFp 0. By the continuity of F, for any mN+ with m ≥ 2, there exist real numbers {xi}i=1m1 such that F(xi) = i/m, i = 1, …, m − 1. Furthermore, define x0 = −∞ and xm = ∞.

By the non-decreasing monotonicity of Ln and F, it follows that for x ∈ [xi−1, xi]

Ln(x)F(x)Ln(xi)F(xi1)=Ln(xi)(F(xi)1m)=Ln(xi)F(xi)+1m,

and

Ln(x)F(x)Ln(xi1)F(xi)=Ln(xi1)(F(xi1)+1m)=Ln(xi1)F(xi1)1m.

Hence

|Ln(x)F(x)|maxi=1,,m1|Ln(xi)F(xi)|+1m

for x ∈ [xi−1,xi]. Note that

LnF=supxR|Ln(x)F(x)|=maxi=1,,msupx[xi1,xi]|Ln(x)F(x)|,

it follows that

LnFmaxi=1,,m1|Ln(xi)F(xi)|+1m,

and hence pointwise convergence implies uniform convergence. An analogous proof shows that ‖UnFp 0 and is omitted.

Combining ‖LnFp 0 and ‖UnFp 0 implies that

supGPs*:LnGUnGFLnF+UnFp0.

To prove (14) in the case that hG = (Gs*)′, it suffices to prove that

supGPs*:LnGUn(Gs*/s*)(Fs*/s*)K,p0. (20)

Note that hG/s* = G′/G1−s*. Since K is a compact interval in J(F) and hF/s* = f/F1−s* is continuous and non-increasing on J(F), for any fixed ϵ > 0 there exist points a0 < a1 ⋯ < am < am+1 in J(F) such that K ⊂ [a1, am] and

01s*hF(ai1)1s*hF(ai)ϵfor1im+1.

For GPs with LnGUn, for any xK it follows from the monotonicity of hF/s* and hG/s* that

supxK(1s*hG(x)1s*hF(x))maxi=1,,m1(1s*hG(ai)1s*hF(ai+1))maxi=1,,m1(1s*Gs*(ai)1s*Gs*(ai1)aiai11s*hF(ai+1))maxi=1,,m1(1s*Uns*(ai)1s*Lns*(ai1)aiai11s*hF(ai+1))=maxi=1,,m1(1s*Fs*(ai)1s*Fs*(ai1)aiai11s*hF(ai+1))+op(1)maxi=1,,m1(1s*hF(ai1)1s*hF(ai+1))+op(1)2ϵ+op(1).

Analogously,

supxK(1s*hF(x)1s*hG(x))maxi=1,,m1(1s*hF(ai)1s*hF(ai+2))+op(1)2ϵ+op(1).

Since ϵ > 0 is arbitrarily small, this shows that (20) holds.

The proof of (14) in the case that hG = ((1 − G)s*)′ is similar and hence is omitted.

Since G′ = G1−s* (Gs* / s*)′, it follows from (20) that (14) holds in the case that hG = G′.

Finally, let x1 < sup J(F) and b1 < f(x1)/F1−s*(x1). As in the proof of Lemma 6(ii) an analogous argument implies that for any x1>x1, x1J(F),

Uno(x)(Uns*(x)+s*1s*Lns*(x1)1s*Uns*(x1)x1x1(xx))+1/s*

for all xx′x1.

Note that by the consistency of Ln and Un and letting x1x1, it follows that.

1s*Lns*(x1)1s*Uns*(x1)x1x1p1s*Fs*(x1)1s*Fs*(x1)x1x1>b1.

Hence with probability tending to one,

Uno(x)(Uns*(x)+s*b1(xx))+1/s*,

for all xx′x1. The proof of (16) is similar and hence is omitted. □

Proof of Remark 1:

(i) By Theorem 3(ii), if s* > 0 and inf J(F) = −∞, it follows that for arbitrary xJ(F),

F(y)F(x)(1+s*f(x)F(x)(yx))+1/s*=0

for small enough y such that

1+s*f(x)F(x)(yx)<0.

This violates the assumption that inf J(F) = −∞ and hence inf J(F) > −∞. The finiteness of sup J(F) can be proved similarly and hence is omitted.

(ii) We first note that (9) holds automatically if inf J(F) > −∞ and sup J(F) < ∞.

In the following proof, we focus on the case that inf J(F) = −∞ and sup J(F) < ∞. To prove (9), it suffices to show that ∫ |x|tdF(x) is finite for t ∈ (0, (−1)/s*). Note that

|x|tdF(x)=E|X|t=0P(|X|t>a)da=0P(|X|>a1/t)da=0tat1P(|X|>a)da=0tat1P(X>a)da+0tat1P(X<a)da.

Since sup J(F) is finite, the first term of the last display is finite and hence it suffices to prove that tat−1P(X < −a) is integrable for t < (−1)/s*.

It follows from Theorem 3(ii) that for any a large enough and xJ(F),

P(X<a)F(x)(1+s*f(x)(ax)F(x))+1/s*=F(x)(1+s*f(x)(a+x)F(x))+1/s*.

Thus tat−1P(X < −a) is integrable for t < (−1)/s*, since

tat1P(X<a)tF(x)at1(1+s*f(x)(a+x)F(x))+1/s*=tF(x)(s*f(x)F(x))1/s*at1(a+x+F(x)s*f(x))+1/s*2tF(x)(s*f(x)F(x))1/s*at+1/s*1

for a large enough and at+1/s*−1 is integrable for t < (−1)/s*.

For other cases, the proof is similar and hence is omitted. □

Proof of Corollary 8:

Suppose that x0 is a point in J(F). Notice that for any zR,

ϕ(z)ϕ(x0)=R(1[x0x<z]1[zx<x0])ϕ(x)dx,

and hence by Fubini’s theorem, it follows that

RϕdG=ϕ(x0)+Rϕ(x)(1[xx0]G(x))dx, (21)

provided that

R|ϕ(x)||1[xx0]G(x)|dx<.

To prove the last display, note that for any b1 ∈ (0, T1(F)) and b2 ∈ (0, T2(F)), there exist points x1, x2J(F) with x1x0x2 and

fF1s*(x1)>b1, f(1F)1s*(x2)>b2.

Then it follows from Theorem 7(ii) that with probability tending to one,

Uno(x)(Uns*(x1)+s*b1(xx1))+1/s*forxx1,

and

1Lno(x)((1Ln(x2))s*s*b2(xx2))+1/s*forxx2.

Hence for any c > max{|x1|, |x2|}, it follows that

x1c|ϕ(x)||1[xx0]G(x)|dx=x1x|ϕ(x)|G(x)dxx1c|ϕ(x)|Uno(x)dxx1c|ϕ(x)|(Uns*(x1)+s*b1(xx1))+1/s*dx=x1c|ϕ(x)|(Uns*(x1)+s*b1(xx1))1/s*dx.

Since |ϕ′(x)| ≤ a|x|k−1, it follows that the last display is no larger than

x1ca|x|k1(Uns*(x1)+s*b1(xx1))1/s*dx,

which is finite by noting that k − 1 + 1/s* < −1. Analogously, one can prove that for c > max{|x1|, |x2|},

x2+c|ϕ(x)||1[xx0]G(x)|dxx2+c|ϕ(x)||1Lno(x)|dx<.

Since ϕ′ is continuous on R, it follows that for any c > max{|x1|, |x2|},

x1cx2+c|ϕ(x)||1[xx0]G(x)|dx<

and hence

R|ϕ(x)||1[xx0]G(x)|dx<.

By (21), it follows that

supG:LnoGUno|ϕdGϕdF|=supG:LnoGUno|ϕ(x)(FG)(x)dx|,

which is not larger than

supG:LnoGUnoGFx1cx2+c|ϕ(x)|dx+x1c|ϕ(x)|(F+Uno)(x)dx+x2+c|ϕ(x)|(1F+1Lno)(x)dxop(1)+2x1c|ϕ(x)|Uno(x)dx+2x2+c|ϕ(x)|(1Lno(x))dx.

Note that the last two terms go to zero as c goes to infinity by their integrability and hence

supG:LnoGUno|ϕdGϕdF|=op(1).

Proof of Theorem 9: It follows from the proof of Corollary 8 that

supG:LnoGUno|ϕdGϕdF|=supG:LnoGUno|ϕ(x)(FG)(x)dx|

and hence

supG:LnoGUno|ϕdGϕdF|supG:LnoGUno|ϕ(x)||(GF)(x)|dx.

It suffices to bound |GF | on R, where G is between Lno and Uno.

It follows from GUnoUn and Condition (*) that on the interval {λn1/(22γ)Fn1λn1/(22γ)},

GFUnoFUnFUnFn+FnFκn1/2(Fn(1Fn))γ+|FnF|

To bound FnF, it follows from Theorem 3.7.1, page 141, Shorack and Wellner (2009) that

n(FnF)UF(F(1F))γp0

by verifying that q(t) = (t(1 − t))γ with 0 ≤ γ < 1/2 is monotonically increasing on [0, 1/2], symmetric about 1/2 and 01q2(t)dt<, where U is Brownian bridge on [0, 1].

Hence for any fixed ϵ ∈ (0, 1) there exists a constant κϵ > 0 such that with probability at least 1 − ϵ,

|FnF|κϵn1/2(F(1F))γ

on R. Thus, it follows that on the interval {λn1/(22γ)Fn1λn1/(22γ)},

GFκn1/2(Fn(1Fn))γ+κϵn1/2(F(1F))γ.

To bound Fn(1Fn) by F(1 − F), note that

Fn(1Fn)=(FnF+F)(1F+FFn)=(FnF)(1F)+F(1F)(FnF)2F(FnF)=F(1F)+(FnF)(12F)(FnF)2F(1F)+|FnF||12F|+|FnF|2F(1F)+|FnF|+|FnF|=F(1F)(1+2|FnF|F(1F))F(1F)(1+4|FnF|min{F,1F})sinceF(1F)min{F,1F}/2F(1F)(1+4κϵn1/2(F(1F))γmin{F,1F}).

For a constant λϵ > 0 to be specified later, it follows from λϵn−1/(2−2γ)F ≤ 1 − λϵn−1/(2−2γ) and γ ∈ [0, 1/2) that

(F(1F))γF=Fγ1(1F)γλϵγ1n(γ1)/(22γ)=λϵγ1n1/2

and

(F(1F))γ1F=Fγ(1F)γ1λϵγ1n(γ1)/(22γ)=λϵγ1n1/2.

Hence

Fn(1Fn)F(1F)(1+4κϵn1/2λϵγ1n1/2)=F(1F)(1+4κϵλϵγ1).

Thus, on the interval

{λn1/(22γ)Fn1λn1/(22γ)}{λϵn1/(22γ)F1λϵn1/(22γ)},
GFκn1/2(F(1F)(1+4κϵλϵγ1))γ+κϵn1/2(F(1F))γ=νϵn1/2(F(1F))γ,

where ν=κ(1+4κλγ1)γ+κ.

The following arguments show that for a large enough λϵ, the interval {λϵn−1/(2−2γF ≤ 1 − λϵn−1/(2−2γ) is a subset of {λn1/(22γ)Fn1λn1/(22γ)}.

To see this, note that

Fn=F+FnF(1|FnF|F)F(1κϵn1/2Fγ1(1F)γ)F(1κϵn1/2λϵγ1n1/2)λϵn1/(22γ)=(λϵκϵλϵγ)n1/(22γ)

and analogously,

1Fn(λϵκϵλϵγ)n1/(22γ),

it follows that by choosing a λϵ large enough such that λκλλ>λ, the interval {λϵn−1/(2−2γ)F ≤ 1 − λϵn−1/(2−2γ)} is a subset of {λn1/(22γ)Fn1λn1/(22γ)} and hence on the interval

{λϵn1/(22γ)F1λϵn1/(22γ)},
GFνϵn1/2(F(1F))γ.

Define xn1 and xn2, such that F(xn1) = λϵn−1/(2−2γ) and F(xn2) = 1 − λϵn−1/(2−2γ). Analogously, one can prove that FGνϵn−1/2(F(1−F))γ on [xn1,xn2] and hence

|GF|νϵn1/2(F(1F))γ (22)

on [xn1,xn2]. Thus for G between Lno and Uno,

supG:LnoGUno|ϕd(GF)|=supG:LnoGUno|ϕ(x)(F(x)G(x))dx|νϵn1/2xn1xn2|ϕ(x)|Fγ(x)(1F(x))γdx+xn1|ϕ(x)|(F(x)+Uno(x))dx+xn2|ϕ(x)|(2F(x)Lno(x))dx.

From here, we can see that if FPs with s* > 0, it follows from Remark 1(i) that J(F) is bounded and hence

supG:LnoGUno|ϕd(GF)|=Op(n1/2)

as long as ϕ′ is bounded on J(F).

The similar argument works if FPs with s* < 0 and J(F) is bounded. In the following proof, we get back to our case that FPs with s* < 0 and without loss of generality, we assume J(F) = (−∞, ∞).

As in the proof of Corollary 8, for x0J(F), b1 ∈ (0, T1(F)) and b2 ∈ (0,T2(F)), there exist points x1,x2J(F) with x1 < x0 < x2 such that f(x1)/F1−s*(x1) > b1 and f(x2)/(1 − F(x2))1−s* > b2. Then it follows from Theorem 7(ii) that with asymptotic probability one,

Uno(x)(Uns*(x1)+s*b1(xx1))+1/s*=Un(x1)(1+s*b1Uns*(x1)(xx1))1/s*forxx1, (23)

and

1Ln0(x)((1Ln(x2))s*s*b2(xx2))+1/s*=(1Ln(x2))(1s*b2(1Ln(x2))s*(xx2))1/s*forxx2.

Similarly, it follows from Theorem 3(ii) that

F(x)F(x1)(1+s*f(x1)F(x1)(xx1))+1/s*F(x1)(1+s*b1Fs*(x1)(xx1))1/s*forxx1, (24)

and

1F(x)(1F(x2))(1s*f(x2)1F(x2)(xx2))+1/s*(1F(x2))(1s*b2(1F(x2))s*(xx2))1/s*forxx2.

For large enough n, one can have [x1, x2] ⊂ [xn1, xn2] and hence

supG:LnoGUno|ϕd(GF)|In0+In1+In1+In2+In2,

where

In0νϵn1/2x1x2|ϕ(x)|Fγ(x)(1F(x))γdx,In1νϵn1/2xn1x1|ϕ(x)|Fγ(x)(1F(x))γdx,In2νϵn1/2x2xn2|ϕ(x)|Fγ(x)(1F(x))γdx,In1xn1|ϕ(x)|(F(x)+Uno(x))dx,In2xn2|ϕ(x)|(2F(x)Lno(x))dx.

Note that In0νn1/2x1x2|ϕ(x)|dx=O(n1/2). For the other terms, first note that F(xn1) = λϵn−1/(2−2γ) and hence it follows from (24) that

xn1x1Fs*(x1)s*b1+λϵs*s*b1ns*/(22γ)=O(1)+λϵs*s*b1ns*/(22γ).

Analogously, one can prove that

xn2x2(1F(x2))s*s*b2λϵs*s*b2ns*/(22γ)=O(1)+λϵs*s*b1ns*/(22γ).

Thus, it follows from (24) and the upper bound of |ϕ′| that

In1νn1/2xn1x1|ϕ(x)|Fγ(x)dxO(n1/2xn1x1|ϕ(x)|(1+s*b1Fs*(x1)(xx1))γ/s*dx)O(n1/20O(ns*/(22γ))|ϕ(x)|(1+s*b1Fs*(x1)x)γ/s*dx)=O(n1/20O(ns*/(22γ))|ϕ(x)|xγ/s*dx)O(n1/20O(ns*/(22γ))xk1xγ/s*dx)=O(n1/2n(k+γ/s*)s*/(22γ))=O(n12(1+s*k1γ))

Analogously, one could show that

In2O(n12(1+s*k1γ)).

To bound In1, note that for xxn1, it follows from an analogous proof of (24) that

F(x)(Fs*(xn1)+s*b1(xxn1))1/s*=(λϵs*ns*/(22γ)+s*b1(xxn1))1/s*.

Analogously, it follows that for xxn1,

Uno(x)(Uns*(xn1)+s*b1(xxn1))1/s*.

Note that it follows from (22) that

Un(xn1)=Un(xn1)F(xn1)+F(xn1)νϵn1/2(F(xn1)(1F(xn1)))γ+F(xn1)νϵn1/2Fγ(xn1)+F(xn1)=(νϵλϵγ+λϵ)n1/(22γ)

and hence for xxn1,

Un0(x)((νϵλϵγ+λϵ)s*ns*/(22γ)+s*b1(xxn1))1/s*.

Thus,

In1=xn1|ϕ(x)|(F(x)+Uno(x))dx=O(xn1|ϕ(x)|(ns*/(22γ)+s*b1(xxn1))1/s*dx)=O(0|ϕ(x+xn1)|(ns*/(22γ)+s*b1x)1/s*dx)=O(0|x+xn1|k1(ns*/(22γ)+s*b1x)1/s*dx)=O(n1/(22γ)0|x+xn1|k1(1+s*b1x/ns*/(22γ))1/s*dx)=O(n1/(22γ)ns*/(22γ)0|xns*/(22γ)+xn1|k1(1+s*b1x)1/s*dx)=O(n1/(22γ)ns*/(22γ)0|xns*/(22γ)+ns*/(22γ)|k1(1+s*b1x)1/s*dx)=O(n1/(22γ)nks*/(22γ)0|x|k1|x|1/s*dx)=O(n(ks*+1)/(22γ)).

Analogously, one could show that

In2O(n(ks*+1)/(22γ)).

Hence

supG:Ln0GUno|ϕd(GF)|In0+In1+In1+In2+In2O(n1/2)+O(n(ks*+1)/(22γ)).

Supplementary Material

1
2

Table 1:

Summary of Examples 1-8

Name Example density
f
d.f.
F
s s* = s/(1 + s) γ¯(F)=1s
student-t 1 fr, r > 0 Fr − 1/(1 + r) − 1/r 1 + (1/r)
Fa,b 2 fa,b, a, b > 0 Fa,b − 1/(1 + a/2) −2/a 1 + 2/a
Pareto(a, b) 3 fa,b, a, b > 0 Fa,b − 1/(1 + a) − 1/a 1 + 1/a
Symmetric Beta 4 fr, r > 0 Fr 2/r 2/(r + 2) 1/(1 + 2/r) = r/(r + 2)
Expo family Tilted U(0, 1) 5 ft,tR Ft 0 e −|t| 1 − e−|t|
Mixture, N(δ, 1), N(−δ, 1) 6 fδ Fδ not s-concave for δ > 1 0 for 0 < δ < 1.34 1
0 < δ < 1.34
Mixture, T(δ, 1), T(−δ, 1) 7 fδ Fδ not s-concave δ > .6 bi-s*-concave, some s* 0 < δ < ∞ 2
δ small
Lévy α = 1/2 8 fa Fa −2/3 −2 3

Highlights:

  • New classes of shape-constrained distributions are defined and studied:

  • New confidence bands which exploit the shape - constraints are defined and shown to improve on existing bands if the assumed shape constraint holds

  • The new classes of shape-constrained distribution functions, which we call bi-s*-concave, play an important role in the theory of quantile processes.

Acknowledgements:

We owe thanks to Lutz Dümbgen for several helpful discussions. We also thank two referees for their positive comments and suggestions.

The research of J. A. Wellner was partially supported by NSF grant DMS-1566514, NIAID grant 2R01 AI291968-04, a Simons Fellowship via the Newton Institute (INI-program STS 2018), Cambridge University, and the Saw Swee Hock Visiting Professorship of Statistics at the National University of Singapore (in 2019).

7. Appendix 1

Proof of the equivalence between Definition 1 and Definition 2.

Definition 1 implies Definition 2:

For any FPs, Theorem 3 shows that F is a continuous function on R. By noticing that J(F)R, J(F) ⊂ (inf J(F), ∞) and J(F) ⊂ (−∞, sup J(F)), the convexity or concavity of Fs* or (1−F)s* on R, (inf J(F), ∞) and (−∞, sup J(F)) imply the convexity or concavity of Fs* or (1 − F)s* on J(F). Hence, Definition 1 implies Definition 2.

Definition 2 implies Definition 1:

Suppose s* < 0. By Definition 2, for any FPs, Fs* and (1 − F)s* are convex on J(F). Moreover, F is continuous on R and hence J(F) = (a, b) where a ≡ inf J(F), b ≡ sup J(F).

To prove that Fs* is convex on R, by continuity of F it suffices to prove that Fs* is mid-point convex: that is,

Fs*(x2+y2)12Fs*(x)+12Fs*(y) (25)

for any x,yR. Without loss of generality, we assume that x < y.

Note that if a = −∞ and b = ∞, then there is nothing to prove. Without loss of generality, we assume that a > −∞ and b < ∞.

Note that if x ∈ (−∞, a], then Fs*(x) = ∞ and hence (25) holds automatically. If x ∈ (a, b) and y ∈ (a, b), (25) holds by the convexity of Fs* on J(F). Moreover, by noticing the continuity of Fs* at b, (25) holds for any x ∈ (a, b) and y ∈ (a, b]. Since Fs*(y) = Fs*(b) = 1 for yb, (25) holds for any x ∈ (a, b) and y ∈ [b, ∞). If x, y ∈ [b, ∞), (25) holds automatically since Fs* (x) = Fs*(y) = 1.

The proof of the convexity of (1 − F)s* on R is similar and hence is omitted. For the cases that s* ≥ 0, the proof is similar and hence is omitted. □

Proof of Theorem 3 (0 ≤ s* ≤ 1):

Recall that a ≡ inf J(F) and b ≡ sup J(F). Suppose 1 ≥ s* > 0.

(i) implies (ii):

Suppose FPs. To prove that F is continuous on R, we first note that xFs* (x) and x ↦ (1 − F(x))s* (x) are concave functions on (a, ∞) and (−∞, b) respectively. By Theorem 10.1 (page 82) in Rockafellar (1970), Fs* and (1 − F(x))s* are continuous on any open convex sets in their effective domains. In particular, Fs* and (1 − F)s* are continuous on (a, ∞) and (−∞, b) respectively. This yields that F is continuous on (a, ∞) and (−∞, b), or equivalently, on (a, ∞)∪(−∞, b) = (−∞, ∞) since F is non-degenerate.

To prove that F is differentiable on J(F), note that J(F) = (a, b) since F is continuous on R. By Theorem 23.1 (page 213) in Rockafellar (1970), for any xJ(F), the concavity of Fs* on J(F) implies the existence of (Fs)+(x) and (Fs)(x). Moreover, (Fs)(x)(Fs)+(x) by Theorem 24.1 (page 227) in Rockafellar (1970). Since F = (Fs*)1/s* on J(F) the chain rule guarantees the existence of F±(x) and

F±(x)=1s*(Fs*)1/s*1(x±)(Fs*)±(x).

Since F is continuous on J(F), then

F±(x)=1s*(Fs*)1/s*1(x)(Fs*)±(x).

Hence F(x)F+(x) by (Fs)(x)(Fs)+(x).

Similarly, one can prove F(x)F+(x) by the concavity of (1 − F)s* on J(F). Thus F(x)=F+(x)=F(x) for any xJ(F), or equivalently, F is differentiable on J(F). The derivative of F is denoted by f, i.e. fF′.

To prove (6), note that the concavity of xFs* (x) on J(F) implies that, for any x, yJ(F),

Fs*(y)Fs*(x)(yx)(Fs*)(x)=(yx)s*Fs*1(x)f(x),

or, with x+ = max{x, 0},

Fs*(y)Fs*(x)(1+s*f(x)F(x)(yx))+.

Hence

F(y)F(x)(1+s*f(x)F(x)(yx))+1/s*,

or, equivalently,

F(y)F(x)(1+s*f(x)F(x)(yx))+1/s*.

Analogously, the convexity of (1 − F(x))s* on J(F) implies that for any x, yJ(F)

(1F(y))s*(1F(x))s*(yx)s*(1F(x))s*1f(x),

or, equivalently,

(1F(y)1F(x))s*(1s*f(x)1F(x)(yx))+,

which yields

F(y)1(1F(x))(1s*f(x)1F(x)(yx))+1/s*.

The proof of (6) is complete.

(ii) implies (iii):

Applying (6) yields that for any x, yJ(F) with x < y,

Fs*(x)Fs*(y)1+s*f(y)F(y)(xy),

and

Fs*(y)Fs*(x)1+s*f(x)F(x)(yx),

or, equivalently,

Fs*(x)Fs*(y)+s*f(y)F1s*(y)(xy),

and

Fs*(y)Fs*(x)+s*f(x)F1s*(x)(yx).

By defining hf/F1−s* on J(F), it follows that

Fs*(x)Fs*(y)+s*h(y)(xy),

and

Fs*(y)Fs*(x)+s*h(x)(yx).

After summing up the last two inequalities, it follows that

Fs*(x)+Fs*(y)Fs*(y)+s*h(y)(xy)+Fs*(x)+s*h(x)(yx),

or, equivalently,

0s*(h(x)h(y))(yx).

Hence h(x) ≥ h(y), or equivalently, h(·) is a monotonically non-increasing function on J(F).

The proof of the monotonicity of h˜f/(1F)1s* is similar and hence is omitted.

(iii) implies (iv):

If (iii) holds, it immediately follows that f > 0 on J(F) = (a, b). If not, suppose that f(x0) = 0 for some x0J(F). It follows that h(x0) = f(x0)/F1−s*(x0) = 0. Since h is monotonically non-increasing on J(F), h(x) = 0 for all x ∈ [x0, b), or, equivalently, f = 0 on [x0, b). Similarly, the non-decreasing monotonicity of xh˜(x) on J(F) implies that f = 0 on (a, x0]. Then f = 0 on J(F), which violates the continuity assumption in (iii) and hence f > 0 on J(F).

To prove f is bounded on J(F), note that the monotonicities of h and h~ imply that for any x, x0J(F),

f(x)={F1s*h(x)h(x)h(x0),ifxx0,(1F(x))1s*h˜(x)h˜(x)h˜(x0),ifxx0.

Hence f(x)max{h(x0),h˜(x0)} for any x, x0J(F).

To prove that f is differentiable on J(F) almost every, we first prove that f is Lipschitz continuous on (c, d) for any c, dJ(F) with c < d.

By noticing the non-increasing monotonicity of h on J(F), the following arguments yield an upper bound of (f(y) − f(x))/(yx) for x, y ∈ (c, d):

f(y)f(x)yx=F1s*(y)h(y)F1s*(x)h(x)yx=h(y)F1s*(y)F1s*(x)yx+F1s*(x)h(y)h(x)yxh(y)F1s*(y)F1s*(x)yx=h(y)(1s*)f(z)Fs*(z),

where the last equality follows from the mean value theorem and z is between x and y.

Since − s* < 0, it follows that Fs*(z) < Fs* (c) and hence

f(y)f(x)yx(1s*)f(z)h(y)Fs*(z)(1s*)max{h(x0),h˜(x0)}h(c)Fs*(c),

for x, y ∈ (c, d).

Similar arguments imply that

f(y)f(x)yx=F¯1s*(y)h˜(y)F¯1s*(x)h˜(x)yx=h˜(y)F¯1s*(y)F¯1s*(x)yx+F¯1s*(x)h˜(y)h˜(x)yxh˜(y)F¯1s*(y)F¯1s*(x)yx=h˜(y)(1s*)F¯s*(z)f(z)(1s*)max{h(x0),h˜(x0)}h˜(d)F¯s*(d).

Hence

|f(y)f(x)yx|(1s*)max{h(x0),h˜(x0)}max{h(c)Fs*(c),h˜(d)F¯s*(d)}.

The last display shows that f is Lipschitz continuous on (c, d).

By Proposition 4.1(iii) of Shorack (2017), page 82, f is absolutely continuous on (c, d), and hence f is differentiable on (c, d) almost everywhere.

Since (c, d) is an arbitrary interval in (a, b), the differentiability of f on (c, d) implies the differentiability of f on (a, b) and hence f is differentiable on (a, b) with f′ = F″ almost everywhere.

Since f is differentiable almost everywhere, the non-increasing monotonicity of h on J(F) implies that

h(x)0almost everywhere onJ(F),

or, equivalently,

log(h)(x)0almost everywhere onJ(F).

Straight-forward calculation yields that the last display is equivalent to

ff(1s*)fF0almost everywhere onJ(F),

or,

f(1s*)f2Falmost everywhere onJ(F),

which is the right hand side of (8).

Similarly, the non-decreasing monotonicity of h~ implies the left hand side of (8).

(iv) implies (i):

Since F is continuous on R, it suffices to prove that Fs* is convex on J(F) by Definition 2. Since we assume that F is differentiable on J(F) with derivative f = F′, the concavity of Fs* on J(F) can be proved by the non-increasing monotonicity of the first derivative of Fs* on J(F). Since f is differentiable almost everywhere on J(F), the non-increasing monotonicity of (Fs*)′ on J(F) can be proved by the non-positivity of (Fs*)″ on J(F) almost everywhere, which follows from

(Fs*)(x)=s*Fs*1(x)((1s*)f2(x)F(x)+f(x))0,

where f = F′, f′ = F″. The last inequality follows from the right hand side of (8).

Similarly, the concavity of (1 − F(x))s*, or Fs, on J(F) can be proved by the following arguments:

(F¯s*)(x)=s*F¯s*1(x)((1s*)f2(x)F¯(x)f(x))0,

where the last inequality follows from the left part of (8). □

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Barrio ED, Giné E and Utzet F (2005). Asymptotics for L2 functionals of the empirical quantile process, with applications to tests of fit based on weighted wasserstein distances. Bernoulli 11 131–189. [Google Scholar]
  2. Bobkov S and Ledoux M (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances, vol. 261. Mem. Amer. Math. Soc [Google Scholar]
  3. Borell C (1975). Convex set functions in d-space. Periodica Mathematica Hungarica 6(2) 111–136. [Google Scholar]
  4. Brascamp HJ and Lieb EH (1976). On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis 22 366–389. [Google Scholar]
  5. Csörgő M and Révész P (1978). Strong approximations of the quantile process. Ann. Statist 6 882–894. [Google Scholar]
  6. Dharmadhikari S and Joag-Dev K (1988). Unimodality, Convexity, and Applications. Academic Press. [Google Scholar]
  7. Dümbgen L, Kolesnyk P and Wilke RA (2017). Bi-log-concave distribution functions. Journal of Statistical Planning and Inference 184 1–17. [Google Scholar]
  8. Dümbgen L and Wellner JA (2014). Confidence bands for distribution functions: A new look at the law of the iterated logarithm. Tech. rep., Department of Statistics, University of Washington. [Google Scholar]
  9. Durrett R (2019). Probability—Theory and Examples, vol. 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. Fifth edition of [MR1068527]. [Google Scholar]
  10. Erdős P and Stone A (1970). On the sum of two Borel sets. Proceedings of the American Mathematical Society 25 304–306. [Google Scholar]
  11. Gardner RJ (2002). The Brunn-Minkowski inequality. Bull. Amer. Math. Soc. (N.S.) 39 355–405. [Google Scholar]
  12. Grenander U (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr 39 125–153 (1957). [Google Scholar]
  13. Hall P (1984). On unimodality and rates of convergence for stable laws. J. London Math. Soc (2) 30 371–384. [Google Scholar]
  14. Kleiber C and Kotz S (2003). Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley & Sons. [Google Scholar]
  15. Massart P (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab 18 1269–1283. [Google Scholar]
  16. Owen AB (1995). Nonparametric likelihood confidence bands for a distribution function. Journal of the American Statistical Association 90 516–521. [Google Scholar]
  17. Rinott Y (1976). On convexity of measures. Ann. Probab 4 1020–1026. [Google Scholar]
  18. Robertson T, Wright FT and Dykstra RL (1988). Order Restricted, Statistical Inference. Wiley & Sons. [Google Scholar]
  19. Rockafellar RT (1970). Convex Analysis. Princeton University Press. [Google Scholar]
  20. Samworth RJ (2018). Recent progress in log-concave density estimation. Statist. Sci 33 493–509. [Google Scholar]
  21. Samworth RJ and Sen B (2018). Editorial: special issue on “Nonparametric inference under shape constraints”. Statist. Sci 33 469–472. [Google Scholar]
  22. Saumard A (2019). Bi-log-concavity: some properties and some remarks towards a multi-dimensional extension. Electron. Commun. Probab 24 Paper No. 61, 8. [Google Scholar]
  23. Shorack GR (2017). Probability for Statisticians. Springer. [Google Scholar]
  24. Shorack GR and Wellner JA (2009). Empirical Processes with Applications to Statistics, vol. 59 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. Reprint of the 1986 original [MR0838963]. [Google Scholar]
  25. van Eeden C (1956). Maximum likelihood estimation of ordered probabilities. Statist. Afdeling S 188 (VP 5), Math. Centrum Amsterdam. [Google Scholar]
  26. Woolridge JM (2000). Instructional Stata datasets for econometrics. Boston College Department of Economics. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES