Abstract
We introduce new shape-constrained classes of distribution functions on , the bi-s*-concave classes. In parallel to results of Dümbgen et al. (2017) for what they called the class of bi-log-concave distribution functions, we show that every s-concave density f has a bi-s*-concave distribution function F for s* ≤ s/(s + 1).
Confidence bands building on existing nonparametric confidence bands, but accounting for the shape constraint of bi-s*-concavity, are also considered. The new bands extend those developed by Dümbgen et al. (2017) for the constraint of bi-log-concavity. We also make connections between bi-s*-concavity and finiteness of the Csörgő - Révész constant of F which plays an important role in the theory of quantile processes.
Keywords: log-concave, bi-log-concave, shape constraint, s-concave, quantile process, Csörgő - Révész condition, hazard function
1. Introduction
Statistical methods based on shape constraints have been developing rapidly during the past 15 - 20 years. From the classical univariate methods based on monotonicity going back to the work of Grenander (1956) and van Eeden (1956) in the 1950’s and 1960’s, research has progressed to consideration of convexity type constraints in a variety of problems including estimation of density functions, regression functions, and other “nonparametric” functions such as hazard (rate) functions. See Samworth and Sen (2018) for a summary and overview of some of this recent activity.
One very appealing shape constraint is log-concavity: a (density) function is log-concave if log f is concave (with log 0 = −∞). See Samworth (2018) for a recent review of the properties of log-concave densities and their relevance for statistical applications. While much of the current literature has focused on point estimation, our main focus here will be on inference for one-dimensional distribution functions and especially on (honest, exact) confidence bands for distribution functions which take advantage of shape constraints.
To this end, Dümbgen et al. (2017) introduced the class of bi-log-concave distribution functions defined as follows: a distribution function F on is bi-log-concave if both F and 1 – F are log-concave. They provided several different equivalent characterizations of this property, and noted (the previously known fact) that if f is a log-concave density, then the corresponding distribution function F and survival function 1 – F are both log-concave. But the converse is false: there are many bi-log-concave distribution functions F with density f which fail to be log-concave; see Section 2 below for an explicit example. Dümbgen et al. (2017) also showed how to construct confidence bands which exploit the bi-log-concave shape constraint and thereby obtain narrower bands, especially in the tails, with correct coverage when the bi-log-concave assumption holds.
However, a difficulty with the assumption of bi-log-concavity is that the corresponding density functions inherit the requirement of exponentially decaying tails of the class of log-concave densities, and this rules out distribution functions F with tails decaying more slowly than exponentially. Here we introduce new shape-constrained families of distribution functions F, which we call the bi-s*-concave distributions, with tails possibly decaying more slowly (or more rapidly) than exponentially. As the name indicates, these families involve a parameter s* ∈ (−∞, 1] which allows heavier than exponential tails when s* < 0, lighter than exponential tails when s* > 0, and which correspond to exactly the bi-log-concave class introduced by Dümbgen et al. (2017) when s* = 0.
Here is an outline of the rest of the paper. In Section 2 we give careful definitions of the new classes of bi-s*-concave distributions. We also present several helpful examples and discuss some basic properties of the new classes and their connections to the classes of s-concave densities studied by Borell (1975), Brascamp and Lieb (1976), and Rinott (1976). (See also Dharmadhikari and Joag-Dev (1988), and Gardner (2002).) Section 3 contains the main theoretical results of the paper. The connection between the bi-s*-concave class and a key condition in the theory of quantile processes, the Csörgő - Révész condition, is discussed in Corollary 4. Finally, we give two tail bounds for distribution functions , see Corollary 5.
In Section 4 we first introduce the new confidence bands for a distribution function assuming s* is known. We also discuss some of their theoretical properties: the consistency of confidence bands is discussed in Theorem 7, and Theorem 9 provides a rate of convergence for linear functionals of bi-s*-distribution functions contained in the bands. This extends Theorem 5 of Dümbgen et al. (2017). We then briefly discuss the algorithms used to compute the new bands, and illustrate the new bands with real and artificial data. Section 5 gives a brief summary and statements of further problems. An especially important remaining problem concerns construction of confidence bands when s* is unknown. The proofs for all the results in Sections 2, 3, and 4 are given in Sections 6 and 7.
We conclude this section with some notation which will be used throughout the rest of the paper. The supremum norm of a function is denoted by , and for we write . For a function x ↦ f(x),
assuming that the indicated limits exist. In general, we use F and f to denote a distribution function and the corresponding density function with respect to Lebesgue measure, and we set .
2. Definitions, Examples, and First Properties
As we discussed above, for distribution functions F on , Dümbgen et al. (2017) introduced a shape constraint they called bi-log-concavity by requiring that both F and 1 – F be log-concave.
In this paper, we generalize the bi-log-concave distribution functions by introducing and studying bi-s*-concave distributions defined as follows.
Definition 1.
For −∞ < s* < 0, a distribution function F is bi-s*-concave if both x ↦ Fs* (x) and x ↦ (1 − F(x))s* are convex functions from to [0, ∞].
For s* = 0, a distribution function F is bi-s*-concave (or bi-log-concave) if both x ↦ log(F(x)) and x ↦ log (1 − F(x)) are concave functions from to [−∞, 0].
For 0 < s* ≤ 1, a distribution function F is bi-s*-concave if x ↦ Fs* (x) is concave from (inf J(F), ∞) to [0, 1] and x → (1 − F(x))s* is concave from (−∞, sup J(F)) to [0, 1].
The class of bi-s*-concave distribution functions is denoted by , i.e.
Definition 2. (Alternative to Definition 1.)
A distribution function F is bi-s*-concave if it is continuous on and satisfies the following properties on J(F):
For −∞ < s* < 0, both x ↦ Fs*(x) and x ↦ (1 − F(x))s* are convex functions on J(F).
For s* = 0, both x ↦ log(F(x)) and x ↦ log (1 − F(x)) are concave functions on J(F).
For 0 < s* ≤ 1, both x ↦ Fs* (x) and x ↦ (1 − F(x))s* are concave functions on J(F).
See the Appendix, Section 7, for a proof of the equivalence of Definitions 1 and 2. The main benefit of the second definition is that it is immediately clear that any bi-s*-concave distribution function F is continuous since continuity of F is explicitly required in Definition 2. Moreover, to verify we only need to verify the convexity or concavity of Fs* or (1 − F)s* on the same interval J(F).
Recall that a density function f is s-concave if fs is convex for s < 0, fs is concave for s > 0, and log f is concave for s = 0. Two basic properties linking s-concave densities and bi-s*-concave distribution functions are given in the following two propositions. Proposition 1 generalizes the case s = 0 as noted above, while Proposition 2 generalizes the corresponding nestedness property of the classes of s-concave densities; see e.g. Dharmadhikari and Joag-Dev (1988), page 86, and Borell (1975), page 111.
Proposition 1. Suppose a density function f is s-concave with s ∈ (−1, ∞). Then the corresponding distribution function F is bi-s*-concave for all s* ≤ s/(1 + s).
Proposition 2. The bi-s*-concave classes are nested in the following sense:
| (1) |
Moreover, the bi-s*-concave classes are continuous at s* = 0 in the following sense:
| (2) |
In view of the nesting property (1), for each for some s* we define
Similarly if f is s-concave for some s we define
We often drop the subscript 0 if the meaning is clear. For other basic properties of s-concave densities and bi-s*-concave distribution functions, including results concerning closure under convolution, see Borell (1975), Dharmadhikari and Joag-Dev (1988), and Saumard (2019).
Now we introduce two important parameters, one of which will appear in connection with our characterization of the class of bi-s*-concave distribution functions in the next section and in our examples below. The Csörgő - Révész constant of a bi-log-concave distribution function F, denoted by , is given by
| (3) |
provided that F is differentiable on with derivative f ≡ F′ and f is differentiable almost everywhere on J(F) with derivative f′ = F″. Here the essential supremum is with respect to Lebesgue measure. Alternatively (and suited for our characterization Theorem 3),
| (4) |
Note that since u∧(1 − u) ≤ 2u(1 − u) ≤ 2{u ∧ (1 − u)} it follows that , and hence finiteness of γ(F) is equivalent to finiteness of . The Csörgő - Révész constant arises in the study of quantile processes and transportation distances between empirical distributions and true distributions on : see Csörgő and Révész (1978), Shorack and Wellner (2009), Barrio et al. (2005), and Bobkov and Ledoux (2019). It follows from the characterization Theorem 1(iv) of DKW (2017) that F is bi-log-concave if and only if . We will define and generalize this to the classes of bi-s*-concave distribution functions in Section 3.
Now we consider several examples of s-concave densities and bi-s*-concave distribution functions.
Example 1. (Student-t) Suppose x ↦ fr(x) is the density function of the Student-t distribution with r degrees of freedom defined as follows:
It is well-known (see e.g. Borell (1975)) that fr is s-concave for any s ≤ −1/(1 + r) = s0(fr). Note that s takes values in (−1, 0) since r ∈ (0, ∞). It follows from Proposition 1 that and (1 − F)s* are convex for , and hence Fr is bi-s*-concave for all 0 < r < ∞. Direct calculation shows that the Csörgő - Révész constant γ(Fr) = 1 − s* = 1 + (1/r) ∈ (1, ∞) for 0 < r < ∞.
In particular, this yields γ(F1) = γ(Cauchy) = 2. And it suggests that γ(F) ≤ 1/(1 + s) = 1 − s* for all bi-s*-concave distribution functions F where 1/(1 + s) varies from 1 to ∞ as s varies from 0 to −1. This is one of the characterizations of the bi-s*-concave class that we will prove in Section 3.
Example 2. (Fa,b) Suppose that fa,b is the family of F—distributions with “degrees of freedom” a > 0 and b > 0. (In statistical practice, if T has the density fa,b, this would usually be denoted by T ~ Fa,b, where a is the “numerator degrees of freedom” and b is the “denominator degrees of freedom”.) The density is given by
(In fact, C(a, b) = aa/2bb/2Beta(a/2, b/2), and fa,b(x) → gb(x) as a → ∞ where gb is the Gamma density with parameters b/2 and b/2.) It is well-known (see e.g. Borell (1975)) that fa,b belongs to the class of s-concave densities, if s ≤ −1/(1 + a/2) = s0(fa,b) when a ≥ 2 and b ≥ 2. This implies that s ∈ [−1/2, 0), and the resulting is in [−1, 0). By Proposition 1, it follows that Fs* and (1 − F)s* are convex; i.e. F is bi-s*-concave.
Example 3. (Pareto) Suppose that fa,b = (a/b)(x/b)−(a+1)1[b,∞)(x), the Pareto distribution with parameters a > 0 and b > 0. In this case, fa,b is s-concave for each s ≤ −1/(1 + a) by noting the convexity of . Thus we take s = −1/(1 + a) ∈ (−1, 0) for a ∈ (0, ∞) and hence s* = s/(1 + s) equals −1/a. Furthermore, it is easily seen that
(CRR(·) represents the Csörgő - Révész function in the right tail.)
Thus the Pareto distribution is analogous to the exponential distribution in the log-concave case in the sense that x → fs(x) = cx (with c = b−1(b/a)1/(1+a)) is linear.
Example 4. (Symmetrized Beta) Suppose that
where
and r ∈ (0, ∞). Note that fr is an s-concave density with s = 2/r ∈ (0, ∞) since
is concave and hence the corresponding distribution function Fr is bi-s*-concave with s* = s/(1 + s) = 2/(2 + r). As r → ∞ it is easily seen that
the standard normal density. Thus r = ∞ corresponds to s = 0 and s* = 0. On the other hand,
as r → 0. Thus r = 0 corresponds to s = ∞ and s* = 1.
Note that just as the class of bi-log-concave distributions is considerably larger than the class of log-concave distributions (as shown by Dümbgen et al. (2017)), the class of bi-s*-concave distributions is considerably larger than the class of s-concave distributions. In particular, multimodal distributions are allowed in both the bi-log-concave and the bi-s*-concave classes.
Example 5. (Exponential family; exponential tilt of U(0, 1)) Suppose that
where
| (5) |
for −∞ < t < ∞ with K(0) ≡ 0, and further define .
One can verify that ft is s-concave only for s ≤ 0 and hence Ft is bi-s*-concave for s* ≤ s/(1 + s) ≤ 0 by Proposition 1. However, this might not be optimal; i.e. there remains the possibility that for some s* > 0. In fact, by Theorem 3(iv) it follows that with s* = e−|t|. (For an example involving a power-tilt of U(0, 1), see Dharmadhikari and Joag-Dev (1988) (iv), page 95.) This also implies that the converse of Proposition 1 does not hold here or in general. The following two examples also illustrate this point.
Example 6. (Mixture of Gaussians shifted) (Dümbgen et al. (2017), page 2-3) Suppose that fδ is the mixture (1/2)N(−δ, 1) + (1/2)N(δ, 1) with δ > 0. It is well-known that fδ is bimodal if δ > 1. Since all s-concave densities are unimodal (see e.g. Dharmadhikari and Joag-Dev (1988) page 86), it follows that fδ is not s-concave for any δ > 1. Dümbgen et al. (2017) showed (numerically) that the corresponding distribution Fδ is bi-log-concave for δ ≤ 1.34 but not for δ ≥ 1.35. With δ = 1.8 this example also shows that strict inequality can occur in the second inequality in Corollary 4 below.
Example 7. (Mixture of shifted Student-t) Now suppose that f is the mixture (1/2)t1(· − δ) + (1/2)t1(· + δ) with δ > 0 where tr is the standard Student-t density with r degrees of freedom as in Example 1. Since fδ is bimodal if δ > δ0 ≈ 0.6 and all s-concave densities are unimodal, it follows that fδ is not s-concave for any δ > δ0. For values of δ < δ0, fδ is s-concave with s = −1/2, so Proposition 1 applies and shows that Fδ is bi-s*-concave with s* = −1. By numerical calculation, for δ > δ0 the distribution functions Fδ are bi-s*-concave for some s* = s*(δ) ∈ (−∞, 1] which decreases (approximately linearly) for large δ.
Example 8. (Lévy with α = 1/2) This example is the completely asymmetric α—stable (or Lévy) law with α = 1/2. It gives the first passage time to the level a > 0 for a standard Brownian motion B (started at 0 and with no drift). See e.g. Durrett (2019), pages 372 - 374. The density is given by
and the distribution function . It is easily seen that fa is s-concave with s = −2/3, and hence Fa is bi-s*-concave with s* = −2. Thus γ(F) = 3.
The following table summarizes the examples:
Example 5 shows that strict inequality can hold in the inequality
3. Main Theoretical Results
Here is our theorem characterizing bi-s*-concave distribution functions.
Theorem 3. Let s* ≤ 1. For a non-degenerate distribution function F, the following statements are equivalent:
(i) F is bi-s*-concave.
(ii) F is continuous on and differentiable on J(F) with derivative f = F′.
Moreover, for s* ≠ 0,
| (6) |
while for s* = 0
| (7) |
for all x, y ∈ J(F).
(iii) F is continuous on and differentiable on J(F) with derivative f = F′ such that the s*-hazard function f/(1 − F)1−s* is non-decreasing on J(F), and the reverse s*-hazard function f/F1−s* is non-increasing on J(F).
(iv) F is continuous on and differentiable on J(F) with bounded and strictly positive derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying
| (8) |
The following two remarks are immediately consequences of Theorem 3. See Section 6 for a proof of Remark 1.
Remark 1.
(i) The proof of Theorem 3(iv) implies that if s* > 1, then not both Fs* and (1 − F)s* can be concave.
(ii) If F is a bi-s*-concave distribution function for 0 < s* ≤ 1, then inf J(F) > −∞ and sup J(F) < ∞.
(iii) If F is a bi-s*-concave distribution function for s* < 0, then it follows that
| (9) |
with
| (10) |
Remark 2. Suppose that F is a bi-s*-concave distribution function, and define
Since f/F1-s* is monotonically non-increasing on J(F), it follows that for any x,x0 ∈ J(F) with x < x0,
and hence
Analogously one can show that
Corollary 4. (Connection with the Csörgő - Révész constant.)
Suppose F is a bi-s*-concave distribution function for s* ≤ 1. Then with and γ(F) as defined in (3) and (4), we have
| (11) |
where
and
Remark 3. By Theorem 3, one can verify that is well-defined for any . Note that
The first two inequalities in Corollary 4 follow (as we noted before) from 2−1{u∧(1 − u)} ≤ u(1 − u) ≤ u ∧ (1 − u) for 0 ≤ u ≤ 1. Thus finiteness of implies finiteness of γ(F) and vice-versa. Examples show that strict inequality may hold in the inner inequalities in (11). On the other hand, if f is non-decreasing on (a, F−1(1/2)) and f is non-increasing on (F−1(1/2), b) where J(F) = (a, b), then by inspection of the proof of .
Corollary 5. (Bounds for , where s* ≠ 0.)
For any s* ∈ (−∞, 0) ∪ (0, 1] and ,
| (12) |
where and .
Moreover, FU(x) is a convex function on J(F), and FL(x) is a concave function on J(F). For s* = 0 and , (12) holds with FL(x) = 1 + log F(x) and FU(x) = −log(1 − F(x)).
4. Confidence bands for bi-s*-concave distribution functions
Our goal in this section is to define confidence bands for F which exploit the shape constraint . We start with some known unconstrained non-parametric bands and define new bands under the assumption that the true distribution function F satisfies the shape constraint where is known.
4.1. Definitions and Basic Properties
Let X1, …, Xn be i.i.d. random variables with continuous distribution function F. A (1 − α)-confidence band, denoted by (Ln, Un), for F means that both Ln and Un are monotonically non-decreasing functions on depending on α and X1, …, Xn only, moreover, Ln and Un have to satisfy Ln < 1, Un > 0 and
The following two bands are discussed in Dümbgen et al. (2017) and we briefly restate them here.
Example (Komogorov-Smirnov band). A Komogorov-Smirnov band (Ln, Un) is given by
where is the empirical distribution function and denotes the (1 − α)-quantile of , see Shorack and Wellner (2009) Note that by Massart’s (1990) inequality, see Massart (1990).
Example (Weighted Komogorov-Smirnov band). A Weighted Komogorov-Smirnov band (Ln, Un) is as follows: for any γ ∈ [0, 1/2),
for i ∈ {0, 1, …, n} and x ∈ [X(i), X(i+1)), where denotes the order statistics of , X(0) ≡ −∞, X(n+1) ≡ ∞, ti ≡ i/(n + 1) for i = 1, …, n, and denotes the (1 − α)-quantile of the following test statistics
Note that .
A further example of a nonparametric confidence band due to Owen (1995) and refined by Dümbgen and Wellner (2014) was considered by Dümbgen et al. (2017). We will not consider this third possibility further here due to space constraints.
Now we turn to confidence bands for bi-s*-distribution functions. Our approach will be to refine the three unconstrained bands given in the three examples. Suppose F is a bi-s*-concave distribution function. A nonparametric (1 − α) confidence band (Ln, Un) for F may be refined as follows:
If there is no bi-s*-concave distribution function F fitting into the band (Ln, Un), we set and and we conclude with confidence 1 − α that F is not bi-s*-concave. But in the case of , this happens with probability at most α.
The following lemma implies two properties of our shape-constrained band (). The first one is that both and are Lipschitz continuous on , unless . The second one is that converges polynomially fast to 0 as x → −∞ and converges polynomially fast to 1 as x → ∞ as long as limx→∞ Ln(x) > limx→−∞ Un(x).
Lemma 6. For real numbers a < b, 0 < u < v < 1 and s* ∈ (−∞, 0) ∪ (0, 1], define
(i) If Ln(a) ≥ u and Un(b) ≤ v, then and are Lipschitz-continuous on with Lipschitz constant max{γ1, γ2}.
(ii) If Un(a) ≤ u and Ln(b) ≥ v, then
The following theorem implies the consistency of our proposed confidence band ().
Theorem 7. Suppose that the original confidence band (Ln, Un) is consistent in the sense that for any fixed , both Ln(x) and Un(x) tend to F(x) in probability.
(i) Suppose that . Then .
(ii) Suppose that with s* ≠ 0. Then , and
| (13) |
where sup(Ø) ≡ 0. Moreover, for any compact interval K ⊂ J(F),
| (14) |
where hG stands for any of the three functions G′, (Gs*)′, and ((1 − G)s*)′. Finally, for any fixed x1 ∈ J(F) and 0 < b1 < f(x1)/F1−s*(x1),
| (15) |
while for any fixed x2 ∈ J(F) and 0 < b2 < f(x2)/(1 − F(x2))1−s*
| (16) |
The following result provides the consistency of confidence bands for functionals ∫ ϕdF of F with well-behaved integrands .
Corollary 8. Suppose that the original confidence band (Ln, Un) is consistent, and let with s* < 0. Let be absolutely continuous with a continuous derivative ϕ′ satisfying the following constraint: there exist constants a > 0 and k < −1/s* such that
Then
The following theorem provides rates of convergence, with the following condition on the original confidence band (Ln, Un):
Condition (*): For certain constants γ ∈ [0, 1/2) and κ, λ > 0,
on the interval .
As stated in Dümbgen et al. (2017), this condition is satisfied with γ = 0 in the case of the Kolmogorov-Smirnov band. In the case of the weighted Kolmogorov-Smirnov band, it is satisfied for the given value of γ ∈ [0, 1/2). For the refined version of Owen’s band, it is satisfied for any fixed number γ ∈ (0, 1/2).
Theorem 9. Suppose that with s* < 0 and let (Ln, Un) satisfy Condition (*). Let be absolutely continuous with a continuous derivative ϕ′.
Suppose that |ϕ′(x)| = O(|x|k−1) as |x| → ∞ for some numbers k < −1/s*. Then
| (17) |
Remark: (i) From (17), one can verify that the convergence rate is n−1/2 as long as k < γ/(−s*).
(ii) From (17), one can verify that when γ = 0, the convergence rate is n−1/2+k/(−s*) and we have a “power deficit” (or “polynomial rate deficit”) relative to n−1/2.
4.2. Implementation and illustration of the confidence bands
In this section, we discuss the implementation of confidence bands for bi-s*-concave distribution functions. This extends the treatment of Dümbgen et al. (2017) from s* = 0 to general values s* ∈ (−∞, 1].
Recall the procedure ConcInt(·, ·) developed in Dümbgen et al. (2017). Given any finite set of real numbers t0 < t1 < ⋯ < tm and any pair (l, u) of functions with l < u pointwise and l(t) > −∞ for at least two different points , this procedure computes the pair (lo, uo) where
First note that lo is the smallest concave majorant of l on ; thus it may be computed by a version of the pool-adjacent-violators algorithm; see for example Robertson et al. (1988). Then we obtain indices 0 ≤ j(0) < j(1) < ⋯ < j(b) ≤ m such that
With lo in hand, we then check to see if lo ≤ u on . If this fails, then there is no concave function lying between l and u, and the procedure returns an error message. If this test succeeds, then we compute uo(x) as
where . (The rest of the description of the procedure ConcInt(·, ·) is just as in Dümbgen et al. (2017).)
When s* < 0, let g(v; s*) ≡ g(v) ≡ −vs* and h(v; s*) ≡ h(v) ≡ (−v)1/s*. (This is the most important new case. When s = s* = 0, g(v) ≡ log(v), h(v) ≡ exp(v). When s* > 0, g(v) ≡ vs* and h(v) ≡ v1/s*.) Here is pseudocode for the computation of ().
Illustration of the confidence bands
To get some feeling for the new confidence bands in a setting in which is known, we generated a sample of size n = 100 from the Student-t distribution with r = 1 degrees of freedom. This distribution belongs to for every . We constructed Kolmogorov-Smirnov (KS) and weighted Kolmogorov-Smirnov (WKS) bands with γ = 0.4 as the initial starting bands (Ln, Un). We then computed and plotted our shape constrained confidence bands () under the (correct) assumption that s* = −1 and the (incorrect) assumption that s* = 0 for both the KS and WKS bands as initial nonparametric bands with for α = 0.05; see Figure 1 and Figure 2. To see the components of Figures 1 and 2 separately, see the Supplementary file, Figures 1–2 and 3–4 respectively.
Figure 1:

Confidence bands for bi-s*-concave distribution functions based on KS bands. The black curve is the distribution function of the Student-t distribution with 1 degree of freedom. The two gray-black lines give the KS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function in the middle is the empirical distribution function.
Figure 2:

Confidence bands for bi-s*-concave distribution functions based on WKS bands. The black curve is the distribution function of the Student-t distribution with 1 degree of freedom. The two gray-black lines give the WKS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function in the middle is the empirical distribution function.
Figure 3:

Confidence Bands for bi-s*-concave distribution functions from KS bands based on a sample of size 1000 from the Student-t distribution with 1 degree of freedom. The two gray-black lines give the initial bands, lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function (black) in the middle is the empirical distribution function.
Figure 4:

Confidence Bands for bi-s*-concave distribution functions from WKS bands based on a sample of size 1000 from the Student-t distribution with one degree of freedom. The two gray-black lines give the initial bands, lines in other colors are refined confidence bands under the bi-s*-concave assumption. The step function (black) in the middle is the empirical distribution function.
Note that when s* = 0, s* is miss-specified and the resulting bands are not guaranteed to have coverage probabiltiy .95. An indication of this is that the shape constrained bands computed under the assumption s* = 0 do not contain the empirical distribution.
From these two plots, an immediate observation is that the confidence bands for smaller s* are wider than those with larger s*. This is a direct consequence of the nested property of the bi-s*-concave classes; see Proposition 2. Also note that the shape constrained band with s* = −1 does improve on the KS band, especially in the tail.
An Application
Dümbgen et al. (2017) gave an application of bi-log-concave confidence bands to a dataset from Woolridge (2000). It contains approximate annual salaries of the CEOs of 177 randomly chosen companies in the U.S. The salary is rounded to multiples of 1000 USD. We denote the i-th observed approximate salary by Yi,raw. Dümbgen et al. (2017) assume that the unobserved true salary Yi,true lies within (Yi,raw − 1, Yi,raw + 1). Let us assume that Gtrue is the unknown distribution of Ytrue. For income data it is sometimes assumed that log10 Ytrue is Gaussian (see Kleiber and Kotz (2003)). Since Gaussian densities are all log-concave and hence have bi-log-concave distribution functions (by Proposition 1), it is natural to consider replacing the Gaussian assumption by the assumption of bi-log-concavity. Dümbgen et al. (2017) therefore assumed that X = log10 Ytrue is bi-log-concave and constructed 95% confidence bands (Ln, Un) (see Figure 4 of Dümbgen et al. (2017)) where Ln is computed with the empirical distribution of and Un is computed with that of .
Here we assume that the distribution of X is bi-s*-concave for some s* and compute confidence bands for different values of s*. Now we are confronted with the issue of choosing s*: if we want narrower confidence bands we would assume some value of s* ∈ (0, 1], while if we are not willing to assume s* = 0 (the choice made by Dümbgen et al. (2017), then we would assume some value of s* < 0 (leading to the larger classes with s* < 0. It is of some interest to know if the CEO data could be modeled by use of the bi-s* classes with s* ∈ (0, 1] since this would result in still narrower confidence bands. But it is also of interest to try to use the data to choose s*.
Choosing s*
Since F can be a member of for various values of s*, each s* leads to a different set of bands. However, due to the nesting property of , a larger s* always yields a narrower confidence band. Thus, it is of interest to estimate
since generates the narrowest bands at a given confidence level. If F is not bi-s*-concave for any s* ≤ 1, then we set . Now is connected to the Csörgő - Révész constant since when and . For example, the Student-t distribution with r “degree of freedom” has . However, this connection cannot be easily exploited for practical estimation purposes due to difficulties in estimating γ(F) or . So we take an alternative route to making inference about .
Starting from an initial 1 − α band (Ln, Un), a bound on is given by
Clearly, for , there is no bi-s*-concave distribution function fitting into the band (Ln, Un). Since this happens with probability at most α ∈ (0, 1) when the true distribution function , it follows that is a confidence set for with coverage probability at least 1 − α. Our simulations suggest that is generally considerably larger than , and hence not suitable as an estimator, especially for α = 0.05.
Instead, we propose an estimator of based on the measure of the set where the empirical measure remains between the shape-constrained band for s*. More formally, let and denote the 1 − α level bi-s*-concave confidence bands based on the initial bands Ln and Un and the assumption . Define
A higher value of ω(s*) indicates that () contains a greater portion of . Since the bands (Ln(s*), Un(s*)) become narrower as s* increases, ω(s*) decreases in s*, and eventually becomes zero when . A plausible estimator of is therefore given by
| (18) |
where ρ is a threshold taking values in (0, 1). The calculation of thus depends on α and ρ.
In the case of the CEO data, for the KS initial band, and for the WKS band. Taking α = 0.05 and ρ = .95, leads to , while taking α = 0.05 and ρ = 0.95, leads to . The resulting bands are given in Figures 5 and 6. Also see the Supplementrary file, Figures 9–10 and Figures 11–12 for the steps in constructing Figures 5 and 6.
Figure 5:

Confidence Bands from an initial KS band for the CEO salary data. The step function in the middle is the empirical distribution function. The two gray-black lines give the KS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption.
Figure 6:

Confidence Bands from an initial WKS band for the CEO salary data. The step function in the middle is the empirical distribution function. The two gray-black lines give the WKS band and lines in other colors are refined confidence bands under the bi-s*-concave assumption.
We should emphasize that our current theory says little about the coverage probabilities of the bands (). Discussion of the consistency of is beyond the scope of the present paper, but this and further issues concerning inference for both s* and seem to be interesting directions for future research.
5. Summary and further problems
In this paper we have:
• Defined new classes of shape-constrained distribution functions, the bi-s*-classes extending the bi-log-concave class of distribution functions defined by Dümbgen et al. (2017).
• Characterized the new classes and connected our characterization to an important parameter, the Csörgő - Révész constant associated with a distribution function F.
• Used the new bi-s*-concave classes to define refined confidence bands for distribution functions which exploit the shape constraint, thereby producing more accurate (narrower) bands with honest coverage when the shape constraint holds.
Thus we have shown that if we know the parameter s* ∈ (−∞, 1] determining the class, we can construct refined confidence bands which improve on any given nonparametric confidence bands if the given value of s* is correct. It follows from the construction of our bands that they have conservative coverage probabilities under the (null) hypothesis that the true distribution function is in and that s* is correctly specified.
• What if we do not know s*? Can we estimate it from the data? As becomes clear from the discussion of the CEO data via Figures 5 and 6, our methods provide one-sided confidence bounds for the true s* of the form (−∞, ] under the assumption that for some s*. It remains to develop inference methods for s* and (s*, F) jointly. It will also be of interest to have a more complete understanding of the power behavior of tests related to and .
• The stable laws are known to be unimodal; see e.g. Hall (1984) for some history. In connection with Example 8 we have the following:
Conjecture:
the α—stable laws are s—concave with s = −1/(1 + α) for 0 < α < 2.
6. Proofs
Proof of Theorem 3:
Throughout our proof we will denote inf J(F) and sup J(F) by a and b respectively. Moreover, we assume s* < 0 in the following proof and leave the case of s* > 0 for the Appendix. Note that the case s* = 0 is proved by Dümbgen et al. (2017).
(i) implies (ii):
Suppose . To prove that F is continuous on , we first note that x ↦ Fs*(x) and x ↦ (1 − F(x))s* (x) are convex functions on . By Theorem 10.1 (page 82) of Rockafellar (1970), Fs* and (1 − F(x))s* are continuous on any open convex sets in their effective domains. In particular, Fs* and (1 − F)s* are continuous on (a, ∞) and (−∞, b) respectively. This implies that F is continuous on (a, ∞) and (−∞, b), or equivalently, on (a, ∞) ∪ (−∞, b) = (−∞, ∞) since F is non-degenerate.
To prove that F is differentiable on J(F), note that J(F) = (a, b) since F is continuous on . By Theorem 23.1 (page 213) of Rockafellar (1970), for any x ∈ J(F), the convexity of Fs* on J(F) implies the existence of and . Moreover, by Theorem 24.1 (page 227) in Rockafellar (1970). Since F = (Fs*)1/s* on J(F), the chain rule guarantees the existence of and
Since F is continuous on J(F), then
Hence by noting that and s* < 0.
Similarly, one can prove by the convexity of (1 − F)s* on J(F). Thus for any x ∈ J(F), or equivalently, F is differentiable on J(F). The derivative of F is denoted by f, i.e. f ≡ F′.
To prove (6), note that the convexity of x ↦ Fs* (x) on J(F) implies that, for any x,y ∈ J(F),
or, with x+ = max{x, 0},
Hence,
or, equivalently,
Analogously, the convexity of (1 − F(x))s* on J(F) implies that
or, equivalently,
which yields
The proof of (6) is complete.
(i) implies (iii):
Applying (6) yields that for any x, y ∈ J(F) with x < y,
and
or, equivalently,
and
By defining h ≡ f/F1−s* on J(F), it follows that
and
After summing up the last two inequalities, it follows that
or, equivalently,
Hence h(x) ≥ h(y), or equivalently, h(·) is a monotonically non-increasing function on J(F).
The proof of the monotonicity of is similar and hence is omitted.
(iii) implies (iv):
If (iii) holds, it immediately follows that f > 0 on J(F) = (a, b). If not, suppose that f(x0) = 0 for some x0 ∈ J(F). It follows that h(x0) = f(x0)/F1-s* (x0) = 0. Since h is monotonically non-increasing on J(F), h(x) = 0 for all x ∈ [x0, b), or, equivalently, f = 0 on [x0, b). Similarly, the non-decreasing monotonicity of on J(F) implies that f = 0 on (a, x0]. Then f = 0 on J(F), which violates the continuity assumption in (iii) and hence f > 0 on J(F).
To prove f is bounded on J(F), note that the monotonicities of h and imply that for any x, x0 ∈ J(F),
Hence } for any x, x0 ∈ J(F).
To prove that f is differentiable on J(F) almost everywhere, we first prove that f is Lipschitz continuous on (c, d) for any c, d ∈ J(F) with c < d.
By the non-increasing monotonicity of h on J(F), the following arguments yield an upper bound of (f(y) − f(x))/(y − x) for any x, y ∈ (c, d):
where the last equality follows from the mean value theorem and z is between x and y.
Since −s* > 0, it follows that F−s* < 1 and hence
for x, y ∈ (c, d).
Similar arguments imply that
Hence
The last display shows that f is Lipschitz continuous on (c, d).
By Proposition 4.1(iii) of Shorack (2017), page 82, f is absolutely continuous on (c, d), and hence f is differentiable on (c, d) almost everywhere.
Since (c, d) is an arbitrary interval in (a, b), the differentiability of f on (c, d) implies the differentiability of f on (a, b) and hence f is differentiable on (a, b) with f′ = F″ almost everywhere.
Since f is differentiable almost everywhere, the non-increasing monotonicity of h on J(F) implies that
or, equivalently,
Straight-forward calculation yields that the last display is equivalent to
or,
which is the right hand side of (8).
Similarly, the non-decreasing monotonicity of implies the left hand side of (8).
(iv) implies (i):
Since F is continuous on , it suffices to prove that Fs* is convex on J(F) by Definition 2. Since we assume that F is differentiable on J(F) with derivative f = F′, the convexity of Fs* on J(F) can be proved by the increasing monotonicity of the first derivative of Fs* on J(F). Since f is differentiable almost everywhere on J(F), the increasing monotonicity of (Fs*)′ on J(F) can be proved by the non-positivity of (Fs*)″ on J(F) almost everywhere, which follows from
where f = F′, f′ = F″. The last inequality follows from the right hand side of (8).
Similarly, the convexity of (1 − F(x))s* , or , on J(F) can be proved by the following arguments:
where the last inequality follows from the left part of (8). □
Proof of Proposition 1:
First some background and definitions:
- Let a, b ≥ 0 and θ ∈ (0, 1). The generalized mean of order is defined by
- Let (M, d) be a metric space with Borel σ—field . A measure μ on is called t-concave if for nonempty sets and 0 < θ < 1 we have
where μ* is the inner measure corresponding to μ (which is needed in general in view of examples noted by Erdős and Stone (1970)). - A non-negative real-valued function h on (M, d) is called s-concave if for x, y ∈ M and 0 < θ < 1 we have
See Chapter 3.3 in Dharmadhikari and Joag-Dev (1988) for more details of the definitions of Ms(a, b; θ), t-concave and s-concave. - Suppose , k—dimensional Euclidean space with the usual Euclidean metric and suppose that f is an s-concave density function with respect to Lebesgue measure λ on , and consider the probability measure μ on defined by
Then by a theorem of Borell (1975), Brascamp and Lieb (1976) and Rinott (1976), the measure μ is s*-concave where s* = 1/(1 + ks) if s ∈ (−1/k, ∞) and s* = 0 if s = 0. - Here we are in the case k = 1. Thus for s ∈ (−1, ∞) the measure μ is s* concave: for s ∈ (−1, ∞), , and 0 < θ < 1,
here μ* denotes inner measure corresponding to μ.(19)
With this preparation we can give our proof of Proposition 1: if A = (−∞, x] and B = (−∞, y] for x, y ∈ J(F), it is easily seen that
Therefore, with the second inequality following from (19),
i.e. F is s*-concave. Similarly, taking A = (x, ∞) and B = (y, ∞) it follows that 1 − F is s*-concave.
Note that this argument contains the case s* = 0. □
Proof of Proposition 2:
By Theorem 3, for any , F is continuous on and differentiable on J(F) with derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying (8).
For any t* ≤ s*, by noting that 1 − s* ≤ 1 − t* and −(1 − s*) ≥ −(1 − t*), it follows that
almost everywhere on J(F). Hence by Theorem 3. This proves (1).
To prove (2), note that for any , F is continuous on and differentiable on J(F) with derivative f = F′. Furthermore, f is differentiable almost everywhere on J(F) with derivative f′ = F″ satisfying (8), i.e.
for all s* > 0. By taking s* → 0, it follows that
The last display is equivalent to by Theorem 3. This proves that the left hand side of (2) holds. Similarly, one can prove the right hand side of (2); the details are omitted. □
Proof of Corollary 4:
To prove the right part of (11), note that (8) implies that
almost everywhere on J(F), or equivalently,
Replacing ess supx∈J(F) F f′/f2 and ess supx∈J(F)−(1 − F)f′/f2 by and , it follows that
One can prove the left two inequalities of (11) by the following arguments:
where the last inequality holds since u ∧ (1 − u) ≥ u(1 − u) for 0 ≤ u ≤ 1. □
Proof of Corollary 5:
Note that for s* < 0 and y > −1, we have (1 + y)s* ≥ 1 + s*y. Replacing y by −F(x), where x ∈ J(F), it follows that
or, by rearranging,
where FU is a convex function on J(F) if . This proves the right hand side of (12) for s* < 0. Similarly, replacing y by −(1 − F(x)), where x ∈ J(F), by rearranging terms, it follows that
which proves the left hand side of (12) for s* < 0.
Similarly, for 1 ≥ s* > 0 and y > −1, we have (1 + y)s* ≤ 1 + s*y. Replacing y by −F(x), where x ∈ J(F), it follows that
or, by rearranging,
where FU is a convex function on J(F) if . This proves the right hand side of (12) for s* > 0.
Similarly, replacing y by −(1 − F(x)), where x ∈ J(F), by rearranging terms, it follows that
which proves the left hand side of (12) for s* > 0.
Proof of Lemma 6:
If there is no fitting in between Ln and Un, it follows that and and assertions in both (i) and (ii) are trivial. In the following proof, we let such that Ln ≤ G ≤ Un.
(i) It suffices to prove that for any x ∈ J(G) the density function g = G′ satisfies g(x) ≤ max{γ1, γ2}, because this is equivalent to Lipschitz-continuity of G with the latter constant, and this property carries over to the pointwise infimum and supremum .
To prove g(x) ≤ max{γ1, γ2}, note that g/G1−s* is monotonically non-increasing on J(G) (see Theorem 3(iii)), it follows that for x ≥ b
The last inequality follows from noting that x ↦ (1/s*)xs* is a monotonically non-decreasing function for all s* ≠ 0, G(b) ≤ Un(b) ≤ v and G(a) ≥ Ln(a) ≥ u. Hence
Similarly, by noting that g/(1 − G)1−s* is monotonically non-decreasing on J(G) (see Theorem 3(iii)), it follows that for x ≤ a
The last inequality follows from noting that x ↦ −(1/s*)(1 − x)s* is a monotonically non-decreasing function for all s* = 0, G(b) ≤ v and G(a) ≥ u. Hence
For a < x < b, analogously, we get two following inequalities
and
The former inequality times (x − a) plus the latter inequality times (b − x) yields
where
Since
it follows that h(y) is convex on (0, 1) and hence
Note that
and
Hence g(x) ≤ max{γ1, γ2} for a < x < b.
(ii) By Theorem 3(ii), it follows that for x ≤ a
By Theorem 3(iii), the non-increasing monotonicity of g/G1−s* implies that
The last inequality follows from noting that G(a) ≤ Un(a) ≤ u and G(b) ≥ Ln(b) ≥ v. Since x − a ≤ 0, it follows that
The last inequality follows from noting that G(a) ≤ u.
On the other hand, by Theorem 3(ii), it follows that for x ≥ b
The last inequality follows from noting that 1 − G(b) ≤ 1 − v. By Theorem 3(iii), the non-decreasing monotonicity of g/(1 − G)1−s* implies that
The last inequality follows from noting that G(a) ≤ Un(a) ≤ u and G(b) ≥ Ln(b) ≥ v. Since x − b ≥ 0, it follows that
Proof of Theorem 7:
The following proof is analogous to the proof of Theorem 3 in Dümbgen et al. (2017), in which they proved the result in the case s* = 0. In the following proof we assume that s* ≠ 0.
(i) Suppose s* > 0. Since F is not bi-s*-concave, it follows that Fs* or (1 − F)s* is not concave. Without loss of generality, we assume that Fs* is not concave and hence there exist real numbers x0 < x1 < x2 such that Fs*(x1) < (1 − λ)Fs* (x0) + λFs* (x2), where λ ≡ (x1 − x0)/(x2 − x0) ∈ (0, 1). By the consistency of Ln and Un, it follows that, with probability tending to one, and hence
for any G such that Ln ≤ G ≤ Un. Therefore, there are no bi-s*-concave distribution functions fitting between Ln and Un and hence and with probability tending to one.
The proof of the case s* < 0 is similar and hence is omitted.
(ii) Suppose . Note that since (Ln, Un) is a (1 − α) confidence band for F, it follows that .
If is empty, it follows that and and hence the assertions are trivial. In the following proof, we assume that is not empty.
To prove (13), we first prove that ‖Ln − F‖∞ →p 0 and ‖Un − F‖∞ →p 0. By the continuity of F, for any with m ≥ 2, there exist real numbers such that F(xi) = i/m, i = 1, …, m − 1. Furthermore, define x0 = −∞ and xm = ∞.
By the non-decreasing monotonicity of Ln and F, it follows that for x ∈ [xi−1, xi]
and
Hence
for x ∈ [xi−1,xi]. Note that
it follows that
and hence pointwise convergence implies uniform convergence. An analogous proof shows that ‖Un − F‖∞ →p 0 and is omitted.
Combining ‖Ln − F‖∞ →p 0 and ‖Un − F‖∞ →p 0 implies that
To prove (14) in the case that hG = (Gs*)′, it suffices to prove that
| (20) |
Note that hG/s* = G′/G1−s*. Since K is a compact interval in J(F) and hF/s* = f/F1−s* is continuous and non-increasing on J(F), for any fixed ϵ > 0 there exist points a0 < a1 ⋯ < am < am+1 in J(F) such that K ⊂ [a1, am] and
For with Ln ≤ G ≤ Un, for any x ∈ K it follows from the monotonicity of hF/s* and hG/s* that
Analogously,
Since ϵ > 0 is arbitrarily small, this shows that (20) holds.
The proof of (14) in the case that hG = ((1 − G)s*)′ is similar and hence is omitted.
Since G′ = G1−s* (Gs* / s*)′, it follows from (20) that (14) holds in the case that hG = G′.
Finally, let x1 < sup J(F) and b1 < f(x1)/F1−s*(x1). As in the proof of Lemma 6(ii) an analogous argument implies that for any , ,
for all x ≤ x′ ≤ x1.
Note that by the consistency of Ln and Un and letting , it follows that.
Hence with probability tending to one,
for all x ≤ x′ ≤ x1. The proof of (16) is similar and hence is omitted. □
Proof of Remark 1:
(i) By Theorem 3(ii), if s* > 0 and inf J(F) = −∞, it follows that for arbitrary x ∈ J(F),
for small enough y such that
This violates the assumption that inf J(F) = −∞ and hence inf J(F) > −∞. The finiteness of sup J(F) can be proved similarly and hence is omitted.
(ii) We first note that (9) holds automatically if inf J(F) > −∞ and sup J(F) < ∞.
In the following proof, we focus on the case that inf J(F) = −∞ and sup J(F) < ∞. To prove (9), it suffices to show that ∫ |x|tdF(x) is finite for t ∈ (0, (−1)/s*). Note that
Since sup J(F) is finite, the first term of the last display is finite and hence it suffices to prove that tat−1P(X < −a) is integrable for t < (−1)/s*.
It follows from Theorem 3(ii) that for any a large enough and x ∈ J(F),
Thus tat−1P(X < −a) is integrable for t < (−1)/s*, since
for a large enough and at+1/s*−1 is integrable for t < (−1)/s*.
For other cases, the proof is similar and hence is omitted. □
Proof of Corollary 8:
Suppose that x0 is a point in J(F). Notice that for any ,
and hence by Fubini’s theorem, it follows that
| (21) |
provided that
To prove the last display, note that for any b1 ∈ (0, T1(F)) and b2 ∈ (0, T2(F)), there exist points x1, x2 ∈ J(F) with x1 ≤ x0 ≤ x2 and
Then it follows from Theorem 7(ii) that with probability tending to one,
and
Hence for any c > max{|x1|, |x2|}, it follows that
Since |ϕ′(x)| ≤ a|x|k−1, it follows that the last display is no larger than
which is finite by noting that k − 1 + 1/s* < −1. Analogously, one can prove that for c > max{|x1|, |x2|},
Since ϕ′ is continuous on , it follows that for any c > max{|x1|, |x2|},
and hence
By (21), it follows that
which is not larger than
Note that the last two terms go to zero as c goes to infinity by their integrability and hence
Proof of Theorem 9: It follows from the proof of Corollary 8 that
and hence
It suffices to bound |G − F | on , where G is between and .
It follows from and Condition (*) that on the interval ,
To bound , it follows from Theorem 3.7.1, page 141, Shorack and Wellner (2009) that
by verifying that q(t) = (t(1 − t))γ with 0 ≤ γ < 1/2 is monotonically increasing on [0, 1/2], symmetric about 1/2 and , where is Brownian bridge on [0, 1].
Hence for any fixed ϵ ∈ (0, 1) there exists a constant κϵ > 0 such that with probability at least 1 − ϵ,
on . Thus, it follows that on the interval ,
To bound by F(1 − F), note that
For a constant λϵ > 0 to be specified later, it follows from λϵn−1/(2−2γ) ≤ F ≤ 1 − λϵn−1/(2−2γ) and γ ∈ [0, 1/2) that
and
Hence
Thus, on the interval
where .
The following arguments show that for a large enough λϵ, the interval {λϵn−1/(2−2γ ≤ F ≤ 1 − λϵn−1/(2−2γ) is a subset of .
To see this, note that
and analogously,
it follows that by choosing a λϵ large enough such that , the interval {λϵn−1/(2−2γ) ≤ F ≤ 1 − λϵn−1/(2−2γ)} is a subset of and hence on the interval
Define xn1 and xn2, such that F(xn1) = λϵn−1/(2−2γ) and F(xn2) = 1 − λϵn−1/(2−2γ). Analogously, one can prove that F − G ≤ νϵn−1/2(F(1−F))γ on [xn1,xn2] and hence
| (22) |
on [xn1,xn2]. Thus for G between and ,
From here, we can see that if with s* > 0, it follows from Remark 1(i) that J(F) is bounded and hence
as long as ϕ′ is bounded on J(F).
The similar argument works if with s* < 0 and J(F) is bounded. In the following proof, we get back to our case that with s* < 0 and without loss of generality, we assume J(F) = (−∞, ∞).
As in the proof of Corollary 8, for x0 ∈ J(F), b1 ∈ (0, T1(F)) and b2 ∈ (0,T2(F)), there exist points x1,x2 ∈ J(F) with x1 < x0 < x2 such that f(x1)/F1−s*(x1) > b1 and f(x2)/(1 − F(x2))1−s* > b2. Then it follows from Theorem 7(ii) that with asymptotic probability one,
| (23) |
and
Similarly, it follows from Theorem 3(ii) that
| (24) |
and
For large enough n, one can have [x1, x2] ⊂ [xn1, xn2] and hence
where
Note that . For the other terms, first note that F(xn1) = λϵn−1/(2−2γ) and hence it follows from (24) that
Analogously, one can prove that
Thus, it follows from (24) and the upper bound of |ϕ′| that
Analogously, one could show that
To bound , note that for x ≤ xn1, it follows from an analogous proof of (24) that
Analogously, it follows that for x ≤ xn1,
Note that it follows from (22) that
and hence for x ≤ xn1,
Thus,
Analogously, one could show that
Hence
Supplementary Material
Table 1:
Summary of Examples 1-8
| Name | Example | density f |
d.f. F |
s | s* = s/(1 + s) | |
|---|---|---|---|---|---|---|
| student-t | 1 | fr, r > 0 | Fr | − 1/(1 + r) | − 1/r | 1 + (1/r) |
| Fa,b | 2 | fa,b, a, b > 0 | Fa,b | − 1/(1 + a/2) | −2/a | 1 + 2/a |
| Pareto(a, b) | 3 | fa,b, a, b > 0 | Fa,b | − 1/(1 + a) | − 1/a | 1 + 1/a |
| Symmetric Beta | 4 | fr, r > 0 | Fr | 2/r | 2/(r + 2) | 1/(1 + 2/r) = r/(r + 2) |
| Expo family Tilted U(0, 1) | 5 | Ft | 0 | e −|t| | 1 − e−|t| | |
| Mixture, N(δ, 1), N(−δ, 1) | 6 | fδ | Fδ | not s-concave for δ > 1 | 0 for 0 < δ < 1.34 | 1 0 < δ < 1.34 |
| Mixture, T(δ, 1), T(−δ, 1) | 7 | fδ | Fδ | not s-concave δ > .6 | bi-s*-concave, some s* 0 < δ < ∞ | 2 δ small |
| Lévy α = 1/2 | 8 | fa | Fa | −2/3 | −2 | 3 |
Highlights:
New classes of shape-constrained distributions are defined and studied:
New confidence bands which exploit the shape - constraints are defined and shown to improve on existing bands if the assumed shape constraint holds
The new classes of shape-constrained distribution functions, which we call bi-s*-concave, play an important role in the theory of quantile processes.
Acknowledgements:
We owe thanks to Lutz Dümbgen for several helpful discussions. We also thank two referees for their positive comments and suggestions.
The research of J. A. Wellner was partially supported by NSF grant DMS-1566514, NIAID grant 2R01 AI291968-04, a Simons Fellowship via the Newton Institute (INI-program STS 2018), Cambridge University, and the Saw Swee Hock Visiting Professorship of Statistics at the National University of Singapore (in 2019).
7. Appendix 1
Proof of the equivalence between Definition 1 and Definition 2.
Definition 1 implies Definition 2:
For any , Theorem 3 shows that F is a continuous function on . By noticing that , J(F) ⊂ (inf J(F), ∞) and J(F) ⊂ (−∞, sup J(F)), the convexity or concavity of Fs* or (1−F)s* on , (inf J(F), ∞) and (−∞, sup J(F)) imply the convexity or concavity of Fs* or (1 − F)s* on J(F). Hence, Definition 1 implies Definition 2.
Definition 2 implies Definition 1:
Suppose s* < 0. By Definition 2, for any , Fs* and (1 − F)s* are convex on J(F). Moreover, F is continuous on and hence J(F) = (a, b) where a ≡ inf J(F), b ≡ sup J(F).
To prove that Fs* is convex on , by continuity of F it suffices to prove that Fs* is mid-point convex: that is,
| (25) |
for any . Without loss of generality, we assume that x < y.
Note that if a = −∞ and b = ∞, then there is nothing to prove. Without loss of generality, we assume that a > −∞ and b < ∞.
Note that if x ∈ (−∞, a], then Fs*(x) = ∞ and hence (25) holds automatically. If x ∈ (a, b) and y ∈ (a, b), (25) holds by the convexity of Fs* on J(F). Moreover, by noticing the continuity of Fs* at b, (25) holds for any x ∈ (a, b) and y ∈ (a, b]. Since Fs*(y) = Fs*(b) = 1 for y ≥ b, (25) holds for any x ∈ (a, b) and y ∈ [b, ∞). If x, y ∈ [b, ∞), (25) holds automatically since Fs* (x) = Fs*(y) = 1.
The proof of the convexity of (1 − F)s* on is similar and hence is omitted. For the cases that s* ≥ 0, the proof is similar and hence is omitted. □
Proof of Theorem 3 (0 ≤ s* ≤ 1):
Recall that a ≡ inf J(F) and b ≡ sup J(F). Suppose 1 ≥ s* > 0.
(i) implies (ii):
Suppose . To prove that F is continuous on , we first note that x ↦ Fs* (x) and x ↦ (1 − F(x))s* (x) are concave functions on (a, ∞) and (−∞, b) respectively. By Theorem 10.1 (page 82) in Rockafellar (1970), Fs* and (1 − F(x))s* are continuous on any open convex sets in their effective domains. In particular, Fs* and (1 − F)s* are continuous on (a, ∞) and (−∞, b) respectively. This yields that F is continuous on (a, ∞) and (−∞, b), or equivalently, on (a, ∞)∪(−∞, b) = (−∞, ∞) since F is non-degenerate.
To prove that F is differentiable on J(F), note that J(F) = (a, b) since F is continuous on . By Theorem 23.1 (page 213) in Rockafellar (1970), for any x ∈ J(F), the concavity of Fs* on J(F) implies the existence of and . Moreover, by Theorem 24.1 (page 227) in Rockafellar (1970). Since F = (Fs*)1/s* on J(F) the chain rule guarantees the existence of and
Since F is continuous on J(F), then
Hence by .
Similarly, one can prove by the concavity of (1 − F)s* on J(F). Thus for any x ∈ J(F), or equivalently, F is differentiable on J(F). The derivative of F is denoted by f, i.e. f ≡ F′.
To prove (6), note that the concavity of x ↦ Fs* (x) on J(F) implies that, for any x, y ∈ J(F),
or, with x+ = max{x, 0},
Hence
or, equivalently,
Analogously, the convexity of (1 − F(x))s* on J(F) implies that for any x, y ∈ J(F)
or, equivalently,
which yields
The proof of (6) is complete.
(ii) implies (iii):
Applying (6) yields that for any x, y ∈ J(F) with x < y,
and
or, equivalently,
and
By defining h ≡ f/F1−s* on J(F), it follows that
and
After summing up the last two inequalities, it follows that
or, equivalently,
Hence h(x) ≥ h(y), or equivalently, h(·) is a monotonically non-increasing function on J(F).
The proof of the monotonicity of is similar and hence is omitted.
(iii) implies (iv):
If (iii) holds, it immediately follows that f > 0 on J(F) = (a, b). If not, suppose that f(x0) = 0 for some x0 ∈ J(F). It follows that h(x0) = f(x0)/F1−s*(x0) = 0. Since h is monotonically non-increasing on J(F), h(x) = 0 for all x ∈ [x0, b), or, equivalently, f = 0 on [x0, b). Similarly, the non-decreasing monotonicity of on J(F) implies that f = 0 on (a, x0]. Then f = 0 on J(F), which violates the continuity assumption in (iii) and hence f > 0 on J(F).
To prove f is bounded on J(F), note that the monotonicities of h and imply that for any x, x0 ∈ J(F),
Hence for any x, x0 ∈ J(F).
To prove that f is differentiable on J(F) almost every, we first prove that f is Lipschitz continuous on (c, d) for any c, d ∈ J(F) with c < d.
By noticing the non-increasing monotonicity of h on J(F), the following arguments yield an upper bound of (f(y) − f(x))/(y − x) for x, y ∈ (c, d):
where the last equality follows from the mean value theorem and z is between x and y.
Since − s* < 0, it follows that F−s*(z) < F−s* (c) and hence
for x, y ∈ (c, d).
Similar arguments imply that
Hence
The last display shows that f is Lipschitz continuous on (c, d).
By Proposition 4.1(iii) of Shorack (2017), page 82, f is absolutely continuous on (c, d), and hence f is differentiable on (c, d) almost everywhere.
Since (c, d) is an arbitrary interval in (a, b), the differentiability of f on (c, d) implies the differentiability of f on (a, b) and hence f is differentiable on (a, b) with f′ = F″ almost everywhere.
Since f is differentiable almost everywhere, the non-increasing monotonicity of h on J(F) implies that
or, equivalently,
Straight-forward calculation yields that the last display is equivalent to
or,
which is the right hand side of (8).
Similarly, the non-decreasing monotonicity of implies the left hand side of (8).
(iv) implies (i):
Since F is continuous on , it suffices to prove that Fs* is convex on J(F) by Definition 2. Since we assume that F is differentiable on J(F) with derivative f = F′, the concavity of Fs* on J(F) can be proved by the non-increasing monotonicity of the first derivative of Fs* on J(F). Since f is differentiable almost everywhere on J(F), the non-increasing monotonicity of (Fs*)′ on J(F) can be proved by the non-positivity of (Fs*)″ on J(F) almost everywhere, which follows from
where f = F′, f′ = F″. The last inequality follows from the right hand side of (8).
Similarly, the concavity of (1 − F(x))s*, or , on J(F) can be proved by the following arguments:
where the last inequality follows from the left part of (8). □
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Barrio ED, Giné E and Utzet F (2005). Asymptotics for L2 functionals of the empirical quantile process, with applications to tests of fit based on weighted wasserstein distances. Bernoulli 11 131–189. [Google Scholar]
- Bobkov S and Ledoux M (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances, vol. 261. Mem. Amer. Math. Soc [Google Scholar]
- Borell C (1975). Convex set functions in d-space. Periodica Mathematica Hungarica 6(2) 111–136. [Google Scholar]
- Brascamp HJ and Lieb EH (1976). On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis 22 366–389. [Google Scholar]
- Csörgő M and Révész P (1978). Strong approximations of the quantile process. Ann. Statist 6 882–894. [Google Scholar]
- Dharmadhikari S and Joag-Dev K (1988). Unimodality, Convexity, and Applications. Academic Press. [Google Scholar]
- Dümbgen L, Kolesnyk P and Wilke RA (2017). Bi-log-concave distribution functions. Journal of Statistical Planning and Inference 184 1–17. [Google Scholar]
- Dümbgen L and Wellner JA (2014). Confidence bands for distribution functions: A new look at the law of the iterated logarithm. Tech. rep., Department of Statistics, University of Washington. [Google Scholar]
- Durrett R (2019). Probability—Theory and Examples, vol. 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. Fifth edition of [MR1068527]. [Google Scholar]
- Erdős P and Stone A (1970). On the sum of two Borel sets. Proceedings of the American Mathematical Society 25 304–306. [Google Scholar]
- Gardner RJ (2002). The Brunn-Minkowski inequality. Bull. Amer. Math. Soc. (N.S.) 39 355–405. [Google Scholar]
- Grenander U (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr 39 125–153 (1957). [Google Scholar]
- Hall P (1984). On unimodality and rates of convergence for stable laws. J. London Math. Soc (2) 30 371–384. [Google Scholar]
- Kleiber C and Kotz S (2003). Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley & Sons. [Google Scholar]
- Massart P (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab 18 1269–1283. [Google Scholar]
- Owen AB (1995). Nonparametric likelihood confidence bands for a distribution function. Journal of the American Statistical Association 90 516–521. [Google Scholar]
- Rinott Y (1976). On convexity of measures. Ann. Probab 4 1020–1026. [Google Scholar]
- Robertson T, Wright FT and Dykstra RL (1988). Order Restricted, Statistical Inference. Wiley & Sons. [Google Scholar]
- Rockafellar RT (1970). Convex Analysis. Princeton University Press. [Google Scholar]
- Samworth RJ (2018). Recent progress in log-concave density estimation. Statist. Sci 33 493–509. [Google Scholar]
- Samworth RJ and Sen B (2018). Editorial: special issue on “Nonparametric inference under shape constraints”. Statist. Sci 33 469–472. [Google Scholar]
- Saumard A (2019). Bi-log-concavity: some properties and some remarks towards a multi-dimensional extension. Electron. Commun. Probab 24 Paper No. 61, 8. [Google Scholar]
- Shorack GR (2017). Probability for Statisticians. Springer. [Google Scholar]
- Shorack GR and Wellner JA (2009). Empirical Processes with Applications to Statistics, vol. 59 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. Reprint of the 1986 original [MR0838963]. [Google Scholar]
- van Eeden C (1956). Maximum likelihood estimation of ordered probabilities. Statist. Afdeling S 188 (VP 5), Math. Centrum Amsterdam. [Google Scholar]
- Woolridge JM (2000). Instructional Stata datasets for econometrics. Boston College Department of Economics. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
