Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Aug 17.
Published in final edited form as: Sankhya Ser A. 2017 May 30;79(2):355–383. doi: 10.1007/s13171-017-0108-4

The Bennett-Orlicz Norm

Jon A Wellner 1
PMCID: PMC6097809  NIHMSID: NIHMS944267  PMID: 30122872

Abstract

van de Geer and Lederer (Probab. Theory Related Fields 157(1-2), 225–250, 2013) introduced a new Orlicz norm, the Bernstein-Orlicz norm, which is connected to Bernstein type inequalities. Here we introduce another Orlicz norm, the Bennett-Orlicz norm, which is connected to Bennett type inequalities. The new Bennett-Orlicz norm yields inequalities for expectations of maxima which are potentially somewhat tighter than those resulting from the Bernstein-Orlicz norm when they are both applicable. We discuss cross connections between these norms, exponential inequalities of the Bernstein, Bennett, and Prokhorov types, and make comparisons with results of Talagrand (Ann. Probab., 17(4), 1546–1570, 1989, 1991), and Boucheron et al. (2013).

AMS (2000) subject classification: Primary: 60E15, 60F10; Secondary: 60G50, 33E20

Phrases: Bennett’s inequality, Exponential bound, Maximal inequality, Orlicz norm, Poisson, Prokhorov’s inequality

1 Orlicz Norms and Maximal Inequalities

Let Ψ be an increasing convex function from [0, ∞) onto [0, ∞). Such a function is called a Young-Orlicz modulus by Dudley (1999), and a Young modulus by de la Peña and Giné (1999). Let X be a random variable. The Orlicz normXΨ is defined by

XΨ=inf{c>0:EΨ(|X|c)1},

where the infimum over the empty set is ∞. By Jensen’s inequality it is easily shown that this does define a norm on the set of random variables for which ‖XΨ is finite. The most important functions Ψ for a variety of applications are those of the form Ψ(x) = exp(xp)−1 ≡ Ψp(x) for p ≥ 1, and in particular Ψ1 and Ψ2 corresponding to random variables which are “sub-exponential” or “sub-Gaussian” respectively. See Krasnoseľskiĭ and Rutickiĭ (1961), Dudley (1999), Arcones and Giné (1995), de la Peña and Giné (1999), & van der Vaart and Wellner (1996) for further background on Orlicz norms, and see Rao and Ren (1991), Krasnoseľskiĭ and Rutickiĭ (1961), & Hewitt and Stromberg (1975) for more information about Birnbaum-Orlicz spaces.

The following useful lemmas are from van der Vaart and Wellner (1996), pages 95-97, and Arcones and Giné (1995) (see also de la Peña and Giné (1999), pages 188-190), respectively.

Lemma 1.1

Let Ψ be a convex, nondecreasing, nonzero function with Ψ(0) = 0 and lim supx,y→∞ Ψ(x)Ψ(y)/Ψ(cxy) < ∞ for some constant c. Then, for any random variables X1, …, Xm,

max1jmXiΨKΨ1(m)max1jmXiΨ (1.1)

where K is a constant depending only on Ψ.

Lemma 1.2

Let Ψ be a Young Modulus satisfying

limsupx,yΨ1(xy)Ψ1(x)Ψ1(y)<andlimsupxΨ1(x2)Ψ1(x)<.

Then for some constant M depending only on Ψ and every sequence of random variables {Xk: k ≥ 1},

supk1|Xk|Ψ1(k)ΨMsupk1XkΨ. (1.2)

The inequality (1.1) shows that if Orlicz norms for individual random variables {Xi}i=1m are under control, then the Ψ–Orlicz norm of the maximum of the Xi’s is controlled by a constant times Ψ−1(m) times the maximum of the individual Orlicz norms. The inequality (1.2) shows a stronger related Orlicz norm control of the supremum of an entire sequence Xk divided by Ψ−1(k) if the supremum of the individual Orlicz norms is finite. Lemma 1.2 implies Lemma 1.1 for Young functions of exponential type (such as Ψp(x) = exp(xp) − 1 with p ≥ 1), but it does not hold for power type Young functions such as Ψ(x) = xp, p ≥ 1. These latter Young functions continue to be covered by Lemma 1.1. Arcones and Giné (1995) carefully define Young moduli Ψp(x) = exp(xp) − 1 for all p > 0 and use Lemma 1.2 to establish laws of the iterated logarithm for U-statistics.

A general theme is that if Ψa ≤ Ψb and we have control of the individual Ψb Orlicz norms, then Lemma 1.1 or Lemma 1.2 applied with Ψ = Ψb will yield a better bound than with Ψ = Ψa in the sense that Ψb1(m)Ψa1(m).

Here we are interested in functions Ψ of the form

Ψ(x)=exp(h(x))1 (1.3)

where h is a nondecreasing convex function with h(0) = 0 not of the form xp. In fact, the particular functions h of interest here are (scaled versions of):

h0(x)=x22(1+x),
h1(x)=1+x1+2x,
h2(x)=h(1+x)=(1+x)log(1+x)x,
h4(x)=(x/2)arcsinh(x/2),
h5(x)=(x)arcsinh(x/2)2(cosh(arcsinh(x/2))1)

for the particular h(x) ≡ x(log x − 1) + 1. The functions h0 and h1 are related to Bernstein exponential bounds and refinements thereof due to Birgé and Massart (1998), while the function h2 is related to Bennett’s inequality (Bennett, 1962), and h4 is related to Prokhorov’s inequality (Prokhorov, 1959).

van de Geer and Lederer (2013) studied the family of Orlicz norms defined in terms of scaled versions of h1, and called called them Bernstein-Orlicz norms. Our primary goal here is to compare and contrast the Orlicz norms defined in terms of h0, h1, h2, and h4. We begin in the next section by reviewing the Bernstein-Orlicz norm(s) as defined by van de Geer and Lederer (2013). Section 3 gives corresponding results for what we call the Bennett-Orlicz norm(s) corresponding to the function h2. In Section 4 we give further comparisons and two applications.

2 The Bernstein-Orlicz Norm

For a given number L > 0, van de Geer and Lederer (2013) have defined the Bernstein-Orlicz norm XΨL with

ΨL(x)Ψ1(x;L)exp{(1+2Lx1L)2}1=exp{2L2h1(Lx)}1. (2.1)

It is easily seen that

Ψ1(x;L)~{exp(x2)1for Lxsmall,exp(2x/L)1for Lxlarge.

The following three lemmas of van de Geer and Lederer (2013) should be compared with the development on page 96 of van der Vaart and Wellner (1996).

Lemma 2.1

Let τZΨ1(·;L). Then

P(|Z|>τ[t+21Lt])2etfor allt>0;

or, equivalently, with h11(y)y+2y,

P(|Z|>(τ/L)h11(L2t2))2etfor allt>0;

or

P(|Z|>z)2exp(2L2h1(Lzτ))for allz>0. (2.2)

Lemma 2.2

Suppose that for some τ and L > 0 we have

P(|Z|τ[t+21Lt])2etfor allt>0.

Equivalently, the inequality (2.2) holds. Then ZΨ1(·;3L)3τ.

Example 2.1

Suppose that X ~ Poisson(ν). Then it is well-known (see e.g. Boucheron et al. (2013), page 23), that

P(|Xν|z)2exp(νh2(z/ν))2exp(9νh1(z/(3ν)))

where h2(x) = h(1 + x) = (x + 1) log(x + 1) − x. Thus the inequality involving h1 holds with 9ν = 2/L2 and 1/(3ν) = L/τ. Thus L=2/(9ν)=312/ν, τ=L3ν=2ν. We conclude from Lemma 2.2 that

XνΨ1(·;2/3ν)6ν.

Pisier (1983) and Pollard (1990) showed how to bound the Orlicz norm of the maximum of random variables with bounded Orlicz norms; see also de la Peña and Giné (1999), section 4.3, and van der Vaart and Wellner (1996), Lemma 2.2.2, page 96. The following bound for the expectation of the maximum was given by van de Geer and Lederer (2013); also see Boucheron et al. (2013), Theorem 2.5, pages 32-33.

Lemma 2.3

Let τ and L be positive constants, and let Z1, …, Zm be random variables satisfying max1jmZjΨLτ. Then

E{max1jm|Zj|}τΨ11(m;L)=τ{log(1+m)+L2log(1+m)}. (2.3)

Corollary 2.1

For m ≥ 2

E{max1jm|Zj|}τΨ11(m;L)2max{τ,Lτ/2}log(1+m).

In particular when Zj ~ Poisson(ν) for 1 ≤ jm

E{max1jm|Zj|}τΨ11(m;L)2{2ν1/3}log(1+m).

Proof

This follows from Lemma 2.3 since xx for x ≥ 1. The Poisson(ν) special case then follows from Example 2.1.

It will be helpful to relate Ψ1(·; L) to several functions appearing frequently in the theory of exponential bounds as follows: for x ≥ 0, we define

h(x)=x(logx1)+1,
h1(x)=1+x1+2x, (2.4)
h0(x)=x22(1+x).

It is easily shown (see e.g. Boucheron et al. (2013) Exercise 2.8, page 47) that

9h0(x/3)9h1(x/3)h(1+x). (2.5)

A trivial restatement of the inequality on the left above and some algebra and easy inequalities yield

h0(x)h1(x)2h0(x)h0(2x). (2.6)

The latter inequalities imply that the Orlicz norms based on h0 and h1 are equivalent up to constants.

One reason the functions h0 and h1 are so useful is that they both have explicit inverses: from Boucheron, Lugosi, and Massart (2013), page 29, for h1 and direct calculation for h0,

h11(y)=y+2y,for y0,
h01(y)=y+y2+2y.

To relate the inequalities in Lemmas 2.1 and 2.2 to more standard inequalities (with names) we note that

t+21Lt=1Lh11(L2t2).

This implies immediately that the inequality in Lemma 2.2 can be rewritten as

P(|Z|>z)2exp(2L2h1(Lzτ))2exp(2L2h0(Lzτ))=2exp(z2τ2+Lτz)for all z>0.

Here is a formal statement of a proposition relating exponential tail bounds in the traditional Bernstein form in terms of h0 to tail bounds in terms of the (larger) function h1.

Proposition 2.1

Suppose that a random variable Z satisfies

P(|Z|>z)2exp(z22(A+Bz))=2exp(AB2h0(BzA))for allz>0 (2.7)

for numbers A, B > 0. Then the hypothesis of Lemma 2.2 holds with L and τ given by L2 = 2B2/A and τ = 23/2A1/2:

P(|Z|>z)2exp(AB2h1(Bz2A)) (2.8)
=2exp(2L2h1(Lzτ))for allz>0. (2.9)

Proof

This follows from Eq. 2.6 and elementary manipulations.

The classical route to proving inequalities of the form given in Eq. 2.7 for sums of independent random variables is via Bernstein’s inequality; see for example (van der Vaart and Wellner, 1996) Lemmas 2.2.9 and 2.2.11, pages 102 and 103, or Boucheron et al. (2013), Theorem 2.10, page 37. But the recent developments of concentration inequalities via Stein’s method yields inequalities of the form given in Eq. 2.7 for many random variables Z which are not sums of independent random variables: see, for example, Ghosh and Goldstein (2011a), Ghosh and Goldstein (2011b), & Goldstein and Iṡlak (2014). The point of the previous proposition is that (up to constants) these inequalities in terms of h0 can be re-expressed in terms of the (larger) function h1.

3 Bennett’s Inequality and the Bennett-Orlicz Norm

We begin with a statement of a version of Bennett’s inequality for sums of bounded random variables; see Bennett (1962), Shorack and Wellner (1986), & Boucheron et al. (2013). Let h(x) ≡ x(log x−1)+1 and h2(x) ≡ h(1+x). This function arises in Bennett’s inequality for bounded random variables and elsewhere; see e.g. Bennett (1962), Shorack and Wellner (1986), & Boucheron et al. (2013), page 35 (but note that their h is our h2 = h(1 + ·)). As noted in Example 1 above, the function h also appears in exponential bounds for Poisson random variables: see Shorack and Wellner (1986) page 485, and Boucheron et al. (2013) page 23.

Proposition 3.1

(Bennett) (i) Let X1, …, Xn be independent with max1≤jn(Xjμj) ≤ b, E(Xj) = μj, Var(Xj)=σn,j2. Let μj=1nμj/n, σn2(σn,12++σn,n2)/n. Then with ψ(x) ≡ 2h(1 + x)/x2,

P(n(X¯nμ)z)exp(z22σn2ψ(zbnσn2))=exp(nσn2b2h(1+zbnσn2))=exp(nσn2b2h2(zbnσn2)) (3.1)

for all z > 0.

(ii) If, in addition, max1≤jn |Xjμj| ≤ b, then

P(|n(X¯nμ)|z)2exp(z22σn2ψ(zbnσn2))=2exp(nσn2b2h(1+zbnσn2))=2exp(nσn2b2h2(zbnσn2)).

Using the inequality h(1 + x) ≥ 9h1(x/3), it follows that

P(|n(X¯nμ)|z)2exp(9nσn2b2h1(zb3nσn2)).

Thus an inequality of the form of that in Lemma 2.1 holds with 2/L2=9nσn2/b2 and L/τ=b/(3nσn2). Thus L=2/9b/(nσn) and τ=L3nσn2/b=2σn. It follows from Lemma 2.2 that

n(X¯nμ)Ψ1(·;3L)6σn,

or

n(X¯nμ)Ψ1(·;2/3b/(nσn))6σn.

But this bound has not taken advantage of the fact the the first bound above involves the function h (or h2) rather than h1. It would seem to be of potential interest to develop an Orlicz norm based on the function h2h(1 + ·) rather than the function h1. Motivated by the first inequality in Proposition 3.1, we define for each L > 0 a new Orlicz norm based on the function h2 as follows.

Ψ2(x;L)exp(2L2h2(Lx))1.

Since h2 is convex, h2(0) = 0, and h2 is increasing on [0, ∞), it follows that Ψ2(·; L) defines a valid Orlicz norm (as defined in Section 1) for each L:

XΨ2(·;L)=inf{c>0:EΨ2(|X|c;L)1}; (3.2)

We call XΨ2(·;L) the Bennett-Orlicz norm of X. Note that with ψ(Lx) ≡ x−2(2/L2)h2(Lx),

Ψ2(x;L)=exp(x2ψ(Lx))1~{exp(x2)1for Lxsmall,exp(2xLlog(Lx))1for Lxlarge.

We first relate Ψ2(·; L) to Ψ1(x; L) and to the usual Gaussian Orlicz norm defined by Ψ2(x) = exp(x2) − 1.

Proposition 3.2

  1. Ψ2(x; L) ≤ exp(x2) − 1 = Ψ2(x) for all x ≥ 0.

  2. Ψ2 (x; L) ≥ Ψ1(x; L/3) for x ≥ 0.

Proof

(i) follows since ψ(x) ≡ 2x−2h(1 + x) ≤ 1 for all x ≥ 0; see Shorack and Wellner (1986), Proposition 11.1.1, page 441. To show that (ii) holds, note that by Eq. 2.1

Ψ1(x;L/3)=exp(2·9L2h1(Lx/3))1.

Thus the claimed inequality in (ii) is equivalent to

2L2h(1+Lx)2·9L2h1(Lx/3),

or equivalently

h(1+Lx)9h1(Lx/3).

But the inequality in the last display holds in view of Eq. 2.5.

Note that while h1 and Ψ1(·; L) have explicit inverses given in terms of v and log(1 + v) by Eqs. 2.7 and 2.3, inverses of the functions h2 and Ψ2(·; L) can only be written in terms of Lambert’s function (also called the product log function) W satisfying W (z) exp(W (z)) = z; see Corless et al. (1996). But this slight difficulty is easily overcome by way of several nice inequalities for W. By use of W and the inequalities developed in the Appendix, Section 6, we obtain the following proposition concerning Ψ21(·;L).

Proposition 3.3

  1. Ψ21(y;L)Ψ11(y;L/3)=log(1+y)+(L/6)log(1+y) for y ≥ 0.

  2. Furthermore, with W denoting the Lambert W function,
    Ψ21(y;L)=1Lh21(L22log(1+y))=1L{((L2/2)log(1+y)1)W(((L2/2)log(1+y)1)/e)1}.
  3. If (L2/2) log(1 + y) ≥ 1, then
    Ψ21(y;L)Llog(1+y)log((L2/2)log(1+y)1).
  4. If (L2/2) log(1 + y) ≥ 5, then
    Ψ21(y;L)2Llog(1+y)log((L2/2)log(1+y))2Llog(1+y)loglog(1+y)ifalsoL2/21.
  5. If (L2/2) log(1 + y) ≤ 9/4, then
    Ψ21(y;L)2log(1+y).
  6. Ψ21(y;L){(2.2/2)log(1+y)if(L2/2)log(1+y)1+e,Llog(1+y)1log((L2/2)log(1+y)1)1if(L2/2)log(1+y)>1+e.

Proof

(i) follows immediately from Proposition 3.2. (ii) follows from the definition of Ψ2(·; L) and direct computation for the first part; the second part follows from Lemma 6.1. The inequality in (iii) follows from (ii) and Lemma 6.2. The first inequality in (iv) follows from (iii) since log(y − 1) ≥ (1/2) log y for y ≥ 4. The second inequality in (iv) follows by noting that

log((L2/2)log(1+y))=log(L2/2)+loglog(1+y)loglog(1+y)

if L2/2 ≥ 1. (v) follows from (ii) and Lemma 6.3, part (iv).

Lemmas 2.1 and 2.2 by van de Geer and Lederer (2013) as stated in Section 2 should be compared with the development on page 96 of van der Vaart and Wellner (1996). We now show that the following analogues of Lemmas 2.1–2.3 hold for ZΨ2(·;L).

Lemma 3.1

Let τZΨ2(·;L). Then

P(|Z|>τLh21(L2t/2))2etfor allt>0

where h2(x) ≡ h(1 + x) and h21 is the inverse of h2 (so that h21(y)=h1(y)1)

Proof

Let y > 0. Since Ψ2(x;L) = exp((2/L2)h2(Lx)) − 1 = et − 1 implies h2(Lx) = L2t/2, it follows that for any c>ZΨ2(·;L) we have

P(|Z|c>1Lh21(L2t/2))=P(h2(L|Z|c)>L2t/2)=P(2L2h2(L|Z|c)>t)=P(exp(2L2h2(L|Z|c))1>et1)=P(Ψ2(|Z|c;L)>et1)E{Ψ2(|Z|c;L)+1}et2etas cτZΨ2(·;L).

Lemma 3.2

Suppose that for some τ > 0 we have

P(|Z|>τLh21(L2t/2))2etfor allt>0.

Equivalently,

P(|Z|>z)2exp(2L2h2(Lzτ))=2exp(2L2h(1+Lzτ))=2exp(z2τ2ψ(Lzτ))for allz>0.

Then ZΨ2(·;3L)3τ.

Proof

Let α, β > 0. We compute

EΨ2(|Z|ατ;βL)=0P(Ψ2(|Z|ατ;βL)v)dv=0P(2β2L2h2(βL|Z|ατ)log(1+v))dv=0P(βL|Z|αττh21(β2L22log(1+v)))dv=0P(|Z|ατβLh21(β2L22log(1+v)))dv=0P(|Z|ατβLh21(β2L22t))etdt.

Choosing α=β=3 this yields

EΨ2(|Z|3τ;3L)=0P(|Z|τLh21(L223t))etdt02exp(3t)exp(t)dt=1.

Hence we conclude that ZΨ2(·;3L)3τ.

Corollary 3.1

  1. If X ~ Poisson(ν), then XνΨ2(·;6/ν)6/ν.

  2. If X1, …, Xn are i.i.d. Bernoulli(p), then
    n(X¯np)Ψ2(·;2/(np(1p)))6p(1p).
  3. If X ~ N(0, 1), then XΨh(·;L)6 for every L > 0. By taking the limit on L ↘ 0 and noting that Ψ2·(z; L) → Ψ2(z) ≡ exp(z2) − 1 as L ↘ 0 this yields XΨ26. In this case it is known that XΨ2=8/3. (See van der Vaart and Wellner (1996), Exercise 2.2.1, page 105.)

Now for an inequality paralleling Lemma 2.3 for the Bernstein-Orlicz norm:

Lemma 3.3

Let τ and L be constants, and let Z1, …, Zm be random variables satisfying max1jmZjΨ2(·;L)τ. Then

E{max1jm|Zj|}τΨ21(m;L)2τLlog(1+m)loglog(1+m)ifL22andlog(1+m)5.

Furthermore,

E{max1jm|Zj|}τΨ21(m;L)2τ{Llog(1+m)loglog(1+m)+log(1+m)}

for all m such that log(1 + m) ≥ 5 (or me5 − 1).

Remark 3.1

The point of this last bound is that it gives an explicit trade-off between the Gaussian component (the term log(1+m)) and the Poisson component (the term log(1 + m)/log log(1 + m)) governed by a Bennett type inequality. In contrast, the bounds obtained by van de Geer and Lederer (2013) yield a trade-off between the Gaussian world and the sub-exponential world governed by a Bernstein type inequality.

Proof

We write Ψ2,L ≡ Ψ2(·; L). Let c>τ. Then by Jensen’s inequality

E{max1jm|Zj|}cΨ2,L1(E{Ψ2,L(max1jm|Zj|/c)})=cΨ2,L1(E{max1jmΨ2,L(|Zj|/c)})cΨ2,L1(j=1mEΨ2,L(|Zj|/c))cΨ2,L1(mmax1jmEΨ2,L(|Zj|/c)).

Therefore,

E{max1jm|Zj|}limcτ{cΨ2,L1(mmax1jmEΨ2,L(|Zj|/c))} (3.3)
τΨ2,L1(m)=τLh21(L22log(1+m)). (3.4)

The remaining claims follow from Proposition 3.3.

Here are analogues of Lemmas 4 and 5 of van de Geer and Lederer (2013).

Lemma 3.4

Let Z1, …, Zm be random variables satisfying

max1jmZjΨ2(·,L)τ (3.5)

for some L and τ. Then, for all t > 0

P(max1jm|Zj|τL(h21(L2t/2)+h21(L2log(1+m)/2)))2et.

Proof

For any a > 0 and t > 0 concavity of h21 together with h21(0)=0 imply that

h21(a)+h21(t)h21(a+t).

Therefore, by using a union bound and Lemma 3.1

P(max1jm|Zj|τL(h21(L2t/2)+h21(L2log(1+m)/2)))P(max1jm|Zj|τL(h21(21L2(t+log(1+m)))))j=1mP(|Zj|τL(h21(21L2(t+log(1+m)))))2mexp((t+log(1+m)))=2mm+1et2et.

Lemma 3.5

Let Z1, …, Zm be random variables satisfying (3.5). Then

(max1jm|Zj|τLh21(L22log(1+m)))+Ψ2(·;L)3τ.

Proof

Let

Z(max1jm|Zj|τLh21(L22log(1+m)))+.

Then Lemma 3.4 implies that

P(ZτLh21(L2t2))P(max1jm|Zj|τL(h21(L22log(1+m))+h21(L2t2)))2et.

Then the conclusion follows from Lemma 3.2.

4 Prokhorov’s “Arcsinh” Exponential Bound and Orlicz Norms

Another important exponential bound for sums of independent bounded random variables is due to Prokhorov (1959). As will be seen below, Prokhorov’s bound involves another function h4 (rather than h2 of Bennett’s inequality) given by

h4(x)=(x/2)arcsinh(x/2)=(x/2)log(x/2+1+(x/2)2). (4.1)

Suppose that X1, …, Xn are independent random variables with E(Xj) = μj and |Xjμj| ≤ b for some b > 0. Let Sn = X1 + ⋯ + Xn, and set μn1j=1nμj, σn2n1Var(Sn). Prokhorov’s “arcsinh” exponential bound is as follows:

Proposition 4.1

(Prokhorov) If the Xj’s satisfy the above assumptions, then

P(Snnμz)exp(z2barcsinh(zb2σn2)).

Equivalently, with σn2n1Var(Sn) and h4(x) ≡ (x/2)arcsinh(x/2),

P(|n(X¯nμ)|z)=2exp(zn2barcsinh(zb2nσn2))=2exp(nσn2b2(zb2nσn2)arcsinh(zb2nσn2))2exp(2σn2b2h4(zbnσn2)). (4.2)

See e.g. Prokhorov (1959), Stout (1974), de la Peña and Giné (1999), Johnson et al. (1985), & Kruglov (2006). Johnson et al. (1985) use Prokhorov’s inequality to control Orlicz norms for functions Ψ of the form Ψ(x) = exp(ψ(x)) with ψ(x) ≡ x log(1+x) and use the resulting inequalities to show that the optimal constants Dp in Rosenthal’s inequalities grow as p/log(p).

Kruglov (2006) gives an improvement of Prokhorov’s inequality which involves replacing h4 by

h5(x)xarcsinh(x/2)2(cosh(arcsinh(x/2))1).

Note that Prokhorov’s inequality is of the same form as Bennett’s inequality (3.1) in Proposition 3.1, but with Bennett’s h2 replaced by Prokhorov’s h4.

Thus we want to compare Prokhorov’s inequality (and Kruglov’s improvement thereof) to Bennett’s inequality. As can be seen from the above development, this boils down to comparison of the functions h2, h4, and h5. The following lemma makes a number of comparisons and contrasts between the functions h2, h4, and h5.

Lemma 4.1

(Comparison of h2, h4, and h5)

  • (i)(a)

    h2(x) ≥ h5(x) ≥ h4(x) for all x ≥ 0.

  • (i)(b)

    h21(y)h51(y)h41(y) for all y ≥ 0.

  • (ii)(a)

    h2(x) ≥ (x/2) log(1 + x) ≥ (x/2) log(1 + x/2) for all x ≥ 0.

  • (ii)(b)

    h4(x) ≥ (x/2) log(1 + x/2) for all x ≥ 0.

  • (ii)(c)

    h5(x) ≥ (x/2) log(1 + x/2) for all x ≥ 0.

  • (iii)(a)

    h2(x) ~ 2−1x2 as x ↘ 0; h2(x) ~ x log(x) as x → ∞.

  • (iii)(b)

    h4(x) ~ 4−1x2 as x ↘ 0; h4(x) ~ (1/2)x log(x) as x → ∞.

  • (iii)(c)

    h5(x) ~ 4−1x2 as x ↘ 0; h5(x) ~ x log(x) as x → ∞.

  • (iii)(d)

    h2(x) − h4(x) ~ x2/4 as x ↘ 0; h2(x) − h4(x) ~ (1/2)x log x as x→ ∞.

  • (iii)(e)

    h2(x) − h5(x) ~ x2/4 as x ↘ 0; h2(x) − h5(x) ~ log x as x→ ∞.

  • (iv)(a)
    h2(x) = 2−1x2ψ2(x) where
    ψ2(x){x1log(1+x),(1+x/3)1.
  • (iv)(b)
    h4(x) = 4−1x2ψ4(x) where
    ψ4(x){2x1log(1+x/2),forx0(1/2)/(1+x/2)1,forx0(1δ)/(1+x/2),forx2δ1/2/(1/2δ)1/2.
  • (iv)(c)
    h5(x) = 4−1x2ψ5(x) where
    ψ5(x){2x1log(1+x/2),forx01/(1+x/2)1,forx0.

Proof

(i) We first prove that h2(x) ≥ h4(x). Let g(x) = h2(x) − h4(x); thus

g(x)=(1+x)log(1+x)x(x/2)log(x/2+1+(x/2)2).

Then g(0) = 0 and

g(x)=log(1+x)12log(x/2+1+(x/2)2)x/41+(x/2)2

also has g′(0) = 0. Note that 1+(x/2)21+x/2 and hence x/2+1+(x/2)21+x. Thus

log((x/2)+1+(x/2)2)log(1+x), (4.3)

and hence

g(x)log(1+x)(1/2)log(1+x)x/41+(x/2)2=(1/2)log(1+x)x/41+(x/2)2,

and it suffices to show that the right side is ≥ 0 for all x. Thus we let

m(x)g(x)=12log(1+x)12x/21+(x/2)2=12{log(1+x)x/21+(x/2)2}.

Let m¯(x)2m(2x)=log(1+2x)x1+x2. Then m¯(0)=0 and we compute

m¯(x)=21+2x11+x2+x2(1+x2)3/2=21+2x11+x2(1x2(1+x2))=2(1+x2)3/2(1+2x)(1+2x)(1+x2)3/2j(x)(1+2x)(1+x2)3/2

so that m¯(0)=1 and the numerator, j, is easily seen to be non-negative since (1 + x2)3/2 ≥ 1 + x2 implies 2(1 + x2)3/2 ≥ 2(1 + x2) ≥ 1 + 2x for all x ≥ 0. Thus h2(x) ≥ h4(x).

Kruglov (2006) shows that h5(x) ≥ h4(x). Now we show that h2(x) ≥ h5(x). Note that with g(x) ≡ h2(x) − h5(x),

g(x)=log(1+x)arcsinh(x/2)

has g′(x) = 0 and g′(x) ≥ 0 (as was shown above in (4.3)). Thus g(x)=0xg(v)dv0.

  • (i)(b)

    The inequalities for the inverse functions follow immediately from the inequalities for the functions themselves in (i)(a).

  • (ii)(a)
    To show that the first inequality holds, consider
    g(x)h2(x)(x/2)log(1+x)=(1+x)log(1+x)x(x/2)log(1+x)=(1+x/2)log(1+x)x.
    Then g(0) = 0 and
    g(x)=12log(1+x)+1+x/21+x1=1/21+x{(1+x)log(1+x)x}=1/21+xh2(x)0.
    Thus g′ (0) = 0 and g(x)=0xg(y)dy0. The second inequality in (ii)(a) is trivial.
  • (ii)(b)

    This follows easily from arcsinh(v)=log(v+1+v2)log(v+1) for all v ≥ 0.

  • (ii)(c)

    This follows from (i)(a) and (ii)(b).

  • (iii)(a)

    This follows from ψ2(x) ≡ ψ(x) → 1 as x ↘ 0; see Proposition 11.1.1, page 441, Shorack and Wellner (1986).

  • (iii)(b)
    Now
    h4(x)=x41+(x/2)2+12arcsinh(x/2),
    with h4(0)=0, and
    h4(x)=121+(x/2)2x216(1+(x/2)2)3/2=8+x22(4+x2)3/2
    with h4(0)=1/2. Therefore
    h4(x)=h4(x)x22for some0xx
    and
    4x2h4(x)=2h4(x)1as xx0.
  • (iii)(c)
    Now
    h5(x)=arcsinh(x/2),
    h5(x)=121+(x/2)212as x0,
    where h5(x)=0 and h5 is decreasing. Thus h5(x)=(x2/2)h5(x) for some 0 ≤ xx* and we conclude that 4x−2h5(x) → 1 as xx* ↘ 0.
  • (iv)(a)

    The first part is a restatement of (ii)(a). The second part follows from Eq. 2.6: h2(x) = h(1 + x) ≥ 9h0(x) = x2/(2(1 + x/3)), and the claim follows by definition of ψ2.

  • (iv)(b)
    The first inequality is a restatement of (ii)(b). The second inequality follows since h4(x)=h4(x) where xh4(x) is decreasing, so
    4y2h4(x)=2h4(x)2h4(x)=11+(x/2)2x28(1+(x/2)2)3/2=11+(x/2)2·1+x2/81+x2/41/21+(x/2)21/21+x/2.
    To prove the third inequality, note that
    1+x2/81+x2/4c
    holds if 1 + x2/8 ≥ c(1 + x2/4), or if 1 − c ≥ (x2/4)(c − 1/2). Then rearrange and take c = (1 − δ) for δ ∈ (0, 1/2).
  • (iv)(c)
    The first inequality follows from (ii)(c). The second inequality follows by arguing as in (iv)(b), but now without the complicating second factor: note that
    4x2h5(x)=2h5(x)2h5(x)=11+(x/2)211+x/2
    since h5 is decreasing.

Discussion

  1. Even though Kruglov’s inequality improves on Prokhorov’s inequality, (ia) of Lemma 4.1 shows that Bennett’s inequality dominates both Kruglov’s improvement of Prokhorov’s inequality and Prokhorov’s inequality itself: h2h5h4.

  2. (ii) of Lemma 4.1 shows that all three of the inequalities, Bennett, Kruglov, and Prokhorov, are based on functions h2, h5, and h4 which are bounded below by (x/2) log(1+x/2) for all x ≥ 0. On the other hand, (ii)(d) shows that both h2 and h5 are very nearly equivalent for large x, but that although h4 grows at the same x log x rate as h2 and h5, h4 is smaller by a multiplicative factor of 1/2 as x → ∞.

  3. (iii)(a-c) of Lemma 4.1 shows that h2(x) ~ x2/2 as x ↘ 0 while hk(x) ~ x2/4 for both h5 and h4; thus h2(x) is larger at x = 0 by a factor of 2. Furthermore, the difference h2h4 is of order (1/2)x log x as x→∞, while the difference h2h5 is only of order log x as x → ∞.

  4. (iv) of Lemma 4.1 re-expresses the behavior of the Kruglov and Prokhorov inequalities for small values of x in terms of the corresponding ψk functions. The upshot of all of these comparisons is that Bennett’s inequality dominates both the Kruglov and Prokhorov inequalities. Figures 12 give graphical versions of these comparisons as well as comparisons to the Bernstein type h–functions h0 and h1.

Figure 1.

Figure 1

Comparison of the hk functions h0, h1, h2, h4, and h5. The plot shows the functions hk. The function h0 is plotted in magenta (tiny dashing), h1 in blue (medium dashing), h2 in red (no dashing), h4 in purple (large dashing), and h5 is plotted in black (medium dashing). For values of the argument larger than 1.4 h2 > h5 > h4h1 > h0 (and all are below h2), while for values of the argument smaller than 1, h2 > h1 > h0h5 > h4

Figure 2.

Figure 2

The plot depicts (with the same colors and dashing as in Fig. 1) the ratios xhk(x)/(x2/2) ≡ ψk(x) for k ∈ {0, 1, 2, 4, 5}. This figure illustrates our finding that the Prokhorov type h–functions are smaller by a factor of 1/2 at x = 0, while they again dominate the Bernstein type h–functions for larger values of x, with the cross-overs occurring again between 1 and 1.4

5 Comparisons with Some Results of Talagrand

Our goal in this section is to give comparisons with some results of Talagrand (1989, 1994), especially his Theorem 3.5, page 45, and Proposition 6.5, page 58.

Talagrand (1994) defines a function φL,S as follows:

φL,S(x){x2L2S,if xLS,xL(log(exLS))1/2if xLS.

Because of the square-root on the log term, this can be regarded as corresponding to a “sub - Bennett” type exponential bound. One of the interesting properties of φL,S established by Talagrand (1994) is given in the following lemma:

Lemma 5.1

There is a number K(L) depending on L only such that

φL,S(K(L)xS)11x2for allxK(L)S.

This is Lemma 3.6 of Talagrand (1994) page 47. Talagrand uses this Lemma to develop a Kiefer-type inequality: see also van der Vaart and Wellner (1996), Corollary A.6.3. In the basic Kiefer type inequality for Binomial random variables, van der Vaart and Wellner (1996), Corollary A.6.3, it follows that

P(n|Y¯np|z)2exp(11z2)

for log(1/p) − 1 ≥ 11; i.e. for pe−12.

A similar fact holds for any exponential bound of the Bennett type under a certain boundedness hypothesis. Suppose that

P(|Z|>z)2exp(z22τψ(Lzτ))

and that P (|Z| ≥ v) = 0 for all vC. Then, since ψ is decreasing, for zC

P(|Z|>z)2exp(z22τψ(LCτ))=2exp(z22LCLCτψ(LCτ))2exp(z2LClog(LCeτ))

where the log term can be made arbitrarily large by choosing τ sufficiently small. Here the second inequality follows from the fact that

xψ(x)2log(x/e)=2(logx1). (5.1)

Proof. of Eq. 5.1

Proof. of Eq. 5.1: Since ψ(x) = 2x−2h(1 + x) where h(x) = x(log x − 1) + 1, we can write, with ψ_(x)2x2log(x/e),

12(xψ(x)ψ_(x))=1x{(1+x)log(1+x)x}(logx1)=1+xxlog(1+x)logx=log(1+xx)+1xlog(1+x)

where both terms are clearly non-negative.

Now we consider another basic inequality due to Talagrand (1994). Suppose that

θ:2X{C2X:|C|<}

satisfies the following three properties:

  1. CD implies that θ(C) ≤ θ(D) for C, D2X.

  2. θ(CD) ≤ θ(C) + θ(D).

  3. θ(C) ≤ |C| = #(C).

Then if X1, …, Xn are i.i.d. P non-atomic on (X,A) and Zθ({X1, …, Xn}), for some universal constant K2 we have, for zK2E(Z),

P(Zz)exp(zK2log(ezK2E(Z))).

As noted by Talagrand (1994), this follows from an isoperimetric inequality established in Talagrand (1989), but it is also a consequence of results of Talagrand (1991, 1995). Here we simply note that it can be rephrased as a Bennett type inequality: for all zK2E(Z)

P(Zz)exp(E(Z)h(1+zK2E(Z))).

This follows by simply checking that

E(Z)h(1+zK2E(Z))zK2E(Z)log(ezK2E(Z))

for zK2E(Z).

Also see Ledoux (2001), Theorem 7.5, page 142 and Corollary 7.8, page 148; Massart (2000), and Boucheron et al. (2013), Theorem 6.12, page 182.

One further remark seems to be in order: Talagrand (1989) Theorem 2 and Proposition 12, shows that Orlicz norms of the Bennett type are “too large” to yield nice generalizations of the classical Hoffmann-Jørgensen inequality in the setting of sums of independent bounded sequences in a general Banach space. This follows by noting that Talagrand’s condition (2.11) fails for the Bennett-Orlicz norm Ψ2(·, L) as defined in Eq. 3.2.

Acknowledgments

I owe thanks to Evan Greene and Johannes Lederer for several helpful conversations and suggestions. Thanks are also due to Richard Nickl for a query concerning Prokhorov’s inequality.

Jon A. Wellner was supported in part by NSF Grants DMS-1104832 and DMS-1566514, and NI-AID grant 2R01 AI291968-04

Appendix 1: Lambert’s Function W; Inverses of h and h2

Let h(x) ≡ x(log x − 1) + 1 and h2(x) ≡ h(1 + x) for x ≥ 0. The function h is convex, decreasing on [0, 1], increasing on [1, ∞), with h(1) = 0; see Shorack and Wellner (1986), page 439. The Lambert, or product log function, W (see e.g. Corless et al. (1996) and satisfies W (x)eW(x) = x for x ≥ −1/e. As noted by Boucheron et al. (2013), problem 2.18, the inverse functions h−1 (for the function h: [1, ∞) → [0, ∞)) and h21 (for the function h2: [0, ∞) → [0, ∞)) can be expressed in terms of the function W. Here are some facts about W:

Fact 1

W: [−1/e, ∞) ↦ ℝ is multi-valued on [−1/e, 0) with two branches W0 and W−1 where W0(x) > 0, W−1(x) < 0, and W0(−1/e) = −1 = W−1(−1/e).

Fact 2

W0 is monotone increasing on [−1/e, ∞) with W (0) = 0 and W′ (0) = 1.

See Roy and Olver (2010), section 4.13, page 111; and Corless et al. (1996).

In the following we simply write W for W0. The following lemma shows that the inverses of the functions h and h2 can be expressed in terms of W.

Lemma 6.1

(h and h2 inverses in terms of W)

  1. For y ≥ 0
    h1(y)=e(y1)/eW((y1)/e)1. (6.1)
  2. For y ≥ 0
    h21(y)=h1(y)1=e(y1)/eW((y1)/e)10. (6.2)

Proof

If h−1 is as in the display we have, since h(x) = x(log x − 1) + 1,

h(h1(y))=e(y1)/eW((y1)/e)(log((y1)/eW((y1)/e))+11)+1=e(y1)/eW((y1)/e)W((y1)/e)+1since eW(x)=xW(x)=y1+1=y.

Thus Eq. 6.1 holds. Then Eq. 6.2 follows immediately.

In view of Lemma 6.1, the following lower bounds on the function W will be useful in deriving upper bounds on h−1 and h21.

Lemma 6.2

(A lower bound for W) For z > 0

W(z)12log(ez)=21(1+logz). (6.3)

Proof

We first prove (6.3) for z ≥ 1/e. Since W(z) is increasing for z ≥ 0, the claimed inequality is equivalent to

z=W(z)eW(z)12log(ez)exp((1/2)log(ez))=(ez)1/2log((ez)1/2)ylogy,

for ez ≥ 1 where y ≡ (ez)1/2. But then the last display is equivalent to

e1y2ylogyfory1

or

g(y)y2eylogy0for all y1.

Now g(1) = 0, g(e) = 0, and g′(y) = 2yee log y has g′(1) = 2 − e < 0, g′(e) = 0, and g′(y) > 0 for y > e with g″(y) = 2 − e/y, we find that g″(e) = 2 − e/e = 1 > 0. Thus the claimed bound holds for z ≥ 1/e. For 0 ≤ z < 1/e the bound holds trivially since W (z) ≥ 0 while 2−1 log(ez) < 0.

Combining Lemma 6.1 with the lower bounds for W given in Lemma 6.2 yields the following upper bounds for h−1 and h21. The second and third parts of the following lemma are motivated by the fact that h2(x) = h(1 + x) ≡ (x2/2)ψ(x) where ψ(x) ↗ 1 as x ↘ 0; see Shorack and Wellner (1986), Proposition 4.4.1, page 441.

Lemma 6.3

(Upper bounds for h−1 and h21)

  1. For y > 1 + e
    h1(y)2y1log(y1). (6.4)
  2. For y > 1 + e,
    h21(y)2y1log(y1)1. (6.5)
  3. For 0 ≤ y ≤ 9c−2(c2/2 − 1)2 with c>2,
    h21(y)cy. (6.6)
    In particular, with c = 2, the bound holds for 0 ≤ y ≤ 9/4, and with c = 2.2, the bound holds for 0 ≤ y ≤ 1 + e.
  4. For 0 < y < ∞,
    h21(y){2.2vy1+e,2y1log(y1)1y>1+e. (6.7)

Proof

  1. Follows from (i) of Lemma 6.1 together with Lemma 6.2. Note that g(x) ≡ x/log(x) ≥ e and g is increasing for xe.

  2. follows from (ii) of Lemma 6.1 and Lemma 6.2.

  3. To show that Eq. 6.6 holds, note that the inequality is equivalent to yh2(cy), and hence, by taking xcy, to the inequality
    x2/c2h2(x)=h(1+x)=x22ψ(x)
    where ψ(x) ≡ (2/x2)h(1 + x) ≥ 1/(1 + x/3) by Lemma 4.1 (iva) (or by (10) of Proposition 11.4.1, Shorack and Wellner (1986) page 441). But then we have
    h2(x)=h(1+x)x2211+x/3x2c2
    where the last inequality holds if 0 ≤ x ≤ 3(c2/2 − 1). Hence the inequality in (iii) holds for 0 ≤ yx2/c2 ≤ 9(c2/2 − 1)2/c−2. Finally, (iv) holds by combining the bounds in (ii) and (iii).

Appendix 2: General versions of Lemmas 1-5

Now consider Young functions of the form Ψ = eψ − 1 where ψ is assumed to be convex and nondecreasing with ψ(0) = 0. (Note that we have changed notation in this section: the functions h and hj for j ∈ {0, 1, 2, 4, 6} in Sections 1–6 are denoted here by ψ.) Our goal in this section is to give general versions of Lemmas 1 - 5 of van de Geer and Lederer (2013) and Section 3 above. The advantage of this formulation is that the resulting lemmas apply to all the special cases treated in Sections 2 and 3 and more.

Lemma 7.1

Suppose that τ ≡ ‖ZΨ < ∞. Then for all t > 0

P(|Z|>τψ1(t))2et.

For the general version of Lemma 2 we consider a scaled version of Ψ as follows:

Ψ(z;L)ΨL(z)exp(2L2ψ(Lz))1. (7.1)

Lemma 7.2

Suppose that for some τ > 0 and L > 0

P(|Z|τLψ1(L2t/2))2etfor allt>0.

Then ZΨ(·;3L)3τ.

Lemma 7.3

Suppose that Ψ is non-decreasing, convex, with Ψ(0)=0. Suppose that Z1, …, Zm are random variables with max1≤jmZjΨτ <∞. Then

E(max1jm|Zj|)τΨ1(m).

Lemma 7.4

Suppose that Ψ is non-decreasing, convex, with Ψ(0)=0. Suppose that Z1, …, Zm are random variables with max1≤jmZjΨτ <∞. Then

P(max1jm|Zj|τ(ψ1(log(1+m))+ψ1(t)))2et.

Lemma 7.5

Suppose that Ψ is non-decreasing, convex, with Ψ(0)=0. Suppose that Z1, …, Zm are random variables with max1≤jmZjΨτ <∞. Then

(max1jm|Zj|τΨ1(m))+Ψ(·;6)6τ.

Proof of Lemma 7.1

For all c > ‖ZΨ

P(|Z|/c>ψ1(t))=P(ψ(|Z|/c)>t)=P(eψ(|Z|/c)1>et1)=P(Ψ(|Z|/c)>et1)(EΨ(|Z|/c)+1)et.

Thus letting cτ yields

P(|Z|/τ>ψ1(t))=limcτP(|Z|/c>ψ1(t))limcτ(EΨ(|Z|/c)+1)et2et.

Proof of Lemma 7.2

Let α, β > 0. We compute

EΨ(|Z|ατ;βL)=0P(Ψ(|Z|ατ;βL)v)dv=0P(2β2L2ψ(βL|Z|ατ)log(1+v))dv=0P(βL|Z|ατψ1(β2L22log(1+v)))dv=0P(|Z|ατβLψ1(β2L22log(1+v)))dv=0P(|Z|ατβLψ1(β2L22t))etdt02e3tetdt=1

by choosing α=β=3.

Proof of Lemma 7.3

Let c > τ. Then by Jensen’s inequality and convexity of Ψ

Ψ(E{max1jm|Zj|}c)E{Ψ(max1jm|Zj|/c)}=E{max1jmΨ(|Zj|/c)}E{j=1mEΨ(|Zj|/c)}m·max1jmEΨ(|Zj|/c).

Letting cτ yields

E{max1jm|Zj|}τΨ1(m·max1jmΨ(|Zj|/τ))=τΨ1(m).

Proof of Lemma 7.4

For any u > 0 and v > 0 concavity of ψ−1 implies that

ψ1(u)+ψ1(v)ψ1(u+v).

Therefore, by using this with u = log(1 + m) and v = t, a union bound, and Lemma 7.1,

P(max1jm|Zj|τ(ψ1(log(1+m))+ψ1(t))))P(max1jm|Zj|τψ1(log(1+m)+t))j=1mP(|Zj|τψ1(log(1+m)+t))2mexp((t+log(1+m)))=2mm+1et2et.

Proof of Lemma 7.5

By Lemma 7.4

P(max1jm|Zj|τ(ψ1(log(1+m))+ψ1(t))))2etfor all t>0,

so the hypothesis of Lemma 7.2 holds for

Z(max1jm|Zj|τψ1(log(1+m)))+

with L=2 and τ replaced by 2τ. Thus the conclusion of Lemma 7.2 holds for Z with these choices of L and τ:ZΨ(·;6)6τ.

References

  1. Arcones MA, Giné E. On the law of the iterated logarithm for canonical U-statistics and processes. Stochastic Process Appl. 1995;58(2):217–245. [Google Scholar]
  2. Bennett G. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association. 1962;57:33–45. [Google Scholar]
  3. Birgé L, Massart P. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli. 1998;4(3):329–375. [Google Scholar]
  4. Boucheron S, Lugosi G, Massart P. Concentration Inequalities. Oxford University Press; Oxford: 2013. [Google Scholar]
  5. Corless RM, Gonnet GH, Hare DEG, Jeffrey DJ, Knuth DE. On the Lambert W function. Adv Comput Math. 1996;5(4):329–359. [Google Scholar]
  6. De La Peña VH, Giné E. Probability and its Applications (New York) Springer-Verlag; New York: 1999. Decoupling; From dependence to independence. [Google Scholar]
  7. Dudley RM. Uniform Central Limit Theorems, volume 63 of Cambridge Studies in Advanced Mathematics. Cambridge University Press; Cambridge: 1999. [Google Scholar]
  8. Ghosh S, Goldstein L. Applications of size biased couplings for concentration of measures. Electron Commun Probab. 2011a;16:70–83. [Google Scholar]
  9. Ghosh S, Goldstein L. Concentration of measures via size-biased couplings. Probab Theory Related Fields. 2011b;149(1-2):271–278. [Google Scholar]
  10. Goldstein L, Iṡlak Ü. Concentration inequalities via zero bias couplings. Statist Probab Lett. 2014;86:17–23. [Google Scholar]
  11. Hewitt E, Stromberg K. Real and Abstract Analysis. Springer-Verlag; New York-Heidelberg: 1975. A modern treatment of the theory of functions of a real variable, Third printing, Graduate Texts in Mathematics, No. 25. [Google Scholar]
  12. Johnson WB, Schechtman G, Zinn J. Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Ann Probab. 1985;13(1):234–253. [Google Scholar]
  13. Krasnoseľskiĭ MA, Rutickiĭ JB. Convex Functions and Orlicz Spaces. Leo F. Boron. P. Noordhoff Ltd.; Groningen: 1961. Translated from the first Russian edition. [Google Scholar]
  14. Kruglov VM. Strengthening of Prokhorov’s arcsine inequality. Theor Probab Appl. 2006;50:677–684. Transl. from Strengthening the Prokhorov arcsine inequality, Teor. Veroyatn. Primen., 50, (2005). [Google Scholar]
  15. Ledoux M. The Concentration of Measure Phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society; Providence, RI: 2001. [Google Scholar]
  16. Massart P. About the constants in Talagrand’s concentration inequalities for empirical processes. Ann Probab. 2000;28(2):863–884. [Google Scholar]
  17. Pisier G. Banach spaces, harmonic analysis, and probability theory (Storrs, Conn., 1980/1981), volume 995 of Lecture Notes in Math. Springer; Berlin: 1983. Some applications of the metric entropy condition to harmonic analysis; pp. 123–154. [Google Scholar]
  18. Pollard D. Empirical Processes: Theory and Applications. Institute of Mathematical Statistics; Hayward CA: American Statistical Association; Alexandria, VA: 1990. (NSF-CBMS Regional Conference Series in Probability and Statistics, 2). [Google Scholar]
  19. Prokhorov YV. An extremal problem in probability theory. Theor Probability Appl. 1959;4:201–203. [Google Scholar]
  20. Rao MM, Ren ZD. Theory of Orlicz spaces, volume 146 of Monographs and Textbooks in Pure and Applied Mathematics. Marcel Dekker, Inc; New York: 1991. [Google Scholar]
  21. Roy R, Olver FWJ. NIST handbook of mathematical functions. U.S. Dept. Commerce; Washington, DC: 2010. Elementary functions; pp. 103–134. [Google Scholar]
  22. Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. John Wiley & Sons Inc.; New York: 1986. (Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics). [Google Scholar]
  23. Stout WF. Almost Sure Convergence. Vol. 24 Academic Press; New York-London: 1974. (Probability and Mathematical Statistics). [A subsidiary of Harcourt Brace Jovanovich, Publishers] [Google Scholar]
  24. Talagrand M. Isoperimetry and integrability of the sum of independent Banach-space valued random variables. Ann Probab. 1989;17(4):1546–1570. [Google Scholar]
  25. Talagrand M. Geometric aspects of functional analysis (1989–90), volume 1469 of Lecture Notes in Math. Springer; Berlin: 1991. A new isoperimetric inequality and the concentration of measure phenomenon; pp. 94–124. [Google Scholar]
  26. Talagrand M. Sharper bounds for Gaussian and empirical processes. Ann Probab. 1994;22(1):28–76. [Google Scholar]
  27. Talagrand M. Concentration of measure and isoperimetric inequalities in product spaces. Inst Hautes Études Sci Publ Math. 1995;81:73–205. [Google Scholar]
  28. Van De Geer S, Lederer J. The Bernstein-Orlicz norm and deviation inequalities. Probab Theory Related Fields. 2013;157(1-2):225–250. [Google Scholar]
  29. Van Der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer-Verlag; New York: 1996. (Springer Series in Statistics). [Google Scholar]

RESOURCES