Skip to main content
Entropy logoLink to Entropy
. 2023 May 4;25(5):752. doi: 10.3390/e25050752

Some Families of Jensen-like Inequalities with Application to Information Theory

Neri Merhav 1
Editor: Raúl Alcaraz1
PMCID: PMC10288939  PMID: 37238507

Abstract

It is well known that the traditional Jensen inequality is proved by lower bounding the given convex function, f(x), by the tangential affine function that passes through the point (E{X},f(E{X})), where E{X} is the expectation of the random variable X. While this tangential affine function yields the tightest lower bound among all lower bounds induced by affine functions that are tangential to f, it turns out that when the function f is just part of a more complicated expression whose expectation is to be bounded, the tightest lower bound might belong to a tangential affine function that passes through a point different than (E{X},f(E{X})). In this paper, we take advantage of this observation by optimizing the point of tangency with regard to the specific given expression in a variety of cases and thereby derive several families of inequalities, henceforth referred to as “Jensen-like” inequalities, which are new to the best knowledge of the author. The degree of tightness and the potential usefulness of these inequalities is demonstrated in several application examples related to information theory.

Keywords: Jensen’s inequality, convex function, concave function, entropy, capacity, moment-generating function, cumulant-generating function


In memory of Jacob Ziv,
a shining star in the sky of information theory,
whose legacy as a researcher will continue to inspire me and many others
for years to come.

1. Introduction

As is well known, the Jensen inequality is one of the most fundamental and useful mathematical tools in a variety of fields, including information theory. Interestingly, it includes many other very well-known inequalities, which are important on their own, as special cases. Among many examples, we mention the Shwartz–Cauchy inequality (which in turn supports uncertainty principles and the Cramér–Rao bound), the Lyapunov inequality, the Hölder inequality, and the inequalities among the harmonic, geometric and arithmetic means. In the field of information theory, the Jensen inequality stands at the basis of the information inequality (i.e., the non-negativity of the relative entropy), the data processing inequality (which in turn leads to the Fano inequality), and the inequality between conditional and unconditional entropies. Moreover, it plays a central role in support of the derivation of single-letter formulas in Shannon theory and in the theory of maximum entropy under moment constraints (see, for example, Chapter 12 of [1]).

During the last two decades, there have been many research efforts around Jensen’s inequality, which included refinements [2,3,4,5], variations [6,7,8], improvements [9,10,11], and extensions [12], just to name a few. There have also many derivations of reversed versions of the Jensen inequality. For a non-exhaustive list of works, see, e.g., ref. [13] for mixtures of exponential families, refs. [14,15,16,17] for global bounds on the difference between the two sides of Jensen’s inequality, ref. [18] for functions of self-adjoint operators in Hibert spaces, refs. [19,20] for inequalities via Green functions, refs. [21,22] for inequalities via Chebychev and Chernoff bounds, ref. [23] for quantum Simpson’s and quantum Newton’s inequalities, and ref. [24] for new quantum Hermite–Hadamard-like inequalities. In most of them, the derived inequalities are exemplified in many applications, for instance, useful relationships between arithmetic and geometric means, converse bounds on the entropy, the relative entropy, as well as the more general f-divergence, converse forms of the Hölder inequality, and so on. In many of these works, the main results are given in the form of an upper bound on the difference, E{f(X)}f(E{X}), where f is a convex function, E{·} is the expectation operator, and X is the random variable. However, those bounds depend mostly on global parameters associated with f, for example, its range and domain, but not particularly on the underlying probability function (probability density function in the continuous case, or probability mass function in the discrete case), of X. For one thing, a desirable property of a reverse Jensen inequality would be that it is tight when X is well concentrated in the vicinity of its mean, just like the same well-known property of the ordinary Jensen inequality. In [22], there is an attempt to address this issue.

This paper revisits the Jensen inequality from a completely different angle. It is not meant to be another improvement of earlier bounds in an existing line of work. It is meant to propose a different approach for generating useful inequalities in the spirit of Jensen’s inequality. It is based on the following simple observation, which is rooted in the proof of Jensen’s inequality: The given convex function, f(x), is lower bounded by the tangential affine function, (x)=f(a)+f(a)(xa), where a is an arbitrary number in the domain of x and f(a) is the derivative of f at x=a (provided that f is differentiable at x=a). By selecting a=E{X} and taking expectations of both sides of the inequality, f(X)(X), the Jensen inequality is readily proved. The point to be remembered is that here, a=E{X} is the optimal choice of a in the sense of maximizing E{(X)} over all possible values of a, thus yielding the tightest lower bound within this class of lower bounds on E{f(X)}. The optimal choice of a, however, might be different than E{X} when the function f(X) is only a part of a more complicated expression whose expectation is to be lower bounded. For example, one might be interested in lower bounding E{g[f(X)]}, where g is a monotonically non-decreasing function, or E{f(X)g(X)}, where g is a non-negative and/or convex function, or a combination of both, etc.

To demonstrate this fact, consider the example (to be treated in detail in Section 2) of lower bounding E{f(X)g(X)}, where g is a non-negative function. In this case,

E{f(X)g(X)}E{[f(a)+f(a)(Xa)]g(X)}, (1)

and by maximizing the right-hand side (r.h.s.) over a, we easily obtain that the optimal choice of a here is a=E{Xg(X)}/E{g(X)}, yielding the inequality,

E{f(X)g(X)}fE{Xg(X)}E{g(X)}·E{g(X)}, (2)

which is useful as long as g is such that we can easily calculate both E{g(X)} and E{Xg(X)}. While this particular inequality could have been obtained also by applying the (ordinary) Jensen inequality, E{f(X)}f(E{X}), with respect to (with respect to) the density, p˜(x)=p(x)g(x)/p(x)g(x)dx, we will see in the sequel also various examples of inequalities with no apparent simple interpretations such as this. We henceforth refer to these classes of inequalities as Jensen-like inequalities, since they are derived using the same general idea that underlies the proof the classical Jensen inequality. We will also demonstrate the usefulness of these inequalities in information theory.

Our contributions, in this work, have the following features:

  1. In many cases (such as the one above), the optimal value of the parameter(s) (e.g., the parameter a in the above discussion) can be found in closed form. In other cases, the resulting expressions may not lend themselves to closed-form optimization, and then we have two possibilities: (i) carry out the optimization numerically, and (ii) select an arbitrary choice of a and obtain a valid lower bound, bearing in mind that an educated guess can potentially result in a good bound.

  2. Our inequalities provide two types of bounds: (i) bounds that require the calculation of the first two moments (or equivalently, the first two cumulants) of X, and (ii) bounds that require the calculation of the moment-generating function (MGF) of X and its derivative, or equivalently, the cumulant-generating function (CGF) of X and its derivative. All these types of moments are often easily calculable in closed form, especially in situations where X is given by the sum of independent and identically distributed (i.i.d.) random variables, which is frequently encountered in information–theoretic applications.

  3. Most of our derivations extend to convex functions of more than one variable.

  4. The classes of Jensen-like inequalities that we consider allow enough flexibility to obtain derivations of lower bounds on functions that are not necessarily convex, and even for some concave functions, and thereby open the door for another route to reverse Jensen inequalities. This can be accomplished by representing the given function in one of the categories discussed (e.g., a product of a convex function and a non-negaive function, a product of two non-negative convex functions, a composition of a monotone function and a convex function, etc.).

  5. We demonstrate the utility of the Jensen-like inequalities in several examples of information–theoretic relevance. We also display numerical results that exemplify the degree of tightness of these bounds.

  6. Our Jensen-like inequalities have the desirable property of becoming tighter as X becomes more and more concentrated around its mean, just like the ordinary Jensen inequality.

  7. Throughout the paper, we confine ourselves to lower bounds on expectations of expressions that include a convex function f, but it should be understood that they all continue to apply also if f is concave and the inequalities are reversed.

  8. It should be understood that the classes of Jensen-like inequalities that we derive in this work are just examples that demonstrate the basic underlying idea of optimizing the point of tangency to the given convex function for the specific expression at hand. It is conceivable that the same idea can be applied to many more situations of theoretical and practical interest.

In all forthcoming derivations, it will be assumed that the convex functions involved are weakly convex and differentiable. In other words, we will rely on the well-known fact that a differentiable convex function, f(x), is nowhere below the supporting line, (x)=f(a)+f(a)(xa), for every value of the parameter a in the domain of the independent variable, x [25] (p. 69, eq. (3.2)). In order to show that the point of zero-derivative of the lower bound (w.r.t. a) indeed yields a maximum (and not a minimum, etc.) of the lower bound, we will need to further assume that f is twice differentiable, but such an assumption will not limit the applicability of the claimed lower bound, because the lower bound applies to any value of a, including the point of zero-derivative, even if this point cannot be proved to yield the maximum of the lower bound using the standard methods. Similar comments apply when the lower bound will depend on more than one parameter.

In the remaining part of this article, each section is devoted to a different class of Jensen-like inequalities, which corresponds to a different form of an expression that includes the convex function, f.

2. A Product of a Convex Function and a Non-Negative Function

In this section, we focus on lower bounding expressions of the form E{f(X)g(X)}, where f is convex and g is non-negative. Indeed, let f:RR be a convex function and let g:RR+ be a non-negative function. Then, for any aR,

(3)E{f(X)g(X)}E{[f(a)+f(a)(Xa)]g(X)}(4)=[f(a)af(a)]E{g(X)}+f(a)E{Xg(X)}.

To find the value of a that maximizes the r.h.s., we equate the derivative to zero and obtain:

[f(a)f(a)af(a)]E{g(X)}+f(a)E{Xg(X)}=0 (5)

or equivalently,

f(a)[E{Xg(X)}aE{g(X)}]=0, (6)

whose solution is readily obtained as

a=a=E{Xg(X)}E{g(X)}, (7)

and it is easy to verify that the second derivative at a=a is f(a)E{g(X)}<0, which means that it is a maximum (at least a local one). The resulting lower bound on E{f(X)g(X)} is then given by

E{f(X)g(X)}fE{Xg(X)}E{g(X)}·E{g(X)}. (8)

This result extends straightforwardly to the case where X is a vector provided that f is jointly convex and differentiable in all components of X. In particular, it extends to the case where f and g act as different random variables, X and Y, with a joint distribution:

E{f(X)g(Y)}fE{Xg(Y)}E{g(Y)}·E{g(Y)}. (9)

We next consider several examples.

Example 1. 

Let f(x)=lnx and g(x)=x, x>0. Applying Inequality (8),

E{XlnX}E{X}·lnE{X2}E{X}=E{X}·ln(E{X})E{X}·ln1+Var{X}[E{X}]2. (10)

Note that the function xlnx is concave, rather than convex, yet we have here a lower bound (rather than an upper bound) to its expectation, namely, a reversed Jensen inequality. The first term on the right-most side is the (ordinary) Jensen upper bound on E{XlnX}, and the second term is the gap, which depends not only on the expectation of X but also on its variance, which manifests the fluctuations around E{X}. Clearly, if Var{X}=0, the second term vanishes, which makes sense, because when X is a degenerated random variable, Jensen’s inequality is achieved with equality and there is no gap. This inequality has an immediate application for obtaining a lower bound to the expectation of the empirical entropy of a sequence drawn by a memoryless source, which is relevant in the context of universal source coding [26]. Each term of the empirical entropy is of the form XlnX, where X=N(u)/N, N(u) is the number of occurrences of a letter u in a randomly drawn N-tuple from a memoryless source, P, with a finite alphabet, U. Clearly, each N(u) is a binomial random variable with N trials and probability of success, P(u). In this case, E{X}=P(u) and Var{X}=P(u)[1P(u)]/N. Thus, denoting the entropy and the empirical entropy, respectively, by

H=uUP(u)lnP(u) (11)
H^=uUN(u)NlnN(u)N, (12)

with the convention that 0ln0=0, we have:

E{H^}uUP(u)lnP(u)uUP(u)ln1+P(u)[1P(u)]/NP2(u)=HuUP(u)ln1+1P(u)NP(u)HuUP(u)·1P(u)NP(u)=H1NuU[1P(u)](13)=H|U|1N,

where |U| is the cardinality of U. The use of the ordinary Jensen inequality yields an upper bound rather than a lower bound, E{H^}H. We conclude that the expected empirical entropy, E{H^}, is sandwiched between H and H(|U|1)/N, which is reasonable because the variance of the empirical probabilities, N(u)/N, decays at the rate of 1/N.

Example 2. 

Let s and t be two real numbers whose difference, st, is either negative or larger than unity. Now, let g(x)=xt, and f(x)=xst. Then,

E{Xs}=E{XtXst}E{Xt+1}E{Xt}st·E{Xt}(14)=(E{Xt+1})st(E{Xt})st1.

In particular, for t=1 and s(1,2), this becomes

E{Xs}(E{X2})s1(E{X})s2=[E{X}]s·1+Var{X}[E{X}]2s1 (15)

which is, once again, a bound that depends only on the first two moments of X. For s(0,1), the function xs is concave, and so, this is a reversed version of the Jensen inequality. For s0 and s2, the function xs is convex, and so, this is an improved version of the Jensen inequality: While the first factor, [E{X}]s, corresponds to the ordinary Jensen inequality, the second factor expresses the improvement, which depends on the relative fluctuation term, Var{X}/[E{X}]2. The degree of improvement depends, of course, on the variance of X. If the variance vanishes, there is nothing to improve because the ordinary Jensen inequality becomes an equality. On the other hand, the larger the variance, the larger the gap between the ordinary Jensen bound, [E{X}]s, and the improved one. Accordingly, this also demonstrates the role of the optimization of the parameter a as opposed to the default choice of a=E{X} of the ordinary Jensen inequality.

To particularize this example even further, consider the problem of randomized guessing under a distribution Q (see, e.g., [27] and many references therein). Then, the probability of a single success in guessing a discrete alphabet random variable, X, given that we know that X=x (but not the guesser), is Q(x). In sequential guessing until the first success, the number of guesses, G, is a geometric RV with parameter p=Q(x), whose mean and variance are 1/p and (1p)/p2, respectively. For s(1,2),

E{Gs}1ps·1+(1p)/p21/p2s1=(2p)s1ps=[2Q(x)]s1[Q(x)]s. (16)

Example 3. 

Let f be an arbitrary convex function and let g(x)=esx, where s is a given real number. Then, Inequality (8) becomes:

E{f(X)esX}f(ψ(s))·eψ(s) (17)

where

ψ(s)=lnE{esX} (18)

is the CGF of X and ψ(s) is its derivative. This gives a lower bound in terms of the CGF of X and its derivative. The ordinary Jensen inequality is obtained as the special case of s=0, where ψ(0)=0 and ψ(0)=E{X}.

3. A Composition of a Monotone Function and a Convex Function

Another family of Jensen-like inequalities corresponds to the need to lower bound an expression of the form E{g[f(X)]}, where f is convex as before and g is a monotonically non-decreasing function. The general idea is to carry out the optimization of the r.h.s. of the following inequality.

E{g[f(X)]}supaE{g[f(a)+f(a)(Xa)]}. (19)

In the important special case where g(x)=ex, we have:

E{ef(X)}supaE{ef(a)+f(a)(Xa)}=supaef(a)af(a)E{eXf(a)}(20)=expsupa{f(a)af(a)+ψ[f(a)]},

where ψ(·) is again the CGF of X. The optimal value, a, of a, is the solution to the equation obtained by equating the derivative of the exponent to zero, i.e.,

ψ[f(a)]=a,providedthatf(a)ψ[f(a)]<1, (21)

where ψ(·) and ψ(·) are the first and the second derivatives of ψ(·), respectively.

Example 4. 

Consider the case where f(x)=sx2/2 and XN(μ,σ2), where σ2<1/s, as otherwise, E{esX2}=. In this case, the condition f(a)ψ[f(a)]<1 is equivalent to σ2<1/s, and we have f(a)=sa, ψ(t)=μt+σ2t2/2, and so, ψ(t)=μ+σ2t, which means that ψ[f(a)]=μ+σ2sa. The equation for the optimal a becomes then

μ+σ2sa=a, (22)

whose solution is

a=a=μ1σ2s, (23)

which yields

EesX2/2expsa2/2sa2+μsa+σ2s2a2/2=expμ2s2(1σ2s). (24)

The ordinary Jensen inequality yields

EesX2/2expsE{X2}/2=es(μ2+σ2)/2, (25)

which does not capture the singularity at s=1/σ2. The exact calculation yields

EesX2/2=11σ2s·expμ2s2(1σ2s), (26)

namely, the Jensen-like bound (24) gives the correct exponential term (along with the singularity at s=1/σ2) and differs from the exact quantity only in the pre-exponential factor. Once again, this demonstrates the fact that optimizing the point of tangency, a, rather than using the default value, a=E{X}, can make a significant difference.

4. A Product of a Convex Function and a Monotone-Convex Composition

Yet another class of Jensen-like inequalities corresponds to lower bounding the expectation of the product of two functions, where one is convex and the other is a composition of a non-negative monotonically non-decreasing function and a convex function, i.e.,

E{h[f(X)]g(X)}supa,bE{h[f(a)+f(a)(Xa)]·[g(b)+g(b)(Xb)]}, (27)

where f and g are convex and h is monotonically non-decreasing and non-negative. For the case where h(x)=ex, we end up with a bound that depends on the CGF of X and its derivative:

(28)E{ef(X)g(X)}Eef(a)+f(a)(Xa)[g(b)+g(b)(Xb)](29)=ef(a)af(a)EeXf(a)[g(b)bg(b)+g(b)X](30)=exp{f(a)af(a)+ψ[f(a)]}{g(b)+g(b)(ψ[f(a)]b)}.

Maximizing with respect to b while a is kept fixed yields b=ψ[f(a)], and we obtain:

E{ef(X)g(X)}supaexp{f(a)af(a)+ψ[f(a)]}·g(ψ[f(a)]). (31)

Example 5. 

Considering the case where f(x)=lnx and g(x)=xlnx, we may obtain a reversed Jensen-like inequality, namely, a lower bound to the expectation of the concave function lnX:

(32)E{lnX}=EelnX·XlnX(33)supa0exp{lna+1+ψ(1/a)}·ψ(1/a)lnψ(1/a)(34)=supα0exp{lnα+1+ψ(α)}ψ(α)lnψ(α)(35)=e·supα0αeψ(α)ψ(α)lnψ(α)(36)=e·supα0αE{XeαX}lnE{XeαX}E{eαX}.

Defining the MGF ϕ(s)=E{esX}=eψ(s), we have:

(37)E{lnX}e·supα0αϕ(α)lnψ(α)(38)=e·supα0αϕ(α)ψ(α)lnψ(α)(39)=e·supα0αϕ(α)lnϕ(α)ϕ(α).

We obtained a lower bound in terms of the MGF and its derivative (or, equivalently, the CGF and its derivative), which is appealing in cases where X is the sum of i.i.d. random variables.

Accordingly, we now particularize this example further by examining the case where X=1+i=1kYi2, with YiN(0,σ2), i=1,,k, being independent random variables. The motivation of assessing an expression of the form, Eln1+i=1kYi2, is two-fold. The first is that it is useful for bounding the ergodic capacity of the single-input, multiple-output (SIMO) channel, where {Yi} designates random channel transfer coefficients (see, e.g., [22,28,29] and references therein). The second is that it is relevant for bounding the joint differential entropy associated with the multivariate Cauchy density. Here, (Y1,,Yk) are not Gaussian as defined above, but their multivariate Cauchy density can be represented as a continuous mixture of i.i.d. zero-mean Gaussian random variables, where the mixture is taken over all possible variances—see [22] (Example 6) for the details. In this case,

(40)ϕ(s)=Eexps1+i=1kYi2(41)=esE{esY2}k(42)=es(12sσ2)k/2,s<12σ2.

Thus,

ψ(s)=sk2ln(12sσ2), (43)

and

ψ(s)=1+kσ212sσ2. (44)

It follows that

Eln1+i=1kYi2e·supα0αeα(1+2ασ2)k/21+kσ21+2ασ2ln1+kσ21+2ασ2. (45)

The Jensen upper bound, ln(1+kσ2), and the lower bound (45) are displayed in Figure 1 for σ2=1 and k=1,2,,100. As can be seen, the bounds are quite close. Interestingly, the choice α=1/(kσ2) yields results that are very close to those of the optimal α.

Figure 1.

Figure 1

Upper and lower bounds on Eln1+i=1kYi2, where YiN(0,σ2) are i.i.d., for σ2=1 and k=1,2,,100. The red curve is the upper bound, ln(1+kσ2), which is obtained by applying the ordinary Jensen inequality. The blue curve is the lower bound of Equation (45), where the search over α was carried out with a resolution of 0.001.

Another instance of this example is the circularly symmetric complex Gaussian channel whose signal-to-noise ratio (SNR), Z, is a random variable (e.g., due to fading), which is known to both the transmitter and the receiver. The capacity is given by C=E{ln(1+gZ)}, where g is a certain deterministic gain factor and the expectation is with respect to the randomness of Z. For simplicity, let us assume that Z is distributed exponentially, i.e.,

p(z)=θeθzz00z<0 (46)

where the parameter θ>0 is given. In this case,

ϕ(α)=θeαθ+gα (47)

and

ψ(α)=lnθln(θ+gα)α, (48)

and so,

E{ln(1+gZ)}eθ·supα0αeαθ+gα·1+gg+θαln1+gg+θα. (49)

In Figure 2, we plot this lower bound as a function of θ for g=5 and compare it to the Jensen upper bound, ln(1+g/θ) (red curve) and to the lower bound of [22] (Sect. 4.1, Example 1). As can be seen, the lower bound proposed here is considerably tighter, especially for small θ.

Figure 2.

Figure 2

Upper and lower bounds on Eln(1+gZ), where Z is distributed exponentially with parameter θ, as functions of θ, for g=5. The red curve is the upper bound, ln(1+g/θ), obtained by applying the ordinary Jensen inequality. The blue curve is the lower bound of of Equation (49), where the search over α was carried out with resolution of 0.001. The green curve is the lower bound of [22] (Example 1).

Example 6. 

Yet another example of this family of Jensen-like inequalities applies to obtaining a lower bound to E{Xt}, where t is an arbitrary real. For a given t, let s0 be either larger than 1t or smaller than t, and consider the case where f(x)=xt+s, g(x)=slnx and h(x)=ex. Then,

(50)E{Xt}=E{eslnXXt+s}(51)Eexpslna1a(Xa)·bt+s+(t+s)bt+s1(Xb)(52)=es[1lna]ϕsabt+s+(t+s)bt+s1ψsab.

Choosing b=ψ(s/a), and changing the optimization variable a into α=1/a, we obtain

E{Xt}supα0(αe)sϕ(αs)[ψ(αs)]t+s. (53)

More specifically, if X=i=1nYi, where {Yi} are Bernoulli i.i.d., with parameter p, then ϕ(s)=(pes+q)n, where q=1p. We then obtain

E{Xt}supα0(αe)s(peαs+q)n·npeαspeαs+qt+s. (54)

Selecting α=1/(np), we obtain

E{Xt}(np)t·es(pes/(np)+q)nes(t+s)/(np)(pes/(np)+q)t+s. (55)

The first factor is (EX)t. The second factor tends to unity as n grows, because pes/np+qp(1s/(np))+q=1s/n, and so, (pes/np+q)n(1s/n)nes. For t1 and t0, the function f(x)=xt is convex, and so, (EX)t is the ordinary Jensen lower bound. In this case, the bound is valuable if the multiplicative factor,

es(pes/(np)+q)nes(t+s)/(np)(pes/(np)+q)t+s,

is larger than unity. If 0<t<1, the function f(x)=xt is concave, and then (EX)t is an upper bound. Of course, the parameter s can be optimized, too. Some numerical results for t=0.5 are depicted in Figure 3. As can be seen, the upper and the lower bounds are fairly close.

Figure 3.

Figure 3

Upper and lower bounds on E{t=1nYt} as functions of n, where {Yt} are i.i.d., Bernoulli (0.2). The red curve is the Jensen upper bound, np, and the blue curve is the proposed lower bound where α is optimized in the range [0,10] and s is optimized in the range [0.5,10], both with resolution of 0.01.

Another application of this example is related to estimation theory. Let θR and let Y1,,Yn be i.i.d., with mean θ and variance σ2. Consider the t-th moment of the estimation error, Eθ|1ni=1nYiθ|t. Defining X=1ni=1nYiθ2, we have

ϕ(s)=112sσ2/n;ψ(s)=12ln12sσ2n. (56)

and so,

ϕ(αs)=11+2αsσ2/n;ψ(αs)=σ2/n1+2αsσ2/n. (57)
(58)Eθ|1ni=1nYiθ|t=EθXt/2(αe)s1+2αsσ2/nσ2/n1+2αsσ2/nt/2+s(59)=σ2nt/2+s·(αe)s(1+2αsσ2/n)(t+1)/2+s.

with either s1t/2 or st/2. For α=ζn/σ2 (ζ>0 being a constant), we have:

Eθ|1ni=1nYiθ|tσtnt/2·supζ>0,s>1t/2(ζe)s(1+2ζs)(t+1)/2+s (60)

where for t[0,2], the first factor, σt/nt/2, is the Jensen upper bound. The second factor,

μt=supζ>0,s>1t/2(ζe)s(1+2ζs)(t+1)/2+s, (61)

is the gap between the Jensen upper bound and the proposed lower bound. In Figure 4, we display this factor. The result μ2=1 is expected, because for t=2 and s=0, the calculation is trivially exact. Note that the maximization over ζ, for a given s, can be carried out in closed form by equating to zero the partial derivative of ln[(ζe)s/(1+2ζs)(t+1)/2+s] with respect to ζ. The optimal ζ turns out to be equal to 1/(t+1) (independently of s), and so,

μt=sups>1t/2t+1t+2s+1(t+1)/2·et+2s+1s. (62)

Figure 4.

Figure 4

The gap factor, μt, as a function of t. The parameter s is optimized in the range [1t/2,10] with a resolution of 0.001.

Finally, it should be pointed out that this family of Jensen-like bounds opens the door also to lower-bound calculations on the form E{f(X)/g(X)}, where f is non-negative convex and g is non-negative and concave. Using the fact the identity 1/s=0estdt, we have:

(63)Ef(X)g(X)=Ef(X)·0etg(X)dt(64)=0Eetg(X)f(X)dt

and we can apply the same ideas as before to the integrand, having the freedom to optimize the bound parameters with possible dependence on t.

5. A Product of Two Non-Negative Convex Functions

The last family of Jensen-like bounds that we present in this work is associated with the product of two non-negative convex functions. Let both f and g be non-negative convex functions of x0. Then,

(65)E{f(X)g(X)}E{[f(a)+f(a)(Xa)]·g(X)}(66)=[f(a)af(a)]E{g(X)}+f(a)E{Xg(X))}[f(a)af(a)]E{[g(b)+g(b)(Xb)]}+(67)f(a)E{X[g(c)+g(c)(Xc)]}f(a)af(a)0=[f(a)af(a)]·[g(b)bg(b)+g(b)E{X}]+(68)f(a)[(g(c)cg(c))E{X}+g(c)E{X2}}].

The optimal b and c are b=E{X} and c=E{X2}/E{X}, respectively. Thus,

E{f(X)g(X)}[f(a)af(a)]·g(E{X})+f(a)E{X}·gE{X2}E{X}. (69)

Let

a=E{X}·g(E{X2}/E{X})g(E{X}) (70)

and assume that f(a)af(a)0. Then, a is the optimal value of a, which yields

E{f(X)g(X)}fE{X}·g(E{X2}/E{X})g(E{X})·g(E{X}). (71)

More generally, when X and Y are two random variables with a joint distribution, the above derivation easily extends to

E{f(X)g(Y)}fE{X}·g(E{XY}/E{X})g(E{Y})·g(E{Y}). (72)

If f and g are both concave, rather than convex, then the inequalities are reversed.

Example 7. 

Consider again the example of the capacity of the AWGN with a random SNR, c(Z)=ln(1+gZ), and suppose that we wish to bound the variance of c(Z) in order to assess the fluctuations (e.g., for the purpose of bounding the outage probability). Then, obviously,

Var{c(Z)}=E{c2(Z)}[E{c(Z)}]2=E{ln2(1+gZ)}[E{ln(1+gZ)}]2. (73)

To upper bound Var{c(Z)}, we may derive an upper bound to E{ln2(1+gZ)} and a lower bound to E{ln(1+gZ)}. For the latter, a lower bound was already proposed earlier in Example 5. For the former, we may use the present inequality with the choice f(z)=g(z)=ln(1+gz), which can easily be shown to satisfy the requirements. We then obtain the following upper bound, which depends merely on the first two moments of Z:

E{ln2(1+gZ)}ln(1+gE{Z})·ln1+gE{Z}ln(1+gE{Z2}/E{Z})ln(1+gE{Z}). (74)

Interestingly, the function ln2(1+gx) is neither convex nor concave, yet our approach offers an upper bound, which is fairly easy to calculate provided that one can compute the first two moments of Z.

6. Conclusions

In this work, we have revisited the Jensen inequality on the basis of taking advantage of the freedom to optimize the choice of the supporting line that is tangential to the given convex function. This optimal choice might be different than the ordinary one when the convex function does not stand alone, but it is rather only part of a more complicated expression. This more complicated expression can sometimes be created in an artificial manner, such as in Examples 2, 5 and 6. The resulting bounds depend on either the first two moments of the independent variable, X, or on its MGF and its derivative. Both types of moments often lend themselves to relatively easy calculations. The proposed methodology can be used both for improving on the ordinary Jensen inequality (such as in Examples 2 and 4), and for generating lower bounds to expectations of non-convex or even concave (rather than convex) functions (such as in Examples 1, 2, 5 and 7). Several families of Jensen-like inequalities have been derived along with a demonstration of numerical examples with application to information theory. The tightness of the inequalities obtained was also demonstrated in those examples.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Cover T.M., Thomas J.A. Elements of Information Theory. 2nd ed. John Wiley & Sons; Hoboken, NJ, USA: 2006. [Google Scholar]
  • 2.Xiao L., Lu G. A new refinement of Jensen’s inequality with applications in information theory. Open Math. 2020;18:1748–1759. doi: 10.1515/math-2020-0123. [DOI] [Google Scholar]
  • 3.Deng Y., Ullah H., Khan M.A., Iqbal S., Wu S. Refinements of Jensen’s inequality via majorization results with applications in information theory. Hindawi J. Math. 2021;2021:1951799. doi: 10.1155/2021/1951799. [DOI] [Google Scholar]
  • 4.Wu S., Khan M.A., Saeed T., Sayed Z.M.M.M. A refined Jensen inequality connected to an arbitrary positive finite sequence. Mathematics. 2022;10:4817. doi: 10.3390/math10244817. [DOI] [Google Scholar]
  • 5.Sayyari Y., Barsam H., Sattarzadeh A.R. On new refinement of the Jensen inequality using uniformly convex functions with applications. Appl. Anal. 2023 doi: 10.1080/00036811.2023.2171873. [DOI] [Google Scholar]
  • 6.Jaafari E., Asgari M.S., Hosseini M.S., Moosavi B. On the Jensen’s inequality and its variants. AIMS Math. 2020;5:1177–1185. doi: 10.3934/math.2020081. [DOI] [Google Scholar]
  • 7.Matković A., Pečarić J. A variant of Jensen’s inequality for convex functions of several variables. J. Math. 2007;1:45–51. doi: 10.7153/jmi-01-06. [DOI] [Google Scholar]
  • 8.Bakula M.K., Matković A., Pečarić J. On a variant of Jensen’s inequality for functions of nondecreasing increments. J. Korean Math. Soc. 2008;45:821–834. doi: 10.4134/JKMS.2008.45.3.821. [DOI] [Google Scholar]
  • 9.Seuret A., Gouaisbaut F. Reducing the Gap of Jensen’s Inequality by Using Wirtinger Inequality. Preprint Submitted to Automatica. Jul 12, 2012. [(accessed on 23 March 2023)]. Available online: https://www.semanticscholar.org/paper/Reducing-the-gap-of-Jensen’s-inequality-by-using-Seuret-Gouaisbaut/1751273aa96d157ee143e3e7212fa04e1798ef11.
  • 10.Walker S.G. On a lower bound for the Jensen inequality. SIAM J. Math. Anal. 2014;46:3151–3157. doi: 10.1137/140954015. [DOI] [Google Scholar]
  • 11.Liao J.G., Berg A. Sharpening Jensen’s inequality. Am. Stat. 2019;73:278–281. doi: 10.1080/00031305.2017.1419145. [DOI] [Google Scholar]
  • 12.Simić S., Almohsen B. Some generalizations of Jensen’s inequality. Contemp. Math. 2021;2:1–14. doi: 10.37256/cm.212021686. [DOI] [Google Scholar]
  • 13.Jebara T., Pentland A. On reversing Jensen’s inequality; Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS 2000); Denver, CO, USA. 1 January 2000; pp. 213–219. [Google Scholar]
  • 14.Budimir I., Dragomir S.S., Pečarić J. Further reverse results for Jensen’s discrete inequality and applications in information theory. J. Inequalities Pure Appl. Math. 2001;2:1–14. [Google Scholar]
  • 15.Simić S. On an upper bound for Jensen’s inequality. J. Inequalities Pure Appl. Math. 2009;10:60. [Google Scholar]
  • 16.Simić S. On a new converse of Jensen’s inequality. Publ. L’inst. Math. 2009;85:107–110. doi: 10.2298/PIM0999107S. [DOI] [Google Scholar]
  • 17.Dragomir S.S. Some reverses of the Jensen inequality with applications. Bull. Aust. Math. Soc. 2013;87:177–194. doi: 10.1017/S0004972712001098. [DOI] [Google Scholar]
  • 18.Dragomir S.S. Some reverses of the Jensen inequality for functions of selfadjoint operators in Hilbert spaces. J. Inequalities Appl. 2010;2010:496821. doi: 10.1155/2010/496821. [DOI] [Google Scholar]
  • 19.Khan S., Khan M.A., Chu Y.-M. New converses of Jensen inequality via Green functions with applications. Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A Mat. 2020;114:1–14. doi: 10.1007/s13398-020-00843-1. [DOI] [Google Scholar]
  • 20.Khan S., Khan M.A., Chu Y.-M. Converses of Jensen inequality derived from the Green functions with applications in information theory. Math. Methods Appl. Sci. 2020;43:2577–2587. doi: 10.1002/mma.6066. [DOI] [Google Scholar]
  • 21.Wunder G., Groß B., Fritschek R., Schaefer R.F. A reverse Jensen inequality result with application to mutual information estimation; Proceedings of the 2021 IEEE Information Theory Workshop (ITW 2021); Kanazawa, Japan. 17–21 October 2021; [(accessed on 25 March 2023)]. Available online: https://arxiv.org/pdf/2111.06676.pdf. [Google Scholar]
  • 22.Merhav N. Reversing Jensen’s inequality for information–theoretic analyses. Information. 2022;13:39. doi: 10.3390/info13010039. [DOI] [Google Scholar]
  • 23.Ali M.A., Budak H., Zhang Z. A new extension of quantum Simpson’s and quantum Newton’s inequalities for quantum differentiable convex functions. Math. Methods Appl. Sci. 2022;45:1845–1863. doi: 10.1002/mma.7889. [DOI] [Google Scholar]
  • 24.Budak H., Ali M.A., Tarhanaci M. Some new quantum Hermite-Hadamard like inequalities for co-ordinated convex functions. J. Optim. Appl. 2020;186:899–910. doi: 10.1007/s10957-020-01726-6. [DOI] [Google Scholar]
  • 25.Boyd S., Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004. [Google Scholar]
  • 26.Krichevsky R.E., Trofimov V.K. The performance of universal encoding. IEEE Trans. Inform. Theory. 1981;27:199–207. doi: 10.1109/TIT.1981.1056331. [DOI] [Google Scholar]
  • 27.Merhav N., Cohen A. Universal randomized guessing with application to asynchronous decentralized brute-force attacks. IEEE Trans. Inform. Theory. 2020;66:114–129. doi: 10.1109/TIT.2019.2920538. [DOI] [Google Scholar]
  • 28.Dong A., Zhang H., Wu D., Yuan D. Logarithmic expectation of the sum of exponential random variables for wireless communication performance evaluation; Proceedings of the 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall); Boston, MA, USA. 6–9 September 2015. [Google Scholar]
  • 29.Tse D., Viswanath P. Fundamentals of Wireless Communication. Cambridge University Press; Cambridge, UK: 2005. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES