Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2007 Sep 19;104(39):15254–15258. doi: 10.1073/pnas.0706451104

Shannon's monotonicity problem for free and classical entropy

Dimitri Shlyakhtenko †,, Hanne Schultz §
PMCID: PMC2000542  PMID: 17881587

Abstract

We give a short unified proof of the following theorem, valid in the context of both classical probability theory and Voiculescu's free probability theory: let (Xj(1), …, Xj(n)) be independent (resp., freely independent) n-tuples of random variables. Let ZN(p) = N−1/2(X1(p) + … + XN(p)) be their central limit sums. Then the entropy (resp., free entropy) of the n-tuple (ZN(1), …, ZN(n)) is a monotone function of N. The classical case (for n = 1) is a celebrated result of Artstein, Ball, Barthe, and Naor, and our proof is an adaptation and simplification of their argument.

Keywords: probability theory, free probability, information theory


Voiculescu's free probability theory (1) is an amazing non-commutative parallel to classical probability theory. Classical probability theory deals with random variables X1, X2, … which are commutative variables: XiXj = XjXi. Associated to any polynomial P in the variables X1, X2, … one has its expected value E(P). A typical model for random variables are functions XjL (𝔛, μ) on a measure space 𝔛 endowed with a positive measure μ satisfying μ(𝔛) = 1. Then E(P(X1,X2, …) = ∫ P(X1(σ), X2(σ), …)dμ(σ). Two groups of variables X1,X2, … and Y1,Y2 … are independent if E(PQ) = E(P)E(Q) whenever P is a polynomial depending only on X1,X2, … and Q is a polynomial depending only on Y1,Y2….

In the noncommutative setting, one again has random variables, only now they are not assumed to commute; yet one still can associate an expected value τ(P) to any (noncommutative) polynomial P in X1,X2…. A prototypical example is that of bounded operators Xj on a Hilbert space, with expectation functional given by τ(P) = 〈P(X1,X2, …)ξ,ξ〉 for some fixed ξ ∈ H of unit norm. An amazing discovery made by Voiculescu (2) is that, in addition to a straightforward extension of the notion of independence, the noncommutative setting allows for a new form of independence, called free independence (see ref. 1 for an introduction). The notion of free independence and free probability theory plays an important role in the description of large N asymptotics of random multimatrix models.

Free probability thus provides a noncommutative parallel to classical probability theory; strangely, many statements and notions from classical probability have their free analogs (though the correspondence is far from being straightforward: Some classical statements fail to have free analogs, while others have stronger free analogs than one would expect). An example of this correspondence is the theory of free entropy and free information (3), which in many respects parallels the classical theory pioneered by Shannon.

This article is devoted to giving a proof of monotonicity of free and classical entropy computed on central limit sums of n-tuples of random variables. The classical statement was conjectured by Shannon in the 50s and proved (at least in the n = 1 case) by ref. 4. In trying to adapt their proof to the free context, we discovered a simplification which at the same time allowed us to give a unified proof for the free and classical statement (5). In the present paper, we extend our proof to the case of n variables. Appendix 1, by Hanne Schultz, characterizes strict monotonicity.

Background

The Classical Case.

Let X1, …, Xn, … be independent identically distributed (iid) random variables, and assume that each Xj is centered [i.e., its expectation E(Xj) is zero] and that the variance E(Xj2) = σ2 is fixed. The classical central limit theorem then states that the central limit sums

graphic file with name zpq03907-7418-m01.jpg

coverge in law to a Gaussian variable G with variance σ2, dμG=(1/2πσ)exp(x2/σ2)dx. Recall that if the law of a random variable X has density dμX(x) = p(x)dx then the entropy of X is defined to be

graphic file with name zpq03907-7418-m02.jpg

If the law of X is not Lebesgue absolutely continuous, the entropy is defined to be +∞. The entropy H(X) is a measure of how far the law of X is from that of a Gaussian variable. In fact, if E(X) = 0 and E(X2) = σ2, then H(X) ≥ H(G), with equality iff μX = μG. One can thus expect that H(ZN) approaches H(G) as N → ∞ (6). This is in general false (e.g., the measure associated to ZN could be atomic for all N). The following conjecture, going back to Shannon (see refs. 79), was proved by Artstein, Ball, Barthe, and Naor (4):

Theorem.

(See ref. 4.) The function NH(ZN) is monotone nonincreasing in N.

In the case of several variables X1, …, Xn, their joint entropy can be expressed in terms of the joint law p(x1, …, xn)dx1dxn by the formula

graphic file with name zpq03907-7418-m03.jpg

We prove the following generalization of the main result of ref. 4 for n-tuples:

Theorem 1.

Let (Xj(1), …, Xj(p)) be a sequence of p-tuples of random variables, so that {(Xj(1), …, Xj(p)) : j = 1,2, …} are independent and identically distributed and have finite second moments. Let ZN(k) = (X1(k)+ … +XN(k))/N. Then the function NH(ZN(1), …, ZN(p)) is monotone nonincreasing.

Note that (ZN(1), …, ZN(p)) → (G(1), …, G(p)) in law, where G(j) are correlated Gaussians determined by E(G(i)G(j)) = E(X1(i)X1(j)).

A version of this theorem also holds for conditionally independent n-tuples (see Relative Entropy before Appendix 1).

The Free Case.

We now describe the case of freely independent random variables (see ref. 1 for basic definitions).

Let X1, …, Xn, … ∈ (M, τ) be freely independent identically distributed random variables, so that each Xj is centered and τ(Xj2) = σ2. Consider their central limit sums ZN=(X1++XN)/N. According to Voiculescu's free central limit theorem (1, 2), ZN → σS in law, where S is a random variable with the semicircular law dμs(x)=(2/π)2t2dx.

For a single random variable X with law μX its free entropy was defined by Voiculescu (3, 10) to be

graphic file with name zpq03907-7418-m04.jpg

This quantity plays the role of the classical entropy H in free probability theory. An important difference, though, is its sign: χ(X) ∈ [−∞,∞).

For any random variable X with τ(X) = 0 and having variance τ(X2) = σ2, τ(X) = 0, the free entropy of X satisfies χ(X) ≤ χ(S) with equality if and only if X has the same law as S.

Theorem.

(See ref. 5.) Let Xj be freely independent, identically distributed random variables, and let ZN=(X1++XN)/N. Then the function N → χ(ZN) is monotone nondecreasing.

In the case of several variables X1, …, Xn, Voiculescu introduced two definitions of free entropy, denoted χ(X1, …, Xn) and χ*(X1, …, Xn). It is not known whether these quantities are ever different; they are the same when n = 1. For n > 1, our proof works only for the free entropy χ* (see below for its definition).

Theorem 2.

Let (Xj(1), …, Xj(p)) be a sequence of p-tuples of random variables, so that {(Xj(1), …, Xj(p)): j = 1,2, …} are freely independent and identically distributed and have finite second moments. Let ZN(k) = (X1(k)+ … +XN(k))/N. Then the function N → χ*(ZN(1), …, ZN(p)) is monotone nondecreasing.

Preliminaries on Entropy

Let as before X1, …, Xn be some (perhaps noncommutative) random variables in an operator algebra A equipped with a positive tracial linear functional τ : A → ℂ [thus we write τ(X) for the expected value E(X) if X is a classical random variable]. We recall the definitions of the score function, Fisher information and entropy in the classical case (see, e.g., ref. 11 and references therein) and the free case (see refs. 3 and 12).

The Classical Case.

Consider the derivations dj : AL2(A, τ) given by djXk = 0 for jk and djXj = 1. If 1 is in the domain of dj*, one defines the score function by

graphic file with name zpq03907-7418-m05.jpg

In other words, for any function gL2(A, τ), one has 〈gj, fL2(A,τ) = 〈djf,1〉L2(A,τ). That is, ∫ fjgdμ(x1, …, xn) = ∫ ∂g/∂xjdμ(x1, …, xn), where μ is the joint law of X1, …, Xn. It is not hard to see that this exists iff μ is Lebesgue absolutely-continuous; and if p(x1, …, xn) is the density of μ, then 1 is in the domain of dj iff the following expression for fj is in L2(μ):

graphic file with name zpq03907-7418-m06.jpg

The Fisher information F(X1, …, Xn) is then defined by the equation F(X1, …, Xn) = ΣjfjL2(A,τ)2. It turns out that the entropy of X1, …, Xn is up to a universal constant the expression

graphic file with name zpq03907-7418-m07.jpg

where Xjt = Xj + tGj and G1, …, Gn are independent iid centered Gaussian random variables of variance 1, independent from X1, …, Xn.

The Free Case.

Consider the derivations ∂j: AL2(A, τ)⊗̄L2(A, τ) given by ∂jXk = 0 for jk and ∂jXj = 1 ⊗ 1. If 1 ⊗ 1 is in the domain of ∂j*, one defines the free score function (also called conjugate variable) by

graphic file with name zpq03907-7418-m08.jpg

In other words, for any gL2(A, τ), one has 〈 g, ξj〉 = 〈∂j f,1 ⊗ 1〉L2(A,τ);⊗̄L2(A,τ). In the case that n = 1 one can check that 1 ⊗ 1 is in the domain of ∂ iff the law of X = X1 is Lebesgue absolutely continuous with density p, for which the Hilbert transform H p lies in L2(μ). There is no explicit description of ξj in the case that n > 1, since the joint law of several noncommutative random variables is no longer encoded by a measure.

One defines the free Fisher information by Φ* (X1, …, Xn) = Σj‖ξjL2(A,τ)2. The free entropy χ* of X1, …, Xn is up to a universal constant the expression

graphic file with name zpq03907-7418-m09.jpg

where Xjt = Xj + tSj and S1, …, Sn are freely independent iid centered semicircular random variables of variance 1, freely independent from X1, …, Xn.

In the case that n = 1, this definition of free entropy χ*(X) coincides with the free entropy χ(X) defined in Eq. 1.

Properties of Free and Classical Fisher Information

Lemma 3.

Assume that Z is freely independent (resp., classically independent) from X,Y1, …, Ym. Then one has the equality J(X : Y1, …, Ym,Z) = J(X : Y1, …, Ym) [resp., f(X : Y1, …, Ym,Z) = f(X : Y1, …, Ym)].

We refer the reader to e.g., ref. 12, for the proof in the free case (the proof in the classical case is immediate from the explicit formula for the score function).

Lemma 4.

Assume that {Xj(k)}, k = 1, …, n, j = 1,2, … are (noncommutative) random variables. Then for each j = 1,2, …, N + 1 and each k = 1,2, …, n one has:

graphic file with name zpq03907-7418-m10.jpg

assuming that the score function appearing on the right-hand side of the respective equation exists.

Proof: Let Yk = Σi=1N+1 Xi(k), Yk = Σij Xi(k). Thus Yk = Yk + Xj(k). Let P be a polynomial in Y1, …, Yn, viewed also as a polynomial in Y1, …, Yn,X1(k), …, Xj(k). Then

graphic file with name zpq03907-7418-m11.jpg

Indeed, the values of a derivation on an arbitrary polynomial P are determined by the Leibnitz rule and the values of the derivation on the generators. However, one has

graphic file with name zpq03907-7418-m12.jpg

and similarly for d. It follows that for any such P,

graphic file with name zpq03907-7418-m13.jpg

which, in view of the fact that PW* (Σi=1N+1 Xi(r) : r = 1, 2, …, n), proves the lemma.

Monotonicity for Fisher Information

Theorem 5.

Let (Xj(1), …, Xj(p)) be a sequence of p-tuples of random variables, so that {(Xj(1), …, Xj(p)) : j = 1,2, …} are classically (resp., freely) independent and identically distributed and have finite second moments. Let ZN(k) = X1(k)+ … +XN(k)/N. Then the function NF(ZN(1), …, ZN(n)) (resp., N → Φ* (ZN(1), …, ZN(n))) is monotonenon-increasing.

Proof: We give the details in the free case. The argument in the classical case is the same, if one replaces everywhere J by f and Φ* by F and free independence by classical independence.

Let M = W*(ZN+1(1), …, ZN+1(n)). Then using Lemma 3, we have that for all k,

graphic file with name zpq03907-7418-m14.jpg

where in the last line we used Lemma 3 and free independence of Xj(k) and {Xi(k)x}ij. Thus,

graphic file with name zpq03907-7418-m15.jpg

Since EM is a contraction on L2 we obtain that

graphic file with name zpq03907-7418-m16.jpg

where

graphic file with name zpq03907-7418-m17.jpg

Now let Mj = W* (Xj(1), …, Xj(n)), M = W* ({Mj}j=1N+1) and Ej : MQj = *ijMi be the conditional expectation. Then Ej : L2(M) → L2(M) are projections and moreover {Ej : j = 1, …, N + 1} form a commuting family. Indeed, because of the freeness assumptions, we may write

graphic file with name zpq03907-7418-m18.jpg

Hence if i < j,

graphic file with name zpq03907-7418-m19.jpg

In particular, note that E1 ∘ … ∘ En = τ. Since ζjQj, ζj = Ejζj and τ(ζj) = 〈1, ζj〉 = 〈∂ΣjiXi(k):{ZN+1(r):rk} 1,1 ⊗ 1〉 = 0, we may now apply lemma 5 in ref. 4 to conclude that

graphic file with name zpq03907-7418-m20.jpg

On the other hand, because the joint distribution of (Xi(1), …, Xi(n)) does not depend on i, we find that the L2 norm of ζj is the same for all j; hence

graphic file with name zpq03907-7418-m21.jpg

Combining this with the previous estimates (Eqs. 2 and 3) we obtain

graphic file with name zpq03907-7418-m22.jpg

By summing over k we obtain the inequality Φ*(ZN+1(1), …, ZN+1(n)) ≤ Φ* (ZN(1), …, ZN(n)).

Proof of Theorems 1 and 2.

We now show that Theorem 5 implies Theorem 1 and Theorem 2. Indeed, let Xj(k,t) = Xj(k) + tYj(k), where {Yj(k)}j,k are independent, iid centered Gaussian random variables (resp., freely iid centered semicircular variables) of variance 1, which are independent (resp., freely independent) from {Xj(k)}j,k. Let ZN(k,t) = N−1/2(X1(k,t) + … +XN(k,t)). Applying Theorem 5 for fixed t gives that

graphic file with name zpq03907-7418-m23.jpg

(and similarly for Φ* in the free case). But ZN(k,t) = ZN(k) + tY(N,k), where for each fixed N, Y(N,k) = N−1/2(Y1(k) + … + YN(k)), k = 1, …, n, is a family of centered iid Gaussian random variables (resp., centered freely iid semicircular variables), independent (resp., freely independent) from {ZN(k)}k and having variance 1. Hence

graphic file with name zpq03907-7418-m24.jpg

implying Theorem 1. The argument is the same in the free case, except for a change of sign in the definition of entropy.

Relative Entropy.

The same argument implies the following result: for an arbitrary von Neumann algebra B, keeping the notation of Theorem 1 (resp., Theorem 2), and assuming that the n-tuples {(Xj(1), …, Xj(n)): j = 1,2, …} are conditionally independent over B (resp., free with amalgamation over B), the relative entropy H(ZN(1), …, ZN(n) : B) (resp., the relative free entropy, χ*(ZN(1), …, ZN(n) : B)) is monotone nonincreasing (resp., monotone nondecreasing).

Acknowledgments

This work was supported by National Science Foundation Grants DMS-0355336 and DMS-0555680.

Abbreviation

iid

independent identically distributed.

Appendix 1

In the case of single random variables, we characterize, in the case n = 1, when equality holds in Theorem 1 or Theorem 2.

Theorem 6.

Let X1, …, XN+1 be freely iid bounded with χ(X1) > −∞. If

graphic file with name zpq03907-7418-m25.jpg

then X1 is semicircular.

Theorem 7.

Let X1, …, XN+1 be iid square integrable random variables with H(X1) < + ∞. If

graphic file with name zpq03907-7418-m26.jpg

then X1 is Gaussian.

Proof in the Free Case.

We first note that if we assume that if we assume that Eq. 4 holds, then by writing entropy as an integral of Fisher information, we find that it is sufficient to prove the theorem under the assumption that Φ*(N−1/2Σi=1N Xi) = Φ* ((N + 1)−1/2Σi=1N+1Xi).

Let ζj = Jij Xi). Then equality in Eq. 4 entails

graphic file with name zpq03907-7418-m27.jpg

Let E be the orthogonal projection onto W* (Σi=1N+1 Xi). Then Ej) = Ji=1N+1Xi) and as in the proof of Theorem 2 we have the inequalities

graphic file with name zpq03907-7418-m28.jpg

So equality in Eq. 4 forces E(Σζj) = Σζj and ‖Σζj22 = N Σ‖ζj22. We now use the following lemma (13):

Lemma 8.

Let P1, …, Pm be commuting projections on a Hilbert space ℋ. If ξ1, …, ξm ∈ ℋ satisfy that for all 1 ≤ im, P1P2Pmξi = 0, and the equality

graphic file with name zpq03907-7418-m29.jpg

holds, then for each j, ξj ∈ ⊕iji, where we have setj = (∩kjPk(ℋ))∩Pj(ℋ).

Applying Lemma 8 with m = N + 1, ξj = ζj, P1 = E1, …, PN+1 = EN+1, where Ej is the projection onto L2(W*(X1, …, j, …, XN+1)), and noticing that ℋj = L2(W*(Xj)), we obtain that

graphic file with name zpq03907-7418-m30.jpg

Combining this with the equality (N + 1) Jj=1N+1 Xj) = Σj=1N+1 ζj, we conclude from Eq. 7 that Jj=1N+1 ajXj) ∈ ⊕j=1N+1 (L2(W* (Xi)) ⊝ ℂ1). Now choose η jL2(W*(Xi)) ⊝ ℂ1 so that

graphic file with name zpq03907-7418-m31.jpg

Then

graphic file with name zpq03907-7418-m32.jpg

A standard application of freeness shows that for (i, j) ≠ (k, l), the terms Xiη j − ηi Xj and Xkηl − ηkXl are perpendicular elements of L2(M) [when τ(Xj) = 0]. Thus, the above identity implies that for all ij,

graphic file with name zpq03907-7418-m33.jpg

It follows from the unique decomposition within the free product that there is only one way that Eq. 9 can be fulfilled: There exist c1, …, cn+1 ∈ ℝ such that η j = cj Xj. Eq. 9 shows that all cj must be the same, so that for some C,

graphic file with name zpq03907-7418-m34.jpg

This implies that the variable Σj=1N+1 Xj is semicircular (12). Since Xi are freely iid, one can conclude by additivity of R-transform that this can only happen if X1 is semicircular.

Modifications of the Proof in the Classical Case.

The proof in the classical case proceeds in the same way until we arrive at Eq. 8. Since we are in a commutative setting, the commutator trick which we applied in the free case does not work here. We use the following lemma (see ref. 13 for a proof) instead.

Lemma 9.

Let N ∈ ℕ. Then for every m ∈ ℕ, the mth Hermite polynomial, Hm, satisfies

graphic file with name zpq03907-7418-m35.jpg

Because of finiteness of Fisher information, we may assume (see lemma 13.3 in ref. 13 for details) that the variables Xj are functions of iid Gaussian variables. Thus we can assume that the Hermite polynomials form a basis for the L2-spaces with which we are working. That is, there exist scalars (αm)m=1 and (βm)m=1 such that the score function f for X1 + … + XN+1 is given by

graphic file with name zpq03907-7418-m36.jpg

and that the vectors ηj in Eq. 8 are equal to

graphic file with name zpq03907-7418-m37.jpg

By Lemma 9, this implies that

graphic file with name zpq03907-7418-m38.jpg

The functions (Hk1 (x1)Hk2 (x2) … HkN+1 (xN+1))k1,…,kN+1≥0 are mutually perpendicular in L2(ℝN+1, ⊗j=1N+1σ1), where σ1 denotes the standard Gaussian law. Fix m ≥ 2, and take k1, …, kN+1 with Σjkj = m and kj ≥ 1 for at least two j. Then take inner product with Hk1 (x1)Hk2 (x2) … HkN+1 (xN+1) on both sides of Eq. 12 to see that αm must be zero. Thus the score function to X1 + … + XN+1 must be proportional to X1 + … + XN+1. This shows that X1 + … + XN+1 is Gaussian. As in the free case, using additivity of the logarithm of the Fourier transform, this can only happen if X1 is Gaussian.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

References

  • 1.Voiculescu DV, Dykema K, Nica A. Free Random Variables, CRM Monograph Series. Vol 1. Providence, RI: Am Math Soc; 1992. [Google Scholar]
  • 2.Voiculescu DV. Operator Algebras and Their Connections with Topology and Ergodic Theory, Lecture Notes in Mathematics. Vol 1132. New York: Springer; 1986. pp. 556–588. [Google Scholar]
  • 3.Voiculescu DV. Bull London Math Soc. 2002;34:257–278. [Google Scholar]
  • 4.Artstein S, Ball K, Barthe F, Naor A. J Am Math Soc. 2004;17:975–982. [Google Scholar]
  • 5.Shlyakhtenko D. Adv Math. 2007;208:824–833. [Google Scholar]
  • 6.Barron A, Johnson O. Prob Theor Rel Fields. 2004;129:391–409. [Google Scholar]
  • 7.Shannon C, Weaver W. The Mathematical Theory of Communication. Urbana, IL: Univ of Illinois Press; 1949. [Google Scholar]
  • 8.Stam A. Info Control. 1959;2:101–112. [Google Scholar]
  • 9.Lieb EH. Commun Math Phys. 1978;62:35–41. [Google Scholar]
  • 10.Voiculescu DV. Commun Math Phys. 1993;155:71–92. [Google Scholar]
  • 11.Barron A. Ann Prob. 1986;14:336–342. [Google Scholar]
  • 12.Voiculescu DV. Invent Math. 1998;132:189–227. [Google Scholar]
  • 13.Schultz H. 2005 arXiv:math/0512492. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES