Skip to main content
Entropy logoLink to Entropy
. 2022 Jul 2;24(7):924. doi: 10.3390/e24070924

On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Jonathan E W Huffmann 1, Martin Mittelbach 2,*
Editor: Joaquín Abellán
PMCID: PMC9323744  PMID: 35885147

Abstract

Based on the canonical correlation analysis, we derive series representations of the probability density function (PDF) and the cumulative distribution function (CDF) of the information density of arbitrary Gaussian random vectors as well as a general formula to calculate the central moments. Using the general results, we give closed-form expressions of the PDF and CDF and explicit formulas of the central moments for important special cases. Furthermore, we derive recurrence formulas and tight approximations of the general series representations, which allow efficient numerical calculations with an arbitrarily high accuracy as demonstrated with an implementation in Python publicly available on GitLab. Finally, we discuss the (in)validity of Gaussian approximations of the information density.

Keywords: information density, information spectrum, probability density function, cumulative distribution function, central moments, Gaussian random vector, canonical correlation analysis

1. Introduction and Main Theorems

Let ξ and η be arbitrary random variables on an abstract probability space (Ω,F,P) such that the joint distribution Pξη is absolutely continuous w. r. t. the product PξPη of the marginal distributions Pξ and Pη. If dPξηdPξPη denotes the Radon–Nikodym derivative of Pξη w. r. t. PξPη, then

i(ξ;η)=logdPξηdPξPη(ξ,η)

is called the information density of ξ and η. The expectation E(i(ξ;η))=I(ξ;η) of the information density, called mutual information, plays a key role in characterizing the asymptotic channel coding performance in terms of channel capacity. The non-asymptotic performance, however, is determined by the higher-order moments of the information density and its probability distribution. Achievability and converse bounds that allow a finite blocklength analysis of the optimum channel coding rate are closely related to the distribution function of the information density, also called information spectrum by Han and Verdú [1,2]. Moreover, based on the variance of the information density tight second-order finite blocklength approximations of the optimum code rate can be derived for various important channel models. First work on a non-asymptotic information theoretic analysis was already published in the early years of information theory by Shannon [3], Dobrushin [4], and Strassen [5], among others. Due to the seminal work of Polyanskiy et al. [6], considerable progress has been made in this area. The results of [6] on the one hand and the requirements of current and future wireless networks regarding latency and reliability on the other hand stimulated a significant new interest in this type of analysis (Durisi et al. [7]).

The information density i(ξ;η) in the case when ξ and η are jointly Gaussian is of special interest due to the prominent role of the Gaussian distribution. Let ξ=(ξ1,ξ2,,ξp) and η=(η1,η2,,ηq) be real-valued random vectors with nonsingular covariance matrices Rξ and Rη and cross-covariance matrix RξrvY with rank r=rank(Rξη). (For notational convenience, we write vectors as row vectors. However, in expressions where matrix or vector multiplications occur, we consider all vectors as column vectors.) Without loss of generality for the subsequent results, we assume the expectation of all random variables to be zero. If (ξ1,ξ2,,ξp,η1,η2,,ηq) is a Gaussian random vector, then Pinsker [8], Ch. 9.6 has shown that the distribution of the information density i(ξ;η) coincides with the distribution of the random variable

ν=12i=1rϱiξ˜i2η˜i2+I(ξ;η). (1)

In this representation ξ˜1,ξ˜2,,ξ˜r,η˜1,η˜2,,η˜r are independent and identically distributed (i.i.d.) Gaussian random variables with zero mean and unit variance and the mutual information I(ξ;η) in (1) has the form

I(ξ;η)=12i=1rlog11ϱi2. (2)

Moreover, ϱ1ϱ2ϱr>0 denote the positive canonical correlations of ξ and η in descending order, which are obtained by a linear method called canonical correlation analysis that yields the maximum correlations between two sets of random variables (see Section 3). The rank r of the cross-covariance matrix Rξη satisfies 0rmin{p,q}, and for r=0 we have i(ξ;η)0 almost surely and I(ξ;η)=0. This corresponds to Pξη=PξPη and the independence of ξ and η such that the resulting information density is deterministic. Throughout the rest of the paper, we exclude this degenerated case when the information density is considered and assume subsequently the setting and notation introduced above with r1. As customary notation, we further write R, N0, and N to denote the set of real numbers, non-negative integers, and positive integers.

Main contributions. Based on (1), we derive in Section 4 series representations of the probability density function (PDF) and the cumulative distribution function (CDF) as well as explicit general formulas for the central moments of the information density i(ξ;η) given subsequently in Theorems 1 to 3. The series representations are useful as they allow tight approximations with errors as low as desired by finite sums as shown in Section 5.2. Moreover, we derive recurrence formulas in Section 5.1 that allow efficient numerical calculations of the series representations in Theorems 1 and 2.

Theorem 1

(PDF of information density). The PDF fi(ξ;η) of the information density i(ξ;η) is given by

fi(ξ;η)(x)=1ϱrπk1=0k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2ki×Kr12+k1+k2++kr1xI(ξ;η)ϱrΓr2+k1+k2++kr1xI(ξ;η)2ϱrr12+k1+k2++kr1,xR\{I(ξ;η)}, (3)

where Γ(·) denotes the gamma function [9], Sec. 5.2.1 and Kα(·) denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii). If r2, then fi(ξ;η)(x) is also well defined for x=I(ξ;η).

Theorem 2

(CDF of information density). The CDF Fi(ξ;η) of the information density i(ξ;η) is given by

Fi(ξ;η)(x)=12VI(ξ;η)xifxI(ξ;η)12+VxI(ξ;η)ifx>I(ξ;η),

with V(z) defined by

V(z)=k1=0k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2kiz2ϱr×[Kr12+k1+k2++kr1zϱrLr32+k1+k2++kr1zϱr+Kr32+k1+k2++kr1zϱrLr12+k1+k2++kr1zϱr],z0, (4)

where Lα(·) denotes the modified Struve L function of order α [9], Sec. 11.2.

The method to obtain the result in Theorem 1 is adopted from Mathai [10], where a series representation of the PDF of the sum of independent gamma distributed random variables is derived. Previous work of Grad and Solomon [11] and Kotz et al. [12] goes in a similar direction as Mathai [10]; however, it is not directly applicable since only the restriction to positive series coefficients is considered there. Using Theorem 1, the series representation of the CDF of the information density in Theorem 2 is obtained. The details of the derivations of Theorems 1 and 2 are provided in Section 4.

Theorem 3

(Central moments of information density). The m-th central moment E([i(ξ;η)I(ξ;η)]m) of the information density i(ξ;η) is given by

E([i(ξ;η)I(ξ;η)]m)
=(m1,m2,,mr)Km,r[2]m!i=1r(2mi)!4mi(mi!)2ϱi2miifm=2m˜0ifm=2m˜1, (5)

for all m˜N, where Km,r[2]=(m1,m2,,mr)N0r:2m1+2m2++2mr=m.

Pinsker [8], Eq. (9.6.17) provided a formula for i=1rE(ϱi2(ξ˜i2η˜i2)m), which he called “derived m-th central moment” of the information density, where ξ˜i and η˜i are given as in (1). These special moments coincide for m=2 with the usual central moments considered in Theorem 3.

The rest of the paper is organized as follows: In Section 2, we discuss important special cases which allow simplified and explicit formulas. In Section 3, we provide some background on the canonical correlation analysis and its application to the calculation of the information density and mutual information for Gaussian random vectors. The proofs of the main Theorems 1 to 3 are given in Section 4. Recurrence formulas, finite sum approximations, and uniform bounds of the approximation error are derived in Section 5, which allow efficient and accurate numerical calculations of the PDF and CDF of the information density. Some examples and illustrations are provided in Section 6, where also the (in)validity of Gaussian approximations is discussed. Finally, Section 7 summarizes the paper. Note that a first version of this paper was published on arXiv as preprint [13].

2. Special Cases

2.1. Equal Canonical Correlations

A simple but important special case for which the series representations in Theorems 1 and 2 simplify to a single summand and the sum of products in Theorem 3 simplifies to a single product is considered in the following corollary.

Corollary 1

(PDF, CDF, and central moments of information density for equal canonical correlations). If all canonical correlations are equal, i.e.,

ϱ1=ϱ2==ϱr,

then we have the following simplifications.

(i) The PDF fi(ξ;η) of the information density i(ξ;η) simplifies to

fi(ξ;η)(x)=1ϱrπΓr2Kr12xI(ξ;η)ϱrxI(ξ;η)2ϱrr12,xR\{I(ξ;η)}, (6)

where I(ξ;η) is given by

I(ξ;η)=r2log1ϱr2.

If r2, then fi(ξ;η)(x) is also well defined for x=I(ξ;η).

(ii) The CDF Fi(ξ;η) of the information density i(ξ;η) is given by

Fi(ξ;η)(x)=12VI(ξ;η)xifxI(ξ;η)12+VxI(ξ;η)ifx>I(ξ;η), (7)

with V(z) defined by

V(z)=z2ϱrKr12zϱrLr32zϱr+Kr32zϱrLr12zϱr,z0. (8)

(iii) The m-th central moment E([i(ξ;η)I(ξ;η)]m) of the information density i(ξ;η) has the form

E([i(ξ;η)I(ξ;η)]m)=m!m/2!j=1m/2r2+j1ϱrmifm=2m˜0ifm=2m˜1, (9)

for all m˜N.

Clearly, if all canonical correlations are equal, then the only nonzero term in the series (3) and (4) occur for k1=k2==kr1=0. For this single summand, the product in squared brackets in (3) and (4) is equal to 1 by applying 00=1, which yields the results of part (i) and (ii) in Corollary 1. Details of the derivation of part (iii) of the corollary are provided in Section 4.

Note, if all canonical correlations are equal, then we can rewrite (1) as follows:

ν=ϱr2i=1rξ˜i2i=1rη˜i2+I(ξ;η).

This implies that ν coincides with the distribution of the random variable

ν*=ϱr2ζ1ζ2+I(ξ;η),

where ζ1 and ζ2 are i.i.d. χ2-distributed random variables with r degrees of freedom. With this representation, we can obtain the expression of the PDF given in (6) also from [14], Sec. 4.A.4.

Special cases of Corollary 1. The case when all canonical correlations are equal is important because it occurs in various situations. The subsequent cases follow from the properties of canonical correlations given in Section 3.

(i) Assume that the random variables ξ1,ξ2,,ξp,η1,η2,,ηq are pairwise uncorrelated with the exception of the pairs (ξi,ηi),i=1,2,,kmin{p,q} for which we have cor(ξi,ηi)=ρ0, where cor(·,·) denotes the Pearson correlation coefficient. Then, r=k and ϱi=|ρ| for all i=1,2,,r. Note, if p=q=k, then for the previous conditions to hold, it is sufficient that the two-dimensional random vectors (ξi,ηi) are i.i.d. However, the identical distribution of the (ξi,ηi)’s is not necessary. In Laneman [15], the distribution of the information density for an additive white Gaussian noise channel with i.i.d. Gaussian input is determined. This is a special case of the case with i.i.d. random vectors (ξi,ηi) just mentioned. In Wu and Jindal [16] and in Buckingham and Valenti [17], an approximation of the information density by a Gaussian random variable is considered for the setting in [15]. A special case very similar to that in [15] is also considered in Polyanskiy et al. [6], Sec. III.J. To the best of the authors’ knowledge, explicit formulas for the general case as considered in this paper are not available yet in the literature.

(ii) Assume that the conditions of part (i) are satisfied. Furthermore, assume that A^ is a real nonsingular matrix of dimension p×p and B^ is a real nonsingular matrix of dimension q×q. Then, the random vectors

ξ^=A^ξandη^=B^η

have the same canonical correlations as the random vectors ξ and η, i.e., ϱi=|ρ| for all i=1,2,,kmin{p,q}.

(iii) If r=1, i.e., if the cross-covariance matrix Rξ,η has rank 1, then Corollary 1 obviously applies. Clearly, the most simple special case with r=1 occurs for p=q=1, where ϱ1=|cor(ξ1,η1)|.

As a simple multivariate example, let the covariance matrix of the random vector (ξ1,ξ2,,ξp,η1,η2,,ηq) be given by the Kac-Murdock–Szegö matrix

RξRξηRξηRη=ρ|ij|i,j=1p+q

which is related to the covariance function of a first-order autoregressive process, where 0<|ρ|<1. Then, r=rank(Rξη)=1 and ϱ1=|ρ|.

(iv) As yet another example, assume p=q and Rξη=ρRξ1/2Rη1/2 for some 0<|ρ|<1. Then, ϱi=|ρ| for i=1,2,,r=q. Here, A1/2 denotes the square root of the real-valued positive semidefinite matrix A, i.e., the unique positive semidefinite matrix B such that BB=A.

2.2. More on Special Cases with Simplified Formulas

Let us further evaluate the formulas given in Corollary 1 and Theorem 3 for some relevant parameter values.

(i) Single canonical correlation coefficient. In the most simple case, there is only a single non-zero canonical correlation coefficient, i.e., r=1. (Recall, in the beginning of the paper, we have excluded the degenerated case when all canonical correlations are zero.) Then, the formulas of the PDF and the m-th central moment in Corollary 1 simplify to the form

fi(ξ;η)(x)=1ϱ1πK0xI(ξ;η)ϱ1,xR\{I(ξ;η)},

and

E([i(ξ;η)I(ξ;η)]m)=m!m/2!2ϱ12mifm=2m˜0ifm=2m˜1, (10)

for all m˜N. A formula equivalent to (10) is also provided by Pinsker [8], Lemma 9.6.1 who considered the special case p=q=1, which implies r=1.

(ii) Second and fourth central moment. To demonstrate how the general formula given in Theorem 3 is used, we first consider m=2. In this case, the summation indices m1,m2,,mr have to satisfy mi=1 for a single i{1,2,,r}, whereas the remaining mi’s have to be zero. Thus, (5) evaluates for m=2 to

E([i(ξ;η)I(ξ;η)]2)=var(i(ξ;η))=i=1rϱi2. (11)

As a slightly more complex example, let m=4. In this case, either we have mi=2 for a single i{1,2,,r}, whereas the remaining mi’s are zero or we have mi1=mi2=1 for two i1i2{1,2,,r}, whereas the remaining mi’s have to be zero. Thus, (5) evaluates for m=4 to

E([i(ξ;η)I(ξ;η)]4)=9i=1rϱi4+6i=2rj=1i1ϱi2ϱj2.

(iii) Even number of equal canonical correlations. As in Corollary 1, assume that all canonical correlations are equal and additionally assume that the number r of canonical correlations is even, i.e., r=2r˜ for some r˜N. Then, we can use [9], Secs. 10.47.9, 10.49.1, and 10.49.12 to obtain the following relation for the modified Bessel function Kα(·) of a second kind and order α

Kr12(y)=π2expyi=0r/21r/21+i!r/21i!i!2iy(i+12),y(0,). (12)

Plugging (12) into (6) and rearranging terms yields the following expression for the PDF of the information density:

fi(ξ;η)(x)=1ϱr2r1r/21!expxI(ξ;η)ϱr×i=0r/212(r/21)i!2ir/21i!i!xI(ξ;η)ϱri,xR.

By integration, we obtain for the function V(·) in (8) the expression

V(z)=1212r1r/21!expzϱr×i=0r/212(r/21)i!2ir/21i!j=0i1(ij)!zϱrij,z0.

Note that these special formulas can also be obtained directly from the results given in [14], Sec. 4.A.3.

To illustrate the principal behavior of the PDF and CDF of the information density for equal canonical correlations, it is instructive to consider the specific value r=2 in the above formulas, which yields

fi(ξ;η)(x)=12ϱrexpxI(ξ;η)ϱr,xR,V(z)=121expzϱr,z0

and r=4, for which we obtain

fi(ξ;η)(x)=14ϱrexpxI(ξ;η)ϱr1+xI(ξ;η)ϱr,xR,V(z)=121expzϱr1+z2ϱr,z0.

3. Mutual Information and Information Density in Terms of Canonical Correlations

First introduced by Hotelling [18], the canonical correlation analysis is a widely used linear method in multivariate statistics to determine the maximum correlations between two sets of random variables. It allows a particularly simple and useful representation of the mutual information and the information density of Gaussian random vectors in terms of the so-called canonical correlations. This representation was first obtained by Gelfand and Yaglom [19] and further extended by Pinsker [8], Ch. 9. For the convenience of the reader, we summarize in this section the essence of the canonical correlation analysis and demonstrate how it is applied to derive the representations in (1) and (2).

The formulation of the canonical correlation analysis given below is particularly suitable for implementations. The corresponding results are given without proof. Details and thorough discussions can be found, e.g., in Härdle and Simar [20], Koch [21], or Timm [22].

Based on the nonsingular covariance matrices Rξ and Rη of the random vectors ξ=(ξ1,ξ2,,ξp) and η=(η1,η2,,ηq), and the cross-covariance matrix Rξη with rank r=rank(Rξη) satisfying 0rmin{p,q}, define the matrix

M=Rξ12RξηRη12,

where the inverse matrices Rξ1/2=Rξ1/21 and Rη1/2=Rη1/21 can be obtained from diagonalizing Rξ and Rη. Then, the matrix M has a singular value decomposition

M=UDVT,

where VT denotes the transpose of V. The only non-zero entries d1,1,d2,2,,dr,r>0 of the matrix D=di,ji,j=1p,q are called canonical correlations of ξ and η, denoted by ϱi=di,i,i=1,2,,r. The singular value decomposition can be chosen such that ϱ1ϱ2ϱr holds, which is assumed throughout the paper.

Define the random vectors

ξ^=(ξ^1,ξ^2,,ξ^p)=Aξandη^=(η^1,η^2,,η^q)=Bη,

where the nonsingular matrices A and B are given by

A=UTRξ12andB=VTRη12.

Then, the random variables ξ^1,ξ^2,,ξ^p,η^1,η^2,,η^q have unit variance and they are pairwise uncorrelated with the exception of the pairs (ξ^i,η^i),i=1,2,,r for which we have cor(ξ^i,η^i)=ϱi.

Using these results, we obtain for the mutual information and the information density

I(ξ;η)=I(Aξ;Bη)=I(ξ^;η^)=i=1rI(ξ^i;η^i) (13)
i(ξ;η)=i(Aξ;Bη)=i(ξ^;η^)=i=1ri(ξ^i;η^i)(P-almostsurely). (14)

The first equality in (13) and (14) holds because A and B are nonsingular matrices, which follows, e.g., from Pinsker [8], Th. 3.7.1. Since we consider the case where ξ and η are jointly Gaussian, ξ^ and η^ are jointly Gaussian as well. Therefore, the correlation properties of ξ^ and η^ imply that all random variables ξ^i,η^j are independent except for the pairs (ξ^i,η^i), i=1,2,,r. This implies the last equality in (13) and (14), where i(ξ^1;η^1),i(ξ^2;η^2),,i(ξ^r;η^r) are independent. The sum representations follow from the chain rules of mutual information and information density and the equivalence between independence and vanishing mutual information and information density.

Since ξ^i and η^i are jointly Gaussian with correlation cor(ξ^i,η^i)=ϱi, we obtain from (13) and the formula of mutual information for the bivariate Gaussian case the identity (2). Additionally, with ξ^i and η^i having zero mean and unit variance, the information density i(ξ^i;η^i) is further given by

i(ξ^i;η^i)=12log(1ϱi2)ϱi22(1ϱi2)ξ^i22ξ^iη^iϱi+η^i2,i=1,2,,r. (15)

Now assume ξ˜1,ξ˜2,,ξ˜r,η˜1,η˜2,,η˜r are i.i.d. Gaussian random variables with zero mean and unit variance. Then, the distribution of the random vector

121+ϱiξ˜i+1ϱiη˜i,1+ϱiξ˜i1ϱiη˜i

coincides with the distribution of the random vector (ξ^i,η^i) for all i=1,2,,r. Plugging this into (15), we obtain together with (14) that the distribution of the information density i(ξ;η) coincides with the distribution of (1).

4. Proof of Main Results

4.1. Auxiliary Results

To prove Theorem 1, the following lemma regarding the characteristic function of the information density is utilized. The results of the lemma are also used in Ibragimov and Rozanov [23] but without proof. Therefore, the proof is given below for completeness.

Lemma 1 

(Characteristic function of (shifted) information density). The characteristic function of the shifted information density i(ξ;η)I(ξ;η) is equal to the characteristic function of the random variable

ν˜=12i=1rϱiξ˜i2η˜i2, (16)

where ξ˜1,ξ˜2,,ξ˜r,η˜1,η˜2,,η˜r are i.i.d. Gaussian random variables with zero mean and unit variance, and ϱ1,ϱ2,,ϱr are the canonical correlations of ξ and η. The characteristic function of ν˜ is given by

φν˜(t)=i=1r11+ϱi2t2,tR. (17)

Proof. 

Due to (1), the distribution of the shifted information density i(ξ;η)I(ξ;η) coincides with the distribution of the random variable ν˜ in (16) such that the characteristic functions of i(ξ;η)I(ξ;η) and ν˜ are equal.

It is a well known fact that ξ˜i2 and η˜i2 in (16) are chi-squared distributed random variables with one degree of freedom from which we obtain that the weighted random variables ϱiξ˜i2/2 and ϱiη˜i2/2 are gamma distributed with a scale parameter of 1/ϱi and shape parameter of 1/2. The characteristic function of these random variables therefore admits the form

φϱi2ξ˜i2(t)=1ϱijt12.

Further, from the identity φϱiξ˜i2/2(t)=φϱiξ˜i2/2(t) for the characteristic function and from the independence of ξ˜i and η˜i, we obtain the characteristic function of ν˜i=ϱi(ξ˜i2η˜i2)/2 to be given by

φν˜i(t)=1ϱijt121+ϱijt12=1+ϱi2t212.

Finally, because ν˜ in (16) is given by the sum of the independent random variables ν˜i, the characteristic function of ν˜ results from multiplying the individual characteristic functions of the random variables ν˜i. By doing so, we obtain (17). □

As further auxiliary result, the subsequent proposition providing properties of the modified Bessel function Kα of second kind and order α will be used to prove the main results.

Proposition 1

(Properties related to the function Kα). For all αR, the function

yyαKα(y),y(0,),

where Kα(·) denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii), is strictly positive and strictly monotonically decreasing. Furthermore, if α>0, then we have

limy+0yαKα(y)=supy(0,)yαKα(y)=Γ(α)2α1. (18)

Proof. 

If αR is fixed, then Kα(y) is strictly positive and strictly monotonically decreasing w. r. t. y(0,) due to [9], Secs. 10.27.3 and 10.37. Furthermore, we obtain

dyαKα(y)dy=yαKα1(y),y(0,)

by applying the rules to calculate derivatives of Bessel functions given in [9], Sec. 10.29(ii). It follows that yαKα(y) is strictly positive and strictly monotonically decreasing w. r. t. y(0,) for all fixed αR.

Consider now the Basset integral formula as given in [9], Sec. 10.32.11

Kα(yz)=Γα+12(2z)αyαπu=0cos(uy)u2+z2α+12du (19)

for |arg(z)|<π/2,y>0,α>12 and the integral

u=01u2+1α+12du=πΓ(α)2Γα+12 (20)

for α>0, where the equality holds due to [24], Secs. 3.251.2 and 8.384.1. Using (19) and (20), we obtain

limy+0yαKα(y)=limy+0Γα+122απu=0cos(uy)u2+1α+12du=Γα+122απu=01u2+1α+12du=Γ(α)2α1,

for all α>0, where we also applied the dominated convergence theorem, which is possible due to cos(uy)/u2+1α+1/21/u2+1α+1/2. Using the previously derived monotonicity, we obtain (18). □

4.2. Proof of Theorem 1

To prove Theorem 1, we calculate the PDF fν˜ of the random variable ν˜ introduced in Lemma 1 by inverting the characteristic function φν˜ given in (17) via the integral

fν˜(v)=12πφν˜(t)expJtvdt,vR. (21)

Shifting the PDF of ν˜ by I(ξ;η), we obtain the PDF fi(ξ;η)=fν˜(xI(ξ;η)), xR, of the information density i(ξ;η).

The method used subsequently is based on the work of Mathai [10]. To invert the characteristic function φν˜, we expand the factors in (17) as

1+ϱi2t212=1+ϱr2t212ϱrϱi1+ϱr2ϱi211+ϱr2t2112 (22)
=1+ϱr2t212k=0(1)k1/2kϱrϱi1ϱr2ϱi2k1+ϱr2t2k. (23)

In (23), we have used the binomial series

(1+y)a=k=0akyk (24)

where aR. The series is absolutely convergent for |y|<1 and

ak==1ka+1,kN, (25)

denotes the generalized binomial coefficient with a0=1. Since

1ϱr2ϱi21+ϱr2t21<1 (26)

holds for all tR, the series in (23) is absolutely convergent for all tR. Using the expansion in (23) and the absolute convergence together with the identity

1/2k=(1)k(2k)!(k!)24k (27)

we can rewrite the characteristic function φν˜ as

φν˜(t)=k1=0k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2ki×1+ϱr2t2r2+k1+k2++kr1,tR. (28)

To obtain the PDF fν˜, we evaluate the inversion integral (21) based on the series representation in (28). Since every series in (28) is absolutely convergent, we can exchange summation and integration. Let β=r2+k1+k2++kr1. Then, by symmetry, we have for the integral of a summand

t=expJtv(1+ϱr2t2)βdt=2t=0costv(1+ϱr2t2)βdt=2ϱru=0cosuv/ϱr(1+u2)βdu, (29)

where the second equality is a result of the substitution t=u/ϱr. By setting z=1, α=β120 and y=v/ϱr in the Basset integral formula given in (19) in the proof of Proposition 1 and using the symmetry with respect to v, we can evaluate (29) to the following form:

t=expJtv(1+ϱr2t2)βdt=πΓβ2β32ϱrβ+12Kβ12|v|ϱr|v|β12,vR\{0}. (30)

Combining (21), (28), and (30) yields

fν˜(v)=12πk1=0k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2ki×Kr12+k1+k2++kr1|v|ϱr|v|r12+k1+k2++kr1Γr2+k1+k2++kr12r32+k1+k2++kr1ϱrr+12+k1+k2++kr1,vR\{0}. (31)

Slightly rearranging terms and shifting fν˜(·) by I(ξ;η) yields (3).

It remains to show that fi(ξ;η)(x) is also well defined for x=I(ξ;η) if r2. Indeed, if r2, then we can use Proposition 1 to obtain

limxI(ξ;η)fi(ξ;η)(x)=12ϱrπk1=0k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2ki×Γr12+k1+k2++kr1Γr12+k1+k2++kr1+12

where we used the exchangeability of the limit and the summation due to the absolute convergence of the series. Since Γ(α)/Γ(α+12) is decreasing w. r. t. α12, we have

Γr12+k1+k2++kr1Γr12+k1+k2++kr1+12Γr12Γr12+12π.

Then, with (69) in the proof of Theorem 4, it follows that limxI(ξ;η)fi(ξ;η)(x) exists and is finite. □

4.3. Proof of Theorem 2

To prove Theorem 2, we calculate the CDF Fν˜ of the random variable ν˜ introduced in Lemma 1 by integrating the PDF fν˜ given in (31). Shifting the CDF of ν˜ by I(ξ;η), we obtain the CDF Fi(ξ;η)(x)=Fν˜(xI(ξ;η)),xR, of the information density i(ξ;η). Using the symmetry of fν˜, we can write

Fν˜(z)=P(ν˜z)=12v=0zfν˜(v)dvforz012+v=0zfν˜(v)dvforz>0.

It is therefore sufficient to evaluate the integral

V(z):=v=0zfν˜(v)dv (32)

for z0. To calculate the integral (32), we plug (31) into (32) and exchange integration and summation, which is justified by the monotone convergence theorem. To evaluate the integral of a summand, consider the following identity

x=0zxαKα(x)dx=2α1πΓα+12zKα(z)Lα1(z)+Kα1(z)Lα(z) (33)

for α>1/2 given in [25], Sec. 1.12.1.3, where Lα(·) denotes the modified Struve L function of order α [9], Sec. 11.2. Using (33) with α=r12+k1+k2++kr10, we obtain (4). □

4.4. Proof of Theorem 3

Using the random variable

ν˜=i=1rν˜iwithν˜i=ϱi2(ξ˜iη˜i)

introduced in Lemma 1 and the well-known multinomial theorem [9], Sec. 26.4.9

y1+y2+yrm=(1,2,,r)Km,rm!i=1ryiii!,

where Km,r=(1,2,,r)N0r:1+2++r=m, we can write the m-th central moment of the information density i(ξ;η) as

E([i(ξ;η)I(ξ;η)]m)=E(i=1rν˜im)=(1,2,,r)Km,rm!i=1rE(ν˜ii)i!. (34)

To obtain the second equality in (34), we have exchanged expectation and summation and additionally used the identity E(i=1rν˜ii)=i=1rE(ν˜ii), which holds due to the independence of the random variables ν˜1,ν˜2,,ν˜r.

Based on the relation between the -th central moment of a random variable and the -th derivative of its characteristic function at 0, we further have

E(ν˜ii)=(j)ididtiφν˜i(t)|t=0, (35)

where φν˜i(t)=1+ϱi2t21/2,tR, is the characteristic function of the random variable ν˜i derived in the proof of Lemma 1. As in the proof of Theorem 1, consider now the binomial series expansion using (24)

φν˜i(t)=1+ϱi2t212=mi=01/2miϱit2mi.

The series is absolutely convergent for all t<ϱi1. Furthermore, consider the Taylor series expansion of the characteristic function φν˜i at the point 0

φν˜i(t)=i=0didtiφν˜i(t)|t=0tii!.

Both series expansions must be identical in an open interval around 0 such that we obtain by comparing the series coefficients

didtiφν˜i(t)|t=0=i!1/2i/2ϱiiifi=2mi0ifi=2mi1

for all miN. With this result, (35) evaluates to

E(ν˜ii)=i!2(i/2)!24i2ϱiiifi=2mi0ifi=2mi1 (36)

for all miN, where we have additionally used the identity (27).

From (34) and (36) we now obtain E([i(ξ;η)I(ξ;η)]m)=0 for all m=2m˜1 with m˜N because, if m is odd, then for all (1,2,,r)Km,r at least one of the i’s has to be odd. If m=2m˜ with m˜N, we obtain from (34) and (36)

E([i(ξ;η)I(ξ;η)]m)=(1,2,,r)Km,rm!i=1r1i!i!2(i/2)!24i2ϱii=(m1,m2,,mr)Km,r[2]m!i=1r(2mi)!mi!24miϱi2mi.

 □

4.5. Proof of Part (iii) of Corollary 1

Using the random variable ν˜ as in the proof of Theorem 3, we can write the m-th central moment of the information density i(ξ;η) as

E([i(ξ;η)I(ξ;η)]m)=E(ν˜m)=(j)mdmdtmφν˜(t)|t=0,

where the characteristic function φν˜ of ν˜ is given by φν˜(t)=1+ϱr2t2r/2,tR, due to Lemma 1 and the equality of all canonical correlations. Using the binomial series and the Taylor series expansion as in the proof of Theorem 3, we obtain

dmdtmφν˜(t)|t=0=m!r/2m/2ϱrmifm=2m˜0ifm=2m˜1

for all m˜N. Collecting terms and additionally using the definition of the generalized binomial coefficient given in (25) in the proof of Theorem 1 yields (9). □

5. Recurrence Formulas and Finite Sum Approximations

If there are at least two distinct canonical correlations, then the PDF fi(ξ;η) and CDF Fi(ξ;η) of the information density i(ξ;η) are given by the infinite series in Theorems 1 and 2. If we consider only a finite number of summands in these representations, then we obtain approximations amenable in particular for numerical calculations. However, a direct finite sum approximation of the series in (3) and (4) is rather inefficient since modified Bessel and Struve L functions have to be evaluated for every summand. Therefore, we derive in this section recursive representations, which allow efficient numerical calculations. Furthermore, we derive uniform bounds of the approximation error. Based on the recurrence relations and the error bounds, an implementation in the programming language Python has been developed, which provides an efficient tool to numerically calculate the PDF and CDF of the information density with a predefined accuracy as high as desired. The developed source code as well as illustrating examples are made publicly available in an open access repository on GitLab [26].

Subsequently, we adopt all the previous notation and assume r2 and at least two distinct canonical correlations (since otherwise we have the case of Corollary 1, where the series reduce to a single summand).

5.1. Recurrence Formulas

The recursive approach developed below is based on the work of Moschopoulos [27], which extended the work of Mathai [10]. First, we rewrite the series representations of the PDF and CDF of the information density given in Theorem 1 and Theorem 2 in a form, which is suitable for recursive calculations. To begin with, we define two functions appearing in the series representations (3) and (4), which involve the modified Bessel function Kα of second kind and order α and the modified Struve L function Lα of order α. Let us define for all kN0 the functions Uk and Dk by

Uk(z)=Kr12+k(z)Γr2+kz2r12+k,z0 (37)

and

Dk(z)=z2ϱrKr12+kzϱrLr32+kzϱr+Kr32+kzϱrLr12+kzϱr,z0. (38)

Furthermore, we define for all kN0 the coefficient δk by

δk=(k1,k2,,kr1)Kk,r1i=1r1(2ki)!(ki!)24ki1ϱr2ϱi2ki, (39)

where Kk,r1=(k1,k2,,kr1)N0r1:k1+k2++kr1=k. With these definitions, we obtain the following alternative series representations of (3) and (4) by observing that the multiple summations over the indices k1,k2,,kr1 can be shortened to one summation over the index k=k1+k2++kr1.

Proposition 2

(Alternative representation of PDF and CDF of the information density). The PDF fi(ξ;η) of the information density i(ξ;η) given in Theorem 1 has the alternative series representation

fi(ξ;η)(x)=1ϱrπi=1r1ϱrϱik=0δkUkxI(ξ;η)ϱr,xR. (40)

The function V(·) specifying the CDF Fi(ξ;η) of the information density i(ξ;η) as given in Theorem 2 has the alternative series representation

V(z)=i=1r1ϱrϱik=0δkDk(z),z0. (41)

Based on the representations in Proposition 2 and with recursive formulas for Uk(·), Dk(·) and δk, we are in the position to calculate the PDF and CDF of the information density by a single summation over completely recursively defined terms. In the following, we will derive recurrence relations for Uk(·), Dk(·) and δk, which allow the desired efficient calculations.

Lemma 2

(Recurrence formula of the function Uk). If for all kN0 the function Uk is defined by (37), then Uk(z) satisfies for all k2 and z0 the recurrence formula

Uk(z)=z2(r+2k2)(r+2k4)Uk2(z)+r+2k3r+2k2Uk1(z). (42)

Proof. 

First, assume z=0. Based on Proposition 1, we obtain for all kN0

limz+0Uk(z)=Γr12+k2Γr2+k, (43)

such that Uk(0) is well defined and finite. Using the recurrence relation Γ(y+1)=yΓ(y) for the Gamma function [24], Sec. 8.331.1 we have

Γr12+k2Γr2+k=r12+k1r2+k1·Γr12+k12Γr2+k1.

This shows together with (43) that the recurrence formula (42) holds for Uk(0) and k2.

Now, assume z>0 and consider the recurrence formula

zKα(z)=zKα2(z)+2(α1)Kα1(z) (44)

for the modified Bessel function of the second kind and order α [24], Sec. 8.486.10. Plugging (44) into (37) for α=r12+k yields for k2

Uk(z)=Kr12+k2(z)Γr2+kz2r12+k2z22+r12+k1Kr12+k1(z)Γr2+kz2r12+k1. (45)

Using again the relation Γ(y+1)=yΓ(y), we obtain

Γr2+k=r2+k1Γr2+k1=r2+k1r2+k2Γr2+k2,

which yields together with (45) and (37) the recurrence formula (42) for Uk(z) if z>0 and k2. □

Lemma 3

(Recurrence formula of the function Dk). If, for all kN0, the function Dk is defined by (38), then Dk(z) satisfies for all k1 and z0 the recurrence formula

Dk(z)=Dk1(z)12πr2+k1zϱrUk1zϱr, (46)

with Uk(·) as defined in (37).

Proof. 

First, assume z=0. We have Dk(0)=0 for all kN0 and from the proof of Lemma 2 we have Uk(0)=Γr12+k/2Γr2+k for all kN0. Thus, the left-hand side and the right-hand side of (46) are both zero, which shows that (46) holds for z=0 and k1.

Now, assume z>0 and consider the recurrence formula

zLα(z)=zLα2(z)2(α1)Lα1(z)21αzαπΓα+12

for the modified Struve L function of order α [9], Sec. 11.4.25. Together with the recurrence formula (44) for the modified Bessel function of the second kind and order α, we obtain

zLα(z)Kα1(z)=zLα2(z)Kα1(z)2(α1)Lα1(z)Kα1(z)
21αzαπΓα+12Kα1(z), (47)
zKα(z)Lα1(z)=zKα2(z)Lα1(z)+2(α1)Kα1(z)Lα1(z). (48)

Plugging (47) and (48) into (38) for α=r12+k yields for k1

Dk(z)=z2ϱr[Kr12+k1zϱrLr32+k1zϱr+Kr32+k1zϱrLr12+k1zϱr]1πΓr2+kz2ϱrr12+kKr12+k1zϱr.

Together with (38), the identity Γr2+k=r2+k1Γr2+k1, and the definition of the function Uk(·) in (37), we obtain the recurrence formula (46) for Dk(z) if z>0 and k1. □

Lemma 4

(Recursive formula of the coefficient δk). The coefficient δk defined by (39) satisfies for all kN0 the recurrence formula

δk+1=1k+1j=1k+1jγjδk+1j, (49)

where δ0=1 and

γj=i=1r112j1ϱr2ϱi2j. (50)

For the derivation of Lemma 4, we use an adapted version of the method of Moschopoulos [27] and the following auxiliary result.

Lemma 5.

For kN0, let g be a real univariate (k+1)-times differentiable function. Then, we have the following recurrence relation for the (k+1)-th derivative of the composite function h=expg

h(k+1)=j=1k+1kj1g(j)h(kj+1), (51)

where f(i) denotes the i-th derivative of the function f with f(0)=f.

Proof. 

We prove the assertion of Lemma 5 by induction over k. First, consider the base case for k=0. In this case, formula (51) gives

h(1)=g(1)h,

which is easily seen to be true.

Assuming formula (51) holds for h(k), we continue with the case k+1. Application of the product rule leads to

h(k+1)=h(k)(1)=j=1kk1j1g(j)h(kj)(1)=j=1kk1j1g(j+1)h(kj)+j=1kk1j1g(j)h(kj+1).

Substitution of j=j+1 in the first term gives

h(k+1)=j=2k+1k1j2g(j)h(kj+1)+j=1kk1j1g(j)h(kj+1).

With this representation and the identity,

k1j2+k1j1=kj1

We finally have

h(k+1)=g(1)h(k)+j=2kk1j1+k1j2g(j)h(kj+1)+g(k+1)h=k0g(1)h(k)+j=2kkj1g(j)h(kj+1)+kkg(k+1)h=j=1k+1kj1g(j)h(kj+1).

This completes the proof of Lemma 5. □

Proof of Lemma 4.

To prove the recurrence formula (49), we consider the characteristic function

φν˜(t)=i=1r1+ϱi2t212,tR (52)

of the random variable ν˜ introduced in Lemma 1. On the one hand, the series representation of φν˜ given in (28) in the proof of Theorem 1 can be rewritten as follows using the coefficient δk defined in (39):

φν˜(t)=1+ϱr2t2r2i=1r1ϱrϱi=0δ1+ϱr2t2,tR. (53)

On the other hand, recall the expansion of 1+ϱi2t212 given in (22), which yields together with (52) and the application of the natural logarithm the identity

logφν˜(t)=log1+ϱr2t2r2i=1r1ϱrϱi+i=1r1log1+ϱr2ϱi211+ϱr2t2112. (54)

Now consider the power series

log(1+y)==1(1)+1y, (55)

which is absolutely convergent for |y|<1. With the same arguments as in the proof of Theorem 1, in particular due to (26), we can apply the series expansion (55) to the second term on the right-hand side of (54) to obtain the absolutely convergent series representation

logφν˜(t)=log1+ϱr2t2r2i=1r1ϱrϱi+=1γ1+ϱr2t2, (56)

where we have further used the definition of γ given in (50). Applying the exponential function to both sides of (56) then yields the following expression for the characteristic function φν˜.

φν˜(t)=1+ϱr2t2r2i=1r1ϱrϱiexp=1γ1+ϱr2t2 (57)

Comparing (53) and (57) yields the identity

=0δ1+ϱr2t2=exp=1γ1+ϱr2t2. (58)

We now define x=1+ϱr2t21 and take the (k+1)-th derivative w. r. t. x on both sides of (58) using the identity

dmdxm=0ax=dmdxm=1ax==m!(m)!axm (59)

for the m-th derivative of a power series =0ax. For the left-hand side of (58), we obtain

dk+1dxk+1=0δx==k+1!(k1)!δxk1. (60)

For the right-hand side of (58), we obtain

dk+1dxk+1exp=1γx=j=1k+1kj1djdxj=1γxdkj+1dxkj+1exp=1γx=j=1k+1kj1djdxj=1γxdkj+1dxkj+1=0δx=j=1k+1kj1=j!γ(j)!xj×=k+1j!δ(k+j1)!xk+j1, (61)

where we used Lemma 5 and the identities (58) and (59). From the equality

dk+1dxk+1=0δx=dk+1dxk+1exp=1γx

and the evaluation of the right-hand side of (60) and (61), we obtain

(k+1)!δk+1x0+x1+x2=j=1k+1kj1j!γj(k+1j)!δk+1jx0+x1+x2

Comparing the coefficients for x0 finally yields

δk+1=1(k+1)!j=1k+1kj1j!γj(k+1j)!δk+1j=1(k+1)!j=1k+1k!(j1)!(k+1j)!j!γj(k+1j)!δk+1j=1(k+1)j=1k+1jγjδk+1j.

This completes the proof of Lemma 4. □

5.2. Finite Sum Approximations

The results in the previous Section 5.1 can be used in the following way for efficient numerical calculations. Consider

f^i(ξ;η)(x,n)=1ϱrπi=1r1ϱrϱik=0nδkUkxI(ξ;η)ϱr,xR (62)

for nN0, i.e., the finite sum approximation of the PDF given in (40). To calculate f^i(ξ;η)(x,n), first calculate U0|xI(ξ;η)|/ϱr and U1|xI(ξ;η)|/ϱr using (37). Then, use the recurrence formulas (42) and (49) to calculate the remaining summands in (62). The great advantage of this approach is that only two evaluations of the modified Bessel function are required and for the rest of the calculations efficient recursive formulas are employed making the numerical computations efficient.

Similarly, consider

F^i(ξ;η)(x,n)=12V^I(ξ;η)x,nifxI(ξ;η)12+V^xI(ξ;η),nifx>I(ξ;η), (63)
with     V^(z,n)=i=1r1ϱrϱik=0nδkDk(z),z0, (64)

for nN0, i.e., the finite sum approximation of the alternative representation of the CDF of the information density, where V^(z,n) is the finite sum approximation of the function V(·) given in (41). To calculate F^i(ξ;η)(x,n), first calculate D0(z), U0z/ϱr, and U1z/ϱr for z=I(ξ;η)x or z=xI(ξ;η) using (37) and (38). Then, use the recurrence formulas (42), (46), and (49) to calculate the remaining summands in (64). This approach requires only three evaluations of the modified Bessel and Struve L function resulting in efficient numerical calculations also for the CDF of the information density.

The following theorem provides suitable bounds to evaluate and control the error related to the introduced finite sum approximations.

Theorem 4

(Bounds of the approximation error for the alternative representation of PDF and CDF). For the finite sum approximations in (62)–(64) of the alternative representation of the PDF and CDF of the information density as given in Proposition 2, we have for nN summands the error bounds

|fi(ξ;η)(x)f^i(ξ;η)(x,n)|Γr12+n2ϱrπΓr2+n1i=1r1ϱrϱik=0nδk,xR (65)

and

|V(z)V^(z,n)|121i=1r1ϱrϱik=0nδk,z0. (66)

Proof. 

From the special case where all canonical correlations are equal, we can conclude from the CDF given in Corollary 1 that the function

zzKαzLα1z+Kα1zLαz,z0, (67)

is monotonically increasing for all α=(j1)/2, jN and that further

limzzKαzLα1z+Kα1zLαz=1 (68)

holds. Using (68), we obtain from (4)

limz2V(z)=k1=0k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2ki

by exchanging the limit and the summation, which is justified by the monotone convergence theorem. Due to the properties of the CDF, we have limz2V(z)=1, which implies

i=1r1ϱrϱik=0δk=k2=0kr1=0i=1r1ϱrϱi(2ki)!(ki!)24ki1ϱr2ϱi2ki=1, (69)

where the first equality follows from the definition of the coefficient δk in (39).

We now obtain with (41) and (64)

|V(z)V^(z,n)|=i=1r1ϱrϱik=n+1δkDk(z)i=1r1ϱrϱik=n+112δk.

The inequality follows from the definition of the function Dk(·) in (38), the monotonicity of the function in (67), and from (68). Then, (66) follows from (69).

Similarly, we obtain with (40) and (62)

|fi(ξ;η)(x)f^i(ξ;η)(x,n)|=1ϱrπi=1r1ϱrϱik=n+1δkUkxI(ξ;η)ϱr1ϱrπi=1r1ϱrϱik=n+1δkΓr12+n2Γr2+n.

The inequality follows from the definition of the function Uk(·), Proposition 1, and the decreasing monotonicity of Γ(r12+k)/Γ(r2+k) w. r. t. kN0. Then, (65) follows from (69). □

Remark 1.

Note that the bound in (65) can be further simplified using the inequality Γ(α)/Γα+1/2π. Further note that the derived error bounds are uniform in the sense that they only depend on the parameters of the given Gaussian distribution and the number of summands considered. As can be seen from (69), the bounds converge to zero as the number of summands jointly increase.

Remark 2

(Relation to Bell polynomials). Interestingly, the coefficient δk can be expressed for all kN in the following form

δk=Bkγ1,2γ2,6γ3,,k!γkk!,

where γj is defined in (50), and Bk denotes the complete Bell polynomial of order k [28], Sec. 3.3. Even though this is an interesting connection to the Bell polynomials, which provides an explicit formula of δk, the recursive formula given in Lemma 4 is more efficient for numerical calculations.

6. Numerical Examples and Illustrations

We illustrate the results of this paper with some examples, which all can be verified with the Python implementation publicly available on GITLAB [26].

Equal canonical correlations. First, we consider the special case of Corollary 1 when all canonical correlations are equal. The PDF and CDF given by (6) and (7) are illustrated in Figure 1 and Figure 2 in centered form, i.e., shifted by I(ξ;η), for r{1,2,3,4,5} and equal canonical correlations ϱi=0.9,i=1,,r. In Figure 3 and Figure 4, a fixed number of r=5 equal canonical correlations ϱi{0.1,0.2,0.5,0.7,0.9},i=1,,r is considered. When all canonical correlations are equal, then, due to the central limit theorem, the distribution of the information density i(ξ;η) converges to a Gaussian distribution as r. Figure 5 and Figure 6 show for r{5,10,20,40} and equal canonical correlations ϱi=0.2,i=1,2,,r the PDF and CDF of the information density together with corresponding Gaussian approximations. The approximations are obtained by considering Gaussian distributions, which have the same variance as the information density i(ξ;η). Recall that the variance of the information density is given by (11), i.e., by the sum of the squared canonical correlations. The illustrations show that only for a high number of equal canonical correlations the distribution of the information density becomes approximately Gaussian.

Figure 1.

Figure 1

PDF fi(ξ;η)I(ξ;η) for r{1,2,3,4,5} equal canonical correlations ϱi=0.9.

Figure 2.

Figure 2

CDF Fi(ξ;η)I(ξ;η) for r{1,2,3,4,5} equal canonical correlations ϱi=0.9.

Figure 3.

Figure 3

PDF fi(ξ;η)I(ξ;η) for r=5 equal canonical correlations ϱi{0.1,0.2,0.5,0.7,0.9}.

Figure 4.

Figure 4

CDF Fi(ξ;η)I(ξ;η) for r=5 equal canonical correlations ϱi{0.1,0.2,0.5,0.7,0.9}.

Figure 5.

Figure 5

PDF fi(ξ;η)I(ξ;η) for r{5,10,20,40} equal canonical correlations ϱi=0.2 vs. Gaussian approximation.

Figure 6.

Figure 6

CDF Fi(ξ;η)I(ξ;η) for r{5,10,20,40} equal canonical correlations ϱi=0.2 vs. Gaussian approximation.

Different canonical correlations. To illustrate the case with different canonical correlations, let us consider two more examples.

(i) First, assume that the random vectors ξ=(ξ1,ξ2,,ξp) and η=(η1,η2,,ηq) have equal dimensions, i.e., p=q, and are related by

(η1,η2,,ηp)=(ξ1+ζ1,ξ2+ζ2,,ξp+ζp),

where ξ=(ξ1,ξ2,,ξp) and ζ=(ζ1,ζ2,,ζp) are zero mean Gaussian random vectors, independent of each other and with covariance matrices

Rξ=ρ|ij|i,j=1pandRζ=σz2Ip,

for parameters 0<|ρ|<1 and σz2>0, where Ip denotes the identity matrix of dimension p×p. The covariance matrix of the Gaussian random vector (ξ1,ξ2,,ξp,η1,η2,,ηp) is the basis of the canonical correlation analysis and is given by

RξRξηRξηRη=RξRξRξRξ+Rζ.

The specified situation corresponds to a discrete-time additive noise channel, where a stationary first-order Markov-Gaussian input process is corrupted by a stationary additive white Gaussian noise process. In this setting, a block of p consecutive input and output symbols is considered.

For given parameter values ρ and σz2, the canonical correlations can be calculated numerically with the method described in Section 3. However, the example at hand even allows the derivation of explicit formulas for the canonical correlations. Evaluating the approach in Section 3 analytically yields

ϱi(ρ,σz2)=λiλi+σz2withλi=1ρ212ρcos(θi)+ρ2,i=1,2,,r=p, (70)

where θ1,θ2,,θr are the zeros of the function

g(θ)=sin(r+1)θ2ρsinrθ+ρ2sin(r1)θ,θ(0,π).

In this representation, λ1,λ2,,λr denote the eigenvalues of the covariance matrix Rξ=(ρ|ij|)i,j=1p derived in [29], Sec. 5.3.

As numerical examples Figure 7 and Figure 8 show, the approximated PDF f^i(ξ;η)I(ξ;η)(·,n) and CDF F^i(ξ;η)I(ξ;η)(·,n) for p=r{5,10,20,40} and the parameter values ρ=0.9 and σz2=10 using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen <1×103 to obtain a high precision of the plotted curves. The number n of summands required in (62) and (64) to achieve these error bounds for r{5,10,20,40} is equal to n{217,333,462,649} for the PDF and n{282,444,618,847} for the CDF. For this example, the distribution of the information density i(ξ;η) converges to a Gaussian distribution as r. However, Figure 7 and Figure 8 show that, even for r=40, there is still a significant gap between the exact distribution and the corresponding Gaussian approximation.

Figure 7.

Figure 7

Approximated PDF f^i(ξ;η)I(ξ;η)(·,n) for r{5,10,20,40} canonical correlations ϱi(ρ,σz2) given in (70) for ρ=0.9 and σz2=10 (approximation error <1×103) vs. Gaussian approximation (r{20,40}).

Figure 8.

Figure 8

Approximated CDF F^i(ξ;η)I(ξ;η)(·,n) for r{5,10,20,40} canonical correlations ϱi(ρ,σz2) given in (70) for ρ=0.9 and σz2=10 (approximation error <1×103) vs. Gaussian approximation (r{20,40}).

(ii) As a second example with different canonical correlations, let us consider the sequence {ϱ1(T),ϱ2(T),,ϱr(T)} with

ϱi(T)=T2T2+πi122,i=1,2,,r. (71)

These canonical correlations are related to the information density of a continuous-time additive white Gaussian noise channel confined to a finite time interval [0,T] with a Brownian motion as input signal (see, e.g., Huffmann [30], Sec. 8.1 for more details). Figure 9 and Figure 10 show the approximated PDF f^i(ξ;η)I(ξ;η)(·,n) and CDF F^i(ξ;η)I(ξ;η)(·,n) for r{2,5,10,15} and T=1 using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen <1×102 such that there are no differences visible in the plotted curves by further lowering the approximation error. The number n of summands required in (62) and (64) to achieve these error bounds for r{2,5,10,15} is equal to n{15,141,638,1688} for the PDF and n{20,196,886,2071} for the CDF. Choosing r larger than 15 for the canonical correlations (71) with T=1 does not result in visible changes of the PDF and CDF compared to r=15. This demonstrates, together with Figure 9 and Figure 10, that a Gaussian approximation is not valid for this example, even if r.

Figure 9.

Figure 9

Approximated PDF f^i(ξ;η)I(ξ;η)(·,n) for r{2,5,10,15} canonical correlations ϱi(T) given in (71) for T=1 (approximation error <1×102) vs. Gaussian approximation (r=15).

Figure 10.

Figure 10

Approximated CDF F^i(ξ;η)I(ξ;η)(·,n) for r{2,5,10,15} canonical correlations ϱi(T) given in (71) for T=1 (approximation error <1×102) vs. Gaussian approximation (r=15).

Indeed, from [8], Th. 9.6.1 and the comment above Eq. (9.6.45) in [8], one can conclude that, whenever the canonical correlations satisfy

limri=1rϱi2<,

then the distribution of the information density is not Gaussian.

7. Summary of Contributions

We derived series representations of the PDF and CDF of the information density for arbitrary Gaussian random vectors as well as a general formula for the central moments using canonical correlation analysis. We provided simplified and closed-form expressions for important special cases, in particular when all canonical correlations are equal, and derived recurrence formulas and uniform error bounds for finite sum approximations of the general series representations. These approximations and recurrence formulas are suitable for efficient and arbitrarily accurate numerical calculations, where the approximation error can be easily controlled with the derived error bounds. Moreover, we provided examples showing the (in)validity of approximating the information density with a Gaussian random variable.

Author Contributions

J.E.W.H. and M.M. conceived this work, performed the analysis, validated the results, and wrote the manuscript. All authors have read and agreed to this version of the manuscript.

Data Availability Statement

An implementation in Python allowing efficient numerical calculations related to the main results of the paper is publicly available on GitLab: https://gitlab.com/infth/information-density (accessed on 24 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

The work of M.M. was supported in part by the German Research Foundation (Deutsche Forschungsgemeinschaft) as part of Germany’s Excellence Strategy—EXC 2050/1—Project ID 390696704—Cluster of Excellence “Centre for Tactile Internet with Human-in-the-Loop” (CeTI) of Technische Universität Dresden. We acknowledge the open access publication funding granted by CeTI.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Han T.S., Verdú S. Approximation Theory of Output Statistics. IEEE Trans. Inf. Theory. 1993;39:752–772. doi: 10.1109/18.256486. [DOI] [Google Scholar]
  • 2.Han T.S. Information-Spectrum Methods in Information Theory. Springer; Berlin/Heidelberg, Germany: 2003. [Google Scholar]
  • 3.Shannon C.E. Probability of Error for Optimal Codes in a Gaussian Channel. Bell Syst. Tech. J. 1959;38:611–659. doi: 10.1002/j.1538-7305.1959.tb03905.x. [DOI] [Google Scholar]
  • 4.Dobrushin R.L. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press; Berkeley, CA, USA: 1961. Mathematical Problems in the Shannon Theory of Optimal Coding of Information; pp. 211–252. Volume 1: Contributions to the Theory of Statistics. [Google Scholar]
  • 5.Strassen V. Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes (Held 1962) Czechoslovak Academy of Sciences; Prague, Czech Republic: 1964. Asymptotische Abschätzungen in Shannons Informationstheorie; pp. 689–723. [Google Scholar]
  • 6.Polyanskiy Y., Poor H.V., Verdú S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory. 2010;56:2307–2359. doi: 10.1109/TIT.2010.2043769. [DOI] [Google Scholar]
  • 7.Durisi G., Koch T., Popovski P. Toward Massive, Ultrareliable, and Low-Latency Wireless Communication With Short Packets. Proc. IEEE. 2016;104:1711–1726. doi: 10.1109/JPROC.2016.2537298. [DOI] [Google Scholar]
  • 8.Pinsker M.S. Information and Information Stability of Random Variables and Processes. Holden-Day; San Francisco, CA, USA: 1964. [Google Scholar]
  • 9.Olver F.W.J., Lozier D.W., Boisvert R.F., Clark C.W., editors. NIST Handbook of Mathematical Functions. Cambridge University Press; Cambridge, UK: 2010. [Google Scholar]
  • 10.Mathai A.M. Storage Capacity of a Dam With Gamma Type Inputs. Ann. Inst. Stat. Math. 1982;34:591–597. doi: 10.1007/BF02481056. [DOI] [Google Scholar]
  • 11.Grad A., Solomon H. Distribution of Quadratic Forms and Some Applications. Ann. Math. Stat. 1955;26:464–477. doi: 10.1214/aoms/1177728491. [DOI] [Google Scholar]
  • 12.Kotz S., Johnson N.L., Boyd D.W. Series Representations of Distributions of Quadratic Forms in Normal Variables. I. Central Case. Ann. Math. Stat. 1967;38:823–837. doi: 10.1214/aoms/1177698877. [DOI] [Google Scholar]
  • 13.Huffmann J.E.W., Mittelbach M. On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations. Entropy. 2022;24:924. doi: 10.3390/e24070924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Simon M.K. Probability Distributions Involving Gaussian Random Variables: A Handbook for Engineers and Scientists. Springer; Berlin/Heidelberg, Germany: 2006. [Google Scholar]
  • 15.Laneman J.N. On the Distribution of Mutual Information; Proceedings of the Workshop Information Theory and Its Applications (ITA); San Diego, CA, USA. 13 February 2006. [Google Scholar]
  • 16.Wu P., Jindal N. Coding Versus ARQ in Fading Channels: How Reliable Should the PHY Be? IEEE Trans. Commun. 2011;59:3363–3374. doi: 10.1109/TCOMM.2011.102011.100152. [DOI] [Google Scholar]
  • 17.Buckingham D., Valenti M.C. The Information-Outage Probability of Finite-Length Codes Over AWGN Channels; Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS); Princeton, NJ, USA. 19–21 March 2008. [Google Scholar]
  • 18.Hotelling H. Relations Between Two Sets of Variates. Biometrika. 1936;28:321–377. doi: 10.1093/biomet/28.3-4.321. [DOI] [Google Scholar]
  • 19.Gelfand I.M., Yaglom A.M. AMS Translations, Series 2. Volume 12. AMS; Providence, RI, USA: 1959. Calculation of the Amount of Information About a Random Function Contained in Another Such Function; pp. 199–246. [Google Scholar]
  • 20.Härdle W.K., Simar L. Applied Multivariate Statistical Analysis. 4th ed. Springer; Berlin/Heidelberg, Germany: 2015. [Google Scholar]
  • 21.Koch I. Analysis of Multivariate and High-Dimensional Data. Cambridge University Press; Cambridge, UK: 2014. [Google Scholar]
  • 22.Timm N.H. Applied Multivariate Analysis. Springer; Berlin/Heidelberg, Germany: 2002. [Google Scholar]
  • 23.Ibragimov I.A., Rozanov Y.A. On the Connection Between Two Characteristics of Dependence of Gaussian Random Vectors. Theory Probab. Appl. 1970;15:295–299. doi: 10.1137/1115034. [DOI] [Google Scholar]
  • 24.Gradshteyn I.S., Ryzhik I.M. Table of Integrals, Series, and Products. 7th ed. Elsevier; Amsterdam, The Netherlands: 2007. [Google Scholar]
  • 25.Prudnikov A.P., Brychov Y.A., Marichev O.I. Integrals and Series, Volume 2: Special Functions. Gordon and Breach Science; New York, NY, USA: 1986. [Google Scholar]
  • 26.Huffmann J.E.W., Mittelbach M. Efficient Python Implementation to Numerically Calculate PDF, CDF, and Moments of the Information Density of Gaussian Random Vectors. 2021. [(accessed on 24 June 2022)]. Source Code Provided on GitLab. Available online: https://gitlab.com/infth/information-density. Source Code Provided on GitLab. [DOI] [PMC free article] [PubMed]
  • 27.Moschopoulos P.G. The Distribution of the Sum of Independent Gamma Random Variables. Ann. Inst. Stat. Math. 1985;37:541–544. doi: 10.1007/BF02481123. [DOI] [Google Scholar]
  • 28.Comtet L. Advanced Combinatorics: The Art of Finite and Infinite Expansions. Revised and Enlarged ed. D. Reidel Publishing Company; Dordrecht, The Netherlands: 1974. [Google Scholar]
  • 29.Grenander U., Szegö G. Toeplitz Forms and Their Applications. University of California Press; Berkeley, CA, USA: 1958. [Google Scholar]
  • 30.Huffmann J.E.W. Diploma Thesis. Department of Electrical Engineering and Information Technology, Technische Universität Dresden; Dresden, Germany: 2021. [(accessed on 24 June 2022)]. Canonical Correlation and the Calculation of Information Measures for Infinite-Dimensional Distributions. Available online: https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-742541. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

An implementation in Python allowing efficient numerical calculations related to the main results of the paper is publicly available on GitLab: https://gitlab.com/infth/information-density (accessed on 24 June 2022).


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES