On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Jonathan E W Huffmann; Martin Mittelbach

doi:10.3390/e24070924

. 2022 Jul 2;24(7):924. doi: 10.3390/e24070924

On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Jonathan E W Huffmann ¹, Martin Mittelbach ^2,^*

Editor: Joaquín Abellán

PMCID: PMC9323744 PMID: 35885147

Abstract

Based on the canonical correlation analysis, we derive series representations of the probability density function (PDF) and the cumulative distribution function (CDF) of the information density of arbitrary Gaussian random vectors as well as a general formula to calculate the central moments. Using the general results, we give closed-form expressions of the PDF and CDF and explicit formulas of the central moments for important special cases. Furthermore, we derive recurrence formulas and tight approximations of the general series representations, which allow efficient numerical calculations with an arbitrarily high accuracy as demonstrated with an implementation in Python publicly available on GitLab. Finally, we discuss the (in)validity of Gaussian approximations of the information density.

Keywords: information density, information spectrum, probability density function, cumulative distribution function, central moments, Gaussian random vector, canonical correlation analysis

1. Introduction and Main Theorems

Let $ξ$ and $η$ be arbitrary random variables on an abstract probability space $(Ω, F, P)$ such that the joint distribution $P_{ξ η}$ is absolutely continuous w. r. t. the product $P_{ξ} \otimes P_{η}$ of the marginal distributions $P_{ξ}$ and $P_{η}$ . If $\frac{d P_{ξ η}}{d P_{ξ} \otimes P η}$ denotes the Radon–Nikodym derivative of $P_{ξ η}$ w. r. t. $P_{ξ} \otimes P_{η}$ , then

\begin{matrix} i (ξ; η) = log (\frac{d P_{ξ η}}{d P_{ξ} \otimes P_{η}} (ξ, η)) \end{matrix}

is called the information density of $ξ$ and $η$ . The expectation $E (i (ξ; η)) = I (ξ; η)$ of the information density, called mutual information, plays a key role in characterizing the asymptotic channel coding performance in terms of channel capacity. The non-asymptotic performance, however, is determined by the higher-order moments of the information density and its probability distribution. Achievability and converse bounds that allow a finite blocklength analysis of the optimum channel coding rate are closely related to the distribution function of the information density, also called information spectrum by Han and Verdú [1,2]. Moreover, based on the variance of the information density tight second-order finite blocklength approximations of the optimum code rate can be derived for various important channel models. First work on a non-asymptotic information theoretic analysis was already published in the early years of information theory by Shannon [3], Dobrushin [4], and Strassen [5], among others. Due to the seminal work of Polyanskiy et al. [6], considerable progress has been made in this area. The results of [6] on the one hand and the requirements of current and future wireless networks regarding latency and reliability on the other hand stimulated a significant new interest in this type of analysis (Durisi et al. [7]).

The information density $i (ξ; η)$ in the case when $ξ$ and $η$ are jointly Gaussian is of special interest due to the prominent role of the Gaussian distribution. Let $ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})$ and $η = (η_{1}, η_{2}, \dots, η_{q})$ be real-valued random vectors with nonsingular covariance matrices $R_{ξ}$ and $R_{η}$ and cross-covariance matrix $R_{ξ r v Y}$ with rank $r = rank (R_{ξ η})$ . (For notational convenience, we write vectors as row vectors. However, in expressions where matrix or vector multiplications occur, we consider all vectors as column vectors.) Without loss of generality for the subsequent results, we assume the expectation of all random variables to be zero. If $(ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{q})$ is a Gaussian random vector, then Pinsker [8], Ch. 9.6 has shown that the distribution of the information density $i (ξ; η)$ coincides with the distribution of the random variable

\begin{matrix} ν & = \frac{1}{2} \sum_{i = 1}^{r} ϱ_{i} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2}) + I (ξ; η) . \end{matrix}

(1)

In this representation ${\tilde{ξ}}_{1}, {\tilde{ξ}}_{2}, \dots, {\tilde{ξ}}_{r}, {\tilde{η}}_{1}, {\tilde{η}}_{2},$ … $, {\tilde{η}}_{r}$ are independent and identically distributed (i.i.d.) Gaussian random variables with zero mean and unit variance and the mutual information $I (ξ; η)$ in (1) has the form

\begin{matrix} I (ξ; η) & = \frac{1}{2} \sum_{i = 1}^{r} log (\frac{1}{1 - ϱ_{i}^{2}}) . \end{matrix}

(2)

Moreover, $ϱ_{1} \geq ϱ_{2} \geq \dots \geq ϱ_{r} > 0$ denote the positive canonical correlations of $ξ$ and $η$ in descending order, which are obtained by a linear method called canonical correlation analysis that yields the maximum correlations between two sets of random variables (see Section 3). The rank r of the cross-covariance matrix $R_{ξ η}$ satisfies $0 \leq r \leq min {p, q}$ , and for $r = 0$ we have $i (ξ; η) \equiv 0$ almost surely and $I (ξ; η) = 0$ . This corresponds to $P_{ξ η} = P_{ξ} \otimes P_{η}$ and the independence of $ξ$ and $η$ such that the resulting information density is deterministic. Throughout the rest of the paper, we exclude this degenerated case when the information density is considered and assume subsequently the setting and notation introduced above with $r \geq 1$ . As customary notation, we further write $R$ , $N_{0}$ , and $N$ to denote the set of real numbers, non-negative integers, and positive integers.

Main contributions. Based on (1), we derive in Section 4 series representations of the probability density function (PDF) and the cumulative distribution function (CDF) as well as explicit general formulas for the central moments of the information density $i (ξ; η)$ given subsequently in Theorems 1 to 3. The series representations are useful as they allow tight approximations with errors as low as desired by finite sums as shown in Section 5.2. Moreover, we derive recurrence formulas in Section 5.1 that allow efficient numerical calculations of the series representations in Theorems 1 and 2.

Theorem 1

(PDF of information density). The PDF $f_{i (ξ; η)}$ of the information density $i (ξ; η)$ is given by

$\begin{matrix} f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} \sqrt{π}} \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ \frac{K_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{|x - I (ξ; η)|}{ϱ_{r}})}{Γ (\frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1})} {(\frac{|x - I (ξ; η)|}{2 ϱ_{r}})}^{(\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}, x \in R \ {I (ξ; η)}, \end{matrix}$ (3)

where $Γ (\cdot)$ denotes the gamma function [9], Sec. 5.2.1 and $K_{α} (\cdot)$ denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii). If $r \geq 2$ , then $f_{i (ξ; η)} (x)$ is also well defined for $x = I (ξ; η)$ .

Theorem 2

(CDF of information density). The CDF $F_{i (ξ; η)}$ of the information density $i (ξ; η)$ is given by

$F_{i (ξ; η)} (x) = \{\begin{matrix} \frac{1}{2} - V (I (ξ; η) - x) & i f x \leq I (ξ; η) \\ \frac{1}{2} + V (x - I (ξ; η)) & i f x > I (ξ; η) \end{matrix},$

with $V (z)$ defined by

$\begin{matrix} V (z) = & \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \frac{z}{2 ϱ_{r}} \times \\ [K_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}}) \\ + K_{\frac{r - 3}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}})], z \geq 0, \end{matrix}$ (4)

where $L_{α} (\cdot)$ denotes the modified Struve $L$ function of order α [9], Sec. 11.2.

The method to obtain the result in Theorem 1 is adopted from Mathai [10], where a series representation of the PDF of the sum of independent gamma distributed random variables is derived. Previous work of Grad and Solomon [11] and Kotz et al. [12] goes in a similar direction as Mathai [10]; however, it is not directly applicable since only the restriction to positive series coefficients is considered there. Using Theorem 1, the series representation of the CDF of the information density in Theorem 2 is obtained. The details of the derivations of Theorems 1 and 2 are provided in Section 4.

Theorem 3

(Central moments of information density). The m-th central moment $E ({[i (ξ; η) - I (ξ; η)]}^{m})$ of the information density $i (ξ; η)$ is given by

$\begin{matrix} E ([i (ξ; η) & {- I (ξ; η)]}^{m}) \end{matrix}$

$= \{\begin{matrix} \sum_{(m_{1}, m_{2}, \dots, m_{r}) \in K_{m, r}^{[2]}} m! \prod_{i = 1}^{r} \frac{(2 m_{i})!}{4^{m_{i}} {(m_{i}!)}^{2}} ϱ_{i}^{2 m_{i}} & i f m = 2 \tilde{m} \\ 0 & i f m = 2 \tilde{m} - 1 \end{matrix},$ (5)

for all $\tilde{m} \in N$ , where $K_{m, r}^{[2]} = \{(m_{1}, m_{2}, \dots, m_{r}) \in N_{0}^{r} : 2 m_{1} + 2 m_{2} + \dots + 2 m_{r} = m\}$ .

Pinsker [8], Eq. (9.6.17) provided a formula for $\sum_{i = 1}^{r} E ({[\frac{ϱ_{i}}{2} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2})]}^{m})$ , which he called “derived m-th central moment” of the information density, where ${\tilde{ξ}}_{i}$ and ${\tilde{η}}_{i}$ are given as in (1). These special moments coincide for $m = 2$ with the usual central moments considered in Theorem 3.

The rest of the paper is organized as follows: In Section 2, we discuss important special cases which allow simplified and explicit formulas. In Section 3, we provide some background on the canonical correlation analysis and its application to the calculation of the information density and mutual information for Gaussian random vectors. The proofs of the main Theorems 1 to 3 are given in Section 4. Recurrence formulas, finite sum approximations, and uniform bounds of the approximation error are derived in Section 5, which allow efficient and accurate numerical calculations of the PDF and CDF of the information density. Some examples and illustrations are provided in Section 6, where also the (in)validity of Gaussian approximations is discussed. Finally, Section 7 summarizes the paper. Note that a first version of this paper was published on arXiv as preprint [13].

2. Special Cases

2.1. Equal Canonical Correlations

A simple but important special case for which the series representations in Theorems 1 and 2 simplify to a single summand and the sum of products in Theorem 3 simplifies to a single product is considered in the following corollary.

Corollary 1

(PDF, CDF, and central moments of information density for equal canonical correlations). If all canonical correlations are equal, i.e.,

$ϱ_{1} = ϱ_{2} = \dots = ϱ_{r},$

then we have the following simplifications.

(i) The PDF $f_{i (ξ; η)}$ of the information density $i (ξ; η)$ simplifies to

$f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} \sqrt{π} Γ (\frac{r}{2})} K_{\frac{r - 1}{2}} (\frac{|x - I (ξ; η)|}{ϱ_{r}}) {(\frac{|x - I (ξ; η)|}{2 ϱ_{r}})}^{\frac{r - 1}{2}}, x \in R \ {I (ξ; η)},$ (6)

where $I (ξ; η)$ is given by

$I (ξ; η) = - \frac{r}{2} log (1 - ϱ_{r}^{2}) .$

If $r \geq 2$ , then $f_{i (ξ; η)} (x)$ is also well defined for $x = I (ξ; η)$ .

(ii) The CDF $F_{i (ξ; η)}$ of the information density $i (ξ; η)$ is given by

$F_{i (ξ; η)} (x) = \{\begin{matrix} \frac{1}{2} - V (I (ξ; η) - x) & i f x \leq I (ξ; η) \\ \frac{1}{2} + V (x - I (ξ; η)) & i f x > I (ξ; η) \end{matrix},$ (7)

with $V (z)$ defined by

$V (z) = \frac{z}{2 ϱ_{r}} [K_{\frac{r - 1}{2}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2}} (\frac{z}{ϱ_{r}}) + K_{\frac{r - 3}{2}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2}} (\frac{z}{ϱ_{r}})], z \geq 0 .$ (8)

(iii) The m-th central moment $E ({[i (ξ; η) - I (ξ; η)]}^{m})$ of the information density $i (ξ; η)$ has the form

$\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) = & \{\begin{matrix} \frac{m!}{(m / 2)!} (\prod_{j = 1}^{m / 2} (\frac{r}{2} + j - 1)) ϱ_{r}^{m} & i f m = 2 \tilde{m} \\ 0 & i f m = 2 \tilde{m} - 1 \end{matrix}, \end{matrix}$ (9)

for all $\tilde{m} \in N$ .

Clearly, if all canonical correlations are equal, then the only nonzero term in the series (3) and (4) occur for $k_{1} = k_{2} = \dots = k_{r - 1} = 0$ . For this single summand, the product in squared brackets in (3) and (4) is equal to 1 by applying $0^{0} = 1$ , which yields the results of part (i) and (ii) in Corollary 1. Details of the derivation of part (iii) of the corollary are provided in Section 4.

Note, if all canonical correlations are equal, then we can rewrite (1) as follows:

\begin{matrix} ν & = \frac{ϱ_{r}}{2} (\sum_{i = 1}^{r} {\tilde{ξ}}_{i}^{2} - \sum_{i = 1}^{r} {\tilde{η}}_{i}^{2}) + I (ξ; η) . \end{matrix}

This implies that $ν$ coincides with the distribution of the random variable

\begin{matrix} ν_{*} & = \frac{ϱ_{r}}{2} (ζ_{1} - ζ_{2}) + I (ξ; η), \end{matrix}

where $ζ_{1}$ and $ζ_{2}$ are i.i.d. $χ^{2}$ -distributed random variables with r degrees of freedom. With this representation, we can obtain the expression of the PDF given in (6) also from [14], Sec. 4.A.4.

Special cases of Corollary 1. The case when all canonical correlations are equal is important because it occurs in various situations. The subsequent cases follow from the properties of canonical correlations given in Section 3.

(i) Assume that the random variables $ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{q}$ are pairwise uncorrelated with the exception of the pairs $(ξ_{i}, η_{i}), i = 1, 2, \dots, k \leq min {p, q}$ for which we have $cor (ξ_{i}, η_{i}) = ρ \neq 0$ , where $cor (\cdot, \cdot)$ denotes the Pearson correlation coefficient. Then, $r = k$ and $ϱ_{i} = | ρ |$ for all $i = 1, 2, \dots, r$ . Note, if $p = q = k$ , then for the previous conditions to hold, it is sufficient that the two-dimensional random vectors $(ξ_{i}, η_{i})$ are i.i.d. However, the identical distribution of the $(ξ_{i}, η_{i})$ ’s is not necessary. In Laneman [15], the distribution of the information density for an additive white Gaussian noise channel with i.i.d. Gaussian input is determined. This is a special case of the case with i.i.d. random vectors $(ξ_{i}, η_{i})$ just mentioned. In Wu and Jindal [16] and in Buckingham and Valenti [17], an approximation of the information density by a Gaussian random variable is considered for the setting in [15]. A special case very similar to that in [15] is also considered in Polyanskiy et al. [6], Sec. III.J. To the best of the authors’ knowledge, explicit formulas for the general case as considered in this paper are not available yet in the literature.

(ii) Assume that the conditions of part (i) are satisfied. Furthermore, assume that $\hat{A}$ is a real nonsingular matrix of dimension $p \times p$ and $\hat{B}$ is a real nonsingular matrix of dimension $q \times q$ . Then, the random vectors

\begin{matrix} \hat{ξ} = \hat{A} ξ and \hat{η} = \hat{B} η \end{matrix}

have the same canonical correlations as the random vectors $ξ$ and $η$ , i.e., $ϱ_{i} = | ρ |$ for all $i = 1, 2, \dots, k \leq min {p, q}$ .

(iii) If $r = 1$ , i.e., if the cross-covariance matrix $R_{ξ, η}$ has rank 1, then Corollary 1 obviously applies. Clearly, the most simple special case with $r = 1$ occurs for $p = q = 1$ , where $ϱ_{1} = | cor (ξ_{1}, η_{1}) |$ .

As a simple multivariate example, let the covariance matrix of the random vector $(ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{q})$ be given by the Kac-Murdock–Szegö matrix

\begin{matrix} (\begin{matrix} R_{ξ} & R_{ξ η} \\ R_{ξ η} & R_{η} \end{matrix}) = {(ρ^{| i - j |})}_{i, j = 1}^{p + q} \end{matrix}

which is related to the covariance function of a first-order autoregressive process, where $0 < | ρ | < 1$ . Then, $r = rank (R_{ξ η}) = 1$ and $ϱ_{1} = | ρ |$ .

(iv) As yet another example, assume $p = q$ and $R_{ξ η} = ρ R_{ξ}^{1 / 2} R_{η}^{1 / 2}$ for some $0 < | ρ | < 1$ . Then, $ϱ_{i} = | ρ |$ for $i = 1, 2, \dots, r = q$ . Here, $A^{1 / 2}$ denotes the square root of the real-valued positive semidefinite matrix A, i.e., the unique positive semidefinite matrix B such that $B B = A$ .

2.2. More on Special Cases with Simplified Formulas

Let us further evaluate the formulas given in Corollary 1 and Theorem 3 for some relevant parameter values.

(i) Single canonical correlation coefficient. In the most simple case, there is only a single non-zero canonical correlation coefficient, i.e., $r = 1$ . (Recall, in the beginning of the paper, we have excluded the degenerated case when all canonical correlations are zero.) Then, the formulas of the PDF and the m-th central moment in Corollary 1 simplify to the form

f_{i (ξ; η)} (x) = \frac{1}{ϱ_{1} π} K_{0} (\frac{|x - I (ξ; η|)}{ϱ_{1}}), x \in R \ {I (ξ; η)},

and

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) = & \{\begin{matrix} {(\frac{m!}{(m / 2)!})}^{2} {(\frac{ϱ_{1}}{2})}^{m} & i f m = 2 \tilde{m} \\ 0 & i f m = 2 \tilde{m} - 1 \end{matrix} \end{matrix},

(10)

for all $\tilde{m} \in N$ . A formula equivalent to (10) is also provided by Pinsker [8], Lemma 9.6.1 who considered the special case $p = q = 1$ , which implies $r = 1$ .

(ii) Second and fourth central moment. To demonstrate how the general formula given in Theorem 3 is used, we first consider $m = 2$ . In this case, the summation indices $m_{1}, m_{2}, \dots, m_{r}$ have to satisfy $m_{i} = 1$ for a single $i \in {1, 2, \dots, r}$ , whereas the remaining $m_{i}$ ’s have to be zero. Thus, (5) evaluates for $m = 2$ to

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{2}) = var (i (ξ; η)) = \sum_{i = 1}^{r} ϱ_{i}^{2} . \end{matrix}

(11)

As a slightly more complex example, let $m = 4$ . In this case, either we have $m_{i} = 2$ for a single $i \in {1, 2, \dots, r}$ , whereas the remaining $m_{i}$ ’s are zero or we have $m_{i_{1}} = m_{i_{2}} = 1$ for two $i_{1} \neq i_{2} \in {1, 2, \dots, r}$ , whereas the remaining $m_{i}$ ’s have to be zero. Thus, (5) evaluates for $m = 4$ to

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{4}) = 9 \sum_{i = 1}^{r} ϱ_{i}^{4} + 6 \sum_{i = 2}^{r} \sum_{j = 1}^{i - 1} ϱ_{i}^{2} ϱ_{j}^{2} . \end{matrix}

(iii) Even number of equal canonical correlations. As in Corollary 1, assume that all canonical correlations are equal and additionally assume that the number r of canonical correlations is even, i.e., $r = 2 \tilde{r}$ for some $\tilde{r} \in N$ . Then, we can use [9], Secs. 10.47.9, 10.49.1, and 10.49.12 to obtain the following relation for the modified Bessel function $K_{α} (\cdot)$ of a second kind and order $α$

\begin{matrix} K_{\frac{r - 1}{2}} (y) = \sqrt{\frac{π}{2}} exp (- y) \sum_{i = 0}^{r / 2 - 1} \frac{(r / 2 - 1 + i)!}{(r / 2 - 1 - i)! i! 2^{i}} y^{- (i + \frac{1}{2})}, y \in (0, \infty) . \end{matrix}

(12)

Plugging (12) into (6) and rearranging terms yields the following expression for the PDF of the information density:

\begin{matrix} f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} 2^{r - 1} (r / 2 - 1)!} & exp (- \frac{|x - I (ξ; η)|}{ϱ_{r}}) \times \\ \sum_{i = 0}^{r / 2 - 1} \frac{(2 (r / 2 - 1) - i)! 2^{i}}{(r / 2 - 1 - i)! i!} {(\frac{|x - I (ξ; η)|}{ϱ_{r}})}^{i}, x \in R . \end{matrix}

By integration, we obtain for the function $V (\cdot)$ in (8) the expression

\begin{matrix} V (z) = \frac{1}{2} - \frac{1}{2^{r - 1} (r / 2 - 1)!} & exp (- \frac{z}{ϱ_{r}}) \times \\ \sum_{i = 0}^{r / 2 - 1} \frac{(2 (r / 2 - 1) - i)! 2^{i}}{(r / 2 - 1 - i)!} \sum_{j = 0}^{i} \frac{1}{(i - j)!} {(\frac{z}{ϱ_{r}})}^{i - j}, z \geq 0 . \end{matrix}

Note that these special formulas can also be obtained directly from the results given in [14], Sec. 4.A.3.

To illustrate the principal behavior of the PDF and CDF of the information density for equal canonical correlations, it is instructive to consider the specific value $r = 2$ in the above formulas, which yields

\begin{matrix} f_{i (ξ; η)} (x) & = \frac{1}{2 ϱ_{r}} exp (- \frac{|x - I (ξ; η)|}{ϱ_{r}}), & x \in R, \\ V (z) & = \frac{1}{2} (1 - exp (- \frac{z}{ϱ_{r}})), & z \geq 0 \end{matrix}

and $r = 4$ , for which we obtain

\begin{matrix} f_{i (ξ; η)} (x) & = \frac{1}{4 ϱ_{r}} exp (- \frac{|x - I (ξ; η)|}{ϱ_{r}}) (1 + \frac{|x - I (ξ; η)|}{ϱ_{r}}), & x \in R, \\ V (z) & = \frac{1}{2} (1 - exp (- \frac{z}{ϱ_{r}}) (1 + \frac{z}{2 ϱ_{r}})), & z \geq 0 . \end{matrix}

3. Mutual Information and Information Density in Terms of Canonical Correlations

First introduced by Hotelling [18], the canonical correlation analysis is a widely used linear method in multivariate statistics to determine the maximum correlations between two sets of random variables. It allows a particularly simple and useful representation of the mutual information and the information density of Gaussian random vectors in terms of the so-called canonical correlations. This representation was first obtained by Gelfand and Yaglom [19] and further extended by Pinsker [8], Ch. 9. For the convenience of the reader, we summarize in this section the essence of the canonical correlation analysis and demonstrate how it is applied to derive the representations in (1) and (2).

The formulation of the canonical correlation analysis given below is particularly suitable for implementations. The corresponding results are given without proof. Details and thorough discussions can be found, e.g., in Härdle and Simar [20], Koch [21], or Timm [22].

Based on the nonsingular covariance matrices $R_{ξ}$ and $R_{η}$ of the random vectors $ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})$ and $η = (η_{1}, η_{2}, \dots, η_{q})$ , and the cross-covariance matrix $R_{ξ η}$ with rank $r = rank (R_{ξ η})$ satisfying $0 \leq r \leq min {p, q}$ , define the matrix

M = R_{ξ}^{- \frac{1}{2}} R_{ξ η} R_{η}^{- \frac{1}{2}},

where the inverse matrices $R_{ξ}^{- 1 / 2} = {(R_{ξ}^{1 / 2})}^{- 1}$ and $R_{η}^{- 1 / 2} = {(R_{η}^{1 / 2})}^{- 1}$ can be obtained from diagonalizing $R_{ξ}$ and $R_{η}$ . Then, the matrix M has a singular value decomposition

M = U D V^{T},

where $V^{T}$ denotes the transpose of V. The only non-zero entries $d_{1, 1}, d_{2, 2}, \dots, d_{r, r} > 0$ of the matrix $D = {(d_{i, j})}_{i, j = 1}^{p, q}$ are called canonical correlations of $ξ$ and $η$ , denoted by $ϱ_{i} = d_{i, i}, i = 1, 2, \dots, r$ . The singular value decomposition can be chosen such that $ϱ_{1} \geq ϱ_{2} \geq \dots \geq ϱ_{r}$ holds, which is assumed throughout the paper.

Define the random vectors

\begin{matrix} \hat{ξ} = ({\hat{ξ}}_{1}, {\hat{ξ}}_{2}, \dots, {\hat{ξ}}_{p}) = A ξ and \hat{η} = ({\hat{η}}_{1}, {\hat{η}}_{2}, \dots, {\hat{η}}_{q}) = B η, \end{matrix}

where the nonsingular matrices A and B are given by

\begin{matrix} A = U^{T} R_{ξ}^{- \frac{1}{2}} and B = V^{T} R_{η}^{- \frac{1}{2}} . \end{matrix}

Then, the random variables ${\hat{ξ}}_{1}, {\hat{ξ}}_{2}, \dots, {\hat{ξ}}_{p}, {\hat{η}}_{1}, {\hat{η}}_{2}, \dots, {\hat{η}}_{q}$ have unit variance and they are pairwise uncorrelated with the exception of the pairs $({\hat{ξ}}_{i}, {\hat{η}}_{i}), i = 1, 2, \dots, r$ for which we have $cor ({\hat{ξ}}_{i}, {\hat{η}}_{i}) = ϱ_{i}$ .

Using these results, we obtain for the mutual information and the information density

\begin{matrix} I (ξ; η) & = I (A ξ; B η) & = I (\hat{ξ}; \hat{η}) & = \sum_{i = 1}^{r} I ({\hat{ξ}}_{i}; {\hat{η}}_{i}) \end{matrix}

(13)

\begin{matrix} i (ξ; η) & = i (A ξ; B η) & = i (\hat{ξ}; \hat{η}) & = \sum_{i = 1}^{r} i ({\hat{ξ}}_{i}; {\hat{η}}_{i}) (P-almost surely) . \end{matrix}

(14)

The first equality in (13) and (14) holds because A and B are nonsingular matrices, which follows, e.g., from Pinsker [8], Th. 3.7.1. Since we consider the case where $ξ$ and $η$ are jointly Gaussian, $\hat{ξ}$ and $\hat{η}$ are jointly Gaussian as well. Therefore, the correlation properties of $\hat{ξ}$ and $\hat{η}$ imply that all random variables ${\hat{ξ}}_{i}, {\hat{η}}_{j}$ are independent except for the pairs $({\hat{ξ}}_{i}, {\hat{η}}_{i})$ , $i = 1, 2, \dots, r$ . This implies the last equality in (13) and (14), where $i ({\hat{ξ}}_{1}; {\hat{η}}_{1}), i ({\hat{ξ}}_{2}; {\hat{η}}_{2}), \dots, i ({\hat{ξ}}_{r}; {\hat{η}}_{r})$ are independent. The sum representations follow from the chain rules of mutual information and information density and the equivalence between independence and vanishing mutual information and information density.

Since ${\hat{ξ}}_{i}$ and ${\hat{η}}_{i}$ are jointly Gaussian with correlation $cor ({\hat{ξ}}_{i}, {\hat{η}}_{i}) = ϱ_{i}$ , we obtain from (13) and the formula of mutual information for the bivariate Gaussian case the identity (2). Additionally, with ${\hat{ξ}}_{i}$ and ${\hat{η}}_{i}$ having zero mean and unit variance, the information density $i ({\hat{ξ}}_{i}; {\hat{η}}_{i})$ is further given by

\begin{matrix} i ({\hat{ξ}}_{i}; {\hat{η}}_{i}) = - \frac{1}{2} log (1 - ϱ_{i}^{2}) - \frac{ϱ_{i}^{2}}{2 (1 - ϱ_{i}^{2})} ({\hat{ξ}}_{i}^{2} - \frac{2 {\hat{ξ}}_{i} {\hat{η}}_{i}}{ϱ_{i}} + {\hat{η}}_{i}^{2}), i = 1, 2, \dots, r . \end{matrix}

(15)

Now assume ${\tilde{ξ}}_{1}, {\tilde{ξ}}_{2}, \dots, {\tilde{ξ}}_{r}, {\tilde{η}}_{1}, {\tilde{η}}_{2},$ … $, {\tilde{η}}_{r}$ are i.i.d. Gaussian random variables with zero mean and unit variance. Then, the distribution of the random vector

\begin{matrix} \frac{1}{\sqrt{2}} (\sqrt{1 + ϱ_{i}} {\tilde{ξ}}_{i} + \sqrt{1 - ϱ_{i}} {\tilde{η}}_{i}, \sqrt{1 + ϱ_{i}} {\tilde{ξ}}_{i} - \sqrt{1 - ϱ_{i}} {\tilde{η}}_{i}) \end{matrix}

coincides with the distribution of the random vector $({\hat{ξ}}_{i}, {\hat{η}}_{i})$ for all $i = 1, 2, \dots, r$ . Plugging this into (15), we obtain together with (14) that the distribution of the information density $i (ξ; η)$ coincides with the distribution of (1).

4. Proof of Main Results

4.1. Auxiliary Results

To prove Theorem 1, the following lemma regarding the characteristic function of the information density is utilized. The results of the lemma are also used in Ibragimov and Rozanov [23] but without proof. Therefore, the proof is given below for completeness.

Lemma 1

(Characteristic function of (shifted) information density). The characteristic function of the shifted information density $i (ξ; η) - I (ξ; η)$ is equal to the characteristic function of the random variable

$\begin{matrix} \tilde{ν} & = \frac{1}{2} \sum_{i = 1}^{r} ϱ_{i} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2}), \end{matrix}$ (16)

where ${\tilde{ξ}}_{1}, {\tilde{ξ}}_{2}, \dots, {\tilde{ξ}}_{r}, {\tilde{η}}_{1}, {\tilde{η}}_{2},$ … $, {\tilde{η}}_{r}$ are i.i.d. Gaussian random variables with zero mean and unit variance, and $ϱ_{1}, ϱ_{2}, \dots, ϱ_{r}$ are the canonical correlations of ξ and η. The characteristic function of $\tilde{ν}$ is given by

$φ_{\tilde{ν}} (t) = \prod_{i = 1}^{r} \frac{1}{\sqrt{1 + ϱ_{i}^{2} t^{2}}}, t \in R .$ (17)

Proof.

Due to (1), the distribution of the shifted information density $i (ξ; η) - I (ξ; η)$ coincides with the distribution of the random variable $\tilde{ν}$ in (16) such that the characteristic functions of $i (ξ; η) - I (ξ; η)$ and $\tilde{ν}$ are equal.

It is a well known fact that ${\tilde{ξ}}_{i}^{2}$ and ${\tilde{η}}_{i}^{2}$ in (16) are chi-squared distributed random variables with one degree of freedom from which we obtain that the weighted random variables $ϱ_{i} {\tilde{ξ}}_{i}^{2} / 2$ and $ϱ_{i} {\tilde{η}}_{i}^{2} / 2$ are gamma distributed with a scale parameter of $1 / ϱ_{i}$ and shape parameter of $1 / 2$ . The characteristic function of these random variables therefore admits the form

$φ_{\frac{ϱ_{i}}{2} {\tilde{ξ}}_{i}^{2}} (t) = {(1 - ϱ_{i} j t)}^{- \frac{1}{2}} .$

Further, from the identity $φ_{- ϱ_{i} {\tilde{ξ}}_{i}^{2} / 2} (t) = φ_{ϱ_{i} {\tilde{ξ}}_{i}^{2} / 2} (- t)$ for the characteristic function and from the independence of ${\tilde{ξ}}_{i}$ and ${\tilde{η}}_{i}$ , we obtain the characteristic function of ${\tilde{ν}}_{i} = ϱ_{i} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2}) / 2$ to be given by

$φ_{{\tilde{ν}}_{i}} (t) = {(1 - ϱ_{i} j t)}^{- \frac{1}{2}} {(1 + ϱ_{i} j t)}^{- \frac{1}{2}} = {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}} .$

Finally, because $\tilde{ν}$ in (16) is given by the sum of the independent random variables ${\tilde{ν}}_{i}$ , the characteristic function of $\tilde{ν}$ results from multiplying the individual characteristic functions of the random variables ${\tilde{ν}}_{i}$ . By doing so, we obtain (17). □

As further auxiliary result, the subsequent proposition providing properties of the modified Bessel function $K_{α}$ of second kind and order $α$ will be used to prove the main results.

Proposition 1

(Properties related to the function $K_{α}$ ). For all $α \in R$ , the function

$\begin{matrix} y \mapsto y^{α} K_{α} (y), y \in (0, \infty), \end{matrix}$

where $K_{α} (\cdot)$ denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii), is strictly positive and strictly monotonically decreasing. Furthermore, if $α > 0$ , then we have

$\begin{matrix} lim_{y \to + 0} y^{α} K_{α} (y) = sup_{y \in (0, \infty)} y^{α} K_{α} (y) = Γ (α) 2^{α - 1} . \end{matrix}$ (18)

Proof.

If $α \in R$ is fixed, then $K_{α} (y)$ is strictly positive and strictly monotonically decreasing w. r. t. $y \in (0, \infty)$ due to [9], Secs. 10.27.3 and 10.37. Furthermore, we obtain

$\begin{matrix} \frac{d y^{α} K_{α} (y)}{d y} = - y^{α} K_{α - 1} (y), y \in (0, \infty) \end{matrix}$

by applying the rules to calculate derivatives of Bessel functions given in [9], Sec. 10.29(ii). It follows that $y^{α} K_{α} (y)$ is strictly positive and strictly monotonically decreasing w. r. t. $y \in (0, \infty)$ for all fixed $α \in R$ .

Consider now the Basset integral formula as given in [9], Sec. 10.32.11

$\begin{matrix} K_{α} (y z) = \frac{Γ (α + \frac{1}{2}) {(2 z)}^{α}}{y^{α} \sqrt{π}} \int_{u = 0}^{\infty} \frac{cos (u y)}{{(u^{2} + z^{2})}^{α + \frac{1}{2}}} d u \end{matrix}$ (19)

for $| arg (z) | < π / 2, y > 0, α > - \frac{1}{2}$ and the integral

$\begin{matrix} \int_{u = 0}^{\infty} \frac{1}{{(u^{2} + 1)}^{α + \frac{1}{2}}} d u = \frac{\sqrt{π} Γ (α)}{2 Γ (α + \frac{1}{2})} \end{matrix}$ (20)

for $α > 0$ , where the equality holds due to [24], Secs. 3.251.2 and 8.384.1. Using (19) and (20), we obtain

$\begin{matrix} lim_{y \to + 0} y^{α} K_{α} (y) & = lim_{y \to + 0} \frac{Γ (α + \frac{1}{2}) 2^{α}}{\sqrt{π}} \int_{u = 0}^{\infty} \frac{cos (u y)}{{(u^{2} + 1)}^{α + \frac{1}{2}}} d u \\ = \frac{Γ (α + \frac{1}{2}) 2^{α}}{\sqrt{π}} \int_{u = 0}^{\infty} \frac{1}{{(u^{2} + 1)}^{α + \frac{1}{2}}} d u = Γ (α) 2^{α - 1}, \end{matrix}$

for all $α > 0$ , where we also applied the dominated convergence theorem, which is possible due to $cos (u y) / {(u^{2} + 1)}^{α + 1 / 2} \leq 1 / {(u^{2} + 1)}^{α + 1 / 2}$ . Using the previously derived monotonicity, we obtain (18). □

4.2. Proof of Theorem 1

To prove Theorem 1, we calculate the PDF $f_{\tilde{ν}}$ of the random variable $\tilde{ν}$ introduced in Lemma 1 by inverting the characteristic function $φ_{\tilde{ν}}$ given in (17) via the integral

\begin{matrix} f_{\tilde{ν}} (v) = \frac{1}{2 π} \int_{- \infty}^{\infty} φ_{\tilde{ν}} (t) exp (- J t v) d t, v \in R . \end{matrix}

(21)

Shifting the PDF of $\tilde{ν}$ by $I (ξ; η)$ , we obtain the PDF $f_{i (ξ; η)} = f_{\tilde{ν}} (x - I (ξ; η))$ , $x \in R$ , of the information density $i (ξ; η)$ .

The method used subsequently is based on the work of Mathai [10]. To invert the characteristic function $φ_{\tilde{ν}}$ , we expand the factors in (17) as

\begin{matrix} {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}} & = {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{1}{2}} \frac{ϱ_{r}}{ϱ_{i}} {(1 + (\frac{ϱ_{r}^{2}}{ϱ_{i}^{2}} - 1) {(1 + ϱ_{r}^{2} t^{2})}^{- 1})}^{- \frac{1}{2}} \end{matrix}

(22)

\begin{matrix} = {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{1}{2}} \sum_{k = 0}^{\infty} {(- 1)}^{k} (\binom{- 1 / 2}{k}) \frac{ϱ_{r}}{ϱ_{i}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k} {(1 + ϱ_{r}^{2} t^{2})}^{- k} . \end{matrix}

(23)

In (23), we have used the binomial series

\begin{matrix} {(1 + y)}^{a} = \sum_{k = 0}^{\infty} (\binom{a}{k}) y^{k} \end{matrix}

(24)

where $a \in R$ . The series is absolutely convergent for $| y | < 1$ and

\begin{matrix} (\binom{a}{k}) = \prod_{ℓ = 1}^{k} \frac{a - ℓ + 1}{ℓ}, k \in N, \end{matrix}

(25)

denotes the generalized binomial coefficient with $(\binom{a}{0}) = 1$ . Since

\begin{matrix} |(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}}) {(1 + ϱ_{r}^{2} t^{2})}^{- 1}| < 1 \end{matrix}

(26)

holds for all $t \in R$ , the series in (23) is absolutely convergent for all $t \in R$ . Using the expansion in (23) and the absolute convergence together with the identity

\begin{matrix} (\binom{- 1 / 2}{k}) = \frac{{(- 1)}^{k} (2 k)!}{{(k!)}^{2} 4^{k}} \end{matrix}

(27)

we can rewrite the characteristic function $φ_{\tilde{ν}}$ as

\begin{matrix} φ_{\tilde{ν}} (t) = \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} & [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ {(1 + ϱ_{r}^{2} t^{2})}^{- (\frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}, t \in R . \end{matrix}

(28)

To obtain the PDF $f_{\tilde{ν}}$ , we evaluate the inversion integral (21) based on the series representation in (28). Since every series in (28) is absolutely convergent, we can exchange summation and integration. Let $β = \frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1}$ . Then, by symmetry, we have for the integral of a summand

\int_{t = - \infty}^{\infty} \frac{exp (- J t v)}{{(1 + ϱ_{r}^{2} t^{2})}^{β}} d t = 2 \int_{t = 0}^{\infty} \frac{cos (t v)}{{(1 + ϱ_{r}^{2} t^{2})}^{β}} d t = \frac{2}{ϱ_{r}} \int_{u = 0}^{\infty} \frac{cos (u v / ϱ_{r})}{{(1 + u^{2})}^{β}} d u,

(29)

where the second equality is a result of the substitution $t = u / ϱ_{r}$ . By setting $z = 1$ , $α = β - \frac{1}{2} \geq 0$ and $y = v / ϱ_{r}$ in the Basset integral formula given in (19) in the proof of Proposition 1 and using the symmetry with respect to v, we can evaluate (29) to the following form:

\int_{t = - \infty}^{\infty} \frac{exp (- J t v)}{{(1 + ϱ_{r}^{2} t^{2})}^{β}} d t = \frac{\sqrt{π}}{Γ (β) 2^{β - \frac{3}{2}} ϱ_{r}^{β + \frac{1}{2}}} K_{β - \frac{1}{2}} (\frac{| v |}{ϱ_{r}}) {| v |}^{β - \frac{1}{2}}, v \in R \ {0} .

(30)

Combining (21), (28), and (30) yields

\begin{matrix} f_{\tilde{ν}} (v) = \frac{1}{2 \sqrt{π}} \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ \frac{K_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{| v |}{ϱ_{r}}) {| v |}^{(\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}}{Γ (\frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1}) 2^{(\frac{r - 3}{2} + k_{1} + k_{2} + \dots + k_{r - 1})} ϱ_{r}^{(\frac{r + 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}}, v \in R \ {0} . \end{matrix}

(31)

Slightly rearranging terms and shifting $f_{\tilde{ν}} (\cdot)$ by $I (ξ; η)$ yields (3).

It remains to show that $f_{i (ξ; η)} (x)$ is also well defined for $x = I (ξ; η)$ if $r \geq 2$ . Indeed, if $r \geq 2$ , then we can use Proposition 1 to obtain

\begin{matrix} lim_{x \to I (ξ; η)} f_{i (ξ; η)} (x) = \frac{1}{2 ϱ_{r} \sqrt{π}} \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ \frac{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1} + \frac{1}{2})} \end{matrix}

where we used the exchangeability of the limit and the summation due to the absolute convergence of the series. Since $Γ (α) / Γ (α + \frac{1}{2})$ is decreasing w. r. t. $α \geq \frac{1}{2}$ , we have

\begin{matrix} \frac{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1} + \frac{1}{2})} \leq \frac{Γ (\frac{r - 1}{2})}{Γ (\frac{r - 1}{2} + \frac{1}{2})} \leq \sqrt{π} . \end{matrix}

Then, with (69) in the proof of Theorem 4, it follows that ${lim}_{x \to I (ξ; η)} f_{i (ξ; η)} (x)$ exists and is finite. □

4.3. Proof of Theorem 2

To prove Theorem 2, we calculate the CDF $F_{\tilde{ν}}$ of the random variable $\tilde{ν}$ introduced in Lemma 1 by integrating the PDF $f_{\tilde{ν}}$ given in (31). Shifting the CDF of $\tilde{ν}$ by $I (ξ; η)$ , we obtain the CDF $F_{i (ξ; η)} (x) = F_{\tilde{ν}} (x - I (ξ; η)), x \in R$ , of the information density $i (ξ; η)$ . Using the symmetry of $f_{\tilde{ν}}$ , we can write

F_{\tilde{ν}} (z) = P (\tilde{ν} \leq z) = \{\begin{matrix} \frac{1}{2} - \int_{v = 0}^{- z} f_{\tilde{ν}} (v) d v & for z \leq 0 \\ \frac{1}{2} + \int_{v = 0}^{z} f_{\tilde{ν}} (v) d v & for z > 0 \end{matrix} .

It is therefore sufficient to evaluate the integral

\begin{matrix} V (z) : = \int_{v = 0}^{z} f_{\tilde{ν}} (v) d v \end{matrix}

(32)

for $z \geq 0$ . To calculate the integral (32), we plug (31) into (32) and exchange integration and summation, which is justified by the monotone convergence theorem. To evaluate the integral of a summand, consider the following identity

\begin{matrix} \int_{x = 0}^{z} x^{α} K_{α} (x) d x = 2^{α - 1} \sqrt{π} Γ (α + \frac{1}{2}) z [K_{α} (z) L_{α - 1} (z) + K_{α - 1} (z) L_{α} (z)] \end{matrix}

(33)

for $α > - 1 / 2$ given in [25], Sec. 1.12.1.3, where $L_{α} (\cdot)$ denotes the modified Struve $L$ function of order $α$ [9], Sec. 11.2. Using (33) with $α = \frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1} \geq 0$ , we obtain (4). □

4.4. Proof of Theorem 3

Using the random variable

\begin{matrix} \tilde{ν} = \sum_{i = 1}^{r} {\tilde{ν}}_{i} with {\tilde{ν}}_{i} = \frac{ϱ_{i}}{2} ({\tilde{ξ}}_{i} - {\tilde{η}}_{i}) \end{matrix}

introduced in Lemma 1 and the well-known multinomial theorem [9], Sec. 26.4.9

\begin{matrix} {(y_{1} + y_{2} + \dots y_{r})}^{m} = \sum_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}} m! \prod_{i = 1}^{r} \frac{y_{i}^{ℓ_{i}}}{ℓ_{i}!}, \end{matrix}

where $K_{m, r} = \{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in N_{0}^{r} : ℓ_{1} + ℓ_{2} + \dots + ℓ_{r} = m\}$ , we can write the m-th central moment of the information density $i (ξ; η)$ as

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) & = E ({[\sum_{i = 1}^{r} {\tilde{ν}}_{i}]}^{m}) \\ = \sum_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}} m! \prod_{i = 1}^{r} \frac{E ({\tilde{ν}}_{i}^{ℓ_{i}})}{ℓ_{i}!} . \end{matrix}

(34)

To obtain the second equality in (34), we have exchanged expectation and summation and additionally used the identity $E (\prod_{i = 1}^{r} {\tilde{ν}}_{i}^{ℓ_{i}}) = \prod_{i = 1}^{r} E ({\tilde{ν}}_{i}^{ℓ_{i}})$ , which holds due to the independence of the random variables ${\tilde{ν}}_{1}, {\tilde{ν}}_{2}, \dots, {\tilde{ν}}_{r}$ .

Based on the relation between the ℓ-th central moment of a random variable and the ℓ-th derivative of its characteristic function at 0, we further have

\begin{matrix} E ({\tilde{ν}}_{i}^{ℓ_{i}}) = {(- j)}^{ℓ_{i}} \frac{d^{ℓ_{i}}}{d t^{ℓ_{i}}} φ_{{\tilde{ν}}_{i}} (t) |_{t = 0}, \end{matrix}

(35)

where $φ_{{\tilde{ν}}_{i}} (t) = {(1 + ϱ_{i}^{2} t^{2})}^{- 1 / 2}, t \in R,$ is the characteristic function of the random variable ${\tilde{ν}}_{i}$ derived in the proof of Lemma 1. As in the proof of Theorem 1, consider now the binomial series expansion using (24)

\begin{matrix} φ_{{\tilde{ν}}_{i}} (t) = {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}} = \sum_{m_{i} = 0}^{\infty} (\binom{- 1 / 2}{m_{i}}) {(ϱ_{i} t)}^{2 m_{i}} . \end{matrix}

The series is absolutely convergent for all $t < ϱ_{i}^{- 1}$ . Furthermore, consider the Taylor series expansion of the characteristic function $φ_{{\tilde{ν}}_{i}}$ at the point 0

\begin{matrix} φ_{{\tilde{ν}}_{i}} (t) = \sum_{ℓ_{i} = 0}^{\infty} (\frac{d^{ℓ_{i}}}{d t^{ℓ_{i}}} φ_{{\tilde{ν}}_{i}} (t) |_{t = 0}) \frac{t^{ℓ_{i}}}{ℓ_{i}!} . \end{matrix}

Both series expansions must be identical in an open interval around 0 such that we obtain by comparing the series coefficients

\begin{matrix} \frac{d^{ℓ_{i}}}{d t^{ℓ_{i}}} φ_{{\tilde{ν}}_{i}} (t) |_{t = 0} = & \{\begin{matrix} ℓ_{i}! (\binom{- 1 / 2}{ℓ_{i} / 2}) ϱ_{i}^{ℓ_{i}} & if ℓ_{i} = 2 m_{i} \\ 0 & if ℓ_{i} = 2 m_{i} - 1 \end{matrix} \end{matrix}

for all $m_{i} \in N$ . With this result, (35) evaluates to

\begin{matrix} E ({\tilde{ν}}_{i}^{ℓ_{i}}) = & \{\begin{matrix} \frac{{(ℓ_{i}!)}^{2}}{{((ℓ_{i} / 2)!)}^{2} 4^{\frac{ℓ_{i}}{2}}} ϱ_{i}^{ℓ_{i}} & if ℓ_{i} = 2 m_{i} \\ 0 & if ℓ_{i} = 2 m_{i} - 1 \end{matrix} \end{matrix}

(36)

for all $m_{i} \in N$ , where we have additionally used the identity (27).

From (34) and (36) we now obtain $E ({[i (ξ; η) - I (ξ; η)]}^{m}) = 0$ for all $m = 2 \tilde{m} - 1$ with $\tilde{m} \in N$ because, if m is odd, then for all $(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}$ at least one of the $ℓ_{i}$ ’s has to be odd. If $m = 2 \tilde{m}$ with $\tilde{m} \in N$ , we obtain from (34) and (36)

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) & = \sum_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}} m! \prod_{i = 1}^{r} \frac{1}{ℓ_{i}!} \frac{{(ℓ_{i}!)}^{2}}{{((ℓ_{i} / 2)!)}^{2} 4^{\frac{ℓ_{i}}{2}}} ϱ_{i}^{ℓ_{i}} \\ = \sum_{(m_{1}, m_{2}, \dots, m_{r}) \in K_{m, r}^{[2]}} m! \prod_{i = 1}^{r} \frac{(2 m_{i})!}{{(m_{i}!)}^{2} 4^{m_{i}}} ϱ_{i}^{2 m_{i}} . \end{matrix}

□

4.5. Proof of Part (iii) of Corollary 1

Using the random variable $\tilde{ν}$ as in the proof of Theorem 3, we can write the m-th central moment of the information density $i (ξ; η)$ as

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) = E ({\tilde{ν}}^{m}) = {(- j)}^{m} \frac{d^{m}}{d t^{m}} φ_{\tilde{ν}} (t) |_{t = 0}, \end{matrix}

where the characteristic function $φ_{\tilde{ν}}$ of $\tilde{ν}$ is given by $φ_{\tilde{ν}} (t) = {(1 + ϱ_{r}^{2} t^{2})}^{- r / 2}, t \in R$ , due to Lemma 1 and the equality of all canonical correlations. Using the binomial series and the Taylor series expansion as in the proof of Theorem 3, we obtain

\begin{matrix} \frac{d^{m}}{d t^{m}} φ_{\tilde{ν}} (t) |_{t = 0} = & \{\begin{matrix} m! (\binom{- r / 2}{m / 2}) ϱ_{r}^{m} & if m = 2 \tilde{m} \\ 0 & if m = 2 \tilde{m} - 1 \end{matrix} \end{matrix}

for all $\tilde{m} \in N$ . Collecting terms and additionally using the definition of the generalized binomial coefficient given in (25) in the proof of Theorem 1 yields (9). □

5. Recurrence Formulas and Finite Sum Approximations

If there are at least two distinct canonical correlations, then the PDF $f_{i (ξ; η)}$ and CDF $F_{i (ξ; η)}$ of the information density $i (ξ; η)$ are given by the infinite series in Theorems 1 and 2. If we consider only a finite number of summands in these representations, then we obtain approximations amenable in particular for numerical calculations. However, a direct finite sum approximation of the series in (3) and (4) is rather inefficient since modified Bessel and Struve $L$ functions have to be evaluated for every summand. Therefore, we derive in this section recursive representations, which allow efficient numerical calculations. Furthermore, we derive uniform bounds of the approximation error. Based on the recurrence relations and the error bounds, an implementation in the programming language Python has been developed, which provides an efficient tool to numerically calculate the PDF and CDF of the information density with a predefined accuracy as high as desired. The developed source code as well as illustrating examples are made publicly available in an open access repository on GitLab [26].

Subsequently, we adopt all the previous notation and assume $r \geq 2$ and at least two distinct canonical correlations (since otherwise we have the case of Corollary 1, where the series reduce to a single summand).

5.1. Recurrence Formulas

The recursive approach developed below is based on the work of Moschopoulos [27], which extended the work of Mathai [10]. First, we rewrite the series representations of the PDF and CDF of the information density given in Theorem 1 and Theorem 2 in a form, which is suitable for recursive calculations. To begin with, we define two functions appearing in the series representations (3) and (4), which involve the modified Bessel function $K_{α}$ of second kind and order $α$ and the modified Struve $L$ function $L_{α}$ of order $α$ . Let us define for all $k \in N_{0}$ the functions $U_{k}$ and $D_{k}$ by

\begin{matrix} U_{k} (z) = \frac{K_{\frac{r - 1}{2} + k} (z)}{Γ (\frac{r}{2} + k)} {(\frac{z}{2})}^{\frac{r - 1}{2} + k}, z \geq 0 \end{matrix}

(37)

and

\begin{matrix} D_{k} (z) = \frac{z}{2 ϱ_{r}} [K_{\frac{r - 1}{2} + k} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2} + k} (\frac{z}{ϱ_{r}}) + K_{\frac{r - 3}{2} + k} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2} + k} (\frac{z}{ϱ_{r}})], z \geq 0 . \end{matrix}

(38)

Furthermore, we define for all $k \in N_{0}$ the coefficient $δ_{k}$ by

\begin{matrix} δ_{k} = \sum_{(k_{1}, k_{2}, \dots, k_{r - 1}) \in K_{k, r - 1}} [\prod_{i = 1}^{r - 1} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}], \end{matrix}

(39)

where $K_{k, r - 1} = \{(k_{1}, k_{2}, \dots, k_{r - 1}) \in N_{0}^{r - 1} : k_{1} + k_{2} + \dots + k_{r - 1} = k\}$ . With these definitions, we obtain the following alternative series representations of (3) and (4) by observing that the multiple summations over the indices $k_{1}, k_{2}, \dots, k_{r - 1}$ can be shortened to one summation over the index $k = k_{1} + k_{2} + \dots + k_{r - 1}$ .

Proposition 2

(Alternative representation of PDF and CDF of the information density). The PDF $f_{i (ξ; η)}$ of the information density $i (ξ; η)$ given in Theorem 1 has the alternative series representation

$\begin{matrix} f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{\infty} δ_{k} U_{k} (\frac{|x - I (ξ; η)|}{ϱ_{r}}), x \in R . \end{matrix}$ (40)

The function $V (\cdot)$ specifying the CDF $F_{i (ξ; η)}$ of the information density $i (ξ; η)$ as given in Theorem 2 has the alternative series representation

$\begin{matrix} V (z) = (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{\infty} δ_{k} D_{k} (z), z \geq 0 . \end{matrix}$ (41)

Based on the representations in Proposition 2 and with recursive formulas for $U_{k} (\cdot)$ , $D_{k} (\cdot)$ and $δ_{k}$ , we are in the position to calculate the PDF and CDF of the information density by a single summation over completely recursively defined terms. In the following, we will derive recurrence relations for $U_{k} (\cdot)$ , $D_{k} (\cdot)$ and $δ_{k}$ , which allow the desired efficient calculations.

Lemma 2

(Recurrence formula of the function $U_{k}$ ). If for all $k \in N_{0}$ the function $U_{k}$ is defined by (37), then $U_{k} (z)$ satisfies for all $k \geq 2$ and $z \geq 0$ the recurrence formula

$\begin{matrix} U_{k} (z) = \frac{z^{2}}{(r + 2 k - 2) (r + 2 k - 4)} U_{k - 2} (z) + \frac{r + 2 k - 3}{r + 2 k - 2} U_{k - 1} (z) . \end{matrix}$ (42)

Proof.

First, assume $z = 0$ . Based on Proposition 1, we obtain for all $k \in N_{0}$

$\begin{matrix} lim_{z \to + 0} U_{k} (z) = \frac{Γ (\frac{r - 1}{2} + k)}{2 Γ (\frac{r}{2} + k)}, \end{matrix}$ (43)

such that $U_{k} (0)$ is well defined and finite. Using the recurrence relation $Γ (y + 1) = y Γ (y)$ for the Gamma function [24], Sec. 8.331.1 we have

$\begin{matrix} \frac{Γ (\frac{r - 1}{2} + k)}{2 Γ (\frac{r}{2} + k)} = \frac{(\frac{r - 1}{2} + k - 1)}{(\frac{r}{2} + k - 1)} \cdot \frac{Γ (\frac{r - 1}{2} + k - 1)}{2 Γ (\frac{r}{2} + k - 1)} . \end{matrix}$

This shows together with (43) that the recurrence formula (42) holds for $U_{k} (0)$ and $k \geq 2$ .

Now, assume $z > 0$ and consider the recurrence formula

$\begin{matrix} z K_{α} (z) = z K_{α - 2} (z) + 2 (α - 1) K_{α - 1} (z) \end{matrix}$ (44)

for the modified Bessel function of the second kind and order $α$ [24], Sec. 8.486.10. Plugging (44) into (37) for $α = \frac{r - 1}{2} + k$ yields for $k \geq 2$

$\begin{matrix} U_{k} (z) = \frac{K_{\frac{r - 1}{2} + k - 2} (z)}{Γ (\frac{r}{2} + k)} & {(\frac{z}{2})}^{\frac{r - 1}{2} + k - 2} {(\frac{z}{2})}^{2} \\ + \frac{(\frac{r - 1}{2} + k - 1) K_{\frac{r - 1}{2} + k - 1} (z)}{Γ (\frac{r}{2} + k)} {(\frac{z}{2})}^{\frac{r - 1}{2} + k - 1} . \end{matrix}$ (45)

Using again the relation $Γ (y + 1) = y Γ (y)$ , we obtain

$\begin{matrix} Γ (\frac{r}{2} + k) & = (\frac{r}{2} + k - 1) Γ (\frac{r}{2} + k - 1) = (\frac{r}{2} + k - 1) (\frac{r}{2} + k - 2) Γ (\frac{r}{2} + k - 2), \end{matrix}$

which yields together with (45) and (37) the recurrence formula (42) for $U_{k} (z)$ if $z > 0$ and $k \geq 2$ . □

Lemma 3

(Recurrence formula of the function $D_{k}$ ). If, for all $k \in N_{0}$ , the function $D_{k}$ is defined by (38), then $D_{k} (z)$ satisfies for all $k \geq 1$ and $z \geq 0$ the recurrence formula

$\begin{matrix} D_{k} (z) = D_{k - 1} (z) - \frac{1}{2 \sqrt{π} (\frac{r}{2} + k - 1)} (\frac{z}{ϱ_{r}}) U_{k - 1} (\frac{z}{ϱ_{r}}), \end{matrix}$ (46)

with $U_{k} (\cdot)$ as defined in (37).

Proof.

First, assume $z = 0$ . We have $D_{k} (0) = 0$ for all $k \in N_{0}$ and from the proof of Lemma 2 we have $U_{k} (0) = Γ (\frac{r - 1}{2} + k) / 2 Γ (\frac{r}{2} + k)$ for all $k \in N_{0}$ . Thus, the left-hand side and the right-hand side of (46) are both zero, which shows that (46) holds for $z = 0$ and $k \geq 1$ .

Now, assume $z > 0$ and consider the recurrence formula

$\begin{matrix} z L_{α} (z) = z L_{α - 2} (z) - 2 (α - 1) L_{α - 1} (z) - \frac{2^{1 - α} z^{α}}{\sqrt{π} Γ (α + \frac{1}{2})} \end{matrix}$

for the modified Struve $L$ function of order $α$ [9], Sec. 11.4.25. Together with the recurrence formula (44) for the modified Bessel function of the second kind and order $α$ , we obtain

$\begin{matrix} z L_{α} (z) K_{α - 1} (z) & = z L_{α - 2} (z) K_{α - 1} (z) - 2 (α - 1) L_{α - 1} (z) K_{α - 1} (z) \end{matrix}$

$\begin{matrix} - \frac{2^{1 - α} z^{α}}{\sqrt{π} Γ (α + \frac{1}{2})} K_{α - 1} (z), \end{matrix}$ (47)

$\begin{matrix} z K_{α} (z) L_{α - 1} (z) & = z K_{α - 2} (z) L_{α - 1} (z) & + 2 (α - 1) K_{α - 1} (z) L_{α - 1} (z) . \end{matrix}$ (48)

Plugging (47) and (48) into (38) for $α = \frac{r - 1}{2} + k$ yields for $k \geq 1$

$\begin{matrix} D_{k} (z) = \frac{z}{2 ϱ_{r}} [ & K_{\frac{r - 1}{2} + k - 1} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2} + k - 1} (\frac{z}{ϱ_{r}}) + K_{\frac{r - 3}{2} + k - 1} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2} + k - 1} (\frac{z}{ϱ_{r}})] \\ - \frac{1}{\sqrt{π} Γ (\frac{r}{2} + k)} {(\frac{z}{2 ϱ_{r}})}^{\frac{r - 1}{2} + k} K_{\frac{r - 1}{2} + k - 1} (\frac{z}{ϱ_{r}}) . \end{matrix}$

Together with (38), the identity $Γ (\frac{r}{2} + k) = (\frac{r}{2} + k - 1) Γ (\frac{r}{2} + k - 1)$ , and the definition of the function $U_{k} (\cdot)$ in (37), we obtain the recurrence formula (46) for $D_{k} (z)$ if $z > 0$ and $k \geq 1$ . □

Lemma 4

(Recursive formula of the coefficient $δ_{k}$ ). The coefficient $δ_{k}$ defined by (39) satisfies for all $k \in N_{0}$ the recurrence formula

$\begin{matrix} δ_{k + 1} = \frac{1}{k + 1} \sum_{j = 1}^{k + 1} j γ_{j} δ_{k + 1 - j}, \end{matrix}$ (49)

where $δ_{0} = 1$ and

$\begin{matrix} γ_{j} = \sum_{i = 1}^{r - 1} \frac{1}{2 j} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{j} . \end{matrix}$ (50)

For the derivation of Lemma 4, we use an adapted version of the method of Moschopoulos [27] and the following auxiliary result.

Lemma 5.

For $k \in N_{0}$ , let g be a real univariate $(k + 1)$ -times differentiable function. Then, we have the following recurrence relation for the $(k + 1)$ -th derivative of the composite function $h = exp (g)$

$h^{(k + 1)} = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) g^{(j)} h^{(k - j + 1)},$ (51)

where $f^{(i)}$ denotes the i-th derivative of the function f with $f^{(0)} = f$ .

Proof.

We prove the assertion of Lemma 5 by induction over k. First, consider the base case for $k = 0$ . In this case, formula (51) gives

$h^{(1)} = g^{(1)} h,$

which is easily seen to be true.

Assuming formula (51) holds for $h^{(k)}$ , we continue with the case $k + 1$ . Application of the product rule leads to

$\begin{matrix} h^{(k + 1)} = & {(h^{(k)})}^{(1)} = {(\sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j)} h^{(k - j)})}^{(1)} \\ = & \sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j + 1)} h^{(k - j)} + \sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j)} h^{(k - j + 1)} . \end{matrix}$

Substitution of $j^{'} = j + 1$ in the first term gives

$\begin{matrix} h^{(k + 1)} = & \sum_{j^{'} = 2}^{k + 1} (\binom{k - 1}{j^{'} - 2}) g^{(j^{'})} h^{(k - j^{'} + 1)} + \sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j)} h^{(k - j + 1)} . \end{matrix}$

With this representation and the identity,

$(\binom{k - 1}{j - 2}) + (\binom{k - 1}{j - 1}) = (\binom{k}{j - 1})$

We finally have

$\begin{matrix} h^{(k + 1)} = & g^{(1)} h^{(k)} + \sum_{j = 2}^{k} [(\binom{k - 1}{j - 1}) + (\binom{k - 1}{j - 2})] g^{(j)} h^{(k - j + 1)} + g^{(k + 1)} h \\ = & (\binom{k}{0}) g^{(1)} h^{(k)} + \sum_{j = 2}^{k} (\binom{k}{j - 1}) g^{(j)} h^{(k - j + 1)} + (\binom{k}{k}) g^{(k + 1)} h \\ = & \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) g^{(j)} h^{(k - j + 1)} . \end{matrix}$

This completes the proof of Lemma 5. □

Proof of Lemma 4.

To prove the recurrence formula (49), we consider the characteristic function

$φ_{\tilde{ν}} (t) = \prod_{i = 1}^{r} {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}}, t \in R$ (52)

of the random variable $\tilde{ν}$ introduced in Lemma 1. On the one hand, the series representation of $φ_{\tilde{ν}}$ given in (28) in the proof of Theorem 1 can be rewritten as follows using the coefficient $δ_{k}$ defined in (39):

$\begin{matrix} φ_{\tilde{ν}} (t) = {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{ℓ = 0}^{\infty} δ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}, t \in R . \end{matrix}$ (53)

On the other hand, recall the expansion of ${(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}}$ given in (22), which yields together with (52) and the application of the natural logarithm the identity

$\begin{matrix} log (φ_{\tilde{ν}} (t)) = & log ({(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}})) \\ + \sum_{i = 1}^{r - 1} log ({(1 + (\frac{ϱ_{r}^{2}}{ϱ_{i}^{2}} - 1) {(1 + ϱ_{r}^{2} t^{2})}^{- 1})}^{- \frac{1}{2}}) . \end{matrix}$ (54)

Now consider the power series

$log (1 + y) = \sum_{ℓ = 1}^{\infty} \frac{{(- 1)}^{ℓ + 1}}{ℓ} y^{ℓ},$ (55)

which is absolutely convergent for $| y | < 1$ . With the same arguments as in the proof of Theorem 1, in particular due to (26), we can apply the series expansion (55) to the second term on the right-hand side of (54) to obtain the absolutely convergent series representation

$\begin{matrix} \begin{matrix} log (φ_{\tilde{ν}} (t)) = & log ({(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}})) + \sum_{ℓ = 1}^{\infty} γ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}, \end{matrix} \end{matrix}$ (56)

where we have further used the definition of $γ_{ℓ}$ given in (50). Applying the exponential function to both sides of (56) then yields the following expression for the characteristic function $φ_{\tilde{ν}}$ .

$\begin{matrix} φ_{\tilde{ν}} (t) = & {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}) \end{matrix}$ (57)

Comparing (53) and (57) yields the identity

$\sum_{ℓ = 0}^{\infty} δ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ} = exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}) .$ (58)

We now define $x = {(1 + ϱ_{r}^{2} t^{2})}^{- 1}$ and take the $(k + 1)$ -th derivative w. r. t. x on both sides of (58) using the identity

$\frac{d^{m}}{d x^{m}} (\sum_{ℓ = 0}^{\infty} a_{ℓ} x^{ℓ}) = \frac{d^{m}}{d x^{m}} (\sum_{ℓ = 1}^{\infty} a_{ℓ} x^{ℓ}) = \sum_{ℓ = m}^{\infty} \frac{ℓ!}{(ℓ - m)!} a_{ℓ} x^{ℓ - m}$ (59)

for the m-th derivative of a power series $\sum_{ℓ = 0}^{\infty} a_{ℓ} x^{ℓ}$ . For the left-hand side of (58), we obtain

$\begin{matrix} \frac{d^{k + 1}}{d x^{k + 1}} (\sum_{ℓ = 0}^{\infty} δ_{ℓ} x^{ℓ}) = \sum_{ℓ = k + 1}^{\infty} \frac{ℓ!}{(ℓ - k - 1)!} δ_{ℓ} x^{ℓ - k - 1} . \end{matrix}$ (60)

For the right-hand side of (58), we obtain

$\begin{matrix} \frac{d^{k + 1}}{d x^{k + 1}} (exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ})) & = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) \frac{d^{j}}{d x^{j}} (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ}) \frac{d^{k - j + 1}}{d x^{k - j + 1}} (exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ})) \\ = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) \frac{d^{j}}{d x^{j}} (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ}) \frac{d^{k - j + 1}}{d x^{k - j + 1}} (\sum_{ℓ = 0}^{\infty} δ_{ℓ} x^{ℓ}) \\ = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) (\sum_{ℓ = j}^{\infty} \frac{ℓ! γ_{ℓ}}{(ℓ - j)!} x^{ℓ - j}) \times \\ (\sum_{ℓ = k + 1 - j}^{\infty} \frac{ℓ! δ_{ℓ}}{(ℓ - k + j - 1)!} x^{ℓ - k + j - 1}), \end{matrix}$ (61)

where we used Lemma 5 and the identities (58) and (59). From the equality

$\begin{matrix} \frac{d^{k + 1}}{d x^{k + 1}} (\sum_{ℓ = 0}^{\infty} δ_{ℓ} x^{ℓ}) = \frac{d^{k + 1}}{d x^{k + 1}} (exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ})) \end{matrix}$

and the evaluation of the right-hand side of (60) and (61), we obtain

$\begin{matrix} (k + 1)! δ_{k + 1} & x^{0} + (\dots) & x^{1} + (\dots) & x^{2} \dots \\ = (\sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) j! γ_{j} (k + 1 - j)! δ_{k + 1 - j}) & x^{0} + (\dots) & x^{1} + (\dots) & x^{2} \dots \end{matrix}$

Comparing the coefficients for $x^{0}$ finally yields

$\begin{matrix} δ_{k + 1} & = \frac{1}{(k + 1)!} \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) j! γ_{j} (k + 1 - j)! δ_{k + 1 - j} \\ = \frac{1}{(k + 1)!} \sum_{j = 1}^{k + 1} \frac{k!}{(j - 1)! (k + 1 - j)!} j! γ_{j} (k + 1 - j)! δ_{k + 1 - j} \\ = \frac{1}{(k + 1)} \sum_{j = 1}^{k + 1} j γ_{j} δ_{k + 1 - j} . \end{matrix}$

This completes the proof of Lemma 4. □

5.2. Finite Sum Approximations

The results in the previous Section 5.1 can be used in the following way for efficient numerical calculations. Consider

\begin{matrix} {\hat{f}}_{i (ξ; η)} (x, n) = \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k} U_{k} (\frac{|x - I (ξ; η)|}{ϱ_{r}}), x \in R \end{matrix}

(62)

for $n \in N_{0}$ , i.e., the finite sum approximation of the PDF given in (40). To calculate ${\hat{f}}_{i (ξ; η)} (x, n)$ , first calculate $U_{0} (| x - I (ξ; η) | / ϱ_{r})$ and $U_{1} (| x - I (ξ; η) | / ϱ_{r})$ using (37). Then, use the recurrence formulas (42) and (49) to calculate the remaining summands in (62). The great advantage of this approach is that only two evaluations of the modified Bessel function are required and for the rest of the calculations efficient recursive formulas are employed making the numerical computations efficient.

Similarly, consider

\begin{matrix} {\hat{F}}_{i (ξ; η)} (x, n) = & \{\begin{matrix} \begin{matrix} \frac{1}{2} - \hat{V} (I (ξ; η) - x, n) \end{matrix} & if x \leq I (ξ; η) \\ \frac{1}{2} + \hat{V} (x - I (ξ; η), n) & if x > I (ξ; η) \end{matrix}, \end{matrix}

(63)

\begin{matrix} with \hat{V} (z, n) = (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k} D_{k} (z), z \geq 0, \end{matrix}

(64)

for $n \in N_{0}$ , i.e., the finite sum approximation of the alternative representation of the CDF of the information density, where $\hat{V} (z, n)$ is the finite sum approximation of the function $V (\cdot)$ given in (41). To calculate ${\hat{F}}_{i (ξ; η)} (x, n)$ , first calculate $D_{0} (z)$ , $U_{0} (z / ϱ_{r})$ , and $U_{1} (z / ϱ_{r})$ for $z = I (ξ; η) - x$ or $z = x - I (ξ; η)$ using (37) and (38). Then, use the recurrence formulas (42), (46), and (49) to calculate the remaining summands in (64). This approach requires only three evaluations of the modified Bessel and Struve $L$ function resulting in efficient numerical calculations also for the CDF of the information density.

The following theorem provides suitable bounds to evaluate and control the error related to the introduced finite sum approximations.

Theorem 4

(Bounds of the approximation error for the alternative representation of PDF and CDF). For the finite sum approximations in (62)–(64) of the alternative representation of the PDF and CDF of the information density as given in Proposition 2, we have for $n \in N$ summands the error bounds

$\begin{matrix} | f_{i (ξ; η)} (x) - {\hat{f}}_{i (ξ; η)} (x, n) | \leq \frac{Γ (\frac{r - 1}{2} + n)}{2 ϱ_{r} \sqrt{π} Γ (\frac{r}{2} + n)} (1 - (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k}), x \in R \end{matrix}$ (65)

and

$\begin{matrix} | V (z) - \hat{V} (z, n) | \leq \frac{1}{2} (1 - (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k}), z \geq 0 . \end{matrix}$ (66)

Proof.

From the special case where all canonical correlations are equal, we can conclude from the CDF given in Corollary 1 that the function

$\begin{matrix} z \mapsto z [K_{α} (z) L_{α - 1} (z) + K_{α - 1} (z) L_{α} (z)], z \geq 0, \end{matrix}$ (67)

is monotonically increasing for all $α = (j - 1) / 2$ , $j \in N$ and that further

$\begin{matrix} lim_{z \to \infty} z [K_{α} (z) L_{α - 1} (z) + K_{α - 1} (z) L_{α} (z)] = 1 \end{matrix}$ (68)

holds. Using (68), we obtain from (4)

$\begin{matrix} lim_{z \to \infty} 2 V (z) = \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \end{matrix}$

by exchanging the limit and the summation, which is justified by the monotone convergence theorem. Due to the properties of the CDF, we have ${lim}_{z \to \infty} 2 V (z) = 1$ , which implies

$\begin{matrix} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{\infty} δ_{k} = \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] = 1, \end{matrix}$ (69)

where the first equality follows from the definition of the coefficient $δ_{k}$ in (39).

We now obtain with (41) and (64)

$\begin{matrix} | V (z) - \hat{V} (z, n) | = (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} δ_{k} D_{k} (z) \leq & (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} \frac{1}{2} δ_{k} . \end{matrix}$

The inequality follows from the definition of the function $D_{k} (\cdot)$ in (38), the monotonicity of the function in (67), and from (68). Then, (66) follows from (69).

Similarly, we obtain with (40) and (62)

$\begin{matrix} | f_{i (ξ; η)} (x) - {\hat{f}}_{i (ξ; η)} (x, n) | = & \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} δ_{k} U_{k} (\frac{|x - I (ξ; η)|}{ϱ_{r}}) \\ \leq & \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} δ_{k} \frac{Γ (\frac{r - 1}{2} + n)}{2 Γ (\frac{r}{2} + n)} . \end{matrix}$

The inequality follows from the definition of the function $U_{k} (\cdot)$ , Proposition 1, and the decreasing monotonicity of $Γ (\frac{r - 1}{2} + k) / Γ (\frac{r}{2} + k)$ w. r. t. $k \in N_{0}$ . Then, (65) follows from (69). □

Remark 1.

Note that the bound in (65) can be further simplified using the inequality $Γ (α) / Γ (α + 1 / 2) \leq \sqrt{π} .$ Further note that the derived error bounds are uniform in the sense that they only depend on the parameters of the given Gaussian distribution and the number of summands considered. As can be seen from (69), the bounds converge to zero as the number of summands jointly increase.

Remark 2

(Relation to Bell polynomials). Interestingly, the coefficient $δ_{k}$ can be expressed for all $k \in N$ in the following form

$\begin{matrix} δ_{k} = \frac{B_{k} (γ_{1}, 2 γ_{2}, 6 γ_{3}, \dots, k! γ_{k})}{k!}, \end{matrix}$

where $γ_{j}$ is defined in (50), and $B_{k}$ denotes the complete Bell polynomial of order k [28], Sec. 3.3. Even though this is an interesting connection to the Bell polynomials, which provides an explicit formula of $δ_{k}$ , the recursive formula given in Lemma 4 is more efficient for numerical calculations.

6. Numerical Examples and Illustrations

We illustrate the results of this paper with some examples, which all can be verified with the Python implementation publicly available on GITLAB [26].

Equal canonical correlations. First, we consider the special case of Corollary 1 when all canonical correlations are equal. The PDF and CDF given by (6) and (7) are illustrated in Figure 1 and Figure 2 in centered form, i.e., shifted by $I (ξ; η)$ , for $r \in {1, 2, 3, 4, 5}$ and equal canonical correlations $ϱ_{i} = 0.9, i = 1, \dots, r$ . In Figure 3 and Figure 4, a fixed number of $r = 5$ equal canonical correlations $ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}, i = 1, \dots, r$ is considered. When all canonical correlations are equal, then, due to the central limit theorem, the distribution of the information density $i (ξ; η)$ converges to a Gaussian distribution as $r \to \infty$ . Figure 5 and Figure 6 show for $r \in {5, 10, 20, 40}$ and equal canonical correlations $ϱ_{i} = 0.2, i = 1, 2, \dots, r$ the PDF and CDF of the information density together with corresponding Gaussian approximations. The approximations are obtained by considering Gaussian distributions, which have the same variance as the information density $i (ξ; η)$ . Recall that the variance of the information density is given by (11), i.e., by the sum of the squared canonical correlations. The illustrations show that only for a high number of equal canonical correlations the distribution of the information density becomes approximately Gaussian.

PDF $f_{i (ξ; η) - I (ξ; η)}$ for $r \in {1, 2, 3, 4, 5}$ equal canonical correlations $ϱ_{i} = 0.9$ .

CDF $F_{i (ξ; η) - I (ξ; η)}$ for $r \in {1, 2, 3, 4, 5}$ equal canonical correlations $ϱ_{i} = 0.9$ .

PDF $f_{i (ξ; η) - I (ξ; η)}$ for $r = 5$ equal canonical correlations $ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}$ .

CDF $F_{i (ξ; η) - I (ξ; η)}$ for $r = 5$ equal canonical correlations $ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}$ .

PDF $f_{i (ξ; η) - I (ξ; η)}$ for $r \in {5, 10, 20, 40}$ equal canonical correlations $ϱ_{i} = 0.2$ vs. Gaussian approximation.

CDF $F_{i (ξ; η) - I (ξ; η)}$ for $r \in {5, 10, 20, 40}$ equal canonical correlations $ϱ_{i} = 0.2$ vs. Gaussian approximation.

Different canonical correlations. To illustrate the case with different canonical correlations, let us consider two more examples.

(i) First, assume that the random vectors $ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})$ and $η = (η_{1}, η_{2}, \dots, η_{q})$ have equal dimensions, i.e., $p = q$ , and are related by

\begin{matrix} (η_{1}, η_{2}, \dots, η_{p}) = (ξ_{1} + ζ_{1}, ξ_{2} + ζ_{2}, \dots, ξ_{p} + ζ_{p}), \end{matrix}

where $ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})$ and $ζ = (ζ_{1}, ζ_{2}, \dots, ζ_{p})$ are zero mean Gaussian random vectors, independent of each other and with covariance matrices

\begin{matrix} R_{ξ} = {(ρ^{| i - j |})}_{i, j = 1}^{p} and R_{ζ} = σ_{z}^{2} I_{p}, \end{matrix}

for parameters $0 < | ρ | < 1$ and $σ_{z}^{2} > 0$ , where $I_{p}$ denotes the identity matrix of dimension $p \times p$ . The covariance matrix of the Gaussian random vector $(ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{p})$ is the basis of the canonical correlation analysis and is given by

(\begin{matrix} R_{ξ} & R_{ξ η} \\ R_{ξ η} & R_{η} \end{matrix}) = (\begin{matrix} R_{ξ} & R_{ξ} \\ R_{ξ} & R_{ξ} + R_{ζ} \end{matrix}) .

The specified situation corresponds to a discrete-time additive noise channel, where a stationary first-order Markov-Gaussian input process is corrupted by a stationary additive white Gaussian noise process. In this setting, a block of p consecutive input and output symbols is considered.

For given parameter values $ρ$ and $σ_{z}^{2}$ , the canonical correlations can be calculated numerically with the method described in Section 3. However, the example at hand even allows the derivation of explicit formulas for the canonical correlations. Evaluating the approach in Section 3 analytically yields

\begin{matrix} ϱ_{i} (ρ, σ_{z}^{2}) = \sqrt{\frac{λ_{i}}{λ_{i} + σ_{z}^{2}}} with λ_{i} = \frac{1 - ρ^{2}}{1 - 2 ρ cos (θ_{i}) + ρ^{2}}, i = 1, 2, \dots, r = p, \end{matrix}

(70)

where $θ_{1}, θ_{2}, \dots, θ_{r}$ are the zeros of the function

\begin{matrix} g (θ) = sin ((r + 1) θ) - 2 ρ sin (r θ) + ρ^{2} sin ((r - 1) θ), θ \in (0, π) . \end{matrix}

In this representation, $λ_{1}, λ_{2}, \dots, λ_{r}$ denote the eigenvalues of the covariance matrix $R_{ξ} = {(ρ^{| i - j |})}_{i, j = 1}^{p}$ derived in [29], Sec. 5.3.

As numerical examples Figure 7 and Figure 8 show, the approximated PDF ${\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ and CDF ${\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ for $p = r \in {5, 10, 20, 40}$ and the parameter values $ρ = 0.9$ and $σ_{z}^{2} = 10$ using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen $< 1 \times 10^{- 3}$ to obtain a high precision of the plotted curves. The number n of summands required in (62) and (64) to achieve these error bounds for $r \in {5, 10, 20, 40}$ is equal to $n \in {217, 333, 462, 649}$ for the PDF and $n \in {282, 444, 618, 847}$ for the CDF. For this example, the distribution of the information density $i (ξ; η)$ converges to a Gaussian distribution as $r \to \infty$ . However, Figure 7 and Figure 8 show that, even for $r = 40$ , there is still a significant gap between the exact distribution and the corresponding Gaussian approximation.

Approximated PDF ${\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ for $r \in {5, 10, 20, 40}$ canonical correlations $ϱ_{i} (ρ, σ_{z}^{2})$ given in (70) for $ρ = 0.9$ and $σ_{z}^{2} = 10$ (approximation error $< 1 \times 10^{- 3}$ ) vs. Gaussian approximation ( $r \in {20, 40}$ ).

Approximated CDF ${\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ for $r \in {5, 10, 20, 40}$ canonical correlations $ϱ_{i} (ρ, σ_{z}^{2})$ given in (70) for $ρ = 0.9$ and $σ_{z}^{2} = 10$ (approximation error $< 1 \times 10^{- 3}$ ) vs. Gaussian approximation ( $r \in {20, 40}$ ).

(ii) As a second example with different canonical correlations, let us consider the sequence ${ϱ_{1} (T), ϱ_{2} (T),$ $\dots, ϱ_{r} (T)}$ with

\begin{matrix} ϱ_{i} (T) = \sqrt{\frac{T^{2}}{T^{2} + π {(i - \frac{1}{2})}^{2}}}, i = 1, 2, \dots, r . \end{matrix}

(71)

These canonical correlations are related to the information density of a continuous-time additive white Gaussian noise channel confined to a finite time interval $[0, T]$ with a Brownian motion as input signal (see, e.g., Huffmann [30], Sec. 8.1 for more details). Figure 9 and Figure 10 show the approximated PDF ${\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ and CDF ${\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ for $r \in {2, 5, 10, 15}$ and $T = 1$ using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen $< 1 \times 10^{- 2}$ such that there are no differences visible in the plotted curves by further lowering the approximation error. The number n of summands required in (62) and (64) to achieve these error bounds for $r \in {2, 5, 10, 15}$ is equal to $n \in {15, 141, 638, 1688}$ for the PDF and $n \in {20, 196, 886, 2071}$ for the CDF. Choosing r larger than 15 for the canonical correlations (71) with $T = 1$ does not result in visible changes of the PDF and CDF compared to $r = 15$ . This demonstrates, together with Figure 9 and Figure 10, that a Gaussian approximation is not valid for this example, even if $r \to \infty$ .

Approximated PDF ${\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ for $r \in {2, 5, 10, 15}$ canonical correlations $ϱ_{i} (T)$ given in (71) for $T = 1$ (approximation error $< 1 \times 10^{- 2}$ ) vs. Gaussian approximation ( $r = 15$ ).

Approximated CDF ${\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)$ for $r \in {2, 5, 10, 15}$ canonical correlations $ϱ_{i} (T)$ given in (71) for $T = 1$ (approximation error $< 1 \times 10^{- 2}$ ) vs. Gaussian approximation ( $r = 15$ ).

Indeed, from [8], Th. 9.6.1 and the comment above Eq. (9.6.45) in [8], one can conclude that, whenever the canonical correlations satisfy

\begin{matrix} lim_{r \to \infty} \sum_{i = 1}^{r} ϱ_{i}^{2} < \infty, \end{matrix}

then the distribution of the information density is not Gaussian.

7. Summary of Contributions

We derived series representations of the PDF and CDF of the information density for arbitrary Gaussian random vectors as well as a general formula for the central moments using canonical correlation analysis. We provided simplified and closed-form expressions for important special cases, in particular when all canonical correlations are equal, and derived recurrence formulas and uniform error bounds for finite sum approximations of the general series representations. These approximations and recurrence formulas are suitable for efficient and arbitrarily accurate numerical calculations, where the approximation error can be easily controlled with the derived error bounds. Moreover, we provided examples showing the (in)validity of approximating the information density with a Gaussian random variable.

Author Contributions

J.E.W.H. and M.M. conceived this work, performed the analysis, validated the results, and wrote the manuscript. All authors have read and agreed to this version of the manuscript.

Data Availability Statement

An implementation in Python allowing efficient numerical calculations related to the main results of the paper is publicly available on GitLab: https://gitlab.com/infth/information-density (accessed on 24 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

The work of M.M. was supported in part by the German Research Foundation (Deutsche Forschungsgemeinschaft) as part of Germany’s Excellence Strategy—EXC 2050/1—Project ID 390696704—Cluster of Excellence “Centre for Tactile Internet with Human-in-the-Loop” (CeTI) of Technische Universität Dresden. We acknowledge the open access publication funding granted by CeTI.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Han T.S., Verdú S. Approximation Theory of Output Statistics. IEEE Trans. Inf. Theory. 1993;39:752–772. doi: 10.1109/18.256486. [DOI] [Google Scholar]
2.Han T.S. Information-Spectrum Methods in Information Theory. Springer; Berlin/Heidelberg, Germany: 2003. [Google Scholar]
3.Shannon C.E. Probability of Error for Optimal Codes in a Gaussian Channel. Bell Syst. Tech. J. 1959;38:611–659. doi: 10.1002/j.1538-7305.1959.tb03905.x. [DOI] [Google Scholar]
4.Dobrushin R.L. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press; Berkeley, CA, USA: 1961. Mathematical Problems in the Shannon Theory of Optimal Coding of Information; pp. 211–252. Volume 1: Contributions to the Theory of Statistics. [Google Scholar]
5.Strassen V. Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes (Held 1962) Czechoslovak Academy of Sciences; Prague, Czech Republic: 1964. Asymptotische Abschätzungen in Shannons Informationstheorie; pp. 689–723. [Google Scholar]
6.Polyanskiy Y., Poor H.V., Verdú S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory. 2010;56:2307–2359. doi: 10.1109/TIT.2010.2043769. [DOI] [Google Scholar]
7.Durisi G., Koch T., Popovski P. Toward Massive, Ultrareliable, and Low-Latency Wireless Communication With Short Packets. Proc. IEEE. 2016;104:1711–1726. doi: 10.1109/JPROC.2016.2537298. [DOI] [Google Scholar]
8.Pinsker M.S. Information and Information Stability of Random Variables and Processes. Holden-Day; San Francisco, CA, USA: 1964. [Google Scholar]
9.Olver F.W.J., Lozier D.W., Boisvert R.F., Clark C.W., editors. NIST Handbook of Mathematical Functions. Cambridge University Press; Cambridge, UK: 2010. [Google Scholar]
10.Mathai A.M. Storage Capacity of a Dam With Gamma Type Inputs. Ann. Inst. Stat. Math. 1982;34:591–597. doi: 10.1007/BF02481056. [DOI] [Google Scholar]
11.Grad A., Solomon H. Distribution of Quadratic Forms and Some Applications. Ann. Math. Stat. 1955;26:464–477. doi: 10.1214/aoms/1177728491. [DOI] [Google Scholar]
12.Kotz S., Johnson N.L., Boyd D.W. Series Representations of Distributions of Quadratic Forms in Normal Variables. I. Central Case. Ann. Math. Stat. 1967;38:823–837. doi: 10.1214/aoms/1177698877. [DOI] [Google Scholar]
13.Huffmann J.E.W., Mittelbach M. On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations. Entropy. 2022;24:924. doi: 10.3390/e24070924. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Simon M.K. Probability Distributions Involving Gaussian Random Variables: A Handbook for Engineers and Scientists. Springer; Berlin/Heidelberg, Germany: 2006. [Google Scholar]
15.Laneman J.N. On the Distribution of Mutual Information; Proceedings of the Workshop Information Theory and Its Applications (ITA); San Diego, CA, USA. 13 February 2006. [Google Scholar]
16.Wu P., Jindal N. Coding Versus ARQ in Fading Channels: How Reliable Should the PHY Be? IEEE Trans. Commun. 2011;59:3363–3374. doi: 10.1109/TCOMM.2011.102011.100152. [DOI] [Google Scholar]
17.Buckingham D., Valenti M.C. The Information-Outage Probability of Finite-Length Codes Over AWGN Channels; Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS); Princeton, NJ, USA. 19–21 March 2008. [Google Scholar]
18.Hotelling H. Relations Between Two Sets of Variates. Biometrika. 1936;28:321–377. doi: 10.1093/biomet/28.3-4.321. [DOI] [Google Scholar]
19.Gelfand I.M., Yaglom A.M. AMS Translations, Series 2. Volume 12. AMS; Providence, RI, USA: 1959. Calculation of the Amount of Information About a Random Function Contained in Another Such Function; pp. 199–246. [Google Scholar]
20.Härdle W.K., Simar L. Applied Multivariate Statistical Analysis. 4th ed. Springer; Berlin/Heidelberg, Germany: 2015. [Google Scholar]
21.Koch I. Analysis of Multivariate and High-Dimensional Data. Cambridge University Press; Cambridge, UK: 2014. [Google Scholar]
22.Timm N.H. Applied Multivariate Analysis. Springer; Berlin/Heidelberg, Germany: 2002. [Google Scholar]
23.Ibragimov I.A., Rozanov Y.A. On the Connection Between Two Characteristics of Dependence of Gaussian Random Vectors. Theory Probab. Appl. 1970;15:295–299. doi: 10.1137/1115034. [DOI] [Google Scholar]
24.Gradshteyn I.S., Ryzhik I.M. Table of Integrals, Series, and Products. 7th ed. Elsevier; Amsterdam, The Netherlands: 2007. [Google Scholar]
25.Prudnikov A.P., Brychov Y.A., Marichev O.I. Integrals and Series, Volume 2: Special Functions. Gordon and Breach Science; New York, NY, USA: 1986. [Google Scholar]
26.Huffmann J.E.W., Mittelbach M. Efficient Python Implementation to Numerically Calculate PDF, CDF, and Moments of the Information Density of Gaussian Random Vectors. 2021. [(accessed on 24 June 2022)]. Source Code Provided on GitLab. Available online: https://gitlab.com/infth/information-density. Source Code Provided on GitLab. [DOI] [PMC free article] [PubMed]
27.Moschopoulos P.G. The Distribution of the Sum of Independent Gamma Random Variables. Ann. Inst. Stat. Math. 1985;37:541–544. doi: 10.1007/BF02481123. [DOI] [Google Scholar]
28.Comtet L. Advanced Combinatorics: The Art of Finite and Infinite Expansions. Revised and Enlarged ed. D. Reidel Publishing Company; Dordrecht, The Netherlands: 1974. [Google Scholar]
29.Grenander U., Szegö G. Toeplitz Forms and Their Applications. University of California Press; Berkeley, CA, USA: 1958. [Google Scholar]
30.Huffmann J.E.W. Diploma Thesis. Department of Electrical Engineering and Information Technology, Technische Universität Dresden; Dresden, Germany: 2021. [(accessed on 24 June 2022)]. Canonical Correlation and the Calculation of Information Measures for Infinite-Dimensional Distributions. Available online: https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-742541. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1-entropy-24-00924] 1.Han T.S., Verdú S. Approximation Theory of Output Statistics. IEEE Trans. Inf. Theory. 1993;39:752–772. doi: 10.1109/18.256486. [DOI] [Google Scholar]

[B2-entropy-24-00924] 2.Han T.S. Information-Spectrum Methods in Information Theory. Springer; Berlin/Heidelberg, Germany: 2003. [Google Scholar]

[B3-entropy-24-00924] 3.Shannon C.E. Probability of Error for Optimal Codes in a Gaussian Channel. Bell Syst. Tech. J. 1959;38:611–659. doi: 10.1002/j.1538-7305.1959.tb03905.x. [DOI] [Google Scholar]

[B4-entropy-24-00924] 4.Dobrushin R.L. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press; Berkeley, CA, USA: 1961. Mathematical Problems in the Shannon Theory of Optimal Coding of Information; pp. 211–252. Volume 1: Contributions to the Theory of Statistics. [Google Scholar]

[B5-entropy-24-00924] 5.Strassen V. Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes (Held 1962) Czechoslovak Academy of Sciences; Prague, Czech Republic: 1964. Asymptotische Abschätzungen in Shannons Informationstheorie; pp. 689–723. [Google Scholar]

[B6-entropy-24-00924] 6.Polyanskiy Y., Poor H.V., Verdú S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory. 2010;56:2307–2359. doi: 10.1109/TIT.2010.2043769. [DOI] [Google Scholar]

[B7-entropy-24-00924] 7.Durisi G., Koch T., Popovski P. Toward Massive, Ultrareliable, and Low-Latency Wireless Communication With Short Packets. Proc. IEEE. 2016;104:1711–1726. doi: 10.1109/JPROC.2016.2537298. [DOI] [Google Scholar]

[B8-entropy-24-00924] 8.Pinsker M.S. Information and Information Stability of Random Variables and Processes. Holden-Day; San Francisco, CA, USA: 1964. [Google Scholar]

[B9-entropy-24-00924] 9.Olver F.W.J., Lozier D.W., Boisvert R.F., Clark C.W., editors. NIST Handbook of Mathematical Functions. Cambridge University Press; Cambridge, UK: 2010. [Google Scholar]

[B10-entropy-24-00924] 10.Mathai A.M. Storage Capacity of a Dam With Gamma Type Inputs. Ann. Inst. Stat. Math. 1982;34:591–597. doi: 10.1007/BF02481056. [DOI] [Google Scholar]

[B11-entropy-24-00924] 11.Grad A., Solomon H. Distribution of Quadratic Forms and Some Applications. Ann. Math. Stat. 1955;26:464–477. doi: 10.1214/aoms/1177728491. [DOI] [Google Scholar]

[B12-entropy-24-00924] 12.Kotz S., Johnson N.L., Boyd D.W. Series Representations of Distributions of Quadratic Forms in Normal Variables. I. Central Case. Ann. Math. Stat. 1967;38:823–837. doi: 10.1214/aoms/1177698877. [DOI] [Google Scholar]

[B13-entropy-24-00924] 13.Huffmann J.E.W., Mittelbach M. On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations. Entropy. 2022;24:924. doi: 10.3390/e24070924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14-entropy-24-00924] 14.Simon M.K. Probability Distributions Involving Gaussian Random Variables: A Handbook for Engineers and Scientists. Springer; Berlin/Heidelberg, Germany: 2006. [Google Scholar]

[B15-entropy-24-00924] 15.Laneman J.N. On the Distribution of Mutual Information; Proceedings of the Workshop Information Theory and Its Applications (ITA); San Diego, CA, USA. 13 February 2006. [Google Scholar]

[B16-entropy-24-00924] 16.Wu P., Jindal N. Coding Versus ARQ in Fading Channels: How Reliable Should the PHY Be? IEEE Trans. Commun. 2011;59:3363–3374. doi: 10.1109/TCOMM.2011.102011.100152. [DOI] [Google Scholar]

[B17-entropy-24-00924] 17.Buckingham D., Valenti M.C. The Information-Outage Probability of Finite-Length Codes Over AWGN Channels; Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS); Princeton, NJ, USA. 19–21 March 2008. [Google Scholar]

[B18-entropy-24-00924] 18.Hotelling H. Relations Between Two Sets of Variates. Biometrika. 1936;28:321–377. doi: 10.1093/biomet/28.3-4.321. [DOI] [Google Scholar]

[B19-entropy-24-00924] 19.Gelfand I.M., Yaglom A.M. AMS Translations, Series 2. Volume 12. AMS; Providence, RI, USA: 1959. Calculation of the Amount of Information About a Random Function Contained in Another Such Function; pp. 199–246. [Google Scholar]

[B20-entropy-24-00924] 20.Härdle W.K., Simar L. Applied Multivariate Statistical Analysis. 4th ed. Springer; Berlin/Heidelberg, Germany: 2015. [Google Scholar]

[B21-entropy-24-00924] 21.Koch I. Analysis of Multivariate and High-Dimensional Data. Cambridge University Press; Cambridge, UK: 2014. [Google Scholar]

[B22-entropy-24-00924] 22.Timm N.H. Applied Multivariate Analysis. Springer; Berlin/Heidelberg, Germany: 2002. [Google Scholar]

[B23-entropy-24-00924] 23.Ibragimov I.A., Rozanov Y.A. On the Connection Between Two Characteristics of Dependence of Gaussian Random Vectors. Theory Probab. Appl. 1970;15:295–299. doi: 10.1137/1115034. [DOI] [Google Scholar]

[B24-entropy-24-00924] 24.Gradshteyn I.S., Ryzhik I.M. Table of Integrals, Series, and Products. 7th ed. Elsevier; Amsterdam, The Netherlands: 2007. [Google Scholar]

[B25-entropy-24-00924] 25.Prudnikov A.P., Brychov Y.A., Marichev O.I. Integrals and Series, Volume 2: Special Functions. Gordon and Breach Science; New York, NY, USA: 1986. [Google Scholar]

[B26-entropy-24-00924] 26.Huffmann J.E.W., Mittelbach M. Efficient Python Implementation to Numerically Calculate PDF, CDF, and Moments of the Information Density of Gaussian Random Vectors. 2021. [(accessed on 24 June 2022)]. Source Code Provided on GitLab. Available online: https://gitlab.com/infth/information-density. Source Code Provided on GitLab. [DOI] [PMC free article] [PubMed]

[B27-entropy-24-00924] 27.Moschopoulos P.G. The Distribution of the Sum of Independent Gamma Random Variables. Ann. Inst. Stat. Math. 1985;37:541–544. doi: 10.1007/BF02481123. [DOI] [Google Scholar]

[B28-entropy-24-00924] 28.Comtet L. Advanced Combinatorics: The Art of Finite and Infinite Expansions. Revised and Enlarged ed. D. Reidel Publishing Company; Dordrecht, The Netherlands: 1974. [Google Scholar]

[B29-entropy-24-00924] 29.Grenander U., Szegö G. Toeplitz Forms and Their Applications. University of California Press; Berkeley, CA, USA: 1958. [Google Scholar]

[B30-entropy-24-00924] 30.Huffmann J.E.W. Diploma Thesis. Department of Electrical Engineering and Information Technology, Technische Universität Dresden; Dresden, Germany: 2021. [(accessed on 24 June 2022)]. Canonical Correlation and the Calculation of Information Measures for Infinite-Dimensional Distributions. Available online: https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-742541. [Google Scholar]

PERMALINK

On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Jonathan E W Huffmann

Martin Mittelbach

Roles

Abstract

1. Introduction and Main Theorems

Theorem 1

Theorem 2

Theorem 3

2. Special Cases

2.1. Equal Canonical Correlations

Corollary 1

2.2. More on Special Cases with Simplified Formulas

3. Mutual Information and Information Density in Terms of Canonical Correlations

4. Proof of Main Results

4.1. Auxiliary Results

Lemma 1

Proof.

Proposition 1

Proof.

4.2. Proof of Theorem 1

4.3. Proof of Theorem 2

4.4. Proof of Theorem 3

4.5. Proof of Part (iii) of Corollary 1

5. Recurrence Formulas and Finite Sum Approximations

5.1. Recurrence Formulas

Proposition 2

Lemma 2

Proof.

Lemma 3

Proof.

Lemma 4

Lemma 5.

Proof.

Proof of Lemma 4.

5.2. Finite Sum Approximations

Theorem 4

Proof.

Remark 1.

Remark 2

6. Numerical Examples and Illustrations

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

7. Summary of Contributions

Author Contributions

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases