Conditional Rényi Entropy and the Relationships between Rényi Capacities

Gautam Aishwarya; Mokshay Madiman

doi:10.3390/e22050526

. 2020 May 6;22(5):526. doi: 10.3390/e22050526

Conditional Rényi Entropy and the Relationships between Rényi Capacities

Gautam Aishwarya ^1,^*, Mokshay Madiman ¹

PMCID: PMC7517019 PMID: 33286298

Abstract

The analogues of Arimoto’s definition of conditional Rényi entropy and Rényi mutual information are explored for abstract alphabets. These quantities, although dependent on the reference measure, have some useful properties similar to those known in the discrete setting. In addition to laying out some such basic properties and the relations to Rényi divergences, the relationships between the families of mutual informations defined by Sibson, Augustin-Csiszár, and Lapidoth-Pfister, as well as the corresponding capacities, are explored.

Keywords: conditional Rényi entropy, Rényi mutual information, Rényi capacity, Sibson mutual information

1. Introduction

Shannon’s information measures (entropy, conditional entropy, the Kullback divergence or relative entropy, and mutual information) are ubiquitous both because they arise as operational fundamental limits of various communication or statistical inference problems, and because they are functionals that have become fundamental in the development and advancement of probability theory itself. Over a half century ago, Rényi [1] introduced a family of information measures extending those of Shannon [2], parametrized by an order parameter $α \in [0, \infty]$ . Rényi’s information measures are also fundamental– indeed, they are (for $α > 1$ ) just monotone functions of $L^{p}$ -norms, whose relevance or importance in any field that relies on analysis need not be justified. Furthermore, they show up in probability theory, PDE, functional analysis, additive combinatorics, and convex geometry (see, e.g., [3,4,5,6,7,8,9]), in ways where understanding them as information measures instead of simply as monotone functions of $L^{p}$ -norms is fruitful. For example, there is an intricate story of parallels between entropy power inequalities (see, e.g., [10,11,12]), Brunn-Minkowski-type volume inequalities (see, e.g., [13,14,15]) and sumset cardinality (see, e.g., [16,17,18,19,20]), which is clarified by considering logarithms of volumes and Shannon entropies as members of the larger class of Rényi entropies. It is also recognized now that Rényi’s information measures show up as fundamental operational limits in a range of information-theoretic or statistical problems (see, e.g., [21,22,23,24,25,26,27]). Therefore, there has been considerable interest in developing the theory surrounding Rényi’s information measures (which is far less well developed than the Shannon case), and there has been a steady stream of recent papers [27,28,29,30,31,32,33,34,35,36] elucidating their properties beyond the early work of [37,38,39,40]. This paper, part of which was presented at ISIT 2019 [41], is a further contribution along these lines.

More specifically, three notions of Rényi mutual information have been considered in the literature (usually named after Sibson, Arimoto and Csiszár) for discrete alphabets. Sibson’s definition has also been considered for abstract alphabets, but Arimoto’s definition has not. Indeed Verdú [31] asserts: “One shortcoming of Arimoto’s proposal is that its generalization to non-discrete alphabets is not self-evident.” The reason it is not self-evident is because although there is an obvious generalized definition, the mutual information arising from this notion depends on the choice of reference measure on the abstract alphabet, which is not a desirable property. Nonetheless, the perspective taken in this note is that it is still interesting to develop the properties of the abstract Arimoto conditional Rényi entropy keeping in mind the dependence on reference measure. The Sibson definition is then just a special case of the Arimoto definition where we choose a particular, special reference measure.

Our main motivation comes from considering various notions of Rényi capacity. While certain equivalences have been shown between various such notions by Csiszár [21] for finite alphabets and Nakiboğlu [36,42] for abstract alphabets, the equivalences and relationships are further extended in this note.

This paper is organized in the following manner. In Section 2 below we begin by defining conditional Rényi entropy for random variables taking values in a Polish space. Section 3 presents a variational formula for the conditional Rényi entropy in terms of Rényi divergence, which will be a key ingredient in several results later. Basic properties that the abstract conditional Rényi entropy satisfies akin to its discrete version are proved in Section 4, including description for special orders $0, 1$ and ∞, monotonicity in the order, reduction of entropy upon conditioning, and a version of the chain rule. Section 5 discusses and compares several notions of $α$ -mutual information. The various notions of channel capacity arising out of different notions of $α$ -mutual information are studied in Section 6, which are then compared using results from the preceding section.

2. Definition of Conditional Rényi Entropies

Let S be a Polish space and $B_{S}$ its Borel $σ$ -algebra. We fix a $σ$ -finite reference measure $γ$ on $(S, B_{S})$ . Our study of entropy and in particular all $L^{p}$ spaces we talk about will be with respect to this measure space, unless stated otherwise.

Definition 1.

Let X be an S-valued random variable with density f with respect to γ. We define the Rényi entropy of X of order $α \in (0, 1) \cup (1, \infty)$ by

$h_{α}^{γ} (X) = \frac{α}{1 - α} \log {∥ f ∥}_{α} .$

It will be convenient to write down Rényi entropy as

h_{α}^{γ} (X) = - \log {Ren}_{α}^{γ} (X),

where ${Ren}_{α}^{γ} (X) = {∥ f ∥}_{α}^{\frac{α}{α - 1}}$ will be called the Rényi probability of order $α$ of X.

Let T be another Polish space with a fixed measure $η$ on its Borel $σ$ -algebra $B_{T}$ . Now suppose $X, Y$ are, resepectively, $S, T$ -valued random variables with a joint density $F : S \times T \to R$ w.r.t. the reference $γ \otimes η$ . We will denote the marginals of F on S and T by f and g respectively. This in particular means that X has density f w.r.t. $γ$ and Y has density g w.r.t. $η$ . Just as Rényi probability of X, one can define the Rényi probability of the conditional X given $y = y$ by the expression ${Ren}_{α}^{γ} (X | Y = y) = {∥ \frac{F (\cdot, y)}{g (y)} ∥}_{α}^{\frac{α}{α - 1}}$ . The following generalizes ([30], Definition 2).

Definition 2.

Let $α \in (0, 1) \cup (1, \infty)$ . We define the conditional Rényi entropy $h_{α}^{γ} (X | Y)$ in terms of a weighted mean of conditional Rényi probabilities ${Ren}_{α}^{γ} (X | Y = y)$ ,

$h_{α}^{γ} (X | Y) = - \log {Ren}_{α}^{γ} (X | Y),$

where

$\begin{matrix} {Ren}_{α}^{γ} (X | Y) = {(\int_{T} {Ren}_{α}^{γ} {(X | Y = y)}^{\frac{α - 1}{α}} d P_{Y})}^{\frac{α}{α - 1}} \end{matrix}$

We can re-write ${Ren}_{α}^{γ} (X | Y)$ as

{Ren}_{α}^{γ} (X | Y) = {(\int_{supp (P_{Y})} g (y) {(\int_{S} {(\frac{F (x, y)}{g (y)})}^{α} d γ (x))}^{\frac{1}{α}} d η (y))}^{\frac{α}{α - 1}},

which is the expected $L^{α} (S, γ)$ norm of the conditional density under the measure $P_{Y}$ raised to a power which is the Hölder conjugate of $α$ . Using Fubini’s theorem the formula for ${Ren}_{α}^{γ} (X | Y)$ can be further written down only in terms of the joint density,

{Ren}_{α}^{γ} (X | Y) = {(\int_{T} {(\int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}} d η (y))}^{\frac{α}{α - 1}} .

Remark 1.

Suppose $P_{X | Y = y}$ , for each $y \in supp (g)$ , denotes the conditional distribution of X given $Y = y$ , i.e., the probability measure on S with density $F (x, y) / g (y)$ with respect to γ, then the conditional Rényi entropy can be written as

$h_{α}^{γ} (X | Y) = \frac{α}{1 - α} \log \int_{T} e^{- \frac{1 - α}{α} D_{α} (P_{X | Y = y} ∥ γ)} d P_{Y} (y),$

where $D_{α} (\cdot ∥ \cdot)$ denotes Rényi divergence (see Definition 3).

When X and Y are independent random variables one can easily check that ${Ren}_{α}^{γ} (X | Y) = {Ren}_{α}^{γ} (X)$ , therefore $h_{α}^{γ} (X | Y) = h_{α}^{γ} (X)$ as expected. Since the independence of X and Y means that all the conditionals $P_{X | Y = y}$ are equal to $P_{X}$ , the fact that $h_{α}^{γ} (X | Y) = h_{α}^{γ} (X)$ in this case can also be verified from the expression in Remark 1. The converse is also true, i.e., $h_{α}^{γ} (X | Y) = h_{α}^{γ} (X)$ implies the independence of X and Y, if $α \neq 0, \infty$ . This is noted later in Corollary 2.

Clearly, unlike conditional Shannon entropy, the conditional Rényi entropy is not the average Rényi entropy of the conditional distribution. The average Rényi entropy of the conditional distribution,

{\tilde{h}}_{α}^{γ} (X | Y) : = E_{Y} h_{α}^{γ} (X | Y = y),

has been proposed as a candidate for conditional Rényi entropy, however it does not satisfy some properties one would expect such a notion to satisfy, like monotonicity (see [30]). When $α \geq 1$ it follows from Jensen’s inequality that $h_{α}^{γ} (X | Y) \leq {\tilde{h}}_{α}^{γ} (X | Y)$ , while the inequality is reversed when $0 < α < 1$ .

3. Relation to Rényi Divergence

We continue to consider an S-valued random variable X and a T-valued random variable Y with a given joint distribution $P_{X, Y}$ with density F with respect to $γ \otimes η$ . Densities, etc are with respect to the fixed reference measures on the state spaces, unless mentioned otherwise.

Let $μ$ be a Borel probability measure with density p on a Polish space $Ω$ and let $ν$ be a Borel measure with density q on the same space with respect to a common measure $γ$ .

Definition 3

(Rényi divergence). Suppose $α \in (0, 1) \cup (1, \infty)$ . Then, the Rényi divergence of order α between measures μ and ν is defined as

$D_{α} (μ ∥ ν) = \log {(\int_{Ω} p {(x)}^{α} q {(x)}^{1 - α} d γ (x))}^{\frac{1}{α - 1}} .$

For order $0, 1, \infty,$ the Rényi divergence is defined by the respective limits.

Definition 4.

1.
$D_{0} (μ ∥ ν) : = {lim}_{α \to 0} D_{α} (μ ∥ ν) = - \log ν (supp (p))$ ;

2.
$D_{1} (μ ∥ ν) : = D (μ ∥ ν) = \int p (x) \log \frac{p (x)}{q (x)} d γ (x)$ ; and

3.
$D_{\infty} (μ ∥ ν) : = {lim}_{α \to \infty} D_{α} (μ ∥ ν) = \log (ess \sup_{μ} \frac{p (x)}{q (x)}) .$

Remark 2.

These definitions are independent of the reference measure γ.

Remark 3.

$\lim_{α \to 1} D_{α} (μ ∥ ν) = 1$ if some $D_{α} (μ ∥ ν) < \infty$ . See [29].

The conditional Rényi entropy can be written in terms of Rényi divergence from the joint distribution using a generalized Sibson’s identity we learnt from B. Nakiboğlu [43] (also see [36], and [38] where this identity for $α \neq 1$ appears to originate from). The proof for abstract alphabets presented here is also due to B. Nakiboğlu [43], which simplifies our original proof [41] of the second formula below.

Theorem 1.

Let $X, Y$ be random variables taking vaules in spaces $S, T$ respectively. We assume they are jointly distributed with density F with respect to the product reference measure $γ \otimes η$ . For $α \in (0, \infty)$ , and any probability measure λ absolutely continuous with respect to η, we have

$D_{α} (P_{X, Y} ∥ γ \otimes λ) = D_{α} (P_{X, Y} ∥ γ \otimes q_{⋆}) + D_{α} (q_{⋆} ∥ λ) = - h_{α}^{γ} (X | Y) + D_{α} (q_{⋆} ∥ λ),$

where $q_{⋆} = \frac{μ_{⋆}}{∥ μ_{⋆} ∥}$ , $μ_{⋆}$ is the measure having density $ϕ (y) = {(\int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}}$ with respect to η, and $∥ μ_{⋆} ∥ = \int_{T} {(\int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}} d η (y)$ is the normalization factor.

As a consequence, we have

$h_{α}^{γ} (X | Y) = - min_{λ \in P (T)} D_{α} (P_{X, Y} ∥ γ \otimes λ) .$

Proof.

Suppose $λ$ has density h with respect to $η$ . Then $γ \otimes λ$ has density $h (y)$ with respect to $d γ (x) d η (y)$ . Now, for $α \neq 1$ ,

$\begin{array}{l} D_{α} (P_{X, Y} ∥ γ \otimes λ) & = \frac{1}{α - 1} \log \int_{T} \int_{S} F {(x, y)}^{α} h {(y)}^{1 - α} d γ (x) d η (y) \\ = \frac{1}{α - 1} \log \int_{T} h {(y)}^{1 - α} \int_{S} F {(x, y)}^{α} d γ (x) d η (y) \\ = \frac{1}{α - 1} \log \int_{T} h {(y)}^{1 - α} ϕ {(y)}^{α} d η (y) \\ = \frac{1}{α - 1} \log \int_{T} {(\frac{d λ}{d η})}^{1 - α} {(\frac{d μ_{⋆}}{d η})}^{α} d η (y) \\ = \frac{1}{α - 1} \log \int_{T} {(\frac{d λ}{d η})}^{1 - α} {(\frac{d q_{⋆}}{d η})}^{α} {∥ μ_{⋆} ∥}^{α} d η (y) \\ = \frac{α}{α - 1} \log ∥ μ_{⋆} ∥ + \frac{1}{α - 1} \log \int_{T} {(\frac{d λ}{d η})}^{1 - α} {(\frac{d q_{⋆}}{d η})}^{α} d η (y) \\ = \frac{α}{α - 1} \log ∥ μ_{⋆} ∥ + D_{α} (q_{⋆} ∥ λ) \\ = - h_{α}^{γ} (X | Y) + D_{α} (q_{⋆} ∥ λ) . \end{array}$

The case $α = 1$ is straightforward and well-known, and the optimal $q_{⋆}$ in this case is the distribution of Y.□

Remark 4.

The identities above and the measure $q_{⋆}$ are independent of the reference measure η. η is only used to write out the Rényi divergence concretely in terms of densities.

4. Basic Properties

4.1. Special Orders

We will now look at some basic properties of the conditional Rényi entropy we have defined above. First we see that the conditional Rényi entropy is consistent with the notion of conditional Shannon entropy of X given Y defined by

h^{γ} (X | Y) = - \int_{T} \int_{S} F (x, y) \log \frac{F (x, y)}{g (y)} d γ (x) d η (y) .

Proposition 1.

$\lim_{α \to 1^{+}} h_{α}^{γ} (X | Y) = h_{1}^{γ} (X | Y) = h^{γ} (X | Y),$

if $h_{α}^{γ} (X | Y) < \infty$ for some $α > 1$ .

Proof.

We will use the formula in Theorem 1. By the monotonicity in the order $α$ of $h_{α} (X | Y)$ , all limits ${lim}_{α \to 1^{+}}, {lim}_{α \to 1^{-}}, {lim}_{α \to 0} = {lim}_{α \to 0^{+}}$ exist. Furthermore, for every $η$ ,

$min_{λ} D_{α} (P_{X, Y} ∥ γ \otimes λ) \leq D_{α} (P_{X, Y} ∥ γ \otimes η),$

so

$lim_{α \to 1^{+}} min_{λ} D_{α} (P_{X, Y} ∥ γ \otimes λ) \leq lim_{α \to 1^{+}} D_{α} (P_{X, Y} ∥ γ \otimes η),$

that is,

$- lim_{α \to 1^{+}} h_{α}^{γ} (X | Y) \leq D (P_{X, Y} ∥ γ \otimes η) .$

Now by minimizing over $η$ and hitting both sides with a minus sign yields,

$lim_{α \to 1^{+}} h_{α}^{γ} (X | Y) \geq h^{γ} (X | Y) .$

Suppose $α \geq 1$ , then by nondecreasing-ness of the Rényi divergence in order, for every $λ$ we have

$D_{α} (P_{X, Y} ∥ γ \otimes λ) \geq D (P_{X, Y} ∥ γ \otimes λ),$

and so by minimization over $λ$ and hitting with minus signs we obtain $h_{α}^{γ} (X | Y) \leq h^{γ} (X | Y)$ . This shows that

$lim_{α \to 1^{+}} h_{α}^{γ} (X | Y) = h^{γ} (X | Y) .$

□

We can extend our definition of Rényi probability of order $α$ to $α = 0$ by taking limits, thereby obtaining

{Ren}_{0}^{γ} (X) = \frac{1}{γ (supp (f))} .

In the next proposition we define the conditional Rényi entropy of order 0 and record a consequence.

Proposition 2.

$h_{0}^{γ} (X | Y) : = {∥ h_{0}^{γ} (X | Y = y) ∥}_{L^{\infty} (P_{Y})} = \lim_{α \to 0} h_{α}^{γ} (X | Y)$

Proof.

We will again use the formula from 1 in this proof. Just as in the last proof, for every probability measure $η$ ,

$\min_{λ} D_{α} (P_{X, Y} ∥ γ \otimes λ) \leq D_{α} (P_{X, Y} ∥ γ \otimes η),$

so

$lim_{α \to 0} min_{λ} D_{α} (P_{X, Y} ∥ γ \otimes λ) \leq \lim_{α \to 0} D_{α} (P_{X, Y} ∥ γ \otimes η),$

that is,

$- \lim_{α \to 0} h_{α}^{γ} (X | Y) \leq D_{0} (P_{X, Y} ∥ γ \otimes η) .$

Now by minimizing over $η$ and hitting both sides with a minus sign yields,

$\lim_{α \to 0} h_{α}^{γ} (X | Y) \geq h_{0}^{γ} (X | Y) .$

Suppose $α \geq 0$ , then by nondecreasing-ness of the Rényi divergence in order, for every $λ$ we have

$D_{α} (P_{X, Y} ∥ γ \otimes λ) \geq D_{0} (P_{X, Y} ∥ γ \otimes λ),$

and so by minimization over $λ$ and hitting with minus signs we obtain $h_{α}^{γ} (X | Y) \leq h_{0}^{γ} (X | Y)$ . This shows that

$\lim_{α \to 0} h_{α}^{γ} (X | Y) = h_{0}^{γ} (X | Y) .$

□

4.2. Monotonicity in Order

The (unconditional) Rényi entropy decreases with order, and the same is true of the conditional version.

Proposition 3.

For random variables X and Y,

$h_{β}^{γ} (X | Y) \leq h_{α}^{γ} (X | Y),$

for all $0 < α \leq β \leq \infty$ .

The proof is essentially the same as in the discrete setting, and follows from Jensen’s inequality.

Proof.

We obtain separately the cases $1 < α \leq β < \infty$ , $0 < α \leq β < 1$ , so that the entire Proposition follows from taking limits and the transitivity of inequality. Let $α \leq β$ be positive numbers, set $e = \frac{β - 1}{β} \frac{α}{α - 1}$ . Consider the following argument.

$\begin{matrix} {Ren}_{β} (X | Y) & = {(\int_{supp (P_{Y})} {Ren}_{β}^{γ} {(X | Y = y)}^{\frac{β - 1}{β}} d P_{Y})}^{\frac{β}{β - 1}} \\ \geq {(\int_{supp (P_{Y})} {Ren}_{α}^{γ} {(X | Y = y)}^{\frac{β - 1}{β}} d P_{Y})}^{\frac{β}{β - 1}} \\ = {(\int_{supp (P_{Y})} {Ren}_{α}^{γ} {(X | Y = y)}^{\frac{α - 1}{α} e} d P_{Y})}^{\frac{β}{β - 1}} \\ \geq {(\int_{supp (P_{Y})} {Ren}_{α}^{γ} {(X | Y = y)}^{\frac{α - 1}{α}} d P_{Y})}^{e \frac{β}{β - 1}} \\ = {Ren}_{α}^{γ} (X | Y) . \end{matrix}$

The first inequality above follows from the fact that the unconditional Rényi entropy (probability) decreases (increases) with order. Note that $e \geq 1$ when $1 < α \leq β < \infty$ and hence the function $r \mapsto r^{e}$ is convex making the second inequality an application of Jensen’s inequality in this case. When $0 < α \leq β < 1$ , the exponent satisfies $0 < e \leq 1$ so the function $r \mapsto r^{e}$ is concave but the outer exponent $\frac{β}{β - 1}$ is negative which turns the (second) inequality in the desired direction. □

4.3. Conditioning Reduces Rényi Entropy

As is the case for Shannon entropy, we find that the conditional Rényi entropy obeys monotonicity too; the proof of the theorem below adapts the approach for the discrete case taken in [30] by using Minkowski’s integral inequality.

Theorem 2.

[Monotonicity] Let $α \in [0, \infty]$ , and X be S-valued and $Y, Z$ be T-valued random variables. Then,

$h_{α}^{γ} (X | Y Z) \leq h_{α}^{γ} (X | Z) .$

Proof.

We begin by proving the result for an empty Z.

First, we deal with the case $1 < α < \infty$ . In terms of Rényi probabilities we must show that conditioning increases Rényi probability. Indeed,

$\begin{matrix} {Ren}_{α}^{γ} (X | Y) & = {(\int_{T} {[\int_{S} F {(x, y)}^{α} d γ (x)]}^{\frac{1}{α}} d η (y))}^{\frac{α}{α - 1}} \\ \geq {[\int_{S} {(\int_{T} F (x, y) d η (y))}^{α} d γ (x)]}^{\frac{1}{α} \frac{α}{α - 1}} \\ = {(\int_{S} f {(x)}^{α} d γ (x))}^{\frac{1}{α - 1}} = {Ren}_{α}^{γ} (X) . \end{matrix}$ (1)

The inequality above is a direct application of Minkowski’s integral inequality ([44], Theorem 2.4), which generalizes the summation in the standard triangle inequality to integrals against a measure.

For the case $0 < α < 1$ , we apply the triangle inequality for $1 / α$ then the fact that now $\frac{1}{α - 1}$ is negative flips the inequality in the desired direction:

$\begin{matrix} {Ren}_{α}^{γ} (X | Y) & = {(\int_{T} {[\int_{S} F {(x, y)}^{α} d γ (x)]}^{\frac{1}{α}} d η (y))}^{\frac{α}{α - 1}} \\ \geq {[\int_{S} {(\int_{T} F {(x, y)}^{\frac{α}{α}} d η (y))}^{α} d γ (x)]}^{\frac{1}{α - 1}} \\ = {(\int_{S} f {(x)}^{α} d γ (x))}^{\frac{1}{α - 1}} = {Ren}_{α}^{γ} (X) . \end{matrix}$

Now to extend this to non-empty Z we observe the following. Suppose V is an S-valued random variable, W is a T-valued random variable and $h \in R$ is such that

$h_{α}^{γ} (V | W, Z = z) \geq h_{α}^{γ} (X | Y, Z = z) + h,$

for every z in the support of $P_{Z}$ . Then

$h_{α}^{γ} (V | W Z) \geq h_{α}^{γ} (X | Y Z) + h .$

In terms of Renyi probabilities this means that if for every $z \in supp (P_{Z})$ ,

${Ren}_{α}^{γ} (V | W, Z = z) \leq \frac{{Ren}_{α}^{γ} (X | Y, Z = z)}{\log h},$

then

${Ren}_{α}^{γ} (V | W Z) \leq \frac{{Ren}_{α}^{γ} (X | Y Z)}{\log h} .$

Indeed,

$\begin{matrix} {Ren}_{α}^{γ} (V | W Z) & = {(\int_{supp (Z)} {Ren}_{α}^{γ} {(V | W, Z = z)}^{\frac{α - 1}{α}} d P_{Z} (z))}^{\frac{α}{α - 1}} \\ \leq {(\int_{supp (Z)} {(\frac{{Ren}_{α}^{γ} (X | Y, Z = z)}{\log h})}^{\frac{α - 1}{α}} d P_{Z} (z))}^{\frac{α}{α - 1}} \\ = \frac{{Ren}_{α}^{γ} (X | Y Z)}{\log h}, \end{matrix}$

since the functions $r \mapsto r^{\frac{α - 1}{α}}$ and $r \mapsto r^{\frac{α}{α - 1}}$ are both strictly increasing or strictly decreasing, based on the value of $α$ . Finally, the claim for non-empty Z follows from this observation given we have already demonstrated $h_{α}^{γ} (X | Y, Z = z) \leq h_{α}^{γ} (X | Z = z)$ throughout $supp (Z)$ . The cases $α = 0, \infty$ are obtained by taking limits. For $α = 1$ this is well-known.□

Corollary 1.

If $X \to Y \to Z$ forms a Markov chain, then $h_{α}^{γ} (X | Y) \leq h_{α}^{γ} (X | Z)$ .

When we specialize to “empty Z” (i.e., the $σ$ -field generated by Z being the Borel $σ$ -field on T, or not conditioning on anything), we find that “conditioning reduces Rényi entropy”.

Corollary 2.

Let $α \in (0, \infty)$ . Then $h_{α}^{γ} (X | Y) \leq h_{α}^{γ} (X)$ , with equality iff X and Y are independent.

While the inequality in Corollary 2 follows immediately from Theorem 2, the conditions for equality follow from those for Minkowski’s inequality (which is the key inequality used in the proof of Theorem 2, see, e.g., ([44], Theorem 2.4)): given the finiteness of both sides in the display (1), equality holds if and only if $F (x, y) = ϕ (x) ψ (y)$ $γ \otimes η$ -a.e. for some functions $ϕ$ and $ψ$ . In our case, this means that equality holds in $h_{α}^{γ} (X | Y) \leq h_{α}^{γ} (X)$ if and only if X and Y are independent ( $α \in (0, 1) \cup (1, \infty)$ ). The corresponding statement for $α = 1$ is well-known.

Since ${\tilde{h}}_{α}^{γ} (X | Y) \leq h_{α}^{γ} (X | Y)$ when $0 < α < 1$ , as noted in Section 2, we have “conditioning reduces Rényi entropy” in this case as well.

Corollary 3.

${\tilde{h}}_{α}^{γ} (X | Y) \leq h_{α}^{γ} (X)$ , when $0 < α < 1$ .

Remark 5.

The above corollary is not true for large values of α. For a counter-example, see ([28], Theorem 7).

From the special case when Y is discrete random variable taking finitely many values $y_{i}$ with probability $p_{i}$ , $1 \leq i \leq n$ , and the conditional density of X given $Y = y_{i}$ is $f_{i} (x)$ , we obtain the concavity of Rényi entropy (for orders below 1) which is already known in the literature.

Corollary 4.

Let $0 \leq α \leq 1$ . Suppose $f_{i}$ are densities on S and $p_{i}$ non-negative numbers, $1 \leq i \leq n$ , such that $\sum_{i} p_{i} = 1$ . Then, $h_{α}^{γ} (\sum_{i = 1}^{n} p_{i} f_{i}) \geq \sum_{i = 1}^{n} p_{i} h_{α}^{γ} (f_{i})$ .

4.4. A chain Rule

In this subsection we deduce a version of the chain rule from Theorem 1. For the discrete case, this has been done by Fehr and Berens in ([30], Theorem 3). If $η$ is a probability measure, then we have $h_{α}^{γ} (X | Y) \geq - D_{α} (P_{X, Y} ∥ γ \otimes η) = h_{α}^{γ \otimes η} (X, Y)$ . If we relax the condition on $η$ to be a measure under which $P_{Y}$ is absolutely continuous and supported on a set of finite measure, we obtain $h_{α}^{γ} (X | Y) \geq h_{α}^{γ \otimes η} (X, Y) - h_{0}^{η} (Y)$ . Since this inequality trivially holds true when Y is supported on a set of infinite $η$ -measure, we have proved the following inequality that (although weaker being an inequality rather an identity) is reminiscent of the chain rule for Shannon entropy.

Proposition 4.

For any $α > 0$ ,

$h_{α}^{γ \otimes η} (X, Y) \leq h_{α}^{γ} (X | Y) + h_{0}^{η} (Y) .$

Proof.

Recall that

${Ren}_{α}^{γ} (X | Y) = {[\int_{T} {(\int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}} d η (y)]}^{\frac{α}{α - 1}},$

where the outer integral can be restricted to the support of $P_{Y}$ , which we will keep emphasizing in the first few steps.

$\begin{matrix} {Ren}_{α}^{γ} (X | Y) & = {[\int_{supp (P_{Y})} {(\int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}} d η (y)]}^{\frac{α}{α - 1}} \\ = {[\int_{supp (P_{Y})} {(η (supp (P_{Y})) \int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}} d \frac{η (y)}{η (supp (P_{Y}))}]}^{\frac{α}{α - 1}} \\ = {[\int_{supp (P_{Y})} {(\int_{S} {(η (supp (P_{Y})) F (x, y))}^{α} d γ (x))}^{\frac{1}{α}} d \frac{η (y)}{η (supp (P_{Y}))}]}^{\frac{α}{α - 1}} . \end{matrix}$

By Jensen’s inequality, when $α > 1$ ,

$\begin{matrix} {Ren}_{α}^{γ} (X | Y) & \leq {[\int_{supp (P_{Y})} (\int_{S} {(η (supp (P_{Y})) F (x, y))}^{α} d γ (x)) d \frac{η (y)}{η (supp (P_{Y}))}]}^{\frac{1}{α - 1}} \\ = η (supp (P_{Y})) {[\int_{supp (P_{Y})} \int_{S} F {(x, y)}^{α} d γ (x) d η (y)]}^{\frac{1}{α - 1}} \\ = η (supp (P_{Y})) {Ren}_{α}^{γ \otimes η} (X, Y) . \end{matrix}$

Note that the above calculation also holds when $α \in (0, 1)$ because even though Jensen’s inequality is flipped because $\frac{1}{α}$ is now convex, the inequality flips again, now to the desired side, since the exponent $\frac{α}{α - 1}$ is negative. Taking logarithms and hitting both sides with a minus sign concludes the proof.□

Remark 6.

These inequalities are tight. Equality is attained when X, Y are independent and Y is uniformly distributed on finite support.

4.5. Sensitivity to Reference Measure

Proposition 5.

Suppose $\frac{d γ}{d μ} = ψ$ which is bounded by a number M. Then $h_{α}^{μ} (X | Y) \geq h_{α}^{γ} (X | Y) - \log M .$

Proof.

Then the joint density of $(X, Y)$ under the measure $μ \otimes μ$ becomes $F (x, y) ψ (x) ψ (y)$ if the joint density was $F (x, y)$ under $γ$ . Now suppose $α \geq 1$ then,

$\begin{matrix} h_{α}^{μ} (X | Y) \\ = \frac{- α}{α - 1} \log [\int_{T} {(\int_{S} F {(x, y)}^{α} ψ {(x)}^{α} ψ {(y)}^{α} d μ (x))}^{\frac{1}{α}} d μ (y)] \\ = \frac{- α}{α - 1} \log [\int_{T} {(\int_{S} F {(x, y)}^{α} ψ {(x)}^{α} d μ (x))}^{\frac{1}{α}} ψ (y) d μ (y)] \\ = \frac{- α}{α - 1} \log [\int_{T} {(\int_{S} F {(x, y)}^{α} ψ {(x)}^{α - 1} d γ (x))}^{\frac{1}{α}} d γ (y)] \\ \geq \frac{- α}{α - 1} (\log [\int_{T} {(\int_{S} F {(x, y)}^{α} d γ (x))}^{\frac{1}{α}} d γ (y)] + \log M^{\frac{α - 1}{α}}) \\ = h_{α}^{γ} (X | Y) - \log M . \end{matrix}$

If $α \in (0, 1)$ , the same inequality holds if $μ$ is also absolutely continuous w.r.t. $γ$ .□

5. Notions of $α$ -Mutual Information

Arimoto [39] used his conditional Rényi entropy to define a mutual information that we extend to the general setting as follows.

Definition 5.

Let X be an S-valued random variable and let Y be a T-valued random variable, with a given joint distribution. Then, we define

$I_{α}^{(γ)} (X ⇝ Y) = h_{α}^{γ} (X) - h_{α}^{γ} (X | Y) .$

We use the squiggly arrow to emphasize the lack of symmetry in X and Y, but nonetheless to distinguish from the notation for directed mutual information, which is usually written with a straight arrow. By Corollary 2, for $α \in (0, \infty)$ , $I_{α}^{(γ)} (X ⇝ Y) = 0$ if and only if X and Y are independent. Therefore $I_{α}^{(γ)} (X ⇝ Y)$ , for any choice of reference measure $γ$ , can be seen as a measure of dependence between X and Y.

Let us discuss a little further the validity of $I_{α}^{(γ)} (X ⇝ Y)$ as a dependence measure. If the conditional distributions are denoted by $P_{X | Y = y}$ as in Remark 1, using the fact that $h_{α}^{ν} (Z) = - D_{α} (Z ∥ ν)$ for any random variable Z, we have for any $α \in (0, 1) \cup (1, \infty)$ that

\begin{matrix} I_{α}^{(γ)} (X ⇝ Y) & = h_{α}^{γ} (X) - h_{α}^{γ} (X | Y) \\ = \frac{α}{1 - α} \log e^{- \frac{1 - α}{α} D_{α} (P_{X} ∥ γ)} - \frac{α}{1 - α} \log \int_{T} e^{- \frac{1 - α}{α} D_{α} (P_{X | Y = y} ∥ γ)} d P_{Y} (y) \\ = - \frac{α}{1 - α} \log \int_{T} e^{- \frac{1 - α}{α} [D_{α} (P_{X | Y = y} ∥ γ) - D_{α} (P_{X} ∥ γ)]} d P_{Y} (y) . \end{matrix}

Furthermore, when $α \in (0, 1)$ , by ([29], Proposition 2), we may also write

\begin{matrix} I_{α}^{(γ)} (X ⇝ Y) & = - \frac{α}{1 - α} \log \int_{T} e^{- [D_{1 - α} (γ ∥ P_{X | Y = y}) - D_{1 - α} (γ ∥ P_{X})]} d P_{Y} (y) . \end{matrix}

Note that Rényi divergence is convex in the second argument (see [29], Theorem 12) when $α \in (0, 1)$ , and the last equation suggests that Arimoto’s mutual information can be seen as a quantification of this convexity gap.

One can also see clearly from the above expressions why this quantity controls, at least for $α \in (0, 1)$ , the dependence between X and Y: indeed, one has for any $α \in (0, 1)$ and any $t > 0$ that,

\begin{matrix} P_{Y} {D_{α} (P_{X | Y} ∥ γ) - D_{α} (P_{X} ∥ γ) < t} & = P_{Y} {e^{- β [D_{α} (P_{X | Y} ∥ γ) - D_{α} (P_{X} ∥ γ)]} > e^{- β t}} \leq e^{β t} e^{- β I_{α}^{(γ)} (X ⇝ Y)}, \end{matrix}

where the inequality comes from Markov’s inequality, and we use $β = \frac{1 - α}{α}$ . Thus, when $I_{α}^{(γ)} (X ⇝ Y)$ is large, the probability that the conditional distributions of X given Y cluster at around the same “Rényi divergence” distance from the reference measure $γ$ as the unconditional distribution of X (which is of course a mixture of the conditional distributions) is small, suggesting a significant “spread” of the conditional distributions and therefore strong dependence. This is illustrated in Figure 1. Thus, despite the dependence of $I_{α}^{(γ)} (X ⇝ Y)$ on the reference measure $γ$ , it does guarantee strong dependence when it is large (at least for $α < 1$ ). When $α \to 1^{-}$ we have $β \to 0$ , and consequently the upper bound $e^{β t} e^{- β I_{α}^{(γ)} (X ⇝ Y)} \to 1$ making the inequality trivial.

A schematic diagram showing how large $I_{α}^{(γ)} (X ⇝ Y)$ , for a fixed $α < 1$ , demonstrates strong dependence between X and Y: the space depicted is the space of probability measures on S, including $γ$ , $P_{X}$ , and the red dots representing the conditional distributions of X given that Y takes different values in T. The $D_{α}$ -balls around $γ$ are represented by ellipses to emphasize that the geometry is non-Euclidean and in fact, non-metric. When $I_{α}^{(γ)} (X ⇝ Y)$ is large, there is a significant probability that Y takes values such that the corresponding conditional distributions of X lie outside the larger $D_{α}$ -ball, and therefore far from the (unconditional) distribution $P_{X}$ of X.

The “mutual information” quantity $I_{α}^{(γ)} (X ⇝ Y)$ clearly depends on the choice of the reference measure $γ$ . Nonetheless, there are 3 families of Rényi mutual informations that are independent of the choice of reference measure, which we now introduce.

Definition 6.

Fix $α \geq 0$ .

1.
The Lapidoth-Pfister α-mutual information is defined as
$J_{α} (X; Y) : = \min_{μ \in P (S), ν \in P (T)} D_{α} (P_{X, Y} ∥ μ \otimes ν) .$

2.
The Augustin-Csiszár α-mutual information is defined as
$K_{α}^{X ⇝ Y} (P_{X, Y}) = min_{μ \in P (T)} E_{X} D_{α} (P_{Y | X} (\cdot | X) ∥ μ) .$

3.
Sibson’s α-mutual information is defined as
$I_{α}^{X ⇝ Y} (P_{X, Y}) = min_{μ \in P (T)} D_{α} (P_{X, Y} ∥ P_{X} \otimes μ) .$

The quantity $J_{α}$ was recently introduced by Lapidoth and Pfister as a measure of independence in [45] (cf., [25,27,32]). The Augustin-Csiszár mutual information was originally introduced in [40] by Udo Augustin with a slightly different parametrization, and gained much popularity following Csiszár’s work in [21]. For a discussion on early work on this quantity and applications also see [42] and references therein. Both [40] and [42] treat abstract alphabets however the former is limited to $α \in (0, 1)$ while the latter treats all $α \in (0, \infty)$ . Sibson’s definition originates in [38] where he introduces $I_{α}$ in the form of information radius (see, e.g, [33]), which is often written in terms of Gallager’s function (from [46]). Since all the quantities in the above definition are stated in terms of Rényi divergences not involving the reference measure $γ$ , they themselves are independent of the reference measure. Their relationship with the Rényi divergence also shows that all of them are non-negative. Moreover, putting $μ = P_{X}, ν = P_{Y}$ in the expression for $J_{α}$ and $μ = P_{Y}$ in expressions for $K_{α}$ and $I_{α}$ when $X, Y$ are independent shows that they all vanish under independence.

While these notions of mutual information are certainly not equal to $I_{α}^{(γ)}$ in general when $α \neq 1$ , they do have a direct relationship with conditional Rényi entropies, by varying the reference measure.

Since ${min}_{μ} {min}_{ν} D_{α} (P_{X, Y} ∥ μ \otimes ν) = \min_{μ} - h_{α}^{μ} (X | Y) = - \max_{μ} h_{α}^{μ} (X | Y)$ , where all optimizations are done over probability measures, we can write Lapidoth and Pfister’s mutual information as

J_{α} (X; Y) = - \max_{μ} h_{α}^{μ} (X | Y) = \min_{μ} - \frac{α}{1 - α} \log \int_{T} e^{- \frac{1 - α}{α} D_{α} (P_{X | Y = y} ∥ μ)} d P_{Y} (y) .

Note that it is symmetric by definition: $J_{α} (X; Y) = J_{α} (Y; X)$ , which is why we do not use squiggly arrows to denote it. By writing down Rényi divergence as Rényi entropy w.r.t. reference measure, Augustin-Csiszár’s $K_{α}$ can be recast in a similar form, this time using the average Rényi entropy of the conditionals instead of Arimoto’s conditional Rényi entropy,

K_{α}^{X ⇝ Y} = - \max_{μ} E_{X} h_{α}^{μ} (Y | X = x) = - \max_{μ} {\tilde{h}}_{α}^{μ} (Y | X) .

In light of Theorem 1, Sibson’s mutual information can clearly be written in terms of conditional Rényi entropy as

I_{α}^{X ⇝ Y} = - h_{α}^{P_{X}} (X | Y) .

This leads to the observation that Sibson’s mutual information can be seen as a special case of Arimoto’s mutual information, when the reference measure is taken to be the distribution of X:

\begin{matrix} I_{α}^{X ⇝ Y} = - h_{α}^{P_{X}} (X | Y) \\ = h_{α}^{P_{X}} (X) - h_{α}^{P_{X}} (X | Y) = I_{α}^{(P_{X})} (X ⇝ Y) . \end{matrix}

For the sake of comparision with the corresponding expression for $I_{α}^{(γ)} (X ⇝ Y)$ , we also write $I_{α}^{X ⇝ Y}$ as

I_{α}^{X ⇝ Y} = - \frac{α}{1 - α} \log \int_{T} e^{- \frac{1 - α}{α} D_{α} (P_{X | Y = y} ∥ P_{X})} d P_{Y} (y) .

The following inequality, which relates the three families when $α \geq 1$ , turns out to be quite fruitful.

Theorem 3.

For $α \geq 1$ , we have

$K_{α}^{X ⇝ Y} \leq J_{α} (X; Y) \leq I_{α}^{X ⇝ Y} .$

Proof.

Suppose $α \geq 1$ . Then, $h_{α}^{μ} (Y | X) \leq {\tilde{h}}_{α}^{μ} (Y | X)$ so that

$\begin{matrix} K_{α}^{X ⇝ Y} = - \max_{μ} {\tilde{h}}_{α}^{μ} (Y | X) & \leq - \max_{μ} h_{α}^{μ} (Y | X) \\ = J_{α} (Y; X) = J_{α} (X; Y) . \end{matrix}$

Moreover,

$J_{α} (X; Y) = - \max_{μ} h_{α}^{μ} (X | Y) \leq - h_{α}^{P_{X}} (X | Y) = I_{α}^{X ⇝ Y},$

which completes the proof. □

Remark 7.

When $α \in (0, 1)$ , from the straightforward observation $J_{α} (X; Y) \leq I_{α}^{X ⇝ Y}$ and [21], we have $J_{α} (X; Y) \leq I_{α}^{X ⇝ Y} \leq K_{α}^{X ⇝ Y}$ .

We note that the relation between $K_{α}^{X ⇝ Y}$ and $I_{α}^{X ⇝ Y}$ (in the finite alphabet case) goes back to Csiszár [21]. In the next and final section, we explore the implications of Theorem 3 for various notions of capacity.

6. Channels and Capacities

We begin by defining channel and capacities. Throughout this section, assume $α \geq 1$ .

Definition 7.

Let $(A, A), (B, B)$ be measurable spaces. A function $W : A \times B \to R$ is called a probability kernel or a channel from the input space $(A, A)$ to the output space $(B, B)$ if

1.
For all $a \in A$ , the function $W (\cdot | a) : B \to R$ is a probability measure on $(B, B)$ , and

2.
For every $V \in B$ , the function $W (V | \cdot) : A \to R$ is a $A$ -measurable function.

In our setting, the conditional distributions X given $Y = y$ define a channel W from $supp (P_{Y})$ to S. In terms of this, one can write a density-free expression for the conditional Rényi entropy:

h_{α}^{γ} (X | Y) = \frac{α}{1 - α} \log \int_{T} e^{- \frac{1 - α}{α} D_{α} (W (\cdot | y) ∥ γ)} d P_{Y} (y) .

Definition 8.

Let $(B, B)$ be a measurable space and $W \subseteq P (B)$ a set of probability measures on B. Following [36], define the order-α Rényi radius of $W$ relative to $q \in P (B)$ by

$S_{α, W} (q) = sup_{w \in W} D_{α} (w ∥ q) .$

The order-α Rényi radius of $W$ is defined as

$S_{α} (W) = \inf_{q \in P (B)} S_{α, W} (q) .$

Given a joint distribution $P_{X, Y}$ of $(X, Y)$ , one can consider the quantities $I_{α}^{X ⇝ Y}, I_{α}^{Y ⇝ X}, K_{α}^{X ⇝ Y}, K_{α}^{Y ⇝ X}$ and $J_{α} (X; Y)$ , as functions of an “input” distribution $P_{X}$ and the channel from X to Y (i.e., the probability kernel formed by the conditional distributions of Y given $X = x$ ). For example, one can consider $I_{α}^{Y ⇝ X} = \inf_{ν \in P (S)} D_{α} (P_{X, Y} ∥ ν \otimes P_{Y})$ as a function of $P_{X}$ and the probability kernel formed by the conditional distributions of Y given $X = x$ . Under this interpretation, we first define various capacities.

Definition 9.

Given a channel W from X to Y, we define capacities of order α by

1.
$C_{K, α}^{X ⇝ Y} (W) = \sup_{P \in P (S)} K_{α}^{X ⇝ Y} (P, W)$ ,

2.
$C_{J, α} (W) = \sup_{P \in P (S)} J_{α} (P, W)$ , and

3.
$C_{I, α}^{X ⇝ Y} (W) = \sup_{P \in P (S)} I_{α}^{X ⇝ Y} (P, W)$ .

Theorem 3 allows us to extend ([31], Theorem 5) to include the capacity based on the Lapidoth-Pfister mutual information.

Theorem 4.

Let $α \geq 1$ , and fix a channel W from X to Y. Then,

$\begin{matrix} C_{K, α}^{Y ⇝ X} (W) \leq C_{I, α}^{X ⇝ Y} (W) & = C_{J, α} (W) = C_{K, α}^{X ⇝ Y} (W) = S_{α} (W) \leq C_{I, α}^{Y ⇝ X} (W) . \end{matrix}$

Proof.

Theorem 3 implies that,

$\sup_{P \in P (S)} K_{α}^{X ⇝ Y} (P, W) \leq \sup_{P \in P (S)} J_{α} (P, W) \leq \sup_{P \in P (S)} I_{α}^{X ⇝ Y} (P, W),$

that is,

$C_{K, α}^{X ⇝ Y} (W) \leq C_{J, α} (W) \leq C_{I, α}^{X ⇝ Y} (W) .$

It was shown by Csiszár [21] in the finite alphabet setting (in fact, he showed this for all $α > 0$ ) that $C_{I, α}^{X ⇝ Y} (W) = C_{K, α}^{X ⇝ Y} (W)$ . Nakiboğlu demonstrates $C_{I, α}^{X ⇝ Y} (W) = S_{α} (W)$ in [36] and $C_{K, α}^{X ⇝ Y} (W) = S_{α} (W)$ in [42] for abstract alphabets. Putting all this together, we have

$C_{I, α}^{X ⇝ Y} (W) = C_{J, α} (W) = C_{K, α}^{X ⇝ Y} (W) = S_{α} (W) .$

Finally, using the symmetry of $J_{α}$ and Theorem 3 in a similar fashion again, we get

$C_{K, α}^{Y ⇝ X} (W) \leq C_{J, α} (W) \leq C_{I, α}^{Y ⇝ X} (W) .$

This completes the proof.□

The two inequalities in the last theorem cannot be improved to be equalities, this follows from a counter-example communicated to the authors by C. Pfister. Note that this theorem corrects Theorem V.1 in [41].

Since $J_{α} (X; Y)$ is no longer sandwiched between $K_{α}^{X ⇝ Y}$ and $I_{α}^{X ⇝ Y}$ when $α \in (0, 1)$ the same argument cannot be used to deduce the equality of various capacities in this case. However, when $α \in [1 / 2, 1)$ , a direct demonstration proves that the Lapidoth-Pfister capacity of a channel equals Rényi radius when the state spaces are finite.

Theorem 5.

Let $α \in [\frac{1}{2}, 1)$ , and fix a channel W from X to Y where X and Y take values in finite sets S and T respectively. Then,

$\sup_{P \in P (S)} J_{α} (P, W) = S_{α} (W) .$

Proof.

We continue using integral notation instead of summation. Note that,

$J_{α} (X; Y) = - \max_{μ \in P (T)} - β \log \int_{S} e^{\frac{1}{β} D_{α} (W (x) ∥ μ)} d P (x) = \min_{μ \in P (T)} β \log \int_{S} e^{\frac{1}{β} D_{α} (W (x) ∥ μ)} d P (x),$

where $β = \frac{α}{α - 1} .$ We consider the function $f (P, μ) = β \log \int_{S} e^{\frac{1}{β} D_{α} (W (x) ∥ μ)} d P (x)$ defined on $P (S) \times P (T) .$ Observe that the function $g (P, μ) = - e^{\frac{1}{β} f (P, μ)} = - \int_{S} e^{\frac{1}{β} D_{α} (W (x) ∥ μ)} d P (x)$ has the same minimax properties as f. We make the following observations about this function.

g is linear in P.

g is convex in μ.

Follows from the proof in ([27], Lemma 17).

g is continuous in each of the variables P and μ. Continuity in $μ$ follows from continuity of $D_{α}$ in the second coordinate (see, for example, in [29]) whereas continuity in P is a consequence of linearity of the integral (summation).

The above observations ensure that we can apply von Neumann’s convex minimax theorem to g, and therefore to f to conclude that

$\sup_{P} J_{α} (P, W) = \sup_{P} \min_{μ} f (P, μ) = \min_{μ} \sup_{P} f (P, μ) .$

For a fixed $μ$ however, $\sup_{P} f (P, μ) = \sup_{P} β \log \int e^{\frac{1}{β} D_{α} (W (x) ∥ μ)} d P (x) = \sup_{x} D_{α} (W (x) ∥ μ)$ (the RHS is clearly bigger than the LHS, for the other direction use measures $P = δ_{x_{n}}$ where $x_{n}$ is a supremum achieving sequence for the RHS). This shows that when $α \geq 1 / 2$ the capacity coming from the $J_{α} (X; Y)$ equals the Rényi radius if the state spaces are finite.□

Though we do not treat capacities coming from Arimoto’s mutual information in this paper due to its dependence on a reference measure, a remark can be made in this regard following B. Nakiboğlu’s [43] observation that Arimoto’s mutual information w.r.t. $γ$ of a joint distribution $(X, Y)$ can be written as a Sibson mutual information of some input probability measure P and the channel W from X to Y corresponding to $P_{X, Y}$ . Let $X, Y$ denote the marginals of $(X, Y)$ . As before there are reference measures $γ$ on the state space S of X. Let P denote the probability measure on S with density $\frac{d P}{d γ} = e^{(1 - α) D_{α} (P_{X} ∥ γ)} {(\frac{d P_{X}}{d γ})}^{α}$ . Then a calculation shows that

I_{α}^{(γ)} (X ⇝ Y) = I_{α}^{X ⇝ Y} (P, W) .

Therefore, it follows that if a reference measure $γ$ is fixed, then the capacity of order $α$ of a channel W calculated from Arimoto’s mutual information will be less than the capacity based on Sibson mutual information (which equals the Rényi radius of W).

Acknowledgments

We thank C. Pfister for many useful comments, especially for pointing out an error in [41] that is corrected here, and for a simpler proof of Theorem 5. We also thank B. Nakiboğlu for detailed suggestions, which include but are not limited to pointing us towards useful references, an easier proof of Theorem 1, and the observation discussed at the very end of the last section.

Author Contributions

Conceptualization, G.A. and M.M; investigation, G.A. and M.M.; writing—original draft preparation, G.A. and M.M.; writing—review and editing, G.A. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by NSF grant DMS-1409504.

Conflicts of Interest

The authors declare no conflict of interest.

References

1.Rényi A. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Volume I. University of California Press; Berkeley, CA, USA: 1961. On measures of entropy and information; pp. 547–561. [Google Scholar]
2.Shannon C. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:379–423, 623–656. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]
3.Toscani G. Rényi entropies and nonlinear diffusion equations. Acta Appl. Math. 2014;132:595–604. doi: 10.1007/s10440-014-9933-9. [DOI] [Google Scholar]
4.Wang L., Madiman M. Beyond the entropy power inequality, via rearrangements. IEEE Trans. Inform. Theory. 2014;60:5116–5137. doi: 10.1109/TIT.2014.2338852. [DOI] [Google Scholar]
5.Madiman M., Wang L., Woo J.O. Majorization and Rényi entropy inequalities via Sperner theory. Discret. Math. 2019;342:2911–2923. doi: 10.1016/j.disc.2019.03.002. [DOI] [Google Scholar]
6.Wang L., Woo J.O., Madiman M. A lower bound on the Rényi entropy of convolutions in the integers; Proceedings of the 2014 IEEE International Symposium on Information Theory; Honolulu, HI, USA. 29 June–4 July 2014; pp. 2829–2833. [Google Scholar]
7.Madiman M., Wang L., Woo J.O. Rényi entropy inequalities for sums in prime cyclic groups. arXiv. 20171710.00812 [Google Scholar]
8.Madiman M., Melbourne J., Xu P. Forward and Reverse Entropy Power Inequalities in Convex Geometry. In: Carlen E., Madiman M., Werner E.M., editors. Convexity and Concentration. Volume 161. Springer; Berlin, Germany: 2017. pp. 427–485. [Google Scholar]
9.Fradelizi M., Li J., Madiman M. Concentration of information content for convex measures. Electron. J. Probab. 2020;25:1–22. doi: 10.1214/20-EJP416. [DOI] [Google Scholar]
10.Stam A. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control. 1959;2:101–112. doi: 10.1016/S0019-9958(59)90348-1. [DOI] [Google Scholar]
11.Madiman M., Barron A. The Monotonicity of Information in the Central Limit Theorem and Entropy Power Inequalities; Proceedings of the 2006 IEEE International Symposium on Information Theory; Seattle, WA, USA. 9–14 July 2006; pp. 1021–1025. [Google Scholar]
12.Madiman M., Ghassemi F. Combinatorial Entropy Power Inequalities: A Preliminary Study of the Stam region. IEEE Trans. Inform. Theory. 2019;65:1375–1386. doi: 10.1109/TIT.2018.2854545. [DOI] [Google Scholar]
13.Bobkov S.G., Madiman M., Wang L. Fractional generalizations of Young and Brunn-Minkowski inequalities. In: Houdré C., Ledoux M., Milman E., Milman M., editors. Concentration, Functional Inequalities and Isoperimetry. Volume 545. American Mathematical Society; Providence, RI, USA: 2011. pp. 35–53. [Google Scholar]
14.Fradelizi M., Madiman M., Marsiglietti A., Zvavitch A. The convexification effect of Minkowski summation. EMS Surv. Math. Sci. 2019;5:1–64. doi: 10.4171/EMSS/26. [DOI] [Google Scholar]
15.Madiman M., Kontoyiannis I. Entropy bounds on abelian groups and the Ruzsa divergence. IEEE Trans. Inform. Theory. 2018;64:77–92. doi: 10.1109/TIT.2016.2620470. [DOI] [Google Scholar]
16.Ruzsa I.Z. Sumsets and entropy. Random Struct. Algorithms. 2009;34:1–10. doi: 10.1002/rsa.20248. [DOI] [Google Scholar]
17.Madiman M., Marcus A., Tetali P. Information-theoretic Inequalities in Additive Combinatorics; Proceedings of the 2010 IEEE Information Theory Workshop (ITW 2010); Cairo, Egypt. 6–8 January 2010. [Google Scholar]
18.Tao T. Sumset and inverse sumset theory for Shannon entropy. Comb. Probab. Comput. 2010;19:603–639. doi: 10.1017/S0963548309990642. [DOI] [Google Scholar]
19.Madiman M., Marcus A., Tetali P. Entropy and set cardinality inequalities for partition-determined functions. Random Struct. Algorithms. 2012;40:399–424. doi: 10.1002/rsa.20385. [DOI] [Google Scholar]
20.Kontoyiannis I., Madiman M. Sumset and Inverse Sumset Inequalities for Differential Entropy and Mutual Information. IEEE Trans. Inform. Theory. 2014;60:4503–4514. doi: 10.1109/TIT.2014.2322861. [DOI] [Google Scholar]
21.Csiszár I. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inform. Theory. 1995;41:26–34. doi: 10.1109/18.370121. [DOI] [Google Scholar]
22.Arikan E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inform. Theory. 1996;42:99–105. doi: 10.1109/18.481781. [DOI] [Google Scholar]
23.Bunte C., Lapidoth A. Encoding tasks and Rényi entropy. IEEE Trans. Inform. Theory. 2014;60:5065–5076. doi: 10.1109/TIT.2014.2329490. [DOI] [Google Scholar]
24.Bunte C., Lapidoth A. Rényi Entropy and Quantization for Densities; Proceedings of the 2014 IEEE Information Theory Workshop (ITW 2014); Hobart, TAS, Australia. 2–5 November 2014; pp. 258–262. [Google Scholar]
25.Tomamichel M., Hayashi M. Operational interpretation of Rényi information measures via composite hypothesis testing against product and Markov distributions. IEEE Trans. Inform. Theory. 2018;64:1064–1082. doi: 10.1109/TIT.2017.2776900. [DOI] [Google Scholar]
26.Tan V.Y.F., Hayashi M. Analysis of remaining uncertainties and exponents under various conditional Rényi entropies. IEEE Trans. Inform. Theory. 2018;64:3734–3755. doi: 10.1109/TIT.2018.2792495. [DOI] [Google Scholar]
27.Lapidoth A., Pfister C. Two measures of dependence. Entropy. 2019;21:778. doi: 10.3390/e21080778. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Teixeira A., Matos A., Antunes L. Conditional Rényi entropies. IEEE Trans. Inform. Theory. 2012;58:4273–4277. doi: 10.1109/TIT.2012.2192713. [DOI] [Google Scholar]
29.van Erven T., Harremoës P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory. 2014;60:3797–3820. doi: 10.1109/TIT.2014.2320500. [DOI] [Google Scholar]
30.Fehr S., Berens S. On the conditional Rényi entropy. IEEE Trans. Inform. Theory. 2014;60:6801–6810. doi: 10.1109/TIT.2014.2357799. [DOI] [Google Scholar]
31.Verdú S. α-mutual information; Proceedings of the Information Theory and Applications Workshop (ITA); San Diego, CA, USA. 1–6 February 2015. [Google Scholar]
32.Lapidoth A., Pfister C. Testing Against Independence and a Rényi Information Measure; Proceedings of the 2018 IEEE Inform. Theory Workshop; Guangzhou, China. 25–29 November 2018; pp. 1–5. [Google Scholar]
33.Sason I., Verdú S. Arimoto-Rényi conditional entropy and Bayesian M-ary hypothesis testing. IEEE Trans. Inform. Theory. 2018;64:4–25. doi: 10.1109/TIT.2017.2757496. [DOI] [Google Scholar]
34.Sason I., Verdú S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inform. Theory. 2018;64:4323–4346. doi: 10.1109/TIT.2018.2803162. [DOI] [Google Scholar]
35.Saito S., Matsushima T. Non-Asymptotic Fundamental Limits of Guessing Subject to Distortion. arXiv. 20191808.06190v2 [Google Scholar]
36.Nakiboğlu B. The Rényi Capacity and Center. IEEE Trans. Inform. Theory. 2019;65:841–860. doi: 10.1109/TIT.2018.2861002. [DOI] [Google Scholar]
37.Campbell L.L. A coding theorem and Rényi’s entropy. Inf. Control. 1965;8:423–429. doi: 10.1016/S0019-9958(65)90332-3. [DOI] [Google Scholar]
38.Sibson R. Information radius. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. 1969;14:149–160. doi: 10.1007/BF00537520. [DOI] [Google Scholar]
39.Arimoto S. Information measures and capacity of order α for discrete memoryless channels. Top. Inf. Theory. 1977;16:41–52. [Google Scholar]
40.Augustin U. Noisy Channels. University Erlangen-Nürnberg; Erlangen, Germany: 1978. [Google Scholar]
41.Aishwarya G., Madiman M. Remarks on Rényi versions of conditional entropy and mutual information; Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT); Paris, France. 7–12 July 2019; pp. 1117–1121. [Google Scholar]
42.Nakiboğlu B. The Augustin Capacity and Center. Probl. Inf. Transm. 2019;55:299–342. doi: 10.1134/S003294601904001X. [DOI] [Google Scholar]
43.Nakiboğlu B. ((Middle East Technical University (METU), Ankara, Turkey)). Personal communication. 2020. Email on 9 February 2020.
44.Lieb E.H., Loss M. Analysis. 2nd ed. American Mathematical Society; Providence, RI, USA: 2001. Volume 14, Graduate Studies in Mathematics. [Google Scholar]
45.Lapidoth A., Pfister C. Two measures of dependence; Proceedings of the 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE); Eilat, Israel. 16–18 November 2016; pp. 1–5. [Google Scholar]
46.Gallager R.G. A simple derivation of the coding theorem and some applications. IEEE Trans. Inform. Theory. 1965;IT-11:3–18. doi: 10.1109/TIT.1965.1053730. [DOI] [Google Scholar]

[B1-entropy-22-00526] 1.Rényi A. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Volume I. University of California Press; Berkeley, CA, USA: 1961. On measures of entropy and information; pp. 547–561. [Google Scholar]

[B2-entropy-22-00526] 2.Shannon C. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:379–423, 623–656. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]

[B3-entropy-22-00526] 3.Toscani G. Rényi entropies and nonlinear diffusion equations. Acta Appl. Math. 2014;132:595–604. doi: 10.1007/s10440-014-9933-9. [DOI] [Google Scholar]

[B4-entropy-22-00526] 4.Wang L., Madiman M. Beyond the entropy power inequality, via rearrangements. IEEE Trans. Inform. Theory. 2014;60:5116–5137. doi: 10.1109/TIT.2014.2338852. [DOI] [Google Scholar]

[B5-entropy-22-00526] 5.Madiman M., Wang L., Woo J.O. Majorization and Rényi entropy inequalities via Sperner theory. Discret. Math. 2019;342:2911–2923. doi: 10.1016/j.disc.2019.03.002. [DOI] [Google Scholar]

[B6-entropy-22-00526] 6.Wang L., Woo J.O., Madiman M. A lower bound on the Rényi entropy of convolutions in the integers; Proceedings of the 2014 IEEE International Symposium on Information Theory; Honolulu, HI, USA. 29 June–4 July 2014; pp. 2829–2833. [Google Scholar]

[B7-entropy-22-00526] 7.Madiman M., Wang L., Woo J.O. Rényi entropy inequalities for sums in prime cyclic groups. arXiv. 20171710.00812 [Google Scholar]

[B8-entropy-22-00526] 8.Madiman M., Melbourne J., Xu P. Forward and Reverse Entropy Power Inequalities in Convex Geometry. In: Carlen E., Madiman M., Werner E.M., editors. Convexity and Concentration. Volume 161. Springer; Berlin, Germany: 2017. pp. 427–485. [Google Scholar]

[B9-entropy-22-00526] 9.Fradelizi M., Li J., Madiman M. Concentration of information content for convex measures. Electron. J. Probab. 2020;25:1–22. doi: 10.1214/20-EJP416. [DOI] [Google Scholar]

[B10-entropy-22-00526] 10.Stam A. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control. 1959;2:101–112. doi: 10.1016/S0019-9958(59)90348-1. [DOI] [Google Scholar]

[B11-entropy-22-00526] 11.Madiman M., Barron A. The Monotonicity of Information in the Central Limit Theorem and Entropy Power Inequalities; Proceedings of the 2006 IEEE International Symposium on Information Theory; Seattle, WA, USA. 9–14 July 2006; pp. 1021–1025. [Google Scholar]

[B12-entropy-22-00526] 12.Madiman M., Ghassemi F. Combinatorial Entropy Power Inequalities: A Preliminary Study of the Stam region. IEEE Trans. Inform. Theory. 2019;65:1375–1386. doi: 10.1109/TIT.2018.2854545. [DOI] [Google Scholar]

[B13-entropy-22-00526] 13.Bobkov S.G., Madiman M., Wang L. Fractional generalizations of Young and Brunn-Minkowski inequalities. In: Houdré C., Ledoux M., Milman E., Milman M., editors. Concentration, Functional Inequalities and Isoperimetry. Volume 545. American Mathematical Society; Providence, RI, USA: 2011. pp. 35–53. [Google Scholar]

[B14-entropy-22-00526] 14.Fradelizi M., Madiman M., Marsiglietti A., Zvavitch A. The convexification effect of Minkowski summation. EMS Surv. Math. Sci. 2019;5:1–64. doi: 10.4171/EMSS/26. [DOI] [Google Scholar]

[B15-entropy-22-00526] 15.Madiman M., Kontoyiannis I. Entropy bounds on abelian groups and the Ruzsa divergence. IEEE Trans. Inform. Theory. 2018;64:77–92. doi: 10.1109/TIT.2016.2620470. [DOI] [Google Scholar]

[B16-entropy-22-00526] 16.Ruzsa I.Z. Sumsets and entropy. Random Struct. Algorithms. 2009;34:1–10. doi: 10.1002/rsa.20248. [DOI] [Google Scholar]

[B17-entropy-22-00526] 17.Madiman M., Marcus A., Tetali P. Information-theoretic Inequalities in Additive Combinatorics; Proceedings of the 2010 IEEE Information Theory Workshop (ITW 2010); Cairo, Egypt. 6–8 January 2010. [Google Scholar]

[B18-entropy-22-00526] 18.Tao T. Sumset and inverse sumset theory for Shannon entropy. Comb. Probab. Comput. 2010;19:603–639. doi: 10.1017/S0963548309990642. [DOI] [Google Scholar]

[B19-entropy-22-00526] 19.Madiman M., Marcus A., Tetali P. Entropy and set cardinality inequalities for partition-determined functions. Random Struct. Algorithms. 2012;40:399–424. doi: 10.1002/rsa.20385. [DOI] [Google Scholar]

[B20-entropy-22-00526] 20.Kontoyiannis I., Madiman M. Sumset and Inverse Sumset Inequalities for Differential Entropy and Mutual Information. IEEE Trans. Inform. Theory. 2014;60:4503–4514. doi: 10.1109/TIT.2014.2322861. [DOI] [Google Scholar]

[B21-entropy-22-00526] 21.Csiszár I. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inform. Theory. 1995;41:26–34. doi: 10.1109/18.370121. [DOI] [Google Scholar]

[B22-entropy-22-00526] 22.Arikan E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inform. Theory. 1996;42:99–105. doi: 10.1109/18.481781. [DOI] [Google Scholar]

[B23-entropy-22-00526] 23.Bunte C., Lapidoth A. Encoding tasks and Rényi entropy. IEEE Trans. Inform. Theory. 2014;60:5065–5076. doi: 10.1109/TIT.2014.2329490. [DOI] [Google Scholar]

[B24-entropy-22-00526] 24.Bunte C., Lapidoth A. Rényi Entropy and Quantization for Densities; Proceedings of the 2014 IEEE Information Theory Workshop (ITW 2014); Hobart, TAS, Australia. 2–5 November 2014; pp. 258–262. [Google Scholar]

[B25-entropy-22-00526] 25.Tomamichel M., Hayashi M. Operational interpretation of Rényi information measures via composite hypothesis testing against product and Markov distributions. IEEE Trans. Inform. Theory. 2018;64:1064–1082. doi: 10.1109/TIT.2017.2776900. [DOI] [Google Scholar]

[B26-entropy-22-00526] 26.Tan V.Y.F., Hayashi M. Analysis of remaining uncertainties and exponents under various conditional Rényi entropies. IEEE Trans. Inform. Theory. 2018;64:3734–3755. doi: 10.1109/TIT.2018.2792495. [DOI] [Google Scholar]

[B27-entropy-22-00526] 27.Lapidoth A., Pfister C. Two measures of dependence. Entropy. 2019;21:778. doi: 10.3390/e21080778. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28-entropy-22-00526] 28.Teixeira A., Matos A., Antunes L. Conditional Rényi entropies. IEEE Trans. Inform. Theory. 2012;58:4273–4277. doi: 10.1109/TIT.2012.2192713. [DOI] [Google Scholar]

[B29-entropy-22-00526] 29.van Erven T., Harremoës P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory. 2014;60:3797–3820. doi: 10.1109/TIT.2014.2320500. [DOI] [Google Scholar]

[B30-entropy-22-00526] 30.Fehr S., Berens S. On the conditional Rényi entropy. IEEE Trans. Inform. Theory. 2014;60:6801–6810. doi: 10.1109/TIT.2014.2357799. [DOI] [Google Scholar]

[B31-entropy-22-00526] 31.Verdú S. α-mutual information; Proceedings of the Information Theory and Applications Workshop (ITA); San Diego, CA, USA. 1–6 February 2015. [Google Scholar]

[B32-entropy-22-00526] 32.Lapidoth A., Pfister C. Testing Against Independence and a Rényi Information Measure; Proceedings of the 2018 IEEE Inform. Theory Workshop; Guangzhou, China. 25–29 November 2018; pp. 1–5. [Google Scholar]

[B33-entropy-22-00526] 33.Sason I., Verdú S. Arimoto-Rényi conditional entropy and Bayesian M-ary hypothesis testing. IEEE Trans. Inform. Theory. 2018;64:4–25. doi: 10.1109/TIT.2017.2757496. [DOI] [Google Scholar]

[B34-entropy-22-00526] 34.Sason I., Verdú S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inform. Theory. 2018;64:4323–4346. doi: 10.1109/TIT.2018.2803162. [DOI] [Google Scholar]

[B35-entropy-22-00526] 35.Saito S., Matsushima T. Non-Asymptotic Fundamental Limits of Guessing Subject to Distortion. arXiv. 20191808.06190v2 [Google Scholar]

[B36-entropy-22-00526] 36.Nakiboğlu B. The Rényi Capacity and Center. IEEE Trans. Inform. Theory. 2019;65:841–860. doi: 10.1109/TIT.2018.2861002. [DOI] [Google Scholar]

[B37-entropy-22-00526] 37.Campbell L.L. A coding theorem and Rényi’s entropy. Inf. Control. 1965;8:423–429. doi: 10.1016/S0019-9958(65)90332-3. [DOI] [Google Scholar]

[B38-entropy-22-00526] 38.Sibson R. Information radius. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. 1969;14:149–160. doi: 10.1007/BF00537520. [DOI] [Google Scholar]

[B39-entropy-22-00526] 39.Arimoto S. Information measures and capacity of order α for discrete memoryless channels. Top. Inf. Theory. 1977;16:41–52. [Google Scholar]

[B40-entropy-22-00526] 40.Augustin U. Noisy Channels. University Erlangen-Nürnberg; Erlangen, Germany: 1978. [Google Scholar]

[B41-entropy-22-00526] 41.Aishwarya G., Madiman M. Remarks on Rényi versions of conditional entropy and mutual information; Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT); Paris, France. 7–12 July 2019; pp. 1117–1121. [Google Scholar]

[B42-entropy-22-00526] 42.Nakiboğlu B. The Augustin Capacity and Center. Probl. Inf. Transm. 2019;55:299–342. doi: 10.1134/S003294601904001X. [DOI] [Google Scholar]

[B43-entropy-22-00526] 43.Nakiboğlu B. ((Middle East Technical University (METU), Ankara, Turkey)). Personal communication. 2020. Email on 9 February 2020.

[B44-entropy-22-00526] 44.Lieb E.H., Loss M. Analysis. 2nd ed. American Mathematical Society; Providence, RI, USA: 2001. Volume 14, Graduate Studies in Mathematics. [Google Scholar]

[B45-entropy-22-00526] 45.Lapidoth A., Pfister C. Two measures of dependence; Proceedings of the 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE); Eilat, Israel. 16–18 November 2016; pp. 1–5. [Google Scholar]

[B46-entropy-22-00526] 46.Gallager R.G. A simple derivation of the coding theorem and some applications. IEEE Trans. Inform. Theory. 1965;IT-11:3–18. doi: 10.1109/TIT.1965.1053730. [DOI] [Google Scholar]

PERMALINK

Conditional Rényi Entropy and the Relationships between Rényi Capacities

Gautam Aishwarya

Mokshay Madiman

Abstract

1. Introduction

2. Definition of Conditional Rényi Entropies

Definition 1.

Definition 2.

Remark 1.

3. Relation to Rényi Divergence

Definition 3

Definition 4.

Remark 2.

Remark 3.

Theorem 1.

Proof.

Remark 4.

4. Basic Properties

4.1. Special Orders

Proposition 1.

Proof.

Proposition 2.

Proof.

4.2. Monotonicity in Order

Proposition 3.

Proof.

4.3. Conditioning Reduces Rényi Entropy

Theorem 2.

Proof.

Corollary 1.

Corollary 2.

Corollary 3.

Remark 5.

Corollary 4.

4.4. A chain Rule

Proposition 4.

Proof.

Remark 6.

4.5. Sensitivity to Reference Measure

Proposition 5.

Proof.

5. Notions of α-Mutual Information

Definition 5.

Figure 1.

Definition 6.

Theorem 3.

Proof.

Remark 7.

6. Channels and Capacities

Definition 7.

Definition 8.

Definition 9.

Theorem 4.

Proof.

Theorem 5.

Proof.

Acknowledgments

Author Contributions

Funding

Conflicts of Interest

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

5. Notions of $α$ -Mutual Information