Characterizations of matrix and operator-valued Φ-entropies, and operator Efron–Stein inequalities

Hao-Chung Cheng; Min-Hsiu Hsieh

doi:10.1098/rspa.2015.0563

. 2016 Mar;472(2187):20150563. doi: 10.1098/rspa.2015.0563

Characterizations of matrix and operator-valued Φ-entropies, and operator Efron–Stein inequalities

Hao-Chung Cheng ^1,^2,^✉, Min-Hsiu Hsieh ²

PMCID: PMC4841475 PMID: 27118909

Abstract

We derive new characterizations of the matrix Φ-entropy functionals introduced in Chen & Tropp (Chen, Tropp 2014 Electron. J. Prob. 19, 1–30. (doi:10.1214/ejp.v19-2964)). These characterizations help us to better understand the properties of matrix Φ-entropies, and are a powerful tool for establishing matrix concentration inequalities for random matrices. Then, we propose an operator-valued generalization of matrix Φ-entropy functionals, and prove the subadditivity under Löwner partial ordering. Our results demonstrate that the subadditivity of operator-valued Φ-entropies is equivalent to the convexity. As an application, we derive the operator Efron–Stein inequality.

Keywords: Φ-entropy, matrix concentration inequalities, Efron–Stein inequality

1. Introduction

The introduction of Φ-entropy functionals can be traced back to the early days of information theory [1,2] and convex analysis [3–6], where the notion of ϕ-divergence is defined. Formally, given a non-negative real random variable Z and a smooth convex function Φ, the Φ-entropy functional refers to

H_{Φ} (Z) = E Φ (Z) - Φ (E Z) .

By the Jensen inequality, it is not hard to see that the quantity H_Φ(Z) is non-negative. Hence, the Φ-entropy functional can be used as an entropic measure to characterize the uncertainty of the random variable Z.

The investigation of general properties of classical Φ-entropies has enjoyed great success in physics, probability theory, information theory and computer science. Of these, the subadditivity (or the tensorization) property [7–9] has led to the derivations of the logarithmic Sobolev [10], Φ-Sobolev [11] and Poincaré inequalities [12], which, in turn, is a crucial step towards the powerful entropy method in concentration inequalities [13–15] and analysis of Markov semigroups [16].

Let Z=f(X₁,…,X_n) be a random variable defined on n independent random variables (X₁,…,X_n). We say H_Φ(Z) is subadditive if

H_{Φ} (Z) \leq \sum_{i = 1}^{n} E [E_{i} Φ (Z) - Φ (E_{i} Z)],

where $E_{i}$ denotes the conditional expectation with respect to X_i. Gross [10] first observed that the ordinary entropy functional $H_{u \log u} (Z)$ is subadditive in his seminal paper. Later on, equivalent characterizations of the subadditive entropy class (see theorem 2.2) were established [11,17,18], which prove to be useful in other contexts such as stochastic processes [17,18].

Parallel to the classical Φ-entropies, Chen & Tropp [19] introduced the notion of matrix Φ-entropy functionals. Namely, for a positive semi-definite random matrix Z, the matrix Φ-entropy functional is defined as

H_{Φ} (Z) ≜ \bar{tr} [E Φ (Z) - Φ (E Z)],

where $\bar{tr}$ is the normalized trace. The class of subadditive matrix Φ-entropy functionals is characterized in terms of the second derivative of their representing functions. Unlike its classical counterpart, only a few connections between the matrix Φ-entropy functionals and other convex forms of the same functions have been established [20,21] prior to this work.

In this paper, we establish equivalent characterizations of the matrix Φ-entropy functionals defined in [19]. Our results show that matrix Φ-entropy functionals satisfy all known equivalent statements that classical Φ-entropy functions satisfy [15,17,18]. Our results provide additional justification for the original definition of the matrix Φ-entropy functionals (table 1). The equivalences between matrix Φ-entropy functionals and other convex forms of the function Φ advance our understanding of the class of entropy functions. Moreover, it allows us to unify the study of matrix concentration inequalities and matrix Φ-Sobolev inequalities [22,23].

Table 1.

Comparison between the equivalent characterizations of the Φ-entropy functional class (C1) (definition 2.1) and the matrix Φ-entropy functional class (C2) (definition 3.1).

	classical Φ-entropy functional class (C1)	matrix Φ-entropy functional class (C2)
(a)	Φ is affine or Φ′′>0 and 1/Φ′′ is concave	Φ is affine or DΦ′ is invertible and (DΦ′)⁻¹ is concave
(b)	convexity of (u,v)↦Φ(u+v)−Φ(u)−Φ′(u)v	convexity of (u,v)↦Tr[Φ(u+v)−Φ(u)−DΦ[u](v)]
(c)	convexity of (u,v)↦(Φ′(u+v)−Φ′(u))v	convexity of (u,v)↦Tr[(DΦ[u+v](v)−DΦ[u](v)]
(d)	convexity of (u,v)↦Φ′′(u)v²	convexity of (u,v)↦Tr[D²Φ[u](v,v)]
(e)	Φ is affine or Φ′′>0 and Φ′′′′Φ′′≥2Φ′′′²	equation (3.2)
(f)	convexity of (u,v)↦tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) for any 0≤t≤1	convexity of (u,v)↦ Tr[tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v)] for any 0≤t≤1
(g)	$E_{1} H_{Φ} (Z \| X_{1}) \geq H_{Φ} (E_{1} Z)$	$E_{1} H_{Φ} (Z \| X_{1}) \geq H_{Φ} (E_{1} Z)$
(h)	H_Φ(Z) is a convex function of Z	H_Φ(Z) is a convex function of Z
(i)	$H_{Φ} (Z) = sup_{T > 0} {E [Υ_{1} (T) \cdot Z + Υ_{2} (T)]}$	$H_{Φ} (Z) = sup_{T ≻ 0} {\bar{tr} E [Υ_{1} (T) \cdot Z + Υ_{2} (T)]}$
(j)	$H_{Φ} (Z) \leq \sum_{i = 1}^{n} E H_{Φ}^{(i)} (Z)$	$H_{Φ} (Z) \leq \sum_{i = 1}^{n} E H_{Φ}^{(i)} (Z)$

Open in a new tab

Furthermore, we consider the following operator-valued generalization of matrix Φ-entropy functionals:

H_{Φ} (Z) ≜ E Φ (Z) - Φ (E Z) .

A special case of this operator-valued Φ-entropy functional is the operator-valued variance Var(Z) defined in [24,25], where Φ is the square function. The equivalent conditions for the subadditivity under Löwner partial ordering are derived (theorem 4.5). In particular, we show that subadditivity of the operator-valued Φ-entropies is equivalent to the convexity,

subadditvity of H_{Φ} (Z) \Leftrightarrow H_{Φ} (Z) is convex in Z .

Our result directly yields the operator Efron–Stein inequality, which recovers the well-known Efron–Stein inequality [26,27] when random matrices reduce to real random variables.

(a). Our results

We summarize our results here. First, we derive equivalent characterizations for the matrix Φ-entropy functionals in table 1 (see theorem 3.3). Notably, all known equivalent characterizations for the classical Φ-entropies can be generalized to their matrix correspondences. We emphasize that additional characterizations of the Φ-entropies prove to be useful in many instances. The characterizations (b)–(d) in (C1) are explored by Chafaï [18] to derive several entropic inequalities for M/M/ $\infty$ queueing processes that are not diffusions. With the characterizations (b)–(d), the difficulty of lacking the diffusion property can be circumvented and replaced by convexity. Moreover, as shown in corollary 4.8, item (f) in table 1 can be used to demonstrate an interesting result in quantum information theory: the matrix Φ-entropy functional of a quantum ensemble (i.e. a set of quantum states with some prior distribution) is monotone under any unital quantum channel. This property motivates us to study the dynamical evolution of a quantum ensemble and its mixing time, a fundamentally important problem in quantum computation (see our follow-up work [23] for further details).

Second, we define and derive equivalent characterizations for operator-valued Φ-entropies in table 2 (see theorem 4.5). Note that the only known statement in table 1 that is missing in table 2 is condition (e). In other words, we are not able to generalize (e) in table 1 to the non-commutative case. Finally, we employ the subadditivity of operator-valued Φ-entropies to show the operator Efron–Stein inequality in theorem 5.1.

Table 2.

Equivalent statements of the operator-valued Φ-entropy class (C3) (definition 4.1).

	operator-valued Φ-entropy class (C3)
(a)	the second-order Fréchet derivative D²Φ[u](v,v) is jointly convex in (u,v)
(b)	Φ(u+v)−Φ(u)−DΦ[u](v) is jointly convex in (u,v)
(c)	DΦ[u+v](v)−DΦ[u](v) is jointly convex in (u,v)
(d)	D²Φ[u](v,v) is jointly convex in (u,v)
(e)	tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) is jointly convex in (u,v) for any 0≤t≤1
(f)	$E_{1} H_{Φ} (Z \| X_{1}) ⪰ H_{Φ} (E_{1} Z)$
(g)	H_Φ(Z) is a convex function of Z
(h)	$H_{Φ} (Z) = sup_{T ≻ 0} {E [Υ_{1} (T) \cdot Z + Υ_{2} (T)]}$
(i)	$H_{Φ} (Z) ⪯ \sum_{i = 1}^{n} E H_{Φ}^{(i)} (Z)$

Open in a new tab

(b). Prior work

For the history of the equivalent characterizations in the class (C1), we refer to an excellent textbook [15] and papers [17,18].

The original definition of the matrix Φ-entropy class, namely (a) in (C2), was proposed by Chen & Tropp in 2014 [19]. In the same paper, they also established the subadditivity property (j) through (i) and (g): (a)⇒(i)⇒(g)⇒(j) in table 1. Shortly after, the equivalent relation between (a) and the joint convexity of the matrix Brégman divergence (b) was proved in [21]. The equivalent relation between (a) and (d) was almost immediately implied by the result in [20] (see the detailed discussion in the proof of theorem 3.3). The convexity of H_Φ(Z), (h), is noted in [20]. Here, we provide transparent evidence—the joint convexity of (f).

We organize the paper in the following way. We collect the necessary information for the matrix algebra in §2. The equivalent characterizations of the matrix Φ-entropy functionals are provided in §3. We define the operator-valued Φ-entropies and derive their equivalent statements in §4. Section 5 shows an application of the subadditivity—the operator Efron–Stein inequality. The proofs of the main results are collected in §6 and §7, respectively. Finally, we conclude the paper.

2. Preliminaries

We first introduce basic notation.

The set $M^{sa}$ refers to the subspace of self-adjoint operators on some separable Hilbert space. We denote by $M^{+}$ (resp. $M^{+ +}$ ) the set of positive semi-definite (resp. positive-definite) operators in $M^{sa}$ . If the dimension d of a Hilbert space needs special attention, then we highlight it in subscripts, e.g. $M_{d}$ denotes the Banach space of d×d complex matrices. The trace function $Tr : C^{d \times d} \to C$ is defined as the summation of eigenvalues. The normalized trace function $\bar{tr}$ for every d×d matrix M is denoted by $\bar{tr} [M] ≜ (1 / d) Tr [M]$ . For $p \in [1, \infty)$ , the Schatten p-norm of an operator M is denoted as $∥ M ∥_{p} ≜ (\sum_{i} | λ_{i} {(M) |^{p})}^{1 / p}$ , where {λ_i(M)} are the singular values of M. The Hilbert–Schmidt inner product is defined as $⟨ A, B ⟩ ≜ Tr A^{†} B$ . For $A, B \in M^{sa}$ , $A \underline{≻} B$ means that A−B is positive semi-definite. Similarly, A≻B means A−B is positive-definite. Throughout this paper, italic capital letters (e.g. X) are used to denote operators.

Denote a probability space $(Ω, Σ, P)$ . A random matrix Z defined on the probability space $(Ω, Σ, P)$ means that it is a matrix-valued random variable defined on Ω. We denote the expectation of Z with respect to $P$ by

E_{P} [Z] ≜ \int_{Ω} Z d μ = \int_{x \in Ω} Z (x) P (d x),

where the integral is the Bochner integral [28,29]. We note that the results derived in this paper are universal for all probability spaces. Hence, we will omit the subscript $P$ of the expectation. If we consider a sample space Ω₁×Ω₂ with joint distribution $P$ , then we denote the conditional expectation of Z with respect to the first space Ω₁ by $E_{i} [Z] ≜ \int_{x_{1} \in Ω_{1}} Z (x 1, X_{2} \cdot) P_{1} (d x_{1})$ , where $P_{1} (x_{1}) = \int_{x_{2} \in Ω_{2}} P (x_{1}, d x_{2})$ is the marginal distribution on Ω₁.

Let $U, W$ be real Banach spaces. The Fréchet derivative of a function $L : U \to W$ at a point $X \in U$ , if it exists,¹ is a unique linear mapping $D L [X] : U \to W$ such that

∥ L (X + E) - L (X) - D L [X] (E) ∥_{W} = o (∥ E ∥_{U}),

where $∥ \cdot ∥_{U (W)}$ is a norm in $U$ (resp. $W$ ). The notation $D L [X] (E)$ then is interpreted as ‘the Fréchet derivative of $L$ at X in the direction E’. The partial Fréchet derivative of multivariate functions can be defined as follows. Let $U, V$ and $W$ be real Banach spaces, $L : U \times V \to W$ . For a fixed $v_{0} \in V$ , $L (u, v_{0})$ is a function of u whose derivative at u₀, if it exists, is called the partial Fréchet derivative of $L$ with respect to u, and is denoted by $D_{u} L [u_{0}, v_{0}]$ . The partial Fréchet derivative $D_{v} L [u_{0}, v_{0}]$ is defined similarly. Likewise, the mth Fréchet derivative $D^{m} L [X]$ is a unique multi-linear map from $U^{m} ≜ U \times \dots \times U$ (m times) to $W$ that satisfies

\begin{aligned} ∥ D^{m - 1} L [X + E_{m}] (E_{1}, \dots, E_{m - 1}) - D^{m - 1} L [X] (E_{1}, \dots, E_{m - 1}) \\ - D^{m} L [X] (E_{1}, \dots, E_{m}) ∥_{W} = o (∥ E_{m} ∥_{U}) \end{aligned}

for each $E_{i} \in U, i = 1, \dots, m$ . The Fréchet derivative enjoys several properties as in standard derivatives. We provide these facts in appendix A.

A function $f : I \to R$ is called operator convex if, for each $A, B \in M^{sa} (I)$ and 0≤t≤1,

f (t A) + f ((1 - t) B) ⪯ f (t A + (1 - t) B) .

Similarly, a function $f : I \to R$ is called operator monotone if, for each $A, B \in M^{sa} (I)$ ,

A ⪯ B \Rightarrow f (A) ⪯ f (B) .

(a). Classical Φ-entropy functionals

Let (C1) denote the class of functions $Φ : [0, \infty) \to R$ that are continuous, convex on $[0, \infty)$ , twice differentiable on $(0, \infty)$ , and either Φ is affine or Φ′′ is strictly positive and 1/Φ′′ is concave.

Definition 2.1 (Classical Φ-entropies) —

Let $Φ : [0, \infty) \to R$ be a convex function. For every non-negative integrable random variable Z, so that $E | Z | < \infty$ and $E | Φ (Z) | < \infty$ , the classical Φ-entropy H_Φ(Z) is defined as

$H_{Φ} (Z) = E Φ (Z) - Φ (E Z) .$

In particular, we are interested in Z=f(X₁,…,X_n), where X₁,…,X_n are independent random variables and f≥0 is a measurable function.

We say H_Φ(Z) is subadditive [9] if

H_{Φ} (Z) \leq \sum_{i = 1}^{n} E [H_{Φ}^{(i)} (Z)],

where $H_{Φ}^{(i)} (Z) = E_{i} Φ (Z) - Φ (E_{i} Z)$ is the conditional Φ-entropy, and $E_{i}$ denotes the conditional expectation conditioned on the n−1 random variables $X_{- i} ≜ (X_{1} \dots, X_{i - 1}, X_{i + 1}, \dots, X_{n})$ . Sometimes, we also denote $H_{Φ}^{(i)} (Z)$ by H_Φ(Z|X_−i).

It is a well-known result that, for any function Φ∈(C1), H_Φ(Z) is subadditive [11, corollary 3] (see also [13, section 3]).

Theorem 2.2 establishes equivalent characterizations of classical Φ-entropies.

Theorem 2.2 [18, theorem 4.4] —

The following statements are equivalent:

(a) Φ∈(C1): Φ is affine, or Φ′′>0 and 1/Φ′′ is concave;

(b) Brégman divergence (u,v)↦Φ(u+v)−Φ(u)−Φ′(u)v is convex;

(c) (u,v)↦(Φ′(u+v)−Φ′(u))v is convex;

(d) (u,v)↦Φ′′(u)v² is convex;

(e) Φ is affine or Φ′′>0 and Φ′′′′Φ′′≥2Φ′′′²;

(f) (u,v)↦tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) is convex for any 0≤t≤1;

(g) $E_{1} H_{Φ} (Z | X_{1}) \geq H_{Φ} (E_{1} Z);$

(h) {H_Φ(Z)}_Φ∈(C1) forms a convex set;

(i) $H_{Φ} (Z) = sup_{T > 0} (E [(Φ^{'} (T) - Φ^{'} (E T)) (Z - T)] + H_{Φ} (T))$ ; and

(j) $H_{Φ} (Z) \leq \sum_{i = 1}^{n} E H_{Φ}^{(i)} (Z)$ .

3. Equivalent characterizations of matrix Φ-entropy functionals

Here, we first introduce matrix Φ-entropy functionals, and present the main result (theorem 3.3) of this section, namely new characterizations of the matrix Φ-entropy functionals.

Chen & Tropp [19] introduce the class of matrix Φ-entropies, and prove its subadditivity in 2014. Here, we will show that all equivalent characterizations of classical Φ-entropies in theorem 2.2 have a one-to-one correspondence for the class of matrix Φ-entropies.

Let d be a natural number. The class Φ_d contains each function $Φ : (0, \infty) \to R$ that is either affine or satisfies the following three conditions.

(i) Φ is convex and continuous at zero.
(ii) Φ is twice continuously differentiable.
(iii) Define Ψ(t)=Φ′(t) for t>0. The Fréchet derivative DΨ of the standard matrix function $Ψ : M_{d}^{+ +} \to M_{d}^{sa}$ is an invertible linear map on $M_{d}^{+ +}$ , and the map A↦(DΨ[A])⁻¹ is concave with respect to the Löwner partial ordering on positive definite matrices.

Define $(C2) ≜ Φ_{\infty} \equiv ⋂_{d = 1}^{\infty} Φ_{d}$ .

Definition 3.1 (Matrix Φ-entropy functional [19]) —

Let $Φ : [0, \infty) \to R$ be a convex function. Consider a random matrix $Z \in M_{d}^{+}$ with $E ∥ Z ∥_{\infty} < \infty$ and $E ∥ Φ (Z) ∥_{\infty} < \infty$ . The matrix Φ-entropy H_Φ(Z) is defined as

$H_{Φ} (Z) ≜ \bar{tr} [E Φ (Z) - Φ (E Z)] .$

The corresponding conditional matrix Φ-entropy can be defined under the σ-algebra.

Theorem 3.2 (Subadditivity of the matrix Φ-entropy functional [19, theorem 2.5]) —

Let Φ∈(C2), and assume Z is a measurable function of (X₁,…,X_n). We have

$H_{Φ} (Z) \leq \sum_{i = 1}^{n} E [H_{Φ}^{(i)} (Z)],$ 3.1

where $H_{Φ}^{(i)} (Z) = E_{i} Φ (Z) - Φ (E_{i} Z)$ is the conditional entropy, and $E_{i}$ denotes the conditional expectation conditioned on the n−1 random matrices $X_{- i} ≜ (X_{1}, \dots, X_{i - 1}, X_{i + 1}, \dots, X_{n})$ .

Theorem 3.3 is the main result of this section. We show that all the equivalent conditions in theorem 2.2 also hold for the class of matrix Φ-entropy functionals. Hence, we have a more comprehensive understanding of the class of matrix Φ-entropy functionals.

Theorem 3.3 —

The following statements are equivalent:

(a) Φ∈(C2): Φ is affine or DΨ is invertible and A↦(DΨ[A])⁻¹ is operator concave;

(b) matrix Brégman divergence: (A,B)↦Tr[Φ(A+B)−Φ(A)−DΦ[A](B)] is convex;

(c) (A,B)↦Tr[DΦ[A+B](B)−DΦ[A](B)] is convex;

(d) (A,B)↦Tr[D²Φ[A](B,B)] is convex;

(e) Φ is affine or Φ′′>0 and
$\begin{aligned} Tr [h \cdot {(D Ψ [A])}^{- 1} \circ D^{3} Ψ [A] (k, k, {(D Ψ [A])}^{- 1} (h))] \\ \geq 2 Tr [h \cdot {(D Ψ [A])}^{- 1} \circ D^{2} Ψ [A] (k, {(D Ψ [A])}^{- 1} (D^{2} Ψ [A] (k, {(D Ψ [A])}^{- 1} (h))))], \end{aligned}$ 3.2
for each $A \underline{≻} 0$ and $h, k \in M_{d}^{sa};$

(f) (A,B)↦ Tr[tΦ(A)+(1−t)Φ(B)−Φ(tA+(1−t)B)] is convex for any 0≤t≤1;

(g) $E_{1} H_{Φ} (Z | X_{1}) \geq H_{Φ} (E_{1} Z);$

(h) {H_Φ(Z)}_Φ∈(C2) forms a convex set of convex functions;

(i) $H_{Φ} (Z) = sup_{T ≻ 0} {Tr E [(Φ^{'} (T) - Φ^{'} (E T)) (Z - T)] + H_{Φ} (T)}$ ; and

(j) $H_{Φ} (Z) \leq \sum_{i = 1}^{n} E H_{Φ}^{(i)} (Z)$ .

We note that the statements (a)⇒(i)⇒(g)⇒(j) was proved by Chen & Tropp [19]. The equivalence of (a)⇔(b) was shown in [21, theorem 2]. Hansen and Zhang [20] established an equivalence of item (a) and the convexity of the following map:

(A, X) \mapsto ⟨ X, D Φ^{'} [A] (X) ⟩ .

3.3

From lemma A.5, it is not hard to observe that equation (3.3) is equivalent to item (d), i.e.

Tr (D^{2} Φ [A] (X, X)) = ⟨ X, D Φ^{'} [A] (X) ⟩ .

We provide the detailed proof of the remaining equivalence statements in §6.

4. Operator-valued Φ-entropies

Here, we extend the notion of matrix Φ-entropy functionals (i.e. real-valued) to operator-valued Φ-entropies.

Definition 4.1 (Operator-valued entropy class) —

Let d be a natural number. The class Φ_d contains each function $Φ : [0, \infty) \to R$ such that its second-order Fréchet derivative exists and the following map satisfies the joint convexity (under the Löwner partial ordering) condition:

$(A, X) \mapsto D^{2} Φ [A] (X, X), \forall A \in M_{d}^{sa}, X \in M_{d} .$ 4.1

We denote the class of operator-valued Φ-entropies $Φ_{\infty} ≜ ⋂_{d = 1}^{\infty} Φ_{d}$ by (C3).

Definition 4.2 (Operator-valued Φ-entropies) —

Let $Φ : [0, \infty) \to R$ be a convex function. Consider a random matrix Z taking values in $M^{+}$ , with $E ∥ Z ∥_{\infty} < \infty$ and $E ∥ Φ (Z) ∥_{\infty} < \infty$ . That is, the random matrix Z and Φ(Z) are Bochner integrable [28,29] (hence, $E Z$ and $E Φ (Z)$ exist and are well defined). The operator-valued Φ-entropy H_Φ is defined as

$H_{Φ} (Z) ≜ E Φ (Z) - Φ (E Z) .$ 4.2

The corresponding conditional terms can be defined under the σ algebra.

It is worth mentioning that the matrix Φ-entropy functional [19] in §3 is non-negative for every convex function Φ due to the fact that the trace function $\bar{tr} Φ$ is also convex [32] (or see, for example, [33, section 2.2]). However, according to the operator Jensen inequality [34, theorem 3.2], only the operator convex function Φ ensures that the operator-valued Φ-entropy is non-negative.

In the following, we show that the entropy class (C3) is not an empty set.

Proposition 4.3 —

The square function Φ(u)=u² belongs to (C3).

Proof. —

It suffices to verify the joint convexity of the map

$D^{2} Φ [A] (X, X) = 2 X^{2} for all A, X \in M^{+},$

where we use the identity of the second-order Fréchet derivative [35, example X.4.6] of the square function. Because the square function is operator convex, Φ(u)=u² belongs to the operator-valued Φ-entropy class (C3). ▪

(a). Subadditivity of operator-valued Φ-entropies

Denote by $X ≜ (X_{1}, \dots, X_{n})$ a series of independent random variables taking values in a Polish space, and let

X_{- i} ≜ (X_{1}, \dots, X_{i - 1}, X_{i + 1}, \dots, X_{n}) .

Let a positive semi-definite matrix Z that depends on the series of random variables X be defined as

Z ≜ Z (X_{1}, \dots, X_{n}) \in M^{+} .

Throughout this paper, we assume that the random matrix Z satisfies the integrability conditions: $| Z | ≜ \sqrt{Z^{2}}$ and |Φ(Z)| is Bochner integrable for Φ∈(C3).

Theorem 4.4 (Subadditivity of the operator-valued Φ-entropy) —

Fix a function Φ∈(C3). Under the prevailing assumptions,

$H_{Φ} (Z) ⪯ \sum_{i = 1}^{n} E [H_{Φ}^{(i)} (Z)],$ 4.3

where $H_{Φ}^{(i)} (Z) = H_{Φ} (Z | X_{- i}) ≜ E_{i} Φ (Z) - Φ (E_{i} Z)$ .

The proof is given in §7.

(b). Equivalent characterizations of operator-valued Φ-entropies

Here, we derive alternative characterizations of the class (C3) in theorem 4.5. As an application of the entropy class, we show that if the function Φ belongs to (C3), then the operator-valued Φ-entropy is monotone under any unital completely positive map.

Theorem 4.5 —

The following statements are equivalent:

(a) Φ∈(C3): convexity of (A,B)↦D²Φ[A](B,B);

(b) operator-valued Brégman divergence: (A,B)↦Φ(A+B)−Φ(A)−DΦ[A](B) is convex;

(c) (A,B)↦DΦ[A+B](B)−DΦ[A](B) is convex;

(d) (A,B)↦D²Φ[A](B,B) is convex;

(e) convexity of (A,B)↦ tΦ(A)+(1−t)Φ(B)−Φ(tA+(1−t)B) for any 0≤t≤1;

(f) $E_{1} H_{Φ} (Z | X_{1}) ⪰ H_{Φ} (E_{1} Z)$ ;

(g) {H_Φ(Z)}_Φ∈(C3) forms a convex set of convex functions;

(h) $H_{Φ} (Z) = sup_{T ≻ 0} {E [D Φ [T] (Z - T) - D Φ [E T] (Z - T)] + H_{Φ} (T)}$ ; and

(i) $H_{Φ} (Z) ⪯ \sum_{i = 1}^{n} E H_{Φ}^{(i)} (Z)$ .

The proof is omitted, because it directly follows from that of theorem 3.3 without taking traces.

Remark 4.6 —

In item (g) of theorem 4.5, we introduce a supremum representation for the operator-valued Φ-entropies. The supremum is defined as the least upper bound (under Löwner partial ordering) among the set of operators. In general, the supremum might not exist owing to matrix partial ordering; however, the supremum in (g) exists and is attained when T≡Z.

In the following, we demonstrate a monotone property of operator-valued Φ-entropies when Φ∈(C3).

Proposition 4.7 (Monotonicity of operator-valued Φ-entropies) —

Fix a convex function Φ∈(C3), then the operator-valued Φ-entropy H_Φ(Z) is monotone under any unital completely positive map N, i.e.

$H_{Φ} (N (Z)) ⪯ H_{Φ} (Z)$

for any random matrix Z taking values in $M^{+}$ .

Proof. —

If Φ∈(C3), by item (e) in theorem 4.5, then we have the joint convexity of the map:

$F_{t} (A, B) ≜ t Φ (A) + (1 - t) Φ (B) - Φ (t A + (1 - t) B)$

for any 0≤t≤1. Let X=(A,B) denote the pair of matrices.

For any completely positive unital map N, it can be expressed in the following form [36]:

$N (A) = \sum_{i} K_{i} A K_{i}^{†},$

where $\sum_{i} K_{i} K_{i}^{†} = I$ (the identity matrix in $M^{sa}$ ), and the dagger denotes the complex conjugate. Hence, by the Jensen operator inequality, proposition A.7 yields

$F_{t} (N (X)) ⪯ F_{t} (X), \forall 0 \leq t \leq 1$

for any completely positive unital map N, which implies the monotonicity of H_Φ(Z). ▪

Following the same argument, the matrix Φ-entropy functional satisfies the monotone property if Φ∈(C2).

Corollary 4.8 (Monotonicity of matrix Φ-entropy functionals) —

Fix a convex function Φ∈(C2), then the matrix Φ-entropy functional H_Φ(Z) is monotone under any unital completely positive map N: H_Φ(N(Z))≤H_Φ(Z) for any random matrix Z taking values in $M^{+}$ .

We remark that the monotonicity of a quantum ensemble is only known for when $Φ (x) = x \log x$ . This is the famous result in quantum information theory, namely the monotone property of the Holevo quantity [37]. Our corollary 4.8 extends the monotonicity of a quantum ensemble to any function Φ∈(C2).

5. Applications: operator Efron–Stein inequality

Here, we employ the operator subadditivity of H_u→u²(Z) to prove the operator Efron–Stein inequality. For 1≤i≤n, let X₁′,…,X_n′ be independent copies of X₁,…,X_n, and denote ${\tilde{X}}^{(i)} ≜ (X_{1}, \dots, X_{i - 1}, X_{i}^{'}, X_{i + 1}, \dots, X_{n})$ , i.e. replacing the ith component of X by the independent copy X_i′.

Define the quantity²

E (L) ≜ \frac{1}{2} E [\sum_{i = 1}^{n} {(L (X) - L ({\tilde{X}}^{(i)}))}^{2}],

and denote the operator-valued variance of a random matrix A (taking values in $M^{sa}$ ) by

Var (A) ≜ E {(A - E A)}^{2} = E A^{2} - {(E A)}^{2} .

Theorem 5.1 (Operator Efron–Stein inequality) —

With the prevailing assumptions, we have

$Var (Z) ⪯ E (Z) .$

Proof. —

Theorem 5.1 is a direct consequence of the subadditivity of operator-valued Φ-entropies, namely theorem 4.4 with Φ(u)=u².

For two independent and identical random matrices A, B, direct calculation yields

$\begin{aligned} \frac{1}{2} E [{(A - B)}^{2}] & = \frac{1}{2} E [A^{2} - A B - B A + B^{2}] \\ = Var (A) . \end{aligned}$

Observe that Z_i′ is an independent copy of Z conditioned on X_−i. Denote ${Var}^{(i)} (Z) ≜ E_{i} {(Z - E_{i} Z)}^{2}$ for all i=1,…,n. Then

${Var}^{(i)} (Z) = \frac{1}{2} E_{i} [{(Z - Z_{i}^{'})}^{2}] .$

Finally, theorem 4.4 and proposition 4.3 lead to

$\begin{aligned} Var (Z) & = H_{u \mapsto u^{2}} (Z) \\ ⪯ \sum_{i = 1}^{n} E H_{u \mapsto u^{2}}^{(i)} (Z) \\ = \sum_{i = 1}^{n} {Var}^{(i)} (Z) \\ = E (Z) . \end{aligned}$

▪

Note that the established operator Efron–Stein inequality leads directly to a matrix polynomial Efron–Stein inequality.

Corollary 5.2 (Matrix polynomial Efron–Stein) —

With the prevailing assumptions, for each natural number p≥1, we have

$∥ E {(Z - E Z)}^{2} ∥_{p}^{p} \leq {∥ \frac{1}{2} \sum_{i = 1}^{n} E [{(Z - Z_{i}^{'})}^{2}] ∥}_{p}^{p} .$

Corollary 5.2 is a variant of the matrix polynomial Efron–Stein inequality derived in [25, theorem 4.2].

6. Proof of theorem 3.3

Proof. —

(a)⇒(i)⇒(g)⇒(j) This statement is proved by Chen & Tropp in [19].

(a)⇔(b) This equivalent statement is proved in [21, theorem 2].

(a)⇔(d) Theorem 2.1 in [20] proved the equivalence of (a) and the following convexity lemma.

Lemma 6.1 (Convexity lemma [19, lemma 4.2]) —

Fix a function Φ∈(C2), and let Ψ=Φ′. Suppose that A is a random matrix taking values in $M_{d}^{+ +},$ and let X be a random matrix taking values in $M_{d}^{sa}$ . Assume that ∥A∥, ∥X∥ are integrable. Then

$E ⟨ X, D Ψ [A] (X) ⟩ \geq ⟨ E [X], D Ψ [E A] (E X) ⟩ .$

What remains is to establish equivalence between the convexity lemma and condition (d). This follows easily from lemma A.5,

Tr (D^{2} Φ [A] (X, X)) = ⟨ X, D Φ^{'} [A] (X) ⟩ .

Remark 6.2 —

In [19, lemma 4.2], it is shown that the concavity of the map,

$A \mapsto ⟨ X {(D Ψ [A])}^{- 1} (X) ⟩, \forall X \in M_{d}^{s a},$

implies the joint convexity of the map (i.e. lemma 6.1),

$(X, A) \mapsto ⟨ X (D Ψ [A]) (X) ⟩ .$ 6.1

(b)⇔(c)⇔(d) Define $A_{Φ}, B_{Φ}, C_{Φ} : M_{d}^{+} \times M_{d}^{+} \to R$ as

$\begin{aligned} A_{Φ} (u, v) & ≜ Tr [Φ (u + v) - Φ (u) - D Φ [u] (v)], \\ B_{Φ} (u, v) & ≜ Tr [D Φ [u + v] (v) - D Φ [u] (v)], \\ C_{Φ} (u, v) & ≜ Tr [D^{2} Φ [u] (v, v)] . \end{aligned}$

Following from [18], we can establish the following relations: for any $(u, v) \in M_{d}^{+} \times M_{d}^{+}$ ,

$A_{Φ} (u, v) = \int_{0}^{1} (1 - s) C_{Φ} (u + s v, v) d s$ 6.2

and

$B_{Φ} (u, v) = \int_{0}^{1} C_{Φ} (u + s v, v) d s,$ 6.3

and, for small enough ϵ>0,

$A_{Φ} (u, ϵ v) = \frac{1}{2} C_{Φ} (u, v) ϵ^{2} + o (ϵ^{2})$ 6.4

and

$B_{Φ} (u, ϵ v) = C_{Φ} (u, v) ϵ^{2} + o (ϵ^{2}) .$ 6.5

Equation (6.2) is exactly the integral representation for the matrix Brégman divergence proved in [21]. Similarly, equation (6.3) follows from

$\begin{aligned} B_{Φ} (u, v) & = {\frac{d}{d s} Tr [Φ (u + s v)] |}_{s = 1} - {\frac{d}{d s} Tr [Φ (u + s v)] |}_{s = 0} \\ = \int_{0}^{1} \frac{d}{d s} (\frac{d}{d s} Tr [Φ (u + s v)]) d s \\ = \int_{0}^{1} C_{Φ} (u + s v, v) d s . \end{aligned}$

Equations (6.4) and (6.5) can be obtained by Taylor expansion at (u,0). That is,

$\begin{aligned} A_{Φ} (u, ϵ v) & = A_{Φ} (u, 0) + D_{u} A_{Φ} [u, 0] (0) + D_{v} A_{Φ} [u, 0] (ϵ v) \\ + \frac{1}{2} (D_{u}^{2} A_{Φ} [u, 0] (0, 0) + 2 D_{u} D_{v} A_{Φ} [u, 0] (0, ϵ v) + D_{v}^{2} A_{Φ} [u, 0] (ϵ v, ϵ v)) + o (ϵ^{2}) \\ = Tr [D Φ [u + 0] (ϵ v) - D Φ [u] (D [v] (ϵ v)) + \frac{1}{2} D^{2} Φ [u + 0] (ϵ v, ϵ v)] + o (ϵ^{2}) \\ = \frac{1}{2} C_{Φ} (u, v) ϵ^{2} + o (ϵ^{2}) . \end{aligned}$

Following the same argument,

$\begin{aligned} B_{Φ} (u, ϵ v) \\ = B_{Φ} (u, 0) + D^{2} Φ [u + 0] (0, ϵ v) + D Φ [u + 0] (D [v] (ϵ v)) - D Φ [u] (D [v] (ϵ v)) \\ + \frac{1}{2} (D^{3} Φ [u + 0] (0, ϵ v, ϵ v) + 2 D^{2} Φ [u + 0] (ϵ v, ϵ v)) + o (ϵ^{2}) \\ = C_{Φ} (u, v) ϵ^{2} + o (ϵ^{2}) . \end{aligned}$

We can observe from equations (6.2) and (6.3) that the joint convexity of (u,v)↦A_Φ(u,v) and (u,v)↦B_Φ(u,v) follows from that of (u,v)↦C_Φ(u,v). In other words, we proved that conditions (d)⇒(b) and (d)⇒(c).

Conversely, equations (6.4) and (6.5) show that (b)⇒(d) and condition (c)⇒(d). To be more specific, the joint convexity of (u,v)↦A_Φ(u,ϵv) implies

$t A_{Φ} (u_{1}, ϵ v_{1}) + (1 - t) A_{Φ} (u_{2}, ϵ v_{2}) \geq A_{Φ} (u, ϵ v),$ 6.6

for each $u_{1}, u_{2}, v_{1}, v_{2} \in M_{d}^{+}$ , t∈[0,1], ϵ>0, and u≡tu₁+(1−t)u₂, v≡tv₁+(1−t)v₂. Invoking equation (6.4) gives

$t A_{Φ} (u_{1}, ϵ v_{1}) + (1 - t) A_{Φ} (u_{2}, ϵ v_{2}) = \frac{t C_{Φ} (u_{1}, v_{1}) + (1 - t) C_{Φ} (u_{2}, v_{2})}{2} ϵ^{2} + o (ϵ^{2})$

and

$A_{Φ} (u, ϵ v) = \frac{1}{2} C_{Φ} (u, ϵ v) ϵ^{2} + o (ϵ^{2}) .$

Hence, equation (6.6) is equivalent to

$t C_{Φ} (u_{1}, v_{1}) ϵ^{2} + (1 - t) C_{Φ} (u_{2}, v_{2}) ϵ^{2} + o (ϵ^{2}) \geq C_{Φ} (u, ϵ v) ϵ^{2} + o (ϵ^{2}) .$

The joint convexity of (u,v)↦C_Φ(u,ϵv) follows by dividing by ϵ² on both sides and letting $ϵ \to 0$ . The joint convexity of (u,v)↦B_Φ(u,ϵv) can be obtained in a similar way using equation (6.5).

(a)⇔(e) It is trivial if Φ is affine; hence, we assume Φ′′>0. We start from the convexity of the map,

$A \mapsto - Tr [h {(D Ψ [A])}^{- 1} (h)], for all h \in M_{d}^{sa} .$ 6.7

To ease the burden of notation, we denote $T_{A} ≜ D Ψ [A] ≃ C^{d^{2} \times d^{2}}$ and $\hat{h} ≜ h ≃ C^{d^{2} \times 1}$ by the isometric isomorphism between super-operators and matrices. Then, equation (6.7) can be rewritten as

$A \mapsto - {\hat{h}}^{†} \cdot T_{A}^{- 1} \cdot \hat{h}, for all \hat{h} \in C^{d^{2} \times 1},$

which is equivalent to the non-negativity of the second derivative (see proposition A.2),

$\begin{aligned} - D_{A}^{2} [{\hat{h}}^{†} \cdot T_{A}^{- 1} \cdot \hat{h}] (k, k) & = - {\hat{h}}^{†} \cdot D_{A}^{2} [T_{A}^{- 1}] (k, k) \cdot \hat{h} \\ \geq 0, for all A ⪰ 0, \hat{h} \in C^{d^{2} \times 1}, k \in M_{d}^{sa} . \end{aligned}$

Now, recall the chain rule of the Fréchet derivative in proposition A.1,

$\begin{aligned} D F \circ G [A] (u) & = D F [G (A)] (D G [A] (u)); \\ D^{2} F \circ G [A] (u, v) & = D^{2} F [G (A)] (D G [A] (v), D G [A] (v)) \\ + D F [G (A)] (D^{2} G [A] (u, v)), \end{aligned}$

and the formula of the differentiation of the inverse function (see lemma A.6),

$\begin{aligned} D G {[A]}^{- 1} (u) & = - G {(A)}^{- 1} \cdot D G [A] (u) \cdot G {(A)}^{- 1}; \\ D^{2} G {[A]}^{- 1} (u, u) & = 2 G {(A)}^{- 1} \cdot D G [A] (u) \cdot G {(A)}^{- 1} \cdot D G [A] (u) \cdot g {(A)}^{- 1} \\ - G {(A)}^{- 1} \cdot D^{2} G [A] (u, u) \cdot G {(A)}^{- 1}, \end{aligned}$

we can compute the following identities by taking $G [A] \equiv T_{A}$ and u≡k:

$D_{A} [T_{A}^{- 1}] (k) = - T_{A}^{- 1} \cdot D_{A} [T_{A}] (k) \cdot T_{A}^{- 1}$

and

$\begin{aligned} D_{A} [T_{A}^{- 1}] (k, k) & = 2 \cdot T_{A}^{- 1} \cdot D_{A} [T_{A}] (k) \cdot T_{A}^{- 1} \cdot D_{A} [T_{A}] (k) \cdot T_{A}^{- 1} \\ - T_{A}^{- 1} \cdot D_{A}^{2} [T_{A}] (k, k) \cdot T_{A}^{- 1} . \end{aligned}$

Therefore, we reach the expression (3.2), and statement (a) is true if and only if (3.2) holds. Recall that, in the scalar case (i.e. d=1), the Fréchet derivative can be expressed as the product of the differential and the direction [38, theorem 3.11]

$D Ψ [a] h = Ψ^{'} (a) \cdot h .$

Hence, equation (3.2) reduces to

$\begin{aligned} h \cdot (Ψ^{'} {(a))}^{- 1} \cdot Ψ^{‴} (a) \cdot k^{2} \cdot {(Ψ^{'} (a))}^{- 1} \cdot h \\ = \frac{Φ^{⁗} (a) \cdot Φ^{″} (a) \cdot k^{2} h^{2}}{Φ^{″} {(a)}^{2}} \\ \geq 2 \cdot h \cdot {(Ψ^{'} (a))}^{- 1} \cdot Ψ^{″} (a) \cdot k \cdot {(Ψ^{'} (a))}^{- 1} \cdot Ψ^{″} (a) \cdot k \cdot {(Ψ^{'} (a))}^{- 1} \cdot h \\ = \frac{2 Φ^{‴} {(a)}^{2} \cdot k^{2} h^{2}}{Φ^{″} {(a)}^{3}}, \end{aligned}$

for all a>0 and $h, k \in R$ . In other words, equation (3.2) can be viewed as a non-commutative generalization of the classical statement: Φ′′′′Φ′′≥2Φ′′′².

(d)⇔(f) For any t∈[0,1], define $F_{t} : M_{d}^{+} \times M_{d}^{+} \to M_{d}^{sa}$ as

$F_{t} (X, Y) ≜ t Φ (X) + (1 - t) Φ (Y) - Φ (t X + (1 - t) Y) .$

By taking x≡(X,Y) and h≡(h,k) in proposition A.2, the convexity of the twice Fréchet differentiable function F_t is equivalent to

$D^{2} F_{t} [X, Y] (h, k) ⪰ 0 \forall X, Y \in M_{d}^{+} and \forall h, k \in M_{d}^{sa} .$

Then, with the help of the partial Fréchet derivative defined in proposition A.3, the second-order Fréchet derivative of F_t(X,Y) can be evaluated as

$\begin{aligned} D^{2} F_{t} [X, Y] (h, k) \\ = D_{X}^{2} F_{t} [X, Y] (h, h) + D_{Y} D_{X} F_{t} [X, Y] (h, k) \\ + D_{X} D_{Y} F_{t} [X, Y] (k, h) + D_{Y}^{2} F_{t} [X, Y] (k, k) \\ = t \cdot D^{2} Φ [X] (h, h) - t^{2} \cdot D^{2} Φ [t X + (1 - t) Y] (h, h) \\ - t (1 - t) \cdot D^{2} Φ [t X + (1 - t) Y] (h, k) - t (1 - t) \cdot D^{2} Φ [t X + (1 - t) Y] (k, h) \\ + (1 - t) \cdot D^{2} Φ [Y] (k, k) - {(1 - t)}^{2} \cdot D^{2} Φ [t X + (1 - t) Y] (k, k) . \end{aligned}$ 6.8

Taking trace on both sides of (6.8) and invoking lemma A.5, we have

$\begin{aligned} Tr [D^{2} F_{t} [X, Y] (h, k)] \\ = Tr [t \cdot h D Ψ [X] (h) - t^{2} \cdot h D Ψ [t X + (1 - t) Y] (h)] \\ - Tr [t (1 - t) \cdot h D Ψ [t X + (1 - t) Y] (k) + t (1 - t) \cdot k D Ψ [t X + (1 - t) Y] (h)] \\ + Tr [(1 - t) \cdot k D Ψ [Y] (k) - {(1 - t)}^{2} \cdot k D^{2} Ψ [t X + (1 - t) Y] (k)] . \end{aligned}$ 6.9

Because both the trace and the second-order Fréchet derivative are bilinear, we have the following result:

$\begin{aligned} Tr [t^{2} \cdot h D Ψ [t X + (1 - t) Y] (h) + t (1 - t) \cdot k D Ψ [t X + (1 - t) Y] (h)] \\ = ⟨ t h, D Ψ [t X + (1 - t) Y] (t h) ⟩ + ⟨ (1 - t) k, D Ψ [t X + (1 - t) Y] (t h) ⟩ \\ = ⟨ t h + (1 - t) k, D Ψ [t X + (1 - t) Y] (t h) ⟩ . \end{aligned}$ 6.10

Similarly,

$\begin{aligned} Tr [t (1 - t) \cdot h D Ψ [t X + (1 - t) Y] (k) + {(1 - t)}^{2} \cdot k D Ψ [t X + (1 - t) Y] (k)] \\ = ⟨ t h + (1 - t) k, D Ψ [t X + (1 - t) Y] ((1 - t) k) ⟩ . \end{aligned}$ 6.11

Combining equations (6.10) and (6.11), equation (6.9) can be expressed as

$\begin{aligned} Tr [D^{2} F_{t} [X, Y] (h, k)] & = t \cdot ⟨ h, D Ψ [X] (h) ⟩ + (1 - t) \cdot ⟨ k, D Ψ [Y] (k) ⟩ \\ - ⟨ (t h + (1 - t) k), D Ψ [t X + (1 - t) Y] (t h + (1 - t) k) ⟩ . \end{aligned}$

Then, it is not hard to observe that the non-negativity of Tr[D²F_t[X,Y](h,k)] for every $X, Y \in M_{d}^{+}$ , $h, k \in M_{d}^{sa}$ and t∈[0,1] is equivalent to the joint convexity of the map

$(X, A) \mapsto ⟨ X, D Ψ [A] (X) ⟩ = Tr [D^{2} Φ [A] (X, X)] .$

(j)⇒(g) Considering n=2, the subadditivity means that

$H_{Φ} (Z) \leq E_{1} H_{Φ}^{(2)} (Z) + E_{2} H_{Φ}^{(1)} (Z) .$

Then, we have

$\begin{aligned} E_{1} H_{Φ}^{(2)} (Z) & \geq H_{Φ} (Z) - E_{2} H_{Φ}^{(1)} (Z) \\ = E Φ (Z) - Φ (E Z) - E_{2} E_{1} Φ (Z) + E_{2} Φ (E_{1} Z) \\ = E_{2} Φ (E_{1} Z) - Φ (E_{2} E_{1} Z) \\ = H_{Φ} (E_{1} Z) . \end{aligned}$

(f)⇔(h) Let s∈[0,1]. Define a pair of positive semi-definite random matrices (X,Y) taking values (x,y) with probability s and (x′,y′) with probability (1−s). Then the convexity of H_Φ implies that

$H_{Φ} (t X + (1 - t) Y) \leq t H_{Φ} (X) + (1 - t) H_{Φ} (Y)$ 6.12

for every t∈[0,1]. Now, define $F_{t} (u, v) : M_{d}^{+} \times M_{d}^{+} \to R$ as

$F_{t} (u, v) ≜ Tr [t Φ (u) + (1 - t) Φ (v) - Φ (t u + (1 - t) v)] .$

Then, it follows that

$\begin{aligned} s F_{t} (x, y) + (1 - s) F_{t} (x^{'}, y^{'}) - F_{t} (s (x, y) + (1 - s) x^{'}, y^{'}) \\ = t E Φ (X) - t Φ (E X) + (1 - t) E Φ (Y) - (1 - t) Φ (E Y) \\ - E Φ (t X + (1 - t) Y) + Φ (t E X + (1 - t) E Y) \\ = t H_{Φ} (X) + (1 - t) H_{Φ} (Y) - H_{Φ} (t X + (1 - t) Y), \end{aligned}$

which means that the convexity of the pair (u,v)↦F_t(u,v) is equivalent to the convexity of H_Φ, i.e. equation (6.12).

(g)⇔(h) Define a positive semi-definite random matrix $Z ≜ f (X_{1}, X_{2})$ , which depends on two random variables X₁,X₂ on a Polish space. Denote by Z_X₁ the random matrix Z conditioned on X₁. According to the convexity of H_Φ, it follows that

$\begin{aligned} E_{1} H_{Φ} (Z | X_{1}) & = E_{1} H_{Φ} (Z_{X_{1}}) \\ = E_{1} [\bar{tr} (E_{2} Φ (Z_{X_{1}}) - Φ (E_{2} Z_{X_{1}}))] \\ \geq \bar{tr} E_{2} Φ (E_{1} Z_{X_{1}}) - \bar{tr} [Φ (E_{1} E_{2} Z_{X_{1}})] \\ = H_{Φ} (E_{1} Z) . \end{aligned}$

Conversely, define a positive semi-definite random matrix $Z (s, X, Y) ≜ s X + (1 - s) Y,$ where s is a random variable. Now, let s be Bernoulli distributed with parameter t∈[0,1]. Then, for all t∈[0,1], the inequality $E_{1} H_{Φ} (Z | s) \geq H_{Φ} (E_{1} Z)$ coincides,

$H_{Φ} (t X + (1 - t) Y) \leq t H_{Φ} (X) + (1 - t) H_{Φ} (Y) .$

▪

7. Proof of theorem 4.4

Our approach to proving operator subadditivity (theorem 4.4) parallels [[19, theorem 2.5], [13, section 3.1]. The strategy is as follows. First, we prove the supremum representation for the operator-valued Φ-entropies in §7a. Second, we establish a conditional operator Jensen inequality in §7b. Finally, we arrive at the proof of theorem 4.4 in §7c.

(a). Representation of operator-valued Φ-entropy

Theorem 7.1 (Supremum representation for operator-valued Φ-entropies) —

Fix a function Φ∈(C3). Assume $Z \in M_{d}^{+ +}$ is a random positive definite matrix for which |Z|, |Φ(Z)∥ are Bochner integrable. Then the operator-valued Φ-entropy can be represented as

$H_{Φ} (Z) = sup_{T ≻ 0} E [D Φ [T] (Z - T) - D Φ [E T] (Z - T) + Φ (T) - Φ (E T)] .$ 7.1

The range of the supremum contains each random positive definite matrix T for which |T| and |Φ(T)| are Bochner integrable. In particular, the normalized matrix Φ-entropy can be written in the dual form

$H_{Φ} (Z) = sup_{T ≻ 0} E [Υ_{1} (T, Z) + Υ_{2} (T)],$ 7.2

where $Υ_{1} (T, Z) = D Φ [T] (Z) - D Φ [E T] (Z)$ is a linear map of Z and $Υ_{2} (T) = - D Φ [T] (T) + D Φ [E T] (T) + (Φ (T) - Φ (E T))$ .

Proof. —

Observe that, when T=Z, the right-hand side of equation (7.1) equals H_Φ(Z). Then, it remains to confirm the inequality

$H_{Φ} (Z) ⪰ E [D Φ [T] (Z - T) - D Φ [E T] (Z - T) + Φ (T) - Φ (E T)]$ 7.3

for each random positive definite matrix T that satisfies the integrability conditions. We follow the interpolation argument as in [19, lemma 4.1]. For s∈[0,1], define the matrix-valued function

$F (s) = E [D Φ [T_{s}] (Z - T_{s}) - D Φ [E T_{s}] (Z - T_{s})] + H_{Φ} (T_{s}),$

where

$T_{s} ≜ (1 - s) \cdot Z + s \cdot T for s \in [0, 1] .$

Note that F(0)=H_Φ(Z), and F(1) matches the right-hand side of equation (7.3). As a result, it suffices to show that F′(s)≤0 for s∈[0,1] in order to verify equation (7.3). By the replacement Z−T_s=−s⋅(T−Z), the function F(s) can be rephrased as

$F (s) = - s \cdot E [D Φ [T_{s}] (T - Z) - D Φ [E T_{s}] (T - Z)] + E [Φ (T_{s}) - Φ (E T_{s})] .$

Differentiate the above function to arrive at

$\begin{aligned} F^{'} (s) & = - s E [D^{2} Φ [T_{s}] (T - Z, T - Z)] + s E [D^{2} Φ [E T_{s}] (T - Z, E (T - Z))] \\ - E [D Φ [T_{s}] (T - Z) - D Φ [E T_{s}] (T - Z)] + E [D Φ [T_{s}] (T - Z) - D Φ [E T_{s}] (T - Z)] \end{aligned}$ 7.4a

$\begin{aligned} = - s E [D^{2} Φ [T_{s}] (T - Z, T - Z) + s D^{2} Φ [E T_{s}] (E (T - Z), E (T - Z))], \end{aligned}$ 7.4b

where we cancel the last two terms in equation (7.4a) and the second equation (7.4b) follows from the bilinearity of the second-order Fréchet differentiation.

Invoke the joint convexity condition of the function D²Φ[T_s](T−Z,T−Z) (see equation (4.1)), we establish the above derivative to be negative semi-definite, i.e. $F^{'} (s) ⪯ 0$ for s∈[0,1], and thus complete the proof. ▪

(b). A conditional operator Jensen inequality

Lemma 7.2 (Conditional operator Jensen inequality for operator-valued Φ-entropy) —

Suppose that (X₁,X₂) is a pair of independent random matrices taking values in a Polish space, and let Z=Z(X₁,X₂) be a positive definite random matrix for which |Z| and |Φ(Z)| are Bochner integrable. Then

$H_{Φ} (E_{1} Z) ⪯ E H_{Φ} (Z | X_{1}),$

where $E_{1}$ is the expectation with respect to the first matrix X₁.

Proof. —

Let $E_{2}$ refer to the expectation with respect to the second matrix X₂. In the following, we use T(X₂) to emphasize that the matrix T depends only on the randomness in X₂. Recall the supremum representation, equation (7.2); we have

$\begin{aligned} H_{Φ} (E_{1} Z) & = sup_{T} E_{2} [Υ_{1} (T (X_{2}), E_{1} Z) + Υ_{2} (T (X_{2}))] \\ = sup_{T} E_{1} E_{2} [Υ_{1} (T (X_{2}), Z) + Υ_{2} (T (X_{2}))] \\ ⪯ E_{1} sup_{T} E_{2} [Υ_{1} (T (X_{2}), Z) + Υ_{2} (T (X_{2}))] \\ = E_{1} sup_{T} E [Υ_{1} (T (X_{2}), Z) + Υ_{2} (T (X_{2})) | X_{1}] \\ = E_{1} H_{Φ} (Z | X_{1}) . \end{aligned}$

The second relation follows from the Fubini theorem to interchange the order of $E_{1}$ and $E_{2}$ . In the third line, we use the convexity of the supremum. (Note that it is not always true under partial ordering. However, it holds in our case, because the supremum is attained when $T \equiv E_{1} Z$ in the second line.) The last identity is exactly the supremum representation equation (7.2) in the conditional form. ▪

It is worth emphasizing that the conditional Jensen inequality can also be achieved by item (d) in theorem 4.5 (cf. (f)⇔(g)⇔(h) in theorem 3.3).

(c). Subadditivity of operator-valued Φ-entropies

Now we are in a position to prove the subadditivity of the operator-valued Φ-entropies.

Proof. —

By adding and subtracting the term $Φ (E_{1} Z)$ , the operator-valued Φ-entropy can be expressed as

$\begin{aligned} H_{Φ} (Z) & = E [Φ (Z) - Φ (E_{1} Z) + Φ (E_{1} Z) - Φ (E Z)] \\ = E [E_{1} Φ (Z) - Φ (E_{1} Z)] + [E Φ (E_{1} Z) - Φ (E E_{1} Z)] \\ = E H_{Φ} (Z | X_{- 1}) + H_{Φ} (E_{1} Z) \\ ⪯ E H_{Φ} (Z | X_{- 1}) + E_{1} H_{Φ} (Z | X_{1}), \end{aligned}$ 7.5

where the last inequality results from lemma 7.2, because X₁ is independent from X₋₁.

Following the same reasoning, we obtain the operator-valued Φ-entropy conditioned on X₁,

$H_{Φ} (Z | X_{1}) ⪯ E [H_{Φ} (Z | X_{- 2}) | X_{1}] + E_{2} H_{Φ} (Z | X_{1}, X_{2}) .$

By plugging the expression into equation (7.5), we get

$H_{Φ} (Z) ⪯ \sum_{i = 1}^{2} E H_{Φ} (Z | X_{- i}) + E_{1} E_{2} H_{Φ} (Z | X_{1}, X_{2}) .$

Finally, by repeating this procedure, we achieve the subadditivity of the operator-valued Φ-entropy

$H_{Φ} (Z) ⪯ \sum_{i = 1}^{n} E [H_{Φ} (Z | X_{- i})],$

which completes our claim. ▪

8. Conclusion

In this paper, we extend the results of Chen & Tropp [19], Pitrik & Virosztek [21] and Hansen & Zhang [20] to complete the characterizations of the matrix Φ-entropy functionals. Moreover, we generalize the matrix Φ-entropy functionals to the operator-valued Φ-entropies, and show that this generalization preserves the subadditivity property. Additionally, we prove that the set of operator-valued Φ-entropies is not empty and contains at least the square function. Equivalent characterizations of the operator-valued Φ-entropies are also derived. This result demonstrates that the subadditivity of H_Φ(Z) is equivalent to the operator convexity of H_Φ(Z) on the convex cone of positive semi-definite operators. Finally, we exploit the subadditivity to prove the operator Efron–Stein inequality. It is promising that the proposed result can also derive the matrix exponential Efron–Stein (cf. [25, theorem 4.3]) and the moment inequalities for random matrices; see [[13], [15 ch. 15].

The subadditivity of matrix Φ-entropies leads to a series of important inequalities: matrix Poincaré inequalities with respect to binomial and Gaussian distributions, and the related matrix logarithmic Sobolev inequalities [22]. In [23], the subadditivity and the operator Efron–Stein inequality can be exploited to estimate the mixing time of a quantum random graph. It enables us to better understand the dynamics and long-term behaviours of a quantum system undergoing Markovian processes. We believe the proposed results will lead to more matrix functional inequalities, and have a substantial impact on operator algebra and quantum information science.

Finally, we remark that the results of operator-valued Φ-entropies and the operator Efron–Stein inequalities hold in the infinite-dimensional setting. This is not hard to verify, because the tools (such as Fréchet derivatives) employed in the proofs hold in the infinite dimension.

Acknowledgements

H.-C.C. sincerely thanks Marco Tomamichel for the helpful discussion about the operator-valued Φ-entropies.

Appendix A. Miscellaneous lemmas

Proposition A.1 ([Properties of Fréchet derivatives [38, theorem 3.4]) —

Let $U, V$ and $W$ be real Banach spaces. Let $L_{1} : U \to V$ and $L_{2} : V \to W$ be Fréchet differentiable at $A \in U$ and $L_{1} (A),$ respectively, and let $L = L_{2} \circ L_{1}$ (i.e. $L (A) = L_{2} (L_{1} (A))$ . Then $L$ is Fréchet differentiable at A and $D L [A] (E) = D L_{2} [L_{1} (A)] (D L_{1} [A] (E))$ .

Proposition A.2 (Convexity of twice Fréchet differentiable matrix functions [39, proposition 2.2]) —

Let U be an open convex subset of a real Banach space $U,$ and $W$ is also a real Banach space. Then a twice Fréchet differentiable function $L : U \to W$ is convex if and only if $D^{2} L (X) (h, h) ⪰ 0$ for each X∈U and $h \in U$ .

Proposition A.3 (Partial Fréchet derivative [40, proposition 5.3.15]) —

If $L : U \times V \to W$ is Fréchet differentiable at $(X, Y) \in U \times V,$ then the partial Fréchet derivatives $D_{X} L [X, Y]$ and $D_{Y} L [X, Y]$ exist, and

$D L [X, Y] (h, k) = D_{X} L [X, Y] (h) + D_{Y} L [X, Y] (k) .$

Proposition A.4 [41, theorem 2.2] —

Let $A, X \in M^{sa}$ and $t \in R$ . Assume $f : I \to R$ is a continuously differentiable function defined on interval I and assume that the eigenvalues of A+tX⊂I. Then

${\frac{d}{d t} Tr f (A + t X) |}_{t = t_{0}} = Tr [X f^{'} (A + t_{0} X)] .$

Proposition A.4 directly leads to the following lemma.

Lemma A.5 —

Let $A, X, Y \in M^{sa}$ and $t \in R$ . Assume $f : I \to R$ is a continuously differentiable function defined on interval I, and assume that the eigenvalues of A+tX⊂I. Then

$Tr (D^{2} f [A] (X, Y)) = ⟨ X, D f^{'} [A] (Y) ⟩ = ⟨ Y, D f^{'} [A] (X) ⟩ .$

Lemma A.6 (Second-order Fréchet derivative of inversion function) —

Let $G : M \to M$ be second-order Fréchet differentiable at $A \in M,$ and $G (A)$ be invertible. Then, for each $h, k \in M,$ we have

$\begin{aligned} D G {[A]}^{- 1} (h) & = - G {(A)}^{- 1} \cdot D G [A] (h) \cdot G {(A)}^{- 1}; \\ D^{2} G {[A]}^{- 1} (h, k) & = 2 \cdot G {(A)}^{- 1} \cdot D G [A] (k) \cdot G {(A)}^{- 1} \cdot D G [A] (k) \cdot G {(A)}^{- 1} \\ - G {(A)}^{- 1} \cdot D^{2} G [A] (h, k) \cdot G {(A)}^{- 1} . \end{aligned}$

Proof. —

Denote $F : A \mapsto A^{- 1}$ as the inversion function. Recall the chain rule of the Fréchet derivative,

$\begin{aligned} D F \circ G [A] (h) & = D F [G (A)] (D G [A] (h)); \\ D^{2} F \circ G [A] (h, k) & = D^{2} F [G (A)] (D G [A] (k), D G [A] (k)) + D F [G (A)] (D^{2} G [A] (h, k)) . \end{aligned}$

Applying the formulae of the Fréchet derivative of the inversion function (see, for example, [[35]; example X.4.2; [42, exercise 3.27]),

$\begin{aligned} D {[X]}^{- 1} (Y) & = - X^{- 1} Y X^{- 1}, \\ D^{2} {[X]}^{- 1} (Y_{1}, Y_{2}) & = X^{- 1} Y_{1} X^{- 1} Y_{2} X^{- 1} + X^{- 1} Y_{2} X^{- 1} Y_{1} X^{- 1}, \end{aligned}$

concludes the desired results. ▪

Proposition A.7 (Operator Jensen inequality [34,43–45]) —

Let (Ω,Σ) be a measurable space and suppose that $I \subseteq R$ is an open interval. Assume that, for every x∈Ω, K(x) is a (finite or infinite dimensional) square matrix and satisfies

$\int_{x \in Ω} K (d x) K {(d x)}^{†} = I$

(identity matrix in $M^{sa}$ ). If $f : Ω \to M^{sa}$ is a measurable function for which σ( f(x))⊂I, for every x∈Ω, then

$ϕ (\int_{x \in Ω} K (d x) f (x) K {(d x)}^{†}) ⪯ \int_{x \in Ω} K (d x) ϕ (f (x)) K {(d x)}^{†} μ (d x)$

for every operator convex function $ϕ : I \to R$ . Moreover,

$Tr [ϕ (\int_{x \in Ω} K (d x) f (x) K {(d x)}^{†} μ (d x))] \leq Tr [\int_{x \in Ω} K (d x) ϕ (f (x)) K {(d x)}^{†} μ (d x)]$

for every convex function $ϕ : I \to R$ .

Footnotes

We assume that the functions considered in the paper are Fréchet differentiable. The reader can refer to, for example, [30,31] for conditions when a function is Fréchet differentiable.

Note that we will use notation $E (L)$ and $E (Z)$ interchangeably.

Data accessibility

This work does not have any experimental data.

Authors' contributions

Both authors contributed equally to this paper.

Competing interests

We have no competing interests.

Funding

M.-H.H. is supported by an ARC Future Fellowship under grant no. FT140100574.

References

1.Csiszár I. 1963. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizittät von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl. 8, 85–108. [Google Scholar]
2.Csiszár I. 1967. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2, 299–318. [Google Scholar]
3.Ali SM, Silvey SD. 1966. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. B 28, 131–142. [Google Scholar]
4.Burbea J, Rao CR. 1982. Entropy differential metric, distance and divergence measures in probability spaces: a unified approach. J. Multivar. Anal. 12, 575–596. (doi:10.1016/0047-259X(82)90065-3) [Google Scholar]
5.Burbea J, Rao CR. 1982. On the convexity of higher order Jensen differences based on entropy functions. IEEE Trans. Inf. Theory 28, 961–963. (doi:10.1109/TIT.1982.1056573) [Google Scholar]
6.Burbea J, Rao CR. 1982. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28, 489–495. (doi:10.1109/TIT.1982.1056497) [Google Scholar]
7.Han TS. 1978. Nonnegative entropy measures of multivariate symmetric correlations. Inf. Control. 36, 133–156. (doi:10.1016/S0019-9958(78)90275-9) [Google Scholar]
8.Bobkov S, Ledoux M. 1997. Poincaré's inequalities and Talagrand's concentration phenomenon for the exponential distribution. Probab. Theory Relat. Fields 107, 383–400. (doi:10.1007/s004400050090) [Google Scholar]
9.Ledoux M. 1997. On Talagrand's deviation inequalities for product measures. ESAIM Probab. Stat. 1, 63–87. (doi:10.1051/ps:1997103) [Google Scholar]
10.Gross L. 1975. Logarithmic Sobolev inequalities. Am. J. Math. 97, 1061–1083. (doi:10.2307/2373688) [Google Scholar]
11.Latała R, Oleszkiewicz K. 2000. Between Sobolev and Poincaré. Geom. Funct. Anal.1745, 147–168 (doi:10.1007/bfb0107213)
12.Ané C, Blachère S, Fougères P, Gentil I, Malrieu F, Roberto C, Scheffer G. 2000. Sur les inégalités de Sobolev logarithmiques. Panoramas et Synthéses, vol. 10 Paris, France: Société Mathématique de France; [In French.] [Google Scholar]
13.Boucheron S, Bousquet O, Lugosi G, Massart P. 2005. Moment inequalities for functions of independent random variables. Ann. Prob. 33, 514–560. (doi:10.1214/009117904000000856) [Google Scholar]
14.Massart P (ed.) 2007. Concentration inequalities and model selection. Berlin, Germany: Springer; (doi:10.1007/978-3-540-48503-2) [Google Scholar]
15.Boucheron S, Lugosi G, Massart P. 2013. Concentration inequalities: a nonasymptotic theory of independence. Oxford, UK: Oxford University Press; (doi:10.1093/acprof:oso/9780199535255.001.0001) [Google Scholar]
16.Bakry D, Gentil I, Ledoux M. 2013. Analysis and geometry of Markov diffusion operators. Berlin, Germany: Springer; (doi:10.1007/978-3-319-00227-9) [Google Scholar]
17.Chafaï D. 2004. Entropies, convexity, and functional inequalities: on Φ-entropies and Φ-Sobolev inequalities. J. Math. Kyoto. Univ. 44, 325–363. (http://arxiv.org/abs/math/0211103v2) [Google Scholar]
18.Chafaï D. 2006. Binomial-Poisson entropic inequalities and the M/M/∞ queue. ESAIM Probab. Stat. 10, 317–339. (doi:10.1051/ps:2006013) [Google Scholar]
19.Chen RY, Tropp JA. 2014. Subadditivity of matrix φ-entropy and concentration of random matrices. Electron. J. Probab. 19, 1–30. (doi:10.1214/ejp.v19-2964) [Google Scholar]
20.Hansen F, Zhang Z. 2015. Characterisation of matrix entropies. Lett. Math. Phys. 105, 1399–1411. (doi:10.1007/s11005-015-0784-8) [Google Scholar]
21.Pitrik J, Virosztek D. 2015. On the joint convexity of the Bregman divergence of matrices. Lett. Math. Phys. 105, 675–692. (doi:10.1007/s11005-015-0757-y) [Google Scholar]
22.Cheng H-C, Hsieh M-H.2015. New characterizations of matrix Φ-entropies, Poincaré and Sobolev inequalities and an upper bound to Holevo quantity. (http://arxiv.org/abs/1506.06801. )
23.Cheng H-C, Hsieh M-H, Tomamichel M.2015. Exponential decay of matrix Φ-entropies on Markov semigroups with applications to dynamical evolutions of quantum ensembles. (http://arxiv.org/abs/1511.02627. )
24.Tropp JA. 2015. An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8, 1–230. (doi:10.1561/2200000048) [Google Scholar]
25.Paulin D, Mackey L, Tropp JA.2014. Efron–Stein inequalities for random matrices. (http://arxiv.org/abs/1408.3470. )
26.Efron B, Stein C. 1981. The jackknife estimate of variance. Ann. Stat. 9, 586–596. (doi:10.1214/aos/1176345462) [Google Scholar]
27.Steele JM. 1986. An Efron–Stein inequality for nonsymmetric statistics. Ann. Stat. 14, 753–758. (doi:10.1214/aos/1176349952) [Google Scholar]
28.Diestel J, Uhl J. 1977. Vector measures. Providence, RI: American Mathematical Society; (doi:10.1090/surv/015) [Google Scholar]
29.Mikusiński J. 1978. The Bochner integral. Berlin, Germany: Springer; (doi:10.1007/978-3-0348-5567-9) [Google Scholar]
30.Peller VV. 1985. Hankel operators in the perturbation theory of unitary and self-adjoint operators. Funct. Anal. Appl. 19, 111–123. (doi:10.1007/BF01078390) [Google Scholar]
31.Bickel K. 2007. Differentiating matrix functions. Oper. Matrices 7, 71–90. (doi:10.7153/oam-07-03) [Google Scholar]
32.von Neumann J. 1955. Mathematical foundations of quantum mechanics. Princeton, NJ: Princeton University Press. [Google Scholar]
33.Carlen E. 2010. Trace inequalities and quantum entropy: an introductory course. In Entropy and the quantum (eds R Sims, D Ueltschi), pp. 73–140. Contemporary Mathematics, vol. 529. Providence, RI: American Mathematical Society (doi:10.1090/conm/529/10428)
34.Farenick DR, Zhou F. 2007. Jensen's inequality relative to matrix-valued measures. J. Math. Anal. Appl. 327, 919–929. (doi:10.1016/j.jmaa.2006.05.008) [Google Scholar]
35.Bhatia R. 1997. Matrix analysis. Berlin, Germany: Springer; (doi:10.1007/978-1-4612-0653-8) [Google Scholar]
36.Mendl CB, Wolf MM. 2009. Unital quantum channels—convex structure and revivals of Birkhoff's theorem. Commun. Math. Phys. 289, 1057–1086. (doi:10.1007/s00220-009-0824-2) [Google Scholar]
37.Petz D. 2003. Monotonicity of quantum relative entropy revisited. Rev. Math. Phys. 15, 79–91. (doi:10.1142/S0129055X03001576) [Google Scholar]
38.Higham NJ. 2008. Functions of matrices: theory and computation. Philadelphia, PA: Society for Industrial & Applied Mathematics; (doi:10.1137/1.9780898717778) [Google Scholar]
39.Hansen F. 1997. Operator convex functions of several variables. Publ. Res. I. Math. Sci. 33, 443–463. (doi:10.2977/prims/1195145324) [Google Scholar]
40.Atkinson K, Han W. 2009. Theoretical numerical analysis: a functional analysis framework. Germany, Berlin: Springer; (doi:10.1007/978-1-4419-0458-4) [Google Scholar]
41.Hansen F, Pedersen GK. 1995. Perturbation formulas for traces on C^∗-algebras. Publ. Res. Inst. Math. Sci. 31, 169–178. (doi:10.2977/prims/1195164797) [Google Scholar]
42.Hiai F, Petz D. 2014. Introduction to matrix analysis and applications. Germany, Berlin: Springer; (doi:10.1007/978-3-319-04150-6) [Google Scholar]
43.Davis C. 1957. A Schwarz inequality for convex operator functions. Proc. Am. Math. Soc. 8, 42–44. (doi:10.1090/S0002-9939-1957-0084120-4) [Google Scholar]
44.Choi MD. 1974. A Schwarz inequality for positive linear maps on C^∗-algebras. Illinois J. Math. 18, 565–574. [Google Scholar]
45.Hansen F, Pedersen GK. 2003. Jensen's operator inequality. Bull. Lond. Math. Soc. 35, 553–564. (doi:10.1112/S0024609303002200) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

This work does not have any experimental data.

[RSPA20150563C1] 1.Csiszár I. 1963. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizittät von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl. 8, 85–108. [Google Scholar]

[RSPA20150563C2] 2.Csiszár I. 1967. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2, 299–318. [Google Scholar]

[RSPA20150563C3] 3.Ali SM, Silvey SD. 1966. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. B 28, 131–142. [Google Scholar]

[RSPA20150563C4] 4.Burbea J, Rao CR. 1982. Entropy differential metric, distance and divergence measures in probability spaces: a unified approach. J. Multivar. Anal. 12, 575–596. (doi:10.1016/0047-259X(82)90065-3) [Google Scholar]

[RSPA20150563C5] 5.Burbea J, Rao CR. 1982. On the convexity of higher order Jensen differences based on entropy functions. IEEE Trans. Inf. Theory 28, 961–963. (doi:10.1109/TIT.1982.1056573) [Google Scholar]

[RSPA20150563C6] 6.Burbea J, Rao CR. 1982. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28, 489–495. (doi:10.1109/TIT.1982.1056497) [Google Scholar]

[RSPA20150563C7] 7.Han TS. 1978. Nonnegative entropy measures of multivariate symmetric correlations. Inf. Control. 36, 133–156. (doi:10.1016/S0019-9958(78)90275-9) [Google Scholar]

[RSPA20150563C8] 8.Bobkov S, Ledoux M. 1997. Poincaré's inequalities and Talagrand's concentration phenomenon for the exponential distribution. Probab. Theory Relat. Fields 107, 383–400. (doi:10.1007/s004400050090) [Google Scholar]

[RSPA20150563C9] 9.Ledoux M. 1997. On Talagrand's deviation inequalities for product measures. ESAIM Probab. Stat. 1, 63–87. (doi:10.1051/ps:1997103) [Google Scholar]

[RSPA20150563C10] 10.Gross L. 1975. Logarithmic Sobolev inequalities. Am. J. Math. 97, 1061–1083. (doi:10.2307/2373688) [Google Scholar]

[RSPA20150563C11] 11.Latała R, Oleszkiewicz K. 2000. Between Sobolev and Poincaré. Geom. Funct. Anal.1745, 147–168 (doi:10.1007/bfb0107213)

[RSPA20150563C12] 12.Ané C, Blachère S, Fougères P, Gentil I, Malrieu F, Roberto C, Scheffer G. 2000. Sur les inégalités de Sobolev logarithmiques. Panoramas et Synthéses, vol. 10 Paris, France: Société Mathématique de France; [In French.] [Google Scholar]

[RSPA20150563C13] 13.Boucheron S, Bousquet O, Lugosi G, Massart P. 2005. Moment inequalities for functions of independent random variables. Ann. Prob. 33, 514–560. (doi:10.1214/009117904000000856) [Google Scholar]

[RSPA20150563C14] 14.Massart P (ed.) 2007. Concentration inequalities and model selection. Berlin, Germany: Springer; (doi:10.1007/978-3-540-48503-2) [Google Scholar]

[RSPA20150563C15] 15.Boucheron S, Lugosi G, Massart P. 2013. Concentration inequalities: a nonasymptotic theory of independence. Oxford, UK: Oxford University Press; (doi:10.1093/acprof:oso/9780199535255.001.0001) [Google Scholar]

[RSPA20150563C16] 16.Bakry D, Gentil I, Ledoux M. 2013. Analysis and geometry of Markov diffusion operators. Berlin, Germany: Springer; (doi:10.1007/978-3-319-00227-9) [Google Scholar]

[RSPA20150563C17] 17.Chafaï D. 2004. Entropies, convexity, and functional inequalities: on Φ-entropies and Φ-Sobolev inequalities. J. Math. Kyoto. Univ. 44, 325–363. (http://arxiv.org/abs/math/0211103v2) [Google Scholar]

[RSPA20150563C18] 18.Chafaï D. 2006. Binomial-Poisson entropic inequalities and the M/M/∞ queue. ESAIM Probab. Stat. 10, 317–339. (doi:10.1051/ps:2006013) [Google Scholar]

[RSPA20150563C19] 19.Chen RY, Tropp JA. 2014. Subadditivity of matrix φ-entropy and concentration of random matrices. Electron. J. Probab. 19, 1–30. (doi:10.1214/ejp.v19-2964) [Google Scholar]

[RSPA20150563C20] 20.Hansen F, Zhang Z. 2015. Characterisation of matrix entropies. Lett. Math. Phys. 105, 1399–1411. (doi:10.1007/s11005-015-0784-8) [Google Scholar]

[RSPA20150563C21] 21.Pitrik J, Virosztek D. 2015. On the joint convexity of the Bregman divergence of matrices. Lett. Math. Phys. 105, 675–692. (doi:10.1007/s11005-015-0757-y) [Google Scholar]

[RSPA20150563C22] 22.Cheng H-C, Hsieh M-H.2015. New characterizations of matrix Φ-entropies, Poincaré and Sobolev inequalities and an upper bound to Holevo quantity. (http://arxiv.org/abs/1506.06801. )

[RSPA20150563C23] 23.Cheng H-C, Hsieh M-H, Tomamichel M.2015. Exponential decay of matrix Φ-entropies on Markov semigroups with applications to dynamical evolutions of quantum ensembles. (http://arxiv.org/abs/1511.02627. )

[RSPA20150563C24] 24.Tropp JA. 2015. An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8, 1–230. (doi:10.1561/2200000048) [Google Scholar]

[RSPA20150563C25] 25.Paulin D, Mackey L, Tropp JA.2014. Efron–Stein inequalities for random matrices. (http://arxiv.org/abs/1408.3470. )

[RSPA20150563C26] 26.Efron B, Stein C. 1981. The jackknife estimate of variance. Ann. Stat. 9, 586–596. (doi:10.1214/aos/1176345462) [Google Scholar]

[RSPA20150563C27] 27.Steele JM. 1986. An Efron–Stein inequality for nonsymmetric statistics. Ann. Stat. 14, 753–758. (doi:10.1214/aos/1176349952) [Google Scholar]

[RSPA20150563C28] 28.Diestel J, Uhl J. 1977. Vector measures. Providence, RI: American Mathematical Society; (doi:10.1090/surv/015) [Google Scholar]

[RSPA20150563C29] 29.Mikusiński J. 1978. The Bochner integral. Berlin, Germany: Springer; (doi:10.1007/978-3-0348-5567-9) [Google Scholar]

[RSPA20150563C30] 30.Peller VV. 1985. Hankel operators in the perturbation theory of unitary and self-adjoint operators. Funct. Anal. Appl. 19, 111–123. (doi:10.1007/BF01078390) [Google Scholar]

[RSPA20150563C31] 31.Bickel K. 2007. Differentiating matrix functions. Oper. Matrices 7, 71–90. (doi:10.7153/oam-07-03) [Google Scholar]

[RSPA20150563C32] 32.von Neumann J. 1955. Mathematical foundations of quantum mechanics. Princeton, NJ: Princeton University Press. [Google Scholar]

[RSPA20150563C33] 33.Carlen E. 2010. Trace inequalities and quantum entropy: an introductory course. In Entropy and the quantum (eds R Sims, D Ueltschi), pp. 73–140. Contemporary Mathematics, vol. 529. Providence, RI: American Mathematical Society (doi:10.1090/conm/529/10428)

[RSPA20150563C34] 34.Farenick DR, Zhou F. 2007. Jensen's inequality relative to matrix-valued measures. J. Math. Anal. Appl. 327, 919–929. (doi:10.1016/j.jmaa.2006.05.008) [Google Scholar]

[RSPA20150563C35] 35.Bhatia R. 1997. Matrix analysis. Berlin, Germany: Springer; (doi:10.1007/978-1-4612-0653-8) [Google Scholar]

[RSPA20150563C36] 36.Mendl CB, Wolf MM. 2009. Unital quantum channels—convex structure and revivals of Birkhoff's theorem. Commun. Math. Phys. 289, 1057–1086. (doi:10.1007/s00220-009-0824-2) [Google Scholar]

[RSPA20150563C37] 37.Petz D. 2003. Monotonicity of quantum relative entropy revisited. Rev. Math. Phys. 15, 79–91. (doi:10.1142/S0129055X03001576) [Google Scholar]

[RSPA20150563C38] 38.Higham NJ. 2008. Functions of matrices: theory and computation. Philadelphia, PA: Society for Industrial & Applied Mathematics; (doi:10.1137/1.9780898717778) [Google Scholar]

[RSPA20150563C39] 39.Hansen F. 1997. Operator convex functions of several variables. Publ. Res. I. Math. Sci. 33, 443–463. (doi:10.2977/prims/1195145324) [Google Scholar]

[RSPA20150563C40] 40.Atkinson K, Han W. 2009. Theoretical numerical analysis: a functional analysis framework. Germany, Berlin: Springer; (doi:10.1007/978-1-4419-0458-4) [Google Scholar]

[RSPA20150563C41] 41.Hansen F, Pedersen GK. 1995. Perturbation formulas for traces on C^∗-algebras. Publ. Res. Inst. Math. Sci. 31, 169–178. (doi:10.2977/prims/1195164797) [Google Scholar]

[RSPA20150563C42] 42.Hiai F, Petz D. 2014. Introduction to matrix analysis and applications. Germany, Berlin: Springer; (doi:10.1007/978-3-319-04150-6) [Google Scholar]

[RSPA20150563C43] 43.Davis C. 1957. A Schwarz inequality for convex operator functions. Proc. Am. Math. Soc. 8, 42–44. (doi:10.1090/S0002-9939-1957-0084120-4) [Google Scholar]

[RSPA20150563C44] 44.Choi MD. 1974. A Schwarz inequality for positive linear maps on C^∗-algebras. Illinois J. Math. 18, 565–574. [Google Scholar]

[RSPA20150563C45] 45.Hansen F, Pedersen GK. 2003. Jensen's operator inequality. Bull. Lond. Math. Soc. 35, 553–564. (doi:10.1112/S0024609303002200) [Google Scholar]

PERMALINK

Characterizations of matrix and operator-valued Φ-entropies, and operator Efron–Stein inequalities

Hao-Chung Cheng

Min-Hsiu Hsieh

Abstract

1. Introduction

Table 1.

(a). Our results

Table 2.

(b). Prior work

2. Preliminaries

(a). Classical Φ-entropy functionals

Definition 2.1 (Classical Φ-entropies) —

Theorem 2.2 [18, theorem 4.4] —

3. Equivalent characterizations of matrix Φ-entropy functionals

Definition 3.1 (Matrix Φ-entropy functional [19]) —

Theorem 3.2 (Subadditivity of the matrix Φ-entropy functional [19, theorem 2.5]) —

Theorem 3.3 —

4. Operator-valued Φ-entropies

Definition 4.1 (Operator-valued entropy class) —

Definition 4.2 (Operator-valued Φ-entropies) —

Proposition 4.3 —

Proof. —

(a). Subadditivity of operator-valued Φ-entropies

Theorem 4.4 (Subadditivity of the operator-valued Φ-entropy) —

(b). Equivalent characterizations of operator-valued Φ-entropies

Theorem 4.5 —

Remark 4.6 —

Proposition 4.7 (Monotonicity of operator-valued Φ-entropies) —

Proof. —

Corollary 4.8 (Monotonicity of matrix Φ-entropy functionals) —

5. Applications: operator Efron–Stein inequality

Theorem 5.1 (Operator Efron–Stein inequality) —

Proof. —

Corollary 5.2 (Matrix polynomial Efron–Stein) —

6. Proof of theorem 3.3

Proof. —

Lemma 6.1 (Convexity lemma [19, lemma 4.2]) —

Remark 6.2 —

7. Proof of theorem 4.4

(a). Representation of operator-valued Φ-entropy

Theorem 7.1 (Supremum representation for operator-valued Φ-entropies) —

Proof. —

(b). A conditional operator Jensen inequality

Lemma 7.2 (Conditional operator Jensen inequality for operator-valued Φ-entropy) —

Proof. —

(c). Subadditivity of operator-valued Φ-entropies

Proof. —

8. Conclusion

Acknowledgements

Appendix A. Miscellaneous lemmas

Proposition A.1 ([Properties of Fréchet derivatives [38, theorem 3.4]) —

Proposition A.2 (Convexity of twice Fréchet differentiable matrix functions [39, proposition 2.2]) —

Proposition A.3 (Partial Fréchet derivative [40, proposition 5.3.15]) —

Proposition A.4 [41, theorem 2.2] —

Lemma A.5 —

Lemma A.6 (Second-order Fréchet derivative of inversion function) —

Proof. —

Proposition A.7 (Operator Jensen inequality [34,43–45]) —

Footnotes

Data accessibility

Authors' contributions

Competing interests

Funding

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases