Abstract
We derive new characterizations of the matrix Φ-entropy functionals introduced in Chen & Tropp (Chen, Tropp 2014 Electron. J. Prob. 19, 1–30. (doi:10.1214/ejp.v19-2964)). These characterizations help us to better understand the properties of matrix Φ-entropies, and are a powerful tool for establishing matrix concentration inequalities for random matrices. Then, we propose an operator-valued generalization of matrix Φ-entropy functionals, and prove the subadditivity under Löwner partial ordering. Our results demonstrate that the subadditivity of operator-valued Φ-entropies is equivalent to the convexity. As an application, we derive the operator Efron–Stein inequality.
Keywords: Φ-entropy, matrix concentration inequalities, Efron–Stein inequality
1. Introduction
The introduction of Φ-entropy functionals can be traced back to the early days of information theory [1,2] and convex analysis [3–6], where the notion of ϕ-divergence is defined. Formally, given a non-negative real random variable Z and a smooth convex function Φ, the Φ-entropy functional refers to
By the Jensen inequality, it is not hard to see that the quantity HΦ(Z) is non-negative. Hence, the Φ-entropy functional can be used as an entropic measure to characterize the uncertainty of the random variable Z.
The investigation of general properties of classical Φ-entropies has enjoyed great success in physics, probability theory, information theory and computer science. Of these, the subadditivity (or the tensorization) property [7–9] has led to the derivations of the logarithmic Sobolev [10], Φ-Sobolev [11] and Poincaré inequalities [12], which, in turn, is a crucial step towards the powerful entropy method in concentration inequalities [13–15] and analysis of Markov semigroups [16].
Let Z=f(X1,…,Xn) be a random variable defined on n independent random variables (X1,…,Xn). We say HΦ(Z) is subadditive if
where denotes the conditional expectation with respect to Xi. Gross [10] first observed that the ordinary entropy functional is subadditive in his seminal paper. Later on, equivalent characterizations of the subadditive entropy class (see theorem 2.2) were established [11,17,18], which prove to be useful in other contexts such as stochastic processes [17,18].
Parallel to the classical Φ-entropies, Chen & Tropp [19] introduced the notion of matrix Φ-entropy functionals. Namely, for a positive semi-definite random matrix Z, the matrix Φ-entropy functional is defined as
where is the normalized trace. The class of subadditive matrix Φ-entropy functionals is characterized in terms of the second derivative of their representing functions. Unlike its classical counterpart, only a few connections between the matrix Φ-entropy functionals and other convex forms of the same functions have been established [20,21] prior to this work.
In this paper, we establish equivalent characterizations of the matrix Φ-entropy functionals defined in [19]. Our results show that matrix Φ-entropy functionals satisfy all known equivalent statements that classical Φ-entropy functions satisfy [15,17,18]. Our results provide additional justification for the original definition of the matrix Φ-entropy functionals (table 1). The equivalences between matrix Φ-entropy functionals and other convex forms of the function Φ advance our understanding of the class of entropy functions. Moreover, it allows us to unify the study of matrix concentration inequalities and matrix Φ-Sobolev inequalities [22,23].
Table 1.
Comparison between the equivalent characterizations of the Φ-entropy functional class (C1) (definition 2.1) and the matrix Φ-entropy functional class (C2) (definition 3.1).
| classical Φ-entropy functional class (C1) | matrix Φ-entropy functional class (C2) | |
|---|---|---|
| (a) | Φ is affine or Φ′′>0 and 1/Φ′′ is concave | Φ is affine or DΦ′ is invertible and (DΦ′)−1 is concave |
| (b) | convexity of (u,v)↦Φ(u+v)−Φ(u)−Φ′(u)v | convexity of (u,v)↦Tr[Φ(u+v)−Φ(u)−DΦ[u](v)] |
| (c) | convexity of (u,v)↦(Φ′(u+v)−Φ′(u))v | convexity of (u,v)↦Tr[(DΦ[u+v](v)−DΦ[u](v)] |
| (d) | convexity of (u,v)↦Φ′′(u)v2 | convexity of (u,v)↦Tr[D2Φ[u](v,v)] |
| (e) | Φ is affine or Φ′′>0 and Φ′′′′Φ′′≥2Φ′′′2 | equation (3.2) |
| (f) | convexity of (u,v)↦tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) for any 0≤t≤1 | convexity of (u,v)↦ Tr[tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v)] for any 0≤t≤1 |
| (g) | ||
| (h) | HΦ(Z) is a convex function of Z | HΦ(Z) is a convex function of Z |
| (i) | ||
| (j) |
Furthermore, we consider the following operator-valued generalization of matrix Φ-entropy functionals:
A special case of this operator-valued Φ-entropy functional is the operator-valued variance Var(Z) defined in [24,25], where Φ is the square function. The equivalent conditions for the subadditivity under Löwner partial ordering are derived (theorem 4.5). In particular, we show that subadditivity of the operator-valued Φ-entropies is equivalent to the convexity,
Our result directly yields the operator Efron–Stein inequality, which recovers the well-known Efron–Stein inequality [26,27] when random matrices reduce to real random variables.
(a). Our results
We summarize our results here. First, we derive equivalent characterizations for the matrix Φ-entropy functionals in table 1 (see theorem 3.3). Notably, all known equivalent characterizations for the classical Φ-entropies can be generalized to their matrix correspondences. We emphasize that additional characterizations of the Φ-entropies prove to be useful in many instances. The characterizations (b)–(d) in (C1) are explored by Chafaï [18] to derive several entropic inequalities for M/M/ queueing processes that are not diffusions. With the characterizations (b)–(d), the difficulty of lacking the diffusion property can be circumvented and replaced by convexity. Moreover, as shown in corollary 4.8, item (f) in table 1 can be used to demonstrate an interesting result in quantum information theory: the matrix Φ-entropy functional of a quantum ensemble (i.e. a set of quantum states with some prior distribution) is monotone under any unital quantum channel. This property motivates us to study the dynamical evolution of a quantum ensemble and its mixing time, a fundamentally important problem in quantum computation (see our follow-up work [23] for further details).
Second, we define and derive equivalent characterizations for operator-valued Φ-entropies in table 2 (see theorem 4.5). Note that the only known statement in table 1 that is missing in table 2 is condition (e). In other words, we are not able to generalize (e) in table 1 to the non-commutative case. Finally, we employ the subadditivity of operator-valued Φ-entropies to show the operator Efron–Stein inequality in theorem 5.1.
Table 2.
Equivalent statements of the operator-valued Φ-entropy class (C3) (definition 4.1).
| operator-valued Φ-entropy class (C3) | |
|---|---|
| (a) | the second-order Fréchet derivative D2Φ[u](v,v) is jointly convex in (u,v) |
| (b) | Φ(u+v)−Φ(u)−DΦ[u](v) is jointly convex in (u,v) |
| (c) | DΦ[u+v](v)−DΦ[u](v) is jointly convex in (u,v) |
| (d) | D2Φ[u](v,v) is jointly convex in (u,v) |
| (e) | tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) is jointly convex in (u,v) for any 0≤t≤1 |
| (f) | |
| (g) | HΦ(Z) is a convex function of Z |
| (h) | |
| (i) |
(b). Prior work
For the history of the equivalent characterizations in the class (C1), we refer to an excellent textbook [15] and papers [17,18].
The original definition of the matrix Φ-entropy class, namely (a) in (C2), was proposed by Chen & Tropp in 2014 [19]. In the same paper, they also established the subadditivity property (j) through (i) and (g): (a)⇒(i)⇒(g)⇒(j) in table 1. Shortly after, the equivalent relation between (a) and the joint convexity of the matrix Brégman divergence (b) was proved in [21]. The equivalent relation between (a) and (d) was almost immediately implied by the result in [20] (see the detailed discussion in the proof of theorem 3.3). The convexity of HΦ(Z), (h), is noted in [20]. Here, we provide transparent evidence—the joint convexity of (f).
We organize the paper in the following way. We collect the necessary information for the matrix algebra in §2. The equivalent characterizations of the matrix Φ-entropy functionals are provided in §3. We define the operator-valued Φ-entropies and derive their equivalent statements in §4. Section 5 shows an application of the subadditivity—the operator Efron–Stein inequality. The proofs of the main results are collected in §6 and §7, respectively. Finally, we conclude the paper.
2. Preliminaries
We first introduce basic notation.
The set refers to the subspace of self-adjoint operators on some separable Hilbert space. We denote by (resp. ) the set of positive semi-definite (resp. positive-definite) operators in . If the dimension d of a Hilbert space needs special attention, then we highlight it in subscripts, e.g. denotes the Banach space of d×d complex matrices. The trace function is defined as the summation of eigenvalues. The normalized trace function for every d×d matrix M is denoted by . For , the Schatten p-norm of an operator M is denoted as , where {λi(M)} are the singular values of M. The Hilbert–Schmidt inner product is defined as . For , means that A−B is positive semi-definite. Similarly, A≻B means A−B is positive-definite. Throughout this paper, italic capital letters (e.g. X) are used to denote operators.
Denote a probability space . A random matrix Z defined on the probability space means that it is a matrix-valued random variable defined on Ω. We denote the expectation of Z with respect to by
where the integral is the Bochner integral [28,29]. We note that the results derived in this paper are universal for all probability spaces. Hence, we will omit the subscript of the expectation. If we consider a sample space Ω1×Ω2 with joint distribution , then we denote the conditional expectation of Z with respect to the first space Ω1 by , where is the marginal distribution on Ω1.
Let be real Banach spaces. The Fréchet derivative of a function at a point , if it exists,1 is a unique linear mapping such that
where is a norm in (resp. ). The notation then is interpreted as ‘the Fréchet derivative of at X in the direction E’. The partial Fréchet derivative of multivariate functions can be defined as follows. Let and be real Banach spaces, . For a fixed , is a function of u whose derivative at u0, if it exists, is called the partial Fréchet derivative of with respect to u, and is denoted by . The partial Fréchet derivative is defined similarly. Likewise, the mth Fréchet derivative is a unique multi-linear map from (m times) to that satisfies
for each . The Fréchet derivative enjoys several properties as in standard derivatives. We provide these facts in appendix A.
A function is called operator convex if, for each and 0≤t≤1,
Similarly, a function is called operator monotone if, for each ,
(a). Classical Φ-entropy functionals
Let (C1) denote the class of functions that are continuous, convex on , twice differentiable on , and either Φ is affine or Φ′′ is strictly positive and 1/Φ′′ is concave.
Definition 2.1 (Classical Φ-entropies) —
Let be a convex function. For every non-negative integrable random variable Z, so that and , the classical Φ-entropy HΦ(Z) is defined as
In particular, we are interested in Z=f(X1,…,Xn), where X1,…,Xn are independent random variables and f≥0 is a measurable function.
We say HΦ(Z) is subadditive [9] if
where is the conditional Φ-entropy, and denotes the conditional expectation conditioned on the n−1 random variables . Sometimes, we also denote by HΦ(Z|X−i).
It is a well-known result that, for any function Φ∈(C1), HΦ(Z) is subadditive [11, corollary 3] (see also [13, section 3]).
Theorem 2.2 establishes equivalent characterizations of classical Φ-entropies.
Theorem 2.2 [18, theorem 4.4] —
The following statements are equivalent:
(a) Φ∈(C1): Φ is affine, or Φ′′>0 and 1/Φ′′ is concave;
(b) Brégman divergence (u,v)↦Φ(u+v)−Φ(u)−Φ′(u)v is convex;
(c) (u,v)↦(Φ′(u+v)−Φ′(u))v is convex;
(d) (u,v)↦Φ′′(u)v2 is convex;
(e) Φ is affine or Φ′′>0 and Φ′′′′Φ′′≥2Φ′′′2;
(f) (u,v)↦tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) is convex for any 0≤t≤1;
(g)
(h) {HΦ(Z)}Φ∈(C1) forms a convex set;
(i) ; and
(j) .
3. Equivalent characterizations of matrix Φ-entropy functionals
Here, we first introduce matrix Φ-entropy functionals, and present the main result (theorem 3.3) of this section, namely new characterizations of the matrix Φ-entropy functionals.
Chen & Tropp [19] introduce the class of matrix Φ-entropies, and prove its subadditivity in 2014. Here, we will show that all equivalent characterizations of classical Φ-entropies in theorem 2.2 have a one-to-one correspondence for the class of matrix Φ-entropies.
Let d be a natural number. The class Φd contains each function that is either affine or satisfies the following three conditions.
(i) Φ is convex and continuous at zero.
(ii) Φ is twice continuously differentiable.
(iii) Define Ψ(t)=Φ′(t) for t>0. The Fréchet derivative DΨ of the standard matrix function is an invertible linear map on , and the map A↦(DΨ[A])−1 is concave with respect to the Löwner partial ordering on positive definite matrices.
Define .
Definition 3.1 (Matrix Φ-entropy functional [19]) —
Let be a convex function. Consider a random matrix with and . The matrix Φ-entropy HΦ(Z) is defined as
The corresponding conditional matrix Φ-entropy can be defined under the σ-algebra.
Theorem 3.2 (Subadditivity of the matrix Φ-entropy functional [19, theorem 2.5]) —
Let Φ∈(C2), and assume Z is a measurable function of (X1,…,Xn). We have
3.1 where is the conditional entropy, and denotes the conditional expectation conditioned on the n−1 random matrices .
Theorem 3.3 is the main result of this section. We show that all the equivalent conditions in theorem 2.2 also hold for the class of matrix Φ-entropy functionals. Hence, we have a more comprehensive understanding of the class of matrix Φ-entropy functionals.
Theorem 3.3 —
The following statements are equivalent:
(a) Φ∈(C2): Φ is affine or DΨ is invertible and A↦(DΨ[A])−1 is operator concave;
(b) matrix Brégman divergence: (A,B)↦Tr[Φ(A+B)−Φ(A)−DΦ[A](B)] is convex;
(c) (A,B)↦Tr[DΦ[A+B](B)−DΦ[A](B)] is convex;
(d) (A,B)↦Tr[D2Φ[A](B,B)] is convex;
(e) Φ is affine or Φ′′>0 andfor each and
3.2 (f) (A,B)↦ Tr[tΦ(A)+(1−t)Φ(B)−Φ(tA+(1−t)B)] is convex for any 0≤t≤1;
(g)
(h) {HΦ(Z)}Φ∈(C2) forms a convex set of convex functions;
(i) ; and
(j) .
We note that the statements (a)⇒(i)⇒(g)⇒(j) was proved by Chen & Tropp [19]. The equivalence of (a)⇔(b) was shown in [21, theorem 2]. Hansen and Zhang [20] established an equivalence of item (a) and the convexity of the following map:
| 3.3 |
From lemma A.5, it is not hard to observe that equation (3.3) is equivalent to item (d), i.e.
We provide the detailed proof of the remaining equivalence statements in §6.
4. Operator-valued Φ-entropies
Here, we extend the notion of matrix Φ-entropy functionals (i.e. real-valued) to operator-valued Φ-entropies.
Definition 4.1 (Operator-valued entropy class) —
Let d be a natural number. The class Φd contains each function such that its second-order Fréchet derivative exists and the following map satisfies the joint convexity (under the Löwner partial ordering) condition:
4.1 We denote the class of operator-valued Φ-entropies by (C3).
Definition 4.2 (Operator-valued Φ-entropies) —
Let be a convex function. Consider a random matrix Z taking values in , with and . That is, the random matrix Z and Φ(Z) are Bochner integrable [28,29] (hence, and exist and are well defined). The operator-valued Φ-entropy HΦ is defined as
4.2 The corresponding conditional terms can be defined under the σ algebra.
It is worth mentioning that the matrix Φ-entropy functional [19] in §3 is non-negative for every convex function Φ due to the fact that the trace function is also convex [32] (or see, for example, [33, section 2.2]). However, according to the operator Jensen inequality [34, theorem 3.2], only the operator convex function Φ ensures that the operator-valued Φ-entropy is non-negative.
In the following, we show that the entropy class (C3) is not an empty set.
Proposition 4.3 —
The square function Φ(u)=u2 belongs to (C3).
Proof. —
It suffices to verify the joint convexity of the map
where we use the identity of the second-order Fréchet derivative [35, example X.4.6] of the square function. Because the square function is operator convex, Φ(u)=u2 belongs to the operator-valued Φ-entropy class (C3). ▪
(a). Subadditivity of operator-valued Φ-entropies
Denote by a series of independent random variables taking values in a Polish space, and let
Let a positive semi-definite matrix Z that depends on the series of random variables X be defined as
Throughout this paper, we assume that the random matrix Z satisfies the integrability conditions: and |Φ(Z)| is Bochner integrable for Φ∈(C3).
Theorem 4.4 (Subadditivity of the operator-valued Φ-entropy) —
Fix a function Φ∈(C3). Under the prevailing assumptions,
4.3 where .
The proof is given in §7.
(b). Equivalent characterizations of operator-valued Φ-entropies
Here, we derive alternative characterizations of the class (C3) in theorem 4.5. As an application of the entropy class, we show that if the function Φ belongs to (C3), then the operator-valued Φ-entropy is monotone under any unital completely positive map.
Theorem 4.5 —
The following statements are equivalent:
(a) Φ∈(C3): convexity of (A,B)↦D2Φ[A](B,B);
(b) operator-valued Brégman divergence: (A,B)↦Φ(A+B)−Φ(A)−DΦ[A](B) is convex;
(c) (A,B)↦DΦ[A+B](B)−DΦ[A](B) is convex;
(d) (A,B)↦D2Φ[A](B,B) is convex;
(e) convexity of (A,B)↦ tΦ(A)+(1−t)Φ(B)−Φ(tA+(1−t)B) for any 0≤t≤1;
(f) ;
(g) {HΦ(Z)}Φ∈(C3) forms a convex set of convex functions;
(h) ; and
(i) .
The proof is omitted, because it directly follows from that of theorem 3.3 without taking traces.
Remark 4.6 —
In item (g) of theorem 4.5, we introduce a supremum representation for the operator-valued Φ-entropies. The supremum is defined as the least upper bound (under Löwner partial ordering) among the set of operators. In general, the supremum might not exist owing to matrix partial ordering; however, the supremum in (g) exists and is attained when T≡Z.
In the following, we demonstrate a monotone property of operator-valued Φ-entropies when Φ∈(C3).
Proposition 4.7 (Monotonicity of operator-valued Φ-entropies) —
Fix a convex function Φ∈(C3), then the operator-valued Φ-entropy HΦ(Z) is monotone under any unital completely positive map N, i.e.
for any random matrix Z taking values in .
Proof. —
If Φ∈(C3), by item (e) in theorem 4.5, then we have the joint convexity of the map:
for any 0≤t≤1. Let X=(A,B) denote the pair of matrices.
For any completely positive unital map N, it can be expressed in the following form [36]:
where (the identity matrix in ), and the dagger denotes the complex conjugate. Hence, by the Jensen operator inequality, proposition A.7 yields
for any completely positive unital map N, which implies the monotonicity of HΦ(Z). ▪
Following the same argument, the matrix Φ-entropy functional satisfies the monotone property if Φ∈(C2).
Corollary 4.8 (Monotonicity of matrix Φ-entropy functionals) —
Fix a convex function Φ∈(C2), then the matrix Φ-entropy functional HΦ(Z) is monotone under any unital completely positive map N: HΦ(N(Z))≤HΦ(Z) for any random matrix Z taking values in .
We remark that the monotonicity of a quantum ensemble is only known for when . This is the famous result in quantum information theory, namely the monotone property of the Holevo quantity [37]. Our corollary 4.8 extends the monotonicity of a quantum ensemble to any function Φ∈(C2).
5. Applications: operator Efron–Stein inequality
Here, we employ the operator subadditivity of Hu→u2(Z) to prove the operator Efron–Stein inequality. For 1≤i≤n, let X1′,…,Xn′ be independent copies of X1,…,Xn, and denote , i.e. replacing the ith component of X by the independent copy Xi′.
Define the quantity2
and denote the operator-valued variance of a random matrix A (taking values in ) by
Theorem 5.1 (Operator Efron–Stein inequality) —
With the prevailing assumptions, we have
Proof. —
Theorem 5.1 is a direct consequence of the subadditivity of operator-valued Φ-entropies, namely theorem 4.4 with Φ(u)=u2.
For two independent and identical random matrices A, B, direct calculation yields
Observe that Zi′ is an independent copy of Z conditioned on X−i. Denote for all i=1,…,n. Then
Finally, theorem 4.4 and proposition 4.3 lead to
▪
Note that the established operator Efron–Stein inequality leads directly to a matrix polynomial Efron–Stein inequality.
Corollary 5.2 (Matrix polynomial Efron–Stein) —
With the prevailing assumptions, for each natural number p≥1, we have
Corollary 5.2 is a variant of the matrix polynomial Efron–Stein inequality derived in [25, theorem 4.2].
6. Proof of theorem 3.3
Proof. —
(a)⇒(i)⇒(g)⇒(j) This statement is proved by Chen & Tropp in [19].
(a)⇔(b) This equivalent statement is proved in [21, theorem 2].
(a)⇔(d) Theorem 2.1 in [20] proved the equivalence of (a) and the following convexity lemma.
Lemma 6.1 (Convexity lemma [19, lemma 4.2]) —
Fix a function Φ∈(C2), and let Ψ=Φ′. Suppose that A is a random matrix taking values in and let X be a random matrix taking values in . Assume that ∥A∥, ∥X∥ are integrable. Then
What remains is to establish equivalence between the convexity lemma and condition (d). This follows easily from lemma A.5,
Remark 6.2 —
In [19, lemma 4.2], it is shown that the concavity of the map,
implies the joint convexity of the map (i.e. lemma 6.1),
6.1 (b)⇔(c)⇔(d) Define as
Following from [18], we can establish the following relations: for any ,
6.2 and
6.3 and, for small enough ϵ>0,
6.4 and
6.5 Equation (6.2) is exactly the integral representation for the matrix Brégman divergence proved in [21]. Similarly, equation (6.3) follows from
Equations (6.4) and (6.5) can be obtained by Taylor expansion at (u,0). That is,
Following the same argument,
We can observe from equations (6.2) and (6.3) that the joint convexity of (u,v)↦AΦ(u,v) and (u,v)↦BΦ(u,v) follows from that of (u,v)↦CΦ(u,v). In other words, we proved that conditions (d)⇒(b) and (d)⇒(c).
Conversely, equations (6.4) and (6.5) show that (b)⇒(d) and condition (c)⇒(d). To be more specific, the joint convexity of (u,v)↦AΦ(u,ϵv) implies
6.6 for each , t∈[0,1], ϵ>0, and u≡tu1+(1−t)u2, v≡tv1+(1−t)v2. Invoking equation (6.4) gives
and
Hence, equation (6.6) is equivalent to
The joint convexity of (u,v)↦CΦ(u,ϵv) follows by dividing by ϵ2 on both sides and letting . The joint convexity of (u,v)↦BΦ(u,ϵv) can be obtained in a similar way using equation (6.5).
(a)⇔(e) It is trivial if Φ is affine; hence, we assume Φ′′>0. We start from the convexity of the map,
6.7 To ease the burden of notation, we denote and by the isometric isomorphism between super-operators and matrices. Then, equation (6.7) can be rewritten as
which is equivalent to the non-negativity of the second derivative (see proposition A.2),
Now, recall the chain rule of the Fréchet derivative in proposition A.1,
and the formula of the differentiation of the inverse function (see lemma A.6),
we can compute the following identities by taking and u≡k:
and
Therefore, we reach the expression (3.2), and statement (a) is true if and only if (3.2) holds. Recall that, in the scalar case (i.e. d=1), the Fréchet derivative can be expressed as the product of the differential and the direction [38, theorem 3.11]
Hence, equation (3.2) reduces to
for all a>0 and . In other words, equation (3.2) can be viewed as a non-commutative generalization of the classical statement: Φ′′′′Φ′′≥2Φ′′′2.
(d)⇔(f) For any t∈[0,1], define as
By taking x≡(X,Y) and h≡(h,k) in proposition A.2, the convexity of the twice Fréchet differentiable function Ft is equivalent to
Then, with the help of the partial Fréchet derivative defined in proposition A.3, the second-order Fréchet derivative of Ft(X,Y) can be evaluated as
6.8 Taking trace on both sides of (6.8) and invoking lemma A.5, we have
6.9 Because both the trace and the second-order Fréchet derivative are bilinear, we have the following result:
6.10 Similarly,
6.11 Combining equations (6.10) and (6.11), equation (6.9) can be expressed as
Then, it is not hard to observe that the non-negativity of Tr[D2Ft[X,Y](h,k)] for every , and t∈[0,1] is equivalent to the joint convexity of the map
(j)⇒(g) Considering n=2, the subadditivity means that
Then, we have
(f)⇔(h) Let s∈[0,1]. Define a pair of positive semi-definite random matrices (X,Y) taking values (x,y) with probability s and (x′,y′) with probability (1−s). Then the convexity of HΦ implies that
6.12 for every t∈[0,1]. Now, define as
Then, it follows that
which means that the convexity of the pair (u,v)↦Ft(u,v) is equivalent to the convexity of HΦ, i.e. equation (6.12).
(g)⇔(h) Define a positive semi-definite random matrix , which depends on two random variables X1,X2 on a Polish space. Denote by ZX1 the random matrix Z conditioned on X1. According to the convexity of HΦ, it follows that
Conversely, define a positive semi-definite random matrix where s is a random variable. Now, let s be Bernoulli distributed with parameter t∈[0,1]. Then, for all t∈[0,1], the inequality coincides,
▪
7. Proof of theorem 4.4
Our approach to proving operator subadditivity (theorem 4.4) parallels [[19, theorem 2.5], [13, section 3.1]. The strategy is as follows. First, we prove the supremum representation for the operator-valued Φ-entropies in §7a. Second, we establish a conditional operator Jensen inequality in §7b. Finally, we arrive at the proof of theorem 4.4 in §7c.
(a). Representation of operator-valued Φ-entropy
Theorem 7.1 (Supremum representation for operator-valued Φ-entropies) —
Fix a function Φ∈(C3). Assume is a random positive definite matrix for which |Z|, |Φ(Z)∥ are Bochner integrable. Then the operator-valued Φ-entropy can be represented as
7.1 The range of the supremum contains each random positive definite matrix T for which |T| and |Φ(T)| are Bochner integrable. In particular, the normalized matrix Φ-entropy can be written in the dual form
7.2 where is a linear map of Z and .
Proof. —
Observe that, when T=Z, the right-hand side of equation (7.1) equals HΦ(Z). Then, it remains to confirm the inequality
7.3 for each random positive definite matrix T that satisfies the integrability conditions. We follow the interpolation argument as in [19, lemma 4.1]. For s∈[0,1], define the matrix-valued function
where
Note that F(0)=HΦ(Z), and F(1) matches the right-hand side of equation (7.3). As a result, it suffices to show that F′(s)≤0 for s∈[0,1] in order to verify equation (7.3). By the replacement Z−Ts=−s⋅(T−Z), the function F(s) can be rephrased as
Differentiate the above function to arrive at
7.4a
7.4b where we cancel the last two terms in equation (7.4a) and the second equation (7.4b) follows from the bilinearity of the second-order Fréchet differentiation.
Invoke the joint convexity condition of the function D2Φ[Ts](T−Z,T−Z) (see equation (4.1)), we establish the above derivative to be negative semi-definite, i.e. for s∈[0,1], and thus complete the proof. ▪
(b). A conditional operator Jensen inequality
Lemma 7.2 (Conditional operator Jensen inequality for operator-valued Φ-entropy) —
Suppose that (X1,X2) is a pair of independent random matrices taking values in a Polish space, and let Z=Z(X1,X2) be a positive definite random matrix for which |Z| and |Φ(Z)| are Bochner integrable. Then
where is the expectation with respect to the first matrix X1.
Proof. —
Let refer to the expectation with respect to the second matrix X2. In the following, we use T(X2) to emphasize that the matrix T depends only on the randomness in X2. Recall the supremum representation, equation (7.2); we have
The second relation follows from the Fubini theorem to interchange the order of and . In the third line, we use the convexity of the supremum. (Note that it is not always true under partial ordering. However, it holds in our case, because the supremum is attained when in the second line.) The last identity is exactly the supremum representation equation (7.2) in the conditional form. ▪
It is worth emphasizing that the conditional Jensen inequality can also be achieved by item (d) in theorem 4.5 (cf. (f)⇔(g)⇔(h) in theorem 3.3).
(c). Subadditivity of operator-valued Φ-entropies
Now we are in a position to prove the subadditivity of the operator-valued Φ-entropies.
Proof. —
By adding and subtracting the term , the operator-valued Φ-entropy can be expressed as
7.5 where the last inequality results from lemma 7.2, because X1 is independent from X−1.
Following the same reasoning, we obtain the operator-valued Φ-entropy conditioned on X1,
By plugging the expression into equation (7.5), we get
Finally, by repeating this procedure, we achieve the subadditivity of the operator-valued Φ-entropy
which completes our claim. ▪
8. Conclusion
In this paper, we extend the results of Chen & Tropp [19], Pitrik & Virosztek [21] and Hansen & Zhang [20] to complete the characterizations of the matrix Φ-entropy functionals. Moreover, we generalize the matrix Φ-entropy functionals to the operator-valued Φ-entropies, and show that this generalization preserves the subadditivity property. Additionally, we prove that the set of operator-valued Φ-entropies is not empty and contains at least the square function. Equivalent characterizations of the operator-valued Φ-entropies are also derived. This result demonstrates that the subadditivity of HΦ(Z) is equivalent to the operator convexity of HΦ(Z) on the convex cone of positive semi-definite operators. Finally, we exploit the subadditivity to prove the operator Efron–Stein inequality. It is promising that the proposed result can also derive the matrix exponential Efron–Stein (cf. [25, theorem 4.3]) and the moment inequalities for random matrices; see [[13], [15 ch. 15].
The subadditivity of matrix Φ-entropies leads to a series of important inequalities: matrix Poincaré inequalities with respect to binomial and Gaussian distributions, and the related matrix logarithmic Sobolev inequalities [22]. In [23], the subadditivity and the operator Efron–Stein inequality can be exploited to estimate the mixing time of a quantum random graph. It enables us to better understand the dynamics and long-term behaviours of a quantum system undergoing Markovian processes. We believe the proposed results will lead to more matrix functional inequalities, and have a substantial impact on operator algebra and quantum information science.
Finally, we remark that the results of operator-valued Φ-entropies and the operator Efron–Stein inequalities hold in the infinite-dimensional setting. This is not hard to verify, because the tools (such as Fréchet derivatives) employed in the proofs hold in the infinite dimension.
Acknowledgements
H.-C.C. sincerely thanks Marco Tomamichel for the helpful discussion about the operator-valued Φ-entropies.
Appendix A. Miscellaneous lemmas
Proposition A.1 ([Properties of Fréchet derivatives [38, theorem 3.4]) —
Let and be real Banach spaces. Let and be Fréchet differentiable at and respectively, and let (i.e. . Then is Fréchet differentiable at A and .
Proposition A.2 (Convexity of twice Fréchet differentiable matrix functions [39, proposition 2.2]) —
Let U be an open convex subset of a real Banach space and is also a real Banach space. Then a twice Fréchet differentiable function is convex if and only if for each X∈U and .
Proposition A.3 (Partial Fréchet derivative [40, proposition 5.3.15]) —
If is Fréchet differentiable at then the partial Fréchet derivatives and exist, and
Proposition A.4 [41, theorem 2.2] —
Let and . Assume is a continuously differentiable function defined on interval I and assume that the eigenvalues of A+tX⊂I. Then
Proposition A.4 directly leads to the following lemma.
Lemma A.5 —
Let and . Assume is a continuously differentiable function defined on interval I, and assume that the eigenvalues of A+tX⊂I. Then
Lemma A.6 (Second-order Fréchet derivative of inversion function) —
Let be second-order Fréchet differentiable at and be invertible. Then, for each we have
Proof. —
Denote as the inversion function. Recall the chain rule of the Fréchet derivative,
Applying the formulae of the Fréchet derivative of the inversion function (see, for example, [[35]; example X.4.2; [42, exercise 3.27]),
concludes the desired results. ▪
Proposition A.7 (Operator Jensen inequality [34,43–45]) —
Let (Ω,Σ) be a measurable space and suppose that is an open interval. Assume that, for every x∈Ω, K(x) is a (finite or infinite dimensional) square matrix and satisfies
(identity matrix in ). If is a measurable function for which σ( f(x))⊂I, for every x∈Ω, then
for every operator convex function . Moreover,
for every convex function .
Footnotes
Data accessibility
This work does not have any experimental data.
Authors' contributions
Both authors contributed equally to this paper.
Competing interests
We have no competing interests.
Funding
M.-H.H. is supported by an ARC Future Fellowship under grant no. FT140100574.
References
- 1.Csiszár I. 1963. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizittät von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl. 8, 85–108. [Google Scholar]
- 2.Csiszár I. 1967. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2, 299–318. [Google Scholar]
- 3.Ali SM, Silvey SD. 1966. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. B 28, 131–142. [Google Scholar]
- 4.Burbea J, Rao CR. 1982. Entropy differential metric, distance and divergence measures in probability spaces: a unified approach. J. Multivar. Anal. 12, 575–596. (doi:10.1016/0047-259X(82)90065-3) [Google Scholar]
- 5.Burbea J, Rao CR. 1982. On the convexity of higher order Jensen differences based on entropy functions. IEEE Trans. Inf. Theory 28, 961–963. (doi:10.1109/TIT.1982.1056573) [Google Scholar]
- 6.Burbea J, Rao CR. 1982. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28, 489–495. (doi:10.1109/TIT.1982.1056497) [Google Scholar]
- 7.Han TS. 1978. Nonnegative entropy measures of multivariate symmetric correlations. Inf. Control. 36, 133–156. (doi:10.1016/S0019-9958(78)90275-9) [Google Scholar]
- 8.Bobkov S, Ledoux M. 1997. Poincaré's inequalities and Talagrand's concentration phenomenon for the exponential distribution. Probab. Theory Relat. Fields 107, 383–400. (doi:10.1007/s004400050090) [Google Scholar]
- 9.Ledoux M. 1997. On Talagrand's deviation inequalities for product measures. ESAIM Probab. Stat. 1, 63–87. (doi:10.1051/ps:1997103) [Google Scholar]
- 10.Gross L. 1975. Logarithmic Sobolev inequalities. Am. J. Math. 97, 1061–1083. (doi:10.2307/2373688) [Google Scholar]
- 11.Latała R, Oleszkiewicz K. 2000. Between Sobolev and Poincaré. Geom. Funct. Anal.1745, 147–168 (doi:10.1007/bfb0107213)
- 12.Ané C, Blachère S, Fougères P, Gentil I, Malrieu F, Roberto C, Scheffer G. 2000. Sur les inégalités de Sobolev logarithmiques. Panoramas et Synthéses, vol. 10 Paris, France: Société Mathématique de France; [In French.] [Google Scholar]
- 13.Boucheron S, Bousquet O, Lugosi G, Massart P. 2005. Moment inequalities for functions of independent random variables. Ann. Prob. 33, 514–560. (doi:10.1214/009117904000000856) [Google Scholar]
- 14.Massart P (ed.) 2007. Concentration inequalities and model selection. Berlin, Germany: Springer; (doi:10.1007/978-3-540-48503-2) [Google Scholar]
- 15.Boucheron S, Lugosi G, Massart P. 2013. Concentration inequalities: a nonasymptotic theory of independence. Oxford, UK: Oxford University Press; (doi:10.1093/acprof:oso/9780199535255.001.0001) [Google Scholar]
- 16.Bakry D, Gentil I, Ledoux M. 2013. Analysis and geometry of Markov diffusion operators. Berlin, Germany: Springer; (doi:10.1007/978-3-319-00227-9) [Google Scholar]
- 17.Chafaï D. 2004. Entropies, convexity, and functional inequalities: on Φ-entropies and Φ-Sobolev inequalities. J. Math. Kyoto. Univ. 44, 325–363. (http://arxiv.org/abs/math/0211103v2) [Google Scholar]
- 18.Chafaï D. 2006. Binomial-Poisson entropic inequalities and the M/M/∞ queue. ESAIM Probab. Stat. 10, 317–339. (doi:10.1051/ps:2006013) [Google Scholar]
- 19.Chen RY, Tropp JA. 2014. Subadditivity of matrix φ-entropy and concentration of random matrices. Electron. J. Probab. 19, 1–30. (doi:10.1214/ejp.v19-2964) [Google Scholar]
- 20.Hansen F, Zhang Z. 2015. Characterisation of matrix entropies. Lett. Math. Phys. 105, 1399–1411. (doi:10.1007/s11005-015-0784-8) [Google Scholar]
- 21.Pitrik J, Virosztek D. 2015. On the joint convexity of the Bregman divergence of matrices. Lett. Math. Phys. 105, 675–692. (doi:10.1007/s11005-015-0757-y) [Google Scholar]
- 22.Cheng H-C, Hsieh M-H.2015. New characterizations of matrix Φ-entropies, Poincaré and Sobolev inequalities and an upper bound to Holevo quantity. (http://arxiv.org/abs/1506.06801. )
- 23.Cheng H-C, Hsieh M-H, Tomamichel M.2015. Exponential decay of matrix Φ-entropies on Markov semigroups with applications to dynamical evolutions of quantum ensembles. (http://arxiv.org/abs/1511.02627. )
- 24.Tropp JA. 2015. An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8, 1–230. (doi:10.1561/2200000048) [Google Scholar]
- 25.Paulin D, Mackey L, Tropp JA.2014. Efron–Stein inequalities for random matrices. (http://arxiv.org/abs/1408.3470. )
- 26.Efron B, Stein C. 1981. The jackknife estimate of variance. Ann. Stat. 9, 586–596. (doi:10.1214/aos/1176345462) [Google Scholar]
- 27.Steele JM. 1986. An Efron–Stein inequality for nonsymmetric statistics. Ann. Stat. 14, 753–758. (doi:10.1214/aos/1176349952) [Google Scholar]
- 28.Diestel J, Uhl J. 1977. Vector measures. Providence, RI: American Mathematical Society; (doi:10.1090/surv/015) [Google Scholar]
- 29.Mikusiński J. 1978. The Bochner integral. Berlin, Germany: Springer; (doi:10.1007/978-3-0348-5567-9) [Google Scholar]
- 30.Peller VV. 1985. Hankel operators in the perturbation theory of unitary and self-adjoint operators. Funct. Anal. Appl. 19, 111–123. (doi:10.1007/BF01078390) [Google Scholar]
- 31.Bickel K. 2007. Differentiating matrix functions. Oper. Matrices 7, 71–90. (doi:10.7153/oam-07-03) [Google Scholar]
- 32.von Neumann J. 1955. Mathematical foundations of quantum mechanics. Princeton, NJ: Princeton University Press. [Google Scholar]
- 33.Carlen E. 2010. Trace inequalities and quantum entropy: an introductory course. In Entropy and the quantum (eds R Sims, D Ueltschi), pp. 73–140. Contemporary Mathematics, vol. 529. Providence, RI: American Mathematical Society (doi:10.1090/conm/529/10428)
- 34.Farenick DR, Zhou F. 2007. Jensen's inequality relative to matrix-valued measures. J. Math. Anal. Appl. 327, 919–929. (doi:10.1016/j.jmaa.2006.05.008) [Google Scholar]
- 35.Bhatia R. 1997. Matrix analysis. Berlin, Germany: Springer; (doi:10.1007/978-1-4612-0653-8) [Google Scholar]
- 36.Mendl CB, Wolf MM. 2009. Unital quantum channels—convex structure and revivals of Birkhoff's theorem. Commun. Math. Phys. 289, 1057–1086. (doi:10.1007/s00220-009-0824-2) [Google Scholar]
- 37.Petz D. 2003. Monotonicity of quantum relative entropy revisited. Rev. Math. Phys. 15, 79–91. (doi:10.1142/S0129055X03001576) [Google Scholar]
- 38.Higham NJ. 2008. Functions of matrices: theory and computation. Philadelphia, PA: Society for Industrial & Applied Mathematics; (doi:10.1137/1.9780898717778) [Google Scholar]
- 39.Hansen F. 1997. Operator convex functions of several variables. Publ. Res. I. Math. Sci. 33, 443–463. (doi:10.2977/prims/1195145324) [Google Scholar]
- 40.Atkinson K, Han W. 2009. Theoretical numerical analysis: a functional analysis framework. Germany, Berlin: Springer; (doi:10.1007/978-1-4419-0458-4) [Google Scholar]
- 41.Hansen F, Pedersen GK. 1995. Perturbation formulas for traces on C∗-algebras. Publ. Res. Inst. Math. Sci. 31, 169–178. (doi:10.2977/prims/1195164797) [Google Scholar]
- 42.Hiai F, Petz D. 2014. Introduction to matrix analysis and applications. Germany, Berlin: Springer; (doi:10.1007/978-3-319-04150-6) [Google Scholar]
- 43.Davis C. 1957. A Schwarz inequality for convex operator functions. Proc. Am. Math. Soc. 8, 42–44. (doi:10.1090/S0002-9939-1957-0084120-4) [Google Scholar]
- 44.Choi MD. 1974. A Schwarz inequality for positive linear maps on C∗-algebras. Illinois J. Math. 18, 565–574. [Google Scholar]
- 45.Hansen F, Pedersen GK. 2003. Jensen's operator inequality. Bull. Lond. Math. Soc. 35, 553–564. (doi:10.1112/S0024609303002200) [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This work does not have any experimental data.
