Skip to main content
Proceedings. Mathematical, Physical, and Engineering Sciences logoLink to Proceedings. Mathematical, Physical, and Engineering Sciences
. 2016 Mar;472(2187):20150563. doi: 10.1098/rspa.2015.0563

Characterizations of matrix and operator-valued Φ-entropies, and operator Efron–Stein inequalities

Hao-Chung Cheng 1,2,, Min-Hsiu Hsieh 2
PMCID: PMC4841475  PMID: 27118909

Abstract

We derive new characterizations of the matrix Φ-entropy functionals introduced in Chen & Tropp (Chen, Tropp 2014 Electron. J. Prob. 19, 1–30. (doi:10.1214/ejp.v19-2964)). These characterizations help us to better understand the properties of matrix Φ-entropies, and are a powerful tool for establishing matrix concentration inequalities for random matrices. Then, we propose an operator-valued generalization of matrix Φ-entropy functionals, and prove the subadditivity under Löwner partial ordering. Our results demonstrate that the subadditivity of operator-valued Φ-entropies is equivalent to the convexity. As an application, we derive the operator Efron–Stein inequality.

Keywords: Φ-entropy, matrix concentration inequalities, Efron–Stein inequality

1. Introduction

The introduction of Φ-entropy functionals can be traced back to the early days of information theory [1,2] and convex analysis [36], where the notion of ϕ-divergence is defined. Formally, given a non-negative real random variable Z and a smooth convex function Φ, the Φ-entropy functional refers to

HΦ(Z)=EΦ(Z)Φ(EZ).

By the Jensen inequality, it is not hard to see that the quantity HΦ(Z) is non-negative. Hence, the Φ-entropy functional can be used as an entropic measure to characterize the uncertainty of the random variable Z.

The investigation of general properties of classical Φ-entropies has enjoyed great success in physics, probability theory, information theory and computer science. Of these, the subadditivity (or the tensorization) property [79] has led to the derivations of the logarithmic Sobolev [10], Φ-Sobolev [11] and Poincaré inequalities [12], which, in turn, is a crucial step towards the powerful entropy method in concentration inequalities [1315] and analysis of Markov semigroups [16].

Let Z=f(X1,…,Xn) be a random variable defined on n independent random variables (X1,…,Xn). We say HΦ(Z) is subadditive if

HΦ(Z)i=1nE[EiΦ(Z)Φ(EiZ)],

where Ei denotes the conditional expectation with respect to Xi. Gross [10] first observed that the ordinary entropy functional Hulogu(Z) is subadditive in his seminal paper. Later on, equivalent characterizations of the subadditive entropy class (see theorem 2.2) were established [11,17,18], which prove to be useful in other contexts such as stochastic processes [17,18].

Parallel to the classical Φ-entropies, Chen & Tropp [19] introduced the notion of matrix Φ-entropy functionals. Namely, for a positive semi-definite random matrix Z, the matrix Φ-entropy functional is defined as

HΦ(Z)tr¯[EΦ(Z)Φ(EZ)],

where tr¯ is the normalized trace. The class of subadditive matrix Φ-entropy functionals is characterized in terms of the second derivative of their representing functions. Unlike its classical counterpart, only a few connections between the matrix Φ-entropy functionals and other convex forms of the same functions have been established [20,21] prior to this work.

In this paper, we establish equivalent characterizations of the matrix Φ-entropy functionals defined in [19]. Our results show that matrix Φ-entropy functionals satisfy all known equivalent statements that classical Φ-entropy functions satisfy [15,17,18]. Our results provide additional justification for the original definition of the matrix Φ-entropy functionals (table 1). The equivalences between matrix Φ-entropy functionals and other convex forms of the function Φ advance our understanding of the class of entropy functions. Moreover, it allows us to unify the study of matrix concentration inequalities and matrix Φ-Sobolev inequalities [22,23].

Table 1.

Comparison between the equivalent characterizations of the Φ-entropy functional class (C1) (definition 2.1) and the matrix Φ-entropy functional class (C2) (definition 3.1).

classical Φ-entropy functional class (C1) matrix Φ-entropy functional class (C2)
(a) Φ is affine or Φ′′>0 and 1/Φ′′ is concave Φ is affine or DΦ′ is invertible and (DΦ′)−1 is concave
(b) convexity of (u,v)↦Φ(u+v)−Φ(u)−Φ′(u)v convexity of (u,v)↦Tr[Φ(u+v)−Φ(u)−DΦ[u](v)]
(c) convexity of (u,v)↦(Φ′(u+v)−Φ′(u))v convexity of (u,v)↦Tr[(DΦ[u+v](v)−DΦ[u](v)]
(d) convexity of (u,v)↦Φ′′(u)v2 convexity of (u,v)↦Tr[D2Φ[u](v,v)]
(e) Φ is affine or Φ′′>0 and Φ′′′′Φ′′≥2Φ′′′2 equation (3.2)
(f) convexity of (u,v)↦tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) for any 0≤t≤1 convexity of (u,v)↦ Tr[tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v)] for any 0≤t≤1
(g) E1HΦ(Z|X1)HΦ(E1Z) E1HΦ(Z|X1)HΦ(E1Z)
(h) HΦ(Z) is a convex function of Z HΦ(Z) is a convex function of Z
(i) HΦ(Z)=supT>0{E[Υ1(T)Z+Υ2(T)]} HΦ(Z)=supT0{tr¯E[Υ1(T)Z+Υ2(T)]}
(j) HΦ(Z)i=1nEHΦ(i)(Z) HΦ(Z)i=1nEHΦ(i)(Z)

Furthermore, we consider the following operator-valued generalization of matrix Φ-entropy functionals:

HΦ(Z)EΦ(Z)Φ(EZ).

A special case of this operator-valued Φ-entropy functional is the operator-valued variance Var(Z) defined in [24,25], where Φ is the square function. The equivalent conditions for the subadditivity under Löwner partial ordering are derived (theorem 4.5). In particular, we show that subadditivity of the operator-valued Φ-entropies is equivalent to the convexity,

subadditvity ofHΦ(Z)HΦ(Z) is convex in Z.

Our result directly yields the operator Efron–Stein inequality, which recovers the well-known Efron–Stein inequality [26,27] when random matrices reduce to real random variables.

(a). Our results

We summarize our results here. First, we derive equivalent characterizations for the matrix Φ-entropy functionals in table 1 (see theorem 3.3). Notably, all known equivalent characterizations for the classical Φ-entropies can be generalized to their matrix correspondences. We emphasize that additional characterizations of the Φ-entropies prove to be useful in many instances. The characterizations (b)–(d) in (C1) are explored by Chafaï [18] to derive several entropic inequalities for M/M/ queueing processes that are not diffusions. With the characterizations (b)–(d), the difficulty of lacking the diffusion property can be circumvented and replaced by convexity. Moreover, as shown in corollary 4.8, item (f) in table 1 can be used to demonstrate an interesting result in quantum information theory: the matrix Φ-entropy functional of a quantum ensemble (i.e. a set of quantum states with some prior distribution) is monotone under any unital quantum channel. This property motivates us to study the dynamical evolution of a quantum ensemble and its mixing time, a fundamentally important problem in quantum computation (see our follow-up work [23] for further details).

Second, we define and derive equivalent characterizations for operator-valued Φ-entropies in table 2 (see theorem 4.5). Note that the only known statement in table 1 that is missing in table 2 is condition (e). In other words, we are not able to generalize (e) in table 1 to the non-commutative case. Finally, we employ the subadditivity of operator-valued Φ-entropies to show the operator Efron–Stein inequality in theorem 5.1.

Table 2.

Equivalent statements of the operator-valued Φ-entropy class (C3) (definition 4.1).

operator-valued Φ-entropy class (C3)
(a) the second-order Fréchet derivative D2Φ[u](v,v) is jointly convex in (u,v)
(b) Φ(u+v)−Φ(u)−DΦ[u](v) is jointly convex in (u,v)
(c) DΦ[u+v](v)−DΦ[u](v) is jointly convex in (u,v)
(d) D2Φ[u](v,v) is jointly convex in (u,v)
(e) tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) is jointly convex in (u,v) for any 0≤t≤1
(f) E1HΦ(Z|X1)HΦ(E1Z)
(g) HΦ(Z) is a convex function of Z
(h) HΦ(Z)=supT0{E[Υ1(T)Z+Υ2(T)]}
(i) HΦ(Z)i=1nEHΦ(i)(Z)

(b). Prior work

For the history of the equivalent characterizations in the class (C1), we refer to an excellent textbook [15] and papers [17,18].

The original definition of the matrix Φ-entropy class, namely (a) in (C2), was proposed by Chen & Tropp in 2014 [19]. In the same paper, they also established the subadditivity property (j) through (i) and (g): (a)⇒(i)⇒(g)⇒(j) in table 1. Shortly after, the equivalent relation between (a) and the joint convexity of the matrix Brégman divergence (b) was proved in [21]. The equivalent relation between (a) and (d) was almost immediately implied by the result in [20] (see the detailed discussion in the proof of theorem 3.3). The convexity of HΦ(Z), (h), is noted in [20]. Here, we provide transparent evidence—the joint convexity of (f).

We organize the paper in the following way. We collect the necessary information for the matrix algebra in §2. The equivalent characterizations of the matrix Φ-entropy functionals are provided in §3. We define the operator-valued Φ-entropies and derive their equivalent statements in §4. Section 5 shows an application of the subadditivity—the operator Efron–Stein inequality. The proofs of the main results are collected in §6 and §7, respectively. Finally, we conclude the paper.

2. Preliminaries

We first introduce basic notation.

The set Msa refers to the subspace of self-adjoint operators on some separable Hilbert space. We denote by M+ (resp. M++) the set of positive semi-definite (resp. positive-definite) operators in Msa. If the dimension d of a Hilbert space needs special attention, then we highlight it in subscripts, e.g. Md denotes the Banach space of d×d complex matrices. The trace function Tr:Cd×dC is defined as the summation of eigenvalues. The normalized trace function tr¯ for every d×d matrix M is denoted by tr¯[M](1/d)Tr[M]. For p[1,), the Schatten p-norm of an operator M is denoted as Mp(i|λi(M)|p)1/p, where {λi(M)} are the singular values of M. The Hilbert–Schmidt inner product is defined as A,BTrAB. For A,BMsa, A_B means that AB is positive semi-definite. Similarly, AB means AB is positive-definite. Throughout this paper, italic capital letters (e.g. X) are used to denote operators.

Denote a probability space (Ω,Σ,P). A random matrix Z defined on the probability space (Ω,Σ,P) means that it is a matrix-valued random variable defined on Ω. We denote the expectation of Z with respect to P by

EP[Z]ΩZdμ=xΩZ(x)P(dx),

where the integral is the Bochner integral [28,29]. We note that the results derived in this paper are universal for all probability spaces. Hence, we will omit the subscript P of the expectation. If we consider a sample space Ω1×Ω2 with joint distribution P, then we denote the conditional expectation of Z with respect to the first space Ω1 by Ei[Z]x1Ω1Z(x1,X2)P1(dx1), where P1(x1)=x2Ω2P(x1,dx2) is the marginal distribution on Ω1.

Let U,W be real Banach spaces. The Fréchet derivative of a function L:UW at a point XU, if it exists,1 is a unique linear mapping DL[X]:UW such that

L(X+E)L(X)DL[X](E)W=o(EU),

where U(W) is a norm in U (resp. W). The notation DL[X](E) then is interpreted as ‘the Fréchet derivative of L at X in the direction E’. The partial Fréchet derivative of multivariate functions can be defined as follows. Let U,V and W be real Banach spaces, L:U×VW. For a fixed v0V, L(u,v0) is a function of u whose derivative at u0, if it exists, is called the partial Fréchet derivative of L with respect to u, and is denoted by DuL[u0,v0]. The partial Fréchet derivative DvL[u0,v0] is defined similarly. Likewise, the mth Fréchet derivative DmL[X] is a unique multi-linear map from UmU××U (m times) to W that satisfies

Dm1L[X+Em](E1,,Em1)Dm1L[X](E1,,Em1)DmL[X](E1,,Em)W=o(EmU)

for each EiU,i=1,,m. The Fréchet derivative enjoys several properties as in standard derivatives. We provide these facts in appendix A.

A function f:IR is called operator convex if, for each A,BMsa(I) and 0≤t≤1,

f(tA)+f((1t)B)f(tA+(1t)B).

Similarly, a function f:IR is called operator monotone if, for each A,BMsa(I),

ABf(A)f(B).

(a). Classical Φ-entropy functionals

Let (C1) denote the class of functions Φ:[0,)R that are continuous, convex on [0,), twice differentiable on (0,), and either Φ is affine or Φ′′ is strictly positive and 1/Φ′′ is concave.

Definition 2.1 (Classical Φ-entropies) —

Let Φ:[0,)R be a convex function. For every non-negative integrable random variable Z, so that E|Z|< and E|Φ(Z)|<, the classical Φ-entropy HΦ(Z) is defined as

HΦ(Z)=EΦ(Z)Φ(EZ).

In particular, we are interested in Z=f(X1,…,Xn), where X1,…,Xn are independent random variables and f≥0 is a measurable function.

We say HΦ(Z) is subadditive [9] if

HΦ(Z)i=1nE[HΦ(i)(Z)],

where HΦ(i)(Z)=EiΦ(Z)Φ(EiZ) is the conditional Φ-entropy, and Ei denotes the conditional expectation conditioned on the n−1 random variables Xi(X1,Xi1,Xi+1,,Xn). Sometimes, we also denote HΦ(i)(Z) by HΦ(Z|Xi).

It is a well-known result that, for any function Φ∈(C1), HΦ(Z) is subadditive [11, corollary 3] (see also [13, section 3]).

Theorem 2.2 establishes equivalent characterizations of classical Φ-entropies.

Theorem 2.2 [18, theorem 4.4] —

The following statements are equivalent:

  • (a) Φ∈(C1): Φ is affine, or Φ′′>0 and 1/Φ′′ is concave;

  • (b) Brégman divergence (u,v)↦Φ(u+v)−Φ(u)−Φ′(u)v is convex;

  • (c) (u,v)↦(Φ′(u+v)−Φ′(u))v is convex;

  • (d) (u,v)↦Φ′′(u)v2 is convex;

  • (e) Φ is affine or Φ′′>0 and Φ′′′′Φ′′≥2Φ′′′2;

  • (f) (u,v)↦tΦ(u)+(1−t)Φ(v)−Φ(tu+(1−t)v) is convex for any 0≤t≤1;

  • (g) E1HΦ(Z|X1)HΦ(E1Z);

  • (h) {HΦ(Z)}Φ∈(C1) forms a convex set;

  • (i) HΦ(Z)=supT>0(E[(Φ(T)Φ(ET))(ZT)]+HΦ(T)); and

  • (j) HΦ(Z)i=1nEHΦ(i)(Z).

3. Equivalent characterizations of matrix Φ-entropy functionals

Here, we first introduce matrix Φ-entropy functionals, and present the main result (theorem 3.3) of this section, namely new characterizations of the matrix Φ-entropy functionals.

Chen & Tropp [19] introduce the class of matrix Φ-entropies, and prove its subadditivity in 2014. Here, we will show that all equivalent characterizations of classical Φ-entropies in theorem 2.2 have a one-to-one correspondence for the class of matrix Φ-entropies.

Let d be a natural number. The class Φd contains each function Φ:(0,)R that is either affine or satisfies the following three conditions.

  • (i) Φ is convex and continuous at zero.

  • (ii) Φ is twice continuously differentiable.

  • (iii) Define Ψ(t)=Φ′(t) for t>0. The Fréchet derivative DΨ of the standard matrix function Ψ:Md++Mdsa is an invertible linear map on Md++, and the map A↦(DΨ[A])−1 is concave with respect to the Löwner partial ordering on positive definite matrices.

Define (C2)Φd=1Φd.

Definition 3.1 (Matrix Φ-entropy functional [19]) —

Let Φ:[0,)R be a convex function. Consider a random matrix ZMd+ with EZ< and EΦ(Z)<. The matrix Φ-entropy HΦ(Z) is defined as

HΦ(Z)tr¯[EΦ(Z)Φ(EZ)].

The corresponding conditional matrix Φ-entropy can be defined under the σ-algebra.

Theorem 3.2 (Subadditivity of the matrix Φ-entropy functional [19, theorem 2.5]) —

Let Φ∈(C2), and assume Z is a measurable function of (X1,…,Xn). We have

HΦ(Z)i=1nE[HΦ(i)(Z)], 3.1

where HΦ(i)(Z)=EiΦ(Z)Φ(EiZ) is the conditional entropy, and Ei denotes the conditional expectation conditioned on the n−1 random matrices Xi(X1,,Xi1,Xi+1,,Xn).

Theorem 3.3 is the main result of this section. We show that all the equivalent conditions in theorem 2.2 also hold for the class of matrix Φ-entropy functionals. Hence, we have a more comprehensive understanding of the class of matrix Φ-entropy functionals.

Theorem 3.3 —

The following statements are equivalent:

  • (a) Φ∈(C2): Φ is affine or DΨ is invertible and A↦(DΨ[A])−1 is operator concave;

  • (b) matrix Brégman divergence: (A,B)↦Tr[Φ(A+B)−Φ(A)−DΦ[A](B)] is convex;

  • (c) (A,B)↦Tr[DΦ[A+B](B)−DΦ[A](B)] is convex;

  • (d) (A,B)↦Tr[D2Φ[A](B,B)] is convex;

  • (e) Φ is affine or Φ′′>0 and
    Tr[h(DΨ[A])1D3Ψ[A](k,k,(DΨ[A])1(h))]2Tr[h(DΨ[A])1D2Ψ[A](k,(DΨ[A])1(D2Ψ[A](k,(DΨ[A])1(h))))], 3.2
    for each A_0 and h,kMdsa;
  • (f) (A,B)↦ Tr[tΦ(A)+(1−t)Φ(B)−Φ(tA+(1−t)B)] is convex for any 0≤t≤1;

  • (g) E1HΦ(Z|X1)HΦ(E1Z);

  • (h) {HΦ(Z)}Φ∈(C2) forms a convex set of convex functions;

  • (i) HΦ(Z)=supT0{TrE[(Φ(T)Φ(ET))(ZT)]+HΦ(T)}; and

  • (j) HΦ(Z)i=1nEHΦ(i)(Z).

We note that the statements (a)⇒(i)⇒(g)⇒(j) was proved by Chen & Tropp [19]. The equivalence of (a)⇔(b) was shown in [21, theorem 2]. Hansen and Zhang [20] established an equivalence of item (a) and the convexity of the following map:

(A,X)X,DΦ[A](X). 3.3

From lemma A.5, it is not hard to observe that equation (3.3) is equivalent to item (d), i.e.

Tr(D2Φ[A](X,X))=X,DΦ[A](X).

We provide the detailed proof of the remaining equivalence statements in §6.

4. Operator-valued Φ-entropies

Here, we extend the notion of matrix Φ-entropy functionals (i.e. real-valued) to operator-valued Φ-entropies.

Definition 4.1 (Operator-valued entropy class) —

Let d be a natural number. The class Φd contains each function Φ:[0,)R such that its second-order Fréchet derivative exists and the following map satisfies the joint convexity (under the Löwner partial ordering) condition:

(A,X)D2Φ[A](X,X),AMdsa,XMd. 4.1

We denote the class of operator-valued Φ-entropies Φd=1Φd by (C3).

Definition 4.2 (Operator-valued Φ-entropies) —

Let Φ:[0,)R be a convex function. Consider a random matrix Z taking values in M+, with EZ< and EΦ(Z)<. That is, the random matrix Z and Φ(Z) are Bochner integrable [28,29] (hence, EZ and EΦ(Z) exist and are well defined). The operator-valued Φ-entropy HΦ is defined as

HΦ(Z)EΦ(Z)Φ(EZ). 4.2

The corresponding conditional terms can be defined under the σ algebra.

It is worth mentioning that the matrix Φ-entropy functional [19] in §3 is non-negative for every convex function Φ due to the fact that the trace function tr¯Φ is also convex [32] (or see, for example, [33, section 2.2]). However, according to the operator Jensen inequality [34, theorem 3.2], only the operator convex function Φ ensures that the operator-valued Φ-entropy is non-negative.

In the following, we show that the entropy class (C3) is not an empty set.

Proposition 4.3 —

The square function Φ(u)=u2 belongs to (C3).

Proof. —

It suffices to verify the joint convexity of the map

D2Φ[A](X,X)=2X2for all A,XM+,

where we use the identity of the second-order Fréchet derivative [35, example X.4.6] of the square function. Because the square function is operator convex, Φ(u)=u2 belongs to the operator-valued Φ-entropy class (C3). ▪

(a). Subadditivity of operator-valued Φ-entropies

Denote by X(X1,,Xn) a series of independent random variables taking values in a Polish space, and let

Xi(X1,,Xi1,Xi+1,,Xn).

Let a positive semi-definite matrix Z that depends on the series of random variables X be defined as

ZZ(X1,,Xn)M+.

Throughout this paper, we assume that the random matrix Z satisfies the integrability conditions: |Z|Z2 and |Φ(Z)| is Bochner integrable for Φ∈(C3).

Theorem 4.4 (Subadditivity of the operator-valued Φ-entropy) —

Fix a function Φ∈(C3). Under the prevailing assumptions,

HΦ(Z)i=1nE[HΦ(i)(Z)], 4.3

where HΦ(i)(Z)=HΦ(Z|Xi)EiΦ(Z)Φ(EiZ).

The proof is given in §7.

(b). Equivalent characterizations of operator-valued Φ-entropies

Here, we derive alternative characterizations of the class (C3) in theorem 4.5. As an application of the entropy class, we show that if the function Φ belongs to (C3), then the operator-valued Φ-entropy is monotone under any unital completely positive map.

Theorem 4.5 —

The following statements are equivalent:

  • (a) Φ∈(C3): convexity of (A,B)↦D2Φ[A](B,B);

  • (b) operator-valued Brégman divergence: (A,B)↦Φ(A+B)−Φ(A)−DΦ[A](B) is convex;

  • (c) (A,B)↦DΦ[A+B](B)−DΦ[A](B) is convex;

  • (d) (A,B)↦D2Φ[A](B,B) is convex;

  • (e) convexity of (A,B)↦ tΦ(A)+(1−t)Φ(B)−Φ(tA+(1−t)B) for any 0≤t≤1;

  • (f) E1HΦ(Z|X1)HΦ(E1Z);

  • (g) {HΦ(Z)}Φ∈(C3) forms a convex set of convex functions;

  • (h) HΦ(Z)=supT0{E[DΦ[T](ZT)DΦ[ET](ZT)]+HΦ(T)}; and

  • (i) HΦ(Z)i=1nEHΦ(i)(Z).

The proof is omitted, because it directly follows from that of theorem 3.3 without taking traces.

Remark 4.6 —

In item (g) of theorem 4.5, we introduce a supremum representation for the operator-valued Φ-entropies. The supremum is defined as the least upper bound (under Löwner partial ordering) among the set of operators. In general, the supremum might not exist owing to matrix partial ordering; however, the supremum in (g) exists and is attained when TZ.

In the following, we demonstrate a monotone property of operator-valued Φ-entropies when Φ∈(C3).

Proposition 4.7 (Monotonicity of operator-valued Φ-entropies) —

Fix a convex function Φ∈(C3), then the operator-valued Φ-entropy HΦ(Z) is monotone under any unital completely positive map N, i.e.

HΦ(N(Z))HΦ(Z)

for any random matrix Z taking values in M+.

Proof. —

If Φ∈(C3), by item (e) in theorem 4.5, then we have the joint convexity of the map:

Ft(A,B)tΦ(A)+(1t)Φ(B)Φ(tA+(1t)B)

for any 0≤t≤1. Let X=(A,B) denote the pair of matrices.

For any completely positive unital map N, it can be expressed in the following form [36]:

N(A)=iKiAKi,

where iKiKi=I (the identity matrix in Msa), and the dagger denotes the complex conjugate. Hence, by the Jensen operator inequality, proposition A.7 yields

Ft(N(X))Ft(X),0t1

for any completely positive unital map N, which implies the monotonicity of HΦ(Z). ▪

Following the same argument, the matrix Φ-entropy functional satisfies the monotone property if Φ∈(C2).

Corollary 4.8 (Monotonicity of matrix Φ-entropy functionals) —

Fix a convex function Φ∈(C2), then the matrix Φ-entropy functional HΦ(Z) is monotone under any unital completely positive map N: HΦ(N(Z))≤HΦ(Z) for any random matrix Z taking values in M+.

We remark that the monotonicity of a quantum ensemble is only known for when Φ(x)=xlogx. This is the famous result in quantum information theory, namely the monotone property of the Holevo quantity [37]. Our corollary 4.8 extends the monotonicity of a quantum ensemble to any function Φ∈(C2).

5. Applications: operator Efron–Stein inequality

Here, we employ the operator subadditivity of Huu2(Z) to prove the operator Efron–Stein inequality. For 1≤in, let X1′,…,Xn′ be independent copies of X1,…,Xn, and denote X~(i)(X1,,Xi1,Xi,Xi+1,,Xn), i.e. replacing the ith component of X by the independent copy Xi′.

Define the quantity2

E(L)12E[i=1n(L(X)L(X~(i)))2],

and denote the operator-valued variance of a random matrix A (taking values in Msa) by

Var(A)E(AEA)2=EA2(EA)2.

Theorem 5.1 (Operator Efron–Stein inequality) —

With the prevailing assumptions, we have

Var(Z)E(Z).

Proof. —

Theorem 5.1 is a direct consequence of the subadditivity of operator-valued Φ-entropies, namely theorem 4.4 with Φ(u)=u2.

For two independent and identical random matrices A, B, direct calculation yields

12E[(AB)2]=12E[A2ABBA+B2]=Var(A).

Observe that Zi′ is an independent copy of Z conditioned on Xi. Denote Var(i)(Z)Ei(ZEiZ)2 for all i=1,…,n. Then

Var(i)(Z)=12Ei[(ZZi)2].

Finally, theorem 4.4 and proposition 4.3 lead to

Var(Z)=Huu2(Z)i=1nEHuu2(i)(Z)=i=1nVar(i)(Z)=E(Z).

 ▪

Note that the established operator Efron–Stein inequality leads directly to a matrix polynomial Efron–Stein inequality.

Corollary 5.2 (Matrix polynomial Efron–Stein) —

With the prevailing assumptions, for each natural number p≥1, we have

E(ZEZ)2pp12i=1nE[(ZZi)2]pp.

Corollary 5.2 is a variant of the matrix polynomial Efron–Stein inequality derived in [25, theorem 4.2].

6. Proof of theorem 3.3

Proof. —

(a)⇒(i)⇒(g)⇒(j) This statement is proved by Chen & Tropp in [19].

(a)⇔(b) This equivalent statement is proved in [21, theorem 2].

(a)⇔(d) Theorem 2.1 in [20] proved the equivalence of (a) and the following convexity lemma.

Lemma 6.1 (Convexity lemma [19, lemma 4.2]) —

Fix a function Φ∈(C2), and let Ψ=Φ′. Suppose that A is a random matrix taking values in Md++, and let X be a random matrix taking values in Mdsa. Assume thatA∥, ∥Xare integrable. Then

EX,DΨ[A](X)E[X],DΨ[EA](EX).

What remains is to establish equivalence between the convexity lemma and condition (d). This follows easily from lemma A.5,

Tr(D2Φ[A](X,X))=X,DΦ[A](X).

Remark 6.2 —

In [19, lemma 4.2], it is shown that the concavity of the map,

AX(DΨ[A])1(X),XMdsa,

implies the joint convexity of the map (i.e. lemma 6.1),

(X,A)X(DΨ[A])(X). 6.1

(b)⇔(c)⇔(d) Define AΦ,BΦ,CΦ:Md+×Md+R as

AΦ(u,v)Tr[Φ(u+v)Φ(u)DΦ[u](v)],BΦ(u,v)Tr[DΦ[u+v](v)DΦ[u](v)],CΦ(u,v)Tr[D2Φ[u](v,v)].

Following from [18], we can establish the following relations: for any (u,v)Md+×Md+,

AΦ(u,v)=01(1s)CΦ(u+sv,v)ds 6.2

and

BΦ(u,v)=01CΦ(u+sv,v)ds, 6.3

and, for small enough ϵ>0,

AΦ(u,ϵv)=12CΦ(u,v)ϵ2+o(ϵ2) 6.4

and

BΦ(u,ϵv)=CΦ(u,v)ϵ2+o(ϵ2). 6.5

Equation (6.2) is exactly the integral representation for the matrix Brégman divergence proved in [21]. Similarly, equation (6.3) follows from

BΦ(u,v)=ddsTr[Φ(u+sv)]|s=1ddsTr[Φ(u+sv)]|s=0=01dds(ddsTr[Φ(u+sv)])ds=01CΦ(u+sv,v)ds.

Equations (6.4) and (6.5) can be obtained by Taylor expansion at (u,0). That is,

AΦ(u,ϵv)=AΦ(u,0)+DuAΦ[u,0](0)+DvAΦ[u,0](ϵv)+12(Du2AΦ[u,0](0,0)+2DuDvAΦ[u,0](0,ϵv)+Dv2AΦ[u,0](ϵv,ϵv))+o(ϵ2)=Tr[DΦ[u+0](ϵv)DΦ[u](D[v](ϵv))+12D2Φ[u+0](ϵv,ϵv)]+o(ϵ2)=12CΦ(u,v)ϵ2+o(ϵ2).

Following the same argument,

BΦ(u,ϵv)=BΦ(u,0)+D2Φ[u+0](0,ϵv)+DΦ[u+0](D[v](ϵv))DΦ[u](D[v](ϵv))+12(D3Φ[u+0](0,ϵv,ϵv)+2D2Φ[u+0](ϵv,ϵv))+o(ϵ2)=CΦ(u,v)ϵ2+o(ϵ2).

We can observe from equations (6.2) and (6.3) that the joint convexity of (u,v)↦AΦ(u,v) and (u,v)↦BΦ(u,v) follows from that of (u,v)↦CΦ(u,v). In other words, we proved that conditions (d)⇒(b) and (d)⇒(c).

Conversely, equations (6.4) and (6.5) show that (b)⇒(d) and condition (c)⇒(d). To be more specific, the joint convexity of (u,v)↦AΦ(u,ϵv) implies

tAΦ(u1,ϵv1)+(1t)AΦ(u2,ϵv2)AΦ(u,ϵv), 6.6

for each u1,u2,v1,v2Md+, t∈[0,1], ϵ>0, and utu1+(1−t)u2, vtv1+(1−t)v2. Invoking equation (6.4) gives

tAΦ(u1,ϵv1)+(1t)AΦ(u2,ϵv2)=tCΦ(u1,v1)+(1t)CΦ(u2,v2)2ϵ2+o(ϵ2)

and

AΦ(u,ϵv)=12CΦ(u,ϵv)ϵ2+o(ϵ2).

Hence, equation (6.6) is equivalent to

tCΦ(u1,v1)ϵ2+(1t)CΦ(u2,v2)ϵ2+o(ϵ2)CΦ(u,ϵv)ϵ2+o(ϵ2).

The joint convexity of (u,v)↦CΦ(u,ϵv) follows by dividing by ϵ2 on both sides and letting ϵ0. The joint convexity of (u,v)↦BΦ(u,ϵv) can be obtained in a similar way using equation (6.5).

(a)⇔(e) It is trivial if Φ is affine; hence, we assume Φ′′>0. We start from the convexity of the map,

ATr[h(DΨ[A])1(h)],for all hMdsa. 6.7

To ease the burden of notation, we denote TADΨ[A]Cd2×d2 and h^hCd2×1 by the isometric isomorphism between super-operators and matrices. Then, equation (6.7) can be rewritten as

Ah^TA1h^,for all h^Cd2×1,

which is equivalent to the non-negativity of the second derivative (see proposition A.2),

DA2[h^TA1h^](k,k)=h^DA2[TA1](k,k)h^0,for all A0,h^Cd2×1,kMdsa.

Now, recall the chain rule of the Fréchet derivative in proposition A.1,

DFG[A](u)=DF[G(A)](DG[A](u));D2FG[A](u,v)=D2F[G(A)](DG[A](v),DG[A](v))+DF[G(A)](D2G[A](u,v)),

and the formula of the differentiation of the inverse function (see lemma A.6),

DG[A]1(u)=G(A)1DG[A](u)G(A)1;D2G[A]1(u,u)=2G(A)1DG[A](u)G(A)1DG[A](u)g(A)1G(A)1D2G[A](u,u)G(A)1,

we can compute the following identities by taking G[A]TA and uk:

DA[TA1](k)=TA1DA[TA](k)TA1

and

DA[TA1](k,k)=2TA1DA[TA](k)TA1DA[TA](k)TA1TA1DA2[TA](k,k)TA1.

Therefore, we reach the expression (3.2), and statement (a) is true if and only if (3.2) holds. Recall that, in the scalar case (i.e. d=1), the Fréchet derivative can be expressed as the product of the differential and the direction [38, theorem 3.11]

DΨ[a]h=Ψ(a)h.

Hence, equation (3.2) reduces to

h(Ψ(a))1Ψ(a)k2(Ψ(a))1h=Φ(a)Φ(a)k2h2Φ(a)22h(Ψ(a))1Ψ(a)k(Ψ(a))1Ψ(a)k(Ψ(a))1h=2Φ(a)2k2h2Φ(a)3,

for all a>0 and h,kR. In other words, equation (3.2) can be viewed as a non-commutative generalization of the classical statement: Φ′′′′Φ′′≥2Φ′′′2.

(d)⇔(f) For any t∈[0,1], define Ft:Md+×Md+Mdsa as

Ft(X,Y)tΦ(X)+(1t)Φ(Y)Φ(tX+(1t)Y).

By taking x≡(X,Y) and h≡(h,k) in proposition A.2, the convexity of the twice Fréchet differentiable function Ft is equivalent to

D2Ft[X,Y](h,k)0X,YMd+andh,kMdsa.

Then, with the help of the partial Fréchet derivative defined in proposition A.3, the second-order Fréchet derivative of Ft(X,Y) can be evaluated as

D2Ft[X,Y](h,k)=DX2Ft[X,Y](h,h)+DYDXFt[X,Y](h,k)+DXDYFt[X,Y](k,h)+DY2Ft[X,Y](k,k)=tD2Φ[X](h,h)t2D2Φ[tX+(1t)Y](h,h)t(1t)D2Φ[tX+(1t)Y](h,k)t(1t)D2Φ[tX+(1t)Y](k,h)+(1t)D2Φ[Y](k,k)(1t)2D2Φ[tX+(1t)Y](k,k). 6.8

Taking trace on both sides of (6.8) and invoking lemma A.5, we have

Tr[D2Ft[X,Y](h,k)]=Tr[thDΨ[X](h)t2hDΨ[tX+(1t)Y](h)]Tr[t(1t)hDΨ[tX+(1t)Y](k)+t(1t)kDΨ[tX+(1t)Y](h)]+Tr[(1t)kDΨ[Y](k)(1t)2kD2Ψ[tX+(1t)Y](k)]. 6.9

Because both the trace and the second-order Fréchet derivative are bilinear, we have the following result:

Tr[t2hDΨ[tX+(1t)Y](h)+t(1t)kDΨ[tX+(1t)Y](h)]=th,DΨ[tX+(1t)Y](th)+(1t)k,DΨ[tX+(1t)Y](th)=th+(1t)k,DΨ[tX+(1t)Y](th). 6.10

Similarly,

Tr[t(1t)hDΨ[tX+(1t)Y](k)+(1t)2kDΨ[tX+(1t)Y](k)]=th+(1t)k,DΨ[tX+(1t)Y]((1t)k). 6.11

Combining equations (6.10) and (6.11), equation (6.9) can be expressed as

Tr[D2Ft[X,Y](h,k)]=th,DΨ[X](h)+(1t)k,DΨ[Y](k)(th+(1t)k),DΨ[tX+(1t)Y](th+(1t)k).

Then, it is not hard to observe that the non-negativity of Tr[D2Ft[X,Y](h,k)] for every X,YMd+, h,kMdsa and t∈[0,1] is equivalent to the joint convexity of the map

(X,A)X,DΨ[A](X)=Tr[D2Φ[A](X,X)].

(j)⇒(g) Considering n=2, the subadditivity means that

HΦ(Z)E1HΦ(2)(Z)+E2HΦ(1)(Z).

Then, we have

E1HΦ(2)(Z)HΦ(Z)E2HΦ(1)(Z)=EΦ(Z)Φ(EZ)E2E1Φ(Z)+E2Φ(E1Z)=E2Φ(E1Z)Φ(E2E1Z)=HΦ(E1Z).

(f)⇔(h) Let s∈[0,1]. Define a pair of positive semi-definite random matrices (X,Y) taking values (x,y) with probability s and (x′,y′) with probability (1−s). Then the convexity of HΦ implies that

HΦ(tX+(1t)Y)tHΦ(X)+(1t)HΦ(Y) 6.12

for every t∈[0,1]. Now, define Ft(u,v):Md+×Md+R as

Ft(u,v)Tr[tΦ(u)+(1t)Φ(v)Φ(tu+(1t)v)].

Then, it follows that

sFt(x,y)+(1s)Ft(x,y)Ft(s(x,y)+(1s)x,y)=tEΦ(X)tΦ(EX)+(1t)EΦ(Y)(1t)Φ(EY)EΦ(tX+(1t)Y)+Φ(tEX+(1t)EY)=tHΦ(X)+(1t)HΦ(Y)HΦ(tX+(1t)Y),

which means that the convexity of the pair (u,v)↦Ft(u,v) is equivalent to the convexity of HΦ, i.e. equation (6.12).

(g)⇔(h) Define a positive semi-definite random matrix Zf(X1,X2), which depends on two random variables X1,X2 on a Polish space. Denote by ZX1 the random matrix Z conditioned on X1. According to the convexity of HΦ, it follows that

E1HΦ(Z|X1)=E1HΦ(ZX1)=E1[tr¯(E2Φ(ZX1)Φ(E2ZX1))]tr¯E2Φ(E1ZX1)tr¯[Φ(E1E2ZX1)]=HΦ(E1Z).

Conversely, define a positive semi-definite random matrix Z(s,X,Y)sX+(1s)Y, where s is a random variable. Now, let s be Bernoulli distributed with parameter t∈[0,1]. Then, for all t∈[0,1], the inequality E1HΦ(Z|s)HΦ(E1Z) coincides,

HΦ(tX+(1t)Y)tHΦ(X)+(1t)HΦ(Y).

 ▪

7. Proof of theorem 4.4

Our approach to proving operator subadditivity (theorem 4.4) parallels [[19, theorem 2.5], [13, section 3.1]. The strategy is as follows. First, we prove the supremum representation for the operator-valued Φ-entropies in §7a. Second, we establish a conditional operator Jensen inequality in §7b. Finally, we arrive at the proof of theorem 4.4 in §7c.

(a). Representation of operator-valued Φ-entropy

Theorem 7.1 (Supremum representation for operator-valued Φ-entropies) —

Fix a function Φ∈(C3). Assume ZMd++ is a random positive definite matrix for which |Z|, |Φ(Z)∥ are Bochner integrable. Then the operator-valued Φ-entropy can be represented as

HΦ(Z)=supT0E[DΦ[T](ZT)DΦ[ET](ZT)+Φ(T)Φ(ET)]. 7.1

The range of the supremum contains each random positive definite matrix T for which |T| and |Φ(T)| are Bochner integrable. In particular, the normalized matrix Φ-entropy can be written in the dual form

HΦ(Z)=supT0E[Υ1(T,Z)+Υ2(T)], 7.2

where Υ1(T,Z)=DΦ[T](Z)DΦ[ET](Z) is a linear map of Z and Υ2(T)=DΦ[T](T)+DΦ[ET](T)+(Φ(T)Φ(ET)).

Proof. —

Observe that, when T=Z, the right-hand side of equation (7.1) equals HΦ(Z). Then, it remains to confirm the inequality

HΦ(Z)E[DΦ[T](ZT)DΦ[ET](ZT)+Φ(T)Φ(ET)] 7.3

for each random positive definite matrix T that satisfies the integrability conditions. We follow the interpolation argument as in [19, lemma 4.1]. For s∈[0,1], define the matrix-valued function

F(s)=E[DΦ[Ts](ZTs)DΦ[ETs](ZTs)]+HΦ(Ts),

where

Ts(1s)Z+sTfor s[0,1].

Note that F(0)=HΦ(Z), and F(1) matches the right-hand side of equation (7.3). As a result, it suffices to show that F′(s)≤0 for s∈[0,1] in order to verify equation (7.3). By the replacement ZTs=−s⋅(TZ), the function F(s) can be rephrased as

F(s)=sE[DΦ[Ts](TZ)DΦ[ETs](TZ)]+E[Φ(Ts)Φ(ETs)].

Differentiate the above function to arrive at

F(s)=sE[D2Φ[Ts](TZ,TZ)]+sE[D2Φ[ETs](TZ,E(TZ))]E[DΦ[Ts](TZ)DΦ[ETs](TZ)]+E[DΦ[Ts](TZ)DΦ[ETs](TZ)] 7.4a
=sE[D2Φ[Ts](TZ,TZ)+sD2Φ[ETs](E(TZ),E(TZ))], 7.4b

where we cancel the last two terms in equation (7.4a) and the second equation (7.4b) follows from the bilinearity of the second-order Fréchet differentiation.

Invoke the joint convexity condition of the function D2Φ[Ts](TZ,TZ) (see equation (4.1)), we establish the above derivative to be negative semi-definite, i.e. F(s)0 for s∈[0,1], and thus complete the proof. ▪

(b). A conditional operator Jensen inequality

Lemma 7.2 (Conditional operator Jensen inequality for operator-valued Φ-entropy) —

Suppose that (X1,X2) is a pair of independent random matrices taking values in a Polish space, and let Z=Z(X1,X2) be a positive definite random matrix for which |Z| and |Φ(Z)| are Bochner integrable. Then

HΦ(E1Z)EHΦ(Z|X1),

where E1 is the expectation with respect to the first matrix X1.

Proof. —

Let E2 refer to the expectation with respect to the second matrix X2. In the following, we use T(X2) to emphasize that the matrix T depends only on the randomness in X2. Recall the supremum representation, equation (7.2); we have

HΦ(E1Z)=supTE2[Υ1(T(X2),E1Z)+Υ2(T(X2))]=supTE1E2[Υ1(T(X2),Z)+Υ2(T(X2))]E1supTE2[Υ1(T(X2),Z)+Υ2(T(X2))]=E1supTE[Υ1(T(X2),Z)+Υ2(T(X2))|X1]=E1HΦ(Z|X1).

The second relation follows from the Fubini theorem to interchange the order of E1 and E2. In the third line, we use the convexity of the supremum. (Note that it is not always true under partial ordering. However, it holds in our case, because the supremum is attained when TE1Z in the second line.) The last identity is exactly the supremum representation equation (7.2) in the conditional form. ▪

It is worth emphasizing that the conditional Jensen inequality can also be achieved by item (d) in theorem 4.5 (cf. (f)⇔(g)⇔(h) in theorem 3.3).

(c). Subadditivity of operator-valued Φ-entropies

Now we are in a position to prove the subadditivity of the operator-valued Φ-entropies.

Proof. —

By adding and subtracting the term Φ(E1Z), the operator-valued Φ-entropy can be expressed as

HΦ(Z)=E[Φ(Z)Φ(E1Z)+Φ(E1Z)Φ(EZ)]=E[E1Φ(Z)Φ(E1Z)]+[EΦ(E1Z)Φ(EE1Z)]=EHΦ(Z|X1)+HΦ(E1Z)EHΦ(Z|X1)+E1HΦ(Z|X1), 7.5

where the last inequality results from lemma 7.2, because X1 is independent from X−1.

Following the same reasoning, we obtain the operator-valued Φ-entropy conditioned on X1,

HΦ(Z|X1)E[HΦ(Z|X2)|X1]+E2HΦ(Z|X1,X2).

By plugging the expression into equation (7.5), we get

HΦ(Z)i=12EHΦ(Z|Xi)+E1E2HΦ(Z|X1,X2).

Finally, by repeating this procedure, we achieve the subadditivity of the operator-valued Φ-entropy

HΦ(Z)i=1nE[HΦ(Z|Xi)],

which completes our claim. ▪

8. Conclusion

In this paper, we extend the results of Chen & Tropp [19], Pitrik & Virosztek [21] and Hansen & Zhang [20] to complete the characterizations of the matrix Φ-entropy functionals. Moreover, we generalize the matrix Φ-entropy functionals to the operator-valued Φ-entropies, and show that this generalization preserves the subadditivity property. Additionally, we prove that the set of operator-valued Φ-entropies is not empty and contains at least the square function. Equivalent characterizations of the operator-valued Φ-entropies are also derived. This result demonstrates that the subadditivity of HΦ(Z) is equivalent to the operator convexity of HΦ(Z) on the convex cone of positive semi-definite operators. Finally, we exploit the subadditivity to prove the operator Efron–Stein inequality. It is promising that the proposed result can also derive the matrix exponential Efron–Stein (cf. [25, theorem 4.3]) and the moment inequalities for random matrices; see [[13], [15 ch. 15].

The subadditivity of matrix Φ-entropies leads to a series of important inequalities: matrix Poincaré inequalities with respect to binomial and Gaussian distributions, and the related matrix logarithmic Sobolev inequalities [22]. In [23], the subadditivity and the operator Efron–Stein inequality can be exploited to estimate the mixing time of a quantum random graph. It enables us to better understand the dynamics and long-term behaviours of a quantum system undergoing Markovian processes. We believe the proposed results will lead to more matrix functional inequalities, and have a substantial impact on operator algebra and quantum information science.

Finally, we remark that the results of operator-valued Φ-entropies and the operator Efron–Stein inequalities hold in the infinite-dimensional setting. This is not hard to verify, because the tools (such as Fréchet derivatives) employed in the proofs hold in the infinite dimension.

Acknowledgements

H.-C.C. sincerely thanks Marco Tomamichel for the helpful discussion about the operator-valued Φ-entropies.

Appendix A. Miscellaneous lemmas

Proposition A.1 ([Properties of Fréchet derivatives [38, theorem 3.4]) —

Let U,V and W be real Banach spaces. Let L1:UV and L2:VW be Fréchet differentiable at AU and L1(A), respectively, and let L=L2L1 (i.e. L(A)=L2(L1(A)). Then L is Fréchet differentiable at A and DL[A](E)=DL2[L1(A)](DL1[A](E)).

Proposition A.2 (Convexity of twice Fréchet differentiable matrix functions [39, proposition 2.2]) —

Let U be an open convex subset of a real Banach space U, and W is also a real Banach space. Then a twice Fréchet differentiable function L:UW is convex if and only if D2L(X)(h,h)0 for each XU and hU.

Proposition A.3 (Partial Fréchet derivative [40, proposition 5.3.15]) —

If L:U×VW is Fréchet differentiable at (X,Y)U×V, then the partial Fréchet derivatives DXL[X,Y] and DYL[X,Y] exist, and

DL[X,Y](h,k)=DXL[X,Y](h)+DYL[X,Y](k).

Proposition A.4 [41, theorem 2.2] —

Let A,XMsa and tR. Assume f:IR is a continuously differentiable function defined on interval I and assume that the eigenvalues of A+tXI. Then

ddtTrf(A+tX)|t=t0=Tr[Xf(A+t0X)].

Proposition A.4 directly leads to the following lemma.

Lemma A.5 —

Let A,X,YMsa and tR. Assume f:IR is a continuously differentiable function defined on interval I, and assume that the eigenvalues of A+tXI. Then

Tr(D2f[A](X,Y))=X,Df[A](Y)=Y,Df[A](X).

Lemma A.6 (Second-order Fréchet derivative of inversion function) —

Let G:MM be second-order Fréchet differentiable at AM, and G(A) be invertible. Then, for each h,kM, we have

DG[A]1(h)=G(A)1DG[A](h)G(A)1;D2G[A]1(h,k)=2G(A)1DG[A](k)G(A)1DG[A](k)G(A)1G(A)1D2G[A](h,k)G(A)1.

Proof. —

Denote F:AA1 as the inversion function. Recall the chain rule of the Fréchet derivative,

DFG[A](h)=DF[G(A)](DG[A](h));D2FG[A](h,k)=D2F[G(A)](DG[A](k),DG[A](k))+DF[G(A)](D2G[A](h,k)).

Applying the formulae of the Fréchet derivative of the inversion function (see, for example, [[35]; example X.4.2; [42, exercise 3.27]),

D[X]1(Y)=X1YX1,D2[X]1(Y1,Y2)=X1Y1X1Y2X1+X1Y2X1Y1X1,

concludes the desired results. ▪

Proposition A.7 (Operator Jensen inequality [34,4345]) —

Let (Ω,Σ) be a measurable space and suppose that IR is an open interval. Assume that, for every xΩ, K(x) is a (finite or infinite dimensional) square matrix and satisfies

xΩK(dx)K(dx)=I

(identity matrix in Msa). If f:ΩMsa is a measurable function for which σ( f(x))⊂I, for every xΩ, then

ϕ(xΩK(dx)f(x)K(dx))xΩK(dx)ϕ(f(x))K(dx)μ(dx)

for every operator convex function ϕ:IR. Moreover,

Tr[ϕ(xΩK(dx)f(x)K(dx)μ(dx))]Tr[xΩK(dx)ϕ(f(x))K(dx)μ(dx)]

for every convex function ϕ:IR.

Footnotes

1

We assume that the functions considered in the paper are Fréchet differentiable. The reader can refer to, for example, [30,31] for conditions when a function is Fréchet differentiable.

2

Note that we will use notation E(L) and E(Z) interchangeably.

Data accessibility

This work does not have any experimental data.

Authors' contributions

Both authors contributed equally to this paper.

Competing interests

We have no competing interests.

Funding

M.-H.H. is supported by an ARC Future Fellowship under grant no. FT140100574.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

This work does not have any experimental data.


Articles from Proceedings. Mathematical, Physical, and Engineering Sciences / The Royal Society are provided here courtesy of The Royal Society

RESOURCES