Skip to main content
Entropy logoLink to Entropy
. 2024 Mar 29;26(4):306. doi: 10.3390/e26040306

Structural Properties of the Wyner–Ziv Rate Distortion Function: Applications for Multivariate Gaussian Sources

Michail Gkagkos 1, Charalambos D Charalambous 2,*
Editor: Eduard Jorswieck
PMCID: PMC11048919  PMID: 38667860

Abstract

The main focus of this paper is the derivation of the structural properties of the test channels of Wyner’s operational information rate distortion function (RDF), R¯(ΔX), for arbitrary abstract sources and, subsequently, the derivation of additional properties for a tuple of multivariate correlated, jointly independent, and identically distributed Gaussian random variables, {Xt,Yt}t=1, Xt:ΩRnx, Yt:ΩRny, with average mean-square error at the decoder and the side information, {Yt}t=1, available only at the decoder. For the tuple of multivariate correlated Gaussian sources, we construct optimal test channel realizations which achieve the informational RDF, R¯(ΔX)=infM(ΔX)I(X;Z|Y), where M(ΔX) is the set of auxiliary RVs Z such that PZ|X,Y=PZ|X, X^=f(Y,Z), and E{||XX^||2}ΔX. We show the following fundamental structural properties: (1) Optimal test channel realizations that achieve the RDF and satisfy conditional independence, PX|X^,Y,Z=PX|X^,Y=PX|X^,EX|X^,Y,Z=EX|X^=X^. (2) Similarly, for the conditional RDF, RX|Y(ΔX), when the side information is available to both the encoder and the decoder, we show the equality R¯(ΔX)=RX|Y(ΔX). (3) We derive the water-filling solution for RX|Y(ΔX).

Keywords: Wyner’s side information, multivariate Gaussian sources, test channel distributions

1. Introduction, Problem Statement, and Main Results

1.1. The Wyner and Ziv Lossy Compression Problem and Generalizations

Wyner and Ziv [1] derived an operational information definition for the lossy compression problem in Figure 1 with respect to a single-letter fidelity of reconstruction. The joint sequence of random variables (RVs) {(Xt,Yt):t=1,2,} takes values in sets of finite cardinality, {X,Y}, and it is generated independently according to the joint probability distribution function PX,Y. Wyner [2] generalized [1] to RVs {(Xt,Yt):t=1,2,} that take values in abstract alphabet spaces {X,Y} and hence include continuous-valued RVs.

Figure 1.

Figure 1

The Wyner and Ziv [1] block diagram of lossy compression. If switch A is closed, then the side information is available at both the encoder and the decoder; if switch A is open, the side information is only available at the decoder.

(A) Switch “A” Closed: When the side information {Yt:t=1,2,} is available non-causally at both the encoder and the decoder, Wyner [2] (see also Berger [3]) characterized the infimum of all achievable operational rates (denoted by R¯1(ΔX) in [2]), subject to a single-letter fidelity with average distortion less than or equal to ΔX[0,). The rate is given by the single-letter operational information theoretic conditional RDF:

RX|Y(ΔX)=infM0(ΔX)I(X;X^|Y)[0,],ΔX[0,) (1)
=infPX^|X,Y:EdX(X,X^)ΔXI(X;X^|Y) (2)

where M0(ΔX) is the set specified by

M0(ΔX)={X^:ΩX^:PX,Y,X^isthejointmeasureonX×Y×X^,EdX(X,X^)ΔX}, (3)

and X^ is the reproduction of X. I(X;X^|Y) is the conditional mutual information between X and X^ conditioned on Y, and dX(·,·) is the fidelity criterion between x and x^. The infimum in (1) is over all elements of M0(ΔX) with induced joint distributions PX,Y,X^ of the RVs (X,Y,X^) such that the marginal distribution PX,Y is the fixed joint distribution of the source (X,Y). This problem is equivalent to (2) [4].

(B) Switch “A” Open: When the side information is available non-causally only at the decoder, Wyner [2] characterized the infimum of all achievable operational rates (denoted by R(ΔX) in [2]), subject to a single-letter fidelity with average distortion less than or equal to ΔX. The rate is given by the single-letter operational information theoretic RDF as a function of an auxiliary RV Z:ΩZ:

R¯(ΔX)=infM(ΔX)I(X;Z)I(Y;Z)[0,],ΔX[0,) (4)
=infM(ΔX)I(X;Z|Y) (5)

where M(ΔX) is specified by the set of auxiliary RVs Z and defined as:

M(ΔX)={Z:ΩZ:PX,Y,Z,X^isthejointmeasureonX×Y×Z×X^,PZ|X,Y=PZ|X,meas.fun.f:Y×ZX^,X^=f(Y,Z),EdX(X,X^)ΔX}. (6)

Wyner’s realization of the joint measure PX,Y,Z,X^ induced by the RVs (X,Y,Z,X^) is illustrated in Figure 2, where Z is the output of the “test channel”, PZ|X. Clearly, R¯(ΔX) involves two strategies, i.e., f(·,·) and PZ|X,Y=PZ|X. This makes it a much more complex problem compared to RX|Y(ΔX) (which involves only PX^|X,Y).

Figure 2.

Figure 2

Test channel when side information is only available to the decoder.

Throughout [2], the following assumption is imposed.

Assumption 1. 

I(X;Y)< (see [2]).

Wyner [2] considered scalar-valued jointly Gaussian RVs (X,Y) with square-error distortion and constructed the optimal realizations X^ and (Z,X^) and the function f(X,Z) from the sets M0(ΔX) and M(ΔX), respectively. Also, it is shown that these realizations achieve the characterizations of the RDFs RX|Y(ΔX) and R¯(ΔX), respectively, and that the two rates are equal, i.e., R¯(ΔX)=RX|Y(ΔX).

(C) Marginal RDF: If there is no side information, {Yt:t=1,2,}, or the side information is independent of the source, {Xt:t=1,2,}, the RDFs RX|Y(ΔX) and R¯(ΔX) degenerate to the marginal RDF RX(ΔX), defined by

RX(ΔX)=infPX^|X:EdX(X,X^)ΔXI(X;X^)[0,],ΔX[0,). (7)

(D) Gray’s Lower Bounds: A lower bound on RX|Y(ΔX) is given by Gray in [4] [Theorem 3.1]. This bound connects RX|Y(ΔX) with the marginal RDF and the mutual information between X and Y as follows:

RX|Y(ΔX)RX(ΔX)I(X;Y). (8)

Clearly, the lower bound is trivial for values of ΔX[0,) such that RX(ΔX)I(X;Y)<0.

1.2. Main Contributions of the Paper

We first consider Wyner’s [2] RDFs RX|Y(ΔX) and R¯(ΔX) for arbitrary RVs (X,Y) defined on abstract alphabet spaces, and we derive structural properties of the realizations that achieve the two optimal test channels. Subsequently, we generalize Wyner’s [2] results to multivariate-valued jointly Gaussian RVs (X,Y). In other words, we construct the optimal multivariate-valued realizations X^ and (X^,Z) and the function f(X,Z) which achieve the RDFs RX|Y(ΔX) and R¯(ΔX), respectively. In the literature, it is often called achievability of the converse coding theorem. In addition, we use the realizations to prove the equality R¯(ΔX)=RX|Y(ΔX) and to derive the water-filling solution. Along the way, we verify that our results reproduce, for scalar-valued RVs (X,Y), Wyner [2] RDFs and the optimal realizations. However, to our surprise, the existing results from the literature [[5], Theorem 4 and Abstract and [6], Theorem 3A], which deal with the more general multivariate-valued remote sensor problem (the RDF of the remote sensor problem is a generalization of Wyner’s RDF R¯(ΔX), with the encoder observing a noisy version of the RVs generated by the source), do not degenerate to Wyner’s [2] RDFs, when specilized to scalar-valued RVs (we verify this in Remark 5 by also checking the correction suggested in https://tiangroup.engr.tamu.edu/publications/) (accessed on 3 January 2024). In Section 1.3, we give a detailed account of the main results of this paper. We should emphasize that preliminary results of this paper appeared in [7], mostly without the details of the proofs. This paper is extended [7] and contains complete proofs of the preliminary results of [7], which in some cases are lengthy (see, for example, Section 4, proofs of Theorems 3–5, Corollaries 1 and 2, etc.).

1.3. Problem Statement and Main Results

(a) We consider a tuple of jointly independent and identically distributed (i.i.d.) arbitrary RVs (Xn,Yn)={(Xt,Yt):t=1,2,,n} defined on abstract alphabet spaces, and we derive the following results.

(a.1) Lemma 1: Achievable lower bound on the conditional mutual information I(X;X^|Y), which strengthens Gray’s lower bound (8) [[4], Theorem 3.1].

(a.2) Theorem 2: Structural properties of the optimal reconstruction X^, which achieves a lower bound on RX|Y(ΔX) for mean-square error distortion. Theorem 2 strengthens the conditions for the equality to hold, RX|Y(ΔX)=R¯(ΔX), given by Wyner [2] [Remarks, p. 65] (see Remark 1). However, for finite-alphabet-valued sources with Hamming distance distortion, it might be the case that RX|Y(ΔX)<R¯(ΔX), as pointed out by Wyner and Ziv [1] [Section 3] for the doubly symmetric binary source.

(b) We consider a tuple of jointly i.i.d. multivariate Gaussian RVs (Xn,Yn)={(Xt,Yt):t=1,2,,n}, with respect to the square-error fidelity, as defined below.

Xt:ΩRnx=X,Yt:ΩRny=Y,t=1,2,,n, (9)
XtN(0,QX),YtN(0,QY), (10)
Q(Xt,Yt)=EXtYtXtYtT=QXQX,YQX,YTQY, (11)
PXt,Yt=PX,YmultivariateGaussiandistribution, (12)
X^t:ΩRnx=X,t=1,2,n, (13)
DX(xn,x^n)=1nt=1n||xtx^t||Rnx2, (14)

where nx,ny are arbitrary positive integers, XN(0,QX) means X is a Gaussian RV, with zero mean and covariance matrix QX, and ||·||Rnx2 is the Euclidean distance on Rnx. To give additional insight we often consider the following realization of side information (the condition DDT0 ensures I(X;Y)<, and hence, Assumption 1 is respected).

Yt=CXt+DVt, (15)
VtN(0,QV), (16)
CRny×nx,DRny×ny,DDT0,QV=Iny, (17)
VnindependentofXn, (18)

where Iny denotes the ny×ny identity matrix. For the above specification of the source and distortion criterion, we derive the following results.

(b.1) Theorems 3 and 4: Structural properties of optimal realization of X^, which achieves RX|Y(ΔX), its closed form expression.

(b.2) Theorem 5: Structural properties of optimal realization of X^ and X^=f(Y,Z), which achieve R¯(ΔX) and the closed form expression of R¯(ΔX).

(b.3) A proof that R¯(ΔX) and RX|Y(ΔX) coincide: Calculation of the distortion region such that Gray’s lower bound (8) holds with equality.

In Remark 4, we consider the tuple of scalar-valued, jointly Gaussian RVs (X,Y) with square error distortion function and verify that our optimal realizations of X^ and the closed form expressions for RX|Y(ΔX) and R¯(ΔX) are identical to Wyner’s [2] realizations and RDFs.

We should emphasize that our methodology is different from past studies in the sense that we focus on the structural properties of the realizations of the test channels, that achieve the characterizations of the two RDFs (i.e., verification of the converse coding theorem). Our derivations are generic and bring new insight into the construction of realizations that induce the optimal test channels of other distributed source coding problems (i.e., establishing the achievability of the converse coding theorem).

1.4. Additional Generalizations of the Wyner-Ziv [1] and Wyner [2] RDFs

Below, we discuss additional generalizations of Wyner and Ziv [1] and Wyner’s [2] RDFs.

(A) Draper and Wornell [8] Distributed Remote Source Coding Problem: Draper and Wornell [8] generalized the RDF R¯(ΔX), when the source to be estimated at the decoder is S:ΩS, and it is not directly observed at the encoder. Rather, the encoder observes a RV X:ΩX (which is correlated with S), while the decoder observes another RV, as side information, Y:ΩY, which provides information on (S,X). The aim is to reconstruct S at the decoder by S^:ΩS^, subject to an average distortion E{dS(S,S^)}ΔS, by a function S^=f(Y,Z). The RDF for this problem, called the distributed remote source coding problem, is defined by [8]

R¯PO(ΔS)=infMPO(ΔS)I(X;Z|Y)[0,], (19)

where MPO(ΔS) is specified by the set of auxiliary RVs Z, and defined as:

MPO(ΔS)={Z:ΩZ:PS,X,Y,Z,X^isthejointmeasureonS×X×Y×Z×X^,PZ|S,X,Y=PZ|X,measurablefunctionfPO:Y×ZS^,S^=fPO(Y,Z),EdS(S,S^)ΔS}. (20)

Clearly, if S=Xa.s (almost surely), then R¯PO(ΔS) degenerates (this implies the optimal test channel that achieves the characterization of the RDF R¯PO(ΔS) should degenerate to the optimal test channel that achieves the characterization of the RDF R¯(ΔX)) to R¯(ΔX). For scalar-valued jointly Gaussian RVs (S,X,Y,Z,X^) with square-error distortion, Draper and Wornell [8] [Equation (3) and Appendix A.1] derived the characterization of the RDF R¯PO(ΔS) and constructed the optimal realization S^=fPO(Y,Z), which achieves this characterization.

In [5,6], the authors investigated the RDF R¯PO(ΔS) of [8] for the multivariate jointly Gaussian RVs (S,X,Y,Z,X^), with square-error distortion, and derived a characterization for the RDF R¯PO(ΔS) in [[5], Theorem 4] and [[6], Theorem 3A] (see [[6], Equation (26)]). However, it will become apparent in Remark 5 that, when S=X almost surely (a.s.), and hence R¯PO(ΔS)=R¯(ΔX), the RDFs given in [[5], Theorem 4] and [[6], Theorem 3A], do not produce Wyner’s [2] value. We also show in Remark 5 that the same technical issues occur for the correction suggested in https://tiangroup.engr.tamu.edu/publications/ (accessed on 3 January 2024). Similarly, when S=Xa.s. and Y=Xa.s. [[5], Theorem 4] and [[6], Theorem 3A], do not produce the classical RDF RX(ΔX) of the Gaussian source X.

(B) Additional Literature Review: The formulation of Figure 1 is generalized to other multiterminal or distributed lossy compression problems, such as relay networks, sensor networks, etc., under various code formulations and assumptions. Oohama [9] analyzed lossy compression problems for a tuple of scalar correlated Gaussian memoryless sources with square error distortion criterion. Also, he determined the rate-distortion region, in the special case when one source provides partial side information to the other source. Furthermore, Oohama in [10] analyzed separate lossy compression problems for L+1 scalar correlated Gaussian memoryless sources, when L of the sources provide partial side information at the decoder for the reconstruction of the remaining source and gave a partial answer to the rate distortion region. Additionally, ref. [10] proved that the problem of [10] includes, as a special case, the additive white Gaussian CEO problem analyzed by Viswanathan and Berger [11]. Extensions of [10] are derived by Ekrem and Ulukus [12] and Wang and Chen [13], where an outer bound on the rate region is derived for the vector Gaussian multiterminal source. Additional works are [14,15,16] and the references therein.

The vast literature on multiterminal or distributed lossy compression of jointly Gaussian sources with square-error distortion (including the references mentioned above), is often confined to scalar-valued correlated RVs. Moreover, as easily verified, not much emphasis is given in the literature on the structural properties of the realizations of RVs that induce the optimal test channels that achieve the characterizations of the RDFs.

The rest of the paper is organized as follows. In Section 2, we review Wyner’s [2] operational definition of lossy compression. We also state a fundamental theorem on mean-square estimation that we use throughout the paper regarding the analysis of (b). The main Theorems are presented in Section 3; some of the proofs, including the structural properties, are given in Section 4. Connections between our results and the past literature are provided in Section 5. A simulation to show the gap between the two rates is given in the same section.

2. Preliminaries

In this section, we review the Wyner [2] source coding problems with fidelity in Figure 1. We begin with the notation, which follows closely [2].

2.1. Notation

Let Z={,1,0,1,} the set of all integers, N={0,1,2,,} the set of natural integers, Z+={1,2,,}. For nZ+, denote the following finite subset of the above defined set, Zn={1,2,,n}. Denote the real numbers by R and the set of positive and of strictly positive real numbers, by R+=[0,) and R++=(0,), respectively.

For any matrix ARp×m,(p,m)Z+×Z+, we denote its kernel by ker(A) its transpose by AT, and for m=p, we denote its trace by trace(A), and by diag{A}, the matrix with diagonal entries Aii,iZp, and zero elsewhere. The determinant of a square matrix A is denoted by det(A). The identity matrix with dimensions p×p is designated as Ip. Denote an arbitrary set or space by U and the product space formed by n copies of it by Un=×t=1nU. unUn denotes the set of ntuples un=(u1,u2,,un), where ukU,kZk are its coordinates. Denote a probability space by (Ω,F,P). For a sub-sigma-field GF, and AF, denote by P(A|G) the conditional probability of A given G; i.e., P(A|G)=P(A|G)(ω),ωΩ is a measurable function on Ω.

On the above probability space, consider two-real valued random variables (RV) X:ΩX,Y:ΩX, where (X,B(X)),(Y,B(Y)) are arbitrary measurable spaces. The measure (or joint distribution if X,Y are Euclidean spaces) induced by (X,Y) on X×Y is denoted by PX,Y or P(dx,dy) and their marginals on X and Y by PX and PY, respectively. The conditional measure of RV X conditioned on Y is denoted by PX|Y or P(dx|y), when Y=y is fixed. On the above probability space, consider three-real values RVs X:ΩX,Y:ΩX, Z:ΩZ. We say that RVs (Y,Z) are conditional independent given RV X if PY,Z|X=PY|XPZ|Xa.s. (almost surely) or equivalently PZ|X,Y=PZ|Xa.s; the specification a.s is often omitted. We often denote the above conditional independence by the Markov chain (MC) YXZ.

Finally, for RVs X,Y, etc., H(X) denotes differential entropy of X, H(X|Y) conditional differential entropy of X given Y, and I(X;Y) the mutual information between X and Y, as defined in standard books on information theory [17,18]. We use log(·) to denote the natural logarithm. The notation XN(0,QX) means X is a Gaussian distributed RV with zero mean and covariance QX0, where QX0 (resp. QX0) means QX is positive semidefinite (respectively, positive definite). We denote the covariance of X and Y by

QX,Y=covX,Y. (21)

We denote the covariance of X conditioned on Y by

QX|Y=cov(X,X|Y)=EXEX|YXEX|YTif(X,Y)isjointlyGaussian, (22)

where the second equality is due to a property of jointly Gaussian RVs.

2.2. Mean-Square Estimation of Conditionally Gaussian RVs

Below, we state a well-known property of conditionally Gaussian RVs from [19], which we use in our derivations.

Proposition 1. 

Conditionally Gaussian RVs [19]. Consider a pair of multivariate RVs X=(X1,,Xnx)T:ΩRnx and Y=(Y1,,Yny)T:ΩRny, (nx,ny)Z+×Z+, defined on some probability distribution Ω,F,P. Let GF be a subσalgebra. Assume the conditional distribution of (X,Y) conditioned on G, i.e., P(dx,dy|G) is Pa.s. (almost surely) Gaussian, with conditional means

μX|G=EX|G,μY|G=EY|G, (23)

and conditional covariances

QX|G=covX,X|G,QY|G=covY,Y|G, (24)
QX,Y|G=covX,Y|G. (25)

Then, the vectors of conditional expectations μX|Y,G=EX|Y,G and matrices of conditional covariances QX|Y,G=covX,X|Y,G are given, Pa.s., by the following expressions (If QY|G0 then the inverse exists and the pseudoinverse is QY|G=QY|G1):

μX|Y,G=μX|G+QX,Y|GQY|GYμY|G, (26)
QX|Y,G=QX|GQX,Y|GQY|GQX,Y|GT. (27)

If G is the trivial information, i.e., G={Ω,}, then G is removed from the above expressions.

Note that, if G={Ω,}, then (26) and (27) reduce to the well-known conditional mean and conditional covariance of X conditioned on Y.

For Gaussian RVs, we make use of the following properties.

Proposition 2. 

Let X:ΩRn,nZ+, XN(0,QX),QX0, SRn1×n,n1Z+, and denote by FX and FSX the σalgebra generated by the RVs X and SX, respectively. The following hold.

(a) FSXFX.

(b) FSX=FX if and only if ker(QX)=ker(SQX).

Proof. 

This is well-known in measure theory, see [20]. □

Proposition 3. 

Let X:ΩRn,nZ+, XN(0,QX),QX0, rank(QX)=n1,n1Z+,n1<n. Then, there exists a linear transformation SRn1×n such that, if X1:ΩRn1, X1=SX, then X1N(0,QX1),QX10, FX=FX1.

Proof. 

This is well-known in probability theory, see [20]. □

2.3. Wyner’s Coding Theorems with Side Information at the Decoder

For the sake of completeness, we introduce certain results from Wyner’s work in [2], which we use in this paper. On a probability space (Ω,F,P), consider a tuple of jointly i.i.d. RVs (Xn,Yn)={(Yt,Yt):tZn},

Xt:ΩY,Yt:ΩY,tZn, (28)

with induced distribution PXt,Yt=PX,Y,t. Consider also the measurable function dX:X×X^[0,), for a measurable space X^. Let

IM=0,1,,M1,MZM, (29)

be a finite set.

A code (n,M,DX), when switch “A” is open (see Figure 1), is defined by two measurable functions, the encoder FE and the decoder FD, with average distortion, as follows.

FE:XnIM,FD:IM×YnX^n, (30)
1nEt=1ndX(Xt,X^t)=DX, (31)

where X^n is again a sequence of RVs, X^n=FD(FE(Xn),Yn)X^n. A non-negative rate distortion pair (R,ΔX) is said to be achievable if for every ϵ>0, and n sufficiently large, there exists a code (n,M,DX) such that

M2n(R+ϵ),DXΔX+ϵ. (32)

Let R denote the set of all achievable pairs (R,ΔX), and define, for ΔX0, the infimum of all achievable rates by

R(ΔX)=inf(R,ΔX)RR. (33)

If for some ΔX there is no R< such that (R,ΔX)R, then set R(ΔX)=. For arbitrary abstract spaces Wyner [2] characterized the infimum of all achievable rates R(ΔX) by the single-letter RDF, R¯(ΔX) given by (5) and (6), in terms of an auxiliary RV Z:ΩZ. Wyner’s realization of the joint measure PX,Y,Z,X^ induced by the RVs (X,Y,Z,X^) is illustrated in Figure 2, where Z is the output of the “test channel”, PZ|X. Wyner proved the following coding theorems.

Theorem 1. 

Wyner [[2], Theorems, pp. 64–65]. Suppose Assumption 1 holds.

(a) Converse Theorem. For any ΔX0, R(ΔX)R¯(ΔX).

(b) Direct Theorem. If the conditions stated in ([2], pages 64-65, (i), (ii)) hold, then R(ΔX)R¯(ΔX), 0ΔX<.

In Figure 1, when switch A is closed and the tuple of jointly independent and identically distributed RVs (Xn,Yn) is defined as in Section 2.3, Wyner [2] generalized Berger’s [3] characterization of all achievable pairs (R,ΔX), from finite alphabet spaces to abstract alphabet spaces.

A code (n,M,DX), when switch “A” is closed, (see Figure 1), is defined as in Section 2.3, with the encoder FE, replaced by

FE:Xn×YnIM. (34)

Let R1 denote the set of all achievable pairs (R,ΔX), again as defined in Section 2.3. For ΔX0, define the infimum of all achievable rates by

R¯1(ΔX)=inf(R,ΔX)R1R. (35)

Wyner [2] characterized the infimum of all achievable rates R¯1(ΔX) by the single-letter RDF RX|Y(ΔX) given by (1) and (3). The coding Theorems are given by Theorem 1 with R(ΔX) and R¯(ΔX) replaced by R¯1(ΔX) and RX|Y(ΔX), respectively. That is, R¯1(ΔX)=RX|Y(ΔX) (using Wyner’s notation [[2], Appendix A.1]) These coding theorems generalized earlier work of Berger [3] for finite alphabet spaces. Wyner also derived a fundamental lower bound on R(ΔX) in terms of R¯1(ΔX), as stated in the next remark.

Remark 1. 

Wyner [[2], Remarks, p. 65]

(A) For ZM(ΔX), X^=f(Y,Z), and thus PZ|X,Y=PZ|X. Then, by a property of conditional mutual information and the data processing inequality:

I(X;Z|Y)=I(X;Z,f(Y,Z)|Y)I(X;X^|Y)RX|Y(ΔX), (36)

where the last equality is defined since X^M0(ΔX) (see [[2], Remarks, p. 65]. Moreover, minimizing (36) over ZM(ΔX) gives

R(ΔX)RX|Y(ΔX). (37)

(B) Inequality (37) holds with equality, i.e., R(ΔX)=RX|Y(ΔX) if X^M0(ΔX), which achieves I(X;X^|Y)=RX|Y(ΔX) can be generated as in Figure 2 with I(X;Z|Y)=I(X;X^|Y). This occurs if and only if I(X;Z|X^,Y)=0, and follows from the identity and lower bound

I(X;Z|Y)=I(X;Z,X^|Y)=I(X;Z|Y,X^)+I(X;X^|Y) (38)
I(X;X^|Y), (39)

where the inequality holds with equality if and only if I(X;Z|X^,Y)=0.

3. Main Theorems and Discussion

In this section, we state the main results of this paper. These are the achievable lower bounds of Lemma 1 and Theorem 2, which hold for RVs defined on general abstract alphabet spaces, and Theorems 4 and 5, which hold for multivariate Gaussian RVs.

3.1. Side Information at Encoder and Decoder for an Arbitrary Source

We start with the following achievable lower bound on the conditional mutual information I(X;X^|Y), which appears in the definition of RX|Y(ΔX) of (1); this strengthens Gray’s lower bound (8) [[4], Theorem 3.1].

Lemma 1. 

Achievable lower bound on conditional mutual information. Let (X,Y,X^) be a triple of arbitrary RVs taking values in the abstract spaces X×Y×X^, with distribution PX,Y,X^ and joint marginal the fixed distribution PX,Y of (X,Y). Then, the following hold.

(a) The inequality holds:

I(X;X^|Y)I(X;X^)I(X;Y). (40)

Moreover, the equality holds

I(X;X^|Y)=I(X;X^)I(X;Y)[0,), (41)

if and only if

PX|X^,Y=PX|X^a.s.orequivalentlyYX^XisaMC. (42)

(b) If YX^X is a Markov chain then the equality holds

RX|Y(ΔX)=RX(ΔX)I(X;Y),ΔXDC(X|Y), (43)

i.e., for all ΔX that belong to strictly positive set DC(X|Y)[0,).

Proof. 

See Appendix A.1. □

The next theorem which holds for arbitrary RVs is further used to derive the characterization of RX|Y(ΔX) for multivariate Gaussian RVs.

Theorem 2. 

Achievable lower bound on conditional mutual information and mean-square error estimation

(a) Let (X,Y,X^) be a triple of arbitrary RVs on the abstract spaces X×Y×X^, with distribution PX,Y,X^ and joint marginal the fixed distribution PX,Y of (X,Y).

Define the conditional mean of X conditioned on (X^,Y) by

X¯cm=EX|Y,X^=e(Y,X^), (44)

for some measurable function f:Y×X^X.

(1) The inequality holds:

I(X;X^|Y)I(X;X¯cm|Y). (45)

(2) The equality holds, I(X;X^|Y)=I(X;X¯cm|Y) if anyone of the conditions (i) or (ii) holds.

(i)X¯cm=X^a.s(ii)ForafixedyYthefunctione(y,·):X^X,e(y,x^)=x¯cmuniquelydefinesx^ (46)
i.e.,e(y,·)isaninjectivefunctiononthesupportofx^. (47)

(b) In part (a) let (X,Y,X^) be a triple of arbitrary RVs on X×Y×X^=Rnx×Rny×Rnx, (nx,ny)Z+×Z+.

For all measurable functions (y,x^)g(y,x^)Rnx, the mean-square error satisfies

E||Xg(Y,X^)||Rnx2E||XEX|Y,X^||Rnx2,g(·). (48)

Proof. 

See Appendix A.2. □

3.2. Side Information at Encoder and Decoder for Multivariate Gaussian Source

The characterizations of the RDFs RX|Y(ΔX) and R¯(ΔX) for a multivariate Gaussian source are encapsulated in Theorems 3–5; these are proved in Section 4. These theorems include the structural properties of optimal test channels or realizations of (X^,Z), which induce joint distributions. Furthermore, they achieve the RDFs; the closed form expressions of the RDFs are based on a water-filling. The realization of the optimal test channel of RX|Y(ΔX) is shown in Figure 3.

Figure 3.

Figure 3

RX|Y(ΔX): A realization of optimal reproduction X^ over parallel additive Gaussian noise channels of Theorem 4, where hi=1δiλi0,i=1,,nx are the diagonal element of the spectral decomposition of the matrix H=Udiag{h1,,hnx}UT, and WiN(0,hiδi),i=1,,nx, the additive noise introduced due to compression.

The following theorem gives a parametric realization of optimal test channel that achieves the characterization of the RDF RX|Y(ΔX).

Theorem 3. 

Characterization of RX|Y(ΔX) by test channel realization. Consider the RDF RX|Y(ΔX) defined by (1), for the multivariate Gaussian source with mean-square error distortion defined by (9)–(18). The following hold.

(a) The optimal realization X^ that achieves RX|Y(ΔX) is parametrized by the matrices (H,QW) and represented by

X^=HXQX,YQY1Y+QX,YQY1Y+W (49)
=HXQX,YQY1Y+QX,YQY1Y+HΨ,ifH1exists, (50)

where

HQX|Y=QX|YHT=QX|YΣΔ0, (51)
Widependentof(X,Y),QWN(0,QW), (52)
QW=HQX|YHQX|YHT=HΣΔ=ΣΔΣΔQX|Y1ΣΔ=ΣΔH0, (53)
W=HΨ,ΨN(0,QΨ),QΨ=ΣΔH1=H1ΣΔ,ifH1exists, (54)
ΣΔ=EXX^XX^T, (55)
QX^|Y=QX|YΣΔ0, (56)
QX|Y=QXQX,YQY1QX,YT0,QX,Y=QXCT,QY=CQXCT+DDT. (57)

Moreover, the optimal parametric realization of X^ satisfies the following structural properties.

(i)PX|X^,Y=PX|X^,ifQXΣΔ, (58)
(ii)EX|Y=EX^|Y,ifQXΣΔ, (59)
(iii)cov(X,X^|Y)=cov(X^,X^|Y),ifQX|YΣΔ, (60)
(iv)EX|X^,Y=EX|X^=X^,ifQX|YΣΔ. (61)

(b) The RDF RX|Y(ΔX) is given by

RX|Y(ΔX)=infΣΔ0,QX|YΣΔ0,traceΣΔΔX12logmax1,det(QX|YΣΔ1). (62)

Proof. 

The proof is given in Section 4. □

The next theorem gives additional structural properties of the optimal test channel realization of Theorem 3 and uses these properties to characterize RDF RX|Y(ΔX) via a water-filling solution.

Theorem 4. 

Characterization of RX|Y(ΔX) via water-filling solution. Consider the RDF RX|Y(ΔX) defined by (1), for the multivariate Gaussian source with mean-square error distortion defined by (9)–(18), and its characterization in Theorem 3. The following hold.

(a) The matrices of the parametric realization of X^,

{ΣΔ,QX|Y,H,QW}havespectraldecompositionswithrespecttothesameunitarymatrixUUT=Inx,UTU=Inx, (63)

where the realization coefficients are

QW=HΣΔ=Udiag(σW12,,σWnx2)UT,ΣΔ=Udiag(δ1,,δnx)UT, (64)
H=InxQX|Y1ΣΔ=Udiag(h1,,hnx)UT,QX|Y=Udiag(λ1,,λnx)UT, (65)
λ1λ2λnx>0,δ1δ2δnx>0, (66)
σW12σW22σWnx0,h1h2hnx0,σWi2=hiδi,hi=1δiλi, (67)

and the eigenvalues σWi2 and hi are given by

σWi2=min(λi,δi)(λimin(λi,δi))λi,hi=λimin(λi,δi)λi,i=1nxmin(λi,δi)=ΔX. (68)

Moreover, if σWi2=0, then hi=0, and vice versa.

(b) The RDF RX|Y(ΔX) is given by the water-filling solution:

RX|Y(ΔX)=12logmax1,det(QX|YΣΔ1)=12i=1nxlogλiδi, (69)

where

E||XX^||Rnx2=traceΣΔ=i=1nxδi=ΔX,δi=μ,ifμ<λiλi,ifμλi (70)

and μ(0,) is a Lagrange multiplier (obtained from the Kuch–Tucker conditions).

(c) Figure 3 depicts the parallel channel scheme that realizes the optimal X^ of parts (a), (b), which achieves RX|Y(ΔX).

(d) If X and Y are independent or Y is replaced by a RV that generates the trivial information, i.e., the σalgebra of Y is σ{Y}={Ω,} (or C=0 in (15)), then (a)–(c) hold with QX|Y=QX,QX,Y=0, and RX|Y(ΔX)=RX(ΔX), i.e., reduces to the marginal RDF of X.

Proof. 

The proof is given in Section 4. □

The proof of Theorem 4 (see Section 4) is based on the identification of structural properties of the test channel distribution. Some of the implications are briefly described below.

Conclusion 1: The construction and the structural properties of the optimal test channel PX|X^,Y that achieves the water-filling characterization of the RDF RX|Y(ΔX) of Theorems 3 and 4 are not documented elsewhere in the literature.

(i) Structural properties (58) and (61) strengthen Gray’s inequality [[4], Theorem 3.1], (see proof of (8)) to the equality. That is, structural property (58) implies that Gray’s [[4], Theorem 3.1] lower bound (8) holds with equality for a strictly positive surface (See Gray [4] for definition) ΔXDC(X|Y)[0,), i.e.,

RX|Y(ΔX)=RX(ΔX)I(X;Y),ΔXDC(X|Y)=ΔX[0,):ΔXnxλnx. (71)

The set DC(X|Y) excludes values of ΔX[0,) for which water-filling is active in (69) and (70).

By the realization of the optimal reproduction X^, it follows that the subtraction of equal quantities EX|Y at the encoder and decoder does not affect the information measure, noting that EX|Y=EX^|Y.

Theorem 4 points (a) and (b) are obtained with the aid of Theorem 3 and Hadamard’s inequality, which shows QX|Y and ΣΔ have the same eigenvectors.

(ii) Structural properties of realizations of Theorems 3 and 4: The matrices {ΣΔ,QX|Y,H,QW} are nonnegative symmetric and have a spectral decomposition with respect to the same unitary matrix UUT=Inx [21]. This implies that the test channel is equivalently represented by parallel additive Gaussian noise channels (subject to pre-processing and post-processing at the encoder and decoder).

(iii) In Remark 4, we show that the realization of optimal X^ in Figure 3, which achieves the RDF of Theorem 4, degenerates to Wyner’s [2] optimal realization, which attains the RDF RX|Y(ΔX), for the tuple of scalar-valued, jointly Gaussian RVs (X,Y) with square error distortion function.

3.3. Side Information Only at Decoder for Multivariate Gaussian Source

Theorem 5 gives the optimal test channel that achieves the characterization of the RDF R¯(ΔX) and further states that there is no loss of compression rate if side information is only available at the decoder. That is, although in general, R¯(ΔX)RX|Y(ΔX), an optimal reproduction X^=f(Y,Z) of X, where f(·,·) is linear, is constructed such that the inequality holds with equality.

Theorem 5. 

Characterization and water-filling solution of R¯(ΔX). Consider the RDF R¯(ΔX) defined by (5) for the multivariate Gaussian source with mean-square error distortion, defined by (9)–(18). Then, the following hold.

(a) The characterization of the RDF, R¯(ΔX) satisfies

R¯(ΔX)RX|Y(ΔX), (72)

where RX|Y(ΔX) is given in Theorem 4b.

(b) The optimal realization X^=f(Y,Z), which achieves the lower bound in (72), i.e., R¯(ΔX)=RX|Y(ΔX), is represented by

X^=f(Y,Z) (73)
=IHQX,YQY1Y+Z, (74)
Z=HX+W, (75)
(H,QW)givenby(51)(57),and(63)holds. (76)

Moreover, the following structural properties hold:

(1) The optimal test channel satisfies

(i)PX|X^,Y,Z=PX|X^,Y=(α)PX|X^,where(α)holdsifQXΣΔ, (77)
(ii)EX|X^,Y,Z=EX|X^,Y=(β)EX|X^=(γ)X^,where(β),(γ)holdifQX|YΣΔ, (78)
(iii)PZ|X,Y=PZ|X. (79)

(2) Structural property (2) of Theorem 4a holds.

Proof. 

It is given in Section 4. □

The proof of Theorem 5 is based on the derivation of the structural properties and Theorem 4. Some implications are discussed below.

Conclusion 2: The optimal reproduction X^=f(X,Z) or test channel distribution PX|X^,Y,Z, which achieves R¯(ΔX) of Theorem 5, are not reported in the literature.

(i) From the structural property (1) of Theorem 5, i.e., (77), it follows that the lower bound R¯(ΔX)RX|Y(ΔX) is achieved by the realization X^=f(Y,Z) of Theorem 5b; i.e., for a given Y=y, then X^ uniquely defines Z.

(ii) If X is independent of Y or Y generates trivial information, then the RDFs R¯(ΔX)=R¯X|Y(ΔX) degenerate to the classical RDF of the source X, i.e., RX(ΔX), as expected. This is easily verified from (73) and (76), i.e., QX,Y=0, which implies X^=Z.

For scalar-valued RVs, X:ΩR,Y:ΩR,XN(0,σX2), and X independent of Y, then the optimal realization reduces to

X^=Z=1ΔXσX2X+1ΔXσX2ΔXW¯,W¯N(0,1),σX2ΔX, (80)
QX^=QZ=σX^2=σX2ΔX0, (81)

as expected.

(iii) In Remark 4, we show that the realization of optimal X^=f(Y,Z), which achieves the RDF R¯(ΔX) of Theorem 5, degenerates to Wyner’s [2] realization that attains the RDF R¯(ΔX), of the tuple of scalar-valued, jointly Gaussian RVs (X,Y), with the square error distortion function.

4. Proofs of Theorems 3–5

In this section, we derive the statements of Theorems 3–5 by making use of Theorem 2 (which holds for general abstract alphabet spaces) by restricting attention to multivariate jointly Gaussian (X,Y).

4.1. Side Information at Encoder and Decoder

For jointly Gaussian RVs (X,Y,X^), in the next theorem we identify simple sufficient conditions for the lower bound of Theorem 2 to be achievable.

Theorem 6. 

Sufficient conditions for the lower bounds of Theorem 2 to be achievable. Consider the statement of Theorem 2 for a triple of jointly Gaussian RVs (X,Y,X^) on Rnx×Rny×Rnx, (nx,ny)Z+×Z+, i.e., PX,Y,X^=PX,Y,X^G and joint marginal the fixed Gaussian distribution PX,Y=PX,YG of (X,Y)

Then,

X¯cm=EX|Y,X^=eG(Y,X^), (82)
eG(Y,X^)=EX|Y+cov(X,X^|Y)cov(X^,X^|Y)X^EX^|Y. (83)

Moreover, the following hold.

Case (i). cov(X^,X^|Y)0, that is, rank(QX^|Y)=nx. Condition (84) is sufficient for I(X;X^|Y)=I(X;X¯cm|Y).

X¯cm=EX|Y,X^=eG(Y,X^)=X^a.s. (84)

In addition, Conditions 1 and 2 below are sufficient for (84) to hold.

Condition1.EX|Y=EX^|Y (85)
Condition2.cov(X,X^|Y)cov(X^,X^|Y)1=Inx (86)

Case (ii). cov(X^,X^|Y)0 but not cov(X^,X^|Y)0; that is, rank(QX^|Y)=n1<nx. Condition (87) is sufficient for I(X;X^|Y)=I(X;X¯cm|Y).

eG(·,·)definedby(83)satisfies(47). (87)

In addition, a sufficient condition for (87) to hold is, for a fixed Y=yY, the σalgebras satisfy FX^=FeG(y,X^).

Proof. 

Note that identity (83) follows from Proposition 1, (26), by letting Y=X^ and G be the information generated by Y. Consider Case (i); If (84) holds then I(X;X^|X¯cm,Y)=0. By (83), Conditions 1 and 2 are sufficient for (84) to hold. Consider Case (ii). Sufficient condition (87) follows from Theorem 2, and implies I(X;X^|X¯cm,Y)=0. The statement below (87) follows from Proposition 2. □

Now, we turn our attention to the optimization problem RX|Y(ΔX) defined by (1) for the multivariate Gaussian source with mean-square error distortion defined by (9)–(18). In the next lemma, we derive a preliminary parametrization of the optimal reproduction distribution PX^|X,Y of the RDF RX|Y(ΔX).

Lemma 2. 

Preliminary parametrization of optimal reproduction distribution of RX|Y(ΔX). Consider the RDF RX|Y(ΔX) defined by (1) for the multivariate Gaussian source, i.e., PX,Y=PX,YG, with mean-square error distortion defined by (9)–(18).

(a) For every joint distribution PX,Y,X^ there exists a jointly Gaussian distribution denoted by PX,Y,X^G, with marginal the fixed distribution PX,YG, which minimizes I(X;X^|Y) and satisfies the average distortion constraint, i.e., with dX(x,x^)=||xx^||Rnx2.

(b) The conditional reproduction distribution of the RDF RX|Y(ΔX) is PX^|X,Y=PX^|X,YG and induced by the parametric realization of X^ (in terms of H,G,QW),

X^=HX+GY+W, (88)
HRnx×nx,GRnx×ny, (89)
WN(0,QW),QW0, (90)
Windependentof(X,Y), (91)

and X^ is a Gaussian RV.

(c) RX|Y(ΔX) is characterized by the optimization problem.

RX|Y(ΔX)=infM0G(ΔX)I(X;X^|Y),ΔX[0,), (92)

where M0G(ΔX) is specified by the set

M0G(ΔX)=X^:ΩX^:(88)(91)hold,andE||XX^||Rnx2ΔX. (93)

(d) If there exists (H,G,QW) such that (84) or (87) hold, then a further lower bound on RX|Y(ΔX) is achieved in the subset M0G,o(ΔX)M0G(ΔX) defined by

M0G,o(ΔX)={X^:ΩX^:(88)(91)hold,(84)or(87)hold,E||XX^||Rnx2ΔX}, (94)

and the corresponding characterization of the RDF is

RX|Y(ΔX)=infM0G,o(ΔX)I(X;X^|Y),ΔX[0,). (95)

Proof. 

(a) This is omitted since it is similar to the classical unconditional RDF RX(ΔX) of a Gaussian message XN(0,QX). (b) By (a), the conditional distribution PX^|X,YG is such that, its conditional mean is linear in (X,Y), its conditional covariance is nonrandom, i.e., constant, and for fixed (X,Y)=(x,y), PX^|X,YG is Gaussian. Such a distribution is induced by the parametric realization (88)–(91). (c) Follows from parts (a) and (b). (d) Follows from Theorem 6 and (48) due to the achievability of the lower bounds. □

In the next theorem, we identify the optimal triple (H,G,QW) such that (84) or (87) hold (i.e., establish its existence), characterize the RDF by RX|Y(ΔX)=infM0G,o(ΔX)I(X;X^|Y), and construct a realization X^ that achieves it.

Theorem 7. 

Characterization of RDF RX|Y(ΔX). Consider the RDF RX|Y(ΔX), defined by (1), for the multivariate Gaussian source with mean-square error distortion, defined by (9)–(18). The characterization of the RDF RX|Y(ΔX) is

RX|Y(ΔX)=infQ(ΔX)I(X;X^|Y) (96)
=infM0G,o(ΔX)I(X;X^|Y) (97)
=infQ(ΔX)12logdet(QX|YΣΔ1), (98)

where

Q(ΔX)=ΣΔ0:QX|YΣΔ0,traceΣΔΔX, (99)
ΣΔ=EXX^XX^T, (100)
QX|Y=QXQX,YQY1QX,YT, (101)
QX,Y=QXCT,QY=CQXCT+DDT. (102)

The realization of the optimal reproduction X^M0G,o(ΔX), which achieves RX|Y(ΔX), is given in Theorem 3a, also satisfies the properties stated under Theorem 3a. (i)–(iv).

Proof. 

See Appendix A.3. □

Remark 2. 

Structural properties of the optimal realization of Theorem 4a. For the characterization of the RDF RX|Y(ΔX) of Theorem 7, which is achieved by X^ defined in Theorem 3a in terms of the matrices ΣΔ,QX|Y,H,QW, we show in Corollary 2, the statements of Theorem 4a, i.e.,

(i)H=HT0, (103)
(ii)ΣΔ,ΣX|Y,H,QWhavespectralrepres.withrespecttothesameunitarymatrixUUT=Inx. (104)

To prove the structural property of Remark 2, we use the next corollary, which is a degenerate case of [[22], Lemma 2] (i.e., the structural properties of test channel of Gorbunov and Pinsker [23] nonanticipatory RDF of Markov sources).

Corollary 1. 

Structural properties of realization of optimal X^ of Theorem 4a. Consider the characterization of the RDF RX|Y(ΔX) of Theorem 7. Suppose QX|Y0 and ΣΔ0 commute, that is,

QX|YΣΔ=ΣΔQX|Y. (105)

Then,

(1)H=InxΣΔQX|Y1=HT,QW=ΣΔHT=ΣΔH=HΣΔ=QWT0(2)ΣΔ,QX|Y,H,QWhavespectral (106)
decompositionswithrespecttothesameunitarymatrixUUT=Inx,UTU=Inx. (107)

that is, the following hold.

QX|Y=Udiag{λ1,,λnx}UT,λ1λ2λnx>0, (108)
ΣΔ=Udiag{δ1,,δnx}UT,δ1δ2δnx0, (109)
H=Udiag{1δ1λ1,,1δnxλnx}UT, (110)
QW=Udiag{(1δ1λ1)δ1,,(1δnxλnx)δnx}UT,and(1δkλk)δk0. (111)

Proof. 

See Appendix A.4. □

In the next corollary, we re-express the realization of X^ of Theorem 4a, which characterizes the RDF of Theorem 7 using a translation of X and X^ by subtracting their conditional means with respect to Y, making use of property EX|Y=EX^|Y of (78). This is the the realization shown in Figure 3.

Corollary 2. 

Equivalent characterization of RX|Y(ΔX). Consider the characterization of the RDF RX|Y(ΔX) of Theorem 7 and the realization of X^ of Theorem 3a and Theorem 4a. Define the translated RVs

X=XEX|Y=XQX,YQY1Y,X^=X^EX|Y=X^QX,YQY1Y. (112)

Let

QX|Y=Udiag{λ1,,λnx}UT,UUT=Inx,UTU=Inx,λ1λ2λnx, (113)
X¯=UTX,X¯^=UTX^. (114)

Then,

X^=HX+W, (115)
I(X;X^|Y)=I(X;X^)=I(UTX;UTX^), (116)
EXX^Rnx2=EXX^Rnx2=EUTXUTX^Rnx2=traceΣΔ, (117)

where (H,QW) are given in Theorem 3a.

Further, the characterization of the RDF RX|Y(ΔX) (98) satisfies the following equalities and inequality:

RX|Y(ΔX)=infQ(ΔX)I(X;X^|Y)=infQ(ΔX)12logmax1,det(QX|YΣΔ1) (118)
=infEXX^Rnx2ΔXI(X;X^) (119)
=infEUTXUTX^Rnx2ΔXI(UTX;UTX^) (120)
infEUTXUTX^Rnx2ΔXt=1nxI(X¯t;X¯^t) (121)

Moreover, the inequality (121) is achieved if QX|Y0 and ΣΔ0 commute; that is, if (105) holds, then

RX|Y(ΔX)=infi=1nxδiΔX12i=1nxlogmax1,λiδi (122)

where

diag{EUTXUTX^UTXUTX^T}=diag{δ1,δ2,,δnx}. (123)

Proof. 

By Theorem 3a,

X^=HX+IHQX,YQY1Y+W (124)
=HXQX,YQY1Y+QX,YQY1Y+W (125)
X^QX,YQY1Y=HXQX,YQY1Y+W (126)
X^=HX+W. (127)

The last equation establishes (115). By properties of conditional mutual information and the properties of optimal realization X^, the following equalities hold.

I(X;X^|Y)=I(XQX,YQY1Y;X^QX,YQY1Y|Y) (128)
=I(X;X^|Y),by(112) (129)
=H(X^|Y)H(X^|Y,X) (130)
=H(X^)H(X^|Y,X),byindep.of(X,W)andY (131)
=H(X^)H(X^|X),byindep.ofWandYforfixedX (132)
=I(X;X^) (133)
=I(UTX;UTX^) (134)
=I(X¯1,X¯2,,X¯nx;X¯^1,X¯^2,,X¯^nx) (135)
t=1nxI(X¯t;X¯^t),bymutualindependenceofX¯t,t=1,2,,nx. (136)

Moreover, inequality (136) holds with equality if (X¯t;X¯^t),t=1,2,,nx are jointly independent. The average distortion function is then given by

EXX^Rnx2=EXX^QX,YQY1Y+QX,YQY1YRnx2 (137)
=EXX^Rnx2,by(112) (138)
=EUTXUTX^Rnx2=traceΣΔ,byUUT=Inx. (139)

By Corollary 1, if (105) holds, that is, QX|Y0 and ΣΔ0 satisfy QX|YΣΔ=ΣΔQX|Y (i.e., commute), then (106)–(108) hold, and by (122) we obtain

X¯^=UTX^=UTHX+UTW=UTX^=UTHUUTX+UTW (140)
=UTHUX¯+UTW,UTHUisdiagonalandUTWhasindep.components. (141)

Hence, if (105) holds, then the lower bound in (136) holds with equality because (X¯t;X¯^t),tZnx are jointly independent. Moreover, if (105) holds, then from, say, (118), the expressions (122) and (123) are obtained. The above equations establish all claims. □

Proposition 4. 

Theorem 4 is correct.

Proof. 

By invoking Corollary 2, Theorem 7 and the convexity of RX|Y(ΔX) given by (122), then we arrive at the statements of Theorem 4, which completely characterize the RDF RX|Y(ΔX) and construct a realization of the optimal X^ that achieves it. □

Next, we discuss the degenerate case, when the statements of Theorems 3, 4 and 7 reduce to the RDF RX(ΔX) of a Gaussian RV X with square-error distortion function. We illustrate that the identified structural property of the realization matrices ΣΔ,QX|Y,H,QW leads to to the well-known water-filling solution.

Remark 3. 

Degenerate case of Theorem 7 and realization X^ of Theorem 4a. Consider the characterization of the RDF RX|Y(ΔX) of Theorem 7, the realization of X^ Theorem 3a, Theorem 3, and assume X and Y are independent or Y generates the trivial information; i.e., the σalgebra of Y is σ{Y}={Ω,} or C=0 in (15)–(18).

(a) By the definitions of QX,Y,QX|Y then

QX,Y=0,QX|Y=QX. (142)

Substituting (142) into the expressions of Theorem 7, the RDF RX|Y(ΔX) reduces to

RX|Y(ΔX)=RX(ΔX)=infQ(ΔX)I(X;X^) (143)
=infQm(ΔX)12logdet(QXΣΔ1), (144)

where

Qm(ΔX)=ΣΔ0:QXΣΔ,traceΣΔΔX, (145)

and the optimal reproduction X^ reduces to

X^=InxΣΔQX1X+W,QXΣΔ, (146)
QW=InxΣΔQX1ΣΔ0. (147)

Thus, RX(ΔX) is the well-known RDF of a multivariate memoryless Gaussian RV X with square-error distortion.

(b) For the RDF RX(ΔX) of part (a), it is known [24] that ΣΔ and QX have a spectral decomposition with respect to the same unitary matrix, that is,

QX=UΛXUT,ΣΔ=UΔUT,UUT=I, (148)
ΛX=diag{λX,1,,λX,nx},Δ=diag{δ1,,δnx}, (149)

where the entries of (ΛX,Δ) are in decreasing order.

Define

Xp=UTX,X^p=UTX^,Wp=UTW. (150)

Then, a parallel channel realization of the optimal reproduction X^p is obtained as follows:

X^p=HXp+Wp, (151)
H=InxΔΛX1=diag{1δ1λX,1,,1δnxλX,nx}, (152)
QWp=HΔ=diag{1δ1λX,1δ1,,1δnxλX,nxδnx}. (153)

The RDF RX(ΔX) is then computed from the reverse water-filling equations as follows.

RX(ΔX)=12i=1nxlogλX,iδi, (154)

where

i=1nxδi=ΔX,δi=μ,ifμ<λX,iλX,i,ifμλX,i (155)

and μ[0,) is a Lagrange multiplier (obtained from the Kuch–Tucker conditions).

4.2. Side Information Only at Decoder

In general, when the side information is available only at the decoder, the achievable operational rate R(ΔX) is greater than the achievable operational rate R¯1(ΔX) when the side information is available to the encoder and the decoder [2]. By Remark 1, R¯(ΔX)RX|Y(ΔX), and equality holds if I(X;Z|X^,Y)=0.

In view of the characterization of RX|Y(ΔX) and the realization of the optimal reproduction X^ of Theorem 3, which is presented in Figure 3, we observe that we can re-write (49) as follows.

X^=HX+InxHQX,YQY1Y+W, (156)
=InxHQX,YQY1Y+Z (157)
=f(Y,Z) (158)
Z=HX+W, (159)
H=InxΣΔQX|Y1,QW=HΣΔ,definedby(51)(63), (160)
PZ|X,Y=PZ|X,(X^,Y)uniquelydefineZ,whichimpliesI(X;Z|X^,Y)=0. (161)

Proposition 5. 

Theorem 5 is correct.

Proof. 

From the above realization of X^=f(Y,Z), we have the following. (a) By Wyner, see Remark 1, then the inequalities (36) and (37) hold, and equalities hold if I(X;Z|X^,Y)=0. That is, for any X^=f(Y,Z), and by the properties of conditional mutual information, then

I(X;Z|Y)=(α)I(X;Z,X^|Y) (162)
=(β)I(X;Z|X^,Y)+I(X;X^|Y) (163)
(γ)I(X;X^|Y), (164)

where (α) is due to X^=f(Y,Z), (β) is due to the chain rule of mutual information, and (γ) is due to I(X;Z|X^,Y)0. Hence, (72) is obtained (as in Wyner [2] for a tuple of scalar jointly Gaussian RVs). (b) Equality holds in (164) if there exists an X^=f(Y,Z) such that I(X;Z|X^,Y)=0, and the average distortion is satisfied. Taking X^=f(Y,Z)=(InxH)QX,YQY1Y+Z, where Z=g(X,W) is specified by (156)–(160), then I(X;Z|X^,Y)=0 and the average distortion is satisfied. Since the realization (156)–(160) is identical to the realization (73)–(76), then part (b) is also shown. (c) This follows directly from the optimal realization. □

5. Connection with Other Works and Simulations

In this section, we illustrate that for the special case of scalar-valued jointly Gaussian RVs (X,Y), our results reproduce Wyner’s [2] results. In addition, we show that the characterizations of the RDFs of the more general problems considered in [5,6] (i.e., where a noisy version of source is available at the encoder) do not reproduce Wyner’s [2] results. Finally, we present simulations.

5.1. Connection with Other Works

Remark 4. 

The degenerate case to Wyner’s [2] optimal test channel realizations. Now, we verify that for the tuple of scalar-valued, jointly Gaussian RVs (X,Y), with square error distortion function specified below, our optimal realizations of X^ and closed form expressions for RX|Y(ΔX) and R¯(ΔX) are identical to Wyner’s [2] realizations and RDFs (see Figure 4). Let us define:

X:ΩX=R,Y:ΩY=R,X^:ΩX^=R, (165)
dX(x,x^)=xx^2, (166)
XN(0,σX2),σX2>0,Y=αX+U, (167)
UN(0,σU2),σU2>0,α>0. (168)

(a) RDF RX|Y(ΔX): By Theorem 4a applied to (165)–(168), we obtain

QX=σX2,QX,Y=ασX2,QY=σY2=α2σX2+α2σU2,QX|Y=cσU2,c=σX2σX2+σU2, (169)
H=1ΔXQX|Y1=cσU2dcσU2a,QX,YQY1=cα,HQX,YQY1=acα, (170)
W=HΨ=aΨ,QΨ=H1ΔX=ΔXa=cσU2ΔXcσU2ΔX,cσU2ΔX>0. (171)

Moreover, by Theorem 4b the optimal reproduction X^M0(d) and RX|Y(d) are,

X^=a(XcαY)+cαY+aΨ,cσU2ΔX>0 (172)
RX|Y(ΔX)=12logcσU2ΔX,0<ΔX<cσU20,ΔXcσU2. (173)

This shows our realization of Figure 3 degenerates to Wyner’s [2] realization of Figure 4a.

(b) RDF R¯(ΔX): By Theorem 5b applied to (165)–(168), and using the calculations (169)–(172), we obtain

X^=f(Y,Z)=cα(1a)Y+Zby(172),(175), (174)
Z=aX+Ψ,(a,Ψ)definedin(170),(171) (175)
R¯(ΔX)=RX|Y(ΔX)=(173)byevaluatingI(X;Z)I(Y;Z),using(4)and(175). (176)

This shows our value of R¯(ΔX) and optimal realization X^=f(Y,Z) reproduce Wyner’s optimal realization and the value of R¯(ΔX) given in [2] (i.e., Figure 4b).

Figure 4.

Figure 4

Wyner’s realizations of optimal reproductions for RDFs RX|Y(ΔX) and R¯(ΔX). (a) RDF RX|Y(ΔX): Wyner’s [2] optimal realization of X^ for RDF RX|Y(ΔX) of (165)–(168). (b) RDF R¯(ΔX): Wyner’s [2] optimal realization X^=f(X,Z) for RDF R¯(ΔX) of (165)–(168).

In the following Remark, we show that, when S=X-a.s., the realization of the auxiliary RV Z, which is used in the proofs in [5,6] to show the converse coding theorem does not coincide with Wyner’s realization [2]. Also, their realizations do not reproduce Wyner’s RDF (this observation is verified for modified realization given in the correction note without proof in https://tiangroup.engr.tamu.edu/publications/ (accessed on 3 January 2024)). The deficiency of the realizations in [5,6] to show the converse was first pointed out in [7], using an alternative proof.

Remark 5. 

Optimal test channel realization of [5,6]

(a) The derivation of [[5], Theorem 4], uses the following representation of RVs (see [[5], Equation (4)] adopted to our notation using (19)):

X=KxsKsy+KxyY+KxsN1+N2,S=KsyY+N1,

where N1 and N2 are independent Gaussian RVs with zero mean, N1 is independent Y and N2 is independent of (S,Y).

To reduce [5,6] to the Wyner and Ziv RDF, we set X=Sa.s., which then implies, Kxs=I,N2=0a.s and Kxy=0. According to the derivation of the converse [[5], Theorem 4] (see [[5], 3 lines above Equation (32)] using our notation), the optimal realization of the auxiliary RV ZT used to achieve the RDF is

ZT=UTX+N3, (177)

where QX|Y=Udiag(λ1,,λn)UT and U is a unitary matrix, N3N(0,QN3) such that QN3 is a diagonal covariance matrix, with elements given by (for the value of σ3,i2, we considered the one given in the correction note in https://tiangroup.engr.tamu.edu/publications/ (accessed on 3 January 2024) (although no derivation is given), where it is stated that σ3,i2 that appeared in the derivation [[5], proof of theorem 4] should be multiplied by λi),

σ3,i2=min(λi,δi)λimin(λi,δi)λi,i=1nmin(λi,δi)=ΔX. (178)

(b) It is easy to verify that the above realization of ZT that uses the correction of footnote 6 is precisely the realization given in [[6], Theorem 3A].

(c) Special Case: For scalar-valued RVs the auxiliary RV ZT reduces to

ZT=X+N3,N3N0,ΔXQX|YQX|YΔX,QX|Y>ΔX (179)

Now, we examine whether the realization (179) corresponds to Wyner’s realization and induces Wyner’s RDF. Recall that the Wyner’s [2] RDF, denoted by RX;Z|Y(ΔX) and corresponding to auxiliary RV Z, is

Z=HX+W,H=QX|YΔXQX|Y,WN(0,HΔX), (180)
RX;Z|Y(ΔX)=I(X;Z|Y)=12logQX|YΔX,ΔXQX|Y. (181)

Clearly, the two realizations (179) and (180) are different. Let R^X;ZT|Y(ΔX) denote the RDF corresponding to the realization ZT. Then R^X;ZT|Y(ΔX) can be computed using I(X;ZT|Y)=I(X;ZT)I(Y;ZT)=H(ZT|X)+H(ZT|Y) where H(·|·) denotes the conditional differential entropy. Then, by using

QZT|X=QN3=ΔXQX|YQX|YΔX, (182)
QZT|Y=QN3+QX|Y. (183)

it is straightforward to show that

R^X;ZT|Y(ΔX)=H(ZT|X)+H(ZT|Y) (184)
=12log(2πeΔXQX|YQX|YΔX)+12log(2πeQX|Y2QX|YΔX),ΔX<QX|Y (185)

However, we note that (i) unlike Wyner’s RDF given in (181), which gives RX;Z|Y(ΔX)=0 at ΔX=QX|Y, the corresponding R^X;ZT|Y(ΔX)=+ at ΔX=QX|Y, and (ii) Wyner’s test channel realization is Z=HX+W,H=QX|YΔXQX|Y and WN(0,HΔX), which is different from the test channel realization in (179). In particular, if QX|Y=ΔX, then H=0WN(0,0) and Z=0a.s. On the other hand, for the test channel in (179), if QX|Y=ΔX, then N3N0,+, and thus the variance of ZT in (179) is not zero.

Further, in Proposition 6, we prove that for the multi-dimensional source, the test channel realization in (179) does not achieve the RDF when water-filling is active, i.e., when at least one component of the source is not reproduced.

(d) Special Case Classical RDF: The classical RDF is obtained as a special case if we assume X and Y are independent or Y generates the trivial information {Ω,}; i.e., Y is nonrandom. Clearly, in this case, the RDF R^S;ZT|Y(ΔX) should degenerate to the classical RDF of the source X, i.e., RX(ΔX), and it should be that X^=ZT. However, for this case, (179) gives QZT=QX+ΔXQXQXΔX=QX2QXΔX, which is fundamentally different from Wyner’s degenerate, and correct values QX^=QZ=max{0,QXΔX}.

Proposition 6. 

When S=X-a.s., Wyner’s [2] auxiliary RV Z and the auxiliary RV ZT given in (177) i.e., the degenerate case of [5,6] (with the correction of footnote 6), are not related by an invertible function. As a result, the computed RDF based on the two realizations are different.

Proof. 

Recall that, if the two auxiliary RVs ZT and Z are not related by an invertible function, i.e., Z=f(ZT), where f(·) is invertible and both f and its inverse are measurable, then I(X;ZT)I(Y;ZT)I(X;Z)I(Y;Z). It was shown earlier in this paper (and also in [7]) that for the multivariate Wyner’s RDF, the auxiliary RV takes the form

Z=HX+W,WN(0,QW), (186)

where QW=HΣΔ=Udiag(σw,12,,σw,n2)UT, ΣΔ=Udiag(δ1,,δn)UT, H=IQX|Y1ΣΔ=Udiag(h1,,hn)UT and QX|Y=Udiag(λ1,,λn)UT, where U is a unitary matrix. The eigenvalues σw,i2 and hi are given by

σw,i2=min(λi,δi)(λimin(λi,δi))λi, (187)
hi=λimin(λi,δi)λi, (188)

where i=1nmin(λi,δi)=ΔX. Hence, Equations (186), (187), and (188), imply that if σw,i2=0 then hi=0, and vice versa. Such zero values correspond to compression with water-filling. On the other hand, from (177) and (178), if water-filling is active, then σ3,i2=λi2λiλi. Moreover, by comparing Equations (187) with (178) and (188) with (177), it is straightforward to show that f(·)=HU. If HU is not an invertible matrix for all values of the distortion ΔX, then I(X;ZT)I(Y;ZT)I(X;Z)I(Y;Z).

By (188) it is easy to show that if min(λi,δi)=λi, HU is not invertible. This implies I(X;ZT)I(Y;ZT)I(X;Z)I(Y;Z). □

5.2. Simulations

In this section, we provide an example to show the gap between the classical rate distortion RX(ΔX) defined in (154), the conditional distortion function RX|Y(ΔX) (69), and to verify the validity of Gray’s lower bound (8). Note that in Theorem 5 it is shown that RX|Y(ΔX)=R¯(ΔX), and hence the plot for R¯(ΔX) is omitted. For the evaluation, we pick a joint covariance matrix (11) given by

Q(X,Y)=2.50001.12500.47500.61251.12500.81250.27500.30630.47500.27500.15250.16250.61250.30630.16250.2031,X:ΩR2,Y:ΩR2.

In order to compute the rates, we first have to find QX,QY,QXY and QX|Y. From the definition of Q(X,Y) given in (11), it is easy to see that the covariance of X, Y, and the joint covariance of X and Y are equal to

QX=2.50001.12501.12500.8125,QY=0.15250.16250.16250.2031,QXY=0.47500.61250.27500.3063.

Then, the conditional covariance QX|Y, which appears in RX|Y(ΔX), can be computed from (27). Using Singular Value Decomposition (SVD), we can calculate the eigenvalues of QX|Y. For this case, the eigenvalues of the conditional covariance are {0.7538,0.2}. Similarly, the eigenvalues of QX can be determined. Finally, the eigenvalues of QX and QX|Y are passed to the water-filling to compute the RX(ΔX) and RX|Y(ΔX), respectively.

The classical rate distortion, the conditional RDF, and the Gray’s lower bound for the joint covariance above are illustrated in Figure 5. It is clear that RX|Y(ΔX) is smaller, and as the distortion ΔX increases, the gap between the classical and conditional RDF becomes larger. Gray’s lower bound is achievable for some positive distortion values, as provided in (71), i.e., for ΔX{ΔX[0,):ΔXnxλnx}. Recall that the set of eigenvalues of QX|Y is {0.7538,0.2}, and the lower bound is achievable for ΔX2·0.2=0.4; i.e., for these values RX|Y(ΔX)=RX(ΔX)I(X;Y).

Figure 5.

Figure 5

Comparison of classical RDF, RX(ΔX), conditional RDF RX|Y(ΔX)=R¯(ΔX), and Gray’s lower bound RX(ΔX)I(X;Y) (solid green line).

6. Conclusions

We derived nontrivial structural properties of the optimal test channel realizations that achieve the optimal test channel distributions of the characterizations of RDFs for a tuple of multivariate jointly independent and identically distributed Gaussian random variables with mean-square error fidelity for two cases. Initially, the side information was available at the encoder and decoder, and then it was only available at the decoder. Using the realizations of the optimal test channels, we showed that when the side information is known to the encoder and the decoder, it does not achieve a better compression compared to when side information is only known to the decoder.

Appendix A

Appendix A.1. Proof of Lemma 1

(a) By the chain rule of mutual information,

I(X;X^,Y)=I(X;Y|X^)+I(X;X^) (A1)
=I(X;X^|Y)+I(X;Y) (A2)

Since I(X;Y|X^)0, then from the above it follows

I(X;X^)I(X;X^|Y)+I(X;Y) (A3)
I(X;X^|Y)I(X;X^)I(X;Y) (A4)

The above shows (40). However, the inequality holds with equality if and only if I(X;Y|X^)=0, and this quantity is zero if and only if PX|X^,Y=PX|X^. Alternatively, we note the following:

I(X;X^|Y)=ElogPX|X^,YPX|Y=ElogPX|X^,YPX|YPXPX=ElogPX|X^,YPXlogPX|YPX=ElogPX|X^PXlogPX|YPX,ifandonlyifPX|X^,Y=PX|X^.

This completes the statement of equality of (40); i.e., it establishes equality (41). (b) Consider a test channel PX|X^,Y such that E{||XX^||Rnx2ΔX, i.e., X^M0(ΔX), and such that PX|X^,Y=PX|X^, for ΔXDC(X|Y)[0,). By (41) taking the infimum of both sides over X^M0(ΔX) such that PX|X^,Y=PX|X^, then (43) is obtained on a nontrivial surface, i.e., ΔXDC(X|Y), which exists due to continuity and convexity of RX(ΔX) for ΔX(0,). This completes the proof.

Appendix A.2. Proof of Theorem 2

(a) (1) By properties of conditional mutual information [18],

I(X;X^|Y)=(α)I(X;X^,X¯cm|Y) (A5)
=(β)I(X;X^|X¯cm,Y)+I(X;X¯cm|Y) (A6)
(γ)I(X;X¯cm|Y) (A7)

where (α) is due to X¯cm being a function of (Y,X^), and a well-known property of the mutual information [18] (β) is due to the chain rule of mutual information [18], and (γ) is due to I(X;X^|X¯cm,Y)0. Hence, inequality (45) is shown. (2) If (i) holds, i.e., X^=X¯cm- a.s, then I(X;X^|X¯cm,Y)=0, and hence the inequality (45) becomes an equality. If (ii) holds, since for fixed yY the function e(y,·):X^X, e(y,x^)=x¯cm uniquely defines x^, then I(X;X^|X¯cm,Y)=0, and the inequality (45) becomes an equality.

(b) The inequality (48) is well known due to the orthogonal projection theorem.

Appendix A.3. Proof of Theorem 7

Consider the realization (88). We identify the triple (H,G,QW) such that (84) or (87) hold; i.e., we characterize the set M0G,o(ΔX).

Case (i). cov(X^,X^|Y)0, that is, rank(QX^|Y)=nx. By Theorem 6, Case (i), we seek the triple (H,G,QW) such that (84) holds, i.e., X¯cm=X^a.s. Recall that Conditions 1 and 2 of Theorem 6 are sufficient for X^=X¯cm.

Condition 1, i.e., (85). The left-hand side part of (85) is given by (this follows from mean-square estimation theory, or an application of (26) with G={Ω,})

EX|Y=EX+cov(X,Y)cov(Y,Y)1YEY=cov(X,Y)cov(Y,Y)1Y (A8)
=QX,YQY1Y (A9)
=QXCTQY1Ybymodel(15)(18). (A10)

Similarly, the right hand side of (85) is given by

EX^|Y=EX^+cov(X^,Y)cov(Y,Y)1YEY=cov(X^,Y)cov(Y,Y)1Y (A11)
=HQX,Y+GQYQY1Y (A12)
=HQXCT+GQYQY1Yby(15)(18) (A13)

Equating (A9) and (A12), then

EX|Y=EX^|Y (A14)
QX,YQY1Y=HQX,Y+GQYQY1Yby(A12) (A15)
G=IHQX,YQY1 (A16)
G=IHQXCTQY1by(15)(18). (A17)

Hence, G is obtained, and the reproduction is represented by

X^=HX+IHQX,YQY1Y+W, (A18)
EX^|Y=QX,YQY1Y=EX|Y, (A19)
X^EX^|Y=HXHQX,YQY1Y+W. (A20)

Condition 2, i.e., (86). To apply (86), the following calculations are needed.

QX|Y=cov(X,X|Y)=EXEX|YXEX|YT (A21)
=QXQX,YQY1QX,YT (A22)
=QXQXCTQY1CQXby(15)(18) (A23)
cov(X,X^|Y)=EXEX|YX^EX^|YT
=EXEX|YX^EX|YTby(A19) (A24)
=EXEX|YX^Tbyorthogonality (A25)
=QXHTQX,YQY1QY,XHTby(A18),(A19) (A26)
=QXHTQXCTQY1CQXHTby(15)(18)=QXQXCTQY1CQXHT (A27)
=QX|YHT. (A28)
cov(X^,X^|Y)=EX^EX^|YX^EX^|YT=HQXHT+QWHQX,YQY1QY,XHTby(A20) (A29)
=HQXHT+QWHQXCTQY1CQXHTby(15)(18)=HQXQXCTQY1CQXHT+QW (A30)
=HQX|YHT+QW. (A31)

By Condition 2 and (A28) and (A31),

cov(X,X^|Y)cov(X^,X^|Y)1=InxQX|YHTHQX|YHT+QW1=Inx (A32)
QW=QX|YHTHQX|YHT (A33)
QW=InxHQX|YHT. (A34)

It remains to show QW=QWT. This will follow shortly by identifying the equation for H as follows. Conditions 1 and 2 imply

ΣΔ=cov(X,X|Y,X^) (A35)
=cov(X,X|Y)cov(X,X^|Y)cov(X^,X^|Y)1cov(X,X^|Y)T,mboxbyPropostion1,(26) (A36)
=cov(X,X|Y)cov(X,X^|Y)T,by(86) (A37)
=QX|YHQX|Y,by(A28). (A38)
HQX|Y=QX|YΣΔ (A39)
H=IΣΔQX|Y1 (A40)

By (A39), it then follows from (A33) that QW=QWT. From the specification of G the equation of QW given by (A33) and (A34) and HQX|Y,H given by (A39), and (A40) then follows the realization of Theorem 4.(a) for the case QX|YΣΔ0. Properties (58)–(61) are easily verified.

Case (ii). cov(X^,X^|Y)0 but not cov(X^,X^|Y)0, that is, rank(QX^|Y)=n1<nx. We can verify that the stated realization in Theorem 4.(a) is such that Condition (87) holds. By (83) and the above calculations, we have

X¯cm=eG(Y,X^)=EX|Y+cov(X,X^|Y)cov(X^,X^|Y)X^EX^|Y (A41)
=EX|Y+QX|YΣΔQX|YΣΔX^EX^|Y. (A42)

Since QX^|Y=QX|YΣΔ, EX^EX^|YX^EX^|YT=QX|YΣΔ, and rank(L)=n1, where L=QX|YΣΔQX|YΣΔ, then an application of Proposition 3, implies that Condition (87) holds. Thus, we have established Theorem 3.(a) and the properties stated under Theorem 4.(a). (i)–(iv). Finally, (96)–(101) are obtained from the realization, and hence Theorem 3.(b) is achievable.

Appendix A.4. Proof of Corollary 1

(a) This part is a special case of a related statement in [22]. However, we include it for completeness. By linear algebra [21], given two matrices AS+k×k,BS+k×k, then the following statements are equivalent: (1) AB is normal and (2) AB0, where AB normal means (AB)(AB)T=(AB)T(AB). Note that AB is normal if and only if AB=BA; i.e., they commute. Let A=UADAUAT,B=UBDBUBT,UAUAT=Ik,UBUBT=Ik,; i.e., there exists a spectral representation of A,B in terms of unitary matrices UA,UB and diagonal matrices DA,DB. Then, AB0 if and only if the matrices A and B commute; i.e., AB=BA, and A and B commute if and only if UA=UB.

Suppose (105) holds. Letting A=QX|Y,B=ΣΔ, then A=UADAUAT, B=UBDBUBT,UAUAT=Inx,UBUBT=Inx,UA=UB. Since QX|Y1=A1=UADA1UAT, then ΣΔQX|Y1=QX|Y1ΣΔ; i.e., they commute. Hence,

HtT=(Inx(ΣΔQX|Y1)T=Inx(QX|Y1)TΣΔT=InxQX|Y1ΣΔ=InxΣΔQX|Y1=HsinceQX|YandΣΔcommute. (A43)

By the definition of QW given in Theorem 4.(a), we have

QW=ΣΔHT=QWT=HΣΔ. (A44)

Substituting (A43) into (A44), then

QW=ΣΔH. (A45)

Hence, {ΣΔ,ΣX|Y,H,QW} are all elements of S+p×p having a spectral decomposition with respect to the same unitary matrix UUT=Inx.

Author Contributions

M.G. and C.D.C. contributed to the conceptualization, methodology, and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Funding Statement

The work of M.G. and C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCELLENCE/1216/0296).

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Wyner A., Ziv J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory. 1976;22:1–10. doi: 10.1109/TIT.1976.1055508. [DOI] [Google Scholar]
  • 2.Wyner A. The rate-distortion function for source coding with side information at the decoder-II: General sources. Inf. Control. 1978;38:60–80. doi: 10.1016/S0019-9958(78)90034-7. [DOI] [Google Scholar]
  • 3.Berger T. Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs; Prentice-Hall, NJ, USA: 1971. [Google Scholar]
  • 4.Gray R.M. A new class of lower bounds to information rates of stationary sources via conditional rate-distortion functions. IEEE Trans. Inf. Theory. 1973;19:480–489. doi: 10.1109/TIT.1973.1055050. [DOI] [Google Scholar]
  • 5.Tian C., Chen J. Remote Vector Gaussian Source Coding With Decoder Side Information Under Mutual Information and Distortion Constraints. IEEE Trans. Inf. Theory. 2009;55:4676–4680. doi: 10.1109/TIT.2009.2027519. [DOI] [Google Scholar]
  • 6.Zahedi A., Ostergaard J., Jensen S.H., Naylor P., Bech S. Distributed remote vector gaussian source coding with covariance distortion constraints; Proceedings of the 2014 IEEE International Symposium on Information Theory; Honolulu, HI, USA. 29 June–4 July 2014; pp. 586–590. [Google Scholar]
  • 7.Gkagkos M., Charalambous C.D. Structural Properties of Test Channels of the RDF for Gaussian Multivariate Distributed Sources; Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT); Melbourne, Australia. 12–20 July 2021; pp. 2631–2636. [DOI] [Google Scholar]
  • 8.Draper S.C., Wornell G.W. Side information aware coding strategies for sensor networks. IEEE J. Sel. Areas Commun. 2004;22:966–976. doi: 10.1109/JSAC.2004.830875. [DOI] [Google Scholar]
  • 9.Oohama Y. Gaussian multiterminal source coding. IEEE Trans. Inf. Theory. 1997;43:1912–1923. doi: 10.1109/18.641555. [DOI] [Google Scholar]
  • 10.Oohama Y. Rate-distortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder. IEEE Trans. Inf. Theory. 2005;51:2577–2593. doi: 10.1109/TIT.2005.850110. [DOI] [Google Scholar]
  • 11.Viswanathan H., Berger T. The quadratic Gaussian CEO problem. IEEE Trans. Inf. Theory. 1997;43:1549–1559. doi: 10.1109/18.623151. [DOI] [Google Scholar]
  • 12.Ekrem E., Ulukus S. An outer bound for the vector Gaussian CEO problem; Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings; Cambridge, MA, USA. 1–6 July 2012; pp. 576–580. [Google Scholar]
  • 13.Wang J., Chen J. Vector Gaussian Multiterminal Source Coding. IEEE Trans. Inf. Theory. 2014;60:5533–5552. doi: 10.1109/TIT.2014.2333473. [DOI] [Google Scholar]
  • 14.Xu Y., Guang X., Lu J., Chen J. Vector Gaussian Successive Refinement With Degraded Side Information. IEEE Trans. Inf. Theory. 2021;67:6963–6982. doi: 10.1109/TIT.2021.3107215. [DOI] [Google Scholar]
  • 15.Renna F., Wang L., Yuan X., Yang J., Reeves G., Calderbank R., Carin L., Rodrigues M.R.D. Classification and Reconstruction of High-Dimensional Signals From Low-Dimensional Features in the Presence of Side Information. IEEE Trans. Inf. Theory. 2016;62:6459–6492. doi: 10.1109/TIT.2016.2606646. [DOI] [Google Scholar]
  • 16.Salehkalaibar S., Phan B., Khisti A., Yu W. Rate-Distortion-Perception Tradeoff Based on the Conditional Perception Measure; Proceedings of the 2023 Biennial Symposium on Communications (BSC); Montreal, QC, Canada. 4–7 July 2023; pp. 31–37. [Google Scholar]
  • 17.Gallager R.G. Information Theory and Reliable Communication. John Wiley & Sons, Inc.; New York, NY, USA: 1968. [Google Scholar]
  • 18.Pinsker M.S. The Information Stability of Gaussian Random Variables and Processes. Volume 133. Holden-Day, Inc.; San Francisco, CA, USA: 1964. pp. 28–30. [Google Scholar]
  • 19.Aries A., Liptser R., Shiryayev A. Statistics of Random Processes II: Applications. Springer; New York, NY, USA: 2013. Stochastic Modelling and Applied Probability. [Google Scholar]
  • 20.van Schuppen J. Control and System Theory of Discrete-Time Stochastic Systems. Springer; Berlin/Heidelberg, Germany: 2021. Number 923 in Communications and Control Engineering. [DOI] [Google Scholar]
  • 21.Horn R.A., Johnson C.R., editors. Matrix Analysis. 2nd ed. Cambridge University Press; New York, NY, USA: 2013. [Google Scholar]
  • 22.Charalambous C., Charalambous T., Kourtellaris C., van Schuppen J. Structural Properties of Nonanticipatory Epsilon Entropy of Multivariate Gaussian Sources; Proceedings of the 2020 IEEE International Symposium on Information Theory; Los Angeles, CA, USA. 21–26 June 2020; pp. 586–590. [Google Scholar]
  • 23.Gorbunov A.K., Pinsker M.S. Prognostic Epsilon Entropy of a Gaussian Message and a Gaussian Source. Probl. Inf. Transm. 1974;10:93–109. [Google Scholar]
  • 24.Ihara S. Information Theory for Continuous Systems. World Scientific; Singapore: 1993. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data are contained within the article.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES