Skip to main content
Entropy logoLink to Entropy
. 2024 Feb 20;26(3):178. doi: 10.3390/e26030178

Blahut–Arimoto Algorithms for Inner and Outer Bounds on Capacity Regions of Broadcast Channels

Yanan Dou 1, Yanqing Liu 2, Xueyan Niu 3, Bo Bai 3, Wei Han 3, Yanlin Geng 1,*
Editor: Eduard Jorswieck
PMCID: PMC10969477  PMID: 38539690

Abstract

The celebrated Blahut–Arimoto algorithm computes the capacity of a discrete memoryless point-to-point channel by alternately maximizing the objective function of a maximization problem. This algorithm has been applied to degraded broadcast channels, in which the supporting hyperplanes of the capacity region are again cast as maximization problems. In this work, we consider general broadcast channels and extend this algorithm to compute inner and outer bounds on the capacity regions. Our main contributions are as follows: first, we show that the optimization problems are max–min problems and that the exchange of minimum and maximum holds; second, we design Blahut–Arimoto algorithms for the maximization part and gradient descent algorithms for the minimization part; third, we provide convergence analysis for both parts. Numerical experiments validate the effectiveness of our algorithms.

Keywords: Blahut–Arimoto algorithm, broadcast channel, capacity region, superposition coding inner bound, Marton’s inner bound, UV outer bound

1. Introduction

In 1972, Cover [1] introduced the two-receiver discrete memoryless broadcast channel p(y,z|x) to model a system of downlink communication in which X is the sender and (Y,Z) are the receivers. In the same paper, he proposed a coding scheme which resulted in the superposition coding inner bound (SCIB). It turns out that the SCIB is indeed the capacity region for two-receiver broadcast channels in which the receivers are comparable in the following partial orders: degraded [2], less noisy [3], and more capable [3]. However, for a general broadcast channel the single-letter capacity region remains open.

To characterize the capacity region of a broadcast channel, a standard approach is to show that one inner bound matches another outer bound. Currently, the best inner bound for general broadcast channels is Marton’s inner bound (MIB) [4], while the UV outer bound (UVOB) [5] was the best outer bound until recently, when a better one called J version outer bound was proposed in [6].

The evaluation of inner and outer bounds is critical in the following aspects: (1) the evaluation of an inner bound usually results in an optimal input distribution which can help in the design of practical coding schemes; (2) the identification of the capacity region of a particular broadcast channel through the comparison of one inner bound and one outer bound relies on the evaluation of these two bounds; and (3) when claiming to establish a new bound, it is necessary to show that the new bound strictly improves on old ones through the evaluation of bounds on a particular channel.

Remark 1.

This is the full version of conference papers accepted by ISIT 2022 and 2023 [7,8].

However, this evaluation is usually difficult due to its non-convexity [9]. To alleviate this issue, there exist a number of generic optimization algorithms, such as interior point [10], active set [10], and sequential quadratic programming [11]. However, efficient algorithms should use the domain knowledge of information theory as well; from this viewpoint, we consider the Blahut–Arimoto (BA) algorithm, which is specially customized for information theory.

The original BA algorithm was independently developed by Blahut [12] and Arimoto [13] to calculate the channel capacity C=maxq(x)I(X;Y) for a general point-to-point channel p(y|x). The algorithm transforms the original maximization problem into an alternating maximization problem:

maxq(x)x,yq(x)p(y|x)lnq(x|y)q(x),maxq(x),Q(x|y)x,yq(x)p(y|x)lnQ(x|y)q(x)

where the updating formulae are explicit within each iteration.

There have been numerous extensions of the BA algorithm to various scenarios in information theory. For example, [14] applied the BA algorithm to compute the sum rate of the multiple-access channel. Later, using the idea from the BA algorithm, the whole capacity region of the multiple-access channel was formulated in [15] as a rank-one constrained problem and solved by relaxation methods. It is beyond the scope of this paper to list all of these references. Instead, we discuss those papers closely related to computing the bounds on capacity regions of broadcast channels.

In [16], the authors considered the capacity region of a degraded broadcast channel p(y,z|x), where receiver Z is a degraded version of Y.

In this scenario, the capacity region of the rate pairs (RY,RZ) is known, and can be achieved by the simplified version of superposition coding. The supporting hyperplanes can be characterized as

θRY+(1θ)RZ=maxq(u,x)θI(X;Y|U)+(1θ)I(U;Z).

Using a similar idea to that of the BA algorithm, the authors designed an algorithm to alternatively maximize the objective function.

The method in [16] is directly applicable to less noisy broadcast channels, as the characterization of the capacity region is the same as that of the degraded case. However, this equivalence no longer holds for the more capable case, as this time the value of the supporting hyperplane θRY+(1θ)RZ is characterized as a max–min optimization problem (e.g., see Equation (14)). As a mater of fact, the supporting hyperplanes of the above-mentioned bounds, that is, SCIB, MIB, and UVOB, are all of the max–min form. The main issue is that the minimization part is inside the maximization part, which prevents the application of the BA algorithm to the whole problem.

The algorithms for calculating inner bounds and outer bounds for general broadcast channels are very limited. The authors of [17] considered MIB (see Section 3.2) and designed a BA algorithm to compute the sum rate RY+RZ of the simplified version, where the auxiliary random variable W=. The objective function,

u,v,x,y,zq(u,v)q(x|u,v)p(y,z|x)lnQ(u|y)Q(v|z)q(u,v)

is convex in q(x|u,v), which means that the maximum input X is a function of (U,V). Noticing this, the authors performed optimization over all fixed mappings q(x|u,v). However, discarding W can result in a strictly smaller sum rate [18], making it is necessary to consider the complete version of MIB.

In this paper, we seek to design BA algorithms for general broadcast channels in order to compute the following inner and outer bounds: SCIB, MIB, and UVOB. The key difference here is that the optimization problems are max–min problems, rather than only containing a maximization part. In Table 1, we provide an intuitive comparison of related references.

Table 1.

Comparison of typical scenarios related to the BA algorithm.

Channel Reference Objective Form Algorithm
point-to-point [12,13] capacity max BA
multiple access [14] sum-rate max BA
multiple access [15] inner/outer bounds max relaxation
degraded broadcast [16] capacity region max BA
general broadcast this paper inner/outer bounds max–min BA + gradient

The notation we use is as follows. p denotes a fixed (conditional) probability distribution such as p(y,z|x), while q and Q are used for (conditional) probabilities that are changeable. Calligraphic letters such as S are used to denote sets. The use of square brackets in the function f[g] means that f is specified by the variable g; θ¯ denotes θ¯:=1θ; and unless otherwise specified, we use the natural logarithm. To make the mathematical expressions more concise, we use the following abbreviations of the Kullback–Leibler divergences:

DY(p||q):=yp(y)lnp(y)q(y),DY|x(p||q):=yp(y|x)lnp(y|x)q(y|x),DY|X(p||q):=xp(x)·DY|x(p||q).

The organization is as follows. First, in Section 2 we introduce the necessary background on the BA algorithm and its extension in [16]. Then, in Section 3 we extend the BA algorithm to the evaluation of SCIB, MIB, and UVOB. Convergence analyses of these algorithms are presented in Section 4. Finally, in Section 5 we perform numerical experiments to validate the effectiveness and efficiency of our algorithms.

2. Mathematical Background of Blahut–Arimoto Algorithms

We first introduce the standard BA algorithm in Section 2.1, as we will rely on several of its properties later. Then, in Section 2.2 we discuss why the method in [16] cannot be applied to general broadcast channels.

2.1. Blahut–Arimoto Algorithm for Point-to-Point Channel

For a point-to-point channel p(y|x), the capacity C is the maximum of the mutual information C=maxq(x)I(X;Y)=maxq(x)C(q), where

C(q)=H(X)H(X|Y)=x,yq(x)p(y|x)lnq(x|y)q(x). (1)

By replacing q(x|y) with a free variable Q(x|y), the BA algorithm performs alternating maximization maxq,QC(q,Q), where

C(q,Q)=x,yq(x)p(y|x)lnQ(x|y)q(x). (2)

Notice here that we abuse the notation of C(·), which should not cause confusion in general.

The above objective function can be reformulated as follows:

C(q,Q)=xq(x)(d[Q](x)lnq(x)), (3)
whered[Q](x)=yp(y|x)lnQ(x|y). (4)

We call this the basic form. For different scenarios, BA algorithms mainly differ in the distribution q(·) and function d[Q](·).

The following theorem (see proof in Appendix A) provides the explicit formulae for the maximum Q given q (denoted as Q[q]) and maximum q given Q (denoted as q[Q]).

Theorem 1.

The following properties hold for the problem maxq,QC(q,Q).

  1. Given a fixed q, C(q,Q) is concave in Q, and the maximum point Q[q] is induced by the input and the channel,
    Q[q](x|y)=q(x|y)=q(x)p(y|x)xq(x)p(y|x). (5)
    Further, the function values satisfy
    C(q)=C(q,Q[q])C(q,Q). (6)
  2. Given a fixed Q, C(q,Q) is concave in q, and the maximum point q[Q] is obtained by the Lagrangian,
    q[Q]=exp{d[Q](x)}xexp{d[Q](x)}. (7)
    Further, evaluation of the function value results in
    C(q[Q],Q)=lnxexp{d[Q](x)}=d[Q](x)lnq[Q](x),x. (8)

Starting from an initial q0(x)>0, the BA algorithm performs alternating maximization and produces a sequence of points

q0Q1q1Q2qnQn+1

where

Qn=Q[qn1],qn=q[Qn]

according to Equations (5) and (7), respectively.

The criterion for stopping the iterations is based on the following result [12,13].

Proposition 1

(Proposition 1 in [13], Theorem 2 in [12]). It is the case that q maximizes C(q) if and only if the following holds for Q[q] and some scalar D:

d[Q[q]](x)lnq(x)=D,q(x)>0,D,q(x)=0.

Remark 2.

It should be mentioned that in order to avoid infinity minus infinity for those q(x)=0 in the above proposition, the following equivalent formulae can be used:

d[Q[q]](x)lnq(x)=yp(y|x)lnq(x|y)q(x)=yp(y|x)lnp(y|x)q(y). (9)

This is not be an issue in the BA algorithm, as qn(x)>0 according to Equation (7).

Thus, at the end of the n-th step, if the difference

maxx{d[Q[qn1]](x)lnqn1(x)}C(qn,Qn)=maxx{d[Qn](x)lnqn1(x)}lnxexp{d[Qn](x)}

is small enough then the iteration is stopped.

Summarizing the above details, we arrive at the BA algorithm depicted in Algorithm 1. The convergence of the resulted sequence of C(qn,Qn) is characterized in the following theorem (see proof in Appendix A).

Theorem 2

(Theorem 1 in [13], Theorem 3 in [12]). If p0>0, then the value C(qn,Qn) converges monotonically from below to the capacity C.

Algorithm 1: Computing channel capacity
Input: p(y|x), maximum iterations N, threshold ϵ>0;
Initialization: q0(x)>0, ϵ0>ϵ, n=0;
while n<N and ϵn>ϵ do
    ( ( nn+1;
Qn=Q[qn1] using Equation (5);
qn=q[Qn] using Equation (7);
C(qn,Qn)=lnxexp{d[Qn](x)} using Equations (8) and (4);
ϵn=maxx{d[Qn](x)lnqn1(x)}C(qn,Qn) using Equation (4);
end
Output: qn(x), Qn(x|y), C(qn,Qn)

2.2. Blahut–Arimoto Algorithm for Degraded Broadcast Channel

In [16], the authors considered the capacity region of the degraded broadcast channel. The original objective function for the value of the supporting hyperplane θRY+θ¯RZ (where θ<12) is:

F(q(u,x))=θI(X;Y|U)+θ¯I(U;Z)=θ¯(I(X;Y|U)+I(U;Z))(θ¯θ)I(X;Y|U) (10)
 =θ¯(H(U,X)H(X|Y,U)H(U|Z))(θ¯θ)(H(Y|U)H(Y|X)). (11)

Similar to the BA algorithm, the new objective function is

F(q,Q)=θ¯·u,xq(u,x)(d[Q](u,x)lnq(u,x)), (12)

where

d[Q](u,x)=y,zp(y,z|x)lnQ(x|y,u)Q(u|z)+θ¯θθ¯lnQ(y|u)p(y|x). (13)

Then, the authors designed an extended BA algorithm that alternately maximizes F(q,Q) and analysed the convergence.

However, the above method does not generalize to allow for evaluating the capacity bounds of general broadcast channels. The main reason can be summarized in just one sentence: the minimum of the expectation is, in general, greater than the expectation of the minimum. Taking SCIB as an example, using the representation A(U,X) in Lemma 1, the supporting hyperplane is

θRY+θ¯RZ=maxq(u,x)θI(X;Y|U)+θ¯·min{I(U;Z),I(U;Y)}=maxq(u,x)(θ¯(H(U,X)H(X|Y,U)+min{H(U|Z),H(U|Y)})(θ¯θ)(H(Y|U)H(Y|X))).

A direct extension might try to reformulate the last expression in the form of Equation (12) by letting

d[Q](u,x)=y,zp(y,z|x)lnQ(x|y,u)+min{lnQ(u|z),lnQ(u|y)}+θ¯θθ¯lnQ(y|u)p(y|x),

however, this is not an equivalent reformulation, as

min{H(U|Z),H(U|Y)}u,xq(u,x)·y,zp(y,z|x)·min{lnQ(u|z),lnQ(u|y)}.

In the following section, we first use the fact that min{f,g}=minα[0,1]{αf+α¯g} to parameterize the minimum, then show the max–min exchanges so that we can apply the BA algorithm in the maximization part.

3. Blahut–Arimoto Algorithms for Capacity Bounds of Broadcast Channels

In this section, we first introduce two inner bounds and one outer bound on the capacity region of the broadcast channel. We characterize their supporting hyperplanes as max–min problems and show that the maximum and minimum can be exchanged. Then, we design BA algorithms for the maximization parts and gradient descent algorithms for the minimization parts.

3.1. Superposition Coding Inner Bound

The superposition coding inner bound was proposed by Cover in [1], which corresponds to region B in the following lemma (see proof in Appendix B). This region actually has three equivalent characterizations.

Lemma 1

(folklore). The following regions A, B, and C are equivalent characterizations of the superposition coding inner bound:

A:=q(u,x)A(U,X):=RYI(X;Y|U),RZmin{I(U;Z),I(U;Y)},B:=q(u,x)B(U,X):=RYI(X;Y|U),RZI(U;Z),RY+RZI(X;Y),C:=q(u,x)C(U,X):=RZI(U;Z),RY+RZmin{I(X;Y|U)+I(U;Z),I(X;Y)}.

To characterize the supporting hyperplane θRY+θ¯RZ=F, we choose to use the representation A(U,X). It is clear that

F=maxq(u,x)θI(X;Y|U)+θ¯·min{I(U;Z),I(U;Y)}=maxq(u,x)θI(X;Y|U)+θ¯·minα[0,1]αI(U;Z)+α¯I(U;Y)=maxq(u,x)minα[0,1]θI(X;Y|U)+θ¯αI(U;Z)+θ¯α¯I(U;Y). (14)

Notice here that we cannot use the BA algorithm directly, as there is a minimum inside the maximum. If we are able to swap the orders of the maximum and minimum, then we can adopt the BA algorithm in the maximization part.

To show this kind of exchange, we first introduce a Terkelsen-type min–max result in the following lemma.

Lemma 2

(Corollary 2 in Appendix A of [19]). Let Λd be the d-dimensional simplex, i.e., λi0 and i=1dλi=1, let P be a set of probability distributions p(u), and let Ti(p(u)), i=1,..,d be a set of functions such that the set T defined by

T={(a1,...,ad)Rd:aiTi(p(u))forsomep(u)P}

is a convex set; then,

supp(u)PminλΛdiλiTi(p(u))=minλΛdsupp(u)PiλiTi(p(u)).

With this lemma, we are to establish the following theorem (proof in Appendix B).

Theorem 3.

The supporting hyperplane θRY+θ¯RZ=F of the superposition coding inner bound is as follows: if θ[12,1], then F=maxq(x)θI(X;Y); otherwise, if θ[0,12) then

F=min{minα12θ1θmaxq(x)θ¯α¯I(X;Y)+θ¯αI(X;Z),minα>12θ1θmaxq(u,x)θI(X;Y|U)+θ¯α¯I(U;Y)+θ¯αI(U;Z)}.

Further, it suffices to consider the cardinality size: |U||X|.

For the maximization part and the nontrivial case where θ(0,12), following the above theorem, two types of BA algorithms can be designed according to the value of α.

When α(12θ1θ,1], the original objective function is

F(α,q)=θI(X;Y|U)+θ¯α¯I(U;Y)+θ¯αI(U;Z)=θ¯(I(X;Y|U)+α¯I(U;Y)+αI(U;Z))(θ¯θ)I(X;Y|U)=θ¯(H(U,X)H(X|Y,U)α¯H(U|Y)αH(U|Z)) (15)
+(θ¯θ)(H(Y|X)H(Y|U))=θ¯·u,x,y,zq(u,x)p(y,z|x)lnq(x|y,u)q(u,x)q(u|y)α¯q(u|z)α+θ¯θθ¯lnq(y|u)p(y|x). (16)

By replacing the conditional qs with free variables Qs, we have the new objective function

F(α,q,Q)=θ¯·u,xq(u,x)(d[Q](u,x)lnq(u,x)), (17)

where

d[Q](u,x)=y,zp(y,z|x)lnQ(x|y,u)Q(u|y)α¯Q(u|z)α+θ¯θθ¯lnQ(y|u)p(y|x). (18)

When α[0,12θ1θ], similar to Equation (4), the new objective function is

F(α,q,Q)=θ¯·xq(x)(d[Q](x)lnq(x)), (19)

where

d[Q](x)=y,zp(y,z|x)(α¯lnQ(x|y)+αlnQ(x|z)). (20)

For the minimization part, it is possible to use the optimal q and Qs obtained in the maximization part to update α. Because the values of the optimal q and Qs may vary greatly when α changes, we propose changing α locally in the neighbourhood. A candidate approach is to use the gradient descent method, as follows:

αk+1=αkτk·αF(αk,q), (21)

where

F(α,q)α=θ¯·(I(X;Z)I(X;Y)),if0α12θ1θ,θ¯·(I(U;Z)I(U;Y)),if12θ1θ<α1. (22)

If the change in α is sufficiently small, it can be assumed that the optimization with respect to α converges and then stop the iteration.

We summarize the above procedures in Algorithm 2. Note that the updating rules for the q and Qs depend on the interval in which the value of α falls.

Algorithm 2: Computing the superposition coding inner bound for θ(0,12)
Input: p(y,z|x), maximum iterations K, N, thresholds η, ϵ>0, step size τ>0;
Initialization: α0(0,1), q0(u,x)>0, ηα>η, k=0;
while k<K and ηα>η do
initialize ϵq>ϵ, n=0;
while n<N and ϵq>ϵ do
nn+1;
Qn=Q[qn1] using Equation (5) similarly;
qn=q[Qn] using Equation (7) similarly;
F(αk,qn,Qn)=θ¯·lnu,xexp{d[Qn]} using Equation (20) or (18);
ϵq=θ¯·max{d[Qn]lnqn1}F(αk,qn,Qn);
end
( kk+1;
calculate αk using Equations (21) and (22);
αkmin{1,max{0,αk}};
ηα=|αkαk1|;
q0qn;
end
Output: αk, qn(u,x), Qn, F(αk1,qn,Qn)

Remark 3.

Given αk, according to Equation (8),

F(αk,qn,Qn)=θ¯·lnu,xexp{d[Qn](u,x)},exp{1θ¯F(αk,qn,Qn)}=u,xexp{d[Qn](u,x)}.

According to Equation (18) or (20), u,xexp{d[Qn]} equals a sum of exponents, as it is a function of αk. This is a simple function; thus, we might wonder whether we can minimize F(αk,qn,Qn) to update αk. It turns out that this kind of global updating rule can result in an oscillating effect, as can be observed from Figure 1 in [7]. The main reason for this is that q[Q] depends locally on α; therefore, it is not suitable to update α globally.

3.2. Marton’s Inner Bound

Marton’s inner bound [4] refers to the union over q(u,v,w,x) (such that (U,V,W)X(Y,Z) is Markov) of the non-negative rate pairs (RY,RZ) satisfying

RYI(U,W;Y),RZI(V,W;Z),RY+RZmin{I(W;Y),I(W;Z)}+I(U;Y|W)+I(V;Z|W)I(U;V|W).

For general broadcast channels, this is the most well known inner bound.

In the following, we characterize the supporting hyperplane θRY+θ¯RZ=M of MIB. Because the expressions in MIB have symmetry in Y and Z, without loss of generality we can assume that θθ¯, i.e., θ[0,12]. According to [20], the supporting hyperplane is stated in the following lemma.

Lemma 3.

(Equations (2) and (5) in [20]). The supporting hyperplane θRY+θ¯RZ=M of Marton’s inner bound, where θ[0,12], is

M=maxq(u,v,w,x)minα[0,1]M(α,q(u,v,w,x))=minα[0,1]maxq(u,v,w,x)M(α,q(u,v,w,x)),

where

M(α,q)=(θ¯αθ)I(W;Z)+αθI(W;Y)+θ¯I(V;Z|W)+θI(U;Y|W)θI(U;V|W). (23)

Further, it suffices to consider the following cardinalities: |U|, |V||X|, and |W||X|+4.

To compute the value of this supporting hyperplane, we can reformulate M(α,q) as follows:

M(α,q)=θ¯H(U,V,W,X)(θ¯αθ)H(W|Z)αθH(W|Y)θ¯H(V|W,Z)θH(U|W,Y)(θ¯θ)H(U|V,W)θ¯H(X|U,V,W). (24)

Then, the objective function M(α,q,Q) can be expressed as

M(α,q,Q)=θ¯u,v,w,xq(u,v,w,x)(d[Q](u,v,w,x)lnq(u,v,w,x)), (25)

where

d[Q]=y,zp(y,z|x)(θ¯αθθ¯lnQ(w|z)+αθθ¯lnQ(w|y)+lnQ(v|w,z)+θθ¯lnQ(u|w,y)+θ¯θθ¯lnQ(u|v,w)+lnQ(x|u,v,w)). (26)

For minimization over α, similar to Section 3.1, we update α along the gradient

αk+1=αkτk·αM(αk,q)=αkτk·θ·(I(W;Y)I(W;Z)). (27)

Similar to Algorithm 2, we summarize the algorithm for MIB in Algorithm 3.

Algorithm 3: Computing Marton’s inner bound for θ(0,12]
Input: p(y,z|x), maximum iterations K, N, thresholds η, ϵ>0, step size τ>0;
Initialization: α0(0,1), q0(u,v,w,x)>0, ηα>η, k=0;
while k<K and ηα>η do
initialize ϵq>ϵ, n=0;
while n<N and ϵq>ϵ do
nn+1;
Qn=Q[qn1] using Equation (5) similarly;
qn=q[Qn] using Equation (7) similarly;
M(αk,qn,Qn)=θ¯·lnu,v,w,xexp{d[Qn]} using Equation (26);
ϵq=θ¯·max{d[Qn]lnqn1}M(αk,qn,Qn);
end
kk+1;
calculate αk using Equation (27);
αkmin{1,max{0,αk}};
ηα=|αkαk1|;
q0qn;
end
Output: αk, qn(u,v,w,x), Qn, M(αk1,qn,Qn)

3.3. UV Outer Bound

The UV outer bound [5] refers to the union over q(u,v,x) of non-negative rate pairs (RY,RZ) satisfying

RYI(U;Y),RZI(V;Z),RY+RZI(U;Y)+I(X;Z|U),RY+RZI(V;Z)+I(X;Y|V).

For general broadcast channels, this was the best outer bound until [6] strictly improved upon it over an erasure Blackwell channel. The following theorem (proof in Appendix B) characterizes the supporting hyperplanes.

Theorem 4

(Claim 2 and Remark 1 in [21]). The supporting hyperplane of the UV outer bound is

θRY+θ¯RZ=minα,βmaxq(u,v,x)θ¯αI(V;Z)+θ¯α¯I(X;Z|U)+θβI(U;Y)+θβ¯I(X;Y|V), (28)

where α,β[0,1] satisfy θ¯α+θβmax{θ,θ¯}. Further, it suffices to consider the cardinality sizes |U|,|V||X|.

The original objective function can be reformulated as

G(α,β,q)=θ¯α(I(X;Y|V)+I(V;Z))(θ¯αθβ¯)I(X;Y|V)+θβ(I(X;Z|U)+I(U;Y))(θβθ¯α¯)I(X;Z|U).

The right-hand side contains two parts, both of which are similar to Equation (10), i.e., the objective function of the degraded broadcast channel. It seems workable to apply the BA algorithm twice, however, it should be noted that these two parts are coupled by the same q(x).

Observe that the first part depends only on q(v,x), while the other depends on q(u,x). It suffices to consider the subset of distributions such that q(u,v,x)=q(x)q(v|x)q(u|x). Thus, it is natural to decouple these two parts by fixing q(x) and applying the BA algorithm separately to q(v|x) and q(u|x). After some manipulations, we have

G(α,β,q,Q)=(θ¯α+θβ)H(X)+θ¯α(H(V|X)H(X|Y,V)H(V|Z))(θ¯αθβ¯)(H(Y|V)H(Y|X)) (29)
+θβ(H(U|X)H(X|Z,U)H(U|Y))(θβθ¯α¯)(H(Z|U)H(Z|X))=(θ¯α+θβ)xq(x)lnq(x)+θ¯αx,vq(x)q(v|x)(d1[Q](v,x)lnq(v|x))+θβx,uq(x)q(u|x)(d2[Q](u,x)lnq(u|x)). (30)

The functions d1 and d2 in the above are

d1[Q](v,x)=y,zp(y,z|x)lnQ(x|y,v)Q(v|z)+θ¯αθβ¯θ¯αlnQ(y|v)p(y|x), (31)
d2[Q](u,x)=y,zp(y,z|x)lnQ(x|z,u)Q(u|y)+θβθ¯α¯θβlnQ(z|u)p(z|x). (32)

For fixed q(x)q(v|x)q(u|x), according to Equation (5), the optimal Qs are induced Q[q]s. For fixed Qs, according to Equation  (7), for each x we have

q[Q](v|x)=exp{d1[Q](v,x)}vexp{d1[Q](v,x)}, (33)
q[Q](u|x)=exp{d2[Q](u,x)}uexp{d2[Q](u,x)}. (34)

The value of the objective function is

G(α,β,q(x),q[Q](v|x),q[Q](u|x),Q)=(θ¯α+θβ)xq(x)(d[Q](x)lnq(x)), (35)

where

d[Q](x)=θ¯αθ¯α+θβlnvexp{d1[Q](v,x)}+θβθ¯α+θβlnuexp{d2[Q](u,x)}. (36)

Again, according to Equation (7) the optimal q[Q](x) and corresponding function value are

q[Q](x)=exp{d[Q](x)}xexp{d[Q](x)}, (37)
G(α,β,q[Q],Q)=(θ¯α+θβ)lnxexp{d[Q](x)}. (38)

For minimization over (α,β), similar to Section 3.1, we update (α,β) along the gradient:

αk+1=αkτk·αG(αk,βk,q)=αkτk·θ¯·(I(V;Z)I(X;Z|U)), (39)
βk+1=βkτk·βG(αk,βk,q)=βkτk·θ·(I(U;Y)I(X;Y|V)). (40)

Here, it should be mentioned that (α,β) must satisfy the constraint θ¯α+θβmax{θ,θ¯}. Thus, if the resulting (αk+1,βk+1) violate this constraint, then we need to scale θ¯αk+1+θβk+1 up to be (at least) equal to max{θ,θ¯}. One way to accomplish this is to use the equality to make β dependent on α, in which case the gradient descent update becomes αk+1=αkτk·dk, βk+1=βkτk·(θ¯dk/θ), where

dk=αG(αk,βk,q)θ¯θ·βG(αk,βk,q).

Similar to Algorithm 2, we summarize the algorithm for UVOB in Algorithm 4.

Algorithm 4: Computing the UV outer bound
Input: p(y,z|x), maximum iterations K, N, thresholds η, ϵ>0, step size τ>0;
Initialization: α0,β0(0,1), q0(u,v,x)>0, θ¯α0+θβ0max{θ,θ¯}, ηα,ηβ>η,
  k=0;
while k<K and max{ηα,ηβ}>η do
initialize ϵq>ϵ, n=0;
while n<N and ϵq>ϵ do
nn+1;
Qn=Q[qn1] using Equation (5) similarly;
qn=q[Qn] using Equations (33), (34) and (37);
G(αk,βk,qn,Qn)=(θ¯α+θβ)·lnxexp{d[Qn]} using Equation (36);
ϵq=(θ¯α+θβ)·max{d[Qn](x)lnqn1(x)}G(αk,βk,qn,Qn);
end
kk+1;
calculate αk and βk using Equations (39) and (40);
αkmin{1,max{0,αk}};
βkmin{1,max{0,βk}};
if θ¯αk+θβk<max{θ,θ¯}, scale up to equality;
ηα=|αkαk1|
ηβ=|βkβk1|;
q0qn;
end
Output: αk, βk, qn(u,v,x), Qn, G(αk1,βk1,qn,Qn)

4. Convergence Analysis

Here, we aim to show that certain convergence results hold if qn lies in a proper convex set which contains the global maximizer q. For this purpose, we first introduce the first-order characterization of a concave function.

Lemma 4

(Lemma 3 in [22]). Given a convex set S, a differentiable function f is concave in S if and only if, for all x,yS,

f(y)f(x)(yx)Tf|x0. (41)

Similar to [22], we use the superlevel set to construct the convex set S. Let SF(α,k) be the superlevel set of the objective function F(α,q) of SCIB:

SF(α,k):={q|F(α,q)k}. (42)

For a fixed k, it is possible for SF(α,k) to contain more than one connected set. For qSF(α,k), we denote the connected set that contains q as TF(α,k,q).

Similarly, for MIB and UVOB we define the corresponding (connected) superlevel sets: SM(α,k), TM(α,k,q), SG(α,β,k), TG(α,β,k,q). Note that k here should not be confused with the notation indicating the number of iterations in the algorithms.

4.1. Superposition Coding Inner Bound

According to Theorem 3, the expression of the objective function F(α,q) of SCIB depends on the value of α. Without loss of generality, we can consider the objective function depicted in Equation (16). An equivalent condition for F(α,q) to be concave is provided in the following lemma.

Lemma 5.

Given a convex set S with a distribution q(u,x), then F(α,q) as depicted in Equation (16) is concave in S if and only if, for all q1, q2S, we have

θ¯(DUX+DX|YU+α¯DU|Y+αDU|Z)+(θ¯θ)DY|U0,

where DA|B denotes DA|B(q2||q1).

The following lemma shows that qn+1 lies in the same connected superlevel set as that of qn. The proof (see Appendix C) is similar to that for Lemma 4 in [16].

Lemma 6.

In Algorithm 2, if qn(u,x)SF(α,k), then qn+1TF(α,k,qn).

Fixing α and letting q(u,x) be the maximizer, the following theorem states that the function values F(α,qn,Qn) converge. The proof (see Appendix C) is similar to that of Theorem 2.

Theorem 5.

If q,q0TF(α,k,q˜) for some k and q˜, and if F(α,q) is concave in TF(α,k,q˜), then the sequence F(α,qn,Qn) generated by Algorithm 2 converges monotonically from below to F(α,q).

The following corollary is implied by the proof of Theorem 5.

Corollary 1.

If q,q0TF(α,k,q˜) for some k and q˜, and if F(α,q) is concave in TF(α,k,q˜), then

F(α,q)F(α,qN,QN)θ¯N·DUX(q||q0).

The above analyses deal with F(α,qn,Qn) for a fixed α. When αm changes to αm+1, the estimation for the one-step change in the function value is presented in the following proposition (see proof in Appendix C).

Proposition 2.

Given αm, suppose that Algorithm 2 converges to the optimal variables q˜ and Q˜ such that q˜=q[Q˜] and Q˜=Q[q˜]. Letting αm+1 be updated using Equation (22) and letting q0=q˜ be the initial point for the next round, we have

F(αm+1,q1,Q1)F(αm,q˜)(αm+1αm)2τm.

4.2. Marton’s Inner Bound

Next, we present the convergence results of the BA algorithm for MIB. The proofs are omitted, as they are similar to those for SCIB.

Lemma 7.

Given a convex set S with a distribution q(u,v,w,x), M(α,q) as depicted in Equation (24) is concave in S if and only if, for all q1, q2S, we have

θ¯DUVWX+(θ¯αθ)DW|Z+αθDW|Y+θ¯DV|WZ+θDU|WY+(θ¯θ)DU|VW+θ¯DX|UVW0,

where DA|B denotes DA|B(q2||q1).

Lemma 8.

In Algorithm 3, if qn(u,v,w,x)SM(α,k), then qn+1TM(α,k,qn).

Fixing α and letting q(u,v,w,x) be the maximizer, the following theorem states that the function values M(α,qn,Qn) converge.

Theorem 6.

If q,q0TM(α,k,q˜) for some k and q˜ and if M(α,q) is concave in TM(α,k,q˜), then the sequence M(α,qn,Qn) generated by Algorithm 3 converges monotonically from below to M(α,q).

The following corollary is implied by the proof of Theorem 6.

Corollary 2.

If q,q0TM(α,k,q˜) for some k and q˜, and if M(α,q) is concave in TM(α,k,q˜), then

M(α,q)M(α,qN,QN)θ¯N·DUVWX(q||q0).

The estimation for the one-step change in the function value for MIB is presented in the following proposition.

Proposition 3.

Given αm, suppose that Algorithm 3 converges to the optimal variables q˜ and Q˜ such that q˜=q[Q˜] and Q˜=Q[q˜]. Letting αm+1 be updated using Equation (27) and letting q0=q˜ be the initial point for the next round, we have

M(αm+1,q1,Q1)M(αm,q˜)(αm+1αm)2τm.

4.3. UV Outer Bound

Now, we present the convergence results of of the BA algorithm for UVOB. The proofs are again omitted, as they are similar to those of SCIB.

Lemma 9.

Given a convex set S of distribution q(u,v,x), G(α,β,q) as depicted in Equation (29) is concave in S if and only if, for all q1, q2S, we have

(θ¯α+θβ)DX+θ¯α(DV|X+DX|YV+DV|Z)+(θ¯αθβ¯)DY|V+θβ(DU|X+DX|ZU+DU|Y)+(θβθ¯α¯)DZ|U0,

where DA|B denotes DA|B(q2||q1).

Lemma 10.

In Algorithm 4, if qn(u,v,x)SG(α,β,k), then qn+1TG(α,β,k,qn).

Fixing (α,β) and letting q(u,v,x) be the maximizer, the following theorem states that the function values G(α,β,qn,Qn) converge.

Theorem 7.

If q,q0TG(α,β,k,q˜) for some k and q˜, and if G(α,β,q) is concave in TG(α,β,k,q˜), then the sequence G(α,β,qn,Qn) generated by Algorithm 4 converges monotonically from below to G(α,β,q).

The following corollary is implied by the proof of Theorem 7.

Corollary 3.

If q,q0TG(α,β,k,q˜) for some k and q˜, and if G(α,β,q) is concave in TG(α,β,k,q˜), then

G(α,β,q)G(α,β,qN,QN)θ¯α+θβN·DUVX(q||q0).

The estimation for the one-step change in the function value for UVOB is presented in the following proposition.

Proposition 4.

Given (αm,βm), suppose that Algorithm 4 converges to the optimal variables q˜ and Q˜ such that q˜=q[Q˜] and Q˜=Q[q˜]. Letting (αm+1,βm+1) be updated using Equations (39) and (40) and letting q0=q˜ be the initial point for the next round, we have

G(αm+1,βm+1,q1,Q1)G(αm,βm,q˜)(αm+1αm)2τm(βm+1βm)2τm. (43)

5. Numerical Results

We take the binary skew-symmetric broadcast channel p(y,z|x) as the test channel. The conditional probability matrices are

PY|X=100.50.5,PZ|X=0.50.501.

This is perhaps the simplest broadcast channel for which the capacity region is still unknown.

This broadcast channel plays a very important role in research on capacity bounds. It was first studied in [23] to show that the time-sharing random variable is useful for the Cover–van der Meulen inner bound [24,25]. Later, [26,27,28] demonstrated that the sum rate of the UVOB for this broadcast channel is strictly larger than that of the MIB, showing for the first time that at least one of these two bounds are suboptimal.

Our algorithms are important in at least the following sense: supposing that it is not known whether the MIB matches the UVOB (or the other two bounds for a new scenario) and we want to check this; we can perform an exhaustive search on channel matrices of size 2×2 (or of higher dimensions) to check whether they match. According to the results shown below in Section 5.5, this does not take very much time compared with generic algorithms.

In the following, we apply the algorithms to compute the value of the supporting hyperplane θRY+θ¯RZ, where θ=0.4. The initial values of α and β are α0=β0=0.7. This set of parameters is feasible for UVOB, as θ¯α+θβ=0.7>max{θ,θ¯}.

We demonstrate the algorithms in the following aspects: (1) the maximization part; (2) the minimization part; (3) the change from the maximum part to the minimization part; (4) the superlevel set; and (5) comparison with generic non-convex algorithms.

5.1. Maximization Part

In this part, we fix α and β to the initial values and let the BA algorithms iterate for N=200 times. The results are presented in Figure 1. Because this is the maximization part, the function values increase as the iterations proceed. It is clear that the function values behave properly for fixed α and β.

Figure 1.

Figure 1

The maximization parts in the algorithms for BSSC with fixed values α0=β0=0.7: (a) the objective function values and (b) P(X=0).

5.2. Minimization Part

In this part, we start with the initial α0 and β0, then let the algorithms iterate for K=200 times. The results for (αk,βk) are presented in Figure 2. Because this is the minimization part, the function values decrease as the iterations proceed. It is clear that αk in SCIB and MIB gradually changes as k grows. For UVOB, it is necessary to ensure that θ¯α+θβmax{θ,θ¯}. When the updated (αk+1,βk+1) makes θ¯α+θβ fall below this value, it becomes necessary to scale it back. This happens approximately starting from k=5.

Figure 2.

Figure 2

The minimization parts in the algorithms for BSSC with initial values α0=β0=0.7: (a) the objective function values and (b) (αk,βk).

5.3. Change from Maximization to Minimization

In this part, we consider UVOB and let K in the algorithm be K=100. Figure 3 plots the following three values in Equation (43):

G(αk,βk,q˜),LHS:=G(αk+1,βk+1,q1,Q1),RHS:=G(αk,βk,q˜)(αk+1αk)2τk(βk+1βk)2τk.

As the algorithm iterates, the estimate in Equation (43) becomes more and more accurate, as exp{x}1+x and ln(1+x)x for small x.

Figure 3.

Figure 3

Function values of UVOB for BSSC with initial values α0=β0=0.7.

5.4. Superlevel Set

To visualize the convergence of qn and its relation with the superlevel set, we take SCIB as an example and fix q(u) such that q(x|u) has two free variables. We reformulate the objective function of SCIB depicted in Equation (16) as follows:

F˜(α,q(x|u),Q)=θ¯H(U)+θ¯(H(X|U)H(X|Y,U)α¯H(U|Y)αH(U|Z))(θ¯θ)(H(Y|U)H(Y|X))=θ¯H(U)+θ¯x,uq(u)q(x|u)(d[Q](u,x)lnq(x|u)),

where

d[Q](u,x)=y,zp(y,z|x)lnQ(x|y,u)Q(u|y)α¯Q(u|z)α+θ¯θθ¯lnQ(y|u)p(y|x).

In particular, we fix α0=0.7 and PU=(0.3,0.7), then use the algorithm to find the values of P(X=0|U=0) and P(X=0|U=1). The results are shown in Figure 4. In this case, qn for large enough n lies in the concave part of the superlevel set, meaning that the algorithm converges. Here, it should be mentioned that it is possible that the algorithm may not converge to the optimal point for some initial q0s that do not lie in the concave part.

Figure 4.

Figure 4

Function values of SCIB for BSSC with the initial value α0=0.7 and fixed probability vector PU=(0.3,0.7): (a) 3D view and (b) contour view.

5.5. Comparison with Generic Non-Convex Algorithms

Here, we compare our algorithms with the following generic algorithms implemented using the “fmincon” MATLAB function: interior-point, active-set, and sequential quadratic programming (sqp). For simplicity, we only compare the sum rate of MIB, for which the optimal value is 0.2506717… nats (0.3616428… bits). The optimization problem for computing the sum rate is

maxq(u,v,w,x)min{I(W;Y),I(W;Z)}+I(U;Y|W)+I(V;Z|W)I(U;V|W).

According to Lemma 3, the cardinality size is |U|·|V|·|W|·|X|=|X|3(|X|+4)=48.

Notice that we do not carry out a comparison with the method in [16], as it cannot be applied to cases where there is a minimum. For scenarios in which [16] can be used, our algorithms degenerate to the method in [16].

The initial point of q(u,v,w,x) is randomly generated for all the algorithms. Table 2 lists the experimental results. For the first three algorithms, a randomly picked starting point usually does not provide a good enough result. Thus, we ran the first three algorithms multiple times until the best function value hit 0.2506 in order to test their effectiveness. It is clear from the table that only sqp can be considered comparable to our algorithms.

Table 2.

Comparison with generic non-convex algorithms on BSSC.

Method Time (Seconds) Sum-Rate of MIB (Nats)
interior-point 513.82 0.25060…
active-set 2438.57 0.25061…
sqp 1.7621 0.25067…
this paper 0.0629 0.25067…

For further comparison with sqp, we randomly generated broadcast channels with cardinalities of |X|=3, 4, 5, 6, and |Y|=|Z|=|X|. The corresponding dimensions are |X|3(|X|+4)=189, 512, 1125, 2160. Because the optimal sum rate is not yet known, we ran sqp once to record the running time. The results in Table 3 suggest that our algorithms are highly scalable. This meets our expectation, as the updating formulae in Equations (5) and (7) are all explicit and can be computed rapidly.

Table 3.

Comparison with sqp on random channels with alphabet sizes |X|=3,4,5,6.

Method Time (Seconds) Sum-Rate of MIB (Nats)
|X| = 3 |X| = 4 |X| = 5 |X| = 6 |X| = 3 |X| = 4 |X| = 5 |X| = 6
sqp 2.6342 22.44 168.12 1065.51 0.1840 0.2348 0.1983 0.2351
this paper 0.0771 0.1031 0.1450 0.2086 0.1863 0.2375 0.2019 0.2423

6. Discussion and Conclusions

6.1. Initial Points of Algorithms

Taking MIB as an example, we next discuss how to choose the initial points. When there is no prior knowledge on the optimization problem, the initial point is usually generated randomly. In this paper, Theorem 6 and Lemma 7 provide some guidance on the choice of the initial point p0(u,v,w,x). A possibly workable method is to randomly generate an initial point and slightly perturb it to check whether these two points satisfy the inequality in Lemma 7. If the answer is no, then it is possible that the objective function is not concave in the neighbourhood of this point, and we continue to generate new initial points.

For the initial point α0, because it lies in [0,1] it is affordable to perform a grid search, especially when |X| is small. For example, we can take 0.1 as the equal space and try each α0{0,0.1,0.2,,1}. This approach can to some extent help us avoid becoming stuck in local extreme points.

6.2. J Version Outer Bound

As mentioned earlier, the best general outer bound is the J version outer bound proposed in [6]. However, the evaluation of this outer bound turns out to be even harder, as there are additional constraints on the free variables and the auxiliary channel with the joint distribution

q(x)q(u,v,w|x)q(u˜,v˜,w˜|x)q(u^,v^,w^|x)p(y,z|x)T(j|x,y,z).

These constraints are presented in Equations (18a)–(18c) and (19a)–(19c) in [6]. Taking Equations (18a) and (19a) as an example,

(18a):I(W˜;Z)I(W˜;J)+I(W^;J)I(W^;Y)=I(W;Z)I(W;Y),(19a):0I(X;Z|U˜,W˜)I(X;J|U˜,W˜)I(V˜;Z|W˜)I(V˜;J|W˜).

Direct application of Equation (7) does not yield an updated q[Q] guaranteed to satisfy these constraints; thus, the design of BA algorithms for the J version outer bound should carefully address this kind of problem. We leave this for future research.

Finally, to conclude our paper, the extension of the BA algorithm to inner and outer bounds for general broadcast channels encounters max–min problems. We have shown that the max–min order can be changed to min–max. Based on this observation, we have designed BA algorithms for the maximization parts and gradient descent algorithms for the minimization parts, then performed convergence analysis and numerical experiments to support our analysis. We have compared our algorithms to the following generic non-convex algorithms: interior-point, active-set, and sequential quadratic programming. The results show that our algorithms are both effective and efficient.

Appendix A. Proofs of Results in Section 1

Proof of Theorem 1.

For the first property, Fox fixed q, the concavity is clear, as lnQ is concave in Q. The equality in Equation (6) is easy to show, as if we take Q[q](x|y) to be q(x|y) then the function C(q,Q[q]) in Equation (2) reduces to C(q) in Equation (1). The maximum can be proved using the Kullback–Leibler divergence:

C(q,Q[q])C(q,Q)=C(q)C(q,Q)=x,yq(x)p(y|x)lnq(x|y)Q(x|y)=x,yq(y)q(x|y)lnq(x|y)Q(x|y)=DX|Y(q||Q)0.

For the second property, we can reformulate C(q,Q) as

C(q,Q)=H(X)xq(x)(yp(y|x)lnQ(x|y)).

The concavity holds, as the first term H(X) is concave and the second term is linear. To find the maximum q[Q], because there is a constraint xq(x)=1, we consider the derivative of the Lagrangian:

0=q(x)L(λ,q,Q)=q(x)xq(x)(d[Q](x)lnq(x))λ(1xq(x))=d[Q](x)lnq(x)1λ.

This implies that q(x)=exp{d[Q](x)1λ} for all x. The common term 1+λ can be eliminated by normalization, as depicted in Equation (7). We can then verify the function value as follows:

C(q[Q],Q)=xq[Q](x)(d[Q](x)lnq[Q](x))=(a)xq[Q](x)(lnq[Q](x)+lnxexp{d[Q](x)}lnq[Q](x))=lnxexp{d[Q](x)},

where (a) is due to Equation (7). The second equality is again per Equation (7). □

Proof of Theorem 2.

The basic idea is to show that the sum is bounded for decreasing and positive numbers CC(qn,Qn).

The monotonicity C(qn+1,Qn+1)C(qn,Qn) is clear, as we are performing alternating maximization; thus,

C(qn+1,Qn+1)=C(q[Qn+1],Qn+1)C(qn,Qn+1)=C(qn,Q[qn])C(qn,Qn).

The positiveness of CC(qn,Qn) is because CC(qn)C(qn,Qn).

Let q achieve the maximum of C(q); then,

CC(qn,Qn)xq(x)lnqn(x)qn1(x)=C(q,Q[q])C(qn,Qn)xq(x)lnqn(x)qn1(x)=(a)xq(x)(d[Q[q]](x)lnq(x)d[Qn](x)+lnqn(x)lnqn(x)qn1(x))=xq(x)(d[Q[q]](x)lnq(x)d[Q[qn1]](x)+lnqn1(x))=(b)xp(x)yp(y|x)(lnp(y|x)q(y)lnp(y|x)qn1(y)).=x,yq(x)p(y|x)lnqn1(y)q(y)=DY(q||qn1)0,

where (a) is due to Equation (8) and (b) holds according to Equation (9). Now, we can bound the sum as follows:

n=1N(CC(qn,Qn))n=1Nxq(x)lnqn(x)qn1(x)=xq(x)lnqN(x)q0(x)=xq(x)lnqN(x)q(x)q0(x)q(x)=DX(q||q0)DX(q||qN)DX(q||q0).

The last term is finite, as q0>0. This implies C(qn,Qn)C. □

Appendix B. Proofs of Results in Section 3

Proof of Lemma 1.

The equivalence B=C is already stated without proof in Chapter 5.3 and Chapter 5.6 of [29]. We present the proof here for completeness.

It is clear that ABC. Thus, we only need to prove CA. It suffices to show that the corner points of C(U,X) lie inside A.

Because C(U,X) is a trapezoid with one corner point (0,0), there are at most three nontrivial corner points, as follows:

  1. Lower right (r1,0), where r1=min{I(X;Y),I(X;Y|U)+I(U;Z)}. Because r1I(X;Y), we have (r1,0)A(,X).

  2. Upper left (0,r2), where r2=min{I(U;Z),I(X;Y)}. Clearly, (0,r2)A(X,X).

  3. Upper right (min{I(X;Y|U),I(X;Y)I(U;Z)},I(U;Z)); this corner point exists when I(U;Z)I(X;Y). We can consider two cases:
    • (a)
      I(U;Y)I(U;Z): the corner point (I(X;Y|U),I(U;Z)) is inside A(U,X).
    • (b)
      I(U;Y)I(U;Z): the corner point is (I(X;Y)I(U;Z),I(U;Z)). It suffices to consider I(U;Y)I(U;Z)I(X;Y), as otherwise this point does not exist. Now, let α be such that (1α)I(X;Y)+αI(U;Y)=I(U;Z). We can construct a pair (U^,X^) such that this corner point is inside A(U^,X^). Considering the random variables QBernoulli(α), U˜=X when Q=0 and U˜=U when Q=1, we can let U^=(U˜,Q), X^=X; then, I(X^;Y|U^)=0+αI(X;Y|U)=I(X;Y)I(U;Z). Now, we have I(U^;Z)I(U˜;Z|Q)I(U;Z) and I(U^;Y)I(U˜;Y|Q)=I(U;Z); hence, min{I(U^;Z),I(U^;Y)}I(U;Z).
  • The proof is finished. □

Proof of Theorem 3.

We can use A(U,X) in Lemma 1 to compute the supporting hyperplanes.

When θ[12,1], then θRY+θ¯RZ is bounded from above by

maxq(u,x)θI(X;Y|U)+θ¯I(U;Y)=maxq(u,x)θ(I(X;Y)I(U;Y))+θ¯I(U;Y)=maxq(u,x)θI(X;Y)+(12θ)I(U;Y)maxq(x)θI(X;Y).

On the other hand, this upper bound is achieved by setting U= in A(U,X).

When θ[0,12), we first have

F=maxp(u,x)minα[0,1]θI(X;Y|U)+θ¯αI(U;Z)+θ¯α¯I(U;Y).

To show that the maximum and minimum can be exchanged, we use Lemma 2; in particular, letting d=2, λ1=α, and λ2=α¯, we have

T1(q(u,x))=θI(X;Y|U)+θ¯I(U;Z),T2(q(u,x))=θI(X;Y|U)+θ¯I(U;Y).

Then, the objective function F(α,q) equals λ1T1+λ2T2. It remains to prove that T is convex. Assuming that (a1,a2)T for some qa(u,x) and that (b1,b2)T for some qb(u,x) while letting U˜=(U,Q), where QBernoulli(β) and (U,X)qa(u,x) if Q=1 and (U,X)qb(u,x) if Q=0, we then have

T1(βqa+β¯qb)=T1(q(u˜,x))=θI(X;Y|U,Q)+θ¯I(U,Q;Z)θI(X;Y|U,Q)+θ¯I(U;Z|Q)=βT1(qa)+β¯T1(qb).

Similar inequalities hold for T2. This proves the convexity, and hence the exchange.

Now, we can show that the expression of F can be simplified for θ[0,12) and α12θ1θ. Noting that I(U;Y)=I(X;Y)I(X;Y|U) and I(U;Z)=I(X;Z)I(X;Z|U), we have

θI(X;Y|U)+θ¯α¯I(U;Y)+θ¯αI(U;Z)=θ¯α¯I(X;Y)+θ¯αI(X;Z)+(θθ¯α¯)I(X;Y|U)θ¯αI(X;Z|U).

When α12θ1θ, i.e., θθ¯α¯0, the last two terms above have a non-positive sum. The maximum value equals zero, and can be achieved by taking U=X. This finishes the proof of the expression.

The cardinality can be proved as follows:

θI(X;Y|U)+θ¯α¯I(U;Y)+θ¯αI(U;Z)=θ¯α¯I(X;Y)+θ¯αI(X;Z)+(θθ¯α¯)I(X;Y|U)θ¯αI(X;Z|U)=f(q(x))+uq(u)g(q(x|u)),

where f and g are some continuous functions corresponding to the mutual information. Subject to fixed marginal q(x), the maximum of uq(u)g(q(x|u)) over all feasible q(u) and q(x|u) is the upper concave envelope of the function g evaluated at q(x). Notice that as the degree of freedom of the distribution q(x) is |X|1, it suffices to consider |U|X|1+1=|X| for evaluating the envelope. □

Proof of Theorem 4.

For fixed q(u,v,x), in the pentagon of the UVOB there are at most two corner points in the first quadrant, namely, the upper left and lower right ones. The line connecting these two points has slope 1. We need to compute the the supporting hyperplane value G=max(RY,RZ)UVOBθRY+θ¯RZ, where θ(0,1).

For the case θ(0,12], note that the slope of the line θRY+θ¯RZ=G is θ/θ¯1; thus, it suffices to consider the upper left corner point. The expression of this point is different when q(u,v,x) falls into one of the following two sets:

S1={q(u,v,x):I(V;Z)I(U;Y)+I(X;Z|U)},S2={q(u,v,x):I(V;Z)I(U;Y)+I(X;Z|U)}.

When qS1, this corner point and the corresponding expression of l1(q):=θRY+θ¯RZ are

min{I(U;Y),I(U;Y)+I(X;Z|U)I(V;Z),I(X;Y|V)},I(V;Z),l1(q)=θ·min{I(U;Y),I(U;Y)+I(X;Z|U)I(V;Z),I(X;Y|V)}+θ¯I(V;Z).

Otherwise, when qS2, this corner point and corresponding expression of l2(q):=θRY+θ¯RZ are

0,I(U;Y)+I(X;Z|U),l2(q)=θ¯I(U;Y)+I(X;Z|U).

Now, the supporting hyperplane value is

G=max{maxqS1l1(q),maxqS2l2(q)}.

We want to show that

maxqS2l2(q)=maxqS2l1(q).

For the left hand-side term, we have

maxqS2l2(q)=maxqS2θ¯I(U;Y)+I(X;Z|U)maxqS2θ¯I(V;Z)maxqS2θ¯I(X;Z),

where the equalities hold with U=, V=X and this choice is in S2. For the other term,

maxqS2l1(q)=maxqS2θ(I(U;Y)+I(X;Z|U)I(V;Z))+θ¯I(V;Z)maxqS2θ¯I(V;Z)maxqS2θ¯I(X;Z),

where the equalities hold using the same settings as above. Hence, we have the supporting hyperplane value

G=max{maxqS1l1(q),maxqS2l1(q)}=maxql1(q).

We can simplify this expression as follows:

maxql1(q)=maxqmina,b[0,1]θ(1a)I(U;Y)+θa(1b)I(U;Y)+I(X;Z|U)I(V;Z)+θabI(X;Y|V)+θ¯I(V;Z)=maxqmina,b[0,1]θ(1ab)I(U;Y)+θa(1b)I(X;Z|U)+(θ¯θa(1b))I(V;Z)+θabI(X;Y|V).

Letting α=1θa(1b)/θ¯, β=1ab, we have α,β[0,1], θ¯α+θβ=θ¯+θ(1a), and the supporting hyperplane value is

θRY+θ¯RZ=maxqminα,β[0,1]θ¯αI(V;Z)+θ¯α¯I(X;Z|U)+θβI(U;Y)+θβ¯I(X;Y|V).

Notice that the range of θ¯α+θβ is [max{θ,θ¯},1]. Within this range, the reverse mapping from (α,β) to (a,b) is

a=1(θ¯α+θβ)θ,b=θβ¯θβ¯+θ¯α¯.

Notice that when θ[12,1), we have θ¯(0,12]; we can use similar reasoning as above (by swapping Y and Z, RY and RZ, U and V, θ and θ¯, and finally α and β) to obtain

θ¯RZ+θRY=maxqminα,β[0,1]θβI(U;Y)+θβ¯I(X;Y|V)+θ¯αI(V;Z)+θ¯α¯I(X;Z|U).

The constraint then becomes θβ+θ¯α=θ+θ¯(1a), for which the range is again [max{θ¯,θ},1]. Thus, the expression and the constraints are the same as for the case where θ(0,12]. Putting these two cases together, we have the characterization of the supporting hyperplanes.

To exchange the max–min, we again use Lemma 2. The proof is similar to that of Theorem 3, and as such we omit the details, providing only the setting for θ(0,12], where G=maxql1(q): d=3 and the functions are

T1(q)=θI(U;Y)+θ¯I(V;Z),T2(q)=θ(I(U;Y)+I(X;Z|U)I(V;Z))+θ¯I(V;Z),T3(q)=θI(X;Y|V)+θ¯I(V;Z).

The proof of the cardinality bounds is similar to that of Theorem 3. □

Appendix C. Proofs of Results in Section 4

Proof of Lemma 5.

Let A represent a generic random variable. The gradient of H(A) can be calculated as follows:

H(A)q(u,x)=q(u,x)aq(a)lnq(a)=aq(a)q(u,x)lnq(a)aq(a)1q(a)q(a)q(u,x)=aq(a)q(u,x)(1+lnq(a)).

For the particular term H(U|Y) in F(α,q), the gradient is

H(U,Y)H(Y)q(u,x)=u,yq(u,y)q(u,x)(1+lnq(u,y))+yq(y)q(u,x)(1+lnq(y))=yp(y|x)(1+lnq(u,y))+yp(y|x)(1+lnq(y))=yp(y|x)lnq(u|y).

Now, we can calculate the first-order term in Equation (41):

(q2q1)THq1(U|Y)=u,x,y(q2(u,x)q1(u,x))p(y|x)lnq1(u|y)=u,yq2(u,y)lnq1(u|y)+Hq1(U|Y).

Finally, the left-hand side of Equation (41) equals

Hq2(U|Y)Hq1(U|Y)(q2q1)THq1(U|Y)=Hq2(U|Y)+u,yq2(u,y)lnq1(u|y)=DU|Y(q2||q1).

For the other terms in Equation (16), we can perform similar calculations to obtain the desired inequality. □

Proof of Lemma 6.

Consider the superlevel set S˜F(α,k) of the function F(α,q,Q[qn]). According to Equation (6), F(α,q,Q[qn])F(α,q) for all q; thus, S˜F(α,k)SF(α,k). From Equation (6), F(α,qn,Q[qn])=F(α,qn), we have qnS˜F(α,k); further, because F(α,q,Q[qn]) is concave in q, S˜F(α,k) is a convex set, and as such is connected. This implies that S˜F(α,k)TF(α,k,qn). Because qn+1 makes the function value F(α,q,Q[qn]) larger than that of qn, it must lie in S˜F(α,k), and as such in TF(α,k,qn). □

Proof of Theorem 5.

The proof is similar to that of Theorem 2. We perform the following manipulations:

F(α,q)F(α,qn,Qn)θ¯u,xq(u,x)lnqn(u,x)qn1(u,x)=F(α,q,Q[q])F(α,qn,Qn)θ¯u,xq(u,x)lnqn(u,x)qn1(u,x)=(a)θ¯u,xq(u,x)d[Q[q]]lnqd[Qn]+lnqnlnqnqn1
=θ¯u,xq(u,x)d[Q[q]]lnqd[Q[qn1]]+lnqn1=(b)θ¯(DUX(q||qn1)+DX|YU+α¯DU|Y+αDU|Z)+(θ¯θ)DY|U(c)0,

where (a) holds from Equation (8), (b) is due to

u,xq(d[Q[q]]d[Q[qn1]])=u,x,y,zq(u,x)p(y,z|x)(lnq(x|y,u)qn1(x|y,u)(q(u|y)qn1(u|y))α¯(q(u|z)qn1(u|z))α+θ¯θθ¯lnq(y|u)qn1(y|u))=DX|YU(q||qn1)+α¯DU|Y+αDU|Z+θ¯θθ¯DY|U,

and (c) is from Lemma 5. This implies that

n=1NF(α,q)F(α,qn,Qn)θ¯u,xq(u,x)lnqN(u,x)q0(u,x)=θ¯u,xq(u,x)lnqN(u,x)q(u,x)q0(u,x)q(u,x)=θ¯(DUX(q||q0)DUX(q||qN))θ¯DUX(q||q0).

The last term is finite and positive, as q0>0. Finally, a sequence of positive terms has a finite sum, which implies that the terms converge to zero, i.e., F(α,qn,Qn)F(α,q). □

Proof of Proposition 2.

Let Cm and Δd(u,x) be

Cm=u,xexp{dm[Q˜](u,x)},Δd(u,x)=dm+1[Q˜]dm[Q˜],

where dk[Q] is as in Equation (18) with α=αk. According to Equations (7) and (8),

q0=q˜=exp{dm[Q˜]}exp{dm[Q˜]}=exp{dm[Q˜]}Cm,F(αm,q˜)=θ¯lnCm.

Noting that Q1=Q[q0]=Q[q˜]=Q˜, we estimate the difference as follows:

F(αm+1,q1,Q1)F(αm,q˜)=(a)θ¯lnexp{dm+1[Q˜]}θ¯lnCm=θ¯lnexp{dm[Q˜]}exp{Δd}θ¯lnCm=θ¯lnCmq˜exp{Δd}θ¯lnCm=θ¯lnq˜exp{Δd}θ¯lnq˜(1+Δd)=θ¯ln(1+q˜Δd)θ¯q˜Δd,

where (a) holds from Equation (8) and from the fact that Q1=Q˜.

According to the definition of d[Q] in Equation (19),

Δd=(αm+1αm)y,zp(y,z|x)lnQ˜(u|z)Q˜(u|y).

The expectation of this difference is

u,xq˜Δd=(αm+1αm)(I(U;Z)I(U;Y))=(b)(αm+1αm)(αm+1αm)1τmθ¯=(αm+1αm)2τmθ¯,

where (b) is due to Equation (22). The proof is finished. □

Author Contributions

Methodology and analysis, Y.G.; software and visualization, Y.D. and Y.L.; validation, X.N., B.B. and W.H.; writing—original draft preparation, Y.G.; writing—review and editing, X.N., B.B. and W.H. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Yanqing Liu is employed by Bank of China. Xueyan Niu, Bo Bai, Wei Han are employed by Huawei Tech. Co., Ltd. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding Statement

This research was funded in part by the National Key R&D Program of China (Grant No. 2021YFA1000500) and in part by Huawei Tech. Co., Ltd.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Cover T. Broadcast Channels. IEEE Trans. Inform. Theory. 1972;18:2–14. doi: 10.1109/TIT.1972.1054727. [DOI] [Google Scholar]
  • 2.Bergmans P. Random coding theorem for broadcast channels with degraded components. IEEE Trans. Inform. Theory. 1973;19:197–207. doi: 10.1109/TIT.1973.1054980. [DOI] [Google Scholar]
  • 3.Körner J., Marton K. Comparison of two noisy channels. In: Csiszár I., Elias P., editors. Topics in Information Theory. North-Holland; Amsterdam, The Netherlands: 1977. pp. 411–423. [Google Scholar]
  • 4.Marton K. A coding theorem for the discrete memoryless broadcast channel. IEEE Trans. Inform. Theory. 1979;25:306–311. doi: 10.1109/TIT.1979.1056046. [DOI] [Google Scholar]
  • 5.Nair C., El Gamal A. An outer bound to the capacity region of the broadcast channel. IEEE Trans. Inform. Theory. 2007;53:350–355. doi: 10.1109/TIT.2006.887492. [DOI] [Google Scholar]
  • 6.Gohari A., Nair C. Outer bounds for multiuser settings: The auxiliary receiver approach. IEEE Trans. Inform. Theory. 2022;68:701–736. doi: 10.1109/TIT.2021.3128136. [DOI] [Google Scholar]
  • 7.Liu Y., Geng Y. Blahut-Arimoto Algorithms for Computing Capacity Bounds of Broadcast Channels; Proceedings of the IEEE International Symposium on Information Theory; Espoo, Finland. 26 June–1 July 2022; pp. 1145–1150. [Google Scholar]
  • 8.Liu Y., Dou Y., Geng Y. Blahut-Arimoto Algorithm for Marton’s Inner Bound; Proceedings of the IEEE International Symposium on Information Theory; Taipei, Taiwan. 25–30 June 2023; pp. 2159–2164. [Google Scholar]
  • 9.Calvo E., Palomar D.P., Fonollosa J.R., Vidal J. The computation of the capacity region of the discrete degraded BC is a nonconvex DC problem; Proceedings of the IEEE International Symposium on Information Theory; Toronto, ON, Canada. 6–11 July 2008; pp. 1721–1725. [Google Scholar]
  • 10.Nocedal J., Wright S.J. Numerical Optimization. 2nd ed. Springer; New York, NY, USA: 2006. [Google Scholar]
  • 11.Wilson R.B. Ph.D. Thesis. Graduate School of Business Administration, Harvard University; Cambridge, MA, USA: 1963. A simplicial Algorithm for Concave Programming. [Google Scholar]
  • 12.Blahut R. Computation of channel capacity and rate distortion functions. IEEE Trans. Inform. Theory. 1972;18:460–473. doi: 10.1109/TIT.1972.1054855. [DOI] [Google Scholar]
  • 13.Arimoto S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inform. Theory. 1972;18:14–20. doi: 10.1109/TIT.1972.1054753. [DOI] [Google Scholar]
  • 14.Rezaeian M., Grant A. Computation of Total Capacity for Discrete Memoryless Multiple-Access Channels. IEEE Trans. Inform. Theory. 2004;50:2779–2784. doi: 10.1109/TIT.2004.836661. [DOI] [Google Scholar]
  • 15.Calvo E., Palomar D.P., Fonollosa J.R., Vidal J. On the Computation of the Capacity Region of the Discrete MAC. IEEE Trans. Commun. 2010;58:3512–3525. doi: 10.1109/TCOMM.2010.091710.090239. [DOI] [Google Scholar]
  • 16.Yasui K., Matsushima T. Toward Computing the Capacity Region of Degraded Broadcast Channel; Proceedings of the IEEE International Symposium on Information Theory; Austin, TX, USA. 13–18 June 2010; pp. 570–574. [Google Scholar]
  • 17.Dupuis F., Yu W., Willems F.M.J. Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information; Proceedings of the IEEE International Symposium on Information Theory; Chicago, IL, USA. 27 June–2 July 2004; [(accessed on 8 January 2024)]. p. 179. Available online: https://www.comm.utoronto.ca/~weiyu/ab_isit04.pdf. [Google Scholar]
  • 18.Gohari A., El Gamal A., Anantharam V. On Marton’s Inner Bound for the General Broadcast Channel. IEEE Trans. Inform. Theory. 2014;60:3748–3762. doi: 10.1109/TIT.2014.2321384. [DOI] [Google Scholar]
  • 19.Geng Y., Gohari A., Nair C., Yu Y. On Marton’s Inner Bound and Its Optimality for Classes of Product Broadcast Channels. IEEE Trans. Inform. Theory. 2014;60:22–41. doi: 10.1109/TIT.2013.2285925. [DOI] [Google Scholar]
  • 20.Anantharam V., Gohari A., Nair C. On the Evaluation of Marton’s Inner Bound for Two-Receiver Broadcast Channels. IEEE Trans. Inform. Theory. 2019;65:1361–1371. doi: 10.1109/TIT.2018.2880241. [DOI] [Google Scholar]
  • 21.Geng Y. Single-Letterization of Supporting Hyperplanes to Outer Bounds for Broadcast Channels; Proceedings of the IEEE/CIC International Conference on Communications in China; Changchun, China. 11–13 August 2019; pp. 70–74. [Google Scholar]
  • 22.Gowtham K.R., Thangaraj A. Computation of secrecy capacity for more-capable channel pairs; Proceedings of the IEEE International Symposium on Information Theory; Toronto, ON, Canada. 6–11 July 2008; pp. 529–533. [Google Scholar]
  • 23.Hajek B.E., Pursley M.B. Evaluation of an achievable rate region for the broadcast channel. IEEE Trans. Inform. Theory. 1979;25:36–46. doi: 10.1109/TIT.1979.1055989. [DOI] [Google Scholar]
  • 24.Cover T.M. An achievable rate region for the broadcast channel. IEEE Trans. Inform. Theory. 1975;21:399–404. doi: 10.1109/TIT.1975.1055418. [DOI] [Google Scholar]
  • 25.van der Meulen E.C. Random coding theorems for the general discrete memoryless broadcast channel. IEEE Trans. Inform. Theory. 1975;21:180–190. doi: 10.1109/TIT.1975.1055347. [DOI] [Google Scholar]
  • 26.Nair C., Wang Z.V. On the inner and outer bounds for 2-receiver discrete memoryless broadcast channels; Proceedings of the Information Theory and Applications Workshop; San Diego, CA, USA. 27 January–1 February 2008; pp. 226–229. [Google Scholar]
  • 27.Gohari A., Anantharam V. Evaluation of Marton’s inner bound for the general broadcast channel; Proceedings of the IEEE International Symposium on Information Theory; Seoul, Repubic of Korea. 28 June–3 July 2009; pp. 2462–2466. [Google Scholar]
  • 28.Jog V., Nair C. An information inequality for the BSSC channel; Proceedings of the Information Theory and Applications Workshop; La Jolla, CA, USA. 31 January–5 February 2010; pp. 1–8. [Google Scholar]
  • 29.El Gamal A., Kim Y.-H. Network Information Theory. Cambridge University Press; Cambridge, UK: 2011. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES