Abstract
The celebrated Blahut–Arimoto algorithm computes the capacity of a discrete memoryless point-to-point channel by alternately maximizing the objective function of a maximization problem. This algorithm has been applied to degraded broadcast channels, in which the supporting hyperplanes of the capacity region are again cast as maximization problems. In this work, we consider general broadcast channels and extend this algorithm to compute inner and outer bounds on the capacity regions. Our main contributions are as follows: first, we show that the optimization problems are max–min problems and that the exchange of minimum and maximum holds; second, we design Blahut–Arimoto algorithms for the maximization part and gradient descent algorithms for the minimization part; third, we provide convergence analysis for both parts. Numerical experiments validate the effectiveness of our algorithms.
Keywords: Blahut–Arimoto algorithm, broadcast channel, capacity region, superposition coding inner bound, Marton’s inner bound, UV outer bound
1. Introduction
In 1972, Cover [1] introduced the two-receiver discrete memoryless broadcast channel to model a system of downlink communication in which X is the sender and are the receivers. In the same paper, he proposed a coding scheme which resulted in the superposition coding inner bound (SCIB). It turns out that the SCIB is indeed the capacity region for two-receiver broadcast channels in which the receivers are comparable in the following partial orders: degraded [2], less noisy [3], and more capable [3]. However, for a general broadcast channel the single-letter capacity region remains open.
To characterize the capacity region of a broadcast channel, a standard approach is to show that one inner bound matches another outer bound. Currently, the best inner bound for general broadcast channels is Marton’s inner bound (MIB) [4], while the UV outer bound (UVOB) [5] was the best outer bound until recently, when a better one called J version outer bound was proposed in [6].
The evaluation of inner and outer bounds is critical in the following aspects: (1) the evaluation of an inner bound usually results in an optimal input distribution which can help in the design of practical coding schemes; (2) the identification of the capacity region of a particular broadcast channel through the comparison of one inner bound and one outer bound relies on the evaluation of these two bounds; and (3) when claiming to establish a new bound, it is necessary to show that the new bound strictly improves on old ones through the evaluation of bounds on a particular channel.
Remark 1.
This is the full version of conference papers accepted by ISIT 2022 and 2023 [7,8].
However, this evaluation is usually difficult due to its non-convexity [9]. To alleviate this issue, there exist a number of generic optimization algorithms, such as interior point [10], active set [10], and sequential quadratic programming [11]. However, efficient algorithms should use the domain knowledge of information theory as well; from this viewpoint, we consider the Blahut–Arimoto (BA) algorithm, which is specially customized for information theory.
The original BA algorithm was independently developed by Blahut [12] and Arimoto [13] to calculate the channel capacity for a general point-to-point channel . The algorithm transforms the original maximization problem into an alternating maximization problem:
where the updating formulae are explicit within each iteration.
There have been numerous extensions of the BA algorithm to various scenarios in information theory. For example, [14] applied the BA algorithm to compute the sum rate of the multiple-access channel. Later, using the idea from the BA algorithm, the whole capacity region of the multiple-access channel was formulated in [15] as a rank-one constrained problem and solved by relaxation methods. It is beyond the scope of this paper to list all of these references. Instead, we discuss those papers closely related to computing the bounds on capacity regions of broadcast channels.
In [16], the authors considered the capacity region of a degraded broadcast channel , where receiver Z is a degraded version of Y.
In this scenario, the capacity region of the rate pairs is known, and can be achieved by the simplified version of superposition coding. The supporting hyperplanes can be characterized as
Using a similar idea to that of the BA algorithm, the authors designed an algorithm to alternatively maximize the objective function.
The method in [16] is directly applicable to less noisy broadcast channels, as the characterization of the capacity region is the same as that of the degraded case. However, this equivalence no longer holds for the more capable case, as this time the value of the supporting hyperplane is characterized as a max–min optimization problem (e.g., see Equation (14)). As a mater of fact, the supporting hyperplanes of the above-mentioned bounds, that is, SCIB, MIB, and UVOB, are all of the max–min form. The main issue is that the minimization part is inside the maximization part, which prevents the application of the BA algorithm to the whole problem.
The algorithms for calculating inner bounds and outer bounds for general broadcast channels are very limited. The authors of [17] considered MIB (see Section 3.2) and designed a BA algorithm to compute the sum rate of the simplified version, where the auxiliary random variable . The objective function,
is convex in , which means that the maximum input X is a function of . Noticing this, the authors performed optimization over all fixed mappings . However, discarding W can result in a strictly smaller sum rate [18], making it is necessary to consider the complete version of MIB.
In this paper, we seek to design BA algorithms for general broadcast channels in order to compute the following inner and outer bounds: SCIB, MIB, and UVOB. The key difference here is that the optimization problems are max–min problems, rather than only containing a maximization part. In Table 1, we provide an intuitive comparison of related references.
Table 1.
Comparison of typical scenarios related to the BA algorithm.
The notation we use is as follows. p denotes a fixed (conditional) probability distribution such as , while q and Q are used for (conditional) probabilities that are changeable. Calligraphic letters such as are used to denote sets. The use of square brackets in the function means that f is specified by the variable g; denotes ; and unless otherwise specified, we use the natural logarithm. To make the mathematical expressions more concise, we use the following abbreviations of the Kullback–Leibler divergences:
The organization is as follows. First, in Section 2 we introduce the necessary background on the BA algorithm and its extension in [16]. Then, in Section 3 we extend the BA algorithm to the evaluation of SCIB, MIB, and UVOB. Convergence analyses of these algorithms are presented in Section 4. Finally, in Section 5 we perform numerical experiments to validate the effectiveness and efficiency of our algorithms.
2. Mathematical Background of Blahut–Arimoto Algorithms
We first introduce the standard BA algorithm in Section 2.1, as we will rely on several of its properties later. Then, in Section 2.2 we discuss why the method in [16] cannot be applied to general broadcast channels.
2.1. Blahut–Arimoto Algorithm for Point-to-Point Channel
For a point-to-point channel , the capacity C is the maximum of the mutual information , where
| (1) |
By replacing with a free variable , the BA algorithm performs alternating maximization , where
| (2) |
Notice here that we abuse the notation of , which should not cause confusion in general.
The above objective function can be reformulated as follows:
| (3) |
| (4) |
We call this the basic form. For different scenarios, BA algorithms mainly differ in the distribution and function .
The following theorem (see proof in Appendix A) provides the explicit formulae for the maximum Q given q (denoted as ) and maximum q given Q (denoted as ).
Theorem 1.
The following properties hold for the problem .
Given a fixed q, is concave in Q, and the maximum point is induced by the input and the channel,
(5) Further, the function values satisfy
(6) Given a fixed Q, is concave in q, and the maximum point is obtained by the Lagrangian,
(7) Further, evaluation of the function value results in
(8)
Starting from an initial , the BA algorithm performs alternating maximization and produces a sequence of points
where
according to Equations (5) and (7), respectively.
The criterion for stopping the iterations is based on the following result [12,13].
Proposition 1
(Proposition 1 in [13], Theorem 2 in [12]). It is the case that maximizes if and only if the following holds for and some scalar D:
Remark 2.
It should be mentioned that in order to avoid infinity minus infinity for those in the above proposition, the following equivalent formulae can be used:
(9) This is not be an issue in the BA algorithm, as according to Equation (7).
Thus, at the end of the n-th step, if the difference
is small enough then the iteration is stopped.
Summarizing the above details, we arrive at the BA algorithm depicted in Algorithm 1. The convergence of the resulted sequence of is characterized in the following theorem (see proof in Appendix A).
Theorem 2
(Theorem 1 in [13], Theorem 3 in [12]). If , then the value converges monotonically from below to the capacity C.
2.2. Blahut–Arimoto Algorithm for Degraded Broadcast Channel
In [16], the authors considered the capacity region of the degraded broadcast channel. The original objective function for the value of the supporting hyperplane (where ) is:
| (10) |
| (11) |
Similar to the BA algorithm, the new objective function is
| (12) |
where
| (13) |
Then, the authors designed an extended BA algorithm that alternately maximizes and analysed the convergence.
However, the above method does not generalize to allow for evaluating the capacity bounds of general broadcast channels. The main reason can be summarized in just one sentence: the minimum of the expectation is, in general, greater than the expectation of the minimum. Taking SCIB as an example, using the representation in Lemma 1, the supporting hyperplane is
A direct extension might try to reformulate the last expression in the form of Equation (12) by letting
however, this is not an equivalent reformulation, as
In the following section, we first use the fact that to parameterize the minimum, then show the max–min exchanges so that we can apply the BA algorithm in the maximization part.
3. Blahut–Arimoto Algorithms for Capacity Bounds of Broadcast Channels
In this section, we first introduce two inner bounds and one outer bound on the capacity region of the broadcast channel. We characterize their supporting hyperplanes as max–min problems and show that the maximum and minimum can be exchanged. Then, we design BA algorithms for the maximization parts and gradient descent algorithms for the minimization parts.
3.1. Superposition Coding Inner Bound
The superposition coding inner bound was proposed by Cover in [1], which corresponds to region in the following lemma (see proof in Appendix B). This region actually has three equivalent characterizations.
Lemma 1
(folklore). The following regions , , and are equivalent characterizations of the superposition coding inner bound:
To characterize the supporting hyperplane , we choose to use the representation . It is clear that
| (14) |
Notice here that we cannot use the BA algorithm directly, as there is a minimum inside the maximum. If we are able to swap the orders of the maximum and minimum, then we can adopt the BA algorithm in the maximization part.
To show this kind of exchange, we first introduce a Terkelsen-type min–max result in the following lemma.
Lemma 2
(Corollary 2 in Appendix A of [19]). Let be the d-dimensional simplex, i.e., and , let be a set of probability distributions , and let , be a set of functions such that the set defined by
is a convex set; then,
With this lemma, we are to establish the following theorem (proof in Appendix B).
Theorem 3.
The supporting hyperplane of the superposition coding inner bound is as follows: if , then ; otherwise, if then
Further, it suffices to consider the cardinality size: .
For the maximization part and the nontrivial case where , following the above theorem, two types of BA algorithms can be designed according to the value of .
When , the original objective function is
| (15) |
| (16) |
By replacing the conditional qs with free variables Qs, we have the new objective function
| (17) |
where
| (18) |
When , similar to Equation (4), the new objective function is
| (19) |
where
| (20) |
For the minimization part, it is possible to use the optimal q and Qs obtained in the maximization part to update . Because the values of the optimal q and Qs may vary greatly when changes, we propose changing locally in the neighbourhood. A candidate approach is to use the gradient descent method, as follows:
| (21) |
where
| (22) |
If the change in is sufficiently small, it can be assumed that the optimization with respect to converges and then stop the iteration.
We summarize the above procedures in Algorithm 2. Note that the updating rules for the q and Qs depend on the interval in which the value of falls.
| Algorithm 2: Computing the superposition coding inner bound for | ||||
| Input: , maximum iterations K, N, thresholds , , step size ; | ||||
| Initialization: , , , ; | ||||
| while and do | ||||
| initialize , ; | ||||
| while and do | ||||
| ; | ||||
| using Equation (5) similarly; | ||||
| using Equation (7) similarly; | ||||
| using Equation (20) or (18); | ||||
| ; | ||||
| end | ||||
| ; | ||||
| calculate using Equations (21) and (22); | ||||
| ; | ||||
| ; | ||||
| ; | ||||
| end | ||||
| Output: , , , | ||||
Remark 3.
Given , according to Equation (8),
According to Equation (18) or (20), equals a sum of exponents, as it is a function of . This is a simple function; thus, we might wonder whether we can minimize to update . It turns out that this kind of global updating rule can result in an oscillating effect, as can be observed from Figure 1 in [7]. The main reason for this is that depends locally on α; therefore, it is not suitable to update α globally.
3.2. Marton’s Inner Bound
Marton’s inner bound [4] refers to the union over (such that is Markov) of the non-negative rate pairs satisfying
For general broadcast channels, this is the most well known inner bound.
In the following, we characterize the supporting hyperplane of MIB. Because the expressions in MIB have symmetry in Y and Z, without loss of generality we can assume that , i.e., . According to [20], the supporting hyperplane is stated in the following lemma.
Lemma 3.
(Equations (2) and (5) in [20]). The supporting hyperplane of Marton’s inner bound, where , is
where
(23) Further, it suffices to consider the following cardinalities: , , and .
To compute the value of this supporting hyperplane, we can reformulate as follows:
| (24) |
Then, the objective function can be expressed as
| (25) |
where
| (26) |
For minimization over , similar to Section 3.1, we update along the gradient
| (27) |
Similar to Algorithm 2, we summarize the algorithm for MIB in Algorithm 3.
| Algorithm 3: Computing Marton’s inner bound for | ||||
| Input: , maximum iterations K, N, thresholds , , step size ; | ||||
| Initialization: , , , ; | ||||
| while and do | ||||
| initialize , ; | ||||
| while and do | ||||
| ; | ||||
| using Equation (5) similarly; | ||||
| using Equation (7) similarly; | ||||
| using Equation (26); | ||||
| ; | ||||
| end | ||||
| ; | ||||
| calculate using Equation (27); | ||||
| ; | ||||
| ; | ||||
| ; | ||||
| end | ||||
| Output: , , , | ||||
3.3. UV Outer Bound
The UV outer bound [5] refers to the union over of non-negative rate pairs satisfying
For general broadcast channels, this was the best outer bound until [6] strictly improved upon it over an erasure Blackwell channel. The following theorem (proof in Appendix B) characterizes the supporting hyperplanes.
Theorem 4
(Claim 2 and Remark 1 in [21]). The supporting hyperplane of the UV outer bound is
(28) where satisfy . Further, it suffices to consider the cardinality sizes .
The original objective function can be reformulated as
The right-hand side contains two parts, both of which are similar to Equation (10), i.e., the objective function of the degraded broadcast channel. It seems workable to apply the BA algorithm twice, however, it should be noted that these two parts are coupled by the same .
Observe that the first part depends only on , while the other depends on . It suffices to consider the subset of distributions such that . Thus, it is natural to decouple these two parts by fixing and applying the BA algorithm separately to and . After some manipulations, we have
| (29) |
| (30) |
The functions and in the above are
| (31) |
| (32) |
For fixed , according to Equation (5), the optimal Qs are induced s. For fixed Qs, according to Equation (7), for each x we have
| (33) |
| (34) |
The value of the objective function is
| (35) |
where
| (36) |
Again, according to Equation (7) the optimal and corresponding function value are
| (37) |
| (38) |
For minimization over , similar to Section 3.1, we update along the gradient:
| (39) |
| (40) |
Here, it should be mentioned that must satisfy the constraint . Thus, if the resulting violate this constraint, then we need to scale up to be (at least) equal to . One way to accomplish this is to use the equality to make dependent on , in which case the gradient descent update becomes , , where
Similar to Algorithm 2, we summarize the algorithm for UVOB in Algorithm 4.
| Algorithm 4: Computing the UV outer bound | ||||
| Input: , maximum iterations K, N, thresholds , , step size ; | ||||
| Initialization: , , , , | ||||
| ; | ||||
| while and do | ||||
| initialize , ; | ||||
| while and do | ||||
| ; | ||||
| using Equation (5) similarly; | ||||
| using Equations (33), (34) and (37); | ||||
| using Equation (36); | ||||
| ; | ||||
| end | ||||
| ; | ||||
| calculate and using Equations (39) and (40); | ||||
| ; | ||||
| ; | ||||
| if , scale up to equality; | ||||
| ; | ||||
| ; | ||||
| end | ||||
| Output: , , , , | ||||
4. Convergence Analysis
Here, we aim to show that certain convergence results hold if lies in a proper convex set which contains the global maximizer . For this purpose, we first introduce the first-order characterization of a concave function.
Lemma 4
(Lemma 3 in [22]). Given a convex set , a differentiable function f is concave in if and only if, for all ,
(41)
Similar to [22], we use the superlevel set to construct the convex set . Let be the superlevel set of the objective function of SCIB:
| (42) |
For a fixed k, it is possible for to contain more than one connected set. For , we denote the connected set that contains q as .
Similarly, for MIB and UVOB we define the corresponding (connected) superlevel sets: , , , . Note that k here should not be confused with the notation indicating the number of iterations in the algorithms.
4.1. Superposition Coding Inner Bound
According to Theorem 3, the expression of the objective function of SCIB depends on the value of . Without loss of generality, we can consider the objective function depicted in Equation (16). An equivalent condition for to be concave is provided in the following lemma.
Lemma 5.
Given a convex set with a distribution , then as depicted in Equation (16) is concave in if and only if, for all , , we have
where denotes .
The following lemma shows that lies in the same connected superlevel set as that of . The proof (see Appendix C) is similar to that for Lemma 4 in [16].
Lemma 6.
In Algorithm 2, if , then .
Fixing and letting be the maximizer, the following theorem states that the function values converge. The proof (see Appendix C) is similar to that of Theorem 2.
Theorem 5.
If for some k and , and if is concave in , then the sequence generated by Algorithm 2 converges monotonically from below to .
The following corollary is implied by the proof of Theorem 5.
Corollary 1.
If for some k and , and if is concave in , then
The above analyses deal with for a fixed . When changes to , the estimation for the one-step change in the function value is presented in the following proposition (see proof in Appendix C).
Proposition 2.
Given , suppose that Algorithm 2 converges to the optimal variables and such that and . Letting be updated using Equation (22) and letting be the initial point for the next round, we have
4.2. Marton’s Inner Bound
Next, we present the convergence results of the BA algorithm for MIB. The proofs are omitted, as they are similar to those for SCIB.
Lemma 7.
Given a convex set with a distribution , as depicted in Equation (24) is concave in if and only if, for all , , we have
where denotes .
Lemma 8.
In Algorithm 3, if , then .
Fixing and letting be the maximizer, the following theorem states that the function values converge.
Theorem 6.
If for some k and and if is concave in , then the sequence generated by Algorithm 3 converges monotonically from below to .
The following corollary is implied by the proof of Theorem 6.
Corollary 2.
If for some k and , and if is concave in , then
The estimation for the one-step change in the function value for MIB is presented in the following proposition.
Proposition 3.
Given , suppose that Algorithm 3 converges to the optimal variables and such that and . Letting be updated using Equation (27) and letting be the initial point for the next round, we have
4.3. UV Outer Bound
Now, we present the convergence results of of the BA algorithm for UVOB. The proofs are again omitted, as they are similar to those of SCIB.
Lemma 9.
Given a convex set of distribution , as depicted in Equation (29) is concave in if and only if, for all , , we have
where denotes .
Lemma 10.
In Algorithm 4, if , then .
Fixing and letting be the maximizer, the following theorem states that the function values converge.
Theorem 7.
If for some k and , and if is concave in , then the sequence generated by Algorithm 4 converges monotonically from below to .
The following corollary is implied by the proof of Theorem 7.
Corollary 3.
If for some k and , and if is concave in , then
The estimation for the one-step change in the function value for UVOB is presented in the following proposition.
Proposition 4.
Given , suppose that Algorithm 4 converges to the optimal variables and such that and . Letting be updated using Equations (39) and (40) and letting be the initial point for the next round, we have
(43)
5. Numerical Results
We take the binary skew-symmetric broadcast channel as the test channel. The conditional probability matrices are
This is perhaps the simplest broadcast channel for which the capacity region is still unknown.
This broadcast channel plays a very important role in research on capacity bounds. It was first studied in [23] to show that the time-sharing random variable is useful for the Cover–van der Meulen inner bound [24,25]. Later, [26,27,28] demonstrated that the sum rate of the UVOB for this broadcast channel is strictly larger than that of the MIB, showing for the first time that at least one of these two bounds are suboptimal.
Our algorithms are important in at least the following sense: supposing that it is not known whether the MIB matches the UVOB (or the other two bounds for a new scenario) and we want to check this; we can perform an exhaustive search on channel matrices of size (or of higher dimensions) to check whether they match. According to the results shown below in Section 5.5, this does not take very much time compared with generic algorithms.
In the following, we apply the algorithms to compute the value of the supporting hyperplane , where . The initial values of and are . This set of parameters is feasible for UVOB, as .
We demonstrate the algorithms in the following aspects: (1) the maximization part; (2) the minimization part; (3) the change from the maximum part to the minimization part; (4) the superlevel set; and (5) comparison with generic non-convex algorithms.
5.1. Maximization Part
In this part, we fix and to the initial values and let the BA algorithms iterate for times. The results are presented in Figure 1. Because this is the maximization part, the function values increase as the iterations proceed. It is clear that the function values behave properly for fixed and .
Figure 1.
The maximization parts in the algorithms for BSSC with fixed values : (a) the objective function values and (b) .
5.2. Minimization Part
In this part, we start with the initial and , then let the algorithms iterate for times. The results for are presented in Figure 2. Because this is the minimization part, the function values decrease as the iterations proceed. It is clear that in SCIB and MIB gradually changes as k grows. For UVOB, it is necessary to ensure that . When the updated makes fall below this value, it becomes necessary to scale it back. This happens approximately starting from .
Figure 2.
The minimization parts in the algorithms for BSSC with initial values : (a) the objective function values and (b) .
5.3. Change from Maximization to Minimization
In this part, we consider UVOB and let K in the algorithm be . Figure 3 plots the following three values in Equation (43):
As the algorithm iterates, the estimate in Equation (43) becomes more and more accurate, as and for small x.
Figure 3.
Function values of UVOB for BSSC with initial values .
5.4. Superlevel Set
To visualize the convergence of and its relation with the superlevel set, we take SCIB as an example and fix such that has two free variables. We reformulate the objective function of SCIB depicted in Equation (16) as follows:
where
In particular, we fix and , then use the algorithm to find the values of and . The results are shown in Figure 4. In this case, for large enough n lies in the concave part of the superlevel set, meaning that the algorithm converges. Here, it should be mentioned that it is possible that the algorithm may not converge to the optimal point for some initial s that do not lie in the concave part.
Figure 4.
Function values of SCIB for BSSC with the initial value and fixed probability vector : (a) 3D view and (b) contour view.
5.5. Comparison with Generic Non-Convex Algorithms
Here, we compare our algorithms with the following generic algorithms implemented using the “fmincon” MATLAB function: interior-point, active-set, and sequential quadratic programming (sqp). For simplicity, we only compare the sum rate of MIB, for which the optimal value is 0.2506717… nats (0.3616428… bits). The optimization problem for computing the sum rate is
According to Lemma 3, the cardinality size is .
Notice that we do not carry out a comparison with the method in [16], as it cannot be applied to cases where there is a minimum. For scenarios in which [16] can be used, our algorithms degenerate to the method in [16].
The initial point of is randomly generated for all the algorithms. Table 2 lists the experimental results. For the first three algorithms, a randomly picked starting point usually does not provide a good enough result. Thus, we ran the first three algorithms multiple times until the best function value hit 0.2506 in order to test their effectiveness. It is clear from the table that only sqp can be considered comparable to our algorithms.
Table 2.
Comparison with generic non-convex algorithms on BSSC.
| Method | Time (Seconds) | Sum-Rate of MIB (Nats) |
|---|---|---|
| interior-point | 513.82 | 0.25060… |
| active-set | 2438.57 | 0.25061… |
| sqp | 1.7621 | 0.25067… |
| this paper | 0.0629 | 0.25067… |
For further comparison with sqp, we randomly generated broadcast channels with cardinalities of , 4, 5, 6, and . The corresponding dimensions are , 512, 1125, 2160. Because the optimal sum rate is not yet known, we ran sqp once to record the running time. The results in Table 3 suggest that our algorithms are highly scalable. This meets our expectation, as the updating formulae in Equations (5) and (7) are all explicit and can be computed rapidly.
Table 3.
Comparison with sqp on random channels with alphabet sizes .
| Method | Time (Seconds) | Sum-Rate of MIB (Nats) | ||||||
|---|---|---|---|---|---|---|---|---|
| = 3 | = 4 | | = 5 | = 6 | = 3 | = 4 | = 5 | = 6 | |
| sqp | 2.6342 | 22.44 | 168.12 | 1065.51 | 0.1840 | 0.2348 | 0.1983 | 0.2351 |
| this paper | 0.0771 | 0.1031 | 0.1450 | 0.2086 | 0.1863 | 0.2375 | 0.2019 | 0.2423 |
6. Discussion and Conclusions
6.1. Initial Points of Algorithms
Taking MIB as an example, we next discuss how to choose the initial points. When there is no prior knowledge on the optimization problem, the initial point is usually generated randomly. In this paper, Theorem 6 and Lemma 7 provide some guidance on the choice of the initial point . A possibly workable method is to randomly generate an initial point and slightly perturb it to check whether these two points satisfy the inequality in Lemma 7. If the answer is no, then it is possible that the objective function is not concave in the neighbourhood of this point, and we continue to generate new initial points.
For the initial point , because it lies in it is affordable to perform a grid search, especially when is small. For example, we can take 0.1 as the equal space and try each . This approach can to some extent help us avoid becoming stuck in local extreme points.
6.2. J Version Outer Bound
As mentioned earlier, the best general outer bound is the J version outer bound proposed in [6]. However, the evaluation of this outer bound turns out to be even harder, as there are additional constraints on the free variables and the auxiliary channel with the joint distribution
These constraints are presented in Equations (18a)–(18c) and (19a)–(19c) in [6]. Taking Equations (18a) and (19a) as an example,
Direct application of Equation (7) does not yield an updated guaranteed to satisfy these constraints; thus, the design of BA algorithms for the J version outer bound should carefully address this kind of problem. We leave this for future research.
Finally, to conclude our paper, the extension of the BA algorithm to inner and outer bounds for general broadcast channels encounters max–min problems. We have shown that the max–min order can be changed to min–max. Based on this observation, we have designed BA algorithms for the maximization parts and gradient descent algorithms for the minimization parts, then performed convergence analysis and numerical experiments to support our analysis. We have compared our algorithms to the following generic non-convex algorithms: interior-point, active-set, and sequential quadratic programming. The results show that our algorithms are both effective and efficient.
Appendix A. Proofs of Results in Section 1
Proof of Theorem 1.
For the first property, Fox fixed q, the concavity is clear, as is concave in Q. The equality in Equation (6) is easy to show, as if we take to be then the function in Equation (2) reduces to in Equation (1). The maximum can be proved using the Kullback–Leibler divergence:
For the second property, we can reformulate as
The concavity holds, as the first term is concave and the second term is linear. To find the maximum , because there is a constraint , we consider the derivative of the Lagrangian:
This implies that for all x. The common term can be eliminated by normalization, as depicted in Equation (7). We can then verify the function value as follows:
where is due to Equation (7). The second equality is again per Equation (7). □
Proof of Theorem 2.
The basic idea is to show that the sum is bounded for decreasing and positive numbers .
The monotonicity is clear, as we are performing alternating maximization; thus,
The positiveness of is because .
Let achieve the maximum of ; then,
where is due to Equation (8) and holds according to Equation (9). Now, we can bound the sum as follows:
The last term is finite, as . This implies . □
Appendix B. Proofs of Results in Section 3
Proof of Lemma 1.
The equivalence is already stated without proof in Chapter 5.3 and Chapter 5.6 of [29]. We present the proof here for completeness.
It is clear that . Thus, we only need to prove . It suffices to show that the corner points of lie inside .
Because is a trapezoid with one corner point , there are at most three nontrivial corner points, as follows:
Lower right , where . Because , we have .
Upper left , where . Clearly, .
Upper right ; this corner point exists when . We can consider two cases:
- (a)
: the corner point is inside .- (b)
: the corner point is . It suffices to consider , as otherwise this point does not exist. Now, let be such that . We can construct a pair such that this corner point is inside . Considering the random variables , when and when , we can let , ; then, . Now, we have and ; hence, .
The proof is finished. □
Proof of Theorem 3.
We can use in Lemma 1 to compute the supporting hyperplanes.
When , then is bounded from above by
On the other hand, this upper bound is achieved by setting in .
When , we first have
To show that the maximum and minimum can be exchanged, we use Lemma 2; in particular, letting , , and , we have
Then, the objective function equals . It remains to prove that is convex. Assuming that for some and that for some while letting , where and if and if , we then have
Similar inequalities hold for . This proves the convexity, and hence the exchange.
Now, we can show that the expression of F can be simplified for and . Noting that and , we have
When , i.e., , the last two terms above have a non-positive sum. The maximum value equals zero, and can be achieved by taking . This finishes the proof of the expression.
The cardinality can be proved as follows:
where f and g are some continuous functions corresponding to the mutual information. Subject to fixed marginal , the maximum of over all feasible and is the upper concave envelope of the function g evaluated at . Notice that as the degree of freedom of the distribution is , it suffices to consider for evaluating the envelope. □
Proof of Theorem 4.
For fixed , in the pentagon of the UVOB there are at most two corner points in the first quadrant, namely, the upper left and lower right ones. The line connecting these two points has slope . We need to compute the the supporting hyperplane value , where .
For the case , note that the slope of the line is ; thus, it suffices to consider the upper left corner point. The expression of this point is different when falls into one of the following two sets:
When , this corner point and the corresponding expression of are
Otherwise, when , this corner point and corresponding expression of are
Now, the supporting hyperplane value is
We want to show that
For the left hand-side term, we have
where the equalities hold with and this choice is in . For the other term,
where the equalities hold using the same settings as above. Hence, we have the supporting hyperplane value
We can simplify this expression as follows:
Letting , , we have , , and the supporting hyperplane value is
Notice that the range of is . Within this range, the reverse mapping from to is
Notice that when , we have ; we can use similar reasoning as above (by swapping Y and Z, and , U and V, and , and finally and ) to obtain
The constraint then becomes , for which the range is again . Thus, the expression and the constraints are the same as for the case where . Putting these two cases together, we have the characterization of the supporting hyperplanes.
To exchange the max–min, we again use Lemma 2. The proof is similar to that of Theorem 3, and as such we omit the details, providing only the setting for , where : and the functions are
The proof of the cardinality bounds is similar to that of Theorem 3. □
Appendix C. Proofs of Results in Section 4
Proof of Lemma 5.
Let A represent a generic random variable. The gradient of can be calculated as follows:
For the particular term in , the gradient is
Now, we can calculate the first-order term in Equation (41):
Finally, the left-hand side of Equation (41) equals
For the other terms in Equation (16), we can perform similar calculations to obtain the desired inequality. □
Proof of Lemma 6.
Consider the superlevel set of the function . According to Equation (6), for all q; thus, . From Equation (6), , we have ; further, because is concave in q, is a convex set, and as such is connected. This implies that . Because makes the function value larger than that of , it must lie in , and as such in . □
Proof of Theorem 5.
The proof is similar to that of Theorem 2. We perform the following manipulations:
where holds from Equation (8), is due to
and is from Lemma 5. This implies that
The last term is finite and positive, as . Finally, a sequence of positive terms has a finite sum, which implies that the terms converge to zero, i.e., . □
Proof of Proposition 2.
Let and be
where is as in Equation (18) with . According to Equations (7) and (8),
Noting that , we estimate the difference as follows:
where holds from Equation (8) and from the fact that .
According to the definition of in Equation (19),
The expectation of this difference is
where is due to Equation (22). The proof is finished. □
Author Contributions
Methodology and analysis, Y.G.; software and visualization, Y.D. and Y.L.; validation, X.N., B.B. and W.H.; writing—original draft preparation, Y.G.; writing—review and editing, X.N., B.B. and W.H. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
Conflicts of Interest
Yanqing Liu is employed by Bank of China. Xueyan Niu, Bo Bai, Wei Han are employed by Huawei Tech. Co., Ltd. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Funding Statement
This research was funded in part by the National Key R&D Program of China (Grant No. 2021YFA1000500) and in part by Huawei Tech. Co., Ltd.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Cover T. Broadcast Channels. IEEE Trans. Inform. Theory. 1972;18:2–14. doi: 10.1109/TIT.1972.1054727. [DOI] [Google Scholar]
- 2.Bergmans P. Random coding theorem for broadcast channels with degraded components. IEEE Trans. Inform. Theory. 1973;19:197–207. doi: 10.1109/TIT.1973.1054980. [DOI] [Google Scholar]
- 3.Körner J., Marton K. Comparison of two noisy channels. In: Csiszár I., Elias P., editors. Topics in Information Theory. North-Holland; Amsterdam, The Netherlands: 1977. pp. 411–423. [Google Scholar]
- 4.Marton K. A coding theorem for the discrete memoryless broadcast channel. IEEE Trans. Inform. Theory. 1979;25:306–311. doi: 10.1109/TIT.1979.1056046. [DOI] [Google Scholar]
- 5.Nair C., El Gamal A. An outer bound to the capacity region of the broadcast channel. IEEE Trans. Inform. Theory. 2007;53:350–355. doi: 10.1109/TIT.2006.887492. [DOI] [Google Scholar]
- 6.Gohari A., Nair C. Outer bounds for multiuser settings: The auxiliary receiver approach. IEEE Trans. Inform. Theory. 2022;68:701–736. doi: 10.1109/TIT.2021.3128136. [DOI] [Google Scholar]
- 7.Liu Y., Geng Y. Blahut-Arimoto Algorithms for Computing Capacity Bounds of Broadcast Channels; Proceedings of the IEEE International Symposium on Information Theory; Espoo, Finland. 26 June–1 July 2022; pp. 1145–1150. [Google Scholar]
- 8.Liu Y., Dou Y., Geng Y. Blahut-Arimoto Algorithm for Marton’s Inner Bound; Proceedings of the IEEE International Symposium on Information Theory; Taipei, Taiwan. 25–30 June 2023; pp. 2159–2164. [Google Scholar]
- 9.Calvo E., Palomar D.P., Fonollosa J.R., Vidal J. The computation of the capacity region of the discrete degraded BC is a nonconvex DC problem; Proceedings of the IEEE International Symposium on Information Theory; Toronto, ON, Canada. 6–11 July 2008; pp. 1721–1725. [Google Scholar]
- 10.Nocedal J., Wright S.J. Numerical Optimization. 2nd ed. Springer; New York, NY, USA: 2006. [Google Scholar]
- 11.Wilson R.B. Ph.D. Thesis. Graduate School of Business Administration, Harvard University; Cambridge, MA, USA: 1963. A simplicial Algorithm for Concave Programming. [Google Scholar]
- 12.Blahut R. Computation of channel capacity and rate distortion functions. IEEE Trans. Inform. Theory. 1972;18:460–473. doi: 10.1109/TIT.1972.1054855. [DOI] [Google Scholar]
- 13.Arimoto S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inform. Theory. 1972;18:14–20. doi: 10.1109/TIT.1972.1054753. [DOI] [Google Scholar]
- 14.Rezaeian M., Grant A. Computation of Total Capacity for Discrete Memoryless Multiple-Access Channels. IEEE Trans. Inform. Theory. 2004;50:2779–2784. doi: 10.1109/TIT.2004.836661. [DOI] [Google Scholar]
- 15.Calvo E., Palomar D.P., Fonollosa J.R., Vidal J. On the Computation of the Capacity Region of the Discrete MAC. IEEE Trans. Commun. 2010;58:3512–3525. doi: 10.1109/TCOMM.2010.091710.090239. [DOI] [Google Scholar]
- 16.Yasui K., Matsushima T. Toward Computing the Capacity Region of Degraded Broadcast Channel; Proceedings of the IEEE International Symposium on Information Theory; Austin, TX, USA. 13–18 June 2010; pp. 570–574. [Google Scholar]
- 17.Dupuis F., Yu W., Willems F.M.J. Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information; Proceedings of the IEEE International Symposium on Information Theory; Chicago, IL, USA. 27 June–2 July 2004; [(accessed on 8 January 2024)]. p. 179. Available online: https://www.comm.utoronto.ca/~weiyu/ab_isit04.pdf. [Google Scholar]
- 18.Gohari A., El Gamal A., Anantharam V. On Marton’s Inner Bound for the General Broadcast Channel. IEEE Trans. Inform. Theory. 2014;60:3748–3762. doi: 10.1109/TIT.2014.2321384. [DOI] [Google Scholar]
- 19.Geng Y., Gohari A., Nair C., Yu Y. On Marton’s Inner Bound and Its Optimality for Classes of Product Broadcast Channels. IEEE Trans. Inform. Theory. 2014;60:22–41. doi: 10.1109/TIT.2013.2285925. [DOI] [Google Scholar]
- 20.Anantharam V., Gohari A., Nair C. On the Evaluation of Marton’s Inner Bound for Two-Receiver Broadcast Channels. IEEE Trans. Inform. Theory. 2019;65:1361–1371. doi: 10.1109/TIT.2018.2880241. [DOI] [Google Scholar]
- 21.Geng Y. Single-Letterization of Supporting Hyperplanes to Outer Bounds for Broadcast Channels; Proceedings of the IEEE/CIC International Conference on Communications in China; Changchun, China. 11–13 August 2019; pp. 70–74. [Google Scholar]
- 22.Gowtham K.R., Thangaraj A. Computation of secrecy capacity for more-capable channel pairs; Proceedings of the IEEE International Symposium on Information Theory; Toronto, ON, Canada. 6–11 July 2008; pp. 529–533. [Google Scholar]
- 23.Hajek B.E., Pursley M.B. Evaluation of an achievable rate region for the broadcast channel. IEEE Trans. Inform. Theory. 1979;25:36–46. doi: 10.1109/TIT.1979.1055989. [DOI] [Google Scholar]
- 24.Cover T.M. An achievable rate region for the broadcast channel. IEEE Trans. Inform. Theory. 1975;21:399–404. doi: 10.1109/TIT.1975.1055418. [DOI] [Google Scholar]
- 25.van der Meulen E.C. Random coding theorems for the general discrete memoryless broadcast channel. IEEE Trans. Inform. Theory. 1975;21:180–190. doi: 10.1109/TIT.1975.1055347. [DOI] [Google Scholar]
- 26.Nair C., Wang Z.V. On the inner and outer bounds for 2-receiver discrete memoryless broadcast channels; Proceedings of the Information Theory and Applications Workshop; San Diego, CA, USA. 27 January–1 February 2008; pp. 226–229. [Google Scholar]
- 27.Gohari A., Anantharam V. Evaluation of Marton’s inner bound for the general broadcast channel; Proceedings of the IEEE International Symposium on Information Theory; Seoul, Repubic of Korea. 28 June–3 July 2009; pp. 2462–2466. [Google Scholar]
- 28.Jog V., Nair C. An information inequality for the BSSC channel; Proceedings of the Information Theory and Applications Workshop; La Jolla, CA, USA. 31 January–5 February 2010; pp. 1–8. [Google Scholar]
- 29.El Gamal A., Kim Y.-H. Network Information Theory. Cambridge University Press; Cambridge, UK: 2011. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.




