Skip to main content
Entropy logoLink to Entropy
. 2020 Feb 16;22(2):222. doi: 10.3390/e22020222

Computing Classical-Quantum Channel Capacity Using Blahut–Arimoto Type Algorithm: A Theoretical and Numerical Analysis

Haobo Li 1,2,3,*, Ning Cai 1,*
PMCID: PMC7516652  PMID: 33285996

Abstract

Based on Arimoto’s work in 1972, we propose an iterative algorithm for computing the capacity of a discrete memoryless classical-quantum channel with a finite input alphabet and a finite dimensional output, which we call the Blahut–Arimoto algorithm for classical-quantum channel, and an input cost constraint is considered. We show that, to reach ε accuracy, the iteration complexity of the algorithm is upper bounded by lognlogεε where n is the size of the input alphabet. In particular, when the output state {ρx}xX is linearly independent in complex matrix space, the algorithm has a geometric convergence. We also show that the algorithm reaches an ε accurate solution with a complexity of O(m3lognlogεε), and O(m3logεlog(1δ)εD(p*||pN0)) in the special case, where m is the output dimension, D(p*||pN0) is the relative entropy of two distributions, and δ is a positive number. Numerical experiments were performed and an approximate solution for the binary two-dimensional case was analysed.

Keywords: capacity, classical-quantum channel, Blahut–Arimoto type algorithm, convergence speed

1. Introduction

The computation of channel capacity has always been a core problem in information theory. The very well-known Blahut-Arimoto algorithm [1,2] was proposed in 1972 to compute the discrete memoryless classical channel. Inspired by this algorithm, we propose an algorithm of Blahut-Arimoto type to compute the capacity of discrete memoryless classical-quantum channel. The classical-quantum channel [3] can be considered as a mapping xρx of an input alphabet X={1,2,,|X|} to a set of quantum states in a finite dimensional Hilbert space H. The state of a quantum system is given by a density operator ρ, which is a positive semi-definite operator with trace equal to one. Let Dm denote the set of all density operators acting on a Hilbert space H of dimension m. If the source emits a letter x with probability px, the output would be ρx, thus the output would form an ensemble: {px:ρx}xX.

In 1998, Holevo showed [4] that the classical capacity of the classical-quantum channel is the maximization of a quantity called the Holevo information over all input distributions. The Holevo information χ of an ensemble {px:ρx}xX is defined as

χ({px:ρx}xX)=H(xpxρx)xpxH(ρx), (1)

where H(·) is the von Neumann entropy, which is defined on positive semidefinite matrices:

H(ρ)=Tr(ρlogρ). (2)

Due to the concavity of von Neumann entropy [5], the Holevo information is always non-negative. The Holevo quantity is concave in the input distribution [5], and thus the maximization of Equation (1) over p is a convex optimization problem. However, it is not a straightforward convex optimization problem. In 2014, Sutter et al. [6] promoted an algorithm based on duality of convex programming and smoothing techniques [7] with a complexity of O((nm)m3(logn)1/2ε), where nm=max{n,m}.

For discrete memoryless classical channels, the capacity can be computed efficiently by using an algorithm called Blahut–Arimoto (BA) algorithm [1,2,8]. In 1998, H. Nagaoka [9] proposed a quantum version of BA algorithm. In his work, he considered the quantum-quantum channel and this problem was proved to be NP-complete in 2008 [10]. Despite the NP-completeness, in [11], an example is given of a qubit quantum channel which requires four inputs to maximize the Holevo capacity. Further research of Nagaoka’s algorithm was presented in [12], where the algorithm was implemented to check the additivity of quantum channels. In [9], Nagaoka mentioned an algorithm concerning classical-quantum channel; however, its speed of convergence was not studied there and the details of the proof were not presented either. In this paper, we show that, with proper manipulations, the BA algorithm can be applied to computing the capacity of classical-quantum channel with an input constraint efficiently. The remainder of this article is structured as follows. In Section 2, we propose the algorithm and show how the algorithm works. In Section 3, we provide the convergence analysis of the algorithm. In Section 4, we show the numerical experiments of BA algorithm to see how well this algorithm performs. In Section 5, we propose an approximate solution for a special case, which is the binary input, two-dimensional output case.

Notations: The logarithm with basis 2 is denoted by log(·). The space of all Hermitian operators of dimension m is denoted by Hm. The set of all density matrices of dimension m is denoted by Dm{ρHm:ρ0,Trρ=1}. Each letter xX is mapped to a density matrix ρx, thus the classical-quantum channel can be represented as a set of density matrices {ρx}xX. The set of all probability distributions of length n is denoted by Δn{pRn:px0,x=1npx=1}. The von Neumann entropy of a density matrix ρ is denoted by H(ρ)=Tr[ρlogρ]. The relative entropy between p,qΔn, if supp(p)supp(q), is denoted by D(p||q)=xpx(logpxlogqx) and + otherwise. The relative entropy between ρ,σDm, if supp(ρ)supp(σ), is denoted by D(ρ||σ)=Tr[ρ(logρlogσ)] and + otherwise.

2. Blahut–Arimoto Algorithm for Classical-Quantum Channel

First, we write down the primal optimization problem:

Primal:maxpH(xpxρx)xpxH(ρx),subjecttosTpS;pΔn, (3)

where ρxDm, sRn is a positive real vector, S>0. We denote the maximal value of Equation (3) as C(S). In this optimization problem, we are to maximize the Holevo quantity with respect to the input distribution {px}xX. Practically, the preparation of different signal state x has different cost, which is represented by s=(s1,s2,,sn). We would like to bound the expected cost of the resource within some quantity, which is represented by the inequality constraint in Equation (3).

Lemma 1.

[6] Let a set G be defined as G:=argmaxpΔnχ({px:ρx}xX) and Smax:=minpGsTp. Then, if SSmax, the inequality constraint in the primal problem is inactive; and, if S<Smax, the inequality constraint in the primal problem is equivalent to sTp=S.

Now, we assume that min{sx}xXSSmax. The Lagrange dual problem of Equation (3) is

Dual:minλ0maxpH(xpxρx)xpxH(ρx)λ(sTpS)subjecttopΔn. (4)

Lemma 2.

Strong duality holds between Equations (3) and (4).

Proof. 

The lemma follows from standard strong duality result of convex optimization theory ([13], Chapter 5.2.3). □

Define functions. Let

fλ(p,p)=xTr{pxρx[log(pxρx)log(pxρ)]}λsTp, (5)
F(λ)=maxpmaxpf(p,p). (6)

where ρ=xpxρx.

Lemma 3.

For fixed p, argmaxpfλ(p,p)=p.

Proof. 

Actually, we can prove a stronger lemma (the following lemma was proposed in [9], but no proof was given, perhaps due to the space limitation). We now restate the lemma in [9] and give the proof. □

Lemma 4.

For fixed {px:ρx}xX, we have

max{qx:σx}xXD(p||q)+xpxTr{ρx[logσxlogσ]}=xpxTr{ρx[logρxlogρ]},i.e.,argmax{qx:σx}xXD(p||q)+xpxTr{ρx[logσxlogσ]}={px:ρx}xX, (7)

where p,qΔn,σxDm and ρ=xpxρx,σ=xqxσx.

Proof. 

Considering Equation (7), we have

RHSLHS=D(p||q)+xpxD(ρx||σx)D(ρ||σ)=D(ρXB||σXB)D(ρ||σ),

where ρXB=xpx|xx|Xρx and σXB=xqx|xx|Xσx are classical-quantum state [5]. Let the quantum channel N be the partial trace channel on X system; then, by the monotonicity of quantum relative entropy ([5], Theorem 11.8.1), we have

D(ρXB||σXB)D(N(ρXB)||N(σXB))=D(ρ||σ).

Notice that, if we let σx=ρx in Equation (7), with some calculation, Equation (7) becomes Lemma 3. Thus, Lemma 3 is a straightforward corollary of Lemma 4

Theorem 1.

The dual problem in Equation (4) is equivalent to

minλ0F(λ)+λS. (8)

Proof. 

It follows from Equation (5) and Lemma 3 that

maxpfλ(p,p)=fλ(p,p)=H(ρ)xpxH(ρx)λsTp.

Hence,

minλ0maxpH(ρ)xpxH(ρx)λ(sTpS)=minλ0maxpmaxpfλ(p,p)+λS=minλ0F(λ)+λS.

The BA algorithm is an alternating optimization algorithm, i.e., to optimize fλ(p,p), each iteration step would fix one variable and optimize the function over the other variable. Now, we use BA algorithm to find F(λ). The iteration procedure is

px0>0,pxt=pxt,pt+1=argmaxpxTr{pxρx[log(pxtρx)log(pxρt)]}λsTp,

where ρt=xpxtρx.

To get pt+1, we can use the Lagrange function:

L=xTr{pxρx[log(pxtρx)log(pxρt)]}λsTpν(xpx1),

setting the gradient with respect to px to zero. By combining the normalization condition, we can have (taking the natural logarithm for convenience):

pxt+1=rxtxrxt, (9)
whererxt=exp(Tr{ρx[log(pxtρx)logρt]}sxλ),ρt=xpxtρx. (10)

Thus, we can summarize the Algorithm 1 below.

Algorithm 1 Blahut–Arimoto algorithm for discrete memoryless classical-quantum channel.
  • set px0=1|X|, xX;

  • repeat

  •     pxt=pxt;

  •     pxt+1=rxtxrxt, where rxt=exp(Tr{ρx[log(pxtρx)logρt]}sxλ);   

  • until convergence.

Lemma 5.

Let p*(λ)=argmaxpf(p,p) for a given λ; then, sTp*(λ) is a decreasing function of λ.

Proof. 

For convenience, we denote χ({px:ρx}xX) as χ(p). Notice that fλ(p,p)=χ(p)λsTp by definition of f(p,p).

For λ1<λ2, if p*(λ1)s<sTp*(λ2), then, by the definition of p*(λ), we have:

χ(p*(λ1))λ1sTp*(λ1)χ(p*(λ2))λ1sTp*(λ2)χ(p*(λ2))χ(p*(λ1))λ1sT(p*(λ2)p*(λ1))<λ2sT(p*(λ2)p*(λ1))χ(p*(λ1))λ2sTp*(λ1)>χ(p*(λ2))λ2sTp*(λ2),

which is a contradiction to the fact that p*(λ2) is an optimizer of χ(p)λ2sTp. Thus, we must have sTp*(λ1)sTp*(λ2) if λ1<λ2. □

We do not need to solve the optimization problem in Equation (8), because from Lemma 1 we can see that the statement “p* is an optimal solution” is equivalent to “sTp*=S and p* maximizes fλ(p,p)+λS=χ({px,ρx}xX)λ(sTpS)”, which is also equivalent to “sTp*=S and p* maximizes fλ(p,p)", thus, if for some λ0, a p maximizes fλ(p,p) and sTp=S, then the capacity C(S)=F(λ)+λS, and such λ is easy to find since sTp is a decreasing function of λ, and, to reach an ε accuracy, we need

O(logε) (11)

steps using bisection method.

3. Convergence Analysis

Next, we show that the algorithm indeed converges to F(λ) and then provide an analysis of the speed of the convergence.

3.1. The Convergence Is Guaranteed

Corollary 1.

fλ(pt+1,pt)=log(xrxt).

Proof. 

fλ(pt+1,pt)=xTr{pxt+1ρxlogpxt+1}+xTr{pxt+1ρx[log(pxtρx)log(ρt)]}λsTpt+1=xpxt+1logpxt+1+xpxt+1log(rxt)=xpxt+1log(rxtpxt+1)=log(xrxt).

The first equality comes from a manipulation of Equation (5). The second equality follows from Equation (10). The last equality follows from Equation (9). □

Corollary 2.

For arbitrary distribution {px}xX, we have

χ({px,ρx}xX)λsTpf(pt+1,pt)xpxlog(pxt+1pxt(x)).

Proof. 

Define ρ=xpxρx. Then, we have

xpxlog(pxt+1pxt)=xpxlog(1pxtrxtxrxt)=fλ(pt+1,pt)+xpxlogrxtpxt=fλ(pt+1,pt)+xpxTr{ρx[log(pxtρx)logρt]sxλlogpxt}=fλ(pt+1,pt)+xpxTr{ρx[logρxlogρt]}λsTp=fλ(pt+1,pt)+xpxTr{ρx[logρxlogρ+logρlogρt]}λsTp=fλ(pt+1,pt)+χ({px,ρx}xX)λsTp+D(ρ||ρt). (12)

The first equality follows from Equation (9). The second equality follows from Corollary 1. The third equality follows from Equation (10). The last equality follows from Equation (1). Since the relative entropy D(ρX||ρt) is always non-negative [5], we have

χ({px,ρx}xX)λsTpfλ(pt+1,pt)xpxlog(pxt+1pxt(x)).

Theorem 2.

fλ(pt+1,pt) converges to F(λ) as t.

Proof. 

Let p* be an optimal solution that achieves F(λ); then, we have the following inequality

t=0N[F(λ)fλ(pt+1,pt)]=t=0N[χ({px*,ρx}xX)λsTp*fλ(pt+1,pt)]t=0Nxpx*log(pxt+1pxt)=xpx*t=0Nlog(pxt+1pxt)=xpx*log(pxN+1px0)=xpx*log(px*px0)+xpx*log(pxN+1p*(x))=D(p*||p0)D(p*||pN+1) (13)
D(p*||p0). (14)

The first equality follows from Equations (5), (6), and (1). The first inequality follows from Corollary 2. The last inequality follows from the non-negativity of relative entropy.

Thus, let N and with F(λ)fλ(pt+1,pt)0, we have

0t=0[F(λ)fλ(pt+1,pt)]D(p*||p0), (15)

Notice we take the initial p0 to be uniform distribution, so the right hand side of Equation (15) is finite. Combine with the fact that fλ(pt+1,pt) is a non-decreasing sequence, this means fλ(pt+1,pt) converges to F(λ). □

Theorem 3.

The probability distribution {pt}t=0 also converges.

Proof. 

Remove the summation over t in Equations (13) and (14); then, we have

0F(λ)fλ(pt+1,pt)xpx*log(pxt+1pxt)=D(p*||pt)D(p*||pt+1). (16)

Now that the sequence {pt}t=0 is a bounded sequence, there exists a subsequence {ptk}k=0 that converges. Let us say it converges to p¯. Then, clearly, we have f(p¯,p¯)=F(λ) (or f(pt+1,pt) would not converge). Substituting p*=p¯ in Equation (16), we have

0D(p¯||pt)D(p¯||pt+1).

Thus, the sequence {D(p¯||pt)}t=0 is a decreasing sequence and there exists a subsequence {D(p¯||ptk)}k=0 that converges to zero. Therefore, we can conclude that {D(p¯||pt)}t=0 converges to zero, which means {pt}t=0 converges to p¯. □

3.2. The Speed of Convergence

Theorem 4.

To reach ε accuracy to F(λ), the algorithm needs an iteration complexity less than lognε.

Proof. 

From the proof of Theorem 2, we know

t=0N[F(λ)fλ(pt+1,pt)]D(p*||p0)=xpx*log(px*px0)=lognH(p*)<logn,

and [F(λ)fλ(pt+1,pt)] is non-increasing in t, thus

F(λ)fλ(pt+1,pt)<lognt.

Next, we show that in some special cases the algorithm has a better convergence performance, which is a geometric speed of convergence.

Assumption 1.

The channel matrices {ρx}xX are linearly independent, i.e., there does not exist a vector cRn such that

xcxρx=0.

Remark 1.

Assumption 1 is equivalent to: The output state ρ=xpxρx is uniquely determined by the input distribution p.

Theorem 5.

Under Assumption 1, the optimal solution p* is unique.

Proof. 

Notice that the von Neumann entropy in Equation (2) is strictly concave [14], thus, for distributions pp, ρ=xpxρxxpxρx=ρ, which is followed from Assumption refas1. Thus, this means H(ρ) is strictly concave in p. Thus, Holevo quantity in Equation (1) is strictly concave in p, which means the optimal solution p* is unique. □

We also need the following theorem:

Theorem 6.

[15] The relative entropy satisfies

D(ρ||σ)12Tr(ρσ)2.

Now, we state the theorem of convergence:

Theorem 7.

Suppose start from some initial point p0, then, under Assumption 1, the algorithm converges to the optimal point p*, and p0 converges to p* at a geometric speed, i.e., there exist N0 and δ>0, where N and δ are independent, such that, for any t>N0, we have

D(p*||pt)(1δ)tN0D(p*||pN0).

Proof. 

Define dx=px*pxt and the real vector |d=(d1,d2,,dn)T. Using Taylor expansion, we have

D(p*||pt)=xpx*log(px*pxt)=xpx*log(1dxpx*)=12d|P|d+xO(dx3),

where P=diag(p1*,p2*,,pn*). Now, pt converges to p*, i.e., |d converges to zero, thus there exists a N0 such that, for any t>N0, we have

D(p*||pt)23d|P|d. (17)

From Theorem 6, we have

D(ρ*||ρt)12Tr{[xdxρx]2}=12d|M|d, (18)

where MRn×n:

Mij=Tr(ρiρj).

From Equation (18), we know that, under Assumption 1, M is positive definite. Thus, there exists a δ>0 such that

12M>δ23P12d|M|d>δ23d|P|d.

Thus, for any t>N0, it follows from Equations (17) and (18) that

D(ρ*||ρt)δD(p*||pt). (19)

From Equation (12), we know

xpx*log(pxt+1pxt)D(ρ*||ρt)D(p*||pt+1)D(p*||pt)D(ρ*||ρt),

combined with Equation (19), we have

D(p*||pt+1)D(p*||pt)δD(p*||pt)=(1δ)D(p*||pt)D(p*||pt)(1δ)tN0D(p*||pN0) (20)

for any t>N0. □

Remark 2.

(Complexity). Denote n,m as the size of input alphabet and output state dimension, respectively. A closer look at Algorithm 1 reveals that, for each iteration, a matrix logarithm logρt needs to be calculated, and the rest are just multiplication of matrices and multiplication of numbers. The matrix logarithm can be done with complexity O(m3) [16], thus, by Theorem 4 and Equation (11), the complexity to reach ε-close to the true capacity using Algorithm 1 is O(m3lognlogεε). With extra condition of the channel {ρx}xX, which is Assumption 1, the complexity to reach an ε-close solution (i.e., D(p*||pt)<ε) using Algorithm 1 is O(m3logεlog(1δ)εD(p*||pN0)). Usually, we do not need ε to be too small (no smaller than 106), thus, in either case, the complexity is better than O((nm)m3(logn)1/2ε) in [6] when nm is big, where nm=max{n,m}.

4. Numerical Experiments on BA Algorithm

We only performed experiments on BA algorithm with no input constraint (BA algorithm with input constraint is some combination of BA algorithm with no input constraint.) We studied the relations between iteration complexity and n,m (i.e., the input size and output dimension) when the algorithm reaches certain accuracy. Since we do not know the true capacity of a certain channel, we used the following theorem to bound the error of the algorithm.

Theorem 8.

With the iteration procedure in the BA Algorithm 1, maxx{D(ρx||ρt)λsx} converges to F(λ) from above.

Proof. 

Following from Algorithm 1, Corollary 1, and Theorem 3, we have

limtpxt+1pxt=exp[D(ρx||ρ*)λsxF(λ)],

where ρ*=xpx*ρx, p* is an optimal distribution. The limit above is 1 if px*>0 and does not exceed 1 if px*=0. Thus,

D(ρx||ρ*)λsxF(λ)

for every xX, with equality if px*>0. This proves

maxx{D(ρx||ρt)λsx}F(λ).

For any pt and any optimal distribution p*, we have

maxx[D(ρx||ρt)λsx]xpx*[D(ρx||ρt)λsx]=xpx*D(ρx||ρ*)+D(ρ*||ρt)λsTp*=F(λ)+D(ρ*||ρt)F(λ).

The first equality requires some calculation and the second equality follows since p* is an optimal distribution. This means maxx{D(ρx||ρt)λsx} converges to F(λ) from above. □

Thus, our accuracy criterion was: for a given classical-quantum channel, we ran the BA algorithm (with no input constraint), until [maxx{D(ρx||ρt)}f(pt,pt)] was less than 10k, and recorded the number of iteration. At this time, the accuracy was of order 10(k+1) at most since maxx{D(ρx||ρt)} and f(pt,pt)] converged to the true capacity from above and below, respectively.

We performed the following numerical experiments: for given values of input size n, output dimension m and accuracy, we generated 200 classical-quantum channels randomly, recorded the numbers of iterations, and then calculated the average number of iterations of these 200 experiments. The results are shown in Figure 1. Note that the accuracy 10k in Figure 1 means we ran the BA algorithm until [maxx{D(ρx||ρt)}f(pt,pt)] was less than 10k, and the error between the true capacity and the computed value was of order 10(k+1) at most.

Figure 1.

Figure 1

The number of iterations needed to reach certain accuracy.

We can see in Figure 1 that the iteration complexity scales better as accuracy and input dimension increase. We can also see for given input size n and accuracy, the output dimension has vary little influence on iteration complexity, which means the iteration complexity also scales better as the output dimension m increases. Compared with our theoretical analysis of iteration complexity in Theorem 4: to reach ϵ accuracy, we needed lognϵ iterations; the numerical experiments showed that the number of iterations was far smaller than lognϵ to reach ϵ accuracy, whether the output quantum states were independent or not (cases in (n,m)=(6,2),(10,2)). The reason for this is that the inequalities in the proof of Theorem 4 are quite loose. Thus, Theorem 4 only provides a very loose upper bound on iteration complexity. We can also guess that maybe the relation in Equation (20) holds generally and we just cannot prove it yet.

Next, we needed to see the running time of the BA algorithm. There were three methods to compute the classical-quantum channel capacity: BA algorithm, the duality and smoothing technique [6], and the method created by Hamza Fawzi et al. [17]. In [17], a Matlab code package called CvxQuad is provided which accurately approximates the relative entropy function via semidefinite programming so that many quantum capacity quantities can be computed using a convex optimization tool called CVX. Here, we compared the running time of the above three methods. For different input size n and output dimension m, we generated a classical-quantum channel randomly and computed the channel capacity using the above three methods and then recorded the running time of each method. The results are shown in Figure 2.

Figure 2.

Figure 2

The caparison of the running time of three methods.

In Figure 2, we can see the BA algorithm was the fastest method. The duality and smoothing method was rather slow and we did not record the running time of the duality and smoothing method when n=30 because it took too long. We can also notice that the running time of the CvxQuad method was extremely sensitive to the output dimension, which is not a surprise because CVX is a second-order method. Thus, our BA algorithm was significantly faster than the other two methods when n and m became big.

5. An Approximate Solution of p in Binary Two Dimensional Case

In this section, we provide an approximate optimal input distribution for the case of the input size and output dimension are both 2:

{p1:ρ1;p2:ρ2},p1+p2=1,ρ1,ρ2D2.

5.1. Use Bloch Sphere to Get an Approximate Solution

Any two-dimensional density matrix can be represented as a point in the Bloch sphere [5], as shown in the following:

Any density matrix can be represented as a vector in the Bloch sphere starting from the origin. Suppose ρ1,ρ2 can be represented as r1,r2 respectively, as shown in Figure 3; then, the two eigenvalues would be 0.5±r1/2 and 0.5±r2/2, respectively. Extending r1, we get two intersections on the surface of the Bloch sphere; then, these two intersections represent the two eigenvectors of ρ1 (the points on the surface of the sphere represent pure state and the interior points represent mixed states). A probabilistic combination of ρ1,ρ2 can be represented as p1ρ1+p2ρ2=p1r1+p2r2 ([5] Exercise 4.4.13). Any point on the surface of Bloch sphere can be represented as

cosα2|0+sinα2eiϕ|1,

where α is the angle to the Z-axis and ϕ is the angle of the X-axis to the projection of the point on the XY plane.

Figure 3.

Figure 3

Bloch sphere.

By symmetry, it is obvious that the Holevo quantity is only related to r1,r2,θ,p1, where θ is the angle between r1,r2. One interesting result is that the angle θ has very little influence on p*, where p* is the optimal distribution that maximizes Holevo quantity. If we know λ1,λ2 (the bigger eigenvalues of ρ1,ρ2, respectively), θ and p1, then the Holevo quantity can be written as

χ(λ1,λ2,θ,p1)=S(12+12||p1r1+(1p1)r2||2)[p1S(12+12r1)+(1p1)S(12+12r2)], (21)

where S(·) is the binary entropy (S(x)=(xlogx+(1x)log(1x)) and ri=2λi1.

Using Cosine Theorem to calculate ||p1r1+(1p1)r2||2), the gradient of χ with respect to p1 can be calculated directly, denoted as

p1χ(λ1,λ2,θ,p1).

If we can find a p1 such that p1χ(λ1,λ2,θ,p1)=0, then this p1 is the optimal solution (because χ(λ1,λ2,θ,p1) is concave in p1). However, we cannot solve the equation p1χ(λ1,λ2,θ,p1)=0 with respect to p1 when θ0. Now that θ has little influence on p*, let θ=0 (this is actually the classical case), and let

p1χ(λ1,λ2,θ=0,p1)=0,

the above equation is easy to solve and we get a solution p^1:

p^1=1c1+cr2r1r2,wherec=2S(λ1)S(λ2)(r1r2)/2, (22)

where we assume r1r2. (It can be easily seen from the Bloch sphere that if r1=r2, the optimal distribution would be {12,12}.)

This p^1 can be used as an approximate optimal solution for all θ[0,π]. Next, we need numerical experiments to see how accurate p^1 is.

5.2. Numerical Experiments on the Approximated Solution p^1

It is obvious that the maximum of Holevo quantity only depends on r1,r2 and θ, thus, without loss of generality, we let ρ1 be on the Z-axis and ρ2 be on the XZ plane:

ρ1=λ1|00|+(1λ1)|11|;ρ2=λ2|ψ0ψ0|+(1λ2)|ψ1ψ1|,

where

|ψ0=cosθ2|0+sinθ2|1;|ψ1=sinθ2|0+cosθ2|1,

which means the angle between ρ1 and ρ2 (i.e., r1 and r2) is θ [5].

In the numerical experiments, we let λ1,λ2 range from 0.5 to 1, and θ range from 0 to π. For each value of (λ1,λ2,θ), we substituted (λ1,λ2) into Equation (22) to compute p^1. Then, we substituted (λ1,λ2,θ,p^1) into Equation (21) to get the approximate maximum of Holevo quantity over p1: χ(λ1,λ2,θ,p^1). To see how accurate this approximate maximum is, we need a BA algorithm to provide an accurate maximum. The termination criterion for the iteration process of BA algorithm is stopping when [maxx{D(ρx||ρt)}f(pt,pt)] is less than 106; then, the BA algorithm outputs a value of Holevo quantity χBA(λ1,λ2,θ). We can compute the error of χ(λ1,λ2,θ,p^1) then take the maximum over θ[0,π]

Error(λ1,λ2)=maxθ[0,π]|χ(λ1,λ2,θ,p^1)χBA(λ1,λ2,θ)|.

Figure 4 is the numerical result, which is a plot of (λ1,λ2,Error(λ1,λ2)).

Figure 4.

Figure 4

Error of the approximated method.

In Figure 4, we can see that, if λ1,λ2 are not “too big", the error can be upper bounded by 103. To see this more directly, we take the maximum of Error(λ1,λ2) for different ranges of λ1,λ2:

maxλ1,λ2[0.5,R]Error(λ1,λ2).

Figure 5 is a plot of (R,maxλ1,λ2[0.5,R]Error(λ1,λ2)).

Figure 5.

Figure 5

Error of the approximate method.

In Figure 5, we can see that, if λ1,λ2<0.95, the error of approximate maximum of Holevo quantity can be upper bounded by 3×104. Thus, we can conclude that, when the bigger eigenvalues of ρ1,ρ2 are not too big (no bigger than 0.95), Equation (22) can make the error of the maximum of Holevo quantity smaller than 3×104.

The approximate solution is an interesting phenomenon. The reason the angle θ has such little influence on the maximum of Holevo quantity is unclear.

6. Discussion

In this paper, we provide an algorithm which computes the capacity of classical-quantum channel. We analyzed the speed of convergence theoretically and numerical experiments showed that our algorithm outperforms the existing methods [6,17]. We also provide an approximated method to compute the capacity of binary two-dimensional classical-quantum channel, which shows high accuracy. As mentioned in the Introduction, for a general quantum-quantum channel, maximizing Holevo quantity with respect to both the input distribution and output quantum state is NP-complete and this is not a convex optimization problem because Holevo quantity is concave with respect to input distribution and convex with respect to output quantum states. Thus, it remains open whether there exists an efficient algorithm to solve this problem. However, for classical-quantum channel, Holevo quantity has an upper bound, thus one future work would be to maximize Holevo quantity with respect to output states with given input distribution. It remains open if alternating optimization algorithms, in particular of Blahut–Arimoto type, can also be given for other optimization problems in terms of quantum entropy.

The authors declare no conflict of interest.

Author Contributions

Formal analysis, H.L., N.C.; Investigation, H.L.; Project administration, H.L.; Supervision, N.C.; Writing—original draft, H.L.; Writing—review & editing, N.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Arimoto S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory. 1972;18:14–20. doi: 10.1109/TIT.1972.1054753. [DOI] [Google Scholar]
  • 2.Blahut R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory. 1972;18:460–473. doi: 10.1109/TIT.1972.1054855. [DOI] [Google Scholar]
  • 3.Holevo A. Problems in the mathematical theory of quantum communication channels. Rep. Math. Phys. 1977;12:273–278. doi: 10.1016/0034-4877(77)90010-6. [DOI] [Google Scholar]
  • 4.Holevo A.S. The capacity of the quantum channel with general signal states. IEEE Trans. Inf. Theory. 1998;44:269–273. doi: 10.1109/18.651037. [DOI] [Google Scholar]
  • 5.Wilde M. From Classical to Quantum Shannon Theory. Cambridge University Press; Cambridge, UK: 2016. [Google Scholar]
  • 6.Sutter D., Sutter T., Esfahani P.M., Renner R. Efficient Approximation of Quantum Channel Capacities. IEEE Trans. Inf. Theory. 2016;62:578–598. doi: 10.1109/TIT.2015.2503755. [DOI] [Google Scholar]
  • 7.Nesterov Y. Smooth minimization of non-smooth functions. Math. Program. 2005;103:127–152. doi: 10.1007/s10107-004-0552-5. [DOI] [Google Scholar]
  • 8.Yeung R.W. Information Theory and Network Coding. Springer; Berlin/Heidelberger, Germany: 2008. [Google Scholar]
  • 9.Nagaoka H. Algorithms of Arimoto-Blahut type for computing quantum channel capacity; Proceedings of the IEEE International Symposium on Information Theory (ISIT); Cambridge, MA, USA. 16–21 August 1998; p. 354. [Google Scholar]
  • 10.Beigi S., Shor P.W. On the Complexity of Computing Zero-Error and Holevo Capacity of Quantum Channels. arXiv. 20070709.2090 [Google Scholar]
  • 11.Osawa S., Nagaoka H. Numerical Experiments on the Capacity of Quantum Channel with Entangled Input States. IEICE TRANSACTIONS Fundam. Electron. Commun. Comput. 2001;E84-A:2583–2590. [Google Scholar]
  • 12.Hayashi M., Imai H., Matsumoto K., Ruskai M.B., Shimono T. Qubit Channels Which Require Four Inputs to Achieve Capacity: Implications for Additivity Conjectures. Quantum Inf. Comput. 2005;5:13–31. [Google Scholar]
  • 13.Boyd S., Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004. [Google Scholar]
  • 14.Bengtsson I., Zyczkowski K. GEOMETRY OF QUANTUM STATES: An Introduction to Quantum Entanglement. Cambridge University Press; Cambridge, UK: 2006. [Google Scholar]
  • 15.Hiai F., Petz D. Introduction to Matrix Analysis and Applications. Springer; Berlin/Heidelberger, Germany: 2014. [Google Scholar]
  • 16.Moler C., Van Loan C. Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later. SIAM Rev. 2003;45:3–49. doi: 10.1137/S00361445024180. [DOI] [Google Scholar]
  • 17.Hamza Fawzi O.F. Efficient optimization of the quantum relative entropy. J. Phys. A Math. Theor. 2018;51:409601. doi: 10.1088/1751-8121/aadb70. [DOI] [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES