Abstract
In a recent paper by T. Strohmer and R. Vershynin [“A Randomized Kaczmarz Algorithm with Exponential Convergence”, Journal of Fourier Analysis and Applications, published online on April 25, 2008] a “randomized Kaczmarz algorithm” is proposed for solving systems of linear equations . In that algorithm the next equation to be used in an iterative Kaczmarz process is selected with a probability proportional to ‖ai‖ 2. The paper illustrates the superiority of this selection method for the reconstruction of a bandlimited function from its nonuniformly spaced sampling values.
In this note we point out that the reported success of the algorithm of Strohmer and Vershynin in their numerical simulation depends on the specific choices that are made in translating the underlying problem, whose geometrical nature is “find a common point of a set of hyperplanes”, into a system of algebraic equations. If this translation is carefully done, as in the numerical simulation provided by Strohmer and Vershynin for the reconstruction of a bandlimited function from its nonuniformly spaced sampling values, then indeed good performance may result. However, there will always be legitimate algebraic representations of the underlying problem (so that the set of solutions of the system of algebraic equations is exactly the set of points in the intersection of the hyperplanes), for which the selection method of Strohmer and Vershynin will perform in an inferior manner.
Keywords: Kaczmarz algorithm, projection method, rate of convergence
1 Introduction
Kaczmarz’s algorithm [21] is a sequential projection method for the solution of linear systems of equations of the form Ax = b. It was rediscovered in the field of image reconstruction from projections in [13] where it is called the Algebraic Reconstruction Technique (ART), see, e.g., [5, 7, 17], and it is obtained also by applying to the hyperplanes described by the linear system the method of successive projections onto convex sets. The latter is called in the literature POCS (for projections onto convex sets), see, e.g., [25], and was originally published by Bregman [3] and further studied by Gubin, Polyak and Raik [15], see, e.g., Bauschke and Borwein [1] and Combettes [9] for reviews. It belongs also to the class of row-action methods described in [6], see also [7]. The literature on Kaczmarz’s algorithm is vast and ranges from beautiful mathematical theoretical results to many successful applications in important real-world problems. Kaczmarz’s method is also a special case of several other algorithmic structures, see, e.g., [4, 8, 12, 20].
In a recent paper [24] (see also [23]) Strohmer and Vershynin propose a randomized Kaczmarz’s algorithm and claim that “it outperforms all previously known methods on general extremely overdetermined systems.” In this note we point out that the reported success of the algorithm of [24] in their numerical simulation depends on the specific choices that are made in translating the underlying problem, whose geometrical nature is “find a common point of a set of hyperplanes”, into a system of algebraic equations. If this translation is carefully done, as in the numerical simulation provided by [24] for the reconstruction of a bandlimited function from its nonuniformly spaced sampling values, then indeed good performance may result. However, there will always be legitimate algebraic representations of the underlying problem (so that the set of solutions of the system of algebraic equations is exactly the set of points in the intersection of the hyperplanes), for which the selection method of [24] will perform in an inferior manner.
2 The randomized Kaczmarz’s algorithm
Each equation of the m× n system Ax = b gives rise to a hyperplane
(1) |
in the n-dimensional Euclidean space Rn, where ai ∈ Rn and bi ∈ R are the i-th row of the matrix A and the i-th component of the right-hand side vector b ∈ Rm of the linear system, respectively. The Euclidean inner product is where and and ‖ai‖ denotes the Euclidean norm. The randomized Kaczmarz’s algorithm of Strohmer and Vershynin is as follows.
Algorithm 1 [24, Algorithm 1]
Initialization
x0 ∈ Rn is arbitrary.
Iterative Step
Given the current iterate xk, calculate the next iterate xk+1 by
(2) |
where r (i) is chosen from the set {1, 2, …, m} at random with probability proportional to ‖ar (i)‖ 2.
Obviously, the equations can be scaled, i.e., both sides of each equation 〈ai, x〉 = bi can be divided through by an arbitrary nonzero real number, say, ci, without changing the hyperplane Hi but changing only its algebraic presentation. The question is how do such permissible changes affect the behavior of Algorithm 1, if at all, and what does that mean for the superiority claim of this algorithm in [24].
3 The results of Strohmer and Vershynin
Strohmer and Vershynin use the scaled condition number of a matrix A, introduced by Demmel [10],
(3) |
where ‖A‖2 and ‖A‖ F are the spectral norm and the Frobenius norm of A, respectively. Their main result [24, Theorem 2] states that any iterative sequence , generated by Algorithm 1, converges to the solution of the system Ax = b with an expected exponential rate, and that the rate of convergence depends on the scaled condition number κ (A).
In their numerical experiments they compare what they call the standard Kaczmarz method (in which the rows of A are selected in their “natural” order 1, 2, …) with a simple randomized Kaczmarz method (in which the rows of A are selected at random with equal probability and with the randomized Kaczmarz method of Algorithm 1 (where the rows of A are selected at random with probability proportional to the 2-norm of the rows). They plot the least squares error versus the number of projections for each algorithm and claim that “Algorithm 1 significantly outperforms the other Kaczmarz methods, demonstrating not only the power of choosing the projections at random, but also the importance of choosing the projections according to their relevance.”
4 Theory and practice
Clearly, Kaczmarz’s method is a geometric algorithm (this term was used by Gordon and Mansour [14]) in the sense that the sequence of iterates generated by it depends only on the hyperplanes defined by the equations and not on any particular algebraic representation of the hyperplanes. One can always use real nonzero numbers, say ci, i = 1, 2, …, m, and divide through each equation 〈ai, x〉 = bi, without affecting the hyperplanes defined by the equations and the solution set of the system. Therefore, the rate of convergence of a geometric algorithm must depend on properties of the underlying geometric objects (the hyperplanes) and not on their algebraic representation. This geometric approach has been mathematically studied in several works, see, e.g., Hamaker and Solmon [16], Kayalar and Weinert [22], Bauschke, Borwein and Lewis [2] and Deutsch and Hundal [11] which consider, in various settings, the connection between the angles among the underlying sets and the rate of convergence of the alternating projections method. When specialized to hyperplanes, that algorithm coincides with Kaczmarz’s method and the results apply to the linear system of equations Ax = b.
A scaling of the equations will change the system matrix A and its scaled condition number κ(A) and, in the light of [24], it might be tempting to think that it is possible to control in such a way the convergence rate of Kaczmarz’s method. However, the geometric nature of Kaczmarz’s method precludes such a possibility. Given any probability distribution whatsoever to be used for selecting the next hyperplane in Algorithm 1, we can always use real nonzero numbers to divide through the equations so that the rule for selecting the rows in Algorithm 1 will be exactly according to that probability distribution.1 Thus, the rule proposed in Algorithm 1 cannot be in general optimal (or even in any sense superior).
Thus, the results shown in Figure 1 of [24], cannot be fully explained by the presented theory there. It is true that the convergence behavior depends on the order-selection of hyperplanes (as is well-known), but this dependence is determined by the geometry of the hyperplanes (the angles among them) and not by their algebraic representation, see, e.g., [2, 11, 16, 18, 19, 22].
Stated differently, if, for any reason whatsoever, we believe that at iteration k hyperplane Hi should be picked with higher probability then hyperplane Hj, then this is the case however the equations representing Hi and Hj are scaled (since the step of projecting onto the selected hyperplane is independent of the scaling). Hence, a claim that states that an order-selection methodology that depends on the norms of the ais in the representation of the hyperplanes is generally superior to another order-selection methodology must be inherently false.
To avoid inferior behavior by the randomized Kaczmarz algorithm, it is essential that the system of algebraic equations that represent the set of hyperplanes be carefully chosen. Indeed, this was done by Strohmer and Vershynin [24] in their numerical simulation in Section 4.1, see their equation (18). Had they selected a different algebraic representation, they would have obtained different convergence behavior. In particular, if they scaled their equations so that ‖ai‖ = 1 after scaling, then the reported difference between the behaviors of simple randomized Kaczmarz and of Algorithm 1 would have disappeared. And, if by the luck of the draw, the algebraic representation happened to be such that the norm associated with one equation is very much larger than the norms associated with the other equations, then the progress made by Algorithm 1 towards a solution would be poor due to the fact that the random selection would keep selecting the same equation most of the time.
Acknowledgments
This work was supported by grant No. 2003275 of the United States-Israel Binational Science Foundation (BSF), by a Chinese NKBRSF grant No. 2003CB716101 and by National Science Foundation of China (NSFC) grants No. 60325101, 60532080, and 60628102 and by Award Number R01HL070472 from the National Heart, Lung, And Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, And Blood Institute or the National Institutes of Health.
Footnotes
’This is an author-created, un-copyedited version of an article accepted for publication in Journal of Fourier Analysis and Applications. The definitive publisher authenticated version is available online at http://www.springerlink.com/content/t861180p278v/?p=9c1df8db5f2348dab0f861d9c4ee911d&pi=0
For any distribution λ i > 0, i = 1, 2, …, m, the equations may be scaled as follows. Let . Then ‖ãi‖2 = λ i for the scaled equation. This scaling method also works for any other norm in the Euclidean space.
Contributor Information
Yair Censor, Email: yair@math.haifa.ac.il.
Gabor T. Herman, Email: gabortherman@yahoo.com.
Ming Jiang, Email: ming-jiang@pku.edu.cn.
References
- 1.Bauschke HH, Borwein JM. On projection algorithms for solving convex feasibility problems. SIAM Review. 1996;38:367–426. [Google Scholar]
- 2.Bauschke HH, Borwein JM, Lewis AS. The method of cyclic projections for closed convex sets in Hilbert space. Contemporary Mathematics. 1997;204:1–38. [Google Scholar]
- 3.Bregman LM. The method of successive projections for finding a common point of convex sets. Soviet Mathematics Doklady. 1965;6:688–692. [Google Scholar]
- 4.Butnariu D, Davidi R, Herman GT, Kazantsev IG. Stable convergence behavior under summable perturbations of a class of projection methods for convex feasibility and optimization problems. IEEE Journal of Selected Topics in Signal Processing. 2007;1:540–547. [Google Scholar]
- 5.Byrne CL. Applied Iterative Methods. Wellesley, MA, USA: A. K. Peters Ltd.; 2008. [Google Scholar]
- 6.Censor Y. Row-action methods for huge and sparse systems and their applications. SIAM Review. 1981;23:444–466. [Google Scholar]
- 7.Censor Y, Zenios SA. Parallel Optimization: Theory, Algorithms, and Applications. New York, NY, USA: Oxford University Press; 1997. [Google Scholar]
- 8.Censor Y, Elfving T, Herman GT, Nikazad T. On diagonallyrelaxed orthogonal projection methods. SIAM Journal on Scientific Computing. 2008;30:473–504. [Google Scholar]
- 9.Combettes PL. The convex feasibility problem in image recovery. Advances in Imaging and Electron Physics. 1996;95:155–270. [Google Scholar]
- 10.Demmel JW. The probability that a numerical analysis problem is difficult. Mathematics of Computation. 1988;50:449–480. [Google Scholar]
- 11.Deutsch F, Hundal H. The rate of convergence for the cyclic projections algorithm I: Angles between convex sets. Journal of Approximation Theory. 2006;142:36–55. [Google Scholar]
- 12.Eggermont PPB, Herman GT, Lent A. Iterative algorithms for large partitioned linear systems, with applications to image reconstruction. Linear Algebra and Its Applications. 1981;40:37–67. [Google Scholar]
- 13.Gordon R, Bender R, Herman GT. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and x-ray photography. Journal of Theoretical Biology. 1970;29:471–481. doi: 10.1016/0022-5193(70)90109-8. [DOI] [PubMed] [Google Scholar]
- 14.Gordon D, Mansour R. A geometric approach to quadratic optimization: an improved method for solving strongly underdetermined systems in CT. Inverse Problems in Science & Engineering. 2007;15:811–826. [Google Scholar]
- 15.Gubin L, Polyak B, Raik E. The method of projections for finding the common point of convex sets. USSR Computational Mathematics and Mathematical Physics. 1967;7:1–24. [Google Scholar]
- 16.Hamaker C, Solmon D. The angles between the null space of X-rays. Journal of Mathematical Analysis and Applications. 1978;62:1–23. [Google Scholar]
- 17.Herman GT. Image Reconstruction From Projections: The Fundamentals of Computerized Tomography. New York, NY, USA: Academic Press; 1980. [Google Scholar]
- 18.Herman GT. Algebraic reconstruction techniques in medical imaging. In: Leondes CT, editor. Medical Imaging Systems Techniques and Applications: Computational Techniques. Amsterdam, The Netherlands: Gordon and Breach, Overseas Publishers Association (OPA); 1998. pp. 1–42. [Google Scholar]
- 19.Herman GT, Meyer LB. Algebraic reconstruction techniques can be made computationally efficient. IEEE Transactions on Medical Imaging. 1993;12:600–609. doi: 10.1109/42.241889. [DOI] [PubMed] [Google Scholar]
- 20.Jiang M, Wang G. Convergence studies on iterative algorithms for image reconstruction. IEEE Transactions on Medical Imaging. 2003;22:569–579. doi: 10.1109/TMI.2003.812253. [DOI] [PubMed] [Google Scholar]
- 21.Kaczmarz S. Angenäherte Auflösung von Systemen linearer Gleichungen. Bulletin de l’Académie Polonaise des Sciences et Lettres. 1937;A35:355–357. [Google Scholar]
- 22.Kayalar S, Weinert HL. Error bounds for the method of alternating projections. Mathematics of Control, Signals, and Systems. 1988;1:43–59. [Google Scholar]
- 23.Strohmer T, Vershynin R. A randomized solver for linear systems with exponential convergence. In: Diaz J, Jansen K, PRolimand JD, Zwick U, editors. Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques, Lecture Notes in Computer Science (LNCS) Vol. 4110. Berlin, Heidelberg, Germany: Springer-Verlag; 2006. pp. 499–507. [Google Scholar]
- 24.Strohmer T, Vershynin R. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, to appear. The OnlineFirst version is available at: http://www.springerlink.com/content/802836566684l332/ [Google Scholar]
- 25.Youla DC. Mathematical theory of image restoration by the method of convex projections. In: Stark H, editor. Image Recovery: Theory and Applications. Orlando, Florida, USA: Academic Press; 1987. pp. 29–77. [Google Scholar]