Skip to main content
Journal of Research of the National Institute of Standards and Technology logoLink to Journal of Research of the National Institute of Standards and Technology
. 2019 Oct 9;124:1–6. doi: 10.6028/jres.124.028

A Purely Algebraic Justification of the Kabsch-Umeyama Algorithm

Jim Lawrence 1,2, Javier Bernal 1, Christoph Witzgall 1
PMCID: PMC7340555  PMID: 34877177

Abstract

The constrained orthogonal Procrustes problem is the least-squares problem that calls for a rotation matrix that optimally aligns two matrices of the same order. Over past decades, the algorithm of choice for solving this problem has been the Kabsch-Umeyama algorithm, which is effectively no more than the computation of the singular value decomposition of a particular matrix. Its justification, as presented separately by Kabsch and Umeyama, is not totally algebraic since it is based on solving the minimization problem via Lagrange multipliers. In order to provide a more transparent alternative, it is the main purpose of this paper to present a purely algebraic justification of the algorithm through the exclusive use of simple concepts from linear algebra. For the sake of completeness, a proof is also included of the well known and widely used fact that the orientation-preserving rigid motion problem, i.e., the least-squares problem that calls for an orientation-preserving rigid motion that optimally aligns two corresponding sets of points in d−dimensional Euclidean space, reduces to the constrained orthogonal Procrustes problem.

Keywords: constrained, least-squares, orthogonal, Procrustes, rigid motion, rotation, singular value decomposition, trace

1. Introduction

In the orthogonal Procrustes problem [1, 2], given real matrices P and Q of size d × n, the problem is that of finding a d × d orthogonal matrix U that minimizes ∥U Q − PF, where ∥ · ∥F denotes the Frobenius norm of a matrix. On the other hand, in the constrained orthogonal Procrustes problem [35], the same function is minimized but U is constrained to be a rotation matrix, i.e., an orthogonal matrix of determinant 1. By letting pi, qi, i = 1…, n, be the vectors in d that are the columns from left to right of P and Q, respectively, since clearly UQPF2=i=1nUqipi2, where ∥ · ∥ denotes the d−dimensional Euclidean norm, then an alternative formulation of the two problems above is that of finding an orthogonal matrix U (of determinant 1 for the constrained problem) that minimizes i=1nUqipi2. We note that minimizing matrices do exist for the two problems as the function being minimized is continuous and both the set of orthogonal matrices and the set of rotation matrices are compact (in some topology). Finally, in the same vein, another problem of interest is the orientation-preserving rigid motion problem which is that of finding an orientation-preserving rigid motion ϕ of d that minimizes i=1nϕ(qi)pi2. An affine linear function ϕ,ϕ:dd, is a rigid motion of d if it is of the form ϕ(q)=Uq+tforqd , where U is a d × d orthogonal matrix, and t is a vector in d. The rigid motion ϕ is orientation preserving if det(U) = 1, i.e., the determinant of U equals 1. With p¯,q¯ denoting the centroids of {pi}, {qi}, respectively, as will be shown in Section 3 of this paper, this problem can be reduced to the constrained orthogonal Procrustes problem by translating {pi}, {qi} to become {pip¯},{qiq¯}, respectively, so that the centroid of each set becomes 0d.

With P, Q, pi, qi, i = 1, …, n, as above, in this paper we focus our attention mostly on the constrained orthogonal Procrustes problem, and therefore wish to find a d × d rotation matrix U that minimizes i=1nUqipi2.

With this purpose in mind, we rewrite i=1nUqipi2 as follows, where given a square matrix R, tr(R) stands for the trace of R.

i=1nUqipi2=i=1n(Uqipi)T(Uqipi)=tr((UQP)T(UQP))=tr((QTUTPT)(UQP))=tr(QTQ+PTPQTUTPPTUQ)=tr(QTQ)+tr(PTP)2tr(PTUQ).

Since only the third term in the last line above depends on U, it suffices to find a d × d rotation matrix U that maximizes tr(PT U Q). Since tr(PT U Q) = tr(U QPT) (note in general tr(AB) = tr(BA), A an n × d matrix, B a d × n matrix), denoting the d × d matrix QPT by M, this problem is equivalent to finding a d × d rotation matrix U that maximizes tr(U M), and it is well known that one such U can be computed from the singular value decomposition of M [35]. This is done with the Kabsch-Umeyama algorithm [35] (see Algorithm Kabsch-Umeyama below, where diag{s1, …, sd} is the d × d diagonal matrix with numbers s1, …, sd as the elements of the diagonal, in that order running from the upper left to the lower right of the matrix). A singular value decomposition (SVD) [6] of M is a representation of the form M = V SWT, where V and W are d × d orthogonal matrices and S is a d × d diagonal matrix with the singular values of M, which are nonnegative real numbers, appearing in the diagonal of S in descending order, from the upper left to the lower right of S. Finally, note that any matrix, not necessarily square, has a singular value decomposition, not necessarily unique [6].

graphic file with name jres-124-028-g001.jpg

Algorithm Kabsch-Umeyama has existed for several decades [35], however the known justifications of the algorithm [35] are not totally algebraic as they are based on exploiting the optimization technique of Lagrange multipliers. It is the main purpose of this paper to justify the algorithm in a purely algebraic manner through the exclusive use of simple concepts from linear algebra. This is done in Section 2 of the paper. Finally, we note that applications of the algorithm can be found, notably in the field of functional and shape data analysis [7, 8], where, in particular, the shapes of two curves are compared, in part by optimally rotating one curve to match the other.

2. Algebraic Justification of the Kabsch-Umeyama Algorithm

We justify Algorithm Kabsch-Umeyama using exclusively simple concepts from linear algebra, mostly in the proof of the following useful proposition. We note that most of the proof of the proposition is concerned with proving (3) of the proposition. Thus, it seems reasonable to say that any justification of the algorithm that requires the conclusion in (3) but lacks a proof for it, is not exactly complete. See page 47 of the otherwise excellent thesis in [9] for an example of this situation. See [10] for an outline of this dissertation.

Proposition 1:

If D = diag{σ1, …, σd}, σj 0, j = 1, …, d, and W is a d × d orthogonal matrix, then

  1. tr(WD)j=1dσj.

  2. If B is a d × d orthogonal matrix, S = BT DB, then tr(W S) tr(S).

  3. If det(W) = −1, σdσj, j = 1, …, d−1, then tr(WD)j=1d1σjσd.

Proof:

Since W is orthogonal and if Wkj, k, j = 1, …, d, are the entries of W, then, in particular, Wjj 1, j = 1, …, d, so that tr(WD)=j=1dWjjσjj=1dσj , and therefore statement (1) holds.

Accordingly, assuming B is a d × d orthogonal matrix, since BW BT is also orthogonal, it follows from (1) that tr(WS)=tr(WBTDB)=tr(BWBTD)j=1dσj=tr(D)=tr(S) , and therefore (2) holds.

If det(W) = −1, we show next that a d × d orthogonal matrix B can be identified so that with W¯=BTWB , then W¯=(W0OOT1) , W0 interpreted as the upper leftmost d − 1 × d − 1 entries of W¯ and as a d − 1 × d − 1 matrix as well; O interpreted as a vertical column or vector of d − 1 zeroes.

With I as the d × d identity matrix, then det(W) = −1 implies det(W + I) = −det(W)det(W + I) = −det(WT)det(W + I) = −det(I + WT) = −det(I + W) which implies det(W + I) = 0 so that x ≠ 0 exists in d with Wx = −x. It also follows then that WTWx = WT (−x) which gives x = −WTx so that WTx = −x as well.

Letting bd = x, vectors b1, …, bd−1 can be obtained so that b1, …, bd form a basis of d, and by the Gram-Schmidt process starting with bd, we may assume b1, …, bd form an orthonormal basis of d with Wbd = WTbd = −bd. Letting B = (b1, …, bd), interpreted as a d × d matrix with columns b1, …, bd, in that order, it then follows that B is orthogonal, and with W¯=BTWB and W0, O as previously described, noting BTWbd=BT(bd)=(O1) and bdTWB=(WTbd)TB=(bd)TB=(OT1), then W¯=(W0OOT1). Note W¯ is orthogonal and therefore so is the d − 1 × d − 1 matrix W0.

Let S = BT DB and write S=(S0abTγ), S0 interpreted as the upper leftmost d − 1 × d − 1 entries of S and as a d − 1 × d − 1 matrix as well; a and b interpreted as vertical columns or vectors of d − 1 entries, and γ as a scalar. Note tr(WD)=tr(BTWDB)=tr(BTWBBTDB)=tr(W¯S), so that W¯S=(W0OOT1)(S0abTγ)=(W0S0W0abTγ) gives tr(W D) = tr(W0S0) − γ.

We show tr(W0S0) tr(S0). For this purpose let W^=(W0OOT1) W0 and O as above. Since W0 is orthogonal, then clearly W^ is a d × d orthogonal matrix, and by (2), tr(W^S)tr(S) so that W^S=(W0OOT1)(S0abTγ)=(W0S0W0abTγ) gives tr(W0S0)+γ=tr(W^S)tr(S)=tr(S0)+γ. Thus, tr(W0S0) tr(S0).

Note tr(S0) + γ = tr(S) = tr(D), and if Bkj, k, j = 1, …, d are the entries of B, then γ=k=1dBkd2σk, a convex combination of the σk’s, so that γ ≥ σd. It then follows that tr(WD)=tr(W0S0)γtr(S0)γ=tr(D)γγj=1d1σjσd, and therefore (3) holds. □

Finally, the following theorem, a consequence of Proposition 1, justifies the Kabsch-Umeyama algorithm.

Theorem 1:

Given a d × d matrix M, let V, S, W be d × d matrices such that the singular value decomposition of M gives M = V SWT. If det(VW) > 0, then U = WVT maximizes tr(UM) over all d × d rotation matrices U. Otherwise, if det(VW) < 0, with S˜=diag{s1,,sd},s1==sd1=1,sd=1, then U=WS˜VT maximizes tr(UM) over all d × d rotation matrices U. □

Proof:

Let σj, j = 1, …, d, σ1 ≥ σ2 ≥ σd 0, be the singular values of M, so that S = diag{σ1, …, σd}.

Assume det(VW) > 0. If U is any rotation matrix, then U is orthogonal. From (1) of Proposition 1 since WT UV is orthogonal, then tr(UM)=tr(UVSWT)=tr(WTUVS)j=1dσj.

On the other hand, if U = WVT, then U is clearly orthogonal, det(U) = 1, and tr(UM)=tr(WVTVSWT)=tr(WSWT)=tr(S)=j=1dσj.

Thus, U = WVT maximizes tr(UM) over all d × d rotation matrices U.

Finally, assume det(VW) < 0. If U is any rotation matrix, then U is orthogonal and det(U) = 1. From (3) of Proposition 1 since WT UV is orthogonal and det(WT UV) = −1, then tr(UM)=tr(UVSWT)=tr(WTUVS)j=1d1σjσd.

On the other hand, if U=WS˜VT, then U is clearly orthogonal, det(U) = 1, and tr(UM)=tr(WS˜VTVSWT)=tr(WS˜SWT)=tr(S˜S)=j=1d1σjσd.

Thus, U=WS˜VT maximizes tr(UM) over all d × d rotation matrices U. □

3. Reduction of the Orientation-Preserving Rigid Motion Problem to the Constrained Orthogonal Procrustes Problem

Although not exactly related to the main goal of this paper, for the sake of completeness, we show the orientation-preserving rigid motion problem reduces to the constrained orthogonal Procrustes problem. For this purpose, let q¯ and p¯ denote the centroids of the sets {qi}i=1n and {pi}i=1n in d, respectively:

q¯=1ni=1nqi and p¯=1ni=1npi.

First we prove a proposition that shows, in particular, that if ϕ^(q¯)p¯, then ϕ=ϕ^ does not minimize

Δ(ϕ)=i=1nϕ(qi)pi2,

the minimization occurring over either the set of all rigid motions ϕ of d or the smaller set of rigid motions ϕ of d that are orientation preserving.

Proposition 2:

Let ϕ be a rigid motion of d with ϕ(q¯)p¯ and define an affine linear function τ, τ:dd, by τ(q)=ϕ(q)ϕ(q¯)+p¯forqd for qd. Then τ is a rigid motion of d,τ(q¯)=p¯,Δ(τ)<Δ(ϕ), and if ϕ is orientation preserving, then so is τ.

Proof:

Clearly τ(q¯)=p¯ Let U be a d × d orthogonal matrix and td such that ϕ (q) = U q + t for qd. Then τ(q)=UqUq¯+p¯ so that τ is a rigid motion of d, τ is orientation preserving if ϕ is, and for 1 ≤ i ≤ n, we have

ϕ(qi)pi2τ(qi)pi2=(Uqi+tpi)T(Uqi+tpi)(UqiUq¯+p¯pi)T(UqiUq¯+p¯pi)=((Uqipi)T(Uqipi)+2(Uqipi)Tt+tTt)((Uqipi)T(Uqipi)2(Uqipi)T(Uq¯p¯)+(Uq¯p¯)T(Uq¯p¯))=2(Uqipi+t)T(Uq¯p¯+t)(Uq¯p¯+t)T(Uq¯p¯+t).

It then follows that

Δ(ϕ)Δ(τ)=i=1n(2(Uqipi+t)T(Uq¯p¯+t)(Uq¯p¯+t)T(Uq¯p¯+t))=n||Uq¯p¯+t||2=n||ϕ(q¯)p¯||2>0

as ϕ(q¯)p¯ is nonzero. Thus Δ(τ) < Δ(ϕ). □

Finally, the following corollary, a consequence of Proposition 2, shows that the problem of finding an orientation-preserving rigid motion ϕ of d that minimizes i=1nϕ(qi)pi2 can be reduced to a constrained orthogonal Procrustes problem which, of course, then can be solved with the Kabsch-Umeyama algorithm. Here ri=pip¯,si=qiq¯, for i = 1, …, n, and if r¯=1ni=1nri,s¯=1ni=1nsi, then clearly r¯=s¯=0.

Corollary 1:

Let U^ be such that U=U^ minimizes i=1nUsiri2 over all d × d rotation matrices U. Let t^=p¯U^q¯, and let ϕ^ be given by ϕ^(q)=U^q+t^ for qd. Then ϕ=ϕ^ minimizes i=1nϕ(qi)pi2 over all orientation-preserving rigid motions ϕ of d.

Proof:

One such U^ can be computed with the Kabsch-Umeyama algorithm.

By Proposition 2, in order to minimize i=1nϕ(qi)pi2 over all orientation-preserving rigid motions ϕ of d, it suffices to do it over those for which ϕ(q¯)=p¯. Therefore, it suffices to minimize i=1nUqi+tpi2 with t=p¯Uq¯ over all d × d rotation matrices U, i.e., it suffices to minimize

i=1nUqi+p¯Uq¯pi2=i=1n(U(qiq¯)(pip¯)2

over all d × d rotation matrices U. But minimizing the last expression is equivalent to minimizing i=1nUsiri2 over all d × d rotation matrices U. Since U=U^ is a solution to this last problem, it then follows that U=U^ minimizes i=1nUqi+p¯Uq¯pi2=i=1nUqi+tpi2 with t=p¯Uq¯ over all d × d rotation matrices U. Consequently, if t^=p¯U^q¯, and ϕ^ is given by ϕ^(q)=U^q+t^ for qd, then ϕ=ϕ^ clearly minimizes i=1nϕ(qi)pi2 over all orientation-preserving rigid motions ϕ of d. □

Biography

About the authors: James F. Lawrence is a professor of mathematics at George Mason University. He is a respected authority in the field of convex and combinatorial geometry. In his position as faculty appointee in the Applied and Computational Mathematics Division at NIST he serves as an expert on combinatorial issues. He received a B.S. in mathematics in 1972 from Oklahoma State University and a Ph.D. in mathematics in 1975 from the University of Washington. He was a National Research Council Postdoctoral Fellow at NIST and held positions at the University of Texas at Austin, the University of Massachusetts at Boston, and the University of Kentucky.

Javier Bernal is a mathematician in the Applied and Computational Mathematics Division of the NIST Information Technology Laboratory in Gaithersburg, MD. He received his Ph.D. in mathematics in 1980 from Catholic University in Washington, DC, the same year he joined NIST. His research interests include the development, analysis and implementation of algorithms in computational geometry, functional and shape data analysis, image processing, optimization, and applied linear algebra.

Christoph Witzgall holds the designation of Scientist Emeritus from the NIST Information Technology Laboratory (ITL). He is a respected authority in the fields of operations research, optimization and numerical analysis. He received his Ph.D. in mathematics in 1958 from the Ludwig-Maximilians-Universität München. Although he retired from government service in 2003, he continues to serve as a guest researcher in the ITL Applied and Computational Mathematics Division. Christoph has been associated with NIST since 1962, serving as Acting Chief of its Operations Research Division from 1979 to 1982. He received the Department of Commerce Silver Medal for meritorious Federal service.

The National Institute of Standards and Technology is an agency of the U.S. Department of Commerce.

4. References

  • [1].Gower JC, Dijksterhuis GB (2004) Procrustes Problems (Oxford Statistical Science Series) (Oxford University Press; ), 1st Ed. [Google Scholar]
  • [2].Schönemann PH (1966) A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1):1–10. 10.1007/BF02289451 [DOI] [Google Scholar]
  • [3].Kabsch W (1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics 32(5):922–923. 10.1107/S0567739476001873 [DOI] [Google Scholar]
  • [4].Kabasch W (1978) A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics 34(5):827–828. 10.1107/S0567739478001680 [DOI] [Google Scholar]
  • [5].Umeyama S (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Trans Pattern Analysis and Machine Intelligence 13(4):376–380. 10.1109/34.88573 [DOI] [Google Scholar]
  • [6].Lay DC, Lay SR, McDonald JJ (2016) Linear Algebra and its Applications (Pearson Education, Boston: ), 5th Ed. [Google Scholar]
  • [7].Doğan G, Bernal J, Hagwood C (2015) FFT-based alignment of 2d closed curves with application to elastic shape analysis Proceedings of the 1st International Workshop on Differential Geometry in Computer Vision for Analysis of Shapes, Images and Trajectories (DIFF-CV 2015), Swansea, Wales, eds Drira H, Kurtek S, Turaga P (BMVA Press; ), Vol. 12, pp 1–10. 10.5244/C.29.DIFFCV.12 Available at 10.5244/C.29.DIFFCV.12 [DOI] [Google Scholar]
  • [8].Srivastava A, Klassen EP (2016) Functional and Shape Data Analysis (Springer-Verlag; New York: ), 1st Ed. [Google Scholar]
  • [9].Papadimitriou P (1993) Parallel Solution of SVD-Related Problems, with Applications. Ph.D. thesis. University of Manchester, Manchester, England: Available at https://www.maths.manchester.ac.uk/~higham/misc/past-students.php. [Google Scholar]
  • [10].Higham N (1994) Matrix Procrustes problems. Available at https://www.maths.manchester.ac.uk/~higham/talks.

Articles from Journal of Research of the National Institute of Standards and Technology are provided here courtesy of National Institute of Standards and Technology

RESOURCES