Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jan 30.
Published in final edited form as: Appl Comput Harmon Anal. 2011 Jan 30;30(1):20–36. doi: 10.1016/j.acha.2010.02.001

Angular Synchronization by Eigenvectors and Semidefinite Programming

A Singer *
PMCID: PMC3003935  NIHMSID: NIHMS181381  PMID: 21179593

Abstract

The angular synchronization problem is to obtain an accurate estimation (up to a constant additive phase) for a set of unknown angles θ1, …, θn from m noisy measurements of their offsets θiθj mod 2π. Of particular interest is angle recovery in the presence of many outlier measurements that are uniformly distributed in [0, 2π) and carry no information on the true offsets. We introduce an efficient recovery algorithm for the unknown angles from the top eigenvector of a specially designed Hermitian matrix. The eigenvector method is extremely stable and succeeds even when the number of outliers is exceedingly large. For example, we successfully estimate n = 400 angles from a full set of m=(4002) offset measurements of which 90% are outliers in less than a second on a commercial laptop. The performance of the method is analyzed using random matrix theory and information theory. We discuss the relation of the synchronization problem to the combinatorial optimization problem Max-2-Lin mod L and present a semidefinite relaxation for angle recovery, drawing similarities with the Goemans-Williamson algorithm for finding the maximum cut in a weighted graph. We present extensions of the eigenvector method to other synchronization problems that involve different group structures and their applications, such as the time synchronization problem in distributed networks and the surface reconstruction problems in computer vision and optics.

1 Introduction

The angular synchronization problem is to estimate n unknown angles θ1, …, θn ∈ [0, 2π) from m noisy measurements δij of their offsets θiθj mod 2π. In general, only a subset of all possible (n2) offsets are measured. The set E of pairs {i, j} for which offset measurements exist can be realized as the edge set of a graph G = (V, E) with vertices corresponding to angles and edges corresponding to measurements.

When all offset measurements are exact with zero measurement error, it is possible to solve the angular synchronization problem iff the graph G is connected. Indeed, if G is connected, then it contains a spanning tree and all angles are sequentially determined by traversing the tree while summing the offsets modulo 2π. The angles are uniquely determined up to an additive phase, e.g., the angle of the root. On the other hand, if G is disconnected, then it is impossible to determine the offset between angles that belong to disjoint components of the graph.

Sequential algorithms that integrate the measured offsets over a particular spanning tree of the graph are very sensitive to measurement errors, due to accumulation of the errors. It is therefore desirable to integrate all offset measurements in a globally consistent way. The need for such a globally consistent integration method comes up in a variety of applications. One such application is the time synchronization of distributed networks [17, 23], where clocks measure noisy time offsets titj from which the determination of t1, …, tn ∈ ℝ is required. Other applications include the surface reconstruction problems in computer vision [13, 1] and optics [30], where the surface is to be reconstructed from noisy measurements of the gradient to the surface and the graph of measurements is typically the two-dimensional regular grid. The most common approach in the above mentioned applications for a self consistent global integration is the least squares approach. The least squares solution is most suitable when the offset measurements have a small Gaussian additive error. The least squares solution can be efficiently computed and also mathematically analyzed in terms of the Laplacian of the underlying measurement graph.

There are many possible models for the measurement errors, and we are mainly interested in models that allow many outliers. An outlier is an offset measurement that has a uniform distribution on [0, 2π) regardless of the true value for the offset. In addition to outliers that carry no information on the true angle values, there also exist of course good measurements whose errors are relatively small. We have no a-priori knowledge, however, which measurements are good and which are bad (outliers).

In our model, the edges of E can be split into a set of good edges Egood and a set of bad edges Ebad, of sizes mgood and mbad respectively (with m = |E| = mgood + mbad), such that

δij=θiθjfor{i,j}EgoodδijUniform([0,2π))for{i,j}Ebad. (1)

Perhaps it would be more realistic to allow a small discretization error for the good offsets, for example, by letting them have the wrapped normal distribution on the circle with mean θiθj and variance σ2 (where σ is a typical discretization error). This discretization error can be incorporated into the mathematical analysis of Section 4 with a little extra difficulty. However, the effect of the discretization error is negligible compared to that of the outliers, so we choose to ignore it in order to make the presentation as simple as possible.

It is trivial to find a solution to (1) if some oracle whispers to our ears which equations are good and which are bad (in fact, all we need in that case is that Egood contains a spanning tree of G). In reality, we have to be able to tell the good from the bad on our own.

The overdetermined system of linear equations (modulo 2π)

θiθj=δijmod2π,for{i,j}E (2)

can be solved by the method of least squares as follows. Introducing the complex-valued variables zi = eιθi, the system (2) is equivalent to

zieιδijzj=0,for{i,j}E, (3)

which is an overdetermined system of homogeneous linear equations over ℂ. To prevent the solution from collapsing to the trivial solution z1 = z2 = ··· = zn = 0, we set z1 = 1 (recall that the angles are determined up to a global additive phase, so we may choose θ1 = 0), and look for the solution z2, …, zn of (3) with minimal ℓ2-norm residual. However, it is expected that the sum of squares errors would be overwhelmingly dominated by outlier equations, making least squares least favorable to succeed if the proportion of bad equations is large (see numerical results involving least squares in Table 3). We therefore seek for a solution method which is more robust to outliers.

Table 3.

Comparison between the correlations obtained by the eigenvector method ρeig, by the SDP method ρsdp and by the least squares method ρlsqr for different values of p (small world graph on S2, n = 200, ε = 0.3, m ≈ 3000). The SDP tends to find low-rank matrices despite the fact that the rank-one constraint on Θ is not included in the SDP. The rightmost column gives the rank of the Θ matrices that were found by the SDP. To solve the SDP (8)–(10) we used SDPLR, a package for solving large-scale SDP problems [5]. The least squares solution was obtained using MATLAB’s lsqr function. As expected, the least squares method yields poor correlations compared to the eigenvector and the SDP methods.

p ρlsqr ρeig ρsdp rank Θ

1 1 1 1 1
0.7 0.787 0.977 0.986 1
0.4 0.046 0.839 0.893 3
0.3 0.103 0.560 0.767 3
0.2 0.227 0.314 0.308 4
0.15 0.091 0.114 0.102 5

Maximum likelihood is an obvious step in that direction. The maximum likelihood solution to (1) is simply the set of angles θ1, …, θn that satisfies as many equations of (2) as possible. We may therefore define the self consistency error (SCE) of θ1, …, θn as the number of equations not being satisfied

SCE(θ1,,θn)=#{{i,j}E:θiθjδijmod2π}. (4)

As even the good equations contain some error (due to angular discretization and noise), a more suitable self consistency error is SCEf that incorporates some penalty function f

SCEf(θ1,,θn)={i,j}Ef(θiθjδij), (5)

where f: [0, 2π) ∈ ℝ is a smooth periodic function with f(0) = 0 and f(θ) = 1 for |θ| > θ0, where θ0 is the allowed discretization error. The minimization of (5) is equivalent to maximizing the log likelihood with a different probabilistic error model.

The maximum likelihood approach suffers from a major drawback though. It is virtually impossible to find the global minimizer θ1, …, θn when dealing with large scale problems (n ≫1), because the minimization of either (4) or (5) is a non-convex optimization problem in a huge parameter space. It is like finding a needle in a haystack.

In this paper we take a different approach and introduce two different estimators for the angles. The first estimator is based on an eigenvector computation while the second estimator is based on a semidefinite program (SDP) [38]. Our eigenvector estimator θ̂1, …, θ̂n is obtained by the following two-step recipe. In the first step, we construct an n × n complex-valued matrix H whose entries are

Hij={eιδij{i,j}E0{i,j}E, (6)

where ι=1. The matrix H is Hermitian, i.e. Hij = ji, because the offsets are skew-symmetric δij = −δji mod 2π. As H is Hermitian, its eigenvalues are real. The second step is to compute the top eigenvector v1 of H with maximal eigenvalue, and to define the estimator in terms of this top eigenvector as

eιθ^i=v1(i)v1(i),i=1,,n. (7)

The philosophy leading to the eigenvector method is explained in Section 2.

The second estimator is based on the following SDP

maxΘCn×ntrace(HΘ) (8)
s.t.Θ0 (9)
Θii=1i=1,2,,n, (10)

where Θ ≽ 0 is a shorthand notation for Θ being a Hermitian semidefinite positive matrix. The only difference between this SDP and the Goemans-Williamson algorithm for finding the maximum cut in a weighted graph [18] is that the maximization is taken over all semidefinite positive Hermitian matrices with complex-valued entries rather than just the real-valued symmetric matrices. The SDP-based estimator θ̂1, …, θ̂n is derived from the normalized top eigenvector v1 of Θ by the same rounding procedure (7). Our numerical experiments show that the accuracy of the eigenvector method and the SDP method are comparable. Since the eigenvector method is much faster, we prefer using it for large scale problems. The eigenvector method is also numerically appealing, because in the useful case the spectral gap is large, rendering the simple power method an efficient and numerically stable way of computing the top eigenvector. The SDP method is summarized in Section 3.

In Section 4 we use random matrix theory to analyze the eigenvector method for two different measurement graphs: the complete graph and “small-world” graphs [39]. Our analysis shows that the top eigenvector of H in the complete graph case has a non-trivial correlation with the vector of true angles as soon as the proportion p of good offset measurements becomes greater than 1n. In particular, the correlation goes to 1 as np2 → ∞, meaning a successful recovery of the angles. Our numerical simulations confirm these results and demonstrate the robustness of the estimator (7) to outliers.

In Section 5 we prove that the eigenvector method is asymptotically nearly optimal in the sense that it achieves the information theoretic Shannon bound up to a multiplicative factor that depends only on the discretization error of the measurements 2π/L, but not on m and n. In other words, no method whatsoever can accurately estimate the angles if the proportion of good measurements is o(nm). The connection between the angular synchronization problem and Max-2-Lin mod L [3] is explored in Section 6. Finally, Section 7 is a summary and discussion of further applications of the eigenvector method to other synchronization problems over different groups.

2 The Eigenvector Method

Our approach to finding the self consistent solution for θ1, …, θn starts with forming the following n × n matrix H

Hij={eιδij{i,j}E0{i,j}E, (11)

where ι=1. Since

δij=δij,foralli,j=1,,n, (12)

it follows that Hij = ji, where for any complex number z = a + ib we denote by = aιb its complex conjugate. In other words, the matrix H is Hermitian, i.e., H* = H. We choose to set the diagonal elements of H to 0 (i.e., Hii = 0).

Next, we consider the maximization problem

maxθ1,,θn[0,2π)i,j=1neιθiHijeιθj, (13)

and explain the philosophy behind it. For the correct set of angles θ1, …, θn, each good edge contributes

eιθieι(θiθj)eιθj=1

to the sum in (13). The total contribution of the good edges is just the sum of ones, piling up to be exactly the total number of good edges mgood. On the other hand, the contribution of each bad edge will be uniformly distributed on the unit circle in the complex plane. Adding up the terms due to bad edges can be thought of as a discrete planar random walk where each bad edge corresponds to a unit size step at a uniformly random direction. These random steps mostly cancel out each other, such that the total contribution of the mbad edges is only O(mbad). It follows that the objective function in (13) has the desired property of diminishing the contribution of the bad edges by a square root relative to the linear contribution of the good edges.

Still, the maximization problem (13) is a non-convex maximization problem which is quite difficult to solve in practice. We therefore introduce the following relaxation of the problem

maxz1,,znCi=1nzi2=ni,j=1nz¯iHijzj. (14)

That is, we replace the previous n individual constraints for each of the variables zi = eιθi to have a unit magnitude, by a single and much weaker constraint, requiring the sum of squared magnitudes to be n. The maximization problem (14) is that of a quadratic form whose solution is simply given by the top eigenvector of the Hermitian matrix H. Indeed, the spectral theorem implies that the eigenvectors v1, v2, …, vn of H form an orthonormal basis for ℂn with corresponding real eigenvalues λ1λ2 ≥ … ≥ λn satisfying Hvi = λivi. Rewriting the constrained maximization problem (14) as

max||z||2=nzHz, (15)

it becomes clear that the maximizer z is given by z = v1, where v1 is the normalized top eigenvector satisfying Hv1 = λ1v1 and ||v1||2 = n, with λ1 being the largest eigenvalue. The components of the eigenvector v1 are not necessarily of unit magnitude, so we normalize them and define the estimated angles by

eιθ^i=v1(i)v1(i),fori=1,,n (16)

(see also equation (7)).

The top eigenvector can be efficiently computed by the power iteration method that starts from a randomly chosen vector b0 and iterates bn+1=Hbn||Hbn||. Each iteration requires just a matrix-vector multiplication that takes O(n2) operations for dense matrices, but only O(m) operations for sparse matrices, where m = |E| is the number of non-zero entries of H corresponding to edges in the graph. The number of iterations required by the power method decreases with the spectral gap that indeed exists and is analyzed in detail in Section 4.

Note that cycles in the graph of good edges lead to consistency relations between the offset measurements. For example, if the three edges {i, j}, {j, k}, {k, i} are a triangle of good edges, then the corresponding offset angles δij, δjk and δki must satisfy

δij+δjk+δki=0mod2π, (17)

because

δij+δjk+δki=θiθj+θjθk+θkθi=0mod2π.

A closer look into the power iteration method reveals that multiplying the matrix H by itself integrates the information in the consistency relation of triplets, while higher order iterations exploit consistency relations of longer cycles. Indeed,

Hij2=k=1nHikHkj=k:{i,k},{j,k}Eeιδikeιδkj=#{k:{i,k}and{j,k}Egood}eι(θiθj)+k:{i,k}or{j,k}Ebadeι(δik+δkj). (18)

The top eigenvector therefore integrates the consistency relations of all cycles.

3 The semidefinite program approach

A different natural relaxation of the optimization problem (13) is using SDP. Indeed, the objective function in (13) can be written as

i,j=1neιθiHijeιθj=trace(HΘ), (19)

where Θ is the n × n complex-valued rank-one Hermitian matrix

Θij=eι(θiθj). (20)

Note that Θ has ones on its diagonal

Θii=1,i=1,2,,n. (21)

Except for the non-convex rank-one constraint implied by (20), all other constraints are convex and lead to the natural SDP relaxation (8)–(10). This program is almost identical to the Goemans-Williamson SDP for finding the maximum cut in a weighted graph. The only difference is that here we maximize over all possible complex-valued Hermitian matrices, not just the symmetric real matrices. The SDP-based estimator corresponding to (8)–(10) is then obtained from the best rank-one approximation of the optimal matrix Θ using the Cholesky decomposition.

The SDP method may seem favorable to the eigenvector method as it explicitly imposes the unit magnitude constraint for eιθi. Our numerical experiments show that the two methods give similar results (see Table 3). Since the eigenvector method is much faster, it is also the method of choice for large scale problems.

4 Connections with random matrix theory and spectral graph theory

In this section we analyze the eigenvector method using tools from random matrix theory and spectral graph theory.

4.1 Analysis of the complete graph angular synchronization problem

We first consider the angular synchronization problem in which all (n2) angle offsets are given, so that the corresponding graph is the complete graph Kn of n vertices. We also assume that the probability for each edge to be good is p, independently of all other edges. This probabilistic model for the graph of good edges is known as the Erdős-Rényi random graph G(n, p) [9]. We refer to this model as the complete graph angular synchronization model.

The elements of H in the complete graph angular synchronization model are random variables given by the following mixture model. With probability p the edge {i, j} is good and Hij = eι (θiθj), whereas with probability 1 − p the edge is bad and Hij ~ Uniform (S1). It is convenient to define the diagonal elements as Hii = p.

The matrix H is Hermitian and the expected value of its elements is

EHij=peι(θiθj). (22)

In other words, the expected value of H is the rank-one matrix

EH=npzz, (23)

where z is the normalized vector (||z|| = 1) given by

zi=1neιθi,i=1,,n. (24)

The matrix H can be decomposed as

H=npzz+R, (25)

where R = HInline graphicH is a random matrix whose elements have zero mean, with Rii = 0, and for ij

Rij={(1p)eι(θiθj)withprobabilitypeιϕpeι(θiθj)w.p.1pandϕUniform([0,2π)). (26)

The variance of Rij is

ERij2=(1p)2p+(1+p2)(1p)=1p2 (27)

for ij and 0 for the diagonal elements. Note that for p = 1 the variance vanishes as all edges become good.

The distribution of the eigenvalues of the random matrix R follows Wigner’s semi-circle law [40, 41] whose support is [ 2n(1p2),2n(1p2)]. The largest eigenvalue of R, denoted λ1(R), is concentrated near the right edge of the support [2] and the universality of the edge of the spectrum [34] implies that it follows the Tracy-Widom distribution [36] even when the entries of R are non-Gaussian. For our purposes, the approximation

λ1(R)2n(1p2) (28)

will suffice, with the probabilistic error bound given in [2].

The matrix H = npzz*+R can be considered as a rank-one perturbation to a random matrix. The distribution of the largest eigenvalue of such perturbed random matrices was investigated in [29, 11, 15] for the particular case where z is proportional to the all-ones vector (1 1 ··· 1)T. Although our vector z given by (24) is different, without loss of generality, we can assume θ1 = θ2 = … = θn = 0, because the matrix zz* can be reduced to the all-ones matrix by conjugation with the n × n diagonal matrix Z whose diagonal elements are Zii = zi, i = 1, …, n. Thus, adapting [11, Theorem 1.1] to H gives that for

np>n(1p2) (29)

the largest eigenvalue λ1(H ) jumps outside the support of the semi-circle law and is normally distributed with mean μ and variance σ2 given by

λ1(H)N(μ,σ2),μ=np1p2+1p2p,σ2=(n+1)p21np2(1p2), (30)

whereas for np<n(1p2), λ1(H ) still tends to the right edge of the semicircle given at 2n(1p2).

Note that the factor of 2 that appears in (28) has disappeared from (29), which is perhaps somewhat non-intuitive: it is expected that λ1(H ) > λ1(R) whenever np > λ1(R), but the theorem guarantees that λ1(H ) > λ1(R) already for np>12λ1(R).

The condition (29) also implies a lower bound on the correlation between the normalized top eigenvector v1 of H and the vector z. To that end, consider the eigenvector equation satisfied by v1:

λ1(H)v1=Hv1=(npzz+R)v1. (31)

Taking the dot product with v1 yields

λ1(H)=npz,v12+v1Rv1. (32)

From v1Rv1λ1(R) we obtain the lower bound

z,v12λ1(H)λ1(R)np, (33)

with λ1(H ) and λ1(R) given by (28) and (30). Thus, if the spectral gap λ1(H ) − λ1(R) is large enough, then v1 must be close to z, in which case the eigenvector method successfully recovers the unknown angles. Since the variance of the correlation of two random unit vectors in ℝn is 1/n, the eigenvector method would give above random correlation values whenever

λ1(H)λ1(R)np>1n. (34)

Replacing in (34) λ1(H ) by μ from (30) and λ1(R) by (28) and multiplying by pn yields the condition

np1p2+1p2np21p2>pn. (35)

Since np1p2+1p2np2, it follows that (35) is satisfied for

p>1n. (36)

Thus, already for p>1n we should obtain above random correlations between the vector of angles z and the top eigenvector v1. We therefore define the threshold probability pc as

pc=1n. (37)

When npλ1(R), the correlation between v1 and z can be predicted by using regular perturbation theory for solving the eigenvector equation (31) in an asymptotic expansion with the small parameter ε=λ1(R)np. Such perturbations are derived in standard textbooks on quantum mechanics aiming to find approximations to the energy levels and eigenstates of perturbed time-independent Hamiltonians (see, e.g., [20, Chapter 6]). In our case, the resulting asymptotic expansions of the non-normalized eigenvector v1 and of the eigenvalue λ1(H) are given by

v1z+1np[Rz(zRz)z]+, (38)

and

λ1(H)np+zRz+. (39)

Note that the first order term in (38) is perpendicular to the leading order term z, from which it follows that the angle α between the eigenvector v1 and the vector of true angles z satisfies the asymptotic relation

tan2α||Rz||2(zRz)2(np)2+, (40)

because ||Rz − (z*Rz)z||2 = ||Rz||2 − (z*Rz)2. The expected values of the numerator terms in (40) are given by

E||Rz||2=Ei=1n|j=1nRijzj|2=i,j=1nVar(Rijzj)=i=1njizj2(1p2)=(n1)(1p2), (41)

and

E(zRz)2=E[i,j=1nRijz¯izj]2=i,j=1nVar(Rijz¯izj)=(1p2)ijzi2zj2=(1p2)[(i=1nzi2)2i=1nzi4]=(1p2)(11n), (42)

where we used that Rij are i.i.d zero mean random variables with variance given by (27) and that zi2=1n. Substituting (41)–(42) into (40) results in

Etan2α(n1)2(1p2)n3p2+, (43)

which for p ≪ 1 and n ≫ 1 reads

Etan2α1np2+. (44)

This expression shows that as np2 goes to infinity, the angle between v1 and z goes to zero and the correlation between them goes to 1. For np2 ≫ 1, the leading order term in the expected squared correlation Inline graphiccos2 α is given by

Ecos2α=E11+tan2α11+1np2+. (45)

We conclude that even for very small p values, the eigenvector method successfully recovers the angles if there are enough equations, that is, if np2 is large enough.

Figure 1 shows the distribution of the eigenvalues of the matrix H for n = 400 and different values of p. The spectral gap decreases as p is getting smaller. From (29) we expect a spectral gap for ppc where the critical value is pc=1400=0.05. The experimental values of λ1(H) also agree with (30). For example, for n = 400 and p = 0.15, the expected value of the largest eigenvalue is μ = 67.28 and its standard deviation is σ = 0.93, while for p = 0.1 we get μ = 50.15 and σ = 0.86; these value are in full agreement with the location of the largest eigenvalues in Figures 1(a)–1(b). Note that the right edge of the semi-circle is smaller than 2n=40, so the spectral gap is significant even when p = 0.1.

Figure 1.

Figure 1

Histogram of the eigenvalues of the matrix H in the complete graph model for n = 400 and different values of p.

The skeptical reader may wonder whether the existence of a visible spectral gap necessarily implies that the normalized top eigenvector v1 correctly recovers the original set of angles θ1, …, θn (up to a constant phase). To that end, we compute the following two measures of correlation ρ1 and ρ2 for the correlation between the vector of true angles z and the computed normalized top eigenvector v1:

ρ1=|1ni=1neiθiv1(i)v1(i)|,ρ2=|1ni=1neiθiv1(i)=|z,v1. (46)

The correlation ρ1 takes into account the rounding procedure (16), while ρ2 is simply the dot product between v1 and z without applying any rounding. Clearly, ρ1, ρ2 ≤ 1 (Cauchy-Schwartz), and ρ1 = 1 iff the two sets of angles are the same up to a rotation. Note that it is possible to have ρ1 = 1 with ρ2 < 1. This happens when the angles implied by v1(i) are all correct, but the magnitudes |v1(i)| are not all the same. Table 1 summarizes the experimentally obtained correlations ρ1, ρ2 for different values of p with n = 100 (Table 0(a)) and n = 400 (Table 0(b)). The experimental results show that for large values of np2 the correlation is very close to 1, indicating a successful recovery of the angles. The third column, indicating the values of (1+1np2)12 is motivated by the asymptotic expansion (45) and seems to provide a very good approximation for ρ2 when np2 ≫ 1, with deviations attributed to higher order terms of the asymptotic expansion and to statistical fluctuations around the mean value. Below the threshold probability (ending rows of Tables 0(a) and 0(b) with np2 < 1), the correlations take values near 1n, as expected from the correlation of two unit random vectors in ℝn ( 1100=0.1 and 1400=0.05).

Table 1.

Correlations between the top eigenvector v1 of H and the vector z of true angles for different values of p in the complete graph model.

(a) n = 100
(b) n = 400
p np2
(1+1np2)12
ρ1 ρ2 p np2
(1+1np2)12
ρ1 ρ2


0.4 16 0.97 0.99 0.98 0.2 16 0.97 0.99 0.97
0.3 9 0.95 0.97 0.95 0.15 9 0.95 0.97 0.95
0.2 4 0.89 0.90 0.88 0.1 4 0.89 0.90 0.87
0.15 2.25 0.83 0.75 0.81 0.075 2.25 0.83 0.77 0.76
0.1 1 0.71 0.34 0.35 0.05 1 0.71 0.28 0.32
0.05 0.25 0.45 0.13 0.12 0.025 0.25 0.45 0.06 0.07

From the practical point of view, most important is the fact that the eigenvector method successfully recovers the angles even when a large portion of the offset measurements consists of just outliers. For example, for n = 400, the correlation obtained when 85% of the offset measurements were outliers (only 15% are good measurements) was ρ1 = 0.97.

4.2 Analysis of the angular synchronization problem in general

We turn to analyze the eigenvector method for general measurement graphs, where the graph of good measurements is assumed to be connected, while the graph of bad edges is assumed to be made of edges that are uniformly drawn from the remaining edges of the complete graph once the good edges has been removed from it. Our analysis is based on generalizing the decomposition given in (25).

Let A be the adjacency matrix for the set of good edges Egood:

Aij={1{i,j}Egood0{i,j}Egood. (47)

As the matrix A is symmetric, it has a complete set of real eigenvalues λ1λ2 ≥ … ≥ λn and corresponding real orthonormal eigenvectors ψ1, …, ψn such that

A=l=1nλlψlψlT. (48)

Let Z be an n × n diagonal matrix whose diagonal elements are Zii = eιθi. Clearly, Z is a unitary matrix (ZZ* = I). Define the Hermitian matrix B by conjugating A with Z

B=ZAZ. (49)

It follows that the eigenvalues of B are equal to the eigenvalues λ1, …, λn of A, and the corresponding eigenvectors {φl}l=1n of B, satisfying l = λlφl are given by

φl=Zψl,l=1,,n. (50)

From (49) it follows that

Bij={eι(θiθj){i,j}Egood0{i,j}Egood. (51)

We are now ready to decompose the matrix H defined in (11) as

H=B+R, (52)

where R is a random matrix whose elements are given by

Rij={eιδij{i,j}Ebad0{i,j}Ebad, (53)

where δij ~ Uniform([0, 2π) for {i, j} ∈ Ebad. The decomposition (52) is extremely useful, because it sheds light into the eigen-structure of H in terms of the much simpler eigen-structures of B and R.

First, consider the matrix B defined in (49), which shares the same spectrum with A and whose eigenvectors φ1, …, φn are phase modulations of the eigenvectors ψ1, …, ψn of A. If the graph of good measurements is connected, as it must be in order to have a unique solution for the angular synchronization problem (see second paragraph of Section 1), then the Perron-Frobenius theorem (see, e.g., [22, Chapter 8]) for the non-negative matrix A implies that the entries of ψ1 are all positive

ψ1(i)>0,foralli=1,2,,n, (54)

and therefore the complex phases of the coordinates of the top eigenvector φ1 = 1 of B are identical to the true angles, that is, eιθi=φ1(i)φ1(i). Hence, if the top eigenvector of H is highly correlated with the top eigenvector of B, then the angles will be estimated with high accuracy. We will shortly derive the precise condition that guarantees such a high correlation between the eigenvectors of H and B.

The spectral gap Δgood of the good graph is the difference between its first and second eigenvalues, i.e., Δgood = λ1(A) − λ2(A). The Perron-Frobenius theorem and the connectivity of the graph of good measurements also imply that Δgood > 0.

Next, we turn to analyze the spectrum of the random matrix R given in (53). We assume that the mbad bad edges were drawn uniformly at random from the remaining edges of the complete graph on n vertices that are not already good edges. There are only 2mbad nonzero elements in R, which makes R a sparse matrix with an average number of 2mbad/n nonzero entries per row. The nonzero entries of R have zero mean and unit variance. The spectral norm of such sparse random matrices was studied in [25, 24] where it was shown that with probability 1,

limsupnn2mbadλ1R2

as long as mbadnlogn as n → ∞. The implication of this result is that we can approximate λ1(R) with

λ1(R)22mbadn. (55)

Similar to the spectral gap condition (29), requiring

Δgood>12λ1(R), (56)

ensures that with high probability, the top eigenvector of H would be highly correlated with the top eigenvector of B. Plugging (55) into (56), we get the condition

Δgood>2mbadn. (57)

We illustrate the above analysis for the small world graph, starting with a neighborhood graph on the unit sphere S2 with n vertices corresponding to points on the sphere and m edges, and rewiring each edge with probability 1− p at random, resulting shortcut edges. The shortcut edges are considered as bad edges, while unperturbed edges are the good edges. As the original m edges of the small world graph are rewired with probability 1 − p, the expected number of bad edges Inline graphicmbad and the expected number of good edges Inline graphicmgood are given by

Emgood=pm,Embad=(1p)m, (58)

with relatively small fluctuations of O(mp(1p)).

The average degree of the original unperturbed graph is d¯=2mn. Assuming uniform sampling of points on the sphere, it follows that the average area of the spherical cap covered by the neighboring points is 4πd¯n=8πmn2. The average opening angle η corresponding to this cap satisfies 2π(1cosη)=8πmn2, or 1cosη=4mn2. Consider the limit m, n → ∞ while keeping the ratio c = 4m/n2 constant. By the law of large numbers, the matrix 1nA converges in this limit to the integral convolution operator Inline graphic on S2 (see, e.g., [7]), given by

(Kf)(β)=p4πS2χ[1c,1](β,β)f(β)dSβ,βS2, (59)

where Inline graphic is the characteristic function of the interval I.

The classical Funk-Hecke theorem (see, e.g., [28, p. 195]) asserts that the spherical harmonics are the eigenfunctions of convolution operators over the sphere, and the eigenvalues λl are given by

λl(K)=p211χ[1c,1]Pl(t)dt=p21c1Pl(t)dt,

and have multiplicities 2l + 1 (l = 0, 1, 2, …), where Pl are the Legendre polynomials (P0(t) = 1, P1(t) = t, …). In particular, λ0(K)=pc2,λ1(K)=pc2(112c), and the spectral gap of Inline graphic is Δ(K)=pc24. The spectral gap of A is approximately

ΔgoodnΔ(K)=npc24=4m2pn3. (60)

Plugging (58) and (60) into (57) yields the condition

4m2pn3>2(1p)mn, (61)

which is satisfied for p > pc, where pc is the threshold probability

pcn58m3. (62)

We note that this estimate for the threshold probability is far from being tight and can be improved in principle by taking into account the entire spectrum of the good graph rather than just the spectral gap between the top eigenvalues, but we do not attempt to derive tighter bounds here.

We end this section by describing the results of a few numerical experiments. Figure 2 shows the histogram of the eigenvalues of the matrix H for small-world graphs on S2. Each graph was generated by sampling n points β1, …, βn on the unit sphere S2 in ℝ3 from the uniform distribution as well as n random rotation angles θ1, …, θn uniformly distributed in [0, 2π). An edge between i and j exists iff 〈βi, βj 〉 > 1 − ε, where ε is a small parameter that determines the connectivity (average degree) of the graph. The resulting graph is a neighborhood graph on S2. The small world graphs were obtained by randomly rewiring the edges of the neighborhood graph. Every edge is rewired with probability 1 − p, so that the expected proportion of good edges is p.

Figure 2.

Figure 2

Histogram of the eigenvalues of the matrix H in the small-world model for n = 400, ε = 0.2, m ≈ 8000, and different values of p.

The histograms of Figure 2 for the eigenvalues of H seem to be much more exotic than the ones obtained in the complete graph case shown in Figure 1. In particular, there seems to be a long tail of large eigenvalues, rather than a single eigenvalue that stands out from all the others. But now we understand that these eigenvalues are nothing but the top eigenvalues of the adjacency matrix of the good graph, related to the spherical harmonics. This behavior is better visible in Figure 3.

Figure 3.

Figure 3

Bar plot of the 25 largest eigenvalues of the matrix H in the small-world model for n = 4000, ε = 0.2, m ≈ 8 · 105, and different values of p. The multiplicities 1, 3, 5, 7, 9 corresponding to the spherical harmonics are evident as long as p is not too small. As p decreases, the high-oscillatory spherical harmonics are getting “swallowed” by the semi-circle.

The experimental correlations given in Table 2 indicate jumps in the correlation values that occur between p = 0.15 and p = 0.2 for n = 100 and between p = 0.1 and p = 0.12 for n = 400. The experimental threshold values seem to follow the law pcn2m that holds for the complete graph case (36) with m=(n2). As mentioned earlier, (62) is a rather pessimistic estimate of the threshold probability.

Table 2.

Correlations between the top eigenvector of H and the vector of true angles for different values of p in the small-world S2 model.

(a) n = 100, ε = 0.3, m ≈ 750
(b) n = 400, ε = 0.2, m ≈ 8000
p
2mp2n
ρ1 p
2mp2n
ρ1


0.8 9.6 0.923 0.8 26 0.960
0.6 5.4 0.775 0.4 6.4 0.817
0.4 2.4 0.563 0.3 3.6 0.643
0.3 1.4 0.314 0.2 1.6 0.282
0.2 0.6 0.095 0.1 0.4 0.145

Also evident from Table 2 is that the correlation goes to 1 as 2mp2/n → ∞. We remark that using regular perturbation theory and the relation of the eigenstructure of B to the spherical harmonics, it should be possible to obtain an asymptotic series for the correlation in terms of the large parameter 2mp2/n, similar to the asymptotic expansion (45).

The comparison between the eigenvector and SDP methods (as well as the least squares method of Section 1) is summarized in Table 3 showing the numerical correlations for n = 200, ε = 0.3 (number of edges m ≈ 3000) and for different values of p. Although the SDP is slightly more accurate, the eigenvector method runs faster.

5 Information Theoretic Analysis

The optimal solution to the angular synchronization problem can be considered as the set of angles that maximizes the log-likelihood. Unfortunately, the log-likelihood is a non-convex function and the maximum likelihood cannot be found in a polynomial time. Both the eigenvector method and the SDP method are polynomial-time relaxations of the maximum log-likelihood problem. In the previous section we showed that the eigenvector method fails to recover the true angles when p is below the threshold probability pceig. It is clear that even the maximum likelihood solution would fail to recover the correct set of angles below some (perhaps lower) threshold. It is therefore natural to ask if the threshold value of the polynomial eigenvector method gets close to the optimal threshold value of the exponential-time maximum likelihood exhaustive search. In this section we provide a positive answer to this question using the information theoretic Shannon bound [8]. Specifically, we show that the threshold probability for the eigenvector method is asymptotically larger by just a multiplicative factor compared to the threshold probability of the optimal recovery algorithm. The multiplicative factor is a function of the angular discretization resolution, but not a function of n and m. The eigenvector method becomes less optimal as the discretization resolution improves.

We start the analysis by recalling that from the information theoretic point of view, the uncertainty in the values of the angles is measured by their entropy. The noisy offset measurements carry some bits of information on the angle values, therefore decreasing their uncertainty, which is measured by the conditional entropy that we need to estimate.

The angles θ1, …, θn can take any real value in the interval [0, 2π). However, an infinite number of bits is required to describe real numbers, and so we cannot hope to determine the angles with an arbitrary precision. Moreover, the offset measurements are often also discretized. We therefore seek to determine the angles only up to some discretization precision 2πL, where L is the number of subintervals of [0, 2π) obtained by dividing the unit circle is into L equally sized pieces.

Before observing any of the offset measurements, the angles are uniformly distributed on {0, 1, …, L − 1}, that is, each of them falls with equal probability 1/L to any of the L subintervals. It follows that the entropy of the i’th angle θi is given by

H(θi)=l=0L11Llog21L=log2L,fori=1,2,,n. (63)

We denote by θn = (θ1, …, θn) the vector of angles. Since θ1, …, θn are independent, their joint entropy H(θn) is given by

H(θn)=i=1nH(θi)=nlog2L, (64)

reflecting the fact that the configuration space is of size Ln = 2n log2L.

Let δij be the random variable for the outcome of the noisy offset measurement of θi and θj. The random variable δij is also discretized and takes values in {0, 1, …, L − 1}. We denote by δm = (δi1j1, …, δimjm) the vector of all offset measurements. Conditioned on the values of θi and θj, the random variable δij has the following conditional probability distribution

Pr{δijθi,θj}={1pLδijθiθjmodL,p+1pLδij=θiθjmodL, (65)

because with probability 1 − p the measurement δij is an outlier that takes each of the L possibilities with equal probability 1L, and with probability p it is a good measurement that equals θiθj. It follows that the conditional entropy H(δij | θi, θj) is

H(δijθi,θj)=(L1)1pLlog21pL(p+1pL)log2(p+1pL). (66)

We denote this entropy by H(L, p) and its deviation from log2 L by I (L, p), that is,

H(L,p)(L1)1pLlog21pL(p+1pL)log2(p+1pL). (67)

and

I(L,p)log2LH(L,p). (68)

Without conditioning, the random variable δij is uniformly distributed on {0, …, L − 1} and has entropy

H(δij)=log2L. (69)

It follows that the mutual information I(δij; θi, θj) between the offset measurement δij and the angle values θi and θj is

I(δij;θi,θj)=H(δij)H(δijθi,θj)=log2LH(L,p)=I(L,p). (70)

This mutual information measures the reduction in the uncertainty of the random variable δij from knowledge of θi and θj. Due to the symmetry of the mutual information,

I(δij;θi,θj)=H(δij)H(δijθi,θj)=H(θi,θj)H(θi,θjδij), (71)

the mutual information is also the reduction in uncertainty of the angles θi and θj given the noisy measurement of their offset δij. Thus,

H(θi,θjδij)=H(θi,θj)I(δij;θi,θj). (72)

Similarly, given all m offset measurements δm, the uncertainty in θn is given by

H(θnδm)=H(θn)I(δm;θn), (73)

with

I(δm;θn)=H(δm)H(δmθn). (74)

A simple upper bound for this mutual information is obtained by explicit evaluation of the conditional entropy H(δm|θn) combined with a simple upper bound on the joint entropy term H(δm). First, note that given the values of θ1, …, θn, the offsets become independent random variables. That is, knowledge of δi1j1 (given θi1, θj1) does not give any new information on the value of δi2j2 (given θi2, θj2). The conditional probability distribution of the offsets is completely determined by (65), and the conditional entropy is therefore the sum of m identical entropies of the form (66)

H(δmθn)=mH(L,p). (75)

Next, bounding the joint entropy H(δm) by the logarithm of its configuration space size Lm yields

H(δm)mlog2L. (76)

Note that this simple upper bound ignores the dependencies among the offsets which we know to exist, as implied, for example, by the triplet consistency relation (17). As such, (76) is certainly not a tight bound, but still good enough to prove our claim about the nearly optimal performance of the eigenvector method.

Plugging (75) and (76) in (74) yields the desired upper bound on the mutual information

I(δm;θn)mlog2LmH(L,p)=mI(L,p). (77)

Now, substituting the bound (77) and the equality (64) in (73) gives a lower bound for the conditional entropy

H(θnδm)nlog2LmI(L,p). (78)

We may interpret this bound in the following way. Before seeing any offset measurement the entropy of the angles is n log2 L, and each of the m offset measurements can decrease the conditional entropy by at most I(L, p), the information that it carries.

The bound (78) demonstrates, for example, that for fixed n, p and L, the conditional entropy is bounded from below by a linear decreasing function of m. It follows that unless m is large enough, the uncertainty in the angles would be too large. Information theory says that a successful recovery of all θ1, …, θn is possible only when their uncertainty, as expressed by the conditional entropy, is small enough. The last statement can be made precise by Fano’s inequality and Wolfowitz’ converse, also known as the weak and strong converse theorems to the coding theorem that provide a lower bound for the probability of the error probability in terms of the conditional entropy, see, e.g., [8, Chapter 8.9, pages 204–207] and [16, Chapter 5.8, pages 173–176].

In the language of coding, we may think of θn as a codeword that we are trying to decode from the noisy vector of offsets δm which is probabilistically related to θn. The codeword θn is originally uniformly distributed on {1, 2, …, 2n log2L} and from δm we estimate θn as one of the 2n log2L possibilities. Let the estimate be θ̂n and define the probability of error as Pe = Pr{θ̂nθn}. Fano’s inequality [8, Lemma 8.9.1, page 205] gives the following lower bound on the error probability

H(θnδm)1+Penlog2L. (79)

Combining (79) with the lower bound for the conditional entropy (78) we obtain a weak lower bound on the error probability

Pe1mnI(L,p)log2L1nlog2L. (80)

This lower bound for the probability of error is applicable to all decoding algorithms, not just for the eigenvector method. For large n, we see that for any β < 1,

mnI(L,p)log2L<βPe1β+o(1). (81)

We are mainly interested in the limit m, n → ∞ and p → 0 with L being fixed. The Taylor expansion of I(L, p) (given by (67)–(68)) near p = 0 reads

I(L,p)=12(L1)p2+O(p3). (82)

Combining (81) and (82) we obtain that

p=nm2log2L(L1)βPe1β+o(1),asn,m,n/m0. (83)

Note that n/m → 0, because mn log n in order to ensure with high probability the connectivity of the measurement graph G. The bound (83) was derived using the weak converse theorem (Fano’s inequality). It is also possible to show that the probability of error goes exponentially to 1 (using the Wolfowitz’ converse and Chernoff bound, see [16, Theorem 5.8.5, pages 173–176]).

The above discussion shows that there does not exist a decoding algorithm with a small probability for the error for values of p below the threshold probability pcinf given by

pcinf=nm2log2LL1. (84)

Note that for L = 2, the threshold probability pceig=1n of the eigenvector method in the complete graph case for which m=(n2) is 2 times smaller than pcinf. This is not a violation of information theory, because the fact that the top eigenvector has a non-trivial correlation with the vector of true angles does not mean that all angles are recovered correctly by the eigenvector.

We turn to shed some light on why it is possible to partially recover the angles below the information theoretic bound. The main issue here is that it is perhaps too harsh to measure the success of the decoding algorithm by Pe = Pr{θ̂nθn}. For example, when the decoding algorithm decodes 999 angles out of n = 1000 correctly while making just a single mistake, we still count it as a failure. It may be more natural to consider the probability of error in the estimation of the individual angles. We proceed to show that this measure of error leads to a threshold probability which is smaller than (84) by just a constant factor.

Let Pe(1)=Pr{θ^1θ1} be the probability of error in the estimation of θ1. Again, we want to use Fano’s inequality to bound the probability of the error by bounding the conditional entropy. A simple lower bound to the conditional entropy H(θ1|δm) is obtained by conditioning on the remaining n − 1 angles

H(θ1δm)H(θ1δm,θ2,θ3,,θn). (85)

Suppose that there are d1 noisy offset measurements of the form θ1θj, that is, d1 is the degree of node 1 in the measurement graph G. Let the neighbors of node 1 be j1, j2, …, jd1 with corresponding offset measurements δ1j1, …, δ1jd1. Given the values of all other angles θ2, …, θn, and in particular the values of θj1, …, θjd1, these d1 equations become noisy equations for the single variable θ1. We denote these transformed equations for θ1 alone by δ̃1, …, δ̃d1. All other md1 equations do not involve θ1 and therefore do not carry any information on its value. It follows that

H(θ1δm,θ2,θ3,,θn)=H(θ1δ1,,δd1). (86)

We have

H(δ1,,δd1θ1)=d1H(L,p), (87)

because given θ1 these d1 equations are i.i.d random variables with entropy H(L, p). Also, a simple upper bound on the d1 equations (without conditioning) is given by

H(δ1,,δd1)d1log2L, (88)

ignoring possible dependencies among the outcomes. From (87)–(88) we get an upper bound for the mutual information between θ1 and the transformed equations

I(θ1;δ1,,δd1)d1[log2LH(L,p)]=d1I(L,p). (89)

Combining (85),(86) (89) and (63) we get

H(θ1δm)H(θ1δm,θ2,θ3,,θn)=H(θ1δ1,,δd1)=H(θ1)I(θ1;δ1,,δd1)log2Ld1I(L,p). (90)

This lower bound on the conditional entropy translates, via Fano’s inequality, to a lower bound on the probability of error Pe(1), and it follows that

d1I(L,p)>log2L (91)

is a necessary condition for having a small Pe(1). Similarly, the condition for a small probability of error in decoding θi is

diI(L,p)>log2L, (92)

where di is the degree of vertex i in the measurement graph. This condition suggests that we should have more success in decoding angles of high degree. The average degree in a graph with n vertices and m edges is d¯=2mn. The condition for successful decoding of angles with degree is

2mnI(L,p)>log2L. (93)

In particular, this would be the condition for all vertices in a regular graph, or in a graph whose degree distribution is concentrated near .

Substituting the Taylor expansion (82) into (93) results in the condition

p>nmlog2LL1. (94)

This means that successful decoding of the individual angles may be possible already for p>pcind, where

pcind=nmlog2LL1, (95)

but the estimation of the individual angles must contain some error when p<pcind. Note that pcind<pcinf, so while for p values between pcind and pcinf it is impossible to successfully decode all angles, it may still be possible to decode some angles.

In the complete graph case, comparing the threshold probability of the eigenvector method pceig=1n given by (36) and the information theoretic threshold probability pcind (95) below which no algorithm can successfully recover individual angles, we find that their ratio is asymptotically independent of n and m:

pceigpcind=L12log2L+o(1). (96)

Note that the threshold probability pceig is smaller than pcind for L ≤ 6. Thus, we may regard the eigenvector method as a very successful recovery algorithm for offset equations with a small modulo L.

For L ≥ 7, equation (96) implies a gap between the threshold probabilities pceig and pcind, suggesting that the exhaustive exponential search for the maximum likelihood would perform better than the polynomial time eigenvector method. Note, however, that the gap would be significant only for very large values of L that correspond to very fine angular resolutions. For example, even for L = 100 the threshold probability of the eigenvector method would only be 992log21002.73 times larger than that of the maximum likelihood. The exponential complexity of O(mLn) of the exhaustive search for the maximum likelihood makes it impractical even for moderate-scale problems. On the other hand, the eigenvector method has a polynomial running time and it can handle large scale problems with relative ease.

6 Connection with Max-2-Lin mod L and Unique Games

The angular synchronization problem is related to the combinatorial optimization problem Max-2-Lin mod L for maximizing the number of satisfied linear equations mod L with exactly 2 variables in each equation, because the discretized offset equations θiθj = δij mod L are exactly of this form. Max-2-Lin mod L is a problem mainly studied in theoretical computer science, where we prefer using the notation “mod L” instead of the more common “mod p”, to avoid confusion between the size of the modulus and the proportion of good measurements.

Note that a random assignment of the angles would satisfy a 1L fraction of the offset equations. Andersson, Engebretsen, and Håstad [3] considered SDP based algorithms for Max-2-Lin mod L, and showed that they could obtain an 1L(1+κ(L))-approximation algorithm, where κ (L) > 0 is a constant that depends on L. In particular, they gave a very weak proven performance guarantee of 1L(1+108), though they concluded that it is most likely that their bounds can be improved significantly. Moreover, for L = 3 they numerically find the approximation ratio to be 11.270.79, and later Goemans and Williamson [19] proved a 0.793733-approximation. The SDP based algorithms in [3] are similar in their formulation to the SDP based algorithm of Frieze and Jerrum for Max-k-Cut [14], but with a different rounding procedure. In these SDP models, L vectors are assigned to each of the n angle variables, so that the total number of vectors is nL. The resulting nL × nL matrix of inner products is required to be semidefinite positive, along with another set of O(n2L2) linear and inequality constraints. Due to the large size of the inner product matrix and the large number of constraints, our numerical experiments with these SDP models were limited to relatively small size problems (such as n = 20 and L = 7) from which it was difficult to get a good understanding of their performance. In the small scale problems that we did manage to test, we did not find any supporting evidence that these SDP algorithms perform consistently better than the eigenvector method, despite their extensive running times and memory requirements. For our SDP experiments we used the software SDPT3 [35, 37] and SDPLR [5] in MATLAB. In [3] it is also shown that it is NP-hard to approximate Max-2-Lin mod L within a constant ratio, independent of L. Thus, we should expect an L-dependent gap similar to (96) for any polynomial time algorithm, not just for the eigenvector method.

Max-2-Lin is an instance of what is known as unique games [10], described below. One distinguishing feature of the offset equations is that every constraint corresponds to a bijection between the values of the associated variables. That is, for every possible value of θi, there is a unique value of θj that satisfies the constraint θiθj = δij. Unique games are systems of constraints, a generalization of the offset equations, that have this uniqueness property, so that every constraint corresponds to some permutation.

As in the setting of offset equations, instances of unique games where all constraints are satisfiable are easy to handle. Given an instance where 1 − ε fraction of constraints are satisfiable, the Unique Games Conjecture (UGC) of Khot [26] says that it is hard to satisfy even a γ > 0 fraction of the constraints. The UGC has been shown to imply a number of inapproximability results for fundamental problems that seem difficult to obtain by more standard complexity assumptions. Note that in our angular synchronization problem the fraction of constraints that are satisfiable is 1ε=p+1pL.

Charikar, Makarychev and Makarychev [6] presented improved approximation algorithms for unique games. For instances with domain size L where the optimal solution satisfies 1 − ε fraction of all constraints, their algorithms satisfy roughly Lε/(2 − ε) and 1O(εlogL) fraction of all constraints. Their algorithms are based on SDP, also with an underlying inner products matrix of size nL × nL, but their constraints and rounding procedure are different than those of [3]. Given the results of [27], the algorithms in [6] are near optimal if the UGC is true, that is, any improvement (beyond low order terms) would refute the conjecture. We have not tested their SDP based algorithm in practice, because, like the SDP of [3] it is also expected to be limited to relatively small scale problems.

7 Summary and Further Applications

In this paper we presented an eigenvector method and an SDP approach for solving the angular synchronization problem. We used random matrix theory to prove that the eigenvector method finds an accurate estimate for the angles even in the presence of a large number of outlier measurements.

The idea of synchronization by eigenvectors can be applied to other problems exhibiting a group structure and noisy measurements of ratios of group elements. In this paper we specialized the synchronization problem over the group SO(2). In the general case we may consider a group G other than SO(2) for which we have good and bad measurements gij of ratios between group elements

gij=gigj1,gi,gjG. (97)

For example, in the general case, the triplet consistency relation (17) simply reads

gijgjkgki=gigj1gjgk1gkgi1=e, (98)

where e is the identity element of G.

Whenever the group G is compact and has a complex or real representation (for example, the rotation group SO(3) has a real representation using 3 × 3 rotation matrices), we may construct an Hermitian matrix that is a matrix of matrices: the ij element is either the matrix representation of the measurement gij or the zero matrix if there is no direct measurement for the ratio of gi and gj. Once the matrix is formed, one can look for its top eigenvectors (or SDP) and estimate the group elements from them.

In some cases the eigenvector and the SDP methods can be applied even when there is only partial information for the group ratios. This problem arises naturally in the determination of the three-dimensional structure of a macromolecule in cryo-electron microscopy [12]. In [32] we show that the common lines between projection images give partial information for the group ratios between elements in SO(3) that can be estimated accurately using the eigenvector and SDP methods. In [33] we explore the close connection between the angular synchronization problem and the class averaging problem in cryo-electron microscopy [12]. Other possible applications of the synchronization problem over SO(3) include the distance geometry problem in NMR spectroscopy [42, 21] and the localization of sensor networks [4, 31].

The eigenvector method can also be applied to non-compact groups that can be “compactified”. For example, consider the group of real numbers ℝ with addition. One may consider the synchronization problem of clocks that measure noisy time differences of the form

titj=tij,ti,tj. (99)

We compactify the group ℝ by mapping it to the unit circle teιωt, where ω ∈ ℝ is a parameter to be chosen not too small and not too large, as we now explain. There may be two kinds of measurement errors in (99). The first kind of error is a small discretization error (e.g., a small Gaussian noise) of typical size Δ. The second type of error is a large error that can be regarded as an outlier. For example, in some practical application an error of size 10Δ may be considered as an outlier. We therefore want ω to satisfy ω ≫ (1/10)Δ−1 (not too small) and ω ≪ Δ−1 (not too large), so that when constructing the matrix

Hij={eιωtij{i,j}E,0{i,j}E, (100)

each good equation will contribute approximately 1, while the contribution of the bad equations will be uniformly distributed on the unit circle. One may even try several different values for the “frequency” ω in analogy to the Fourier transform. An overdetermined linear system of the form (99) can also be solved using least squares, which is also the maximum likelihood estimator if the measurement errors are Gaussian. However, in the many outliers model, the contribution of outlier equations will dominate the sum of squares error. For example, each outlier equation with error 10Δ contributes to the sum of squares error the same as 100 good equations with error Δ. The compactification of the group combined with the eigenvector method has the appealing effect of reducing the impact of the outlier equations. This may open the way for the eigenvector method based on (100) to be useful for the surface reconstruction problems in computer vision [13, 1] and optics [30] in which current methods succeed only in the presence of a limited number of outliers.

Acknowledgments

The author would like to thank Yoel Shkolniskly, Fred Sigworth and Ronald Coifman for many stimulating discussions regarding the cryo-electron microscopy problem; Boaz Barak for references to the vast literature on Max-2-Lin mod L and unique games; Amir Bennatan for pointers to the weak and strong converse theorems to the coding theorem; Robert Ghrist and Michael Robinson for valuable discussions at UPenn and for the reference to [17]; and Steven (Shlomo) Gortler, Yosi Keller and Ben Sonday for reviewing an earlier version of the manuscript and for their helpful suggestions.

The project described was supported by Award Number DMS-0914892 from the NSF, by Award Number FA9550-09-1-0551 from AFOSR, and by Award Number R01GM090200 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Agrawal AK, Raskar R, Chellappa R. What is the range of surface reconstructions from a gradient field?. Computer Vision – ECCV 2006: 9th European Conference on Computer Vision; Graz, Austria. May 7–13, 2006.2006. [Google Scholar]; Proceedings, Part IV (Lecture Notes in Computer Science); pp. 578–591. [Google Scholar]
  • 2.Alon N, Krivelevich M, Vu VH. On the concentration of eigenvalues of random symmetric matrices. Israel Journal of Mathematics. 2002;131(1):259–267. [Google Scholar]
  • 3.Andersson G, Engebretsen L, Håstad J. A new way to use semidefinite programming with applications to linear equations mod. Proceedings 10th annual ACM-SIAM symposium on Discrete algorithms; 1999. pp. 41–50. [Google Scholar]
  • 4.Biswas P, Liang TC, Toh KC, Wang TC, Ye Y. Semidefinite programming approaches for sensor network localization with noisy distance measurements. IEEE Transactions on Automation Science and Engineering. 2006;3(4):360–371. [Google Scholar]
  • 5.Burer S, Monteiro RDC. A Nonlinear Programming Algorithm for Solving Semidefinite Programs Via Low-Rank Factorization. Mathematical Programming (series B) 2003;95 (2):329–357. [Google Scholar]
  • 6.Charikar M, Makarychev K, Makarychev Y. Near-Optimal Algorithms for Unique Games. Proceedings 38th annual ACM symposium on Theory of computing; 2006. pp. 205–214. [Google Scholar]
  • 7.Coifman RR, Lafon S. Diffusion maps. Applied and Computational Harmonic Analysis. 2006;21(1):5–30. [Google Scholar]
  • 8.Cover TM, Thomas JA. Elements of Information Theory. Wiley; New York: 1991. [Google Scholar]
  • 9.Erdős P, Rényi A. On random graphs. Publicationes Mathematicae. 1959;6:290–297. [Google Scholar]
  • 10.Feige U, Lovász L. Two-prover one round proof systems: Their power and their problems. Proceedings of the 24th ACM Symposium on Theory of Computing; 1992. pp. 733–741. [Google Scholar]
  • 11.Féral D, Péché S. The Largest Eigenvalue of Rank One Deformation of Large Wigner Matrices. Communications in Mathematical Physics. 2007;272(1):185–228. [Google Scholar]
  • 12.Frank J. Three-Dimensional Electron Microscopy of Macro-molecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford: 2006. [Google Scholar]
  • 13.Frankot RT, Chellappa R. A method for enforcing integrability in shape from shading algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1988;10(4):439–451. [Google Scholar]
  • 14.Frieze A, Jerrum M. Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION. Algorithmica. 1997;18(1):67–81. [Google Scholar]
  • 15.Füredi Z, Komlós J. The eigenvalues of random symmetric matrices. Combinatorica. 1981;1:233–241. [Google Scholar]
  • 16.Gallager RG. Information Theory and Reliable Communication. Wiley; New York: 1968. [Google Scholar]
  • 17.Giridhar A, Kumar PR. Distributed Clock Synchronization over Wireless Networks: Algorithms and Analysis. 45th IEEE Conference on Decision and Control 2006; 2006. pp. 4915–4920. [Google Scholar]
  • 18.Goemans MX, Williamson DP. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM) 1995;42(6):1115–1145. [Google Scholar]
  • 19.Goemans MX, Williamson DP. Approximation algorithms for Max-3-Cut and other problems via complex semidefinite programming. Proceedings 33rd annual ACM symposium on Theory of computing; 2001. pp. 443–452. [Google Scholar]
  • 20.Griffiths DJ. Introduction to Quantum Mechanics. Prentice Hall; NJ: 1994. 416 Pages. [Google Scholar]
  • 21.Havel TF, Wuthrich K. An evaluation of the combined use of nuclear magnetic resonance and distance geometry for the determination of protein conformation in solution. J Mol Biol. 1985;182:281–294. doi: 10.1016/0022-2836(85)90346-8. [DOI] [PubMed] [Google Scholar]
  • 22.Horn RA, Johnson CR. Matrix Analysis. Cambridge University Press; 1990. 575 pages. [Google Scholar]
  • 23.Karp R, Elson J, Estrin D, Shenker S. Technical Report. Center for Embedded Networked Sensing, University of California; Los Angeles: 2003. Optimal and global time synchronization in sensornets. [Google Scholar]
  • 24.Khorunzhiy O. Rooted trees and moments of large sparse random matrices. Discrete Mathematics and Theoretical Computer Science AC. 2003:145–154. [Google Scholar]
  • 25.Khorunzhy A. Sparse Random Matrices: Spectral Edge and Statistics of Rooted Trees. Advances in Applied Probability. 2001;33(1):124–140. [Google Scholar]
  • 26.Khot S. On the power of unique 2-prover 1-round games. Proceedings of the ACM Symposium on the Theory of Computing; 2002. pp. 767–775. [Google Scholar]
  • 27.Khot S, Kindler G, Mossel E, O’Donnell R. Optimal inapproximability results for MAX-CUT and other two-variable CSPs? SIAM Journal of Computing. 2007;37(1):319–357. [Google Scholar]
  • 28.Natterer F. The Mathematics of Computerized Tomography. SIAM: Society for Industrial and Applied Mathematics, Classics in Applied Mathematics; 2001. [Google Scholar]
  • 29.Péché S. The largest eigenvalues of small rank perturbations of Hermitian random matrices. Prob Theo Rel Fields. 2006;134 (1):127–174. [Google Scholar]
  • 30.Rubinstein J, Wolansky G. Reconstruction of optical surfaces from ray data. Optical Review. 2001;8(4):281–283. [Google Scholar]
  • 31.Singer A. A Remark on Global Positioning from Local Distances. Proceedings of the National Academy of Sciences. 2008;105 (28):9507–9511. doi: 10.1073/pnas.0709842104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Singer A, Shkolnisky Y. Three-Dimensional Structure Determination from Common Lines in Cryo-EM by Eigenvectors and Semidefinite Programming. doi: 10.1137/090767777. (submitted for publication) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Singer A, Shkolnisky Y, Hadani R. Viewing Angle Classification of Cryo-Electron Microscopy Images using Eigenvectors. doi: 10.1137/090778390. (submitted for publication) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Soshnikov A. Universality at the edge of the spectrum in Wigner random matrices. Comm Math Phys. 1999;207:697–733. [Google Scholar]
  • 35.Toh KC, Todd MJ, Tutuncu RH. SDPT3 — a Matlab software package for semidefinite programming. Optimization Methods and Software. 1999;11:545–581. [Google Scholar]
  • 36.Tracy CA, Widom H. Level-spacing distributions and the Airy kernel. Communications in Mathematical Physics. 1994;159(1):151–174. [Google Scholar]
  • 37.Tutuncu RH, Toh KC, Todd MJ. Solving semidefinite-quadratic-linear programs using SDPT3. Mathematical Programming Ser B. 2003;95:189–217. [Google Scholar]
  • 38.Vandenberghe L, Boyd S. Semidefinite programming. SIAM Review. 1996;38(1):49–95. [Google Scholar]
  • 39.Watts DJ, Strogatz SH. Collective dynamics of small-world networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]
  • 40.Wigner EP. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics. 1955;62:548–564. [Google Scholar]
  • 41.Wigner EP. On the distribution of tile roots of certain symmetric matrices. Annals of Mathematics. 1958;67:325–328. [Google Scholar]
  • 42.Wuthrich K. NMR studies of structure and function of biological macromolecules (Nobel Lecture) J Biomol NMR. 2003;27:13–39. doi: 10.1023/a:1024733922459. [DOI] [PubMed] [Google Scholar]

RESOURCES