Abstract
Single particle cryo-electron microscopy (EM) is a method for determining the 3-D structure of macromolecules from many noisy 2-D projection images of individual macromolecules whose orientations and positions are random and unknown. The problem of orientation assignment for the images motivated work on general multireference alignment. The recently introduced non-unique games framework provides a representation theoretic approach to alignment over compact groups, and offers a convex relaxation which is formulated as semidefinite programs with certificates of global optimality under certain circumstances. One of the great opportunities in cryo-EM is studying heterogeneous samples, containing two or more distinct classes or conformations of molecules. Taking advantage of this opportunity presents an algorithmic challenge: determining both the class and orientation of each particle. We generalize multireference alignment to a problem of alignment and classification, and we propose to extend non-unique games to the problem of simultaneous alignment and classification with the goal of simultaneously classifying cryo-EM images and aligning them within their respective classes.
Keywords: alignment, rotation group, classification, synchronization, graph-cut, SDP, cryo-em, heterogeneity, heterogeneous multireference alignment
1. Introduction
A Non-Unique Game (NUG) is an optimization problem or a statistical estimation problem, of inferring elements of a group by minimizing an expression of the form
(1) |
where are loss functions for particular pairwise relations between elements. This problem arises in multireference alignment (MRA) discussed in [1], and in more general settings discussed in [2]; A convex relaxation of the problem, proposed in [2], can be solved using semidefinite programming (SDP). NUG over the group and its SDP relaxation coincides with classification problems and graph-cut algorithms (e.g., [3, 4]).
One of the applications where NUGs and associated algorithms have been of particular interest is single particle cryo-electron microscopy (cryo-EM) [5, 6], where multiple noisy 2-D projections images of individual, ideally identical, frozen-hydrated 3-D macromolecules whose orientations and positions are random and unknown, must be aligned over SO(3), as a step in reconstructing the macromolecule. Some of the popular software used for analysis of cryo-EM data (e.g. [7]) alternate between updating an estimate of the scattering density of the molecule and updating an estimate of particle orientations, and are sensitive to scattering density chosen for initialization. One approach to ab-initio modeling of molecules from cryo-EM data, discussed in further detail in Section 2.1 and in [8, 9, 10, 11], uses common lines in the Fourier transforms of particle images to align them. The Fourier transform of these 2-D particle images are slices of the 3-D Fourier transform of the scattering density of the molecule, therefore each two slices have a common line where they intersect. This approach to ab-initio modeling has been formulated as a NUG [2], with the potential advantage of certificates of global optimality and no need for an approximate structure for initialization; the software is under development at the time of writing this paper. Such ab-initio models are used to initialize other algorithms, and to recover smaller structures.
Cryo-EM has been the subject of the 2017 Nobel Prize in Chemistry, due to the breakthroughs that the method facilitated in mapping the structure of molecules that are difficult to crystallize. Cryo-EM does not require crystallization necessary for X-ray crystallography, and unlike NMR it is not limited to small size molecules. One of the additional great opportunities in cryo-EM, is to overcome heterogeneity in the sample [12]: in practice many samples contain two (or more) distinct types of molecules (or different conformations of the same molecule); methods like X-ray crystallography and NMR, which measure ensembles of particles, have a difficulty distinguishing between these different types. In the case of heterogeneous samples, we do not know the orientation or the conformation of the molecule in each particle image; it is therefore natural to ask how MRA can be accomplished before classification of the particle images, or how classification can be accomplished before the particle images are aligned.
In this paper we discuss the problem of alignment and classification, referred to as heterogeneous MRA, and propose to solve them simultaneously. This approach is based on the observation that both alignment and classification are problems over compact groups, and that the direct product of these groups is also a compact group.
We reformulate the problem as an optimization problem over the direct product of the groups, and reduce it to a NUG. In addition, we discuss some of the symmetries in the problem, which are exploited to reduce the size of the resulting SDPs. Furthermore, we propose an approach for controlling the size of the classes. The approach can be generalized to simultaneous alignment and parametrization, in the case of continuous heterogeneity (which will be discussed in a future paper). We demonstrate the applicability of this approach with examples of simoultanous alignment and classifcation over SO(2), comparing the quality of classification to a method of alignment-free classification based on invariant features.
Our approach is also applicable to other recently proposed methods of optimization over groups (e.g. [13, 14, 15]). The idea introduced in this work to treat alignment and classification/heterogeneity as two sides of the same coin in MRA has since been generalized to algorithms that take other perspectives on MRA and heterogeneity (inter alia [16, 17, 18, 19]).
This work suggests that the NUG solver for cryo-EM (developed for the homogeneous case) can be extended to solve the alignment and classification problems simultaneously in the case of heterogeneous cryo-EM samples; the algorithm inherits the properties of the NUG solver, such as certificates of global optimality in some cases and not requiring an approximate structure for initialization. Analogously to the role of ab-initio modeling in the homogeneous case, we envision this type of algorithms used in ab-initio modeling of heterogeneous samples, providing multiple 3-D structures or classification and alignment as initialization for refinement algorithms.
This paper is organized as follows. In Sections 2.1 and 2.2 we present a brief overview of cryo-EM as motivation for the discussion of simultaneous alignment and classification. The remainder of Section 2 summarizes some standard results used in this paper, as well as some previous work on NUGs. Section 3 contains a more detailed discussion of the problem, and the derivation of the main arguments in this paper. In section 4 we propose SDP algorithms for simultaneous alignment and classification. Section 5 contains experimental results for the case of simultaneous alignment and classification over SO(2). In section 6 we summarize our conclusions and briefly discuss generalizations and future work.
2. Preliminaries
2.1. The Cryo-EM Problem
Electron microscopy is an important tool for recovering the 3-D structure of molecules. Of particular interest in the context of this paper is single particle reconstruction (SPR), and more specifically, cryo-EM, where multiple noisy 2-D projections, ideally of identical particles in different orientations, are used in order to recover the 3-D structure. The following formula is a simplified imaging model of SPR
(2) |
where , is some random rotation matrix in SO(3), is the scattering density of the molecule, and is the projection operator. In other words, the model is that the molecule is rotated in a random direction, and the image obtained is the top-view projection of the rotated molecule, integrating out the axis. Indeed, one of the characteristic properties of cryo-EM SPR that sets it apart from other tomography techniques is that the orientation of the molecule in each image is unknown in cryo-EM, whereas in other tomography techniques the rotation angles are typically recorded with the measurements. The analysis of cryo-EM images is further complicated by extremely high level of noise, far exceeding the signal in magnitude, which makes it difficult not only to analyze the particles in the images, but also to locate the particles in the micrographs produced. Sample images are presented in fig. 1. More detailed discussions of these challenges, and various other challenges, such as the contrast transfer functions (CTF) applied to the images in the imaging process, can be found, for example, in [5].
The reconstruction of the molecule (or, more precisely, the density ) from the images obtained in cryo-EM requires an estimate of the rotation angles of the images. The Fourier slice theorem (see, for example, [20]) provides a way to estimate these rotations from the common lines between the images (see, for example [8, 9, 10, 11], and fig. 2). In the context of this paper, we assume that for every pair of images and , we have some function which corresponds to the “incompatibility” between the images and for every relative orientation ; this function is a measure of the discrepancy between the radial line in the Fourier transform of image and the radial line in the Fourier transform of image which would have corresponded to the common line between the plane of and the plane of , if the relative orientation of the two images had been . Had there not been noise, we would have expected that for the true relative rotation between image and image , and for every other (in fact, for every that yields the same common lines for the pair of images as since various rotations can yield the same common line. The ambiguity is resolved, up to reflections, only by adding a third image). In practice, due to the high levels of noise, need not be 0 at , and in fact, the value of may even not be minimized at . However, the expected value of is lower for the true than it is for other relative rotations. For more details about this loss function in the context of this paper, see [2].
2.2. The Heterogeneity Problem in Cryo-EM
So far, we have assumed that all the molecules being imaged in an experiment are identical copies of each other, so that all the images are projections of identical copies, from different directions. However, in practice, the molecules in a given sample may differ from one another for various reasons. For example, the sample may contain several types of different molecules due to some contamination or feature of the experiment. Alternatively, the molecules which are studied may have several different conformations or states, or some local variability (see example in fig. 3). The heterogeneity may be discrete (e.g. in the case of distinct different molecules) or continuous (in the case of molecules with continuous variability).
When there is heterogeneity in the samples, high resolution reconstruction of the molecules requires not only an estimate of the rotation of each image, but also classification of the images into clusters, each corresponding to a different molecule which is to be reconstructed separately. Some of the existing SPR analysis methods rely on some prior knowledge of the underlying molecules and on iterative processes of estimating the structure of the molecules and matching images to those estimates (e.g. [22, 23, 7]), and others require some method of recovering the rotation of the images although the images reflect mixtures of projections of different molecules (e.g. [24, 25]). A recent independent work [26] proposes to iterate between estimating the orientations and estimating the class labels based on pairwise relations between images.
2.3. Irreducible Representations of Groups
The purpose of sections 2.3 to 2.5 is to briefly review some standard results in group theory and harmonic analysis; more detailed discussions of these facts can be found, inter alia, in [28, 29, 30].
Suppose that is a compact group and , then by the Peter-Weyl Theorem [31], the generalized Fourier expansion of is
(3) |
where the matrices are the irreducible representations of is the dimensionality of the th representation, and the matrices are the Fourier coefficients of , defined by the formula
(4) |
with the Haar measure on normalized so that
(5) |
Remark 1. For abelian groups, such as SO(2) (shifts on a circle), for all k. However, in SO(3), which is of particular interest in the cryo-EM application, with .
The integration of any irreducible representation with respect to the Haar measure yields the zero matrix, except for the case of the trivial constant irreducible representation :
(6) |
The following are well known properties of irreducible and unitary representations of compact groups:
(7) |
(8) |
2.4. Special Cases: SO(2) and
In the special case where (discrete cyclic group of elements), there is a finite set of irreducible representations, and all the irreducible representations are of dimensionality one (scalar rather than a matrix). The irreducible representations of are
(9) |
The Fourier coefficients of a function over are simply the discrete Fourier transform (DFT) of the function (with the appropriate normalization eq. (5)).
In the special case where , there is an infinite set of irreducible representations, and all the irreducible representations are of dimensionality one. The irreducible representations of SO(2) are
(10) |
Remark 2. For the sake of brevity, and with a small abuse of notation, we will use elements of the groups and SO(2) and integers and angles interchangeably. For example, in eq. (10), the variable “a” can denote an element of SO(2) or an angle. Therefore, would mean the same as , with the former in group notation and the latter in angle notation; (where e is the identity element) in group notation means the same as in angle notations. The appropriate interpretation, group element or integers and angles, is obvious from the context or does not matter.
2.5. Direct Products of Groups
The direct product of two compact groups and is also a compact group, which has the elements . In this paper, we are particularly interested in the case .
The product of two elements of is defined in terms of elements in and by the following formula
(11) |
It follows that
(12) |
If is an irreducible representation of and is an irreducible representation of , then , defined by the formula
(13) |
is an irreducible representation of . The irreducible representations of are summarized in table 2; in table 3 we substitute and for the trivial irreducible representations of and respectively. By remark 1, the irreducible representations of abelian groups, such as the irreducible representations of , are one dimensional, so in this special case, the tensor product can be replaced with the trivial product between the scalar valued function and the (possibly) matrix valued function , as summarized in table 4.
Table 2:
Table 3:
Table 4:
2.6. Non-Unique Games (NUG)
Let be a compact group, and for every let ; non-unique games (NUG) are problems of the form eq. (1).
Remark 3. The solutions to non-unique games are not unique: if is a solution, then, is also a solution for any , because . The solution is therefore unique at most up to a global group element; the relative pairwise ratios may be unique.
2.6.1. Fourier Expansion of a NUG, and a Matrix Form
Using the Fourier expansion (see eq. (3)) of ,
(14) |
we rephrase eq. (1) in the Fourier expansion form:
(15) |
For example, in the case of , the Fourier coefficients of are given by its DFT, and the NUG becomes
(16) |
Plugging eq. (7) into eq. (15) yields
(17) |
The same expression can be rewritten in a block matrix form:
(18) |
where,
(19) |
Indeed, the block of the matrix , which we denote by , is
(20) |
Therefore, recovering the matrices which take the above form is equivalent to recovering the ratio between pairs, which allows us to recover up to a global element (see remark 3). In other words, we have “lifted” the problem from the original variables to the block matrices, where each block is associated with the ratio between a pair.
2.6.2. Convex Relaxation of NUG
We would like to convexify the NUG problem in order to use convex optimization theory and algorithms; in this section we consider the convex relaxation of eq. (18) and eq. (19):
(21) |
where the solution matrices are in the convex hull of the matrices defined in eq. (19).
The following SDP relaxation has been proposed in [2]:
(22) |
where,
(23) |
The constraints in eq. 22 are designed to restrict in eq. 22 to the convex hull of the matrices in eq. (19).
Remark 4. When the expansion of the irreducible representations of is infinite, it must be truncated in practice. The implementation of the non-negativity constraint is not trivial. The problem is discussed in [2], where is sampled and a non-negative kernel is applied. In some cases, sum-of-squares (SOS) constraints can also be used. The constraint, and possible improvements of it, are the subject of ongoing work.
3. NUG Formulation for Simultaneous Classification and Alignment
The purpose of this section is to introduce the problem of classification and alignment (heterogeneous MRA), demonstrate that it can be formulated as a NUG, and discuss the properties of the NUG SDP in this case. A simple case of heterogeneous MRA is provided in Section 3.1 as a motivating example, followed by a formal problem formulation in Section 3.2. In Sections 3.3 and 3.4 we some known aspects of NUG and SDPs from a perspective that is useful in the discussion of the connection between clustering and NUG of the convex relaxation of NUG. In Section 3.5, we introduce an extension of NUG that provides control on distributions and class sizes, which is used as an optional component in the remainder of the discussion. In Section 3.6, we write the problem introduced in Section 3.2 explicitly as a NUG, and infer the SDP associated with it. In Sections 3.7 and 3.8 we discuss finer properties of this SDP, generalizing some of the properties discussed in Sections 3.3 and 3.4.
3.1. Motivating Example: Classification and Alignment over SO(2)
In this section we present the problem of MRA over SO(2), and a heterogeneity problem associated with it. This problem turns out to be simpler than the cryo-EM problem in some fundamental ways which we will discuss in section 5 in the sense that there are tools available for approaching this problem that are not available in cryo-EM; however, in the context of the NUG formulation, the problem has many of the features of the cryo-EM problem.
Suppose that we have some periodic function over SO(2), and suppose that we are given multiple copies of this function, each shifted by some arbitrary angle. An example of such shifted copies is given in fig. 4. If we want to recover the original function (up to cyclic shifts), we may choose an arbitrary copy, because all the copies are identical to the original function up to shifts.
Next, suppose that we have noisy shifted copies of the function (fig. 5(a)). If we wish to approximate the original function (up to shifts), we would align the noisy copies (fig. 5(b)) and then average them to cancel out the noise (fig. 5(c)). Of course, in order to do this we must somehow recover the correct shifts of all the copies together (up to some global shift). In the following sections, we will use a loss function for different possible pairwise alignment; for each pair of copies, we can define a “compatibility penalty” for different possible alignments, for example (with slight abuse of notation), via the formula
(24) |
An example of such compatibility loss function is given in fig. 6. When the shifts are unknown, the problem of aligning the signals is a NUG (see [2, 1]).
In the heterogeneity problem we have a mixture of prototype signals; in this simplified example, let us assume that we have a mixture of noisy shifted versions of two classes of functions and , so that each sample is a shifted noisy version of either or as illustrated in the example in fig. 7(a). If we knew both the class and shift of each sample, we could divide the samples into two classes, and align them within each class (fig. 7(b),(c)), so that we could average within each class and approximate the two original signals (fig. 7 (d),(e)).
We know neither the shift nor the class of the samples; we study the extension of MRA and NUG to this case of alignment in the presence of heterogeneity.
3.2. Problem Formulation
We would like to find the optimal way to divide the samples into classes, so that we can best align them within each class. More formally, we would like to optimize the rotations and classification together:
(25) |
Remark 5. In this formulation, it is typically assumed that the loss is non-negative, and typically larger when and do not belong to the same class, so that there is an incentive to distribute the samples among clusters, and align them within each cluster.
We will also discuss the problem of controlling the distribution to different clusters; for example, we will discuss the case where all the clusters are required to be of equal size:
(26) |
3.3. Ambiguity
In some cases, there is a degree of ambiguity in a solution of a NUG (in addition to the inherent global ambiguity discussed in remark 3). Suppose that is a solution of the NUG in eq. (18) with the corresponding matrices , and suppose that there exists another solution with corresponding matrices that achieves the same optimization objective. We would be particularly interested in the case where cannot be obtained by applying some group element to (the case discussed in remark 3), so that in general . In the convex formulation of the problem in eq. (21), if both and are solutions, then so is every convex combination of those solutions, even if there is no “physical” solution which corresponds to .. In some cases, where the form of the ambiguity is known, we can use this property to enforce a solution of a certain form. An example is provided in the next section.
3.4. Reducing k-clustering to a NUG
In this section we discuss the NUG formulation of the problem of clustering vertices in a graph in k communities, to which we refer as k-clustering or k-classification. In particular, we discuss the max-k-cut problem and the balanced version of the problem (where each cluster contains an equal number of vertices). The SDP relaxation of max-k-cut has been studied in [3, 4] and the closely related min-k-cut problem has been studied as a NUG in [2]. For completeness, we present a slightly different formulation and derivation of the max-k-cut problem to illustrate some aspects of the NUG SDP which are useful the remainder of the discussion of alignment and classification. Since is often reserved for denoting indices of irreducible representations, we denote the number of clusters by .
Given an undirected weighted graph (e.g. fig. 8(a)), the max-k-cut problem is to divide the vertices of a graph into clusters (e.g. fig. 8(b)), cutting the most edges between clusters
(27) |
with the weight of the edge between vertices and . In other words, the problem is to divide the graph into clusters retaining the minimal sum of edge weights:
(28) |
where . We can view the weight of each edge as a measure of incompatibility or “distance,” and attempt to classify the vertices into clusters which are the least incompatible; i.e. the goal is to minimize the sum of intra-cluster weights retained, by finding a clustering that removes as many inter-cluster edges as possible.
The following SDP relaxation has been proposed in [3, 4],
(29) |
where is the matrix of edge weights. In a solution that corresponds to a “physical” solution (a valid classification, rather than, for example, a convex combination of classifications), if and are in the same cluster, and otherwise. A derivation for the related min-k-cut problem, in the context of NUG, is provided in [2]. We discuss an additional derivation which we will generalize in the following sections.
We consider the group of cyclic shifts. A function over this group can be written explicitly as a vector of length , indexed . We define the function by the following formula
(30) |
where is the weight of the edge between and . We denote by the class assignment of the element, so that
(31) |
or, in group notation
(32) |
where e is the identity element. This is precisely the loss function in eq. (28).
The discrete Fourier transform (DFT) of (with the appropriate choice of normalization) is
(33) |
These coefficients coincide with the coefficients of the expansion of in the irreducible representation of :
(34) |
Rewriting the clustering problem eq. (28) as a NUG over yields
(35) |
and substituting eq. (33) and eq. (35) into the block matrix formulation in eq. (18) yields
(36) |
subject to having the structure in eq. (19). The scalar irreducible representations here are , so that for every , the matrix is an matrix with in position . The matrix is a matrix of the coefficients in the DFT of ; by eq.(33), , for all . For some solution of the NUG, we have for every pair , with
(37) |
where we again use as the group elements and the angle.
After writing the problem in the block matrix form, we turn our attention to the convex version of this formulation (see eq. (21)). In particular, we discuss the ambiguity in the solution, which results in convex combinations of equivalent solutions, as discussed in section 3.3. Obviously, the solution to eq. (28) is unique at most up to any permutation (and not only cyclic shifts) of the labels assigned to each class. For example, Class 1 can be renamed Class 2 and vice versa, without changing the graph cut, as illustrated in fig. 8(b). In other words, the loss function depends only on whether or not and are in the same class, so it is invariant to permutations: for any permutation ,
(38) |
It follows that in the convexified formulation we can average all the different permutations, as discussed in section 3.3. If and are assigned to the same class in the solution, then so (or in integer notation ) and by eq. (37)
(39) |
However, if and are not assigned to the same class in the solution, we can average all the solutions for all permutations, where the solution for a permutation is
(40) |
A simple computation yields the averaged (equally weighted convex combination) solution for all , when and are not assigned to the same class,
(41) |
In other words, is the all-ones matrix, and the matrices for all are equal:
(42) |
with the element of these matrices with :
(43) |
Since is fixed, it can be ignored in the loss term of eq. (36), so the optimization is reduced to
(44) |
Using eq. (42) and eq. (33), the optimization is further reduced to
(45) |
which is scaled to
(46) |
Setting , we have the optimization term in eq. (29), with the other conditions in eq. (29) following from the derivation above.
3.5. Controlling Cluster Size or Distributions
The purpose of this section is to extend the NUG framework by adding constraints on the distribution of solutions over the group.
In some cases it is useful to restrict the clusters in a graph cut problem to be of equal size (for example, see discussion of min-k-cut in [32]), i.e.
(47) |
The NUG formulation does not have a mechanism to enforce such a constraint. We first consider the extension of the NUG in eq. (29) for the max-k-cut problem to the case of balanced cluster size. We add the constraint that for ,
(48) |
(for , the matrix is the trivial all ones matrix). Indeed, for any valid balanced solution, every vertex has vertices (including itself) in the same cluster, and for these vertices ; every vertex also has vertices in different classes, for these vertices . Therefore, the sum of these elements is 0. This solution resembles the algorithm proposed in [32].
This idea is a special case of a more general framework that enforces constant distribution over the group by enforcing eq. (48). The strict constraint on the distribution can be relaxed to an approximation, and therefore extended beyond discrete groups by relaxing the condition to one of the following constraints
(49) |
(50) |
(51) |
or by adding a similar constraint as a regularizer in the optimization (with the obvious extension where the irreducible representation is a matrix). This approach, which views the irreducible representations and their sum as an approximation of the Haar measure of the group (or appropriate variation when a prior is available), will be discussed in more detail in a future paper.
3.6. The Direct Product of Alignment and Classification (Product NUG)
The purpose of this section is to formulate the problem of simultaneous alignment and classification as a NUG. We revisit eq. (25) and rewrite the summation in the optimization:
(52) |
where
(53) |
With a small abuse of notation, we rewrite the class labels as elements in ; the expression can also be written as (where is the identity element of ), so, we can also write eq. (53) as:
(54) |
We introduce the function , defined as
(55) |
Using the identity eq. (12), we obtain
(56) |
and observe that is now simply a function over the compact group . Therefore, the expression in eq. (25) is reduced to the the explicit form of a NUG
(57) |
The block matrix formulation eq. (18) of this product NUG is
(58) |
Where,
(59) |
with the Fourier coefficient of corresponding to the irreducible representation , and the dimensionality of that irreducible representation. The irreducible representations of are enumerated in table 4; they are referenced by two indices, and .
As in the general discussion of NUG, we are interested in the convex relaxation of eq. (58):
(60) |
where the solution matrices are in the convex hull of the matrices defined in eq. (59).
The relaxation of the form eq. (22) is
(61) |
In the following sections, we turn our attention to the ambiguities and symmetries in of the convexified formulation eq. (60).
3.7. The 0 Order Representation of Alignment, and the Clustering Label Ambiguity
As discussed in section 3.3, when there is ambiguity in the solution of the NUG, it is manifested as convex combinations of solutions in the covexified formulation eq. (60). As discussed in section 3.4, there is ambiguity in the assignment of class labels which leads to symmetries in the NUG for the clustering problem.
We observe that the irreducible representations of , enumerated in the first row of table 4, are simply the irreducible representations of which appear in the max-k-cut problem, as are the coefficients of the expansion of . Therefore, the same argument used in section 3.4 can be used here to identify the desired form of the first row in the solution of the convex simultaneous alignment and classification problem eq. (60). In fact, the same argument applies to all rows, which can be averaged in the same way; the form of the averaged solution of each block is summarized in table 5, for the two cases: either and are in the same class (a), or they are in different classes (b).
Table 5:
3.8. Inter-Class Invariance
In addition to the class label ambiguity, there is another type of ambiguity which emerges in the simultaneous clustering and alignment product NUG. We observe that the solution is invariant to a group action on one class (without applying the same action to the other classes, so this is not a group action of ).
Lemma 1. Let and . Suppose that a and are some arbitrary class and rotation, and suppose that
(62) |
Then, the objective value in eq. (25) is the same for and :
(63) |
In other words, if is a solution of eq. (25), then so is .
Proof. For any (the clusters that have not been rotated), , so that
(64) |
For , we have
(65) |
so that eq. (64) holds for as well. □
It follows that when , we may average over all inter-class rotations. By eq. (6), using the Haar measure for the possible alignments yields 0 for all elements with . The form of the averaged solution of each block is summarized in table 6, for the two cases: either and are in the same cluster, or they are in different clusters.
Table 6:
4. Algorithms
Substituting the results of section 3.6 into eq. (22) we obtain the following SDP:
(66) |
The coefficient in the matrix can be obtained from the original alignment problem, when no clustering is required; suppose that the coefficients in that problem are , then for all and , the coefficients are
(67) |
We observe that due to the structure discussed in sections 3.7 and 3.8, regardless of whether or ,
(68) |
Taking these observations into account, eq. (66) is reduced to
(69) |
In fact, the requirement for non-negativity over is redundant, due to the following lemma.
Lemma 2. Suppose that (where is the identity element of ). If the other constraints in eq. (69) are satisfied, then for all ,
(70) |
for all and all .
Proof. Due to the other constraints in eq. (69), for all , we have , so that for all
(71) |
where the last step is due to the fact that for that are not the identity.
For , we have and , so that
(72) |
Using this lemma, eq. (69) is reduced to
(73) |
where is the identity element of .
4.1. Controlling Class Size
When the size of the classes is known to be equal, the constraint eq. (48) of section 3.5 is added to the SDP. Considering all the symmetries, the constraint takes the form
(74) |
4.2. Variables and Constraints Accounting
The purpose of this section is to discuss the number of free variables remaining in the formulation eq. (73), and the number of constraints. We note that the only remaining matrix variables are and The matrix is the trivial all ones matrix, and every other matrix is set to be equal to the appropriate matrix of those listed above (see eq. 68). We observe that the matrix has exactly the same form as the matrix in the max-cut classification SDP, and the constrains on it are similar. The matrices .. have the same form as the matrices in the alignment problem, and also have similar constraints.
Suppose that the overall number of matrix elements (summing over all ) for each pair in the irreducible representation for alignment is (if the expansion is truncated at representations, the number of elements in the case of SO(2) is and in the case of SO(3) it is , and suppose that the number of elements in the irreducible representations for classification is the number of classes . Then, there would be overall approximately matrix elements in the matrices for all and in the formulation in eq. (66). Instead, the formulation in eq. (73) implies that the remaining variables are and , so that the overall number of matrix elements is variables in and one in ). In other words, the number of free variables and constraints in eq. (73) is smaller than the straightforward formulation that does not take the symmetries into acount.
5. Experimental Results
In this section we present experiments with the simplified case of alignment and clustering of noisy functions over SO(2) (also discussed in section 3.1). We generated 4 complex valued prototype functions over , the functions are low-bandwidth, represented by 11 coefficients in the Fourier domain. For each prototype function we generated 15 copies, each copy was shifted randomly on SO(2), and random noise was added to each of the shifted copies, yielding a dataset of signals. The problem is now to align and cluster the signals in the dataset.
We implemented the SDP in eq. (73) with balanced classes (eq. (74)) in Matlab, using CVX [33, 34]. For every pair of signals and we compute :
(75) |
where is the signal rotated by . The rotation is implemented by multiplication by the appropriate phase in the Fourier domain. We construct the matrices of coefficients (the matrices for MRA without classification); the elements in the position in the matrix is the element in the DFT of .
(76) |
The non-negativity constraint is implemented using the Fejér kernel (see [2]).
To study the performance of the algorithm, we examine the classification performance, which we can compare to a benchmark clustering method based on shift-invariant signatures. The auto-correlation and bispectrum [35] of signals are invariant to rotations; therefore, in the absence of noise, we can compute the auto-correlation or bispectrum of each signal in our dataset, and use these as signatures by which to cluster. In the presence of noise, these signatures are distorted, leading to possible errors in clustering. We computed the bispectrum of each signal in the dataset and also solved the SDP for the product NUG of this dataset. In the case of low noise, the classification can be read directly from the matrix , however, as the noise increases, a rounding procedure is required to recover an approximate classification based on the output of the SDP. For simplicity, we used the simple k-means to cluster the signals: first by the bispectral signature of each signal, and then by the columns of the matrix obtained by the SDP (as a simple rounding method for the SDP). For simplicity, we did not enforce equal cluster sizes in the k-means. We measured the fraction of signals that were misclassified (the clusters are recovered only up to permutation: even if the k-means find the correct clusters, the class labels are assigned arbitrarily. We computed the minimum error over all permutations of class labels). This experiment examines only the classification properties of the algorithm, but given a good classification the problem is reduced to the NUG of alignment within each class. Special cases of alignment problems allow specialized rounding procedures using the output of this algorithm directly.
We repeated the experiment 20 times for every noise level. The results are presented in fig. 9. We experimented with both auto-correlation and bispectrum; since the results were very similar in the two cases we present the results for bispectrum here. The experiment demonstrates that the product NUG achieves considerably better classification results in the presence of noise.
Remark 6. In the cryo-EM problem, the images which we wish to align are different projections of the molecule . While bispectrum and auto-correlation have been used to find images from the same plane (see [36]), these signatures are not invariant to projections. Therefore, in the cryo-EM problem, these signatures cannot be used for classification, so they do not provide an alternative for the product NUG discussed here.
In other words, although the product NUG achieves better results than invariant signature based clustering in these experiments, its true importance is in cases where such alternative methods cannot be used. An implementation of the NUG for alignment over SO(3) in the special case of cryo-EM (even without classification) was not yet avialble at the time of writing this paper.
6. Summary and Future Work
The problem of simultaneous alignment and classification has been formulated as a non-unique game, and an algorithm has been presented for solving a convex relaxation of the problem. The algorithm has been demonstrated for the case of simultaneous alignment and classification of mixed signals over SO(2); and it is currently being adapted for the heterogeneity problem of cryo-EM. It should be noted that SDPs like the one proposed here are difficult to scale using off-the-shelf solvers to very large problems, such as alignment of hundreds of thousands of images produced in modern cryo-EM experiments. Nevertheless, special purpose solvers provide more scalability, the SDPs offer certificates of global optimality of solutions found using other approaches in some circumstances, they provide a benchmark for approximate optimizations, and they can be applied to reduced datasets (e.g. class averages of images). Furthermore, the approach can be used with other recent methods for optimization over groups.
The approach discussed here can be generalized to the case of continuous heterogeneity, where the molecules are not classified to distinct classes, but rather lie on a continuum of states that can be parametrized (alternatively, the states are distinct, but related to some degree). In this case, we follow similar ideas to those in this paper, however there are some additional details that require considerations in the choice of underlying groups and the structure of ; this case will be discussed in more detail in a future paper.
As discussed in section 3.5, there are several variations of the control over the size of clusters. Furthermore, the same ideas can be used to control the distribution of the recovered rotation angles (for example, when the images can be assumed to come from approximately uniform distribution over SO(3)).
Table 1:
the complex conjugate transpose of the matrix | |
the cyclic group of order | |
the direct product between group and group | |
the action of on a function | |
the trace of the matrix | |
the Kronecker (tensor) product of the matrix A and the matrix |
Acknowledgments
The authors would like to thank Joakim Andén, Afonso Bandeira, Tejal Bhamre, Yutong Chen and Justin Solomon for their help.
The authors were partially supported by Award Number R01GM090200 from the NIGMS, FA9550-12-1-0317 and FA9550-13-1-0076 from AFOSR, Simons Foundation Investigator Award, Simons Collaboration on Algorithms and Geometry, and the Moore Foundation Data-Driven Discovery Investigator Award. Part of the work by RRL was done while visiting the Hausdorff Research Institute for Mathematics, as part of the Mathematics of Signal Processing trimester.
References
- [1].Bandeira AS, Charikar M, Singer A, Zhu A, Multireference alignment using semidefinite programming, in: Proceedings of the 5th conference on Innovations in theoretical computer science, ACM, 2014, pp. 459–470. [Google Scholar]
- [2].Bandeira AS, Chen Y, Singer A, Non-unique games over compact groups and orientation estimation in cryo-em, arXiv preprint arXiv:1505.03840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Goemans MX, Williamson DP, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM (JACM) 42 (6) (1995) 1115–1145. [Google Scholar]
- [4].Frieze A, Jerrum M, Improved approximation algorithms for max k-cut and max bisection, in: Integer Programming and Combinatorial Optimization, Springer, 1995, pp. 1–13. [Google Scholar]
- [5].Frank J, Three-dimensional electron microscopy of macromolecular assemblies: visualization of biological molecules in their native state, Oxford University Press, 2006. [Google Scholar]
- [6].van Heel M, Gowen B, Matadeen R, Orlova EV, Finn R, Pape T, Cohen D, Stark H, Schmidt R, Schatz M, et al. , Single-particle electron cryo-microscopy: towards atomic resolution, Quarterly reviews of biophysics 33 (04) (2000) 307–369. [DOI] [PubMed] [Google Scholar]
- [7].Scheres S, RELION: Implementation of a Bayesian approach to cryo-EM structure determination, J. Struct. Biol 180 (3) (2012) 519–530. doi: 10.1016/j.jsb.2012.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Van Heel M, Angular reconstitution: a posteriori assignment of projection directions for 3d reconstruction, Ultramicroscopy 21 (2) (1987) 111–123. [DOI] [PubMed] [Google Scholar]
- [9].Singer A, Coifman RR, Sigworth FJ, Chester DW, Shkolnisky Y, Detecting consistent common lines in cryo-em by voting, Journal of structural biology 169 (3) (2010) 312–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Singer A, Shkolnisky Y, Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming, SIAM journal on imaging sciences 4 (2) (2011) 543–572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Shkolnisky Y, Singer A, Viewing direction estimation in cryo-em using synchronization, SIAM journal on imaging sciences 5 (3) (2012) 1088–1110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Nogales E, The development of cryo-em into a mainstream structural biology technique, Nature Methods 13 (1) (2016) 24–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Perry A, Wein AS, Bandeira AS, Moitra A, Message-passing algorithms for synchronization problems over compact groups, arXiv preprint arXiv:1610.04583. [Google Scholar]
- [14].Boumal N, Nonconvex phase synchronization, arXiv preprint arXiv:1601.06114. [Google Scholar]
- [15].Chen Y, Candes E, The projected power method: An efficient algorithm for joint alignment from pairwise differences, arXiv preprint arXiv:1609.05820. [Google Scholar]
- [16].Lederman RR, Singer A, Continuously heterogeneous hyper-objects in cryo-em and 3-d movies of many temporal dimensions, arXiv preprint arXiv:1704.02899. [Google Scholar]
- [17].Bandeira AS, Blum-Smith B, Kileel J, Perry A, Weed J, Wein AS, Estimation under group actions: recovering orbits from invariants, arXiv preprint arXiv:1712.10163. [Google Scholar]
- [18].Boumal N, Bendory T, Lederman RR, Singer A, Heterogeneous multireference alignment: A single pass approach, in: 2018 52nd Annual Conference on Information Sciences and Systems (CISS), IEEE, 2018, pp. 1–6. [Google Scholar]
- [19].Ma C, Bendory T, Boumal N, Sigworth F, Singer A, Heterogeneous multireference alignment for images with application to 2-d classification in single particle reconstruction, arXiv preprint arXiv:1811.10382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Natterer F, The mathematics of computerized tomography, Vol. 32, Siam, 1986. [Google Scholar]
- [21].Liao M, Cao E, Julius D, Cheng Y, Structure of the trpv1 ion channel determined by electron cryo-microscopy, Nature 504 (7478) (2013) 107–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Sigworth FJ, Doerschuk PC, Carazo J-M, Scheres SH, Chapter tenan introduction to maximum-likelihood methods in cryo-em, Methods in enzymology 482 (2010) 263–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Scheres SH, Chapter eleven-classification of structural heterogeneity by maximum-likelihood methods, Methods in enzymology 482 (2010) 295–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Katsevich E, Katsevich A, Singer A, Covariance matrix estimation for the cryo-em heterogeneity problem, SIAM journal on imaging sciences 8 (1) (2015) 126–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Andén J, Katsevich E, Singer A, Covariance estimation using conjugate gradient for 3d classification in cryo-em, in: Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on, IEEE, 2015, pp. 200–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Aizenbud Y, Shkolnisky Y, A max-cut approach to heterogeneity in cryo-electron microscopy, arXiv preprint arXiv:1609.01100. [Google Scholar]
- [27].Liao HY, Frank J, Classification by bootstrapping in single particle methods, in: 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, IEEE, 2010, pp. 169–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Coifman RR, Weiss G, Representations of compact groups and spherical harmonics, Enseignement math 14 (1968) 121–173. [Google Scholar]
- [29].Sternberg S, Group theory and physics, Cambridge University Press, 1995. [Google Scholar]
- [30].Dym H, McKean HP, Fourier series and integrals, Academic press, 1985. [Google Scholar]
- [31].Peter F, Weyl H, Die vollständigkeit der primitiven darstellungen einer geschlossenen kontinuierlichen gruppe, Mathematische Annalen 97 (1) (1927) 737–755. [Google Scholar]
- [32].Agarwal N, Bandeira AS, Koiliaris K, Kolla A, Multisection in the stochastic block model using semidefinite programming, arXiv preprint arXiv:1507.02323. [Google Scholar]
- [33].Grant M, Boyd S, CVX: Matlab software for disciplined convex programming, version 2.1, http://cvxr.com/cvx (Mar. 2014). [Google Scholar]
- [34].Grant M, Boyd S, Graph implementations for nonsmooth convex programs, in: Blondel V, Boyd S, Kimura H (Eds.), Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, Springer-Verlag Limited, 2008, pp. 95–110. [Google Scholar]
- [35].Sadler BM, Giannakis GB, Shift-and rotation-invariant object reconstruction using the bispectrum, JOSA A 9 (1) (1992) 57–69. [Google Scholar]
- [36].Zhao Z, Singer A, Rotationally invariant image representation for viewing direction classification in cryo-em, Journal of structural biology 186 (1) (2014) 153–166. [DOI] [PMC free article] [PubMed] [Google Scholar]