Abstract
We propose a Geometric unsupervised matching Network (Gum-Net) for finding the geometric correspondence between two images with application to 3D subtomogram alignment and averaging. Subtomogram alignment is the most important task in cryo-electron tomography (cryo-ET), a revolutionary 3D imaging technique for visualizing the molecular organization of unperturbed cellular landscapes in single cells. However, subtomogram alignment and averaging are very challenging due to severe imaging limits such as noise and missing wedge effects. We introduce an end-to-end trainable architecture with three novel modules specifically designed for preserving feature spatial information and propagating feature matching information. The training is performed in a fully unsupervised fashion to optimize a matching metric. No ground truth transformation information nor category-level or instance-level matching supervision information is needed. After systematic assessments on six real and nine simulated datasets, we demonstrate that Gum-Net reduced the alignment error by 40 to 50% and improved the averaging resolution by 10%. Gum-Net also achieved 70 to 110 times speedup in practice with GPU acceleration compared to state-of-the-art subtomogram alignment methods. Our work is the first 3D unsupervised geometric matching method for images of strong transformation variation and high noise level. The training code, trained model, and datasets are available in our open-source software AITom1.
1. Introduction
Given a transformation model, geometric matching aims to estimate the geometric correspondence between related images. In two and three dimensions, geometric matching is widely applied to fields such as pattern recognition [1, 2], 3D image reconstruction [3, 4], medical image alignment and registration [5, 6], and computational chemistry [7]. Finding the global optimal parameters consistent with a geometric transformation model such as affine or rigid transformation has a fundamental bottleneck. The parametric space needs to be exhaustively searched but the computational cost is infeasible [8]. Many popular methods have been proposed that alleviate the computational cost by detecting and matching hand-crafted local features [9, 10, 11] to estimate the global geometric transformation robustly [12, 13, 14, 15].
Recently, end-to-end trainable image alignment attracts attention. There are two major advantages over traditional non-trainable methods: (1) a properly trained convolutional neural network (CNN) model can process a large amount of data in a significantly shorter time and (2) with increasing amount of data collected, the deep learning model performance can be improved progressively by better feature learning [16].
In this paper, we focus on an important geometric matching application field, cryo-electron tomography (cryo-ET). In recent years, cryo-ET emerges as a revolutionary in situ 3D structural biology imaging technique for studying macromolecular complexes in single cells, the nano-machines that govern cellular biological processes [17]. Cryo-ET captures the 3D native structure and spatial distribution of all macromolecular complexes together with other subcellular components without disrupting the cell [18]. Nevertheless, cryo-ET data is heavily affected by a low signal-to-noise ratio (SNR) (example input data and mathematical definition in Supplementary Section S3) due to the complex cytoplasm environment and missing wedge effects2. Therefore, the macromolecular structures in the 3D tomogram need to be detected and recovered for further biomedical interpretation.
A subtomogram from a tomogram is a small cubic subvolume generally containing one macromolecular complex. Subtomogram alignment is the most critical cryo-ET data processing technique for two reasons: first, high-resolution macromolecular structures can be recovered through subtomogram averaging based on alignment. Second, the spatial distribution of a certain structure can be detected through alignment. To recover the structure, subtomograms containing the same macromolecular structure but in different poses must be iteratively aligned and averaged. Subtomogram averaging improves resolution by reducing noise and missing wedge artifacts [19]. Subtomogram alignment is a considerably more challenging geometric matching task than related tasks such as 3D deformable medical image registration from two aspects: first, there is strong transformation variation because the structure inside a subtomogram is of completely random orientation and displacement. Second, medical images are relatively clean tissue images whereas subtomograms are cellular images with a low SNR (around 0.01 to 0.1) due to the complex cytoplasm environment and the low electron dose used for imaging [20] (example input data in Supplementary Section S3).
Given the 3D rigid transformation model, subtomogram alignment computes the six parameters (three rotational and three translational). We and others have proposed methods [21, 22] to approximate the constrained correlation objective function [23] as heuristics to limit the computational time to a feasible range. However, it is possible nowadays to collect a set of tomograms in several days containing millions of subtomograms [24]. Existing state-of-the-art subtomogram alignment methods [21, 22] generally align a pair of subtomograms on the scale of several seconds, which is too slow for processing such a large amount of data. Moreover, their accuracy is limited because they are approximation methods.
We propose Gum-Net (Geometric unsupervised matching Network), a deep architecture for 3D subtomogram alignment and averaging through unsupervised rigid geometric matching. Integrating three novel modules, Gum-Net inputs two subtomograms to estimate the transformation parameters by extracting and matching convolutional features. Gum-Net achieved significant improvement in efficiency (70 to 110 times speedup) and accuracy (40 to 50 % reduction in alignment error) over two state-of-the-art subtomogram alignment methods [21, 22]. The improvements from proposed modules were demonstrated in three ablation studies.
Main contributions.
Our work is the first 3D unsupervised geometric matching method for images of strong transformation variation and high noise level. We integrated three novel modules (Figure 1): (1) we observe that as the max pooling and averaging pooling operations in the standard deep feature extraction process seek to achieve local transformation invariance, it is not suitable for accurate geometric matching, because the feature spatial locations need to be preserved to a large extent during feature extraction. Therefore, we introduce a feature extraction module with spectral operations including pooling and filtering to preserve the spatial location of extracted features. (2) We propose a novel Siamese matching module that improves spatial correlation information propagation by processing two feature correlation maps in parallel. (3) We incorporate a modified spatial transformer network [25] with a differentiable missing wedge imputation strategy into the alignment module. We achieved fully unsupervised training by feeding into random pairs of subtomograms regardless of their structural class information. Therefore, in contrast to other weakly-supervised geometric matching methods [26, 27, 28, 29, 30], no supervision such as instance-level or category-level matching information is needed.
Figure 1:
Gum-Net model pipeline. The model is unsupervised and feed-forward. The model inputs two subtomograms sa and sb (underlying structures are shown in isosurface representation) and outputs the transformed subtomogram to geometrically match sa, in addition to the transformation model parameters ϕtr and ϕrot. The dash-line denotes that the parameters are shared between the two feature extractors.
2. Related Work
2.1. 2D image alignment based on CNN
2D image alignment usually consists of two steps: (1) obtaining image feature descriptors and (2) matching feature descriptors according to a geometric model. Recently, some methods have employed pre-trained [31] or trainable [32, 33] CNN-based feature extractors. Specifically, [34] proposed a hierarchical metric learning strategy to learn better feature descriptors for geometric matching. However, all the networks are combined with traditional matching methods.
In 2017, Rocco et al. proposed the first end-to-end convolutional neural network for geometric matching of 2D images [35]. This fully supervised model utilizes a pre-trained network [36] to extract features from the two images to be matched. Then a correlation layer matches the features followed by a network to regress to the known transformation parameters for supervised training. Later, they extended this model to be weakly-supervised for finding category-level [27] and instance-level correspondence [26]. Other weakly supervised methods have been proposed for similar tasks including semantic attribute matching [28], simultaneous alignment and segmentation [29], and alignment under large intra-class variations [30]. However, they still require additional training supervision such as matching image pairs on the instance level or category level.
2.2. Unsupervised optical flow estimation
Optical flow estimation describes the small displacements of pixels in a sequence of 2D images using a dense or sparse vector field. Early unsupervised methods have used the gated restricted Boltzmann machine to learn image transformations [37, 38]. Recent CNN-based methods applied techniques such as frame interpolation [39], occlusion reasoning [40], and unsupervised losses in terms of brightness constancy [41] or bidirectional census [42]. Although these methods are all unsupervised, they require their input images to be highly similar with only small pixel shifts.
2.3. Unsupervised deformable medical image registration
3D image registration is the 3D analog to the 2D optical flow estimation. Deformable image registration has been extensively applied to 3D medical images such as brain MRI [43, 44], CT [45, 46], and cardiac images [47, 48]. Recent works present unsupervised CNN models based on spatial transformation function [49, 50, 51] or generative adversarial networks [52, 53]. Similar to optical flow estimation, these methods require the input pair of fixed and moving volumes to be similar. The information from the two volumes is integrated by stacking them as one input to the CNN models. However, simply stacking the input image pairs works poorly when there is strong transformation variation because the image similarity comparison is spatially constrained to a local neighborhood [54].
2.4. Non-learning-based subtomogram alignment
Early works have used exhaustive grid search of rotations and translations with fixed intervals such as 1 voxel and 5° to align subtomograms [55, 56, 57]. To reduce the computational cost of searching the 6D parametric space exhaustively, high-throughput alignment proposed in [21] applied the fast rotational matching algorithm [58]. Fast and accurate alignment proposed in [22] also used the fast rotational matching algorithm and takes into account more information including amplitude and phase into their procedure. Another approach is to collaboratively align multiple subtomograms together based on nuclear-norm [59].
In this paper, we focus on pairwise subtomogram alignment and compared our method against the two most popular subtomogram alignment methods as baselines [21, 22].
3. Method
Our model is shown in Figure 1 (detailed architecture in Supplementary Section S2). Two subtomograms (3D grayscale cubic images) sa and sb are processed using feature extractors with shared weights to produce two feature maps va and vb. Then a Siamese matching module computes two correlation maps cab and cba. At a specific position (i,j,k), cab contains the similarity between va at that position (i,j,k) and all the features of vb, whereas cba is similarly defined. cab and cba are processed with the same network architecture and are later concatenated to estimate the transformation parameters. The 6D transformation parameters, which consist of ϕtr = {qx, qy, qz} for 3D translation and ϕrot = {qα, qβ, qγ} for 3D rotation in ZYZ convention, are feed into a differentiable spatial transformer network to compute the output, a transformed subtomogram with the missing wedge region imputed (Section 3.3). A spectral data imputation technique is integrated into the spatial transformer network to compensate for the missing wedge effects. In the training process, we do not have the ground truth transformation parameters to regress to as in [35]. Therefore, to assess the geometric matching performance, our objective is to find 3D rigid transformation parameters to maximize the cross-correlation between sa and in an unsupervised fashion. The cross-correlation-based loss is back-propagated to update the model weights.
3.1. Feature extraction module
Feature extraction is a dimensionality reduction process to efficiently learn a compact feature vector representation of interesting parts of raw images. There are various popular feature extraction techniques such as DenseNet [60], InceptionNet [61], and ResNet [62]. Subsampling methods such as max pooling and average pooling are used in these convolutional neural networks to reduce feature map dimensionality and facilitate computation. Compared with max pooling and average pooling, spectral representation for convolutional neural networks preserves considerably more spatial information per parameter and enables flexibility in the pooling output dimensionality [63]. 2D spectral pooling layers that perform dimension reduction in the frequency domain have been proposed based on discrete Fourier transform (DFT) [63], discrete cosine transform (DCT) [64], and Hartley transform [65]. However, these methods are designed for 2D images and do not take into account image noise.
We propose a 3D DCT-based spectral layer with pooling and filtering operations. Since our inputs are 3D noisy images, the novel filtering operation is for feature map high-frequency noise reduction, and pooling operation for feature map dimension reduction. We choose the DCT because it stores only real-valued coefficients and compacts more energy in a smaller portion of the spectra compared to the DFT [66].
For an input feature map , its 3D type-II DCT is defined as [67]:
| (1) |
where , ∀l ∈ {0, …, L – 1}. ϵh and ϵw are similarly defined ∀h ∈ {0, …, H – 1}, ∀w ∈ {0, …, W – 1}. The inverse transform of 3D type-II DCT is well defined as 3D type-III DCT [67]. Therefore, the pooled and filtered representation in the frequency domain can be transformed back through type-III DCT to the spatial domain as the output of the layer.
We use the DCT to perform subsampling in which the input is transformed to the frequency domain and cropped there. The output with reduced dimensionality is computed by transforming the cropped spectrum back into the spatial domain. The spectral pooling operation has been shown to achieve better spatial information preservation per parameter in terms of the l2 norm as compared to the max pooling operation [63]. Figure 2 shows the image reconstruction from max pooling, average pooling, and DCT spectral pooling at different subsampling factors. Compared to other pooling operations, a major advantage of using spectral pooling & filtering layers for geometric matching tasks is that the spatial location of features in two images are significantly better preserved for accurate matching. For example, during max pooling, the maximum from the receptive field is selected to achieve local rotation and translation invariance with the intuition that the exact location of a feature does not matter to the final classification. By contrast, during the feature extraction step for geometric matching, the exact feature spatial location is critical and the information loss will lead to inaccurate downstream matching.
Figure 2:
Image reconstruction from the max pooling, average pooling, and DCT spectral pooling scheme at different subsampling factors. DCT spectral pooling retains substantially greater spatial information of features from the original image and offers arbitrary output map dimensionality.
We implement the 3D DCT spectral pooling & filtering as differentiable layers in the feature extractor. The low-pass filtering is also performed by masking out high-frequency regions dominated by noise. The forward and back-propagation procedure of the 3D DCT spectral pooling & filtering layer is outlined in Algorithm 1 and 2.
The arbitrary output size of spectral pooling & filtering layers offers another major advantage for geometric matching tasks. If the output two feature maps are of size L × W × H with C channels, the Siamese correlation layer (Section 3.2) will create two correlation maps, each of size L ×W × H with (LWH) channels. The output feature map size from the feature extraction module to the Siamese matching module needs to be carefully manipulated, especially for 3D images. If the output feature map is too small, such as 3 × 3 × 3, there is too much information loss for matching. If the output feature map is too large, such as 20 × 20 × 20, the resulting correlation maps will be of size 20 × 20 × 20 × 8000, which is too large to be processed. Unlike max pooling or average pooling layers which aggressively reduce each dimension to half of the size and remove 87.5 % of the information, spectral pooling & filtering layers can gradually reduce the feature map size to the desired feature extraction module output size. Therefore, no additional spatial cropping or padding layer is needed to control the feature map size.
| Algorithm 1 DCT spectral pooling & filtering | |
|---|---|
| Algorithm 2 DCT spectral pooling & filtering back-propagation | |
|---|---|
3.2. Siamese matching module
The matching of extracted features from images is usually performed as an independent post-processing step [68, 69, 70, 71, 72]. The 2D correlation layer proposed in [35] achieved the state-of-the-art for integrating the matching information from two images. It is essentially a normalized cross-correlation function . One of the input feature maps va is first flattened into shape , where N = HW, in order to keep the output correlation map 2D. Then for each feature (pixel) in va and vb, the dot product is computed over all the channels (as feature descriptors) to obtain the correlation, which is later normalized. Nevertheless, to control the dimension of the output correlation map, all axes of one input feature map are broken and later cast into the channels of the output whereas the other input feature map is preserved.
We propose a novel Siamese matching module for pairwise 3D feature matching. To better utilize and process the feature correlation information, we design a Siamese correlation layer. Different from the correlation layer in [35] which computes only cab, the Siamese correlation layer is intuitive and symmetrically designed, which computes two correlation maps cab and cba. Each of them preserves the spatial coordinates of one input feature map. The use of two correlation maps propagates more feature spatial correlation information for the transformation parameter estimation. Element at a specific position lwhc is defined as:
| (2) |
The two correlation maps are feed into a pseudo-Siamese network consisting of convolution layers and convolved separately but later concatenated for one fully connected layer. After another fully connected layer, the Siamese matching module outputs the estimated rigid transformation parameters ϕtr and ϕrot. Detailed model architecture can be found in Supplementary Section S2.
3.3. Unsupervised geometric alignment module
Existing subtomogram alignment methods optimize a matching metric [21, 22, 57, 73]. In practice, preparing the subtomogram alignment ground truth for training is extremely time-consuming (need to exhaustively search the 6D parametric space). Therefore, the deep model should be unsupervised for this task. To achieve this goal, we propose an unsupervised geometric alignment module utilizing the spatial transformer network [25] with spectral data imputation designed specifically for subtomogram data.
In a tomogram with fixed voxel spacing (around 1nm), a certain type of macromolecular structure does not scale or reflect. Therefore, we restrict ourselves to 3D rigid transformation. Denoting the transformation matrix generated by the 3D rigid transformation parameters as Mθ [74] and the 3D warping as , we have:
| (3) |
where (, , ) is the target coordinates on the transformed output 3D image and (, , ) is the source coordinates on the input 3D image. θ is an element of the transformation matrix. The 3 × 3 orthogonal rotation matrix is from θ11 to θ33. The displacement along each axis is specified by θ14, θ24, and θ34. The 3D warping is differentiable and therefore able to be trained end-to-end.
In order to compensate for the missing wedge effects and thus to decrease the bias introduced, we integrate a spectral data imputation strategy from our previous work [75] into the spatial transformer network. For a subtomogram, we use its current estimated transformation to compute the rotated missing wedge mask m, as an indicator function to represent whether the Fourier coefficients are valid or missing in certain regions, and impute the missing ones with those from its transformation target subtomogram sa. We can form a transformed and imputed subtomogram such that:
| (4) |
where is the Fourier transform operator, is a Fourier space location, and m(ξ) is the rotated missing wedge mask according to ϕrot. Since the magnitude of Fourier transform is translation-invariant, we only need to rotate m(ξ) without using ϕtr [23]. The imputation operation facilitates the unsupervised geometric matching task because only when the optimal alignment is obtained, the imputed data results in the highest consistency with the transformed subtomogram.
We note that since the rotation of the missing wedge mask m is implemented along with the transformation of the input subtomogram in the differentiable spatial transformer network and the inverse discrete Fourier transformation is well defined, this spectral data imputation step is differentiable in a similar manner as Algorithm 2.
Loss function.
Pearson’ correlation and its variants are widely used for assessing the alignment between two subtomograms [21, 22, 23, 57, 73] because of its simplicity and effectiveness. We implement it as a loss function to Gum-Net:
| (5) |
where N is the total number of voxels in an input subtomogram. Compared to existing methods [21, 22], which utilize translation-invariant upper bound to approximate the Pearson’s correlation objective to reduce the computational cost, Gum-Net optimizes Pearson’s correlation directly for more accurate alignment.
3.4. Baseline methods
We implemented two most popular state-of-the-art subtomogram alignment methods for comparison: H-T align [21] and F&A align [22]. We performed three ablation studies with existing modules: Gum-Net Max Pooling (Gum-Net MP), Gum-Net Average Pooling (Gum-Net AP), and Gum-Net Single Correlation (Gum-Net SC). Detailed implementation can be found in Supplementary Section S2.
4. Experiments
Gum-Net was evaluated on six real and nine realistically simulated datasets at different SNR. On the simulated datasets, the accuracy of subtomogram alignment was evaluated by comparing the estimated transformation parameters ϕtr and ϕrot to the ground truth. On the real datasets, since the transformation ground truth is not available, in practice, the optimal transformation is usually obtained by parametric space exhaustive grid search to optimize the cross-correlation between sa and . Therefore, we compared the cross-correlation between sa and computed by Gum-Net and baseline methods as an indirect indicator of the alignment accuracy. The visualization of subtomograms in different datasets can be found in Supplementary Section S3.
4.1. Datasets
4.1.1. Real datasets
GroEL/GroES dataset: this dataset contains 786 experimental subtomograms of purified GroEL and GroEL/GroES complexes from 24 tomograms [23]. Each subtomogram is rescaled to size 323 with voxel size 0.933 nm and 25° missing wedge.
Rat neuron culture dataset: this recent dataset is a set of tomograms from rat neuron culture [76]. In total 1095 ribosome subtomograms and 1527 capped proteasome subtomograms were extracted by template matching [55] and biology expert annotation. Each subtomogram is of size 323 with voxel size 1.368 nm and 30° missing wedge.
S. cerevisiae 80S ribosome dataset: this dataset contains 3120 subtomograms extracted from 7 tomograms of purified S. cerevisiae 80S ribosomes [77]. Each subtomogram is rescaled to size 323 with voxel size 1.365 nm and 30° missing wedge.
TMV dataset: this dataset contains 2742 Tobacco Mosaic Virus (TMV) subtomograms, a type of helical virus [78]. Each subtomogram is binned to size 323 with voxel size 1.080 nm and 30° missing wedge.
Aldolase dataset: this recent dataset contains 400 purified rabbit muscle aldolase subtomograms [79]. Each subtomogram is rescaled to size 323 with voxel size 0.750 nm and 30° missing wedge.
Insulin receptor dataset: this recent dataset contains 400 purified human insulin-bounded insulin receptor subtomograms [80]. Each subtomogram is rescaled to size 323 with voxel size 0.876 nm and 45° missing wedge.
4.1.2. Simulated datasets
The subtomogram dataset simulation utilized a standard procedure in [81, 82] which takes into account the tomographic reconstruction process with missing wedges and contrast transfer function (detailed simulation procedure in Supplementary Section S3). We chose five representative macromolecular complexes: spliceosome (PDB ID: 5LQW), RNA polymerase-rifampicin complex (1I6V), RNA polymerase II elongation complex (6A5L), ribosome (5T2C), and capped proteasome (5MPA). All five structures are asymmetric so that there exists only one alignment ground truth. We simulated five datasets, one relatively clean (SNR 100) and four with SNR close to the experimental conditions (0.1, 0.05, 0.03, and 0.01), each consists of 2100 subtomogram pairs of each structure (in total 10500 subtomogram pairs). 5000 subtomogram pairs from each dataset were used for training and 500 pairs for validation. The rest 5000 subtomogram pairs from each dataset are used for testing. For a pair of subtomograms, one structure is a randomly transformed copy of the other and the two structures were processed independently to obtain its tomographic image distortions. Each subtomogram is of size 323 with voxel size 1.2 nm. The sb in each pair has a typical missing wedge 30° while sa has no missing wedge.
For subtomogram averaging, we simulated four datasets of 500 ribosomes (PDB ID: 5T2C) in the same manner of SNR 0.1, 0.05, 0.03, and 0.01.
4.2. Implementation
The deep models were implemented in Keras [83] with custom layers backend by Tensorflow [84]. All inputs have size 323. We note that due to the flexibility of input and output size of the DCT spectral pooling & filtering layers, the input size can be arbitrary. Higher resolution can be achieved with larger input subtomogram sizes. Detailed implementation of Gum-Net and baselines can be found in Supplementary Section S2.
For each epoch, we randomly draw 5000 subtomogram pairs sa and sb from the training dataset regardless of their structural class information. Therefore, Gum-Net is fully unsupervised without instance-level or category-level matching information for weak supervision as in other geometric matching methods [26, 27, 28, 29, 30]. For a simulated dataset, there are 50002 possible image pairs. As a result, we did not observe any overfitting issue.
4.3. Subtomogram alignment
Given the transformation ground truth, we measure the alignment accuracy with two metrics: (1) the translation error defined as the Euclidean distance between the translation estimation and the ground truth and (2) the rotation error defined as the Euclidean distance between the flattened rotation matrix of estimation and the ground truth.
On simulated datasets:
Table 3.4 shows the alignment accuracy. Gum-Net achieved similar performance on the clean dataset (SNR 100). As max pooling achieves more local transformation invariance [85], Gum-Net MP performs worse than Gum-Net AP in all settings as expected. When the SNR is close to experimental condition (the real datasets have SNR around 0.01 to 0.1), CNN-based methods generally perform better than traditional methods. Specifically, Gum-Net outperformed all the baseline methods, demonstrating the improvement from the proposed modules.
In our experiments, the training, validation, and testing datasets are independent, which ensured no overfitting. However, since Gum-Net is fully unsupervised, even if the testing dataset is from a different domain source, such as collected under different imaging conditions, it is possible to fine-tune a trained model on the testing dataset (with no ground truth) for adaptation. In terms of speed, with a trained model, Gum-Net only takes 17.6 seconds to align 1000 subtomograms on a single GPU core. The training takes less than 10 hours. Since there is no available GPU-accelerated version of the traditional algorithms, H-T align and F & A align take 1916.4 seconds and 1251.2 seconds to align 1000 subtomograms on a CPU core, respectively. Therefore, in practice, this results in 70 to 110 times speedup over traditional methods.
On real datasets:
We split the GroEL/GroES dataset into a training dataset of 617 subtomograms, a validation dataset of 69 subtomograms, and a testing dataset of 100 subtomograms. There are 4950 pairs of subtomograms in the testing dataset. We align them pairwise by Gum-Net, H-T align, and F&A align and calculates the cross-correlation. Gum-Net achieved cross-correlations of 0.0908±0.0204, significantly better (p < 0.001) than H-T align (0.0756±0.0194) and F&A align (0.0838±0.0204).
We split the rat neuron culture dataset into a training dataset of 2270 subtomograms, a validation dataset of 252 subtomograms, and a testing dataset of 100 ribosome and 100 capped proteasome subtomograms. There are 19900 pairs of subtomograms in the testing dataset. Gum-Net achieved cross-correlations of 0.0615±0.0187, significantly better (p < 0.001) than H-T align (0.0541±0.0235) and F&A align (0.0607±0.0199). We use the pairwise correlation matrix to cluster the subtomograms by defining the pairwise distance as 1 - pairwise correlation. Applying the complete-linkage hierarchical clustering algorithm with k = 2, Gum-Net achieved an accuracy of 92%, better than F&A align (65%) and H-T align (53.5%).
4.4. Non-parametric reference-free subtomogram averaging
Structures present in multiple noisy copies (usually thousands of) in a tomogram must be averaged through geometric transformation to obtain higher resolution 3D views [19]. To eliminate potential bias, subtomogram averaging is often done without any external structural reference. One major approach of reference-free subtomogram averaging is non-parametric alignment-based averaging in which all subtomograms are iteratively aligned to their average and re-averaged for the next iteration [86]. Figure 4 illustrates such a process in which the initial average is generated by simply averaging all the subtomograms without any transformation. The structural resolution of the subtomogram average is gradually improved through the iterative process.
Figure 4:
Illustration of alignment-based subtomogram averaging using Gum-Net. On the left are five example input subtomograms at SNR 0.1 in our experiment. On the right are subtomogram averages at different iterations and the true structure. The 2D slices representations are shown in Supplementary Section S3.
The iterative alignment-based non-parametric reference-free subtomogram averaging was tested using the proposed and baseline methods. The standard resolution measurement for assessing subtomogram averaging is Fourier shell correlation (FSC) [87] (mathematical definition in Supplementary Section S3), which measures the maximal discrepant structural factors between the subtomogram average and the true structure. The smaller the value, the better the results. As shown in Table 4.4, Gum-Net achieved the overall best averaging performance and improved the resolution by around 10%.
5. Conclusion
Cryo-ET subtomogram alignment and averaging revolutionize the discovery of 3D native macromolecular structure details in single cells. Such information provides critical insights into the precise function/dysfunction of the cellular processes. However, with a rapidly increasing amount of cryo-ET data collected, there is an urgent need to drastically improve the efficiency of subtomogram alignment methods. We developed the first unsupervised deep learning approach for 3D subtomogram alignment and averaging. Using the three proposed modules, Gum-Net achieved fast and accurate alignment with end-to-end unsupervised learning. Gum-Net opens up the possibility for continued improvement of subtomogram alignment and averaging efficiency and accuracy with better model design and training. This work serves as an important step toward in situ high-throughput detection and recovery of macromolecular structures for a better understanding of the molecular machinery in cellular processes.
Gum-Net can be integrated into existing cryo-ET analysis software in several ways. For example, EMAN2 [81] performs exhaustive 3D rotational and translational search followed by local refinement for alignment-based averaging. RELION [77] maximizes the likelihood of a model with Gaussian noise assumption by exhaustively scanning the 3D rigid transformation space for integration. Gum-Net improves the accuracy and efficiency of subtomogram alignment, especially for a large amount of cryo-ET data. Therefore, integrating Gum-Net with existing software can boost the speed of their alignment step or quickly generate initial structural models for averaging refinement. Gum-Net can also be easily extended to related tasks including tomographic tilt series alignment [88] and cryo-electron microscopy single-particle reconstruction [89]. The proposed modules can be adapted to other geometric matching tasks for images of strong transformation variation such as face alignment under pose variations [90, 91], or of high noise level such as synthetic aperture radar imaging [92, 93] and sonar imaging [94, 95].
Supplementary Material
Figure 3:
Example alignment inputs and outputs at SNR 100. 2D slices representations are shown in Supplementary Section S3.
Table 1:
Subtomogram alignment accuracy on five datasets with SNR specified. In each cell, the first term is the mean and standard deviation of the rotation error and the second term, the translation error. We highlighted Gum-Net results that are significantly better (p < 0.001) than all baselines by the paired sample t-test. More detailed results and analysis can be found in Supplementary Section S3.
| Method | SNR 100 | SNR 0.1 | SNR 0.05 | SNR 0.03 | SNR 0.01 |
|---|---|---|---|---|---|
| H-T align | 0.30±0.68, 1.82±2.69 | 1.22±1.07, 4.76±4.56 | 1.93±0.98, 7.26±4.77 | 2.22±0.77, 8.86±4.72 | 2.38±0.57, 11.33±5.02 |
| F&A align | 0.33±0.70, 1.93±2.86 | 1.34±1.13, 5.39±4.90 | 1.95±0.98, 7.54±4.94 | 2.22±0.77, 8.99±4.81 | 2.38±0.57, 11.32±4.92 |
| Gum-Net MP | 0.90±0.87, 3.34±3.41 | 1.30±0.79, 4.93±3.36 | 1.44±0.79, 5.46±3.38 | 1.53±0.78, 5.96±3.34 | 1.67±0.77, 7.28±3.38 |
| Gum-Net AP | 0.60±0.71, 2.32±2.71 | 1.09±0.73, 4.20±2.96 | 1.30±0.77, 5.00±3.15 | 1.45±0.77, 5.70±3.25 | 1.65±0.78, 7.18±3.35 |
| Gum-Net SC | 0.70±0.75, 2.63±2.86 | 1.16±0.77, 4.41±3.23 | 1.36±0.79, 5.13±3.34 | 1.48±0.78, 5.75±3.34 | 1.67±0.77, 7.24±3.46 |
| Gum-Net | 0.41±0.70, 1.59±2.63 | 0.62±0.69, 2.41±2.61 | 0.87±0.74, 3.20±2.78 | 1.13±0.75, 4.29±2.75 | 1.50±0.78, 6.78±4.22 |
Table 2:
Subtomogram averaging results in FSC resolution (nm). ‘0.1’ denotes simulated dataset at SNR 0.1. ‘80S’, ‘TMV’, ‘Aldolase’, and ‘Insulin’ denote the real datasets. The best resolution is highlighted.
| Method | 0.1 | 0.05 | 0.03 | 0.01 | 80S | TMV | Aldolase | Insulin |
|---|---|---|---|---|---|---|---|---|
| H-T align | 2.89 | 3.79 | 4.92 | 4.41 | 3.05 | 2.23 | 2.34 | 1.90 |
| F&A align | 2.78 | 4.36 | 3.81 | 4.53 | 2.77 | 2.52 | 3.13 | 2.18 |
| Gum-Net | 2.78 | 2.95 | 4.01 | 4.22 | 2.73 | 2.16 | 1.97 | 1.77 |
Acknowledgements
This work was supported by U.S. National Science Foundation (NSF) grant DBI-1949629 and in part by U.S. National Institutes of Health (NIH) grant P41 GM103712. XZ was supported by a fellowship from Carnegie Mellon University’s Center for Machine Learning and Health. We thank Hongyu Zheng, Dr. Benjamin Chidester, and Jennifer Williams at our Department for proof-reading the paper.
Footnotes
Partial sampling of images due to limited tilt angle ranges (description in Supplementary Section S1)
References
- [1].Li Xinchao, Larson Martha, and Hanjalic Alan. Pairwise geometric matching for large-scale object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5153–5161, 2015. [Google Scholar]
- [2].Sattler Torsten, Maddern Will, Toft Carl, Torii Akihiko, Hammarstrand Lars, Stenborg Erik, Safari Daniel, Okutomi Masatoshi, Pollefeys Marc, Sivic Josef, et al. Benchmarking 6dof outdoor visual localization in changing conditions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8601–8610, 2018. [Google Scholar]
- [3].Huang Qi-Xing, Flöry Simon, Gelfand Natasha, Hofer Michael, and Pottmann Helmut. Reassembling fractured objects by geometric matching. ACM Transactions on Graphics (TOG), 25(3):569–578, 2006. [Google Scholar]
- [4].Han Renmin, Wan Xiaohua, Wang Zihao, Hao Yu, Zhang Jingrong, Chen Yu, Gao Xin, Liu Zhiyong, Ren Fei, Sun Fei, et al. Autom: a novel automatic platform for electron tomography reconstruction. Journal of structural biology, 199(3):196–208, 2017. [DOI] [PubMed] [Google Scholar]
- [5].Declerck Jérôme, Feldmar Jacques, Betting Fabienne, and Goris Michael L. Automatic registration and alignment on a template of cardiac stress and rest spect images. In Proceedings of the Workshop on Mathematical Methods in Biomedical Image Analysis, pages 212–221. IEEE, 1996. [Google Scholar]
- [6].Guéziec André P, Pennec Xavier, and Ayache Nicholas. Medical image registration using geometric hashing. IEEE Computational Science and Engineering, 4(4):29–41, 1997. [Google Scholar]
- [7].Wolber Gerhard, Seidel Thomas, Bendix Fabian, and Langer Thierry. Molecule-pharmacophore superpositioning and pattern matching in computational drug design. Drug discovery today, 13(1-2):23–29, 2008. [DOI] [PubMed] [Google Scholar]
- [8].Indyk Piotr, Motwani Rajeev, and Venkatasubramanian Suresh. Geometric matching under noise: Combinatorial bounds and algorithms. In SODA, pages 457–465, 1999. [Google Scholar]
- [9].Lowe David G. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004. [Google Scholar]
- [10].Dalal Navneet and Triggs Bill. Histograms of oriented gradients for human detection. In international Conference on computer vision & Pattern Recognition (CVPR’05), volume 1, pages 886–893. IEEE Computer Society, 2005. [Google Scholar]
- [11].Rublee Ethan, Rabaud Vincent, Konolige Kurt, and Bradski Gary R. Orb: An efficient alternative to sift or surf. In ICCV, volume 11, page 2. Citeseer, 2011. [Google Scholar]
- [12].Fischler Martin A and Bolles Robert C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. [Google Scholar]
- [13].Philbin James, Chum Ondrej, Isard Michael, Sivic Josef, and Zisserman Andrew. Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007. [Google Scholar]
- [14].Lazebnik Svetlana, Schmid Cordelia, and Ponce Jean. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 2169–2178. IEEE, 2006. [Google Scholar]
- [15].Ma Jiayi, Zhou Huabing, Zhao Ji, Gao Yuan, Jiang Junjun, and Tian Jinwen. Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Transactions on Geoscience and Remote Sensing, 53(12):6469–6481, 2015. [Google Scholar]
- [16].Najafabadi Maryam M, Villanustre Flavio, Khoshgoftaar Taghi M, Seliya Naeem, Wald Randall, and Muharemagic Edin. Deep learning applications and challenges in big data analytics. Journal of Big Data, 2(1):1, 2015. [Google Scholar]
- [17].Kühlbrandt Werner. The resolution revolution. Science, 343(6178):1443–1444, 2014. [DOI] [PubMed] [Google Scholar]
- [18].Chang Juan, Liu Xiangan, Rochat Ryan H, Baker Matthew L, and Chiu Wah. Reconstructing virus structures from nanometer to near-atomic resolutions with cryo-electron microscopy and tomography. In Viral Molecular Machines, pages 49–90. Springer, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Wan W and Briggs JAG. Cryo-electron tomography and subtomogram averaging. In Methods in enzymology, volume 579, pages 329–367. Elsevier, 2016. [DOI] [PubMed] [Google Scholar]
- [20].Danev Radostin, Kanamaru Shuji, Marko Michael, and Nagayama Kuniaki. Zernike phase contrast cryo-electron tomography. Journal of structural biology, 171(2):174–181, 2010. [DOI] [PubMed] [Google Scholar]
- [21].Xu Min, Beck Martin, and Alber Frank. High-throughput subtomogram alignment and classification by fourier space constrained fast volumetric matching. Journal of structural biology, 178(2):152–164, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Chen Yuxiang, Pfeffer Stefan, Hrabe Thomas, Schuller Jan Michael, and Förster Friedrich. Fast and accurate reference-free alignment of subtomograms. Journal of structural biology, 182(3):235–245, 2013. [DOI] [PubMed] [Google Scholar]
- [23].Förster Friedrich, Pruggnaller Sabine, Seybert Anja, and Frangakis Achilleas S. Classification of cryo-electron sub-tomograms using constrained correlation. Journal of structural biology, 161(3):276–286, 2008. [DOI] [PubMed] [Google Scholar]
- [24].Baldwin Philip R, Zi Tan Yong, Eng Edward T, Rice William J, Noble Alex J, Negro Carl J, Cianfrocco Michael A, Potter Clinton S, and Carragher Bridget. Big data in cryoem: automated collection, processing and accessibility of em data. Current opinion in microbiology, 43:1–8, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Jaderberg Max, Simonyan Karen, Zisserman Andrew, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015. [Google Scholar]
- [26].Rocco Ignacio, Cimpoi Mircea, Arandjelovic Relja, Torii Akihiko, Pajdla Tomas, and Sivic Josef. Neighbourhood consensus networks. In Advances in Neural Information Processing Systems, pages 1651–1662, 2018. [Google Scholar]
- [27].Rocco Ignacio, Arandjelovic Relja, and Sivic Josef. End-to-end weakly-supervised semantic alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6917–6925, 2018. [Google Scholar]
- [28].Kim Seungryong, Min Dongbo, Jeong Somi, Kim Sunok, Jeon Sangryul, and Sohn Kwanghoon. Semantic attribute matching networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12339–12348, 2019. [Google Scholar]
- [29].Ufer Nikolai, Lui Kam To, Schwarz Katja, Warkentin Paul, and Ommer Björn. Weakly supervised learning of dense semantic correspondences and segmentation. In German Conference on Pattern Recognition, pages 456–470. Springer, 2019. [Google Scholar]
- [30].Min Juhong, Lee Jongmin, Ponce Jean, and Cho Minsu. Hyperpixel flow: Semantic correspondence with multi-layer neural features. In Proceedings of the IEEE International Conference on Computer Vision, pages 3395–3404, 2019. [Google Scholar]
- [31].Ufer Nikolai and Ommer Bjorn. Deep semantic feature matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6914–6923, 2017. [Google Scholar]
- [32].Kim Seungryong, Min Dongbo, Ham Bumsub, Jeon Sangryul, Lin Stephen, and Sohn Kwanghoon. Fcss: Fully convolutional self-similarity for dense semantic correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6560–6569, 2017. [Google Scholar]
- [33].Novotny David, Larlus Diane, and Vedaldi Andrea. Anchornet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5277–5286, 2017. [Google Scholar]
- [34].Fathy Mohammed E, Tran Quoc-Huy, Zeeshan Zia M, Vernaza Paul, and Chandraker Manmohan. Hierarchical metric learning and matching for 2d and 3d geometric correspondences. In Proceedings of the European Conference on Computer Vision (ECCV), pages 803–819, 2018. [Google Scholar]
- [35].Rocco Ignacio, Arandjelovic Relja, and Sivic Josef. Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6148–6157, 2017. [DOI] [PubMed] [Google Scholar]
- [36].Simonyan Karen and Zisserman Andrew. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [Google Scholar]
- [37].Memisevic Roland and Hinton Geoffrey. Unsupervised learning of image transformations. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007. [Google Scholar]
- [38].Memisevic Roland and Hinton Geoffrey E. Learning to represent spatial transformations with factored higher-order boltzmann machines. Neural computation, 22(6):1473–1492, 2010. [DOI] [PubMed] [Google Scholar]
- [39].Long Gucan, Kneip Laurent, Alvarez Jose M, Li Hongdong, Zhang Xiaohu, and Yu Qifeng. Learning image matching by simply watching video. In European Conference on Computer Vision, pages 434–450. Springer, 2016. [Google Scholar]
- [40].Janai Joel, Guney Fatma, Ranjan Anurag, Black Michael, and Geiger Andreas. Unsupervised learning of multi-frame optical flow with occlusions. In Proceedings of the European Conference on Computer Vision (ECCV), pages 690–706, 2018. [Google Scholar]
- [41].Jason J Yu, Harley Adam W, and Derpanis Konstantinos G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In European Conference on Computer Vision, pages 3–10. Springer, 2016. [Google Scholar]
- [42].Meister Simon, Hur Junhwa, and Roth Stefan. Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [Google Scholar]
- [43].Wittek Adam, Miller Karol, Kikinis Ron, and Warfield Simon K. Patient-specific model of brain deformation: Application to medical image registration. Journal of biomechanics, 40(4):919–929, 2007. [DOI] [PubMed] [Google Scholar]
- [44].Mohamed Ashraf, Zacharaki Evangelia I, Shen Dinggang, and Davatzikos Christos. Deformable registration of brain tumor images via a statistical model of tumor-induced deformation. Medical image analysis, 10(5):752–763, 2006. [DOI] [PubMed] [Google Scholar]
- [45].Hou Jidong, Guerrero Mariana, Chen Wenjuan, and D’Souza Warren D. Deformable planning ct to cone-beam ct image registration in head-and-neck cancer. Medical physics, 38(4):2088–2094, 2011. [DOI] [PubMed] [Google Scholar]
- [46].Schreibmann Eduard, Nye Jonathon A, Schuster David M, Martin Diego R, Votaw John, and Fox Tim. Mr-based attenuation correction for hybrid pet-mr brain imaging systems using deformable image registration. Medical physics, 37(5):2101–2109, 2010. [DOI] [PubMed] [Google Scholar]
- [47].Zagrodsky Vladimir, Walimbe Vivek, Castro-Pareja Carlos R, Qin Jian Xin, Song Jong-Min, and Shekhar Raj. Registration-assisted segmentation of real-time 3-d echocardiographic data using deformable models. IEEE Transactions on Medical Imaging, 24(9):1089–1099, 2005. [DOI] [PubMed] [Google Scholar]
- [48].Rohé Marc-Michel, Datar Manasi, Heimann Tobias, Sermesant Maxime, and Pennec Xavier. Svf-net: Learning deformable image registration using shape matching. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 266–274. Springer, 2017. [Google Scholar]
- [49].de Vos Bob D, Berendsen Floris F, Viergever Max A, Staring Marius, and Išgum Ivana. End-to-end unsupervised deformable image registration with a convolutional neural network. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 204–212. Springer, 2017. [Google Scholar]
- [50].Balakrishnan Guha, Zhao Amy, Sabuncu Mert R, Guttag John, and Dalca Adrian V. An unsupervised learning model for deformable medical image registration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9252–9260, 2018. [Google Scholar]
- [51].de Vos Bob D, Berendsen Floris F, Viergever Max A, Staring Marius, and Išgum Ivana. A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis, 52:128–143, 2019. [DOI] [PubMed] [Google Scholar]
- [52].Mahapatra Dwarikanath, Antony Bhavna, Sedai Suman, and Garnavi Rahil. Deformable medical image registration using generative adversarial networks. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pages 1449–1453. IEEE, 2018. [Google Scholar]
- [53].Kim Boah, Kim Jieun, Lee June-Goo, Kim Dong Hwan, Park Seong Ho, and Chul Ye Jong. Unsupervised deformable image registration using cycle-consistent cnn. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 166–174. Springer, 2019. [Google Scholar]
- [54].Melekhov Iaroslav, Tiulpin Aleksei, Sattler Torsten, Pollefeys Marc, Rahtu Esa, and Kannala Juho. Dgc-net: Dense geometric correspondence network. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1034–1042. IEEE, 2019. [Google Scholar]
- [55].Böhm Jochen, Frangakis Achilleas S, Hegerl Reiner, Nickell Stephan, Typke Dieter, and Baumeister Wolfgang. Toward detecting and identifying macromolecules in a cellular context: template matching applied to electron tomograms. Proceedings of the National Academy of Sciences, 97(26):14245–14250, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Förster Friedrich and Hegerl Reiner. Structure determination in situ by averaging of tomograms. Methods in cell biology, 79:741–767, 2007. [DOI] [PubMed] [Google Scholar]
- [57].Amat Fernando, Comolli Luis R, Moussavi Farshid, Smit John, Downing Kenneth H, and Horowitz Mark. Subtomogram alignment by adaptive fourier coefficient thresholding. Journal of structural biology, 171(3):332–344, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Kovacs Julio A and Wriggers Willy. Fast rotational matching. Acta Crystallographica Section D: Biological Crystallography, 58(8):1282–1286, 2002. [DOI] [PubMed] [Google Scholar]
- [59].Kuybeda Oleg, Frank Gabriel A, Bartesaghi Alberto, Borgnia Mario, Subramaniam Sriram, and Sapiro Guillermo. A collaborative framework for 3d alignment and classification of heterogeneous subvolumes in cryo-electron tomography. Journal of structural biology, 181(2):116–127, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Huang Gao, Liu Zhuang, Van Der Maaten Laurens, and Weinberger Kilian Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017. [Google Scholar]
- [61].Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. [Google Scholar]
- [62].He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [Google Scholar]
- [63].Rippel Oren, Snoek Jasper, and Adams Ryan P. Spectral representations for convolutional neural networks. In Advances in neural information processing systems, pages 2449–2457, 2015. [Google Scholar]
- [64].Smith James S and Wilamowski Bogdan M. Discrete cosine transform spectral pooling layers for convolutional neural networks. In International Conference on Artificial Intelligence and Soft Computing, pages 235–246. Springer, 2018. [Google Scholar]
- [65].Zhang Hao and Ma Jianwei. Hartley spectral pooling for deep learning. arXiv preprint arXiv:1810.04028, 2018. [Google Scholar]
- [66].Watson Andrew B. Image compression using the discrete cosine transform. Mathematica journal, 4(1):81, 1994. [Google Scholar]
- [67].Alshibami O and Boussakta Said. Fast algorithm for the 3d dct. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), volume 3, pages 1945–1948. IEEE, 2001. [Google Scholar]
- [68].Han Xufeng, Leung Thomas, Jia Yangqing, Sukthankar Rahul, and Berg Alexander C. Matchnet: Unifying feature and metric learning for patch-based matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3279–3286, 2015. [Google Scholar]
- [69].Kwang Moo Yi Eduard Trulls, Lepetit Vincent, and Fua Pascal. Lift: Learned invariant feature transform. In European Conference on Computer Vision, pages 467–483. Springer, 2016. [Google Scholar]
- [70].Schmidt Tanner, Newcombe Richard, and Fox Dieter. Self-supervised visual descriptor learning for dense correspondence. IEEE Robotics and Automation Letters, 2(2):420–427, 2016. [Google Scholar]
- [71].Melekhov Iaroslav, Kannala Juho, and Rahtu Esa. Image patch matching using convolutional descriptors with euclidean distance. In Asian Conference on Computer Vision, pages 638–653. Springer, 2016. [Google Scholar]
- [72].Song Hyun Oh, Jegelka Stefanie, Rathod Vivek, and Murphy Kevin. Deep metric learning via facility location. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5382–5390, 2017. [Google Scholar]
- [73].Bartesaghi Alberto, Sprechmann P, Liu J, Randall G, Sapiro G, and Subramaniam Sriram. Classification and 3d averaging with missing wedge correction in biological electron tomography. Journal of structural biology, 162(3):436–450, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Eggert David W, Lorusso Adele, and Fisher Robert B. Estimating 3-d rigid body transformations: a comparison of four major algorithms. Machine vision and applications, 9(5-6):272–290, 1997. [Google Scholar]
- [75].Xu Min, Singla Jitin, Tocheva Elitza I, Chang Yi-Wei, Stevens Raymond C, Jensen Grant J, and Alber Frank. De novo structural pattern mining in cellular electron cryotomograms. Structure, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Guo Qiang, Lehmer Carina, Martínez-Sánchez Antonio, Rudack Till, Beck Florian, Hartmann Hannelore, Pérez-Berlanga Manuela, Frottin Frédéric, Hipp Mark S, Hartl F Ulrich, et al. In situ structure of neuronal c9orf72 poly-ga aggregates reveals proteasome recruitment. Cell, 172(4):696–705, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Bharat Tanmay AM and Scheres Sjors HW. Resolving macromolecular structures from electron cryo-tomography data using subtomogram averaging in relion. Nature protocols, 11(11):2054, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Kunz Michael, Yu Zhou, and Frangakis Achilleas S. M-free: Mask-independent scoring of the reference bias. Journal of structural biology, 192(2):307–311, 2015. [DOI] [PubMed] [Google Scholar]
- [79].Noble Alex J, Dandey Venkata P, Wei Hui, Brasch Julia, Chase Jillian, Acharya Priyamvada, Zi Tan Yong, Zhang Zhening, Kim Laura Y, Scapin Giovanna, et al. Routine single particle cryoem sample and grid characterization by tomography. Elife, 7:e34257, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Noble Alex J, Wei Hui, Dandey Venkata P, Zhang Zhening, Zi Tan Yong, Potter Clinton S, and Carragher Bridget. Reducing effects of particle adsorption to the air–water interface in cryo-em. Nature methods, 15(10):793, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Galaz-Montoya Jesús G, Flanagan John, Schmid Michael F, and Ludtke Steven J. Single particle tomography in eman2. Journal of structural biology, 190(3):279–290, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Pei Long, Xu Min, Frazier Zachary, and Alber Frank. Simulating cryo electron tomograms of crowded cell cytoplasm for assessment of automated particle picking. BMC bioinformatics, 17(1):405, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Chollet François et al. Keras (2015), 2017. [Google Scholar]
- [84].Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, et al. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 265–283, 2016. [Google Scholar]
- [85].Zhou Jiahuan, Xu Weiqi, and Chellali Ryad. Analysing the effects of pooling combinations on invariance to position and deformation in convolutional neural networks. In 2017 IEEE International Conference on Cyborg and Bionic Systems (CBS), pages 226–230. IEEE, 2017. [Google Scholar]
- [86].Briggs John AG. Structural biology in situ—the potential of subtomogram averaging. Current opinion in structural biology, 23(2):261–267, 2013. [DOI] [PubMed] [Google Scholar]
- [87].Van Heel Marin and Schatz Michael. Fourier shell correlation threshold criteria. Journal of structural biology, 151(3):250–262, 2005. [DOI] [PubMed] [Google Scholar]
- [88].Han Renmin, Wang Liansan, Liu Zhiyong, Sun Fei, and Zhang Fa. A novel fully automatic scheme for fiducial marker-based alignment in electron tomography. Journal of structural biology, 192(3):403–417, 2015. [DOI] [PubMed] [Google Scholar]
- [89].Zhou Z Hong. Towards atomic resolution structural determination by single-particle cryo-electron microscopy. Current opinion in structural biology, 18(2):218–228, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [90].Drira Hassen, Ben Amor Boulbaba, Srivastava Anuj, Daoudi Mohamed, and Slama Rim. 3d face recognition under expressions, occlusions, and pose variations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9):2270–2283, 2013. [DOI] [PubMed] [Google Scholar]
- [91].Zhu Xiangyu, Lei Zhen, Liu Xiaoming, Shi Hailin, and Li Stan Z. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 146–155, 2016. [Google Scholar]
- [92].Ye Yuanxin and Shen Li. Hopc: A novel similarity metric based on geometric structural properties for multi-modal remote sensing image matching. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3:9, 2016. [Google Scholar]
- [93].Chen Min, Habib Ayman, He Haiqing, Zhu Qing, and Zhang Wei. Robust feature matching method for sar and optical images by using gaussian-gamma-shaped bi-windows-based descriptor and geometric constraint. Remote Sensing, 9(9):882, 2017. [Google Scholar]
- [94].Chailloux Cyril, Le Caillec Jean-Marc, Gueriot Didier, and Zerr Benoit. Intensity-based block matching algorithm for mosaicing sonar images. IEEE Journal of Oceanic Engineering, 36(4):627–645, 2011. [Google Scholar]
- [95].Pham Minh Tân and Gueriot Didier. Guided block-matching for sonar image registration using unsupervised kohonen neural networks. In 2013 OCEANS-San Diego, pages 1–5. IEEE, 2013. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




