Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2024 Apr 15:arXiv:2404.10178v1. [Version 1]

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

Chentianye Xu 1,, Xueying Zhan 2,, Min Xu 2
PMCID: PMC11065045  PMID: 38699171

Abstract

Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods’ sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-based approaches often require extensive labeled datasets, limiting their practicality. To overcome these obstacles, we introduce cryoMAE, a novel approach based on few-shot learning that harnesses the capabilities of Masked Autoencoders (MAE) to enable efficient selection of single particles in cryo-EM images. Contrary to conventional NN-based techniques, cryoMAE requires only a minimal set of positive particle images for training yet demonstrates high performance in particle detection. Furthermore, the implementation of a self-cross similarity loss ensures distinct features for particle and background regions, thereby enhancing the discrimination capability of cryoMAE. Experiments on large-scale cryo-EM datasets show that cryoMAE outperforms existing state-of-the-art (SOTA) methods, improving 3D reconstruction resolution by up to 22.4%.

Keywords: Cryo-electron microscopy, particle picking, few-shot learning, masked autoencoder

1. Introduction

Cryo-EM is vital for obtaining high-resolution images of biological entities, such as cells, viruses, and proteins, at cryogenic temperatures, significantly minimizing radiation damage. It has revolutionized structural biology, especially through single-particle analysis (SPA), allowing for the detailed examination of molecular structures in their near-native state [12]. The process starts with sample preparation, where specimens are vitrified in a thin ice layer to maintain their native state. Researchers then use a transmission electron microscope to gather multiple 2D projection images from different angles. Image processing includes denoising and identifying particles for 3D reconstruction. Fig. 1 presents a simplified workflow of SPA using cryo-EM [25].

Figure 1:

Figure 1:

In cryo-EM with SPA, electron beams capture numerous 2D images of proteins within a cryogenically preserved sample. These images are subsequently denoised and subjected to particle picking, facilitating the reconstruction of the 3D structure of the protein.

Particle picking is a pivotal step in cryo-EM for isolating individual protein particles from micrographs for further analysis. The quality of particle picking significantly influences the accuracy and resolution of the reconstructed particle structure in the following steps. Challenges in particle picking include the low SNR and varied particle orientations in cryo-EM micrographs, necessitating a large sample size for accurate 3D reconstructions [1]. Moreover, manual picking is inefficient, time-consuming, labor-intensive, error-prone, and introduces dataset inconsistencies [4]. Mis-identifications, or false positives, further compromise reconstruction quality. These issues highlight the need for improved particle selection techniques to enhance both the efficiency of particle identification and the overall quality of cryo-EM reconstructions, emphasizing the reduction of false positives and the increase of true positives [11].

Various semi-automated and automated cryo-EM particle picking methods have been developed in response to this need. Traditional methods are categorized into template-free [13] and template-based methods [14, 16, 17, 19]. Template-free methods like the Difference of Gaussians (DoG) [21] are noise-sensitive and less effective for irregular particles. Template-based approaches struggle with particle variability and are ill-suited for novel structures, limiting their efficacy in complex cryo-EM analysis. With the advent of deep learning, NN-based particle picking methods [1, 22, 23, 26] have been proposed, marking a significant evolution in the field. These advanced techniques leverage the powerful pattern recognition capabilities of deep learning models to enhance the accuracy and efficiency of particle picking. Among these methods, crYOLO [22] and Topaz [1] are notable for their widespread application. While crYOLO is recognized for its efficiency in particle detection, it occasionally misses real particles. Topaz, though capable of identifying particles with limited labeled data, is susceptible to false positives and duplicates. Despite claims of minimal data requirements, these methods still often require large-scale labeled datasets for improved performance. Moreover, they exhibit limited generalization to unseen data, restricting their applicability in diverse cryo-EM research settings.

In this study, we present cryoMAE, a cutting-edge cryo-EM particle picking approach, drawing inspiration from MAE [7]. Leveraging the few-shot learning paradigm, cryoMAE is meticulously designed to first learn representative particle features from a limited set of cryo-EM particle regions efficiently, cryoMAE then detects and extracts particles from query micrographs by comparing the latent features generated for exemplars against those from regions within the query micrographs. The operation of cryoMAE unfolds in two distinct stages. Initially, it trains on a curated set of particle regions and a broader selection of unlabeled regions from a reference micrograph, utilizing a self-supervised approach. We introduce a unique self-cross similarity loss, ensuring the cryoMAE encoder generates distinct latent features for particle and non-particle areas. Subsequently, the trained encoder analyzes query micrographs, extracting and comparing latent features to exemplar features to ascertain particle locations through similarity scoring.

The performance of cryoMAE was rigorously evaluated using the CryoPPP cryo-EM particle picking dataset [4], showcasing significant enhancements in 3D particle reconstruction resolution. Particles selected by our model from this dataset exhibit up to 22.4% (average 11.1%) improvement in resolution compared to those picked using current SOTA models. Remarkably, these results were achieved using just a few labeled exemplars (e.g., 15) per protein type, highlighting cryoMAE’s efficient use of limited data.

Our contributions are summarized as follows:

  1. We introduce CryoMAE, an innovative two-stage few-shot learning method specifically designed for SPA cryo-EM particle picking task. This approach markedly diminishes the reliance on extensive, labeled datasets.

  2. We propose a novel formulation of self-cross similarity loss, aiming to promote the model capacity to differentiate between particle objects and background regions.

  3. Our experimental findings indicate that cryoMAE achieves up to 22.4% improvement in the resolution of 3D particle reconstructions compared to SOTA NN-based particle picking methods.

2. Related Work

2.0.1. Particle Picking.

Introduced in Section 1, a variety of approaches exist for cryo-EM particle picking, ranging from automated to semi-automated techniques. Template matching, use predefined reference images or “templates” and cross-correlation for particle identification [20, 24], performing best with known particle structures but limited by template quality and diversity. In contrast, template-free methods bypass the need for templates, employing computer vision techniques to distinguish particles. For instance, the DoG method emphasizes particles by contrasting two differently blurred image versions, improving visibility but risking noise amplification in low-SNR scenarios.

NN-based particle picking methods [1, 22, 23, 26] provide significant advances in cryo-EM, offering more accurate, efficient, and accessible solutions. These methods can learn from a wide range of particle shapes, sizes, and orientations directly from the training data, making them more adaptable to different datasets without the need for specific templates. CrYOLO [22] and Topaz [1] are distinguished for their advanced particle picking capabilities in cryo-EM. CrYOLO leverages the You Only Look Once framework [15] for particle detection, and Topaz employs convolutional neural networks (CNNs) with positive-unlabeled (PU) learning. Despite their strengths, crYOLO may overlook true particles, while Topaz is prone to recognizing numerous false positives and duplicates [5]. They require extensive labeled datasets, demanding significant time and resources. Our cryoMAE, utilizing few-shot learning, offers high efficiency using a minimal number of exemplars. It effectively reduces false negatives and positives, and minimizes reliance on large labeled datasets, representing a significant leap in cryo-EM particle picking technology.

2.0.2. Masked Autoencoders.

MAEs were initially introduced by He et al. [7], drawing inspiration from the BERT model [3], a transformative approach in natural language processing. MAEs bring the innovative concept of masking into the realm of computer vision, a technique where random sections of an image are obscured (masked) before being processed by an encoder. Subsequently, a decoder attempts to reconstruct these masked sections. [7] demonstrated that masking a substantial portion of the image (up to 75%) compels the model to learn deeper and more comprehensive representations of the data. In our study, we harness the exceptional feature extraction capabilities of MAEs to discern unique features of particles, thereby enhancing the efficiency and accuracy of particle picking in cryo-EM.

2.0.3. Contrastive Learning.

Contrastive Learning has been a transformative force in unsupervised learning, concentrating on increasing the similarity between representations of positive pairs while simultaneously differentiating those of negative pairs. Pioneering this approach, the concept of contrastive loss was introduced for dimensionality reduction and embedding learning, aiming to preserve semantic similarity [6]. Further advances have been made with the development of SimCLR [2], which utilizes data augmentation techniques to enhance the robustness of visual representations. Moreover, He et al. [8] introduced Momentum Contrast, a methodology for building dynamic dictionaries in contrastive learning, which refines the application of contrastive loss. This refinement ensures the consistency of the representations for negative samples across the learning process. In our research, we leverage the principles of contrastive learning to develop a unique contrastive loss mechanism called self-cross similarity loss. This innovation enables our model to effectively discriminate between regions containing particles and background regions.

3. Methodology

In this section, we detail cryoMAE, starting with defining the few-shot cryo-EM particle picking problem, followed by our two-stage framework.

3.1. Overview

3.1.1. Problem setup.

Given a reference micrograph R, containing the target particles for analysis, we first randomly select a reference micrograph R and manually label m (m is a small number, e.g. 15) particle regions xil as exemplars (XL={xil}i=1m), and randomly crop additional n regions xju from the same cryo-EM micrograph as unlabeled regions (XU={xju}j=1n). The remaining micrographs containing the same particle are query micrograph set Q. Our goal is to leverage the limited set of exemplars XL and unlabeled regions extracted from R to detect the particles within R and Q.

As depicted in Fig. 2, our framework unfolds in two distinct stages. In stage 1, cryoMAE is trained using a mixture of labeled exemplars XL and unlabeled regions XU from R. This training process is guided by both mean squared error reconstruction loss and a novel self-cross similarity loss, which helps the model distinguish between regions with and without particles. In stage 2, trained MAE encoder scans query micrographs to identify particles, comparing latent features of regions against those of exemplars to determine similarity scores. Regions with higher similarity scores are identified as more likely to contain particles, facilitating accurate particle picking.

Figure 2:

Figure 2:

Overview of the two-stage cryoMAE framework: stage 1 illustrates the training phase with a mix of labeled particle and unlabeled regions, employing reconstruction loss and self-cross similarity loss. Stage 2 depicts the particle picking process, where the trained MAE encoder assesses query micrographs, leveraging latent feature comparisons to identify particle positions accurately.

3.2. Stage 1: Training on One Reference Micrograph

3.2.1. Model training.

For each protein type represented by multiple micrographs, we select a reference micrograph R with manually annotated regions XL as exemplars and crop random unlabeled regions XU from the remaining parts of R. As discussed in [1], particle regions are sparse within micrographs, making most unlabeled regions likely non-particle areas. These images are resized to 224 × 224 and further processed into 16 × 16 patches during training, which are then subjected to random masking at a rate of 75%. This process transforms exemplar and unlabeled regions into xˆil for labeled exemplars and xˆju for unlabeled regions, respectively. The cryoMAE encoder then generates latent features for these regions, denoted as E(xˆil) and E(xˆju) respectively. Subsequently, the MAE decoder utilizes the generated latent features to reconstruct the original input images. This reconstruction is achieved through a self-supervised process, with the original images serving as the supervisory signal. This masking encourages the model to focus on global features of cryo-EM images, enhancing understanding of particle structures and generalizing across conditions. Such a focus is crucial for overcoming the limited training data challenge in the cryo-EM field, improving the model’s performance in particle detection and generalization.

Training cryoMAE incorporates both particle and unlabeled regions to bolster model robustness. Exclusive training on particle images could lead MAE to converge towards a homogeneous latent feature space for any given input, potentially escalating the false positive rate by assigning high similarity scores indiscriminately, including to background regions. By including unlabeled regions, cryoMAE learns to recognize features of non-particle spaces, avoiding overfitting to a solely particle-focused feature space. This broader training approach refines the model’s ability to distinguish between particle and background regions, markedly lowering false positive rates by assigning more accurate similarity scores to non-particle areas. However, adding unlabeled regions faces some challenges: 1) the diverse background noise in cryo-EM, ranging from crystalline ice contamination and malformed particles to grayscale background regions, which demands a nuanced approach for accurate differentiation; 2) merely incorporating unlabeled data might not prompt the model to learn features unique to particles against complex backgrounds. To optimize the training efficiency of cryoMAE few-shot particle datasets and reduce overfitting risks, while also accounting for a wide range of background noise, we introduced a pre-training phase. Pre-training cryoMAE on a broader set of unlabeled regions better represents background variability. Further, introducing a self-cross similarity loss specifically addresses these noise issues, enhancing the model’s ability to discern particles from backgrounds.

3.2.2. Self-cross similarity.

Drawing from the self-similarity concept [18], we develop a self-cross similarity loss to foster distinct latent features for particles and background within cryo-EM images, enhancing the model’s ability to differentiate between these regions. This approach aims to increase the disparity in the feature space, thereby improving the precision of particle identification. As illustrated in Fig. 2, the MAE encoder’s latent features are utilized not only for image reconstruction by the decoder but also are evaluated using the self-cross similarity loss, further detailed in Fig. 3. The cosine similarity between two feature vectors a and b is calculated as Scos(a,b)=aTbab.

Figure 3:

Figure 3:

Self-cross similarity loss.

The self-similarity Sself is calculated as the mean cosine similarity among the features of positive regions, formalized as:

Sself=1m2i=1mj=1mScos(E(xˆil),E(xˆjl)). (1)

Similarly, the cross similarity Scross is the mean cosine similarity between features of positive and unlabeled regions:

Scross=1mni=1mj=1nScos(E(xˆil),E(xˆju)). (2)

Sself measures the similarity among latent features of exemplars, reflecting the internal consistency of particle features. This is crucial for the model to identify and enhance particle-specific patterns, facilitating better distinction from background noise. Ideally, The goal is for Sself to increase, indicating stronger similarity within particle groups. Conversely, Scross assesses the similarity between exemplar features and those of unlabeled (negative) regions, aiming to capture the distinctiveness between particles and background. The objective is for Scross to decrease, signifying reduced feature similarity between particles and background. Self-cross similarity loss LSCS is designed to optimize these dynamics, thereby improving model’s ability to differentiate between particles and backgrounds:

LSCSScross,Sself=maxτ,1+αScross-(1-α)Sself. (3)

α balances self and cross-similarity contributions, and τ sets a minimum difference threshold between them, limiting further distinction efforts beyond it.

3.2.3. PU learning.

Inspired by [1], we identify a limitation in our previous loss function design, which treats unlabeled data as negative. Randomly cropped training regions may unintentionally include particles, potentially confusing the model’s distinction between labeled particles and background noise. This overlap complicates training, as the model could wrongly link particle features with the background, undermining our strategy to reduce background similarity scores and challenging the model’s ability to learn discriminatively. To enhance the loss formulation, we accommodate the potential inclusion of particles in unlabeled regions. Acknowledging that a certain proportion (πˆ) of these samples may harbor particles, we modify the representation of features for these unlabeled samples.

We adjust feature representation by implementing a weighting scheme grounded in the estimated probability πˆ that an unlabeled region harbors a particle, alongside a complementary weight 1-πˆ for regions likely devoid of particles. This probabilistic approach enhances the model’s capacity to differentiate between particle-laden regions and pure background, optimizing the use of unlabeled data in training and improving particle identification accuracy. The presence of particles in unlabeled regions necessitates a recalibration of similarity calculations, introducing a deeper analysis of self-similarity among potential positives and their cross-similarity with potential negatives within the unlabeled data:

Sˆself=1(m+n)2Sll+2πˆSlu+πˆ2Suu, (4)
Sˆcross=1(m+n)×n(1-πˆ)Slu+πˆ(1-πˆ)Suu. (5)

Sll, Slu, and Suu measure the sums of cosine similarities among exemplars, between exemplars and unlabeled regions, and among unlabeled regions, respectively. In the formulas, we decide not to adjust n because we treat each latent feature adjustment as a weighted process. Under this logic, we view it as having n latent features adjusted by πˆ and 1-πˆ, rather than having a total of πˆn particle regions or (1-πˆ)n background regions within all unlabeled regions. This enhances the clarity of our methodology and ensures its alignment with Fig. 3. thereby preserving logical coherence. The refined self-cross similarity loss, LˆSCS(Sˆcross,Sˆself), adeptly captures the complexity of similarity within data subsets. By refining these calculations, we refine these metrics to account for the intricate characteristics of unlabeled data, facilitating a more discerning and efficacious training regimen.

The total loss of cryoMAE, taking into account the reconstruction loss:

Ltotal=LMSE+βLˆSCS. (6)

Here β adjusts the weight of the self-cross similarity loss in the overall loss function, balancing reconstruction accuracy with discriminative learning.

3.3. Stage 2: Particle Picking on Query Micrographs

In stage 2, our model undertakes particle picking by utilizing the MAE encoder to scan query micrographs and extract features from each sliding window region, as detailed in Stage 2 of Fig. 2. This stage does not employ masking for the input regions. The extracted latent features are then matched against those of exemplars through cosine similarity, assigning similarity scores to each region based on the highest similarity. Following the completion of the sliding process on a micrograph, these similarity scores are ranked. It is crucial to recognize the variability in the imaging states of different micrographs, where a single threshold does not work well. Therefore, we adopt a density-based method to determine the most suitable cutoff threshold for each micrograph automatically. This process involves calculating the average distance of each score to its k nearest neighbors, and finding the score where the rate of change in these average distances is maximized as the cutoff threshold. Coordinates of all regions with similarity scores exceeding this threshold, along with the micrograph filenames, are recorded in a .star file. The .star format is widely used in cryo-EM to document particle coordinates, aiding in subsequent steps like 3D reconstruction using CryoSPARC.

4. Experiments

This section evaluates cryoMAE against SOTA particle picking methods using the CryoPPP dataset, including ablation studies, sensitive analysis, and qualitative visualizations to demonstrate its effectiveness.

4.1. Experimental Setup

4.1.1. Datasets.

We evaluated cryoMAE using five distinct particle datasets from CryoPPP [4], which were obtained from the Electron Microscopy Public Image Archive (EMPIAR) database [10]. EMPIAR is a publicly accessible resource that offers raw, high-resolution cryo-EM images for research and benchmarking in the field of electron microscopy. The datasets used in our experiments, identified by EMPIAR IDs 10081, 10093, 10345, 10532, and 11056, comprise 300, 300, 300, 300, and 361 micrographs, respectively, each accompanied by particle coordinate information. Each EMPIAR ID corresponds to a unique protein type, facilitating targeted analysis within our SPA framework.

4.1.2. Baselines.

In this study, we utilized crYOLO1 [22] and Topaz2 [1] introduced in Section 2 as our baselines. For crYOLO, we employed the general model for crYOLO pre-trained on more than 40 datasets that can select particles of previously unseen macromolecular species as claimed in [22]. For Topaz, we used a pre-trained model based on ResNet [9] (16 layers, each layer has 64 units) trained on large-scale cryo-EM datasets.

4.1.3. Evaluation metrics.

Our evaluation metrics include precision, recall, and F1 scores. A true positive occurs when a picked particle region overlaps with a ground truth region, achieving an intersection over union (IoU) of 0.5 or higher, with each ground truth accounted for only once. False positives include picked regions that either have an IoU less than 0.5 with any ground truth region or represent multiple detections for a single ground truth. False negatives are ground truth regions that remain undetected.

4.1.4. Particle picking.

The cryoMAE encoder slides on and processes query images in stage 2 with a stride of 28, extracting features for each sub-region. These features are matched against exemplar features, assigning the highest similarity score to each region. Following the sliding process, scores are ordered, and a density-based approach determines the cut-off threshold by identifying a sharp change in the 5 nearest neighbor average distance list. Coordinates from regions above this threshold are pinpointed as particle locations.

4.1.5. 3D reconstruction.

We utilized CryoSPARC [14] to conduct 3D reconstructions on particles selected by various methods and compared the resolutions of the reconstructed particles. The workflow, from particle picking to reconstructed structure, encompasses essential steps: contrast transfer function (CTF) estimation, 2D classification, 2D class selection, ab initio reconstruction, and homogeneous refinement. CTF estimation corrects for the microscope’s phase contrast, crucial for high-resolution reconstructions. 2D classification sorts particles into classes, removing aberrant particles to improve data quality. 2D class selection further ensures only high-quality particles are used, followed by ab initio reconstruction for an initial 3D model creation without prior knowledge.

4.2. Overall Performance

The performance comparison of crYOLO, Topaz, and cryoMAE in particle picking is detailed in Tables 1 and 2, with 3D reconstruction outcomes visualized in Fig. 4. Table 1 reveals the high precision of crYOLO but its tendency to miss many particles. This is particularly evident with EMPIAR-10081, where crYOLO shows strong performance due to pre-training on a dataset that included EMPIAR-10081 particles. This pre-trained crYOLO model raises questions about its generalization to new particle types, where performance significantly drops, highlighting a generalization issue. Topaz scores well in the recall, albeit with a high false positive rate. Conversely, cryoMAE excels in both precision and recall, outperforming Topaz in all evaluated metrics for five particles and showing better recall than crYOLO, aside from EMPIAR-10081. It surpasses crYOLO in precision and F1 score, excluding EMPIAR-10081. Importantly, cryoMAE leads to the highest 3D reconstruction resolution on CryoPPP dataset particles, indicating an average 11% resolution improvement over baselines.

Table 1:

Performance comparison of cryoMAE, crYOLO, and Topaz on CryoPPP.

EMPIAR ID Data information
Precision
Recall
F1 score
Image size Particle diameter (px) crYOLO Topaz Ours crYOLO Topaz Ours crYOLO Topaz Ours

10081 (3710, 3838) 154 0.705 0.412 0.645 0.867 0.855 0.939 0.777 0.556 0.765
10093 (3838, 3710) 172 0.380 0.328 0.383 0.355 0.209 0.497 0.367 0.255 0.433
10345 (3838, 3710) 149 0.441 0.195 0.473 0.561 0.732 0.733 0.494 0.308 0.575
10532 (4096, 4096) 174 0.501 0.387 0.503 0.231 0.311 0.497 0.316 0.345 0.500
11056 (5760, 4092) 164 0.690 0.453 0.694 0.465 0.578 0.671 0.556 0.507 0.682

Average - - 0.543 0.355 0.540 0.496 0.537 0.667 0.502 0.394 0.591

Table 2:

Ab-initio reconstruction resolution comparison of cryoMAE, crYOLO, and Topaz across EMPIAR Datasets from CryoPPP.

EMPIAR ID Protein type # of micrographs # of GT particles Resolution (Å)
crYOLO Topaz Ours

10081 Transport protein 300 39,352 12.25 12.72 11.32
10093 Membrane protein 300 56,394 11.64 11.62 9.02
10345 Signaling protein 300 15,894 11.63 10.39 10.27
10532 Viral protein 300 87,933 12.86 10.85 9.92

Average - - - 12.10 11.40 10.13

Figure 4:

Figure 4:

3D reconstructions for EMPIAR-10081 and EMPIAR-10093 using crYOLO, Topaz, and cryoMAE: (a)-(c) for 10081, (d)-(f) for 10093.

4.3. Ablation Studies

Ablation studies validate the contributions of key cryoMAE components: self-cross similarity loss, pre-training, and exemplar similarity matching.

4.3.1. W/ and w/o self-cross similarity loss.

We assessed the performance of cryoMAE across different configurations of self-cross similarity loss (with self-cross similarity loss, with unadjusted self-cross similarity loss, and with adjusted self-cross similarity loss) in Table 3, revealing optimal performance with adjusted loss. This finding highlights the crucial impact of self-cross similarity loss in enhancing feature extraction, making cryoMAE more discerning in particle selection and greatly lowering the chance of incorrect region identification. CryoMAE without self-cross similarity loss incorrectly scores many non-particle regions highly, evident from widespread white areas in Fig.5(b)(e). In contrast, with this loss, cryoMAE specificity improves, accurately identifying particle regions, as shown in Fig.5(c)(f), reducing false scores for background areas. Further insights are shown in Fig. 6, displaying a cosine similarity matrix for 12 regions, including 4 exemplars (1–4) and 8 unlabeled areas (5–12), with region 10 being a particle region. The matrix demonstrates high similarity among particle regions and lower similarity between particle and background regions, highlighting the model’s ability to group particle regions closely in the feature space and distinguish them from the background. This is key to the success of the self-cross similarity loss, enabling the model to significantly reduce similarity scores for non-target areas and concentrate high scores on central particle regions, thus reducing false positives. Conversely, models trained without this loss struggle to separate particle regions from backgrounds, leading to increased false positives.

Table 3:

Comparison of cryoMAE supervised w/o self-cross similarity loss, w/ unadjusted self-cross similarity loss LSCS, and w/ adjusted self-cross similarity loss LˆSCS.

EMPIAR ID Precision
Recall
F1 Score
w/o w/ LSCS w/LˆSCS w/o w/ LSCS w/LˆSCS w/o w/ LSCS w/LˆSCS

10081 0.225 0.639 0.645 0.652 0.928 0.939 0.335 0.757 0.765
10093 0.143 0.376 0.383 0.216 0.493 0.497 0.172 0.427 0.433
10345 0.180 0.474 0.473 0.547 0.724 0.733 0.271 0.573 0.575
10532 0.177 0.495 0.503 0.269 0.478 0.497 0.213 0.486 0.500
11056 0.154 0.691 0.694 0.327 0.639 0.671 0.209 0.664 0.682

Average 0.176 0.535 0.540 0.402 0.652 0.667 0.240 0.581 0.591
Figure 5:

Figure 5:

Similarity maps generated from query micrographs by cryoMAE, w/ and w/o adjusted self-cross similarity loss. (a),(d) original micrographs; (b),(e) similarity map w/o adjusted self-cross similarity loss; (c),(f) w/ adjusted self-cross similarity loss.

Figure 6:

Figure 6:

Cosine similarity matrix for 12 regions, comprising 4 exemplars (1–4) and 8 unlabeled regions (5–12). Entries at the intersection of row i and column j denote the cosine similarity between latent features of regions i and j.

We also conduct 2D t-SNE visualizations to analyze the latent features of cryoMAE under varying conditions: trained on a dataset without unlabeled regions, trained on a dataset with unlabeled regions without the adjusted self-cross similarity loss, and trained on a dataset with unlabeled regions with the adjusted self-cross similarity loss. For each visualization, we randomly select a consistent set of 60 exemplars and 360 unlabeled regions from EMPIAR-10081 to ensure comparability across the three scenarios. The visualizations are in Fig. 7a, Fig. 7b and Fig. 7c, respectively. As demonstrated in Fig. 7a, training exclusively on particle regions leads cryoMAE to generate homogeneous latent features for any input. This approach risks elevating the false positive rate by indiscriminately assigning high similarity scores, including to background regions. Fig. 7b illustrates that incorporating unlabeled regions enables cryoMAE to discern features of non-particle regions, thus mitigating over-fitting to a particle-exclusive feature space. Consequently, the model acquires a preliminary capability to differentiate between particle and background regions, although with limited clarity (as observed in the latent feature space 2D visualization, where the blue and yellow clusters are approximately but not distinctly separated). Further advancements are evident in Fig. 7c, where the introduction of adjusted self-cross similarity loss significantly enhances the model’s ability to distinguish between background regions and particles. This improvement is illustrated by the distinct separation between the two clusters in the figure, despite the presence of some yellow points within the blue cluster. These exceptions, representing particle-containing regions within unlabeled areas, are considered reasonable.

Figure 7:

Figure 7:

2D t-SNE visualizations of cryoMAE latent feature space (on EMPIAR-10081).

4.3.2. W/ and w/o pre-training.

Table 4 demonstrates how employing pre-training strategy on cryoMAE markedly promotes model performance, with gains in precision and recall at 41.4%, and in F1 score at 36.8%. Without pre-training, cryoMAE shows reduced effectiveness, likely due to overfitting on the limited training data, hindering its generalization capabilities, especially in recognizing varied particle orientations and background noise variations.

Table 4:

W/ and w/o pre-training.

EMPIAR ID Precision
Recall
F1 Score
w/ w/o w/ w/o w/ w/o

10081 0.645 0.352 0.939 0.919 0.765 0.509
10093 0.383 0.281 0.497 0.479 0.433 0.354
10345 0.473 0.209 0.733 0.581 0.575 0.307
10532 0.503 0.404 0.497 0.347 0.500 0.373
11056 0.694 0.664 0.671 0.579 0.682 0.619

Average 0.540 0.382 0.667 0.581 0.591 0.432

4.3.3. Max and mean matching strategies.

Table 5 presents a comparative study on two similarity score calculation methods for matching sliding regions against exemplar latent features: maximum vs. average cosine similarity. Table 5 reveals that maximum cosine similarity outperforms average cosine similarity, particularly in precision. This advantage is linked to the varied orientation distributions among particle exemplars. Maximum cosine similarity effectively matches regions to their closest exemplar across different orientations, ensuring optimal scores. Conversely, average cosine similarity dilutes scores for particles with diverse orientations, as it averages across all exemplars, including those with markedly different particle orientations from the target region. This dilution lowers similarity scores for such particles, reducing their distinctiveness from the background and making accurate particle identification more challenging amidst noise.

Table 5:

Max and mean matching.

EMPIAR ID Precision
Recall
F1 Score
Max Mean Max Mean Max Mean

10081 0.645 0.595 0.939 0.946 0.765 0.731
10093 0.383 0.367 0.497 0.548 0.433 0.440
10345 0.473 0.396 0.733 0.718 0.575 0.510
10532 0.503 0.498 0.497 0.502 0.500 0.500
11056 0.694 0.606 0.671 0.650 0.682 0.628

Average 0.540 0.492 0.667 0.673 0.591 0.562

4.4. Sensitivity Analysis

In this section, we conducted a sensitivity analysis to examine the impact of varying the number of exemplars and the sliding stride on model performance.

4.4.1. Number of exemplars.

Table 6 shows how the model performance of cryoMAE varies with the number of exemplars used. As expected, adding more exemplars generally improves performance, owing to a more comprehensive representation of particle orientations in the similarity scoring process. This is particularly beneficial for particles with diverse orientations, as more exemplars increase the chance of capturing regions across different orientation states, improving recall. However, the performance improvement plateaus after a certain number of exemplars, with precision potentially decreasing. This is because particle orientations are limited, and once the diversity of these states is adequately covered, additional exemplars offer little benefit and may even raise false positives by increasing the likelihood of background regions being mistakenly scored highly. Thus, considering the diminishing returns beyond 15 exemplars, we identify this count as the optimal number for our few-shot learning approach.

Table 6:

CryoMAE with various exemplar number settings {1,5,15,25}.

EMPIAR ID Precision
Recall
F1 Score
1 5 15 25 1 5 15 25 1 5 15 25

10081 0.167 0.401 0.645 0.653 0.774 0.896 0.939 0.943 0.275 0.554 0.765 0.772
10093 0.068 0.243 0.383 0.373 0.139 0.361 0.497 0.508 0.091 0.290 0.433 0.430
10345 0.102 0.183 0.473 0.460 0.585 0.706 0.733 0.772 0.174 0.291 0.575 0.576
10532 0.138 0.296 0.503 0.481 0.251 0.478 0.497 0.507 0.178 0.366 0.500 0.494
11056 0.287 0.289 0.694 0.712 0.353 0.485 0.671 0.680 0.317 0.362 0.682 0.696

Average 0.152 0.282 0.540 0.536 0.420 0.585 0.667 0.682 0.207 0.373 0.591 0.594

4.4.2. Sliding strides.

Table 7 outlines the model performance of cryoMAE across various sliding strides, noting that decreasing stride from 56 to 14 typically boosts recall but diminishes precision. This trend can be attributed to the fact that larger strides cause a certain particle to be present in fewer windows, minimizing duplicate detections and enhancing precision. However, this can result in lower similarity scores for many particles, as they’re more likely to be close to window edges, which can reduce their likelihood of being selected and decrease recall. The F1 score, a precision-recall harmony measure, tends to improve with smaller strides. Yet, reducing stride size significantly lengthens processing time per query image. Considering the trade-off between time efficiency and model accuracy, a 28-pixel stride is identified as the optimal balanced approach.

Table 7:

CryoMAE with various sliding strides {14, 28, 42, 56}.

EMPIAR ID Precision
Recall
F1 Score
14 28 42 56 14 28 42 56 14 28 42 56

10081 0.584 0.645 0.665 0.689 0.942 0.939 0.856 0.772 0.721 0.765 0.748 0.728
10093 0.376 0.383 0.399 0.411 0.489 0.497 0.311 0.157 0.425 0.433 0.350 0.227
10345 0.446 0.473 0.478 0.479 0.746 0.733 0.551 0.502 0.558 0.575 0.512 0.490
10532 0.500 0.503 0.501 0.503 0.604 0.497 0.430 0.343 0.547 0.500 0.463 0.408
11056 0.689 0.694 0.692 0.695 0.702 0.671 0.606 0.522 0.695 0.682 0.646 0.596

Average 0.520 0.540 0.547 0.555 0.697 0.667 0.551 0.459 0.589 0.591 0.544 0.490

Average time (s) 356.4 87.2 38.0 20.8 - - - - - - - -

5. Conclusion

We introduce cryoMAE, a pioneering approach in few-shot learning tailored specifically for the cryo-EM field, significantly reducing the dependence on extensive labeled datasets for accurate particle picking. By harnessing the power of MAE and integrating a novel self-cross similarity loss, cryoMAE achieves superior performance in identifying particle-containing regions amidst the challenges posed by low SNR and diverse particle orientations. Validations on the CryoPPP dataset demonstrate cryoMAE’s superiority over existing NN-based methods, marking a significant advancement in the cryo-EM analysis pipeline. This innovation not only streamlines the process of high-resolution protein structure determination but also makes it more accessible to a wider scientific audience, promising to accelerate discoveries in structural biology.

6. Acknowledgement

This work was supported in part by U.S. NIH grants R01GM134020 and P41GM103712, NSF grants DBI-1949629, DBI-2238093, IIS-2007595, IIS-2211597, and MCB-2205148. This work was supported in part by Oracle Cloud credits and related resources provided by Oracle for Research, and the computational resources support from AMD HPC Fund.

Footnotes

References

  • [1].Bepler T., Morin A., Rapp M., Brasch J., Shapiro L., Noble A.J., Berger B.: Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nature methods 16(11), 1153–1160 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Chen T., Kornblith S., Norouzi M., Hinton G.: A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PMLR (2020) [Google Scholar]
  • [3].Devlin J., Chang M.W., Lee K., Toutanova K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) [Google Scholar]
  • [4].Dhakal A., Gyawali R., Wang L., Cheng J.: A large expert-curated cryo-em image dataset for machine learning protein particle picking. Scientific Data 10(1), 392 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Gyawali R., Dhakal A., Wang L., Cheng J.: Accurate cryo-em protein particle picking by integrating the foundational ai image segmentation model and specialized u-net (2023) [DOI] [PMC free article] [PubMed]
  • [6].Hadsell R., Chopra S., LeCun Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). vol. 2, pp. 1735–1742. IEEE (2006) [Google Scholar]
  • [7].He K., Chen X., Xie S., Li Y., Dollár P., Girshick R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022) [Google Scholar]
  • [8].He K., Fan H., Wu Y., Xie S., Girshick R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9729–9738 (2020) [Google Scholar]
  • [9].He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016) [Google Scholar]
  • [10].Iudin A., Korir P.K., Somasundharam S., Weyand S., Cattavitello C., Fonseca N., Salih O., Kleywegt G.J., Patwardhan A.: Empiar: the electron microscopy public image archive. Nucleic Acids Research 51(D1), D1503–D1511 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Li H., Chen G., Gao S., Li J., Wan X., Zhang F.: A transfer learning-based classification model for particle pruning in cryo-electron microscopy. Journal of Computational Biology 29(10), 1117–1131 (2022) [DOI] [PubMed] [Google Scholar]
  • [12].Milne J.L., Borgnia M.J., Bartesaghi A., Tran E.E., Earl L.A., Schauder D.M., Lengyel J., Pierson J., Patwardhan A., Subramaniam S.: Cryo-electron microscopy–a primer for the non-microscopist. The FEBS journal 280(1), 28–45 (2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Pei L., Xu M., Frazier Z., Alber F.: Simulating cryo electron tomograms of crowded cell cytoplasm for assessment of automated particle picking. BMC bioinformatics 17, 1–13 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Punjani A., Rubinstein J.L., Fleet D.J., Brubaker M.A.: cryosparc: algorithms for rapid unsupervised cryo-em structure determination. Nature methods 14(3), 290–296 (2017) [DOI] [PubMed] [Google Scholar]
  • [15].Redmon J., Divvala S., Girshick R., Farhadi A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016) [Google Scholar]
  • [16].Rigort A., Günther D., Hegerl R., Baum D., Weber B., Prohaska S., Medalia O., Baumeister W., Hege H.C.: Automated segmentation of electron tomograms for a quantitative description of actin filament networks. Journal of structural biology 177(1), 135–144 (2012) [DOI] [PubMed] [Google Scholar]
  • [17].Scheres S.H.: Semi-automated selection of cryo-em particles in relion-1.3. Journal of structural biology 189(2), 114–122 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Shi M., Lu H., Feng C., Liu C., Cao Z.: Represent, compare, and learn: A similarity-aware framework for class-agnostic counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9529–9538 (2022) [Google Scholar]
  • [19].Tang G., Peng L., Baldwin P.R., Mann D.S., Jiang W., Rees I., Ludtke S.J.: Eman2: an extensible image processing suite for electron microscopy. Journal of structural biology 157(1), 38–46 (2007) [DOI] [PubMed] [Google Scholar]
  • [20].Vinzenz M., Nemethova M., Schur F., Mueller J., Narita A., Urban E., Winkler C., Schmeiser C., Koestler S.A., Rottner K., et al. : Actin branching in the initiation and maintenance of lamellipodia. Journal of cell science 125(11), 2775–2785 (2012) [DOI] [PubMed] [Google Scholar]
  • [21].Voss N., Yoshioka C., Radermacher M., Potter C., Carragher B.: Dog picker and tiltpicker: software tools to facilitate particle selection in single particle electron microscopy. Journal of structural biology 166(2), 205–213 (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Wagner T., Merino F., Stabrin M., Moriya T., Antoni C., Apelbaum A., Hagel P., Sitsel O., Raisch T., Prumbaum D., et al. : Sphire-cryolo is a fast and accurate fully automated particle picker for cryo-em. Communications biology 2(1), 218 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Wang F., Gong H., Liu G., Li M., Yan C., Xia T., Li X., Zeng J.: Deeppicker: A deep learning approach for fully automated particle picking in cryo-em. Journal of structural biology 195(3), 325–336 (2016) [DOI] [PubMed] [Google Scholar]
  • [24].Zeng X., Kahng A., Xue L., Mahamid J., Chang Y.W., Xu M.: High-throughput cryo-et structural pattern mining by unsupervised deep iterative subtomogram clustering. Proceedings of the National Academy of Sciences 120(15), e2213149120 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Zhang J., Chen Q., Zeng Y., Gao W., He X., Liu Z., Yu J.: Genem: Physics-informed generative cryo-electron microscopy. arXiv preprint arXiv:2312.02235 (2023) [Google Scholar]
  • [26].Zhu Y., Ouyang Q., Mao Y.: A deep convolutional neural network approach to single-particle recognition in cryo-electron microscopy. BMC bioinformatics 18, 1–10 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES