Skip to main content
Biophysical Reviews logoLink to Biophysical Reviews
. 2020 Mar 11;12(2):349–354. doi: 10.1007/s12551-020-00669-6

Development of a deep learning-based method to identify “good” regions of a cryo-electron microscopy grid

Yuichi Yokoyama 1, Tohru Terada 2,3,, Kentaro Shimizu 2,3, Kouki Nishikawa 4,5, Daisuke Kozai 6, Atsuhiro Shimada 7, Akira Mizoguchi 8, Yoshinori Fujiyoshi 4,5, Kazutoshi Tani 8
PMCID: PMC7242580  PMID: 32162215

Abstract

Recent advances in cryo-electron microscopy (cryo-EM) have enabled protein structure determination at atomic resolutions. Cryo-EM specimens are prepared by rapidly freezing a protein solution on a metal grid coated with a holey carbon film; this results in the formation of an ice film on each hole. The thickness of the ice film is a critical factor for high-resolution structure determination; ice that is too thick degrades the contrast of the protein image while ice that is too thin excludes the protein from the hole or denatures the protein. Therefore, trained researchers need to manually select “good” regions with appropriate ice thicknesses for imaging. To reduce the time spent on such tasks, we developed a deep learning program consisting of a “detector” and a “classifier” to identify good regions from low-magnification EM images. In our method, the holes in a low-magnification EM image are detected via a detector, and the ice image on each hole is classified as either good or bad via a classifier. The detector detected more than 95% of the holes regardless of the type of samples. The classifier was trained for different types of samples because the appropriate ice thickness varies between sample types. The accuracies of the classifiers were 93.8% for a soluble protein sample (β-galactosidase) and 95.3% for a membrane protein sample (bovine heart cytochrome c oxidase). In addition, we found that a training data set containing ~ 2100 hole images from 300 low-magnification EM images was sufficient to obtain good accuracy, such as higher than 90%. We expect that the throughput of the cryo-EM data collection step will be greatly improved by using our method.

Keywords: Cryo-EM, Grid, Ice thickness, Deep learning, Low-magnification image

Introduction

Cryo-electron microscopy (cryo-EM) has attracted growing attention due to its ability to determine the structures of proteins and their complexes at high resolution. A cryo-EM specimen is prepared by applying a protein solution to a metal grid. The metal grid is rapidly frozen to embed the protein molecules in vitreous ice. As a result, the native structure of the protein is maintained under high-vacuum conditions, which enables high-resolution structure determination (Dubochet et al. 1982; Cheng 2015; Fernandez-Leiro and Scheres 2016). The grid is loaded into an electron microscope, and image data are collected. Then, the data are analyzed and the 3D structure of the protein is reconstructed. In the process of the single-particle analysis of cryo-EM data, sample preparation and data collection are the rate-limiting steps. It is therefore necessary to develop methods to overcome these bottlenecks to improve the throughput of protein structure determination via cryo-EM.

Sample preparation for cryo-EM is difficult and time-consuming, because the thickness of the ice is a critical factor for high-resolution structure determination, and many trials and errors are often required to find conditions under which ice films with appropriate thickness are constantly formed on a grid. The grid is coated with a carbon film with a regular array of holes, and an ice film forms within each hole (Fig. 1a). Because the difference in the density between the protein and the water is small, a thick film of ice results in a noisy protein image (Fig. 1b). Conversely, a thin film of ice results in the protein molecules being excluded from the hole. In addition, some protein molecules may be denatured due to destruction of the hydration structure around the protein or be deformed by the surface tension of the water layer (Unwin 2013; Noble et al. 2018) when the ice film is too thin (Fig. 1b). Because the ice film thickness varies between different regions on a grid, researchers must find the “good” holes, which have ice films with appropriate thicknesses, before acquiring hole images at high magnification. However, much experience is needed to manually select good holes. One option is to acquire all hole images at high magnification. However, this would require one or more months and would result in an enormous amount of image data from which useful images would need to be extracted. Therefore, acquiring all hole images at high magnification is very hard in practice.

Fig. 1.

Fig. 1

a A 200-mesh grid with a diameter of 3 mm used in this study and its EM images at different magnifications. The middle image shows a square in the mesh. The carbon film on the grid has regular holes of the diameter of 2 μm. The bottom image shows an enlarged view of the holes of the carbon film with a sample. b Schematic illustrations (top) and corresponding EM images (bottom) of holes with thick, good, and thin ice films. In the illustrations, the carbon film and the ice regions are colored gray and blue, respectively, and the solid green and red circles represent native and denatured protein particles, respectively. In each image, the 2-μm-diameter hole at the center of the view is enclosed in a red dashed circle. The region indicated by the black dotted circle within the hole with a thin ice film represents a region from which protein molecules are excluded

Because a thicker ice film transmits less electrons and a thinner ice film transmits more electrons, it may be possible to identify good holes based on the brightness of the images inside the holes (Tan et al. 2016). Some software packages that assist in automated data collection, such as AutoEMation (Lei and Frank 2005) and Leginon (Suloway et al. 2005), provide a function to identify holes brighter than a given threshold to skip empty holes and holes with ice that is too thin. However, it is still difficult to find a threshold value that ensures the identification of good holes due to the variability in the ice images of good holes. Therefore, these functions are not sufficient to improve the efficiency of the data collection.

In this study, we developed a deep learning-based method to identify “good” holes from low-magnification EM images. This method enables the automatic selection of holes suitable for high-resolution structure determination. In this way, high-magnification images are only acquired for the selected holes. It is therefore expected that the throughput of cryo-EM data collection will be greatly improved using this method.

Materials and methods

Protein preparation

The apo form of an E. coli β-galactosidase lyophilized sample (Sigma-Aldrich, UK, G5635-1KU) was suspended in a buffer containing 25-mM Tris-HCl (pH 8.0), 50-mM NaCl, 2-mM MgCl2, and 1-mM tris-(2-carboxyethyl)-phosphine hydrochloride (TCEP). A bovine heart cytochrome c oxidase solution was prepared as described in a previous study (Tsukihara et al. 1995). The β-galactosidase and cytochrome c oxidase samples are referred to as sample A and sample B, respectively.

Cryo-EM data acquisition

For cryo-EM, 200-mesh Quantifoil R 2/2 molybdenum grids (Quantifoil Micro Tools GmbH, Germany) were pre-irradiated overnight with an electron beam and glow discharge. An aliquot of 1.5 μl of the β-galactosidase (0.5–1 mg ml−1) or cytochrome c oxidase (3–4 mg ml−1) solution was placed on the grids. After blotting the excess solution, the grids were manually plunged into liquid ethane using a KF-80 plunge-freezing device (Leica Microsystems, Austria).

Data collection was performed on a JEM-Z300CF cryo-electron microscope (JEOL, Japan) equipped with a side entry stage cooled by liquid nitrogen and an automated specimen exchange system at 300 kV with a cold field emission gun. All low-magnification images used in the classification system were recorded using a OneView 4 k × 4 k CMOS camera (GATAN, USA) using a 2 × 2 binning mode during the search mode. During the search mode, the microscope was set to a highly defocused diffraction mode at a camera length of 150 cm without inserting a selected aperture. We set the apparent magnification of the experiment to correspond to ×1000–1500 in the Low-MAG mode. After searching the area, high-magnification (×40,000) images were also recorded using a K2 Summit camera (GATAN, USA) to determine whether proteins were contained in the holes.

Detection and classification system

Figure 2 outlines the detection and classification system developed in this study. A low-magnification image is put into the system. There are several (seven on average) holes in each image. For each hole in the image, the system makes a decision as to whether the ice on the hole is good or bad. The system is composed of a detector that detects holes in the low-magnification image and a classifier that classifies the detected holes as either good or bad based on the image of the ice film within each hole. Bad holes include empty holes and holes with thick, thin, or cracked ice films. The detector and the classifier were implemented using the YOLOv3 (Redmon and Farhadi 2018) and Xception (Chollet 2017) deep learning methods, respectively. YOLOv3 is a fast, real-time object detection method, and Xception is an image classification method based on a parameter-efficient convolutional neural network. The original YOLOv3 network was extended to allow 2048 × 2048 input images. The detector was trained with a set of low-magnification images with bounding boxes that enclose the holes in the images. The classifier was trained with a set of hole images, which were manually labeled as good or bad.

Fig. 2.

Fig. 2

Outline of the developed system. A low-magnification cryo-EM image is put into the system composed of a detector and a classifier implemented using deep learning methods. The detector, which was implemented using an extended version of YOLOv3, detects holes in the low-magnification image. The classifier, which was implemented using Xception, classifies the detected holes as either good or bad based on the image of the ice film within each hole. The classifier was trained with a set of hole images, which were manually labeled as good or bad

Results and discussion

Performance of the detector

The detector was trained and validated using 125 and 10 low-magnification images of sample A, respectively. First, we evaluated the performance of the detector using 1270 low-magnification images of sample A. We found that 95.4% of the holes in the test set images were correctly detected. Next, we applied the detector to the images of sample B. The detector correctly detected 97.5% of the holes in 519 low-magnification images of sample B. Therefore, the detector did not show sample dependency. Even though the detector failed to detect a small number of holes, most of the undetected holes were contaminated and, therefore, were not suitable for data collection. Based on these results, we concluded that the detector achieved a sufficiently high accuracy.

Performance of the classifier

The classifier was first trained using low-magnification images of sample A. A total of 7707 hole images extracted from 1096 low-magnification images were used for the training, and 693 hole images extracted from 100 low-magnification images were used for the validation. Then, the performance of the classifier was evaluated using 678 hole images extracted from 100 low-magnification images of sample A. All of the hole images were manually labeled as either good or bad. The performance was evaluated according to the accuracy, precision, recall, and area under the curve (AUC). The accuracy, precision, and recall were calculated as

Accuracy=TP+TNTP+FP+FN+TN,Precision=TPTP+FP,Recall=TPTP+FN,

where TP, FP, FN, and TN are the number of good holes correctly classified as good (true positives), the number of bad holes incorrectly classified as good (false positives), the number of good holes incorrectly classified as bad (false negatives), and the number of bad holes correctly classified as bad (true negatives), respectively. Recall is also referred to as the true positive rate. AUC represents the area under the receiver operating characteristic (ROC) curve, which was drawn by plotting the true positive rate (i.e., recall) against the false positive rate at a different classification threshold. The false positive rate is calculated as

False positive rate=FPFP+TN.

The accuracy, precision, and recall values for sample A were 93.8%, 95.6%, and 95.2%, respectively. Figure 3a shows the ROC curve. The curve passed close to the top-left corner of the graph area, indicating a good classification performance (Fig. 3a). The AUC value of this plot is 0.981. All of these results indicate that the holes were accurately classified.

Fig. 3.

Fig. 3

ROC curves of the classifier that was trained with the sample A training set and then applied to (a) the sample A test set and (b) the sample B test set. Typical images of a good hole for (c) sample A and (d) sample B. (e) An example of the output of the developed system applied to a low-magnification EM image of sample A. The good hole is enclosed in a red circle, and the bad holes are enclosed in blue squares

Sample dependence of the classifier

Next, we applied this classifier to the images of sample B. The performance was evaluated using 678 hole images extracted from 100 low-magnification images of sample B. The accuracy, precision, and recall values were 75.7%, 76.8%, and 95.6%, respectively. The accuracy and precision values were significantly degraded because the number of false positives increased. This can be seen from the ROC curve, which has an AUC value of 0.392 (Fig. 3b).

This result indicates that the classifier trained with the images of sample A cannot correctly classify the images of sample B. Figure 3c and d show hole images labeled as good for samples A and B, respectively. Both good holes have whitish regions at their centers, and the size of the whitish region is larger in the images of sample A than in the images of sample B. This difference is caused by differences in the particle distribution. In sample A, the protein particles of β-galactosidase were evenly distributed in the ice film within the hole. Conversely, the cytochrome c oxidase in sample B tended to be located near the rim of the hole because cytochrome c oxidase prefers thicker ice films, and ice films are thicker near the rim. The whitish region represents the region where the ice film is thin. For sample A, a hole with a thin ice film is good for data collection, and the image of such a hole has a large whitish region. Conversely, for sample B, a hole with a slightly thicker ice film is good for data collection, and the image of such a hole has a smaller whitish region. Therefore, there is a discrepancy between what constitutes a good hole for samples A and B. This is why the classifier trained using only images of sample A produced large numbers of false positives when applied to images of sample B.

To address this problem, we trained a classifier using images of sample B. A total of 2322 hole images extracted from 319 low-magnification images were used for the training, and 708 hole images extracted from 100 low-magnification images were used for the validation. Then, the performance was evaluated using 678 hole images extracted from 100 low-magnification images. The accuracy, precision, and AUC values were greatly improved to 95.3%, 95.1%, and 0.966, respectively. The recall also increased to 98.6%. These results indicate that the holes were accurately classified with the classifier trained using the images of sample B.

Comparison of the performances of versatile and dedicated classifiers

To explore the possibility of constructing a versatile classifier that is applicable to a wide range of samples, we trained a classifier using the images of both sample A and sample B. Training and validation were performed using the same sets of hole images as those used previously. The trained classifier was first applied to the test set of the hole images of sample A. The accuracy, precision, recall, and AUC values were 91.6%, 92.2%, 95.6%, and 0.951, respectively. We next applied the classifier to the test set of the hole images of sample B. The accuracy, precision, recall, and AUC values were 91.0%, 90.0%, 98.6%, and 0.945, respectively. Even though fairly good performances were obtained, the number of false positives increased slightly. As a result, the accuracy, precision, and AUC values were slightly worse than those obtained from the classifiers that were trained with the images of just one sample and applied to images of that same sample.

These results suggest that a dedicated classifier that is trained with images of a single sample will likely achieve a higher performance than a versatile classifier. To train a dedicated classifier, the user needs to create a training data set by manually labeling the hole images, which is a time-consuming and boring task. To estimate the number of images required for training, we evaluated the performance of the classifier using training data sets of different sizes. We used the images of sample A in this experiment. The accuracy value of a classifier trained with the hole images extracted from 500 low-magnification images was 92.0%, which was comparable to the accuracy value of a classifier trained using 1096 low-magnification images (93.8%). The accuracy value did not change very much (92.2%), when the number of low-magnification images was reduced to 300. However, when the number of low-magnification images was reduced to 100, the accuracy value significantly degraded to 88.7%. The same tendency was observed for the recall values. Therefore, 300 low-magnification images, containing approximately 2100 holes, appear to be a necessary and sufficient number to train a classifier with good performance.

Usage of the system

Finally, we combined the detector and the classifier. Figure 3e shows the output of the combined system. In this figure, the good hole is enclosed in a red circle and bad holes are enclosed in blue squares.

To maximize portability, the detector and classifier were implemented using one of the most popular deep learning frameworks, TensorFlow (https://www.tensorflow.org/), via Keras (https://keras.io/), the Python deep learning library. Using a personal computer equipped with an NVIDIA graphics processing unit, the time required for training was approximately 10 min, and the time required for the detection and classification of the holes in a low-magnification image was approximately 0.1 s. The procedure for using the proposed system is as follows:

  1. Take at least 300 low-magnification images of a sample grid.

  2. Detect the holes using the system detector.

  3. Manually label each hole in the low-magnification images as either good or bad.

  4. Train the classifier using the hole images.

  5. Take a low-magnification image of the grid, identify good holes in the image using the trained classifier, and acquire high-magnification images of the good holes.

  6. Repeat step 5 until a sufficient amount of data is obtained.

We are currently developing a program that assists users in the manual labeling of the hole images (Step 3) using a graphic user interface. With this system, time will not be wasted for acquiring high-magnification images of useless holes. We hope that this system will contribute to improving the throughput of cryo-EM data collection.

Conclusions

We developed a method to identify holes that have ice films with appropriate thicknesses in low-magnification EM images. This method sequentially implements two deep learning methods to detect the holes in an image and to classify the hole images as either good or bad. The detector accurately detected the holes in low-magnification EM images regardless of the type of protein in the sample. The performance of the classifier was also high but was found to depend on the type of sample. Therefore, to achieve higher performance, a dedicated classifier that is trained with images of the sample of interest needs to be used. The training data set is created by manually labeling the hole images in low-magnification images as either good or bad. We found that a training data set containing hole images from 300 low-magnification images was necessary and sufficient for good performance. We are currently developing a program to assist users in creating a training data set. We expect our method to facilitate the process of protein structure determination using cryo-EM.

Acknowledgments

We thank Mr. Itto Higuchi for his assistance in the early stage of this work. We also thank Dr. Takeshi Kaneko, Mr. Isamu Ishikawa, and Mr. Yoshihiro Ohkura for setting up of the cryo-electron microscope.

Funding information

This research was partially supported by the Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number JP19am0101107. This work was also supported by JSPS KAKENHI Grant Number 15H05775 and by AMED under Grant Number JP19ae0101046.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Cheng Y. Single-particle Cryo-EM at crystallographic resolution. Cell. 2015;161:450–457. doi: 10.1016/j.cell.2015.03.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chollet F (2017) Xception: deep learning with Depthwise separable convolutions. arXiv:1610.02357v3. https://arxiv.org/abs/1610.02357v3
  3. Dubochet J, Chang J-J, Freeman R, Lepault J, McDowall AW. Frozen aqueous suspensions. Ultramicroscopy. 1982;10:55–61. doi: 10.1016/0304-3991(82)90187-5. [DOI] [Google Scholar]
  4. Fernandez-Leiro R, Scheres SH. Unravelling biological macromolecules with cryo-electron microscopy. Nature. 2016;537:339–346. doi: 10.1038/nature19948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Lei J, Frank J. Automated acquisition of cryo-electron micrographs for single particle reconstruction on an FEI Tecnai electron microscope. J Struct Biol. 2005;150:69–80. doi: 10.1016/j.jsb.2005.01.002. [DOI] [PubMed] [Google Scholar]
  6. Noble AJ, Wei H, Dandey VP, Zhang Z, Tan YZ, Potter CS, Carragher B. Reducing effects of particle adsorption to the air-water interface in cryo-EM. Nat Methods. 2018;15:793–795. doi: 10.1038/s41592-018-0139-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Redmon J, Farhadi A (2018) YOLOv3: an Incremental Improvement. arXiv:1804.02767. https://arxiv.org/abs/1804.02767
  8. Suloway C, Pulokas J, Fellmann D, Cheng A, Guerra F, Quispe J, Stagg S, Potter CS, Carragher B. Automated molecular microscopy: the new Leginon system. J Struct Biol. 2005;151:41–60. doi: 10.1016/j.jsb.2005.03.010. [DOI] [PubMed] [Google Scholar]
  9. Tan YZ, Cheng A, Potter CS, Carragher B. Automated data collection in single particle electron microscopy. Microscopy. 2016;65:43–56. doi: 10.1093/jmicro/dfv369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, Shinzawa Itoh K, Nakashima R, Yaono R, Yoshikawa S. Structures of metal sites of oxidized bovine heart cytochrome c oxidase at 2.8 A. Science. 1995;269:1069–1074. doi: 10.1126/science.7652554. [DOI] [PubMed] [Google Scholar]
  11. Unwin N. Nicotinic acetylcholine receptor and the structural basis of neuromuscular transmission: insights from Torpedo postsynaptic membranes. Q Rev Biophys. 2013;46:283–322. doi: 10.1017/S0033583513000061. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biophysical Reviews are provided here courtesy of Springer

RESOURCES