Abstract
Cryo-EM maps are valuable sources of information for protein structure modeling. However, due to the loss of contrast at high frequencies, they generally need to be post-processed to improve their interpretability. Most popular approaches, based on global B-factor correction, suffer from limitations. For instance, they ignore the heterogeneity in the map local quality that reconstructions tend to exhibit. Aiming to overcome these problems, we present DeepEMhancer, a deep learning approach designed to perform automatic post-processing of cryo-EM maps. Trained on a dataset of pairs of experimental maps and maps sharpened using their respective atomic models, DeepEMhancer has learned how to post-process experimental maps performing masking-like and sharpening-like operations in a single step. DeepEMhancer was evaluated on a testing set of 20 different experimental maps, showing its ability to reduce noise levels and obtain more detailed versions of the experimental maps. Additionally, we illustrated the benefits of DeepEMhancer on the structure of the SARS-CoV-2 RNA polymerase.
Subject terms: Cryoelectron microscopy, Data processing
Sanchez-Garcia et al. present DeepEMhancer, a deep learning-based method that can automatically perform post-processing of raw cryo-electron microscopy density maps. The authors report that DeepEMhancer globally improves local quality of density maps, and may represent a useful tool for novel structures where PDB models are not readily available.
Introduction
Almost one decade after the beginning of the so-called “resolution revolution”, cryogenic electron microscopy (cryo-EM) has become one of the most versatile tools in the field of structural biology. Beginning from thousands of single-particle projection images, cryo-EM workflows are capable of obtaining three-dimensional (3D) reconstructions of many macromolecules at “near-atomic” resolution levels. However, the ultimate goal of the cryo-EM single-particle analysis is not the obtention of 3D maps but the detailed atomic understanding through the derivation of atomic models.
During the atomic model building process, raw 3D maps are rarely employed, as they suffer from loss of contrast at high resolution1 that makes difficult the detection and interpretability of residues and secondary structure. Fortunately, loss of contrast can be alleviated using different contrast restoration algorithms, which are usually known as sharpening methods. The first sharpening approach for cryo-EM maps was introduced by Rosenthal and Henderson1 and their formulation, based on the global B-factor correction, is still at the basis of the most commonly employed sharpening methods, including RELION postprocessing2,3 or Phenix AutoSharpen4. The principle behind these algorithms consists in the correction of the raw maps by boosting the amplitude of their high-frequency Fourier components. The strength of the amplitude boost at each frequency depends on the frequency itself and on a single number, the B-factor, that measures the global loss of contrast. Thus, although the different global B-factor-based methods differ in the procedures employed to determine the B-factor that is applied, they modify the volume globally in a similar manner.
Despite being widely used, global B-factor-based approaches present an important limitation: they do not consider the differences in quality that different parts of the map may present and they produce density maps that do not correspond to the scattering properties of biological macromolecules5. Consequently, for the case of maps that exhibit heterogeneous local resolution, some regions could be undersharpened whereas others could be oversharpened. Recently, local sharpening algorithms that alleviate this shortcoming, have been proposed. Thus, the LocScale6 algorithm uses the information contained in an atomic model to locally scale up a map. Such transformation is achieved by means of a sliding window approach in which the amplitudes of the map region that lay inside the window are scaled up to agree with the atomic model provided. Following a totally different strategy, the LocalDeblur7 algorithm employs a Wiener filtering approach that performs local deblurring with a strength proportional to an estimation of the local resolution, that has to be pre-computed. Similarly, LocSpiral8 employs the spiral phase transformation to factorize the volume and then perform a local enhancement based on the normalization and thresholding of the amplitudes.
Despite their benefits, current local sharpening approaches present some drawbacks. Thus, both LocSpiral and LocalDeblur depend on masks to distinguish the macromolecule from the noise and LocalDeblur requires also an estimation of the local resolution of the map. On the other hand, the main strength of LocScale, its ability to employ the structural information of atomic models, could also be regarded as its main weakness since the availability of atomic models limits its applicability.
With the aim of overcoming these shortcomings, in this work, we present Deep cryo-EM Map Enhancer (DeepEMhancer), a fully automatic deep learning-based approach that performs cryo-EM volume post-processing. Deep learning has revolutionized the field of artificial intelligence and its impact has been felt in many others including cryo-EM. Deep learning in cryo-EM was first applied to the problem of particle picking9–11 and since then, it has evolved to deal with other questions such as map reconstruction12,13, map segmentation14,15, or local resolution determination16,17. As in most of those methods, our approach relies on a convolutional neural network (CNN) that is trained on massive quantities of data. Particularly, our development, which follows a simple image super-resolution setup18, exploits the vast amount of structural information that is contained in the Electron Microscopy Data Bank (EMDB) database19 in order to mimic the local sharpening effect of the LocScale algorithm. However, DeepEMhancer does not require any atomic model to function and, contrary to previous methods, it also performs automatic (tight) masking of input maps. Our results show that DeepEMhancer, which works in a fully automatic manner, is able to largely improve the interpretability of the maps contained in our benchmark, performing better than classical global B-factor approaches.
Results
DeepEMhancer is based on an end-to-end U-net architecture20 trained in a supervised manner. Particularly, we implemented a 3D U-net consisting of three downsampling blocks and three upsampling blocks that process cubic chunks of the input map (see Supplementary Table 1 for more details). Training was performed using pairs of input maps and target maps, consisting of experimental cryo-EM maps and tightly masked LocScale post-processed maps. Despite other possible alternatives (e.g., LocalDeblur, etc.) LocScale was chosen as the method to produce targets because it makes use of atomic model information, which tends to produce high-quality results. For a complete description of the data preparation, training, and evaluation processes see the “Methods” section.
DeepEMhancer performance on the testing set
In order to assess the quality of DeepEMhancer predictions, we first compared them against the target maps generated by LocScale. Thus, for DeepEMhancer maps, we measured a median correlation coefficient of 0.9 against LocScale maps in contrast to 0.6 for input maps (see Supplementary Fig. 1). Such an important increase in the correlation coefficient implies that DeepEMhancer has learned to accurately reproduce the effect of LocScale sharpening with one important advantage: no atomic models are required to employ DeepEMhancer.
Although reproducing the LocScale-sharpening effect was our main objective, the ultimate goal of map post-processing is to simplify the process of atomic model building. With the aim of studying if DeepEMhancer also contributes to that purpose, we next explored whether DeepEMhancer post-processed maps were more similar to the actual atomic models. To do so, we computed, for all the maps included in the testing set, the Fourier shell correlation coefficient (FSC) resolution between the input (half maps average) and post-processed maps against the reference maps obtained from the atomic models. As it is shown in Fig. 1, for all the examples included in the testing set, the application of DeepEMhancer increased the similarity of the input maps with respect to the references (blue and green bars). Particularly, the median improvement achieved by DeepEMhancer was ~0.6 Å (~14% in the frequency domain). Such an important improvement confirms that the maps computed by DeepEMhancer are more similar to the target maps.
DeepEMhancer post-processing operation performs a non-linear transformation of the experimental volume that produces a set of effects that could be broadly classified as masking/denoising and sharpening-like features enhancement. In order to disentangle the contribution of the different effects, we have also computed the FSC of the input and post-processed maps using a tight mask derived from the atomic model. As it can be observed in Fig. 1, the FSC resolution obtained for the post-processed maps tends to be better than the values computed for the input independently of the mask application (green and red bars vs orange bar), which implies that the masking effect is of high-quality, as the resolutions for the unmasked DeepEMhancer results tend to be better than the ones for the masked input maps.
Comparison with other methods
With the aim of comparing DeepEMhancer with the commonly employed global B-factor-based sharpening methods, we repeated the same experiments using the post-processed maps obtained with the Relion postprocessing algorithm2,3. Before it is important to notice that contrary to DeepEMhancer, Relion automatic masking is a simple process, and thus, in order to make the comparison more interesting, we used instead the masks derived from the atomic models.
Still, when we evaluated the FSC for the masked regions, only a few maps improved, while many others worsened, leading to a median improvement that was negligible (<0.05 Å) for both FSC and median DeepRes resolution (see Figs. 2 and 3).
Similarly, and, although it is true that the trend is not as strong as in the previous experiment, DeepEMhancer also tends to improve the resolution of the masked regions (Fig. 1, orange vs. red bars), which supposes an enhancement of the map features. Leaving aside some problematic examples such as EMD-705521, that will be discussed in Supplementary Note 1 and Supplementary Fig. 2, most of the evaluated maps exhibit a non-negligible improvement in resolution, especially notable when compared to B-factor-based results (see next section), with a median value of ~0.3 Å.
Alternatively, with the aim of obtaining a complementary measurement of improvement, we computed the DeepRes local resolution for the input and post-processed maps. As can be appreciated in Fig. 2, all test cases treated with DeepEMhancer improved in terms of DeepRes local resolution, with dramatic improvements of more than 0.7 Å and a median improvement of ~0.4 Å. Again, those figures, consistent with the FSC-based measurements, point out that DeepEMhancer is improving the interpretability of the maps.
We acknowledge that the automatic determination of the B-factor can lead to less accurate results than if it were manually selected and it may be the reason behind the poor observed performance. Thus, we have also included in the comparison the post-processed maps deposited in EMDB in which the estimation of B-factor was carried out by the authors. In this case, the improvement in resolution, with median values of ~0.15 and ~0.1 Å for DeepRes and FSC, respectively, although closer to the values obtained using DeepEMhancer, are still considerably inferior (see Figs. 2 and 3). Such a difference in performance can be partially explained by the ability of local sharpening methods to deal better with low-quality regions of input maps as is shown in Supplementary Figure 3 and discussed in Supplementary Note 2.
In the light of these results, we can state that DeepEMhancer maps tend to be more similar to the atomic models than the ones obtained using global B-factor-based methods and thus, more useful for the process of model building. Finally, for the sake of completeness, we also computed FSC curves to compare our approach with other state-of-the-art sharpening approaches, showing that our fully automatic approach produces competing if no better results for many cases (see Supplementary Note 3 and Supplementary Figs. 4–9).
Visual inspection of testing maps
The purpose of this section is to further explore the results obtained with DeepEMhancer for some of the maps included in the testing set with the aim of illustrating how the improvements in global quality measurements translate to tangible improvements in the quality of the maps.
EMD-7099
The EMD-709922 is a high-resolution volume (global resolution 3.1 Å) of a multidrug resistance ATP-driven pump. EMD-7099 presents 17 transmembrane helices and, although the overall quality of the map is excellent, visualizing the transmembrane regions is challenging because of the signal that comes from the lipids. As a result, important parts of the protein are not traced. Due to the fact that DeepEMhancer was trained to ignore the signal coming from lipidic layers, this example illustrates the unique characteristics of DeepEMhancer when applied to membrane proteins. Thus, as can be observed in Fig. 4a–d, DeepEMhancer has been able to suppress the signal coming from the lipid layer in a much more simple and effective way than diminishing the threshold in the raw map or the B-factor-based sharpened maps. The noise suppression effect simplifies the process of model building, as the researchers do not have to deal with masks or larger thresholds that make the visualization of near-to-noise level features more difficult. Yet not only DeepEMhancer produces a noise reduction effect, but also it is able to enhance some parts of the map that under B-factor-based sharpening seem noisy and disconnected. Such improvement, although observed in several regions of the map, is more noticeable at the transmembrane region Thus, the most important enhancement is depicted in Fig. 4e, f, in which an important part of the backbone of the protein has been de novo traced thanks to DeepEMhancer enhancement, that has restored the densities corresponding to residues A195 to I203 in chain A of PDB 6bhu. Although it is true that this region was present in the raw data map, its intensity range was so close to one of the lipidic layers that after conventional B-factor post-processing, the region was so damaged that modeling was not possible. On the contrary, not only DeepEMhancer was able to suppress most of the signal coming from the lipid layer but also it was able to restore the density of the region so that it looks smooth and continuous.
EMD-4997
The EMD-499723 is a medium-high resolution volume (4.0 Å) for a murine epithelial anion transporter. As in the previous example, the overall quality of the map is quite good, yet it presents lower quality regions. Figure 5a shows an overview of the published map, displayed at the recommended threshold, and the map obtained with DeepEMhancer. Although it is true that both the published map and the post-processed map look very similar, it is also true that there exist important differences. Firstly, the map processed with DeepEMhancer is cleaner than the published one. Serve as an example the removal of the artifacts that the published map presents near the elbow of the complex (see Fig. 5a, red box). More importantly, there can also be found many regions for which the DeepEMhancer post-processed volume resolves better the different residues of the regions. One such example can be found near the N-terminal end of the protein complex. Thus, as it is shown in Fig. 5b, the densities that correspond to the strands of the β-sheet are better separated than in the published volume. It is important to notice that this better separation is not a consequence of the employed thresholds, as it is proven by the fact that rising the threshold makes the densities corresponding to the backbone discontinuous before the densities for the two strands separate (see Fig. 5b). As a result, we can affirm that the quality of this region has been improved by the usage of DeepEMhancer.
Another similar example is displayed in Fig. 5c. In this case, two non-contiguous aromatic residues, Y361 and H121, seem connected in the published map. However, when DeepEMhancer is applied, the densities corresponding to the two residues look separated while the backbone remains continuous.
Use case EMD-30178 from SARS-CoV-2 RNA-dependent RNA polymerase
In order to further explore the benefits of the DeepEMhancer algorithm, we analyzed more deeply the post-processing of EMD-30178 map from Gao24, corresponding to the SARS-CoV-2 RNA-dependent RNA polymerase. The published map presents detailed structure up to 2.9 Å resolution, however, as is often the case in cryo-EM, the resolution of the map is highly heterogeneous. We have chosen this map not only for the importance of this structure in current days but also because of the fact that the heterogeneous quality of the map density presents an ideal case for DeepEMhacer software. As it is shown in Fig. 6a, the application of the algorithm reduces the noise and improves the consistency and depiction of the map. To better illustrate these differences, we have chosen two different regions in chains A and D where the differences between the published and the DeepEMhancer map can be appreciated (Fig. 6b and c). While the density in the published map looks noisy or discontinuous depending on the displayed threshold (Fig. 6b and c, left and middle panel), the application of the DeepEMhacer software results in a well-defined continuous density where the side chains are nicely depicted (Fig. 6b and c, right panel). This improvement in the map density allowed us to close the loop between residues in the β-sheet V115 to I132 from chain D tracing three new residues that were not traced in the published structure (Fig. 6b). The improvement of the density is not only applicable to the edges of the map but it can be also appreciated in its core. Residues H362–L366 in chain A, traced on the published map were positioned more accurately on the density after map post-processing (Fig. 6c).
Discussion
The number of deposited high-resolution cryo-EM maps has soared since the beginning of the ‘resolution revolution’. As a result, there is an increasing number of atomic models that are being built using cryo-EM as the primary source of information. However, building atomic models directly from the raw maps is generally not possible. Instead, maps are post-processed in order to enhance the contrast of their high-resolution features.
In this work, we have presented DeepEMhancer, a map post-processing method based on deep learning. Trained on pairs of experimental cryo-EM maps and post-processed maps constructed with LocScale using atomic models, DeepEMhancer has learned how to perform a high-quality post-processing operation that reproduces the effects of masking and local sharpening in an automatic fashion.
Although it is true that DeepEMhancer could have been trained on other targets, for instance, the simulated maps obtained directly from the atomic models, we discarded this alternative for two reasons. The first reason is that we wanted to reproduce the state-of-the-art local sharpening effect and not a new type of post-processing that could not be compatible with downstream atomic modeling tools. The other one is empirical: we obtained better results when targets were produced with LocScale than when the targets were directly obtained from the atomic models. As it is discussed in Supplementary Note 4 and illustrated in Supplementary Figs. 10–13, our neural network tends to suffer from underfitting when trained on maps derived from atomic models and thus, the results are blurrier than the ones obtained when using LocScale maps as a target. One possible explanation for such behavior could be the fact that, when using LocScale, the input and target maps, although different, still share some similar properties such as intensity ranges or local quality, which are not necessarily preserved when using simulated maps as targets. As a consequence, it is reasonable to believe that as the input and target maps become more similar, the training process should also become easier. For these reasons, we expect that super-resolution approaches trained on maps derived from atomic models will only be possible when more powerful models will be employed at the cost of more powerful computational resources and larger datasets.
The performance of our algorithm has been assessed using a testing set of 20 experimental maps that were not used for training nor during the trial-and-error process required for its implementation. In all cases, the similarity between the maps obtained from the atomic models and the experimental maps improved after the application of DeepEMhancer. Additionally, we evaluated in detail the performance of DeepEMhancer on two of those maps, showing that, not only DeepEMhancer facilitates the visualization of cryo-EM maps, but also that DeepEMhancer can unveil some details that are not easily recognizable in the raw maps.
Nevertheless, it is important to highlight that DeepEMhancer is not the ultimate solution and that different examples will benefit from considering simultaneously different post-processing techniques. This is of especial importance for some of the cases in which DeepEMhancer, by dataset scarcity, presents limitations, for instance, when dealing with uncommon posttranslational modifications (see Supplementary Note 5 and Supplementary Figs. 14 and 15).
Another important caveat that all methods intended to enhance maps need to face is the problem of model validation. Although the results here presented have been validated using as ground truth the published models, in real-world scenarios such ground truth models are not available, and thus, the goodness of the results should be addressed by the users. To that end, we recommend trying and comparing different approaches since orthogonal methods should reveal inconsistencies. On the contrary, we discourage users from trying to estimate the resolution of post-processed maps, as there is no obvious way of doing it without ground-truth and even in those cases, masking effects could be challenging (see Supplementary Note 3).
Finally, with the aim of illustrating how beneficial DeepEMhancer could be in real-world scenarios, we have employed it on a map of the RNA polymerase of the SARS-CoV 2 virus, improving its quality of the map and the quality of the associated atomic model.
Methods
Raw data collection
DeepEMhancer has been trained and evaluated using as input a subset of cryo-EM maps obtained from the EMDB19 that meet the following requirements: (1) resolution better than 7 Å; (2) have one and only one atomic model associated; (3) correlation between the atomic model and the map better than 0.6; and (4) half maps available. As a result, an original list of 415 maps was compiled. However, this initial list is highly redundant and, in order to avoid biases in both the training and evaluation procedures, this list was further filtered to reduce its redundancy (see subsection “Redundancy control”). Finally, after a visual inspection aimed at removing problematic cases that survived the automatic filtering procedure, a total amount of 147 maps, with an average reported resolution of 3.8 Å, were selected.
Since the main objective of DeepEMhancer is to perform a sharpening-like post-processing transformation, it is important to ensure that the maps used in this study were not previously sharpened. Given the fact that most of the maps deposited in EMDB are sharpened and many are also masked, we decided to employ only the half-maps available in EMDB (condition number 4). Due to the lack of an appropriate searching tool in EMDB and a file name convention, we had to analyze all the map file names included in the database looking for the substring “half” to recover the half maps. Full maps were obtained averaging respective half maps.
As learning targets, we employed the output generated by LocScale using as input the aforementioned maps and their associated atomic models. Additionally, the output maps were tightly masked using as masks the maps simulated from the atomic models after a thresholding operation (see Supplementary Note 6 and Supplementary Fig. 16).
Data preparation
Due to the fact that the monomers (amino acids, nucleotides, etc.) that compose the macromolecules have fixed size but the deposited maps vary in voxel size, both the input and the target maps were resampled to 1 Å/voxel size with the aim of facilitating the learning process. After that, the intensity of each volume was normalized using the classical cryo-EM approach by which the map noise statistics are forced to adopt a fixed mean and standard deviation (0 and 0.1, respectively). Finally, due to GPU memory limitations, the maps were chunked into 64 × 64 × 64 cubes, the maximum size that our computing systems were able to efficiently manage. As a result, more than 70k volume cubes, including both signal cubes and noise-only cubes were used for training.
Redundancy control
In order to perform the train/test/validation split used to develop and evaluate our method, it is important to consider that the universe of proteins is highly redundant and that the EMDB entries are even more redundant. Serve as an example the case of the ribosome, which supposes ~10% of all EMDB entries. Thus, in order to avoid an over-optimistic performance estimation, we have ensured that the train, test, and validation sets are mutually exclusive in the sense that their intersections are empty under a certain equivalence criterion. Particularly, we consider that two EMDB entries are equivalent if they share one sequence that belongs to the same 30% sequence identity cluster. Similarly, with the aim of eliminating potential bias in the evaluation, we have guaranteed that only one member per cluster is included in testing and validation sets. On the contrary, we have relaxed our quite strict redundancy control policy in the training set allowing up to five cluster representatives in an attempt to increase the size of this set. This decision is founded on the fact that even maps of the same exact protein may present different statistics due to the intrinsic variability of cryo-EM reconstruction workflows and thus, limiting their presence in the training set may be difficult for the generalization of the neural network.
As a result, a list of 107, 21, and 20 maps were used for training, validation, and testing, respectively. The full list of the EMDB entries used can be found in Supplementary Note 7 and Supplementary Data 1.
Neural network architecture
We have employed a 3D U-net-like neural network20 as a regression model for the estimation of post-processed maps. Our neural network consists of three downsampling blocks and three upsampling blocks with skip connections. Each block contains three convolutional layers followed by group normalization25 and PRelu activation26. The number of filters for each block is 3 × 32, 3 × 64, and 3 × 128, respectively. Downsampling is carried out using strided convolution and upsampling is performed via transposed convolution. See Supplementary Table 1 for additional details.
Neural network training
Our neural network was trained using stochastic gradient descent with a batch size of 8 cubes. Initial learning rate was set to 10−3 and decreased by a factor of 0.5 when the validation loss did not improve during 5 epochs. As a loss function, a mean absolute error was employed. Data augmentation, consisting of random 90° rotations, gaussian blurring, and patch corruption was applied to the training data.
Neural network inference
In order to perform volume post-processing, the input volume is pre-processed as described in the “Data preparation” subsection. Then, the resized and normalized volume is chunked into overlapping cubes of size 64 × 64 × 64 with strides of 16 voxels. Each cube is individually processed by the trained neural network, yielding post-processed cubes. After that, the post-processed cubes are re-assembled into the final volume averaging the overlapping parts. Finally, the processed volume is resized to the size of the original volume, thus, showing the correct sampling rate value.
Evaluation
With the aim of guiding the cross-validation process, we computed the correlation coefficient between the maps produced by DeepEMhancer and the maps used as learning targets (masked LocScale post-processed maps). Once the final model was selected, the quality of DeepEMhancer predictions was assessed comparing the input and processed maps against the reference maps obtained from the atomic models. Specifically, we computed the FSC between them and we estimated the resolution using 0.5 as the threshold. Due to the fact that DeepEMhancer performs a non-conventional post-processing operation, including masking and enhancement operations, in order to disentangle the two effects, the FSC was also computed after masking the maps to compare with a tight mask derived from the atomic model.
As a complementary metric, we also applied DeepRes17 over the input and processed maps. DeepRes is a deep learning-based local resolution method that, contrary to others, is sensitive to the sharpening process and thus, it can provide an alternative estimation of the post-processing effect.
Finally, for comparison purposes, we repeated the FSC and DeepRes experiments using the Relion postprocessing program2,3. As Relion automatic masking is very simple, in order to make the comparison more interesting, we decided to execute the postprocessing algorithm using the mask derived from the atomic models. Similarly, since the automatic determination of the B-factor can produce worse results than a manually selected one, in addition to the maps computed using an automatically determined B-factor by Relion, we also considered the sharpened map deposited in EMDB.
EMD-30178 map evaluation and atomic model modification
DeepEMhancer was applied to the half maps deposited in EMDB entry EMD-30178. The published and post-processed maps were visually inspected using Coot27 and Chimera28, and chosen regions on the 7btf PDB were newly built or modified using Coot.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
The authors would like to acknowledge economical support from: The Spanish Ministry of Science and Innovation through Grants: Proyectos de I+D+i - RTI Tipo A PID2019-108850RA-I00, SEV 2017-0712, PID2019-104757RB-I00/ AEI/10.13039/501100011033; the “Comunidad Autónoma de Madrid” through Grant S2017/BMD-3817; CSIC: PIE/COVID-19 number 202020E079; European Union (EU) and Horizon 2020 through grants EOSC Life (INFRAEOSC-04-2018, Proposal: 824087) and HighResCells (ERC - 2018- SyG, Proposal: 810057). J.V. acknowledges economical support from the Ramón y Cajal 2018 program (RYC2018-024087-I).
Author contributions
Conceptualization: J.V., C.O.S.S., R.S.-G.; Methodology: R.S.-G., J.V., and C.O.S.S.; Software implementation: R.S.-G., J.G.; Evaluation: R.S.-G., J.V., A.C.; Writing: R.S.-G., A.C., J.V., C.O.S.S. and J.M.C.; Supervision: J.V., C.O.S.S.; Funding acquisition: J.M.C., J.V.
Data availability
All training and testing examples used in this work can be found in the EMDB and PDB databases. Accessions codes are included in Supplementary Note 7 and Supplementary Data 1. Post-processed map examples and trained models are freely available at http://campins.cnb.csic.es/deepEMhancer/examples. Data used during figure preparation is available in Supplementary Data 2 and 3. All other data are available from the corresponding authors upon reasonable request.
Code availability
DeepEMhancer is freely available at https://github.com/rsanchezgarc/deepEMhancer and as an Xmipp protocol for Scipion v3 (https://github.com/I2PC/scipion-em-xmipp).
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary handling editors: Jung-Eun Lee, Christina Karlsson Rosenthal, George Inglis.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Carlos Oscar S. Sorzano, Email: coss@cnb.csic.es
Javier Vargas, Email: jvargas@fis.ucm.es.
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-021-02399-1.
References
- 1.Rosenthal PB, Henderson R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 2003;333:721–745. doi: 10.1016/j.jmb.2003.07.013. [DOI] [PubMed] [Google Scholar]
- 2.Kimanius D, Forsberg BO, Scheres SH, Lindahl E. Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2. Elife. 2016;5:e18722. doi: 10.7554/eLife.18722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zivanov J, et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife. 2018;7:e42166. doi: 10.7554/eLife.42166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Terwilliger TC, Sobolev OV, Afonine PV, Adams PD. Automated map sharpening by maximization of detail and connectivity. Acta Crystallogr. Sect. D. 2018;74:545–559. doi: 10.1107/S2059798318004655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vilas JL, et al. Re-examining the spectra of macromolecules. Current practice of spectral quasi B-factor flattening. J. Struct. Biol. 2020;209:107447. doi: 10.1016/j.jsb.2020.107447. [DOI] [PubMed] [Google Scholar]
- 6.Jakobi AJ, Wilmanns M, Sachse C. Model-based local density sharpening of cryo-EM maps. Elife. 2017;6:e27131. doi: 10.7554/eLife.27131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ramírez-Aportela E, et al. Automatic local resolution-based sharpening of cryo-EM maps. Bioinformatics. 2020;36:765–772. doi: 10.1093/bioinformatics/btz671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kaur S, et al. Local computational methods to improve the interpretability and analysis of cryo-EM maps. Nat. Commun. 2021;12:1240. doi: 10.1038/s41467-021-21509-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wagner T, et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Commun. Biol. 2019;2:218. doi: 10.1038/s42003-019-0437-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang F, et al. DeepPicker: A deep learning approach for fully automated particle picking in cryo-EM. J. Struct. Biol. 2016;195:325–336. doi: 10.1016/j.jsb.2016.07.006. [DOI] [PubMed] [Google Scholar]
- 11.Zhu Y, Ouyang Q, Mao Y. A deep convolutional neural network approach to single-particle recognition in cryo-electron microscopy. BMC Bioinforma. 2017;18:348. doi: 10.1186/s12859-017-1757-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gupta, H., McCann, M. T., Donati, L. & Unser, M. CryoGAN: a new reconstruction paradigm for single-particle cryo-EM via deep adversarial learning. Preprint at bioRxiv10.1101/2020.03.20.001016 (2020).
- 13.Zhong, E. D., Bepler, T., Davis, J. H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Preprint at arxiv https://arxiv.org/abs/1909.05215v3 (2019).
- 14.Maddhuri Venkata Subramaniya SR, Terashi G, Kihara D. Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nat. Methods. 2019;16:911–917. doi: 10.1038/s41592-019-0500-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Si D, et al. Deep learning to predict protein backbone structure from high-resolution cryo-EM density maps. Sci. Rep. 2020;10:1–22. doi: 10.1038/s41598-019-56847-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Avramov T, et al. Deep learning for validating and estimating resolution of cryo-electron microscopy density maps †. Molecules. 2019;24:1181. doi: 10.3390/molecules24061181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ramírez-Aportela E, Mota J, Conesa P, Carazo JM, Sorzano COS. DeepRes: a new deep-learning- and aspect-based local resolution method for electron-microscopy maps. IUCrJ. 2019;6:1054–1063. doi: 10.1107/S2052252519011692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yang W, et al. Deep learning for single image super-resolution: a brief review. IEEE Trans. Multimed. 2019;21:3106–3121. doi: 10.1109/TMM.2019.2919431. [DOI] [Google Scholar]
- 19.Lawson CL, et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 2015;44:D396–D403. doi: 10.1093/nar/gkv1126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI, Vol. 9351, (eds Navab, N., Hornegger, J., Wells, W. M., Frangi, A. F.) 234–241 (2015).
- 21.Tenthorey JL, et al. The structural basis of flagellin detection by NAIP5: a strategy to limit pathogen immune evasion. Science (80-.) 2017;358:888–893. doi: 10.1126/science.aao1140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Johnson ZL, Chen J. ATP Binding enables substrate release from multidrug resistance protein 1. Cell. 2018;172:81–89e10. doi: 10.1016/j.cell.2017.12.005. [DOI] [PubMed] [Google Scholar]
- 23.Walter JD, Sawicka M, Dutzler R. Cryo-EM structures and functional characterization of murine Slc26a9 reveal mechanism of uncoupled chloride transport. Elife. 2019;8:e46986. doi: 10.7554/eLife.46986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gao Y, et al. Structure of the RNA-dependent RNA polymerase from COVID-19 virus. Science (80-.) 2020;368:779–782. doi: 10.1126/science.abb7498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wu Y, He K. Group normalization. Int. J. Comput. Vis. 2020;128:742–755. doi: 10.1007/s11263-019-01198-w. [DOI] [Google Scholar]
- 26.He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proc. 2015 International IEEE International Conference on Computer Vision, 1026–1034 (2015).
- 27.Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr. Sect. D. 2004;60:2126–2132. doi: 10.1107/S0907444904019158. [DOI] [PubMed] [Google Scholar]
- 28.Pettersen EF, et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All training and testing examples used in this work can be found in the EMDB and PDB databases. Accessions codes are included in Supplementary Note 7 and Supplementary Data 1. Post-processed map examples and trained models are freely available at http://campins.cnb.csic.es/deepEMhancer/examples. Data used during figure preparation is available in Supplementary Data 2 and 3. All other data are available from the corresponding authors upon reasonable request.
DeepEMhancer is freely available at https://github.com/rsanchezgarc/deepEMhancer and as an Xmipp protocol for Scipion v3 (https://github.com/I2PC/scipion-em-xmipp).