Abstract
Cryo-electron microscopy (cryo-EM) maps usually show heterogeneous distributions of B-factors and electron density occupancies and are typically B-factor sharpened to improve their contrast and interpretability at high-resolutions. However, ‘over-sharpening’ due to the application of a single global B-factor can distort processed maps causing connected densities to appear broken and disconnected. This issue limits the interpretability of cryo-EM maps, i.e. ab initio modelling. In this work, we propose 1) approaches to enhance high-resolution features of cryo-EM maps, while preventing map distortions and 2) methods to obtain local B-factors and electron density occupancy maps. These algorithms have as common link the use of the spiral phase transformation and are called LocSpiral, LocBSharpen, LocBFactor and LocOccupancy. Our results, which include improved maps of recent SARS-CoV-2 structures, show that our methods can improve the interpretability and analysis of obtained reconstructions.
Subject terms: Structural biology, Cryoelectron microscopy, Software, Cryoelectron microscopy
Here, the authors present two local methods for analyzing cryo-EM maps: LocSpiral and LocBSharpen that enhance high-resolution features of cryoEM maps, while preventing map distortions. They also introduce LocBFactor and LocOccupancy, which allow obtaining local B-factors and electron density occupancy maps from cryo-EM reconstructions and the authors demonstrate that these methods improve the interpretability and analysis of cryo-EM maps using different test cases among them recent SARS-CoV-2 spike glycoprotein structures.
Introduction
Cryo-electron microscopy (cryo-EM) has become a mainstream technique for structure determination of macromolecular complexes at close-to-atomic resolution and ultimately for building an atomic model1,2. With its unique ability to reconstruct multiple conformations and compositions of the macromolecular complexes, cryo-EM allows the understanding of the structural and assembly dynamics of macromolecular complexes in their native conditions3–5. However, the presence of heterogeneity in cryo-EM maps leads to high variability in resolution within different regions of the same map. This directs to challenges and errors in the process of building an atomic model from a cryo-EM reconstruction. Additionally, current reconstructions from cryo-EM do not provide essential information to build accurate ab initio atomic models as atomic Debye–Waller factors (B-factors) or atomic occupancies, while their counterparts from X-ray crystallography do by analysing the attenuation of scattered intensity represented at Bragg peaks.
Cryo-EM structures exhibit loss of contrast at high-resolution coming from many different sources, including molecular motions, heterogeneity and/or signal damping by the transfer function of the electron microscope (CTF). Interpretation of high-resolution features in cryo-EM maps is essential to understanding the biological functions of macromolecules. Thus, approaches to compensate for this contrast loss and improve map visibility at high-resolution are crucial. This process is usually referred to as ‘sharpening’ and is typically performed by imposing a uniform B-factor to the cryo-EM map that boosts the map signal amplitudes within a defined resolution range. When the map is sharpened with increasing positive B-factors, the clarity and map details initially improve, but eventually, the map becomes worse as the connectivity is lost, and the map densities appear broken and noisy. In the global sharpening approach6–8, the B-factor is automatically computed by determining the line that best fits the decay of the spherically averaged noise-weighted amplitude structure factors, within a resolution range given by [15–10 Å, Rmax], with Rmax the maximum resolution in the map given by the Fourier Shell Correlation (FSC). More recently, the AutoSharpen method within Phenix9 calculates a single B-factor that maximises both map connectivity and details of the resulting sharpened map. AutoSharpen automatically chooses the B-factor that leads to the highest level of detail in the map, while maintaining connectivity. This combination is optimised by maximising the surface area of the contours in the sharpened map.
The approaches presented above are global, so the same signal amplitude scaling is applied to map regions that may exhibit very different signal to noise ratios (SNRs) at medium/high-resolutions. Thus, cryo-EM maps showing inhomogeneous SNRs (and resolutions) can result in sharpened maps that show both over-sharpened and under-sharpened regions. The former may be strongly affected by noise and broken densities, while the latter may present reduced contrast at high-resolutions. Both cases make it difficult or even impossible to interpret the biological relevance of these regions or even the whole map10. Thus, local sharpening methods have been proposed to overcome these limitations11,12. LocScale approach11 compares radial averages of structure factor amplitudes inside moving windows between the experimental and the atomic density maps. After, the method modifies locally the map amplitudes of the experimental map in Fourier space to rescale them accordingly to those of the atomic map. This approach requires as input a complete atomic model (without major gaps) fitted to the cryo-EM map to be sharpened, which is not always available. In addition, the size of the moving window should be provided and depending on the quality of the map to be sharpened, this process may lead to overfitting. More recently, the LocalDeblur method12 proposed an approach for map local sharpening using as input an estimation of the local resolution. The method assumes that the map local density values have been obtained by the convolution between a local isotropic low-pass filter and the actual map. This local low-pass filter is assumed Gaussian-shaped so that the frequency cutoff is given by the local resolution estimation.
In X-ray crystallography, the B-factor (also called temperature value or Debye–Waller factor) describes the degree to which the electron density is spread out, indicating the true static or dynamic mobility of an atom and/or the positions where errors may exist in the model building. The B-factor is given by , where is the mean square displacement for atom i. These atomic B-factors can be experimentally measured in X-ray crystallography, introduced as an amendment factor of the structure factor calculations since the scattering effect of X-ray is reduced on the oscillating atoms compared to the atoms at rest13. B-factors can be further refined by model building packages, i.e. Phenix14 or Refmac15 to improve the quality and accuracy of atomics models. Although B-factors are essential to ‘sharpen’ cryo-EM maps at high-resolution, they also provide key information to analyse cryo-EM reconstructions. Effective B-factors are used to model the combined effects of issues such as molecular drifting due to charging effects, macromolecular flexibility or possible errors in the reconstruction workflow that lead to a signal fall-off6,16,17. However, cryo-EM maps are usually analysed with a single B-factor, even though maps may largely differ in different regions. Thus, methods to determine local B-factors are much needed to accurately analyse cryo-EM maps and improve the quality of fitted atomic models. Another local parameter usually provided by X-ray crystallography in contrast with cryo-EM are atomic occupancies (or Q-values). The occupancy estimates the presence of an atom at its mean position and it ranges between 0.0 to 1.0. Note that these parameters can be also refined by model building packages if the electron density map is of sufficient resolution. To our knowledge, currently, there is not any available method to estimate local occupancies from cryo-EM maps, even though this information (in addition to local B-factors) is essential to building accurate atomic models. For example, in ref. 18 authors found that 31% of all models examined in this analysis possess unrealistic occupancies or/and B-factor values, such as all being set to zero or other unlikely values. They also reported that 40% of models analysed show cross-correlations between cryo-EM maps and respective models below 0.5, and they indicated as a possible hypothesis an incomplete optimisation of the model parameters (coordinates, occupancies and B-factors).
In this work, we propose semi-automated methods to enhance high-resolution map features to improve their visibility and interpretability. More importantly, these approaches do not require input parameters as fitted atomic models or local resolution maps, which reduces the possibility of overfitting. In particular, our proposed local map enhancement approach (LocSpiral) is robust to maps affected by inhomogeneous local resolutions/SNRs, thus the method strongly improves the interpretability of these maps. Secondly, we also propose approaches to determine local B-factors and density occupancy maps to improve the analysis of cryo-EM reconstructions. The link between the different proposed approaches is the use of the spiral phase transform to estimate a modulation or amplitude map of the cryo-EM reconstruction at different resolutions.
Results
In this section, we first provide a brief and comprehensive description of the approaches developed in this work. A deeper and more technical explanation of these methods is given in the ‘Methods’ section at the end of the manuscript. Then, we present results obtained by our approaches in a variety of situations. We tested our proposed methods with five different samples ranging from near-atomic single-particle reconstructions (∼1.54 Å) to maps with more modest resolutions (∼6.5 Å). In all cases, we compared our results with the ones provided by the Relion postprocessing approach7,19.
Overview of the proposed methods
The input parameters of the different methods (LocSpiral, LocBSharpen, LocBFactor and LocOccupancy) is the unfiltered map to process, a resolution range given by [Rmin, Rmax] and, in some cases, a tight solvent mask. The different algorithms start by filtering the input map to a given resolution 1/ω within the resolution range. Then, the 3D spiral phase transform is calculated to factorise in real space the filtered map into amplitude and a phase map as
1 |
The amplitude map mω(r) is related to the ‘strength’ of the local map signal at resolution 1/ω, while the phase map refers to its shape and it is limited to the [−1, +1] range. The different methods proposed here are based on the analysis of the amplitude maps. In some cases, the approaches compute new amplitude maps (), which are used to determine a sharpened map (LocSpiral, LocBSharpen) as
2 |
with a SNR weighting parameter (please see ‘Methods’ section). In other cases, the amplitude maps are further analysed to provide local B-factor maps (LocBFactor) or a local occupancy map estimation (LocOccupancy).
In LocSpiral at every resolution inside the resolution range, the amplitude map is compared locally with a noise threshold value computed from the 90–95% quantile of the empirical noise/background distribution at this resolution. This empirical distribution is generated collecting the amplitude values at resolution 1/ω for all voxels outside the tight solvent mask. The computed noise threshold is then used to obtain a new normalised and filtered amplitude map, , which is used to reconstruct the sharpened map as shown in Eq. (2).
In LocBSharpen the amplitude map at a resolution 1/ω0 is stored. The resolution 1/ω0 is provided by the user and is typically 15–10 Å. In the process of building the sharpened map, the new amplitude map at any resolution equal or higher than 1/ω0 is equal to , while for the rest of resolutions inside the resolution range, is equal to .
In LocBFactor the amplitude maps at different resolutions inside the defined resolution range are used to estimate map local B-factors. A typical resolution range is of [15, Rmax] Å, being Rmax the global map resolution. To compute the local B-factors, the method obtains at every voxel r the linear fitting between and ω2 within the resolution rage. The method provides as output the B-map (local B-factor map) and the A-map (local values of the logarithm of structure factor amplitudes at 15 Å).
In LocOccupancy, the local occupancy map is estimated comparing the amplitude map with a macromolecule density threshold for every resolution inside the defined resolution range. The macromolecule density threshold at a given resolution indicates the density value at which we are confident that the electron density occupancy is of 100% at this resolution. This threshold is obtained from the empirical macromolecule amplitude probability distribution at frequency ω. This amplitude probability distribution is calculated from density values at voxels that are included inside the solvent mask. From this distribution, the macromolecule density threshold may be calculated from the macromolecule amplitude value corresponding to the 25% quantile, given by . Then, for every voxel and resolution within the resolution range, the amplitude map is compared with , providing a value between 0 and 1. Finally, the average value over all resolutions is computed and provided as an estimation of the map occupancy within the resolution range.
Polycystin-2 (PC2) TRP channel
First, we analysed a single-particle reconstruction of the polycystin-2 (PC2) TRP channel (EMDataBank: EMD-10418)20. In this case, we focussed on showing the capacity of LocSpiral approach, though, for the sake of consistency, we also show results of obtained B-factor and occupancy maps. The original publication reports a resolution of 2.96 Å with a final B-factor to be used for sharpening of −84.56 Å2 (slope of Guinier plot fitting equal to −21.14 Å2).
In Fig. 1A, we show maps with high threshold values obtained by LocSpiral and by the postprocessing method of Relion 37,19. The map densities are similar in the inner core of the protein as can be seen from the solid red rectangle in the figure, where we show a zoomed view of LocSpiral and Relion maps of the region indicated in the red rectangles over the maps. However, the map densities are quite different in the outer regions, where the Relion map shows thin and broken densities. In addition, we show comparisons of fitted densities with the corresponding atomic model (PDB ID: 6t9n) of two α-helices and one loop. The asterisks label results obtained by LocSpiral. The residues marked with a red arrow were used to adjust the threshold values between maps. These comparisons show that the map obtained by LocSpiral shows fewer fragmented and broken densities and better coverage of the atomic model, helping in the interpretation of the maps and in the process of building accurate atomic models. In Supplementary Fig. 1, we show additional figures comparing LocSpiral and Relion postprocessing maps.
We also compared the performance of LocSpiral with other methods, including LocalDeblur, our proposed local B-factor correction method (LocBSharpen) and the global B-factor correction approach as implemented in Relion. The results are shown in Supplementary Note 1, Supplementary Table 1 and Supplementary Fig. 2 where we also provide results obtained by LocBFactor and LocOccupancy methods.
Pre-catalytic spliceosome
Next, we processed the Saccharomyces cerevisiae pre-catalytic B complex spliceosomal single particles deposited in EMPIAR (EMPIAR 10180)4,21. This dataset exhibits a high degree of conformational heterogeneity, thus, it represents a perfect use case to test our proposed approaches. We used the approach described in ref. 22 to obtain a reconstruction at 4.28 Å resolution after Relion postprocessing7,19. In the ‘Methods’ section, we provide a detailed description of the image processing workflow used to obtain this reconstruction. The unfiltered map provided by Relion autorefine was used as input to LocSpiral, LocBFactor and LocOccupancy.
We first show results obtained by LocSpiral method for this highly heterogeneous case. In Fig. 1B, we show maps at different orientations and similar threshold values obtained by LocSpiral and by the postprocessing method of Relion 37,19. As before, the LocSpiral map shows fewer fragmented and broken densities, especially in the flexible part of the spliceosome reconstruction, and enhanced details in the central core portion improving the visibility of the reconstruction.
We then concentrate on showing the capacity of LocBFactor method. In Fig. 2A, we show a central slice along the Z axis of this map with several points marked with coloured squares. These points show parts of the map that correspond to clear spliceosome densities (green and red), flexible and low-resolution spliceosomal regions (yellow and blue) and background (magenta). Figure 2B shows the corresponding Guinier plots at these locations. Solid lines represent measured values of the logarithm of SNR-weighted structure factor amplitudes, while dashed lines show fitted curves. This figure also provides the obtained B-factors for the different curves. The Guinier plots and B-factors are determined within a resolution range of 15 Å to the FSC resolution, given by 4.28 Å. As can be seen from Fig. 2B, the red and green curves, which correspond to clear spliceosomal densities, present high amplitude values at 15 Å, while the yellow, blue and magenta curves show low amplitudes at 15 Å and a flat profile within the resolution range. In Fig. 2B, we also show in the black curve, the Guinier plot of the noise/background amplitudes obtained from the 90–95% quantile of the empirical noise/background distribution for reference. The discontinuous black line indicates the linear fit of this noise Guinier plot. Comparing the yellow, blue, magenta, and black curves, it is clear that these plots are below our noise level and that the shape of these curves is similar to that of the noise curve. Thus, these B-factors describe mainly noise B-factors that show how the noise signal fall off inside the used resolution range and they should be filtered out from our B-factor map. Moreover, Fig. 2C shows the spliceosome map coloured according to the occupancy map obtained by LocOccupancy using a resolution range of [30, 10] Å. From Fig. 2C, we see that the flexible and moving parts of the spliceosome, like the ones indicated with the yellow and blue points in Fig. 2A, show low occupancies (close to zero) within the used resolution range. Figure 2D renders the spliceosome map coloured with the obtained B-factor map to be used for sharpening (slope of the local Guinier plot multiplied by 4). In Fig. 2D the noise B-factors (B-factors obtained from amplitudes below the noise level for the used resolution range) are filtered out and appear with black colour. Note that Guinier plots at regions with amplitudes below the noise level are dominated by the noise signal and describe the noise signal fall-off inside the used resolution range. The noise signal presents typically a flat spectrum, thus, artefactual close to zero B-factors, which are not in agreement with the concept of B-factor as a measure of position uncertainty or disorder. Figure 2E shows the corresponding local resolution map as obtained by Resmap23 of the spliceosome reconstruction. As can be seen from this figure, the local resolution values of the flexible parts (helicase and SF3b domains) are lower than the others and within a range of [10, 15] Å. Consequently, the obtained amplitudes for these flexible parts within the resolution range of [15, 4.28] Å are dominated by the noise/background signal. The average inside a solvent mask of the signal B-factors (B-factors obtained from amplitude values above the noise level for the used resolution range) is −567.62 Å2, while the value reported by Relion postprocessing is −158.08 Å2. Note that Relion postprocessing does not filter out regions dominated by noise/background when computing the global B-factor. As mentioned before, regions dominated by the noise signal within the used resolution range present artefactual low B-factors. Consequently, this global B-factor may be overestimated. A more detailed description of this point is given in Supplementary Note 3: B-factor analysis of low and high-resolution maps. In Fig. 2F, we show the local values of the logarithm of the structure factor’s amplitudes at 15 Å (A map). As expected, this map shows low amplitudes at highly flexible and moving regions. We have recalculated B-factors using a new resolution range of [20, 10] Å. The results are shown in Fig. 2G–I. As can be seen from these figures, now the flexible parts show unfiltered low signal B-factors and low amplitudes at 20 Å. However, it is important to note that at this resolution range, the B-factors are dominated by the molecular shape and solvent contrast and not by resolution limiting factors such as errors in the reconstruction procedure (as the presence of heterogeneity), radiation damage or imaging imperfections, for example6. Consequently, it is not recommended to use a resolution range of [20, 10] Å as obtained B-factors may not be used to evaluate map quality.
Apoferritin
We have also applied these techniques to recently reported high-resolution cryo-EM reconstructions of mouse apoferritin: EMD-9865 and EMD-21024. The reported global resolution of these reconstructions is 1.54 and 1.75 Å for EMD-9865 and EMD-21024, respectively.
In Fig. 3A, B, we show the results obtained by LocBFactor (B and A maps) and LocOccupancy methods (occupancy maps). The resolution range used to estimate the B and A maps was between 15 Å to the reported global resolution for both cases. The occupancy maps were calculated for these high-resolution maps between 5 Å to the global resolution. As can be seen from Fig. 3B, EMD-9865 shows lower B-factors and higher local amplitudes than EMD-21024, indicating a better-quality reconstruction, however, the low values of both B maps indicate the high quality of these reconstructions. In both cases, the highest B-factors are in the outer regions of the protein. Moreover, local occupancies show similar maps for both cases, showing occupancies as low as approximately 0.5 at the outer part and indicating the presence of flexibility in these outer residues. Note that the obtained average and standard deviation of B-factors inside a solvent mask is of −56 and 7.20 Å2 (EMD-9865) and −78 and 8.93 Å2 (EMD-21024), respectively, which reflects the high quality of these reconstructions.
In Fig. 3C, we show sharpened maps obtained from EMD-9865 by LocSpiral and by the postprocessing method of Relion 3. The LocSpiral map is shown in grey colour, while the Relion map is rendered in red. The solid black rectangle shows a zoomed view of the outer region of the protein, which is indicated with the dashed black rectangles in the figure. Supplementary Fig. 3 shows that the extra densities that appear in the LocSpiral map correspond to missing residues in EMD-9865. Additional, at the right of Fig. 3C, we show the respective occupancy map obtained by LocOccupancy at the same orientation that these sharpened maps. As can be seen from Fig. 3C, the LocSpiral map shows fewer fragmented and broken densities, especially in the parts of the map that shows low occupancies. We compute also EMRINGER and MolProbity scores24 between these maps (EMD-9865 and LocSpiral) and the atomic model (PDB 6v21) after refining the structure against corresponding maps by Phenix real_space_refine approach25 using 5 refining iterations. The results obtained are shown in Supplementary Table S1.
Immature prokaryote ribosomes
We processed immature ribosomal maps of the bacterial large subunit3. These maps were obtained after depletion of bL17 ribosomal protein and are publicly available from the Electron Microscopy Data Bank (EMDB) (EMD-8440, EMD-8441, EMD-8445, EMD-8450, EMD-8434)26. In this case, we focussed on showing the capacity of LocOccupancy to interpret and analyse reconstructions showing a high degree of compositional heterogeneity.
Figure 4 shows the obtained results. The first row shows the different maps to be processed as deposited in the EMDB. Next, we show the obtained occupancy maps by LocOccupancy, where the mature 50S ribosome (EMD-8434) is coloured according to corresponding occupancy maps. The resolution range used was [30, 10] Å. These figures clearly show regions that are lacking in the different immature maps with respect to the mature map. Thus, occupancy maps were used to create binary masks to segment the mature 50S ribosome map, extracting after the densities that are missing in the respective immature maps. These densities are shown in the third column of Fig. 4 with different colours (yellow, red, indigo and green). The obtained occupancy maps also allow us to define a ‘maturity level’ index. This index is calculated by comparing the number of voxels activated in the solvent mask of the mature 50S reconstruction with the ones in the occupancy masks (see methods section for a more detailed description). As can be seen from Fig. 4, the larger the unfolded regions in the immature maps are, the smaller the maturity level is. This maturity level index allows us to quantitatively sort the different immature maps in a spectrum according to their maturity.
In the Supplementary Note 2 and Supplementary Fig. 4, we further show the advantages of LocSpiral and LocBFactor approaches in these highly heterogeneous datasets compared to the global sharpening approach.
SARS-CoV-2
We have processed recent cryo-EM maps of the SARS-CoV-2 spike (S) glycoprotein27,28. These maps include cryo-EM reconstructions of the SARS-CoV-2 spike in the prefusion conformation with a single receptor-binding domain (RBD) up (EMD-21375) and after imposing C3 symmetry in the refinement to improve visualisation of the symmetric S2 subunit (EMD-21374). We also processed additional cryo-EM reconstructions from the Veesler lab of the SARS-CoV-2 spike glycoprotein with three RBDs down (EMD-21452) and the SARS-CoV-2 spike ectodomain structure (EMD-21457) with a single RBD up. The reported global resolution of these maps is 3.46, 3.17, 2.8 and 3.2 Å, respectively. Interesting deposited atomic models (PDBs PDB 6vsb, PDB 6vxx and PDB 6vyb) incompletely cover the reconstructed cryo-EM maps, showing the existence of disordered or over-sharpened regions after B-factor correction that could not be modelled. Supplementary Fig. 5 displays corresponding maps and fitted atomic models showing a large amount of protein that is not currently modelled.
In Fig. 5A, we show EMD-21375 map and the obtained LocSpiral reconstruction. In this figure, we use a relatively low threshold to visualise the outer parts of the protein. This figure shows that our obtained reconstruction presents less fragmented and broken densities and better map connectivity than the one deposited in EMD, suggesting that our approach improves the analysis and visualisation of the outer regions and potentially aides in the modelling of additional map motifs. In Supplementary Fig. 6A, we show similar results for EMD-21374, EMD-21452 and EMD-21457 maps. Interesting, the LocSpiral EMD-21374 map shows some additional fragmented densities at the top of the spike, however, we believe that these additional densities are in fact artefacts that come as a result of artificially imposing C3 symmetry on particles that are asymmetric. In Fig. 5B, we show the local B-factor map to be used for sharpening (slope of the local Guinier plot multiplied by 4) obtained by LocBFactor for EMD-21375 and in Supplementary Fig. 6B, we compare obtained local B-factor maps from EMD-21375, EMD-21374, EMD-21452 and EMD-21454 maps using a similar colourmap. Supplementary Fig. 6B shows that EMD-21452 and EMD-21454 present lower B-factors than EMD-21374 and EMD-21374, and then a better localizability of secondary structure and residues.
Then, we used the LocSpiral EMD-21375 reconstruction to improve the deposited atomic model (PDB 6vsb). As result, we could model additional loops and motifs: K444.C-F490.C; E96.C-S98.C; NAG1322.C; P812.C-K814.C, and some additional amino acids, which are now visible in the improved map: P621.C-G639.C; S673.C-V687.C; A829.B-A825.B. We were also able to visualise map densities corresponding to numerous additional N-linked glycans that could not be resolved in the original reconstruction. Examples of some regions that could be further modelled are shown in Fig. 5C, D. In Fig. 5C, we show the obtained LocSpiral map with the improved atomic model in green at the left and marked with an asterisk. At the right, it is rendered the deposited EMD map with the PDB 6vsb in magenta. Figure 5D shows in white the PDB 6vsb with the traced parts of the glycan proteins marked with purple spheres and in red the additional parts that could be traced using LocSpiral map. In addition, in this figure, we provide also zoomed views of two glycan proteins that could be further modelled with our improved map. Corresponding EMRINGER and MolProbity scores, calculated between LocSpiral map and the improved atomic model, and between EMD-21375 and the deposited model (PDB 6vsb), are shown in Supplementary Table S1. In both cases, the atomic structures were refined against corresponding maps by Phenix real_space_refine approach25 using 5 refining iterations.
Discussion
In this paper, we have introduced methods to improve the analysis and interpretability of cryo-EM maps. These methods include map enhancement approaches (LocSpiral and LocBSharpen), and approaches to calculate local B-factors (LocBFactor) and density occupancy maps (LocOccupancy). We have shown in our experiments that LocSpiral approach improves map connectivity showing fewer fragmented and broken densities and better coverage of the atomic model. In fact, our LocSpiral approach has been applied on several published publications29–33, enabling molecular modelling on maps with flexibility and light anisotropic resolution.
We envision that our proposed methods to estimate local B-factors and occupancy maps could be used to improve de novo model building. First, these maps can be employed to guide the manual tracing. These maps can be informative to estimate the range of structures that could be compatible with the given electron microscopy density. Second, for very high-resolution cryo-EM maps, these values can be used as an approximation of the atomic B-factors and occupancies to be further refined as part of the automatic model refinement process by automatic model building packages as Phenix14 or Refmac15. B-factor maps provide complementary information to local resolution maps, though, these results are usually correlated. The latter usually determines the resolution at a given point by comparing the map to noise or background amplitudes34, while the former determines the rate of signal amplitude fall off within a resolution range. Then, we can find map regions with similar local resolution (map amplitude similar to noise/background amplitude at this resolution and coordinates), while different B-factor as the signal damping could be different within the used resolution range (highly or slowly sloped).
We have seen that we must be careful when processing maps affected by high flexibility and heterogeneity or when analysing maps with moderate global resolution (close to 10–15 Å) as the obtained B-factors could be overestimated if the selected resolution range is above the local resolution at these regions. Note that obtained B-factors at this low-resolution regions describe mainly noise B-factors that show how the noise signal fall off inside the used resolution range and they should be filtered out from our B-factor map. However, these problematic cases can be easily detected as the amplitude values in corresponding Guinier plot will be below the noise level (obtained from the 90–95% quantile of the empirical noise/background distribution). Thus, these regions can be automatically filtered out and not taken into consideration. In our analysis of B-factors for low and high-resolution maps shown in the Supplementary Material, we show that existing methods to determine the map global B-factor, as Relion postprocessing, do not filter problematic low-resolution regions so the estimated B-factor may be overestimated.
In principle, it might be possible to differentiate between compositional and moderate conformational flexibility from the obtained occupancy maps for samples accurately 3D classified. In the former case, the occupancy map is expected to show close to zero values at missing regions, as the density values of these parts should be low and close to the noise level. Oppositely, in the latter case, the occupancy is likely to show higher values as the density values of moving parts, while slightly blurred because of the movement, should be similar to the ones at other static regions of the macromolecule. However, we should be extremely careful about these analyses as 3D classification approaches are not perfect, thus, macromolecules showing different compositions could provide 3D maps with significant density values in regions that should be empty. Additionally, samples showing large conformational changes could present low-density values at moving regions when compared to density values at static parts, providing close-to zero occupancy values.
The methods proposed here are semi-automated and essentially only require the unfiltered map to enhance or analyse, a resolution range and, in some cases, a binary solvent mask as inputs. They do not require additional information as atomic models or local resolution maps. The common link between all these approaches is the use of the spiral phase transform, which is used to factorise cryo-EM maps into amplitude and phase terms in real space for different resolutions. The spiral phase transform has been extensively used in optics for phase extraction in interferometry35–39 or by Shack-Hartmann sensors40,41. This transformation is not new in cryo-EM as it has been proposed previously to facilitate particle screening42, CTF estimation43 and local and directional resolution determination34,44. In refs. 34,44, the authors used the Riesz transform to obtain amplitude maps, which is similar to the spiral phase transform.
Cryo-EM reconstructions of different types of macromolecules have been used to test the performance of these algorithms. Specifically, we have used a membrane protein (TRP channel), immature ribosomes affected by high compositional heterogeneity, the spliceosome that shows high conformational heterogeneity, recent SARS-CoV-2 reconstructions exhibiting dynamic regions and high-resolution apoferritin reconstructions. In all cases, our proposed approaches show excellent results, improving the analysis and the interpretability of the processed maps. The proposed methods are also highly efficient. For example, the processing of EMD-21457 (map size 400 px3) using our local enhancement approach took only 12 min on a standard laptop using 4 cores.
Methods
The proposed methods are based on a 3D generalisation of the 2D spiral phase transform. In the following, we present the 3D spiral phase transform and its application to map enhancement, local B-factor determination, and estimation of local map occupancies.
3D spiral phase transform
The spiral phase transform is a Fourier operator that can factorise a 3D map into its amplitude and phase terms in real space at different resolutions. We assume without loss of generality that a given 3D map can be modelled as a 3D phase modulated signal given by
3 |
where V(r) is the cryo-EM map, Vω(r) is a band-passed map filtered at frequency ω, the 3D background or DC term, the 3D amplitude map, φω the 3D modulating phase and . Assuming that we are interested in spatial frequencies higher than 1/50–1/30 1/Å and that the background is usually a low-frequency signal, we can approximate the map by a high-passed filtered map VHP for resolutions higher than 50–30 Å by
4 |
For convenience, Eq. (4) can be expanded into its corresponding analytic signal as
5 |
This analytic signal relates to our high-passed filtered map by
6 |
with Re{·} an operator that takes the real part and j is the imaginary unit (j2 = −1). Note from the analytic signal defined in Eq. (5) that and clearly represent amplitude and phase terms. The quadrature transformation of Eq. (4) is given by
7 |
Then, Eq. (5) may be rewritten as
8 |
Assuming that mω is a low varying map compared to φω, the gradient of VHP is approximated by
9 |
Rearranging terms, we obtain
10 |
Equation (10) shows that the quadrature term is composed of two terms. The first is an orientation map nφ and the second corresponds to a non-linear operator that can be interpreted as a 3D generalisation of the 1D Hilbert transform, which can be efficiently calculated using the Fourier transform. As shown in45, the operator corresponds to the 3D Hilbert transform applied to our band-passed maps , then
11 |
Thus, Eq. (10) can be rewritten as
12 |
Note that is a unit vector pointing in the same direction that (remember that is a low varying map compared to φω), but maybe with different orientation because a possible change of sign introduced by the cosine term in Eq. (4). We can rewrite Eq. (12) as
13 |
where s(r) is a function with range +1 or −1 considering that and can be parallel or antiparallel only. From Eq. (13), we can obtain an estimation of affected by an indetermination in its sign by
14 |
However, we can use Eq. (14) to obtain the modulation and cosine terms in Eq. (4) separately without sign ambiguity as
15 |
Using these expressions, we can obtain for each frequency ω the terms and .
Local enhanced map (LocSpiral)
We are proposing here a robust local map enhancement method that only requires as input a binary mask of the macromolecule and a resolution range. The approach works for both high and moderate resolution maps. In the following, we provide details of the proposed method.
As explained before, each band-pass filtered map can be factorised into an amplitude and phase term by the spiral phase transform. Then, given a user-defined solvent mask, the method obtains the empirical noise amplitude probability distribution at frequency ω, selecting the density values of voxels not included in the solvent mask. From this distribution, the approach determines the noise amplitude value corresponding to the 90–95% quantile, given by . This value is used to locally normalise map amplitudes in real space along with different frequencies and remove local signals that are below this amplitude threshold as they are likely noise at this given frequency and position. After this non-linear amplitude transformation, the enhanced map at a given frequency ω is given by
16 |
and the map
17 |
The method allows as an option the use of an SNR weighting parameter to weight the contribution of the different amplitudes in the final map. In this case, Eq. (17) is rewritten as
18 |
with the SNR weighting parameter given by
19 |
Local B-factor determination (LocBFactor)
The factorisation of a 3D map into its amplitude and phase terms in real space for different frequencies allows the efficient determination of local B-factor maps. To this end, LocBFactor method first obtains the local map amplitudes for resolutions between 15–10 Å to the estimated global map resolution. These amplitude maps are then used to obtain SNR-weighted log-amplitudes of structure factors locally as
20 |
with a SNR weighting parameter defined in (19). This expression can be used to fit versus ω2 within the resolution rage defined between 15 and 10 Å to the estimated global map resolution. Thus, finally we have
21 |
with B(r) the local B-factor map or B map, and A(r) the log-amplitude map at ω0 (A map). In Eq. (21) the approach typically does not take into consideration in the linear fit amplitude values that are below the noise level . Additionally, local Guinier plots without at least two points above the noise level are filtered out from the B map. Note that ω0 corresponds to the lowest frequency within the used resolution range (typically 1/15–1/10 Å−1).
Local B-factor sharpened map (LocBSharpen)
The spiral phase transform can be used to obtain local B-factor sharpened maps. Note that Expression (4) can be modified for frequencies higher than ω0 as
22 |
With A(r) the log-amplitude map at ω0 (A map).
Local occupancy map (LocOccupancy)
Low occupancy map regions correspond to parts of the macromolecule where map amplitudes of the reconstruction are significantly smaller when compared to other regions of the macromolecule. Keeping this in mind, we define the occupancy map as
23 |
where and are obtained from the empirical macromolecule amplitude probability distribution at frequency ω. This amplitude probability distribution is calculated from map density values corresponding to voxels that are included in the solvent mask. From this distribution, the approach determines the macromolecule amplitude values corresponding to the 25 and 0% quantiles, given by and that are used as thresholds. To calculate local occupancy maps, a typical resolution range between 30 and 10–8 Å is used to obtain density occupancies of complete secondary structure motifs, while ranges between 5 and 3–1.5 Å are used for high-resolution cryo-EM maps to obtain occupancies of residues.
Maturity level index
In the analysis of the immature 50S ribosomes, we have proposed a maturity level index. This index can be extended to the analysis of any maturing macromolecule and is useful to place immature macromolecules into a maturing timeline. The calculation of this index requires reconstructions of immature and mature macromolecules. The mature reconstruction is used to obtain a binary solvent mask, while the immature reconstructions are used to calculate occupancy maps. These occupancy maps allow us to determine highly occupied regions (occupancy >0.75) and calculate occupancy masks. Then, the index is obtained comparing the number of voxels activated in the solvent mask of the mature reconstruction with the ones in the occupancy masks. As can be seen from Fig. 3, the larger are the regions that are not folded in the immature maps, the smaller is the maturity level.
Cryo-EM image processing of the spliceosome data
The dataset is composed of 327,490 particle images of a spliceosomal B-complex from yeast (EMPIAR-10180)4. The particles were polished with Relion, downsampled to 1.699 Å/px and windowed to a size of 320 × 320 pixels. A set of 30 initial volumes were obtained by RANSAC (15 maps) and Eman2 (15 maps) and processed by volume selector approach22 producing two different initial volumes. Then, Relion 3D classification was used to compute two classes providing both volumes as reference initial maps (class 1 and class 2 composed by 201,407 and 126,083 particles respectively). The resulting classes were refined by Relion autorefine using the maps obtained in the previous 3D classification. Finally, Relion postprocessing provided maps at 4.28 and 4.58 Å for class 1 and class 2, respectively. Lastly, a local resolution was calculated using Relion for both classes.
Supplementary information
Acknowledgements
This work was supported by grants from NSERC Discovery Grant (RGPIN-2018-04813), the Spanish Ministry of Science and Innovation through the call 2019 Proyectos de I + D + i - RTI Tipo A (PID2019-108850RA-I00). J.V. acknowledges economical support from the Ramón y Cajal 2018 programme (RYC2018-024087-I). We want to thank helpful discussions with Jose Jesus Fernandez.
Author contributions
J.V. had the idea, J.V., S.K. and J.G.-B. and devised the theory, developed and implemented the algorithm, performed the experiments and wrote the manuscript. R.S.-G. and S.A. helped to analyse and interpret data. A.A.Z.K., D.W., J.S.M. and K.H.B. analysed data, wrote part of the manuscript and provided comments and feedback. All authors reviewed the manuscript, supervised the experiments and discussed the results.
Data availability
Previously published datasets used for testing are available from the Electron Microscopy Data Bank (https://www.ebi.ac.uk/pdbe/emdb/) under accession codes EMD-10418, EMD-8440, EMD-8441, EMD-8445, EMD-8450, EMD-8434, EMD-21375, EMD-21374, EMD-21452 and EMD-21457. Data that support the findings of this study have been deposited in http://t.ly/XKQa.
Code availability
The source code for the presented methods is freely available under the terms of an open-source software license and can be downloaded from https://github.com/1aviervargas/LocSpiral-LocBSharpen-LocBFactor-LocOccupancy46.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Michael Cianfrocco and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Satinder Kaur, Josue Gomez-Blanco.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-21509-5.
References
- 1.Wandzik, J. M. et al. A structure-based model for the complete transcription cycle of influenza polymerase. Cell10.1016/j.cell.2020.03.061 (2020). [DOI] [PubMed]
- 2.Ge, P. et al. Action of a minimal contractile bactericidal nanomachine. Nature10.1038/s41586-020-2186-z (2020). [DOI] [PMC free article] [PubMed]
- 3.Davis JH, et al. Modular assembly of the bacterial large ribosomal subunit. Cell. 2016;167:1610–1622 e1615. doi: 10.1016/j.cell.2016.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Plaschka C, Lin PC, Nagai K. Structure of a pre-catalytic spliceosome. Nature. 2017;546:617–621. doi: 10.1038/nature22799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Razi, A. et al. Role of Era in assembly and homeostasis of the ribosomal small subunit. Nucleic Acids Res.10.1093/nar/gkz571 (2019). [DOI] [PMC free article] [PubMed]
- 6.Rosenthal PB, Henderson R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 2003;333:721–745. doi: 10.1016/j.jmb.2003.07.013. [DOI] [PubMed] [Google Scholar]
- 7.Fernandez JJ, Luque D, Caston JR, Carrascosa JL. Sharpening high resolution information in single particle electron cryomicroscopy. J. Struct. Biol. 2008;164:170–175. doi: 10.1016/j.jsb.2008.05.010. [DOI] [PubMed] [Google Scholar]
- 8.Scheres SH. Semi-automated selection of cryo-EM particles in RELION-1.3. J. Struct. Biol. 2015;189:114–122. doi: 10.1016/j.jsb.2014.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Terwilliger TC, Sobolev OV, Afonine PV, Adams PD. Automated map sharpening by maximization of detail and connectivity. Acta Crystallogr. D Struct. Biol. 2018;74:545–559. doi: 10.1107/S2059798318004655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Murshudov GN. Refinement of atomic structures against cryo-EM maps. Methods Enzymol. 2016;579:277–305. doi: 10.1016/bs.mie.2016.05.033. [DOI] [PubMed] [Google Scholar]
- 11.Jakobi, A. J., Wilmanns, M. & Sachse, C. Model-based local density sharpening of cryo-EM maps. eLife6, 10.7554/eLife.27131 (2017). [DOI] [PMC free article] [PubMed]
- 12.Ramirez-Aportela, E. et al. Automatic local resolution-based sharpening of cryo-EM maps. Bioinformatics36, 765–772 (2020). [DOI] [PMC free article] [PubMed]
- 13.Sherwood, D., Cooper, J. & Sherwood, D. Crystals, X-rays, and Proteins: Comprehensive Protein Crystallography (2011).
- 14.Liebschner D, et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 2019;75:861–877. doi: 10.1107/S2059798319011471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Winn MD, Murshudov GN, Papiz MZ. Macromolecular TLS refinement in REFMAC at moderate resolutions. Methods Enzymol. 2003;374:300–321. doi: 10.1016/S0076-6879(03)74014-2. [DOI] [PubMed] [Google Scholar]
- 16.Penczek PA. Image restoration in cryo-electron microscopy. Methods Enzymol. 2010;482:35–72. doi: 10.1016/S0076-6879(10)82002-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Liao HY, Frank J. Definition and estimation of resolution in single-particle reconstructions. Structure. 2010;18:768–775. doi: 10.1016/j.str.2010.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Afonine PV, et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallogr. D Struct. Biol. 2018;74:814–840. doi: 10.1107/S2059798318009324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kimanius, D., Forsberg, B. O., Scheres, S. H. & Lindahl, E. Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2. eLife5, 10.7554/eLife.18722 (2016). [DOI] [PMC free article] [PubMed]
- 20.Wang Q, et al. Lipid interactions of a ciliary membrane trp channel: simulation and structural studies of polycystin-2. Structure. 2020;28:169–184.e165. doi: 10.1016/j.str.2019.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ, Patwardhan A. EMPIAR: a public archive for raw electron microscopy image data. Nat. Methods. 2016;13:387–388. doi: 10.1038/nmeth.3806. [DOI] [PubMed] [Google Scholar]
- 22.Gomez-Blanco J, Kaur S, Ortega J, Vargas J. A robust approach to ab initio cryo-electron microscopy initial volume determination. J. Struct. Biol. 2019;208:107397. doi: 10.1016/j.jsb.2019.09.014. [DOI] [PubMed] [Google Scholar]
- 23.Kucukelbir A, Sigworth FJ, Tagare HD. Quantifying the local resolution of cryo-EM density maps. Nat. Methods. 2014;11:63–65. doi: 10.1038/nmeth.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Barad BA, et al. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Methods. 2015;12:943–946. doi: 10.1038/nmeth.3541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Afonine PV, et al. Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr. D Struct. Biol. 2018;74:531–544. doi: 10.1107/S2059798318006551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lawson CL, et al. EMDataBank.org: unified data resource for CryoEM. Nucleic Acids Res. 2011;39:D456–D464. doi: 10.1093/nar/gkq880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wrapp D, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367:1260–1263. doi: 10.1126/science.abb2507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Walls, A. C. et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell10.1016/j.cell.2020.02.058 (2020). [DOI] [PMC free article] [PubMed]
- 29.Khalifa, A. A. Z. et al. The inner junction complex of the cilia is an interaction hub that involves tubulin post-translational modifications. eLife9, 10.7554/eLife.52760 (2020). [DOI] [PMC free article] [PubMed]
- 30.Ichikawa, M. et al. Tubulin lattice in cilia is in a stressed form regulated by microtubule inner proteins. J. Proc. Natl Acad. Sci. USA10.1073/pnas.1911119116 (2019). [DOI] [PMC free article] [PubMed]
- 31.Yang M, et al. Cryo-electron microscopy structures of ArnA, a key enzyme for polymyxin resistance, revealed unexpected oligomerizations and domain movements. J. Struct. Biol. 2019;208:43–50. doi: 10.1016/j.jsb.2019.07.009. [DOI] [PubMed] [Google Scholar]
- 32.Gutmann, T. et al. Cryo-EM structure of the complete and ligand-saturated insulin receptor ectodomain. J. Cell Biol.219, 10.1083/jcb.201907210 (2020). [DOI] [PMC free article] [PubMed]
- 33.Jahagirdar, D. et al. Alternative conformations and motions adopted by 30S ribosomal subunits visualized by cryo-electron microscopy. RNA26, 2017–2030 (2020). [DOI] [PMC free article] [PubMed]
- 34.Vilas JL, et al. MonoRes: automatic and accurate estimation of local resolution for electron microscopy maps. Structure. 2018;26:337–344.e334. doi: 10.1016/j.str.2017.12.018. [DOI] [PubMed] [Google Scholar]
- 35.Vargas J, Restrepo R, Quiroga JA, Belenguer T. High dynamic range imaging method for interferometry. Opt. Commun. 2011;284:4141–4145. doi: 10.1016/j.optcom.2011.04.059. [DOI] [Google Scholar]
- 36.Larkin KG, Bone DJ, Oldfield MA. Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform. J. Opt. Soc. Am. A. 2001;18:1862–1870. doi: 10.1364/JOSAA.18.001862. [DOI] [PubMed] [Google Scholar]
- 37.Antonio Quiroga J, Servin M. Isotropic n-dimensional fringe pattern normalization. Opt. Commun. 2003;224:221–227. doi: 10.1016/j.optcom.2003.07.014. [DOI] [Google Scholar]
- 38.Vargas J, Quiroga JA, Sorzano CO, Estrada JC, Carazo JM. Two-step interferometry by a regularized optical flow algorithm. Opt. Lett. 2011;36:3485–3487. doi: 10.1364/OL.36.003485. [DOI] [PubMed] [Google Scholar]
- 39.Vargas J, Quiroga JA, Sorzano CO, Estrada JC, Servin M. Multiplicative phase-shifting interferometry using optical flow. Appl. Opt. 2012;51:5903–5908. doi: 10.1364/AO.51.005903. [DOI] [PubMed] [Google Scholar]
- 40.Vargas J, González-Fernandez L, Quiroga, Juan A, Belenguer T. Shack–Hartmann centroid detection method based on high dynamic range imaging and normalization techniques. Appl. Opt. 2010;49:2409–2416. doi: 10.1364/AO.49.002409. [DOI] [Google Scholar]
- 41.Vargas J, et al. Shack-Hartmann centroid detection using the spiral phase transform. Appl. Opt. 2012;51:7362–7367. doi: 10.1364/AO.51.007362. [DOI] [PubMed] [Google Scholar]
- 42.Vargas J, et al. Particle quality assessment and sorting for automatic and semiautomatic particle-picking techniques. J. Struct. Biol. 2013;183:342–353. doi: 10.1016/j.jsb.2013.07.015. [DOI] [PubMed] [Google Scholar]
- 43.Vargas J, et al. FASTDEF: fast defocus and astigmatism estimation for high-throughput transmission electron microscopy. J. Struct. Biol. 2013;181:136–148. doi: 10.1016/j.jsb.2012.12.006. [DOI] [PubMed] [Google Scholar]
- 44.Vilas JL, Tagare HD, Vargas J, Carazo JM, Sorzano COS. Measuring local-directional resolution and local anisotropy in cryo-EM maps. Nat. Commun. 2020;11:55. doi: 10.1038/s41467-019-13742-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Servin M, Quiroga JA, Marroquin JL. General n-dimensional quadrature transform and its application to interferogram demodulation. J. Opt. Soc. Am. A. 2003;20:925–934. doi: 10.1364/JOSAA.20.000925. [DOI] [PubMed] [Google Scholar]
- 46.Kaur, S. et al. Local computational methods to improve the interpretability and analysis of cryo-EM maps. 10.5281/zenodo.4452060 (2021). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Previously published datasets used for testing are available from the Electron Microscopy Data Bank (https://www.ebi.ac.uk/pdbe/emdb/) under accession codes EMD-10418, EMD-8440, EMD-8441, EMD-8445, EMD-8450, EMD-8434, EMD-21375, EMD-21374, EMD-21452 and EMD-21457. Data that support the findings of this study have been deposited in http://t.ly/XKQa.
The source code for the presented methods is freely available under the terms of an open-source software license and can be downloaded from https://github.com/1aviervargas/LocSpiral-LocBSharpen-LocBFactor-LocOccupancy46.