A new program, Servalcat, to facilitate atomic model refinement in cryo-EM single-particle analysis is presented. It implements a refinement pipeline using REFMAC5 and F o − F c map calculation.
Keywords: cryo-EM, structure refinement, REFMAC5, Servalcat
Abstract
In 2020, cryo-EM single-particle analysis achieved true atomic resolution thanks to technological developments in hardware and software. The number of high-resolution reconstructions continues to grow, increasing the importance of the accurate determination of atomic coordinates. Here, a new Python package and program called Servalcat is presented that is designed to facilitate atomic model refinement. Servalcat implements a refinement pipeline using the program REFMAC5 from the CCP4 package. After the refinement, Servalcat calculates a weighted F o − F c difference map, which is derived from Bayesian statistics. This map helps manual and automatic model building in real space, as is common practice in crystallography. The F o − F c map helps in the visualization of weak features including hydrogen densities. Although hydrogen densities are weak, they are stronger than in the electron-density maps produced by X-ray crystallography, and some H atoms are even visible at ∼1.8 Å resolution. Servalcat also facilitates atomic model refinement under symmetry constraints. If point-group symmetry has been applied to the map during reconstruction, the asymmetric unit model is refined with the appropriate symmetry constraints.
1. Notation
FT: Fourier transform of unknown true map (complex values).
Fn: Fourier transform of noise in the observed map (complex values).
Fo1, Fo2: Fourier transforms of the two unweighted and unsharpened half maps from independent reconstructions (complex values).
Fo: Fourier transform of the observed full map, (Fo1 + F o 2)/2.
Fc: Fourier transform of calculated map from atomic coordinates (complex values).
E: structure factors normalized in resolution bins, F/(〈|F|2〉)1/2.
k: resolution-dependent scale factor between Fo and F T.
D: resolution-dependent scale factor between Fo and F c.
: variance of signal, var(FT).
: variance of noise, var(Fn).
: variance of unexplained signal, var(DFc − kF T).
f: atomic scattering factor.
s: column vector of position in reciprocal space.
sT: row vector of position in reciprocal space.
x: column vector of position in real space.
(R, t): rotation matrix and translation vector that could be an element of a point group.
B: displacement parameter of an atom, or blurring parameter for a local or global region of a map. A real value (isotropic case) or a 3 × 3 symmetric matrix (anisotropic case). Usually B is isotropic and atomic unless otherwise stated. Also called an atomic displacement parameter (ADP) if associated with an atom.
Unless otherwise stated, all quantities in Fourier space are dependent on s.
2. Introduction
Atomic model refinement is the optimization of the model’s parameters against the observed data. Atomic parameters typically include coordinates, atomic displacement parameters (ADPs) and occupancies. In crystallography, refinement is crucial because of the phase problem: the accuracy of density maps relies on the accuracy of the phases of the structure factors. Accurate phases are not observed and must be calculated from the model (Tronrud, 2004 ▸). More accurate maps may be obtained as the model becomes more accurate through the refinement. In single-particle analysis (SPA) there is no phase problem, although the Fourier coefficients can be noisy, especially at high resolution.
Accurate atomic model determination is becoming more and more important due to the ‘resolution revolution’ in cryo-EM SPA following the introduction of direct electron detectors and new data-processing methods (Bai et al., 2015 ▸). As of April 2021, more than 2500 SPA entries with resolutions better than 3.5 Å have been deposited in the Electron Microscopy Data Bank (EMDB; Tagari et al., 2002 ▸). This improvement in resolution has accelerated the development of methods for model building, refinement and validation. Automatic model-building programs that were originally developed for crystallography are now being adapted for cryo-EM SPA maps (Terwilliger, Adams et al., 2018 ▸; Hoh et al., 2020 ▸; Chojnowski et al., 2021 ▸). Density modification and local map sharpening can help to interpret the map (Jakobi et al., 2017 ▸; Terwilliger, Sobolev et al., 2018 ▸; Ramírez-Aportela et al., 2019 ▸; Ramlaul et al., 2019 ▸; Terwilliger et al., 2020 ▸). In general, care must be exercised when using any techniques based on prior knowledge; bias towards incorrect assumptions might lead to misinterpretation of the maps. Full-atom refinement can be performed either in real space (Afonine et al., 2018 ▸) or in reciprocal space (Murshudov, 2016 ▸).
After refinement, the model should be validated; the model should have a reasonable geometry and should describe the map well. Due to the low data-to-parameter ratio, all models will exhibit a degree of overfitting; however, the model should not deviate substantially from cross-validation data (Brown et al., 2015 ▸). MolProbity is the most widely used geometry validation tool, and includes analyses of clashes, rotamers and the Ramachandran plot (Chen et al., 2010 ▸). Map–model quality is assessed using real-space local correlations (Cragnolini et al., 2021 ▸), which have commonly been used in crystallography (Tickle, 2012 ▸). In reciprocal-space refinement, the R factor can be calculated as in crystallography, but the map–model Fourier shell correlation (FSC) is preferred as it does not depend on resolution-dependent scaling and takes phases into account explicitly. An F o − F c map, which highlights unmodelled features and errors in the current model, is almost always used in crystallography, and some similar tools already exist for SPA (Joseph et al., 2020 ▸). The σA-weighted (m|F o| − D|F c|)exp(iφc) map as used in crystallography is not directly applicable to SPA, because phases are available for both F o and F c and we should model the error of F o in the complex plane, rather than simply using the estimated phase error as in crystallography (see below).
In 2020, cryo-EM SPA achieved atomic resolution, according to Sheldrick’s criterion (Wlodawer & Dauter, 2017 ▸), in structural analyses of apoferritin, which were reported by two groups (Nakane et al., 2020 ▸; Yip et al., 2020 ▸). Nakane et al. (2020 ▸) observed H-atom densities at 1.2 and 1.7 Å resolutions using F o − F c maps calculated by REFMAC5. There is a higher chance of observing hydrogen density in electron microscopy than in X-ray crystallography because of the increased contrast for the lighter elements (Clabbers & Abrahams, 2018 ▸). Nevertheless, hydrogen density is relatively weak and there is always a much higher peak from the parent atom nearby, so the F o − F c difference map is essential to see it. In addition, there is complexity in the interpretation of hydrogen peaks in EM. An electron in an H atom is usually shifted towards the parent atom from the nucleus position. In EM, both the electrons and the nucleus contribute to scattering, and this offset results in a shift of hydrogen density peaks beyond the position of the hydrogen nucleus (Nakane et al., 2020 ▸).
SPA structures often have point-group symmetries (rather than space-group symmetry as in crystallography). Approximately half of the SPA entries in the EMDB have non-C1 point-group symmetry according to their associated metadata. Such symmetry is advantageous and helps to reach higher resolution because it increases the effective number of particles. If the map is symmetrized, downstream analyses should be aware of it and the structural model must follow the symmetry. As in crystallography, it is natural to work in a single asymmetric unit. The MTRIX records in the PDB format or _struct_ncs_oper in the mmCIF format can be used to encode the symmetry information.1 Currently, for structures from SPA there are only a few depositions of such asymmetric unit models in the PDB (excepting viruses). We recommend refining and depositing an asymmetric unit model, which makes sure the symmetry copies are truly identical. It should be noted that validation tools must be aware of any applied symmetry operators, but results should be reported for the asymmetric unit only. These considerations are only valid if the map is symmetrized, and we suggest that the point-group information should be required by the deposition system.
Here, we present Servalcat, a Python package and standalone program for the refinement and map calculation of cryo-EM SPA structures. Servalcat takes unsharpened and unweighted half maps of the independent reconstructions as inputs and implements a refinement pipeline using REFMAC5, which uses a dedicated likelihood function for SPA (Murshudov, 2016 ▸). After the refinement, Servalcat calculates a sharpened and weighted F o − F c map derived from Bayesian statistics as described below. If the map has point-group symmetry, the user can give an asymmetric unit model and a point-group symbol, and the program will output a refined asymmetric unit model with symmetry annotation as well as a symmetry-expanded model. The noncrystallographic symmetry (NCS) constraint function in REFMAC5 has been updated to consider symmetry-related nonbonded interactions and ADP similarity restraints (to ensure the similarity of ADPs of atoms brought into close proximity via symmetry operations).
Servalcat is freely available as a standalone package and also as part of CCP-EM (Burnley et al., 2017 ▸), where the REFMAC5 interface has been updated to use Servalcat.
3. Map calculation and sharpening using signal variance
Let us assume that F o is the result of a position-independent blurring k of the true Fourier coefficients F T with an independent zero-mean Gaussian noise with variance . That is,
Note that in this work we treat k as a function of resolution |s|. Multiplication by k in Fourier space is equivalent to isotropic blurring by a convolution in real space. In general, k could take on a different value at each point s in Fourier space, which would produce a position-independent but direction-dependent blurring in real space.
The variance of the noise () can be calculated from the half maps in resolution bins (Murshudov, 2016 ▸),
We will later use the relationship of and to the FSC, correlation coefficients in resolution bins (Rosenthal & Henderson, 2003 ▸),
Let us also assume that the errors in the model follow a Gaussian distribution (Luzzati, 1952 ▸),
We need two functions: the likelihood p(F o; F c) for the estimation of parameters (of the atomic model and of the distribution function) and the posterior distribution p(F T; F o, F c) of the unknown F T for map calculation.
3.1. Likelihood
As derived in Murshudov (2016 ▸),
is the likelihood function that is optimized during atomic model refinement. D and are obtained in each resolution bin i by maximizing the joint likelihood (7):
where N i is the number of Fourier coefficients in bin i.
3.2. Posterior distribution and map calculation
The posterior distribution, as derived in Murshudov (2016 ▸),
is a 2D Gaussian distribution with the mean and variance
where
Coefficients for an F o − F c-type difference map can be derived as
The remaining unknown variable is k, which cannot be determined from the data alone. For position-independent isotropic Gaussian blurring, k has the form exp(−B overall|s|2/4) and B overall may be estimated from line fitting of a Wilson plot (Wilson, 1942 ▸). However such an estimate is unstable, especially when only low-resolution data are available. Here, we introduce a simple approximation using the variance of the signal. Let us assume that the true map consists of atoms with the same isotropic ADP of 〈B〉, and then
We ignored the interference terms . Further ignoring resolution-dependent terms in , we can use kσT as a proxy for k, which gives the best sharpening for the region, with a local blurring parameter of 〈B〉. kσT can be transformed as follows:
The F o − F c coefficient then finally has the form
Servalcat calculates an F o − F c map using (17). Note that the F o − F c map is only sensible when the ADPs are properly refined; otherwise we will see spurious peaks due to incorrect ADPs. For this reason, unsharpened F o should be used as the input for atomic model refinement (see Section 4.1); the sharpening is then consistent as the same sharpening factor is applied to F o and F c. Note also that the sharpening is based on the average B value, so regions having very different B values may show fewer structural features.
The map from the estimated true Fourier coefficients (11) may be useful, but there is a risk of model bias because of the contribution from F c. In the future, techniques may be available to resolve the issue of model bias. At the moment, Servalcat provides the following as a default map for manual inspection. This is a special case of (11) in the absence of a model, that is with D = 0,
This is equivalent to EMDA’s normalized expected map (Warshamanage et al., 2021 ▸).
The approach here should work at any resolution where atomic model refinement is applicable.
3.3. Variance of a masked map
The significance of difference map peaks is usually defined by the r.m.s.d. (sigma) level in crystallography. However, in SPA the box size is arbitrary and the voxels outside the molecular envelope lead to underestimation of the r.m.s.d. value. Here, we demonstrate how a mask inflates sigma-scaled density and show that it is useful to normalize the map using the standard deviation within the mask.
We consider a masked map containing n points in total, where m points are within the mask and thus the values for n − m points are zero. If we calculate the mean value of the whole data,
Thus, to calculate the mean within the mask we can calculate the total mean and then use the formula for correction:
For the variance,
From here we can calculate varmask if we know vartotal and μtotal. If we denote f = m/n then we can write
If the mean inside the mask is zero then there is a simple relationship between the total variance and the variance within the mask. This explains the dependence between the box size and the r.m.s.d. of a cryo-EM SPA map. Servalcat normalizes the F o − F c map by (varmask)1/2 when a mask file is given. (Otherwise only the F o − F c structure factors are written in MTZ format.)
If we assume that the map consists of signal and noise, and there is no correlation between them, then we can claim that varmask = varsignal + varnoise. Now, in addition, if we assume that we have modelled the map fully with an atomic model (or that two maps have an almost perfect overlap of signals) then the difference maps should consist almost entirely of noise. Therefore, vardiffmap,mask = varnoise. This variance should be calculated within the mask to make sure that we do not have variance reduction because of systematically low values outside the region occupied by the macromolecule. If we want to increase the reliability of these variances for a region of interest then we may also mask out other regions where there might be signal that is not fully accounted for by the current model. This can also be practiced in crystallography.
4. Refinement procedure
In this section the refinement and map-calculation procedures are described. Everything other than REFMAC5 itself is implemented in Servalcat using the GEMMI library (https://github.com/project-gemmi/gemmi). Fig. 1 ▸ summarizes the procedure.
4.1. Map choice
The optimal map depends on the purpose. For manual inspection, optimally sharpened and weighted maps should be used so that the best visual interpretability is achieved. In general, this does not mean the best signal-to-noise ratio, but it does mean that the details of structural features are visible in the map. On the other hand, unsharpened and unweighted maps are preferred in refinement. If a sharpened map is used, some atoms may need to be refined to have negative B values (or nonpositive definite if anisotropic), but they are constrained to be positive in the refinement, resulting in suboptimal atomic models. On the other hand, blurred maps will just give a shifted distribution of refined B values. An unweighted map is preferred because it enables the calculation of many properties including noise variance and optimally weighted maps after refinement (see Section 3). Users should therefore be aware that the ADPs in the model are not refined against the same map that is used for visual inspection. Cross-validation (Brown et al., 2015 ▸) can also be carried out throughout refinement and model building if both half maps are readily available. Therefore, unsharpened and unweighted half maps from two independent reconstructions are considered to be optimal inputs for the Servalcat pipeline, which performs atomic model refinement followed by map calculation.
4.2. Masking and trimming
The box size in SPA is often substantially larger than the molecule, which is unnecessary for atomic model refinement. Therefore the map is masked and trimmed into a smaller box to speed up calculations, as discussed in Nicholls et al. (2018 ▸).
Half maps are first sharpened, masked at a radius of 3 Å (default) from the atom positions and then blurred by the same factor. Sharpening before masking is important to avoid masking away any of the signal (the tails of the atomic density distributions), because the raw half maps are blurred and the signal is spread out. The optimal sharpening will differ depending on the region, but here we use an overall isotropic B value estimated by comparing |F o| with |F c| calculated from a copy of the initial model with all ADPs set to zero. Alternatively, a user-supplied B value can be used. The sharpened–masked–unsharpened half maps are then averaged to make a full map that is used as the refinement target in REFMAC5. After refinement, the map–model FSC is calculated using a newly created mask based on the refined model.
4.3. Point-group symmetry
If the maps are symmetrized, the user can specify a point-group symbol and give the coordinates for just a single asymmetric unit. Symmetry operators are calculated from the symbols (Cn, Dn, O, T and I) following the axis convention in RELION (Scheres, 2012 ▸), which follows the common orientation convention (Heymann et al., 2005 ▸) except for T. It is also assumed that the centre of the box is the origin of symmetry. This requires translation for each rotation R j, which can be calculated as c − R j c = (I − R j)c, where c is the origin of symmetry. Reconstruction programs such as RELION (Scheres, 2012 ▸) usually follow this assumption. However, the rotation of the axes and the position of the origin are arbitrary in general, and in future will be determined automatically using ProSHADE (Nicholls et al., 2018 ▸; Tykac, 2018 ▸) and EMDA. The model in the asymmetric unit is expanded when creating a mask and performing map trimming. The rotation matrices are invariant to changing the box sizes and shifts of the molecule. The translation vectors in the symmetry operators are recalculated for the shifted model.
REFMAC5 internally generates symmetry copies when calculating Fc and restraint terms. For anisotropic ADPs, the B aniso matrix in the Cartesian basis is transformed by . This anisotropic ADP transformation is also implemented in GEMMI.
During the refinement, nonbonded interaction and ADP similarity restraints are evaluated using the symmetry-expanded model, and the gradients are calculated for the model in the asymmetric unit.
If atoms are on special positions (for example on a rotation axis), they are restrained2 to sit on the special position and have anisotropic ADPs consistent with symmetry. Firstly, atoms are identified as being on a special position if the following condition is obeyed for any of the symmetry operators j,
where ɛ is a tolerance that can be modified by users. The default value is 0.25 Å. If an atom is on a special position then the program makes sure that the symmetry operators for this position form a group that is a subgroup of the point group of the map. Once the elements of the subgroup for this atom have been identified, the atom is forced to be on that position by simply replacing its coordinates with
In every cycle, the positions of these atoms are restrained to be on their special positions by adding a term to the target function,
where the summation is performed over all subgroup elements of the special position and σx is a user-controllable weight parameter for special positions. The occupancy of the atom is adjusted based on the multiplicity of the position.
If anisotropic ADPs are used, they are also forced to obey symmetry conditions for atoms on special positions by replacing the anisotropic tensor with
After this, similarly to the positional parameters, in every cycle restraints are applied to the anisotropic tensor of the atoms on special positions to avoid violation of the symmetry condition for the ADP,
where σB is a user-controllable weight parameter for B aniso values on special positions. Here, the distance between anisotropic tensors is a Frobenius distance |B 1 − B 2|2 = .
4.4. H atoms
Hydrogen electrons are usually shifted towards the parent atoms by 0.1–0.2 Å (Williams et al., 2018 ▸). This must be accounted for when calculating structure factors from the atomic model (F c). REFMAC5 and Servalcat (GEMMI) use the Mott–Bethe formula (Mott & Bragg, 1930 ▸; Bethe, 1930 ▸; Murshudov, 2016 ▸), which can conveniently take this fact into account.
The atomic scattering factor for an atom with a shifted nucleus is
where Δx is the positional shift of the nucleus with respect to the centre of the electron density. The hydrogen density peak in real space is shifted beyond the position of the hydrogen nucleus and varies depending on the ADP and resolution cutoff (Nakane et al., 2020 ▸). The expected peak position may be calculated by the Fourier transform of (28). The new CCP4 monomer library includes nucleus bond distances (_chem_comp_bond.value_dist_nucleus; Nicholls et al., 2021 ▸).
4.5. Refinement
REFMAC5 performs a maximum-likelihood refinement against the Fourier transform of a sharpened–masked–unsharpened map (see Section 4.2) using a dedicated likelihood function for SPA (7). The estimated noise is not used at the moment. No solvent model is used. The average of map–model FSC weighted by the number of Fourier coefficients in each shell (FSC average) is reported to monitor the refinement. At low resolution the use of jelly-body restraints or external restraints is encouraged to ensure a large radius of convergence and stabilize the refinement (Murshudov et al., 2011 ▸; Nicholls et al., 2012 ▸). Note that jelly-body restraints are only useful when the initial model geometry is of good quality because they try to keep the model in its current conformation. After the refinement, Servalcat shifts the model back to the original box and adjusts the translation vectors of the symmetry operators if needed. It also generates an MTZ file of map coefficients including the sharpened and weighted F o − F c and F o maps (as calculated by equations 17 and 18).
4.6. User interface
Servalcat has a command-line interface. A graphical interface will be available in CCP-EM, where the REFMAC5 interface has been updated and is now based on Servalcat.
From the user’s point of view, the main difference in setting up a refinement job is that the default input is now a pair of half maps. (Refinement from a single input map is still possible but is no longer the default option.) The user is also offered more control over the options for refinement weight, symmetry and handling of H atoms. At the end of refinement, the F o − F c difference map from Servalcat is made available along with the other output files in the CCP-EM launcher.
5. Methods and results
5.1. Fo − F c map for ligand visualization
Fo − F c omit maps are widely used to convincingly demonstrate the existence of ligands in crystallography. They are also useful for this purpose in SPA. Fig. 2 ▸ shows an example of an F o − F c omit map for the ligand density from EMDB entries EMD-22898 (Kern et al., 2021 ▸) and EMD-8123 (Murray et al., 2016 ▸), clearly showing support for the presence of the ligand. To generate the map from EMD-22898, chain A of the atomic model from PDB entry 7kjr was refined using the half maps under C2 symmetry constraints. For EMD-8123, PDB entry 5it7 was refined using the half maps without symmetry constraints. After the refinement, the ligand and water atoms were omitted and the F o − F c maps were calculated. Map values were normalized within a mask. Since a suitable mask for EMD-22898 was not available in the EMDB, one was calculated from half-map correlation using EMDA.
The weighting and sharpening scheme in Servalcat was compared with alternatives using no weights or (FSCfull)1/2 weights (Rosenthal & Henderson, 2003 ▸), both with sharpening by the overall B value as determined from Wilson plot fitting by RELION (Supplementary Figs. S1 and S2). Especially in the case of EMDB entry EMD-8123 (Supplementary Fig. S2), sharpening by the overall B value obtained by line fitting gave oversharpened maps.
5.2. Fo − F c map for detecting model errors
In crystallography, F o − F c maps are almost always used for manual and automatic model rebuilding. Strong negative density usually indicates that parts of the model should be moved away or removed, while strong positive density implies that there are unmodelled atoms. The F o − F c map is typically updated after every refinement session, and refinement may be stopped when there are no significant strong peaks.
The same refinement practice is possible in SPA. Fig. 3 ▸ illustrates the use of the F o − F c map for detecting model errors using EMDB entry EMD-0919 and PDB entry 6lmt (Demura et al., 2020 ▸). Chain A of the model was refined using the half maps under C8 symmetry constraints. After refinement, the F o − F c map was calculated and normalized using the standard deviation of the region within the EMDB-deposited mask. In this example, it is clear from the positive and negative difference peaks that the tryptophan and methionine side chains should be repositioned. The weighting and sharpening scheme are compared in Supplementary Fig. S3, demonstrating that appropriate weighting can increase the interpretability of maps.
5.3. Hydrogen density analysis
Nakane et al. (2020 ▸) reported convincing densities for H atoms in apoferritin and GABAAR maps by cryo-EM SPA at 1.2 and 1.7 Å resolution, respectively. It is natural to ask what is the lowest resolution at which H atoms can be seen in cryo-EM SPA using currently available computational tools.
Here, we analyzed apoferritin maps from the EMDB to see if and when hydrogen densities could be observed. There are 25 mouse or human apoferritin entries at resolutions better than 2.1 Å, of which 19 had half maps and were used in the analysis (Table 1 ▸). Chain A of each model was refined using the half maps under O symmetry constraints. If there was no corresponding PDB entry, PDB entry 7a4m or 6z6u was placed in the map using MOLREP (Vagin & Teplyakov, 2010 ▸) followed by jiggle fit in Coot (Brown et al., 2015 ▸) before full atomic refinement. After ten cycles of refinement with REFMAC5, an F o − F c map was calculated and normalized within the mask. Riding H atoms were used in the refinement (so they are not refined, but generated at fixed positions; this is the default in REFMAC5) and they were omitted for F o − F c map calculation. Peaks of ≥2σ and ≥3σ were detected using PEAKMAX from the CCP4 package (Winn et al., 2011 ▸), and were associated with hydrogen positions if the distance from the peak was less than 0.3 Å. H atoms having multiple potential minima (such as those in hydroxyl, sulfhydryl or carboxyl groups) were ignored in the analysis. The ratios of the number of hydrogen peaks to the number of H atoms in the model are plotted in Fig. 4 ▸(a). The result shows that the 1.25 Å resolution data gave the highest ratio of ∼70% hydrogens detected (Fig. 5 ▸ a). Even at 1.84 Å resolution approximately 17% of the H atoms may be found (Fig. 5 ▸ b), while at 2.0 or 2.1 Å resolution only a few H atoms are visible in the map (Fig. 5 ▸ c). The weighting and sharpening schemes are compared in Supplementary Figs. S4–S6. Note that there may be false positives due to, for example, alternative conformations or inaccuracies in the model.
Table 1. Test data for hydrogen peak analysis.
EMDB code | PDB code | Resolution (Å) | Reference |
---|---|---|---|
EMD-11638 | 7a4m | 1.22 | Nakane et al. (2020 ▸) |
EMD-11103 | 6z6u | 1.25 | Yip et al. (2020 ▸) |
EMD-30683 | (7a4m)† | 1.31 | Danev et al. (2021 ▸) |
EMD-30685 | (7a4m)† | 1.35 | Danev et al. (2021 ▸) |
EMD-30684 | (7a4m)† | 1.43 | Danev et al. (2021 ▸) |
EMD-30686 | (7a4m)† | 1.43 | Danev et al. (2021 ▸) |
EMD-9865 | (7a4m)† | 1.54 | Kato et al. (2019 ▸) |
EMD-11121 | 6z9e | 1.55 | Yip et al. (2020 ▸) |
EMD-11122 | 6z9f | 1.56 | Yip et al. (2020 ▸) |
EMD-9599 | (7a4m)† | 1.62 | Danev et al. (2019 ▸) |
EMD-0144 | (6z6u)† | 1.65 | Zivanov et al. (2018 ▸) |
EMD-20026 | (6z6u)† | 1.75 | Pintilie et al. (2020 ▸) |
EMD-21024 | 6v21 | 1.75 | Wu et al. (2020 ▸) |
EMD-10101 | 6s61 | 1.84 | No publication |
EMD-10675 | (7a4m)† | 1.86 | Fislage et al. (2020 ▸) |
EMD-21951 | 6wx6 | 2.00 | Tan & Rubinstein (2020 ▸) |
EMD-22351 | (6z6u)† | 2.07 | Guo et al. (2020 ▸) |
EMD-4905 | 6rjh | 2.10 | Naydenova et al. (2019 ▸) |
EMD-20521 | 6pxm | 2.10 | No publication |
In addition, F o − F c maps were generated from the 1.2 Å resolution data (PDB entry 7a4m; EMDB entry EMD-11638) using several different resolution cutoffs. These were analysed in the same way (Fig. 4 ▸ c), along with F c maps calculated from the PDB entry 7a4m model at the same resolutions (Fig. 4 ▸ d). Figs. 4 ▸(c) and 4 ▸(d) show that if the cryo-EM experiment and atomic model refinement are carried out carefully, with due attention to ADPs, then some H atoms can be seen even at 2.0 Å resolution.
For comparison, we performed the same analysis using X-ray crystallographic data for (apo)ferritins deposited in the PDB. 51 re-refined atomic models available in the PDB-REDO database (Joosten et al., 2012 ▸) were downloaded, crystallographic mF o − DF c maps were calculated using REFMAC5 and density peaks for H atoms were analysed as just described. The result (Fig. 4 ▸ b) confirms that, as expected, H atoms are more visible in EM than using X-rays.
6. Conclusions
A new program, Servalcat, for the refinement and validation of atomic models using cryo-EM SPA maps has been developed. The program controls the refinement flow and performs difference-map calculations. A weighted and sharpened F o − F c map was derived as a validation tool, obtained from the posterior distribution of F T and an approximation of an overall blurring factor calculated from the variance of the signal. We showed that such maps are useful to visualize H atoms and model errors, as in crystallography.
In this work, we assumed the blurring factor k was position-independent (see Section 3). However, in reality, blurring of maps is position- and direction-dependent, for example due to the varying mobility of different domains and/or uncertainty in the particle alignments. For such regions k should ideally be replaced with k local, derived from a local map blurring parameter B local according to k local(s) = exp(−B local|s|2/4) (if isotropic) or exp(−s T B local s/4) (if anisotropic). If we could estimate B local values, then we would be able to use them for the visual improvement of maps. This is especially important for identifying weak densities. We are working on this subject.
We showed that many H atoms may be observed in the difference maps, even up to a resolution of 2 Å. We would expect that they should also be visible in electron diffraction (MicroED) experiments. However, high accuracy would be needed in the experiment, data analysis and model refinement in both MicroED and cryo-EM SPA to achieve this experimentally. For example, the electron dose in cryo-EM experiments is often high enough to cause radiation damage (Hattne et al., 2018 ▸); H atoms are known to suffer from radiation damage (Leapman & Sun, 1995 ▸) and this would hinder their detection. Lower dose experiments might be needed for more reliable identification of hydrogen, even at the expense of resolution.
Symmetry is widely used in cryo-EM SPA. When symmetry is imposed in the reconstruction, it should be used throughout the downstream analyses, and all software tools should be aware of it and take it into account. The asymmetric unit model should be refined under symmetry constraints, and it should be deposited in the PDB with the correct annotation of the symmetry. The PDB and EMDB deposition system will need to validate the symmetry of both the model and the map. We hope that this will become common practice in the future. The same practice should be established for helical reconstructions, in which symmetry is described by the axial symmetry type (Cn or Dn), twist and rise (He & Scheres, 2017 ▸). Servalcat will support helical symmetry in the future.
Servalcat is freely available under an open source (MPL-2.0) licence at https://github.com/keitaroyam/servalcat. The features described in this paper have been implemented in REFMAC 5.8.0291 and Servalcat 0.2.0 (which requires GEMMI 0.4.9). Servalcat is also available in the latest nightly builds of the CCP-EM suite and will be included in the upcoming version 1.6 release.
Supplementary Material
Supplementary Figures. DOI: 10.1107/S2059798321009475/qt5003sup1.pdf
Acknowledgments
The authors are grateful to Marcin Wojdyr for the implementation of F c calculation for EM in the GEMMI library, Takanori Nakane for critical reading of the manuscript, computational structural biology group members for discussion, and Jake Grimmett and Toby Darling from the MRC–LMB Scientific Computing Department for computing support and resources.
Funding Statement
This work was funded by Medical Research Council grants MC_UP_A025_1012 and MR/V000403/1 to Garib Murshudov, Keitaro Yamashita, Colin Palmer, and Tom Burnley.
Footnotes
There is a similar record, BIOMT, which encodes the biological assembly. In SPA, the symmetry of the map usually corresponds to the biological assembly, but this is not always the case. Both MTRIX and BIOMT records are generally required during deposition.
Technically, fixed position constraints would be more appropriate here. We used restraints instead of constraints for simplicity of implementation. In the future, we will implement the use of constraints instead.
References
- Afonine, P. V., Poon, B. K., Read, R. J., Sobolev, O. V., Terwilliger, T. C., Urzhumtsev, A. & Adams, P. D. (2018). Acta Cryst. D74, 531–544. [DOI] [PMC free article] [PubMed]
- Bai, X.-C., McMullan, G. & Scheres, S. H. W. (2015). Trends Biochem. Sci. 40, 49–57. [DOI] [PubMed]
- Bethe, H. (1930). Ann. Phys. 397, 325–400.
- Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136–153. [DOI] [PMC free article] [PubMed]
- Burnley, T., Palmer, C. M. & Winn, M. (2017). Acta Cryst. D73, 469–477. [DOI] [PMC free article] [PubMed]
- Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. [DOI] [PMC free article] [PubMed]
- Chojnowski, G., Sobolev, E., Heuser, P. & Lamzin, V. S. (2021). Acta Cryst. D77, 142–150. [DOI] [PMC free article] [PubMed]
- Clabbers, M. T. B. & Abrahams, J. P. (2018). Crystallogr. Rev. 24, 176–204.
- Cragnolini, T., Sahota, H., Joseph, A. P., Sweeney, A., Malhotra, S., Vasishtan, D. & Topf, M. (2021). Acta Cryst. D77, 41–47. [DOI] [PMC free article] [PubMed]
- Danev, R., Yanagisawa, H. & Kikkawa, M. (2019). Trends Biochem. Sci. 44, 837–848. [DOI] [PubMed]
- Danev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016. [DOI] [PubMed]
- Demura, K., Kusakizako, T., Shihoya, W., Hiraizumi, M., Nomura, K., Shimada, H., Yamashita, K., Nishizawa, T., Taruno, A. & Nureki, O. (2020). Sci. Adv. 6, eaba8105. [DOI] [PMC free article] [PubMed]
- Fislage, M., Shkumatov, A. V., Stroobants, A. & Efremov, R. G. (2020). IUCrJ, 7, 707–718. [DOI] [PMC free article] [PubMed]
- Guo, H., Franken, E., Deng, Y., Benlekbir, S., Singla Lezcano, G., Janssen, B., Yu, L., Ripstein, Z. A., Tan, Y. Z. & Rubinstein, J. L. (2020). IUCrJ, 7, 860–869. [DOI] [PMC free article] [PubMed]
- Hattne, J., Shi, D., Glynn, C., Zee, C.-T., Gallagher-Jones, M., Martynowycz, M. W., Rodriguez, J. A. & Gonen, T. (2018). Structure, 26, 759–766. [DOI] [PMC free article] [PubMed]
- He, S. & Scheres, S. H. W. (2017). J. Struct. Biol. 198, 163–176. [DOI] [PMC free article] [PubMed]
- Heymann, J. B., Chagoyen, M. & Belnap, D. M. (2005). J. Struct. Biol. 151, 196–207. [DOI] [PubMed]
- Hoh, S. W., Burnley, T. & Cowtan, K. (2020). Acta Cryst. D76, 531–541. [DOI] [PMC free article] [PubMed]
- Jakobi, A. J., Wilmanns, M. & Sachse, C. (2017). eLife, 6, e27131. [DOI] [PMC free article] [PubMed]
- Joosten, R. P., Joosten, K., Murshudov, G. N. & Perrakis, A. (2012). Acta Cryst. D68, 484–496. [DOI] [PMC free article] [PubMed]
- Joseph, A. P., Lagerstedt, I., Jakobi, A., Burnley, T., Patwardhan, A., Topf, M. & Winn, M. (2020). J. Chem. Inf. Model. 60, 2552–2560. [DOI] [PMC free article] [PubMed]
- Kato, T., Makino, F., Nakane, T., Terahara, N., Kaneko, T., Shimizu, Y., Motoki, S., Ishikawa, I., Yonekura, K. & Namba, K. (2019). Microsc. Microanal. 25, 998–999.
- Kern, D. M., Sorum, B., Mali, S. S., Hoel, C. M., Sridharan, S., Remis, J. P., Toso, D. B., Kotecha, A., Bautista, D. M. & Brohawn, S. G. (2021). Nat. Struct. Mol. Biol. 28, 573–582. [DOI] [PMC free article] [PubMed]
- Leapman, R. D. & Sun, S. (1995). Ultramicroscopy, 59, 71–79. [DOI] [PubMed]
- Luzzati, V. (1952). Acta Cryst. 5, 802–810.
- Mott, N. F. & Bragg, W. L. (1930). Proc. R. Soc. London A, 127, 658–665.
- Murray, J., Savva, C. G., Shin, B.-S., Dever, T. E., Ramakrishnan, V. & Fernández, I. S. (2016). eLife, 5, e13567. [DOI] [PMC free article] [PubMed]
- Murshudov, G. N. (2016). Methods Enzymol. 579, 277–305. [DOI] [PubMed]
- Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. [DOI] [PMC free article] [PubMed]
- Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152–156. [DOI] [PMC free article] [PubMed]
- Naydenova, K., Peet, M. J. & Russo, C. J. (2019). Proc. Natl Acad. Sci. USA, 116, 11718–11724. [DOI] [PMC free article] [PubMed]
- Nicholls, R. A., Long, F. & Murshudov, G. N. (2012). Acta Cryst. D68, 404–417. [DOI] [PMC free article] [PubMed]
- Nicholls, R. A., Tykac, M., Kovalevskiy, O. & Murshudov, G. N. (2018). Acta Cryst. D74, 492–505. [DOI] [PMC free article] [PubMed]
- Nicholls, R. A., Wojdyr, M., Joosten, R. P., Catapano, L., Long, F., Fischer, M., Emsley, P. & Murshudov, G. N. (2021). Acta Cryst. D77, 727–745. [DOI] [PMC free article] [PubMed]
- Pintilie, G., Zhang, K., Su, Z., Li, S., Schmid, M. F. & Chiu, W. (2020). Nat. Methods, 17, 328–334. [DOI] [PMC free article] [PubMed]
- Ramírez-Aportela, E., Vilas, J. L., Glukhova, A., Melero, R., Conesa, P., Martínez, M., Maluenda, D., Mota, J., Jiménez, A., Vargas, J., Marabini, R., Sexton, P. M., Carazo, J. M. & Sorzano, C. O. S. (2019). Bioinformatics, 36, 765–772. [DOI] [PMC free article] [PubMed]
- Ramlaul, K., Palmer, C. M. & Aylett, C. H. (2019). J. Struct. Biol. 205, 30–40. [DOI] [PMC free article] [PubMed]
- R Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
- Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721–745. [DOI] [PubMed]
- Scheres, S. H. W. (2012). J. Struct. Biol. 180, 519–530. [DOI] [PMC free article] [PubMed]
- Schrodinger, LLC (2020). The PyMOL Molecular Graphics System, Version 2.4.
- Tagari, M., Newman, R., Chagoyen, M., Carazo, J.-M. & Henrick, K. (2002). Trends Biochem. Sci. 27, 589. [DOI] [PubMed]
- Tan, Y. Z. & Rubinstein, J. L. (2020). Acta Cryst. D76, 1092–1103. [DOI] [PubMed]
- Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. (2018a). Nat. Methods, 15, 905–908. [DOI] [PMC free article] [PubMed]
- Terwilliger, T. C., Sobolev, O. V., Afonine, P. V. & Adams, P. D. (2018b). Acta Cryst. D74, 545–559. [DOI] [PMC free article] [PubMed]
- Terwilliger, T. C., Sobolev, O. V., Afonine, P. V., Adams, P. D. & Read, R. J. (2020). Acta Cryst. D76, 912–925. [DOI] [PMC free article] [PubMed]
- Tickle, I. J. (2012). Acta Cryst. D68, 454–467. [DOI] [PMC free article] [PubMed]
- Tronrud, D. E. (2004). Acta Cryst. D60, 2156–2168. [DOI] [PubMed]
- Tykac, M. (2018). PhD thesis. University of Cambridge. https://doi.org/10.17863/CAM.31783.
- Vagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22–25. [DOI] [PubMed]
- Warshamanage, R., Yamashita, K. & Murshudov, G. N. (2021). bioRxiv, 2021.07.26.453750.
- Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer.
- Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B. III, Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S. & Richardson, D. C. (2018). Protein Sci. 27, 293–315. [DOI] [PMC free article] [PubMed]
- Wilson, A. J. C. (1942). Nature, 150, 152.
- Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A. & Wilson, K. S. (2011). Acta Cryst. D67, 235–242. [DOI] [PMC free article] [PubMed]
- Wlodawer, A. & Dauter, Z. (2017). Acta Cryst. D73, 379–380. [DOI] [PMC free article] [PubMed]
- Wu, M., Lander, G. C. & Herzik, M. A. (2020). J. Struct. Biol. X, 4, 100020. [DOI] [PMC free article] [PubMed]
- Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157–161. [DOI] [PubMed]
- Zivanov, J., Nakane, T., Forsberg, B. O., Kimanius, D., Hagen, W. J., Lindahl, E. & Scheres, S. H. W. (2018). eLife, 7, e42166. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Figures. DOI: 10.1107/S2059798321009475/qt5003sup1.pdf