Abstract
Virtually all single-particle cryo-EM experiments currently suffer from specimen adherence to the air-water interface, leading to a non-uniform distribution in the set of projection views. Whereas it is well accepted that uniform projection distributions can lead to high-resolution reconstructions, non-uniform (anisotropic) distributions can negatively affect map quality, elongate structural features, and in some cases, prohibit interpretation altogether. Although some consequences of non-uniform sampling have been described qualitatively, we know little about how sampling quantitatively affects resolution in cryo-EM. Here, we show how inhomogeneity in any projection distribution scheme attenuates the global Fourier Shell Correlation (FSC) in relation to the number of particles and a single geometrical parameter, which we term the sampling compensation factor (SCF). The reciprocal of the SCF is defined as the average over Fourier shells of the reciprocal of the per-particle sampling and normalized to unity for uniform distributions. The SCF therefore ranges from one to zero, with values close to the latter implying large regions of poorly sampled or completely missing data in Fourier space. Using two synthetic test cases, influenza hemagglutinin and human apoferritin, we demonstrate how any amount of sampling inhomogeneity always attenuates the FSC compared to a uniform distribution. We advocate quantitative evaluation of the SCF criterion to approximate the effect of non-uniform sampling on resolution within experimental single-particle cryo-EM reconstructions.
Keywords: Fourier Shell Correlation, single particle analysis, preferred orientation, anisotropy
Introduction
Single-particle cryo-electron microscopy (cryo-EM) has gained increasing popularity for structural analysis of macromolecules and macromolecular assemblies. Numerous technical advances have contributed to improvements in resolution [1–3], throughput [4], and overall usability of the approaches, leading to a wealth of novel insights pertaining to macromolecular structure and function [5]. Although many steps in the single-particle workflow are becoming more streamlined and automated, a principal remaining challenge pertains to problems resulting from non-uniform projection distributions contributing to reconstructed density maps.
Non-uniformity in the distribution of projection orientations recorded in a single-particle imaging experiment originates from adherence of the specimen to one of two interfaces (top or bottom) of the grid. The interfaces, which could be air-water or support-water (e.g. thin carbon), cause specimens to stick in one of several “preferential orientations”. It is now clear that virtually every specimen prepared for single-particle imaging using conventional blotting techniques adopts a preferential orientation on cryo-EM grids [6]. The reason for this is that macromolecules, which continuously undergo rapid thermal motion, adhere to interfaces on a time scale that is orders of magnitude shorter than the time to blot off excess sample. Recent inkjet dispensing technologies have ameliorated some of the effects of preferential specimen orientation by attempting to out-run sample adherence to interfaces and by minimizing the amount of time between sample application and plunging into liquid ethane [7]. However, such devices do not yet eliminate preferential orientation in its entirety and require high sample concentration. Furthermore, the increase in interest in specimen supports, like graphene [8–11] which also cause preferential orientation, indicates that the effects of non-uniform sampling on final reconstructions will remain problematic for many single-particle experiments.
Numerous approaches have been devised to estimate the quality of angular distributions and their effects on a reconstructed density. These ideas are primarily developed in conjunction with some anisotropic measure. One measure derives from the application of a 3D point spread function to estimate the strength of signal above some significance criterion, in all directions of the 3D Fourier transform [12]. In another approach, the 3D spectral signal-to noise ratio (SSNR) is used to define directional resolution differences [13], with the SSNR bearing a direct relationship to the Fourier Shell Correlation (FSC), the conventional means for measuring resolution in single-particle cryo-EM [14]. Multiple groups also described the use of conical FSCs to evaluate anisotropic resolution for tomographic reconstructions [15, 16], as well as our and others’ work on evaluating anisotropic resolution in single-particle analysis [17, 18]. More recently, the “efficiency” metric [11] was introduced to characterize an orientation distribution, based on the observed relationship between orientation distribution and experimental resolution. We proposed that an evaluation of anisotropy in cryo-EM experiments should be standard for every cryo-EM reconstruction [19].
The consequences of sampling non-uniformity on a reconstructed density map can vary and depend on the extent and distribution of projection views. In many experimental cases, one might see a few distinct preferential orientations across the Euler distribution profile, but the resulting map still looks reasonable, and is readily interpretable with an atomic model. In the more severe cases, an anisotropic distribution leads to apparent elongation of structural features within the map. In such cases, the interpretation of the map may be affected, sometimes severely, due to the appearance of artefactual density parallel to the dominant view [18]. In the most severe cases, structure determination may be stifled altogether. Some hallmarks of pathologically anisotropic distributions include inflated Fourier Shell Correlation (FSC) curves, elongated features beyond interpretability, an inability to converge on a final structure, and/or the appearance of false positive orientations in the course of refinement [18]. All these factors can reinforce problems in the density. One interesting observation was that anisotropic orientation distributions lead to an increase in the temperature factor associated with the data, thereby also affecting global resolution [11]. However, a derivation from standard models has not been established.
While different measures have been introduced to evaluate the effect of anisotropic distributions on directional viewings of the reconstructed density map, the effect of sampling on global resolution has largely been neglected. Furthermore, there remains no systematic, quantitative study of the effects of inhomogeneous projection distributions on cryo-EM reconstructions. Here, we examine the relationship between non-uniform angular sampling and global resolution, as measured using conventional analyses in cryo-EM. A major conclusion from our work is that any inhomogeneity, and especially missing information in Fourier space, directly attenuates global resolution in 3D reconstructions, and thus impedes the single-particle experiment.
Section 1. Summary of the major findings
Given a set of projection views, we develop an assessment of the quality of the sampling. We chose this assessment based on the expected effect on the spectral signal to noise ratio (SSNR) defined through the FSC. We show that the angular average of the reciprocal of the sampling forms a quantity whose reciprocal attenuates the SSNR, if we consider the other aspects of the problem associated with the overall experimental envelope to be held constant. More specifically, we argue:
(1.1) |
where N is the number of particles and k is spatial frequency. We define ssnrmodel(k) to be the spectral signal-to-noise ratio describing image formation in cryo-EM. This definition is identical to previous formulations of ssnr and incorporates the modulation of the image by a transfer function and the additional presence of noise. The SCF is the “sampling compensation factor”, which we define to be
(1.2) |
Here, <·> means the average in Fourier space over the nonzero values of sampling on shells at (approximately) fixed spatial frequency, k, and sp is the amount of sampling per-particle, determined from the Euler angle assignments.
The factor is the average number of sampling at lattice sites a distance, k, from the Fourier origin. Thus, the SSNR has a very intuitive expression in terms of a single geometrical factor, the average number of observations at the given Fourier radius and the signal-to-noise ratio of a Fourier voxel:
(1.3) |
Notably, one must compensate for the geometry of the sampling to correctly estimate the SSNR: hence the name, “sampling compensation factor”. The sampling geometry enters only in one place in Eq (1.1), as does the total number of particles, N. One notes from (1.1) that the number of particles necessary to perform a reconstruction also depends inversely on the SCF, with smaller SCFs requiring larger numbers of particles.
In section 2, we derive all the formulae relating sampling to SSNR, including the case with missing data, which requires special handling. In section 3, we derive analytical solutions to the sampling and SCF for a variety of different cases. In section 4, we discuss the linear dependence of the SSNR on N, as well as estimating the number of particles to perform a reconstruction. In section 5, we show the correspondence between the proposed decrement of signal based on sampling and the actual decrement in the SSNR when reconstructions are performed for two different proteins.
Section 2. Decrement of SSNR due to sampling inhomogeneity
In this section, we derive Eq (1.1), which provides an expression for the SSNR where all the aspects of the sampling have been incorporated specifically into two parameters: the number of particles, and a single geometrical factor. We assume that the effects of the microscope and the effects of the noise can be approximately decoupled, in a manner that has otherwise been typically assumed in the literature [20–22]. Our treatment from (2.1) to (2.10) is standard. However, we diverge from the standard treatment at Eq. (2.11), where we account for the anisotropy of the sampling.
In section 2.1, we first consider the cases where the voxels in 3D Fourier space are completely measured and derive the SSNR relationship, Eqs (1.1) and (1.2), which is the main result of this paper. In section 2.2, we extend these derivations to cases when there is missing data, by which we arrive at the adjusted formulae for resolution (2.30). We refer to other sources, as necessary (Sorzano, [22] and Penczek [20]), for more detail on the aspects that are not central to the derivations given here.
2.1. Derivation of the Sampling Compensation Factor (SCF):
The generally accepted understanding of 2D projection data after orientation assignment in cryo-EM single-particle analysis is given by:
(2.1) |
Here is a point in 2D Fourier space as measured on the projection j, where the projection j has data on the 2D grid point labeled by (see Figure 1). This is the usual Fourier space description of a “single particle”. Eq (2.1) is our statement of the projection slice theorem: the measured data should be a slice out of the true 3D map, X, but that has been modified in the microscope by a transfer function, Mj(k), and corrupted by , which is identically distributed noise with mean zero and a variance that is independent of direction. This is the same set of arguments that appear starting at Eq (7) from [22], as well as other places.
The 3D rotation, , that appears in Eq (2.1) is the mapping from the 3D version, of the 2D point, to the 3D point, , on the map, X, which is being reconstructed (Figure 1). The “Euler angles” for the projection, j, are the angles that appear in the conventional ZYZ representation of the rotation Rj. The factor Mj(k), has been extensively described (Sorzano [22], Penczek [20]) and should be an oscillating sinusoidal function (CTF) with a frequency-dependent attenuation caused by various envelope effects. Eq. (2.1) is the generally accepted starting point for cryo-EM data.
We next redefine to represent points on the 3D grid, and we shift our attention to the reconstruction of the map in 3D. In the reasoning of direct Fourier reconstruction, we can form the average over the samples that are used to reconstruct each 3D grid point to arrive at an estimate of the 3D data point after reconstruction within the map :
(2.2) |
where is is the number of times that the particular point (in 3D) has been measured (by means of projections as described above). In a conventional direct Fourier reconstruction, both the running estimate of the reconstruction and the total weights that have been used for interpolation (that is, ) are kept as projection data is added. We have performed the simplest operations in deriving the estimate for the target , neglecting any dependence on reconstruction weights [23–25].
One key observation is that, after substituting (2.1) into (2.2), the resulting noise is always down by a factor of one over the square root of the amount of sampling (see [22]):
(2.3) |
where the “renormalized” noise, has mean zero and the same variance as the average of the variances of the constituent noise variables . A complete derivation of Eq. (2.3) can be found in Appendix A. Eq. (2.3) has been written so that the variance of does not depend on the sampling. This is parallel to the argument which appears in [22] at about Eq. (11). We have introduced E(k), which is an effective envelope and the average over the samples of the microscope influences (Mj(k)) as well as misalignment effects. The consideration of such envelopes is not recent [26] and can incorporate a variety of factors that attenuate signal, for example, as demonstrated in [27]. Strictly speaking, Eq (2.3) can only be approximate, but it is consistent with all other analyses [27].
In the typical evaluation of cryo-EM resolution, two independent reconstructions are performed to arrive at half maps which we can write in Fourier space as:
(2.4) |
We are interested in the FSC of half maps drawn from the same statistical ensemble as in (2.3). Therefore, we consider two maps assembled as in (2.3) and then we calculate the FSC. Each half map, F, G should therefore be of the form given as in Eq. (2.3):
(2.5) |
That both maps experience the same reduction in the noise will only be approximately true, but this is the standard treatment. The normal prescription is to introduce the correlation of these half maps at a discrete set of wavevector magnitudes, and then examine the functional behavior of this scalar correlation as a function of this wave-vector:
(2.6) |
(2.7) |
Since F and G are assumed to be statistically similar, we can write (2.6) in short hand as
(2.8) |
Where we have used <·> to mean the averages over shells in Fourier space. Very crudely, it is the cross correlation divided by the self-correlation. For more rigor, see Sorzano et al [22], or Penczek. [20, 28]. All the calculations for FSC like Eq. (2.8), and those that follow are, more precisely, for the expectation of the FSC (see [20] and [22]), but we avoid the distinction for the purposes of clarity.
Starting from (2.8), we can perform the familiar sort of calculation [20, 22, 29]
(2.9) |
(2.10) |
(2.11) |
where:
(2.12) |
and we have decoupled the noise variance from the sampling in going from Eqs (2.10) to (2.11). There is no a priori reason to anticipate that the noise variances are related to the Euler angle assignments, so the decoupling implicit in going from (2.10) to (2.11) is consistent with standard assumptions. For the half maps, this leads to the following approximate estimate for the expectation of the FSC using the above (2.8), (2.9), (2.11):
(2.13) |
(2.14) |
(2.15) |
(2.16) |
In going from (2.14) to (2.15), we have defined
(2.17) |
which is a signal-to-noise power ratio. We have used “model” here in the sense that this ssnr is directly related to the standard model of image formation in cryo-EM, as for example formulated previously ([22],[20]). The model ssnr incorporates attenuation and/or corruption of information of the imaged object due to the effects of the transfer function and by the presence of noise, but it is not directly observable from standard error analyses (e.g. of beam incoherence, detector inefficiency, mis-assignment of orientation). It can only be indirectly inferred from the data, for example by means of the FSC and knowledge of the projection directions. Also, in going from (2.15) to (2.16), we have defined:
(2.18) |
The expression (2.18) is the same as (1.2), after identifying the per-particle version of the sampling. Notably, all the effects of sampling anisotropy are gathered into a single term: the sampling compensation factor (SCF) as given by (2.18). The reason for the regrouping of the factor 2k into the sampling compensation factor (SCF) expression is that the SCF is then unity in a continuum calculation for the average density of sampling for distributions that are uniform, as we will show in section 3.
Following previous formulations, we can define the spectral signal-to-noise ratio (SSNR):
(2.19) |
Substituting (2.16) into (2.19), we arrive at Eq (1.1):
(2.20) |
where N is the number of particles, SCF is the geometrical factor, k is spatial frequency and ssnrmodel(k) is the signal-to-noise power described by (2.17). Eq (2.20) is essentially the same expression that appears in [29] except for the appearance of the SCF term.
Under certain circumstances, the reconstructed volume may have regions of Fourier space that have not been sampled. Two typical causes for this are: 1). The set of projection views are not well distributed (such as top views), such that Fourier voxels, even very near the Fourier origin, have not been filled. 2). The set of projection views are reasonably well distributed, but as one moves further from the Fourier origin, there are lattice sites that are not sampled. Because Fourier voxels not receiving information during the reconstruction procedure are traditionally left as zeros, there will be voxels that do not contribute to the angular averages of (2.9) – (2.11). A careful recalculation shows that the amended formula for the SSNR, as defined by (2.19), should still be (2.20), except that the angular average involved with (2.18) for the evaluation of SCF should only take place over non-zero voxels:
(2.21) |
Eq (2.20) along with (2.21) are our Eq. (1.1). As an aside, we show later that goes like 1/2k so that the total sampling follows N/2k. Therefore, in the standard cryo-EM experiment, the total sampling typically will not thin to zero, and the only zeros are the result of deficient projection distributions.
2.2. An adjusted formula for SSNR for half maps with unmeasured data
The SSNR, based on half maps, has a drawback when some of the Fourier voxels have been left unmeasured. The voxels in each half map are typically set to zero, which leads to smooth, but artefactual maps, and may yield artificially high resolution measures. We see this in detail late in Section 5, when we look at reconstructions that are performed from projections in a 45° cone, and a percentage of randomly distributed extra projections is decreased in the sequence 10%, 3%,1%, 0%. There is a sudden increase in the improperly defined FSC resolution measure at 0%. We will defer discussion of the reconstructed data to that time. Here, we seek an adjusted SSNR expression, which allows variance to be assigned to regions of Fourier space that have been unmeasured. Consider the simplest situation, as in Figure 2, where we have represented some shell of Fourier space by P measured values having mean T, and variance per voxel, varN, given by the reciprocal sampling at each measured point. There are also Q unmeasured voxels, that are assigned 0 values in Figure 2A, and contribute neither to the signal nor the variance. Working through the derivation of the SSNR above, but setting the variance of the Q unmeasured voxels to 0, then, the ratio of signal to variance is shown in the figure: where the average is taken of the reciprocal per-particle sampling over the measured values. However, if one assigns a variance of 1 to the Q unmeasured voxels, and repeats the same calculation, one arrives at . In particular, when N is large, the behavior is completely different: in Figure 2A the SSNR increases without bound, and in Figure 2B the SSNR plateaus to a finite value and is proportional to the area of measured to unmeasured region.
Generalizing the scenario in Figure 2, we consider a Fourier shell at Fourier radius k, and let Pk be the number of voxels that have non-zero sampling and let Qk be the number of voxels with missing data. The total number of voxels therefore is then Pk + Qk. We calculate the adjusted values of the quantities in (2.9) and (2.10), assuming that the data with missing voxels should be allowed to have variance. Then:
(2.22) |
(2.23) |
Where E is defined through (2.3), N2 is the noise variance defined as in (2.12), and X(k) is the target structure. Our approach for the missing data is now clear: missing voxels take on a single unit of noise unattenuated by any sampling. The fairest assignment for such voxels is one unit of variance and zero units of signal. The adjusted formula, FSC*, for the then becomes:
(2.24) |
This leads to an adjusted SSNR, which we develop by starting with its reciprocal:
(2.25) |
(2.26) |
(2.27) |
Thus:
(2.28) |
(2.29) |
which may be rewritten:
(2.30) |
where SCF* is the expression to use in the adjusted version of Eq. (1.1): ssnrmodel(k). Eq. (2.30) gives an expression for reevaluating the SSNR for half maps when the original half maps have missing data. One way to think of Eq (2.30) is it shows how the conventionally constructed SSNR is inflated due to not assigning any variance to missing data. Eq. (2.30) also yields a condition by which a correction is necessitated. The ratio of occupied to unoccupied voxels at some Fourier wavevector is typically only a weak function of Fourier magnitude. This means it is also a geometrical parameter, similar to the SCF. Therefore, when
(2.31) |
then one should have to correct with the factor in the denominator of Eq. (2.30), to obtain a more realistic value of the SSNR. The condition (2.31) is the condition that the unmeasured variance is similar in magnitude with the measured variance, which is sampled in proportion to the number of particles. Also, if there is a sufficiently narrow gap, then we can ignore the adjustment. In practice, if there a sampling geometry that produces true missing gaps, then any number of particles should necessitate the alternate formula.
If N is sufficiently large, then what limits the resolution is solely the gap. Adding more particles will not improve the SSNR, because additional particles will not better resolve the missing voxels, and the already measured region is sufficiently well resolved. The expression for the adjusted SSNR is most readily read off from (2.27), when the unadjusted value becomes large. Then the first term on the right-hand side can be neglected, and the reciprocal of the remaining terms taken to find the limit of large particle numbers, but with missing data:
(2.32) |
Thus, in the limit of large particle numbers, the adjusted SSNR plateaus to a value, which is the model ssnr multiplied by the ratio of measured to unmeasured voxels. For positive k, the expression implies that the FSC* deviates from unity even for small k>0: and is not improved by adding more particles. In this case, the measured voxels are perfectly well sampled, and all the variance is due to the missing values.
To summarize, we derived the relationship between the SSNR and the type of sampling distribution that is involved in the reconstruction. The latter enters the formula solely as a single geometrical factor, the SCF, given by Eqs (2.20) and (2.21) (which reiterates Eq. (1.1)), the main result of this work. In section 3, we derive analytical expressions for the SCF, and in section 5, we evaluate the efficacy of (1.1) using simulated cryo-EM datasets. In the case of missing data, we suggest an adjusted expression for the SSNR from what is usually used. This is the formula for SSNR*(k) given by Eq. (2.30).
Section 3. Numerical and analytical forms for the sampling function, and expressions for the SCF geometrical factor
In Section 1 and 2, we showed that the entire effect of the sampling inhomogeneity on the SSNR could be incorporated into a single geometrical coefficient, the SCF. In this section, we provide numerical and analytical forms for the sampling function, as well as the geometrical SCF factor that causes decrement to SSNR curves. In section 3.1 we explain our numerical and analytical approaches for evaluating the sampling and show that they evaluate identically for appropriate cases. In section 3.2, we give continuum expressions for the sampling for several families of distributions: 1) a one parameter family of distributions with an axial symmetry, that span the complement to cones, which we term “side-like”; 2) a one parameter family of side-like distributions modulated by fluctuations in the phi angle; 3) A two parameter family of projection views that are constrained to fall within a cone of half-angle α, and that have, in addition, a fraction, ϵ of views that are randomly scattered through the remainder of Euler space. In section 3.3, we calculate analytically the SCF for each of these distributions using the continuum formalism that we developed, which is valid when the sampling is not too small. The range of values of the SCF for “side like views” ranges from 1 (the maximum, corresponding to uniform) to (side views). For the modulated side view cases, the SCF decreases as , where λ is the magnitude of the modulation, and we restrict the modulation >1. This gives us a complete parametrization of reasonable sampling where the SCF decreases from 1 to .81 (side) to 0. For the poorly sampled top-like views, we give a closed form integral expression for SCF(α, ϵ) and evaluate the expression graphically. In the case when ϵ=0, we point out that there are typically missing values and the usual expression for the SSNR is problematic, as it neglects the variance that can be estimated for the unmeasured voxels by using the data already measured on the same shells of Fourier space. Using the expressions that we developed in Section 2, we show theoretically that properly defined SSNR curves should always improve after increasing the sampling (by increasing the percentage of uniformly distributed views that lead to measured data in the unmeasured region). All figures of the SCF curves and dependencies on control parameters are provided accordingly.
3.1. Discrete and continuum approaches to the sampling function
3.1.1. Discrete treatment for sampling
The projection-slice theorem [30] states that a 2D projection from a direction of a 3D map, is a slice out of the Fourier volume of the plane perpendicular to and passing through the origin, as shown in Figure 1. If we think of the map as rotated by R before the projection (along ), then what we term the projection direction, , is (approximately) perpendicular to the sampled points, and is given by . As suggested in Figure 1, each projection, , samples the set of points satisfying
(3.1) |
The totality of the discretely sampled points form a lattice as shown in Figure 3. Here, a single projection (in Fourier space) is taken in the direction with Fourier magnitudes less than the real space box, L. Lattice sites, shown as black dots in the kz=0 plane are considered to be sampled. Each sampled plane selects a lattice of points in this manner. Our numerical algorithm hinges on finding lattice sites that satisfy (3.1) for each projection. As we sum over projections, we increase the totality of “viewings” of each lattice site. In direct Fourier inversion, this integer number of “viewings” will correspond roughly to the reconstruction weights.
The number of times a particular 3D point, is sampled, we term , and is therefore given by the cardinality of the set of the projections, that for a given , satisfy the criterion of Eq. (3.1). Therefore:
(3.2) |
where Θ is the indicator function (see Glossary). The per-particle sampling function we define as:
(3.3) |
Eq. (3.1) is what is used numerically to find the sampling at each voxel, wherein the vector to each voxel is checked against every projection to see if the dot product between this vector and the unit direction given by the projection is sufficiently small (less than ½ in magnitude).
We investigate suitably many approximations that the sampling function emerges as a quantity that independently affects the SSNR (and only coupled to average microscope effects: not individual CTFs per particle, for example). It is our hypothesis that this level of approximation is sufficiently useful to enable understanding the effect of anisotropy on resolution.
3.1.2. A continuum treatment for sampling.
We wish to formulate the expressions analytically whenever possible. Toward this end, we recast (3.2) using Dirac delta functions, which will provide continuum calculations that are both useful and accurate. For a single projection in the z-direction, we would like to employ
(3.4) |
Generally, the Dirac delta function is considered to be , in the limit that the parameter M becomes arbitrarily large, whereas we have taken M as simply unity in (3.1). The delta function analytical approximation satisfies the proper normalization, and should be effective so long as the total sampling is well above unity.
The generalization of (3.1) for continuum calculations using the idea in (3.2) yields
(3.5) |
(3.6) |
where is a (generally continuous) measure on the distributions of projections parametrized by . Eq. (3.5) is the continuum approximation when the length of the side of the box, which we will use as L, can be considered to be much larger than 1. This is sufficient for many of our analytical treatments and development of formulae, since we are often working far from the Fourier origin. In Eqs. (3.5), we consider Fourier space to be dimensionless (unitless), which is a common practice. To reintroduce units, if one has, in 1D, 200 voxels of voxel size 1 Å per side, then each Fourier space voxel will have width and the largest distance from the Fourier origin will be (this is the Nyquist frequency). The average sampling across a shell at fixed Fourier magnitude can be derived using our continuum treatment. Starting from Eq. 3.4:
(3.7) |
(3.8) |
This is a natural result: placing planes (Fourier slices) into volumes, the density must fall off as one over the Fourier radius. A thorough discussion is given in Appendix B.1, including an interpretation of the geometrical factor 1/2k as the ratio of the circumference of a great circle on the Fourier sphere, divided by the surface area of such sphere. Eq. (3.8) is also the sampling per particle for a uniform set of projections, but Eqs (3.7) and (3.8) hold for any distribution of projections.
3.1.3. Consistency between numerical and analytical expressions for sampling
As a check of both our code and analytical implementation, we tested the total amount of sampling in our volume by placing 50,000 projections in a box of size LxLxL with L=41. We calculated the integer sum, S, over all the sampling at all the points, and evaluated S/4L2 numerically to be 1.19. To develop an analytical expression or this idea, we can write
(3.9) |
where k is spatial frequency. This is the average amount of intersection of arbitrarily oriented planes with a cube of side 2L. In appendix B.2, we show that the integral evaluates to
(3.10) |
This corroborates our numerical result described above.
3.2: Sampling Function for three different distributions in continuum representation
We calculate the sampling function for three different distributions. The first is the case of the complement to a cone (which we term side-like). The second is for side views with a modulation in the azimuthal Euler angle. Finally, we also calculate the top-like cases, where a certain fraction of uniform views is also included. The side-like views and side-modulated views are each governed by single parameters: i) the cone half angle, α and ii) the modulation parameter, λ. The top-like family of distributions is governed by two parameters: once again, the cone half-angle, α, and a parameter, ϵ, to cover uniform projections in the complement to this region.
Figure 4 shows the projection distributions that will be described in this section, including a schematic representation of the projection distribution (top row), how Fourier space is populated through slice insertion (middle row), and the experimental sampling map derived from 10,000 insertions (bottom row). These are displayed for different sampling schemes, including the uniform (Figure 4A), side-like or complement to cone (Figure 4B), side (Figure 4C), side-modulated (Figure 4D), and top-like (Figure 4E). In the first four cases, all of the Fourier voxels have been sampled, albeit to varying extents. In the last case, there is a missing cone in the transform that is inversely proportional to the projection angle.
3.2.1. side-like cases (α)
For the side-like case, we have
(3.11) |
where CN is a constant to ensure the normalization (3.8), leading to (see Appendix C):
(3.12) |
(3.13) |
The distribution for side views can be selected by taking the limit to arrive at:
(3.14) |
Along the z-axis (that is, ksin θ = 0), sp should have the same value as at the origin, which is 1.
3.2.2. modulated side-views (λ):
A set of modulated side view projections can be written as a density distributions:
(3.15) |
where ϕn is the azimuthal angle for a projection direction . This gives rise, therefore to a sampling given by:
(3.16) |
We describe in appendix B.3 how to select a set of projections with this form, using the cumulative distribution function.
3.2.3. top-like cases (α, ϵ)
Finally, we consider sampling for the top-like cases:
(3.17) |
which leads to (see Appendix C):
(3.18) |
(3.19) |
Taking α ≪ 1 leads to arbitrarily large values of sp: (for |η| < α). Once again, the sampling needs to be truncated to unity, when α = 0, π/2, that is in the xy plane, if the top-view is taken along the z-direction.
These distributions have missing data, and so we can calculate, for each shell, the ratio of filled, P, to unfilled voxels, Q.
(3.20) |
We will need Eq. (3.20), to compute SSNR*, as argued in section 2, because there is missing data and develop an adjusted formula for top-like SCF distributions in the next subsection.
We also can evaluate the top-like cases, when we add a random distribution of projections, so as to fill in the missing data. Then:
(3.21) |
where ϵ is the fraction of projections that are distributed randomly, and the rest fall in the original cone of half-angle α.
Section 3.3. The Sampling Compensation Factors for the three different distributions
The SCF is defined via:
(3.22) |
where <·> is the average over solid angle regions that have non-zero values of the sampling. We can evaluate this numerically for the “top-like” and “top-like with uniform” distributions, and analytically for the “side-like” and “modulated side-view” cases.
In Appendix B.5, we evaluate Eq. (3.22) using (3.14) to arrive at:
(3.23) |
which ranges from arbitrarily large values to (when λ = 0 ; no modulation). In practice, the sampling never achieves a continuum of values, so the expression given by (3.23) cannot be used for λ very close to 1.
One can also show:
(3.24) |
ranges from ( side views) to a value of 1 (α = 0 , uniformly distributed views).
Figure 5 shows schematically the behavior of the SCF for the fully sampled cases, including side-modulated and side-like cases. It is shown there how to continuously vary the SCF from its low values (corresponding to side-modulated cases with sampling suffering from deep pockets) to pure side views to side-like (complement to cone) to its highest value unity (for uniform views).
Finally, we want to calculate the quantity that represents the top-like situation. These are the cases that are often encountered in a cryo-EM experiment, characterized by a single preferential orientation. Reconstructions from particles characterized by top-like orientation distributions lack information within a conical region of the transform (the missing cone). Since much of the region is zero, the normalization from (3.22) must be carefully calculated and leads to:
(3.25) |
The right-hand side of (3.25) ranges from 1 (, uniformly distributed views) to 0 (for α = 0, purely top views). The asymptotics are for , and for small α Thus, we have a set of analytical expressions for SCF that can run from arbitrarily small to unity and from unity to arbitrarily large levels. However, any distribution with SCF > 1, involves distributions with missing data. Ultimately, the more relevant attribute, will be SCF* which relates how the correctly adjusted SSNR* is decremented due to the sampling. Thus, the SCF* is bounded by 0 ≤ SCF* ≤ 1.
Repeating with the additional random projections gives a drastically different value for the SCF for the singular change of adding partially uniform perturbations, because now all (or most) of the Fourier points have at least some sampling.
(3.26) |
For such top-like distributions, it is interesting to compare from (2.31) with SCFtop – like, ϵ (α) for small but finite ϵ from Eq. (3.26). From Eq (2.31), (3.20) and (3.25), we can derive the ϵ=0 quantity:
(3.27) |
(3.28) |
Note that , when . For small ϵ, but large N, the second terms of both (3.26) and (3.28) dominate and we get:
(3.29) |
(3.30) |
The crossover between these expressions occurs approximately when
(3.31) |
The situation for the decrement in the correctly adjusted SSNR is depicted In Figure 6, for the poorly sampled cases. The Eq. (3.28) is the lower bounding curve in gray (ϵ = 0). Otherwise, the curves represent Eq. (3.26). There is no crossover, unless epsilon is sufficiently small: in the figure, there are only three crossings of the curves. For k = 15, and N = 104, Eq. (3.31) implies ϵ = 0.03 ∗ sin α, = is the crossover between curves.
The last expressions tell the entire story of missing data. If data is missing in some sizeable region, the adjusted SSNR is drastically reduced. However, even a small fraction of random perturbations starts to quickly reintroduce signal. If there is a gap, the SCF is increased by a factor
(3.32) |
by adding back a fraction ϵ worth of random perturbations. For, N = 104, k = 15, α = 45°, ϵ = 0.01, the RHS becomes 4.7, which is a huge jump over such a small change in ϵ. Conversely, having an empty region of Fourier space results in a much lower SCF* than a lightly sampled Fourier space transform.
Section 4. Relationship between SSNR and the number of particles N in a reconstruction
In Section 2, we derived the relationship (2.20) (or equivalently (1.1)), which is the estimate of the SSNR in terms of the sampling. There are two aspects of the latter: the cumulative extrinsic effect due to the number of particles in the data, and the shape of the distribution of the sampling (or projection directions), an intrinsic quality. When Fourier space is reasonably sampled everywhere, we can assign a single parameter to each of the extrinsic and intrinsic qualities of the sampling: N, the number of particles, and SCF, the sampling compensation factor, defined as in Eq (1.2). The SSNR is seen to be proportional to each quantity, with the SCF attaining its maximum value of unity when the distributions of projections are uniform.
In this section, we revisit the dependence of the SSNR on N, the number of particles, when every other aspect of the problem is held constant:
(4.1) |
for some function, ssnrpp(k), the “per-particle ssnr”, which is the form of Eq. (1.1) with . Eq. (4.1) is the familiar way that the signal in a noisy system should accrue, if N represents the total number of measurements. The per-particle SSNR, which depends on many factors that corrupt the final reconstruction, is observed to be quite rigid and independent of N, as previously noted [21]. It incorporates multiple components inherent within the cryo-EM pipeline that impair resolution: attenuation due to the microscope transfer function, detector noise, incorrect image orientation assignment, structural heterogeneity, among others. The consequence is that the number of particles needed to obtain a higher resolution using the same collection scheme can be determined from a single SSNR curve, provided that the curve is sufficiently smooth at the desired resolution: indeed, smoothness of the SSNR curve might be another possible criterion for resolution. The universality of the SSNR/N curves is akin to the familiar Reslog [31] the or Guinier [21] analyses.
4.1. Linear dependence of SSNR on N
According to Eq. (4.1), dividing the SSNR by the number of particles results in a universal per-particle curve. To test this idea, we looked at sequences of FSC, equivalently SSNR, curves for reconstructions using successively larger number of particles, N, for data from an experimental dataset contributing to a 2.9 Å reconstruction of the eukaryotic large ribosomal subunit [32]. Figure 7A shows a total of seventeen experimental FSC curves, from N=7000 to 70000 particles. The series of FSC curves collapse to a universal curve via SSNR/N, as predicted by 4.1, where SSNR , as shown in Figure 7B. Although this idea has appeared formally in many places [21, 31, 33, 34], we have not noted the explicit construction of such universal curves, as highlighted here. For smaller values of particle number, N, the ssnrpp(k) curve loses continuity at smaller values of resolution and limits our ability to calculate the necessary number of particles to achieve higher resolutions, as described below.
4.2. Number of Particles Necessary for Reconstruction
Eq. 4.1 can be used to predict the number of particles necessary to attain a given resolution for a general envelope function, ssnrpp(k), derived from a single SSNR curve. A common scenario that is encountered during cryo-EM data collection is one in which the experimentalist asks whether the current approach is conducive toward achieving a target resolution, given a fixed amount of collection time. Our claim is that, there is some N0 so that for N = N0, we can construct the curve ssnrpp(k) ≡ SSNR(N0, k)/N0 and arrive at a reasonable estimate predicting the necessary number of particles (it is conceivable to make a lower estimate for the necessary N0, but this is beyond the scope of the current discussion). Thus, for a resolution criterion, FSC = FSC*, one arrives at an implied criterion, SSNR = SSNR* = FSC* /(1 − FSC*) (If FSC* = 0.143, or 0.5 then SSNR* = 0.167 or 1.0 respectively). Next, one defines kT to be the target resolution. Then the necessary number of particles, NT, to achieve the target resolution is given by:
(4.2) |
Graphically, we can make a construction on a semilog plot of the original SSNR curve, and realize that
(4.3) |
which follows from Eq. (4.1), which implies NT/N0 = SSNR(NT,kT)/ SSNR(N0,kT), and SSNR(kT,NT) = SSNR*. Now, Eq. (4.3) can be used to graphically find the number of particles needed to achieve a target resolution, since the shift from the current resolution to a target resolution is directly proportional to the ratio in the number of particles contributing to each reconstruction. Unlike other methods discussed in this section, this makes no assumptions about the functional form of the per-particle SSNR, ssnrpp(k): it can be exponential (Reslog) or Gaussian (Guinier), or indeed hold to any shape.
The idea is demonstrated for the ribosome sequence of reconstructions in Figure 7C. For convenience, and in line with standard assumptions in the cryo-EM literature, we used the same two FSC criteria described above of 0.5 and 0.143, which is equivalent to an SSNR condition of SSNR* = 1 and 0.167, respectively, and analyzed the SSNR curve corresponding to 7000 particles. Using SSNR* = 1 (or equivalently FSC=0.5), the resolution is measured to be 7.9 Å. To obtain the necessary ratio of number of particles required for reaching the target resolution of 4.2 Å, and using this same criterion, we can measure the difference on the log plot, which is 2.3 or log(10), that is one decade. Therefore, the prediction is that 10 times the original number of particles are necessary to obtain a reconstruction at 4.2 Å. When the orange dotted curve that corresponds to 10x particles is then inspected on the plot, the prediction is corroborated, since the resolution of the 70K particle FSC curve, where the orange dotted curve intercepts the SSNR* condition, matches to the predicted 4.2 Å. The identical analysis is repeated using the FSC=0.143 criterion in Figure 7D with similar conclusions.
Finally, we note that the SSNR is inversely proportional to the geometrical SCF factor, so that distributions with lower SCF (more fluctuations in the sampling) require larger numbers of particles. Under typical data collection procedures, the SCF is fixed by the sample preparation and microscope conditions, and one cannot easily consider the use of the SCF as an independent control parameter that can be conveniently varied. The exception would be to tilt the specimen, which would alter the orientation distribution, and thus the SCF [18].
4.3. Comparison of graphical methods (Guinier, Reslog, and per-particle SSNR curve)
The Guinier [21] and Reslog [31] formulations are popular for extrapolating the number of particles necessary for reconstruction. We would like to understand the relationship between these graphical constructions and the per-particle SSNR curves. We give a thorough analysis of the Guinier analysis, and see that the Guinier assumptions essentially also imply (4.1), but restrict ssnrpp(k) to a Gaussian form. Our method is seen to be slightly more general, but in typical usage, identical to these, based on the argument below.
The prescription in Guinier analysis [21] is to estimate the number of particles needed to achieve a given resolution, and mark this on a semilog plot of N as a function of the square of the spatial frequency, and repeat. This procedure is presumed to form a line, which can be extrapolated to find the number of particles to achieve a desired resolution. That is, knowing N1, define k1 implicitly by SSNR(N1,k1) = SSNR* where SSNR* is fixed value of SSNR that demarcates resolution as described above, and define k2 similarly. The Guinier assumption is that:
(4.4) |
for some constant λ for resolutions of interest corresponding to k1, k2. That is, along the fixed contours of SSNR, the change in the square of the resolution is proportional to the logarithm of the ratio of the number of particles used to achieve the SSNR criterion. By means of such a construction, one can estimate the number of particles needed to achieve a higher resolution. Eq (4.4) is easily solved formally as
(4.5) |
where C1 acts like a constant of integration, which depends on the SSNR, which is held fixed in the construction. This implies (exponentiate 4.5):
(4.6) |
for some function H. Every set of SSNR curves of the form (4.6) will yield (4.4). The only reasonable choice for H is linear, which matches our result (4.1), when ssnrpp(k) from (4.1) is a Gaussian. We should point out that in light scattering, Guinier plots are used as a low frequency approximation where the various physical parameters can rigorously be argued to hold to the damped Gaussian format indicated by (4.6) [35]
The Reslog analysis [31] is very similar and leads to:
(4.7) |
for some constant wavevector c. Once again, the only reasonable choice is a linear function, H, leading to Eq. (4.1) with an exponential form for ssnrpp(k) in Eq. 4.1.
Heymann [29] made an identical argument to arrive at our Eq. (4.1) and used the Guinier analysis, based partly on the formal results on blurring [36] and other envelopes [27]. As an aside, much like multiple time scales [37] can create an effective 1/frequency noise in physical systems rich with multiple time scales, with so many differing sources of noise in cryo-EM, it may be that, depending on the experimental circumstances, the linear behavior is equally valid to the quadratic behavior for governing the log of the envelope. In any case, Heymann suggests the Gaussian form for ssnrpp above: , consistent with the Guinier analysis. Although Heymann arrives at Eq. (4.1), he does not arrive at our Eq. (1.1), because the possible anisotropy is not discussed, and therefore he uses the expression for the uniform distribution of the per-particle sampling (1/2k) which is our Eq. (3.8) and Eq. (9) of Heymann [29].
Section 5. Decrement of SSNR through non-uniform sampling
In Section 4, we observed that the SSNR depends in two ways with the sampling; the extrinsic part governed by the number of particles N (as already has been discussed in the literature) as well as the type of the sampling governed by the geometrical factor of the sampling map, which we have termed the SCF. In this section, we test whether the SCF (or SCF*) has the predicted effect on the SSNR as described by Eq. (1.1) and explained in section 3.3. We look at sequences of reconstructions of two proteins that vary in their size and shape: the influenza hemagglutinin trimer and human apoferritin, for all the situations for which we calculated the SCF (or SCF*) values in Section 3.3. In each case, we compare the SSNR curves of reconstructions versus the baseline case, which is a set of uniformly distributed views.
5.1. Methods
5.1.1. Generation of projection distributions
We generated a set of 10,000 projection Euler angles for sequences of different sampling distributions, each of which is described in section 3.2. We evaluated three different schemes for modulating the projection distribution and comparing to the uniform distribution, as depicted in Figure 8. For the well-sampled side-like sequence, we used pure side views and modulated side views with a set of modulation parameters given by λ = 0.4, 0.6, 0.8, and 1.0, (Figure 8A). For the first of the more poorly-sampled cases, we selected top-like projections, distributed within varying cone sizes of half angular width (5°, 30° and 45°), and fixed a small amount of random projections (3%) distributed evenly across the rest of Euler space (Figure 8B). This scenario evaluates the effect of increasing cone size. For the second of the more poorly-sampled cases, we fixed the cone size to be 45° and added random assignments of 0%, 1%, 3% and 10% evenly distributed projections across the rest of Euler space (Figure 8C). This scenario evaluates the effect of increasing the amount of random projection “sprinkling” in the presence of an otherwise fixed distribution.
5.1.2. Synthetic data generation with distinct projection distributions
To test our idea relating the effect of a single geometrical parameter and the SSNR, we generated synthetic datasets corresponding to two proteins of varying size and shape, namely the hemagglutinin (HA) trimer and apoferritin. The synthetic data generation followed previously described protocols [18, 38–40]. Briefly, 10,000 projections were generated from cryo-EM maps of either HA or apoferritin, according to the viewing directions that were described in Section 5.1 above. These projections were shifted and rotated at random and noise was added. Next a distribution of CTFs were applied to the 2D projections, followed by an additional layer of noise to arrive at an SNR approximately equal to 0.05. This SNR is consistent with experimental cryo-EM data [41]. A reconstruction was performed with the known orientations using the Frealign software, and the usual FSC was calculated between half maps. In parallel, the angle assignments were used to calculate sampling maps, as described in Section 3 (and shown graphically in Figure 4). From the sampling maps, the SCFs were calculated numerically by implementing (3.22).
5.2. Results comparing decrements predicted by sampling and reconstructions
We tested how well the SCF geometrical parameter, based solely on the projection directions, could predict the decrement of the SSNR, with all other aspects of the problem held constant.
5.2.1. Side-like and side-modulated sampling cases
We first proceeded to test the predictive ability of the SCF on well-sampled cases, where most of the values of the sampling remain reasonably high and each index point is sufficiently sampled, e.g. above 20. From a theoretical perspective, we should expect that the ideas set forth are most accurate in this scenario. This is a typical case in cryo-EM reconstructions, even if some views are dominant. In the well sampled cases, all the structure factors remain at play, so we expect that the formulae relating SSNR to SCF are reasonably accurate. The situation is presented in Figure 9, where we describe the effect on reconstructions for a uniform case (Figure 4A), for side views (Figure 4C), and for modulated side-views (Figure 4D). These cases are the most frequently encountered scenarios in experimental cryo-EM datasets characterized by complete Fourier sampling. For both reconstructions of HA and apoferritin, in comparison to uniform, the SSNR curves are attenuated for side sampling in accordance with the amount of sampling inhomogeneity (Figure 9A–C). Side views have a range of sampling values over the surface of a Fourier sphere of radius, k, from the on-axis values to those on the orthogonal plane with a max-min ratio of πk. For the modulated side-view case, the ratio is even larger: πk /(1 – λ), where λ is the strength of the modulation. Nevertheless, the agreement between the decrement in SSNR and the SCF, as shown in the table in Figure 9D, is acceptable for both HA and Apoferritin.
5.2.2. Top-like sampling cases for varying cone sizes
We then proceeded to describe cases that would be reflective of predominantly top views. These cases are often encountered in experimental cryo-EM datasets characterized by a single preferential orientation and no additional views. For this scenario, we constructed the two-parameter family of distributions described by α and ϵ, where the former represents the half-angle of the cone from which the projections are drawn, and the latter represents the percentage of uniform projections besides those drawn from the cone. First, we vary the size of the cone, while fixing 3% uniform sampling across the rest of Euler space. Figure 10A–C shows how the SSNR is attenuated for reconstructions generated from such top-like distributions containing a fixed amount of sprinkled projections. In these cases, the maximum to minimum sampling can be as large as 1 + 4/(πϵα), for small ϵ,α according to the analytical formula. Nevertheless, in Figure 10D, the multiplicative shift determined from SCF (both numerically and from formulae) approximately matches the decrement in SSNR.
5.2.3. Top-like sampling cases for fixed cone size and varying fraction of randomly sprinkled projections
Finally, we took the same two parameter family as in Section 5.2.2, but examined a fixed cone size, and varied the fraction of random projections. This case is intended to show how altering a sampling distribution, even to small extents, can affect the SCF. The scenario has implications for modifying and/or altering orientation distributions in experimental cryo-EM datasets. Figure 11A–C shows how the SSNR is attenuated for reconstructions generated from such top-like distributions containing a fixed cone and varied number of random projections. The first observation from this data, as we explained in Section 3, is the artefactual increase in the SSNR for cases with completely missing data (black dotted curve in Figure 11A–B). This stems from the singularity in the theory for how the SSNR is typically defined, and a separate formula is needed to properly account for the variation that is implicitly missing, in half maps created from sets of projections with missing data. The adjusted formula from Eq. (3.27) pushes the SSNR curve to the appropriate ordering of the curves, where increasing the sampling always increases the SSNR (black solid curve in Figure 11A–B). Theoretically, the other curves (1%, 3%, 10%) should not be adjusted, because there is sufficient sampling to add information to the missing regions. In practice, there is also a small shift in those curves, which is not shown for the sake of clarity. The second observation from this data is that, for cases with large gaps in Fourier space, a small amount of additional projections goes a long way in increasing the SSNR. This is not surprising. Even in the early days of reconstructions, it was realized that, for under-sampled cases, adding small amounts of information to deficient parts of Fourier space greatly improves the ability to solve the reconstruction problem [42]. The experimental attenuations of the SSNR are also in line with the geometrical decrement of the SCF in continuum calculations (compare Figures 11 and 6). As in the previous cases described above, Figure 11D shows that the multiplicative shift determined from the SCF (both numerically and from formulae) approximately matches the decrement in SSNR.
Discussion
In this work, we show that non-uniformity of the set of projection views drives down properly defined global resolution measures. Our calculations are based on the standard assumptions, that there is some envelope that seems to stabilize for values less than 10 Angstroms [21]. The SSNR resolution measure estimates the ratio of the signal power to the signal variance. Using ordinary statistics, we expect that the variance per voxel will be decremented by the sampling. Therefore, if we assume that the noise variance approximately decouples from the sampling, then the average over Fourier shells of the reciprocal sampling arises naturally in the expression for the SSNR, leading to Eqs (1.1) and (1.2). Thus, the measure for the efficacy of sampling that we advocate, the SCF, emerges naturally, if we wish to isolate the effects of the geometry of the sampling on the resolution. The incorporation of the SCF is the step that distinguishes our calculations from similar calculations, such as [29].
The SCF can be obtained directly from a set of Euler angle distributions for a given dataset and describes the extent by which the sampling geometry attenuates global resolution. Our formulations assume that the sampling distribution is known. However, in an experimental setting, this can only be approximate, and in certain cases even misleading. Euler angle assignments depend on numerous factors, including the type of sampling distribution inherent to the data. For most cases, in particular those that are fully sampled, the approximate assignment of orientations is probably close enough, such that slight deviations from their true values should not dramatically affect the SCF. However, severe mis-assignment of Euler angles can occur during cryo-EM refinement, especially for “top-like” cases characterized by missing data. For example, we previously showed that for a pathological case, Euler angles are incorrectly assigned during the course of refinement and as a function of added particles in the data; this led to erroneous features in the resulting map [18], and would likewise skew the experimentally derived SCF. Thus, future work must be able to properly account for errors in orientation assignment in order to accurately estimate how the sampling geometry attenuates resolution.
A typical cryo-EM reconstruction procedure carries along information that can be represented by three maps: two half-maps and a sampling map that can be created from knowledge of the angle-assignments or that can be taken to be the map of reconstruction weights in a direct Fourier reconstruction. From these maps, one can estimate up to second moments and continue to combine information to arrive at better reconstructions. Ultimately, one arrives at a mean map, variance map, and sampling map, or three pieces of information per voxel. If there is missing data, then there is a pathology in the way that SSNR is typically defined. Although defining the mean of the missing values to zero is acceptable (and forms the best estimate of the original structure), setting the variance to zero is problematic, since there is enough information to give a better estimate. We find a self-consistent correction to the ordinary SSNR and showed in section 5 that the redefined SSNR always increases with more uniform sampling, as should be expected.
We also demonstrated the linear dependence of the SSNR on the total sampling, which is governed by the number of particles. This was implicit in earlier analyses of Guinier or Reslog, as shown in the mathematical description of section 4, but takes on a simpler form here. We show that these latter constructions imply a definite functional form for the SSNR, which is more restrictive than necessary. We provide the mathematical argument, that one can estimate the number of particles necessary to achieve a higher resolution, using the same collection strategy, but with a single SSNR curve, provided that the curve is sufficiently continuous over the resolution ranges in question. This has value during data acquisition, since it can inform the experimentalist how a given collection might be altered or abandoned based on the goals of the experiment, and the prediction is achieved without the need to recalculate reconstructions using particle subsets.
An important consideration is the case with missing data. For the fully sampled cases (Figure 4A–D), increasing the number of particles will lead to a corresponding increase in the SSNR, and therefore one can explicitly estimate expected improvements in resolution with increasing N. However, for the case with missing data (Figure 4E), increasing the number of particles will only improve the SSNR within the measured region, whereas the unmeasured region remains devoid of information. For the case of missing data, the adjusted SSNR plateaus to a value, which is related to the ratio of measured to unmeasured voxels (defined in Eq. 2.32 and described in that corresponding section). The extent of missing data (or the “pathology” of the sampling distribution in the single-particle case), can have significant effects on the quality and interpretability of a map. Therefore, one cannot simply expect to improve the global SSNR simply by adding particles, and it may be more beneficial to improve the sampling distribution. As we show in Figure 6, even small alterations to the orientation distribution can have substantial benefits to improving the global SSNR.
There are several major implications from the current work. Most importantly, the direct relationship between sampling and global resolution in single-particle cryo-EM implies that any deviation from uniformity always drives down the SSNR, and thereby leads to an increase in the number of particles that are required for attaining a specified resolution. There is a persistent problem of preferred specimen orientation (and consequently non-uniform projection distributions) that appears to affect the vast majority of single-particle reconstructions [6]. This means that virtually all data sets are characterized by non-ideal imaging and image processing conditions. As dictated by Eq. 1.1 (also Eq. 2.20) the experimental situation therefore requires optimizing two parameters – the experimental “envelope” as well as the sampling distribution. Here, we use the term “envelope” in a broad sense to encompass all of the factors that attenuate experimental resolution. These include, but are not limited to, beam coherence, ice thickness (and its effect on the background signal-to-noise ratio), quantum efficiency of the detector, residual specimen movement that is not corrected by motion correction, errors in computational orientation assignment, structural heterogeneity, and any other factors that generally attenuate experimental resolution, as measured by the FSC. In addition to the envelope, the sampling distribution matters. To reach the hypothetical resolution limit for small particles [43], it is therefore essential to not only improve hardware and software, but also techniques for specimen preparation, in order to maximize sampling uniformity on cryo-EM grids. Some effort toward this goal is ongoing [7], but more needs to be done. Along these lines, the more symmetric the particle, the more orientations are sampled during the reconstruction process. Therefore, symmetry does not merely multiply the number of particles in the data in accordance with the symmetry group; the improvement in sampling for symmetric particles also contributes to gains in SSNR by virtue of improvements to the SCF. Thus, symmetry has a dual effect in improving both data quantity and quality. In part for this reason, cases like AAV [2] and Apoferritin [3] have pushed the resolution limits and are associated with very low temperatures factors (or slowly decreasing envelopes) in the data.
Beyond attenuation of global resolution, the extent to which the map suffers as a consequence of incomplete sampling is currently unclear. Specimens with high C- or D-fold symmetry that are characterized by pure side views are, strictly speaking, anisotropic. However, the effect at the level of the reconstructed map is negligible, and the experimentalist should not notice differences in structural details if one were to directly compare to a map reconstructed from a uniform sampling distribution. Nonetheless, as we show in Figure 9 and emphasize throughout this work, the SSNR for pure side views is still attenuated in comparison to uniform by ~20%, and thus the amount of data required for reaching defined resolutions is increased by approximately the same percentage. Beyond the simple cases, there are multiple factors that currently complicate an exhaustive analysis of experimental maps characterized by different symmetries and sampling geometries. First, it will be necessary to decouple the effect of anisotropy (in its strict definition, impacting directional resolution) from the attenuating effect on global resolution. More worryingly however, we believe that there may, in certain extreme cases of missing data, be systematic bias in the reported resolution in the field, caused by artefactual inflation in the FSC (for example, as observed in Figure 11). In part for this reason, we introduced the FSC* and SSNR* criteria, which compensate for missing views in Euler space and report a more realistic value of resolution for the pathological cases. FSC* and SSNR* can, in principle, be extended to highly under-sampled orientations that may be prevalent in experimental situations. Implementation of these criteria to experimental data, and a careful analysis of the underlying sources of error and resulting statistics, will be the subject of future work.
Experimental improvements to the sampling distribution can be achieved by tilting the specimen inside of the electron microscope. However, this comes at a cost of degradation in image quality [18]. The direct relationship between sampling and resolution indicates that any attenuation due to sampling can now be compared with other types of experimental attenuations, for example due to beam-induced movement, ice thickness, errors in the image processing pipeline, etc. Thus, a natural direction will be to quantify the resolution gains caused by improvements in orientation sampling, as compared to resolution losses caused by degradation of image quality during data acquisition at tilt angles. Such studies will help to quantitatively establish an optimal tilt angle for any dataset containing a defined sampling distribution.
Finally, The SCF provides a direct means by which to evaluate a sampling distributions, with an intuitive scale ranging from 0 to 1. We propose the use of the SCF for evaluating Euler angle assignments for sets of particles that produce 3D reconstructions in cryo-EM.
Acknowledgements
PRB would like to thank Pawel Penczek for conceptual explanations pertaining to the current work. PRB and DL would like to thank David De Rosier for discussions. PRB and DL are supported by grants from the NIH: DP5-OD021396, R01 AI136680, and U54 GM103368.
Glossary
- FSC(k)
is the Fourier shell correlation of the reconstruction at Fourier frequency k.
- SSNR(k)
is the spectral signal to noise ratio of the reconstruction at Fourier frequency k.
- ssnrmodel(k)
is the model SSNR, defined by Eq. (2.17) and used in Secs 1, 2.
- ssnrpp(k)
is the per-particle SSNR and closely related to ssnrmodel(k). See Eq. (4.1)
- L
the side of the real space box
- N
the number of particles in the reconstruction
- k
is Fourier magnitude
is a 2D or 3D point in Fourier space
- SCF
is the sampling compensation factor, characterizes effect of sampling on SSNR
- Sp()
sp() is the sampling function (and per particle sampling function) defined at a 3D lattice site
- Fj()
is the Fourier value of the jth projection at the point .
- Mj(k)
is the effect of the microscope (CTF) on the jth projection
- X()
() is the target model and a running estimate of the model
- Rj
is a 3D rotation matrix describing the projection, j.
- Nj()
is the noise added to the projection, j.
- E(k)
is the total envelope that attenuate the image due to microscope and misalignment effects.
- ()
is the effective noise at 3D lattice sites after regrouping from projections
is the power of the noise
- F, G
are half maps used to derive FSC relations in section 2
- Pk, Qk
the number of measured and unmeasured voxels, on a Fourier shell of radius k, when there is missing data.
is used as a unit vector demarcating a projection
- Θ (x)
indicator function, which is 1 if the condition x is true, 0 otherwise
- λ
is the amplitude of the modulation for the modulation of side views (section 3)
- α
is a cone half angle: for top-like views, projections are inside cone; for side like, projections are outside the cone.
- ϵ
is the fraction of projections that are not restricted to be in the main cone of half angle α (Section 3)
- Euler Angle
is one of the three angles used to describe rotation matrices (θ is rotation around Zaxis, ϕ is rotation around Y-axis, ψ is in-plane rotation)
Appendix A. Heuristic proof of (2.3) from (2.1), (2.2)
From
(A.1) |
and
(A.2) |
We argue
(A.3) |
Once the data has been aligned, then the information created by a given projection, j, is a noisy version of the actual target map, , and one that has been possibly misaligned, and altered by the microscope transfer function: the CTF. So
(A.4) |
This is consistent with (A.1). Strictly speaking, there will be reconstruction weights: the rotation described in (A.1) does not precisely rotate a 2D point precisely into a 3D grid [23–25]. In this particular work, we did not consider alignment errors of the data, but in general one could. This would give rise to the phase errors in (A.4). As was discussed in [36], generally rotational errors cannot be treated so simply, but the Jensen article [27] maintains that this can be done approximately.
We assume that all the noises from different projections in (A.4), but at the same Fourier point, have a similar variance:
(A.5) |
Therefore each (for all j) and also are identically independently distributed (iid) white noise with the same variance. Since iid white noise are essentially defined by the variance, then (A.5) completes the definition of .
Now we take averages over the projections, labeled by j, as in Eq. (A.2). It is standard to decouple the first terms:
(A.6) |
All the current and prior work assume that < Mj(k) > j may be modeled by a single envelope, say E1(k), which is normally taken to be isotropic. This is assumed in all the other cited papers. The second term to consider is the average over the phase errors. Following Jensen [27] or Baldwin/Penczek [36], these are taken to be alignment errors, which when averaged, give rise to a blurring, which is generally modeled as Gaussian in the spatial frequency: Heymann also discusses this point [29]. Strictly speaking this is not true for rotational errors, because the proper basis for rotational blurring is that of a polar basis. But it was modeled successfully in the Jensen paper, and is a generally accepted point. So, the phase error average is thought to give rise to a second envelope, E2(k), which when multiplied by the first term gives rise to a combined envelope: E(k) ≡ E1(k) E2(k). This brings Eq. (A. 6) to
(A.7) |
With the assumptions given above, the variance of the last term can be calculated:
(A.8) |
(A.9) |
Eq. (A.9) follows from Eq. (A.8) after using the definition of < >j from Eq. (A.2). We have considered all terms above as real, for convenience (complexifying is a minor and straightforward detail). The main point now is that the noises do not contribute systematically in the sum, unless j1 = j2, because the expectation of the cross terms is independent. So, on average, this is true, but also if the number of terms in the sum is large, the j2 = j1 terms dominate the j2 ≠ j1 terms. Approximately, we can write:
(A.10) |
But the last term in < > is simply the original variance, which is modeled to be independent of the projection (and the sampling) giving the information to the lattice site. This is the expression, Variance as given by Eq. (A.5) above. The sum over j1 results in a factor of in the sum (because the noises are assumed iid) yielding:
(A.11) |
The last term of (A.7), therefore, can be replaced by an effective noise, , with the same statistics as the original, so long as we downweight by . Eq. (A.7) then yields
(A.12) |
This is a robust, and usually numerically effective, result of the central limit theorem: after regrouping noises, the resultant noise is down by a factor of one over the number of samples, relative to the systematic part. This is a main tenet of the current work. Eq. (A.12) becomes our best estimate for the reconstruction:
(A.13) |
Eq (A.13) is the workhorse of the theoretical work in cryo-EM. Usually the anisotropy of the noise is neglected as well. The only short-coming of the previous work, that was cited in our manuscript, is that incorporating the effect of the fluctuation in the reciprocal sampling on the SSNR is not prohibitively difficult.
Regrouping of noise always leads to an attenuation of the noise variance by the square root of the number of samples is a mainstay of statistics in applications to science. It is the reason that the fluctuations in thermodynamics are so small, or that the expectation of the distance of the random walker from the origin scales only as the square root of the number of steps, whereas a walker moving with constant velocity moves a distance proportional to the time or number of steps. That the sampling at intermediate resolutions ultimately remains fairly high, leads to the robustness of our use of the central limit theorem.
Appendix B. Some details for calculations in Section 3: geometrical factor for decay of density Eq (3.5), checking numerical sampling code Eq (3.9), creating distributions according to some prescribed function Eq (3.15), proof that uniform distribution maximizes the SCF Eq (3.22), derivation of Eq. (3.23) for the SCF for modulated side-views
B.1. A general formula for the projection geometrical factor: Eq. (3.5)
Our claim in Eq (3.5) is that
(B.1) |
where cp is a geometrical factor that we wish to calculate in general dimensions, especially for D = 2, 3. By , we mean the average over the surface of the until ball in D dimensions. One easy way is to integrate the above equation over all with K < L in D dimensions. Then on the left-hand side we get:
(B.2) |
(B.3) |
(B.4) |
(B.5) |
where VD – 1 is the volume of the unit ball in D – 1 dimensions. Eq (B.3) holds because the integrand in (B.2) is no longer dependent on the direction, , so the average over seen in (B.2) integrates to 1. Moreover where it appears in the integral may be set to for convenience. On the RHS of (B.1) we also perform the integration over the ball of radius L and get
(B.6) |
(B.7) |
(B.8) |
where AD is the surface area of the unit ball in D dimensions.
Equating the last two expressions shows that
(B.9) |
This gives
(B.10) |
and
(B.11) |
Therefore, the geometrical factor 2 that appears in Eq 3.5 is simply the ratio of the surface area of a unit ball to the circumference of a great circle of the same ball.
B.2. Checking the Sampling Code Eq. (3.9)
We want to evaluate
(B.12) |
(B.13) |
It is enough to consider the upper quadrant, where all the components are positive.
This is where both the azimuthal angle, ϕ, and the spherical angle, θ, are in the range . This gives us a symmetrization factor of 8. However, we may also consider a definite ordering for the Kx,ky,kz giving a symmetrization factor of 6. Putting this together we have
(B.14) |
We wish to reorder the integrations: first k, then θ then ϕ. The spherical representations for the components may be written as: Kx,ky,kz ≡ (sin θ cos ϕ, sin θ sin ϕ, cos θ)
Now 0 ≤ ky ≤ kx is easily represented by . Let’s write down what we have so far:
(B.15) |
To ensure the last two inequalities we need k ≤ L/cos θ and tan θcos ϕ ≤ 1. This last inequality can be used to govern the upper limit of the θ integration, in place of the π/2, because tan θ can always attain the value 1/cos ϕ on the interval [0,π/2].
Putting this all together and developing we get
(B.16) |
(B.17) |
(B.18) |
(B.19) |
(B.20) |
To evaluate the last integral, I we use the substitution the limits for γ now become ϕ = 0↔γ = 0, ϕ = π/4↔γ = π/3. We also have , . This leads to , which can be simplified to . so
(B.21) |
(B.22) |
(B.23) |
(B.24) |
(B.25) |
(B.26) |
(B.27) |
(B.28) |
(B.29) |
Finally, now
(B.30) |
This factor is the empirically observed excess area of an average plane embedded into a cube. This is the approximately 20 per cent increase in actively sampled points. It is a larger factor than the comparable 1.12 that would appear in a similar 2D problem.
(B.31) |
Another approach to evaluating (B.9) is to introduce an auxiliary variable via . Then there are just a few steps to a single integral and a numerical evaluation: , which is (B.30).
B.3. Creating distributions according to some prescribed function Eq (3.15)
In order to create a numerical sampling map for modulated side views, we would like to assign azimuthal angles to projections such that the oscillatory azimuthal distribution density indicated by (3.11) is achieved. This is well known how to do: for completeness, we include the argument here. From the density function (3.11), the cumulative distribution function can be found which is
(B.32) |
Now the azimuthal angle should be given by
(B.33) |
That is, numbers should be drawn evenly between 0 and 2π, resulting in an array given by (B.33). These are the angle labels to be given to achieve the desired distribution (3.11). So long as λ < 1, this is easy to do, because the distribution is positive and the cdf is monotonically increasing (graphically, the inverse corresponds to flipping across the diagonal, which maps a function into another function due to monotonicity). The python pseudo code would read: phi0= cdf0= np.linspace(0,2*np.pi,NumPoints); , cdflnv = np.interp(phi0,cdf,cdf0). That is, map the array phi0 to the desired phi (which is the desired cdfInv), in the same manner that cdf was mapped to cdf0, where phi0, cdf0 are both regularly spaced.
B.4. Proof that uniform distribution maximizes the SCF Eq (3.22).
Consider a set of positive numbers {ai} that satisfy a constraint C: Σiai = M. The set are to represent the sampling on the unit sphere. We wish to maximize subject to C. We begin by writing the usual variational:
(B.34) |
where μ is a Lagrange parameter. Extremizing wrt the aj yields
(B.35) |
The second variation is:
(B.36) |
Since the second variation is positive, the uniform solution aj = constant, corresponds to a minimum.
The argument supplied here implies why the SCF attains its maximum (1/SCF attains its minimum as in the above calculation), when the sampling (which is a conserved quantity on every shell of Fourier space) is distributed as uniformly as possible, or equivalently the projections are distributed uniformly.
B.5. Derivation of Eq. (3.23); SCF for modulated side-views
From Eq. 3.16, we have
(B.37) |
Using the definition of SCF, 1/SCF = <(1/(2k sp )>, then (B.37) becomes
(B.38) |
where the last term in the integrand of (A.38) is the reciprocal of (B.37). The integration over θ, can be easily performed leaving:
(B.39) |
(B.40) |
Integrals of the sort that appear in (B.40) are easily reduced by means of the so-called Weierstrass half angle formula: . The integral in (B.40) becomes . So the expression in (b.40) becomes:
(B.41) |
which is (3.23).
Appendix C. Derivation of Eq. (3.12) and (3.18): sampling distributions from projection distributions
Eq (3.11) reads
(C.1) |
where integrations are taken over all unit vectors, , in 3D. Also CN, side is a normalization constant ensuring Eq (3.8): , where denotes angular average over the angles in k with the uniform measure on the sphere. The integration in C.1 is over the set of normal vectors to the sphere, with the given constraint. Putting this together with C.1 yields:
(C.2) |
(C.3) |
(C.4) |
Now cannot be a function of the direction of . So it can be conveniently calculated when , which does not depend on an azimuthal angle in the integration over , and therefore leads only to the average over the altitude. This leads to:
(C.5) |
(C.6) |
(C.7) |
Eq. C.7 is a natural result. It is the ratio of the circumference to the surface area of the unit circle: 2π/4π = 1/2. Returning to (C.4) we get:
(C.8) |
(C.9) |
(C.10) |
(C.11) |
So, substituting (C.11) into (C.1) yields
(C.12) |
It is easy to argue that does not depend on the azimuthal angle of , which we can therefore take to be zero in order to evaluate (B.12): . Instead of the integration over the sphere given by the unit vector, , we need to perform the integral in (C.12) over the great circle perpendicular to . Therefore, we can parametrize , in the integration in (C.12) by
(C.13) |
Eq. (C.13) is a parametrization of all the unit vectors perpendicular to as described in the last paragraph. By changing β, we can sweep out the unit vector given by (C.13): these are the locus of normals to and outside the cone of half angle α So from (C.12)
(C.14) |
The criterion Θ (|sin θsin β| < cos α) in C.14 is a rewrite for the constraint of the projection directions, , from Eq. B.12. Continuing from Eq. C.14.
(C.15) |
If cos α > sin θ, then the argument of the indicator function in (C.15) is always true. If not the upper limit of β in the integral must be reduced to asin (cos α /sin θ). This leads to:
(C.16) |
which is (3.12).
Finally
(C.17) |
Using (C.8) and (C.17) using the parallel argument to (C.1)-(C.7) together, we note that
(C.18) |
So
(C.19) |
The parallel derivation to (C.14) now becomes:
(C.20) |
This is the integration around the locus of points normal to and inside the cone of half-angle, α. However, sin β may be replaced by cos β by shift of origin, and an overall factor of 4 introduced due to the 4 equivalent quadrants:
(C.21) |
If cos α > sin θ, then the condition of the indicator function cannot be fulfilled, and the left-hand side = 0. Otherwise
(C.22) |
So
(C.23) |
(C.24) |
This is (3.18). Thus, the sampling is zero in directions close to along the z-axis, for the top-like cases.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Bartesaghi A, et al. , Atomic Resolution Cryo-EM Structure of beta-Galactosidase. Structure, 2018. 26(6): p. 848–856 e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tan YZ, et al. , Sub-2 A Ewald curvature corrected structure of an AAV2 capsid variant. Nat Commun, 2018. 9(1): p. 3628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zivanov J, et al. , New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife, 2018. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tan YZ, et al. , Automated data collection in single particle electron microscopy. Microscopy (Oxf), 2016. 65(1): p. 43–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fernandez-Leiro R and Scheres SH, Unravelling biological macromolecules with cryo-electron microscopy. Nature, 2016. 537(7620): p. 339–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Noble AJ, et al. , Routine single particle CryoEM sample and grid characterization by tomography. Elife, 2018. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Noble AJ, et al. , Reducing effects of particle adsorption to the air-water interface in cryo-EM. Nat Methods, 2018. 15(10): p. 793–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.D’Imprima E, et al. , Protein denaturation at the air-water interface and how to prevent it. Elife, 2019. 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fan X, et al. , Single particle cryo-EM reconstruction of 52 kDa streptavidin at 3.2 Angstrom resolution. Nat Commun, 2019. 10(1): p. 2386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu N, et al. , Bioactive Functionalized Monolayer Graphene for High-Resolution Cryo-Electron Microscopy. J Am Chem Soc, 2019. 141(9): p. 4016–4025. [DOI] [PubMed] [Google Scholar]
- 11.Naydenova K and Russo CJ, Measuring the effects of particle orientation to improve the efficiency of electron cryomicroscopy. Nat Commun, 2017. 8(1): p. 629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Grigorieff N, Three-dimensional structure of bovine NADH:ubiquinone oxidoreductase (complex I) at 22 A in ice. J Mol Biol, 1998. 277(5): p. 1033–46. [DOI] [PubMed] [Google Scholar]
- 13.Penczek PA, Three-dimensional spectral signal-to-noise ratio for a class of reconstruction algorithms. J Struct Biol, 2002. 138(1–2): p. 34–46. [DOI] [PubMed] [Google Scholar]
- 14.Harauz G and van Heel M, Exact Filters for General Geometry Three Dimensional Reconstruction. Optik, 1986. 73: p. 146–156. [Google Scholar]
- 15.Diebolder CA, et al. , Conical Fourier shell correlation applied to electron tomograms. J Struct Biol, 2015. 190(2): p. 215–23. [DOI] [PubMed] [Google Scholar]
- 16.Dudkina NV, et al. , Interaction of complexes I, III, and IV within the bovine respirasome by single particle cryoelectron tomography. Proc Natl Acad Sci U S A, 2011. 108(37): p. 15196–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dang S, et al. , Cryo-EM structures of the TMEM16A calcium-activated chloride channel. Nature, 2017. 552(7685): p. 426–429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tan YZ, et al. , Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat Methods, 2017. 14(8): p. 793–796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lyumkis D, Challenges and opportunities in cryo-EM single-particle analysis. J Biol Chem, 2019. 294(13): p. 5181–5197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Penczek PA, Resolution measures in molecular electron microscopy. Methods Enzymol, 2010. 482: p. 73–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Rosenthal PB and Henderson R, Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J Mol Biol, 2003. 333(4): p. 721–45. [DOI] [PubMed] [Google Scholar]
- 22.Sorzano CO, et al. , A review of resolution measures and related aspects in 3D Electron Microscopy. Prog Biophys Mol Biol, 2017. 124: p. 1–30. [DOI] [PubMed] [Google Scholar]
- 23.Abrishami V, et al. , A fast iterative convolution weighting approach for gridding-based direct Fourier three-dimensional reconstruction with correction for the contrast transfer function. Ultramicroscopy, 2015. 157: p. 79–87. [DOI] [PubMed] [Google Scholar]
- 24.Penczek PA, Renka R, and Schomberg H, Gridding-based direct Fourier inversion of the three-dimensional ray transform. J Opt Soc Am A Opt Image Sci Vis, 2004. 21(4): p. 499–509. [DOI] [PubMed] [Google Scholar]
- 25.Scheres SH, RELION: implementation of a Bayesian approach to cryo-EM structure determination. J Struct Biol, 2012. 180(3): p. 519–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Frank J, The envelope of electron microscopic transfer functions for partially coherent illumination. Optik 1973. 38: p. 519–536. [Google Scholar]
- 27.Jensen GJ, Alignment error envelopes for single particle analysis. J Struct Biol, 2001. 133(2–3): p. 143–55. [DOI] [PubMed] [Google Scholar]
- 28.Penczek PA, Image restoration in cryo-electron microscopy. Methods Enzymol, 2010. 482: p. 35–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Heymann JB, Single-particle reconstruction statistics: a diagnostic tool in solving biomolecular structures by cryo-EM. Acta Crystallogr F Struct Biol Commun, 2019. 75(Pt 1): p. 33–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bracewell RN, Strip Integration in Radio Astronomy. Austrian Journal of Physics, 1956. 9: p. 198. [Google Scholar]
- 31.Stagg SM, et al. , ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. J Struct Biol, 2014. 185(3): p. 418–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Passos DO and Lyumkis D, Single-particle cryoEM analysis at near-atomic resolution from several thousand asymmetric subunits. J Struct Biol, 2015. 192(2): p. 235–44. [DOI] [PubMed] [Google Scholar]
- 33.Saad A, et al. , Fourier amplitude decay of electron cryomicroscopic images of single particles and effects on structure determination. J Struct Biol, 2001. 133(1): p. 32–42. [DOI] [PubMed] [Google Scholar]
- 34.Stagg SM, et al. , A test-bed for optimizing high-resolution single particle reconstructions. J Struct Biol, 2008. 163(1): p. 29–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Plot G, http://gisaxs.com/index.php/Guinier_plot 2019.
- 36.Baldwin PR and Penczek PA, Estimating alignment errors in sets of 2-D images. J Struct Biol, 2005. 150(2): p. 211–25. [DOI] [PubMed] [Google Scholar]
- 37.Milotti E, 1/f noise: a pedagogical review. 2013.
- 38.Lyumkis D, et al. , Likelihood-based classification of cryo-EM images using FREALIGN. J Struct Biol, 2013. 183(3): p. 377–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Voss NR, et al. , A toolbox for ab initio 3-D reconstructions in single-particle electron microscopy. J Struct Biol, 2010. 169(3): p. 389–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang C, et al. , Analysis of discrete local variability and structural covariance in macromolecular assemblies using Cryo-EM and focused classification. Ultramicroscopy, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Baxter WT, et al. , Determination of signal-to-noise ratios and spectral SNRs in cryo-EM low-dose imaging of molecules. J Struct Biol, 2009. 166(2): p. 126–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Crowther RA, et al. , Three dimensional reconstructions of spherical viruses by fourier synthesis from electron micrographs. Nature, 1970. 226(5244): p. 421–5. [DOI] [PubMed] [Google Scholar]
- 43.Henderson R, The potential and limitations of neutrons, electrons and X-rays for atomic resolution microscopy of unstained biological molecules. Q Rev Biophys, 1995. 28(2): p. 171–93. [DOI] [PubMed] [Google Scholar]